CN118013030A

CN118013030A - Method, device and electronic device for generating business process rule file

Info

Publication number: CN118013030A
Application number: CN202211398836.2A
Authority: CN
Inventors: 常晓花; 吕品; 王思琪; 姜飞宇; 张译戈; 陆璐
Original assignee: China Mobile Communications Group Co Ltd; Research Institute of China Mobile Communication Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; Research Institute of China Mobile Communication Co Ltd
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2024-05-10

Abstract

The present invention provides a method, device and electronic device for generating a business process rule file, wherein the method comprises: obtaining a first sample feature field and a business label of a process characteristic analysis software package PCAP sample package; performing clustering processing on the first sample feature field to obtain a second sample feature field; the second sample feature field is used to characterize a common feature; analyzing the association between the business label and the second sample feature field to obtain an association rule; and generating a business process rule file according to the association rule. The generation method of the present invention can achieve high generation efficiency and can also ensure high accuracy of the business rule file.

Description

Method, device and electronic device for generating business process rule file

技术领域Technical Field

本发明实施例涉及网络与数据挖掘技术领域，尤其涉及一种业务流程规则文件的生成方法、装置及电子设备。The embodiments of the present invention relate to the technical field of network and data mining, and in particular to a method, device and electronic device for generating a business process rule file.

背景技术Background technique

随着5G技术的不断发展演进，4G/5G新业务种类不断增多，基于HTTP/HTTPS或TCP/UDP，以及WEB3.0等传输的业务数据流量庞大，业务数据复杂。With the continuous development and evolution of 5G technology, the types of new 4G/5G services are increasing. The business data traffic based on HTTP/HTTPS or TCP/UDP, as well as WEB3.0 transmission is huge and the business data is complex.

现有技术获取业务流规则的方法：The existing technology method for obtaining business flow rules:

依赖人工手动拨测抓包，通过手机连接电脑拨测后，使用wireshark抓网卡流量，从而拿到应用数据的流量，然后人工肉眼查看数据报文中中是否包含能够表征业务流的特征数据，将其特征摘出来整理成该业务流的规则。Rely on manual dialing and packet capture. After connecting the mobile phone to the computer for dialing, use Wireshark to capture the network card traffic to obtain the traffic of the application data. Then, check with the naked eye whether the data message contains characteristic data that can characterize the business flow, extract its characteristics and organize them into the rules of the business flow.

现有依赖人工获取业务流规则的方法生成业务规则文件的效率低、生成的业务规则文档易出错。The existing method of manually obtaining business flow rules to generate business rule files is inefficient and the generated business rule documents are prone to errors.

发明内容Summary of the invention

本发明实施例提供一种业务流程规则文件的生成方法、装置及电子设备，以解决现有的依赖人工获取业务流规则的方法生成业务规则文件的效率低、生成的业务规则文档易出错的问题。The embodiments of the present invention provide a method, device and electronic device for generating a business process rule file, so as to solve the problems that the existing method of generating business rule files by relying on manual acquisition of business flow rules is inefficient and the generated business rule documents are prone to errors.

为了解决上述技术问题，本发明是这样实现的：In order to solve the above-mentioned technical problems, the present invention is achieved as follows:

第一方面，本发明实施例提供了一种业务流程规则文件的生成方法，包括：In a first aspect, an embodiment of the present invention provides a method for generating a business process rule file, comprising:

获取过程特性分析软件包PCAP样本包的第一样本特征字段及业务标签；Obtain the first sample feature field and business label of a process characteristic analysis software package PCAP sample package;

对所述第一样本特征字段进行聚类处理，得到第二样本特征字段；所述第二样本特征字段用于表征公共特征；Performing clustering processing on the first sample feature field to obtain a second sample feature field; the second sample feature field is used to characterize a common feature;

分析所述业务标签与所述第二样本特征字段之间的关联，得到关联规则；Analyze the association between the service tag and the second sample feature field to obtain an association rule;

根据所述关联规则生成业务流程规则文件。A business process rule file is generated according to the association rule.

可选地，Optionally,

对所述第一样本特征字段进行聚类处理，包括：The clustering process is performed on the first sample feature field, including:

采用聚类算法对所述第一样本特征字段进行预分类；Using a clustering algorithm to pre-classify the first sample feature field;

采用最长公共子串算法对预分类后的所述第一样本特征字段进行求解，得到与预分类后的所述第一样本特征字段一一对应的最长公共子串值；Using the longest common substring algorithm to solve the first sample characteristic fields after pre-classification, to obtain the longest common substring value corresponding to the first sample characteristic fields after pre-classification;

采用所述最长公共子串值对应更新预分类后的所述第一样本特征字段的字段值，得到第二样本特征字段。The field value of the first sample characteristic field after pre-classification is updated corresponding to the longest common substring value to obtain a second sample characteristic field.

可选地，Optionally,

采用聚类算法对所述第一样本特征字段进行预分类，包括：The first sample feature field is pre-classified using a clustering algorithm, including:

将所述第一样本特征字段集合组成第一样本集合，将所述第一样本集合发送至与用户关联的交互端，所述第一样本集合用于用户剔除非强相关的所述第一样本特征字段，并用于得到仅包含强相关的所述第一样本特征字段的第二样本集合；The first sample feature field sets are combined into a first sample set, and the first sample set is sent to an interactive terminal associated with a user, wherein the first sample set is used by the user to remove the first sample feature fields that are not strongly correlated, and to obtain a second sample set that only includes the first sample feature fields that are strongly correlated;

接收所述交互端发送的所述第二样本集合；receiving the second sample set sent by the interactive terminal;

采用聚类算法对所述第二样本集合中的所述第一样本特征字段进行预分类。A clustering algorithm is used to pre-classify the first sample feature fields in the second sample set.

可选地，Optionally,

采用聚类算法对所述第二样本集合中的所述第一样本特征字段进行预分类，之前包括：The first sample feature fields in the second sample set are pre-classified using a clustering algorithm, which includes:

对所述第二样本集合中的所述第一样本特征字进行格式检测，得到第一检测结果；Performing format detection on the first sample feature word in the second sample set to obtain a first detection result;

若所述第一检测结果为所述第一样本特征字段为字符格式，采用一位有效编码one-hot方法将所述第二样本集合中的所述第一样本特征字段转换为数字格式。If the first detection result is that the first sample feature field is in character format, the first sample feature field in the second sample set is converted into a digital format using a one-hot encoding method.

可选地，Optionally,

采用聚类算法对所述第一样本特征字段进行预分类，之前包括：The first sample feature field is pre-classified using a clustering algorithm, which includes:

对所述第一样本特征字段进行格式检测，得到第二检测结果；Performing format detection on the first sample characteristic field to obtain a second detection result;

若所述第二检测结果为所述第一样本特征字段为字符格式，采用one-hot方法将所述第一样本特征字段转换为数字格式。If the second detection result is that the first sample feature field is in character format, the first sample feature field is converted into a digital format using a one-hot method.

可选地，Optionally,

分析所述业务标签与所述第二样本特征字段之间的关联，包括：Analyzing the association between the service tag and the second sample feature field includes:

获取预设的训练集合，所述预设的训练集合包括：训练业务标签及训练样本特征字段；Acquire a preset training set, wherein the preset training set includes: a training service label and a training sample feature field;

采用所述训练业务标签及所述训练样本特征字段对关联分析模型进行训练，得到目标关联分析模型；The association analysis model is trained using the training service label and the training sample feature field to obtain a target association analysis model;

采用所述目标关联分析模型分析所述业务标签与所述第二样本特征字段之间的关联。The target association analysis model is used to analyze the association between the service tag and the second sample feature field.

可选地，Optionally,

根据所述关联规则生成业务流程规则文件，之后包括：Generate a business process rule file according to the association rule, and then include:

将所述业务流程规则文件加载至规则库，并更新规则库信息。The business process rule file is loaded into the rule base, and the rule base information is updated.

可选地，Optionally,

更新规则库信息，之后包括：Update the rule base information, including:

接收配置库发送的请求指令；Receive request instructions sent by the configuration library;

根据所述请求指令返回所述规则库信息，所述规则库信息用于所述配置库确定自身存储的业务流程规则文件是否为最新，若不为最新且与所述规则库差别的业务流程规则文件的数量超出预设的差值阈值，所述配置库发送全量同步指令；若不为最新且与所述规则库差别的业务流程规则文件的数量未超出所述预设的差值阈值，所述配置库发送增量同步指令。The rule base information is returned according to the request instruction. The rule base information is used by the configuration library to determine whether the business process rule files stored in it are the latest. If they are not the latest and the number of business process rule files that differ from the rule base exceeds the preset difference threshold, the configuration library sends a full synchronization instruction; if they are not the latest and the number of business process rule files that differ from the rule base does not exceed the preset difference threshold, the configuration library sends an incremental synchronization instruction.

可选地，Optionally,

根据所述请求指令返回所述规则库信息，之后包括：Returning the rule base information according to the request instruction, and then comprising:

接收所述配置库发送的所述全量同步指令；Receiving the full synchronization instruction sent by the configuration library;

发送所述规则库中的所有业务流程规则文件至所述配置库，以更新所述配置库自身存储的所有业务流程规则文件。All business process rule files in the rule library are sent to the configuration library to update all business process rule files stored in the configuration library itself.

可选地，Optionally,

接收所述配置库发送的所述增量同步指令；Receiving the incremental synchronization instruction sent by the configuration repository;

根据所述增量同步指令发送业务流程规则文件至所述配置库，以更新所述配置库自身存储的与所述规则库差别的业务流程规则文件，所述增量同步指令用于指示所述配置库与所述规则库差别的业务流程规则文件。The business process rule file is sent to the configuration library according to the incremental synchronization instruction to update the business process rule file stored in the configuration library itself that is different from the rule library. The incremental synchronization instruction is used to indicate the business process rule file that is different from the configuration library and the rule library.

可选地，Optionally,

所述请求指令包括：第一消息摘要算法；The request instruction includes: a first message digest algorithm;

根据所述请求指令返回所述规则库信息，包括：Returning the rule base information according to the request instruction includes:

采用所述第一消息摘要算法计算得到所述规则库的信息摘要值，将所述信息摘要值返回至所述配置库。The first message digest algorithm is used to calculate the information digest value of the rule base, and the information digest value is returned to the configuration library.

可选地，Optionally,

采用所述第一消息摘要算法计算得到所述规则库的信息摘要值，之前包括：The information digest value of the rule base is calculated using the first message digest algorithm, which includes:

对所述第一消息摘要算法是否为预设的指定算法集合中的摘要算法进行核验，得到核验结果；Verifying whether the first message digest algorithm is a digest algorithm in a preset specified algorithm set, and obtaining a verification result;

若所述核验结果为所述第一消息摘要算法不为所述指定算法集合中的摘要算法，确定所述指定算法集合中的一算法为目标消息摘要算法，采用所述目标消息摘要算法计算得到所述信息摘要值，将所述信息摘要值及所述目标消息摘要算法返回至所述配置库，所述目标消息摘要算法用于所述配置库更新自身存储的所述第一消息摘要算法。If the verification result is that the first message digest algorithm is not a digest algorithm in the specified algorithm set, determine that an algorithm in the specified algorithm set is a target message digest algorithm, use the target message digest algorithm to calculate the information digest value, return the information digest value and the target message digest algorithm to the configuration library, and the target message digest algorithm is used by the configuration library to update the first message digest algorithm stored in itself.

可选地，Optionally,

所述请求指令通过第一请求消息携带传输，所述第一请求消息包括：The request instruction is transmitted by carrying a first request message, and the first request message includes:

信息摘要指示字段，用于指示是否生成信息摘要，和/或生成消息摘要使用的算法；A message digest indication field is used to indicate whether to generate a message digest and/or the algorithm used to generate the message digest;

所述规则库信息通过第一应答消息携带传输，所述第一应答消息包括：The rule base information is transmitted by carrying a first response message, and the first response message includes:

应答结果指示字段，用于指示是否成功应答；The response result indication field is used to indicate whether the response is successful;

规则库版本信息字段，用于指示当前规则库版本信息；The rule base version information field is used to indicate the current rule base version information;

摘要算法指示字段，用于指示消息摘要算法；The digest algorithm indication field is used to indicate the message digest algorithm;

摘要值指示字段，用于指示与规则库版本信息对应的由消息摘要算法计算得到的摘要值。The digest value indication field is used to indicate the digest value calculated by the message digest algorithm corresponding to the rule base version information.

可选地，Optionally,

所述全量同步指令通过第二请求消息携带传输，所述第二请求消息包括：The full synchronization instruction is transmitted by carrying a second request message, and the second request message includes:

规则库同步方式指示字段，用于指示同步方式为全量数据同步，或用于指示同步方式为增量数据同步；A rule base synchronization mode indication field is used to indicate that the synchronization mode is full data synchronization or incremental data synchronization;

规则库信息指示字段，用于同步方式为全量数据同步时指示携带全部规则库内容，或者用于同步方式为增量数据同步时指示携带存在差别的规则库内容；The rule base information indication field is used to indicate that all rule base contents are carried when the synchronization mode is full data synchronization, or to indicate that different rule base contents are carried when the synchronization mode is incremental data synchronization;

所述规则库中的所有业务流程规则文件通过第二应答消息携带传输，所述第二应答消息包括：All business process rule files in the rule base are transmitted by carrying a second response message, and the second response message includes:

应答结果指示字段，用于指示是否成功应答。The response result indication field is used to indicate whether the response is successful.

可选地，Optionally,

所述增量同步指令通过第二请求消息携带传输，所述第二请求消息包括：The incremental synchronization instruction is transmitted by carrying a second request message, and the second request message includes:

规则库同步方式指示字段，用于指示同步方式为全量数据同步，或为增量数据同步；The rule base synchronization mode indication field is used to indicate whether the synchronization mode is full data synchronization or incremental data synchronization;

根据所述增量同步指令发送的业务流程规则文件通过第二应答消息携带传输，所述第二应答消息包括：The business process rule file sent according to the incremental synchronization instruction is carried and transmitted through a second response message, and the second response message includes:

第二方面，本发明实施例提供了一种业务流程规则文件的生成装置，包括：In a second aspect, an embodiment of the present invention provides a device for generating a business process rule file, including:

获取模块，用于获取过程特性分析软件包PCAP样本包的第一样本特征字段及业务标签；An acquisition module, used for acquiring a first sample feature field and a business label of a process characteristic analysis software package PCAP sample package;

聚类模块，用于对所述第一样本特征字段进行聚类处理，得到第二样本特征字段；所述第二样本特征字段用于表征公共特征；A clustering module, used for performing clustering processing on the first sample feature field to obtain a second sample feature field; the second sample feature field is used to represent a common feature;

分析模块，用于分析所述业务标签与所述第二样本特征字段之间的关联，得到关联规则；An analysis module, used for analyzing the association between the service tag and the second sample feature field to obtain an association rule;

生成模块，用于根据所述关联规则生成业务流程规则文件。A generation module is used to generate a business process rule file according to the association rule.

第三方面，本发明实施例提供了一种电子设备，包括处理器，存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令，所述程序或指令被所述处理器执行时实现如第一方面中任一项所述的业务流程规则文件的生成方法中的步骤。In a third aspect, an embodiment of the present invention provides an electronic device, comprising a processor, a memory, and a program or instruction stored in the memory and executable on the processor, wherein the program or instruction, when executed by the processor, implements the steps in the method for generating a business process rule file as described in any one of the first aspects.

第四方面，本发明实施例提供了一种可读存储介质，所述可读存储介质上存储程序或指令，所述程序或指令被处理器执行时实现如第一方面中任一项所述的业务流程规则文件的生成方法中的步骤。In a fourth aspect, an embodiment of the present invention provides a readable storage medium, on which a program or instruction is stored. When the program or instruction is executed by a processor, the steps in the method for generating a business process rule file as described in any one of the first aspects are implemented.

在本发明实施例中，避免了人工肉眼查看数据报文中是否包含能够表征业务流的特征数据的低效率流程，提高了业务流程规则文件的生成效率；由于避免了人工的低效劳动，本发明实施例能够在相同的时间内生成更多的规则文件，对更多的软件包进行生成，拓宽了覆盖面、提高了样本数量，由此避免了过窄的覆盖面和小样本造成的极端个例对关联规则的干扰，进一步提高了业务规则文件的准确率；此外，本发明实施例通过获取过程特性分析软件包PCAP样本包的第一样本特征字段及业务标签；对所述第一样本特征字段进行聚类处理，得到第二样本特征字段；所述第二样本特征字段用于表征公共特征；分析所述业务标签与所述第二样本特征字段之间的关联，得到关联规则；根据所述关联规则生成业务流程规则文件，提高了业务流程规则文件的准确率。In the embodiment of the present invention, the inefficient process of manually checking with the naked eye whether the data message contains characteristic data that can characterize the business flow is avoided, and the efficiency of generating business process rule files is improved; since inefficient manual labor is avoided, the embodiment of the present invention can generate more rule files in the same time, generate more software packages, broaden the coverage, and increase the number of samples, thereby avoiding the interference of extreme cases caused by too narrow coverage and small samples on the association rules, and further improving the accuracy of the business rule files; in addition, the embodiment of the present invention obtains the first sample feature field and business label of the process characteristic analysis software package PCAP sample package; clusters the first sample feature field to obtain the second sample feature field; the second sample feature field is used to characterize common features; analyzes the association between the business label and the second sample feature field to obtain the association rule; generates the business process rule file according to the association rule, and improves the accuracy of the business process rule file.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art by reading the detailed description of the preferred embodiments below. The accompanying drawings are only for the purpose of illustrating the preferred embodiments and are not to be considered as limiting the present invention. Moreover, the same reference symbols are used throughout the accompanying drawings to represent the same components. In the accompanying drawings:

图1为本发明实施例业务流程规则文件的生成方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a method for generating a business process rule file according to an embodiment of the present invention;

图2为应用本发明实施例业务流程规则文件的生成方法的流程示意图；2 is a schematic diagram of a flow chart of a method for generating a business process rule file according to an embodiment of the present invention;

图3为规则库配置装置与用户面DPI系统的交互流程示意图；FIG3 is a schematic diagram of the interaction process between the rule base configuration device and the user plane DPI system;

图4为规则库配置装置执行流程示意图；FIG4 is a schematic diagram of the execution flow of the rule base configuration device;

图5为新增消息类型示意图；Figure 5 is a schematic diagram of a new message type;

图6为规则库版本信息查询请求消息体示意图；FIG6 is a schematic diagram of a rule base version information query request message body;

图7为规则库版本信息查询应答消息体示意图；FIG7 is a schematic diagram of a rule base version information query response message body;

图8为规则库配置请求消息体示意图；FIG8 is a schematic diagram of a rule base configuration request message body;

图9为规则库配置应答消息体示意图；FIG9 is a schematic diagram of a rule base configuration response message body;

图10为本发明实施例业务流程规则文件的生成装置的原理框图；10 is a functional block diagram of a device for generating a business process rule file according to an embodiment of the present invention;

图11为本发明实施例电子设备的原理框图。FIG. 11 is a functional block diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

4G/5G技术发展演进，业务量急剧增加，基于人工原始抓包并生成业务流规则的方法效率低、覆盖面窄、样本小，同时生成的业务规则文档易出错，版本不易维护。With the development and evolution of 4G/5G technology, the business volume has increased dramatically. The method based on manual original packet capture and generation of business flow rules is inefficient, has narrow coverage, and has a small sample size. At the same time, the generated business rule documents are prone to errors and the versions are difficult to maintain.

人工获取业务规则有以下缺点：Manual acquisition of business rules has the following disadvantages:

1)获取周期过长，单个应用的业务规则生成需要1-2周时间。1) The acquisition cycle is too long. It takes 1-2 weeks to generate business rules for a single application.

2)人工效率低，每个应用的业务规则获取依赖于人工抓包，肉眼分析。2) Low manual efficiency. The acquisition of business rules for each application relies on manual packet capture and visual analysis.

3)业务种类覆盖面窄，依赖人工发现规则，短期内只能分析少数的应用，应用业务覆盖面窄。3) The coverage of business types is narrow, relying on manual discovery of rules, and only a few applications can be analyzed in the short term, and the coverage of application business is narrow.

4)规则版本维护不方便，易出错。经过人工分析整理的规则文件，其格式易出错，且无法检验。4) Rule version maintenance is inconvenient and error-prone. The format of rule files that have been manually analyzed and organized is prone to errors and cannot be verified.

本发明实施例提供了一种业务流程规则文件的生成方法，参见图1所示，图1为本发明实施例业务流程规则文件的生成方法的流程示意图，业务流程规则文件的生成方法包括：An embodiment of the present invention provides a method for generating a business process rule file. Referring to FIG. 1 , FIG. 1 is a flow chart of a method for generating a business process rule file according to an embodiment of the present invention. The method for generating a business process rule file includes:

步骤11：获取过程特性分析软件包PCAP样本包的第一样本特征字段及业务标签；Step 11: Obtain the first sample feature field and business label of the process characteristic analysis software package PCAP sample package;

步骤12：对第一样本特征字段进行聚类处理，得到第二样本特征字段；第二样本特征字段用于表征公共特征；Step 12: clustering the first sample feature field to obtain a second sample feature field; the second sample feature field is used to characterize the common feature;

步骤13：分析业务标签与第二样本特征字段之间的关联，得到关联规则；Step 13: Analyze the association between the service tag and the second sample feature field to obtain an association rule;

步骤14：根据关联规则生成业务流程规则文件。Step 14: Generate a business process rule file based on the association rules.

示例性的，参见图2所示，图2为应用本发明实施例业务流程规则文件的生成方法的流程示意图，其中包括以下步骤：For example, referring to FIG. 2 , FIG. 2 is a flow chart of a method for generating a business process rule file according to an embodiment of the present invention, which includes the following steps:

1)数据解析与特征提取：1) Data analysis and feature extraction:

A1.为了方便后续算法分析，提高流规则的准确性，首先对拨测得到的应用PCAP报文进行流量解析，流量解析包括：IP解析、TCP/UDP解析、HTTP/HTTPS解析，具体可以使用pcap4j框架对应用流量进行分层解析，得到每一层协议的特征字段，即相当于本发明实施例的第一样本特征字段：A1. In order to facilitate subsequent algorithm analysis and improve the accuracy of flow rules, the application PCAP message obtained by dialing is first analyzed for traffic. Traffic analysis includes: IP analysis, TCP/UDP analysis, HTTP/HTTPS analysis. Specifically, the pcap4j framework can be used to perform hierarchical analysis on the application traffic to obtain the characteristic fields of each layer of the protocol, which is equivalent to the first sample characteristic field of the embodiment of the present invention:

IP层：源IP、目的IP、IP分片数、报文长度；IP layer: source IP, destination IP, number of IP fragments, and message length;

TCP/UDP层：源端口、目的端口、窗口大小、上下行流量、乱序包数、重传包数、payload长度等；TCP/UDP layer: source port, destination port, window size, upstream and downstream traffic, number of out-of-order packets, number of retransmitted packets, payload length, etc.

HTTP/HTTPS：HOST、URI、User-Agent、Cookie、ServerName、IdAtCommonName等。HTTP/HTTPS: HOST, URI, User-Agent, Cookie, ServerName, IdAtCommonName, etc.

B1.从解析的特征字段中提取跟业务强相关的特征，剔除无关数据，避免这些数据对后续结果的影响。B1. Extract features that are strongly related to the business from the parsed feature fields and eliminate irrelevant data to avoid the impact of these data on subsequent results.

2)自动规则分析：2) Automatic rule analysis:

A2.特征编码：采用one-hot的编码方式，将特征字段的值从字符格式映射成数字格式，便于后续进行聚类分析。A2. Feature encoding: Use one-hot encoding to map the value of the feature field from character format to digital format to facilitate subsequent clustering analysis.

B2.聚类分析：将编码转换后的多维特征数据进行K-means(或者DBScan)聚类分析，对所有特征向量进行预分类，方便后续对每个特征字段进行子串分析。B2. Cluster analysis: Perform K-means (or DBScan) cluster analysis on the multi-dimensional feature data after encoding conversion, and pre-classify all feature vectors to facilitate subsequent substring analysis of each feature field.

其中，多维特征数据为解析IP、TCP/UDP、HTTP/HTTPS协议得到的各协议中的详细字段，作为特征数据。The multi-dimensional feature data is the detailed fields in each protocol obtained by parsing the IP, TCP/UDP, HTTP/HTTPS protocols, as feature data.

C2.子串分析：将预分类好的数据，在每一个子分类中，使用最长公共子串算法求出每个特征字段的最长公共子串，并用求得的最长公共子串代替原来的特征字段的值，所得到的样本特征字段即相当于本发明实施例中的第二样本特征字段。最长公共子串表征各个特征字段之间的公共特征，由此，所得到的第二样本特征字段用于表征公共特征。C2. Substring analysis: For the pre-classified data, in each sub-classification, the longest common substring algorithm is used to find the longest common substring of each feature field, and the obtained longest common substring is used to replace the value of the original feature field. The obtained sample feature field is equivalent to the second sample feature field in the embodiment of the present invention. The longest common substring represents the common features between the feature fields, and thus the obtained second sample feature field is used to represent the common features.

D2.关联分析：将子串分析的特征和业务标签数据，一起做关联分析，通过多次实验调整FPGrowth算法的支持度和置信度，最终选择合适的参数，训练得到目标关联分析模型，采用目标关联分析模型分析业务标签与某单个或多个第二样本特征字段之间的关联规则，此关联规则将作为生成业务流规则的依据。D2. Association analysis: The substring analysis features and business label data are analyzed together. The support and confidence of the FPGrowth algorithm are adjusted through multiple experiments. Finally, appropriate parameters are selected and the target association analysis model is trained. The target association analysis model is used to analyze the association rules between the business label and a single or multiple second sample feature fields. This association rule will be used as the basis for generating business flow rules.

若使用关联规则算法FPGrowth，则该算法的输入数据可以是多维特征向量和多维特征向量对应的业务标签数据，输出数据是跟业务标签相关的关联规则。If the association rule algorithm FPGrowth is used, the input data of the algorithm may be a multidimensional feature vector and business label data corresponding to the multidimensional feature vector, and the output data is the association rules related to the business label.

3)业务流规则生成：3) Business flow rule generation:

将第2)步生成的与业务相关的关联规则转化成可以被DPI识别的标准规则格式的文件，得到业务流规则文件。The business-related association rules generated in step 2) are converted into a file in a standard rule format that can be recognized by DPI to obtain a business flow rule file.

其中，转化是指格式的转化，示例性的，本发明实施例中，关联规则可以为以下形式：The conversion refers to the conversion of the format. For example, in the embodiment of the present invention, the association rule may be in the following form:

SERVER_NAME__s.hongyibo.com.cn&&ID_AT_COMMON_NAME__*.hongyibo.com.cn＝＝>AppInfo__1900061000SERVER_NAME__s.hongyibo.com.cn&&ID_AT_COMMON_NAME__*.hongyibo.com.cn＝＝>AppInfo__1900061000

此条关联规则代表：server_name和id_at_common_name与appinfo_1900061000相关。This association rule means: server_name and id_at_common_name are related to appinfo_1900061000.

则上述关联规则转化后格式可以如下：The above association rules can be converted into the following format:

[SoSoMap][SoSoMap]

#Rule_ID:17463#Rule_ID:17463

Server_name＝s.hongyibo.com.cnServer_name=s.hongyibo.com.cn

Id_at_common_name＝*.hongyibo.com.cnId_at_common_name=*.hongyibo.com.cn

Name＝SoSoMapName＝SoSoMap

上述转化的转化步骤如下：The conversion steps for the above conversion are as follows:

第一步、appinfo_1900061000后面code转化为对应的业务名称，比如SoSoMap；Step 1. Convert the code after appinfo_1900061000 into the corresponding business name, such as SoSoMap;

第二步、在原始规则库文件中找到第一步的业务名称，并比对该业务名称下的Rule_ID，新的Rule_ID在其最大的Rule_ID上自动加1；Step 2: Find the business name in the first step in the original rule base file and compare the Rule_ID under the business name. The new Rule_ID will automatically increase by 1 on the largest Rule_ID.

第三步、将与appinfo关联的字段费别按照关联规则的形式列出；Step 3: List the fields associated with appinfo in the form of association rules;

第四步、在Name字段加上本条规则对应的业务名称。Step 4. Add the service name corresponding to this rule in the Name field.

参见图2所示，图中“带标注的特征向量”，是由于拨测抓包时会对PCAP打标签，该标签会最终成为特征向量的标注。As shown in FIG. 2 , the “labeled feature vector” in the figure is because the PCAP will be labeled when the packet is captured, and the label will eventually become the label of the feature vector.

在本发明实施例中，避免了人工肉眼查看数据报文中是否包含能够表征业务流的特征数据的低效率流程，提高了业务流程规则文件的生成效率；由于避免了人工的低效劳动，本发明实施例能够在相同的时间内生成更多的规则文件，对更多的软件包进行生成，拓宽了覆盖面、提高了样本数量，由此避免了过窄的覆盖面和小样本造成的极端个例对关联规则的干扰，进一步提高了业务规则文件的准确率；此外，本发明实施例通过获取过程特性分析软件包PCAP样本包的第一样本特征字段及业务标签；对所述第一样本特征字段进行聚类处理，得到第二样本特征字段；所述第二样本特征字段用于表征公共特征；分析所述业务标签与所述第二样本特征字段之间的关联，得到关联规则；根据所述关联规则生成业务流程规则文件，提高了生成的业务流程规则文件的准确率。In the embodiment of the present invention, the inefficient process of manually checking with the naked eye whether the data message contains characteristic data that can characterize the business flow is avoided, and the efficiency of generating business process rule files is improved; since inefficient manual labor is avoided, the embodiment of the present invention can generate more rule files in the same time, generate more software packages, broaden the coverage, and increase the number of samples, thereby avoiding the interference of extreme cases caused by too narrow coverage and small samples on the association rules, and further improving the accuracy of the business rule files; in addition, the embodiment of the present invention obtains the first sample feature field and business label of the process characteristic analysis software package PCAP sample package; clusters the first sample feature field to obtain the second sample feature field; the second sample feature field is used to characterize common features; analyzes the association between the business label and the second sample feature field to obtain the association rule; generates the business process rule file according to the association rule, and improves the accuracy of the generated business process rule file.

本发明的一些实施例中，可选地，对第一样本特征字段进行聚类处理，包括：In some embodiments of the present invention, optionally, clustering the first sample feature field includes:

采用聚类算法对第一样本特征字段进行预分类；Using a clustering algorithm to pre-classify the first sample feature field;

采用最长公共子串算法对预分类后的第一样本特征字段进行求解，得到与预分类后的第一样本特征字段一一对应的最长公共子串值；The longest common substring algorithm is used to solve the first sample characteristic field after pre-classification to obtain the longest common substring value corresponding to the first sample characteristic field after pre-classification;

采用最长公共子串值对应更新预分类后的第一样本特征字段的字段值，得到第二样本特征字段。The field value of the first sample characteristic field after pre-classification is updated corresponding to the longest common substring value to obtain the second sample characteristic field.

本发明的一些实施例中，聚类算法可以包括以下至少一项：K-means及DBScan。In some embodiments of the present invention, the clustering algorithm may include at least one of the following: K-means and DBScan.

K-means算法接受输入量k；然后将n个数据对象划分为k个聚类以便使得所获得的聚类满足：同一聚类中的对象相似度较高；而不同聚类中的对象相似度较小。聚类相似度是利用各聚类中对象的均值所获得一个“中心对象”(引力中心)来进行计算的。The K-means algorithm accepts an input of k and then divides n data objects into k clusters so that the obtained clusters satisfy: objects in the same cluster have a high degree of similarity, while objects in different clusters have a low degree of similarity. Cluster similarity is calculated using a "central object" (center of gravity) obtained by the mean of the objects in each cluster.

K-means算法的工作过程说明如下：The working process of the K-means algorithm is described as follows:

首先从n个数据对象任意选择k个对象作为初始聚类中心；而对于所剩下其它对象，则根据它们与这些聚类中心的相似度(距离)，分别将它们分配给与其最相似的(聚类中心所代表的)聚类；First, k objects are randomly selected from n data objects as initial cluster centers; and for the remaining objects, they are assigned to the clusters (represented by the cluster centers) that are most similar to them according to their similarity (distance) to these cluster centers;

然后再计算每个所获新聚类的聚类中心(该聚类中所有对象的均值)；不断重复这一过程直到标准测度函数开始收敛为止。Then calculate the cluster center of each new cluster (the mean of all objects in the cluster); repeat this process until the standard measure function begins to converge.

DBScan(Density-Based Spatial Clustering of Applications with Noise，聚类算法)是一个比较有代表性的基于密度的聚类算法。与划分和层次聚类方法不同，它将簇定义为密度相连的点的最大集合，能够把具有足够高密度的区域划分为簇，并可在噪声的空间数据库中发现任意形状的聚类。DBScan (Density-Based Spatial Clustering of Applications with Noise) is a representative density-based clustering algorithm. Different from partitioning and hierarchical clustering methods, it defines a cluster as the largest set of density-connected points, can divide areas with sufficiently high density into clusters, and can find clusters of any shape in noisy spatial databases.

本发明实施例中，示例性的，参见图2所示，并结合1)数据解析与特征提取的步骤C2：In the embodiment of the present invention, for example, referring to FIG. 2 , and in combination with step C2 of 1) data analysis and feature extraction:

子串分析：将预分类好的数据，在每一个子分类中，使用最长公共子串算法求出每个特征字段的最长公共子串，并用求得的最长公共子串代替原来的特征字段的值，所得到的样本特征字段即相当于本发明实施例中的第二样本特征字段。最长公共子串表征各个特征字段之间的公共特征，由此，所得到的第二样本特征字段用于表征公共特征。Substring analysis: For the pre-classified data, in each sub-classification, the longest common substring algorithm is used to find the longest common substring of each feature field, and the obtained longest common substring is used to replace the value of the original feature field. The obtained sample feature field is equivalent to the second sample feature field in the embodiment of the present invention. The longest common substring represents the common features between the feature fields, and thus the obtained second sample feature field is used to represent the common features.

本发明实施例中，通过求解最长公共子串值，得到了样本的公共特征；进一步采用最长公共子串值对应更新预分类后的第一样本特征字段的字段值，得到第二样本特征字段，有利于避免样本中的非公共特征对关联规则的干扰，有利于提高生成的业务流程规则文件的准确率。In an embodiment of the present invention, the common characteristics of the sample are obtained by solving the longest common substring value; further, the field value of the first sample characteristic field after pre-classification is updated corresponding to the longest common substring value to obtain the second sample characteristic field, which is beneficial to avoid the interference of non-public characteristics in the sample on the association rules and is beneficial to improving the accuracy of the generated business process rule file.

本发明的一些实施例中，可选地，采用聚类算法对第一样本特征字段进行预分类，包括：In some embodiments of the present invention, optionally, a clustering algorithm is used to pre-classify the first sample feature field, including:

将第一样本特征字段集合组成第一样本集合，将第一样本集合发送至与用户关联的交互端，第一样本集合用于用户剔除非强相关的第一样本特征字段，并用于得到仅包含强相关的第一样本特征字段的第二样本集合；The first sample feature field set is formed into a first sample set, and the first sample set is sent to an interactive terminal associated with a user, wherein the first sample set is used by the user to remove non-strongly relevant first sample feature fields, and is used to obtain a second sample set containing only strongly relevant first sample feature fields;

接收交互端发送的第二样本集合；receiving a second sample set sent by the interaction terminal;

采用聚类算法对第二样本集合中的第一样本特征字段进行预分类。A clustering algorithm is used to pre-classify the first sample feature fields in the second sample set.

示例性的，参见图2所示，并结合1)数据解析与特征提取的步骤B1：For example, see FIG. 2 and combine with step B1 of 1) data analysis and feature extraction:

本发明实施例中，从解析的特征字段中提取跟业务强相关的特征，剔除无关数据，避免这些数据对后续结果的影响。In the embodiment of the present invention, features that are strongly related to the business are extracted from the parsed feature fields, and irrelevant data are eliminated to avoid the influence of these data on subsequent results.

本发明的一些实施例中，可选地，采用聚类算法对第二样本集合中的第一样本特征字段进行预分类，之前包括：In some embodiments of the present invention, optionally, a clustering algorithm is used to pre-classify the first sample feature field in the second sample set, which includes:

对第二样本集合中的第一样本特征字进行格式检测，得到第一检测结果；Performing format detection on the first sample feature word in the second sample set to obtain a first detection result;

若第一检测结果为第一样本特征字段为字符格式，采用一位有效编码one-hot方法将第二样本集合中的第一样本特征字段转换为数字格式。If the first detection result is that the first sample feature field is in character format, the first sample feature field in the second sample set is converted into a digital format using a one-hot encoding method.

示例性的，参见图2所示，并结合1)数据解析与特征提取的步骤A2：For example, see FIG. 2 and combine with step A2 of 1) data analysis and feature extraction:

特征编码：采用one-hot的编码方式，将特征字段的值从字符格式映射成数字格式，便于后续进行聚类分析。Feature encoding: One-hot encoding is used to map the value of the feature field from character format to digital format to facilitate subsequent clustering analysis.

本发明实施例中，通过若第一检测结果为第一样本特征字段为字符格式，采用一位有效编码one-hot方法将第二样本集合中的第一样本特征字段转换为数字格式，避免了字符形式的第一样本特征字段无法进行聚类分析而造成了流程停滞或者报错，节约了排除报错以及修改问题所需的时间，进一步提高了规则文件的生成效率。In an embodiment of the present invention, if the first detection result is that the first sample feature field is in character format, a one-hot encoding method is used to convert the first sample feature field in the second sample set into a digital format, thereby avoiding the process stagnation or error caused by the inability to perform cluster analysis on the first sample feature field in character form, saving the time required to eliminate errors and correct problems, and further improving the efficiency of rule file generation.

本发明的一些实施例中，可选地，采用聚类算法对第一样本特征字段进行预分类，之前包括：In some embodiments of the present invention, optionally, a clustering algorithm is used to pre-classify the first sample feature field, which includes:

对第一样本特征字段进行格式检测，得到第二检测结果；Performing format detection on the first sample characteristic field to obtain a second detection result;

若第二检测结果为第一样本特征字段为字符格式，采用one-hot方法将第一样本特征字段转换为数字格式。If the second detection result is that the first sample feature field is in character format, a one-hot method is used to convert the first sample feature field into a digital format.

本发明实施例中，通过若第二检测结果为第一样本特征字段为字符格式，采用one-hot方法将第一样本特征字段转换为数字格式，避免了字符形式的第一样本特征字段无法进行聚类分析而造成了流程停滞或者报错，节约了排除报错以及修改问题所需的时间，进一步提高了规则文件的生成效率。In an embodiment of the present invention, if the second detection result is that the first sample feature field is in character format, a one-hot method is used to convert the first sample feature field into a digital format, thereby avoiding the process stagnation or error caused by the inability to perform cluster analysis on the first sample feature field in character form, saving the time required to eliminate errors and correct problems, and further improving the efficiency of rule file generation.

本发明的一些实施例中，可选地，分析业务标签与第二样本特征字段之间的关联，包括：In some embodiments of the present invention, optionally, analyzing the association between the service tag and the second sample feature field includes:

获取预设的训练集合，预设的训练集合包括：训练业务标签及训练样本特征字段；Obtain a preset training set, the preset training set includes: training business labels and training sample feature fields;

采用训练业务标签及训练样本特征字段对关联分析模型进行训练，得到目标关联分析模型；The association analysis model is trained using training business labels and training sample feature fields to obtain a target association analysis model;

采用目标关联分析模型分析业务标签与第二样本特征字段之间的关联。The target association analysis model is used to analyze the association between the service tag and the second sample feature field.

示例性的，参见图2所示，并结合1)数据解析与特征提取的步骤D2：For example, see FIG. 2 , and combine 1) step D2 of data analysis and feature extraction:

关联分析：将子串分析的特征和业务标签数据，一起做关联分析，通过多次实验调整FPGrowth算法的支持度和置信度，最终选择合适的参数，训练得到目标关联分析模型，采用目标关联分析模型分析业务标签与某单个或多个第二样本特征字段之间的关联规则，此关联规则将作为生成业务流规则的依据。Association analysis: The substring analysis features and business tag data are analyzed together. The support and confidence of the FPGrowth algorithm are adjusted through multiple experiments. Finally, the appropriate parameters are selected and the target association analysis model is trained. The target association analysis model is used to analyze the association rules between the business tag and a single or multiple second sample feature fields. This association rule will be used as the basis for generating business flow rules.

本发明实施例中，通过采用训练业务标签及训练样本特征字段对关联分析模型进行训练，得到目标关联分析模型；再采用目标关联分析模型分析业务标签与第二样本特征字段之间的关联，避免了人工肉眼查看数据报文中是否包含能够表征业务流的特征数据的低效率流程；并且，由于人工获取业务规则文件还受到人员知识面的局限，不能够适应随通信技术发展而涌现的新样本特征和新规则，本发明实施例采用目标关联分析模型能够克服人员知识面的局限，提高对新样本特征和新规则的适应能力，保证规则文件的高效率生成和高准确率。In an embodiment of the present invention, an association analysis model is trained by using training service labels and training sample feature fields to obtain a target association analysis model; the target association analysis model is then used to analyze the association between the service label and the second sample feature field, thereby avoiding the inefficient process of manually checking with the naked eye whether the data message contains feature data that can characterize the service flow; and, since manual acquisition of business rule files is also limited by personnel's knowledge and cannot adapt to new sample features and new rules that emerge with the development of communication technology, the target association analysis model used in an embodiment of the present invention can overcome the limitations of personnel's knowledge, improve the adaptability to new sample features and new rules, and ensure efficient generation and high accuracy of rule files.

本发明的一些实施例中，可选地，根据关联规则生成业务流程规则文件，之后包括：In some embodiments of the present invention, optionally, generating a business process rule file according to the association rule includes:

将业务流程规则文件加载至规则库，并更新规则库信息。Load the business process rule file into the rule base and update the rule base information.

示例性的，参见图2所示，3)业务流规则生成：For example, see FIG. 2 , 3) generating business flow rules:

其中，DPI(Deep Packet Inspection)是一种基于数据包的深度检测技术，针对不同的网络应用层载荷(例如HTTP、DNS等)进行深度检测，通过对报文的有效载荷检测决定其合法性。Among them, DPI (Deep Packet Inspection) is a deep inspection technology based on data packets. It performs deep inspection on different network application layer loads (such as HTTP, DNS, etc.) and determines the legitimacy of the message by inspecting the payload of the message.

DPI设备通过对网络的关键点处的流量和报文内容进行检测分析，可以根据事先定义的策略对检测流量进行过滤控制，能完成所在链路的业务精细化识别、业务流量流向分析、业务流量占比统计、业务占比整形、以及应用层拒绝服务攻击、对病毒、木马进行过滤和滥用P2P的控制等功能。DPI devices detect and analyze the traffic and message content at key points in the network, and can filter and control the detected traffic according to pre-defined strategies. They can complete functions such as refined business identification of the link, business traffic flow analysis, business traffic share statistics, business share shaping, application layer denial of service attacks, filtering of viruses and Trojans, and control of P2P abuse.

本发明实施例中，将业务流程规则文件加载至规则库，并更新规则库信息，可以是将业务流程规则文件加载至DPI系统的规则库，并更新DPI系统的规则库信息。In the embodiment of the present invention, loading the business process rule file into the rule base and updating the rule base information may be loading the business process rule file into the rule base of the DPI system and updating the rule base information of the DPI system.

具体地，将业务流规则文件直接加载于DPI系统中，通过配置装置与DPI系统的交互的方式维护不同版本的规则库，版本检查及控制逻辑，配置装置可以是独立的系统或服务，也可以是其他系统或服务的子模块。配置装置于用户面DPI系统之间的交互基于现有的SDTP协议(Safe Data Transfer Protocol，安全数据传输协议)作为承载协议。基于SDTP协议新增规则库版本协商消息类型，实现规则库版本的查询和版本一致性检查。基于SDTP协议新增规则库配置消息类型，实现规则库的下发配置。Specifically, the business flow rule file is directly loaded into the DPI system, and different versions of the rule base, version checking and control logic are maintained through the interaction between the configuration device and the DPI system. The configuration device can be an independent system or service, or a sub-module of other systems or services. The interaction between the configuration device and the user-side DPI system is based on the existing SDTP protocol (Safe Data Transfer Protocol) as the bearer protocol. A new rule base version negotiation message type is added based on the SDTP protocol to implement query of the rule base version and version consistency check. A new rule base configuration message type is added based on the SDTP protocol to implement the distribution configuration of the rule base.

参见图3及图4所示，图3为规则库配置装置与用户面DPI系统的交互流程示意图，其中：Referring to FIG. 3 and FIG. 4 , FIG. 3 is a schematic diagram of the interaction process between the rule base configuration device and the user plane DPI system, wherein:

每条XDR的三个黑色块分别对应App_Type(业务大类)、App_Sub_Typ(业务小类)及App_Content(业务小类细分)，是标记业务类型的三个字段；The three black blocks in each XDR correspond to App_Type (business category), App_Sub_Typ (business subcategory) and App_Content (business subcategory subdivision), which are three fields marking the business type;

UE：用户设备；UE: User Equipment;

eNB：4G基站；eNB: 4G base station;

gNB：5G基站；gNB: 5G base station;

SGW：Serving GateWay的缩写，意为服务网关；SGW: abbreviation of Serving GateWay, meaning service gateway;

UPF：User Plane Function的缩写，意为用户面功能，主要负责5G核心网用户面数据包的路由和转发相关功能；UPF: User Plane Function, which is mainly responsible for the routing and forwarding of user plane data packets in the 5G core network;

S1-U：eNodeB与S-GW之间通过该接口互连，为用户面接口，用于数据报文传输；S1-U: The eNodeB and S-GW are interconnected through this interface, which is the user plane interface and is used for data message transmission;

N3：5G RAN(Radio Access Network)与UPF(User Plane Function)间的接口，主要用于传递5G RAN与UPF间的上下行用户面数据。N3: The interface between 5G RAN (Radio Access Network) and UPF (User Plane Function), mainly used to transmit uplink and downlink user plane data between 5G RAN and UPF.

图3中，用户面DPI系统采集4G核心网中S1-U接口或5G核心网中N3接口数据，基于规则库进行业务识别，生成业务大类、业务小类、业务小类细分三个字段，并将三个字段写入XDR数据中，为上层经营分析、网络管理等应用系统提供数据支撑。In Figure 3, the user plane DPI system collects data from the S1-U interface in the 4G core network or the N3 interface in the 5G core network, identifies services based on the rule base, generates three fields: service category, service subcategory, and service subcategory segmentation, and writes the three fields into the XDR data to provide data support for upper-level business analysis, network management and other application systems.

图3中，编号为1至5的黑色圆块示意了用户面DPI系统与配置库配置装置之间交互流程的次序，依次为：In FIG. 3 , the black circles numbered 1 to 5 illustrate the order of the interaction process between the user plane DPI system and the configuration library configuration device, which are:

编号1：规则库配置装置向用户面DPI系统发送RuleVer_Req消息，请求获取当前规则库版本，并指定通过何种算法生成消息摘要值；No. 1: The rule base configuration device sends a RuleVer_Req message to the user plane DPI system, requesting to obtain the current rule base version and specifying the algorithm used to generate the message digest value;

编号2：用户面DPI系统向规则库配置装置发送RuleVer_Resp消息，应答查询结果、当前使用的规则库版本号、消息摘要算法及采用该消息摘要算法计算得到的消息摘要值；Number 2: The user plane DPI system sends a RuleVer_Resp message to the rule base configuration device, replying with the query result, the rule base version number currently used, the message digest algorithm, and the message digest value calculated using the message digest algorithm;

编号3：规则库配置装置，检查RuleVer_Resp消息返回的版本信息与消息摘要值是否正常，如正常，则检查当前版本是否最新版本；如当前版本是最新版本，流程终止；如当前版本不是最新版本，则规则库管理装置获取装置内最新版本，对每个业务生成一个摘要值，与编号2中RuleVer_Resp消息中的摘要值进行比对，判断规则不同的业务数是否超过规则库中总业务数的半数；No. 3: The rule base configuration device checks whether the version information and message digest value returned by the RuleVer_Resp message are normal. If normal, check whether the current version is the latest version; if the current version is the latest version, the process terminates; if the current version is not the latest version, the rule base management device obtains the latest version in the device, generates a digest value for each business, and compares it with the digest value in the RuleVer_Resp message in No. 2 to determine whether the number of businesses with different rules exceeds half of the total number of businesses in the rule base;

编号4：如果不超过规则库中总业务数的半数，获取不同业务的规则，触发RuleConfig_Req消息到用户面DPI系统，数据同步类型选择增量，请求推送最新版本规则库版本信息和增量规则信息；如果超过规则库中总业务数的半数，触发RuleConfig_Req消息到用户面DPI系统，数据同步类型选择全量，请求推送最新版本规则库版本信息和全部规则库信息；No. 4: If the number of services does not exceed half of the total number of services in the rule base, obtain the rules of different services, trigger the RuleConfig_Req message to the user plane DPI system, select incremental as the data synchronization type, and request to push the latest version of the rule base version information and incremental rule information; if the number of services exceeds half of the total number of services in the rule base, trigger the RuleConfig_Req message to the user plane DPI system, select full as the data synchronization type, and request to push the latest version of the rule base version information and all rule base information;

编号5：用户面DPI系统解析RuleConfig_Req消息中的数据同步类型，如果是全量数据同步，直接替换规则库文件；如果是增量数据同步，将规则库文件中需要更新的业务规则进行更新替换。用户面DPI系统向规则库配置装置发送RuleConfig_Resp消息，返回配置结果。Number 5: The user plane DPI system parses the data synchronization type in the RuleConfig_Req message. If it is full data synchronization, the rule base file is directly replaced; if it is incremental data synchronization, the business rules that need to be updated in the rule base file are updated and replaced. The user plane DPI system sends a RuleConfig_Resp message to the rule base configuration device and returns the configuration result.

图4为规则库配置装置执行流程示意图，包括如下步骤(步骤1至步骤10)：FIG4 is a schematic diagram of the execution flow of the rule base configuration device, which includes the following steps (step 1 to step 10):

1.规则库配置装置向用户面DPI系统发送RuleVer_Req消息，请求获取当前规则库版本，并指定通过何种算法生成消息摘要值；1. The rule base configuration device sends a RuleVer_Req message to the user plane DPI system, requesting the current rule base version and specifying the algorithm used to generate the message digest value;

2.用户面DPI系统向规则库配置装置发送RuleVer_Resp消息，应答查询结果、当前使用的规则库版本号、消息摘要算法及采用该消息摘要算法计算得到的消息摘要值；2. The user plane DPI system sends a RuleVer_Resp message to the rule base configuration device, replying with the query result, the rule base version number currently in use, the message digest algorithm, and the message digest value calculated using the message digest algorithm;

3.规则库配置装置，检查RuleVer_Resp消息返回的版本信息与消息摘要值是否正常，如是，进入步骤4，反之进入步骤5；3. The rule base configuration device checks whether the version information and message digest value returned by the RuleVer_Resp message are normal. If so, proceed to step 4, otherwise proceed to step 5;

4.检查当前版本是否最新版本，如是，流程终止，反之，进入步骤5；4. Check whether the current version is the latest version. If yes, the process terminates. Otherwise, go to step 5;

5.规则库管理装置获取装置内最新版本，对每个业务生成一个摘要值，与步骤2中RuleVer_Resp消息中的摘要值进行比对；5. The rule base management device obtains the latest version in the device, generates a summary value for each service, and compares it with the summary value in the RuleVer_Resp message in step 2;

6.判断规则不同的业务数是否超过规则库中总业务数的半数，如果不超过，进入步骤7，反之进入步骤8；6. Determine whether the number of services with different rules exceeds half of the total number of services in the rule base. If not, proceed to step 7; otherwise, proceed to step 8.

7.获取不同业务的规则，触发RuleConfig_Req消息到用户面DPI系统，数据同步类型选择增量，请求推送最新版本规则库版本信息和增量规则信息；7. Obtain rules for different services, trigger the RuleConfig_Req message to the user plane DPI system, select incremental as the data synchronization type, and request to push the latest version of the rule base version information and incremental rule information;

8.触发RuleConfig_Req消息到用户面DPI系统，数据同步类型选择全量，请求推送最新版本规则库版本信息和全部规则库信息；8. Trigger the RuleConfig_Req message to the user plane DPI system, select full data synchronization type, and request to push the latest version of the rule base version information and all rule base information;

9.用户面DPI系统解析RuleConfig_Req消息中的数据同步类型，如果是全量数据同步，直接替换规则库文件；如果是增量数据同步，将规则库文件中需要更新的业务规则进行更新替换。9. The user plane DPI system parses the data synchronization type in the RuleConfig_Req message. If it is full data synchronization, the rule base file is directly replaced; if it is incremental data synchronization, the business rules that need to be updated in the rule base file are updated and replaced.

10.用户面DPI系统向规则库配置装置发送RuleConfig_Resp消息，返回配置结果。10. The user plane DPI system sends a RuleConfig_Resp message to the rule base configuration device and returns the configuration result.

步骤1中，规则库配置装置向用户面DPI系统发送RuleVer_Req消息中指定通过何种算法生成消息摘要值；而步骤2中，用户面DPI系统向规则库配置装置发送的RuleVer_Resp消息还包括消息摘要算法。此过程为规则库配置装置与用户面DPI系统双方进行的摘要算法协商过程，具体地：In step 1, the rule base configuration device sends a RuleVer_Req message to the user plane DPI system, specifying the algorithm used to generate the message digest value; and in step 2, the RuleVer_Resp message sent by the user plane DPI system to the rule base configuration device also includes the message digest algorithm. This process is a digest algorithm negotiation process between the rule base configuration device and the user plane DPI system, specifically:

用户面DPI系统可以核验根据RuleVer_Req消息指定的摘要算法是否为自身支持配置的算法，若不是自身支持配置的算法，则用户面DPI系统使用自身支持配置的摘要算法计算得到信息摘要值，并通过RuleVer_Resp消息返回用户面DPI系统自身支持配置的摘要算法及采用该算法计算得到的消息摘要值，使得规则库配置装置也采用用户面DPI系统自身支持配置的摘要算法，完成协商。The user plane DPI system can verify whether the digest algorithm specified according to the RuleVer_Req message is the algorithm supported by itself. If it is not the algorithm supported by itself, the user plane DPI system uses the digest algorithm supported by itself to calculate the information digest value, and returns the digest algorithm supported by the user plane DPI system and the message digest value calculated by the algorithm through the RuleVer_Resp message, so that the rule base configuration device also adopts the digest algorithm supported by the user plane DPI system to complete the negotiation.

步骤6中，规则不同的业务数通过判断信息摘要值不相同的数量得到，因为对于每个业务生成一个摘要值，所以判断规则不同的业务数也就是判断摘要值不同的个数。In step 6, the number of services with different rules is obtained by determining the number of different information summary values. Because a summary value is generated for each service, determining the number of services with different rules is to determine the number of different summary values.

步骤6中，判断规则不同的业务数是否超过规则库中总业务数的半数，即为判断与规则库差别的业务流程规则文件的数量是否超过预设的差值阈值。预设的差值阈值可以为总业务数的半数。In step 6, it is determined whether the number of services with different rules exceeds half of the total number of services in the rule base, that is, whether the number of service process rule files that differ from the rule base exceeds a preset difference threshold. The preset difference threshold may be half of the total number of services.

本发明实施例，通过业务流规则的配置管理，可以快速推送新版本的规则库；通过规则配置管理接口的标准化，可以加速规则库的更新效率和次数，解决各省、各厂家业务识别结果差异大的问题。The embodiment of the present invention can quickly push a new version of the rule base through configuration management of business flow rules; through standardization of the rule configuration management interface, the update efficiency and frequency of the rule base can be accelerated, solving the problem of large differences in business identification results among provinces and manufacturers.

本发明的一些实施例中，可选地，更新规则库信息，之后包括：In some embodiments of the present invention, optionally, updating the rule base information may then include:

根据请求指令返回规则库信息，规则库信息用于配置库确定自身存储的业务流程规则文件是否为最新，若不为最新且与规则库差别的业务流程规则文件的数量超出预设的差值阈值，配置库发送全量同步指令；若不为最新且与规则库差别的业务流程规则文件的数量未超出预设的差值阈值，配置库发送增量同步指令。The rule base information is returned according to the request instruction. The rule base information is used by the configuration library to determine whether the business process rule files stored in it are the latest. If they are not the latest and the number of business process rule files that differ from the rule base exceeds the preset difference threshold, the configuration library sends a full synchronization instruction; if they are not the latest and the number of business process rule files that differ from the rule base does not exceed the preset difference threshold, the configuration library sends an incremental synchronization instruction.

本发明实施例图3及图4的示例中，步骤6中，规则不同的业务数通过判断信息摘要值不相同的数量得到，因为对于每个业务生成一个摘要值，所以判断规则不同的业务数也就是判断摘要值不同的个数；判断规则不同的业务数是否超过规则库中总业务数的半数，即为判断与规则库差别的业务流程规则文件的数量是否超过预设的差值阈值，预设的差值阈值可以为总业务数的半数。In the examples of Figures 3 and 4 of the embodiments of the present invention, in step 6, the number of businesses with different rules is obtained by determining the number of different information summary values. Since a summary value is generated for each business, determining the number of businesses with different rules is to determine the number of different summary values. Determining whether the number of businesses with different rules exceeds half of the total number of businesses in the rule base is to determine whether the number of business process rule files that are different from the rule base exceeds a preset difference threshold. The preset difference threshold can be half of the total number of businesses.

本发明的一些实施例中，可选地，根据请求指令返回规则库信息，之后包括：In some embodiments of the present invention, optionally, returning rule base information according to the request instruction, then comprising:

接收配置库发送的全量同步指令；Receive the full synchronization instruction sent by the configuration library;

发送规则库中的所有业务流程规则文件至配置库，以更新配置库自身存储的所有业务流程规则文件。Send all business process rule files in the rule base to the configuration base to update all business process rule files stored in the configuration base itself.

本发明实施例图3及图4的示例中，步骤8中，触发RuleConfig_Req消息到用户面DPI系统，数据同步类型选择全量，请求推送最新版本规则库版本信息和全部规则库信息。In the examples of FIG. 3 and FIG. 4 of the embodiment of the present invention, in step 8, a RuleConfig_Req message is triggered to the user plane DPI system, the data synchronization type is selected as full, and a request is made to push the latest version of the rule base version information and all rule base information.

接收配置库发送的增量同步指令；Receive incremental synchronization instructions sent by the configuration repository;

根据增量同步指令发送业务流程规则文件至配置库，以更新配置库自身存储的与规则库差别的业务流程规则文件，增量同步指令用于指示配置库与规则库差别的业务流程规则文件。The business process rule file is sent to the configuration library according to the incremental synchronization instruction to update the business process rule file stored in the configuration library itself that is different from the rule library. The incremental synchronization instruction is used to indicate the business process rule file that is different from the configuration library and the rule library.

本发明实施例图3及图4的示例中，步骤7中，获取不同业务的规则，触发RuleConfig_Req消息到用户面DPI系统，数据同步类型选择增量，请求推送最新版本规则库版本信息和增量规则信息。In the examples of Figures 3 and 4 of the embodiments of the present invention, in step 7, rules for different services are obtained, and a RuleConfig_Req message is triggered to the user plane DPI system. The data synchronization type selects incremental, and a request is made to push the latest version of the rule library version information and incremental rule information.

本发明的一些实施例中，可选地，请求指令包括：第一消息摘要算法；In some embodiments of the present invention, optionally, the request instruction includes: a first message digest algorithm;

根据请求指令返回规则库信息，包括：Returns rule base information according to the request instruction, including:

采用第一消息摘要算法计算得到规则库的信息摘要值，将信息摘要值返回至配置库。The first message digest algorithm is used to calculate the information digest value of the rule base, and the information digest value is returned to the configuration library.

本发明的一些实施例中，可选地，采用第一消息摘要算法计算得到规则库的信息摘要值，之前包括：In some embodiments of the present invention, optionally, the information digest value of the rule base is calculated using a first message digest algorithm, which includes:

对第一消息摘要算法是否为预设的指定算法集合中的摘要算法进行核验，得到核验结果；Verifying whether the first message digest algorithm is a digest algorithm in a preset specified algorithm set, and obtaining a verification result;

若核验结果为第一消息摘要算法不为指定算法集合中的摘要算法，确定指定算法集合中的一算法为目标消息摘要算法，采用目标消息摘要算法计算得到信息摘要值，将信息摘要值及目标消息摘要算法返回至配置库，目标消息摘要算法用于配置库更新自身存储的第一消息摘要算法。If the verification result is that the first message digest algorithm is not a digest algorithm in the specified algorithm set, an algorithm in the specified algorithm set is determined to be the target message digest algorithm, the target message digest algorithm is used to calculate the information digest value, the information digest value and the target message digest algorithm are returned to the configuration library, and the target message digest algorithm is used by the configuration library to update the first message digest algorithm stored in itself.

用户面DPI系统可以核验根据RuleVer_Req消息指定的摘要算法(相当于本发明实施例中的第一消息摘要算法)是否为自身支持配置的算法，若不是自身支持配置的算法，则用户面DPI系统使用自身支持配置的摘要算法(相当于本发明实施例中的目标消息摘要算法)计算得到信息摘要值，并通过RuleVer_Resp消息返回用户面DPI系统自身支持配置的摘要算法及采用该算法计算得到的消息摘要值，使得规则库配置装置也采用用户面DPI系统自身支持配置的摘要算法，完成协商(相当于本发明实施例中的目标消息摘要算法用于配置库更新自身存储的第一消息摘要算法)。The user plane DPI system can verify whether the digest algorithm specified according to the RuleVer_Req message (equivalent to the first message digest algorithm in the embodiment of the present invention) is an algorithm supported by itself. If it is not an algorithm supported by itself, the user plane DPI system uses the digest algorithm supported by itself (equivalent to the target message digest algorithm in the embodiment of the present invention) to calculate the information digest value, and returns the digest algorithm supported by the user plane DPI system and the message digest value calculated by the algorithm through the RuleVer_Resp message, so that the rule base configuration device also uses the digest algorithm supported by the user plane DPI system to complete the negotiation (equivalent to the target message digest algorithm in the embodiment of the present invention used to configure the library to update its own stored first message digest algorithm).

本发明的一些实施例中，可选地，请求指令通过第一请求消息携带传输，第一请求消息包括：In some embodiments of the present invention, optionally, the request instruction is transmitted by carrying a first request message, and the first request message includes:

规则库信息通过第一应答消息携带传输，第一应答消息包括：The rule base information is transmitted via a first response message, which includes:

参见图3所示，基于SDTP协议作为传输协议，新增2对请求/响应消息类型，分别用于规则库版本信息的查询和规则库配置的推送。参见图5所示，图中右下角矩形(虚线)框内为新增的2对请求/响应消息，规则库版本信息查询请求、应答消息的消息名分别为RuleVer_Req和RuleVer_Resp，对应的MessageType(信息类型)值依次分别为0x0011和0x8011。规则库配置请求、应答消息的消息名分别为RuleConfig_Req和RuleConfig_Resp，对应的MessageType(信息类型)值依次分别为0x0012和0x8012。As shown in Figure 3, based on the SDTP protocol as the transmission protocol, two pairs of request/response message types are added, which are used for querying the rule base version information and pushing the rule base configuration. As shown in Figure 5, the rectangle (dashed line) box in the lower right corner of the figure shows two pairs of newly added request/response messages. The message names of the rule base version information query request and response messages are RuleVer_Req and RuleVer_Resp, respectively, and the corresponding MessageType (information type) values are 0x0011 and 0x8011, respectively. The message names of the rule base configuration request and response messages are RuleConfig_Req and RuleConfig_Resp, respectively, and the corresponding MessageType (information type) values are 0x0012 and 0x8012, respectively.

参见图6所示，图6为规则库版本信息查询请求消息体示意图，本发明实施例中的第一请求消息即相当于规则库版本信息查询请求RuleVer_Req，包括：Referring to FIG. 6 , FIG. 6 is a schematic diagram of a rule base version information query request message body. The first request message in the embodiment of the present invention is equivalent to a rule base version information query request RuleVer_Req, including:

信息摘要指示字段，如：Message-Digest，通过何种算法生成消息摘要值，0：不生成、1：MD5、2：SHA1、3：SHA256。Information digest indication field, such as: Message-Digest, which algorithm is used to generate the message digest value, 0: not generated, 1: MD5, 2: SHA1, 3: SHA256.

图6还示意了基于SDTP的栈结构的第一请求消息，其中，黑色框示意消息头(MessageHeader)，白色框示意消息体(MessageBody)；信息摘要指示字段，如：Message-Digest，作为消息体(MessageBody)在基于SDTP的栈结构中占据1字节。FIG6 also illustrates a first request message based on the SDTP stack structure, wherein the black frame indicates a message header (MessageHeader), and the white frame indicates a message body (MessageBody); the information digest indication field, such as: Message-Digest, occupies 1 byte as the message body (MessageBody) in the SDTP-based stack structure.

参见图7所示，图7为规则库版本信息查询应答消息体示意图，本发明实施例中的第一应答消息即相当于规则库版本信息应答请求RuleVer_Resp，包括：Referring to FIG. 7 , FIG. 7 is a schematic diagram of a rule base version information query response message body. The first response message in the embodiment of the present invention is equivalent to a rule base version information response request RuleVer_Resp, including:

应答结果指示字段，如：Result，0：失败、1：成功；Response result indication field, such as: Result, 0: failure, 1: success;

规则库版本信息字段，如：Version，当前规则库版本信息；Rule base version information field, such as: Version, current rule base version information;

摘要算法指示字段，如：Message-Digest，消息摘要算法；Digest algorithm indication field, such as: Message-Digest, message digest algorithm;

摘要值指示字段，如：Message-Digest-Value，版本号+每种业务的规则，通过指定消息摘要算法计算出来的值。Digest value indicator field, such as: Message-Digest-Value, version number + rules for each service, the value calculated by the specified message digest algorithm.

图7还示意了基于SDTP的栈结构的第一应答消息，其中，黑色框示意消息头(MessageHeader)，白色框示意消息体(MessageBody)；应答结果指示字段，如：Result，作为消息体(MessageBody)在基于SDTP的栈结构中占据1字节；规则库版本信息字段，如：Version，作为消息体(MessageBody)在基于SDTP的栈结构中占据3字节；摘要算法指示字段，如：Message-Digest，作为消息体(MessageBody)在基于SDTP的栈结构中占据1字节；摘要值指示字段，如：Message-Digest-Value，作为消息体(MessageBody)在基于SDTP的栈结构中占据的字节量不定长。Figure 7 also illustrates the first response message of the stack structure based on SDTP, wherein the black box indicates the message header (MessageHeader), and the white box indicates the message body (MessageBody); the response result indication field, such as: Result, occupies 1 byte as the message body (MessageBody) in the stack structure based on SDTP; the rule base version information field, such as: Version, occupies 3 bytes as the message body (MessageBody) in the stack structure based on SDTP; the digest algorithm indication field, such as: Message-Digest, occupies 1 byte as the message body (MessageBody) in the stack structure based on SDTP; the digest value indication field, such as: Message-Digest-Value, occupies an indefinite amount of bytes as the message body (MessageBody) in the stack structure based on SDTP.

本发明的一些实施例中，可选地，全量同步指令通过第二请求消息携带传输，第二请求消息包括：In some embodiments of the present invention, optionally, the full synchronization instruction is transmitted by carrying a second request message, and the second request message includes:

规则库中的所有业务流程规则文件通过第二应答消息携带传输，第二应答消息包括：All business process rule files in the rule base are transmitted by carrying the second response message, and the second response message includes:

参见图8所示，图8为规则库配置请求消息体示意图，本发明实施例中的第二请求消息即相当于规则库配置请求消息体RuleConfig_Req，包括：Referring to FIG. 8 , FIG. 8 is a schematic diagram of a rule base configuration request message body. The second request message in the embodiment of the present invention is equivalent to the rule base configuration request message body RuleConfig_Req, including:

规则库同步方式指示字段，如：Sync-Type，规则库同步方法，0：全量数据同步；1：增量数据同步；Rule base synchronization method indication field, such as: Sync-Type, rule base synchronization method, 0: full data synchronization; 1: incremental data synchronization;

规则库信息指示字段，如：Rule，规则库信息，全量数据同步时，携带全部规则库内容；增量数据同步时，携带不同的业务规则内容。Rule base information indicator field, such as Rule, rule base information. When full data is synchronized, all rule base contents are carried; when incremental data is synchronized, different business rule contents are carried.

图8还示意了基于SDTP的栈结构的第二请求消息，其中，黑色框示意消息头(MessageHeader)，白色框示意消息体(MessageBody)；规则库版本信息字段，如：Version，作为消息体(MessageBody)在基于SDTP的栈结构中占据3字节；规则库同步方式指示字段，如：Sync-Type，作为消息体(MessageBody)在基于SDTP的栈结构中占据1字节；规则库信息指示字段，如：Rule，作为消息体(MessageBody)在基于SDTP的栈结构中占据的字节量不定长。Figure 8 also illustrates a second request message based on the SDTP stack structure, wherein the black box indicates a message header (MessageHeader), and the white box indicates a message body (MessageBody); the rule base version information field, such as: Version, occupies 3 bytes as the message body (MessageBody) in the SDTP-based stack structure; the rule base synchronization mode indication field, such as: Sync-Type, occupies 1 byte as the message body (MessageBody) in the SDTP-based stack structure; the rule base information indication field, such as: Rule, occupies an indefinite amount of bytes as the message body (MessageBody) in the SDTP-based stack structure.

参见图9所示，图9为规则库配置应答消息体示意图，本发明实施例中的第二应答消息即相当于规则库配置应答消息体RuleConfig_Resp，包括：Referring to FIG. 9 , FIG. 9 is a schematic diagram of a rule base configuration response message body. The second response message in the embodiment of the present invention is equivalent to the rule base configuration response message body RuleConfig_Resp, including:

应答结果指示字段，如：Result，0：失败、1：成功。Response result indication field, such as: Result, 0: failure, 1: success.

图9还示意了基于SDTP的栈结构的第二应答消息，其中，黑色框示意消息头(MessageHeader)，白色框示意消息体(MessageBody)；应答结果指示字段，如：Result，作为消息体(MessageBody)在基于SDTP的栈结构中占据1字节。Figure 9 also illustrates the second response message based on the SDTP stack structure, wherein the black box indicates the message header (MessageHeader), and the white box indicates the message body (MessageBody); the response result indication field, such as: Result, occupies 1 byte in the SDTP-based stack structure as the message body (MessageBody).

本发明的一些实施例中，可选地，增量同步指令通过第二请求消息携带传输，第二请求消息包括：In some embodiments of the present invention, optionally, the incremental synchronization instruction is transmitted by carrying a second request message, and the second request message includes:

根据增量同步指令发送的业务流程规则文件通过第二应答消息携带传输，第二应答消息包括：The business process rule file sent according to the incremental synchronization instruction is carried and transmitted through the second response message, and the second response message includes:

本发明实施例提供了一种业务流程规则文件的生成装置，参见图10所示，图10为本发明实施例业务流程规则文件的生成装置的原理框图，业务流程规则文件的生成装置100包括：An embodiment of the present invention provides a device for generating a business process rule file. Referring to FIG. 10 , FIG. 10 is a functional block diagram of a device for generating a business process rule file according to an embodiment of the present invention. The device 100 for generating a business process rule file includes:

获取模块101，用于获取过程特性分析软件包PCAP样本包的第一样本特征字段及业务标签；An acquisition module 101 is used to acquire a first sample feature field and a business label of a process characteristic analysis software package PCAP sample package;

聚类模块102，用于对所述第一样本特征字段进行聚类处理，得到第二样本特征字段；所述第二样本特征字段用于表征公共特征；A clustering module 102, configured to perform clustering processing on the first sample feature field to obtain a second sample feature field; the second sample feature field is used to represent a common feature;

分析模块103，用于分析所述业务标签与所述第二样本特征字段之间的关联，得到关联规则；An analysis module 103, configured to analyze the association between the service tag and the second sample feature field to obtain an association rule;

生成模块104，用于根据所述关联规则生成业务流程规则文件。The generating module 104 is used to generate a business process rule file according to the association rule.

本发明的一些实施例中，可选地，In some embodiments of the present invention, optionally,

所述聚类模块102，还用于采用聚类算法对所述第一样本特征字段进行预分类；The clustering module 102 is further used to pre-classify the first sample feature field using a clustering algorithm;

所述聚类模块102，还用于采用最长公共子串算法对预分类后的所述第一样本特征字段进行求解，得到与预分类后的所述第一样本特征字段一一对应的最长公共子串值；The clustering module 102 is further used to solve the first sample feature fields after pre-classification using the longest common substring algorithm to obtain the longest common substring value corresponding to the first sample feature fields after pre-classification;

所述聚类模块102，还用于采用所述最长公共子串值对应更新预分类后的所述第一样本特征字段的字段值，得到第二样本特征字段。The clustering module 102 is further configured to update the field value of the first sample feature field after pre-classification corresponding to the longest common substring value to obtain a second sample feature field.

所述聚类模块102，还用于将所述第一样本特征字段集合组成第一样本集合，将所述第一样本集合发送至与用户关联的交互端，所述第一样本集合用于用户剔除非强相关的所述第一样本特征字段，并用于得到仅包含强相关的所述第一样本特征字段的第二样本集合；The clustering module 102 is further used to form a first sample set from the first sample feature field set, and send the first sample set to an interactive terminal associated with a user, wherein the first sample set is used for the user to remove the first sample feature fields that are not strongly correlated, and to obtain a second sample set that only includes the first sample feature fields that are strongly correlated;

所述聚类模块102，还用于接收所述交互端发送的所述第二样本集合；The clustering module 102 is further configured to receive the second sample set sent by the interaction terminal;

所述聚类模块102，还用于采用聚类算法对所述第二样本集合中的所述第一样本特征字段进行预分类。The clustering module 102 is further configured to pre-classify the first sample feature fields in the second sample set by using a clustering algorithm.

所述聚类模块102，还用于对所述第二样本集合中的所述第一样本特征字进行格式检测，得到第一检测结果；The clustering module 102 is further configured to perform format detection on the first sample feature words in the second sample set to obtain a first detection result;

所述聚类模块102，还用于若所述第一检测结果为所述第一样本特征字段为字符格式，采用一位有效编码one-hot方法将所述第二样本集合中的所述第一样本特征字段转换为数字格式。The clustering module 102 is further configured to convert the first sample feature field in the second sample set into a digital format by using a one-hot encoding method if the first detection result is that the first sample feature field is in a character format.

所述聚类模块102，还用于对所述第一样本特征字段进行格式检测，得到第二检测结果；The clustering module 102 is further used to perform format detection on the first sample feature field to obtain a second detection result;

所述聚类模块102，还用于若所述第二检测结果为所述第一样本特征字段为字符格式，采用one-hot方法将所述第一样本特征字段转换为数字格式。The clustering module 102 is further configured to convert the first sample feature field into a digital format by using a one-hot method if the second detection result is that the first sample feature field is in a character format.

所述分析模块103，还用于获取预设的训练集合，所述预设的训练集合包括：训练业务标签及训练样本特征字段；The analysis module 103 is further used to obtain a preset training set, wherein the preset training set includes: a training service label and a training sample feature field;

所述分析模块103，还用于采用所述训练业务标签及所述训练样本特征字段对关联分析模型进行训练，得到目标关联分析模型；The analysis module 103 is further used to train the association analysis model using the training service label and the training sample feature field to obtain a target association analysis model;

所述分析模块103，还用于采用所述目标关联分析模型分析所述业务标签与所述第二样本特征字段之间的关联。The analysis module 103 is further configured to analyze the association between the service tag and the second sample feature field by using the target association analysis model.

所述生成模块104，还用于将所述业务流程规则文件加载至规则库，并更新规则库信息。The generating module 104 is further configured to load the business process rule file into a rule base and update the rule base information.

所述生成模块104，还用于接收配置库发送的请求指令；The generating module 104 is further used to receive a request instruction sent by the configuration library;

所述生成模块104，还用于根据所述请求指令返回所述规则库信息，所述规则库信息用于所述配置库确定自身存储的业务流程规则文件是否为最新，若不为最新且与所述规则库差别的业务流程规则文件的数量超出预设的差值阈值，所述配置库发送全量同步指令；若不为最新且与所述规则库差别的业务流程规则文件的数量未超出所述预设的差值阈值，所述配置库发送增量同步指令。The generation module 104 is also used to return the rule base information according to the request instruction. The rule base information is used by the configuration library to determine whether the business process rule files stored in it are the latest. If they are not the latest and the number of business process rule files that differ from the rule base exceeds the preset difference threshold, the configuration library sends a full synchronization instruction; if they are not the latest and the number of business process rule files that differ from the rule base does not exceed the preset difference threshold, the configuration library sends an incremental synchronization instruction.

所述生成模块104，还用于接收所述配置库发送的所述全量同步指令；The generating module 104 is further configured to receive the full synchronization instruction sent by the configuration library;

所述生成模块104，还用于发送所述规则库中的所有业务流程规则文件至所述配置库，以更新所述配置库自身存储的所有业务流程规则文件。The generating module 104 is further configured to send all the business process rule files in the rule base to the configuration base, so as to update all the business process rule files stored in the configuration base itself.

所述生成模块104，还用于接收所述配置库发送的所述增量同步指令；The generating module 104 is further configured to receive the incremental synchronization instruction sent by the configuration repository;

所述生成模块104，还用于根据所述增量同步指令发送业务流程规则文件至所述配置库，以更新所述配置库自身存储的与所述规则库差别的业务流程规则文件，所述增量同步指令用于指示所述配置库与所述规则库差别的业务流程规则文件。The generation module 104 is also used to send the business process rule file to the configuration library according to the incremental synchronization instruction to update the business process rule file stored in the configuration library itself that is different from the rule library. The incremental synchronization instruction is used to indicate the business process rule file that is different from the configuration library and the rule library.

所述生成模块104，还用于采用所述第一消息摘要算法计算得到所述规则库的信息摘要值，将所述信息摘要值返回至所述配置库。The generating module 104 is further configured to use the first message digest algorithm to calculate an information digest value of the rule base, and return the information digest value to the configuration base.

所述生成模块104，还用于对所述第一消息摘要算法是否为预设的指定算法集合中的摘要算法进行核验，得到核验结果；The generating module 104 is further configured to verify whether the first message digest algorithm is a digest algorithm in a preset specified algorithm set, and obtain a verification result;

所述生成模块104，还用于若所述核验结果为所述第一消息摘要算法不为所述指定算法集合中的摘要算法，确定所述指定算法集合中的一算法为目标消息摘要算法，采用所述目标消息摘要算法计算得到所述信息摘要值，将所述信息摘要值及所述目标消息摘要算法返回至所述配置库，所述目标消息摘要算法用于所述配置库更新自身存储的所述第一消息摘要算法。The generation module 104 is also used to, if the verification result is that the first message digest algorithm is not a digest algorithm in the specified algorithm set, determine that an algorithm in the specified algorithm set is a target message digest algorithm, use the target message digest algorithm to calculate the information digest value, return the information digest value and the target message digest algorithm to the configuration library, and the target message digest algorithm is used by the configuration library to update the first message digest algorithm stored in itself.

所述生成模块104，还用于所述请求指令通过第一请求消息携带传输，所述第一请求消息包括：The generating module 104 is further configured to transmit the request instruction by carrying a first request message, wherein the first request message includes:

所述生成模块104，还用于所述全量同步指令通过第二请求消息携带传输，所述第二请求消息包括：The generating module 104 is further configured to carry and transmit the full synchronization instruction through a second request message, wherein the second request message includes:

所述生成模块104，还用于所述增量同步指令通过第二请求消息携带传输，所述第二请求消息包括：The generating module 104 is further configured to transmit the incremental synchronization instruction by carrying a second request message, wherein the second request message includes:

本申请实施例提供的业务流程规则文件的生成装置能够实现图1至图9的方法实施例实现的各个过程，并达到相同的技术效果，为避免重复，这里不再赘述。The device for generating a business process rule file provided in the embodiment of the present application can implement the various processes implemented by the method embodiments of Figures 1 to 9 and achieve the same technical effect. To avoid repetition, it will not be described here.

本发明实施例提供了一种电子设备110，参见图11所示，图11为本发明实施例电子设备110的原理框图，包括处理器111，存储器112及存储在存储器112上并可在处理器111上运行的程序或指令，程序或指令被处理器执行时实现本发明的任一项业务流程规则文件的生成方法中的步骤。An embodiment of the present invention provides an electronic device 110, as shown in Figure 11, which is a principle block diagram of the electronic device 110 of the embodiment of the present invention, including a processor 111, a memory 112, and a program or instruction stored in the memory 112 and executable on the processor 111. When the program or instruction is executed by the processor, it implements any step in the method for generating a business process rule file of the present invention.

本发明实施例提供了一种可读存储介质，可读存储介质上存储程序或指令，程序或指令被处理器执行时实现如上述任一项的业务流程规则文件的生成方法的实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。An embodiment of the present invention provides a readable storage medium, on which a program or instruction is stored. When the program or instruction is executed by a processor, the various processes of an embodiment of a method for generating a business process rule file such as any of the above-mentioned items are implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.

其中，所述的可读存储介质，如只读存储器(Read-Only Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等。The readable storage medium is, for example, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本发明的保护之内。The embodiments of the present invention are described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific implementation methods. The above-mentioned specific implementation methods are merely illustrative and not restrictive. Under the guidance of the present invention, ordinary technicians in this field can also make many forms without departing from the scope of protection of the present invention and the claims, all of which are within the protection of the present invention.

Claims

1. A method for generating a business process rule file, characterized by comprising:

Obtain the first sample feature field and business label of a process characteristic analysis software package PCAP sample package;

Performing clustering processing on the first sample feature field to obtain a second sample feature field; the second sample feature field is used to characterize a common feature;

Analyze the association between the service tag and the second sample feature field to obtain an association rule;

A business process rule file is generated according to the association rule.

2. The method for generating a business process rule file according to claim 1, characterized in that:

The clustering process is performed on the first sample feature field, including:

Using a clustering algorithm to pre-classify the first sample feature field;

Using the longest common substring algorithm to solve the first sample characteristic fields after pre-classification, to obtain the longest common substring value corresponding to the first sample characteristic fields after pre-classification;

The field value of the first sample characteristic field after pre-classification is updated corresponding to the longest common substring value to obtain a second sample characteristic field.

3. The method for generating a business process rule file according to claim 2, characterized in that:

The first sample feature field is pre-classified using a clustering algorithm, including:

The first sample feature field sets are combined into a first sample set, and the first sample set is sent to an interactive terminal associated with a user, wherein the first sample set is used by the user to remove the first sample feature fields that are not strongly correlated, and to obtain a second sample set that only includes the first sample feature fields that are strongly correlated;

receiving the second sample set sent by the interactive terminal;

A clustering algorithm is used to pre-classify the first sample feature fields in the second sample set.

4. The method for generating a business process rule file according to claim 3, characterized in that:

The first sample feature fields in the second sample set are pre-classified using a clustering algorithm, which includes:

Performing format detection on the first sample feature word in the second sample set to obtain a first detection result;

If the first detection result is that the first sample feature field is in character format, the first sample feature field in the second sample set is converted into a digital format using a one-hot encoding method.

5. The method for generating a business process rule file according to claim 2, characterized in that:

The first sample feature field is pre-classified using a clustering algorithm, which includes:

Performing format detection on the first sample characteristic field to obtain a second detection result;

If the second detection result is that the first sample feature field is in character format, the first sample feature field is converted into a digital format using a one-hot method.

6. The method for generating a business process rule file according to claim 1, characterized in that:

Analyzing the association between the service tag and the second sample feature field includes:

Acquire a preset training set, wherein the preset training set includes: a training service label and a training sample feature field;

The association analysis model is trained using the training service label and the training sample feature field to obtain a target association analysis model;

The target association analysis model is used to analyze the association between the service tag and the second sample feature field.

7. The method for generating a business process rule file according to claim 1, characterized in that:

Generate a business process rule file according to the association rule, and then include:

The business process rule file is loaded into the rule base, and the rule base information is updated.

8. The method for generating a business process rule file according to claim 7, characterized in that:

Update the rule base information, including:

Receive request instructions sent by the configuration library;

The rule base information is returned according to the request instruction. The rule base information is used by the configuration library to determine whether the business process rule files stored in it are the latest. If they are not the latest and the number of business process rule files that differ from the rule base exceeds the preset difference threshold, the configuration library sends a full synchronization instruction; if they are not the latest and the number of business process rule files that differ from the rule base does not exceed the preset difference threshold, the configuration library sends an incremental synchronization instruction.

9. The method for generating a business process rule file according to claim 8, characterized in that:

Returning the rule base information according to the request instruction, and then comprising:

Receiving the full synchronization instruction sent by the configuration library;

All business process rule files in the rule library are sent to the configuration library to update all business process rule files stored in the configuration library itself.

10. The method for generating a business process rule file according to claim 8, characterized in that:

Receiving the incremental synchronization instruction sent by the configuration repository;

The business process rule file is sent to the configuration library according to the incremental synchronization instruction to update the business process rule file stored in the configuration library itself that is different from the rule library. The incremental synchronization instruction is used to indicate the business process rule file that is different from the configuration library and the rule library.

11. The method for generating a business process rule file according to claim 8, characterized in that:

The request instruction includes: a first message digest algorithm;

Returning the rule base information according to the request instruction includes:

The first message digest algorithm is used to calculate the information digest value of the rule base, and the information digest value is returned to the configuration library.

12. The method for generating a business process rule file according to claim 11, characterized in that:

The information digest value of the rule base is calculated using the first message digest algorithm, which includes:

Verifying whether the first message digest algorithm is a digest algorithm in a preset specified algorithm set, and obtaining a verification result;

If the verification result is that the first message digest algorithm is not a digest algorithm in the specified algorithm set, determine that an algorithm in the specified algorithm set is a target message digest algorithm, use the target message digest algorithm to calculate the information digest value, return the information digest value and the target message digest algorithm to the configuration library, and the target message digest algorithm is used by the configuration library to update the first message digest algorithm stored in itself.

13. The method for generating a business process rule file according to claim 8, characterized in that:

The request instruction is transmitted by carrying a first request message, and the first request message includes:

A message digest indication field is used to indicate whether to generate a message digest and/or the algorithm used to generate the message digest;

The rule base information is transmitted by carrying a first response message, and the first response message includes:

The response result indication field is used to indicate whether the response is successful;

The rule base version information field is used to indicate the current rule base version information;

The digest algorithm indication field is used to indicate the message digest algorithm;

The digest value indication field is used to indicate the digest value calculated by the message digest algorithm corresponding to the rule base version information.

14. The method for generating a business process rule file according to claim 9, characterized in that:

The full synchronization instruction is transmitted by carrying a second request message, and the second request message includes:

A rule base synchronization mode indication field is used to indicate that the synchronization mode is full data synchronization or incremental data synchronization;

The rule base information indication field is used to indicate that all rule base contents are carried when the synchronization mode is full data synchronization, or to indicate that different rule base contents are carried when the synchronization mode is incremental data synchronization;

All business process rule files in the rule base are transmitted by carrying a second response message, and the second response message includes:

The response result indication field is used to indicate whether the response is successful.

15. The method for generating a business process rule file according to claim 10, characterized in that:

The incremental synchronization instruction is transmitted by carrying a second request message, and the second request message includes:

The rule base synchronization mode indication field is used to indicate whether the synchronization mode is full data synchronization or incremental data synchronization;

The business process rule file sent according to the incremental synchronization instruction is carried and transmitted through a second response message, and the second response message includes:

16. A device for generating a business process rule file, characterized by comprising:

An acquisition module, used for acquiring a first sample feature field and a business label of a process characteristic analysis software package PCAP sample package;

A clustering module, used for performing clustering processing on the first sample feature field to obtain a second sample feature field; the second sample feature field is used to represent a common feature;

An analysis module, used for analyzing the association between the service tag and the second sample feature field to obtain an association rule;

A generation module is used to generate a business process rule file according to the association rule.

17. An electronic device, characterized in that it includes a processor, a memory, and a program or instruction stored in the memory and executable on the processor, wherein the program or instruction, when executed by the processor, implements the steps in the method for generating a business process rule file as described in any one of claims 1 to 15.

18. A readable storage medium, characterized in that: the readable storage medium stores a program or instruction, and when the program or instruction is executed by a processor, the steps in the method for generating a business process rule file as described in any one of claims 1 to 15 are implemented.