CN111950573A

CN111950573A - Method and device for clustering abnormal problems

Info

Publication number: CN111950573A
Application number: CN201910407682.0A
Authority: CN
Inventors: 孙佩霞; 刘喜文; 祁宏伟
Original assignee: Beijing Xiaomi Intelligent Technology Co Ltd
Current assignee: Beijing Xiaomi Intelligent Technology Co Ltd
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2020-11-17

Abstract

The present disclosure relates to a method and device for clustering abnormal problems, which are used to improve the clustering effect of abnormal stacks and facilitate more accurate discovery of abnormal problems. The method includes: extracting exception description information from a plurality of exception stacks respectively; calculating similarity of exception description information between exception stacks and exception stacks to obtain similarity; and clustering exception stacks according to the similarity , and determine the class center.

Description

Method and device for clustering abnormal problems

技术领域technical field

本公开涉及通信及计算机处理领域，尤其涉及异常问题聚类的方法及装置。The present disclosure relates to the fields of communication and computer processing, and in particular, to a method and apparatus for clustering abnormal problems.

背景技术Background technique

相关技术中，移动终端系统每天可能发生大量的异常问题。这些异常问题的相关信息会存储在专门的栈中，这些栈称为异常栈。对异常栈进行聚类处理，以便对异常问题进行归类，有助于对异常问题进行后续的分析等处理。所以，如何进行更有效的聚类处理，是业内一直在研究的问题。In the related art, a large number of abnormal problems may occur in the mobile terminal system every day. Information about these exceptions is stored in special stacks called exception stacks. The exception stack is clustered to classify the abnormal problem, which is helpful for the follow-up analysis and other processing of the abnormal problem. Therefore, how to perform more effective clustering processing is a problem that has been researched in the industry.

发明内容SUMMARY OF THE INVENTION

为克服相关技术中存在的问题，本公开提供一种异常问题聚类的方法及装置。In order to overcome the problems existing in the related art, the present disclosure provides a method and apparatus for clustering abnormal problems.

根据本公开实施例的第一方面，提供一种异常问题聚类的方法，包括：According to a first aspect of the embodiments of the present disclosure, there is provided a method for clustering abnormal problems, including:

从多个异常栈中分别提取出异常描述信息；Extract the exception description information from multiple exception stacks respectively;

计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度；Calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity;

根据所述相似度对异常栈进行聚类处理，并确定类中心。The abnormal stack is clustered according to the similarity, and the cluster center is determined.

本公开的实施例提供的技术方案可以包括以下有益效果：本实施例中根据异常栈中的异常描述信息对异常栈进行聚类处理，由于异常描述信息对异常问题描述更准确，所以聚类效果也更好。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: In this embodiment, the exception stack is clustered according to the exception description information in the exception stack. Since the exception description information is more accurate in describing the abnormal problem, the clustering effect Also better.

在一个实施例中，所述异常描述信息至少包括下列之一：异常语言描述信息、异常标记、异常数值、异常代码标识、与异常相关的函数的库信息和类信息。In one embodiment, the exception description information includes at least one of the following: exception language description information, exception flags, exception values, exception code identifiers, library information and class information of functions related to exceptions.

本公开的实施例提供的技术方案可以包括以下有益效果：本实施例提供多种异常描述信息，可以更有效的进行聚类处理，适用于多种应用场景。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: the embodiments provide various abnormal description information, can perform clustering processing more effectively, and are suitable for various application scenarios.

在一个实施例中，所述计算异常栈与异常之间，异常描述信息的相似性，得到相似度，包括：In one embodiment, calculating the similarity of the exception description information between the exception stack and the exception to obtain the similarity, including:

采用编辑距离算法，对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性，得到相应的每项异常描述信息的子相似度；The edit distance algorithm is used to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the functions related to the exception, and obtain the corresponding sub-similarity of each exception description information;

采用精确匹配算法，对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性，得到相应的每项异常描述信息的子相似度。The exact matching algorithm is used to calculate the similarity between the exception stack and the exception label, exception value, and exception code identifier respectively, and obtain the corresponding sub-similarity of each exception description information.

根据每项异常描述信息的子相似度，得到异常栈的相似度。According to the sub-similarity of each exception description information, the similarity of the exception stack is obtained.

本公开的实施例提供的技术方案可以包括以下有益效果：本实施例针对不同的异常描述信息采用相应的相似度计算方法，得到的相似度更准确，有助于提高后续聚类的准确性。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: this embodiment adopts corresponding similarity calculation methods for different abnormal description information, and the obtained similarity is more accurate, which helps to improve the accuracy of subsequent clustering.

在一个实施例中，所述采用编辑距离算法，对异常栈与异常之间的与异常相关的函数的库信息和类信息分别计算相似性，得到相应的每项异常描述信息的子相似度，包括：In one embodiment, the edit distance algorithm is used to calculate the similarity of the library information and class information of the exception-related functions between the exception stack and the exception, respectively, to obtain the corresponding sub-similarity of each item of exception description information, include:

针对所述库信息和类信息中的每一层，分别计算子相似度；For each layer in the library information and the class information, calculate the sub-similarity respectively;

对同层级的多个子相似度求平均；Average multiple sub-similarities of the same level;

对多个层级的子相似度进行加权求和，其中，栈层级越低的子相似度对应的权重越大。A weighted sum is performed on the sub-similarities of multiple levels, wherein the sub-similarity with a lower stack level corresponds to a larger weight.

本公开的实施例提供的技术方案可以包括以下有益效果：本实施例针对与异常相关的函数的库信息和类信息的层级特性，提供了更细致的相似度计算方法，得到的相似度更准确。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: this embodiment provides a more detailed similarity calculation method for the hierarchical characteristics of the library information and class information of functions related to exceptions, and the obtained similarity is more accurate .

在一个实施例中，计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度之前，所述方法还包括：In one embodiment, before calculating the similarity of the exception description information between the exception stack and the exception stack, and obtaining the similarity, the method further includes:

对每个异常栈中的异常描述信息进行唯一性编码，得到唯一编码；Uniquely encode the exception description information in each exception stack to obtain a unique encoding;

根据所述唯一编码，对异常栈进行去重处理。The exception stack is deduplicated according to the unique code.

本公开的实施例提供的技术方案可以包括以下有益效果：本实施例提前进行去重处理，可以减少相似度的计算过程，效率更高。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: this embodiment performs de-duplication processing in advance, which can reduce the calculation process of similarity, and is more efficient.

在一个实施例中，计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度，包括：In one embodiment, the similarity of the exception description information between the exception stack and the exception stack is calculated to obtain the similarity, including:

对每个异常栈中的异常描述信息进行矢量编码，得到矢量编码结果；Perform vector encoding on the exception description information in each exception stack to obtain the vector encoding result;

计算异常栈与异常栈之间，矢量编码结果的相似性，得到相似度。Calculate the similarity of the vector encoding results between the exception stack and the exception stack to obtain the similarity.

本公开的实施例提供的技术方案可以包括以下有益效果：本实施例对异常描述信息进行矢量编码，便于更快速的计算相似度。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: this embodiment performs vector encoding on the abnormality description information, which facilitates faster calculation of similarity.

在一个实施例中，所述根据所述相似度对异常栈进行聚类处理，并确定类中心，包括：In one embodiment, performing clustering processing on the exception stack according to the similarity, and determining the cluster center, includes:

统计相同异常栈的个数；Count the number of the same exception stack;

按照个数由高到低的顺序，根据所述相似度对异常栈进行聚类处理，并确定类中心。In order of the number from high to low, cluster processing is performed on the abnormal stacks according to the similarity, and the cluster center is determined.

本公开的实施例提供的技术方案可以包括以下有益效果：本实施例根据相同异常栈的个数由高到低的顺序进行聚类处理，可以较快速的得到覆盖率较高的类别。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: This embodiment performs clustering processing according to the descending order of the number of identical exception stacks, so that categories with higher coverage rates can be obtained more quickly.

在一个实施例中，所述按照个数由高到低的顺序，根据所述相似度对异常栈进行聚类处理，并确定类中心，包括：In one embodiment, according to the order of the number from high to low, clustering processing is performed on the exception stacks according to the similarity, and the cluster center is determined, including:

将得到的所述相似度存储到预设的存储结构中；所述存储结构包括：两层哈希图或数组；The obtained similarity is stored in a preset storage structure; the storage structure includes: a two-layer hash map or an array;

按照个数由高到低的顺序，对所述存储结构进行遍历，并进行聚类处理，以及确定类中心。The storage structures are traversed in descending order of the number, and cluster processing is performed, and the cluster center is determined.

本公开的实施例提供的技术方案可以包括以下有益效果：本实施例通过预设的存储结构来存储和遍历相似度，提高了聚类的处理效率。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: In this embodiment, the similarity is stored and traversed through a preset storage structure, thereby improving the processing efficiency of clustering.

根据本公开实施例的第二方面，提供一种异常问题聚类的装置，包括：According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for clustering abnormal problems, including:

提取模块，用于从多个异常栈中分别提取出异常描述信息；The extraction module is used to extract the exception description information from multiple exception stacks respectively;

计算模块，用于计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度；The calculation module is used to calculate the similarity of the exception description information between the exception stack and the exception stack, and obtain the similarity;

聚类模块，用于根据所述相似度对异常栈进行聚类处理，并确定类中心。The clustering module is used for clustering the abnormal stack according to the similarity, and determining the cluster center.

在一个实施例中，所述计算模块包括：In one embodiment, the computing module includes:

第一计算子模块，用于采用编辑距离算法，对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性，得到相应的每项异常描述信息的子相似度；The first calculation submodule is used to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the function related to the exception, and obtain the corresponding abnormal description information for each item by using the edit distance algorithm. the sub-similarity of ;

第二计算子模块，用于采用精确匹配算法，对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性，得到相应的每项异常描述信息的子相似度。The second calculation sub-module is used to calculate the similarity between the exception stack and the exception label, exception value, and exception code identifier by using the exact matching algorithm, and obtain the corresponding sub-similarity of each item of exception description information.

统计子模块，用于根据每项异常描述信息的子相似度，得到异常栈的相似度。The statistical sub-module is used to obtain the similarity of the exception stack according to the sub-similarity of each exception description information.

在一个实施例中，所述第一计算子模块针对所述库信息和类信息中的每一层，分别计算子相似度；对同层级的多个子相似度求平均；对多个层级的子相似度进行加权求和，其中，栈层级越低的子相似度对应的权重越大。In one embodiment, the first calculation sub-module calculates the sub-similarity separately for each layer in the library information and the class information; averages multiple sub-similarities at the same level; The similarity is weighted and summed, wherein the sub-similarity with a lower stack level corresponds to a greater weight.

在一个实施例中，所述装置还包括：In one embodiment, the apparatus further includes:

唯一编码模块，用于对每个异常栈中的异常描述信息进行唯一性编码，得到唯一编码；The unique encoding module is used to uniquely encode the exception description information in each exception stack to obtain a unique encoding;

去重模块，用于根据所述唯一编码，对异常栈进行去重处理。A deduplication module, configured to perform deduplication processing on the exception stack according to the unique code.

矢量编码子模块，用于对每个异常栈中的异常描述信息进行矢量编码，得到矢量编码结果；The vector coding submodule is used for vector coding the exception description information in each exception stack to obtain the vector coding result;

第三计算子模块，用于计算异常栈与异常栈之间，矢量编码结果的相似性，得到相似度。The third calculation sub-module is used to calculate the similarity between the exception stack and the exception stack, the result of vector encoding, to obtain the similarity.

在一个实施例中，所述聚类模块包括：In one embodiment, the clustering module includes:

个数子模块，用于统计相同异常栈的个数；The number sub-module is used to count the number of the same exception stack;

聚类子模块，用于按照个数由高到低的顺序，根据所述相似度对异常栈进行聚类处理，并确定类中心。The clustering sub-module is used for clustering the abnormal stacks according to the similarity according to the order of the number from high to low, and determining the cluster center.

在一个实施例中，所述聚类子模块将得到的所述相似度存储到预设的存储结构中；所述存储结构包括：两层哈希图或数组；按照个数由高到低的顺序，对所述存储结构进行遍历，并进行聚类处理，以及确定类中心。In one embodiment, the clustering sub-module stores the obtained similarity in a preset storage structure; the storage structure includes: a two-layer hash map or an array; sequence, traverse the storage structure, perform clustering processing, and determine the cluster center.

根据本公开实施例的第三方面，提供一种异常问题聚类的装置，包括：According to a third aspect of the embodiments of the present disclosure, an apparatus for clustering abnormal problems is provided, including:

处理器；processor;

用于存储处理器可执行指令的存储器；memory for storing processor-executable instructions;

其中，所述处理器被配置为：wherein the processor is configured to:

根据本公开实施例的第四方面，提供一种计算机可读存储介质，其上存储有计算机指令，其特征在于，该指令被处理器执行时实现上述异常问题聚类的方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which computer instructions are stored, characterized in that, when the instructions are executed by a processor, the above-mentioned method for clustering abnormal problems is implemented.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

图1是根据一示例性实施例示出的一种异常问题聚类的方法的流程图。Fig. 1 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment.

图2是根据一示例性实施例示出的一种异常问题聚类的方法的流程图。Fig. 2 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment.

图3是根据一示例性实施例示出的一种异常问题聚类的方法的流程图。Fig. 3 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment.

图4是根据一示例性实施例示出的一种异常问题聚类的装置的框图。Fig. 4 is a block diagram of an apparatus for clustering abnormal problems according to an exemplary embodiment.

图5是根据一示例性实施例示出的一种计算模块的框图。Fig. 5 is a block diagram of a computing module according to an exemplary embodiment.

图6是根据一示例性实施例示出的一种异常问题聚类的装置的框图。Fig. 6 is a block diagram of an apparatus for clustering abnormal problems according to an exemplary embodiment.

图7是根据一示例性实施例示出的一种计算模块的框图。Fig. 7 is a block diagram of a computing module according to an exemplary embodiment.

图8是根据一示例性实施例示出的一种聚类模块的框图。Fig. 8 is a block diagram of a clustering module according to an exemplary embodiment.

图9是根据一示例性实施例示出的一种装置的框图。Fig. 9 is a block diagram of an apparatus according to an exemplary embodiment.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

相关技术中，对异常栈进行聚类处理时，一种可能的方案是，对各个异常栈之间的函数进行相似性判断，以便进行聚类处理。但是，异常栈中的函数名称不一定能反应出异常问题，所以根据函数名称来进行聚类，效果不是很好。In the related art, when clustering the exception stacks, a possible solution is to judge the similarity of the functions between the exception stacks, so as to perform the clustering processing. However, the function names in the exception stack may not reflect the exception problem, so clustering based on the function name is not very effective.

为解决上述问题，本实施例根据异常栈中的异常描述信息对异常栈进行聚类，异常描述信息是对异常问题的直观描述，所以聚类效果更好。In order to solve the above problem, in this embodiment, the exception stack is clustered according to the exception description information in the exception stack. The exception description information is an intuitive description of the abnormal problem, so the clustering effect is better.

图1是根据一示例性实施例示出的一种异常问题聚类的方法的流程图，如图1所示，该方法可以由计算机设备实现，包括以下步骤：Fig. 1 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment. As shown in Fig. 1, the method can be implemented by a computer device and includes the following steps:

在步骤101中，从多个异常栈中分别提取出异常描述信息。In step 101, exception description information is extracted from multiple exception stacks, respectively.

在步骤102中，计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度。In step 102, the similarity of the exception description information between the exception stack and the exception stack is calculated to obtain the similarity.

在步骤103中，根据所述相似度对异常栈进行聚类处理，并确定类中心。In step 103, clustering processing is performed on the abnormal stack according to the similarity, and a cluster center is determined.

本实施例中的异常栈是指存储了关于异常问题的信息的栈。本实施例中的异常描述信息是异常栈中函数以外与异常问题有关的信息。由于异常描述信息对异常问题的描述更准确、全面，所以本实施例根据异常描述信息对异常栈进行聚类的结果也更准确。其中，确定类中心就是确定聚类后得到的类别。The exception stack in this embodiment refers to a stack that stores information about exception issues. The exception description information in this embodiment is information related to exception problems other than the functions in the exception stack. Since the exception description information describes the exception problem more accurately and comprehensively, the result of clustering the exception stack according to the exception description information in this embodiment is also more accurate. Among them, to determine the class center is to determine the class obtained after clustering.

本实施例适用于移动终端系统中native crash(本地崩溃)的异常问题，该异常问题属于多发性问题，一天的问题数量已达百万级别，而且没有标注数据。本实施例根据这些异常问题的异常描述信息实现了对这些异常问题的聚类处理，处理结果较准确。This embodiment is applicable to the abnormal problem of native crash (local crash) in the mobile terminal system. The abnormal problem is a multiple problem, the number of problems in one day has reached one million level, and there is no marked data. In this embodiment, the clustering processing of these abnormal problems is realized according to the abnormal description information of these abnormal problems, and the processing result is relatively accurate.

本实施例中异常描述信息可以有多种，例如，异常语言描述信息可以是abortmessage(中止消息)字段中存储的信息，是一种类似自然语言的信息，相当于是对异常问题的语言描述。异常标记可以是abi字段中存储的信息，abi中存储的是各类标记，其中几类标记表示了各类异常问题，所以可以根据该字段存储的标记可以确定异常问题。异常数值可以是signal(数值)字段中存储的信息，signal中存储的是各种取值，其中几种取值表示了各类异常问题，所以可以根据该字段存储的取值可以确定异常问题。异常代码标识可以是code(代码)字段中存储的信息，code中存储的是各种异常代码的标识信息，异常代码反应了异常问题，所以异常代码标识是对各类异常问题的标记，所以可以根据该字段存储的异常代码标识可以确定异常问题。与异常相关的函数的库信息和类信息可以是backtrace的库信息和类信息，是一种类似自然语言的信息，相当于是对异常问题的语言描述。backtrace是一种调用堆栈的函数，根据backtrace的库信息和类信息可以确定栈的语言描述信息，对于异常栈，backtrace的库信息和类信息描述了异常问题。In this embodiment, there may be various types of exception description information. For example, the exception language description information may be the information stored in the abortmessage (abort message) field, which is similar to natural language, and is equivalent to a language description of the exception problem. The exception tag can be the information stored in the abi field, which stores various tags, and several types of tags represent various abnormal problems, so the abnormal problems can be determined according to the tags stored in this field. The abnormal value can be the information stored in the signal (value) field, and the signal stores various values, several of which represent various abnormal problems, so the abnormal problems can be determined according to the values stored in this field. The exception code identifier can be the information stored in the code (code) field. The code stores the identification information of various exception codes. The exception code reflects the abnormal problem, so the exception code identifier is a marker for various abnormal problems, so it can be The exception problem can be determined according to the exception code identifier stored in this field. The library information and class information of the function related to the exception can be the library information and class information of the backtrace, which is a kind of information similar to natural language, which is equivalent to the language description of the exception problem. Backtrace is a function that calls the stack. The language description information of the stack can be determined according to the library information and class information of the backtrace. For the exception stack, the library information and class information of the backtrace describe the exception problem.

本实施例提供了多种异常描述信息，从各个角度反应了异常问题，可以更准确的对异常栈进行聚类，适用于多种应用场景。This embodiment provides a variety of exception description information, reflects the exception problem from various angles, can more accurately cluster the exception stack, and is suitable for a variety of application scenarios.

在一个实施例中，所述步骤102包括：步骤A1-步骤A3。In one embodiment, the step 102 includes: step A1-step A3.

在步骤A1中，采用编辑距离算法，对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性，得到相应的每项异常描述信息的子相似度。In step A1, the edit distance algorithm is used to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the functions related to the exception, and obtain the corresponding sub-similarity of each exception description information. Spend.

在步骤A2中，采用精确匹配算法，对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性，得到相应的每项异常描述信息的子相似度。In step A2, the exact matching algorithm is used to calculate the similarity between the exception stack and the exception label, exception value, and exception code identifier, respectively, to obtain the corresponding sub-similarity of each item of exception description information.

在步骤A3中，根据每项异常描述信息的子相似度，得到异常栈的相似度。In step A3, the similarity of the exception stack is obtained according to the sub-similarity of each item of exception description information.

本实施例中步骤A1与步骤A2可以同步进行。In this embodiment, step A1 and step A2 may be performed synchronously.

本实施例中，异常语言描述信息、与异常相关的函数的库信息和类信息均是类似自然语言的描述信息，所以采用编辑距离算法来计算不同异常栈之间异常语言描述信息的子相似度，以及计算不同异常栈之间与异常相关的函数的库信息和类信息的子相似度。编辑距离算法是种计算两个字符串之间相似度的算法。编辑距离算法可以有多种，均适用于本实施例。In this embodiment, the abnormal language description information, the library information and class information of the functions related to the exception are description information similar to natural language, so the edit distance algorithm is used to calculate the sub-similarity of the abnormal language description information between different exception stacks , and calculate the sub-similarity of library information and class information of exception-related functions between different exception stacks. Edit distance algorithm is an algorithm that calculates the similarity between two strings. There may be multiple edit distance algorithms, all of which are applicable to this embodiment.

异常标记、异常数值和异常代码标识的字符串较短，无语义性，适合采用精确匹配算法，所以得到的子相似度的取值为1(即相似度100％)或0(即完全不相似)。The strings identified by the abnormal mark, abnormal value and abnormal code are short and have no semantics, so they are suitable for the exact matching algorithm, so the obtained sub-similarity is 1 (that is, the similarity is 100%) or 0 (that is, completely dissimilar). ).

本实施例通过步骤A1和步骤A2，得到两个异常栈之间每项异常描述信息的子相似度。再通过步骤A3，汇总两个异常栈之间每项异常描述信息的子相似度，得到异常栈与异常栈之间的相似度。可以采用每项子相似度的求和，或加权求和等方式，得到异常栈与异常栈之间的相似度。每项异常描述信息的子相似度的权重可以预先配置。In this embodiment, the sub-similarity of each item of exception description information between the two exception stacks is obtained through step A1 and step A2. Then, through step A3, the sub-similarity of each item of exception description information between the two exception stacks is aggregated, and the similarity between the exception stack and the exception stack is obtained. The similarity between the exception stack and the exception stack can be obtained by the summation of each sub-similarity, or the weighted summation. The weight of the sub-similarity of each anomaly description information can be preconfigured.

在一个实施例中，所述步骤A1包括：步骤A11-步骤A13。In one embodiment, the step A1 includes: step A11-step A13.

在步骤A11中，针对所述库信息和类信息中的每一层，分别计算子相似度。In step A11, the sub-similarity is calculated separately for each layer in the library information and the class information.

在步骤A12中，对同层级的多个子相似度求平均。In step A12, the multiple sub-similarities of the same level are averaged.

在步骤A13中，对多个层级的子相似度进行加权求和，其中，栈层级越低的子相似度对应的权重越大。In step A13, weighted summation is performed on the sub-similarities of multiple levels, wherein the sub-similarity with a lower stack level corresponds to a larger weight.

本实施例中，库信息和类信息可以有多层，即异常栈可以有多层调用关系。库信息和类信息的一层内可以有多项，即一层内可以有多个异常栈。In this embodiment, the library information and the class information may have multiple layers, that is, the exception stack may have multiple calling relationships. There can be multiple items in one layer of library information and class information, that is, there can be multiple exception stacks in one layer.

库信息和类信息如果只有一层，那么可以省略步骤A12。如果库信息和类信息中一层内只有一项库信息或类信息，那么可以省略步骤A13。If there is only one layer of library information and class information, step A12 can be omitted. If there is only one item of library information or class information in one layer of library information and class information, step A13 may be omitted.

在比较两个异常栈的库信息和类信息的子相似度时，进行逐层比较，以及对同层进行逐项比较，然后再汇总。本实施例认为同层中的每项库信息和类信息的重要程度相同，因此对同层级的多个子相似度求平均，得到该层的子相似度。对于多层的库信息和类信息，认为越靠近底层重要程度越高，所以对多个层级的子相似度进行加权求和，其中，栈层级越低的子相似度对应的权重越大。When comparing the sub-similarity of the library information and class information of the two exception stacks, a layer-by-layer comparison is performed, and an item-by-item comparison is performed on the same layer, and then summarized. This embodiment considers that each item of library information and class information in the same layer has the same degree of importance, so the sub-similarities of the layer are obtained by averaging multiple sub-similarities of the same layer. For multi-layer library information and class information, it is considered that the closer to the bottom layer, the higher the importance, so the weighted summation of the sub-similarities of multiple levels is performed, wherein the sub-similarity of the lower stack level corresponds to the greater the weight.

本实施例针对库信息和类信息的层级特性提供了更详细的相似度计算方式，得到的相似度更准确。This embodiment provides a more detailed similarity calculation method for the hierarchical characteristics of library information and class information, and the obtained similarity is more accurate.

在一个实施例中，计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度之前，所述方法还包括：步骤B1-步骤B2。In one embodiment, before calculating the similarity of the exception description information between the exception stack and the exception stack, and obtaining the similarity, the method further includes: step B1-step B2.

在步骤B1中，对每个异常栈中的异常描述信息进行唯一性编码，得到唯一编码。In step B1, the exception description information in each exception stack is uniquely encoded to obtain a unique encoding.

在步骤B2中，根据所述唯一编码，对异常栈进行去重处理。In step B2, the exception stack is deduplicated according to the unique code.

本实施例在计算相似度之前，可以先对异常栈中的各项异常描述信息进行整体编码，可以采用唯一性编码方法来进行编码。如果另个异常栈的异常描述信息不完全相同，那么得到的编码结果也不同，这样便可确定完全相同的异常栈。然后进行去重处理，这样在后续比较相似度时，完全相同的异常栈就不需要进行比较了，可减少计算相似度时的计算量。另外，该去重过程也是一个初步聚类的过程，即将完全相同的异常栈聚为一类。In this embodiment, before calculating the similarity, the overall coding of each item of exception description information in the exception stack may be performed, and a unique coding method may be used for coding. If the exception description information of another exception stack is not exactly the same, the obtained coding result is also different, so that the exact same exception stack can be determined. Then, deduplication processing is performed, so that when the similarity is compared later, the identical exception stacks do not need to be compared, which can reduce the amount of calculation when calculating the similarity. In addition, the deduplication process is also a preliminary clustering process, that is, the same exception stacks are grouped into one class.

通过唯一性编码，可较快的比较出完全相同的异常栈，比较过程较快。唯一性编码方式可以有多种，如MD5算法(摘要5算法)等。Through the unique encoding, the same exception stack can be compared quickly, and the comparison process is faster. There are many unique encoding methods, such as MD5 algorithm (digest 5 algorithm) and so on.

在一个实施例中，所述步骤102包括：步骤C1-步骤C2。In one embodiment, the step 102 includes: step C1-step C2.

在步骤C1中，对每个异常栈中的异常描述信息进行矢量编码，得到矢量编码结果。In step C1, vector coding is performed on the exception description information in each exception stack to obtain a vector coding result.

在步骤C2中，计算异常栈与异常栈之间，矢量编码结果的相似性，得到相似度。In step C2, the similarity of the vector coding result between the exception stack and the exception stack is calculated to obtain the similarity.

本实施例对异常栈中的每项异常描述信息分别进行矢量编码，便于快速的比较相似性。矢量编码方法可以是一种数值编码方法，将各项异常描述信息编码成相应的数值。并且，在进行聚类时，可以采用带有原数据统计信息的聚类算法，进一步提高聚类的准确性。In this embodiment, vector encoding is performed on each item of exception description information in the exception stack, which facilitates quick comparison of similarity. The vector coding method may be a numerical coding method, which encodes various abnormal description information into corresponding numerical values. Moreover, when performing clustering, a clustering algorithm with statistical information of the original data can be used to further improve the accuracy of the clustering.

在一个实施例中，所述步骤103包括：步骤D1-步骤D2。In one embodiment, the step 103 includes: step D1-step D2.

在步骤D1中，统计相同异常栈的个数。该步骤可以与步骤B2同步进行，去重和统计个数都是确定相同异常栈的过程。In step D1, the number of the same exception stack is counted. This step can be performed synchronously with step B2, and both deduplication and counting are the process of determining the same exception stack.

在步骤D2中，按照个数由高到低的顺序，根据所述相似度对异常栈进行聚类处理，并确定类中心。In step D2, according to the sequence of the number from high to low, cluster processing is performed on the abnormal stack according to the similarity, and the cluster center is determined.

本实施例先统计相同异常栈的个数，相当于先按照相似度100％进行初步聚类，并确定初步聚类后各个类别中异常栈的个数。再按照该个数由高到低的顺序进行二次聚类，相当于优先确定覆盖率较高的类别，也就是异常问题出现次数较多问题较严重的类别。在实际应用中，比较关心异常问题较严重的类别，因此不一定要对所有异常栈进行聚类处理，比如在聚类过程中得到类别数量达到预设数量，或者聚类的覆盖率达到预设的覆盖率，便可结束聚类处理，也是提高了聚类的处理效率。通过本实施例的方法得到的聚类结果，覆盖率又高又准确。In this embodiment, the number of the same exception stacks is first counted, which is equivalent to performing preliminary clustering according to 100% similarity, and determining the number of exception stacks in each category after the preliminary clustering. Then perform secondary clustering in descending order of the number, which is equivalent to prioritizing the category with higher coverage, that is, the category with more abnormal problems and more serious problems. In practical applications, we are more concerned about the categories with more serious exception problems, so it is not necessary to cluster all exception stacks. If the coverage rate is higher, the clustering process can be ended, which also improves the processing efficiency of the clustering. The clustering results obtained by the method of this embodiment have high and accurate coverage.

其中，在聚类过程中，将相似度超过预设阈值的多个异常栈归为一类。一次聚类处理后，可以抽样检验聚类的准确性，如果准确性没有达到预期，可以调整相似度的阈值并再次进行聚类，直至准确性达到预期。Among them, in the clustering process, a plurality of abnormal stacks whose similarity exceeds a preset threshold are classified into one category. After one clustering process, the accuracy of the clustering can be checked by sampling. If the accuracy does not meet the expectation, the threshold of similarity can be adjusted and clustering is performed again until the accuracy reaches the expectation.

在一个实施例中，所述步骤D2包括：步骤D21-步骤D22。In one embodiment, the step D2 includes: step D21-step D22.

在步骤D21中，将得到的所述相似度存储到预设的存储结构中；所述存储结构包括：两层哈希图或数组。In step D21, the obtained similarity is stored in a preset storage structure; the storage structure includes: a two-layer hash map or an array.

在步骤D22中，按照个数由高到低的顺序，对所述存储结构进行遍历，并进行聚类处理，以及确定类中心。In step D22, the storage structures are traversed in descending order of the number, and cluster processing is performed, and a cluster center is determined.

本实施例采用两层哈希图或数组等存储结构存储之前得到的各个相似度，便于较快速的按照个数由高到低的顺序，对所述存储结构进行遍历，可提高聚类的处理效率。This embodiment uses a storage structure such as a two-layer hash map or an array to store the respective similarities obtained before, which is convenient to traverse the storage structures in order of the number from high to low, which can improve the processing of clustering. efficiency.

下面通过几个实施例详细介绍实现过程。The implementation process is described in detail below through several embodiments.

图2是根据一示例性实施例示出的一种异常问题聚类的方法的流程图，如图2所示，该方法可以由计算机设备实现，包括以下步骤：Fig. 2 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment. As shown in Fig. 2, the method can be implemented by a computer device and includes the following steps:

在步骤201中，从多个异常栈中分别提取出各项异常描述信息。In step 201, each item of exception description information is extracted from a plurality of exception stacks, respectively.

在步骤202中，对每个异常栈中的所有异常描述信息整体进行唯一性编码，得到唯一编码。In step 202, the entirety of all the exception description information in each exception stack is uniquely encoded to obtain a unique encoding.

在步骤203中，根据所述唯一编码，对异常栈进行去重处理。In step 203, the exception stack is deduplicated according to the unique code.

在步骤204中，对每个异常栈中的各项异常描述信息进行矢量编码，得到矢量编码结果。In step 204, vector coding is performed on each item of exception description information in each exception stack to obtain a vector coding result.

在步骤205中，采用编辑距离算法，对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息的各自的矢量编码结果分别计算相似性，得到相应的每项异常描述信息的子相似度。In step 205, the edit distance algorithm is used to calculate the similarity between the abnormal language description information between the exception stack and the exception, the library information of the function related to the exception, and the vector coding results of the class information, respectively, to obtain each corresponding item. The sub-similarity of anomaly description information.

在步骤206中，采用精确匹配算法，对异常栈与异常之间的异常标记、异常数值、异常代码标识的各自的矢量编码结果分别计算相似性，得到相应的每项异常描述信息的子相似度。In step 206, the exact matching algorithm is used to calculate the similarity of the respective vector encoding results of the exception flag, the exception value, and the exception code identifier between the exception stack and the exception, and obtain the corresponding sub-similarity of each item of exception description information. .

在步骤207中，根据每项异常描述信息的子相似度，得到异常栈的相似度。In step 207, the similarity of the exception stack is obtained according to the sub-similarity of each item of exception description information.

在步骤208中，根据所述相似度对异常栈进行聚类处理，并确定类中心。In step 208, clustering processing is performed on the abnormal stack according to the similarity, and a cluster center is determined.

图3是根据一示例性实施例示出的一种异常问题聚类的方法的流程图，如图3所示，该方法可以由计算机设备实现，包括以下步骤：FIG. 3 is a flowchart of a method for clustering abnormal problems according to an exemplary embodiment. As shown in FIG. 3 , the method can be implemented by a computer device and includes the following steps:

在步骤301中，从多个异常栈中分别提取出异常描述信息。In step 301, exception description information is extracted from multiple exception stacks, respectively.

在步骤302中，对每个异常栈中的异常描述信息进行唯一性编码，得到唯一编码。In step 302, the exception description information in each exception stack is uniquely encoded to obtain a unique encoding.

在步骤303中，根据所述唯一编码，对异常栈进行去重处理，并统计相同异常栈的个数。In step 303, the exception stack is deduplicated according to the unique code, and the number of the same exception stack is counted.

在步骤304中，计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度。In step 304, the similarity of the exception description information between the exception stack and the exception stack is calculated to obtain the similarity.

在步骤305中，将得到的所述相似度存储到预设的存储结构中；所述存储结构包括：两层哈希图或数组。In step 305, the obtained similarity is stored in a preset storage structure; the storage structure includes: a two-layer hash map or an array.

在步骤306中，按照个数由高到低的顺序，对所述存储结构进行遍历，并进行聚类处理，以及确定类中心。In step 306, the storage structures are traversed in descending order of the number, and cluster processing is performed, and the cluster center is determined.

上述实施例可以根据实际需要进行各种组合。The above embodiments can be combined in various ways according to actual needs.

通过以上介绍了解了异常问题聚类的实现过程，该过程由计算机实现，下面针对设备的内部结构和功能进行介绍。Through the above introduction, we understand the realization process of abnormal problem clustering, which is realized by computer. The following introduces the internal structure and function of the device.

图4是根据一示例性实施例示出的一种异常问题聚类的装置示意图。参照图4，该装置包括：提取模块401、计算模块402和聚类模块403。Fig. 4 is a schematic diagram of an apparatus for clustering abnormal problems according to an exemplary embodiment. Referring to FIG. 4 , the apparatus includes: an extraction module 401 , a calculation module 402 and a clustering module 403 .

提取模块401，用于从多个异常栈中分别提取出异常描述信息。The extraction module 401 is configured to extract the exception description information from the multiple exception stacks respectively.

计算模块402，用于计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度。The calculation module 402 is configured to calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity.

聚类模块403，用于根据所述相似度对异常栈进行聚类处理，并确定类中心。The clustering module 403 is configured to perform clustering processing on the abnormal stack according to the similarity, and determine the cluster center.

在一个实施例中，如图5所示，所述计算模块402包括：第一计算子模块501、第二计算子模块502和统计子模块503。In one embodiment, as shown in FIG. 5 , the calculation module 402 includes: a first calculation sub-module 501 , a second calculation sub-module 502 and a statistics sub-module 503 .

第一计算子模块501，用于采用编辑距离算法，对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性，得到相应的每项异常描述信息的子相似度。The first calculation submodule 501 is used for adopting an edit distance algorithm to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the function related to the exception, and obtain the corresponding description of each exception. Sub-similarity of information.

第二计算子模块502，用于采用精确匹配算法，对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性，得到相应的每项异常描述信息的子相似度。The second calculation sub-module 502 is configured to use an exact matching algorithm to calculate the similarity between the exception stack and the exception label, exception value, and exception code identifier, respectively, to obtain the corresponding sub-similarity of each item of exception description information.

统计子模块503，用于根据每项异常描述信息的子相似度，得到异常栈的相似度。The statistics sub-module 503 is configured to obtain the similarity of the exception stack according to the sub-similarity of each item of exception description information.

在一个实施例中，所述第一计算子模块501针对所述库信息和类信息中的每一层，分别计算子相似度；对同层级的多个子相似度求平均；对多个层级的子相似度进行加权求和，其中，栈层级越低的子相似度对应的权重越大。In one embodiment, the first calculation sub-module 501 calculates the sub-similarity separately for each layer in the library information and the class information; averages multiple sub-similarities at the same level; The sub-similarities are weighted and summed, wherein the sub-similarity with a lower stack level corresponds to a larger weight.

在一个实施例中，如图6所示，所述装置还包括：唯一编码模块601和去重模块602。In one embodiment, as shown in FIG. 6 , the apparatus further includes: a unique encoding module 601 and a deduplication module 602 .

唯一编码模块601，用于对每个异常栈中的异常描述信息进行唯一性编码，得到唯一编码。The unique encoding module 601 is configured to uniquely encode the exception description information in each exception stack to obtain the unique encoding.

去重模块602，用于根据所述唯一编码，对异常栈进行去重处理。A deduplication module 602, configured to perform deduplication processing on the exception stack according to the unique code.

在一个实施例中，如图7所示，所述计算模块402包括：矢量编码子模块701和第三计算子模块702。In one embodiment, as shown in FIG. 7 , the calculation module 402 includes: a vector encoding sub-module 701 and a third calculation sub-module 702 .

矢量编码子模块701，用于对每个异常栈中的异常描述信息进行矢量编码，得到矢量编码结果。The vector coding sub-module 701 is configured to perform vector coding on the exception description information in each exception stack to obtain a vector coding result.

第三计算子模块702，用于计算异常栈与异常栈之间，矢量编码结果的相似性，得到相似度。The third calculation sub-module 702 is configured to calculate the similarity between the exception stack and the vector coding result to obtain the similarity.

在一个实施例中，如图8所示，所述聚类模块403包括：个数子模块801和聚类子模块802。In one embodiment, as shown in FIG. 8 , the clustering module 403 includes: a number sub-module 801 and a clustering sub-module 802 .

个数子模块801，用于统计相同异常栈的个数。The number sub-module 801 is used to count the number of the same exception stack.

聚类子模块802，用于按照个数由高到低的顺序，根据所述相似度对异常栈进行聚类处理，并确定类中心。The clustering sub-module 802 is configured to perform clustering processing on the abnormal stacks according to the similarity according to the order of the number from high to low, and determine the cluster center.

在一个实施例中，所述聚类子模块802将得到的所述相似度存储到预设的存储结构中；所述存储结构包括：两层哈希图或数组；按照个数由高到低的顺序，对所述存储结构进行遍历，并进行聚类处理，以及确定类中心。In one embodiment, the clustering sub-module 802 stores the obtained similarity in a preset storage structure; the storage structure includes: a two-layer hash map or an array; according to the number from high to low order, traverse the storage structure, perform clustering processing, and determine the cluster center.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

图9是根据一示例性实施例示出的一种用于异常问题聚类的装置900的框图。例如，装置900可以被提供为一计算机。参照图9，装置900包括处理组件922，其进一步包括一个或多个处理器，以及由存储器932所代表的存储器资源，用于存储可由处理组件922的执行的指令，例如应用程序。存储器932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外，处理组件922被配置为执行指令，以执行上述方法异常问题聚类。FIG. 9 is a block diagram of an apparatus 900 for clustering abnormal problems according to an exemplary embodiment. For example, apparatus 900 may be provided as a computer. 9, apparatus 900 includes a processing component 922, which further includes one or more processors, and a memory resource, represented by memory 932, for storing instructions executable by processing component 922, such as an application program. An application program stored in memory 932 may include one or more modules, each corresponding to a set of instructions. Additionally, the processing component 922 is configured to execute instructions to perform the method exception problem clustering described above.

装置900还可以包括一个电源组件926被配置为执行装置900的电源管理，一个有线或无线网络接口950被配置为将装置900连接到网络，和一个输入输出(I/O)接口958。装置900可以操作基于存储在存储器932的操作系统，例如Windows ServerTM，Mac OS XTM，UnixTM,LinuxTM，FreeBSDTM或类似。Device 900 may also include a power supply assembly 926 configured to perform power management of device 900 , a wired or wireless network interface 950 configured to connect device 900 to a network, and an input output (I/O) interface 958 . Device 900 may operate based on an operating system stored in memory 932, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™ or the like.

一种异常问题聚类的装置，包括：A device for clustering abnormal problems, comprising:

处理器；processor;

其中，所述处理器被配置为：wherein the processor is configured to:

所述处理器还可以被配置为：The processor may also be configured to:

所述异常描述信息至少包括下列之一：异常语言描述信息、异常标记、异常数值、异常代码标识、与异常相关的函数的库信息和类信息。The exception description information includes at least one of the following: exception language description information, exception flag, exception value, exception code identifier, library information and class information of the function related to the exception.

所述处理器还可以被配置为：The processor may also be configured to:

所述计算异常栈与异常之间，异常描述信息的相似性，得到相似度，包括：The similarity between the exception stack and the exception, the exception description information is calculated, and the similarity is obtained, including:

采用精确匹配算法，对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性，得到相应的每项异常描述信息的子相似度；Using the exact matching algorithm, the similarity between the exception stack and the exception label, exception value, and exception code identifier is calculated respectively, and the corresponding sub-similarity of each exception description information is obtained;

所述处理器还可以被配置为：The processor may also be configured to:

所述采用编辑距离算法，对异常栈与异常之间的与异常相关的函数的库信息和类信息分别计算相似性，得到相应的每项异常描述信息的子相似度，包括：The edit distance algorithm is used to calculate the similarity of the library information and class information of the function related to the exception between the exception stack and the exception, and obtain the corresponding sub-similarity of each item of exception description information, including:

对同层级的多个子相似度求平均；Average multiple sub-similarities at the same level;

所述处理器还可以被配置为：The processor may also be configured to:

计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度之前，所述方法还包括：Before calculating the similarity of the exception description information between the exception stack and the exception stack, and obtaining the similarity, the method further includes:

所述处理器还可以被配置为：The processor may also be configured to:

计算异常栈与异常栈之间，异常描述信息的相似性，得到相似度，包括：Calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity, including:

所述处理器还可以被配置为：The processor may also be configured to:

所述根据所述相似度对异常栈进行聚类处理，并确定类中心，包括：The clustering process is performed on the abnormal stack according to the similarity, and the cluster center is determined, including:

统计相同异常栈的个数；Count the number of the same exception stack;

所述处理器还可以被配置为：The processor may also be configured to:

所述按照个数由高到低的顺序，根据所述相似度对异常栈进行聚类处理，并确定类中心，包括：The abnormal stacks are clustered according to the similarity according to the order of the number from high to low, and the cluster center is determined, including:

一种计算机可读存储介质，当所述存储介质中的指令由移动终端的处理器执行时，使得移动终端能够执行一种异常问题聚类的方法，所述方法包括：A computer-readable storage medium, when an instruction in the storage medium is executed by a processor of a mobile terminal, the mobile terminal can execute a method for clustering abnormal problems, the method comprising:

所述存储介质中的指令还可以包括：The instructions in the storage medium may also include:

统计相同异常栈的个数；Count the number of the same exception stack;

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for clustering abnormal problems, comprising:

extracting the abnormal description information from a plurality of abnormal stacks respectively;

calculating the similarity of the abnormal description information between the abnormal stack and the abnormal stack to obtain the similarity;

and clustering the abnormal stacks according to the similarity, and determining a class center.

2. The method for abnormal problem clustering according to claim 1, wherein the abnormal description information includes at least one of: exception language description information, exception tags, exception values, exception code identification, library information for functions associated with the exception, and class information.

3. The method for clustering abnormal problems according to claim 2, wherein the calculating the similarity between the abnormal stack and the abnormal description information to obtain the similarity comprises:

respectively calculating similarity of the abnormal language description information between the abnormal stack and the abnormality, the library information of the function related to the abnormality and the class information by adopting an edit distance algorithm to obtain the sub-similarity of each corresponding item of abnormal description information;

respectively calculating the similarity of the abnormal marks, the abnormal values and the abnormal code identifications between the abnormal stack and the abnormality by adopting an accurate matching algorithm to obtain the sub-similarity of each corresponding item of abnormal description information;

and obtaining the similarity of the exception stack according to the sub-similarity of each item of exception description information.

4. The method for clustering abnormal problems according to claim 3, wherein the calculating the similarity of the library information and the class information of the function related to the abnormality between the abnormality stack and the abnormality by using the edit distance algorithm to obtain the sub-similarity of each corresponding abnormality description information comprises:

calculating sub-similarity respectively for each layer in the library information and the class information;

averaging the similarity of a plurality of children at the same level;

and carrying out weighted summation on the sub-similarities of the multiple levels, wherein the lower sub-similarity of the stack level corresponds to the higher weight.

5. The method for clustering abnormal problems according to claim 1, wherein before the similarity is obtained by calculating the similarity between the abnormal stack and the abnormal description information, the method further comprises:

uniquely encoding the abnormal description information in each abnormal stack to obtain unique codes;

and performing duplicate removal processing on the abnormal stack according to the unique code.

6. The method for clustering abnormal problems according to claim 1, wherein the step of calculating the similarity between the abnormal stack and the similarity between the abnormal description information to obtain the similarity comprises:

carrying out vector coding on the abnormal description information in each abnormal stack to obtain a vector coding result;

and calculating the similarity of the vector coding results between the abnormal stack and the abnormal stack to obtain the similarity.

7. The method for clustering abnormal problems according to claim 1, wherein the clustering abnormal stacks according to the similarity and determining class centers comprises:

counting the number of the same abnormal stacks;

and according to the sequence of the number from high to low, clustering the abnormal stacks according to the similarity, and determining the class centers.

8. The method for clustering abnormal problems according to claim 7, wherein the clustering abnormal stacks according to the similarity and determining class centers according to the sequence of the number from high to low comprises:

storing the obtained similarity into a preset storage structure; the storage structure includes: two levels of Hash diagrams or arrays;

and traversing the storage structures according to the sequence of the number from high to low, clustering, and determining the class center.

9. An apparatus for clustering abnormal problems, comprising:

the extracting module is used for respectively extracting the abnormal description information from the abnormal stacks;

the calculation module is used for calculating the similarity of the abnormal description information between the abnormal stack and the abnormal stack to obtain the similarity;

and the clustering module is used for clustering the abnormal stacks according to the similarity and determining the class centers.

10. The apparatus for clustering abnormal problems according to claim 9, wherein the abnormal description information includes at least one of: exception language description information, exception tags, exception values, exception code identification, library information for functions associated with the exception, and class information.

11. The apparatus for anomaly clustering according to claim 10, wherein said calculation module comprises:

the first calculation submodule is used for respectively calculating the similarity of the abnormal language description information between the abnormal stack and the abnormality, the library information of the function related to the abnormality and the class information by adopting an edit distance algorithm to obtain the sub-similarity of each corresponding abnormal description information;

the second calculation submodule is used for respectively calculating the similarity of the abnormal marks, the abnormal values and the abnormal code identifications between the abnormal stack and the abnormality by adopting an accurate matching algorithm to obtain the sub-similarity of each corresponding item of abnormal description information;

and the statistic submodule is used for obtaining the similarity of the abnormal stack according to the sub-similarity of each item of abnormal description information.

12. The apparatus for abnormal problem clustering according to claim 11, wherein the first computation submodule separately computes a degree of sub-similarity for each layer of the library information and the class information; averaging the similarity of a plurality of children at the same level; and carrying out weighted summation on the sub-similarities of the multiple levels, wherein the lower sub-similarity of the stack level corresponds to the higher weight.

13. The apparatus for abnormal problem clustering according to claim 9, wherein the apparatus further comprises:

the unique coding module is used for uniquely coding the abnormal description information in each abnormal stack to obtain a unique code;

and the duplicate removal module is used for carrying out duplicate removal processing on the abnormal stack according to the unique code.

14. The apparatus for anomaly clustering according to claim 9, wherein the calculation module comprises:

the vector coding submodule is used for carrying out vector coding on the abnormal description information in each abnormal stack to obtain a vector coding result;

and the third calculation submodule is used for calculating the similarity between the abnormal stack and the vector coding result to obtain the similarity.

15. The apparatus for abnormal problem clustering according to claim 9, wherein the clustering module comprises:

the number submodule is used for counting the number of the same abnormal stacks;

and the clustering submodule is used for clustering the abnormal stacks according to the similarity and determining the class centers according to the sequence of the number from high to low.

16. The apparatus for abnormal problem clustering according to claim 15, wherein the clustering submodule stores the obtained similarity into a preset storage structure; the storage structure includes: two levels of Hash diagrams or arrays; and traversing the storage structures according to the sequence of the number from high to low, clustering, and determining the class center.

17. An apparatus for clustering abnormal problems, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

18. A computer-readable storage medium having computer instructions stored thereon, wherein the instructions, when executed by a processor, implement the method of claims 1 to 8.