CN111950573A - Method and device for clustering abnormal problems - Google Patents
Method and device for clustering abnormal problems Download PDFInfo
- Publication number
- CN111950573A CN111950573A CN201910407682.0A CN201910407682A CN111950573A CN 111950573 A CN111950573 A CN 111950573A CN 201910407682 A CN201910407682 A CN 201910407682A CN 111950573 A CN111950573 A CN 111950573A
- Authority
- CN
- China
- Prior art keywords
- abnormal
- similarity
- exception
- clustering
- stack
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本公开是关于一种异常问题聚类的方法及装置,用于改进异常栈的聚类效果,便于更准确的发现异常问题。所述方法包括:从多个异常栈中分别提取出异常描述信息;计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度;根据所述相似度对异常栈进行聚类处理,并确定类中心。
The present disclosure relates to a method and device for clustering abnormal problems, which are used to improve the clustering effect of abnormal stacks and facilitate more accurate discovery of abnormal problems. The method includes: extracting exception description information from a plurality of exception stacks respectively; calculating similarity of exception description information between exception stacks and exception stacks to obtain similarity; and clustering exception stacks according to the similarity , and determine the class center.
Description
技术领域technical field
本公开涉及通信及计算机处理领域,尤其涉及异常问题聚类的方法及装置。The present disclosure relates to the fields of communication and computer processing, and in particular, to a method and apparatus for clustering abnormal problems.
背景技术Background technique
相关技术中,移动终端系统每天可能发生大量的异常问题。这些异常问题的相关信息会存储在专门的栈中,这些栈称为异常栈。对异常栈进行聚类处理,以便对异常问题进行归类,有助于对异常问题进行后续的分析等处理。所以,如何进行更有效的聚类处理,是业内一直在研究的问题。In the related art, a large number of abnormal problems may occur in the mobile terminal system every day. Information about these exceptions is stored in special stacks called exception stacks. The exception stack is clustered to classify the abnormal problem, which is helpful for the follow-up analysis and other processing of the abnormal problem. Therefore, how to perform more effective clustering processing is a problem that has been researched in the industry.
发明内容SUMMARY OF THE INVENTION
为克服相关技术中存在的问题,本公开提供一种异常问题聚类的方法及装置。In order to overcome the problems existing in the related art, the present disclosure provides a method and apparatus for clustering abnormal problems.
根据本公开实施例的第一方面,提供一种异常问题聚类的方法,包括:According to a first aspect of the embodiments of the present disclosure, there is provided a method for clustering abnormal problems, including:
从多个异常栈中分别提取出异常描述信息;Extract the exception description information from multiple exception stacks respectively;
计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度;Calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity;
根据所述相似度对异常栈进行聚类处理,并确定类中心。The abnormal stack is clustered according to the similarity, and the cluster center is determined.
本公开的实施例提供的技术方案可以包括以下有益效果:本实施例中根据异常栈中的异常描述信息对异常栈进行聚类处理,由于异常描述信息对异常问题描述更准确,所以聚类效果也更好。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: In this embodiment, the exception stack is clustered according to the exception description information in the exception stack. Since the exception description information is more accurate in describing the abnormal problem, the clustering effect Also better.
在一个实施例中,所述异常描述信息至少包括下列之一:异常语言描述信息、异常标记、异常数值、异常代码标识、与异常相关的函数的库信息和类信息。In one embodiment, the exception description information includes at least one of the following: exception language description information, exception flags, exception values, exception code identifiers, library information and class information of functions related to exceptions.
本公开的实施例提供的技术方案可以包括以下有益效果:本实施例提供多种异常描述信息,可以更有效的进行聚类处理,适用于多种应用场景。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: the embodiments provide various abnormal description information, can perform clustering processing more effectively, and are suitable for various application scenarios.
在一个实施例中,所述计算异常栈与异常之间,异常描述信息的相似性,得到相似度,包括:In one embodiment, calculating the similarity of the exception description information between the exception stack and the exception to obtain the similarity, including:
采用编辑距离算法,对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度;The edit distance algorithm is used to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the functions related to the exception, and obtain the corresponding sub-similarity of each exception description information;
采用精确匹配算法,对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性,得到相应的每项异常描述信息的子相似度。The exact matching algorithm is used to calculate the similarity between the exception stack and the exception label, exception value, and exception code identifier respectively, and obtain the corresponding sub-similarity of each exception description information.
根据每项异常描述信息的子相似度,得到异常栈的相似度。According to the sub-similarity of each exception description information, the similarity of the exception stack is obtained.
本公开的实施例提供的技术方案可以包括以下有益效果:本实施例针对不同的异常描述信息采用相应的相似度计算方法,得到的相似度更准确,有助于提高后续聚类的准确性。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: this embodiment adopts corresponding similarity calculation methods for different abnormal description information, and the obtained similarity is more accurate, which helps to improve the accuracy of subsequent clustering.
在一个实施例中,所述采用编辑距离算法,对异常栈与异常之间的与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度,包括:In one embodiment, the edit distance algorithm is used to calculate the similarity of the library information and class information of the exception-related functions between the exception stack and the exception, respectively, to obtain the corresponding sub-similarity of each item of exception description information, include:
针对所述库信息和类信息中的每一层,分别计算子相似度;For each layer in the library information and the class information, calculate the sub-similarity respectively;
对同层级的多个子相似度求平均;Average multiple sub-similarities of the same level;
对多个层级的子相似度进行加权求和,其中,栈层级越低的子相似度对应的权重越大。A weighted sum is performed on the sub-similarities of multiple levels, wherein the sub-similarity with a lower stack level corresponds to a larger weight.
本公开的实施例提供的技术方案可以包括以下有益效果:本实施例针对与异常相关的函数的库信息和类信息的层级特性,提供了更细致的相似度计算方法,得到的相似度更准确。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: this embodiment provides a more detailed similarity calculation method for the hierarchical characteristics of the library information and class information of functions related to exceptions, and the obtained similarity is more accurate .
在一个实施例中,计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度之前,所述方法还包括:In one embodiment, before calculating the similarity of the exception description information between the exception stack and the exception stack, and obtaining the similarity, the method further includes:
对每个异常栈中的异常描述信息进行唯一性编码,得到唯一编码;Uniquely encode the exception description information in each exception stack to obtain a unique encoding;
根据所述唯一编码,对异常栈进行去重处理。The exception stack is deduplicated according to the unique code.
本公开的实施例提供的技术方案可以包括以下有益效果:本实施例提前进行去重处理,可以减少相似度的计算过程,效率更高。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: this embodiment performs de-duplication processing in advance, which can reduce the calculation process of similarity, and is more efficient.
在一个实施例中,计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度,包括:In one embodiment, the similarity of the exception description information between the exception stack and the exception stack is calculated to obtain the similarity, including:
对每个异常栈中的异常描述信息进行矢量编码,得到矢量编码结果;Perform vector encoding on the exception description information in each exception stack to obtain the vector encoding result;
计算异常栈与异常栈之间,矢量编码结果的相似性,得到相似度。Calculate the similarity of the vector encoding results between the exception stack and the exception stack to obtain the similarity.
本公开的实施例提供的技术方案可以包括以下有益效果:本实施例对异常描述信息进行矢量编码,便于更快速的计算相似度。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: this embodiment performs vector encoding on the abnormality description information, which facilitates faster calculation of similarity.
在一个实施例中,所述根据所述相似度对异常栈进行聚类处理,并确定类中心,包括:In one embodiment, performing clustering processing on the exception stack according to the similarity, and determining the cluster center, includes:
统计相同异常栈的个数;Count the number of the same exception stack;
按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心。In order of the number from high to low, cluster processing is performed on the abnormal stacks according to the similarity, and the cluster center is determined.
本公开的实施例提供的技术方案可以包括以下有益效果:本实施例根据相同异常栈的个数由高到低的顺序进行聚类处理,可以较快速的得到覆盖率较高的类别。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: This embodiment performs clustering processing according to the descending order of the number of identical exception stacks, so that categories with higher coverage rates can be obtained more quickly.
在一个实施例中,所述按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心,包括:In one embodiment, according to the order of the number from high to low, clustering processing is performed on the exception stacks according to the similarity, and the cluster center is determined, including:
将得到的所述相似度存储到预设的存储结构中;所述存储结构包括:两层哈希图或数组;The obtained similarity is stored in a preset storage structure; the storage structure includes: a two-layer hash map or an array;
按照个数由高到低的顺序,对所述存储结构进行遍历,并进行聚类处理,以及确定类中心。The storage structures are traversed in descending order of the number, and cluster processing is performed, and the cluster center is determined.
本公开的实施例提供的技术方案可以包括以下有益效果:本实施例通过预设的存储结构来存储和遍历相似度,提高了聚类的处理效率。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: In this embodiment, the similarity is stored and traversed through a preset storage structure, thereby improving the processing efficiency of clustering.
根据本公开实施例的第二方面,提供一种异常问题聚类的装置,包括:According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for clustering abnormal problems, including:
提取模块,用于从多个异常栈中分别提取出异常描述信息;The extraction module is used to extract the exception description information from multiple exception stacks respectively;
计算模块,用于计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度;The calculation module is used to calculate the similarity of the exception description information between the exception stack and the exception stack, and obtain the similarity;
聚类模块,用于根据所述相似度对异常栈进行聚类处理,并确定类中心。The clustering module is used for clustering the abnormal stack according to the similarity, and determining the cluster center.
在一个实施例中,所述异常描述信息至少包括下列之一:异常语言描述信息、异常标记、异常数值、异常代码标识、与异常相关的函数的库信息和类信息。In one embodiment, the exception description information includes at least one of the following: exception language description information, exception flags, exception values, exception code identifiers, library information and class information of functions related to exceptions.
在一个实施例中,所述计算模块包括:In one embodiment, the computing module includes:
第一计算子模块,用于采用编辑距离算法,对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度;The first calculation submodule is used to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the function related to the exception, and obtain the corresponding abnormal description information for each item by using the edit distance algorithm. the sub-similarity of ;
第二计算子模块,用于采用精确匹配算法,对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性,得到相应的每项异常描述信息的子相似度。The second calculation sub-module is used to calculate the similarity between the exception stack and the exception label, exception value, and exception code identifier by using the exact matching algorithm, and obtain the corresponding sub-similarity of each item of exception description information.
统计子模块,用于根据每项异常描述信息的子相似度,得到异常栈的相似度。The statistical sub-module is used to obtain the similarity of the exception stack according to the sub-similarity of each exception description information.
在一个实施例中,所述第一计算子模块针对所述库信息和类信息中的每一层,分别计算子相似度;对同层级的多个子相似度求平均;对多个层级的子相似度进行加权求和,其中,栈层级越低的子相似度对应的权重越大。In one embodiment, the first calculation sub-module calculates the sub-similarity separately for each layer in the library information and the class information; averages multiple sub-similarities at the same level; The similarity is weighted and summed, wherein the sub-similarity with a lower stack level corresponds to a greater weight.
在一个实施例中,所述装置还包括:In one embodiment, the apparatus further includes:
唯一编码模块,用于对每个异常栈中的异常描述信息进行唯一性编码,得到唯一编码;The unique encoding module is used to uniquely encode the exception description information in each exception stack to obtain a unique encoding;
去重模块,用于根据所述唯一编码,对异常栈进行去重处理。A deduplication module, configured to perform deduplication processing on the exception stack according to the unique code.
在一个实施例中,所述计算模块包括:In one embodiment, the computing module includes:
矢量编码子模块,用于对每个异常栈中的异常描述信息进行矢量编码,得到矢量编码结果;The vector coding submodule is used for vector coding the exception description information in each exception stack to obtain the vector coding result;
第三计算子模块,用于计算异常栈与异常栈之间,矢量编码结果的相似性,得到相似度。The third calculation sub-module is used to calculate the similarity between the exception stack and the exception stack, the result of vector encoding, to obtain the similarity.
在一个实施例中,所述聚类模块包括:In one embodiment, the clustering module includes:
个数子模块,用于统计相同异常栈的个数;The number sub-module is used to count the number of the same exception stack;
聚类子模块,用于按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心。The clustering sub-module is used for clustering the abnormal stacks according to the similarity according to the order of the number from high to low, and determining the cluster center.
在一个实施例中,所述聚类子模块将得到的所述相似度存储到预设的存储结构中;所述存储结构包括:两层哈希图或数组;按照个数由高到低的顺序,对所述存储结构进行遍历,并进行聚类处理,以及确定类中心。In one embodiment, the clustering sub-module stores the obtained similarity in a preset storage structure; the storage structure includes: a two-layer hash map or an array; sequence, traverse the storage structure, perform clustering processing, and determine the cluster center.
根据本公开实施例的第三方面,提供一种异常问题聚类的装置,包括:According to a third aspect of the embodiments of the present disclosure, an apparatus for clustering abnormal problems is provided, including:
处理器;processor;
用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
其中,所述处理器被配置为:wherein the processor is configured to:
从多个异常栈中分别提取出异常描述信息;Extract the exception description information from multiple exception stacks respectively;
计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度;Calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity;
根据所述相似度对异常栈进行聚类处理,并确定类中心。The abnormal stack is clustered according to the similarity, and the cluster center is determined.
根据本公开实施例的第四方面,提供一种计算机可读存储介质,其上存储有计算机指令,其特征在于,该指令被处理器执行时实现上述异常问题聚类的方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which computer instructions are stored, characterized in that, when the instructions are executed by a processor, the above-mentioned method for clustering abnormal problems is implemented.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
图1是根据一示例性实施例示出的一种异常问题聚类的方法的流程图。Fig. 1 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种异常问题聚类的方法的流程图。Fig. 2 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment.
图3是根据一示例性实施例示出的一种异常问题聚类的方法的流程图。Fig. 3 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment.
图4是根据一示例性实施例示出的一种异常问题聚类的装置的框图。Fig. 4 is a block diagram of an apparatus for clustering abnormal problems according to an exemplary embodiment.
图5是根据一示例性实施例示出的一种计算模块的框图。Fig. 5 is a block diagram of a computing module according to an exemplary embodiment.
图6是根据一示例性实施例示出的一种异常问题聚类的装置的框图。Fig. 6 is a block diagram of an apparatus for clustering abnormal problems according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种计算模块的框图。Fig. 7 is a block diagram of a computing module according to an exemplary embodiment.
图8是根据一示例性实施例示出的一种聚类模块的框图。Fig. 8 is a block diagram of a clustering module according to an exemplary embodiment.
图9是根据一示例性实施例示出的一种装置的框图。Fig. 9 is a block diagram of an apparatus according to an exemplary embodiment.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.
相关技术中,对异常栈进行聚类处理时,一种可能的方案是,对各个异常栈之间的函数进行相似性判断,以便进行聚类处理。但是,异常栈中的函数名称不一定能反应出异常问题,所以根据函数名称来进行聚类,效果不是很好。In the related art, when clustering the exception stacks, a possible solution is to judge the similarity of the functions between the exception stacks, so as to perform the clustering processing. However, the function names in the exception stack may not reflect the exception problem, so clustering based on the function name is not very effective.
为解决上述问题,本实施例根据异常栈中的异常描述信息对异常栈进行聚类,异常描述信息是对异常问题的直观描述,所以聚类效果更好。In order to solve the above problem, in this embodiment, the exception stack is clustered according to the exception description information in the exception stack. The exception description information is an intuitive description of the abnormal problem, so the clustering effect is better.
图1是根据一示例性实施例示出的一种异常问题聚类的方法的流程图,如图1所示,该方法可以由计算机设备实现,包括以下步骤:Fig. 1 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment. As shown in Fig. 1, the method can be implemented by a computer device and includes the following steps:
在步骤101中,从多个异常栈中分别提取出异常描述信息。In
在步骤102中,计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度。In
在步骤103中,根据所述相似度对异常栈进行聚类处理,并确定类中心。In
本实施例中的异常栈是指存储了关于异常问题的信息的栈。本实施例中的异常描述信息是异常栈中函数以外与异常问题有关的信息。由于异常描述信息对异常问题的描述更准确、全面,所以本实施例根据异常描述信息对异常栈进行聚类的结果也更准确。其中,确定类中心就是确定聚类后得到的类别。The exception stack in this embodiment refers to a stack that stores information about exception issues. The exception description information in this embodiment is information related to exception problems other than the functions in the exception stack. Since the exception description information describes the exception problem more accurately and comprehensively, the result of clustering the exception stack according to the exception description information in this embodiment is also more accurate. Among them, to determine the class center is to determine the class obtained after clustering.
本实施例适用于移动终端系统中native crash(本地崩溃)的异常问题,该异常问题属于多发性问题,一天的问题数量已达百万级别,而且没有标注数据。本实施例根据这些异常问题的异常描述信息实现了对这些异常问题的聚类处理,处理结果较准确。This embodiment is applicable to the abnormal problem of native crash (local crash) in the mobile terminal system. The abnormal problem is a multiple problem, the number of problems in one day has reached one million level, and there is no marked data. In this embodiment, the clustering processing of these abnormal problems is realized according to the abnormal description information of these abnormal problems, and the processing result is relatively accurate.
在一个实施例中,所述异常描述信息至少包括下列之一:异常语言描述信息、异常标记、异常数值、异常代码标识、与异常相关的函数的库信息和类信息。In one embodiment, the exception description information includes at least one of the following: exception language description information, exception flags, exception values, exception code identifiers, library information and class information of functions related to exceptions.
本实施例中异常描述信息可以有多种,例如,异常语言描述信息可以是abortmessage(中止消息)字段中存储的信息,是一种类似自然语言的信息,相当于是对异常问题的语言描述。异常标记可以是abi字段中存储的信息,abi中存储的是各类标记,其中几类标记表示了各类异常问题,所以可以根据该字段存储的标记可以确定异常问题。异常数值可以是signal(数值)字段中存储的信息,signal中存储的是各种取值,其中几种取值表示了各类异常问题,所以可以根据该字段存储的取值可以确定异常问题。异常代码标识可以是code(代码)字段中存储的信息,code中存储的是各种异常代码的标识信息,异常代码反应了异常问题,所以异常代码标识是对各类异常问题的标记,所以可以根据该字段存储的异常代码标识可以确定异常问题。与异常相关的函数的库信息和类信息可以是backtrace的库信息和类信息,是一种类似自然语言的信息,相当于是对异常问题的语言描述。backtrace是一种调用堆栈的函数,根据backtrace的库信息和类信息可以确定栈的语言描述信息,对于异常栈,backtrace的库信息和类信息描述了异常问题。In this embodiment, there may be various types of exception description information. For example, the exception language description information may be the information stored in the abortmessage (abort message) field, which is similar to natural language, and is equivalent to a language description of the exception problem. The exception tag can be the information stored in the abi field, which stores various tags, and several types of tags represent various abnormal problems, so the abnormal problems can be determined according to the tags stored in this field. The abnormal value can be the information stored in the signal (value) field, and the signal stores various values, several of which represent various abnormal problems, so the abnormal problems can be determined according to the values stored in this field. The exception code identifier can be the information stored in the code (code) field. The code stores the identification information of various exception codes. The exception code reflects the abnormal problem, so the exception code identifier is a marker for various abnormal problems, so it can be The exception problem can be determined according to the exception code identifier stored in this field. The library information and class information of the function related to the exception can be the library information and class information of the backtrace, which is a kind of information similar to natural language, which is equivalent to the language description of the exception problem. Backtrace is a function that calls the stack. The language description information of the stack can be determined according to the library information and class information of the backtrace. For the exception stack, the library information and class information of the backtrace describe the exception problem.
本实施例提供了多种异常描述信息,从各个角度反应了异常问题,可以更准确的对异常栈进行聚类,适用于多种应用场景。This embodiment provides a variety of exception description information, reflects the exception problem from various angles, can more accurately cluster the exception stack, and is suitable for a variety of application scenarios.
在一个实施例中,所述步骤102包括:步骤A1-步骤A3。In one embodiment, the
在步骤A1中,采用编辑距离算法,对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度。In step A1, the edit distance algorithm is used to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the functions related to the exception, and obtain the corresponding sub-similarity of each exception description information. Spend.
在步骤A2中,采用精确匹配算法,对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性,得到相应的每项异常描述信息的子相似度。In step A2, the exact matching algorithm is used to calculate the similarity between the exception stack and the exception label, exception value, and exception code identifier, respectively, to obtain the corresponding sub-similarity of each item of exception description information.
在步骤A3中,根据每项异常描述信息的子相似度,得到异常栈的相似度。In step A3, the similarity of the exception stack is obtained according to the sub-similarity of each item of exception description information.
本实施例中步骤A1与步骤A2可以同步进行。In this embodiment, step A1 and step A2 may be performed synchronously.
本实施例中,异常语言描述信息、与异常相关的函数的库信息和类信息均是类似自然语言的描述信息,所以采用编辑距离算法来计算不同异常栈之间异常语言描述信息的子相似度,以及计算不同异常栈之间与异常相关的函数的库信息和类信息的子相似度。编辑距离算法是种计算两个字符串之间相似度的算法。编辑距离算法可以有多种,均适用于本实施例。In this embodiment, the abnormal language description information, the library information and class information of the functions related to the exception are description information similar to natural language, so the edit distance algorithm is used to calculate the sub-similarity of the abnormal language description information between different exception stacks , and calculate the sub-similarity of library information and class information of exception-related functions between different exception stacks. Edit distance algorithm is an algorithm that calculates the similarity between two strings. There may be multiple edit distance algorithms, all of which are applicable to this embodiment.
异常标记、异常数值和异常代码标识的字符串较短,无语义性,适合采用精确匹配算法,所以得到的子相似度的取值为1(即相似度100%)或0(即完全不相似)。The strings identified by the abnormal mark, abnormal value and abnormal code are short and have no semantics, so they are suitable for the exact matching algorithm, so the obtained sub-similarity is 1 (that is, the similarity is 100%) or 0 (that is, completely dissimilar). ).
本实施例通过步骤A1和步骤A2,得到两个异常栈之间每项异常描述信息的子相似度。再通过步骤A3,汇总两个异常栈之间每项异常描述信息的子相似度,得到异常栈与异常栈之间的相似度。可以采用每项子相似度的求和,或加权求和等方式,得到异常栈与异常栈之间的相似度。每项异常描述信息的子相似度的权重可以预先配置。In this embodiment, the sub-similarity of each item of exception description information between the two exception stacks is obtained through step A1 and step A2. Then, through step A3, the sub-similarity of each item of exception description information between the two exception stacks is aggregated, and the similarity between the exception stack and the exception stack is obtained. The similarity between the exception stack and the exception stack can be obtained by the summation of each sub-similarity, or the weighted summation. The weight of the sub-similarity of each anomaly description information can be preconfigured.
在一个实施例中,所述步骤A1包括:步骤A11-步骤A13。In one embodiment, the step A1 includes: step A11-step A13.
在步骤A11中,针对所述库信息和类信息中的每一层,分别计算子相似度。In step A11, the sub-similarity is calculated separately for each layer in the library information and the class information.
在步骤A12中,对同层级的多个子相似度求平均。In step A12, the multiple sub-similarities of the same level are averaged.
在步骤A13中,对多个层级的子相似度进行加权求和,其中,栈层级越低的子相似度对应的权重越大。In step A13, weighted summation is performed on the sub-similarities of multiple levels, wherein the sub-similarity with a lower stack level corresponds to a larger weight.
本实施例中,库信息和类信息可以有多层,即异常栈可以有多层调用关系。库信息和类信息的一层内可以有多项,即一层内可以有多个异常栈。In this embodiment, the library information and the class information may have multiple layers, that is, the exception stack may have multiple calling relationships. There can be multiple items in one layer of library information and class information, that is, there can be multiple exception stacks in one layer.
库信息和类信息如果只有一层,那么可以省略步骤A12。如果库信息和类信息中一层内只有一项库信息或类信息,那么可以省略步骤A13。If there is only one layer of library information and class information, step A12 can be omitted. If there is only one item of library information or class information in one layer of library information and class information, step A13 may be omitted.
在比较两个异常栈的库信息和类信息的子相似度时,进行逐层比较,以及对同层进行逐项比较,然后再汇总。本实施例认为同层中的每项库信息和类信息的重要程度相同,因此对同层级的多个子相似度求平均,得到该层的子相似度。对于多层的库信息和类信息,认为越靠近底层重要程度越高,所以对多个层级的子相似度进行加权求和,其中,栈层级越低的子相似度对应的权重越大。When comparing the sub-similarity of the library information and class information of the two exception stacks, a layer-by-layer comparison is performed, and an item-by-item comparison is performed on the same layer, and then summarized. This embodiment considers that each item of library information and class information in the same layer has the same degree of importance, so the sub-similarities of the layer are obtained by averaging multiple sub-similarities of the same layer. For multi-layer library information and class information, it is considered that the closer to the bottom layer, the higher the importance, so the weighted summation of the sub-similarities of multiple levels is performed, wherein the sub-similarity of the lower stack level corresponds to the greater the weight.
本实施例针对库信息和类信息的层级特性提供了更详细的相似度计算方式,得到的相似度更准确。This embodiment provides a more detailed similarity calculation method for the hierarchical characteristics of library information and class information, and the obtained similarity is more accurate.
在一个实施例中,计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度之前,所述方法还包括:步骤B1-步骤B2。In one embodiment, before calculating the similarity of the exception description information between the exception stack and the exception stack, and obtaining the similarity, the method further includes: step B1-step B2.
在步骤B1中,对每个异常栈中的异常描述信息进行唯一性编码,得到唯一编码。In step B1, the exception description information in each exception stack is uniquely encoded to obtain a unique encoding.
在步骤B2中,根据所述唯一编码,对异常栈进行去重处理。In step B2, the exception stack is deduplicated according to the unique code.
本实施例在计算相似度之前,可以先对异常栈中的各项异常描述信息进行整体编码,可以采用唯一性编码方法来进行编码。如果另个异常栈的异常描述信息不完全相同,那么得到的编码结果也不同,这样便可确定完全相同的异常栈。然后进行去重处理,这样在后续比较相似度时,完全相同的异常栈就不需要进行比较了,可减少计算相似度时的计算量。另外,该去重过程也是一个初步聚类的过程,即将完全相同的异常栈聚为一类。In this embodiment, before calculating the similarity, the overall coding of each item of exception description information in the exception stack may be performed, and a unique coding method may be used for coding. If the exception description information of another exception stack is not exactly the same, the obtained coding result is also different, so that the exact same exception stack can be determined. Then, deduplication processing is performed, so that when the similarity is compared later, the identical exception stacks do not need to be compared, which can reduce the amount of calculation when calculating the similarity. In addition, the deduplication process is also a preliminary clustering process, that is, the same exception stacks are grouped into one class.
通过唯一性编码,可较快的比较出完全相同的异常栈,比较过程较快。唯一性编码方式可以有多种,如MD5算法(摘要5算法)等。Through the unique encoding, the same exception stack can be compared quickly, and the comparison process is faster. There are many unique encoding methods, such as MD5 algorithm (digest 5 algorithm) and so on.
在一个实施例中,所述步骤102包括:步骤C1-步骤C2。In one embodiment, the
在步骤C1中,对每个异常栈中的异常描述信息进行矢量编码,得到矢量编码结果。In step C1, vector coding is performed on the exception description information in each exception stack to obtain a vector coding result.
在步骤C2中,计算异常栈与异常栈之间,矢量编码结果的相似性,得到相似度。In step C2, the similarity of the vector coding result between the exception stack and the exception stack is calculated to obtain the similarity.
本实施例对异常栈中的每项异常描述信息分别进行矢量编码,便于快速的比较相似性。矢量编码方法可以是一种数值编码方法,将各项异常描述信息编码成相应的数值。并且,在进行聚类时,可以采用带有原数据统计信息的聚类算法,进一步提高聚类的准确性。In this embodiment, vector encoding is performed on each item of exception description information in the exception stack, which facilitates quick comparison of similarity. The vector coding method may be a numerical coding method, which encodes various abnormal description information into corresponding numerical values. Moreover, when performing clustering, a clustering algorithm with statistical information of the original data can be used to further improve the accuracy of the clustering.
在一个实施例中,所述步骤103包括:步骤D1-步骤D2。In one embodiment, the
在步骤D1中,统计相同异常栈的个数。该步骤可以与步骤B2同步进行,去重和统计个数都是确定相同异常栈的过程。In step D1, the number of the same exception stack is counted. This step can be performed synchronously with step B2, and both deduplication and counting are the process of determining the same exception stack.
在步骤D2中,按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心。In step D2, according to the sequence of the number from high to low, cluster processing is performed on the abnormal stack according to the similarity, and the cluster center is determined.
本实施例先统计相同异常栈的个数,相当于先按照相似度100%进行初步聚类,并确定初步聚类后各个类别中异常栈的个数。再按照该个数由高到低的顺序进行二次聚类,相当于优先确定覆盖率较高的类别,也就是异常问题出现次数较多问题较严重的类别。在实际应用中,比较关心异常问题较严重的类别,因此不一定要对所有异常栈进行聚类处理,比如在聚类过程中得到类别数量达到预设数量,或者聚类的覆盖率达到预设的覆盖率,便可结束聚类处理,也是提高了聚类的处理效率。通过本实施例的方法得到的聚类结果,覆盖率又高又准确。In this embodiment, the number of the same exception stacks is first counted, which is equivalent to performing preliminary clustering according to 100% similarity, and determining the number of exception stacks in each category after the preliminary clustering. Then perform secondary clustering in descending order of the number, which is equivalent to prioritizing the category with higher coverage, that is, the category with more abnormal problems and more serious problems. In practical applications, we are more concerned about the categories with more serious exception problems, so it is not necessary to cluster all exception stacks. If the coverage rate is higher, the clustering process can be ended, which also improves the processing efficiency of the clustering. The clustering results obtained by the method of this embodiment have high and accurate coverage.
其中,在聚类过程中,将相似度超过预设阈值的多个异常栈归为一类。一次聚类处理后,可以抽样检验聚类的准确性,如果准确性没有达到预期,可以调整相似度的阈值并再次进行聚类,直至准确性达到预期。Among them, in the clustering process, a plurality of abnormal stacks whose similarity exceeds a preset threshold are classified into one category. After one clustering process, the accuracy of the clustering can be checked by sampling. If the accuracy does not meet the expectation, the threshold of similarity can be adjusted and clustering is performed again until the accuracy reaches the expectation.
在一个实施例中,所述步骤D2包括:步骤D21-步骤D22。In one embodiment, the step D2 includes: step D21-step D22.
在步骤D21中,将得到的所述相似度存储到预设的存储结构中;所述存储结构包括:两层哈希图或数组。In step D21, the obtained similarity is stored in a preset storage structure; the storage structure includes: a two-layer hash map or an array.
在步骤D22中,按照个数由高到低的顺序,对所述存储结构进行遍历,并进行聚类处理,以及确定类中心。In step D22, the storage structures are traversed in descending order of the number, and cluster processing is performed, and a cluster center is determined.
本实施例采用两层哈希图或数组等存储结构存储之前得到的各个相似度,便于较快速的按照个数由高到低的顺序,对所述存储结构进行遍历,可提高聚类的处理效率。This embodiment uses a storage structure such as a two-layer hash map or an array to store the respective similarities obtained before, which is convenient to traverse the storage structures in order of the number from high to low, which can improve the processing of clustering. efficiency.
下面通过几个实施例详细介绍实现过程。The implementation process is described in detail below through several embodiments.
图2是根据一示例性实施例示出的一种异常问题聚类的方法的流程图,如图2所示,该方法可以由计算机设备实现,包括以下步骤:Fig. 2 is a flow chart of a method for clustering abnormal problems according to an exemplary embodiment. As shown in Fig. 2, the method can be implemented by a computer device and includes the following steps:
在步骤201中,从多个异常栈中分别提取出各项异常描述信息。In
在步骤202中,对每个异常栈中的所有异常描述信息整体进行唯一性编码,得到唯一编码。In
在步骤203中,根据所述唯一编码,对异常栈进行去重处理。In
在步骤204中,对每个异常栈中的各项异常描述信息进行矢量编码,得到矢量编码结果。In
在步骤205中,采用编辑距离算法,对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息的各自的矢量编码结果分别计算相似性,得到相应的每项异常描述信息的子相似度。In
在步骤206中,采用精确匹配算法,对异常栈与异常之间的异常标记、异常数值、异常代码标识的各自的矢量编码结果分别计算相似性,得到相应的每项异常描述信息的子相似度。In
在步骤207中,根据每项异常描述信息的子相似度,得到异常栈的相似度。In
在步骤208中,根据所述相似度对异常栈进行聚类处理,并确定类中心。In
图3是根据一示例性实施例示出的一种异常问题聚类的方法的流程图,如图3所示,该方法可以由计算机设备实现,包括以下步骤:FIG. 3 is a flowchart of a method for clustering abnormal problems according to an exemplary embodiment. As shown in FIG. 3 , the method can be implemented by a computer device and includes the following steps:
在步骤301中,从多个异常栈中分别提取出异常描述信息。In
在步骤302中,对每个异常栈中的异常描述信息进行唯一性编码,得到唯一编码。In
在步骤303中,根据所述唯一编码,对异常栈进行去重处理,并统计相同异常栈的个数。In
在步骤304中,计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度。In
在步骤305中,将得到的所述相似度存储到预设的存储结构中;所述存储结构包括:两层哈希图或数组。In
在步骤306中,按照个数由高到低的顺序,对所述存储结构进行遍历,并进行聚类处理,以及确定类中心。In
上述实施例可以根据实际需要进行各种组合。The above embodiments can be combined in various ways according to actual needs.
通过以上介绍了解了异常问题聚类的实现过程,该过程由计算机实现,下面针对设备的内部结构和功能进行介绍。Through the above introduction, we understand the realization process of abnormal problem clustering, which is realized by computer. The following introduces the internal structure and function of the device.
图4是根据一示例性实施例示出的一种异常问题聚类的装置示意图。参照图4,该装置包括:提取模块401、计算模块402和聚类模块403。Fig. 4 is a schematic diagram of an apparatus for clustering abnormal problems according to an exemplary embodiment. Referring to FIG. 4 , the apparatus includes: an
提取模块401,用于从多个异常栈中分别提取出异常描述信息。The
计算模块402,用于计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度。The
聚类模块403,用于根据所述相似度对异常栈进行聚类处理,并确定类中心。The
在一个实施例中,所述异常描述信息至少包括下列之一:异常语言描述信息、异常标记、异常数值、异常代码标识、与异常相关的函数的库信息和类信息。In one embodiment, the exception description information includes at least one of the following: exception language description information, exception flags, exception values, exception code identifiers, library information and class information of functions related to exceptions.
在一个实施例中,如图5所示,所述计算模块402包括:第一计算子模块501、第二计算子模块502和统计子模块503。In one embodiment, as shown in FIG. 5 , the
第一计算子模块501,用于采用编辑距离算法,对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度。The
第二计算子模块502,用于采用精确匹配算法,对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性,得到相应的每项异常描述信息的子相似度。The
统计子模块503,用于根据每项异常描述信息的子相似度,得到异常栈的相似度。The statistics sub-module 503 is configured to obtain the similarity of the exception stack according to the sub-similarity of each item of exception description information.
在一个实施例中,所述第一计算子模块501针对所述库信息和类信息中的每一层,分别计算子相似度;对同层级的多个子相似度求平均;对多个层级的子相似度进行加权求和,其中,栈层级越低的子相似度对应的权重越大。In one embodiment, the
在一个实施例中,如图6所示,所述装置还包括:唯一编码模块601和去重模块602。In one embodiment, as shown in FIG. 6 , the apparatus further includes: a
唯一编码模块601,用于对每个异常栈中的异常描述信息进行唯一性编码,得到唯一编码。The
去重模块602,用于根据所述唯一编码,对异常栈进行去重处理。A
在一个实施例中,如图7所示,所述计算模块402包括:矢量编码子模块701和第三计算子模块702。In one embodiment, as shown in FIG. 7 , the
矢量编码子模块701,用于对每个异常栈中的异常描述信息进行矢量编码,得到矢量编码结果。The
第三计算子模块702,用于计算异常栈与异常栈之间,矢量编码结果的相似性,得到相似度。The
在一个实施例中,如图8所示,所述聚类模块403包括:个数子模块801和聚类子模块802。In one embodiment, as shown in FIG. 8 , the
个数子模块801,用于统计相同异常栈的个数。The number sub-module 801 is used to count the number of the same exception stack.
聚类子模块802,用于按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心。The
在一个实施例中,所述聚类子模块802将得到的所述相似度存储到预设的存储结构中;所述存储结构包括:两层哈希图或数组;按照个数由高到低的顺序,对所述存储结构进行遍历,并进行聚类处理,以及确定类中心。In one embodiment, the
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.
图9是根据一示例性实施例示出的一种用于异常问题聚类的装置900的框图。例如,装置900可以被提供为一计算机。参照图9,装置900包括处理组件922,其进一步包括一个或多个处理器,以及由存储器932所代表的存储器资源,用于存储可由处理组件922的执行的指令,例如应用程序。存储器932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件922被配置为执行指令,以执行上述方法异常问题聚类。FIG. 9 is a block diagram of an
装置900还可以包括一个电源组件926被配置为执行装置900的电源管理,一个有线或无线网络接口950被配置为将装置900连接到网络,和一个输入输出(I/O)接口958。装置900可以操作基于存储在存储器932的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
一种异常问题聚类的装置,包括:A device for clustering abnormal problems, comprising:
处理器;processor;
用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
其中,所述处理器被配置为:wherein the processor is configured to:
从多个异常栈中分别提取出异常描述信息;Extract the exception description information from multiple exception stacks respectively;
计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度;Calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity;
根据所述相似度对异常栈进行聚类处理,并确定类中心。The abnormal stack is clustered according to the similarity, and the cluster center is determined.
所述处理器还可以被配置为:The processor may also be configured to:
所述异常描述信息至少包括下列之一:异常语言描述信息、异常标记、异常数值、异常代码标识、与异常相关的函数的库信息和类信息。The exception description information includes at least one of the following: exception language description information, exception flag, exception value, exception code identifier, library information and class information of the function related to the exception.
所述处理器还可以被配置为:The processor may also be configured to:
所述计算异常栈与异常之间,异常描述信息的相似性,得到相似度,包括:The similarity between the exception stack and the exception, the exception description information is calculated, and the similarity is obtained, including:
采用编辑距离算法,对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度;The edit distance algorithm is used to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the functions related to the exception, and obtain the corresponding sub-similarity of each exception description information;
采用精确匹配算法,对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性,得到相应的每项异常描述信息的子相似度;Using the exact matching algorithm, the similarity between the exception stack and the exception label, exception value, and exception code identifier is calculated respectively, and the corresponding sub-similarity of each exception description information is obtained;
根据每项异常描述信息的子相似度,得到异常栈的相似度。According to the sub-similarity of each exception description information, the similarity of the exception stack is obtained.
所述处理器还可以被配置为:The processor may also be configured to:
所述采用编辑距离算法,对异常栈与异常之间的与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度,包括:The edit distance algorithm is used to calculate the similarity of the library information and class information of the function related to the exception between the exception stack and the exception, and obtain the corresponding sub-similarity of each item of exception description information, including:
针对所述库信息和类信息中的每一层,分别计算子相似度;For each layer in the library information and the class information, calculate the sub-similarity respectively;
对同层级的多个子相似度求平均;Average multiple sub-similarities at the same level;
对多个层级的子相似度进行加权求和,其中,栈层级越低的子相似度对应的权重越大。A weighted sum is performed on the sub-similarities of multiple levels, wherein the sub-similarity with a lower stack level corresponds to a larger weight.
所述处理器还可以被配置为:The processor may also be configured to:
计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度之前,所述方法还包括:Before calculating the similarity of the exception description information between the exception stack and the exception stack, and obtaining the similarity, the method further includes:
对每个异常栈中的异常描述信息进行唯一性编码,得到唯一编码;Uniquely encode the exception description information in each exception stack to obtain a unique encoding;
根据所述唯一编码,对异常栈进行去重处理。The exception stack is deduplicated according to the unique code.
所述处理器还可以被配置为:The processor may also be configured to:
计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度,包括:Calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity, including:
对每个异常栈中的异常描述信息进行矢量编码,得到矢量编码结果;Perform vector encoding on the exception description information in each exception stack to obtain the vector encoding result;
计算异常栈与异常栈之间,矢量编码结果的相似性,得到相似度。Calculate the similarity of the vector encoding results between the exception stack and the exception stack to obtain the similarity.
所述处理器还可以被配置为:The processor may also be configured to:
所述根据所述相似度对异常栈进行聚类处理,并确定类中心,包括:The clustering process is performed on the abnormal stack according to the similarity, and the cluster center is determined, including:
统计相同异常栈的个数;Count the number of the same exception stack;
按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心。In order of the number from high to low, cluster processing is performed on the abnormal stacks according to the similarity, and the cluster center is determined.
所述处理器还可以被配置为:The processor may also be configured to:
所述按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心,包括:The abnormal stacks are clustered according to the similarity according to the order of the number from high to low, and the cluster center is determined, including:
将得到的所述相似度存储到预设的存储结构中;所述存储结构包括:两层哈希图或数组;The obtained similarity is stored in a preset storage structure; the storage structure includes: a two-layer hash map or an array;
按照个数由高到低的顺序,对所述存储结构进行遍历,并进行聚类处理,以及确定类中心。The storage structures are traversed in descending order of the number, and cluster processing is performed, and the cluster center is determined.
一种计算机可读存储介质,当所述存储介质中的指令由移动终端的处理器执行时,使得移动终端能够执行一种异常问题聚类的方法,所述方法包括:A computer-readable storage medium, when an instruction in the storage medium is executed by a processor of a mobile terminal, the mobile terminal can execute a method for clustering abnormal problems, the method comprising:
从多个异常栈中分别提取出异常描述信息;Extract the exception description information from multiple exception stacks respectively;
计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度;Calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity;
根据所述相似度对异常栈进行聚类处理,并确定类中心。The abnormal stack is clustered according to the similarity, and the cluster center is determined.
所述存储介质中的指令还可以包括:The instructions in the storage medium may also include:
所述异常描述信息至少包括下列之一:异常语言描述信息、异常标记、异常数值、异常代码标识、与异常相关的函数的库信息和类信息。The exception description information includes at least one of the following: exception language description information, exception flag, exception value, exception code identifier, library information and class information of the function related to the exception.
所述存储介质中的指令还可以包括:The instructions in the storage medium may also include:
所述计算异常栈与异常之间,异常描述信息的相似性,得到相似度,包括:The similarity between the exception stack and the exception, the exception description information is calculated, and the similarity is obtained, including:
采用编辑距离算法,对异常栈与异常之间的异常语言描述信息、与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度;The edit distance algorithm is used to calculate the similarity of the abnormal language description information between the exception stack and the exception, the library information and class information of the functions related to the exception, and obtain the corresponding sub-similarity of each exception description information;
采用精确匹配算法,对异常栈与异常之间的异常标记、异常数值、异常代码标识分别计算相似性,得到相应的每项异常描述信息的子相似度;Using the exact matching algorithm, the similarity between the exception stack and the exception label, exception value, and exception code identifier is calculated respectively, and the corresponding sub-similarity of each exception description information is obtained;
根据每项异常描述信息的子相似度,得到异常栈的相似度。According to the sub-similarity of each exception description information, the similarity of the exception stack is obtained.
所述存储介质中的指令还可以包括:The instructions in the storage medium may also include:
所述采用编辑距离算法,对异常栈与异常之间的与异常相关的函数的库信息和类信息分别计算相似性,得到相应的每项异常描述信息的子相似度,包括:The edit distance algorithm is used to calculate the similarity of the library information and class information of the function related to the exception between the exception stack and the exception, and obtain the corresponding sub-similarity of each item of exception description information, including:
针对所述库信息和类信息中的每一层,分别计算子相似度;For each layer in the library information and the class information, calculate the sub-similarity respectively;
对同层级的多个子相似度求平均;Average multiple sub-similarities at the same level;
对多个层级的子相似度进行加权求和,其中,栈层级越低的子相似度对应的权重越大。A weighted sum is performed on the sub-similarities of multiple levels, wherein the sub-similarity with a lower stack level corresponds to a larger weight.
所述存储介质中的指令还可以包括:The instructions in the storage medium may also include:
计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度之前,所述方法还包括:Before calculating the similarity of the exception description information between the exception stack and the exception stack, and obtaining the similarity, the method further includes:
对每个异常栈中的异常描述信息进行唯一性编码,得到唯一编码;Uniquely encode the exception description information in each exception stack to obtain a unique encoding;
根据所述唯一编码,对异常栈进行去重处理。The exception stack is deduplicated according to the unique code.
所述存储介质中的指令还可以包括:The instructions in the storage medium may also include:
计算异常栈与异常栈之间,异常描述信息的相似性,得到相似度,包括:Calculate the similarity of the exception description information between the exception stack and the exception stack to obtain the similarity, including:
对每个异常栈中的异常描述信息进行矢量编码,得到矢量编码结果;Perform vector encoding on the exception description information in each exception stack to obtain the vector encoding result;
计算异常栈与异常栈之间,矢量编码结果的相似性,得到相似度。Calculate the similarity of the vector encoding results between the exception stack and the exception stack to obtain the similarity.
所述存储介质中的指令还可以包括:The instructions in the storage medium may also include:
所述根据所述相似度对异常栈进行聚类处理,并确定类中心,包括:The clustering process is performed on the abnormal stack according to the similarity, and the cluster center is determined, including:
统计相同异常栈的个数;Count the number of the same exception stack;
按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心。In order of the number from high to low, cluster processing is performed on the abnormal stacks according to the similarity, and the cluster center is determined.
所述存储介质中的指令还可以包括:The instructions in the storage medium may also include:
所述按照个数由高到低的顺序,根据所述相似度对异常栈进行聚类处理,并确定类中心,包括:The abnormal stacks are clustered according to the similarity according to the order of the number from high to low, and the cluster center is determined, including:
将得到的所述相似度存储到预设的存储结构中;所述存储结构包括:两层哈希图或数组;The obtained similarity is stored in a preset storage structure; the storage structure includes: a two-layer hash map or an array;
按照个数由高到低的顺序,对所述存储结构进行遍历,并进行聚类处理,以及确定类中心。The storage structures are traversed in descending order of the number, and cluster processing is performed, and the cluster center is determined.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (18)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910407682.0A CN111950573A (en) | 2019-05-16 | 2019-05-16 | Method and device for clustering abnormal problems |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910407682.0A CN111950573A (en) | 2019-05-16 | 2019-05-16 | Method and device for clustering abnormal problems |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN111950573A true CN111950573A (en) | 2020-11-17 |
Family
ID=73335879
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910407682.0A Pending CN111950573A (en) | 2019-05-16 | 2019-05-16 | Method and device for clustering abnormal problems |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111950573A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113849330A (en) * | 2021-09-22 | 2021-12-28 | 北京基调网络股份有限公司 | A method, device and storage medium for monitoring and analyzing application failure causes |
| CN114595244A (en) * | 2022-03-11 | 2022-06-07 | 北京字节跳动网络技术有限公司 | Collapse data aggregation method and device, electronic equipment and storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103886048A (en) * | 2014-03-13 | 2014-06-25 | 浙江大学 | Cluster-based increment digital book recommendation method |
| US20170124324A1 (en) * | 2015-10-29 | 2017-05-04 | International Business Machines Corporation | Using call stack snapshots to detect anomalous computer behavior |
| CN106933689A (en) * | 2015-12-29 | 2017-07-07 | 伊姆西公司 | A kind of method and apparatus for computing device |
| CN108509975A (en) * | 2018-01-26 | 2018-09-07 | 北京三快在线科技有限公司 | A kind of exception on-line talking method and device, electronic equipment |
| CN108647106A (en) * | 2018-05-11 | 2018-10-12 | 深圳市腾讯网络信息技术有限公司 | Using abnormality eliminating method, storage medium and computer equipment |
-
2019
- 2019-05-16 CN CN201910407682.0A patent/CN111950573A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103886048A (en) * | 2014-03-13 | 2014-06-25 | 浙江大学 | Cluster-based increment digital book recommendation method |
| US20170124324A1 (en) * | 2015-10-29 | 2017-05-04 | International Business Machines Corporation | Using call stack snapshots to detect anomalous computer behavior |
| CN106933689A (en) * | 2015-12-29 | 2017-07-07 | 伊姆西公司 | A kind of method and apparatus for computing device |
| CN108509975A (en) * | 2018-01-26 | 2018-09-07 | 北京三快在线科技有限公司 | A kind of exception on-line talking method and device, electronic equipment |
| CN108647106A (en) * | 2018-05-11 | 2018-10-12 | 深圳市腾讯网络信息技术有限公司 | Using abnormality eliminating method, storage medium and computer equipment |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113849330A (en) * | 2021-09-22 | 2021-12-28 | 北京基调网络股份有限公司 | A method, device and storage medium for monitoring and analyzing application failure causes |
| CN116107789A (en) * | 2021-09-22 | 2023-05-12 | 北京基调网络股份有限公司 | Method for monitoring and analyzing application fault reasons and storage medium |
| CN114595244A (en) * | 2022-03-11 | 2022-06-07 | 北京字节跳动网络技术有限公司 | Collapse data aggregation method and device, electronic equipment and storage medium |
| CN114595244B (en) * | 2022-03-11 | 2023-10-17 | 抖音视界有限公司 | Method and device for aggregating crash data, electronic equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109697451B (en) | Similar image clustering method and device, storage medium and electronic equipment | |
| CN111814664B (en) | Method, device, computer equipment and storage medium for identifying labels in drawing | |
| US11593676B2 (en) | Natural language processing and machine learning assisted cataloging and recommendation engine | |
| US12055998B2 (en) | Intelligent grouping of events in computing system event and computing system incident management domains | |
| CN111400597A (en) | Information classification method and related equipment based on k-means algorithm | |
| CN115412371A (en) | Internet of things-based big data security protection method, system and cloud platform | |
| CN111950573A (en) | Method and device for clustering abnormal problems | |
| CN114153646A (en) | Operation and maintenance fault handling method and device, storage medium and processor | |
| CN111401959B (en) | Risk group prediction method, apparatus, computer device and storage medium | |
| CN117389908A (en) | Dependency analysis method, system and medium for interface automation test case | |
| CN115345324A (en) | Fault location method, device, equipment, storage medium and product | |
| CN110009045A (en) | IOT terminal identification method and device | |
| CN111966515A (en) | Business abnormal data processing method and device, computer equipment and storage medium | |
| CN113781156B (en) | Malicious order identification method, model training method, device and storage medium | |
| CN114418226A (en) | Fault analysis method and device of power communication system | |
| CN112084095B (en) | Energy network connection monitoring method and system based on block chain and storage medium | |
| CN112905987B (en) | Account identification method, device, server and storage medium | |
| CN115222315A (en) | Logistics package abnormal state identification method and device and computer equipment | |
| CN115146653B (en) | Dialogue scenario construction method, device, equipment and storage medium | |
| CN108960455A (en) | Service operation state analysis method, calculates equipment and storage medium at device | |
| CN116303548A (en) | Data blood-edge full-link analysis method and system based on graph technology | |
| US11169964B2 (en) | Hash suppression | |
| CN115982136A (en) | Data migration method, device and equipment | |
| WO2019085075A1 (en) | Information element set generation method and rule execution method based on rule engine | |
| CN114066210A (en) | Big Data Distributed Task Scheduling System and Scheduling Method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201117 |