CN109992476B

CN109992476B - Log analysis method, server and storage medium

Info

Publication number: CN109992476B
Application number: CN201910213011.0A
Authority: CN
Inventors: 陈涛
Original assignee: Wangsu Science and Technology Co Ltd
Current assignee: Wangsu Science and Technology Co Ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2023-08-18
Anticipated expiration: 2039-03-20
Also published as: CN109992476A

Abstract

The embodiment of the application relates to the field of data processing, and discloses a log analysis method, a server and a storage medium. In some embodiments of the present application, a method for analyzing a log includes: acquiring a first log to be processed; processing the first log to obtain a word bag of the first log; determining similarity of a word bag of the first log and a word bag of a reference log in a mapping file, wherein the mapping file comprises the word bag of the reference log and a fault class of the reference log and/or a fault level of the reference log; and determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bags of the first log and the word bags of the reference log. In the implementation, the server can analyze the first log by using the mapping file, determine the fault class of the first log and/or the fault level of the first log, so that the intelligence of the server is improved, and the pressure of maintenance personnel from analyzing the log is reduced.

Description

A log analysis method, server and storage medium

技术领域technical field

本发明实施例涉及数据处理领域，特别涉及一种日志的分析方法、服务器及存储介质。The embodiments of the present invention relate to the field of data processing, and in particular to a log analysis method, server and storage medium.

背景技术Background technique

内核日志是服务器在运行过程中，用于记录自身，以及所运行进程、模块等性能状况的主要手段。然而，对于一部分内核消息而言，内核日志却无法将其记录。例如系统在发生宕机(panic)的时候，有一部分信息是直接通过屏幕显示的，这部分信息由于宕机的原因无法被记录在内核日志中。当系统重启之后，这部分信息就会消失。当前，一些传输工具，例如，netconsoles，解决了这部分内核日志无法被收集的问题。它通过网络，将这部分内核日志发到了另一台服务器上进行存储，使得系统得以保留的内核日志尽可能的没有遗漏。The kernel log is the main means used to record the server itself, as well as the performance status of the running processes and modules during the running process. However, for some kernel messages, the kernel log cannot record them. For example, when the system crashes (panic), some information is directly displayed on the screen, and this part of information cannot be recorded in the kernel log due to the crash. When the system is restarted, this part of the information will disappear. Currently, some transmission tools, such as netconsoles, solve the problem that this part of the kernel log cannot be collected. It sends this part of the kernel log to another server for storage through the network, so that the kernel log that the system can keep is as complete as possible.

然而，发明人发现现有技术中至少存在如下问题：每日产生的日志量巨大，尤其对于企业级的服务器数量而言，日志量更是极为庞大，对每台服务器的日志进行人工处理将浪费大量的时间和精力。However, the inventor has found that there are at least the following problems in the prior art: the amount of logs generated every day is huge, especially for the number of enterprise-level servers, the amount of logs is extremely large, and it will be wasteful to manually process the logs of each server. A lot of time and energy.

需要说明的是，在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解，因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background section is only for enhancing the understanding of the background of the present disclosure, and therefore may include information that does not constitute the prior art known to those of ordinary skill in the art.

发明内容Contents of the invention

本发明实施方式的目的在于提供一种日志的分析方法、服务器及存储介质，使得减少了记录的日志的数量，减少了人工处理日志所花费的时间和精力。The purpose of the embodiments of the present invention is to provide a log analysis method, server and storage medium, so that the number of recorded logs is reduced, and the time and energy spent on manual processing of logs are reduced.

为解决上述技术问题，本发明的实施方式还提供了一种日志的分析方法，包括以下步骤：获取待处理的第一日志；对第一日志进行处理，得到第一日志的词袋；确定第一日志的词袋与映射文件中的参考日志的词袋的相似度，其中，映射文件包括参考日志的词袋，以及参考日志的故障类别，和/或，参考日志的故障级别；根据第一日志的词袋与参考日志的词袋的相似度，确定第一日志的故障类别，和/或，第一日志的故障级别。In order to solve the above technical problems, the embodiment of the present invention also provides a log analysis method, including the following steps: obtaining the first log to be processed; processing the first log to obtain the bag of words of the first log; determining the first log The similarity between the bag of words of a log and the bag of words of the reference log in the mapping file, wherein the mapping file includes the bag of words of the reference log, and the fault category of the reference log, and/or, the fault level of the reference log; according to the first The similarity between the bag of words of the log and the bag of words of the reference log determines the fault category of the first log, and/or, the fault level of the first log.

本发明的实施方式还提供了一种服务器，包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行如上述实施方式提及的日志的分析方法。Embodiments of the present invention also provide a server, including: at least one processor; and a memory connected in communication with at least one processor; wherein, the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor. The processor executes, so that at least one processor can execute the log analysis method mentioned in the above implementation manner.

本发明的实施方式还提供了一种计算机可读存储介质，存储有计算机程序，计算机程序被处理器执行时实现上述实施方式提及的日志的分析方法。Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, and implementing the log analysis method mentioned in the above embodiment when the computer program is executed by a processor.

本发明实施方式相对于现有技术而言，将待处理的日志的词袋和历史日志的词袋进行对比，能够确定待处理的日志的关系和历史日志的词袋的关系，该关系体现了待处理的日志和历史日志的关系。由于服务器能够确定待处理的日志和历史日志的关系，即可根据该关系，有选择的保留已记录的日志，减少了记录的日志的数量，降低了人工处理日志的负担。Compared with the prior art, the embodiment of the present invention compares the bag of words of the log to be processed with the bag of words of the historical log, and can determine the relationship between the log to be processed and the bag of words of the historical log, which reflects the The relationship between pending logs and historical logs. Since the server can determine the relationship between the log to be processed and the historical log, it can selectively retain the recorded log according to the relationship, reducing the number of recorded logs and reducing the burden of manual log processing.

另外，根据第一日志的词袋与参考日志的词袋的相似度，确定第一日志的故障类别，和/或，第一日志的故障级别，具体包括：将与第一日志的词袋的相似度最高的参考日志的故障类别，作为第一日志的故障类别，和/或，将与第一日志的词袋的相似度最高的参考日志的故障级别，作为第一日志的故障级别。该实现中，能够自动确定日志的故障类别，和/或，故障级别，提高了智能性，便于维护人员了解该服务器发生的故障类型和故障级别。In addition, according to the similarity between the word bag of the first log and the word bag of the reference log, determine the fault category of the first log, and/or, the fault level of the first log, specifically include: The failure category of the reference log with the highest similarity is used as the failure category of the first log, and/or the failure level of the reference log with the highest similarity to the bag of words of the first log is used as the failure level of the first log. In this implementation, the fault type and/or fault level of the log can be automatically determined, which improves intelligence and facilitates maintenance personnel to know the fault type and fault level that occur on the server.

另外，根据第一日志的词袋与参考日志的词袋的相似度，确定第一日志的故障类别，和/或，第一日志的故障级别，具体包括：判断映射文件中是否存在与第一日志的词袋的相似度大于第二预设值的参考日志的词袋；若确定是，将与第一日志的词袋的相似度最高的参考日志的故障类别，作为第一日志的故障类别，和/或，将与第一日志的词袋的相似度最高的参考日志的故障级别，作为第一日志的故障级别；否则，确定第一日志的词袋的故障类别为未知类别，和/或，确定第一日志的故障级别为未知级别。该实现中，使得能够自动识别新故障。In addition, according to the similarity between the bag of words of the first log and the bag of words of the reference log, determining the fault category of the first log, and/or, the fault level of the first log, specifically includes: judging whether there is a The similarity of the bag of words of the log is greater than the bag of words of the reference log of the second preset value; if it is determined to be yes, the fault category of the reference log with the highest similarity to the bag of words of the first log is used as the fault category of the first log , and/or, use the fault level of the reference log with the highest similarity with the bag of words of the first log as the fault level of the first log; otherwise, determine the fault category of the bag of words of the first log as an unknown category, and/or Or, determine that the failure level of the first log is an unknown level. In this implementation, automatic identification of new faults is enabled.

另外，映射文件包括参考日志的词袋、参考日志的故障类别和参考日志的故障级别；在确定第一日志的词袋的故障类别为未知类别，确定第一日志的故障级别为未知级别之后，日志的分析方法还包括：上报第一日志；根据用户指定的故障类别和故障级别，确定第一日志的故障类别和第一日志的故障级别；根据第一日志的词袋、第一日志的故障类别和第一日志的故障级别，更新映射文件。该实现中，能够根据识别到的新故障类别的日志更新映射文件，不断扩充映射文件，便于更准确地对后续的日志进行分析。In addition, the mapping file includes the word bag of the reference log, the fault category of the reference log, and the fault level of the reference log; after determining that the fault category of the word bag of the first log is an unknown category, and determining that the fault level of the first log is an unknown level, The log analysis method also includes: reporting the first log; determining the fault category of the first log and the fault level of the first log according to the fault category and fault level specified by the user; Category and failure level of the first log, update the mapping file. In this implementation, the mapping file can be updated according to the identified log of the new fault category, and the mapping file can be continuously expanded to facilitate more accurate analysis of subsequent logs.

另外，映射文件包括参考日志的词袋、参考日志的故障类别和参考日志的故障级别；在根据第一日志的词袋与参考日志的词袋的相似度，确定第一日志的故障类别和第一日志的故障级别之后，日志的分析方法还包括：判断记录的日志中是否存在第二日志，其中，第二日志为与第一日志属于同一故障类别的日志；若确定存在，比较第一日志的故障级别和第二日志的故障级别，根据比较结果，更新记录的日志；若确定不存在，记录第一日志。In addition, the mapping file includes the bag of words of the reference log, the fault category of the reference log and the fault level of the reference log; according to the similarity between the bag of words of the first log and the bag of words of the reference log, the fault category of the first log and the fault category of the second log are determined. After the failure level of a log, the log analysis method also includes: judging whether there is a second log in the recorded log, wherein, the second log is a log belonging to the same fault category as the first log; if it is determined to exist, compare the first log According to the comparison result, update the recorded log; if it is determined that it does not exist, record the first log.

另外，根据比较结果，更新记录的日志，具体包括：若确定比较结果指示第一日志的故障级别高于第二日志的故障级别，用第一日志覆盖第二日志；若确定比较结果指示第一日志的故障级别不高于第二日志的故障级别，不用第一日志覆盖第二日志。该实现中，记录同一故障类别中故障级别较高的日志，保证参考日志的重要程度不断提升，从而达到告警不断升级的效果。In addition, according to the comparison result, updating the recorded log specifically includes: if it is determined that the comparison result indicates that the failure level of the first log is higher than that of the second log, overwriting the second log with the first log; The fault level of the log is not higher than that of the second log, and the second log is not overwritten by the first log. In this implementation, the log with a higher fault level in the same fault category is recorded to ensure that the importance of the reference log is continuously improved, so as to achieve the effect of continuous escalation of alarms.

另外，确定第一日志的词袋与映射文件中的参考日志的词袋的相似度，具体包括：按照第一日志的词袋、参考日志的词袋和相似度的约束关系，计算相似度；其中，约束关系为：相似度＝同时出现在第一日志的词袋和参考日志的词袋的单词数量/(第一日志的词袋的单词数量+参考日志的单词数量-同时出现在第一日志的词袋和参考日志的词袋的单词数量)。In addition, determining the similarity between the bag of words of the first log and the bag of words of the reference log in the mapping file specifically includes: calculating the similarity according to the constraint relationship between the bag of words of the first log, the bag of words of the reference log, and the similarity; Wherein, constraint relation is: similarity=appear in the word bag of the first log and the word quantity of the word bag of reference log/(the word quantity of the word bag of the first log+the word quantity of reference log-appear in the first log at the same time number of words in bag-of-words log and bag-of-words reference log).

另外，在按照第一日志的词袋、参考日志的词袋和相似度的约束关系，计算相似度之前，日志的分析方法还包括：去除第一日志的词袋和参考日志的词袋中的无效单词；其中，无效单词为预先指定的单词。该实现中，能够避免无效单词对第一日志的词袋与映射文件中的参考日志的词袋的相似度的影响。In addition, before calculating the similarity according to the bag of words of the first log, the bag of words of the reference log and the constraint relationship of the similarity, the analysis method of the log also includes: removing the bag of words of the first log and the bag of words of the reference log Invalid word; where invalid word is a pre-specified word. In this implementation, the impact of invalid words on the similarity between the bag of words of the first log and the bag of words of the reference log in the mapping file can be avoided.

另外，对第一日志进行处理，得到第一日志的词袋，具体包括：删除第一日志中的变量，变量为预先设置的参量；将删除变量后的第一日志拆分为N个单词，生成待处理的日志的词袋，N为正整数。In addition, the first log is processed to obtain the bag of words of the first log, which specifically includes: deleting the variable in the first log, where the variable is a preset parameter; splitting the first log after the variable is deleted into N words, Generate bag-of-words for logs to be processed, N is a positive integer.

另外，预先设置的参量至少包括坏道的位置信息、坏道的编号信息、坏块的位置信息和坏块的编号信息中的任意一个。In addition, the preset parameters include at least any one of bad track location information, bad track number information, bad block location information, and bad block number information.

另外，删除第一日志中的变量，具体包括：识别第一日志的正文部分的数字；将第一日志的正文部分的数字删除。In addition, deleting the variable in the first log specifically includes: identifying the number of the text part of the first log; and deleting the number of the text part of the first log.

附图说明Description of drawings

一个或多个实施例通过与之对应的附图中的图片进行示例性说明，这些示例性说明并不构成对实施例的限定，附图中具有相同参考数字标号的元件表示为类似的元件，除非有特别申明，附图中的图不构成比例限制。One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplifications do not constitute a limitation to the embodiments. Elements with the same reference numerals in the drawings represent similar elements. Unless otherwise stated, the drawings in the drawings are not limited to scale.

图1是本发明的第一实施方式的日志的处理方法的流程图；Fig. 1 is the flowchart of the log processing method of the first embodiment of the present invention;

图2是本发明的第二实施方式的日志的处理方法的流程图；Fig. 2 is the flowchart of the log processing method of the second embodiment of the present invention;

图3是本发明的第三实施方式的日志的分析方法的流程图；Fig. 3 is the flow chart of the log analysis method of the third embodiment of the present invention;

图4是本发明的第四实施方式的日志的分析方法的流程图；Fig. 4 is a flowchart of a log analysis method according to a fourth embodiment of the present invention;

图5是本发明的第五实施方式的服务器的结构示意图；FIG. 5 is a schematic structural diagram of a server according to a fifth embodiment of the present invention;

图6是本发明的第六实施方式的服务器的结构示意图。FIG. 6 is a schematic structural diagram of a server according to a sixth embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合附图对本发明的各实施方式进行详细的阐述。然而，本领域的普通技术人员可以理解，在本发明各实施方式中，为了使读者更好地理解本申请而提出了许多技术细节。但是，即使没有这些技术细节和基于以下各实施方式的种种变化和修改，也可以实现本申请所要求保护的技术方案。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, various implementation modes of the present invention will be described in detail below in conjunction with the accompanying drawings. However, those of ordinary skill in the art can understand that, in each implementation manner of the present invention, many technical details are provided for readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following implementation modes, the technical solution claimed in this application can also be realized.

本发明的第一实施方式涉及一种日志的处理方法，应用于服务器。如图1所示，该日志的处理方法包括：The first embodiment of the present invention relates to a log processing method, which is applied to a server. As shown in Figure 1, the log processing methods include:

步骤101：获取待处理的日志。Step 101: Obtain logs to be processed.

具体地说，待处理的日志可以是服务器自身产生的日志，也可以是存储在该服务器上的其他服务器的日志。其中，其他服务器可以通过netconsoles将自身的日志传输到该服务器，也可以通过其他方式，将其他服务器的日志拷贝至该服务器。Specifically, the logs to be processed may be logs generated by the server itself, or logs of other servers stored on the server. Among them, other servers can transfer their own logs to this server through netconsoles, or copy logs of other servers to this server through other methods.

需要说明的是，本领域技术人员可以理解，该日志的处理方法，可以应用于服务器对已记录的多条日志进行处理的过程中，也可以应用于服务器在日志产生后，判断是否记录该日志的过程中，本实施方式不限制该日志的处理方法的应用场景。It should be noted that those skilled in the art can understand that the log processing method can be applied to the process of the server processing multiple recorded logs, and can also be applied to the server to determine whether to record the log after the log is generated. During the process, this embodiment does not limit the application scenarios of the log processing method.

为阐述清楚，本实施方式中，假设，服务器在接收到第一日志文件后，使用本实施方式提及的日志的处理方法，对每条日志按从旧到新的顺序依次进行处理。本领域技术人员可以理解，实际应用中，服务器处理自身产生的日志的过程可以参考本实施方式的相关内容，此处不再赘述。For clarification, in this embodiment, it is assumed that after receiving the first log file, the server uses the log processing method mentioned in this embodiment to process each log in order from old to new. Those skilled in the art can understand that, in practical applications, the process of processing the logs generated by the server itself can refer to the relevant content of this embodiment, and will not be repeated here.

步骤102：对待处理的日志进行处理，得到待处理的日志的词袋。Step 102: Process the log to be processed to obtain the bag of words of the log to be processed.

具体地说，日志主要是由单词组成，将一条日志转换成一个若干个单词组成的词袋，词袋中不含重复单词，通过日志的词袋间的关系，可以确定日志之间的关系。Specifically, the log is mainly composed of words. A log is converted into a bag of words consisting of several words. The bag of words does not contain repeated words. The relationship between the logs can be determined through the relationship between the bags of words in the log.

在一个例子中，服务器先删除待处理的日志中的变量；将删除变量后的待处理的日志拆分为N个单词，生成待处理的日志的词袋，N为正整数。即服务器在将日志转换为词袋前，对待处理的日志进行压缩处理。由于待处理的日志中可能存在一些过于详细的信息，或者是维护人员不看重的信息，该信息对分析服务器的运行状态的用处不大，故维护人员可以将这些信息设置为变量，以便服务器在处理日志时，将这些信息删除，压缩日志。In an example, the server first deletes the variable in the log to be processed; splits the log to be processed after the variable is deleted into N words, and generates a bag of words of the log to be processed, and N is a positive integer. That is, the server compresses the logs to be processed before converting the logs into word bags. Since there may be some overly detailed information in the logs to be processed, or information that maintenance personnel do not value, this information is not very useful for analyzing the running status of the server, so maintenance personnel can set these information as variables so that the server can When processing the log, delete this information and compress the log.

以下对服务器删除待处理的日志中的变量的过程进行举例说明。The process of deleting the variable in the log to be processed by the server is illustrated as an example below.

在一个例子中，变量为预先设置的参量。其中，预先设置的参量至少包括坏道的位置信息、坏道的编号信息、坏块的位置信息和坏块的编号信息中的任意一个。该情况下，服务器删除上述变量的方法可以是：识别待处理的日志的正文部分的数字；将待处理的日志的正文部分的数字删除。In one example, the variables are preset parameters. Wherein, the preset parameters include at least any one of bad track location information, bad track number information, bad block location information, and bad block number information. In this case, the method for the server to delete the above variable may be: identifying the number of the body part of the log to be processed; and deleting the number of the body part of the log to be processed.

对于每一条日志，可以将其拆分为两个部分，即时间戳部分和正文部分。对于正文部分而言，去除其中的预先设置的变量，保留其中的定量。其中，定量为日志中较为重要的信息。变量和定量的确定可以根据经验和需求设置。For each log, it can be split into two parts, namely the timestamp part and the body part. For the text part, the pre-set variables are removed, and the quantification is retained. Among them, quantitative is more important information in the log. The determination of variables and quantification can be set according to experience and needs.

在一个例子中，变量可以包括但不限于以下信息：In one example, variables may include, but are not limited to, the following information:

1.过于详细的信息，如坏道的位置信息(如地址)、坏块的位置信息、坏道的编号信息和坏块的编号信息，上述信息由一连串的数字，或是数字加英文字母组成。1. Too detailed information, such as bad track location information (such as address), bad block location information, bad track number information, and bad block number information. The above information is composed of a series of numbers, or numbers plus English letters .

2.较为不重要的信息，如sda1中的1表示名称为sda的磁盘的第一分区，其中，sda为重要信息，1为不重要信息。2. Relatively unimportant information, such as 1 in sda1 indicates the first partition of the disk named sda, wherein sda is important information, and 1 is unimportant information.

由于上述信息中均包括数字，服务器可以通过识别数字的方法来确定日志中的变量。当然，本实施方式中以识别数字的方法来确定日志中的变量，并不代表数字均为不重要的信息。例如，内容为Nov 26 00:24:04 CPU27:Package power limit notification(total events＝173318)的日志中，CPU27中的27的重要性与sda磁盘中的a的重要性相同，不应该被删除。其中，Nov 26 00:24:04 CPU27:Package power limit notification(total events＝173318)表示在11月26日0点24分4秒，27号中央处理器(CentralProcessing Unit，CPU)的性能功率限制通知(总事件＝173318)。该情况下，可以通过修改删除规则，避免错误删除该重要信息的情况。例如，在将待处理的日志的正文部分的数字删除之前，服务器判断该数字之前的单词是不是CPU，若确定不是，则删除该数字。Since the above information includes numbers, the server can determine the variables in the log by identifying the numbers. Of course, in this embodiment, the method of identifying numbers to determine the variables in the log does not mean that the numbers are all unimportant information. For example, in the log whose content is Nov 26 00:24:04 CPU27:Package power limit notification(total events=173318), the importance of 27 in CPU27 is the same as that of a in sda disk, and should not be deleted. Among them, Nov 26 00:24:04 CPU27:Package power limit notification(total events=173318) indicates that at 0:24:4 on November 26, the performance power limit notification of the No. 27 central processing unit (Central Processing Unit, CPU) (total events = 173318). In this case, the deletion rule can be modified to avoid the situation that the important information is deleted by mistake. For example, before deleting the number in the text part of the log to be processed, the server judges whether the word before the number is CPU, and if it is determined that it is not, then deletes the number.

需要说明的是，本领域技术人员可以理解，当预先设置的变量发生变化时，服务器识别变量的方法也可能发生变化，实际应用中，可以根据需要设置服务器识别变量的方法。It should be noted that those skilled in the art can understand that when the preset variable changes, the method for identifying the variable by the server may also change. In practical applications, the method for identifying the variable by the server can be set as required.

以下结合实际情况，说明对待处理的日志进行压缩处理的过程。The following describes the process of compressing the logs to be processed in combination with the actual situation.

例如，待处理的日志为：Nov 20 00:01:02 I/O error on device sdc1,logicalblock 1057。该日志的正文部分为：I/O error on device sdc1，logical block 1057。其中，变量为sdc1，指代的是sdc磁盘的第一个分区，变量1057表示的是第1057个逻辑块，定量即为“I/O error on device”与“logical block”。因此，该日志告诉我们在对sdc盘的第一个分区的1057号逻辑块进行读写操作时出现了错误。对日志进行压缩处理，即对当前的日志中的信息进行一些修改，或是说是丢弃一些较为不重要的信息，以达到压缩的目的。对示例中的日志记录而言，可以抛弃1057这个信息以及sdc的“第一个分区”这个信息。因此，对示例的日志进行压缩处理后，得到的信息为I/O error on device sdc，logical block。该信息表示名称为sdc的词盘出现了逻辑块读写错误。For example, the pending log is: Nov 20 00:01:02 I/O error on device sdc1, logicalblock 1057. The text part of the log is: I/O error on device sdc1, logical block 1057. Among them, the variable sdc1 refers to the first partition of the sdc disk, and the variable 1057 represents the 1057th logical block, which is "I/O error on device" and "logical block". Therefore, the log tells us that an error occurred when reading and writing logical block No. 1057 of the first partition of the sdc disk. Compress the log, that is, modify the information in the current log, or discard some less important information to achieve the purpose of compression. For the log records in the example, the information of 1057 and the information of "first partition" of sdc can be discarded. Therefore, after compressing the log of the example, the information obtained is I/O error on device sdc, logical block. This information indicates that a logical block read/write error occurred on the disk named sdc.

通过上述内容可知，对待处理的日志的压缩处理，保留了待处理的日志中较为重要的信息，丢弃了较不重要的信息，减少了日志占用的存储空间。From the above content, it can be known that the compression processing of the log to be processed retains more important information in the log to be processed, discards less important information, and reduces the storage space occupied by the log.

需要说明的是，本领域技术人员可以理解，实际应用中，服务器也可以对待处理的日志进行其他处理，本实施方式以压缩处理为例，但压缩处理并不是对待处理的日志进行处理的过程中的必要步骤，可以有选择的执行该部分内容。It should be noted that those skilled in the art can understand that in practical applications, the server can also perform other processing on the logs to be processed. In this embodiment, the compression processing is taken as an example, but the compression processing is not in the process of processing the logs to be processed. You can selectively execute this part of the necessary steps.

步骤103：将待处理的日志的词袋与历史日志的词袋进行对比，确定待处理的日志的词袋与历史日志的词袋的关系。Step 103: Compare the bag of words of the log to be processed with the bag of words of the historical log, and determine the relationship between the bag of words of the log to be processed and the bag of words of the historical log.

具体地说，待处理的日志的词袋(以下简称词袋1)与历史日志的词袋(以下简称词袋2)之间的关系包括但不限于：第一关系、第二关系、第三关系、第四关系和第五关系。其中，第一关系为历史日志的词袋包含待处理的日志的词袋，第二关系为历史日志的词袋与待处理的日志的词袋相等，第三关系为待处理的日志的词袋包含历史日志的词袋，第四关系为待处理的日志的词袋与历史日志的词袋相交，第五关系为待处理的日志的词袋与历史日志的词袋独立。Specifically, the relationship between the bag of words of the log to be processed (hereinafter referred to as bag of words 1) and the bag of words of the historical log (hereinafter referred to as bag of words 2) includes but is not limited to: first relationship, second relationship, third relationship relationship, fourth relationship and fifth relationship. Among them, the first relationship is that the bag of words of the historical log contains the bag of words of the log to be processed, the second relationship is that the bag of words of the historical log is equal to the bag of words of the log to be processed, and the third relationship is the bag of words of the log to be processed It contains the bag of words of the historical log, the fourth relationship is that the bag of words of the log to be processed intersects the bag of words of the historical log, and the fifth relationship is that the bag of words of the log to be processed is independent of the bag of words of the historical log.

以下对待处理的日志的词袋(以下简称词袋1)与历史日志的词袋(以下简称词袋2)的各种关系进行解释说明。Various relationships between the bag of words of the log to be processed (hereinafter referred to as bag of words 1) and the bag of words of the historical log (hereinafter referred to as bag of words 2) are explained below.

首先，对第一关系和第三关系进行解释，即对词袋1包含词袋2或词袋2包含词袋1表征的包含关系进行解释。词袋1包含词袋2表示词袋2所有单词均出现在词袋1中，而词袋1有一些单词却不在词袋2中。词袋2包含词袋1表示词袋1所有单词均出现在词袋2中，而词袋2有一些单词却不在词袋1中。当日志的传输过程出现问题时，相同的两条日志的词袋的关系可能是该包含关系。例如，对于两句完全相同的日志内容而言，netconsoles在传输日志的过程中，其中一句丢失了一些元素，导致服务器接收到一句完整的日志和一句残缺的日志。该情况下，完整的日志的词袋包含残缺的日志的词袋，词袋之间存在包含关系。First, explain the first relationship and the third relationship, that is, explain the containment relationship represented by the bag of words 1 containing the bag of words or the bag of words 2 containing the representation of the bag of words 1 . Bag of words 1 contains bag of words 2 means that all words of bag of words 2 appear in bag of words 1, while bag of words 1 has some words that are not in bag of words 2. Bag of words 2 contains bag of words 1 means that all words of bag of words 1 appear in bag of words 2, and bag of words 2 has some words that are not in bag of words 1. When there is a problem in the log transmission process, the bag-of-words relationship of the same two logs may be the inclusion relationship. For example, for two identical log sentences, during the process of netconsoles transmitting logs, some elements are lost in one sentence, causing the server to receive a complete log sentence and an incomplete log sentence. In this case, the bag of words of the complete log contains the bag of words of the incomplete log, and there is an inclusion relationship between the bags of words.

然后，对第二关系进行解释，即对词袋1与词袋2相等表征的相等关系进行解释。词袋1与词袋2相等，说明词袋1中的单词和词袋2中的单词完全相同。例如，两条完全相同的日志所产生的词袋相等，或者是，重要信息相同的日志的词袋相等。Then, explain the second relationship, that is, explain the equality relationship represented by the bag-of-words 1 and the bag-of-words 2 being equal. Word bag 1 is equal to word bag 2, indicating that the words in word bag 1 and word bag 2 are exactly the same. For example, the bags of words generated by two identical logs are equal, or the bags of words of logs with the same important information are equal.

接着，对第四关系进行解释，即对词袋1与词袋2相交表征的相交关系进行解释。词袋1与词袋2相交表示词袋1和词袋2中有一部分单词一一对应，但彼此均有一部分单词没有出现在对方的词袋中。当待处理的日志与历史日志中的一些重要信息相同，一些重要信息不同时，词袋1与词袋2为相交关系。例如，待处理的日志为Nov 25 18:09:11 Kernelpanic-not syncing:Fatal hardware error！历史日志为Nov 20 00:01:02 I/O error ondevice sdc1,logical block 1057，两个词袋均有单词“error”，但却是两条完全不同的日志。其中，“Nov 25 18:09:11 Kernel panic-not syncing:Fatal hardware error！”表示11月25日18点9分11秒，系统出现了内核死机-不同步：致命的硬件错误！“Nov 20 00:01:02I/O error on device sdc1,logical block 1057”表示11月20日0点1分2秒，在名称为sdc的磁盘的第一个分区的逻辑块1057上发生了读写错误。Next, the fourth relationship is explained, that is, the intersection relationship represented by the intersection of bag of words 1 and bag of words 2 is explained. The intersection of word bag 1 and word bag 2 means that some words in word bag 1 and word bag 2 have one-to-one correspondence, but some words do not appear in each other's word bags. When some important information in the log to be processed is the same as in the historical log, and some important information is different, the bag of words 1 and bag of words 2 are intersecting. For example, the pending log is Nov 25 18:09:11 Kernelpanic-not syncing: Fatal hardware error! The historical log is Nov 20 00:01:02 I/O error ondevice sdc1, logical block 1057. Both bags of words have the word "error", but they are two completely different logs. Among them, "Nov 25 18:09:11 Kernel panic-not syncing: Fatal hardware error!" means that at 18:09:11 on November 25, the system experienced a kernel panic-not syncing: fatal hardware error! "Nov 20 00:01:02I/O error on device sdc1, logical block 1057" indicates that at 00:01:02 on November 20, a read occurred on logical block 1057 of the first partition of the disk named sdc Written wrong.

最后，对第五关系进行解释，即对词袋1与词袋2独立进行解释说明。词袋1与词袋2独立，即词袋1与词袋2没有相同的单词，各自对应的日志之间完全没有联系。Finally, explain the fifth relationship, that is, explain bag-of-words 1 and bag-of-words 2 independently. Bag of words 1 and bag of words 2 are independent, that is, bag of words 1 and bag of words 2 do not have the same words, and there is no connection between the corresponding logs.

在一个例子中，服务器中设置有屏蔽词袋，服务器在执行步骤103之前，确定待处理的日志的词袋中未包括屏蔽词袋中的所有单词。屏蔽词袋中包含屏蔽单词，当待处理的日志的词袋中包括屏蔽词袋中的所有单词时，服务器删除该日志。In one example, a masked bag of words is set in the server, and before step 103 is executed, the server determines that the bag of words of the log to be processed does not include all the words in the masked bag of words. The blocked word bag contains blocked words, and when the word bag of the log to be processed includes all the words in the blocked word bag, the server deletes the log.

在一个例子中，服务器中设置有16个屏蔽词袋。第1个屏蔽词袋中的屏蔽单词为audit(审计)，第2个屏蔽词袋中的屏蔽单词为inode(元数据节点)，第3个屏蔽词袋中的屏蔽单词为hook(钩子)，第4个屏蔽词袋中的屏蔽单词为hung(挂起)、task(任务)、timeout(超时)和secs(秒)，第5个屏蔽词袋中的屏蔽单词为CAP(权限)、NET(网络)和ADMIN(管理员)，第6个屏蔽词袋中的屏蔽单词为filesystem(文件系统)，第7个屏蔽词袋中的屏蔽单词为IPVS(IP Virtual Server，IP虚拟服务器)，第8个屏蔽词袋中的屏蔽单词为the(这)、kdump(内核)、crash(宕机)和info(信息)，第9个屏蔽词袋中的屏蔽单词为USB(UniversalSerial Bus，通用串行总线)，第10个屏蔽词袋中的屏蔽单词为bitmap(位图)，第11个屏蔽词袋中的屏蔽单词为connect(连接)、debounce(去抖动)和failed(失败)，第12个屏蔽词袋中的屏蔽单词为eth(网卡)、Reset(重启)和adapter(适配器)，第13个屏蔽词袋中的屏蔽单词为loading(加载)、buddy(伙伴)和information(信息)，第14个屏蔽词袋中的屏蔽单词为license(证书)和expired(过期)，第15个屏蔽词袋中的屏蔽单词为bus(总线)和error(错误)，第16个屏蔽词袋中的屏蔽单词为error(错误)和device(设备)。当待处理的日志包括上述任意一个屏蔽词袋中的所有单词时，删除该待处理的日志。In one example, 16 blocked word bags are set in the server. The shielded word in the first shielded word bag is audit (audit), the shielded word in the second shielded word bag is inode (metadata node), and the shielded word in the third shielded word bag is hook (hook). The shielded words in the fourth shielded word bag are hung (hang), task (task), timeout (timeout) and secs (seconds), and the shielded words in the fifth shielded word bag are CAP (permission), NET ( Network) and ADMIN (administrator), the shielded word in the 6th shielded word bag is filesystem (file system), the shielded word in the 7th shielded word bag is IPVS (IP Virtual Server, IP virtual server), the 8th The shielded words in the first shielded word bag are the (this), kdump (kernel), crash (downtime) and info (information), and the shielded words in the ninth shielded word bag are USB (UniversalSerial Bus, Universal Serial Bus ), the shielded words in the 10th shielded word bag are bitmap (bitmap), the shielded words in the 11th shielded word bag are connect (connection), debounce (debounce) and failed (failure), the 12th shielded word The shielded words in the word bag are eth (network card), Reset (restart) and adapter (adapter), and the shielded words in the 13th shielded word bag are loading (loading), buddy (partner) and information (information), the 14th The shielded words in the first shielded word bag are license (certificate) and expired (expired), the shielded words in the 15th shielded word bag are bus (bus) and error (error), and the shielded words in the 16th shielded word bag For error (error) and device (device). When the log to be processed includes all the words in any one of the masked word bags, delete the log to be processed.

需要说明的是，本领域技术人员可以理解，实际应用中，可以根据需要设置屏蔽词袋的个数，本实施方式不限制屏蔽词袋的个数。It should be noted that those skilled in the art can understand that in practical applications, the number of masked word bags can be set as required, and this embodiment does not limit the number of masked word bags.

需要说明的是，本领域技术人员可以理解，每个屏蔽词袋中的屏蔽单词可以根据需要设置，此处不一一列举。It should be noted that those skilled in the art can understand that the masked words in each masked word bag can be set as required, and are not listed here.

值得一提的是，服务器根据设置的屏蔽词袋，直接去除一部分日志，减轻了服务器的处理压力，进一步减少了日志的数量。It is worth mentioning that the server directly removes part of the logs according to the set blocked word bag, which reduces the processing pressure on the server and further reduces the number of logs.

步骤104：根据待处理的日志的词袋与历史日志的词袋的关系，确定是否保留待处理的日志。Step 104: According to the relationship between the bag of words of the log to be processed and the bag of words of the historical log, determine whether to keep the log to be processed.

具体地说，在对第一日志文件处理的过程中，将第一日志文件中已处理且确定要参考日志保存在第二日志文件中。其中，历史日志是指第二日志文件中的日志。Specifically, during the processing of the first log file, the processed logs in the first log file that are determined to be referred to are saved in the second log file. Wherein, the historical log refers to the log in the second log file.

需要说明的是，本领域技术人员可以理解，若服务器在日志产生后立即对该日志进行处理，在确定保留该日志后，再记录该日志，则历史日志是指已记录的日志，本实施方式不限制历史日志的含义。It should be noted that those skilled in the art can understand that if the server processes the log immediately after the log is generated, and then records the log after confirming to keep the log, the historical log refers to the recorded log. There is no limit to the meaning of the history log.

在一个例子中，服务器若确定待处理的日志的词袋与历史日志的词袋的关系为第一关系或第二关系，删除待处理的日志；若确定待处理的日志的词袋与历史日志的词袋的关系为第三关系，保留历史日志的时间戳部分和待处理的日志的词袋；若确定待处理的日志的词袋与历史日志的词袋的关系为第四关系或第五关系，保留待处理的日志的时间戳部分和待处理的日志的词袋。当待处理的日志的词袋与历史日志的词袋为第一关系时，说明待处理的日志可能有残缺，或者，历史日志记录的信息比待处理的日志记录的信息更详细，当待处理的日志的词袋与历史日志的词袋为第三关系时，说明历史日志可能有残缺，或者，待处理的日志记录的信息比历史日志记录的信息更详细。针对这一情况，保留词袋较大的一条日志，并选择二者中时间戳较早的一条日志的时间戳作为该日志的时间戳。当待处理的日志的词袋与历史日志的词袋为第二关系时，说明待处理的日志可能与历史日志完全相同，所以可以删除该待处理的日志。当待处理的日志的词袋与历史日志的词袋的关系为第四关系时，说明历史日志和待处理的日志有一些参量相同，有一些参量不同，这两条日志可能是记录了同一磁盘不同故障类型的日志，也有可能是记录了不同磁盘的同一故障类型的日志，还有可能是仅仅一些描述性单词相同，但实质完全不同的日志。因此，需要保留待处理的日志和历史日志。当待处理的日志的词袋与历史日志的词袋为第五关系时，说明待处理的日志和历史日志是完全不相关的日志，故需要保留待处理的日志和历史日志。In one example, if the server determines that the relationship between the bag of words of the log to be processed and the bag of words of the historical log is the first relationship or the second relationship, the server deletes the log to be processed; The relationship between the bag of words of the log is the third relationship, and the time stamp part of the historical log and the bag of words of the log to be processed are reserved; if the relationship between the bag of words of the log to be processed and the bag of words of the historical log is determined to be the fourth or fifth relationship relation, keeping the timestamp portion of the pending log and the bag-of-words of the pending log. When the word bag of the log to be processed is the first relationship with the bag of words of the history log, it means that the log to be processed may be incomplete, or the information recorded in the history log is more detailed than the information recorded in the log to be processed. When the bag of words of the log and the bag of words of the history log have a third relationship, it means that the history log may be incomplete, or the information recorded in the log to be processed is more detailed than the information recorded in the history log. In this case, the log with the larger bag of words is reserved, and the timestamp of the log with the earlier timestamp is selected as the timestamp of the log. When the bag-of-words of the log to be processed is in the second relationship with the bag-of-words of the historical log, it means that the log to be processed may be exactly the same as the historical log, so the log to be processed may be deleted. When the relationship between the bag of words of the log to be processed and the bag of words of the history log is the fourth relationship, it means that some parameters of the history log and the log to be processed are the same, and some parameters are different. These two logs may be recorded on the same disk Logs of different fault types may also be logs of the same fault type recorded on different disks, or logs with the same descriptive words but completely different essences. Therefore, pending logs and historical logs need to be kept. When the bag-of-words of the log to be processed and the bag-of-words of the history log have the fifth relationship, it means that the log to be processed and the history log are completely irrelevant logs, so it is necessary to keep the log to be processed and the history log.

通过上述内容可知，本实施方式提及的日志的处理方法，将重点放在发现日志之间的内在关系上，从而确保能够尽量准确地发现关键的日志。第一日志文件经过本实施方式提供的日志的处理方法处理后，可以得到第二日志文件，第二日志文件相对于第一日志文件而言，重复记录的日志和破损的日志大大减少，日志中一些过于详细的信息也被剔除，每条日志均为独一无二的记录，且以第一次出现的时间为准，使得能够减少日志占用的存储空间，去除无关或是错误日志，合并重复日志，从而加快分析速度。经验证，通过本实施方式提供的日志的处理方法处理后的日志文件可以减少90％的存储空间，分析效率成倍增长。It can be seen from the above that the log processing method mentioned in this embodiment focuses on discovering the internal relationship between logs, so as to ensure that key logs can be found as accurately as possible. After the first log file is processed by the log processing method provided in this embodiment, the second log file can be obtained. Compared with the first log file, the second log file has greatly reduced the number of repeated records and damaged logs. Some overly detailed information is also eliminated, each log is a unique record, and is based on the first time it appears, so that the storage space occupied by the log can be reduced, irrelevant or error logs can be removed, and duplicate logs can be merged, thereby Speed up your analysis. It has been verified that the log files processed by the log processing method provided in this embodiment can reduce the storage space by 90%, and the analysis efficiency is doubled.

需要说明的是，以上仅为举例说明，并不对本发明的技术方案构成限定。It should be noted that the above is only for illustration and does not limit the technical solution of the present invention.

与现有技术相比，本实施方式中提供的日志的处理方法，服务器将待处理的日志的词袋和历史日志的词袋进行对比，能够确定待处理的日志的关系和历史日志的词袋的关系，该关系体现了待处理的日志和历史日志的关系。由于服务器能够确定待处理的日志和历史日志的关系，即可根据该关系，有选择的保留已记录的日志，减少了记录的日志的数量，降低了人工处理日志的负担。Compared with the prior art, in the log processing method provided in this embodiment, the server compares the bag-of-words of the log to be processed with the bag-of-words of the historical log, and can determine the relationship between the log to be processed and the bag-of-words of the historical log The relationship, which reflects the relationship between the logs to be processed and the historical logs. Since the server can determine the relationship between the log to be processed and the historical log, it can selectively retain the recorded log according to the relationship, reducing the number of recorded logs and reducing the burden of manual log processing.

本发明的第二实施方式涉及一种日志的处理方法，本实施方式是对第一实施方式的进一步改进，具体改进之处为：在处理完所有待处理的日志之后，根据参考日志，生成映射文件，以便于分析后续接收到的日志。The second embodiment of the present invention relates to a log processing method. This embodiment is a further improvement on the first embodiment. The specific improvement is: after all pending logs are processed, a mapping is generated according to the reference log file for easy analysis of subsequent received logs.

具体的说，如图2所示，在本实施方式中，包含步骤201至步骤208，其中，步骤201至步骤204分别与第一实施方式中的步骤101至步骤104大致相同，此处不再赘述。下面主要介绍第二实施方式和第一实施方式的不同之处：Specifically, as shown in FIG. 2 , in this embodiment, steps 201 to 208 are included, wherein, steps 201 to 204 are substantially the same as steps 101 to 104 in the first embodiment, and are not repeated here. repeat. The following mainly introduces the differences between the second embodiment and the first embodiment:

执行步骤201至步骤204。Execute step 201 to step 204.

在处理完所有的待处理的日志之后，执行以下步骤：After processing all pending logs, perform the following steps:

步骤205：获取保留的日志，将保留的日志作为参考日志，确定参考日志之间的相似度。Step 205: Obtain the retained logs, use the retained logs as reference logs, and determine the similarity between the reference logs.

具体地说，服务器在确定任意两个参考日志的相似度的过程中，分别进行以下操作：确定两个参考日志的词袋之间的相似度；将两个参考日志的词袋之间的相似度，作为两个参考日志之间的相似度。例如，参考日志包括日志1和日志2，日志1的词袋为词袋3，日志2的词袋为词袋4，日志1和日志2之间的相似度＝词袋3和词袋4之间的相似度。Specifically, in the process of determining the similarity between any two reference logs, the server performs the following operations respectively: determine the similarity between the bags of words of the two reference logs; degree, as the similarity between two reference logs. For example, the reference log includes log 1 and log 2, the bag of words of log 1 is bag of words 3, the bag of words of log 2 is bag of words 4, the similarity between log 1 and log 2 = the difference between bag of words 3 and bag of words 4 similarity between.

在第一个例子中，词袋3和词袋4之间的相似度＝同时出现在词袋3和词袋4中的单词数量/(词袋3的单词数量+词袋4的单词数量-同时出现在词袋3和词袋4中的单词数量)*100％。In the first example, similarity between bag-of-words3 and bag-of-words4 = number of words that appear in bag-of-words3 and bag-of-words4/(number of words in bag-of-words3 + words in bag-of-words4- The number of words that appear in both bag-of-words 3 and bag-of-words 4) * 100%.

在第二个例子中，服务器去除词袋3中的介词、连接词等没有含义的词汇得到词袋5，去除词袋4中的介词、连接词等没有含义的词汇得到词袋6，词袋3和词袋4之间的相似度＝同时出现在词袋5和词袋6中的单词数量/(词袋5的单词数量+词袋6的单词数量-同时出现在词袋5和词袋6中的单词数量)*100％。In the second example, the server removes meaningless words such as prepositions and conjunctions in word bag 3 to obtain word bag 5, removes meaningless words such as prepositions and conjunctions in word bag 4 to obtain word bag 6, word bag The similarity between 3 and bag of words 4 = the number of words that appear in bag of words 5 and bag of words at the same time / (the number of words in bag of words 5 + the number of words in bag of words 6 - the number of words that appear in bag of words 5 and bag of words at the same time number of words in 6)*100%.

步骤206：根据参考日志之间的相似度，对参考日志进行分类。Step 206: Classify the reference logs according to the similarity between the reference logs.

具体地说，同一类的日志的相似度大于第一预设值。其中，第一预设值可以是大于0小于1的任意百分数，例如，第一预设值为30％至60％的百分数，如40％。Specifically, the similarity of logs of the same category is greater than a first preset value. Wherein, the first preset value may be any percentage greater than 0 and smaller than 1, for example, the first preset value is a percentage of 30% to 60%, such as 40%.

例如，第一预设值为40％，参考日志包括5条日志，编号分别为1至5，每条日志处理前的信息、处理后的信息和词袋的大小如表格1所示。For example, the first preset value is 40%, and the reference log includes 5 logs, numbered 1 to 5 respectively. Table 1 shows the pre-processed information, processed information and bag-of-words size of each log.

表格1Table 1

其中，mce:[Hardware Error]:Machine check:Processor context corrupt表示机器检查异常：[硬件错误]：机器检查：处理器上下文损坏；Kernel panic-not syncing:Timeout:Not all CPU entered broadcast exception handler表示内核崩溃—不同步：超时：并非所有CPU都进入广播异常处理程序；sbridge:Lost 47 memory errors表示丢失了47个内存错误；sbridge:HANDLING MCE MEMORY ERROR表示处理MCE内存错误；mce:[Hardware Error]:CPU 17:Machine Check Exception:5 Bank 12:be00003f001000c3表示检测17号CPU的时候发现异常，异常位置在5 Bank 12:be00003f001000c3。使用第二个例子提供的方法计算每条日志之间的相似度，每条日志之间的相似度如表格2所示。Among them, mce:[Hardware Error]:Machine check:Processor context corrupt indicates that the machine check is abnormal: [hardware error]: machine check: the processor context is damaged; Kernel panic-not syncing:Timeout:Not all CPU entered broadcast exception handler indicates the kernel Crash - out of sync: timeout: not all CPUs entered broadcast exception handler; sbridge:Lost 47 memory errors means lost 47 memory errors; sbridge:HANDLING MCE MEMORY ERROR means handle MCE memory errors; mce:[Hardware Error]: CPU 17:Machine Check Exception: 5 Bank 12:be00003f001000c3 indicates that an exception was found when checking CPU No. 17, and the abnormal location is in 5 Bank 12:be00003f001000c3. Use the method provided in the second example to calculate the similarity between each log, and the similarity between each log is shown in Table 2.

表格2Form 2

由上表可知，编号为1的日志和编号为5的日志属于反映同一故障类别的日志，编号为3的日志和编号为4的日志属于反映同一故障类别的日志，编号为2的日志独立为一类日志。通过不断的对已有的日志进行学习，可以不断丰富参考日志的故障类别。It can be seen from the above table that the log numbered 1 and the log numbered 5 belong to the log reflecting the same fault category, the log numbered 3 and the log numbered 4 belong to the log reflecting the same fault category, and the log numbered 2 is independently A type of log. By continuously learning the existing logs, the fault categories of the reference logs can be continuously enriched.

步骤207：确定每一类的日志的故障类别，以及每个参考日志的故障级别。Step 207: Determine the fault category of each type of log, and the fault level of each reference log.

具体地说，服务器将每一类的日志显示给维护人员，维护人员确定并输入该类的日志的故障类别，服务器根据用户输入的故障类别，确定每一类的日志的故障类别。服务器可以将各个参考日志显示给维护人员，维护人员确定并输入每个参考日志的故障级别，服务器根据维护人员输入的保留日志的故障级别，确定每个参考日志的故障级别。Specifically, the server displays each type of log to the maintenance personnel, the maintenance personnel determines and inputs the fault type of the log of this type, and the server determines the fault type of each type of log according to the fault type input by the user. The server can display each reference log to the maintenance personnel, the maintenance personnel determines and inputs the failure level of each reference log, and the server determines the failure level of each reference log according to the failure level of retained logs input by the maintenance personnel.

需要说明的是，本领域技术人员可以理解，实际应用中，也可以由服务器自动识别每个日志的词袋的单词，确定日志的故障类别和故障级别，本实施方式不限制确定每一类的日志的故障类别，以及每个参考日志的故障级别。It should be noted that those skilled in the art can understand that in practical applications, the server can also automatically identify the words in the bag of words of each log to determine the fault category and fault level of the log. This embodiment does not limit the determination of each type of word The failure category of the log, and the failure level of each referenced log.

在一个例子中，同一故障类别的词袋按照重要性从高到低分为A、B、C、D和E五个故障级别。In one example, the bags of words of the same fault category are divided into five fault levels of A, B, C, D, and E in descending order of importance.

步骤208：根据参考日志、参考日志的故障类别和参考日志的故障级别，生成映射文件。Step 208: Generate a mapping file according to the reference log, the fault category of the reference log, and the fault level of the reference log.

具体地说，映射文件为参考日志到参考日志的类别的映射，以及参考日志到参考日志的故障级别的映射，用于分析后续接收的日志，确定后续接收到的日志的故障类别和故障级别。Specifically, the mapping file is a mapping from a reference log to a category of a reference log, and a mapping from a reference log to a fault level of a reference log, and is used to analyze the subsequently received log and determine the fault type and fault level of the subsequently received log.

在一个例子中，生成映射文件之后，服务器利用映射文件，对后续接收到的日志进行分析。服务器分析日志的过程为：服务器获取待分析日志；对待分析日志进行处理，得到待分析日志的词袋；确定待分析日志的词袋与映射文件中的参考日志的词袋的相似度；根据待分析日志的词袋与参考日志的词袋的相似度，确定待分析日志的故障类别和待分析日志的故障级别。In an example, after the mapping file is generated, the server uses the mapping file to analyze subsequent logs received. The process of analyzing logs by the server is as follows: the server obtains the logs to be analyzed; processes the logs to be analyzed to obtain the bag of words of the logs to be analyzed; determines the similarity between the bag of words of the logs to be analyzed and the bag of words of the reference log in the mapping file; The similarity between the bag of words of the analysis log and the bag of words of the reference log is used to determine the fault category of the log to be analyzed and the fault level of the log to be analyzed.

在一个例子中，服务器根据待分析日志的词袋与参考日志的词袋的相似度，确定待分析日志的故障类别和待分析日志的故障级别的方法包括但不限于以下两种：In one example, the server determines the fault category of the log to be analyzed and the fault level of the log to be analyzed according to the similarity between the bag of words of the log to be analyzed and the bag of words of the reference log, including but not limited to the following two:

方法1：服务器将与待分析日志的词袋的相似度最高的参考日志的故障类别，作为待分析日志的故障类别；将相似度最高的参考日志的故障级别，作为待分析日志的故障级别。Method 1: The server uses the fault category of the reference log with the highest similarity with the bag of words of the log to be analyzed as the fault category of the log to be analyzed; uses the fault level of the reference log with the highest similarity as the fault level of the log to be analyzed.

方法2：服务器判断映射文件中是否存在与待分析日志的词袋的相似度大于第二预设值的参考日志的词袋；若确定是，将与待分析日志的词袋的相似度最高的参考日志的故障类别，作为第一日志的故障类别，将相似度最高的参考日志的故障级别，作为待分析日志的故障级别；否则，确定待分析日志的词袋的故障类别为未知类别，待分析日志的故障级别为未知级别。Method 2: The server judges whether there is a word bag of the reference log whose similarity with the word bag of the log to be analyzed is greater than the second preset value in the mapping file; The fault category of the reference log is used as the fault category of the first log, and the fault level of the reference log with the highest similarity is used as the fault level of the log to be analyzed; The failure level of the analysis log is unknown level.

在一个例子中，服务器确定待分析日志的词袋与映射文件中的参考日志的词袋的相似度的方法为：按照待处理日志的词袋、参考日志的词袋和相似度的约束关系，计算相似度；其中，约束关系为：相似度＝同时出现在第一日志的词袋和参考日志的词袋的单词数量/(第一日志的词袋的单词数量+参考日志的单词数量-同时出现在第一日志的词袋和参考日志的词袋的单词数量)。In an example, the method for the server to determine the similarity between the bag of words of the log to be analyzed and the bag of words of the reference log in the mapping file is: according to the constraint relationship between the bag of words of the log to be processed, the bag of words of the reference log, and the similarity, Calculate similarity; Wherein, constraint relation is: similarity=occur in the word quantity of the word bag of the first log and the word bag of reference log/(the word quantity of the word bag of the first log+reference log word quantity-simultaneously The number of words that appear in the bag-of-words of the first log and the bag-of-words of the reference log).

需要说明的是，服务器分析待分析日志的过程可参考第三实施方式和第四实施方式中服务器分析第一日志的过程，此处不做详述，本领域技术人员可参考第三实施方式和第四实施方式的内容对待分析日志进行分析。It should be noted that, the process of the server analyzing the log to be analyzed can refer to the process of the server analyzing the first log in the third embodiment and the fourth embodiment, which will not be described in detail here, and those skilled in the art can refer to the third embodiment and The content of the fourth embodiment is to analyze the log to be analyzed.

与现有技术相比，本实施方式中提供的日志的处理方法，由于服务器能够确定待处理的日志和历史日志的关系，即可根据该关系，有选择的保留已记录的日志，减少了记录的日志的数量，降低了人工处理日志的负担。服务器根据已处理的日志，生成映射文件，以便服务器自动分析后续接收到的日志，提高了服务器的智能性，降低了维护人员的工作量，减轻了人工分析日志的压力。Compared with the prior art, the log processing method provided in this embodiment, since the server can determine the relationship between the log to be processed and the historical log, it can selectively retain the recorded log according to the relationship, reducing the number of records The number of logs reduces the burden of manual log processing. The server generates a mapping file based on the processed logs, so that the server can automatically analyze the subsequent logs received, which improves the intelligence of the server, reduces the workload of maintenance personnel, and reduces the pressure of manual log analysis.

本发明的第三实施方式涉及一种日志的分析方法，应用于服务器。如图3所示，包括以下步骤：The third embodiment of the present invention relates to a log analysis method, which is applied to a server. As shown in Figure 3, the following steps are included:

步骤301：获取待处理的第一日志。Step 301: Obtain the first log to be processed.

步骤302：对第一日志进行处理，得到第一日志的词袋。Step 302: Process the first log to obtain a bag of words of the first log.

在一个例子中，服务器删除第一日志中的变量，变量为预先设置的参量；将删除变量后的第一日志拆分为N个单词，生成待处理的日志的词袋，N为正整数。其中，预先设置的参量至少包括坏道的位置信息、坏道的编号信息、坏块的位置信息和坏块的编号信息中的任意一个。In an example, the server deletes the variable in the first log, and the variable is a preset parameter; splits the first log after the variable is deleted into N words to generate a bag of words of the log to be processed, and N is a positive integer. Wherein, the preset parameters include at least any one of bad track location information, bad track number information, bad block location information, and bad block number information.

在一个例子中，服务器删除第一日志中的变量的方法为：识别第一日志的正文部分的数字；将第一日志的正文部分的数字删除。In an example, the server deletes the variable in the first log by: identifying the number in the body part of the first log; and deleting the number in the body part of the first log.

需要说明的是，服务器对第一日志进行处理，得到第一日志的词袋的过程与第一实施方式中对待处理的日志进行处理，得到待处理的日志的词袋的过程大致相同，本领域技术人员可以参考第一实施方式的相关内容执行该步骤。It should be noted that the process of the server processing the first log to obtain the bag of words of the first log is roughly the same as the process of processing the log to be processed and obtaining the bag of words of the log to be processed in the first embodiment. A skilled person may perform this step with reference to relevant content of the first embodiment.

步骤303：确定第一日志的词袋与映射文件中的参考日志的词袋的相似度。Step 303: Determine the similarity between the bag-of-words of the first log and the bag-of-words of the reference log in the mapping file.

具体地说，映射文件包括参考日志的词袋，以及参考日志的故障类别，和/或，参考日志的故障级别。映射文件的创建方法可以参考第二实施方式提及的日志的处理方法的相关内容，此处不再赘述。Specifically, the mapping file includes the bag of words of the reference log, and the fault category of the reference log, and/or, the fault level of the reference log. For the creation method of the mapping file, reference may be made to the relevant content of the log processing method mentioned in the second embodiment, and details are not repeated here.

服务器确定第一日志的词袋与映射文件中的参考日志的词袋的相似度的方法包括但不限于以下两种：The method for the server to determine the similarity between the bag of words of the first log and the bag of words of the reference log in the mapping file includes but not limited to the following two:

方法1：服务器按照第一日志的词袋、参考日志的词袋和相似度的约束关系，计算相似度；其中，约束关系为：相似度＝同时出现在第一日志的词袋和参考日志的词袋的单词数量/(第一日志的词袋的单词数量+参考日志的单词数量-同时出现在第一日志的词袋和参考日志的词袋的单词数量)。Method 1: The server calculates the similarity according to the bag of words in the first log, the bag of words in the reference log, and the constraint relationship of similarity; where the constraint relationship is: similarity = the bag of words that appears in the first log and the word bag of the reference log at the same time The number of words in the bag of words / (the number of words in the bag of words in the first log + the number of words in the reference log - the number of words that appear in both the bag of words in the first log and the word bag in the reference log).

方法2：服务器去除第一日志的词袋和参考日志的词袋中的无效单词，无效单词为预先指定的单词，例如，各种介词、连接词等没有含义的单词。在去除第一日志的词袋和参考日志的词袋中的无效单词后，按照第一日志的词袋、参考日志的词袋和相似度的约束关系，计算相似度。Method 2: The server removes invalid words in the bag of words of the first log and the bag of words of the reference log. The invalid words are pre-specified words, for example, various prepositions, conjunctions and other meaningless words. After removing invalid words in the bag of words of the first log and the bag of words of the reference log, the similarity is calculated according to the constraint relationship of the bag of words of the first log, the bag of words of the reference log and the similarity.

值得一提的是，由于无效单词相同并不代表两个日志的故障类别，和/或，故障级别相同，因此，去除第一日志的词袋和参考日志的词袋中的无效单词，能够避免无效单词对第一日志的词袋与参考日志的词袋的相似度的影响。It is worth mentioning that since the same invalid word does not represent the fault category of the two logs, and/or, the fault level is the same, therefore, removing invalid words in the bag of words of the first log and the bag of words of the reference log can avoid The effect of invalid words on the similarity between the first log's bag-of-words and the reference log's bag-of-words.

在一个例子中，服务器中设置有屏蔽词袋，在确定第一日志的词袋与参考日志的词袋之前，判断第一日志的词袋是否包含屏蔽词袋中所有的单词，若确定是，则忽略该第一日志，否则，执行后续步骤。In one example, the server is provided with a shielded word bag. Before determining the word bag of the first log and the word bag of the reference log, it is judged whether the word bag of the first log contains all the words in the shielded word bag. If it is determined to be, Then ignore the first log, otherwise, execute the next steps.

步骤304：根据第一日志的词袋与参考日志的词袋的相似度，确定第一日志的故障类别，和/或，第一日志的故障级别。Step 304: According to the similarity between the bag-of-words of the first log and the bag-of-words of the reference log, determine the fault category of the first log, and/or, the fault level of the first log.

具体地说，由于映射文件中包括参考日志的词袋、参考日志的故障类别，和/或，第一日志的故障级别，使得服务器能够使用该映射文件分析第一日志。Specifically, since the mapping file includes the bag of words of the reference log, the fault category of the reference log, and/or the fault level of the first log, the server can use the mapping file to analyze the first log.

以下对服务器使用映射文件分析第一日志的方法进行举例说明。The method for analyzing the first log by the server using the mapping file is described below with an example.

方法a，服务器将映射文件中与第一日志的词袋的相似度最高的参考日志的故障类别，作为第一日志的故障类别，和/或，将与第一日志的词袋的相似度最高的参考日志的故障级别，作为第一日志的故障级别。Method a, the server uses the fault category of the reference log with the highest similarity with the bag of words of the first log in the mapping file as the fault category of the first log, and/or, sets the fault category with the highest similarity with the bag of words of the first log The failure level of the reference log is used as the failure level of the first log.

具体地说，若映射文件中包括参考日志的词袋和参考日志的故障类别，服务器将映射文件中与第一日志的词袋的相似度最高的参考日志的故障类别，作为第一日志的故障类别。若映射文件中包括参考日志的词袋和参考日志的故障级别，服务器将映射文件中与第一日志的词袋的相似度最高的参考日志的故障级别，作为第一日志的故障级别。若映射文件中包括参考日志的词袋、参考日志的故障类别和参考日志的故障级别，服务器将映射文件中与第一日志的词袋的相似度最高的参考日志的故障类别，作为第一日志的故障类别，将与第一日志的词袋的相似度最高的参考日志的故障级别，作为第一日志的故障级别。Specifically, if the mapping file includes the bag of words of the reference log and the fault category of the reference log, the server will use the fault category of the reference log that has the highest similarity with the bag of words of the first log in the mapping file as the fault of the first log. category. If the mapping file includes the bag of words of the reference log and the failure level of the reference log, the server uses the failure level of the reference log in the mapping file with the highest similarity to the bag of words of the first log as the failure level of the first log. If the mapping file includes the bag of words of the reference log, the fault category of the reference log, and the fault level of the reference log, the server will use the fault category of the reference log with the highest similarity to the bag of words of the first log in the mapping file as the first log The fault category of the first log will be the fault level of the reference log with the highest similarity with the bag of words of the first log as the fault level of the first log.

方法b，服务器判断映射文件中是否存在与第一日志的词袋的相似度大于第二预设值的参考日志的词袋；若确定是，将与第一日志的词袋的相似度最高的参考日志的故障类别，作为第一日志的故障类别，和/或，将与第一日志的词袋的相似度最高的参考日志的故障级别，作为第一日志的故障级别；否则，确定第一日志的词袋的故障类别为未知类别，和/或，确定第一日志的故障级别为未知级别。其中，第二预设值可以根据需要设置为大于0小于1的数值，例如，设置为30％至60％的数值，如40％。Method b, the server judges whether there is a bag of words in the reference log whose similarity with the bag of words of the first log is greater than the second preset value in the mapping file; The failure category of the reference log is used as the failure category of the first log, and/or, the failure level of the reference log with the highest similarity with the word bag of the first log is used as the failure level of the first log; otherwise, the first log is determined. The failure category of the bag-of-words log is an unknown category, and/or, the failure level of the first log is determined to be an unknown level. Wherein, the second preset value can be set to a value greater than 0 and less than 1 as required, for example, set to a value of 30% to 60%, such as 40%.

在一个例子中，可以将同一故障类别的日志分为A、B、C、D和E五个故障级别，其中，E为未知级别。针对同一故障类别同一故障级别的日志，其重要程度可能还存在一些差别，该情况下，可以在每个故障级别下衍生出M个子级别，例如，对于故障级别A，可以衍生出子级别A1、A2、A3、A4、A5、A6、A7、A8、A9和A10，使得同一故障级别的日志的词袋仍然有区别空间。In an example, logs of the same fault category may be divided into five fault levels A, B, C, D and E, wherein E is an unknown level. For the logs of the same fault category and the same fault level, there may be some differences in their importance. In this case, M sub-levels can be derived under each fault level. For example, for fault level A, sub-levels A1, A2, A3, A4, A5, A6, A7, A8, A9, and A10, so that the bag-of-words of logs with the same fault level still have room for differentiation.

值得一提的是，在映射文件中不存在与第一日志的词袋的相似度大于第二预设值的参考日志的词袋时，说明第一日志与映射文件中的参考日志不属于同一故障类别，服务器将第一日志的词袋的故障类别标记为未知类别，便于维护人员及时发现未被发现的新出现的故障类别。It is worth mentioning that if there is no bag of words in the reference log whose similarity with the bag of words of the first log is greater than the second preset value in the mapping file, it means that the first log and the reference log in the mapping file do not belong to the same The fault category, the server marks the fault category of the bag of words in the first log as an unknown category, so that the maintenance personnel can find out the undiscovered new fault category in time.

在一个例子中，映射文件包括参考日志的词袋、参考日志的故障类别和参考日志的故障级别。在服务器确定第一日志的词袋的故障类别为未知类别，确定第一日志的故障级别为未知级别之后，服务器上报第一日志；根据用户指定的故障类别和故障级别，确定第一日志的故障类别和第一日志的故障级别；根据第一日志的词袋、第一日志的故障类别和第一日志的故障级别，更新映射文件。In one example, the mapping file includes a bag of words for the reference logs, a failure category for the reference logs, and a failure level for the reference logs. After the server determines that the fault category of the bag of words in the first log is an unknown category, and determines that the fault level of the first log is an unknown level, the server reports the first log; according to the fault category and fault level specified by the user, determine the fault of the first log category and the failure level of the first log; according to the bag of words of the first log, the failure category of the first log and the failure level of the first log, the mapping file is updated.

值得一提的是，服务器及时上报未知类别和未知级别的日志，并根据用户评定的故障类别和故障级别，更新映射文件，使得能够不断扩充和完善映射文件，提高服务器分析日志的准确性。It is worth mentioning that the server reports logs of unknown categories and levels in a timely manner, and updates the mapping file according to the fault category and fault level evaluated by the user, so that the mapping file can be continuously expanded and improved, and the accuracy of the server's log analysis can be improved.

需要说明的是，在极端情况下，相似度最高的参考日志存在多个，即存在多个参考日志的词袋与第一日志的词袋的相似度相同，且为最高值，服务器可以将第一日志的故障类别设置为未知类别，第一日志的故障级别设置为未知级别。It should be noted that, in extreme cases, there are multiple reference logs with the highest similarity, that is, the bags of words of multiple reference logs have the same degree of similarity as the bag of words of the first log, and the similarity is the highest. The fault category of the first log is set to the unknown category, and the fault level of the first log is set to the unknown level.

与现有技术相比，本实施方式提供的日志的分析方法，服务器能够利用映射文件，对第一日志进行分析，确定第一日志的故障类别，和/或，第一日志的故障级别，提高了服务器的智能性，减轻了维护人员从分析日志的压力。除此之外，在分析日志的过程中，能够及时反馈未知类别的日志，并根据为未知类别的日志指定的故障类别和故障级别，及时更新映射文件，不断完善映射文件，映射文件越完善，使用该映射文件分析日志得到的结论越准确。Compared with the prior art, in the log analysis method provided by this embodiment, the server can use the mapping file to analyze the first log, determine the fault type of the first log, and/or, improve the fault level of the first log This improves the intelligence of the server and reduces the pressure on the maintenance staff from analyzing logs. In addition, in the process of analyzing logs, the logs of unknown categories can be fed back in time, and the mapping files can be updated in time according to the fault categories and fault levels specified for the logs of unknown categories, and the mapping files can be continuously improved. Use this mapping file to analyze the log to get more accurate conclusions.

本发明的第四实施方式涉及一种日志的分析方法，本实施方式是对第三实施方式的进一步改进，具体改进之处为：在步骤304之后，增加了其他相关步骤。The fourth embodiment of the present invention relates to a log analysis method. This embodiment is a further improvement on the third embodiment. The specific improvement is that after step 304, other related steps are added.

具体的说，如图4所示，在本实施方式中，包含步骤401至步骤408，其中，步骤401至步骤403分别与第一实施方式中的步骤301至步骤303大致相同，此处不再赘述。下面主要介绍第四实施方式和第三实施方式的不同之处：Specifically, as shown in FIG. 4 , in this embodiment, steps 401 to 408 are included, wherein, steps 401 to 403 are substantially the same as steps 301 to 303 in the first embodiment, and are not repeated here. repeat. The following mainly introduces the differences between the fourth embodiment and the third embodiment:

执行步骤401至步骤403。Execute step 401 to step 403.

步骤404：根据第一日志的词袋与参考日志的词袋的相似度，确定第一日志的故障类别和第一日志的故障级别。Step 404: According to the similarity between the bag-of-words of the first log and the bag-of-words of the reference log, determine the fault category of the first log and the fault level of the first log.

具体地说，映射文件包括参考日志的词袋、参考日志的故障类别和参考日志的故障级别。服务器根据第一日志的词袋与参考日志的词袋的相似度，确定与第一日志的词袋相似度最高的参考日志，将相似度最高的参考日志的故障类别作为第一日志的故障类别，将相似度最高的参考日志的故障级别作为第一日志的故障级别。Specifically, the mapping file includes the bag-of-words of the reference logs, the fault categories of the reference logs, and the fault levels of the reference logs. According to the similarity between the bag of words of the first log and the bag of words of the reference log, the server determines the reference log with the highest similarity to the bag of words of the first log, and uses the fault category of the reference log with the highest similarity as the fault category of the first log , taking the fault level of the reference log with the highest similarity as the fault level of the first log.

步骤405：判断记录的日志中是否存在第二日志。Step 405: Determine whether there is a second log in the recorded logs.

具体地说，第二日志为与第一日志属于同一故障类别的日志。若服务器确定记录的日志中存在第二日志，则执行步骤406，否则，执行步骤407。Specifically, the second log is a log belonging to the same fault category as the first log. If the server determines that there is a second log in the recorded logs, step 406 is performed; otherwise, step 407 is performed.

步骤406：比较第一日志的故障级别和第二日志的故障级别，根据比较结果，更新记录的日志。Step 406: Compare the fault level of the first log with the fault level of the second log, and update the recorded log according to the comparison result.

具体地说，服务器若确定比较结果指示第一日志的故障级别高于第二日志的故障级别，用第一日志覆盖第二日志；若确定比较结果指示第一日志的故障级别不高于第二日志的故障级别，不用第一日志覆盖第二日志，以实现高故障级别的日志覆盖低故障级别的日志。Specifically, if the server determines that the comparison result indicates that the failure level of the first log is higher than that of the second log, the server overwrites the second log with the first log; if it determines that the comparison result indicates that the failure level of the first log is not higher than that of the second log The failure level of the log, do not cover the second log with the first log, so that the log with a high failure level covers the log with a low failure level.

值得一提的是，用高故障级别的日志覆盖低故障级别的日志，减少了记录的日志的数量，减轻了维护人员分析日志所浪费的时间和精力。维护人员可以更直观的获知每个故障类别中故障级别最高的关键日志，以便维护人员及时修复更为严重的故障。It is worth mentioning that the log with a high fault level covers the low fault level log, which reduces the number of recorded logs and reduces the time and energy wasted by maintenance personnel on analyzing logs. Maintenance personnel can more intuitively know the key logs with the highest fault level in each fault category, so that maintenance personnel can repair more serious faults in time.

需要说明的是，本领域技术可以理解，实际应用中，也可以采用其他方式更新记录的日志，例如，第一日志和第二日志以表格的形式存储于服务器中。若第一日志的故障级别高于第二日志，则将第一日志记录在第二日志之前，若第一日志的故障级别低于第一日志，则将第一日志记录在第二日志之后，本实施方式不限制更新日志的方法。It should be noted that those skilled in the art can understand that in practical applications, the recorded logs can also be updated in other ways, for example, the first log and the second log are stored in the server in the form of a table. If the fault level of the first log is higher than the second log, the first log is recorded before the second log, and if the fault level of the first log is lower than the first log, the first log is recorded after the second log, This embodiment does not limit the method of updating logs.

步骤407：记录第一日志。Step 407: Record the first log.

具体地说，由于未记录该故障类别的日志，服务器可以将第一日志记录在日志文件中，以便维护人员获知该日志的信息。Specifically, since no log of the fault category is recorded, the server may record the first log in a log file, so that maintenance personnel can obtain information about the log.

与现有技术相比，本实施方式提供的日志的分析方法，在第一日志的故障级别高于已记录的同一故障类别的日志的故障级别后，用第一日志替换已记录的提日志，保证了记录的日志的重要程度不断提升，从而达到告警不断升级的效果。Compared with the prior art, in the log analysis method provided by this embodiment, after the fault level of the first log is higher than that of the recorded log of the same fault category, the recorded log is replaced with the first log, It ensures that the importance of the recorded logs is continuously improved, so as to achieve the effect of continuous escalation of alarms.

上面各种方法的步骤划分，只是为了描述清楚，实现时可以合并为一个步骤或者对某些步骤进行拆分，分解为多个步骤，只要包括相同的逻辑关系，都在本专利的保护范围内；对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计，但不改变其算法和流程的核心设计都在该专利的保护范围内。The step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

本发明的第五实施方式涉及一种服务器，如图5所示，包括：至少一个处理器501；以及，与至少一个处理器501通信连接的存储器502；其中，存储器502存储有可被至少一个处理器501执行的指令，指令被至少一个处理器501执行，以使至少一个处理器501能够执行如上述实施方式提及的日志的处理方法。The fifth embodiment of the present invention relates to a server, as shown in FIG. 5 , including: at least one processor 501; and a memory 502 communicatively connected to at least one processor 501; The instructions executed by the processor 501 are executed by at least one processor 501, so that the at least one processor 501 can execute the log processing method mentioned in the foregoing implementation manner.

本发明的第六实施方式涉及一种服务器，如图6所示，包括：至少一个处理器601；以及，与至少一个处理器601通信连接的存储器602；其中，存储器602存储有可被至少一个处理器601执行的指令，指令被至少一个处理器601执行，以使至少一个处理器601能够执行如上述实施方式提及的日志的分析方法。The sixth embodiment of the present invention relates to a server, as shown in FIG. 6 , including: at least one processor 601; and a memory 602 communicatively connected to at least one processor 601; The instructions executed by the processor 601 are executed by at least one processor 601, so that the at least one processor 601 can execute the log analysis method mentioned in the above implementation manner.

第五实施方式和第六实施方式中，服务器包括：一个或多个处理器以及存储器，图5和图6中以一个处理器为例。处理器、存储器可以通过总线或者其他方式连接，图5和图6中以通过总线连接为例。存储器作为一种非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。处理器通过运行存储在存储器中的非易失性软件程序、指令以及模块，从而执行设备的各种功能应用以及数据处理。In the fifth embodiment and the sixth embodiment, the server includes: one or more processors and a memory, and one processor is taken as an example in FIG. 5 and FIG. 6 . The processor and the memory may be connected through a bus or in other ways. In FIG. 5 and FIG. 6, connection through a bus is used as an example. The memory, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications and data processing of the device by running non-volatile software programs, instructions and modules stored in the memory.

存储器可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储选项列表等。此外，存储器可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施方式中，存储器可选包括相对于处理器远程设置的存储器，这些远程存储器可以通过网络连接至外接设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store an option list and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices. In some embodiments, the memory may optionally include a memory that is remotely located relative to the processor, and these remote memories may be connected to external devices through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

一个或者多个模块存储在存储器中，当被一个或者多个处理器执行时，执行上述任意方法实施方式中的日志的处理方法或日志的分析方法。One or more modules are stored in the memory, and when executed by one or more processors, execute the log processing method or the log analysis method in any method implementation above.

上述产品可执行本申请实施方式所提供的方法，具备执行方法相应的功能模块和有益效果，未在本实施方式中详尽描述的技术细节，可参见本申请实施方式所提供的方法。The above-mentioned products can execute the methods provided in the embodiments of this application, and have the corresponding functional modules and beneficial effects for executing the methods. For technical details not described in detail in this embodiment, please refer to the methods provided in the embodiments of this application.

本发明的第七实施方式涉及一种计算机可读存储介质，存储有计算机程序。计算机程序被处理器执行时实现上述日志的处理方法的实施例。A seventh embodiment of the present invention relates to a computer-readable storage medium storing a computer program. When the computer program is executed by the processor, the embodiment of the above log processing method is realized.

本发明的第八实施方式涉及一种计算机可读存储介质，存储有计算机程序。计算机程序被处理器执行时实现上述日志的分析方法的实施例。An eighth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. When the computer program is executed by the processor, the embodiment of the above log analysis method is realized.

即，本领域技术人员可以理解，实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序存储在一个存储介质中，包括若干指令用以使得一个设备(可以是单片机，芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, and other media that can store program codes.

本领域的普通技术人员可以理解，上述各实施方式是实现本发明的具体实施例，而在实际应用中，可以在形式上和细节上对其作各种改变，而不偏离本发明的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific examples for realizing the present invention, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present invention. scope.

Claims

1. A method of analyzing a log, comprising:

acquiring a first log to be processed;

processing the first log to obtain a word bag of the first log;

determining similarity of a word bag of the first log and a word bag of a reference log in a mapping file, wherein the mapping file comprises the word bag of the reference log and a fault class of the reference log and/or a fault level of the reference log;

determining a fault class of the first log and/or a fault level of the first log according to the similarity of the word bags of the first log and the reference log;

after determining the fault class of the first log and the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log, the method for analyzing the log further comprises:

judging whether a second log exists in the recorded logs, wherein the second log is a log belonging to the same fault type as the first log;

If the fault level of the first log and the fault level of the second log are determined to exist, updating the recorded log according to a comparison result;

and if the first log does not exist, recording the first log.

2. The method for analyzing the log according to claim 1, wherein the determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log specifically comprises:

and taking the fault class of the reference log with the highest similarity with the word bag of the first log as the fault class of the first log, and/or taking the fault level of the reference log with the highest similarity with the word bag of the first log as the fault level of the first log.

3. The method for analyzing the log according to claim 1, wherein the determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log specifically comprises:

judging whether a word bag of a reference log with similarity to the word bag of the first log larger than a second preset value exists in the mapping file or not;

If so, taking the fault class of the reference log with the highest similarity with the word bag of the first log as the fault class of the first log, and/or taking the fault level of the reference log with the highest similarity with the word bag of the first log as the fault level of the first log;

otherwise, determining the fault class of the bag of words of the first log as an unknown class, and/or determining the fault level of the first log as an unknown level.

4. A method of analysing a log according to claim 3, wherein the map file comprises a bag of words for the reference log, a fault class for the reference log and a fault level for the reference log;

after determining that the fault class of the bag of words of the first log is an unknown class and determining that the fault level of the first log is an unknown level, the log analysis method further comprises:

reporting the first log;

determining the fault class of the first log and the fault level of the first log according to the fault class and the fault level designated by the user;

and updating the mapping file according to the word bag of the first log, the fault class of the first log and the fault level of the first log.

5. The method for analyzing a log according to claim 1, wherein updating the recorded log according to the comparison result specifically comprises:

if the comparison result is determined to indicate that the fault level of the first log is higher than that of the second log, covering the second log by the first log;

and if the comparison result is determined to indicate that the fault level of the first log is not higher than the fault level of the second log, the second log is not covered by the first log.

6. A method of analyzing logs according to any of claims 1 to 3, wherein determining the similarity of the bag of words of the first log to the bag of words of the reference log in the mapping file comprises:

calculating the similarity according to the constraint relation of the word bags of the first log, the word bags of the reference log and the similarity; wherein, the constraint relation is: the similarity = the number of words of the bag of words that occur simultaneously in the first log and the bag of words of the reference log/(the number of words of the bag of words of the first log + the number of words of the reference log-the number of words of the bag of words that occur simultaneously in the first log and the bag of words of the reference log).

7. The method according to claim 6, wherein before the calculation of the similarity according to the constraint relation of the bag of words of the first log, the bag of words of the reference log, and the similarity, the method further comprises:

removing invalid words in the word bags of the first log and the reference log; wherein the invalid word is a pre-specified word.

8. The method for analyzing a log according to claim 1, wherein the processing the first log to obtain a bag of words of the first log specifically includes:

deleting a variable in the first log, wherein the variable is a preset parameter;

splitting the first log after deleting the variables into N words, generating a word bag of the log to be processed, wherein N is a positive integer.

9. The method according to claim 8, wherein the preset parameters include at least any one of position information of a bad track, number information of a bad track, position information of a bad block, and number information of a bad block.

10. The method for analyzing a log according to claim 9, wherein deleting the variable in the first log specifically includes:

Identifying a number of a body portion of the first log;

and deleting the number of the text part of the first log.

11. A server, comprising: at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of log analysis of any one of claims 1 to 10.

12. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of analyzing logs according to any of claims 1 to 10.