CN112115019A

CN112115019A - Application log monitoring method and system for application

Info

Publication number: CN112115019A
Application number: CN202010869818.2A
Authority: CN
Inventors: 周晔; 穆海洁; 何晓楠
Original assignee: Shanghai Huifu Data Service Co ltd
Current assignee: Shanghai Huifu Data Service Co ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2020-12-22

Abstract

The invention discloses an application log monitoring method and system for an application program, wherein the application log monitoring method comprises the following steps: checking the effectiveness of the collected application logs to screen out effective application logs; judging whether the screened effective application logs belong to a stack or not; according to configuration items of a preset rule index, performing predictive anomaly and non-predictive anomaly analysis on the effective application logs which do not belong to the stack; and sending alarm information related to the analysis result according to the configuration item. The invention aims at analyzing and alarming defects possibly occurring in business logic in logs in a targeted manner in real time at least.

Description

Application log monitoring method and system for application

技术领域technical field

本发明涉及计算机技术领域，具体来说，涉及一种用于应用程序的应用日志监控方法以及系统。The present invention relates to the field of computer technology, and in particular, to an application log monitoring method and system for application programs.

背景技术Background technique

日志在计算机系统中是一个应用广泛的概念，操作系统内核、各种应用服务器等等、程序都有可能输出日志。日志的内容、规模和用途也各不相同，很难一概而论。随着软件技术的发展，软件的代码行数变得非常庞大，维护软件的工作也变得十分复杂和困难。通常，可以在编写代码的过程中，添加关键业务的日志记录代码，这样在软件运行过程中，关键业务的程序处理情况就记录在日志文件中了，一旦软件运行异常，就可以通过分析日志信息，做出相应的处理。Logs are a widely used concept in computer systems. Operating system kernels, various application servers, etc., programs may output logs. Logs also vary in content, size, and purpose, making it difficult to generalize. With the development of software technology, the number of lines of code in the software has become very large, and the work of maintaining the software has also become very complicated and difficult. Usually, in the process of writing the code, you can add the log recording code of the key business, so that during the software running process, the program processing of the key business is recorded in the log file. Once the software runs abnormally, you can analyze the log information by , and make corresponding processing.

现有的日志分析系统，通常由日志采集代理和日志分析管理系统构成，可以对数量较小的日志文件进行分析处理。然而，现有的日志分析系统无法胜任海量日志文件的分析处理，并且缺乏实时分析、查询和预警能力。市面上基于日志实时检索监控业务指标以及异常堆栈还没有具体功能实现。The existing log analysis system is usually composed of a log collection agent and a log analysis management system, which can analyze and process a small number of log files. However, the existing log analysis system is not capable of analyzing and processing massive log files, and lacks real-time analysis, query and early warning capabilities. There is no specific function implementation on the market based on real-time retrieval and monitoring of business indicators and exception stacks based on logs.

另一方面，随着计算机技术的成熟，运行于智能设备的应用程序虽然越来越完善，但是实际运行过程中又难免会出现错误的情况。因此，工程技术人员通常需要通过日志系统，记录应用程序的运行状态及操作内容，以便人员查看、作为调试设备的依据。日志记录应用程序的各种运行状态和操作信息，生成日志文件。On the other hand, with the maturity of computer technology, although the application programs running on smart devices are becoming more and more perfect, errors will inevitably occur in the actual operation process. Therefore, engineers and technicians usually need to record the running status and operation content of the application program through the log system, so that the personnel can view it and use it as a basis for debugging equipment. The log records various running status and operation information of the application, and generates log files.

但是查看和分析日志都是基于Linux脚本使用传统工具对日志进行检测。这些工具方法具有以下缺点：不及时，不能第一时间发现问题；不直观、不易扩展，适用范围只限于少量的主机和日志文件类型。现有监控应用日志的方法，应用服务器记录应用日志，定期地将应用日志文件上传到远程监控服务器，再由相关技术维护人员定期从监控服务器获取日志文件进行分析，从而无法实时、准确地对日志进行处理，满足应用层面对报警监控的实时性需求，使日志监控效率较低。However, viewing and analyzing logs are based on Linux scripts using traditional tools to detect logs. These tools and methods have the following disadvantages: they are not timely, and the problem cannot be found at the first time; they are not intuitive and not easy to expand, and their scope of application is limited to a small number of hosts and log file types. In the existing method for monitoring application logs, the application server records the application logs, regularly uploads the application log files to the remote monitoring server, and then the relevant technical maintenance personnel regularly obtain the log files from the monitoring server for analysis, so that the logs cannot be analyzed in real time and accurately. Processing to meet the real-time requirements for alarm monitoring at the application level, making log monitoring less efficient.

发明内容SUMMARY OF THE INVENTION

针对相关技术中的问题，本发明提出一种用于应用程序的应用日志监控方法以及系统，能够实时地、有针对性地对日志中业务逻辑可能出现的缺陷进行分析和告警。Aiming at the problems in the related art, the present invention proposes an application log monitoring method and system for an application program, which can analyze and alarm the possible defects of the business logic in the log in real time and pertinently.

本发明的技术方案是这样实现的：The technical scheme of the present invention is realized as follows:

根据本发明的一个方面，提供了一种用于应用程序的应用日志监控方法，包括：According to an aspect of the present invention, an application log monitoring method for an application is provided, comprising:

对收集的应用日志的有效性进行检验，以筛选出有效应用日志；Check the validity of the collected application logs to filter out valid application logs;

判断所筛选的所述有效应用日志是否属于堆栈；judging whether the filtered valid application log belongs to the stack;

根据预定规则指标的配置项，对不属于所述堆栈的所述有效应用日志进行预见性异常和非预见性异常分析；According to the configuration item of the predetermined rule indicator, perform predictive exception and non-predictive exception analysis on the valid application logs that do not belong to the stack;

根据所述配置项发送与分析结果相关的告警信息。Alarm information related to the analysis result is sent according to the configuration item.

根据本发明的实施例，应用日志监控方法还包括：通过可视化控台根据监控指标至少对错误码类型、统计指标类型中的一种进行监控，所述监控指标包括监控级别、是否升级报警、通知频率、通知人中的至少一种。According to an embodiment of the present invention, the application log monitoring method further includes: monitoring at least one of an error code type and a statistical indicator type through a visual console according to a monitoring indicator, where the monitoring indicator includes a monitoring level, whether to upgrade an alarm, a notification At least one of frequency and notification person.

根据本发明的实施例，发送所述告警信息包括：利用分钟级别定时任务来判断是否满足告警条件。According to an embodiment of the present invention, sending the alarm information includes: using a minute-level timed task to determine whether an alarm condition is satisfied.

根据本发明的实施例，根据所述配置项发送与分析结果相关的告警信息包括：基于指标来获取对应的统计项的结果，并对所述统计项的结果与所述指标的阈值进行比对；如果所述统计项的结果达到所述指标的所述阈值，则发送所述告警信息。According to an embodiment of the present invention, sending the alarm information related to the analysis result according to the configuration item includes: acquiring the result of the corresponding statistical item based on the indicator, and comparing the result of the statistical item with the threshold of the indicator ; If the result of the statistical item reaches the threshold of the indicator, send the alarm information.

根据本发明的实施例，应用日志监控方法还包括：在所述应用程序本地安装用于抓取所述收集应用日志的收集工具。According to an embodiment of the present invention, the application log monitoring method further includes: installing a collection tool locally on the application program for capturing the collection application log.

根据本发明的另一方面，提供了一种用于应用程序的应用日志监控系统，其特征在于，包括：According to another aspect of the present invention, an application log monitoring system for an application program is provided, characterized in that it includes:

收集工具，安装于所述应用程序本地，用于将应用日志采集至分布式消息队列；a collection tool, installed locally in the application program, for collecting application logs to a distributed message queue;

日志路由模块，用于对所述分布式消息队列中的应用日志的有效性进行检验，以筛选出有效应用日志；a log routing module, configured to check the validity of the application logs in the distributed message queue, so as to filter out valid application logs;

日志处理模块，用于判断所筛选的所述有效应用日志是否属于堆栈，并且根据预定规则指标的配置项，对不属于所述堆栈的所述有效应用日志进行预见性异常和非预见性异常分析；A log processing module, configured to judge whether the screened valid application logs belong to the stack, and perform predictive and unforeseen exception analysis on the valid application logs that do not belong to the stack according to the configuration items of the predetermined rule indicators ;

报警通知模块，用于根据所述配置项发送与分析结果相关的告警信息。An alarm notification module, configured to send alarm information related to the analysis result according to the configuration item.

根据本发明的实施例，所述报警通知模块还用于接收无日志系统发送的socket请求或微服务请求，并且无日志应用被转换为所述有效应用日志。According to an embodiment of the present invention, the alarm notification module is further configured to receive a socket request or a microservice request sent by the no-log system, and the no-log application is converted into the effective application log.

根据本发明的实施例，应用日志监控系统还包括跑批模块，用于定时加载所述配置项和相关指标。According to the embodiment of the present invention, the application log monitoring system further includes a batch running module, which is used for regularly loading the configuration items and related indicators.

根据本发明的实施例，应用日志监控系统还包括可视化控台，用于根据监控指标至少对错误码类型、统计指标类型中的一种进行监控，所述监控指标包括监控级别、是否升级报警、通知频率、通知人中的至少一种。According to an embodiment of the present invention, the application log monitoring system further includes a visual console for monitoring at least one of error code types and statistical index types according to monitoring indicators, where the monitoring indicators include monitoring levels, whether to upgrade alarms, At least one of notification frequency and notification person.

根据本发明的实施例，应用日志监控系统还包括分布式缓存或数据库，用于存储所述日志处理模块的分析结果。According to an embodiment of the present invention, the application log monitoring system further includes a distributed cache or database for storing the analysis results of the log processing module.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings required in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some of the present invention. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是根据本发明实施例的用于应用程序的应用日志监控方法的流程图；1 is a flowchart of an application log monitoring method for an application according to an embodiment of the present invention;

图2是根据本发明实施例的应用日志监控方法的告警步骤的流程图；2 is a flowchart of an alarming step of an application log monitoring method according to an embodiment of the present invention;

图3是根据本发明实施例的应用日志监控系统的架构示意图；3 is a schematic diagram of the architecture of an application log monitoring system according to an embodiment of the present invention;

图4是根据本发明实施例的应用日志监控系统的可视化控台错误码类配置的示意图；4 is a schematic diagram of the configuration of the error code class of the visual console of the application log monitoring system according to an embodiment of the present invention;

图5是根据本发明实施例的判断堆栈入库的配置项的示意图。FIG. 5 is a schematic diagram of judging a configuration item of stack storage according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments in the present invention, all other embodiments obtained by those of ordinary skill in the art fall within the protection scope of the present invention.

针对现有技术中的缺点，本发明提供了一种用于应用程序的应用日志监控方法和应用日志监控系统。该应用日志监控方法包括以下步骤S1至S4。以下接合图1所示，对本发明的应用日志监控方法进行说明。In view of the shortcomings in the prior art, the present invention provides an application log monitoring method and an application log monitoring system for an application program. The application log monitoring method includes the following steps S1 to S4. The application log monitoring method of the present invention will be described below with reference to FIG. 1 .

(1)步骤S1(1) Step S1

在该步骤S1处，首先对收集的应用日志的有效性进行检验，以筛选出有效应用日志。具体的，在步骤S1中，可以在判断应用日志的有效性之前收集应用日志，通过步骤S11进行filebeat(一种本地文件的日志数据采集器)正则校验；在步骤S12处，判断应用日志校验是否通过，在应用日志校验通过的情况下进行至步骤S13，将收集的应用日志上送至集群，例如上送至kafka集群。然后在图1中的步骤S14处，可以通过日志路由模块来消费集群的应用日志，并且判断应用日志是否有效。在判断引用日志有效的情况下，方法进行至步骤S2。At this step S1, firstly, the validity of the collected application logs is checked to filter out valid application logs. Specifically, in step S1, the application log can be collected before judging the validity of the application log, and the regular verification of filebeat (a log data collector for local files) is performed in step S11; in step S12, it is judged that the application log is correct If the verification is passed, if the application log verification is passed, go to step S13, and upload the collected application log to the cluster, for example, to the kafka cluster. Then at step S14 in FIG. 1 , the application log of the cluster can be consumed through the log routing module, and it is determined whether the application log is valid. In the case that the reference log is judged to be valid, the method proceeds to step S2.

(2)步骤S2(2) Step S2

在该步骤S2处，判断所筛选的所述有效应用日志是否属于堆栈。如图1所示，在该示出的方法流程示例中，在步骤S15处判断所筛选的所述有效应用日志是否属于堆栈。可以根据实际需要来配置判断是否属于堆栈的规则。当判断有效应用日志属于堆栈时，进行步骤S16，堆栈应用日志入库；当判断有效应用日志不属于堆栈时，则进行步骤S17，对需要监控的有效应用日志进行处理。图1中步骤S16处堆栈日志入库与不属于堆栈的有效日志入库逻辑相同。图5是根据本发明实施例的判断堆栈入库的配置项的示意图。At this step S2, it is judged whether the filtered valid application logs belong to the stack. As shown in FIG. 1 , in the illustrated method flow example, it is determined at step S15 whether the filtered valid application logs belong to a stack. The rules for judging whether it belongs to a stack can be configured according to actual needs. When it is determined that the valid application log belongs to the stack, step S16 is performed, and the stack application log is stored; when it is determined that the valid application log does not belong to the stack, step S17 is performed to process the valid application log that needs to be monitored. The logging of the stack log at step S16 in FIG. 1 is the same as the logging of the valid log that does not belong to the stack. FIG. 5 is a schematic diagram of judging a configuration item of stack storage according to an embodiment of the present invention.

(3)步骤S3(3) Step S3

在该步骤S3处，根据预定规则指标的配置项，对不属于所述堆栈的所述有效应用日志进行预见性异常和非预见性异常分析。如图1所示，在该示出的方法流程示例中，在步骤S18处判断是否有配置项。在有配置项的情况下，进行步骤S19，将已经筛选到的有效日志做进一步分析统计以及逻辑运算，匹配既定规则指标的配置项，异步运算，分别针对未预见异常和预见性异常进行分析，并将最终结果进行持久化。根据本发明的实施例，堆栈入库(堆栈异常)的配置项可以与监控有效日志的既定规则指标的配置项相同。At this step S3, according to the configuration items of predetermined rule indicators, predictive exception and non-predictive exception analysis are performed on the valid application logs that do not belong to the stack. As shown in FIG. 1 , in the illustrated method flow example, it is determined whether there is a configuration item at step S18 . If there is a configuration item, go to step S19, further analyze the valid logs that have been screened and perform logical operations, match the configuration items of the established rule indicators, perform asynchronous operations, and analyze the unforeseen anomalies and foreseeable anomalies respectively, and persist the final result. According to the embodiment of the present invention, the configuration item of stack storage (stack exception) may be the same as the configuration item of the established rule indicator for monitoring effective logs.

(4)步骤S4(4) Step S4

根据所述配置项发送与分析结果相关的告警信息。将已经持久化的最终结果根据指标配置项以及相关事件经过一定规则通知到各应用负责人。在一些实施例中，可以通过邮件、短信、钉钉、语音电话等方式来发送生成的告警信息。Alarm information related to the analysis result is sent according to the configuration item. The final result that has been persisted is notified to each application responsible person through certain rules according to the indicator configuration items and related events. In some embodiments, the generated alarm information may be sent by means of email, short message, DingTalk, or voice call.

本发明的上述应用日志监控方法，通过对应用日志文件中更新的内容可以实时监控并建立对应的处理方法，以解决现有监控应用日志方法无法实时、有针对性地对日志中业务逻辑可能出现的缺陷进行聚合分析和告警的技术问题，从而能够实时、准确地对系统及应用进行处理，满足应用层面对报警监控的实时性要求，进而实现提高监控效率以及发现问题的即时性。本发明无需介入应用代码，完成对应指标以及堆栈异常监控。有效提高监控效率，节省人力成本。The above-mentioned application log monitoring method of the present invention can monitor the updated content in the application log file in real time and establish a corresponding processing method, so as to solve the problem that the existing method for monitoring the application log cannot detect the possible occurrence of business logic in the log in a real-time and targeted manner. It can process the system and application in real time and accurately, and meet the real-time requirements of alarm monitoring at the application level, so as to improve the monitoring efficiency and the immediacy of problem discovery. The present invention does not need to intervene in the application code, and completes the monitoring of corresponding indicators and stack exceptions. Effectively improve monitoring efficiency and save labor costs.

本发明的上述方案可以适用于多种应用程序。例如，与刷卡业务的各类支付成功率、银行通道出入款成功率、风控系统拦截情况等相关的应用程序。The above solution of the present invention can be applied to a variety of applications. For example, applications related to various payment success rates of card swiping business, bank channel deposit and withdrawal success rates, and risk control system interception.

在一些实施例中，本发明应用日志监控方法的利用分钟级别定时任务来判断是否满足告警条件。如图2所示，通过分钟级别定时任务，当判断满足告警条件时，将任务加入待发送队列等待发送。当判断不满足告警条件时，则清空统计结果。In some embodiments, the present invention applies a minute-level timed task of the log monitoring method to determine whether an alarm condition is satisfied. As shown in Figure 2, through the minute-level timing task, when it is judged that the alarm condition is met, the task is added to the to-be-sent queue for sending. When it is judged that the alarm conditions are not met, the statistical results are cleared.

图3是根据本发明实施例的应用日志监控系统的架构示意图。上述应用日志监控方法可以应用于图3所示的应用日志监控系统。如图3所示，本发明的应用日志监控系统包括收集工具21、日志路由模块22、日志处理模块23和报警通知模块24。FIG. 3 is a schematic structural diagram of an application log monitoring system according to an embodiment of the present invention. The above application log monitoring method can be applied to the application log monitoring system shown in FIG. 3 . As shown in FIG. 3 , the application log monitoring system of the present invention includes a collection tool 21 , a log routing module 22 , a log processing module 23 and an alarm notification module 24 .

收集工具21可以安装于所述应用程序本地，并且用于将应用日志采集至分布式消息队列。日志路由模块22用于对所述分布式消息队列中的应用日志的有效性进行检验，以筛选出有效应用日志。日志处理模块23用于判断所筛选的所述有效应用日志是否属于堆栈，并且根据预定规则指标的配置项。日志处理模块23还对不属于所述堆栈的所述有效应用日志进行预见性异常和非预见性异常分析。报警通知模块24用于根据所述配置项发送与分析结果相关的告警信息。The collection tool 21 can be installed locally in the application, and is used to collect application logs to a distributed message queue. The log routing module 22 is configured to check the validity of the application logs in the distributed message queue, so as to filter out valid application logs. The log processing module 23 is configured to judge whether the filtered valid application log belongs to the stack, and according to the configuration item of the predetermined rule index. The log processing module 23 also performs predictive and unpredictable exception analysis on the valid application logs that do not belong to the stack. The alarm notification module 24 is configured to send alarm information related to the analysis result according to the configuration item.

在图3所示的实施例中，当对象为Niginx(一种HTTP和反向代理web服务器)系统日志及各种语言(如Java、Python等)应用日志时，收集工具21可以采用Logstash(一种应用程序日志管理平台)或Filebeat组件，Logstash或Filebeat组件采集日志文件信息至分布式消息队列Kafka(一种分布式发布订阅消息系统)。当对象为无日志系统时，收集工具21可以采用socket(套接字)请求或微服务请求主动发送至报警通知模块24处理。日志路由模块22从分布式消息队列Kafka消费消息到路由应用，转化为字节对象，校验日志有效性。可以通过既定规则筛选出有效日志以等待下一步分析处理。日志处理模块23将已经筛选到的有效日志做进一步分析统计以及逻辑运算，匹配既定规则指标的配置项，异步运算，分别针对未预见异常和预见性异常进行分析，将最终结果进行持久化。In the embodiment shown in FIG. 3, when the objects are Niginx (an HTTP and reverse proxy web server) system logs and application logs in various languages (such as Java, Python, etc.), the collection tool 21 can use Logstash (a HTTP and reverse proxy web server) An application log management platform) or Filebeat components, Logstash or Filebeat components collect log file information to the distributed message queue Kafka (a distributed publish-subscribe messaging system). When the object is a log-free system, the collection tool 21 can actively send a socket (socket) request or a micro-service request to the alarm notification module 24 for processing. The log routing module 22 consumes messages from the distributed message queue Kafka to the routing application, converts them into byte objects, and verifies the validity of the logs. Valid logs can be filtered out through established rules for further analysis and processing. The log processing module 23 performs further analysis, statistics and logical operations on the selected valid logs, matches the configuration items of the established rule indicators, performs asynchronous operations, analyzes unforeseen exceptions and foreseeable exceptions respectively, and persists the final results.

在一个具体示例中，预见异常可以例如是：某通道服务因为发布或者某些其他原因导致服务不可用，可以不用抛出异常而打印日志来通过本系统获取；在该示例中，既定规则指标可以是：以上预见异常，如果偶发不同则按照堆栈异常报警，也可以通过定义错误码来进行统计，例如1分钟内出现特定次数就发出报警；在该示例中，配置项可以包括：该服务连接失败次数、交易成功率等。In a specific example, the foreseeable exception can be, for example: a channel service is unavailable due to publishing or some other reasons, and the log can be printed without throwing an exception to obtain it through the system; in this example, the established rule indicator can be Yes: the above foreseen exceptions, if they are different occasionally, they will be alarmed according to stack exceptions, and statistics can also be made by defining error codes, for example, an alarm will be issued when a specific number of times occurs within 1 minute; in this example, the configuration items can include: The service connection fails Number of times, transaction success rate, etc.

报警通知模块24将已经持久化的最终结果根据指标配置项以及相关事件经过一定规则通知到各应用负责人。所述报警通知模块24还用于接收无日志系统发送的socket请求或微服务请求，并且无日志应用被转换为所述有效应用日志。The alarm notification module 24 notifies the person in charge of each application of the persistent final result according to the indicator configuration item and related events through certain rules. The alarm notification module 24 is further configured to receive a socket request or a microservice request sent by the no-log system, and the no-log application is converted into the effective application log.

继续参考图3所示，本发明的应用日志监控系统还包括跑批模块25和可视化控台26。跑批模块25用于定时加载所述配置项和相关指标。可视化控台26，用于根据监控指标至少对错误码类型、统计指标类型中的一种进行监控，所述监控指标包括监控级别、是否升级报警、通知频率、通知人中的至少一种。如图4所示，本发明提供了可视化控台26来实现可视化监控，可视化控台26可以分别对错误码类型，统计指标类型等进行监控，其中涉及的指标包括但不限于监控级别、投屏状态、是否自动升级报警、是否有预案、通知人、描述、以及通知频率等。Continuing to refer to FIG. 3 , the application log monitoring system of the present invention further includes a batch running module 25 and a visualization console 26 . The batch running module 25 is used to load the configuration items and related indicators regularly. The visual console 26 is configured to monitor at least one of error code types and statistical index types according to monitoring indicators, where the monitoring indicators include at least one of monitoring level, alarm escalation, notification frequency, and notification person. As shown in FIG. 4 , the present invention provides a visual console 26 to implement visual monitoring. The visual console 26 can monitor error code types, statistical index types, etc. Status, whether to automatically escalate the alarm, whether there is a plan, who to notify, description, and notification frequency, etc.

本发明的应用日志监控系统还包括数据库27，数据库27包括分布式缓存redis或数据库polardb，用于存储所述日志处理模块的分析结果。The application log monitoring system of the present invention further includes a database 27, and the database 27 includes a distributed cache redis or a database polardb for storing the analysis results of the log processing module.

本发明的应用日志监控系统通过在待监控的应用程序或者资源本地安装logstash/filebeat收集工具，对应用程序的应用日志进行抓取。可以通过在收集工具上上配置一些特定的代码，来对应用日志进行格式拼接等操作。如果是redis等中间件，还可以借助特定脚本。之后，应用日志被送到Kafka集群。其中，filebeat收集工具不会对本地日志做任何过滤，所有的过滤由本发明的方法执行。The application log monitoring system of the present invention captures the application log of the application program by locally installing the logstash/filebeat collection tool in the application program or resource to be monitored. By configuring some specific codes on the collection tool, you can perform operations such as format splicing of application logs. If it is middleware such as redis, you can also use specific scripts. After that, the application logs are sent to the Kafka cluster. Among them, the filebeat collection tool does not perform any filtering on the local log, and all filtering is performed by the method of the present invention.

监控系统首先是通过从分布式消息队列Kafka获取日志文件，然后基于系统进行分发到后端的日志处理核心节点，核心节点会基于对日志转换成内部对象，然后基于配置的统计项对日志做分析统计，统计的结果放到分布式缓存Redis或者数据库polardb中。报警系统会定时基于指标来获取对应的统计项的结果进行汇总和运算，最终与指标的阈值进行比对，如果触碰了阈值，则进行报警。The monitoring system first obtains log files from the distributed message queue Kafka, and then distributes them to the back-end log processing core nodes based on the system. The core nodes convert the logs into internal objects, and then analyze the logs based on the configured statistics items. , the statistical results are placed in the distributed cache Redis or the database polardb. The alarm system will periodically collect and calculate the results of the corresponding statistical items based on the indicators, and finally compare it with the threshold of the indicator. If the threshold is touched, an alarm will be issued.

传统的日志监控技术只针对日志进行统计记录，只有类似于Cpu、内存等的主机监控，而没有针对应用本身的具体逻辑的预见性和非预见性错误的监控，这样造成的结果是异常的发现是滞后的，等待问题发现然后登陆机器查看日志来回溯问题产生原因，这样检查到的往往不是第一现场而是快照。The traditional log monitoring technology only records statistics for logs, and only monitors hosts such as Cpu and memory, but does not monitor the predictive and unpredictable errors of the specific logic of the application itself. The result is abnormal discovery. It is a lag, waiting for the problem to be discovered and then logging in to the machine to view the log to trace the cause of the problem. In this way, what is often checked is not the first scene but a snapshot.

在本发明中，未对系统和应用程序进行任何侵入性代码植入，系统应用对预见性可能出现的错误打印一条规范日志，在出现问题的第一时间实时通知到负责人，从而可即时做出决策(回滚或者其他补救措施)，尤其是针对支付类等容错率低的应用程序。In the present invention, no intrusive code implantation is performed on the system and the application program, and the system application prints a specification log for the foreseeable errors, and notifies the person in charge in real time when the problem occurs, so that the immediate action can be taken. decision (rollback or other remedial action), especially for applications with low fault tolerance such as payments.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the scope of the present invention. within the scope of protection.

Claims

1. An application log monitoring method for an application program, comprising:

checking the effectiveness of the collected application logs to screen out effective application logs;

judging whether the screened effective application logs belong to a stack or not;

according to configuration items of a preset rule index, performing predictive anomaly and non-predictive anomaly analysis on the effective application logs which do not belong to the stack;

and sending alarm information related to the analysis result according to the configuration item.

2. The application log monitoring method for an application program according to claim 1, further comprising:

and monitoring at least one of an error code type and a statistical index type through a visual console according to a monitoring index, wherein the monitoring index comprises at least one of a monitoring grade, whether to upgrade an alarm, a notification frequency and a notifier.

3. The application log monitoring method for the application program according to claim 1, wherein sending the alarm information comprises:

and judging whether the alarm condition is met by utilizing the minute-level timing task.

4. The application log monitoring method for the application program according to claim 1, wherein sending alarm information related to the analysis result according to the configuration item comprises:

acquiring a result of a corresponding statistical item based on an index, and comparing the result of the statistical item with a threshold value of the index;

and if the result of the statistical item reaches the threshold value of the index, sending the alarm information.

5. The application log monitoring method for an application program according to claim 1, further comprising:

and installing a collection tool for grabbing the collected application log locally in the application program.

6. An application log monitoring system for an application program, comprising:

the collection tool is installed locally in the application program and used for collecting the application logs to the distributed message queue;

the log routing module is used for checking the validity of the application logs in the distributed message queue so as to screen out valid application logs;

the log processing module is used for judging whether the screened effective application logs belong to a stack or not, and analyzing foreseeable exception and unpredicted exception of the effective application logs which do not belong to the stack according to configuration items of preset rule indexes;

and the alarm notification module is used for sending alarm information related to the analysis result according to the configuration item.

7. The application log monitoring system for an application program of claim 6,

the alarm notification module is further used for receiving a socket request or a micro-service request sent by the no-log system, and the no-log application is converted into the effective application log.

8. The application log monitoring system for an application program of claim 6, further comprising:

and the batch running module is used for loading the configuration items and the related indexes at fixed time.

9. The application log monitoring system for an application program of claim 6, further comprising:

and the visual control console is used for monitoring at least one of the error code type and the statistical index type according to monitoring indexes, wherein the monitoring indexes comprise at least one of monitoring level, alarm upgrading or not, notification frequency and notifier.

10. The application log monitoring system for an application program of claim 6, further comprising:

and the distributed cache or the database is used for storing the analysis result of the log processing module.