[go: up one dir, main page]

CN114090366A - Method, device and system for monitoring data - Google Patents

Method, device and system for monitoring data Download PDF

Info

Publication number
CN114090366A
CN114090366A CN202010904700.9A CN202010904700A CN114090366A CN 114090366 A CN114090366 A CN 114090366A CN 202010904700 A CN202010904700 A CN 202010904700A CN 114090366 A CN114090366 A CN 114090366A
Authority
CN
China
Prior art keywords
data
index data
indicator data
indicator
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010904700.9A
Other languages
Chinese (zh)
Other versions
CN114090366B (en
Inventor
王毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010904700.9A priority Critical patent/CN114090366B/en
Publication of CN114090366A publication Critical patent/CN114090366A/en
Application granted granted Critical
Publication of CN114090366B publication Critical patent/CN114090366B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种监控数据的方法、装置和系统,涉及计算机技术领域。该方法的一具体实施方式包括服务端接收客户端所采集的指标数据,利用统一接口执行对第三方时间序列数据库的读写操作,以监控指标数据的存储和读取,克服了现有系统对第三方时间序列数据库单向写的缺陷,提高了监控数据的效率,并提高了指标数据的流动性和利用率;通过确定全局业务标签,解决了由于业务标签不规范造成所收集的指标数据关联性差的问题;并通过将接收到的指标数据放入消息队列,解决了海量数据高并发问题。

Figure 202010904700

The invention discloses a method, device and system for monitoring data, and relates to the technical field of computers. A specific implementation of the method includes that the server receives the indicator data collected by the client, and uses a unified interface to perform read and write operations on a third-party time series database to monitor the storage and reading of the indicator data, which overcomes the need for existing systems to The defect of one-way writing of the third-party time series database improves the efficiency of monitoring data, and improves the liquidity and utilization of indicator data; by determining the global business label, it solves the correlation of the collected indicator data caused by the irregular business label. The problem of poor performance is solved; and by putting the received indicator data into the message queue, the problem of high concurrency of massive data is solved.

Figure 202010904700

Description

一种监控数据的方法、装置和系统A method, device and system for monitoring data

技术领域technical field

本发明涉及计算机技术领域,尤其涉及一种监控数据的方法、装置和系统。The present invention relates to the field of computer technology, and in particular, to a method, device and system for monitoring data.

背景技术Background technique

Prometheus系统作为一套监控系统,提供完整的数据监控解决方案,因为其生态的开放性以及多组件灵活性,在数据监控领域得到了广泛的部署应用。As a monitoring system, the Prometheus system provides a complete data monitoring solution. Because of its ecological openness and multi-component flexibility, it has been widely deployed and applied in the field of data monitoring.

在实现本发明过程中,发明人发现现有技术中至少存在如下问题:In the process of realizing the present invention, the inventor found that there are at least the following problems in the prior art:

Prometheus系统接收的指标数据存储在本地存储介质,由于存储空间的限制而使海量数据存储面临挑战。当将指标数据存储在第三方数据库时,仅支持写入第三方数据库,而不支持直接读取第三方数据库的数据,当存在多个第三方数据库时,增大了监控第三方数据库读写的复杂度,同时,造成数据流动性差,数据利用率低的问题。The indicator data received by the Prometheus system is stored in the local storage medium. Due to the limitation of storage space, mass data storage faces challenges. When the indicator data is stored in a third-party database, it only supports writing to the third-party database, but does not support reading the data of the third-party database directly. The complexity, at the same time, causes the problems of poor data mobility and low data utilization.

Prometheus系统获取指标数据的方式复杂多样,例如获取目标数据源的数据,或者获取数据网关所存储指标数据,或者获取通过服务发现所发现的指标数据,由此带来由于监控数据标签不规范而造成的数据之间关联性缺失的问题,为后期数据关联分析带来了困难。当采集海量数据时,Prometheus系统存在高并发数据采集的缺陷,在处理海量数据的采集过程中存在数据丢失的问题,同时海量数据的集中处理使Prometheus系统存在宕机风险。The Prometheus system acquires indicator data in complex and diverse ways, such as acquiring data from a target data source, or acquiring indicator data stored in a data gateway, or acquiring indicator data discovered through service discovery, which results in irregular monitoring data labels. The problem of lack of correlation between the data brings difficulties to the later data correlation analysis. When collecting massive data, the Prometheus system has the defect of high concurrent data collection, and there is a problem of data loss in the process of processing massive data collection. At the same time, the centralized processing of massive data makes the Prometheus system at risk of downtime.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明实施例提供一种监控数据的方法、装置和系统,服务端接收客户端所采集的指标数据,利用统一接口执行对第三方时间序列数据库的读写操作,以监控指标数据的存储和读取,克服了现有系统对第三方时间序列数据库单向写的缺陷,提高了监控数据的效率,并提高了指标数据的流动性和利用率;通过确定全局业务标签,解决了由于业务标签不规范造成所收集的指标数据关联性差的问题;并通过将接收到的指标数据放入消息队列,解决了海量数据高并发问题。In view of this, embodiments of the present invention provide a method, device, and system for monitoring data. The server receives the indicator data collected by the client, and uses a unified interface to perform read and write operations on a third-party time series database to monitor the indicator data. It overcomes the defect of one-way writing to the third-party time series database in the existing system, improves the efficiency of monitoring data, and improves the liquidity and utilization of indicator data; The problem of poor correlation of the collected indicator data due to irregular business labels; and by putting the received indicator data into the message queue, the problem of high concurrency of massive data is solved.

为实现上述目的,根据本发明实施例的一个方面,提供了一种监控数据的方法,应用于Prometheus系统,其特征在于,包括:接收指标数据,根据所述指标数据的类别,确定所述指标数据对应的全局业务标签,将所述全局业务标签以及对应的内容添加至各个所述指标数据中;形成目标指标数据并存储于消息队列中;利用Prometheus服务器从所述消息队列获取所述目标指标数据,根据设定监控策略,监控所述目标指标数据,并将所述目标指标数据存储于时间序列数据库中,以监控所述指标数据的存储;接收查询请求,根据所述查询请求中的目标指标数据,确定所述目标指标数据对应的时间序列数据库;通过Prometheus服务器从所述时间序列数据库中获取所述目标指标数据,以监控所述指标数据的读取。In order to achieve the above object, according to an aspect of the embodiments of the present invention, a method for monitoring data is provided, which is applied to the Prometheus system, characterized in that it includes: receiving indicator data, and determining the indicator according to the category of the indicator data The global service tag corresponding to the data, adding the global service tag and corresponding content to each of the indicator data; forming target indicator data and storing it in the message queue; using the Prometheus server to obtain the target indicator from the message queue data, according to the set monitoring strategy, monitor the target indicator data, and store the target indicator data in a time series database to monitor the storage of the indicator data; receive a query request, according to the target in the query request indicator data, determine the time series database corresponding to the target indicator data; obtain the target indicator data from the time series database through the Prometheus server to monitor the reading of the indicator data.

可选地,所述监控数据的方法,其特征在于,Optionally, the method for monitoring data is characterized in that:

根据第一格式的格式规则,将基于第二格式的所述指标数据的数据格式转换为所述第一格式。According to the format rule of the first format, the data format of the indicator data based on the second format is converted into the first format.

可选地,所述监控数据的方法,其特征在于,Optionally, the method for monitoring data is characterized in that:

当所述指标数据包含非数字的数值时,根据预定义的非数字与数字的对应关系,将所述非数字的数值转换为对应的数字。When the indicator data includes a non-numeric value, the non-numeric value is converted into a corresponding number according to a predefined correspondence between non-digits and numbers.

可选地,所述监控数据的方法,其特征在于,Optionally, the method for monitoring data is characterized in that:

获取所述目标指标数据的查询请求,根据时间序列数据库的语法规则,转换所述查询请求包含的运算符。For the query request for acquiring the target indicator data, the operators included in the query request are converted according to the grammatical rules of the time series database.

可选地,所述监控数据的方法,其特征在于,Optionally, the method for monitoring data is characterized in that:

基于远程进程调用模型存储所述指标数据至所述时间序列数据库、读取所述时间序列数据库的所述指标数据。Based on the remote process call model, the indicator data is stored in the time series database, and the indicator data in the time series database is read.

为实现上述目的,根据本发明实施例的第二方面,提供了一种监控数据的方法,其特征在于,包括:采集指标数据,根据配置的全局业务标签和网络地址,按照设定周期发送所述指标数据以及所述全局业务标签至所述网络地址。In order to achieve the above object, according to a second aspect of the embodiments of the present invention, a method for monitoring data is provided, which is characterized by comprising: collecting indicator data, and sending all data according to a set period according to the configured global service label and network address. The indicator data and the global service label are sent to the network address.

可选地,所述监控数据的方法,其特征在于,Optionally, the method for monitoring data is characterized in that:

利用数据采集软件包采集指标数据。The indicator data was collected using the data collection software package.

可选地,所述监控数据的方法,其特征在于,Optionally, the method for monitoring data is characterized in that:

利用所述指标数据采集软件包所包含的注册方法,增加自定义指标,并利用所述数据采集软件包采集所述自定义指标相对应的指标数据。Using the registration method included in the indicator data collection software package, a custom indicator is added, and the data collection software package is used to collect indicator data corresponding to the self-defined indicator.

可选地,所述监控数据的方法,其特征在于,Optionally, the method for monitoring data is characterized in that:

利用指标数据采集脚本采集指标数据。Use the indicator data collection script to collect indicator data.

为实现上述目的,根据本发明实施例的第三方面,提供了一种监控数据的装置,其特征在于,应用于Prometheus系统,包括:数据处理模块和数据读写模块;其中,In order to achieve the above object, according to a third aspect of the embodiments of the present invention, a device for monitoring data is provided, which is characterized in that, when applied to the Prometheus system, it includes: a data processing module and a data reading and writing module; wherein,

所述数据处理模块,用于接收指标数据,根据所述指标数据的类别,确定所述指标数据对应的全局业务标签,将所述全局业务标签以及对应的内容添加至各个所述指标数据中;形成目标指标数据并存储于消息队列中;The data processing module is configured to receive indicator data, determine a global business label corresponding to the indicator data according to the category of the indicator data, and add the global business label and corresponding content to each of the indicator data; The target indicator data is formed and stored in the message queue;

所述数据读写模块,用于利用Prometheus服务器从所述消息队列获取所述目标指标数据,根据设定监控策略,监控所述目标指标数据,并将所述目标指标数据存储于时间序列数据库中,以监控所述指标数据的存储;接收查询请求,根据所述查询请求中的目标指标数据,确定所述目标指标数据对应的时间序列数据库;通过Prometheus服务器从所述时间序列数据库中获取所述目标指标数据,以监控所述指标数据的读取。The data reading and writing module is used to obtain the target indicator data from the message queue by using the Prometheus server, monitor the target indicator data according to the set monitoring strategy, and store the target indicator data in a time series database to monitor the storage of the indicator data; receive a query request, and determine the time series database corresponding to the target indicator data according to the target indicator data in the query request; obtain the time series database through the Prometheus server from the time series database Target metric data to monitor the reading of the metric data.

可选地,所述监控数据的装置,其特征在于,Optionally, the device for monitoring data is characterized in that:

根据第一格式的格式规则,将基于第二格式的所述指标数据的数据格式转换为所述第一格式。According to the format rule of the first format, the data format of the indicator data based on the second format is converted into the first format.

可选地,所述监控数据的装置,其特征在于,Optionally, the device for monitoring data is characterized in that:

当所述指标数据包含非数字的数值时,根据预定义的非数字与数字的对应关系,将所述非数字的数值转换为对应的数字。When the indicator data includes a non-numeric value, the non-numeric value is converted into a corresponding number according to a predefined correspondence between non-digits and numbers.

可选地,所述监控数据的装置,其特征在于,Optionally, the device for monitoring data is characterized in that:

获取所述目标指标数据的查询请求,根据时间序列数据库的语法规则,转换所述查询请求包含的运算符。For the query request for acquiring the target indicator data, the operators included in the query request are converted according to the grammatical rules of the time series database.

可选地,所述监控数据的装置,其特征在于,Optionally, the device for monitoring data is characterized in that:

基于远程进程调用模型存储所述指标数据至所述时间序列数据库、读取所述时间序列数据库的所述指标数据。Based on the remote process call model, the indicator data is stored in the time series database, and the indicator data in the time series database is read.

为实现上述目的,根据本发明实施例的第四方面,提供了一种监控数据的装置,其特征在于,包括:数据采集模块;其中,所述数据采集模块用于采集指标数据,根据配置的全局业务标签和网络地址,按照设定周期发送所述指标数据以及所述全局业务标签至所述网络地址。In order to achieve the above object, according to a fourth aspect of the embodiments of the present invention, a device for monitoring data is provided, which is characterized by comprising: a data collection module; wherein, the data collection module is used for collecting index data, and according to the configured A global service label and a network address, and the indicator data and the global service label are sent to the network address according to a set period.

可选地,所述监控数据的装置,其特征在于,Optionally, the device for monitoring data is characterized in that:

利用数据采集软件包采集指标数据。The indicator data was collected using the data collection software package.

可选地,所述监控数据的装置,其特征在于,Optionally, the device for monitoring data is characterized in that:

利用所述指标数据采集软件包所包含的注册方法,增加自定义指标,并利用所述数据采集软件包采集所述自定义指标相对应的指标数据。Using the registration method included in the indicator data collection software package, a custom indicator is added, and the data collection software package is used to collect indicator data corresponding to the self-defined indicator.

可选地,所述监控数据的装置,其特征在于,Optionally, the device for monitoring data is characterized in that:

利用指标数据采集脚本采集指标数据。Use the indicator data collection script to collect indicator data.

为实现上述目的,根据本发明实施例的第五方面,提供了一种监控数据的系统800,包含上述第三方面提供的监控数据的装置600、以及上述第四方面提供的监控数据的装置700。To achieve the above object, according to a fifth aspect of the embodiments of the present invention, a system 800 for monitoring data is provided, including the device 600 for monitoring data provided in the third aspect, and the device 700 for monitoring data provided in the fourth aspect. .

为实现上述目的,根据本发明实施例的第六方面,提供了一种监控数据的电子设备,其特征在于,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述监控数据的方法中任一所述的方法。In order to achieve the above object, according to a sixth aspect of the embodiments of the present invention, an electronic device for monitoring data is provided, which is characterized by comprising: one or more processors; a storage device for storing one or more programs, When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of the above methods for monitoring data.

为实现上述目的,根据本发明实施例的第七方面,提供了一种计算机可读介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如上述监控数据的方法中任一所述的方法。In order to achieve the above object, according to a seventh aspect of the embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, wherein when the program is executed by a processor, the above-mentioned method for monitoring data is implemented any of the methods described above.

上述发明中的一个实施例具有如下优点或有益效果:服务端接收客户端所采集的指标数据,利用统一接口执行对第三方时间序列数据库的读写操作,以监控指标数据的存储和读取,克服了现有系统对第三方时间序列数据库单向写的缺陷,提高了监控数据的效率,并提高了指标数据的流动性和利用率;通过确定全局业务标签,解决了由于业务标签不规范造成所收集的指标数据关联性差的问题;并通过将接收到的指标数据放入消息队列,解决了海量数据高并发问题。An embodiment of the above invention has the following advantages or beneficial effects: the server receives the indicator data collected by the client, and uses a unified interface to perform read and write operations on a third-party time series database to monitor the storage and reading of the indicator data, It overcomes the defect of one-way writing to the third-party time series database in the existing system, improves the efficiency of monitoring data, and improves the liquidity and utilization of indicator data; by determining the global business label, it solves the problem caused by irregular business labels. The problem of poor correlation of the collected indicator data; and by putting the received indicator data into the message queue, the problem of high concurrency of massive data is solved.

上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。Further effects of the above non-conventional alternatives will be described below in conjunction with specific embodiments.

附图说明Description of drawings

附图用于更好地理解本发明,不构成对本发明的不当限定。其中:The accompanying drawings are used for better understanding of the present invention and do not constitute an improper limitation of the present invention. in:

图1是本发明一个实施例提供的一种监控数据的方法的流程示意图;1 is a schematic flowchart of a method for monitoring data provided by an embodiment of the present invention;

图2是本发明一个实施例提供的一种采集指标数据的方法的流程示意图;2 is a schematic flowchart of a method for collecting indicator data provided by an embodiment of the present invention;

图3是本发明一个实施例提供的一种监控数据的流程示意图;3 is a schematic flowchart of a monitoring data provided by an embodiment of the present invention;

图4是现有Prometheus系统的示意图;Fig. 4 is the schematic diagram of existing Prometheus system;

图5是本发明一个实施例提供的改进的现有Prometheus系统的示意图;5 is a schematic diagram of an improved existing Prometheus system provided by an embodiment of the present invention;

图6是本发明一个实施例提供的一种监控数据的装置的结构示意图;6 is a schematic structural diagram of an apparatus for monitoring data provided by an embodiment of the present invention;

图7是本发明一个实施例提供的一种采集指标数据的装置的结构示意图;7 is a schematic structural diagram of a device for collecting index data provided by an embodiment of the present invention;

图8是本发明一个实施例提供的一种监控数据的系统的结构示意图;8 is a schematic structural diagram of a system for monitoring data provided by an embodiment of the present invention;

图9是本发明实施例可以应用于其中的示例性系统架构图;FIG. 9 is an exemplary system architecture diagram to which an embodiment of the present invention may be applied;

图10是适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。FIG. 10 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的示范性实施例做出说明,其中包括本发明实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本发明的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

如图1所示,本发明实施例提供了一种服务端监控数据的方法,该方法可以包括以下步骤:As shown in FIG. 1 , an embodiment of the present invention provides a method for monitoring data by a server, and the method may include the following steps:

步骤S101:接收指标数据,根据所述指标数据的类别,确定所述指标数据对应的全局业务标签,将所述全局业务标签添加至各个所述指标数据中;形成目标指标数据并存储于消息队列中;Step S101: Receive indicator data, determine a global business label corresponding to the indicator data according to the category of the indicator data, and add the global business label to each of the indicator data; form target indicator data and store it in a message queue middle;

具体地,指标数据为Prometheus监控系统所监控的数据,例如:包括物理机器相关的指标数据(例如,路由器、交换机、服务器的温度、硬件故障信息等;还包括运行于物理机的系统指标数据,例如,CPU利用率,内存利用率,硬盘利用率,网卡流量,TCP状态,进程数等)、服务指标数据(例如,系统所使用的Nginx、Tomcat、PHP、MySQL、Redis等服务的运行指标数据);业务指标数据(具体业务场景所产生的指标数据,例如:例如电商网站,每分钟产生多少订单等),可以理解的是,本发明为基于Prometheus系统的组件框架而作的改进,所监控的指标数据的范围与Prometheus系统所处理的数据范围类似;Prometheus系统所包含的指标数据的格式如下:Specifically, the indicator data is the data monitored by the Prometheus monitoring system, for example: including indicator data related to physical machines (for example, the temperature of routers, switches, servers, hardware failure information, etc.; also including system indicator data running on physical machines, For example, CPU utilization, memory utilization, hard disk utilization, network card traffic, TCP status, number of processes, etc.), service indicator data (for example, operating indicator data of services such as Nginx, Tomcat, PHP, MySQL, Redis, etc. used by the system ); business indicator data (indicator data generated by specific business scenarios, such as: e-commerce website, how many orders are generated per minute, etc.), it can be understood that the present invention is an improvement based on the component framework of the Prometheus system, so The scope of the monitored indicator data is similar to that processed by the Prometheus system; the format of the indicator data contained in the Prometheus system is as follows:

metric{tagk=tagv,tagk1=tagv1,…}value;metric{tagk=tagv,tagk1=tagv1,...}value;

其中,metric为被监控的指标的标识,tagk、tagk1为与被监控的指标关联的参数的名称;tagv、tagv1为对应于参数tagk、tagk1的值;value为metric所对应的值。Among them, metric is the identifier of the monitored indicator, tagk and tagk1 are the names of the parameters associated with the monitored indicator; tagv and tagv1 are the values corresponding to the parameters tagk and tagk1; value is the value corresponding to the metric.

进一步地,根据所述指标数据的类别,确定所述指标数据对应的全局业务标签;其中所述指标数据的类别为区分指标数据的来源、业务类型等,例如,将不同集群的指标数据划分为不同类别;将来自不同网络地址范围的指标数据划分为不同类别;将不同业务的指标数据划分为不同类别(例如:将物流、电商的指标数据划分为不同类别);进一步地,根据指标数据的类别,确定所述指标数据对应的全局业务标签,例如,确定来源于集群1的指标数据对应的全局业务标签为serviceId=“cluster11111”;本发明对指标数据的类别的具体内容、全局业务标签的具体内容不做限定。Further, according to the category of the indicator data, determine the global business label corresponding to the indicator data; wherein the category of the indicator data is to distinguish the source of the indicator data, business type, etc., for example, the indicator data of different clusters is divided into Different categories; divide the index data from different network address ranges into different categories; divide the index data of different businesses into different categories (for example: divide the index data of logistics and e-commerce into different categories); further, according to the index data , determine the global service label corresponding to the indicator data, for example, determine that the global service label corresponding to the indicator data from cluster 1 is serviceId="cluster11111"; The specific content is not limited.

再进一步地,将所述全局业务标签添加至各个所述指标数据中;例如:接收指标数据标识为finishJobAvgTime,该标识可能源于不同的服务,例如Hbase、Hadoop、Spark等,通过全局业务标签根据类别,可以并在指标数据标识相同的情况下区分所收集的指标数据的来源。由此可知,利用全局业务标签使接收指标数据具有唯一性并使得指标数据具有关联性,克服了因为指标数据标签不规范而带来的数据关联度低的问题;例如:设置全局业务标签serviceID=“123456abcdef”对应于HBase服务的指标数据,全局业务标签serviceID=“789010abcdef”对应于Hadoop服务的指标数据,并将上述全局业务标签添加到对应的一批指标数据中的每一个指标数据中形成目标指标数据;通过上述步骤解决了因为指标数据标签不统一造成的数据关联度差的问题;优选地,根据指标数据的类别,根据接收的全局业务标签,确定全局业务标签;或者根据自定义的全局业务标签配置规则,基于接收的全局业务标签转换为匹配配置规则的全局业务标签。同时如果客户端在发送全局业务标签的同时发送了其他业务标签,则服务端也可以将其他业务标签添加到一批指标数据中的每一个指标数据中。Further, the global business label is added to each of the indicator data; for example: the indicator data received is identified as finishJobAvgTime, and the identification may originate from different services, such as Hbase, Hadoop, Spark, etc., through the global business label according to Category, which can and if the metric data ID is the same, distinguish the source of the collected metric data. It can be seen from this that the use of the global service tag makes the received indicator data unique and makes the indicator data relevant, which overcomes the problem of low data correlation caused by the irregularity of the indicator data tag; for example: setting the global service tag serviceID= "123456abcdef" corresponds to the indicator data of the HBase service, the global business label serviceID="789010abcdef" corresponds to the indicator data of the Hadoop service, and the above global business label is added to each indicator data in the corresponding batch of indicator data to form a target index data; the problem of poor data correlation caused by inconsistent index data labels is solved through the above steps; preferably, according to the category of index data, according to the received global business label, determine the global business label; or according to the user-defined global business label The service label configuration rule is converted into a global service label matching the configuration rule based on the received global service label. At the same time, if the client sends other business labels while sending the global business label, the server can also add other business labels to each indicator data in a batch of indicator data.

进一步地,将目标指标数据并存储于消息队列中。具体地,现有Prometheus系统按照设定周期获取目标网络单元或数据网关的数据,无法完成高并发数据采集,又,经过压力测试得到如下示例数据:4万台数据处理单元所产生的相关数据的数据量可以为219GB/天,产生的指标数量:150亿/天,每秒约产生17.3万个指标,未经调度的海量数据可能使Prometheus服务器稳定性降低,因此在海量指标数据发送至Prometheus服务器之前,使用消息队列(例如:kafka)处理指标数据,并利用消息队列缓存的机制,克服了海量数据的并发处理问题。例如:本发明的一个实施例为数据网关通过设置主题分区(例如:设置topic1指示为HBase相关的指标数据,设置topic2指示为Hadoop相关的指标数据)将接收的指标数据推送到消息队列,可以理解的是,通过设置主题分区(topic)可以对不同业务在消息队列(kafka)中的数据读写进行隔离,优选地,Prometheus服务器可以按照设定周期从消息队里中获取指标数据,或者按照设定周期由消息队列推送所述指标数据Prometheus服务器;从而部分解决了海量数据直接推送到Prometheus服务器而造成的任务阻塞或者服务器稳定性差的问题。Further, the target indicator data is stored in the message queue. Specifically, the existing Prometheus system obtains the data of the target network unit or data gateway according to the set period, and cannot complete the high concurrent data collection. Moreover, the following sample data is obtained through the stress test: the relevant data generated by 40,000 data processing units The amount of data can be 219GB/day, and the number of indicators generated: 15 billion per day, about 173,000 indicators per second. Unscheduled massive data may reduce the stability of the Prometheus server. Therefore, when massive indicator data is sent to the Prometheus server Previously, the use of message queues (for example: kafka) to process indicator data, and the use of message queue caching mechanisms, overcome the problem of concurrent processing of massive data. For example, in one embodiment of the present invention, the data gateway pushes the received indicator data to the message queue by setting topic partitions (for example, setting topic1 to indicate HBase-related indicator data, and setting topic2 to indicate Hadoop-related indicator data), which is understandable. What's more, by setting the topic partition (topic), the data reading and writing of different services in the message queue (kafka) can be isolated. Preferably, the Prometheus server can obtain the indicator data from the message queue according to the set period, or according to the set period. The indicator data is pushed to the Prometheus server periodically by the message queue; thus, the problem of task blocking or poor server stability caused by the direct push of massive data to the Prometheus server is partially solved.

又进一步地,根据第一格式的格式规则,将基于第二格式的所述指标数据的数据格式转换为所述第一格式。具体地,第一格式为现有Prometheus系统所支持的数据格式,第二格式为与现有Prometheus系统所支持的数据格式不一致的数据格式,例如:第二格式是任意可以转换为第一格式的数据格式,本发明对第二格式的具体格式不做限定。Still further, according to the format rule of the first format, the data format of the indicator data based on the second format is converted into the first format. Specifically, the first format is a data format supported by the existing Prometheus system, and the second format is a data format inconsistent with the data format supported by the existing Prometheus system, for example: the second format is any format that can be converted into the first format Data format, the present invention does not limit the specific format of the second format.

指标数据的数据格式可以是基于Prometheus系统所定义的数据格式(即,第一格式),第一格式的示例如下:The data format of the indicator data may be based on the data format (ie, the first format) defined by the Prometheus system. An example of the first format is as follows:

metric{tagk=tagv,tagk1=tagv1,…}value;其中,metric为被监控的指标的标识,tagk、tagk1为与被监控的指标关联的参数的名称;tagv、tagv1为对应于参数tagk、tagk1的值;value为对应于metric的值;进一步地,所接受的指标数据的数据格式也可以是第二格式,下面以第二格式为基于OpenTSDB数据库所定义的数据格式为例说明,第二格式示例如下:metric{tagk=tagv,tagk1=tagv1,...}value; wherein, metric is the identifier of the monitored indicator, tagk and tagk1 are the names of the parameters associated with the monitored indicator; tagv, tagv1 are corresponding to the parameters tagk, tagk1 value; value is the value corresponding to the metric; further, the data format of the accepted indicator data can also be the second format. An example is as follows:

Figure BDA0002660982430000091
Figure BDA0002660982430000091

其中,metric为被监控的指标标识,在上述示例中,metric为“query_info_12345”,tags包含与指标关联的参数的名称,例如参数1为“status”值为“failed”;根据第一格式的格式规则,将基于所述第二格式的指标数据格式转换为所述第一格式;例如,将上述基于第二格式的指标数据示例转换成第一格式为:Among them, metric is the indicator of the monitored indicator. In the above example, metric is "query_info_12345", and tags contains the name of the parameter associated with the indicator. For example, parameter 1 is "status" and the value is "failed"; according to the format of the first format The rule is to convert the indicator data format based on the second format into the first format; for example, converting the above-mentioned indicator data example based on the second format into the first format is:

query_info_12345{status=”failed”,cluster=”AAAA”}200query_info_12345{status="failed", cluster="AAAA"}200

可以理解的是,根据第一格式的格式规则,将接收到的第二格式的指标数据根据对应的标签或者内容相对应地转换为第一格式,进而对第一格式的指标数据进一步地监控和数据分析,扩展了所监控的指标数据的范围,因为数据格式的统一而降低了数据处理的复杂度。It can be understood that, according to the format rules of the first format, the received indicator data in the second format is correspondingly converted into the first format according to the corresponding label or content, and then the indicator data in the first format is further monitored and analyzed. Data analysis expands the scope of monitored indicator data, and reduces the complexity of data processing due to the unification of data formats.

步骤S102:利用Prometheus服务器从所述消息队列获取所述目标指标数据,根据设定监控策略,监控所述目标指标数据,并将所述目标指标数据存储于时间序列数据库中,以监控所述指标数据的存储;接收查询请求,根据所述查询请求中的目标指标数据,确定所述目标指标数据对应的时间序列数据库;通过Prometheus服务器从所述时间序列数据库中获取所述目标指标数据,以监控所述指标数据的读取。Step S102: Use the Prometheus server to obtain the target indicator data from the message queue, monitor the target indicator data according to the set monitoring strategy, and store the target indicator data in a time series database to monitor the indicator Data storage; receive a query request, and determine a time series database corresponding to the target indicator data according to the target indicator data in the query request; obtain the target indicator data from the time series database through the Prometheus server to monitor The reading of the indicator data.

具体地,利用Prometheus服务器(即,监控服务器)从消息队列获取目标指标数据,优选地,可以按照设定周期从消息队列获取指标数据,例如,设定周期设置为30秒;可以理解的是,根据具体业务或者业务场景、以及用户监控数据的频率和粒度,确定设置周期。例如,对实时性要求较高的指标数据,可以设置周期较短的时间,例如:可以设置为1秒。根据设定监控策略,监控所述目标指标数据,Prometheus服务器根据设定监控策略,监控所接收的指标数据,可以理解的是,监控策略根据被监控的业务场景和指标所设定,包括,设置触发告警规则、设置阈值、设置监控周期、设置被监控的指标数据等;根据设定监控策略监控所接收的指标数据为现有Prometheus服务器的功能,本发明对Prometheus服务器已经具备的功能不做进一步探讨。Specifically, the target indicator data is obtained from the message queue by using the Prometheus server (that is, the monitoring server). Preferably, the indicator data can be obtained from the message queue according to a set period, for example, the set period is set to 30 seconds; it can be understood that, The setting period is determined according to the specific business or business scenario, and the frequency and granularity of user monitoring data. For example, for indicator data with high real-time requirements, a shorter period can be set, for example, it can be set to 1 second. According to the set monitoring strategy, the target indicator data is monitored, and the Prometheus server monitors the received indicator data according to the set monitoring strategy. It can be understood that the monitoring strategy is set according to the monitored business scenarios and indicators, including setting Trigger alarm rules, set thresholds, set monitoring periods, set monitored index data, etc.; monitor the received index data according to the set monitoring strategy as the functions of the existing Prometheus server, and the present invention does not further the functions already possessed by the Prometheus server Explore.

进一步地,根据步骤S101的描述可知,接收的数据经过处理,形成格式一致的目标指标数据,Prometheus服务器将所述目标指标数据存储于时间序列数据库中,以监控所述指标数据的存储;其中,使用基于HBase的时间序列数据库OpenTSDB作为时间序列数据库,可以理解的是,可以按照指标数据的类别,将指标数据存储于多个时间序列数据库中,每个时间序列数据库可以为第三方数据库,通过将数据存储于多个第三方时间序列数据库,解决了本地存储的容量限制的问题,并且通过本步骤可以监控指标数据在多个时间序列数据库的存储。Further, according to the description of step S101, it can be known that the received data is processed to form target indicator data with a consistent format, and the Prometheus server stores the target indicator data in a time series database to monitor the storage of the indicator data; wherein, Using HBase-based time series database OpenTSDB as the time series database, it can be understood that the indicator data can be stored in multiple time series databases according to the category of indicator data, and each time series database can be a third-party database. The data is stored in multiple third-party time series databases, which solves the problem of capacity limitation of local storage, and through this step, the storage of indicator data in multiple time series databases can be monitored.

进一步地,根据时间序列数据库(OpenTSDB)的规则,不支持存储数值为非数字的指标数据,则根据预定义的非数字与数字的对应关系,将所述指标数据的值转换为对应的数字。例如,将指示服务状态的数值“active”对应为数字1,“standby”对应为数字0;即,当所述指标数据包含非数字的数值时,根据预定义的非数字与数字的对应关系,将所述非数字的数值转换为对应的数字。Further, according to the rules of the time series database (OpenTSDB), it is not supported to store indicator data whose values are non-numeric, and then the value of the indicator data is converted into a corresponding number according to the predefined correspondence between non-numerics and numbers. For example, the value "active" indicating the service status is corresponding to the number 1, and "standby" is corresponding to the number 0; that is, when the indicator data contains a non-numeric value, according to the predefined non-numeric and numeric corresponding relationship, Convert the non-numeric value to the corresponding number.

进一步地,接收查询请求,根据所述查询请求中的目标指标数据,确定所述目标指标数据对应的时间序列数据库;从所述时间序列数据库中获取所述目标指标数据,以监控所述指标数据的读取。具体地,接收对目标指标数据的查询请求,确定所述目标指标数据对应的时间序列数据库、并获取所述目标指标数据,从而监控所述指标数据的读取;与现有Prometheus系统比较,通过支持利用Prometheus服务器监控目标时间序列数据库的读取,并根据请求获取时间序列数据库中的指标数据,增加了海量数据的流动性和利用率,可以更方便地分析指标数据生成指标数据分析信息。同时,利用Prometheus服务器直接获取时间序列数据库中的指标数据,提高了对第三方时间序列数据库的监控和管理的效率,克服了现有技术中Prometheus服务器不能从第三方数据库读取数据的缺陷。Further, a query request is received, and according to the target indicator data in the query request, a time series database corresponding to the target indicator data is determined; the target indicator data is obtained from the time series database to monitor the indicator data read. Specifically, a query request for target indicator data is received, a time series database corresponding to the target indicator data is determined, and the target indicator data is acquired, so as to monitor the reading of the indicator data; compared with the existing Prometheus system, by It supports using the Prometheus server to monitor the reading of the target time series database, and obtains the indicator data in the time series database according to the request, which increases the liquidity and utilization rate of massive data, and can analyze the indicator data more conveniently to generate indicator data analysis information. At the same time, the indicator data in the time series database is directly obtained by the Prometheus server, which improves the monitoring and management efficiency of the third-party time series database, and overcomes the defect that the Prometheus server cannot read data from the third-party database in the prior art.

由于时间序列数据库(OpenTSDB)的查询语法与Prometheus系统的数据格式的查询语法存在不一致的部分,例如:Prometheus系统的数据格式中包含了“=”,“!=”,“=~”,“!~”四种运算符号,而OpenTSDB的数据格式中不包含上述运算符号;则获取所述目标指标数据的查询请求,根据时间序列数据库的语法规则进行转换,例如:可以将查询请求中的”=”和“!=”可以转换为OpenTSDB格式中的语法规则中过滤条件运算符“literal_or”;而模糊匹配运算符号“=~”和“!~”在OpenTSDB格式不存在匹配的过滤条件运算符,优选地,可以先获取包含模糊匹配运算符号的查询请求对应的返回结果,然后基于返回结果,利用“或”运算符转换为OpenTSDB所定义的时间序列数据库的语法规则,并执行进一步的查询操作。可以理解的是,Prometheus服务器从OpenTSDB获取目标指标数据后,进一步数据计算和处理在Prometheus服务器中执行。即,获取所述目标指标数据的查询请求,根据时间序列数据库的语法规则,转换所述查询请求包含的运算符。Due to the inconsistency between the query syntax of the time series database (OpenTSDB) and the query syntax of the data format of the Prometheus system, for example: the data format of the Prometheus system contains "=", "!=", "=~", "! ~" four operation symbols, and the data format of OpenTSDB does not contain the above operation symbols; the query request to obtain the target indicator data is converted according to the grammar rules of the time series database, for example, the "=" in the query request can be converted. " and "!=" can be converted to the filter condition operator "literal_or" in the grammar rules in OpenTSDB format; while fuzzy matching operators "=~" and "!~" do not have matching filter condition operators in OpenTSDB format, Preferably, the return result corresponding to the query request containing the fuzzy matching operation symbol can be obtained first, and then based on the return result, the OR operator is used to convert it into the grammar rules of the time series database defined by OpenTSDB, and further query operations are performed. It is understandable that after the Prometheus server obtains the target indicator data from OpenTSDB, further data calculation and processing are performed in the Prometheus server. That is, for the query request for acquiring the target indicator data, the operators included in the query request are converted according to the grammatical rules of the time series database.

进一步地,基于远程进程调用模型存储所述指标数据至所述目标数据库、读取所述目标数据库的所述指标数据。具体地,监控服务器(Prometheus服务器)通过远程进程调用模型(例如:gRPC)执行对时间序列数据库(例如:OpenTSDB)的读写操作;并监控对时间序列数据库的读写操作,其中gRPC可以通过结构数据序列化方法(例如:protobuf)来定义接口,结构数据序列化方法可以将数据序列化为二进制编码并对数据进行压缩,从而减少传输的数据量并提高数据传输的性能。Further, based on a remote process call model, the indicator data is stored in the target database, and the indicator data in the target database is read. Specifically, the monitoring server (Prometheus server) performs read and write operations on the time series database (for example: OpenTSDB) through the remote process call model (for example: gRPC); and monitors the read and write operations on the time series database, where gRPC can pass the structure The data serialization method (for example: protobuf) is used to define the interface, and the structured data serialization method can serialize the data into binary code and compress the data, thereby reducing the amount of data transmitted and improving the performance of data transmission.

如图2所示,本发明实施例提供了一种客户端采集指标数据的方法,该方法可以包括以下步骤:As shown in FIG. 2, an embodiment of the present invention provides a method for a client to collect indicator data, and the method may include the following steps:

步骤S201:采集指标数据,根据配置的全局业务标签和网络地址,按照设定周期发送所述指标数据以及所述全局业务标签至所述网络地址。Step S201: Collect indicator data, and send the indicator data and the global service label to the network address according to a set period according to the configured global service label and network address.

客户端采集指标数据有两种方法如下:其中,客户端为采集指标数据所用的服务器、或者计算机等;本发明对客户端归属的具体设备不做限定。There are two methods for the client to collect the index data as follows: wherein, the client is a server or computer used for collecting the index data; the present invention does not limit the specific equipment to which the client belongs.

第一种方法:利用数据采集软件包采集指标数据。The first method is to use the data collection software package to collect indicator data.

具体地,本发明的一个实施例为客户端利用java语言开发的应用来收集指标数据,客户端利用数据采集软件包(例如:javaagent.jar)完成,客户端可以利用下述命令在Java虚拟机参数中加入如下参数,-javaagent:{javaagent.jar}={IP:Port},labels={serviceId:abcd},file={a.yml},jobName={abc};具体地,关于本示例的参数描述如表1所示;Specifically, an embodiment of the present invention is that the client uses an application developed in the java language to collect indicator data, the client uses a data collection software package (for example: javaagent.jar) to complete, and the client can use the following commands Add the following parameters to the parameters, -javaagent:{javaagent.jar}={IP:Port},labels={serviceId:abcd},file={a.yml},jobName={abc}; The parameter description is shown in Table 1;

Figure BDA0002660982430000121
Figure BDA0002660982430000121

Figure BDA0002660982430000131
Figure BDA0002660982430000131

表1 Java类应用采集指标数据参数Table 1 Java application collection indicator data parameters

具体地,所述数据采集软件包包含如下功能:支持通过启动参数增加全局业务标签;通过配置数据网关网络地址直接推送指标数据到数据网关。进一步地,客户端通过配置全局业务标签,以及动态配置数据网关网络地址,在应用启动后,通过Java所具有的探针技术扫描和收集数据。并将所收集的指标数据通过超文本传输协议发送到所配置的数据网关,进一步地,数据网关通过主题信息将接收的指标数据推送到消息队列(kafka),Prometheus服务器可以从消息队列获取指标数据。Specifically, the data collection software package includes the following functions: supporting adding a global service label through startup parameters; and directly pushing indicator data to the data gateway by configuring the network address of the data gateway. Further, the client scans and collects data through the probe technology possessed by Java after the application is started by configuring the global service label and dynamically configuring the network address of the data gateway. The collected indicator data is sent to the configured data gateway through the hypertext transfer protocol. Further, the data gateway pushes the received indicator data to the message queue (kafka) through the topic information, and the Prometheus server can obtain the indicator data from the message queue. .

进一步地,当客户端需要发送基于自己业务逻辑的自定义指标的指标数据时,可以利用数据采集软件包(javaagent.jar)包含的注册方法上报数据从而完成数据采集。例如:可以使用下面所示的方法注册自定义指标metricRegistry.register(name,metricType),其中,metricRegistry.register为注册所使用的方法,metricType为指标类型,例如包含五种类型,分别为:Gauge,Counter,Meter,Timer,Histogram;其中,Gauge:记录指标的瞬时值,比如服务当前Java虚拟机的使用情况,包括内存利用率,中央处理器利用率,线程使用状态等;Counter:是计数器,通过增加和减少操作,形成累计的指标,例如:一个集群内提交的任务数量总和等;Meter:用于统计事件发生的频率,例如:统计最近1分钟、5分钟、15分钟的网络流量,用于指标的聚合计算;Timer:用于统计分布,例如:统计某接口的请求频率及耗时的数据;Histogram:用于统计指标数据的数值分布情况,例如:统计数值的最小值、最大值、平均值、中位数、75分位、90分位等。客户端可以定义指标并放在逻辑代码抛异常的位置,当异常产生时触发抛异常部分的逻辑代码发送指标对应的指标数据到数据网关,由此可见,客户端可以在抛异常部分逻辑代码部分,调用数据采集软件包所包含的注册方法进而采集自定义指标相对应的指标数据,即,利用所述指标数据采集软件包所包含的注册方法,增加自定义指标,并利用所述数据采集软件包采集所述自定义指标相对应的指标数据。Further, when the client needs to send the indicator data of the custom indicator based on its own business logic, it can use the registration method included in the data collection software package (javaagent.jar) to report the data to complete the data collection. For example, you can use the method shown below to register a custom indicator metricRegistry.register(name, metricType), where metricRegistry.register is the method used for registration, and metricType is the indicator type. For example, it contains five types, namely: Gauge, Counter, Meter, Timer, Histogram; among them, Gauge: record the instantaneous value of the indicator, such as the usage of the current Java virtual machine of the service, including memory utilization, CPU utilization, thread usage status, etc.; Counter: is a counter, through Increase and decrease operations to form cumulative indicators, such as: the sum of the number of tasks submitted in a cluster, etc.; Meter: used to count the frequency of events, such as: count the network traffic in the last 1 minute, 5 minutes, and 15 minutes, for Aggregate calculation of indicators; Timer: used for statistical distribution, such as: statistics of request frequency and time-consuming data of an interface; Histogram: used to count numerical distribution of indicator data, such as: minimum, maximum, and average statistical values Value, median, 75th percentile, 90th percentile, etc. The client can define the indicator and put it in the position where the logic code throws the exception. When the exception occurs, the logic code that triggers the exception throw sends the indicator data corresponding to the indicator to the data gateway. It can be seen that the client can throw the exception in the logic code part of the exception. , call the registration method included in the data collection software package and then collect the indicator data corresponding to the custom indicator, that is, use the registration method included in the indicator data collection software package to add a custom indicator, and use the data collection software The package collects the indicator data corresponding to the custom indicator.

第二种方法:利用指标数据采集脚本采集指标数据。The second method: use the indicator data collection script to collect indicator data.

具体地,对于非java语言所开发的应用,客户端可以利用指标数据采集脚本采集指标数据,根据配置的全局业务标签和网络地址,并将自定义的指标按照数据网关所设定的数据格式,使用超文本传输协议发送所述指标数据、全局业务标签到配置的网络地址(例如:数据网关的网络地址),开发指标数据采集脚本所使用的语言可以是Phython、Perl等,本发明对脚本的具体内容和实现方法不做限定。Specifically, for applications developed in non-java languages, the client can use the indicator data collection script to collect indicator data, according to the configured global service label and network address, and customize the indicator according to the data format set by the data gateway. Use hypertext transfer protocol to send the indicator data and global service label to the configured network address (for example: the network address of the data gateway), and the language used to develop the indicator data collection script can be Phython, Perl, etc. The specific content and implementation method are not limited.

如图3所示,本发明实施例提供了一种监控数据的流程图,该方法可以包括以下步骤:As shown in FIG. 3, an embodiment of the present invention provides a flowchart of monitoring data, and the method may include the following steps:

步骤S301:客户端采集指标数据。Step S301: The client collects indicator data.

具体地,客户端利用数据采集软件包或者数据采集脚本采集指标数据的描述与步骤S201一致,在此不再赘述。即,采集指标数据,根据配置的全局业务标签和网络地址,按照设定周期发送所述指标数据以及所述全局业务标签至所述网络地址。进一步地,利用数据采集软件包采集指标数据。利用所述指标数据采集软件包所包含的注册方法,增加自定义指标,并利用所述数据采集软件包采集所述自定义指标相对应的指标数据。又,利用指标数据采集脚本所述业务标识对应的指标数据。Specifically, the description that the client uses a data collection software package or a data collection script to collect indicator data is consistent with step S201, and details are not repeated here. That is, the indicator data is collected, and according to the configured global service label and network address, the indicator data and the global service label are sent to the network address according to a set period. Further, index data is collected by using a data collection software package. Using the registration method included in the indicator data collection software package, a custom indicator is added, and the data collection software package is used to collect indicator data corresponding to the self-defined indicator. In addition, the indicator data corresponding to the service identifier described in the script is collected by using the indicator data.

步骤S302:数据网关接收来自客户端的指标数据。Step S302: The data gateway receives the indicator data from the client.

具体地,数据网关接收指标数据,接收指标数据,根据所述指标数据的类别,确定所述指标数据对应的全局业务标签,将所述全局业务标签添加至各个所述指标数据中;形成目标指标数据并存储于消息队列中;Specifically, the data gateway receives the indicator data, receives the indicator data, determines the global service label corresponding to the indicator data according to the category of the indicator data, and adds the global service label to each of the indicator data; forms a target indicator data and stored in the message queue;

关于接收指标数据、处理指标数据的描述与步骤S101一致,在此不再赘述。The description about receiving index data and processing index data is consistent with step S101, and details are not repeated here.

步骤S303:数据网关将目标指标数据放入消息队列中。Step S303: The data gateway puts the target indicator data into the message queue.

具体地,关于数据网关将目标指标数据放入消息队列的描述与步骤S101一致,在此不再赘述。Specifically, the description about the data gateway putting the target indicator data into the message queue is consistent with step S101, and details are not repeated here.

步骤S304-步骤S305:利用Prometheus服务器从所述消息队列获取所述目标指标数据,根据设定监控策略,监控所述目标指标数据,并将所述目标指标数据存储于时间序列数据库中,以监控所述指标数据的存储;接收查询请求,根据所述查询请求中的目标指标数据,确定所述目标指标数据对应的时间序列数据库;通过Prometheus服务器从所述时间序列数据库中获取所述目标指标数据,以监控所述指标数据的读取。Step S304-Step S305: Use the Prometheus server to obtain the target indicator data from the message queue, monitor the target indicator data according to the set monitoring strategy, and store the target indicator data in a time series database for monitoring. Storage of the indicator data; receiving a query request, and determining a time series database corresponding to the target indicator data according to the target indicator data in the query request; obtaining the target indicator data from the time series database through a Prometheus server , to monitor the reading of the indicator data.

关于Prometheus服务器从消息队列获取数据,以及监控时间序列数据库的存储与读取的描述与步骤S102一致,在此不再赘述。The description about the Prometheus server acquiring data from the message queue and monitoring the storage and reading of the time series database is consistent with step S102, and will not be repeated here.

图4示出了现有Prometheus系统的示意图;Figure 4 shows a schematic diagram of an existing Prometheus system;

图5示出了本发明一个实施例所提供的改进的Prometheus系统的示意图;5 shows a schematic diagram of an improved Prometheus system provided by an embodiment of the present invention;

下面通过对比图4和图5说明本发明的实施例。Embodiments of the present invention will be described below by comparing FIGS. 4 and 5 .

1)现有的Prometheus系统中,将指标数据存储在本地,海量数据存储存在容量限制的问题。虽然指标数据可以存储在第三方数据库(如图4所示的TSDB),但是与第三方数据库的交互是单向的,仅支持单向写,即第三方数据库存储的指标数据为静态的,流动性差且数据利用率低,进而造成数据关联性缺失、数据格式可能出现不兼容的问题。1) In the existing Prometheus system, the indicator data is stored locally, and there is a capacity limitation in mass data storage. Although the indicator data can be stored in a third-party database (TSDB as shown in Figure 4), the interaction with the third-party database is one-way, and only one-way writing is supported, that is, the indicator data stored in the third-party database is static and flowing. It has poor performance and low data utilization, resulting in the lack of data correlation and the possible incompatibility of data formats.

对比图4,如图5所示,本发明实现Prometheus服务器对时间序列数据库的存储与读取的双向操作并对存储与读取加以监控,用于基于多个第三方时间序列数据库的监控与指标数据的存储和读取。例如,用于监控的Prometheus服务器对第三方数据库(例如,如图5所示的TSDB,即本发明所描述的OpenTSDB)指标数据的存储和读取并监控对第三方数据库的存储和读取,利用Prometheus服务器的统一接口对第三方数据库进行读写操作,提高了监控指标数据的效率提高了指标数据的流动性以及指标数据的利用率,同时也解决了由于海量数据存储于本地硬盘造成的Prometheus性能问题。Compared with Fig. 4, as shown in Fig. 5, the present invention realizes the two-way operation of the storage and reading of the time series database by the Prometheus server and monitors the storage and reading, which is used for monitoring and indicators based on multiple third-party time series databases. Data storage and reading. For example, the Prometheus server used for monitoring stores and reads the indicator data of a third-party database (for example, the TSDB shown in FIG. 5, namely the OpenTSDB described in the present invention) and monitors the storage and reading of the third-party database, Using the unified interface of the Prometheus server to read and write operations to third-party databases improves the efficiency of monitoring indicator data, improves the liquidity of indicator data and the utilization rate of indicator data, and also solves the problem of Prometheus caused by the storage of massive data on local hard disks. performance issues.

即,利用Prometheus服务器从所述消息队列获取所述目标指标数据,根据设定监控策略,监控所述目标指标数据,并将所述目标指标数据存储于时间序列数据库中,以监控所述指标数据的存储;接收查询请求,根据所述查询请求中的目标指标数据,确定所述目标指标数据对应的时间序列数据库;通过Prometheus服务器从所述时间序列数据库中获取所述目标指标数据,以监控所述指标数据的读取。That is, use the Prometheus server to obtain the target indicator data from the message queue, monitor the target indicator data according to the set monitoring strategy, and store the target indicator data in a time series database to monitor the indicator data storage; receive a query request, and determine the time series database corresponding to the target indicator data according to the target indicator data in the query request; obtain the target indicator data from the time series database through the Prometheus server to monitor all Read the indicator data.

2)如图4所示,在现有Prometheus系统中,Prometheus周期性获取静态配置监控目标(targets)或者数据网关(Pushgateway)的指标数据,还可以通过服务发现(Servicediscovery)获取数据容器kubernetes(k8s)的指标数据。由此可见,采集数据的方式为复杂多样,并且采集所使用的指标数据标签可能存在不规范的情况而造成数据孤岛(即,数据缺乏关联性),使后期数据关联分析存在问题;且由图4可见,k8s机器与监控服务耦合程度较高。2) As shown in Figure 4, in the existing Prometheus system, Prometheus periodically obtains the indicator data of static configuration monitoring targets (targets) or data gateway (Pushgateway), and can also obtain data container kubernetes (k8s) through service discovery (Servicediscovery) ) indicator data. It can be seen that the methods of data collection are complex and diverse, and the index data labels used for collection may be irregular, resulting in data islands (that is, data lack of correlation), which makes the later data correlation analysis problematic; 4 It can be seen that the k8s machine is highly coupled with the monitoring service.

如图5所示,Prometheus服务器可以周期性地获取消息队列中的数据(即,目标指标数据),与现有Prometheus系统对比,采集数据的方式从多种转换为单一地从消息队列(kafka)获取,而消息队列(kafka)的目标指标数据为经过数据网关(Pushgateway)对接收的指标数据处理所得,数据网关根据接收的指标数据类别,确定全局业务标签并添加全局业务表填到指标数据中,通过该技术方案,可以统一地获取指标数据并确定了指标数据的关联性和唯一性,又,也可以与k8s解耦,即不直接收集来自k8s的指标数据。As shown in Figure 5, the Prometheus server can periodically obtain the data in the message queue (that is, the target indicator data). The target indicator data of the message queue (kafka) is obtained by processing the received indicator data through the data gateway (Pushgateway). The data gateway determines the global business label according to the received indicator data category and adds a global business table to fill in the indicator data. , Through this technical solution, the indicator data can be obtained uniformly and the correlation and uniqueness of the indicator data can be determined, and it can also be decoupled from k8s, that is, the indicator data from k8s is not directly collected.

即,接收指标数据,根据所述指标数据的类别,确定所述指标数据对应的全局业务标签,将所述全局业务标签添加至各个所述指标数据中;形成目标指标数据并存储于消息队列中;That is, receiving the indicator data, determining the global service label corresponding to the indicator data according to the category of the indicator data, adding the global business label to each of the indicator data; forming target indicator data and storing it in the message queue ;

3)由2)的描述可知,本发明的一个实施例,使用消息队列(kafka)解决高并发海量数据的采集问题,该技术方案为本发明基于现有Prometheus系统新增组件。利用消息队列缓存的机制,可以部分解决监控服务器的高吞吐量和低延迟所产生的问题,同时可以在高并发情况下具有良好的容错性。3) As can be seen from the description of 2), in an embodiment of the present invention, a message queue (kafka) is used to solve the problem of collecting high-concurrency and massive data, and the technical solution is that the present invention is based on the existing Prometheus system. Using the mechanism of message queue caching, the problems caused by the high throughput and low latency of the monitoring server can be partially solved, and at the same time, it can have good fault tolerance in the case of high concurrency.

如图6所示,本发明实施例提供了一种监控数据的装置600,包括:数据处理模块601和数据读写模块601;其中,As shown in FIG. 6, an embodiment of the present invention provides an apparatus 600 for monitoring data, including: a data processing module 601 and a data reading and writing module 601; wherein,

所述数据处理模块601,用于接收指标数据,根据所述指标数据的类别,确定所述指标数据对应的全局业务标签,将所述全局业务标签添加至各个所述指标数据中;形成目标指标数据并存储于消息队列中;The data processing module 601 is configured to receive indicator data, determine a global business label corresponding to the indicator data according to the category of the indicator data, and add the global business label to each of the indicator data; form a target indicator data and stored in the message queue;

所述数据读写模块602,用于利用Prometheus服务器从所述消息队列获取所述目标指标数据,根据设定监控策略,监控所述目标指标数据,并将所述目标指标数据存储于时间序列数据库中,以监控所述指标数据的存储;接收查询请求,根据所述查询请求中的目标指标数据,确定所述目标指标数据对应的时间序列数据库;通过Prometheus服务器从所述时间序列数据库中获取所述目标指标数据,以监控所述指标数据的读取。The data reading and writing module 602 is used to obtain the target indicator data from the message queue by using the Prometheus server, monitor the target indicator data according to a set monitoring strategy, and store the target indicator data in a time series database , to monitor the storage of the indicator data; receive a query request, and determine the time series database corresponding to the target indicator data according to the target indicator data in the query request; obtain the data from the time series database through the Prometheus server the target indicator data to monitor the reading of the indicator data.

可选地,所述数据处理模块601,还用于根据第一格式的格式规则,将基于第二格式的所述指标数据的数据格式转换为所述第一格式。Optionally, the data processing module 601 is further configured to convert the data format of the indicator data based on the second format into the first format according to the format rule of the first format.

可选地,所述数据处理模块601,还用于当所述指标数据包含非数字的数值时,根据预定义的非数字与数字的对应关系,将所述非数字的数值转换为对应的数字。Optionally, the data processing module 601 is further configured to convert the non-numeric value into a corresponding number according to a predefined correspondence between non-numeric values and numbers when the indicator data contains non-numeric values. .

可选地,所述数据读写模块602,还用于获取所述目标指标数据的查询请求,根据时间序列数据库的语法规则,转换所述查询请求包含的运算符。Optionally, the data reading and writing module 602 is further configured to obtain a query request for the target indicator data, and convert the operators included in the query request according to the grammatical rules of the time series database.

可选地,所述数据读写模块602,还用于基于远程进程调用模型存储所述指标数据至所述时间序列数据库、读取所述时间序列数据库的所述指标数据。Optionally, the data reading and writing module 602 is further configured to store the indicator data in the time series database based on a remote process call model, and read the indicator data in the time series database.

如图7所示,本发明实施例提供了一种监控数据的装置700,包括:数据采集模块701;其中,As shown in FIG. 7 , an embodiment of the present invention provides an apparatus 700 for monitoring data, including: a data collection module 701; wherein,

所述数据采集模块701用于采集指标数据,根据配置的全局业务标签和网络地址,按照设定周期发送所述指标数据以及所述全局业务标签至所述网络地址。The data collection module 701 is configured to collect indicator data, and according to the configured global service label and network address, send the indicator data and the global service label to the network address according to a set period.

可选地,所述数据采集模块701,还用于利用数据采集软件包采集指标数据。Optionally, the data collection module 701 is further configured to collect indicator data by using a data collection software package.

可选地,所述数据采集模块701,还用于利用所述指标数据采集软件包所包含的注册方法,增加自定义指标,并利用所述数据采集软件包采集所述自定义指标相对应的指标数据。Optionally, the data collection module 701 is further configured to use the registration method included in the indicator data collection software package to add a custom indicator, and use the data collection software package to collect the corresponding custom indicators. indicator data.

可选地,所述数据采集模块701,还用于利用指标数据采集脚本采集指标数据。Optionally, the data collection module 701 is further configured to collect indicator data by using an indicator data collection script.

如图8所示,本发明实施例提供了一种监控数据的系统,包括:如图6所示的监控数据的装置,以及图7所示的监控数据的装置。As shown in FIG. 8 , an embodiment of the present invention provides a system for monitoring data, including: an apparatus for monitoring data as shown in FIG. 6 , and an apparatus for monitoring data as shown in FIG. 7 .

本发明实施例还提供了一种监控数据的电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述任一实施例提供的方法。An embodiment of the present invention also provides an electronic device for monitoring data, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs Each processor executes, so that the one or more processors implement the method provided by any one of the foregoing embodiments.

本发明实施例还提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现上述任一实施例提供的方法。An embodiment of the present invention further provides a computer-readable medium, on which a computer program is stored, and when the program is executed by a processor, implements the method provided by any of the foregoing embodiments.

图9示出了可以应用本发明实施例的监控数据的方法或监控数据的装置的示例性系统架构900。FIG. 9 shows an exemplary system architecture 900 of a method for monitoring data or an apparatus for monitoring data to which embodiments of the present invention may be applied.

如图9所示,系统架构900可以包括终端设备901、902、903,网络904和服务器905。网络904用以在终端设备901、902、903和服务器905之间提供通信链路的介质。网络904可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 9 , the system architecture 900 may include terminal devices 901 , 902 , and 903 , a network 904 and a server 905 . The network 904 is a medium used to provide a communication link between the terminal devices 901 , 902 , 903 and the server 905 . Network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备901、902、903通过网络904与服务器905交互,以接收或发送消息等。终端设备901、902、903上可以安装有各种客户端应用,例如网页浏览器应用、搜索类应用、即时通信工具和邮箱客户端等。The user can use the terminal devices 901, 902, 903 to interact with the server 905 through the network 904 to receive or send messages and the like. Various client applications may be installed on the terminal devices 901 , 902 and 903 , such as web browser applications, search applications, instant messaging tools, and email clients.

终端设备901、902、903可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于服务器、智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 901, 902, 903 may be various electronic devices having a display screen and supporting web browsing, including but not limited to servers, smart phones, tablet computers, laptop computers, desktop computers, and the like.

服务器905可以是提供各种服务的服务器,例如对用户利用终端设备901、902、903所提出的数据监控请求提供支持的后台管理服务器。后台管理服务器可以对接收到的数据监控请求等数据进行处理并存储接收的指标数据,并将指标数据分析结果反馈给终端设备。The server 905 may be a server that provides various services, for example, a background management server that provides support for data monitoring requests made by users using the terminal devices 901 , 902 , and 903 . The background management server can process the received data monitoring requests and other data, store the received indicator data, and feed back the indicator data analysis results to the terminal device.

需要说明的是,本发明实施例所提供的监控数据的方法一般由服务器905执行,相应地,监控数据的装置一般设置于服务器905中。It should be noted that the method for monitoring data provided in the embodiment of the present invention is generally executed by the server 905 , and accordingly, the apparatus for monitoring data is generally set in the server 905 .

应该理解,图9中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 9 are only illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

下面参考图10,其示出了适于用来实现本发明实施例的终端设备的计算机系统1000的结构示意图。图10示出的终端设备仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。Referring to FIG. 10 below, it shows a schematic structural diagram of a computer system 1000 suitable for implementing a terminal device according to an embodiment of the present invention. The terminal device shown in FIG. 10 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.

如图10所示,计算机系统1000包括中央处理单元(CPU)1001,其可以根据存储在只读存储器(ROM)1002中的程序或者从存储部分1008加载到随机访问存储器(RAM)1003中的程序而执行各种适当的动作和处理。在RAM 1003中,还存储有系统1000操作所需的各种程序和数据。CPU 1001、ROM 1002以及RAM 1003通过总线1004彼此相连。输入/输出(I/O)接口1005也连接至总线1004。As shown in FIG. 10, a computer system 1000 includes a central processing unit (CPU) 1001, which can be loaded into a random access memory (RAM) 1003 according to a program stored in a read only memory (ROM) 1002 or a program from a storage section 1008 Instead, various appropriate actions and processes are performed. In the RAM 1003, various programs and data necessary for the operation of the system 1000 are also stored. The CPU 1001 , the ROM 1002 , and the RAM 1003 are connected to each other through a bus 1004 . An input/output (I/O) interface 1005 is also connected to the bus 1004 .

以下部件连接至I/O接口1005:包括键盘、鼠标等的输入部分1006;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1007;包括硬盘等的存储部分1008;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1009。通信部分1009经由诸如因特网的网络执行通信处理。驱动器1010也根据需要连接至I/O接口1005。可拆卸介质1011,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1010上,以便于从其上读出的计算机程序根据需要被安装入存储部分1008。The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, etc.; an output section 1007 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 1008 including a hard disk, etc. ; and a communication section 1009 including a network interface card such as a LAN card, a modem, and the like. The communication section 1009 performs communication processing via a network such as the Internet. A drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 1010 as needed so that a computer program read therefrom is installed into the storage section 1008 as needed.

特别地,根据本发明公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本发明公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1009从网络上被下载和安装,和/或从可拆卸介质1011被安装。在该计算机程序被中央处理单元(CPU)1001执行时,执行本发明的系统中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as computer software programs in accordance with the disclosed embodiments of the present invention. For example, embodiments disclosed herein include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 1009, and/or installed from the removable medium 1011. When the computer program is executed by the central processing unit (CPU) 1001, the above-described functions defined in the system of the present invention are executed.

需要说明的是,本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In the present invention, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

描述于本发明实施例中所涉及到的模块和/或单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块和/或单元也可以设置在处理器中,例如,可以描述为:一种处理器包括数据处理模块、数据读写模块、和数据采集模块。其中,这些模块的名称在某种情况下并不构成对该模块本身的限定,例如,数据处理模块还可以被描述为“接收指标数据,将指标数据的数据格式转换为第一格式,并为指标数据添加全局业务标签的模块”。The modules and/or units involved in the embodiments of the present invention may be implemented in a software manner, and may also be implemented in a hardware manner. The described modules and/or units can also be provided in a processor, for example, it can be described as: a processor includes a data processing module, a data reading and writing module, and a data acquisition module. Among them, the names of these modules do not constitute a limitation of the module itself under certain circumstances. For example, the data processing module can also be described as "receiving indicator data, converting the data format of the indicator data into the first format, and creating A module for adding global business labels to indicator data".

作为另一方面,本发明还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,使得该设备包括:接收指标数据,根据所述指标数据的类别,确定所述指标数据对应的全局业务标签,将所述全局业务标签添加至各个所述指标数据中;形成目标指标数据并存储于消息队列中;利用Prometheus服务器从所述消息队列获取所述目标指标数据,根据设定监控策略,监控所述目标指标数据,并将所述目标指标数据存储于时间序列数据库中,以监控所述指标数据的存储;接收查询请求,根据所述查询请求中的目标指标数据,确定所述目标指标数据对应的时间序列数据库;通过Prometheus服务器从所述时间序列数据库中获取所述目标指标数据,以监控所述指标数据的读取。采集指标数据,根据配置的全局业务标签和网络地址,按照设定周期发送所述指标数据以及所述全局业务标签至所述网络地址。As another aspect, the present invention also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device includes: receiving indicator data, and determining the corresponding indicator data according to the category of the indicator data. The global business tag is added to each of the indicator data; the target indicator data is formed and stored in the message queue; the Prometheus server is used to obtain the target indicator data from the message queue, and monitor according to the settings strategy, monitor the target indicator data, and store the target indicator data in a time series database to monitor the storage of the indicator data; receive a query request, and determine the target indicator data according to the target indicator data in the query request The time series database corresponding to the target indicator data; the target indicator data is obtained from the time series database through the Prometheus server to monitor the reading of the indicator data. Collect indicator data, and send the indicator data and the global service label to the network address according to a set period according to the configured global service label and network address.

根据本发明实施例的技术方案,服务端接收客户端所采集的指标数据,利用统一接口执行对第三方时间序列数据库的读写操作,以监控指标数据的存储和读取,克服了现有系统对第三方时间序列数据库单向存储的缺陷,提高了监控数据的效率,并提高了指标数据的流动性和利用率;通过确定全局业务标签,解决了由于业务标签不规范造成所收集的指标数据关联性差的问题;并通过将接收到的指标数据放入消息队列,解决了海量数据高并发问题。According to the technical solution of the embodiment of the present invention, the server receives the indicator data collected by the client, and uses a unified interface to perform read and write operations on the third-party time series database, so as to monitor the storage and reading of the indicator data, which overcomes the problem of the existing system. The defect of one-way storage of third-party time series databases improves the efficiency of monitoring data, and improves the liquidity and utilization of indicator data; by determining the global business label, it solves the problem of the indicator data collected due to irregular business labels. The problem of poor correlation; and by putting the received indicator data into the message queue, the problem of high concurrency of massive data is solved.

上述具体实施方式,并不构成对本发明保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等,均应包含在本发明保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims (14)

1. A method for monitoring data, which is applied to a Prometous system, comprises the following steps:
receiving index data, determining a global service label corresponding to the index data according to the category of the index data, and adding the global service label to each index data; forming target index data and storing the target index data in a message queue;
acquiring the target index data from the message queue by using a Prometous server, monitoring the target index data according to a set monitoring strategy, and storing the target index data in a time sequence database so as to monitor the storage of the index data;
receiving a query request, and determining a time sequence database corresponding to target index data according to the target index data in the query request; and acquiring the target index data from the time sequence database through a Prometheus server to monitor the reading of the index data.
2. The method of claim 1,
and converting the data format of the index data based on the second format into the first format according to the format rule of the first format.
3. The method of claim 1,
and when the index data contains non-numerical values, converting the non-numerical values into corresponding numbers according to a predefined corresponding relation between the non-numerical values and the numerical values.
4. The method of claim 1,
and acquiring a query request of the target index data, and converting operators contained in the query request according to grammar rules of a time sequence database.
5. The method of claim 1,
and storing the index data to the time sequence database based on a remote process calling model, and reading the index data of the time sequence database.
6. A method of monitoring data, comprising:
acquiring index data, and sending the index data and the global service label to the network address according to the configured global service label and the network address and a set period.
7. The method of claim 6,
and acquiring index data by using a data acquisition software package.
8. The method of claim 7,
and adding a user-defined index by using a registration method contained in the index data acquisition software package, and acquiring index data corresponding to the user-defined index by using the data acquisition software package.
9. The method of claim 6,
and acquiring the index data by using the index data acquisition script.
10. An apparatus for monitoring data, applied to a Prometheus system, comprising: the data processing module and the data reading and writing module; wherein,
the data processing module is used for receiving index data, determining a global service tag corresponding to the index data according to the category of the index data, and adding the global service tag and corresponding content to each index data; forming target index data and storing the target index data in a message queue;
the data reading and writing module is used for acquiring the target index data from the message queue by using a Prometous server, monitoring the target index data according to a set monitoring strategy, and storing the target index data in a time sequence database so as to monitor the storage of the index data; receiving a query request, and determining a time sequence database corresponding to target index data according to the target index data in the query request; and acquiring the target index data from the time sequence database through a Prometheus server to monitor the reading of the index data.
11. An apparatus for monitoring data, comprising: a data acquisition module; the data acquisition module is used for acquiring index data and sending the index data and the global service label to the network address according to a set period and the configured global service label and the network address.
12. A system for monitoring data, comprising: the apparatus for sharing files of claim 10, and the apparatus for sharing files of claim 11.
13. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5 or 6-9.
14. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-5 or 6-9.
CN202010904700.9A 2020-09-01 2020-09-01 Method, device and system for monitoring data Active CN114090366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010904700.9A CN114090366B (en) 2020-09-01 2020-09-01 Method, device and system for monitoring data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010904700.9A CN114090366B (en) 2020-09-01 2020-09-01 Method, device and system for monitoring data

Publications (2)

Publication Number Publication Date
CN114090366A true CN114090366A (en) 2022-02-25
CN114090366B CN114090366B (en) 2025-09-12

Family

ID=80295782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010904700.9A Active CN114090366B (en) 2020-09-01 2020-09-01 Method, device and system for monitoring data

Country Status (1)

Country Link
CN (1) CN114090366B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595080A (en) * 2022-03-02 2022-06-07 平凯星辰(北京)科技有限公司 Data processing method, apparatus, electronic device, and computer-readable storage medium
CN114911843A (en) * 2022-05-11 2022-08-16 中国平安人寿保险股份有限公司 Method, device, and computer-readable storage medium for reporting business indicators
CN115309612A (en) * 2022-10-10 2022-11-08 凯美瑞德(苏州)信息科技股份有限公司 Method and device for monitoring data
CN115314452A (en) * 2022-07-11 2022-11-08 中电通商数字技术(上海)有限公司 Monitoring index acquisition and storage method, system and storage medium based on message queue
CN115460264A (en) * 2022-08-23 2022-12-09 曙光信息产业股份有限公司 Target server access method and system
CN116303571A (en) * 2023-03-01 2023-06-23 杭州网易云音乐科技有限公司 Data query method, device, equipment and storage medium
CN116303804A (en) * 2023-05-19 2023-06-23 北京拓普丰联信息科技股份有限公司 Data comparison method, device, equipment and medium
CN117194562A (en) * 2023-07-25 2023-12-08 中国人民银行数字货币研究所 Data synchronization method and device, electronic equipment and computer readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844399A (en) * 2017-10-10 2018-03-27 武汉斗鱼网络科技有限公司 Method, storage medium, electronic equipment and the system of automatic monitoring data storehouse service
CN109408347A (en) * 2018-09-28 2019-03-01 北京九章云极科技有限公司 A kind of index real-time analyzer and index real-time computing technique
CN109726074A (en) * 2018-08-31 2019-05-07 网联清算有限公司 Log processing method, device, computer equipment and storage medium
CN111553560A (en) * 2020-04-01 2020-08-18 车智互联(北京)科技有限公司 Service index monitoring method, monitoring server and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844399A (en) * 2017-10-10 2018-03-27 武汉斗鱼网络科技有限公司 Method, storage medium, electronic equipment and the system of automatic monitoring data storehouse service
CN109726074A (en) * 2018-08-31 2019-05-07 网联清算有限公司 Log processing method, device, computer equipment and storage medium
CN109408347A (en) * 2018-09-28 2019-03-01 北京九章云极科技有限公司 A kind of index real-time analyzer and index real-time computing technique
CN111553560A (en) * 2020-04-01 2020-08-18 车智互联(北京)科技有限公司 Service index monitoring method, monitoring server and system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595080A (en) * 2022-03-02 2022-06-07 平凯星辰(北京)科技有限公司 Data processing method, apparatus, electronic device, and computer-readable storage medium
CN114911843A (en) * 2022-05-11 2022-08-16 中国平安人寿保险股份有限公司 Method, device, and computer-readable storage medium for reporting business indicators
CN115314452A (en) * 2022-07-11 2022-11-08 中电通商数字技术(上海)有限公司 Monitoring index acquisition and storage method, system and storage medium based on message queue
CN115314452B (en) * 2022-07-11 2025-01-24 中电通商数字技术(上海)有限公司 Monitoring indicator collection and storage method, system and storage medium based on message queue
CN115460264A (en) * 2022-08-23 2022-12-09 曙光信息产业股份有限公司 Target server access method and system
CN115309612A (en) * 2022-10-10 2022-11-08 凯美瑞德(苏州)信息科技股份有限公司 Method and device for monitoring data
CN116303571A (en) * 2023-03-01 2023-06-23 杭州网易云音乐科技有限公司 Data query method, device, equipment and storage medium
CN116303804A (en) * 2023-05-19 2023-06-23 北京拓普丰联信息科技股份有限公司 Data comparison method, device, equipment and medium
CN116303804B (en) * 2023-05-19 2023-08-15 北京拓普丰联信息科技股份有限公司 Data comparison method, device, equipment and medium
CN117194562A (en) * 2023-07-25 2023-12-08 中国人民银行数字货币研究所 Data synchronization method and device, electronic equipment and computer readable medium
CN117194562B (en) * 2023-07-25 2025-03-07 中国人民银行数字货币研究所 Data synchronization method and device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN114090366B (en) 2025-09-12

Similar Documents

Publication Publication Date Title
CN114090366A (en) Method, device and system for monitoring data
CN111858248B (en) Application monitoring method, device, equipment and storage medium
US8447851B1 (en) System for monitoring elastic cloud-based computing systems as a service
WO2021151312A1 (en) Method for determining inter-service dependency, and related apparatus
US10116534B2 (en) Systems and methods for WebSphere MQ performance metrics analysis
US11570078B2 (en) Collecting route-based traffic metrics in a service-oriented system
CN114490268A (en) Full-link monitoring method, apparatus, device, storage medium and program product
CN109039817B (en) Information processing method, device, equipment and medium for flow monitoring
WO2015018226A1 (en) Method,apparatus,and system for monitoring website
CN111294218B (en) Information processing method, device, system and storage medium
CN110532322B (en) Operation and maintenance interaction method, system, computer-readable storage medium and device
CN110928934A (en) Data processing method and device for business analysis
CN114625597A (en) Monitoring operation and maintenance system, method, device, electronic device and storage medium
CN113760677A (en) Abnormal link analysis method, device, equipment and storage medium
CN110262951A (en) A kind of business second grade monitoring method and system, storage medium and client
CN113642300A (en) Report generation method and device, electronic equipment and computer readable medium
CN114138720A (en) Log processing method, log processing device, electronic device and storage medium
CN117176802B (en) Full-link monitoring method and device for service request, electronic equipment and medium
CN110633191B (en) Method and system for real-time monitoring of software system business health
CN117370053A (en) Information system service operation-oriented panoramic monitoring method and system
CN115048279A (en) Server processing method and device, electronic equipment and computer readable medium
CN115514618A (en) Alarm event processing method and device, electronic equipment and medium
CN116167556A (en) Job monitoring method, job monitoring device, job monitoring system, job monitoring equipment and computer readable storage medium
CN114625763A (en) Information analysis method and device for database, electronic equipment and readable medium
CN110852537B (en) Quality of service detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant