CN108268355A

CN108268355A - For the monitoring system and method for data center

Info

Publication number: CN108268355A
Application number: CN201611268506.6A
Authority: CN
Inventors: 曾祥洪
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Sichuan Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Sichuan Co Ltd
Priority date: 2016-12-31
Filing date: 2016-12-31
Publication date: 2018-07-10

Abstract

This application relates to a monitoring system and method for a data center. The monitoring system includes a monitoring resource layer, a monitoring data aggregation layer, a monitoring center, and a configuration center. The monitoring data aggregation layer has a multi-layer structure, and the monitoring resource layer includes multiple monitoring resource groups. , each monitoring resource group collects monitoring data according to the rules issued by the configuration center, and reports the monitoring data to the monitoring data aggregation layer; the monitoring data aggregation layer receives and sends the monitoring data sent by the monitoring resource layer, classifies the monitoring data, and The monitoring data is aggregated and processed; the monitoring center is used to store the monitoring data; the configuration center is used to configure the grouping information of the monitoring resource group, the grouping information of the aggregation nodes and the aggregation strategy of the monitoring data aggregation layer, and send them to the monitoring resource layer and Monitoring data aggregation layer.

Description

Monitoring system and method for data center

技术领域technical field

本申请总体涉及计算机与网络技术领域，更具体地，涉及用于数据中心的监控系统及方法。The present application generally relates to the field of computer and network technology, and more specifically, to a monitoring system and method for a data center.

背景技术Background technique

随着社会信息化技术的不断提高以及互联网技术的快速普及，计算机设备越来越多，可以预计在不久的将来，超大规模数据中心所涉及的设备数量将达到数十万乃至上百万，因此，需要处理的数据也越来越多，各个领域对海量数据处理的需求也越来越大。在单机器存储空间和运算能力已经不能满足人们对海量数据处理的需求的背景下，分布式计算和并行计算开始快速发展和应用，最终发展为网格计算。大规模分布式系统的监控信息是海量的，监控资源是多层次多来源的，大数据平台的动态性、复杂性给大数据平台的监控系统带来众多困难。With the continuous improvement of social information technology and the rapid popularization of Internet technology, there are more and more computer equipment. It can be expected that in the near future, the number of equipment involved in ultra-large-scale data centers will reach hundreds of thousands or even millions. Therefore, , more and more data needs to be processed, and the demand for massive data processing in various fields is also increasing. Under the background that the storage space and computing power of a single machine can no longer meet people's demand for massive data processing, distributed computing and parallel computing began to develop and apply rapidly, and eventually developed into grid computing. The monitoring information of a large-scale distributed system is massive, and the monitoring resources are multi-level and multi-sourced. The dynamics and complexity of the big data platform bring many difficulties to the monitoring system of the big data platform.

发明内容Contents of the invention

根据本申请的一方面，提供了一种用于数据中心的监控系统，该监控系统包括监控资源层、监控数据汇聚层、监控中心和配置中心，监控数据汇聚层为多层结构，监控资源层包括多个监控资源组，每个监控资源组根据配置中心下发的规则采集监控数据，并将监控数据上报给监控数据汇聚层；监控数据汇聚层接送监控资源层发送的监控数据，对监控数据进行分类，并对每类监控数据进行汇聚处理；监控中心，用于存储监控数据；配置中心，用于配置监控资源组的分组信息、监控数据汇聚层的汇聚节点分组信息和汇聚策略，并下发给监控资源层和监控数据汇聚层。According to one aspect of the present application, a monitoring system for a data center is provided. The monitoring system includes a monitoring resource layer, a monitoring data aggregation layer, a monitoring center, and a configuration center. The monitoring data aggregation layer has a multi-layer structure, and the monitoring resource layer Including multiple monitoring resource groups, each monitoring resource group collects monitoring data according to the rules issued by the configuration center, and reports the monitoring data to the monitoring data aggregation layer; the monitoring data aggregation layer receives and sends the monitoring data sent by the monitoring resource layer, and performs monitoring data Classify and aggregate each type of monitoring data; the monitoring center is used to store monitoring data; the configuration center is used to configure the grouping information of the monitoring resource group, the grouping information of the aggregation node and the aggregation strategy of the monitoring data aggregation layer, and download Send to the monitoring resource layer and the monitoring data aggregation layer.

根据本申请的另一方面，提供了一种用于数据中心的监控方法，该监控方法在包括监控资源层、监控数据汇聚层、监控中心和配置中心的监控系统中被执行，其中，监控数据汇聚层为多层结构，该监控方法包括：监控资源层采集监控数据并将监控数据上报给监控数据汇聚层，监控资源层包括多个监控资源组，每个监控资源组根据配置中心下发的规则采集监控数据；监控数据汇聚层接送监控资源层发送的监控数据，对监控数据进行分类，并对每类监控数据进行汇聚处理；监控中心存储监控数据；以及配置中心配置监控资源组的分组信息、监控数据汇聚层的汇聚节点分组信息和汇聚策略，并下发给监控资源层和监控数据汇聚层。。According to another aspect of the present application, a monitoring method for a data center is provided. The monitoring method is executed in a monitoring system including a monitoring resource layer, a monitoring data aggregation layer, a monitoring center, and a configuration center, wherein the monitoring data The aggregation layer has a multi-layer structure. The monitoring method includes: the monitoring resource layer collects monitoring data and reports the monitoring data to the monitoring data aggregation layer. The monitoring resource layer includes multiple monitoring resource groups. Rules collect monitoring data; the monitoring data aggregation layer receives and sends the monitoring data sent by the monitoring resource layer, classifies the monitoring data, and aggregates each type of monitoring data; the monitoring center stores the monitoring data; and the configuration center configures the grouping information of the monitoring resource group 1. Monitor the aggregation node grouping information and aggregation strategy of the monitoring data aggregation layer, and issue them to the monitoring resource layer and the monitoring data aggregation layer. .

根据本申请实施例的用于数据中心的监控系统和方法提供了一种可以减轻监控中心的压力的技术方案。The monitoring system and method for a data center according to the embodiments of the present application provides a technical solution that can reduce the pressure on the monitoring center.

附图说明Description of drawings

结合以下附图，根据本申请的实施例的描述可以更好地理解本申请，其中：The present application can be better understood according to the description of the embodiments of the present application in conjunction with the following drawings, wherein:

图1根据本申请的一个实施例，示出了用于数据中心的监控系统的结构示意图；Fig. 1 shows a schematic structural diagram of a monitoring system for a data center according to an embodiment of the present application;

图2根据本申请的一个实施例，示出了用于数据中心的监控方法的流程图；Fig. 2 shows a flow chart of a monitoring method for a data center according to an embodiment of the present application;

图3根据本申请的一个实施例，示出了监控中心的操作流程图；以及Fig. 3 shows the operation flowchart of the monitoring center according to one embodiment of the present application; and

图4示出了可以实现本申请的监控资源层节点、监控数据汇聚层节点、监控中心、配置中心中的一者或多者的信息处理设备的结构示意图。Fig. 4 shows a schematic structural diagram of an information processing device that can implement one or more of the monitoring resource layer node, monitoring data convergence layer node, monitoring center, and configuration center of the present application.

具体实施方式Detailed ways

下面将详细描述本申请各个方面的特征和示例性实施例。下面的描述涵盖了许多具体细节，以便提供对本申请的全面理解。但是，对于本领域技术人员来说显而易见的是，本申请可以在不需要这些具体细节中的一些细节的情况下实施。下面对实施例的描述仅仅是为了通过示出本申请的示例来提供对本申请更清楚的理解。本申请绝不限于下面所提出的任何具体配置，而是在不脱离本申请的精神的前提下覆盖了相关特征、结构、操作等的任何修改、替换和改进。Features and exemplary embodiments of various aspects of the present application will be described in detail below. The following description covers numerous specific details in order to provide a thorough understanding of the application. It will be apparent, however, to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is only to provide a clearer understanding of the present application by showing examples of the present application. The present application is by no means limited to any specific configuration presented below, but covers any modifications, substitutions and improvements of related features, structures, operations, etc. without departing from the spirit of the present application.

监控是大数据平台的重要组成部分，随着监控数据的显著增多，目前的系统和方法已经无法满足日益庞大的数据中心的监控需求，导致监控延迟、采集的数据部分丢失、监控中心设备资源消耗高居不下等问题，从而不能实现对数据的有效监控。Monitoring is an important part of the big data platform. With the significant increase of monitoring data, the current systems and methods can no longer meet the monitoring needs of the increasingly large data center, resulting in monitoring delays, partial loss of collected data, and resource consumption of monitoring center equipment. Problems such as high and low, so that effective monitoring of data cannot be achieved.

另外，大数据系统的服务器集群在统计分析和汇总相关软硬件信息时可能因服务器数量较多、部署软件种类较多以及相关信息指标过于复杂等原因，而导致异常报警工作量繁重、监控和报警效率低下等问题。In addition, when the server cluster of the big data system statistically analyzes and summarizes relevant software and hardware information, it may be due to the large number of servers, the large variety of deployed software, and the complexity of related information indicators, which may lead to heavy workloads for abnormal alarms, monitoring and alarming. low efficiency etc.

本申请通过采用多层监控结构的思路，在每层均汇聚相应的资源，同时对每层的资源进行分析和处理，大大减轻监控中心的压力，实现对大数据的有效监控。This application adopts the idea of multi-layer monitoring structure, gathers corresponding resources in each layer, analyzes and processes the resources of each layer at the same time, greatly reduces the pressure on the monitoring center, and realizes effective monitoring of big data.

图1根据本申请的一个实施例，示出了用于数据中心的监控系统100的结构示意图。Fig. 1 shows a schematic structural diagram of a monitoring system 100 for a data center according to an embodiment of the present application.

在一个实施例中，系统100可以包括监控资源层102、监控数据汇聚层101、监控中心103和配置中心104。在一个实施例中，监控数据汇聚层101为多层结构。在一个实施例中，根据数据中心的规模以及需要监控的量，监控数据汇聚层可以按照业务、地域、网络条件等被划分为N层，其中N是正整数，例如，如图1所示，监控数据汇聚层一101_1、监控数据汇聚层二101_2、监控数据汇聚层三101_3……、以及监控数据汇聚层N 101_N。本申请不对监控数据汇聚层的数目进行限制，但随着数据处理层级的增加，会额外的增加监控延迟，因此，需要根据数据中心的业务量、业务特性等设定层数。In one embodiment, the system 100 may include a monitoring resource layer 102 , a monitoring data aggregation layer 101 , a monitoring center 103 and a configuration center 104 . In one embodiment, the monitoring data aggregation layer 101 has a multi-layer structure. In one embodiment, according to the scale of the data center and the amount to be monitored, the monitoring data aggregation layer can be divided into N layers according to business, region, network conditions, etc., where N is a positive integer, for example, as shown in Figure 1, the monitoring Data convergence layer 1 101_1, monitoring data convergence layer 2 101_2, monitoring data convergence layer 3 101_3..., and monitoring data convergence layer N 101_N. This application does not limit the number of monitoring data aggregation layers, but with the increase of data processing layers, additional monitoring delay will be added. Therefore, the number of layers needs to be set according to the business volume and business characteristics of the data center.

在一个实施例中，监控资源层102可以作为系统100的基础层。在一个实施例中，监控资源层102可以包括多个监控资源组1021，每个监控资源组1021可以根据配置中心104下发的规则采集监控数据，并将监控数据上报给监控数据汇聚层101。In one embodiment, the monitoring resource layer 102 may serve as the base layer of the system 100 . In one embodiment, the monitoring resource layer 102 may include multiple monitoring resource groups 1021 , and each monitoring resource group 1021 may collect monitoring data according to the rules issued by the configuration center 104 and report the monitoring data to the monitoring data aggregation layer 101 .

在一个实施例中，多个监控资源组1021中的至少一个监控资源组1021可以包括需要监控的资源。资源例如可以是软件、硬件、或其组合的各种属性，例如，业务属性。In one embodiment, at least one monitoring resource group 1021 among the plurality of monitoring resource groups 1021 may include resources that need to be monitored. Resources can be, for example, various attributes of software, hardware, or a combination thereof, for example, service attributes.

在一个实施例中，可以针对每个需要监控的资源来部署监控节点，以对资源进行监控并且采集数据。在一个实施例中，监控节点可以是代理(agent)。In an embodiment, a monitoring node may be deployed for each resource to be monitored, so as to monitor the resource and collect data. In one embodiment, a monitoring node may be an agent.

在一个实施例中，监控数据汇聚层101接送监控资源层102发送的监控数据，对监控数据进行分类，并对每类监控数据进行汇聚处理。In one embodiment, the monitoring data aggregation layer 101 receives and sends the monitoring data sent by the monitoring resource layer 102, classifies the monitoring data, and performs aggregation processing on each type of monitoring data.

在一个实施例中，监控中心103用于存储监控数据。In one embodiment, the monitoring center 103 is used to store monitoring data.

在一个实施例中，配置中心104配置监控资源组的分组信息、监控数据汇聚层的汇聚节点分组信息和汇聚策略，并下发给监控资源层和监控数据汇聚层。In one embodiment, the configuration center 104 configures the grouping information of the monitoring resource group, the grouping information of the aggregation nodes and the aggregation strategy of the monitoring data aggregation layer, and sends them to the monitoring resource layer and the monitoring data aggregation layer.

在一个实施例中，每个监控节点可以各自将采集到的数据进行上报，例如，上报至第一监控数据汇聚层(例如，监控数据汇聚层一101_1)。在一个实施例中，可以在每个监控资源组中的监控节点中根据预定规则选择出主监控节点，其他监控节点将采集到的数据汇聚到主监控节点，并由主监控节点上报数据。在一个实施例中，可以根据预定规则，针对不同的资源监控组采用上述两种方式来向监控数据汇聚层进行上报。In an embodiment, each monitoring node may report the collected data, for example, to the first monitoring data aggregation layer (eg, monitoring data aggregation layer one 101_1 ). In one embodiment, the main monitoring node can be selected from the monitoring nodes in each monitoring resource group according to predetermined rules, and other monitoring nodes gather the collected data to the main monitoring node, and the main monitoring node reports the data. In one embodiment, according to predetermined rules, the above two methods may be used for different resource monitoring groups to report to the monitoring data aggregation layer.

在一个实施例中，监控数据汇聚层101可以包括第一监控数据汇聚层和第二监控数据汇聚层，例如，监控数据汇聚层一101_1和监控数据汇聚层二101_2(图1中未示出)，第二监控数据汇聚层可以是第一监控数据汇聚层的上层汇聚层。In one embodiment, the monitoring data aggregation layer 101 may include a first monitoring data aggregation layer and a second monitoring data aggregation layer, for example, a monitoring data aggregation layer one 101_1 and a monitoring data aggregation layer two 101_2 (not shown in FIG. 1 ) , the second monitoring data convergence layer may be an upper layer convergence layer of the first monitoring data convergence layer.

在一个实施例中，第一监控数据汇聚层接送监控资源层102发送的监控数据，用于对监控数据进行汇聚以获得第一监控信息，并将第一监控信息发送到第二监控数据汇聚层。In one embodiment, the first monitoring data aggregation layer receives and sends the monitoring data sent by the monitoring resource layer 102, and is used to aggregate the monitoring data to obtain the first monitoring information, and send the first monitoring information to the second monitoring data aggregation layer .

在一个实施例中，第二监控数据汇聚层从第一监控数据汇聚层接收第一监控信息，对第一监控信息进行汇聚以获得第二监控信息，并将第二监控信息发送到监控中心103或第二监控数据汇聚层的上层汇聚层(例如，监控数据汇聚层二101_3)。In one embodiment, the second monitoring data aggregation layer receives the first monitoring information from the first monitoring data aggregation layer, aggregates the first monitoring information to obtain the second monitoring information, and sends the second monitoring information to the monitoring center 103 Or the upper layer convergence layer of the second monitoring data convergence layer (for example, monitoring data convergence layer two 101_3).

在一个实施例中，监控中心103例如可以用于负责监控信息的存储、分析、系统故障自动响应机制动作的分析和自动故障处理动作的下发、以及告警信息的发送中的一项或多项。In one embodiment, the monitoring center 103 can be responsible for one or more of the storage and analysis of monitoring information, the analysis of automatic system fault response mechanism actions, the delivery of automatic fault handling actions, and the sending of alarm information, for example. .

在一个实施例中，监控中心103可以包括监控数据中心1031。在一个实施例中，监控数据中心1031可以用于存储监控数据。在一个实施例中，监控数据中心1031处存储所有监控数据。在一个实施例中，监控数据中心1031处存储部分监控数据。In one embodiment, the monitoring center 103 may include a monitoring data center 1031 . In one embodiment, the monitoring data center 1031 can be used to store monitoring data. In one embodiment, the monitoring data center 1031 stores all monitoring data. In one embodiment, part of the monitoring data is stored in the monitoring data center 1031 .

在一个实施例中，监控中心103可以包括监控数据分析中心1032。在一个实施例中，监控数据分析中心1032可以用于对监控数据进行分析。在一个实施例中，监控数据分析中心1032可以进行如下项中的一项或多项：故障信息的初步分析判定、系统性能数据分析、业务性能状态数据分析、资源趋势分析、设备承载能力分析、资源扩容需求分析、以及生成各类监控相关的报表。In one embodiment, the monitoring center 103 may include a monitoring data analysis center 1032 . In one embodiment, the monitoring data analysis center 1032 can be used to analyze the monitoring data. In one embodiment, the monitoring data analysis center 1032 can perform one or more of the following items: preliminary analysis and determination of fault information, system performance data analysis, service performance status data analysis, resource trend analysis, equipment carrying capacity analysis, Analyze resource expansion requirements and generate various monitoring-related reports.

在一个实施例中，监控中心103可以包括监控通知告警中心1033。在一个实施例中，监控通知告警中心1033可以负责监控系统的通知告警信息发送。在一个实施例中，监控通知告警中心1033可以根据相应的配置，将对应的通知告警信息发送给相应的人员。In one embodiment, the monitoring center 103 may include a monitoring notification and alarm center 1033 . In one embodiment, the monitoring notification and alarm center 1033 may be responsible for sending notification and alarm information of the monitoring system. In one embodiment, the monitoring notification and alarm center 1033 may send corresponding notification and alarm information to corresponding personnel according to corresponding configurations.

在一个实施例中，监控中心103可以包括监控自定义自愈分析中心1034。在一个实施例中，监控自定义自愈分析中心1034可用于对监控数据分析中心1032初步分析的故障进行进一步的分析，生成相应的用于解决故障(例如，设备死机、设备系统层面的故障、业务故障等)的指令。In one embodiment, the monitoring center 103 may include a monitoring custom self-healing analysis center 1034 . In one embodiment, the monitoring self-healing analysis center 1034 can be used to further analyze the failures initially analyzed by the monitoring data analysis center 1032, and generate corresponding solutions for troubleshooting (for example, equipment crashes, failures at the equipment system level, business failure, etc.) instructions.

在一个实施例中，监控中心103可以包括监控自定义动作下发中心1035。在一个实施例中，监控自定义动作下发中心1035可以将监控自定义自愈分析中心1034生成的指令下发到相应的设备节点，以解决故障。在一个实施例中，下发的指令可以包括但不限于：重启系统、重启设备、重启业务、对设备进行断电或者再加电硬重启(在接入相应的电源管理设备的情形下)等。In one embodiment, the monitoring center 103 may include a monitoring custom action delivery center 1035 . In one embodiment, the monitoring custom action delivery center 1035 can deliver the instructions generated by the monitoring custom self-healing analysis center 1034 to corresponding device nodes to solve the fault. In one embodiment, the issued instructions may include but not limited to: restart the system, restart the device, restart the business, power off the device or re-power it to hard restart (in the case of connecting to the corresponding power management device), etc. .

在一个实施例中，监控数据汇聚层可以包括多个数据汇聚组1011，每个数据汇聚组根据配置中心下发的规则进行操作。在一个实施例中，数据汇聚组1011可以包括多个节点。在一个实施例中，可以在数据汇聚组1011中的多个节点中选择某一节点作为主汇聚节点，以用于发送监控信息。In one embodiment, the monitoring data aggregation layer may include multiple data aggregation groups 1011, and each data aggregation group operates according to the rules issued by the configuration center. In one embodiment, the data aggregation group 1011 may include multiple nodes. In one embodiment, a certain node may be selected among multiple nodes in the data aggregation group 1011 as the main aggregation node for sending monitoring information.

在一个实施例中，监控数据汇聚层可以对监控数据进行分类，具体包括：根据监控数据的实时监控属性和监控有效性，将监控数据分为实时监控信息、非实时监控信息和非监控信息。In one embodiment, the monitoring data aggregation layer can classify the monitoring data, specifically including: classifying the monitoring data into real-time monitoring information, non-real-time monitoring information and non-monitoring information according to the real-time monitoring attributes and monitoring effectiveness of the monitoring data.

在一个实施例中，监控数据汇聚层对每类监控数据进行汇聚处理，具体包括：若接收的监控数据为实时监控信息，实时发送实时监控信息；若接收的监控数据为非实时监控信息，将非实时监控信息异步发送至监控中心；若接收的监控数据为非监控信息，丢弃非监控信息。由此，可以提高响应速度并且减轻监控中心的压力。In one embodiment, the monitoring data aggregation layer aggregates each type of monitoring data, specifically including: if the received monitoring data is real-time monitoring information, sending real-time monitoring information in real time; if the received monitoring data is non-real-time monitoring information, sending Non-real-time monitoring information is sent to the monitoring center asynchronously; if the received monitoring data is non-monitoring information, the non-monitoring information is discarded. As a result, the response speed can be improved and the pressure on the monitoring center can be reduced.

在一个实施例中，实时信息优先级最高，可由数据汇聚组1011的主汇聚节点实时地将该实时信息发送至监控中心103或更高级别的监控数据汇聚层。在一个实施例中，可根据资源使用情况、网络流量等由主汇聚节点异步地将非实时信息直接发送至监控中心103。在一个实施例中，非监控信息(例如，包括无效信息或是不再需要的信息)可由各节点直接丢弃或是由主汇聚节点丢弃。由此，将减轻监控中心103和更高级别的监控数据汇聚层的压力。In one embodiment, the real-time information has the highest priority, and the real-time information can be sent to the monitoring center 103 or a higher-level monitoring data aggregation layer by the main aggregation node of the data aggregation group 1011 in real time. In one embodiment, the main aggregation node may asynchronously send the non-real-time information directly to the monitoring center 103 according to resource usage, network traffic, and the like. In one embodiment, non-monitoring information (for example, including invalid information or information that is no longer needed) can be directly discarded by each node or discarded by the main sink node. As a result, the pressure on the monitoring center 103 and higher-level monitoring data aggregation layers will be relieved.

在一个实施例中，配置中心104，其可用于存储各种配置策略，并下发到相应节点。In one embodiment, the configuration center 104 can be used to store various configuration policies and issue them to corresponding nodes.

在一个实施例中，配置策略可以包括：监控资源层102的分组信息、监控资源层102的主监控节点选择机制、监控的策略、监控数据汇聚层的汇聚策略、监控数据汇聚层的主汇聚节点选择机制。In one embodiment, the configuration strategy may include: monitoring the grouping information of the resource layer 102, the main monitoring node selection mechanism of the monitoring resource layer 102, the monitoring strategy, the aggregation strategy of the monitoring data aggregation layer, and the main aggregation node of the monitoring data aggregation layer Selection mechanism.

在一个实施例中，配置中心104可以包括监控组配置1041。在一个实施例中，监控组配置1041可以存储监控资源的分组信息配置。换言之，监控组配置1041可以存储与所监控的资源相关的信息，包括但不限于每个资源所属的监控资源组。在一个实施例中，配置中心104可以根据业务信息、物理区域、业务逻辑、网络区域等来对监控资源进行分组。在一个实施例中，由于对每个资源部署了监控节点(例如，代理)，因此在对每个资源所属的监控资源组进行配置之后，监控节点也具有了相应的组别。In one embodiment, configuration center 104 may include monitoring group configuration 1041 . In one embodiment, the monitoring group configuration 1041 may store group information configurations of monitoring resources. In other words, the monitoring group configuration 1041 may store information related to monitored resources, including but not limited to the monitoring resource group to which each resource belongs. In one embodiment, the configuration center 104 can group monitoring resources according to business information, physical area, business logic, network area, and so on. In one embodiment, since a monitoring node (for example, an agent) is deployed for each resource, after the monitoring resource group to which each resource belongs is configured, the monitoring node also has a corresponding group.

在一个实施例中，配置中心104可以包括汇聚组配置1042。在一个实施例中，汇聚组配置1042可以存储监控数据汇聚层的节点分组信息配置。换言之，汇聚组配置1042可以存储与监控数据汇聚层的节点相关的信息，包括但不限于每个节点所属的数据汇聚组以及数据汇聚组所属的层级。In one embodiment, configuration center 104 may include aggregation group configuration 1042 . In one embodiment, the aggregation group configuration 1042 can store the node group information configuration of the monitoring data aggregation layer. In other words, the aggregation group configuration 1042 may store information related to the nodes monitoring the data aggregation layer, including but not limited to the data aggregation group to which each node belongs and the level to which the data aggregation group belongs.

在一个实施例中，配置中心104可以包括监控策略配置1043。在一个实施例中，监控策略配置1043可以存储与监控相关的策略配置。在一个实施例中，该策略配置包括但不限于：需要采集的信息、采集信息的频度、采集信息的方法、以及监控资源层要将信息上报给监控数据汇聚层的哪个节点等。在一个实施例中，该策略配置还可以包括主监控节点需要主动发起的对同组资源的监控探测策略，例如，同组网络延迟检查、同组节点存活性检查等。In one embodiment, configuration center 104 may include monitoring policy configuration 1043 . In one embodiment, monitoring policy configuration 1043 may store policy configurations related to monitoring. In one embodiment, the policy configuration includes but is not limited to: the information to be collected, the frequency of collecting information, the method of collecting information, and which node of the monitoring data aggregation layer the monitoring resource layer should report the information to, etc. In an embodiment, the policy configuration may also include a monitoring and detection policy for resources in the same group that the master monitoring node needs to actively initiate, for example, network delay checks in the same group, node survivability checks in the same group, and the like.

在一个实施例中，配置中心104可以包括汇聚策略配置1044。在一个实施例中，汇聚策略配置1044可以存储监控数据汇聚层的汇聚策略配置。在一个实施例中，该汇聚策略配置可以包括监控数据汇聚层对汇聚上来的信息的初步分析策略、信息分类策略(将信息分为实时信息、非实时信息、非监控信息等)、主汇聚节点的上级接收节点配置(例如，哪个数据汇聚组中的哪个节点接收信息)、非实时监控信息异步上传监控中心策略(例如，异步上传时机、方式等)等。In one embodiment, configuration center 104 may include aggregation policy configuration 1044 . In one embodiment, the aggregation policy configuration 1044 can store the aggregation policy configuration of the monitoring data aggregation layer. In one embodiment, the aggregation policy configuration may include the monitoring data aggregation layer's preliminary analysis strategy for the aggregated information, information classification strategy (dividing information into real-time information, non-real-time information, non-monitoring information, etc.), main aggregation node The upper-level receiving node configuration (for example, which node in which data aggregation group receives information), the non-real-time monitoring information asynchronous upload monitoring center strategy (for example, asynchronous upload timing, method, etc.), etc.

在一个实施例中，配置中心104可以包括监控组/汇聚组主节点选择机制配置1045。在一个实施例中，监控组/汇聚组主节点选择机制配置1045可以存储监控组主监控节点/汇聚组主汇聚节点选择机制配置。由于大型数据中心环境复杂、资源各异，因此每个监控组/汇聚组的主节点选择策略受限于相应的节点资源、网络资源、输入输出(I/O)资源等以及相应的监控策略。In one embodiment, the configuration center 104 may include a monitoring group/aggregation group master node selection mechanism configuration 1045 . In one embodiment, the monitoring group/convergence group master node selection mechanism configuration 1045 may store the monitoring group master monitoring node/convergence group master sink node selection mechanism configuration. Due to the complex environment and different resources of large data centers, the master node selection strategy of each monitoring group/aggregation group is limited by the corresponding node resources, network resources, input and output (I/O) resources, etc. and corresponding monitoring strategies.

在一个实施例中，每个监控组/汇聚组可以按照各自业务特性，来设定相应的选择机制。例如，在I/O资源压力很大的监控资源组中，可以设定优先选取I/O资源压力较小的节点作为主监控节点。例如，在处理资源(例如，处理器)消耗较大的监控资源组中，可以设定处理资源使用率较低的节点优先作为主监控节点。在环境更加复杂的情形中，选择主节点的策略可能需要综合多种因素。In an embodiment, each monitoring group/aggregation group can set a corresponding selection mechanism according to its own business characteristics. For example, in a monitoring resource group with heavy I/O resource pressure, it may be set to preferentially select a node with less I/O resource pressure as the main monitoring node. For example, in a monitoring resource group that consumes a large amount of processing resources (for example, a processor), a node with a lower usage rate of processing resources may be set to be the primary monitoring node first. In a more complex environment, the strategy for selecting the master node may need to combine multiple factors.

例如，上述策略可依托于具体的业务而变化，例如，分别与分析类的业务和营业类的业务相对应的策略是不同的。例如，在一个实施例中，对于分析类的业务，可以不考虑具体时间点或时间段的处理资源的使用率，但是对于营业类业务，则需要考虑这一属性。For example, the above strategies may vary depending on the specific business, for example, the strategies corresponding to the analysis business and business business are different. For example, in one embodiment, for analysis services, the utilization rate of processing resources at a specific time point or time period may not be considered, but for business services, this attribute needs to be considered.

在一个实施例中，对于各种不同的业务，相应不同策略中需要考虑的属性包括但不限于以下各项中的一项或多项：I/O资源使用情况、处理资源使用情况、网络流量、链接数等等。在一个实施例中，一些业务可能不需要考虑上述任何属性，而另外一些业务可能需要考虑全部属性。In one embodiment, for various businesses, the attributes to be considered in corresponding different policies include but not limited to one or more of the following: I/O resource usage, processing resource usage, network traffic , number of links, etc. In an embodiment, some services may not need to consider any of the above attributes, while other services may need to consider all attributes.

在一个实施例中，所考虑的属性可以是实时的数据，也可以是非实时的历史数据。In an embodiment, the considered attribute may be real-time data or non-real-time historical data.

在一个实施例中，配置中心104可以包括配置策略下发中心1046。在一个实施例中，配置策略下发中心1046可以负责监控资源层节点/监控数据汇聚层节点的配置策略下发，将上述监控组配置1041、汇聚组配置4012、监控策略配置1043、汇聚策略配置1044、监控组/汇聚组主节点选择机制配置1045中的一者或多者的策略配置下发到相应节点。In one embodiment, the configuration center 104 may include a configuration policy distribution center 1046 . In one embodiment, the configuration policy delivery center 1046 can be responsible for delivering the configuration policy of the monitoring resource layer nodes/monitoring data convergence layer nodes. 1044. The policy configuration of one or more of the monitoring group/aggregation group master node selection mechanism configurations in 1045 is delivered to corresponding nodes.

图2根据本申请的一个实施例，示出了用于数据中心的监控方法的流程图200。该监控方法可以在包括监控资源层、监控数据汇聚层、监控中心和配置中心的监控系统中被执行，其中，监控数据汇聚层为多层结构。Fig. 2 shows a flow chart 200 of a monitoring method for a data center according to an embodiment of the present application. The monitoring method can be implemented in a monitoring system including a monitoring resource layer, a monitoring data aggregation layer, a monitoring center and a configuration center, wherein the monitoring data aggregation layer has a multi-layer structure.

在步骤201处，配置中心配置监控资源组的分组信息、监控数据汇聚层的汇聚节点分组信息和汇聚策略，并下发给监控资源层和监控数据汇聚层。At step 201, the configuration center configures the grouping information of the monitoring resource group, the grouping information of the aggregation node and the aggregation strategy of the monitoring data aggregation layer, and delivers them to the monitoring resource layer and the monitoring data aggregation layer.

在步骤202处，监控资源层采集监控数据并将监控数据上报给监控数据汇聚层，监控资源层包括多个监控资源组，每个监控资源组根据配置中心下发的规则采集监控数据。At step 202, the monitoring resource layer collects monitoring data and reports the monitoring data to the monitoring data aggregation layer. The monitoring resource layer includes multiple monitoring resource groups, and each monitoring resource group collects monitoring data according to the rules issued by the configuration center.

在步骤203处，监控数据汇聚层接送监控资源层发送的监控数据，对监控数据进行分类，并对每类监控数据进行汇聚处理。At step 203, the monitoring data aggregation layer receives and sends the monitoring data sent by the monitoring resource layer, classifies the monitoring data, and aggregates each type of monitoring data.

在步骤204处，监控中心存储监控数据。At step 204, the monitoring center stores the monitoring data.

在一个实施例中，监控资源层可以包括多个监控资源组，每个监控资源组可以包括一个或多个节点。In an embodiment, the monitoring resource layer may include multiple monitoring resource groups, and each monitoring resource group may include one or more nodes.

在一个实施例中，监控数据汇聚层至少包括第一监控数据汇聚层和第二监控数据汇聚层，第二监控数据汇聚层为第一监控数据汇聚层的上层汇聚层，方法200可以包括：第一监控数据汇聚层接送监控资源层发送的监控数据，用于对监控数据进行汇聚以获得第一监控信息，并将第一监控信息发送到第二监控数据汇聚层；第二监控数据汇聚层从第一监控数据汇聚层接收第一监控信息，对第一监控信息进行汇聚以获得第二监控信息，并将第二监控信息发送到监控中心或第二监控数据汇聚层的上层汇聚层。In one embodiment, the monitoring data aggregation layer includes at least a first monitoring data aggregation layer and a second monitoring data aggregation layer, and the second monitoring data aggregation layer is an upper aggregation layer of the first monitoring data aggregation layer. The method 200 may include: A monitoring data aggregation layer receives and sends monitoring data sent by the monitoring resource layer, and is used to aggregate the monitoring data to obtain first monitoring information, and sends the first monitoring information to the second monitoring data aggregation layer; the second monitoring data aggregation layer receives from The first monitoring data aggregation layer receives the first monitoring information, aggregates the first monitoring information to obtain second monitoring information, and sends the second monitoring information to the monitoring center or the upper aggregation layer of the second monitoring data aggregation layer.

在一个实施例中，步骤203中监控数据汇聚层对监控数据进行分类可以包括：根据监控数据的实时监控属性和监控有效性，将监控数据分为实时监控信息、非实时监控信息和非监控信息。In one embodiment, in step 203, the monitoring data aggregation layer classifies the monitoring data may include: according to the real-time monitoring attributes and monitoring effectiveness of the monitoring data, the monitoring data is divided into real-time monitoring information, non-real-time monitoring information and non-monitoring information .

在一个实施例中，步骤203中监控数据汇聚层对每类监控数据进行汇聚处理可以包括：若接收的监控数据为实时监控信息，实时发送实时监控信息；若接收的监控数据为非实时监控信息，将非实时监控信息异步发送至监控中心；若接收的监控数据为非监控信息，丢弃非监控信息。In one embodiment, in step 203, the aggregation processing of each type of monitoring data by the monitoring data aggregation layer in step 203 may include: if the received monitoring data is real-time monitoring information, sending real-time monitoring information in real time; if the received monitoring data is non-real-time monitoring information , to asynchronously send non-real-time monitoring information to the monitoring center; if the received monitoring data is non-monitoring information, discard the non-monitoring information.

在一个实施例中，方法200还包括配置中心对如下项进行配置：监控资源层的主监控节点选择机制、监控的策略、监控数据汇聚层的汇聚策略、监控数据汇聚层的主汇聚节点选择机制。例如，在配置中心配置与监控相关的策略，如上，由配置中心下发该策略到监控资源层中的节点。例如，在配置中心配置监控数据汇聚层的汇聚策略配置，如上，由配置中心下发该汇聚策略配置到各个监控数据汇聚层中的节点。In one embodiment, the method 200 further includes configuring the following items by the configuration center: the main monitoring node selection mechanism of the monitoring resource layer, the monitoring strategy, the aggregation strategy of the monitoring data aggregation layer, and the main aggregation node selection mechanism of the monitoring data aggregation layer . For example, policies related to monitoring are configured in the configuration center. As above, the configuration center delivers the policies to nodes in the monitoring resource layer. For example, the aggregation policy configuration of the monitoring data aggregation layer is configured in the configuration center. As above, the configuration center sends the aggregation policy configuration to the nodes in each monitoring data aggregation layer.

在一个实施例中，配置中心可以对如下项进行配置：监控资源层的主监控节点选择机制、监控的策略、监控数据汇聚层的汇聚策略、监控数据汇聚层的主汇聚节点选择机制。In one embodiment, the configuration center can configure the following items: the main monitoring node selection mechanism of the monitoring resource layer, the monitoring strategy, the aggregation strategy of the monitoring data aggregation layer, and the main aggregation node selection mechanism of the monitoring data aggregation layer.

在一个实施例中，监控数据汇聚层可以包括多个数据汇聚组，方法200还可以包括：每个数据汇聚组根据配置中心下发的规则进行操作。In an embodiment, the monitoring data aggregation layer may include multiple data aggregation groups, and the method 200 may further include: each data aggregation group performs an operation according to a rule issued by the configuration center.

在一个实施例中，方法200还可以包括：从监控数据汇聚组中的多个节点中选择主汇聚节点来发送监控信息。In an embodiment, the method 200 may further include: selecting a main aggregation node from multiple nodes in the monitoring data aggregation group to send monitoring information.

在一个实施例中，监控资源层中的监控资源组的各个节点将采集到的数据汇总到相应监控资源组的主监控节点，由该主监控节点将采集到的数据向上发送到监控数据汇聚层。在一个实施例中，监控资源层中的监控资源组的各个节点将采集到的数据直接向上发送到监控数据汇聚层。In one embodiment, each node of the monitoring resource group in the monitoring resource layer summarizes the collected data to the main monitoring node of the corresponding monitoring resource group, and the main monitoring node sends the collected data upward to the monitoring data convergence layer . In one embodiment, each node of the monitoring resource group in the monitoring resource layer directly sends the collected data upward to the monitoring data aggregation layer.

在一个实施例中，监控数据汇聚层的节点可以对从监控资源层或更低级别的监控数据汇聚层接收到的数据进行分类，然后将相应数据汇聚到所在数据汇聚组的主汇聚节点处。在一个实施例中，监控数据汇聚层的主汇聚节点可以在汇聚到所在数据汇聚组的其他节点的数据之后，对这些数据进行分类。In one embodiment, the nodes at the monitoring data aggregation layer can classify the data received from the monitoring resource layer or lower-level monitoring data aggregation layer, and then aggregate the corresponding data to the main aggregation node of the data aggregation group. In an embodiment, the main aggregation node of the monitoring data aggregation layer may classify the data after aggregation to the data of other nodes in the data aggregation group.

图3根据本申请的一个实施例，示出了监控中心的操作流程图300。Fig. 3 shows an operation flowchart 300 of the monitoring center according to an embodiment of the present application.

在一个实施例中，在301处，监控数据汇聚层101通过同步或是异步的方式，将采集到的实时或非实时数据发送到监控中心103，并且监控中心103将该数据存储于监控数据中心1031。In one embodiment, at 301, the monitoring data aggregation layer 101 sends the collected real-time or non-real-time data to the monitoring center 103 in a synchronous or asynchronous manner, and the monitoring center 103 stores the data in the monitoring data center 1031.

在一个实施例中，在302处，监控数据分析中心1032读取监控数据中心1031的数据以进行分析。In one embodiment, at 302, the monitoring data analysis center 1032 reads the data of the monitoring data center 1031 for analysis.

在一个实施例中，在303处，监控数据分析中心1032将分析结果存储到监控数据中心1031。In one embodiment, at 303 , the monitoring data analysis center 1032 stores the analysis results in the monitoring data center 1031 .

在一个实施例中，监控数据分析中心1032对从监控数据中心1031所读取的数据进行分析，若发现故障，则在304处，将通知配置中心104更新故障节点相关节点的配置信息。In one embodiment, the monitoring data analysis center 1032 analyzes the data read from the monitoring data center 1031, and if a fault is found, at 304, the configuration center 104 will be notified to update the configuration information of the node related to the faulty node.

在一个实施例中，监控数据分析中心1032对故障进行初步分析处理。In one embodiment, the monitoring data analysis center 1032 performs preliminary analysis and processing on the fault.

在一个实施例中，在305处，若是故障不在自愈策略范围内，则监控数据分析中心1032通知监控通知告警中心1033向相应人员发出相应的通知告警信息，从而进行人工处理。In one embodiment, at 305, if the fault is not within the scope of the self-healing strategy, the monitoring data analysis center 1032 notifies the monitoring notification and alarm center 1033 to send corresponding notification and alarm information to corresponding personnel, so as to perform manual processing.

在一个实施例中，在306处，若是故障在自愈策略范围内，监控数据分析中心1032将故障信息发送给监控自定义自愈分析中心1034。In one embodiment, at 306, if the fault is within the scope of the self-healing strategy, the monitoring data analysis center 1032 sends the fault information to the monitoring self-healing analysis center 1034 .

在一个实施例中，在307处，监控自定义自愈分析中心1034对故障信息进行分析，生成相应的自愈指令，并且将自愈指令下发到监控自定义动作下发中心1035。In one embodiment, at 307 , the monitoring self-healing analysis center 1034 analyzes the fault information, generates a corresponding self-healing instruction, and sends the self-healing instruction to the monitoring custom action issuing center 1035 .

在一个实施例中，监控自定义动作下发中心1035将相应的自愈操作指令下发到相应的节点进行自愈操作。例如，在309处，监控自定义动作下发中心1035将针对监控资源节点的自愈操作指令下发到相应的监控资源节点进行自愈操作。例如，在309处，监控自定义动作下发中心1035将针对监控数据汇聚节点的自愈操作指令下发到相应的监控数据汇聚节点进行自愈操作。In one embodiment, the monitoring custom action issuing center 1035 issues the corresponding self-healing operation instruction to the corresponding node to perform the self-healing operation. For example, at 309, the monitoring custom action issuing center 1035 issues the self-healing operation instruction for the monitoring resource node to the corresponding monitoring resource node to perform the self-healing operation. For example, at 309, the monitoring custom action issuing center 1035 issues the self-healing operation instruction for the monitoring data aggregation node to the corresponding monitoring data aggregation node to perform the self-healing operation.

在一个实施例中，如果在310处故障节点自愈操作成功，则在311处，通知配置中心104下发相关节点的策略，以将相应节点重新纳入监控范围。In one embodiment, if the self-healing operation of the faulty node is successful at 310 , then at 311 , the configuration center 104 is notified to deliver the policy of the relevant node, so as to bring the corresponding node into the monitoring range again.

在一个实施例中，如果在312处故障节点自愈操作失败，则在313处，通知监控通知告警中心1033发送相应的通知告警给相应人员，以进行人工处理。In one embodiment, if the self-healing operation of the faulty node fails at 312, then at 313, the notification monitoring notification and alarm center 1033 sends a corresponding notification alarm to corresponding personnel for manual processing.

图4示出了信息处理设备400的结构示意图，本申请的实施例中的监控资源层的节点、监控数据汇聚层的节点、监控中心、配置中心中的一者或多者可以由信息处理设备400来实现。如图4所示，设备400可以包括以下组件中的一项或多项：处理器420、存储器430、电源组件440、输入/输出(I/O)接口460、通信接口480，这些组件例如可以通过总线410以可通信的方式连接。Figure 4 shows a schematic structural diagram of an information processing device 400, one or more of the monitoring resource layer nodes, monitoring data aggregation layer nodes, monitoring center, and configuration center in the embodiment of the present application can be implemented by the information processing device 400 to achieve. As shown in FIG. 4, the device 400 may include one or more of the following components: a processor 420, a memory 430, a power supply component 440, an input/output (I/O) interface 460, and a communication interface 480. These components may, for example, Communicatively connected via bus 410 .

处理器420在整体上控制设备400的操作，例如与数据通信和计算处理等相关联的操作。处理器420可以包括一个或多个处理核心，并能够执行指令以实现本申请中所述方法的全部或部分步骤。处理器420可以包括具有处理功能的各种装置，包括但不限于通用处理器、专用处理器、微处理器、微控制器、图形处理器(GPU)、数字信号处理器(DSP)、专用集成电路(ASIC)、可编程逻辑器件(PLD)、现场可编程逻辑门阵列(FPGA)等。处理器420可以包括缓存425或可以与缓存425通信，以提高数据的访问速度。The processor 420 controls the operations of the device 400 as a whole, such as those associated with data communication and computational processing, among others. The processor 420 may include one or more processing cores, and is capable of executing instructions to implement all or part of the steps of the methods described in this application. Processor 420 may include various devices with processing capabilities, including but not limited to general purpose processors, special purpose processors, microprocessors, microcontrollers, graphics processing units (GPUs), digital signal processors (DSPs), application specific integrated circuit (ASIC), programmable logic device (PLD), field programmable logic gate array (FPGA), etc. The processor 420 may include or be in communication with a cache 425 to increase data access speed.

存储器430被配置为存储各种类型的指令和/或数据以支持设备400的操作。数据的示例包括用于在设备400上操作的任何应用程序或方法的指令、数据等。存储器430可以由任何类型的易失性或非易失性存储设备或者它们的组合实现。存储器430可以包括半导体存储器，例如随机存储器(RAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、只读存储器(ROM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、快闪存储器等。存储器430也可以包括例如使用纸介质、磁介质和/或光介质的任何存储器，如纸带、硬盘、磁带、软盘、磁光盘(MO)、CD、DVD、Blue-ray等。The memory 430 is configured to store various types of instructions and/or data to support the operation of the device 400 . Examples of data include instructions, data, etc. for any application or method operating on device 400 . Memory 430 may be implemented by any type or combination of volatile or non-volatile storage devices. Memory 430 may include semiconductor memory such as random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), read only memory (ROM), programmable read only memory (PROM), erasable In addition to programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, etc. Memory 430 may also include, for example, any memory using paper, magnetic, and/or optical media, such as paper tape, hard disk, magnetic tape, floppy disk, magneto-optical disk (MO), CD, DVD, Blue-ray, and the like.

电源组件440为设备400的各种组件提供电力。电源组件440可以包括内部电池和/或外部电源接口，并可以包括电源管理系统以及其他与为设备400生成、管理和分配电力相关联的组件。The power supply component 440 provides power to various components of the device 400 . Power component 440 may include an internal battery and/or an external power interface, and may include a power management system and other components associated with generating, managing, and distributing power for device 400 .

I/O接口460提供了使用户能够与设备400进行交互的接口。I/O接口460例如可以包括基于PS/2、RS-232、USB、FireWire、Lightening、VGA、HDMI、DisplayPort等技术的接口，使用户能够通过键盘、鼠标器、触摸板、触摸屏、操纵杆、按钮、麦克风、扬声器、显示器、摄像头、投影端口等周边装置与设备400进行交互。I/O interface 460 provides an interface that enables a user to interact with device 400 . For example, the I/O interface 460 may include interfaces based on technologies such as PS/2, RS-232, USB, FireWire, Lightening, VGA, HDMI, DisplayPort, etc., enabling users to Peripheral devices such as a microphone, a speaker, a display, a camera, and a projection port interact with the device 400 .

通信接口480被配置来使设备400能够与其他设备以有线或无线方式进行通信。设备400可以通过通信接口480接入基于一种或多种通信标准的无线网络，例如Wi-Fi、2G、3G、4G通信网络。在一种示例性实施例中，通信接口480还可以经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。示例性的通信接口480可以包括基于近场通信(NFC)技术、射频识别(RFID)技术、红外数据协会(IrDA)技术、超宽带(UWB)技术、蓝牙(BT)技术等通信方式的接口。The communication interface 480 is configured to enable the device 400 to communicate with other devices in a wired or wireless manner. The device 400 can access wireless networks based on one or more communication standards through the communication interface 480, such as Wi-Fi, 2G, 3G, and 4G communication networks. In an exemplary embodiment, the communication interface 480 may also receive a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. Exemplary communication interfaces 480 may include interfaces based on near field communication (NFC) technology, radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra wideband (UWB) technology, Bluetooth (BT) technology and other communication methods.

以上所述的结构框图中所示的功能块可以实现为硬件、软件、固件或者它们的组合。当以硬件方式实现时，其可以例如是电子电路、专用集成电路(ASIC)、适当的固件、插件、功能卡等等。当以软件方式实现时，本发明的元素是被用于执行所需任务的程序或者代码段。程序或者代码段可以存储在机器可读介质中，或者通过载波中携带的数据信号在传输介质或者通信链路上传送。“机器可读介质”可以包括能够存储或传输信息的任何介质。机器可读介质的例子包括电子电路、半导体存储器设备、ROM、闪存、可擦除ROM(EROM)、软盘、CD-ROM、光盘、硬盘、光纤介质、射频(RF)链路，等等。代码段可以经由诸如因特网、内联网等的计算机网络被下载。The functional blocks shown in the structural block diagrams described above may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an application specific integrated circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments employed to perform the required tasks. Programs or code segments can be stored in machine-readable media, or transmitted over transmission media or communication links by data signals carried in carrier waves. "Machine-readable medium" may include any medium that can store or transmit information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, and the like. Code segments may be downloaded via a computer network such as the Internet, an Intranet, or the like.

上文中提到了“一个实施例”，然而应理解，在各个实施例中提及的特征并不一定只能应用于该实施例，而是可能用于其他实施例或与其他实施例组合使用。"One embodiment" is mentioned above, but it should be understood that the features mentioned in each embodiment are not necessarily only applicable to this embodiment, but may be used in other embodiments or used in combination with other embodiments.

以上参考本申请的具体实施例对本申请进行了描述，但是本领域技术人员均了解，本文中所提到的实现方法均为本申请声明，所列的具体实施例仅为本申请的应用举例，并不代表本申请仅限于此类应用示例，并且可以对这些具体实施例进行各种修改、组合和变更，而不会脱离由所附权利要求或其等同物限定的本申请的精神和范围。The application has been described above with reference to the specific embodiments of the application, but those skilled in the art understand that the implementation methods mentioned herein are all statements of the application, and the specific embodiments listed are only application examples of the application. It does not represent that the present application is limited to such application examples, and various modifications, combinations and changes can be made to these specific embodiments without departing from the spirit and scope of the present application defined by the appended claims or their equivalents.

Claims

1. a kind of monitoring system for data center, which is characterized in that the monitoring system includes monitoring resource layer, monitoring number According to convergence-level, monitoring center and configuration center, the monitoring data convergence-level is multilayered structure,

The monitoring resource layer includes multiple monitoring resource groups, what each monitoring resource group was issued according to the configuration center Rule acquisition monitoring data, and the monitoring data is reported to the monitoring data convergence-level；

The monitoring data convergence-level picks the monitoring data that the monitoring resource layer is sent, and the monitoring data is divided Class, and convergence processing is carried out to every class monitoring data；

The monitoring center, for storing the monitoring data；

The configuration center, for the convergence section of the grouping information of the monitoring resource group, the monitoring data convergence-level to be configured Point grouping information and convergence strategy, and it is handed down to the monitoring resource layer and the monitoring data convergence-level.

2. monitoring system as described in claim 1, which is characterized in that the monitoring data convergence-level includes at least the first monitoring Data convergence-level and the second monitoring data convergence-level, the second monitoring data convergence-level are the first monitoring data convergence-level Upper strata convergence-level；

The first monitoring data convergence-level picks the monitoring data that the monitoring resource layer is sent, for the monitoring data It is converged to obtain the first monitoring information, and first monitoring information is sent to the second monitoring data convergence-level；

The second monitoring data convergence-level receives first monitoring information, to institute from the first monitoring data convergence-level The first monitoring information is stated to be converged to obtain the second monitoring information, and by second monitoring information be sent to monitoring center or The upper strata convergence-level of second monitoring data convergence-level.

3. monitoring system as claimed in claim 1 or 2, which is characterized in that the monitoring data convergence-level is to the monitoring number According to classifying, specifically include：

According to the real time monitoring attribute of the monitoring data and effective monitoring, the monitoring data is divided into real time monitoring letter Breath, non real-time monitoring information and non-supervised information.

4. monitoring system as claimed in claim 3, which is characterized in that the monitoring data convergence-level to every class monitoring data into Row convergence processing, specifically includes：

If the monitoring data received is real time monitoring information, the real time monitoring information is sent in real time；

If the monitoring data received is non real-time monitoring information, will be in the non real-time monitoring information asynchronous transmission to the monitoring The heart；

If the monitoring data received is non-supervised information, the non-supervised information is abandoned.

5. monitoring system as described in claim 1, which is characterized in that the configuration center is additionally operable to be configured：

The main monitoring node selection mechanism of the monitoring resource layer, the convergence plan of tactful, the described monitoring data convergence-level of monitoring Slightly, the main aggregation node selection mechanism of the monitoring data convergence-level.

6. monitoring system as described in claim 1, which is characterized in that the monitoring data convergence-level is converged including multiple data Group, each data convergence group are operated according to the rule that the configuration center issues.

7. a kind of monitoring method for data center, which is characterized in that the monitoring method is including monitoring resource layer, monitoring It is performed in the monitoring system of data convergence-level, monitoring center and configuration center, wherein, the monitoring data convergence-level is multilayer Structure, the monitoring method include：

The aggregation node grouping of the grouping information, the monitoring data convergence-level of the configuration center configuration monitoring resource group Information and convergence strategy, and it is handed down to the monitoring resource layer and the monitoring data convergence-level；

The monitoring data is simultaneously reported to the monitoring data convergence-level, the prison by the monitoring resource layer acquisition monitoring data It controls resource layer and includes multiple monitoring resource groups, the rule acquisition prison that each monitoring resource group is issued according to the configuration center Control data；

The monitoring data convergence-level picks the monitoring data that the monitoring resource layer is sent, and the monitoring data is divided Class, and convergence processing is carried out to every class monitoring data；And

The monitoring center stores the monitoring data.

8. monitoring method as claimed in claim 7, which is characterized in that the monitoring data convergence-level includes at least the first monitoring Data convergence-level and the second monitoring data convergence-level, the second monitoring data convergence-level are the first monitoring data convergence-level Upper strata convergence-level, the method includes：

The second monitoring data convergence-level receives first monitoring information from the first monitoring data convergence-level, to described First monitoring information is converged to obtain the second monitoring information, and second monitoring information is sent to monitoring center or The upper strata convergence-level of two monitoring data convergence-levels.

9. monitoring method as claimed in claim 7 or 8, which is characterized in that the monitoring data convergence-level is to the monitoring number Include according to classification is carried out：

10. monitoring method as claimed in claim 9, which is characterized in that the monitoring data convergence-level is to every class monitoring data Convergence processing is carried out to include：

11. monitoring method as claimed in claim 7, which is characterized in that the configuration center is also configured following item：Institute State the convergence strategy, described of the main monitoring node selection mechanism of monitoring resource layer, tactful, the described monitoring data convergence-level of monitoring The main aggregation node selection mechanism of monitoring data convergence-level.

12. monitoring method as claimed in claim 7, which is characterized in that the monitoring data convergence-level is converged including multiple data Poly group, the monitoring method include：Each data convergence group is operated according to the rule that the configuration center issues.