CN110750369B

CN110750369B - A distributed node management method and system

Info

Publication number: CN110750369B
Application number: CN201910955865.6A
Authority: CN
Inventors: 许成喜; 陈诚; 胡淼; 沈毅; 张旻; 马慧敏
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-10-09
Filing date: 2019-10-09
Publication date: 2022-07-05
Anticipated expiration: 2039-10-09
Also published as: CN110750369A

Abstract

The present invention provides a distributed node management method and system. The method includes: establishing a secure communication channel between a management server and a working node, and the management server sends task information to the working node through the secure communication channel; the The working node determines whether an update operation needs to be started according to the task information; if so, the working node establishes a message queue with the message server, and sends an update request to the management server through the message server; the management server receives and process the update request, and send the update data to the worker node; the worker node periodically sends its node information to the management server through the message queue, and the management server receives and updates the worker node node information. By adopting the solution of the present invention, the technical effect of taking into account the controllability, flexibility and real-time performance of node management in a distributed system cluster is achieved.

Description

A distributed node management method and system

技术领域technical field

本发明属于分布式系统集群管理领域，具体涉及一种分布式节点管理方法及系统。The invention belongs to the field of distributed system cluster management, in particular to a distributed node management method and system.

背景技术Background technique

当前，互联网信息探测领域普遍采取分布式探测技术，通过分布式部署工作节点，分发调度服务器综合考虑任务数量、工作节点计算资源和网络资源，分布式地调度工作节点执行互联网信息探测任务。节点管理包括工作节点的安装配置、初始化、插件部署及插件更新、节点状态控制、节点资源的管理等内容。节点管理效率直接影响整个系统的运行效率，因此，节点管理在此类系统中具有十分重要的作用。实现节点的自动化管理，能够极大减少系统部署、管理过程中需要人工手动完成的工作量，对于提高系统可扩展性及探测节点规模化具有重要意义。At present, distributed detection technology is generally adopted in the field of Internet information detection. Through distributed deployment of working nodes, the distribution scheduling server comprehensively considers the number of tasks, computing resources of working nodes and network resources, and dispatches working nodes to perform Internet information detection tasks in a distributed manner. Node management includes the installation and configuration of worker nodes, initialization, plug-in deployment and plug-in update, node status control, and node resource management. The efficiency of node management directly affects the operation efficiency of the whole system. Therefore, node management plays a very important role in such systems. Realizing the automatic management of nodes can greatly reduce the workload that needs to be done manually in the process of system deployment and management, and is of great significance for improving the scalability of the system and the scale of detection nodes.

一般而言，节点管理通常包括3种基本模式：主动管理、被动管理和异常管理。主动管理是指管理节点按照一定的策略，诸如轮询、随机等，向工作节点发送管理指令，完成工作节点部署、初始化、收集节点状态等功能。被动管理是指管理节点被动接收工作节点上报的请求、节点状态等消息，根据消息采取相应的管理动作。被动管理实时性较高，但在工作节点出现异常时，可靠性难以保证。异常管理是指当工作节点在计算、存储、网络等方面或者插件运行过程中出现故障时采取的重启、重试、更新等管理操作，异常管理根据具体异常可采取不同的实现方式。Generally speaking, node management usually includes three basic modes: active management, passive management and exception management. Active management means that the management node sends management instructions to the worker nodes according to certain strategies, such as polling, random, etc., to complete the functions of worker node deployment, initialization, and collection of node status. Passive management means that the management node passively receives messages such as requests and node status reported by the working nodes, and takes corresponding management actions according to the messages. Passive management has high real-time performance, but it is difficult to guarantee reliability when the working node is abnormal. Exception management refers to the restart, retry, update and other management operations taken when a worker node fails in computing, storage, network, etc. or during the operation of a plug-in. Exception management can be implemented in different ways according to specific exceptions.

综上所述，上述现有技术存在以下不足：主动管理适用于节点安装、插件部署、初始化等场景，但不适用于个别节点的管理操作。主动管理可操作性强，易于控制，但不够灵活；被动管理实时性较高，但在工作节点出现异常时，可靠性难以保证；异常管理只能用于工作节点出现异常情形，在对于其他情形，则无法适用。To sum up, the above-mentioned prior art has the following shortcomings: active management is suitable for scenarios such as node installation, plug-in deployment, initialization, etc., but is not suitable for management operations of individual nodes. Active management is highly operable and easy to control, but not flexible enough; passive management has high real-time performance, but it is difficult to guarantee reliability when working nodes are abnormal; exception management can only be used for abnormal situations in working nodes, and in other cases , it is not applicable.

发明内容SUMMARY OF THE INVENTION

为解决上述技术问题，本发明提出了一种分布式节点管理方法，所述方法包括下述步骤：In order to solve the above technical problems, the present invention proposes a distributed node management method, which includes the following steps:

步骤1：在管理服务器和工作节点之间建立安全通信通道，所述管理服务器通过所述安全通信通道向工作节点发送任务信息；Step 1: establish a secure communication channel between the management server and the working node, and the management server sends task information to the working node through the secure communication channel;

步骤2：所述工作节点根据所述任务信息，判断是否需要启动更新操作；如果是，则所述工作节点与消息服务器建立消息队列，并通过所述消息服务器向所述管理服务器发送更新请求；Step 2: the working node determines whether an update operation needs to be started according to the task information; if so, the working node establishes a message queue with a message server, and sends an update request to the management server through the message server;

步骤3：所述管理服务器接收并处理所述更新请求，并将更新数据发送给所述工作节点；Step 3: the management server receives and processes the update request, and sends the update data to the working node;

步骤4：所述工作节点定期将其节点信息通过所述消息队列发送给所述管理服务器，所述管理服务器接收并更新所述工作节点的节点信息。Step 4: The working node periodically sends its node information to the management server through the message queue, and the management server receives and updates the node information of the working node.

进一步的，在上述技术方案的基础上，所述步骤1包括：Further, on the basis of the above technical solution, the step 1 includes:

所述管理服务器向所述工作节点主动构建一条SSH通道，所述SSH通道建立后，所述管理服务器通过所述SSH通道向所述工作节点发送所述任务信息，控制所述工作节点完成相应的功能或操作。The management server actively builds an SSH channel to the working node. After the SSH channel is established, the management server sends the task information to the working node through the SSH channel, and controls the working node to complete the corresponding task information. function or operation.

进一步的，在上述技术方案的基础上，所述任务信息包括执行任务所需的插件及其版本信息。Further, on the basis of the above technical solution, the task information includes plug-ins and version information thereof required to perform the task.

进一步的，在上述技术方案的基础上，所述步骤2包括：Further, on the basis of the above technical solution, the step 2 includes:

所述工作节点根据接收到的所述所需插件及其版本信息，与所述工作节点的本地插件信息进行对比，判断是否需要启动所述更新操作；如果不一致，则需要启动所述更新操作。The work node compares the received plug-in and its version information with the local plug-in information of the work node to determine whether the update operation needs to be started; if not, the update operation needs to be started.

进一步的，在上述技术方案的基础上，所述更新操作包括：Further, on the basis of the above technical solution, the update operation includes:

所述工作节点与所述消息服务器建立RPC消息队列，通过所述RPC消息队列向所述消息服务器发送更新请求，然后由所述消息服务器将所述更新请求发送给所述管理服务器。The worker node establishes an RPC message queue with the message server, sends an update request to the message server through the RPC message queue, and then the message server sends the update request to the management server.

进一步的，在上述技术方案的基础上，所述RPC消息队列是利用RabbitMQ提供。Further, on the basis of the above technical solution, the RPC message queue is provided by RabbitMQ.

进一步的，在上述技术方案的基础上，所述步骤3包括：Further, on the basis of the above technical solution, the step 3 includes:

所述管理服务器将最新版的插件通过所述消息队列发送给所述工作节点。The management server sends the latest version of the plug-in to the worker node through the message queue.

进一步的，在上述技术方案的基础上，所述步骤4包括：Further, on the basis of the above technical solution, the step 4 includes:

所述工作节点定期将其节点信息发送至所述RPC消息队列，通过所述消息服务器向所述管理服务器报告所述工作节点的资源和状态信息，所述管理服务器接收到所述节点信息后，更新其所保存的所述工作节点的节点信息。The worker node periodically sends its node information to the RPC message queue, and reports the resource and status information of the worker node to the management server through the message server. After the management server receives the node information, Update the node information of the working node saved by it.

进一步的，在上述技术方案的基础上，所述步骤4中的节点信息包括节点资源占用情况和节点状态。Further, on the basis of the above technical solution, the node information in the step 4 includes node resource occupation and node status.

进一步的，在上述技术方案的基础上，所述步骤4中的定期是按照预设的定时心跳间隔实现。Further, on the basis of the above technical solution, the periodicity in the step 4 is implemented according to a preset timed heartbeat interval.

另一方面，本发明还提出了一种分布式节点管理系统，包括处理器和存储器，所述存储器具有存储有程序代码的介质，当所述处理器读取所述介质存储的程序代码时，所述系统能够执行上述技术方案所述的方法。On the other hand, the present invention also provides a distributed node management system, including a processor and a memory, the memory has a medium storing program codes, when the processor reads the program codes stored in the medium, The system can execute the method described in the above technical solution.

采用本发明的方法和管理系统，解决了已有节点管理方法中存在的可控性和灵活性、实时性无法同时取得的问题。由于本发明采用主动和被动相结合的管理方法，因而实现了至少如下的技术效果：(1)实现工作节点的自动化批量初始化，提高工作节点部署效率，可控制性好；(2)实现探测插件自动化实时更新，保证了工作节点均能采用最新版本的插件执行探测任务，提高了节点管理的实时性；(3)实现节点资源和状态的定时管理与控制，实现了节点状态的定时收集和管理，具有高度的灵活性。By adopting the method and management system of the present invention, the problems existing in the existing node management methods that controllability, flexibility and real-time performance cannot be achieved simultaneously are solved. Since the present invention adopts a management method combining active and passive, it achieves at least the following technical effects: (1) realizes automatic batch initialization of working nodes, improves the deployment efficiency of working nodes, and has good controllability; (2) realizes detection plug-ins The automatic real-time update ensures that the working nodes can use the latest version of the plug-in to perform detection tasks, which improves the real-time performance of node management; (3) realizes the timing management and control of node resources and status, and realizes the timing collection and management of node status. , with a high degree of flexibility.

附图说明Description of drawings

图1为本发明提出的分布式节点管理方法的流程示意图FIG. 1 is a schematic flowchart of a distributed node management method proposed by the present invention

图2为本发明提出的分布式节点管理方法中的基于SSH通道进行节点自动化部署的示意图；2 is a schematic diagram of automatic deployment of nodes based on SSH channels in the distributed node management method proposed by the present invention;

图3为本发明提出的分布式节点管理方法中的基于RPC消息队列的节点插件自动更新的示意图；3 is a schematic diagram of the automatic update of the node plug-in based on the RPC message queue in the distributed node management method proposed by the present invention;

图4为本发明提出的分布式节点管理方法中的基于RPC消息队列的节点资源和状态定时管理的示意图。FIG. 4 is a schematic diagram of node resource and state timing management based on RPC message queue in the distributed node management method proposed by the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的具体实施方式作出详细说明。The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

为更好地说明本发明，便于理解本发明的技术方案，本发明的典型但非限制性的实施例如下：这里需要特别说明的是本发明说明书所列的实施方式仅是为了说明问题方便而给出的示例性实施方法，其不得理解为是本发明唯一正确的实施方式，更不得理解为是对本发明保护范围的限制性说明。In order to better illustrate the present invention and facilitate the understanding of the technical solutions of the present invention, typical but non-limiting examples of the present invention are as follows: It should be noted here that the embodiments listed in the description of the present invention are only for the convenience of illustrating the problem. The given exemplary implementation method should not be construed as the only correct embodiment of the present invention, nor should it be construed as a limiting description of the protection scope of the present invention.

对于分布式节点管理的目标是要实现最大限度降低人工参与，提高工作节点部署效率；当系统探测插件版本更新时，工作节点能够及时更新探测插件，以最新版本探测插件执行探测任务；管理节点能够定时收集工作节点状态和资源使用情况。The goal of distributed node management is to minimize manual participation and improve the deployment efficiency of worker nodes; when the system detects an update of the plug-in version, the worker node can update the probe plug-in in time and perform the detection task with the latest version of the probe plug-in; the management node can Periodically collect worker node status and resource usage.

为此，本发明提出了一种新的将SSH(Secure Shell Protocol)通道技术、RPC消息队列和定时管理等技术手段有机融入到节点管理的技术方案，流程示意图参见图1，包括：To this end, the present invention proposes a new technical solution that organically integrates SSH (Secure Shell Protocol) channel technology, RPC message queue, timing management and other technical means into node management. See Figure 1 for a schematic flow diagram, including:

为了便于理解上述步骤，通过下述对本发明提出的技术方案的一个优选的具体实施方式的各个步骤予以图解说明。In order to facilitate the understanding of the above steps, each step of a preferred specific implementation manner of the technical solution proposed by the present invention is illustrated in the following.

参见图1，为本发明提出的技术方案的步骤1的一个优选的具体实施方式，管理服务器基于SSH(Secure Shell Protocol：安全外壳协议)通道的节点自动化管理技术，利用SSH协议，如基于libssh库(具体参见：http://www.libssh.org/)在管理服务器和工作节点之间构建一条SSH通道。这个SSH通道为管理服务器主动管理工作节点提供了一条通道。SSH通道建立后，就可以在管理服务器端向工作节点发送命令，所述命令包含任务信息，控制工作节点完成相应的功能或操作。Referring to FIG. 1, which is a preferred specific implementation of step 1 of the technical solution proposed by the present invention, the management server is based on the node automation management technology of the SSH (Secure Shell Protocol: Secure Shell Protocol) channel, using the SSH protocol, such as based on the libssh library (For details, see: http://www.libssh.org/) Build an SSH channel between the management server and the worker nodes. This SSH channel provides a channel for the management server to actively manage worker nodes. After the SSH channel is established, the management server can send commands to the working nodes, where the commands include task information, and control the working nodes to complete corresponding functions or operations.

在实际应用中，可以利用这一通道控制工作节点下载自动化初始化插件，实现工作节点的批量管理，提高工作节点部署效率。In practical applications, this channel can be used to control worker nodes to download automatic initialization plug-ins, realize batch management of worker nodes, and improve worker node deployment efficiency.

参见图2，基于SSH通道的节点自动化部署流程如下：(1)管理服务器和工作节点之间建立一条SSH通道(参见图1：(1)建立SSH通道)；(2)管理服务器通过SSH通道向工作节点发送命令(参见图1：(2)发送命令)；(3)工作节点执行命令，完成特定功能(参见图1：(3)指令命令完成)；(4)工作节点通过SSH通道将执行结果返回给管理服务器(参见图1：(4)返回结果)。Referring to Figure 2, the automatic deployment process of nodes based on SSH channel is as follows: (1) An SSH channel is established between the management server and the worker nodes (see Figure 1: (1) SSH channel is established); (2) The management server sends the SSH channel to the The worker node sends the command (see Figure 1: (2) Sending the command); (3) The worker node executes the command to complete a specific function (see Figure 1: (3) The command command is completed); (4) The worker node executes the command through the SSH channel The result is returned to the management server (see Figure 1: (4) return result).

基于SSH通道的节点自动化管理技术为管理服务器主动管理工作节点提供了技术途径，主要应用于工作节点初始化部署和调试阶段。在系统运行过程中，由于探测插件版本不断改进、更新、完善，工作节点上缓存的探测插件版本可能会过时。实时检测到插件版本的变化，并及时更新插件对于保证探测结果质量和一致性具有重要意义。但这时如果依靠管理服务器通过SSH通道逐一检查工作节点插件版本则存在如下问题：一是每次下发任务都需要检查，造成很多不必要的交互流量；二是容易暴露管理服务器IP地址，造成远程机房封杀管理服务器IP地址。The SSH channel-based node automation management technology provides a technical approach for the management server to actively manage the working nodes, and is mainly used in the initial deployment and debugging stages of the working nodes. During the operation of the system, due to the continuous improvement, update and perfection of the version of the detection plug-in, the version of the detection plug-in cached on the worker node may become outdated. It is of great significance to detect the change of plug-in version in real time and update the plug-in in time to ensure the quality and consistency of detection results. However, at this time, if you rely on the management server to check the plug-in versions of the worker nodes one by one through the SSH channel, there are the following problems: First, each task is required to be checked, resulting in a lot of unnecessary interactive traffic; second, it is easy to expose the management server IP address, causing The remote computer room blocks the IP address of the management server.

为了避免上述问题，参见图3，为本发明提出的技术方案的步骤2的一个优选的具体实施方式。在分发到工作节点的任务信息中，包含执行任务所需的插件及版本信息，工作节点将接收的所述信息与本地插件信息进行对比，若不一致则启动插件更新，此时，工作节点和管理服务器之间建立远程调用服务RPC消息队列，如：工作节点利用RabbitMQ(由Rabbit公司开发的一种实现高级消息队列协议(AMQP)的开源消息代理软件(亦称面向消息的中间件))中提供的RPC消息队列，将插件请求发送给消息服务器，然后通过消息服务器向管理服务器发送插件更新请求，管理服务器接收并处理插件请求，将最新版插件发送到消息服务器的临时消息队列，并通过消息队列将插件请求结果发送给工作节点，实现探测插件的自动更新功能。In order to avoid the above problems, see FIG. 3 , which is a preferred specific implementation of step 2 of the technical solution proposed by the present invention. The task information distributed to the worker nodes includes the plugins and version information required to execute the task. The worker nodes compare the received information with the local plugin information, and if they are inconsistent, start the plugin update. At this time, the worker nodes and the management Establish a remote call service RPC message queue between servers, such as: working nodes use RabbitMQ (an open source message broker software (also known as message-oriented middleware) developed by Rabbit that implements the Advanced Message Queuing Protocol (AMQP)) to provide RPC message queue, send the plug-in request to the message server, and then send the plug-in update request to the management server through the message server, the management server receives and processes the plug-in request, sends the latest version of the plug-in to the temporary message queue of the message server, and passes the message queue. Send the plug-in request result to the worker node to realize the automatic update function of the detection plug-in.

基于RPC消息队列更新节点插件采取被动管理、实时更新的模式进行，参见图3，具体流程如下：(1)在任务分发过程中，管理服务器Manage Server根据最新插件列表和版本指定执行任务的插件和版本，通过插件的唯一MD5进行标识，该任务通过消息服务器分发到某一工作节点Worker。(2)工作节点Worker接收到任务后，查找本地是否存在匹配执行任务所需的插件，如果没有，则启动插件更新过程。(3)工作节点Worker将插件更新请求放入消息服务器中的RPC消息队列中，同时，建立一个专用的临时消息队列，用于存放管理服务器返回的插件请求结果。工作节点Worker持续监听这一临时队列，直到取回插件请求结果。(4)管理服务器Manage Server端持续监听消息服务器MQ Server的RPC消息队列，取回其中的新消息，通过插件MD5值唯一确定工作节点Worker所请求更新的插件，将最新版本的插件放入对应的临时消息队列。(5)工作节点Worker从自己创建的临时消息队列中取回插件请求结果，实现插件版本的实时更新。同时，注销对应的临时消息队列。Node plug-ins are updated based on RPC message queues in a passive management and real-time update mode, as shown in Figure 3. The specific process is as follows: (1) During the task distribution process, the management server Manage Server specifies the plug-ins and The version is identified by the unique MD5 of the plug-in, and the task is distributed to a certain worker node Worker through the message server. (2) After receiving the task, the worker node Worker finds out whether there is a local plug-in that matches the execution of the task, and if not, starts the plug-in update process. (3) The worker node Worker puts the plug-in update request into the RPC message queue in the message server, and at the same time, establishes a dedicated temporary message queue for storing the plug-in request result returned by the management server. The worker node Worker continues to listen to this temporary queue until the plugin request result is retrieved. (4) The Manage Server side of the management server continuously monitors the RPC message queue of the message server MQ Server, retrieves the new messages in it, uniquely determines the updated plug-in requested by the worker node Worker through the plug-in MD5 value, and puts the latest version of the plug-in into the corresponding Temporary message queue. (5) The worker node Worker retrieves the plug-in request result from the temporary message queue created by itself, and realizes the real-time update of the plug-in version. At the same time, log off the corresponding temporary message queue.

参见图4，为本发明提出的技术方案的步骤3的一种优选的具体实施方式。工作节点按照预设的定时心跳间隔将节点资源占用情况和节点状态发送至RPC消息队列，通过消息服务器向管理服务器报告节点资源和状态，管理服务器接收并更新相应的节点信息。作为一个更优选的实施方式，工作节点Worker按照预设的定时心跳间隔将节点资源占用情况和节点状态以定时请求的方式发送至RPC消息队列，RPC消息队列通过消息服务器MQServer将上述定时请求向管理服务器Manage Server报告节点资源和状态，管理服务器接收并更新相应的节点信息。可见，基于RPC消息队列管理节点资源和状态与节点插件自动更新的不同之处在于插件自动更新是不定时的，而采用RPC消息队列管理节点资源与状态管理是定时的、常规的任务。因此，基于RPC消息队列管理节点资源和状态采取被动管理、定时更新的模式进行。参见图3，其具体流程如下：(1)工作节点Worker通过设置定时心跳(例如，30秒)，向消息服务器MQ Server发送定时请求，定时将节点资源占用情况和节点状态信息发送至RPC消息队列，同时，建立一个专用的临时队列Qt，用于存放管理服务器返回的定时请求结果。工作节点Worker持续监听临时队列Qt，直到取回定时请求结果。(2)管理服务器Manager Server端持续监听消息服务器MQ Server的RPC消息队列，取回其中的新消息，即新的定时请求，解析取回的新消息，更新节点状态和资源占用情况。同时，根据用户操作，将需要工作节点Worker执行的命令作为定时请求结果放入对应的临时消息队列。(3)工作节点Worker取回临时消息队列Qt中的定时请求结果中的消息，执行特定命令，然后等待下一心跳上报节点状态和资源占用情况。Referring to FIG. 4 , it is a preferred specific implementation of step 3 of the technical solution proposed by the present invention. The worker node sends the node resource occupancy and node status to the RPC message queue according to the preset timing heartbeat interval, reports the node resource and status to the management server through the message server, and the management server receives and updates the corresponding node information. As a more preferred embodiment, the worker node Worker sends the node resource occupancy and node status to the RPC message queue in the form of a timed request according to a preset timed heartbeat interval, and the RPC message queue sends the above-mentioned timed request to the management through the message server MQServer. The server Manage Server reports node resources and status, and the management server receives and updates the corresponding node information. It can be seen that the difference between managing node resources and status based on RPC message queues and the automatic update of node plug-ins is that the automatic update of plug-ins is irregular, while using RPC message queues to manage node resources and status management is a regular and regular task. Therefore, based on the RPC message queue management node resources and status adopt passive management and regular update mode. Referring to Figure 3, the specific process is as follows: (1) The worker node Worker sends a timing request to the message server MQ Server by setting a timing heartbeat (for example, 30 seconds), and regularly sends the node resource occupancy and node status information to the RPC message queue , and at the same time, a dedicated temporary queue Qt is established to store the timing request results returned by the management server. The worker node Worker continues to monitor the temporary queue Qt until the timing request result is retrieved. (2) The Manager Server side of the management server continuously monitors the RPC message queue of the message server MQ Server, retrieves new messages, that is, new timing requests, parses the retrieved new messages, and updates the node status and resource occupancy. At the same time, according to the user operation, the command that needs to be executed by the worker node Worker is put into the corresponding temporary message queue as the result of the timing request. (3) The worker node Worker retrieves the message in the timing request result in the temporary message queue Qt, executes a specific command, and then waits for the next heartbeat to report the node status and resource occupancy.

对于本领域技术人员而言，显然本发明实施例不限于上述示范性实施例的细节，而且在不背离本发明实施例的精神或基本特征的情况下，能够以其他的具体形式实现本发明实施例。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明实施例的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明实施例内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外，显然“包括”一词不排除其他单元或步骤，单数不排除复数。系统、装置或终端权利要求中陈述的多个单元、模块或装置也可以由同一个单元、模块或装置通过软件或者硬件来实现。第一，第二等词语用来表示名称，而并不表示任何特定的顺序。For those skilled in the art, it is obvious that the embodiments of the present invention are not limited to the details of the above-mentioned exemplary embodiments, and the present invention can be implemented in other specific forms without departing from the spirit or essential features of the embodiments of the present invention example. Accordingly, the embodiments are to be considered in all respects as exemplary and not restrictive, the scope of the embodiments of the present invention being defined by the appended claims rather than the foregoing description, and are therefore intended to fall within the scope of All changes within the meaning and scope of equivalents of the claims are included in the embodiments of the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Multiple units, modules or means recited in the system, device or terminal claims can also be implemented by the same unit, module or means by software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.

最后应说明的是，以上实施方式仅用以说明本发明实施例的技术方案而非限制，尽管参照以上较佳实施方式对本发明实施例进行了详细说明，本领域的普通技术人员应当理解，可以对本发明实施例的技术方案进行修改或等同替换都不应脱离本发明实施例的技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the embodiments of the present invention and not to limit them. Although the embodiments of the present invention have been described in detail with reference to the above preferred embodiments, those of ordinary skill in the art should Modifications or equivalent replacements to the technical solutions of the embodiments of the present invention should not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A distributed node management method is characterized in that the following steps are executed among a management server, a message server and a working node:

step 1: establishing a secure communication channel between a management server and a working node, wherein the management server sends task information to the working node through the secure communication channel, the management server actively establishes an SSH channel to the working node, after the SSH channel is established, the management server sends the task information to the working node through the SSH channel to control the working node to complete corresponding functions or operations, and the task information comprises plug-ins and version information thereof required by task execution;

step 2: the working node judges whether to start updating operation or not according to the task information; if so, establishing a message queue by the working node and a message server, and sending an updating request to the management server through the message server, wherein the working node compares the received required plug-in and version information thereof with local plug-in information of the working node to judge whether the updating operation needs to be started or not; if not, the updating operation needs to be started; the message queue established by the working node and the message server is an RPC message queue, an updating request is sent to the message server through the RPC message queue, and then the message server sends the updating request to the management server;

and step 3: the management server receives and processes the updating request and sends updating data to the working node;

and 4, step 4: and the working node periodically sends the node information to the management server through the message queue, and the management server receives and updates the node information of the working node.

2. The method of claim 1, wherein the RPC message queue is provided using a RabbitMQ.

3. The method of claim 2, wherein said step 3 comprises:

and the management server sends the latest version of plug-in to the working node through the message queue.

4. The method of claim 3, wherein said step 4 comprises:

the working node sends the node information of the working node to the RPC message queue periodically, the resource and state information of the working node is reported to the management server through the message server, and the management server updates the node information of the working node stored by the management server after receiving the node information.

5. The method of claim 4, wherein the node information in step 4 includes node resource occupation and node status.

6. The method according to claim 5, wherein the periodicity in step 4 is implemented according to a preset timed heartbeat interval.

7. A distributed node management apparatus comprising a processor and a memory, the memory having a medium with program code stored thereon, the apparatus being capable of performing the method of any one of claims 1 to 6 when the processor reads the program code stored on the medium.