CN111198754B

CN111198754B - Task scheduling method and device

Info

Publication number: CN111198754B
Application number: CN201811376902.XA
Authority: CN
Inventors: 孙正君; 喻涵; 李磊; 陈斌斌; 罗洋
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2023-07-14
Anticipated expiration: 2038-11-19
Also published as: CN111198754A

Abstract

The invention discloses a task scheduling method and device. The method comprises the following steps: the resource use data of each node in the current period is obtained, the resource use data of each node in the next period is predicted by taking the resource use data in the current period as the input of the bidirectional cyclic neural network model, and then the tasks in each node can be scheduled according to the resource use data of each node in the next period. The embodiment of the invention adopts the bidirectional cyclic neural network model, so that the resource use data of each node in the next period can be predicted more accurately, and the accuracy of scheduling the tasks in each node is further improved.

Description

Method and device for task scheduling

技术领域technical field

本发明涉及软件技术领域，尤其涉及一种任务调度方法及装置。The present invention relates to the field of software technology, in particular to a task scheduling method and device.

背景技术Background technique

Kubernetes系统是一个由Google设计和开发的开源容器群集管理项目。它设计的目的是为容器集群提供一个可自动化、可伸缩、可扩展的运营平台。利用Kubernetes系统能方便地管理容器化的应用，能够解决容器之间的通讯问题。The Kubernetes system is an open source container cluster management project designed and developed by Google. It is designed to provide an automated, scalable, and extensible operating platform for container clusters. Using the Kubernetes system can easily manage containerized applications and solve communication problems between containers.

Kubernetes系统中任务调度方法的设计，需要从最大化资源利用率等角度出发，使其能够在出现资源瓶颈之前就能预先触发动态调度。为了让Kubernetes系统能够在出现资源瓶颈之前响应，需要对应用在未来某个时间段内的资源需求量进行预测，然后根据预测值来进行动态的资源调度。目前采用的预测方法主要是通过简单的回归模型对未来资源需求量进行预测。然而，这些方法容易受到预测者主观因素的干扰，对历史数据的质量要求比较高，且预测精度较低。The design of the task scheduling method in the Kubernetes system needs to start from the perspective of maximizing resource utilization, so that it can trigger dynamic scheduling in advance before resource bottlenecks occur. In order for the Kubernetes system to respond before a resource bottleneck occurs, it is necessary to predict the resource demand of the application in a certain period of time in the future, and then perform dynamic resource scheduling based on the predicted value. The current forecasting method mainly uses a simple regression model to predict the future resource demand. However, these methods are easily interfered by the subjective factors of the forecaster, have relatively high quality requirements for historical data, and have low forecasting accuracy.

基于此，目前亟需一种任务调度方法，用于解决现有技术中在任务调度的过程中由于对未来资源需求量预测精确度低导致调度准确性低的问题。Based on this, there is an urgent need for a task scheduling method, which is used to solve the problem of low scheduling accuracy due to low prediction accuracy of future resource demand in the process of task scheduling in the prior art.

发明内容Contents of the invention

本发明实施例提供一种任务调度方法及装置，以解决现有技术中在任务调度的过程中由于对未来资源需求量预测精确度低导致调度准确性低的技术问题。Embodiments of the present invention provide a task scheduling method and device to solve the technical problem of low scheduling accuracy due to low prediction accuracy of future resource demand in the process of task scheduling in the prior art.

本发明实施例提供一种任务调度方法，所述方法包括：An embodiment of the present invention provides a task scheduling method, the method comprising:

获取当前周期内各节点的资源使用数据；Obtain the resource usage data of each node in the current cycle;

将所述当前周期内各节点的资源使用数据作为双向循环神经网络模型的输入，预测得到下一周期内各节点的资源使用数据；所述双向循环神经网络模型是根据历史周期内各节点的资源使用数据进行训练得到的，所述历史周期为所述当前周期以前的周期；The resource usage data of each node in the current cycle is used as the input of the two-way cyclic neural network model to predict the resource usage data of each node in the next cycle; the two-way cyclic neural network model is based on the resources of each node in the historical cycle Obtained by using data for training, the historical period is a period before the current period;

根据所述下一周期内各节点的资源使用数据，对所述各节点中的任务进行调度。According to the resource usage data of each node in the next period, the tasks in each node are scheduled.

如此，本发明实施例通过将当前周期内的资源使用数据作为双向循环神经网络模型的输入，预测下一周期内各节点的资源使用数据，进而可以据此对各节点中的任务进行调度。采用双向循环神经网络模型，能够更加准确地预测下一周期内各节点的资源使用数据，从而提高对各节点中的任务进行调度的准确度；进一步地，本发明实施例可以通过自动化监测、采集、处理数据，并通过双向循环神经网络模型进行学习分析并做出预测，全程智能化，无需人工干预；更进一步地，本发明实施例中，可以根据当前周期内各节点的资源使用数据和下一周期内的资源使用数据灵活调整调度所依据的阈值，从而使得任务调度更加灵活，也更加合理，能够及时、准确、动态地调度任务。In this way, the embodiment of the present invention uses the resource usage data in the current cycle as the input of the bidirectional cyclic neural network model to predict the resource usage data of each node in the next cycle, and then schedule tasks in each node accordingly. The use of a bidirectional cyclic neural network model can more accurately predict the resource usage data of each node in the next cycle, thereby improving the accuracy of scheduling tasks in each node; further, the embodiments of the present invention can automatically monitor and collect , process data, and conduct learning analysis and make predictions through the two-way cyclic neural network model, the whole process is intelligent, without manual intervention; furthermore, in the embodiment of the present invention, it can The resource usage data within a week flexibly adjusts the thresholds based on scheduling, so that task scheduling is more flexible and reasonable, and tasks can be scheduled in a timely, accurate, and dynamic manner.

在一种可能的实现方式中，所述资源使用数据包括至少一个资源使用量和资源使用状态，所述资源使用状态是根据所述至少一个资源使用量确定的；In a possible implementation manner, the resource usage data includes at least one resource usage amount and a resource usage status, and the resource usage status is determined according to the at least one resource usage amount;

所述双向循环神经网络模型是根据所述历史周期内各节点的至少一个资源使用量和资源使用状态进行训练得到的。The bidirectional cyclic neural network model is obtained by training according to at least one resource usage and resource usage status of each node in the historical period.

在一种可能的实现方式中，所述至少一个资源使用量包括内存使用量、CPU使用量、磁盘使用量和IO吞吐量。In a possible implementation manner, the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput.

在一种可能的实现方式中，所述双向循环神经网络模型是根据所述历史周期内各节点的至少一个资源使用量和资源使用状态进行训练得到的，包括：In a possible implementation manner, the bidirectional cyclic neural network model is obtained by training according to at least one resource usage and resource usage status of each node in the historical period, including:

获取任一历史周期内所述各节点的至少一个资源使用量和资源使用状态；Obtaining at least one resource usage and resource usage status of each node in any historical period;

将第一历史周期内所述各节点的至少一个资源使用量和资源使用状态作为训练样本的输入参数，将第二历史周期内所述各节点的至少一个资源使用量和资源使用状态作为所述训练样本的输出参数；所述第一历史周期为所述第二历史周期的上一周期；Using at least one resource usage and resource usage status of each node in the first historical period as an input parameter of the training sample, and using at least one resource usage and resource usage status of each node in the second historical period as the The output parameters of the training samples; the first historical period is the previous period of the second historical period;

使用所述训练样本对双向循环神经网络模型进行训练，得到所述双向循环神经网络模型。Using the training samples to train the bidirectional cyclic neural network model to obtain the bidirectional cyclic neural network model.

在一种可能的实现方式中，所述双向循环神经网络模型包括前向神经网络层和反向神经网络层；In a possible implementation, the bidirectional cyclic neural network model includes a forward neural network layer and a reverse neural network layer;

所述方法还包括：The method also includes:

接收用户的模型更新指令；Receive the user's model update instruction;

根据所述模型更新指令，修改所述双向循环神经网络模型中所述前向神经网络层和/或所述反向神经网络层的层数。According to the model update instruction, the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model is modified.

本发明实施例提供一种任务调度装置，所述装置包括：An embodiment of the present invention provides a task scheduling device, the device comprising:

收取单元，用于获取当前周期内各节点的资源使用数据；The collection unit is used to obtain the resource usage data of each node in the current cycle;

处理单元，用于将所述当前周期内各节点的资源使用数据作为双向循环神经网络模型的输入，预测得到下一周期内各节点的资源使用数据；所述双向循环神经网络模型是根据历史周期内各节点的资源使用数据进行训练得到的，所述历史周期为所述当前周期以前的周期；The processing unit is configured to use the resource usage data of each node in the current cycle as the input of the bidirectional cyclic neural network model, and predict the resource usage data of each node in the next cycle; the bidirectional cyclic neural network model is based on the historical cycle The resource usage data of each node in the network is obtained through training, and the historical cycle is a cycle before the current cycle;

调度单元，用于根据所述下一周期内各节点的资源使用数据，对所述各节点中的任务进行调度。A scheduling unit, configured to schedule the tasks in each node according to the resource usage data of each node in the next period.

在一种可能的实现方式中，所述处理单元具体用于：In a possible implementation manner, the processing unit is specifically configured to:

所述收取单元，还用于接收用户的模型更新指令；The receiving unit is also used to receive a user's model update instruction;

所述处理单元，还用于根据所述模型更新指令，修改所述双向循环神经网络模型中所述前向神经网络层和/或所述反向神经网络层的层数。The processing unit is further configured to modify the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model according to the model update instruction.

本申请实施例的还提供一种装置，该装置具有实现上文所描述的任务调度方法的功能。该功能可以通过硬件执行相应的软件实现，在一种可能的设计中，该装置包括：处理器、收发器、存储器；该存储器用于存储计算机执行指令，该收发器用于实现该装置与其他通信实体进行通信，该处理器与该存储器通过该总线连接，当该装置运行时，该处理器执行该存储器存储的该计算机执行指令，以使该装置执行上文所描述的任务调度方法。Embodiments of the present application also provide an apparatus, which has a function of implementing the task scheduling method described above. This function can be implemented by hardware executing corresponding software. In a possible design, the device includes: a processor, a transceiver, and a memory; the memory is used to store computer-executed instructions, and the transceiver is used to realize the communication between the device and other The entity communicates, the processor and the memory are connected through the bus, and when the device is running, the processor executes the computer-executable instructions stored in the memory, so that the device executes the task scheduling method described above.

本发明实施例还提供一种计算机存储介质，所述存储介质中存储软件程序，该软件程序在被一个或多个处理器读取并执行时实现上述各种可能的实现方式中所描述的任务调度方法。An embodiment of the present invention also provides a computer storage medium, where a software program is stored in the storage medium, and when the software program is read and executed by one or more processors, the tasks described in the above-mentioned various possible implementation modes are realized Scheduling method.

本发明实施例还提供一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行上述各种可能的实现方式中所描述的任务调度方法。An embodiment of the present invention also provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the task scheduling method described in the foregoing various possible implementation manners.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简要介绍。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the drawings that need to be used in the description of the embodiments.

图1为本发明实施例适用的一种系统架构示意图；FIG. 1 is a schematic diagram of a system architecture applicable to an embodiment of the present invention;

图2为本发明实施例提供的一种任务调度方法所对应的流程示意图；FIG. 2 is a schematic flowchart corresponding to a task scheduling method provided by an embodiment of the present invention;

图3为本发明实施例提供的一种双向循环神经网络模型的结构示意图；Fig. 3 is a schematic structural diagram of a bidirectional recurrent neural network model provided by an embodiment of the present invention;

图4为本发明实施例提供的一种任务调度装置结构示意图。Fig. 4 is a schematic structural diagram of a task scheduling device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合说明书附图对本申请进行具体说明，方法实施例中的具体操作方法也可以应用于装置实施例中。The present application will be described in detail below in conjunction with the accompanying drawings, and the specific operation methods in the method embodiments can also be applied to the device embodiments.

图1示例性示出了本发明实施例适用的一种系统架构示意图，如图1所示，本发明实施例适用的系统100包括调度设备101和至少一个节点，例如图1中示出的节点1021、节点1022和节点1023。其中，调度设备101可以通过服务器接口(API Server)与各节点连接，例如，图1中示出的，调度设备101可以通过API Server与节点1021连接，也可以通过APIServer与节点1022连接，还可以通过API Server与节点1023连接。Figure 1 exemplarily shows a schematic diagram of a system architecture applicable to an embodiment of the present invention. As shown in Figure 1, a system 100 applicable to an embodiment of the present invention includes a scheduling device 101 and at least one node, such as the node shown in Figure 1 1021 , node 1022 and node 1023 . Wherein, the scheduling device 101 can be connected to each node through a server interface (API Server). For example, as shown in FIG. Connect to node 1023 through API Server.

进一步地，以系统100为Kubernetes系统为例，调度设备101可以包括管理模块1011、资源监控模块1012、深度分析模块1013和资源调度模块1014。每个节点中可以包括Kubelet组件和至少一个Pod，例如，节点1021可以包括Kubelet组件1211、Pod1212和Pod1213，节点1022可以包括Kubelet组件1221、Pod1222和Pod1223，节点1023可以包括Kubelet组件1231、Pod1232和Pod1233。Further, taking the system 100 as an example of a Kubernetes system, the scheduling device 101 may include a management module 1011 , a resource monitoring module 1012 , an in-depth analysis module 1013 and a resource scheduling module 1014 . Each node may include a Kubelet component and at least one Pod, for example, node 1021 may include a Kubelet component 1211, Pod1212, and Pod1213, node 1022 may include a Kubelet component 1221, Pod1222, and Pod1223, and node 1023 may include a Kubelet component 1231, Pod1232, and Pod1233 .

具体地，管理模块1011可以负责在节点中创建Pod，例如，管理模块1011可以在节点1021中创建Pod1212和Pod1213，也可以在节点1022中创建Pod1222和Pod1223，还可以在节点1023中创建Pod1232和Pod1233。Specifically, the management module 1011 can be responsible for creating Pods in the nodes. For example, the management module 1011 can create Pod1212 and Pod1213 in the node 1021, can also create Pod1222 and Pod1223 in the node 1022, and can also create Pod1232 and Pod1233 in the node 1023 .

资源监控模块1012可以用于收集各节点中的资源使用数据，并可以以Pod为单位聚合起来，提供给深度分析模块1013使用。The resource monitoring module 1012 can be used to collect resource usage data in each node, and can aggregate them in Pod units, and provide them to the in-depth analysis module 1013 for use.

深度分析模块1013可以根据资源监控模块1012提供的资源使用数据，通过深度学习模型进行训练，预测后面一段时间内的资源使用数据。The in-depth analysis module 1013 can perform training through a deep learning model according to the resource usage data provided by the resource monitoring module 1012 to predict resource usage data in a period of time in the future.

进一步地，如图1所示，深度分析模块1013可以包括数据处理模块1131、深度预测模块1132和模型更新模块1133。其中，数据处理模块1131可以用于对资源监控模块1012收集的资源使用数据进行预处理；深度预测模块1132可以将数据处理模块1131的处理结果送入深度学习模型进行学习，对未来资源使用数据进行预测；模型更新模块1133可以通过网络接收管理员的模型更新指令，并且可以根据需求对模型结构进行更新。Further, as shown in FIG. 1 , the depth analysis module 1013 may include a data processing module 1131 , a depth prediction module 1132 and a model update module 1133 . Among them, the data processing module 1131 can be used to preprocess the resource usage data collected by the resource monitoring module 1012; the depth prediction module 1132 can send the processing results of the data processing module 1131 into the deep learning model for learning, and perform future resource usage data Prediction; the model update module 1133 can receive the administrator's model update instruction through the network, and can update the model structure according to the requirement.

资源调度模块1014可以根据深度分析模块1013的预测结果，生成调度策略，并下发到各节点中。The resource scheduling module 1014 can generate a scheduling policy according to the prediction result of the in-depth analysis module 1013, and deliver it to each node.

进一步地，如图1所示，资源调度模块1014可以包括策略产生模块1141和策略下发模块1142。其中，策略产生模块1141可以根据深度分析模块1013的处理结果，生成相应的调度策略；策略下发模块1142可以将策略产生模块1141产生的策略，下发到各节点中，从而实现系统资源合理调度。Further, as shown in FIG. 1 , the resource scheduling module 1014 may include a policy generation module 1141 and a policy delivery module 1142 . Among them, the policy generation module 1141 can generate corresponding scheduling policies according to the processing results of the in-depth analysis module 1013; the policy delivery module 1142 can deliver the policies generated by the policy generation module 1141 to each node, so as to realize reasonable scheduling of system resources .

基于图1所示的系统架构，图2示例性示出了本发明实施例提供的一种任务调度方法所对应的流程示意图，包括以下步骤：Based on the system architecture shown in FIG. 1, FIG. 2 exemplarily shows a schematic flowchart corresponding to a task scheduling method provided in an embodiment of the present invention, including the following steps:

步骤201，获取当前周期内各节点的资源使用数据。Step 201, acquire resource usage data of each node in the current period.

步骤202，将所述当前周期内各节点的资源使用数据作为双向循环神经网络模型的输入，预测得到下一周期内各节点的资源使用数据。Step 202, using the resource usage data of each node in the current cycle as the input of the bidirectional cyclic neural network model, and predicting and obtaining the resource usage data of each node in the next cycle.

步骤203，根据所述下一周期内各节点的资源使用数据，对所述各节点中的任务进行调度。Step 203, according to the resource usage data of each node in the next period, schedule the tasks in each node.

具体来说，资源使用数据可以包括至少一个资源使用量和资源使用状态。Specifically, the resource usage data may include at least one resource usage amount and resource usage status.

其中，资源使用量可以是内存使用量、CPU使用量、磁盘使用量和IO吞吐量等，具体不做限定。进一步地，根据图1示出的内容可知，每个节点中可能存在多个Pod，获取各节点的资源使用量时，可以先获取每个节点中各Pod的资源使用量，然后再根据各Pod的资源使用量确定该节点的资源使用量。Wherein, the resource usage may be memory usage, CPU usage, disk usage, IO throughput, etc., and is not specifically limited. Furthermore, according to the content shown in Figure 1, it can be seen that there may be multiple Pods in each node. When obtaining the resource usage of each node, the resource usage of each Pod in each node can be obtained first, and then according to the The resource usage of determines the resource usage of the node.

如表1所示，为本发明实施例中资源使用量的一种示例。其中，节点A包括Pod A-1和Pod A-2，Pod A-1的内存使用量为20％，CPU使用量为50％，磁盘使用量为10％，IO吞吐量为30％；Pod A-2的内存使用量为25％，CPU使用量为30％，磁盘使用量为80％，IO吞吐量为30％，由此可知，节点A的内存使用量为45％，CPU使用量为80％，磁盘使用量为90％，IO吞吐量为60％。节点B包括Pod B-1和Pod B-2，Pod B-1的内存使用量为35％，CPU使用量为55％，磁盘使用量为20％，IO吞吐量为30％；Pod B-2的内存使用量为25％，CPU使用量为45％，磁盘使用量为55％，IO吞吐量为60％，由此可知，节点B的内存使用量为60％，CPU使用量为100％，磁盘使用量为75％，IO吞吐量为90％。As shown in Table 1, it is an example of resource usage in the embodiment of the present invention. Among them, Node A includes Pod A-1 and Pod A-2, the memory usage of Pod A-1 is 20%, the CPU usage is 50%, the disk usage is 10%, and the IO throughput is 30%; Pod A The memory usage of -2 is 25%, the CPU usage is 30%, the disk usage is 80%, and the IO throughput is 30%, so it can be seen that the memory usage of node A is 45%, and the CPU usage is 80% %, disk usage is 90%, and IO throughput is 60%. Node B includes Pod B-1 and Pod B-2. The memory usage of Pod B-1 is 35%, the CPU usage is 55%, the disk usage is 20%, and the IO throughput is 30%; Pod B-2 The memory usage is 25%, the CPU usage is 45%, the disk usage is 55%, and the IO throughput is 60%. It can be seen that the memory usage of Node B is 60%, and the CPU usage is 100%. Disk usage is 75%, IO throughput is 90%.

表1：本发明实施例中资源使用量的一种示例Table 1: An example of resource usage in the embodiment of the present invention

需要说明的是，上文所示出的节点的资源使用量的计算方法仅为一种示例，在其它可能的实现方式中，节点的资源使用量也可以根据各Pod的资源使用量和各Pod的权重进行计算，具体不再详细描述。It should be noted that the calculation method for the resource usage of a node shown above is only an example. In other possible implementations, the resource usage of a node can also be based on the resource usage of each Pod and the The weight is calculated, which will not be described in detail.

本发明实施例中，资源使用状态可以是根据至少一个资源使用量确定的。具体来说，以表1中示出的节点A的资源使用量为例，节点A的资源使用状态可以根据节点A中的内存使用量、CPU使用量、磁盘使用量和IO吞吐量来确定。举个例子，在计算节点A的资源使用状态时，可以通过判断节点A中资源使用量大于使用量阈值的资源使用量的个数是否大于个数阈值，若大于，则可以确定节点A的资源使用状态为高负载状态，否则，可以确定节点A的资源使用状态为低负载状态。再举个例子，在计算节点A的资源使用状态时，可以先根据节点A中的内存使用量、CPU使用量、磁盘使用量和IO吞吐量来确定节点A的负载量，然后通过判断节点A中负载量是否大于负载量阈值，若大于，则可以确定节点A的资源使用状态为高负载状态，否则，可以确定节点A的资源使用状态为低负载状态。In this embodiment of the present invention, the resource usage status may be determined according to at least one resource usage amount. Specifically, taking the resource usage of node A shown in Table 1 as an example, the resource usage status of node A can be determined according to the memory usage, CPU usage, disk usage and IO throughput of node A. For example, when calculating the resource usage status of node A, it can be determined whether the number of resource usage in node A whose resource usage is greater than the usage threshold is greater than the number threshold. The usage status is a high load status, otherwise, it can be determined that the resource usage status of node A is a low load status. For another example, when calculating the resource usage status of node A, the load of node A can be determined according to the memory usage, CPU usage, disk usage and IO throughput of node A, and then by judging node A Whether the medium load is greater than the load threshold, if it is greater, it can be determined that the resource usage state of node A is a high load state, otherwise, it can be determined that the resource usage state of node A is a low load state.

在执行步骤202之前，本发明实施例可以先对各节点的资源使用数据进行预处理。其中，预处理可以包括格式转换和维度重构两个步骤。格式转换是指将获取到的资源使用数据转换成神经网络模型能够识别的格式。根据资源使用数据的类型的不同，格式转换的方式也不同，比如，可以将布尔类型的数据字段转换为二进制值的格式后，作为输入数据格式；获知，可以将文本类型的数据字段，通过词袋(Bag of Word，BOW)的方法进行格式转换后，作为输入的数据格式；或者，也可以将数值类型的数据字段保留原类型，作为输入数据格式。Before step 202 is executed, the embodiment of the present invention may preprocess the resource usage data of each node. Wherein, the preprocessing may include two steps of format conversion and dimension reconstruction. Format conversion refers to converting the obtained resource usage data into a format that the neural network model can recognize. Depending on the type of data used by the resource, the format conversion method is also different. For example, the data field of the Boolean type can be converted into a binary value format and used as the input data format; Bag of Word (BOW) method for format conversion, as the input data format; or, you can also keep the original type of the data field of the numeric type, as the input data format.

进一步地，对格式转换后的数据可以进行维度重构，从而构建满足深度学习模型输入要求的维度。Furthermore, dimensionality reconstruction can be performed on the format-converted data, so as to construct a dimension that meets the input requirements of the deep learning model.

步骤202中，如图3所示，为本发明实施例提供的一种双向循环神经网络模型的结构示意图。其中，该双向循环神经网络模型可以包含1层输入层(Input Layer)、3层前向神经网络层(Forward Layer)、3层反向神经网络层(backward Layer)、2层全连接层(FullyConnected Layer)和1层输出层(Output Layer)。该双向循环神经网络模型在进行计算时，先将处理后的数据输入到输入层，输入层进行计算处理后，将计算结果同时输入到前向神经网络层和反向神经网络层，经过前向神经网络层和反向神经网络层的计算后，将结果合并输入到全连接层，全连接层处理完数据后通过输出层进行预测输出。In step 202, as shown in FIG. 3 , it is a schematic structural diagram of a bidirectional recurrent neural network model provided by an embodiment of the present invention. Among them, the bidirectional cyclic neural network model can include 1 layer of input layer (Input Layer), 3 layers of forward neural network layer (Forward Layer), 3 layers of reverse neural network layer (backward layer), 2 layers of fully connected layer (FullyConnected Layer) and 1 layer output layer (Output Layer). When the bidirectional cyclic neural network model performs calculations, the processed data is first input to the input layer. After the input layer performs calculation processing, the calculation results are simultaneously input to the forward neural network layer and the reverse neural network layer. After the calculation of the neural network layer and the reverse neural network layer, the results are combined and input to the fully connected layer. After the fully connected layer processes the data, the output layer predicts the output.

具体地，双向循环神经网络模型可以是根据历史周期内各节点的资源使用数据进行训练得到的，其中，历史周期可以为当前周期以前的周期。进一步地，所述双向循环神经网络模型是根据所述历史周期内各节点的至少一个资源使用量和资源使用状态进行训练得到的。Specifically, the bidirectional cyclic neural network model can be obtained by training according to the resource usage data of each node in a historical period, where the historical period can be a period before the current period. Further, the bidirectional cyclic neural network model is obtained by training according to at least one resource usage and resource usage status of each node in the historical period.

进一步地，对双向循环神经网络模型进行训练时，可以将资源使用数据进行预处理后组成时间序列作为训练样本，具体来说，可以先获取任一历史周期内各节点的至少一个资源使用量和资源使用状态；然后，将第一历史周期内各节点的至少一个资源使用量和资源使用状态作为训练样本的输入参数，将第二历史周期内各节点的至少一个资源使用量和资源使用状态作为所述训练样本的输出参数，其中，第一历史周期为第二历史周期的上一周期；最后，可以使用该训练样本对双向循环神经网络模型进行训练，从而得到所述双向循环神经网络模型。Further, when training the bidirectional cyclic neural network model, the resource usage data can be preprocessed to form a time series as a training sample. Specifically, at least one resource usage and Resource usage state; then, use at least one resource usage amount and resource usage state of each node in the first historical period as the input parameters of the training sample, and use at least one resource usage amount and resource usage state of each node in the second historical cycle as The output parameters of the training samples, wherein the first historical period is the previous period of the second historical period; finally, the training samples can be used to train the bidirectional recurrent neural network model, so as to obtain the bidirectional recurrent neural network model.

步骤203中，根据所述下一周期内各节点的资源使用数据，可以生成相应的调度策略，进而可以根据调度策略对各节点中的Pod进行调度，从而实现合理的资源调度。In step 203, according to the resource usage data of each node in the next period, a corresponding scheduling strategy can be generated, and then the Pods in each node can be scheduled according to the scheduling strategy, so as to realize reasonable resource scheduling.

考虑到随着数据量的不断扩大，双向循环神经网络模型的精度存现下降的可能性，本发明实施例中，还可以通过更新模型的方法来提高双向循环神经网络模型的精度。具体来说，首先可以接收用户的模型更新指令，然后可以根据模型更新指令，修改双向循环神经网络模型中前向神经网络层和/或反向神经网络层的层数。如此，当双向循环神经网络模型的精度下降时，可以通过增加模型层数的方式，改变优化器的选择，降低学习率。Considering that the accuracy of the bidirectional cyclic neural network model may decrease as the amount of data continues to expand, in the embodiment of the present invention, the accuracy of the bidirectional cyclic neural network model may also be improved by updating the model. Specifically, firstly, a user's model update instruction can be received, and then the number of forward neural network layers and/or reverse neural network layers in the bidirectional recurrent neural network model can be modified according to the model update instruction. In this way, when the accuracy of the bidirectional cyclic neural network model decreases, the choice of the optimizer can be changed to reduce the learning rate by increasing the number of model layers.

基于相同构思，本发明实施例提供的一种任务调度装置，如图4所示，该装置包括收取单元401、处理单元402和调度单元403；其中，Based on the same idea, an embodiment of the present invention provides a task scheduling device, as shown in FIG. 4 , the device includes a collection unit 401, a processing unit 402, and a scheduling unit 403; wherein,

收取单元401，用于获取当前周期内各节点的资源使用数据；A receiving unit 401, configured to acquire resource usage data of each node in the current cycle;

处理单元402，用于将所述当前周期内各节点的资源使用数据作为双向循环神经网络模型的输入，预测得到下一周期内各节点的资源使用数据；所述双向循环神经网络模型是根据历史周期内各节点的资源使用数据进行训练得到的，所述历史周期为所述当前周期以前的周期；The processing unit 402 is configured to use the resource usage data of each node in the current cycle as the input of the bidirectional cyclic neural network model, and predict the resource usage data of each node in the next cycle; the bidirectional cyclic neural network model is based on historical The resource usage data of each node in the cycle is obtained by training, and the historical cycle is a cycle before the current cycle;

调度单元403，用于根据所述下一周期内各节点的资源使用数据，对所述各节点中的任务进行调度。The scheduling unit 403 is configured to schedule the tasks in each node according to the resource usage data of each node in the next cycle.

在一种可能的实现方式中，所述处理单元402具体用于：In a possible implementation manner, the processing unit 402 is specifically configured to:

所述收取单元401，还用于接收用户的模型更新指令；The receiving unit 401 is also configured to receive a user's model update instruction;

所述处理单元402，还用于根据所述模型更新指令，修改所述双向循环神经网络模型中所述前向神经网络层和/或所述反向神经网络层的层数。The processing unit 402 is further configured to modify the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model according to the model update instruction.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

尽管已描述了本发明的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.

Claims

1. A method of task scheduling, the method comprising:

acquiring resource utilization data of each node in a current period, wherein the resource utilization data comprises at least one resource utilization amount and a resource utilization state, the resource utilization state is determined according to the at least one resource utilization amount, and the resource utilization state is determined according to the relation between the number of the resource utilization amounts, of which the resource utilization amount is larger than a utilization amount threshold value, in the node and a number threshold value; the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput;

taking the resource use data of each node in the current period as the input of a bidirectional cyclic neural network model, and predicting to obtain the resource use data of each node in the next period; the bidirectional cyclic neural network model is obtained by training according to resource use data of each node in a history period, wherein the history period is a period before the current period;

and scheduling the tasks in each node according to the resource use data of each node in the next period.

2. The method of claim 1, wherein the bi-directional recurrent neural network model is trained from resource usage data for each node over the historical period, comprising:

acquiring at least one resource usage amount and resource usage state of each node in any history period;

taking at least one resource usage amount and resource usage state of each node in a first history period as input parameters of a training sample, and taking at least one resource usage amount and resource usage state of each node in a second history period as output parameters of the training sample; the first history period is the last period of the second history period;

and training the bidirectional circulating neural network model by using the training sample to obtain the bidirectional circulating neural network model.

3. The method according to claim 1 or 2, wherein the two-way recurrent neural network model comprises a forward neural network layer and a reverse neural network layer;

the method further comprises the steps of:

receiving a model updating instruction of a user;

and modifying the layer number of the forward neural network layer and/or the reverse neural network layer in the bidirectional circulating neural network model according to the model updating instruction.

4. A task scheduling device, the device comprising:

a receiving unit, configured to obtain resource usage data of each node in a current period, where the resource usage data includes at least one resource usage amount and a resource usage state, where the resource usage state is determined according to the at least one resource usage amount, and the resource usage state is determined according to a relationship between a number of resource usage amounts in the node where the resource usage amount is greater than a usage amount threshold and a number threshold; the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput;

the processing unit is used for taking the resource use data of each node in the current period as the input of a bidirectional cyclic neural network model, and predicting to obtain the resource use data of each node in the next period; the bidirectional cyclic neural network model is obtained by training according to resource use data of each node in a history period, wherein the history period is a period before the current period;

and the scheduling unit is used for scheduling the tasks in each node according to the resource use data of each node in the next period.

5. The apparatus of claim 4, wherein the processing unit is specifically configured to:

6. The apparatus of claim 4 or 5, wherein the two-way recurrent neural network model comprises a forward neural network layer and a reverse neural network layer;

the receiving unit is also used for receiving a model updating instruction of a user;

the processing unit is further configured to modify the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model according to the model update instruction.

7. A computer readable storage medium storing instructions which, when run on a computer, cause the computer to carry out the method of any one of claims 1 to 3.

8. A computer device, comprising:

a memory for storing program instructions;

a processor for invoking program instructions stored in said memory and for performing the method according to any of claims 1 to 3 in accordance with the obtained program.