CN111198754B - Task scheduling method and device - Google Patents
Task scheduling method and device Download PDFInfo
- Publication number
- CN111198754B CN111198754B CN201811376902.XA CN201811376902A CN111198754B CN 111198754 B CN111198754 B CN 111198754B CN 201811376902 A CN201811376902 A CN 201811376902A CN 111198754 B CN111198754 B CN 111198754B
- Authority
- CN
- China
- Prior art keywords
- node
- neural network
- resource usage
- resource
- period
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Debugging And Monitoring (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明涉及软件技术领域,尤其涉及一种任务调度方法及装置。The present invention relates to the field of software technology, in particular to a task scheduling method and device.
背景技术Background technique
Kubernetes系统是一个由Google设计和开发的开源容器群集管理项目。它设计的目的是为容器集群提供一个可自动化、可伸缩、可扩展的运营平台。利用Kubernetes系统能方便地管理容器化的应用,能够解决容器之间的通讯问题。The Kubernetes system is an open source container cluster management project designed and developed by Google. It is designed to provide an automated, scalable, and extensible operating platform for container clusters. Using the Kubernetes system can easily manage containerized applications and solve communication problems between containers.
Kubernetes系统中任务调度方法的设计,需要从最大化资源利用率等角度出发,使其能够在出现资源瓶颈之前就能预先触发动态调度。为了让Kubernetes系统能够在出现资源瓶颈之前响应,需要对应用在未来某个时间段内的资源需求量进行预测,然后根据预测值来进行动态的资源调度。目前采用的预测方法主要是通过简单的回归模型对未来资源需求量进行预测。然而,这些方法容易受到预测者主观因素的干扰,对历史数据的质量要求比较高,且预测精度较低。The design of the task scheduling method in the Kubernetes system needs to start from the perspective of maximizing resource utilization, so that it can trigger dynamic scheduling in advance before resource bottlenecks occur. In order for the Kubernetes system to respond before a resource bottleneck occurs, it is necessary to predict the resource demand of the application in a certain period of time in the future, and then perform dynamic resource scheduling based on the predicted value. The current forecasting method mainly uses a simple regression model to predict the future resource demand. However, these methods are easily interfered by the subjective factors of the forecaster, have relatively high quality requirements for historical data, and have low forecasting accuracy.
基于此,目前亟需一种任务调度方法,用于解决现有技术中在任务调度的过程中由于对未来资源需求量预测精确度低导致调度准确性低的问题。Based on this, there is an urgent need for a task scheduling method, which is used to solve the problem of low scheduling accuracy due to low prediction accuracy of future resource demand in the process of task scheduling in the prior art.
发明内容Contents of the invention
本发明实施例提供一种任务调度方法及装置,以解决现有技术中在任务调度的过程中由于对未来资源需求量预测精确度低导致调度准确性低的技术问题。Embodiments of the present invention provide a task scheduling method and device to solve the technical problem of low scheduling accuracy due to low prediction accuracy of future resource demand in the process of task scheduling in the prior art.
本发明实施例提供一种任务调度方法,所述方法包括:An embodiment of the present invention provides a task scheduling method, the method comprising:
获取当前周期内各节点的资源使用数据;Obtain the resource usage data of each node in the current cycle;
将所述当前周期内各节点的资源使用数据作为双向循环神经网络模型的输入,预测得到下一周期内各节点的资源使用数据;所述双向循环神经网络模型是根据历史周期内各节点的资源使用数据进行训练得到的,所述历史周期为所述当前周期以前的周期;The resource usage data of each node in the current cycle is used as the input of the two-way cyclic neural network model to predict the resource usage data of each node in the next cycle; the two-way cyclic neural network model is based on the resources of each node in the historical cycle Obtained by using data for training, the historical period is a period before the current period;
根据所述下一周期内各节点的资源使用数据,对所述各节点中的任务进行调度。According to the resource usage data of each node in the next period, the tasks in each node are scheduled.
如此,本发明实施例通过将当前周期内的资源使用数据作为双向循环神经网络模型的输入,预测下一周期内各节点的资源使用数据,进而可以据此对各节点中的任务进行调度。采用双向循环神经网络模型,能够更加准确地预测下一周期内各节点的资源使用数据,从而提高对各节点中的任务进行调度的准确度;进一步地,本发明实施例可以通过自动化监测、采集、处理数据,并通过双向循环神经网络模型进行学习分析并做出预测,全程智能化,无需人工干预;更进一步地,本发明实施例中,可以根据当前周期内各节点的资源使用数据和下一周期内的资源使用数据灵活调整调度所依据的阈值,从而使得任务调度更加灵活,也更加合理,能够及时、准确、动态地调度任务。In this way, the embodiment of the present invention uses the resource usage data in the current cycle as the input of the bidirectional cyclic neural network model to predict the resource usage data of each node in the next cycle, and then schedule tasks in each node accordingly. The use of a bidirectional cyclic neural network model can more accurately predict the resource usage data of each node in the next cycle, thereby improving the accuracy of scheduling tasks in each node; further, the embodiments of the present invention can automatically monitor and collect , process data, and conduct learning analysis and make predictions through the two-way cyclic neural network model, the whole process is intelligent, without manual intervention; furthermore, in the embodiment of the present invention, it can The resource usage data within a week flexibly adjusts the thresholds based on scheduling, so that task scheduling is more flexible and reasonable, and tasks can be scheduled in a timely, accurate, and dynamic manner.
在一种可能的实现方式中,所述资源使用数据包括至少一个资源使用量和资源使用状态,所述资源使用状态是根据所述至少一个资源使用量确定的;In a possible implementation manner, the resource usage data includes at least one resource usage amount and a resource usage status, and the resource usage status is determined according to the at least one resource usage amount;
所述双向循环神经网络模型是根据所述历史周期内各节点的至少一个资源使用量和资源使用状态进行训练得到的。The bidirectional cyclic neural network model is obtained by training according to at least one resource usage and resource usage status of each node in the historical period.
在一种可能的实现方式中,所述至少一个资源使用量包括内存使用量、CPU使用量、磁盘使用量和IO吞吐量。In a possible implementation manner, the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput.
在一种可能的实现方式中,所述双向循环神经网络模型是根据所述历史周期内各节点的至少一个资源使用量和资源使用状态进行训练得到的,包括:In a possible implementation manner, the bidirectional cyclic neural network model is obtained by training according to at least one resource usage and resource usage status of each node in the historical period, including:
获取任一历史周期内所述各节点的至少一个资源使用量和资源使用状态;Obtaining at least one resource usage and resource usage status of each node in any historical period;
将第一历史周期内所述各节点的至少一个资源使用量和资源使用状态作为训练样本的输入参数,将第二历史周期内所述各节点的至少一个资源使用量和资源使用状态作为所述训练样本的输出参数;所述第一历史周期为所述第二历史周期的上一周期;Using at least one resource usage and resource usage status of each node in the first historical period as an input parameter of the training sample, and using at least one resource usage and resource usage status of each node in the second historical period as the The output parameters of the training samples; the first historical period is the previous period of the second historical period;
使用所述训练样本对双向循环神经网络模型进行训练,得到所述双向循环神经网络模型。Using the training samples to train the bidirectional cyclic neural network model to obtain the bidirectional cyclic neural network model.
在一种可能的实现方式中,所述双向循环神经网络模型包括前向神经网络层和反向神经网络层;In a possible implementation, the bidirectional cyclic neural network model includes a forward neural network layer and a reverse neural network layer;
所述方法还包括:The method also includes:
接收用户的模型更新指令;Receive the user's model update instruction;
根据所述模型更新指令,修改所述双向循环神经网络模型中所述前向神经网络层和/或所述反向神经网络层的层数。According to the model update instruction, the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model is modified.
本发明实施例提供一种任务调度装置,所述装置包括:An embodiment of the present invention provides a task scheduling device, the device comprising:
收取单元,用于获取当前周期内各节点的资源使用数据;The collection unit is used to obtain the resource usage data of each node in the current cycle;
处理单元,用于将所述当前周期内各节点的资源使用数据作为双向循环神经网络模型的输入,预测得到下一周期内各节点的资源使用数据;所述双向循环神经网络模型是根据历史周期内各节点的资源使用数据进行训练得到的,所述历史周期为所述当前周期以前的周期;The processing unit is configured to use the resource usage data of each node in the current cycle as the input of the bidirectional cyclic neural network model, and predict the resource usage data of each node in the next cycle; the bidirectional cyclic neural network model is based on the historical cycle The resource usage data of each node in the network is obtained through training, and the historical cycle is a cycle before the current cycle;
调度单元,用于根据所述下一周期内各节点的资源使用数据,对所述各节点中的任务进行调度。A scheduling unit, configured to schedule the tasks in each node according to the resource usage data of each node in the next period.
在一种可能的实现方式中,所述资源使用数据包括至少一个资源使用量和资源使用状态,所述资源使用状态是根据所述至少一个资源使用量确定的;In a possible implementation manner, the resource usage data includes at least one resource usage amount and a resource usage status, and the resource usage status is determined according to the at least one resource usage amount;
所述双向循环神经网络模型是根据所述历史周期内各节点的至少一个资源使用量和资源使用状态进行训练得到的。The bidirectional cyclic neural network model is obtained by training according to at least one resource usage and resource usage status of each node in the historical period.
在一种可能的实现方式中,所述至少一个资源使用量包括内存使用量、CPU使用量、磁盘使用量和IO吞吐量。In a possible implementation manner, the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput.
在一种可能的实现方式中,所述处理单元具体用于:In a possible implementation manner, the processing unit is specifically configured to:
获取任一历史周期内所述各节点的至少一个资源使用量和资源使用状态;Obtaining at least one resource usage and resource usage status of each node in any historical period;
将第一历史周期内所述各节点的至少一个资源使用量和资源使用状态作为训练样本的输入参数,将第二历史周期内所述各节点的至少一个资源使用量和资源使用状态作为所述训练样本的输出参数;所述第一历史周期为所述第二历史周期的上一周期;Using at least one resource usage and resource usage status of each node in the first historical period as an input parameter of the training sample, and using at least one resource usage and resource usage status of each node in the second historical period as the The output parameters of the training samples; the first historical period is the previous period of the second historical period;
使用所述训练样本对双向循环神经网络模型进行训练,得到所述双向循环神经网络模型。Using the training samples to train the bidirectional cyclic neural network model to obtain the bidirectional cyclic neural network model.
在一种可能的实现方式中,所述双向循环神经网络模型包括前向神经网络层和反向神经网络层;In a possible implementation, the bidirectional cyclic neural network model includes a forward neural network layer and a reverse neural network layer;
所述收取单元,还用于接收用户的模型更新指令;The receiving unit is also used to receive a user's model update instruction;
所述处理单元,还用于根据所述模型更新指令,修改所述双向循环神经网络模型中所述前向神经网络层和/或所述反向神经网络层的层数。The processing unit is further configured to modify the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model according to the model update instruction.
本申请实施例的还提供一种装置,该装置具有实现上文所描述的任务调度方法的功能。该功能可以通过硬件执行相应的软件实现,在一种可能的设计中,该装置包括:处理器、收发器、存储器;该存储器用于存储计算机执行指令,该收发器用于实现该装置与其他通信实体进行通信,该处理器与该存储器通过该总线连接,当该装置运行时,该处理器执行该存储器存储的该计算机执行指令,以使该装置执行上文所描述的任务调度方法。Embodiments of the present application also provide an apparatus, which has a function of implementing the task scheduling method described above. This function can be implemented by hardware executing corresponding software. In a possible design, the device includes: a processor, a transceiver, and a memory; the memory is used to store computer-executed instructions, and the transceiver is used to realize the communication between the device and other The entity communicates, the processor and the memory are connected through the bus, and when the device is running, the processor executes the computer-executable instructions stored in the memory, so that the device executes the task scheduling method described above.
本发明实施例还提供一种计算机存储介质,所述存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时实现上述各种可能的实现方式中所描述的任务调度方法。An embodiment of the present invention also provides a computer storage medium, where a software program is stored in the storage medium, and when the software program is read and executed by one or more processors, the tasks described in the above-mentioned various possible implementation modes are realized Scheduling method.
本发明实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各种可能的实现方式中所描述的任务调度方法。An embodiment of the present invention also provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the task scheduling method described in the foregoing various possible implementation manners.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the drawings that need to be used in the description of the embodiments.
图1为本发明实施例适用的一种系统架构示意图;FIG. 1 is a schematic diagram of a system architecture applicable to an embodiment of the present invention;
图2为本发明实施例提供的一种任务调度方法所对应的流程示意图;FIG. 2 is a schematic flowchart corresponding to a task scheduling method provided by an embodiment of the present invention;
图3为本发明实施例提供的一种双向循环神经网络模型的结构示意图;Fig. 3 is a schematic structural diagram of a bidirectional recurrent neural network model provided by an embodiment of the present invention;
图4为本发明实施例提供的一种任务调度装置结构示意图。Fig. 4 is a schematic structural diagram of a task scheduling device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面结合说明书附图对本申请进行具体说明,方法实施例中的具体操作方法也可以应用于装置实施例中。The present application will be described in detail below in conjunction with the accompanying drawings, and the specific operation methods in the method embodiments can also be applied to the device embodiments.
图1示例性示出了本发明实施例适用的一种系统架构示意图,如图1所示,本发明实施例适用的系统100包括调度设备101和至少一个节点,例如图1中示出的节点1021、节点1022和节点1023。其中,调度设备101可以通过服务器接口(API Server)与各节点连接,例如,图1中示出的,调度设备101可以通过API Server与节点1021连接,也可以通过APIServer与节点1022连接,还可以通过API Server与节点1023连接。Figure 1 exemplarily shows a schematic diagram of a system architecture applicable to an embodiment of the present invention. As shown in Figure 1, a
进一步地,以系统100为Kubernetes系统为例,调度设备101可以包括管理模块1011、资源监控模块1012、深度分析模块1013和资源调度模块1014。每个节点中可以包括Kubelet组件和至少一个Pod,例如,节点1021可以包括Kubelet组件1211、Pod1212和Pod1213,节点1022可以包括Kubelet组件1221、Pod1222和Pod1223,节点1023可以包括Kubelet组件1231、Pod1232和Pod1233。Further, taking the
具体地,管理模块1011可以负责在节点中创建Pod,例如,管理模块1011可以在节点1021中创建Pod1212和Pod1213,也可以在节点1022中创建Pod1222和Pod1223,还可以在节点1023中创建Pod1232和Pod1233。Specifically, the
资源监控模块1012可以用于收集各节点中的资源使用数据,并可以以Pod为单位聚合起来,提供给深度分析模块1013使用。The
深度分析模块1013可以根据资源监控模块1012提供的资源使用数据,通过深度学习模型进行训练,预测后面一段时间内的资源使用数据。The in-
进一步地,如图1所示,深度分析模块1013可以包括数据处理模块1131、深度预测模块1132和模型更新模块1133。其中,数据处理模块1131可以用于对资源监控模块1012收集的资源使用数据进行预处理;深度预测模块1132可以将数据处理模块1131的处理结果送入深度学习模型进行学习,对未来资源使用数据进行预测;模型更新模块1133可以通过网络接收管理员的模型更新指令,并且可以根据需求对模型结构进行更新。Further, as shown in FIG. 1 , the
资源调度模块1014可以根据深度分析模块1013的预测结果,生成调度策略,并下发到各节点中。The
进一步地,如图1所示,资源调度模块1014可以包括策略产生模块1141和策略下发模块1142。其中,策略产生模块1141可以根据深度分析模块1013的处理结果,生成相应的调度策略;策略下发模块1142可以将策略产生模块1141产生的策略,下发到各节点中,从而实现系统资源合理调度。Further, as shown in FIG. 1 , the
基于图1所示的系统架构,图2示例性示出了本发明实施例提供的一种任务调度方法所对应的流程示意图,包括以下步骤:Based on the system architecture shown in FIG. 1, FIG. 2 exemplarily shows a schematic flowchart corresponding to a task scheduling method provided in an embodiment of the present invention, including the following steps:
步骤201,获取当前周期内各节点的资源使用数据。
步骤202,将所述当前周期内各节点的资源使用数据作为双向循环神经网络模型的输入,预测得到下一周期内各节点的资源使用数据。
步骤203,根据所述下一周期内各节点的资源使用数据,对所述各节点中的任务进行调度。
如此,本发明实施例通过将当前周期内的资源使用数据作为双向循环神经网络模型的输入,预测下一周期内各节点的资源使用数据,进而可以据此对各节点中的任务进行调度。采用双向循环神经网络模型,能够更加准确地预测下一周期内各节点的资源使用数据,从而提高对各节点中的任务进行调度的准确度;进一步地,本发明实施例可以通过自动化监测、采集、处理数据,并通过双向循环神经网络模型进行学习分析并做出预测,全程智能化,无需人工干预;更进一步地,本发明实施例中,可以根据当前周期内各节点的资源使用数据和下一周期内的资源使用数据灵活调整调度所依据的阈值,从而使得任务调度更加灵活,也更加合理,能够及时、准确、动态地调度任务。In this way, the embodiment of the present invention uses the resource usage data in the current cycle as the input of the bidirectional cyclic neural network model to predict the resource usage data of each node in the next cycle, and then schedule tasks in each node accordingly. The use of a bidirectional cyclic neural network model can more accurately predict the resource usage data of each node in the next cycle, thereby improving the accuracy of scheduling tasks in each node; further, the embodiments of the present invention can automatically monitor and collect , process data, and conduct learning analysis and make predictions through the two-way cyclic neural network model, the whole process is intelligent, without manual intervention; furthermore, in the embodiment of the present invention, it can The resource usage data within a week flexibly adjusts the thresholds based on scheduling, so that task scheduling is more flexible and reasonable, and tasks can be scheduled in a timely, accurate, and dynamic manner.
具体来说,资源使用数据可以包括至少一个资源使用量和资源使用状态。Specifically, the resource usage data may include at least one resource usage amount and resource usage status.
其中,资源使用量可以是内存使用量、CPU使用量、磁盘使用量和IO吞吐量等,具体不做限定。进一步地,根据图1示出的内容可知,每个节点中可能存在多个Pod,获取各节点的资源使用量时,可以先获取每个节点中各Pod的资源使用量,然后再根据各Pod的资源使用量确定该节点的资源使用量。Wherein, the resource usage may be memory usage, CPU usage, disk usage, IO throughput, etc., and is not specifically limited. Furthermore, according to the content shown in Figure 1, it can be seen that there may be multiple Pods in each node. When obtaining the resource usage of each node, the resource usage of each Pod in each node can be obtained first, and then according to the The resource usage of determines the resource usage of the node.
如表1所示,为本发明实施例中资源使用量的一种示例。其中,节点A包括Pod A-1和Pod A-2,Pod A-1的内存使用量为20%,CPU使用量为50%,磁盘使用量为10%,IO吞吐量为30%;Pod A-2的内存使用量为25%,CPU使用量为30%,磁盘使用量为80%,IO吞吐量为30%,由此可知,节点A的内存使用量为45%,CPU使用量为80%,磁盘使用量为90%,IO吞吐量为60%。节点B包括Pod B-1和Pod B-2,Pod B-1的内存使用量为35%,CPU使用量为55%,磁盘使用量为20%,IO吞吐量为30%;Pod B-2的内存使用量为25%,CPU使用量为45%,磁盘使用量为55%,IO吞吐量为60%,由此可知,节点B的内存使用量为60%,CPU使用量为100%,磁盘使用量为75%,IO吞吐量为90%。As shown in Table 1, it is an example of resource usage in the embodiment of the present invention. Among them, Node A includes Pod A-1 and Pod A-2, the memory usage of Pod A-1 is 20%, the CPU usage is 50%, the disk usage is 10%, and the IO throughput is 30%; Pod A The memory usage of -2 is 25%, the CPU usage is 30%, the disk usage is 80%, and the IO throughput is 30%, so it can be seen that the memory usage of node A is 45%, and the CPU usage is 80% %, disk usage is 90%, and IO throughput is 60%. Node B includes Pod B-1 and Pod B-2. The memory usage of Pod B-1 is 35%, the CPU usage is 55%, the disk usage is 20%, and the IO throughput is 30%; Pod B-2 The memory usage is 25%, the CPU usage is 45%, the disk usage is 55%, and the IO throughput is 60%. It can be seen that the memory usage of Node B is 60%, and the CPU usage is 100%. Disk usage is 75%, IO throughput is 90%.
表1:本发明实施例中资源使用量的一种示例Table 1: An example of resource usage in the embodiment of the present invention
需要说明的是,上文所示出的节点的资源使用量的计算方法仅为一种示例,在其它可能的实现方式中,节点的资源使用量也可以根据各Pod的资源使用量和各Pod的权重进行计算,具体不再详细描述。It should be noted that the calculation method for the resource usage of a node shown above is only an example. In other possible implementations, the resource usage of a node can also be based on the resource usage of each Pod and the The weight is calculated, which will not be described in detail.
本发明实施例中,资源使用状态可以是根据至少一个资源使用量确定的。具体来说,以表1中示出的节点A的资源使用量为例,节点A的资源使用状态可以根据节点A中的内存使用量、CPU使用量、磁盘使用量和IO吞吐量来确定。举个例子,在计算节点A的资源使用状态时,可以通过判断节点A中资源使用量大于使用量阈值的资源使用量的个数是否大于个数阈值,若大于,则可以确定节点A的资源使用状态为高负载状态,否则,可以确定节点A的资源使用状态为低负载状态。再举个例子,在计算节点A的资源使用状态时,可以先根据节点A中的内存使用量、CPU使用量、磁盘使用量和IO吞吐量来确定节点A的负载量,然后通过判断节点A中负载量是否大于负载量阈值,若大于,则可以确定节点A的资源使用状态为高负载状态,否则,可以确定节点A的资源使用状态为低负载状态。In this embodiment of the present invention, the resource usage status may be determined according to at least one resource usage amount. Specifically, taking the resource usage of node A shown in Table 1 as an example, the resource usage status of node A can be determined according to the memory usage, CPU usage, disk usage and IO throughput of node A. For example, when calculating the resource usage status of node A, it can be determined whether the number of resource usage in node A whose resource usage is greater than the usage threshold is greater than the number threshold. The usage status is a high load status, otherwise, it can be determined that the resource usage status of node A is a low load status. For another example, when calculating the resource usage status of node A, the load of node A can be determined according to the memory usage, CPU usage, disk usage and IO throughput of node A, and then by judging node A Whether the medium load is greater than the load threshold, if it is greater, it can be determined that the resource usage state of node A is a high load state, otherwise, it can be determined that the resource usage state of node A is a low load state.
在执行步骤202之前,本发明实施例可以先对各节点的资源使用数据进行预处理。其中,预处理可以包括格式转换和维度重构两个步骤。格式转换是指将获取到的资源使用数据转换成神经网络模型能够识别的格式。根据资源使用数据的类型的不同,格式转换的方式也不同,比如,可以将布尔类型的数据字段转换为二进制值的格式后,作为输入数据格式;获知,可以将文本类型的数据字段,通过词袋(Bag of Word,BOW)的方法进行格式转换后,作为输入的数据格式;或者,也可以将数值类型的数据字段保留原类型,作为输入数据格式。Before
进一步地,对格式转换后的数据可以进行维度重构,从而构建满足深度学习模型输入要求的维度。Furthermore, dimensionality reconstruction can be performed on the format-converted data, so as to construct a dimension that meets the input requirements of the deep learning model.
步骤202中,如图3所示,为本发明实施例提供的一种双向循环神经网络模型的结构示意图。其中,该双向循环神经网络模型可以包含1层输入层(Input Layer)、3层前向神经网络层(Forward Layer)、3层反向神经网络层(backward Layer)、2层全连接层(FullyConnected Layer)和1层输出层(Output Layer)。该双向循环神经网络模型在进行计算时,先将处理后的数据输入到输入层,输入层进行计算处理后,将计算结果同时输入到前向神经网络层和反向神经网络层,经过前向神经网络层和反向神经网络层的计算后,将结果合并输入到全连接层,全连接层处理完数据后通过输出层进行预测输出。In
具体地,双向循环神经网络模型可以是根据历史周期内各节点的资源使用数据进行训练得到的,其中,历史周期可以为当前周期以前的周期。进一步地,所述双向循环神经网络模型是根据所述历史周期内各节点的至少一个资源使用量和资源使用状态进行训练得到的。Specifically, the bidirectional cyclic neural network model can be obtained by training according to the resource usage data of each node in a historical period, where the historical period can be a period before the current period. Further, the bidirectional cyclic neural network model is obtained by training according to at least one resource usage and resource usage status of each node in the historical period.
进一步地,对双向循环神经网络模型进行训练时,可以将资源使用数据进行预处理后组成时间序列作为训练样本,具体来说,可以先获取任一历史周期内各节点的至少一个资源使用量和资源使用状态;然后,将第一历史周期内各节点的至少一个资源使用量和资源使用状态作为训练样本的输入参数,将第二历史周期内各节点的至少一个资源使用量和资源使用状态作为所述训练样本的输出参数,其中,第一历史周期为第二历史周期的上一周期;最后,可以使用该训练样本对双向循环神经网络模型进行训练,从而得到所述双向循环神经网络模型。Further, when training the bidirectional cyclic neural network model, the resource usage data can be preprocessed to form a time series as a training sample. Specifically, at least one resource usage and Resource usage state; then, use at least one resource usage amount and resource usage state of each node in the first historical period as the input parameters of the training sample, and use at least one resource usage amount and resource usage state of each node in the second historical cycle as The output parameters of the training samples, wherein the first historical period is the previous period of the second historical period; finally, the training samples can be used to train the bidirectional recurrent neural network model, so as to obtain the bidirectional recurrent neural network model.
步骤203中,根据所述下一周期内各节点的资源使用数据,可以生成相应的调度策略,进而可以根据调度策略对各节点中的Pod进行调度,从而实现合理的资源调度。In
考虑到随着数据量的不断扩大,双向循环神经网络模型的精度存现下降的可能性,本发明实施例中,还可以通过更新模型的方法来提高双向循环神经网络模型的精度。具体来说,首先可以接收用户的模型更新指令,然后可以根据模型更新指令,修改双向循环神经网络模型中前向神经网络层和/或反向神经网络层的层数。如此,当双向循环神经网络模型的精度下降时,可以通过增加模型层数的方式,改变优化器的选择,降低学习率。Considering that the accuracy of the bidirectional cyclic neural network model may decrease as the amount of data continues to expand, in the embodiment of the present invention, the accuracy of the bidirectional cyclic neural network model may also be improved by updating the model. Specifically, firstly, a user's model update instruction can be received, and then the number of forward neural network layers and/or reverse neural network layers in the bidirectional recurrent neural network model can be modified according to the model update instruction. In this way, when the accuracy of the bidirectional cyclic neural network model decreases, the choice of the optimizer can be changed to reduce the learning rate by increasing the number of model layers.
基于相同构思,本发明实施例提供的一种任务调度装置,如图4所示,该装置包括收取单元401、处理单元402和调度单元403;其中,Based on the same idea, an embodiment of the present invention provides a task scheduling device, as shown in FIG. 4 , the device includes a
收取单元401,用于获取当前周期内各节点的资源使用数据;A receiving
处理单元402,用于将所述当前周期内各节点的资源使用数据作为双向循环神经网络模型的输入,预测得到下一周期内各节点的资源使用数据;所述双向循环神经网络模型是根据历史周期内各节点的资源使用数据进行训练得到的,所述历史周期为所述当前周期以前的周期;The
调度单元403,用于根据所述下一周期内各节点的资源使用数据,对所述各节点中的任务进行调度。The
在一种可能的实现方式中,所述资源使用数据包括至少一个资源使用量和资源使用状态,所述资源使用状态是根据所述至少一个资源使用量确定的;In a possible implementation manner, the resource usage data includes at least one resource usage amount and a resource usage status, and the resource usage status is determined according to the at least one resource usage amount;
所述双向循环神经网络模型是根据所述历史周期内各节点的至少一个资源使用量和资源使用状态进行训练得到的。The bidirectional cyclic neural network model is obtained by training according to at least one resource usage and resource usage status of each node in the historical period.
在一种可能的实现方式中,所述至少一个资源使用量包括内存使用量、CPU使用量、磁盘使用量和IO吞吐量。In a possible implementation manner, the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput.
在一种可能的实现方式中,所述处理单元402具体用于:In a possible implementation manner, the
获取任一历史周期内所述各节点的至少一个资源使用量和资源使用状态;Obtaining at least one resource usage and resource usage status of each node in any historical period;
将第一历史周期内所述各节点的至少一个资源使用量和资源使用状态作为训练样本的输入参数,将第二历史周期内所述各节点的至少一个资源使用量和资源使用状态作为所述训练样本的输出参数;所述第一历史周期为所述第二历史周期的上一周期;Using at least one resource usage and resource usage status of each node in the first historical period as an input parameter of the training sample, and using at least one resource usage and resource usage status of each node in the second historical period as the The output parameters of the training samples; the first historical period is the previous period of the second historical period;
使用所述训练样本对双向循环神经网络模型进行训练,得到所述双向循环神经网络模型。Using the training samples to train the bidirectional cyclic neural network model to obtain the bidirectional cyclic neural network model.
在一种可能的实现方式中,所述双向循环神经网络模型包括前向神经网络层和反向神经网络层;In a possible implementation, the bidirectional cyclic neural network model includes a forward neural network layer and a reverse neural network layer;
所述收取单元401,还用于接收用户的模型更新指令;The receiving
所述处理单元402,还用于根据所述模型更新指令,修改所述双向循环神经网络模型中所述前向神经网络层和/或所述反向神经网络层的层数。The
本申请实施例的还提供一种装置,该装置具有实现上文所描述的任务调度方法的功能。该功能可以通过硬件执行相应的软件实现,在一种可能的设计中,该装置包括:处理器、收发器、存储器;该存储器用于存储计算机执行指令,该收发器用于实现该装置与其他通信实体进行通信,该处理器与该存储器通过该总线连接,当该装置运行时,该处理器执行该存储器存储的该计算机执行指令,以使该装置执行上文所描述的任务调度方法。Embodiments of the present application also provide an apparatus, which has a function of implementing the task scheduling method described above. This function can be implemented by hardware executing corresponding software. In a possible design, the device includes: a processor, a transceiver, and a memory; the memory is used to store computer-executed instructions, and the transceiver is used to realize the communication between the device and other The entity communicates, the processor and the memory are connected through the bus, and when the device is running, the processor executes the computer-executable instructions stored in the memory, so that the device executes the task scheduling method described above.
本发明实施例还提供一种计算机存储介质,所述存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时实现上述各种可能的实现方式中所描述的任务调度方法。An embodiment of the present invention also provides a computer storage medium, where a software program is stored in the storage medium, and when the software program is read and executed by one or more processors, the tasks described in the above-mentioned various possible implementation modes are realized Scheduling method.
本发明实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各种可能的实现方式中所描述的任务调度方法。An embodiment of the present invention also provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the task scheduling method described in the foregoing various possible implementation manners.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811376902.XA CN111198754B (en) | 2018-11-19 | 2018-11-19 | Task scheduling method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811376902.XA CN111198754B (en) | 2018-11-19 | 2018-11-19 | Task scheduling method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111198754A CN111198754A (en) | 2020-05-26 |
| CN111198754B true CN111198754B (en) | 2023-07-14 |
Family
ID=70743876
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811376902.XA Active CN111198754B (en) | 2018-11-19 | 2018-11-19 | Task scheduling method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111198754B (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112000450B (en) * | 2020-08-18 | 2025-03-25 | 中国银联股份有限公司 | Neural network architecture search method and device |
| CN114327807A (en) * | 2020-11-25 | 2022-04-12 | 中科聚信信息技术(北京)有限公司 | Adaptive task scheduling method and device for distributed rule engine and electronic equipment |
| CN112667398B (en) * | 2020-12-28 | 2023-09-01 | 北京奇艺世纪科技有限公司 | Resource scheduling method and device, electronic equipment and storage medium |
| CN114816690B (en) * | 2021-01-29 | 2024-11-15 | 中移(苏州)软件技术有限公司 | A task allocation method, device, equipment and storage medium |
| CN115604273A (en) * | 2021-06-28 | 2023-01-13 | 伊姆西Ip控股有限责任公司(Us) | Method, apparatus and program product for managing a computing system |
| US11868812B2 (en) | 2021-08-12 | 2024-01-09 | International Business Machines Corporation | Predictive scaling of container orchestration platforms |
| CN114528098B (en) * | 2022-01-25 | 2025-02-14 | 华南理工大学 | An automatic scaling method for cloud platforms based on fixed-length service queuing model |
| CN116800843A (en) * | 2022-03-14 | 2023-09-22 | 中国移动通信集团青海有限公司 | Resource processing methods, devices, equipment and storage media based on Kafka |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103699440A (en) * | 2012-09-27 | 2014-04-02 | 北京搜狐新媒体信息技术有限公司 | Method and device for cloud computing platform system to distribute resources to task |
| CN107613030A (en) * | 2017-11-06 | 2018-01-19 | 网宿科技股份有限公司 | A method and system for processing service requests |
| CN107734035A (en) * | 2017-10-17 | 2018-02-23 | 华南理工大学 | A kind of Virtual Cluster automatic telescopic method under cloud computing environment |
| CN107729126A (en) * | 2016-08-12 | 2018-02-23 | 中国移动通信集团浙江有限公司 | A kind of method for scheduling task and device of container cloud |
| WO2018168521A1 (en) * | 2017-03-14 | 2018-09-20 | Omron Corporation | Learning result identifying apparatus, learning result identifying method, and program therefor |
| CN108563755A (en) * | 2018-04-16 | 2018-09-21 | 辽宁工程技术大学 | A kind of personalized recommendation system and method based on bidirectional circulating neural network |
-
2018
- 2018-11-19 CN CN201811376902.XA patent/CN111198754B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103699440A (en) * | 2012-09-27 | 2014-04-02 | 北京搜狐新媒体信息技术有限公司 | Method and device for cloud computing platform system to distribute resources to task |
| CN107729126A (en) * | 2016-08-12 | 2018-02-23 | 中国移动通信集团浙江有限公司 | A kind of method for scheduling task and device of container cloud |
| WO2018168521A1 (en) * | 2017-03-14 | 2018-09-20 | Omron Corporation | Learning result identifying apparatus, learning result identifying method, and program therefor |
| CN107734035A (en) * | 2017-10-17 | 2018-02-23 | 华南理工大学 | A kind of Virtual Cluster automatic telescopic method under cloud computing environment |
| CN107613030A (en) * | 2017-11-06 | 2018-01-19 | 网宿科技股份有限公司 | A method and system for processing service requests |
| CN108563755A (en) * | 2018-04-16 | 2018-09-21 | 辽宁工程技术大学 | A kind of personalized recommendation system and method based on bidirectional circulating neural network |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111198754A (en) | 2020-05-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111198754B (en) | Task scheduling method and device | |
| CN109992404B (en) | Cluster computing resource scheduling method, device, equipment and medium | |
| CN109491790B (en) | Container-based industrial Internet of things edge computing resource allocation method and system | |
| US11521067B2 (en) | Decentralized distributed deep learning | |
| US20200257968A1 (en) | Self-learning scheduler for application orchestration on shared compute cluster | |
| US8595735B2 (en) | Holistic task scheduling for distributed computing | |
| CN108270805B (en) | Resource allocation method and device for data processing | |
| CN115543577B (en) | Covariate-based Kubernetes resource scheduling optimization method, storage medium and equipment | |
| CN110347515B (en) | A resource optimal allocation method suitable for edge computing environment | |
| WO2023116067A1 (en) | Power service decomposition method and system for 5g cloud-edge-end collaboration | |
| US20220035665A1 (en) | Sharing of compute resources between the virtualized radio access network (vran) and other workloads | |
| Liang et al. | Communication-efficient large-scale distributed deep learning: A comprehensive survey | |
| CN116643844B (en) | Intelligent management system and method for automatic expansion of power super-computing cloud resources | |
| CN105607952A (en) | Virtual resource scheduling method and apparatus | |
| Jeon et al. | Intelligent resource scaling for container-based digital twin simulation of consumer electronics | |
| CN109634714B (en) | An intelligent scheduling method and device | |
| CN119902904B (en) | Intelligent computing center model development method and device for inclusive computing power | |
| CN111445027B (en) | Training method and device for machine learning model | |
| CN115048218A (en) | End cloud collaborative reasoning method and system in edge heterogeneous scene | |
| CN119537031A (en) | Computing power scheduling method, device, storage medium and electronic device for model training | |
| Jeon et al. | Efficient container scheduling with hybrid deep learning model for improved service reliability in cloud computing | |
| CN114035906A (en) | Virtual machine migration method and device, electronic equipment and storage medium | |
| Zhu et al. | Research on fog resource scheduling based on cloud-fog collaboration technology in the electric internet of things | |
| CN116974747A (en) | Resource allocation method, device, equipment, medium and program product | |
| CN117135151A (en) | A fault detection method for GPU cluster and GPU cluster and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |