[go: up one dir, main page]

CN119356828B - Computing power arrangement method, device, equipment and storage medium - Google Patents

Computing power arrangement method, device, equipment and storage medium Download PDF

Info

Publication number
CN119356828B
CN119356828B CN202411918639.8A CN202411918639A CN119356828B CN 119356828 B CN119356828 B CN 119356828B CN 202411918639 A CN202411918639 A CN 202411918639A CN 119356828 B CN119356828 B CN 119356828B
Authority
CN
China
Prior art keywords
contention
tasks
accelerator
task
resources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411918639.8A
Other languages
Chinese (zh)
Other versions
CN119356828A (en
Inventor
刘杨
翟福民
原超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jincheng Digital Government Service Center
Beijing University of Posts and Telecommunications
Original Assignee
Jincheng Digital Government Service Center
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jincheng Digital Government Service Center, Beijing University of Posts and Telecommunications filed Critical Jincheng Digital Government Service Center
Priority to CN202411918639.8A priority Critical patent/CN119356828B/en
Publication of CN119356828A publication Critical patent/CN119356828A/en
Application granted granted Critical
Publication of CN119356828B publication Critical patent/CN119356828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请提供一种算力编排方法、装置、设备和存储介质。涉及计算机数据处理技术领域。该方法包括:将加速器资源虚拟化以实现灵活、可扩展的任务分配;使用自适应流量控制机制检测和管理任务之间的争用;通过集成的运行时系统协调资源分配和争用管理。本申请通过集成自适应加速器分配与动态争用管理功能,确保服务水平协议(SLA)的达成与性能的稳定性。并结合虚拟化的加速器编排与实时争用调节,使得加速器与内存资源可以灵活分配,在多个应用之间有效部署DNN,保障其高效、可靠的执行。

The present application provides a computing power orchestration method, device, equipment and storage medium. It relates to the field of computer data processing technology. The method includes: virtualizing accelerator resources to achieve flexible and scalable task allocation; using an adaptive flow control mechanism to detect and manage contention between tasks; and coordinating resource allocation and contention management through an integrated runtime system. The present application ensures the achievement of service level agreements (SLAs) and performance stability by integrating adaptive accelerator allocation and dynamic contention management functions. Combined with virtualized accelerator orchestration and real-time contention regulation, accelerator and memory resources can be flexibly allocated, DNNs can be effectively deployed between multiple applications, and their efficient and reliable execution can be guaranteed.

Description

Calculation force arrangement method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer data processing technologies, and in particular, to a computing power arrangement method, device, apparatus, and storage medium.
Background
With the rapid development of DNN applications in multiple fields, including edge computing and data centers, there is an increasing demand for parallel, efficient execution of multiple DNN workloads on a shared hardware platform. Existing solutions either suffer from inefficiency by statically partitioning resources or fail to cope with performance fluctuations caused by resource contention. There is a need for an integrated approach to optimize resource utilization and maintain performance stability while meeting quality of service requirements (QoS) for co-resident DNN tasks.
Disclosure of Invention
The present application provides a method, apparatus, device and storage medium for power orchestration to ensure achievement of Service Level Agreements (SLAs) and stability of performance by integrating adaptive accelerator allocation and dynamic contention management functions. And combining virtualized accelerator arrangement and real-time contention adjustment, so that the accelerator and the memory resources can be flexibly distributed, DNN can be effectively deployed among a plurality of applications, and efficient and reliable execution of the DNN is ensured.
In a first aspect, the present application provides a method of arranging computing power, comprising:
Virtualizing accelerator resources to achieve flexible, extensible task allocation;
detecting and managing contention between tasks using an adaptive flow control mechanism;
resource allocation and contention management are coordinated by an integrated runtime system.
Preferably, virtualizing accelerator resources to enable flexible, extensible task allocation includes:
Abstracting the physical accelerator resources into a virtual resource pool which is flexibly allocated, so that a plurality of tasks are not limited by a fixed physical accelerator, and thus, the resources are dynamically acquired according to real-time requirements;
Through a virtual interface, each task does not need to directly access a specific physical accelerator when requesting resources, but maps the task to a proper virtual accelerator according to the requirements of the current workload and the system resource state, wherein the virtual interface is used for hiding the specific position and state of the physical accelerator and simplifying the interaction between the task and the accelerator;
and dynamically distributing virtual accelerators for each task according to the priority, delay requirement and real-time resource condition of the task.
Preferably, the use of an adaptive flow control mechanism to detect and manage contention between tasks includes:
detecting resource contention points in real time according to the memory bandwidth of each task, the contention conditions of the accelerator computing unit and the cache resources;
According to the resource contention point, automatically adjusting the memory access rate or execution rate of the task so as to realize contention management, increasing resource allocation for high-priority tasks, and limiting the access rate of low-priority tasks or non-critical tasks so as to ensure the QoS of critical tasks;
and dynamically adjusting the memory access frequency of each task according to the real-time feedback of the task through the self-adaptive flow control mechanism, thereby guaranteeing the performance stability of the task and avoiding excessive contention of resources, wherein the real-time feedback comprises delay and bandwidth demand change.
Preferably, coordinating resource allocation and contention management by an integrated runtime system includes:
the resource allocation strategy is dynamically adjusted by the runtime system according to the task load, the system resource state and the contention condition, so that the resource optimization in the multi-tenant environment is realized;
The state adjustment resource allocation strategy comprises two modes of a balance mode and a guarantee mode, wherein the balance mode aims at fairness under the condition of no special priority requirement, balances the resource allocation among tasks and enables the SLAs of all tasks to be basically met;
when the system runs, after detecting the use change or the contention condition of the resources, the system immediately responds to adjust the resource allocation strategy and the flow control, realizes the dynamic adaptation to the workload and the contention through an efficient feedback and response mechanism, and optimizes the system performance in a multi-tenant environment.
In a second aspect, the present application provides a computing force orchestration device, the device comprising:
the virtualized accelerator management unit is used for flexibly distributing resources;
The contention monitoring and adjusting engine is used for adjusting the memory access rate in real time;
an adaptive runtime system for synchronizing resource allocation and contention policies according to workload demands.
Preferably, the virtualized accelerator management unit is further configured to abstract physical accelerator resources in the SoC, create a set of flexibly allocated virtual accelerators, so that multiple DNN tasks can share the same accelerator resource without binding to a fixed physical accelerator, and the virtualized accelerator management unit includes:
the dynamic resource pool management module is used for monitoring the resource state of the current system, mapping the task to a proper virtual accelerator according to the real-time requirement of the task, and realizing the dynamic management and allocation of the resource pool;
the task priority and demand matching module is used for dynamically adjusting the distributed virtual resources according to the priority of the task, the real-time bandwidth demand and the delay target;
A virtualization interface for enabling each task to directly access virtual accelerator resources without binding with a specific physical accelerator.
Preferably, the contention monitoring and adjustment engine comprises:
the contention monitoring module is used for monitoring the occupation condition of each task on system resources in real time, tracking the memory access mode and the bandwidth use condition of the task by using a counter and a register, and identifying a contention area possibly existing in the system;
the dynamic contention detection module is used for detecting and identifying the contention condition when the contention is aggravated due to the increase of the resource demand of a certain task or the change of the system load, and timely reporting the current contention state to the self-adaptive runtime system through signal transmission so as to trigger corresponding adjustment measures;
And the contention adjusting module is used for dynamically adjusting the resources through the contention adjusting module based on the feedback result of the contention monitoring module, increasing the memory access rate or priority of the critical tasks, limiting the bandwidth use of the low-priority tasks, and reducing the negative influence of the contention on the system performance through the means of adjusting the memory access rate and delay adjustment of the tasks.
Preferably, the adaptive runtime system comprises:
The system comprises a task scheduling module, a resource scheduling module and a resource scheduling module, wherein the task scheduling module is used for carrying out resource allocation according to different priorities and resource requirements of tasks based on a task scheduling strategy, the task scheduling strategy comprises a guarantee strategy and a balance strategy, the guarantee strategy is used for guaranteeing the resource requirements of high-priority or low-delay tasks in a priority mode, and when system resources are tense, the resource use of the low-priority tasks is properly limited, and the balance strategy is used for balancing the resource use by adopting a fair allocation mode for common tasks without strict QoS requirements;
the real-time response module is used for adjusting the current resource allocation and contention management strategy when the use state of the system resources changes;
And the feedback and optimization module is used for optimizing a future resource allocation scheme according to the delay feedback of the task and the resource use mode so as to improve the response speed and the resource utilization efficiency of the system.
In a third aspect, an embodiment of the present application provides an electronic device, including at least one processor and a memory, where the memory stores computer-executable instructions, and where the at least one processor executes the computer-executable instructions stored in the memory, so that the at least one processor performs the computing power orchestration method according to the first aspect and the various possible designs of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the method of arranging computational forces as described above in the first aspect and in the various possible designs of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements the method of arranging computing power as described above for the first aspect and the various possible designs of the first aspect.
The computing power arranging method, the computing power arranging device, the computing power arranging equipment and the storage medium can be used for deploying extensible accelerators for multi-tenant workloads, and virtualized accelerator arranging is supported by jointly designing accelerator software and hardware stacks, so that the current workloads are allowed to be bound to available accelerators in a self-adaptive mode.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a block diagram of an integrated DNN multi-tenant execution system provided by an embodiment of the present application;
FIG. 2 is a flow chart of a method for arranging computing power according to an embodiment of the present application;
FIG. 3 is a flow chart of accelerator resource virtualization provided by an embodiment of the present application;
FIG. 4 is a flow chart of an adaptive flow control mechanism for managing contention among tasks provided by an embodiment of the present application;
FIG. 5 is a flow chart of coordinating resource allocation and contention management by an integrated runtime system provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of a computing power arrangement device according to an embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
In the technical scheme of the application, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the information such as financial data or user data are in accordance with the regulations of related laws and regulations, and the public welfare is not violated.
It should be noted that, in the embodiments of the present application, some existing solutions in the industry such as software, components, models, etc. may be mentioned, and they should be regarded as exemplary, only for illustrating the feasibility of implementing the technical solution of the present application, but it does not mean that the applicant has or must not use the solution.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The calculation power arrangement ‌ refers to a process of scheduling and managing calculation power resources, and is characterized in that the calculation power resources are dynamically allocated and adjusted according to the requirements of users through an intelligent means so as to ensure efficient utilization and optimization cost. The computing power arrangement is an important component of the computing power network, and flexible management and optimization of computing power resources are realized through arrangement and scheduling.
For queue-based, first-come-first-serve accelerator scheduling, the task thread will request its target accelerator number by directly binding the accelerators to meet the QoS requirements. If a scheduling conflict occurs, i.e., there are fewer accelerators available in the system, then an attempt is made to pin the accelerator from another thread that will have completed the current layer blocking at the earliest, and then begin execution after synchronization and adjustment of accelerator affinity. Since the accelerator is fixed using physical accelerator bindings to user threads, significant overhead is incurred in reassigning the accelerator.
Based on the above, the embodiment of the application provides a calculation force arrangement method. The calculation power arrangement method is a new calculation power arrangement full stack method in an extensible mode, and aims to solve the problems of low resource utilization efficiency and unstable task execution performance caused by inflexible allocation of accelerator resources and limited contention adjustment capability in the existing multi-tenant DNN execution environment. According to the method, accelerator resources are flexibly and efficiently allocated in a multi-tenant environment through virtualized accelerator resources, a self-adaptive flow control mechanism and a unified scheduling system, and quality of service (QoS) and throughput of the system are guaranteed.
Fig. 1 is a block diagram of an integrated DNN multi-tenant execution system provided in an embodiment of the present application. As shown in fig. 1, an embodiment of the present application provides an integrated DNN multi-tenant execution system, and the computing power arrangement method is implemented based on the integrated DNN multi-tenant execution system. The integrated DNN multi-tenant execution system includes a virtualized accelerator layer 101, a contention-aware management layer 102, and an integrated scheduling and runtime system 103. The virtualized accelerator layer 101 supports flexible, real-time allocation of accelerator resources among multiple DNN workloads. Through the virtualized interface, the system optimizes the SLA achievement rate and simultaneously ensures the whole throughput. The contention-aware management layer 102 monitors system-level resource usage and manages memory contention using an adaptive flow control mechanism. According to the QoS requirement of the task, the memory access rate and the processing rate are dynamically adjusted, the interference between the tasks is reduced, and the stability of the system performance is improved. The integrated scheduling and runtime system 103 serves as a central component for coordinating accelerator allocation and contention management. The integrated scheduling and runtime system 103 determines in real-time the optimal resource allocation and contention-mitigation strategy based on the workload demand and the current contention level.
The integrated DNN multi-tenant execution system supports efficient DNN deployment of a variety of delay-sensitive applications through synchrotron allocation and contention management. Compared with the traditional independent method, the method and the system remarkably improve the throughput, fairness and performance stability of the system.
Specifically, the four points are as follows:
First, virtualized accelerator allocation, the system abstracts physical accelerators into virtual entities using virtualized accelerator orchestration techniques. This virtualization allows multiple applications to flexibly share accelerator resources without static binding, greatly improving the scalability and resource utilization of the system.
Second, the contention-aware management mechanism, the system integrates lightweight hardware and runtime, performs real-time contention detection and adjustment. By monitoring memory access and adjusting traffic levels, the system can control interference caused by memory occupation, ensure that QoS requirements of high priority tasks are achieved, and simultaneously effectively share resources.
Thirdly, uniformly scheduling and operating, namely monitoring the resource requirement of each DNN task during comprehensive operation. It allocates accelerators, balances contention, and applies a custom policy based on latency and bandwidth requirements. The scheduler makes decisions between "balancing" and "guaranteeing" the contention policies based on task priorities and system loads, thereby efficiently managing memory bandwidth and computing resources.
Fourth, performance adaptive algorithm the system uses the adaptive algorithm to evaluate and adjust task resource requirements. For example, when a delay sensitive application may not be achieved due to contention, the system may limit low priority tasks, reallocate resources to maintain QoS.
The integrated DNN multi-tenant execution system has the following application scenarios and expansibility:
1. The system is suitable for multitasking execution of edge computing and data centers, and is particularly suitable for scenes such as edge computing equipment, data centers and the like which need multitasking parallel execution. In these scenarios, multiple DNN tasks need to share resources, and the system can ensure QoS and performance stability for each task.
2. And supporting large-scale DNN deployment, wherein the virtual accelerator arrangement and dynamic contention-aware management module of the system provides scalability support for large-scale DNN tasks. By means of resource virtualization and real-time monitoring, the requirement of DNN tasks on rapid expansion can be effectively met.
3. The system is suitable for intelligent devices requiring high performance and real-time response, such as automatic driving, intelligent monitoring, augmented reality and other scenes. These applications need to handle a large number of DNN reasoning tasks, and the contention management function of the system ensures performance stability under high loads.
In summary, the embodiment of the application provides an efficient, fair and extensible multi-tenant DNN execution system. By integrating virtual accelerator arrangement and dynamic contention aware management, the system realizes flexible allocation and contention management of resources, thereby guaranteeing the service quality of key tasks in a multi-tenant environment, optimizing resource utilization and ensuring the overall performance stability of the system.
The computing power arranging method provided by the embodiment of the application can be configured in the system described above for implementation, can be used for arranging resources of a multi-tenant DNN workload accelerator, can integrate dynamic arrangement and contention management, can improve the resource utilization rate of DNN multi-tenant execution on a shared hardware platform through the combination of the two, meets strict QoS targets, reduces delay fluctuation, and has important significance for real-time application. Fig. 2 is a flowchart of a computing power arrangement method according to an embodiment of the present application. As shown in fig. 2, the calculation method includes the following steps S210 to S230.
And S210, virtualizing the accelerator resources to realize flexible and extensible task allocation.
In some embodiments, as shown in fig. 3, a flowchart of accelerator resource virtualization is provided in an embodiment of the present application. Step S210 is specifically implemented by the following steps:
the step S211 is the definition of virtualization, namely the virtualization of the accelerator resources refers to abstracting the physical accelerator resources into a virtual resource pool capable of being flexibly allocated, so that a plurality of tasks are not limited by fixed physical accelerators, and the resources can be dynamically acquired according to real-time requirements.
S212, virtualization is implemented by mapping each task to an appropriate virtual accelerator according to the current workload requirement and system resource status by the system without directly accessing a specific physical accelerator when requesting resources through a virtual interface. The virtualization interface may hide the specific location and state of the physical accelerator, simplifying interactions between tasks and accelerators.
S213, task allocation mechanism, wherein the system dynamically allocates virtual accelerators for each task according to the priority, delay requirement and real-time resource condition of the task. Therefore, the system can flexibly adjust the resource allocation of the tasks according to the Service Level Agreement (SLA) of the tasks, ensure the balance of the resources among the tasks, and avoid the resource waste caused by static resource binding.
S220 detects and manages contention between tasks using an adaptive flow control mechanism.
In some embodiments, as shown in fig. 4, a flow chart is provided for an adaptive flow control mechanism to manage contention among tasks according to an embodiment of the present application. Step S220 is specifically implemented by the following steps:
Contention detection the system monitors the resource usage of the respective tasks, in particular the contention of memory bandwidth, accelerator computation unit and cache resources. By monitoring the memory access rate and bandwidth usage of each task, the system can detect the resource contention point in real time.
And S222, self-adaptive adjustment, wherein the system automatically adjusts the memory access rate or the execution rate of the task according to the contention condition, so as to realize contention management. For high priority tasks, the system may increase the resource allocation, and for low priority tasks or non-critical tasks, the system may limit its access rate to ensure the QoS of critical tasks.
S223, according to the self-adaptive flow control mechanism, according to real-time feedback (such as delay, bandwidth demand change and the like) of the tasks, the performance stability of the tasks is ensured by dynamically adjusting the memory access frequency of each task, and meanwhile, excessive contention of resources is avoided.
S230 coordinates resource allocation and contention management by the integrated runtime system.
In some embodiments, the runtime system is the core of overall accelerator orchestration and contention management. It has global coordination in task scheduling, resource allocation, contention detection, and regulation. The runtime system can dynamically adjust the resource allocation strategy according to the task load, the system resource state and the contention condition, and realizes the resource optimization in the multi-tenant environment. As shown in fig. 5, a flowchart of coordinating resource allocation and contention management by an integrated runtime system according to an embodiment of the present application is provided, and step S230 is specifically implemented by the following steps:
s231, executing a scheduling strategy, wherein the scheduling strategy comprises a balance mode and a guarantee mode.
And in a balance mode, under the condition of no special priority requirement, the system aims at fairness and balances resource allocation among tasks so that the SLAs of all tasks can be basically met.
And in the guarantee mode, when a critical task needs to meet a specific QoS target, the system can allocate resources to the task preferentially, and meanwhile, the non-critical task is properly limited in speed, so that the service quality of the critical task is guaranteed.
And S232, executing a real-time response mechanism, wherein the system can immediately respond after detecting the resource use change or the contention condition, and adjust the resource allocation strategy and the flow control. Through an efficient feedback and response mechanism, the system realizes dynamic adaptation to workload and contention, and optimizes system performance in a multi-tenant environment.
In summary, the computing power arranging method provided by the embodiment of the application can realize flexible, efficient and contention-aware accelerator resource management, provides good performance support for a multi-tenant DNN execution environment, ensures the service quality and fairness of the system under high load, and is embodied in the following three points:
Firstly, flexible resource utilization, namely, through virtualized accelerator resources, resource waste caused by static binding is avoided, and flexible allocation of resources in a multi-tenant DNN environment is realized.
Second, the adaptive flow control mechanism ensures performance stability during concurrent execution of multiple tasks and reduces delay fluctuations suffered by high priority tasks due to contention.
Thirdly, the high-efficiency service quality is ensured, namely, the resource allocation of different tasks can be effectively managed through the integrated runtime system, the service quality of the key tasks is ensured not to be affected, and the resource utilization rate and throughput of the whole system are improved.
The embodiment of the application also provides a computational effort arrangement device which is an efficient hardware device and is used for realizing resource management and contention control in a multi-tenant Deep Neural Network (DNN) execution environment on a system-on-a-chip (SoC). The device is suitable for being applied to DNN multi-task concurrency scenes, and achieves high-efficiency utilization of system resources through intelligent resource allocation and contention adjustment, and ensures fairness among different tasks and quality of service (QoS) of key tasks. Fig. 6 is a schematic structural diagram of a computing power arrangement device according to an embodiment of the present application. As shown in fig. 6, the calculation force arrangement device includes:
a virtualized accelerator management unit 610 for flexibly allocating resources;
A contention monitor and adjust engine 620 for adjusting the memory access rate in real time;
an adaptive runtime system 630 for synchronizing resource allocation with contention policies according to workload demands.
In some embodiments, virtualized accelerator management unit 610 is configured to abstract physical accelerator resources within the SoC, creating a set of flexibly allocatable virtual accelerators, enabling multiple DNN tasks to share the same accelerator resource without binding to a fixed physical accelerator. As shown in fig. 6, the virtualized accelerator management unit 610 includes a dynamic resource pool management module 611, a task priority and demand matching module 612, and a virtualized interface 613.
For the dynamic resource pool management module 611, the virtualized accelerator management unit creates a pool of accelerator resources that allocate virtual accelerator instances when tasks request resources. The management unit can monitor the resource state of the current system, map the task to a proper virtual accelerator according to the real-time requirement of the task, and realize the dynamic management and allocation of the resource pool.
The task priority and demand matching module 612 can dynamically adjust the assigned virtual resources based on the priority of the tasks, real-time bandwidth demands, and latency targets. For example, for tasks with high priority or low latency requirements, the system may prioritize more computing resources, ensuring that they meet QoS requirements.
The virtualized accelerator management unit 610 enables each task to directly access virtual accelerator resources through the virtualized interface 613 without binding to a specific physical accelerator. This design of the virtualized interface 613 simplifies the task scheduling process while allowing for more flexibility and scalability in resource allocation.
In some embodiments, as shown in fig. 6, the contention monitoring and adjustment engine 620 includes:
The contention monitor module 621 monitors the occupation of system resources by each task in real time, especially the use of memory bandwidth and cache resources. The monitoring module uses the counter and registers to track the memory access patterns and bandwidth usage of tasks and identify possible contention areas within the system.
The dynamic contention detection module 622 is capable of quickly detecting and identifying contention as it is exacerbated by increased resource demands for a task or by changes in system load. The system can timely report the current contention status to the adaptive runtime system through signaling to trigger the corresponding adjustment measures.
Contention adjustment module 623 dynamically adjusts the resources based on the results fed back by the contention monitor module. For example, for critical tasks, the system may increase its memory access rate or priority, while for low priority tasks it may limit its bandwidth usage. The adjustment module can reduce the negative influence of contention on the system performance by means of adjusting the memory access rate of tasks, delay adjustment and the like.
The contention adjustment module 623 has a real-time adjustment mechanism, and optimizes resource allocation by an adaptive algorithm according to the contention status monitored in real time, so as to ensure the overall stability of the system. By timely increasing or decreasing the resource use frequency of the tasks, the interference among the tasks is ensured to be minimized, and the performance requirements of the key tasks are met.
In some embodiments, the adaptive runtime system 630 acts as a core management module of the computing orchestration device, responsible for scheduling and allocating resources, and coordinating the operation of the functional units according to real-time task demands and contention conditions. The adaptive runtime system 630 can dynamically analyze the performance requirements of tasks and adjust resource allocation and contention management policies.
As shown in fig. 6, the adaptive runtime system 630 includes a task scheduling module 631, a real-time response module 632, and a feedback and optimization module 633:
The task scheduling module 631 is configured to execute task scheduling policies, where the task scheduling module 631 includes multiple task scheduling policies, and can perform appropriate resource allocation according to different priorities and resource requirements of tasks.
For example, the task scheduling policy includes:
And the guarantee strategy is to guarantee the resource demand of the self-adaptive running system by aiming at the tasks with high priority or low delay, and properly limit the resource use of the tasks with low priority when the system resources are tense.
And (3) balancing strategy, namely for common tasks without strict QoS requirements, adopting a fair allocation mode by the self-adaptive runtime system, balancing resource use, and preventing individual tasks from influencing the overall performance of the system due to contention.
The real-time response module 632 is used to execute a real-time response mechanism. When the use state of the system resources changes, the self-adaptive runtime system can quickly respond and adjust the current resource allocation and contention management strategy. For example, when an increase in system load is detected and a critical task is disturbed, the system immediately prioritizes allocation of resources to the critical task.
The feedback and optimization module 633 continuously optimizes the resource allocation decisions based on the execution feedback of the current task. For example, the system may optimize a future resource allocation scheme according to the delayed feedback of the task and the resource usage pattern, so as to improve the response speed and the resource utilization efficiency of the system.
In summary, the computing power arrangement device provided by the embodiment of the application has the following beneficial effects:
1) The resource utilization flexibility is that the system can realize flexible allocation of the resources through the virtualized accelerator management unit, so that resource waste caused by static binding is avoided, and the adaptability of the SoC in a multi-tenant environment is improved.
2) The contention management effectiveness is that the contention monitoring and adjusting engine can timely find out the resource contention problem and rapidly adjust the resource use of the task, thereby guaranteeing the stable performance of the system under the high concurrency environment, especially the QoS achievement rate of the key task.
3) Efficient runtime scheduling adaptive runtime systems have real-time response and intelligent tuning capabilities,
The embodiment of the application provides electronic equipment. The electronic device may include a processor, a memory, wherein the processor and the memory may communicate, and exemplary, the processor and the memory communicate over a communication bus.
The processor executes the computer-executable instructions stored in the memory, causing the processor to perform the aspects of the embodiments described above. The Processor may be a general purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The communication bus may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The transceiver is used to enable communication between the database access device and other computers (e.g., clients, read-write libraries, and read-only libraries). The memory may include random access memory (random access memory, RAM) and may also include non-volatile memory (non-volatile memory).
The electronic device provided by the embodiment of the application can be the terminal device of the embodiment.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions, and when the computer instructions run on a computer, the computer is caused to execute the technical scheme of the computer power arrangement method.
The embodiment of the application also provides a computer program product, which comprises a computer program stored in a computer readable storage medium, wherein at least one processor can read the computer program from the computer readable storage medium, and the technical scheme of the calculation method in the embodiment can be realized when the at least one processor executes the computer program.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to implement the solution of this embodiment.
In addition, each functional module in the embodiments of the present application may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit. The units formed by the modules can be realized in a form of hardware or a form of hardware and software functional units.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform some of the steps of the methods of the various embodiments of the application.
It should be appreciated that the Processor may be a central processing unit (Central Processing Unit, abbreviated as CPU), or may be other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, abbreviated as DSP), application SPECIFIC INTEGRATED Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or to one type of bus.
The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic control unit or master control device.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of implementing the various method embodiments described above may be implemented by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs the steps comprising the method embodiments described above, and the storage medium described above includes various media capable of storing program code, such as ROM, RAM, magnetic or optical disk.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present application.

Claims (6)

1. A method of computing power orchestration, the method comprising:
Virtualizing accelerator resources to achieve flexible, extensible task allocation;
detecting and managing contention between tasks using an adaptive flow control mechanism;
coordinating resource allocation and contention management by an integrated runtime system;
virtualizing accelerator resources to enable flexible, extensible task allocation, comprising:
Abstracting physical accelerator resources into a flexibly allocated virtual resource pool, abstracting the physical accelerator resources in the SoC, and creating a group of flexibly allocated virtual accelerators, so that a plurality of DNN tasks can share the same accelerator resources without binding to a fixed physical accelerator;
Each task does not need to directly access a specific physical accelerator when requesting resources through a virtualization interface, and the tasks are mapped to a proper virtual accelerator according to the requirements of the current workload and the system resource state, wherein the virtualization interface is used for hiding the specific position and state of the physical accelerator and simplifying the interaction between the tasks and the accelerator;
dynamically distributing virtual accelerators for each task according to the priority, delay requirement and real-time resource condition of the task;
coordinating resource allocation and contention management by an integrated runtime system, comprising:
the resource allocation strategy is dynamically adjusted by the runtime system according to the task load, the system resource state and the contention condition, so that the resource optimization in the multi-tenant environment is realized;
The dynamic adjustment resource allocation strategy comprises a balance mode and a guarantee mode, wherein the balance mode aims at fairness under the condition of no special priority requirement, balances the resource allocation among tasks and enables the SLAs of all tasks to be basically met;
when the system runs, after detecting the use change or the contention condition of the resources, the system immediately responds to adjust the resource allocation strategy and the flow control, realizes the dynamic adaptation to the workload and the contention through an efficient feedback and response mechanism, and optimizes the system performance in a multi-tenant environment.
2. The method of claim 1, wherein detecting and managing contention between tasks using an adaptive flow control mechanism comprises:
detecting resource contention points in real time according to the memory bandwidth of each task, the contention conditions of the accelerator computing unit and the cache resources;
According to the resource contention point, automatically adjusting the memory access rate or execution rate of the task so as to realize contention management, increasing resource allocation for high-priority tasks, and limiting the access rate of low-priority tasks or non-critical tasks so as to ensure the QoS of critical tasks;
and dynamically adjusting the memory access frequency of each task according to the real-time feedback of the task through the self-adaptive flow control mechanism, thereby guaranteeing the performance stability of the task and avoiding excessive contention of resources, wherein the real-time feedback comprises delay and bandwidth demand change.
3. A computing force orchestration device, the device comprising:
a virtualized accelerator management unit for virtualizing accelerator resources to achieve flexible, extensible task allocation;
A contention monitoring and adjustment engine for detecting and managing contention between tasks using an adaptive flow control mechanism;
An adaptive runtime system for coordinating resource allocation and contention management by an integrated runtime system;
the virtualized accelerator management unit is further configured to:
Abstracting physical accelerator resources into a flexibly allocated virtual resource pool, abstracting the physical accelerator resources in the SoC, and creating a group of flexibly allocated virtual accelerators, so that a plurality of DNN tasks can share the same accelerator resources without binding to a fixed physical accelerator;
Each task does not need to directly access a specific physical accelerator when requesting resources through a virtualization interface, and the tasks are mapped to a proper virtual accelerator according to the requirements of the current workload and the system resource state, wherein the virtualization interface is used for hiding the specific position and state of the physical accelerator and simplifying the interaction between the tasks and the accelerator;
dynamically distributing virtual accelerators for each task according to the priority, delay requirement and real-time resource condition of the task;
The adaptive runtime system is further configured to:
the resource allocation strategy is dynamically adjusted by the runtime system according to the task load, the system resource state and the contention condition, so that the resource optimization in the multi-tenant environment is realized;
The dynamic adjustment resource allocation strategy comprises a balance mode and a guarantee mode, wherein the balance mode aims at fairness under the condition of no special priority requirement, balances the resource allocation among tasks and enables the SLAs of all tasks to be basically met;
when the system runs, after detecting the use change or the contention condition of the resources, the system immediately responds to adjust the resource allocation strategy and the flow control, realizes the dynamic adaptation to the workload and the contention through an efficient feedback and response mechanism, and optimizes the system performance in a multi-tenant environment.
4. The computing power orchestration device according to claim 3, wherein the contention monitoring and adjustment engine comprises:
the contention monitoring module is used for monitoring the occupation condition of each task on system resources in real time, tracking the memory access mode and the bandwidth use condition of the task by using a counter and a register, and identifying a contention area possibly existing in the system;
the dynamic contention detection module is used for detecting and identifying the contention condition when the contention is aggravated due to the increase of the resource demand of a certain task or the change of the system load, and timely reporting the current contention state to the self-adaptive runtime system through signal transmission so as to trigger corresponding adjustment measures;
And the contention adjusting module is used for dynamically adjusting the resources through the contention adjusting module based on the feedback result of the contention monitoring module, increasing the memory access rate or priority of the critical tasks, limiting the bandwidth use of the low-priority tasks, and reducing the negative influence of the contention on the system performance through the means of adjusting the memory access rate and delay adjustment of the tasks.
5. An electronic device comprising a processor and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
The processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-2.
6. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any of claims 1-2.
CN202411918639.8A 2024-12-25 2024-12-25 Computing power arrangement method, device, equipment and storage medium Active CN119356828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411918639.8A CN119356828B (en) 2024-12-25 2024-12-25 Computing power arrangement method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411918639.8A CN119356828B (en) 2024-12-25 2024-12-25 Computing power arrangement method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN119356828A CN119356828A (en) 2025-01-24
CN119356828B true CN119356828B (en) 2025-04-22

Family

ID=94304733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411918639.8A Active CN119356828B (en) 2024-12-25 2024-12-25 Computing power arrangement method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN119356828B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024000443A1 (en) * 2022-06-30 2024-01-04 Intel Corporation Enforcement of maximum memory access latency for virtual machine instances

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9769050B2 (en) * 2014-12-23 2017-09-19 Intel Corporation End-to-end datacenter performance control
US11216314B2 (en) * 2018-11-02 2022-01-04 EMC IP Holding Company LLC Dynamic reallocation of resources in accelerator-as-a-service computing environment
CN118349351A (en) * 2024-04-26 2024-07-16 琅琛信息技术(上海)有限公司 Cloud primary computing resource intelligent arrangement and virtualization management center and method
CN118916315A (en) * 2024-07-16 2024-11-08 苏州元脑智能科技有限公司 Compatible processing method, system, electronic equipment and computer readable medium
CN118916147B (en) * 2024-08-29 2025-07-29 深圳华易数字能源有限公司 Multi-source calculation force data integration and intelligent scheduling system and method
CN119065859B (en) * 2024-11-05 2025-01-03 安徽省交通规划设计研究总院股份有限公司 BIM software automated full life cycle management method for virtualized resource pool

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024000443A1 (en) * 2022-06-30 2024-01-04 Intel Corporation Enforcement of maximum memory access latency for virtual machine instances

Also Published As

Publication number Publication date
CN119356828A (en) 2025-01-24

Similar Documents

Publication Publication Date Title
KR100420420B1 (en) Method, system and program products for managing central processing unit resources of a computing environment
CN114327843B (en) Task scheduling method and device
KR100420421B1 (en) Method, system and program products for managing logical processors of a computing environment
US7665090B1 (en) System, method, and computer program product for group scheduling of computer resources
US9086925B2 (en) Methods of processing core selection for applications on manycore processors
US8752055B2 (en) Method of managing resources within a set of processes
KR100420419B1 (en) Method, system and program products for managing groups of partitions of a computing environment
JP3872343B2 (en) Workload management in a computer environment
Hashem et al. MapReduce scheduling algorithms: a review
US8195784B2 (en) Linear programming formulation of resources in a data center
Sun et al. Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
CN112052068A (en) Method and device for binding CPU (central processing unit) of Kubernetes container platform
KR20010050504A (en) Processing channel subsystem pending i/o work queues based on priorities
US20120072624A1 (en) Numa i/o framework
KR20040065981A (en) Dynamic allocation of computer resources based on thread type
JP2013506179A (en) Execution management system combining instruction threads and management method
US20090049449A1 (en) Method and apparatus for operating system independent resource allocation and control
US11776087B2 (en) Function-as-a-service (FAAS) model for specialized processing units
WO2023159652A1 (en) Ai system, memory access control method, and related device
CN115129465A (en) Method for operating a computing unit
CN117311910B (en) High-performance virtual password machine operation method
CN119356828B (en) Computing power arrangement method, device, equipment and storage medium
CN116841751B (en) Policy configuration method, device and storage medium for multi-task thread pool
CN118409705A (en) Real-time guaranteeing method and device for operating system block storage subsystem
Soundararajan et al. Towards end-to-end quality of service: controlling I/O interference in shared storage servers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant