CN109918195B

CN109918195B - Processor resource scheduling method for many-core systems based on thermal-aware dynamic task migration

Info

Publication number: CN109918195B
Application number: CN201910049800.5A
Authority: CN
Inventors: 文生雁; 王小航
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2023-06-20
Anticipated expiration: 2039-01-18
Also published as: CN109918195A

Abstract

The invention discloses a resource scheduling method for many-core system processors based on thermal perception dynamic task migration, which includes the following steps: step 1, detecting whether the waiting sequence is empty; if it is not empty, map the application, and then perform step 2; Whether the arrival queue is empty; if it is not empty, go to step 3; step 3, check whether there is no application running in the system, if not, use the model to estimate the running and waiting time of the bubbles, and search for the best bubble distribution through the branch and bound algorithm Result; Step 4, searching for the best bubble allocation result through the branch-and-bound algorithm; Step 5, applying the mapping stage, and Step 6, applying the running stage. This method utilizes the black silicon phenomenon, adapts to application arrival queues of different lengths according to the application arrival rate and the calculation-sensitive or communication-sensitive characteristics of the application, maintains the task running frequency, can respond to the variable application arrival rate, and effectively improves the system throughput rate , which improves system performance.

Description

Processor resource scheduling method for many-core systems based on thermal-aware dynamic task migration

技术领域technical field

本发明涉及通信技术领域，具体涉及一种基于热感知动态任务迁移的众核系统处理器资源调度方法。The invention relates to the field of communication technology, in particular to a resource scheduling method for many-core system processors based on dynamic task migration based on thermal perception.

背景技术Background technique

众核芯片是云计算、移动计算等领域的处理器部件之一。众核处理器广泛应用于服务器，数据中心等领域。在计算机系统的发展中，众核芯片正在成为一个越来越重要的平台。随着应用对计算需求的增加，众核芯片的集成度和性能也不断提高，随之而来的，是快速增长的芯片功率密度和温度，温度成为限制芯片性能的一个重要因素。长时过高的温度会影响众核芯片的可靠性和使用寿命。由于散热条件的限制且为保障系统运行的安全性，一般为芯片系统设置功率约束。为保持高速计算性能的同时满足功率约束，芯片上的一部分处理器不得不处于关闭状态，这被称为黑硅现象。Many-core chips are one of the processor components in cloud computing, mobile computing and other fields. Many-core processors are widely used in servers, data centers and other fields. Many-core chips are becoming an increasingly important platform in the development of computer systems. With the increase of computing demands of applications, the integration and performance of many-core chips are also continuously improved, followed by the rapid increase of chip power density and temperature, and temperature has become an important factor limiting chip performance. Excessively high temperature for a long time will affect the reliability and service life of many-core chips. Due to the limitation of heat dissipation conditions and to ensure the safety of system operation, power constraints are generally set for chip systems. To meet power constraints while maintaining high-speed computing performance, part of the processor on the chip has to be turned off, which is known as the black silicon phenomenon.

一般称芯片系统内未激活的空闲处理器为气泡。由于气泡的散热性较好，一些做法将气泡置于激活处理器的周围，令该处理器以较高的频率运行以提高计算性能。该类做法仍处于静态任务-处理器核映射范畴，由于任务映射位置是固定不变的，在运行频率不变的状态下，依然有可能产生热点。一些方法应用动态的任务迁移来减少热点，当原处理器的温度逐渐上升到高于设定的温度阈值时，它们将线程或任务从过热核迁移到其他核上。一类方法倾向于选择全局温度最低的处理器或者随机选取空闲处理器作为任务的迁移目标，这会大幅度增加通信距离。如果应用任务间的通信较多，这显然导致了过高的通信代价。另一类方法每次选取与过热处理器相邻的一个处理器迁移任务，经过多次这样的迁移后，应用离开系统时可能会留下一个不连续的空闲区域，新到达或者在队列中等待的应用无法连续映射到空闲处理器上。以上两类方法都可能面临的情况是，任务间通信时可能要经过多个正在运行其它应用的处理器核心，导致应用的任务间通信发生冲突，降低了通信效率。The inactive idle processors in the system-on-a-chip are generally referred to as bubbles. Because the air bubbles dissipate heat better, some practices place the air bubbles around the active processor, causing the processor to run at a higher frequency to increase computing performance. This type of approach is still in the category of static task-processor core mapping. Since the task mapping position is fixed, hot spots may still occur when the operating frequency remains unchanged. Some approaches apply dynamic task migration to reduce hotspots by migrating threads or tasks from overheated cores to other cores when the temperature of the original processor gradually rises above a set temperature threshold. One class of methods tends to choose the processor with the lowest global temperature or randomly select an idle processor as the migration target of the task, which will greatly increase the communication distance. If there are many communications between application tasks, this obviously leads to high communication costs. Another type of method selects a processor adjacent to the overheated processor to migrate tasks each time. After multiple such migrations, a discontinuous free area may be left when the application leaves the system, and the newly arrived or waiting in the queue applications cannot be mapped consecutively to idle processors. Both of the above two types of methods may face the situation that the inter-task communication may pass through multiple processor cores running other applications, resulting in conflicts in the application inter-task communication and reducing the communication efficiency.

由于用户请求的多样性和复杂性，服务器系统被期望于能够应对多变的工作负载，在尽可能短的时间内响应。一种任务迁移方法令每个正在运行任务的处理器都超频运行，当达到温度阈值时将所有的任务全部迁移到另一块连续的空闲处理器区域内。尽管保持了任务间的通信距离不变，这类方法只适用于负载较低时的系统，当应用到达较多，需要的处理器资源较多时，这种低系统利用率的处理器资源调度方法会带来过长的应用等待时间，抵消超频下的性能增益，增加响应时间。Due to the diversity and complexity of user requests, the server system is expected to be able to respond to changing workloads in the shortest possible time. A method of task migration overclocks each processor that is running a task, and migrates all tasks to another contiguous free processor area when the temperature threshold is reached. Although the communication distance between tasks remains unchanged, this type of method is only suitable for systems with low loads. When applications arrive more and require more processor resources, this method of processor resource scheduling with low system utilization It will bring too long application waiting time, offset the performance gain under overclocking, and increase the response time.

授权公告号为CN201310059705的发明专利中公开了一种“一种核资源分配方法、装置及众核系统”，该发明专利主要提出了一种核资源分配方法，根据用户进程的线程数(所需的空闲核数目)将分散的核分区进行合并以将形成的连续的核分区分配给用户进程，优化通信代价。该方法根据核分区迁移代价，从至少两个分散核分区中选取一个基准核分区和一个从核分区，以使总的核分区迁移代价最小。其次迁移从核分区的空闲核，使从核分区空闲核与基准核分区的空闲核合并形成一个连续的核分区。上述专利主要从去碎片化的角度对片上的处理器资源进行整合，使到达的应用能够实现连续映射，它的缺点是没有考虑系统的功率约束和温度约束，没有考虑可能出现的温度峰值和不均匀的温度分布，系统存在过热风险。The invention patent with the authorized announcement number CN201310059705 discloses a "nuclear resource allocation method, device, and many-core system". The invention patent mainly proposes a nuclear resource allocation method. The number of idle cores) merge the scattered core partitions to assign the formed continuous core partitions to user processes, optimizing the communication cost. According to the core partition migration cost, the method selects a reference core partition and a slave core partition from at least two scattered core partitions, so as to minimize the total core partition migration cost. Secondly, the idle cores of the slave core partition are migrated so that the idle cores of the slave core partition and the idle cores of the reference core partition are merged to form a continuous core partition. The above-mentioned patent mainly integrates on-chip processor resources from the perspective of de-fragmentation, so that the arriving applications can realize continuous mapping. Its disadvantage is that it does not consider the power constraints and temperature constraints of the system, and does not consider the possible temperature peaks and abnormalities. Uniform temperature distribution, risk of system overheating.

以上有关任务迁移的方法都没有将对气泡的利用考虑在内。在黑硅现象存在时，同时考虑芯片温度约束和负载，如何设计众核系统的动态处理器资源调度，是维持其高性能的关键。None of the above methods for task transfer take the use of bubbles into account. When the black silicon phenomenon exists, how to design the dynamic processor resource scheduling of the many-core system is the key to maintaining its high performance while considering the chip temperature constraints and load.

发明内容Contents of the invention

本发明是在黑硅现象下的处理器资调度方法，在二维片上网络众核系统任务调度仿真系统内实现，可以在保持较高的处理器运行频率的同时规避热风险且考虑通信代价，用以实现提高系统的吞吐率。The present invention is a processor resource scheduling method under the black silicon phenomenon, which is implemented in a two-dimensional on-chip network many-core system task scheduling simulation system, which can avoid thermal risks and consider communication costs while maintaining a high processor operating frequency. It is used to improve the throughput of the system.

为此，本发明提出的一种基于热感知动态任务迁移的众核系统处理器资源调度方法，包括以下步骤：For this reason, the present invention proposes a resource scheduling method for many-core system processors based on thermal perception dynamic task migration, including the following steps:

步骤一、检测等待序列是否为空；不为空，则映射等待序列里的每一个应用，再进行步骤二；若空则结束本次资源调度，等待下一个时钟周期开始时，进行步骤一；Step 1. Check whether the waiting sequence is empty; if it is not empty, map each application in the waiting sequence, and then proceed to step 2; if it is empty, end this resource scheduling, and wait for the start of the next clock cycle to proceed to step 1;

步骤二、检测到达队列是否为空；若到达队列非空，则进行步骤三；若到达队列为空，则该时钟周期没有新应用到达，不需进行处理器资源调度，结束本次资源调度，等待下一个时钟周期开始时，进行步骤一；Step 2: Detect whether the arrival queue is empty; if the arrival queue is not empty, proceed to step 3; if the arrival queue is empty, then no new application arrives in this clock cycle, processor resource scheduling is not required, and this resource scheduling ends. When waiting for the start of the next clock cycle, proceed to step 1;

步骤三、检测系统内是否没有应用运行，若没有应用运行，即系统的N*N个处理器资源均可用，则用气泡-性能模型和等待时间模型分别为每个应用估算不同气泡下的执行时间和等待时间，作为分支界限中代价函数的计算输入，通过分支界限算法搜索应用的总响应时间最短的气泡分配结果，使总响应时间最短；若有应用运行，则进行步骤四；Step 3. Check whether there is no application running in the system. If there is no application running, that is, the N*N processor resources of the system are all available, use the bubble-performance model and the waiting time model to estimate the execution under different bubbles for each application Time and waiting time, as the calculation input of the cost function in the branch and bound, search for the bubble distribution result with the shortest total response time of the application through the branch and bound algorithm, so as to make the total response time the shortest; if there is an application running, go to step 4;

步骤四：若有应用运行，则系统的可用处理器被已占用的应用区域分割为一组不连续的空闲可用区域。用气泡-性能模型和等待时间模型分别为每个应用估算可选气泡数下的执行时间和等待时间，作为分支界限中代价函数的计算输入，通过分支界限算法搜索最佳的气泡分配结果，使总响应时间最短。Step 4: If there are applications running, the available processors of the system are divided into a group of discontinuous free available areas by the occupied application areas. The bubble-performance model and the waiting time model are used to estimate the execution time and waiting time under the optional number of bubbles for each application respectively, as the calculation input of the cost function in the branch and bound, and search for the best bubble allocation result through the branch and bound algorithm, so that The overall response time is the shortest.

步骤五、应用映射阶段：采用首次适应启发式算法(First-Fit Heuristic)选择空闲区域，再选择应用的映射模式进行映射；所述应用的映射模式包括方形映射模式和通信优先的映射模式；Step 5, the application mapping stage: using the First-Fit Heuristic algorithm (First-Fit Heuristic) to select the free area, and then selecting the application mapping mode for mapping; the application mapping mode includes a square mapping mode and a communication priority mapping mode;

步骤六、应用运行阶段：根据应用的映射模式和其自身的计算量-通信量比例(Computation communication rate，CCR)作为选择依据，为不同类型的应用选择迁移模式。Step 6: Application running phase: According to the mapping mode of the application and its own calculation-communication rate (CCR) as the basis for selection, the migration mode is selected for different types of applications.

进一步的，所述气泡-性能模型用于计算各应用在不同气泡数目下对应的执行时间，该模型为多项式回归模型，令应用的执行时间为∏_i，应用区域包含的气泡数目b_i，应用的关键路径跳数即加权路径最长的路径跳数h_i，应用中任务的平均计算时间c_i，应用中任务平均通信时间t_i，气泡-性能模型如下：Further, the bubble-performance model is used to calculate the corresponding execution time of each application under different numbers of bubbles, the model is a polynomial regression model, let the execution time of the application be ∏ _i , the number of bubbles b _i contained in the application area, and the application The hop count of the critical path is the hop count h _i of the longest weighted path, the average computing time _ci of the task in the application, and the average communication time t _i of the task in the application. The bubble-performance model is as follows:

其中n₁～n₄为多项式阶数，α_k，β_k，γ_k，θ_k为模型的拟合系数，取值由极大似然法得到。

为应用i的气泡数的k次方，/>

为应用i关键路径跳数的k次方，/>

为应用i中平均任务计算时间的k次方，/>

为应用i的平均每两个任务间通信时间。Among them, n ₁ to n ₄ are polynomial orders, α _k , β _k , γ _k , and θ _k are the fitting coefficients of the model, and the values are obtained by the maximum likelihood method.

is the kth power of the number of bubbles applied to i, />

is the k-th power of the hop count of the critical path of application i, />

is the kth power of the average task computing time in application i, />

is the average communication time between every two tasks of application i.

进一步的，所述等待时间模型为给定当前应用的区域大小即气泡数和任务数之和，用于计算各应用在不同气泡数目下和当前到达率下对应的等待时间，该模型为多项式回归模型，当前应用的区域面积记为R，系统总处理器数量记为|T|，已映射应用的平均任务数记为|A_i|，令当前应用的等待时间为η_i，η_i由以下变量建模：已映射应用的平均气泡数-任务数比例r_i，已映射应用的平均执行时间e_i，应用到达率λ，等待时间模型如下：Further, the waiting time model is given the area size of the current application, that is, the sum of the number of bubbles and the number of tasks, and is used to calculate the corresponding waiting time of each application under different numbers of bubbles and the current arrival rate. The model is polynomial regression Model, the area of the current application is denoted as R, the total number of processors in the system is denoted as |T|, the average number of tasks of the mapped application is denoted as |A _i |, let the waiting time of the current application be η _i , η _i is given by the following Variable modeling: the average number of bubbles of the mapped application - the ratio of the number of tasks r _i , the average execution time e _i of the mapped application, the application arrival rate λ, and the waiting time model are as follows:

其中a₀为常数项，z为多项式阶数，r^j为已映射应用的平均气泡数-任务数比例的j次方，e^j为已映射应用的平均执行时间的j次方，λ^j为当前应用到达率的j次方。δ_j，ε_j，μ_j为由极大似然回归得到的对应各参数项的拟合系数。where a ₀ is a constant term, z is the polynomial order, r ^j is the jth power of the average bubble number-task number ratio of the mapped application, e ^j is the jth power of the average execution time of the mapped application, and ^λj is The jth power of the current application arrival rate. δ _j , ε _j , μ _j are the fitting coefficients corresponding to each parameter item obtained by maximum likelihood regression.

进一步的，应用任务数为应用映射占用的处理器数量。一个应用为多个小任务的合集，每个任务运行时执行应用中不同的指令部分，从而并行的执行应用。每个任务映射要占用一个处理器核心，应用任务数等于应用运行要占用的处理器数量。Further, the number of application tasks is the number of processors occupied by the application mapping. An application is a collection of multiple small tasks, and each task executes a different part of the application's instructions when running, thereby executing the application in parallel. Each task map occupies one processor core, and the number of application tasks is equal to the number of processors occupied by the application to run.

进一步的，所述可选气泡数为每个空闲区域的处理器总数与应用任务数之差；应用任务数指等于应用运行要占用的处理器数量。Further, the optional number of bubbles is the difference between the total number of processors in each free area and the number of application tasks; the number of application tasks is equal to the number of processors occupied by the application to run.

进一步的，步骤三由全局管理器(Global manager)实现，当后续应用到达，全局管理器统计当前系统内所有的可用处理器区域，为每个后续到达应用估算这些可用处理器区域对应的执行时间和等待时间，由分支界限算法，找到代价最小的应用-可用区域对应关系。Further, step 3 is implemented by the global manager. When subsequent applications arrive, the global manager counts all available processor areas in the current system, and estimates the execution time corresponding to these available processor areas for each subsequent application. and waiting time, use the branch-and-bound algorithm to find the application-available area correspondence with the least cost.

进一步的，所述首次适应算法具体实现过程为：在系统内从左到右，从上到下搜索空闲处理器，判断其是否能作为应用区域的起始点，能作为起始点的条件是起始点同排右侧空闲处理器数与同列下方空闲处理器数的乘积大于或等于应用区域面积；当找到第一个可作为应用区域的起始点的处理器，则将该处理器做为最左上角的空闲区域分配给该应用，若找不到可作为应用区域的起始点的处理器，则将应用放进等待队列；区域选定后，将应用以选择的模式映射在该区域内。所述选择应用的映射模式的方式为：当应用的气泡与任务数满足1∶1时，应用选择方形映射模式；当应用的气泡与任务数不满足1∶1时，应用选择通信优先的映射模式。Further, the specific implementation process of the first adaptation algorithm is: search for idle processors in the system from left to right and from top to bottom, and judge whether it can be used as the starting point of the application area. The condition that can be used as the starting point is that the starting point The product of the number of free processors on the right side of the same row and the number of free processors below the same column is greater than or equal to the area of the application area; when the first processor that can be used as the starting point of the application area is found, this processor is used as the upper left corner The free area of the application is allocated to the application. If no processor can be found that can be used as the starting point of the application area, the application will be put into the waiting queue; after the area is selected, the application will be mapped in the area in the selected mode. The method of selecting the mapping mode of the application is: when the number of bubbles and tasks of the application meet the ratio of 1:1, the application selects the square mapping mode; when the number of bubbles and tasks of the application does not meet the ratio of 1:1, the application selects the mapping of communication priority model.

进一步的，所述迁移模式包括三种迁移模式，分别为：方形迁移模式、区域内最冷核迁移模式和区域内最冷邻居核迁移模式；所述步骤五根据应用的映射模式和其自身的计算量-通信量比例作为选择依据，各应用气泡数量最大不得超过其任务数量，选择迁移模式的过程如下：Further, the migration mode includes three migration modes, namely: square migration mode, the coldest core migration mode in the region, and the coldest neighbor core migration mode in the region; the fifth step is based on the applied mapping mode and its own The calculation volume-communication volume ratio is used as the basis for selection. The maximum number of application bubbles should not exceed the number of tasks. The process of selecting the migration mode is as follows:

若气泡数与任务数相等，为应用选择方形迁移模式以保持其通信距离在迁移过程中不变，且处理器将超频运行；If the number of bubbles is equal to the number of tasks, select the square migration mode for the application to keep its communication distance constant during the migration process, and the processor will run overclocked;

若气泡数与任务数不相等，为应用选择通信优先映射。阈值取值为2h/(h-1)，h为应用关键路径跳数即加权路径最长的路径跳数，为CCR大于阈值的应用选择区域最冷核迁移，为CCR小于阈值的应用选择区域最冷邻居核迁移。If the number of bubbles is not equal to the number of tasks, the communication priority mapping is selected for the application. The threshold value is 2h/(h-1), h is the hop count of the application critical path, that is, the longest path hop count of the weighted path, the coldest core migration area is selected for applications with CCR greater than the threshold value, and the selection area is selected for applications with CCR less than the threshold value Coldest neighbor nuclear migration.

进一步的，所述三种迁移模式具体实现过程如下：Further, the specific implementation process of the three migration modes is as follows:

(1)方形迁移模式，应用需满足方形映射且应用区域大小至少为任务数的两倍，调用时将所有任务与其当前映射的处理器解绑，将任务按任务序号(ID)顺序依次映射到应用区域的另一块空闲区域内；(1) Square migration mode, the application needs to meet the square mapping and the size of the application area is at least twice the number of tasks. When calling, unbind all tasks from their current mapped processors, and map tasks to In another free area of the application area;

(2)区域内最冷核迁移模式，调用时由热点检测得到的任务ID定位到超出温度阈值的处理器即过热处理器上，接着在应用区域内搜索温度最低的处理器核，当找到一个温度最低的处理器核且区域内最冷核迁移模式的迁移条件，即该处理器为气泡即没有映射任务的空闲处理器，处理器上没有映射任务，将任务与过热处理器即原来的处理器解绑并映射到该处理器上，若无法满足任务的迁移条件，即最冷核不是气泡，对过热处理器做降频处理；(2) The coldest core migration mode in the area. When calling, the task ID obtained by hotspot detection is located on the processor that exceeds the temperature threshold, that is, the overheated processor. Then, the processor core with the lowest temperature is searched in the application area. When one is found The migration condition of the processor core with the lowest temperature and the coldest core migration mode in the region, that is, the processor is a bubble, that is, an idle processor without mapping tasks, and there is no mapping task on the processor. The processor is unbound and mapped to the processor. If the migration conditions of the task cannot be met, that is, the coldest core is not a bubble, the overheated processor will be downclocked;

(3)区域内最冷邻居核迁移模式，调用时由热点检测得到的任务ID定位到过热处理器上，在与该处理器相邻的8个相邻核中搜索温度最低的处理器核，若搜索到的处理器核满足区域内最冷邻居核迁移模式的迁移条件，即其上已映射了任务但温度不高于阈值温度的三分之二，或者该处理器为气泡即是空闲处理器，则执行任务迁移，若该处理器中已有任务，交换该处理器和过热处理器上的任务，即双次解绑-重映射；若该处理器核处于空闲状态，执行单次解绑-重映射，若无法满足任务的迁移条件，即其温度高于阈值温度的三分之二，对过热处理器做降频处理。(3) The coldest neighbor core migration mode in the area. When calling, the task ID obtained by hotspot detection is located on the overheated processor, and the processor core with the lowest temperature is searched among the 8 adjacent cores adjacent to the processor. If the searched processor core meets the migration conditions of the coldest neighbor core migration mode in the area, that is, tasks have been mapped on it but the temperature is not higher than two-thirds of the threshold temperature, or the processor is a bubble, that is, idle processing If there is a task in the processor, the tasks on the processor and the overheated processor are exchanged, that is, double unbinding-remapping; if the processor core is in an idle state, a single unbinding Binding-remapping, if the migration condition of the task cannot be met, that is, its temperature is higher than two-thirds of the threshold temperature, the overheated processor will be down-clocked.

进一步的，所述分支界限算法为：为每个应用分配不同数量的气泡，在每个节点处计算当前的总响应时间作为代价函数；总响应时间σ为该节点处已被分配气泡的应用的等待时间与执行时间之和的最大值：Further, the branch-and-bound algorithm is as follows: assign different numbers of bubbles to each application, and calculate the current total response time at each node as a cost function; The maximum sum of waiting time and execution time:

σ＝max{η_i+∏_i}，i∈{0，1，...，n}σ=max{η _i +∏ _i }, i∈{0,1,...,n}

其中η_i为应用i的等待时间，∏_i为应用i的执行时间。每次优先拓展下界增长最缓慢的分支，并剪掉比下一层节点代价更高即响应时间更长的上层节点，最终得到总响应时间最短的的气泡分配结果，即应用区域划分结果。Where η _i is the waiting time of application i, and ∏ _i is the execution time of application i. Each time, the branch with the slowest growth in the lower bound is first expanded, and the upper-level nodes that are more costly than the lower-level nodes, that is, the response time is longer, are cut off. Finally, the bubble allocation result with the shortest total response time is obtained, that is, the application area division result.

进一步的，应用到达后，根据气泡-性能模型估测计算不同气泡数下应用的执行时间。各应用气泡数量最大不得超过其任务数量。当气泡数量等于任务数量时，估算全部迁移模式下的执行时间。当气泡数量少于任务数量时，为CCR高于阈值的应用估算利于通信的邻居最冷核任务迁移模式下的执行时间；为CCR低于阈值的应用估算利于计算的区域内最冷核任务迁移模式。Further, after the application arrives, the execution time of the application under different numbers of bubbles is estimated and calculated according to the bubble-performance model. The maximum number of bubbles for each application cannot exceed the number of tasks. Estimate the execution time in full migration mode when the number of bubbles is equal to the number of tasks. When the number of bubbles is less than the number of tasks, estimate the execution time in the neighbor’s coldest core task migration mode that is conducive to communication for applications whose CCR is higher than the threshold; estimate the coldest core task migration in the area that is conducive to computing for applications whose CCR is lower than the threshold model.

进一步的，若区域包含的处理器数量大于应用任务数，则根据映射后区域内气泡和任务数的比例计算该区域内活动处理器的频率，令处理器超频运行。Further, if the number of processors included in the area is greater than the number of application tasks, the frequency of the active processors in the area is calculated according to the ratio of the number of bubbles in the mapped area to the number of tasks, and the processors are overclocked.

进一步的，在应用运行阶段，系统在每一个控制间隔内检测热点。当出现热点时，在对应的应用区域内按选择的迁移模式进行任务迁移。Furthermore, in the application running phase, the system detects hot spots in every control interval. When a hotspot occurs, perform task migration in the corresponding application area according to the selected migration mode.

进一步的，所述通信优先的映射模式，以通信权重最大的节点作为首选任务映射在应用区域的几何中心附近的处理器核上，并将与其相连的节点依次映射到距其曼哈顿距离最短的可用处理器核上。其次选取未映射任务已映射的父亲节点，将与其相连的节点依次映射到距其曼哈顿距离最短的可用处理器核上。以此类推，直到应用的所有任务都映射到处理器上。Further, in the communication priority mapping mode, the node with the largest communication weight is used as the preferred task to map on the processor core near the geometric center of the application area, and the nodes connected to it are sequentially mapped to the available nodes with the shortest Manhattan distance. on the processor core. Secondly, select the mapped parent node of the unmapped task, and map the nodes connected to it to the available processor core with the shortest distance from Manhattan. And so on, until all tasks of the application are mapped to the processor.

进一步的，所述方型映射模式，取应用任务数的平方根向下取整为边长，将所有任务连续映射在一个长方形的区域内。Further, in the square mapping mode, the square root of the number of applied tasks is rounded down to the side length, and all tasks are continuously mapped in a rectangular area.

选取与采样间隔相同的控制间隔，每过一个控制间隔检测是否有处理器温度高于给定的温度阈值。如果没有，则不进行任务迁移。如果存在超出温度阈值的处理器核，则由热点上映射的任务定位到对应的应用，在绑定了该应用的事件类实例中按选择的的模式进行任务迁移。热点检测给出过热的应用ID和其过热任务ID。A control interval that is the same as the sampling interval is selected, and whether a processor temperature is higher than a given temperature threshold is detected every time a control interval passes. If not, no task migration is done. If there is a processor core exceeding the temperature threshold, the task mapped on the hotspot is located to the corresponding application, and the task migration is performed according to the selected mode in the event class instance bound to the application. Hotspot detection gives the overheating application ID and its overheating task ID.

通过建立应用的气泡-性能模型、等待时间模型、设计三种迁移模式和拓展二维片上网络众核系统。在资源分配上，利用黑硅现象以获得较优的计算性能。在任务迁移上，根据应用的映射模式和其自身的计算量-通信量比例(Computation communication rate，CCR)作为选择依据，为不同类型的应用选择迁移模式。较高的CCR一般意味着计算性能对应用的整体性能的贡献更大，反之则意味着通信性能在整体性能中更加重要。By establishing the application bubble-performance model, waiting time model, designing three migration modes and expanding the two-dimensional network-on-chip many-core system. In terms of resource allocation, the black silicon phenomenon is used to obtain better computing performance. In task migration, the migration mode is selected for different types of applications according to the application mapping mode and its own calculation-communication rate (CCR) as the basis for selection. A higher CCR generally means that the computing performance contributes more to the overall performance of the application, and vice versa means that the communication performance is more important in the overall performance.

本发明与现有技术相比具有以下的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、在这种调度方法中，气泡不仅可以用作散热，也可同时作为过热处理器上任务的迁移对象。1. In this scheduling method, bubbles can not only be used for heat dissipation, but also serve as migration objects for tasks on overheated processors.

2、系统能够动态响应当前的应用到达率，在到达应用较少时充分利用黑硅现象提升应用的运行性能，在到达应用较多时提高系统利用率，避免应用的等待时间过长。总体上，较其他的传统的任务迁移方法提升了系统吞吐率，降低了应用的平均响应时间。2. The system can dynamically respond to the current application arrival rate, make full use of the black silicon phenomenon to improve the operating performance of the application when there are few applications, and improve the system utilization rate when there are many applications to avoid excessive waiting time for applications. In general, compared with other traditional task migration methods, the system throughput rate is improved, and the average response time of the application is reduced.

附图说明Description of drawings

图1是本发明拓展后的仿真系统的结构框图。Fig. 1 is a structural block diagram of the expanded simulation system of the present invention.

图2为本发明中分支界分配气泡的算法示例图。Fig. 2 is an example diagram of an algorithm for allocating bubbles in branch boundaries in the present invention.

图3为本发明中处理器资源调度方法的流程图。FIG. 3 is a flow chart of a processor resource scheduling method in the present invention.

具体实施方式Detailed ways

下面结合实施例及附图对本发明作进一步详细的描述，但本发明的实施方式不限于此。The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

本实施例的一种基于热感知动态任务迁移的众核系统处理器资源调度方法，采用的应用1对应CCR阈值为2.6，应用2对应CCR阈值为2.6，应用3对应CCR阈值为2.4。系统规模为5*5共含25个处理器核，处理器核的时钟频率为1GHZ，时钟周期为1纳秒。In the method for resource scheduling of many-core system processors based on thermal-aware dynamic task migration in this embodiment, application 1 corresponds to a CCR threshold of 2.6, application 2 corresponds to a CCR threshold of 2.6, and application 3 corresponds to a CCR threshold of 2.4. The system scale is 5*5 and contains a total of 25 processor cores. The clock frequency of the processor core is 1GHZ, and the clock cycle is 1 nanosecond.

如图3所示，包括以下步骤：As shown in Figure 3, the following steps are included:

步骤一、检测等待序列是否为空；不为空，则映射等待序列里的每一个应用，再进行步骤二；若空则结束本次资源调度，等待下一个时钟周期；Step 1. Check whether the waiting sequence is empty; if it is not empty, map each application in the waiting sequence, and then proceed to step 2; if it is empty, end this resource scheduling and wait for the next clock cycle;

步骤二、检测到达队列是否为空；若到达队列非空，则进行步骤三；若到达队列为空，则该时钟周期没有新应用到达，不需进行处理器资源调度，结束本次资源调度，等待下一个时钟周期开始时，进行步骤一。Step 2: Detect whether the arrival queue is empty; if the arrival queue is not empty, proceed to step 3; if the arrival queue is empty, then no new application arrives in this clock cycle, processor resource scheduling is not required, and this resource scheduling ends. When waiting for the start of the next clock cycle, proceed to step 1.

步骤三、检测系统内是否没有应用运行，若没有应用运行，即系统的5*5个处理器资源均可用，则用气泡-性能模型和等待时间模型分别为每个应用估算不同气泡下的执行时间和等待时间，作为分支界限中代价函数的计算输入，通过分支界限算法搜索最佳的气泡分配结果；若有应用运行，则进行步骤四。Step 3. Check whether there is no application running in the system. If there is no application running, that is, 5*5 processor resources of the system are available, then use the bubble-performance model and the waiting time model to estimate the execution under different bubbles for each application The time and waiting time are used as the calculation input of the cost function in the branch and bound, and the optimal bubble allocation result is searched through the branch and bound algorithm; if there is an application running, go to step 4.

步骤四：若有应用运行，则系统的可用处理器被已占用的应用区域分割为一组不连续的空闲可用区域。用气泡-性能模型和等待时间模型分别为每个应用估算可选气泡数下(可选气泡数为每个空闲区域的处理器总数与应用任务数之差)的执行时间和等待时间，作为分支界限中代价函数的计算输入，通过分支界限算法搜索最佳的气泡分配结果，使总响应时间最短。Step 4: If there are applications running, the available processors of the system are divided into a group of discontinuous free available areas by the occupied application areas. Use the bubble-performance model and the waiting time model to estimate the execution time and waiting time under the number of optional bubbles (the number of optional bubbles is the difference between the total number of processors in each free area and the number of application tasks) for each application, as a branch The calculation input of the cost function in the bounds, searches for the best bubble distribution result through the branch and bound algorithm, so that the total response time is the shortest.

其中n₁～n₄为多项式阶数，α_k，β_k，γ_k，θ_k为模型系数，取值由极大似然法得到。

为应用i的气泡数的k次方，/>

为应用i关键路径跳数的k次方，/>

为应用i中平均任务计算时间的k次方，/>

为应用i的平均每两个任务间通信时间。Among them, n ₁ to n ₄ are polynomial orders, α _k , β _k , γ _k , and θ _k are model coefficients, and the values are obtained by the maximum likelihood method.

is the kth power of the number of bubbles applied to i, />

is the k-th power of the hop count of the critical path of application i, />

is the kth power of the average task computing time in application i, />

is the average communication time between every two tasks of application i.

进一步的，所述等待时间模型为给定当前应用的区域大小即气泡数和任务数之和，用于计算各应用在不同气泡数目下和当前到达率下对应的等待时间，该模型为多项式回归模型，当前应用的区域面积记为R，系统总处理器数量记为|T|，已映射应用的平均任务数记为|A_i|，令当前应用的等待时间为η_i，η_i由以下变量建模：已映射应用的平均气泡数-任务数比例r_i，已映射应用的平均执行时间e_i，应用到达率λ，等待时间模型如下：Further, the waiting time model is given the area size of the current application, that is, the sum of the number of bubbles and the number of tasks, and is used to calculate the corresponding waiting time of each application under different numbers of bubbles and the current arrival rate. The model is a polynomial regression Model, the area of the current application is denoted as R, the total number of processors in the system is denoted as |T|, the average number of tasks of the mapped application is denoted as |A _i |, let the waiting time of the current application be η _i , η _i is given by the following Variable modeling: the average number of bubbles of the mapped application - the ratio of the number of tasks r _i , the average execution time e _i of the mapped application, the application arrival rate λ, and the waiting time model are as follows:

进一步的，步骤三由全局管理器(Global manager)实现，即当后续应用到达，全局管理器统计当前系统内所有的可用处理器区域，为每个后续到达应用估算这些可用处理器区域对应的执行时间和等待时间，由分支界限算法，找到代价最小的应用-可用区域对应关系。Further, step 3 is implemented by the global manager, that is, when subsequent applications arrive, the global manager counts all available processor areas in the current system, and estimates the execution corresponding to these available processor areas for each subsequent arriving application Time and waiting time, use the branch and bound algorithm to find the application-available area correspondence with the least cost.

进一步的，所述首次适应算法具体实现过程为：在系统内从左到右，从上到下搜索空闲处理器，判断其是否能作为应用区域的起始点，能作为起始点的条件是起始点同排右侧空闲处理器数与同列下方空闲处理器数的乘积大于或等于应用区域面积；当找到第一个可作为应用区域的起始点的处理器，则将该处理器做为最左上角的空闲区域分配给该应用，若找不到可作为应用区域的起始点的处理器，则将应用放进等待队列；区域选定后，将应用以选择的模式映射在该区域内。Further, the specific implementation process of the first adaptation algorithm is: search for idle processors in the system from left to right and from top to bottom, and judge whether it can be used as the starting point of the application area. The condition that can be used as the starting point is that the starting point The product of the number of free processors on the right side of the same row and the number of free processors below the same column is greater than or equal to the area of the application area; when the first processor that can be used as the starting point of the application area is found, this processor is used as the upper left corner The free area of the application is allocated to the application. If no processor can be found that can be used as the starting point of the application area, the application will be put into the waiting queue; after the area is selected, the application will be mapped in the area in the selected mode.

所述选择应用的映射模式的方式为：当应用的气泡与任务数满足1∶1时，应用选择方形映射模式；当应用的气泡与任务数不满足1∶1时，应用选择通信优先的映射模式。The method of selecting the mapping mode of the application is: when the number of bubbles and tasks of the application meet the ratio of 1:1, the application selects the square mapping mode; when the number of bubbles and tasks of the application does not meet the ratio of 1:1, the application selects the mapping of communication priority model.

若气泡数与任务数不相等，为应用选择通信优先映射。为CCR大于阈值的应用选择区域最冷核迁移，为CCR小于阈值的应用选择区域最冷邻居核迁移。If the number of bubbles is not equal to the number of tasks, the communication priority mapping is selected for the application. For applications with a CCR greater than the threshold, select the region's coldest kernel migration, and for applications with a CCR less than the threshold, select the region's coldest neighbor kernel migration.

(1)方形迁移模式，应用需满足方形映射且应用区域大小至少为任务数的两倍，调用时将所有任务与其当前映射的处理器解绑，重新按顺序依次映射到应用区域的另一块空闲区域内；(1) Square migration mode, the application needs to meet the square mapping and the size of the application area is at least twice the number of tasks. When calling, unbind all tasks from their current mapped processors, and map them to another idle area in the application area in sequence. within the area;

(2)区域内最冷核迁移模式，调用时由热点检测得到的任务ID定位到过热处理器上，接着在应用区域内搜索温度最低的处理器核，当找到一个温度最低的处理器核且满足迁移条件，即该处理器为气泡，处理器上没有映射任务，将任务与过热处理器解绑并重新映射到该处理器上，若无法满足任务的迁移条件，即最冷核不是气泡，对过热处理器做降频处理；(2) The coldest core migration mode in the area. When calling, the task ID obtained by hotspot detection is located on the overheated processor, and then the processor core with the lowest temperature is searched in the application area. When a processor core with the lowest temperature is found and Satisfy the migration conditions, that is, the processor is a bubble, and there is no mapping task on the processor. Unbind the task from the overheated processor and remap it to the processor. If the migration conditions of the task cannot be met, that is, the coldest core is not a bubble, Reduce the frequency of the overheated processor;

(3)区域内最冷邻居核迁移模式，调用时由热点检测得到的任务ID定位到过热处理器上，在与该处理器相邻的8个相邻核中搜索温度最低的处理器核，若搜索到的处理器核满足迁移条件，即其上已映射了任务但温度不高于阈值温度的三分之二，或该处理器为气泡，则选定该处理器为任务迁移的目标处理器。若该处理器中已有任务，交换该处理器和过热处理器上的任务，即执行两次解绑-重新映射，将过热处理器与其映射的任务X解绑，将目标处理器与其映射的任务Y解绑，再将任务X重新映射到目标处理器上，将任务Y重新映射到过热处理器上；若该处理器核为空闲处理器，执行单次解绑-重新映射，若无法满足任务的迁移条件，即其温度高于阈值温度的三分之二，对过热处理器做降频处理。(3) The coldest neighbor core migration mode in the area. When calling, the task ID obtained by hotspot detection is located on the overheated processor, and the processor core with the lowest temperature is searched among the 8 adjacent cores adjacent to the processor. If the searched processor core meets the migration conditions, that is, the task has been mapped on it but the temperature is not higher than two-thirds of the threshold temperature, or the processor is a bubble, then the processor is selected as the target processing of the task migration device. If there is already a task in the processor, exchange the tasks on the processor and the overheated processor, that is, perform two unbinding-remapping, unbind the overheated processor from its mapped task X, and unbind the target processor from its mapped task X. Unbind task Y, then remap task X to the target processor, and remap task Y to the overheated processor; if the processor core is an idle processor, perform a single unbind-remap, if not satisfied The migration condition of the task is that its temperature is higher than two-thirds of the threshold temperature, and the overheated processor is frequency-down processed.

进一步额，在应用运行阶段，系统在每一个控制间隔内检测热点。当出现热点时，在对应的应用区域内按选择的迁移模式进行任务迁移。Furthermore, the system detects hotspots at every control interval during the application run phase. When a hotspot occurs, perform task migration in the corresponding application area according to the selected migration mode.

选取与采样间隔相同的控制间隔，每过一个控制间隔检测是否有处理器温度高于给定的温度阈值。如果没有，则不进行任务迁移。如果存在超出温度阈值为80摄氏度的处理器核，则由热点上映射的任务定位到对应的应用，在绑定了该应用的事件类实例中按选择的的模式进行任务迁移。热点检测给出过热的应用ID和其过热任务ID。A control interval that is the same as the sampling interval is selected, and whether a processor temperature is higher than a given temperature threshold is detected every time a control interval passes. If not, no task migration is done. If there is a processor core whose temperature exceeds the threshold of 80 degrees Celsius, the task mapped on the hot spot is located to the corresponding application, and the task migration is performed according to the selected mode in the event class instance bound to the application. Hotspot detection gives the overheating application ID and its overheating task ID.

本发明在一个仿真系统中实现，参见图1，该二维众核系统任务调度仿真系统包括应用生成器，事件驱动的模拟多核系统，及HotSpot(中文名热点)温度仿真器是美国弗吉尼大学利用电阻-电容等效模型开发的温度仿真模型，具有准确快速的特点，使用的是最新发布版本HotSpot6.0温度仿真器。应用生成器随机生成任务图即创建出模拟应用的任务个数和任务计算量(范围100-800)，任务间的通信拓扑及每条通信边的通信量(范围50-500)，任务图被表示为带权重的有向无环图，包含了各个任务的计算量和任务间的通信依赖。The present invention is implemented in a simulation system, referring to Fig. 1, this two-dimensional many-core system task scheduling simulation system includes an application generator, an event-driven simulated multi-core system, and a HotSpot (Chinese name hot spot) temperature simulator is an American Virginia The temperature simulation model developed by the university using the resistance-capacitance equivalent model is accurate and fast, and the latest version of the HotSpot6.0 temperature simulator is used. The task graph is randomly generated by the application generator, that is, the number of tasks and task calculations (range 100-800) of the simulated application, the communication topology between tasks and the communication amount of each communication edge (range 50-500), the task graph is Expressed as a directed acyclic graph with weights, it contains the calculation amount of each task and the communication dependencies between tasks.

事件驱动的模拟多核系统模拟了单任务实例和模拟处理器的绑定关系，以及任务的执行计算和任务间通信。处理器资源调度算法包含向应用分配气泡，任务映射和任务运行迁移模拟。任务映射由单任务-单处理器的绑定实现。当应用结束运行后，其任务所占用的处理器都会被释放。模拟任务运行时，任务的计算速度等同于对应处理器的运行频率(系统内每个模拟处理器的运行频率都可调控)，每对任务间的通信速度等同于处理器的路由频率/对应的两处理器间的曼哈顿距离。The event-driven simulated multi-core system simulates the binding relationship between a single task instance and a simulated processor, as well as the task execution calculation and inter-task communication. Processor resource scheduling algorithms include allocation bubbles to applications, task mapping, and task run migration simulation. Task mapping is implemented by single-task-single-processor bindings. When the application finishes running, the processors occupied by its tasks are released. When the simulation task is running, the calculation speed of the task is equal to the operating frequency of the corresponding processor (the operating frequency of each analog processor in the system can be adjusted), and the communication speed between each pair of tasks is equal to the routing frequency of the processor/corresponding Manhattan distance between two processors.

在任务映射阶段，提供了两种可选映射方法，方形映射和通信优先映射：In the task mapping phase, two optional mapping methods are provided, square mapping and communication-first mapping:

方形映射简单的将应用的所有任务按序号顺序连续映射在一个近似方形区域内，该区域占整体应用区域(已分配的空闲连续处理器区域)的一半。Square mapping simply maps all the tasks of the application sequentially in an approximate square area in order of serial number, which accounts for half of the overall application area (allocated free continuous processor area).

通信优先映射首先创建三个动态数组MAP，MET，UNM分别代表已映射的任务，与已映射任务相连(即有通信)的任务，及不在前两个队列中的其他任务。节点代表模拟处理器。初始化时，选择应用区域(已分配的空闲连续处理器区域)的近似几何中心作为第一节点，选取累计通信量最高的任务作为首选任务并映射到第一节点上，放入MAP数组。将与首选任务相连的所有任务依次放入MUP数组，剩余任务放入UNM数组。对MET中的第一个任务，在MAP中找到它的父亲任务及对应的节点，依次从与该节点曼哈顿距离为1，2，3，…的节点处开始搜索，直到找到第一个可用节点，将MET数组内的任务映射在节点上，将其从MET中移动到MAP数组中，并把与其相连且在UNM中的任务从UNM移动到MET中。对MET中每个节点都执行以上步骤，直到MAP数组的大小等于应用的任务数量，映射完成。Communication priority mapping first creates three dynamic arrays MAP, MET, and UNM to represent mapped tasks, tasks connected to mapped tasks (that is, with communication), and other tasks not in the first two queues. Nodes represent analog processors. During initialization, select the approximate geometric center of the application area (allocated free continuous processor area) as the first node, select the task with the highest cumulative traffic as the preferred task and map it to the first node, and put it into the MAP array. Put all the tasks connected with the preferred task into the MUP array in turn, and put the remaining tasks into the UNM array. For the first task in the MET, find its parent task and the corresponding node in the MAP, and start searching from the nodes whose Manhattan distance is 1, 2, 3, ... until the first available node is found , map the tasks in the MET array on the nodes, move them from the MET to the MAP array, and move the tasks connected to it and in the UNM from the UNM to the MET. Perform the above steps for each node in the MET until the size of the MAP array is equal to the number of tasks applied, and the mapping is completed.

根据单处理器的功率模型，在每微秒内计算得出众核系统内各个处理器的的功率，作为当前时刻的功率踪迹。以固定时间段作为采样间隔，每过一个采样间隔即调用HotSpot温度仿真器对系统温度进行仿真，以系统运行消耗时间的每一时刻的功率轨迹作为输入，Hotspot返回该时刻系统内每个处理器核的瞬时模拟温度。当某个处理器的温度超过设定的温度阈值时，热点对应的应用将会执行任务迁移。According to the power model of a single processor, the power of each processor in the many-core system is calculated in every microsecond as the power trace at the current moment. With a fixed time period as the sampling interval, the HotSpot temperature emulator is called to simulate the system temperature every time a sampling interval passes, and the power trajectory at each moment of the system running time is used as input, and Hotspot returns each processor in the system at that moment The instantaneous simulated temperature of the nucleus. When the temperature of a certain processor exceeds the set temperature threshold, the application corresponding to the hotspot will perform task migration.

在任务迁移阶段，提供了三种可选的迁移模式，方形迁移模式，区域内最冷核迁移模式，和区域内最冷邻居核迁移模式。In the task migration phase, three optional migration modes are provided, the square migration mode, the coldest core migration mode in the area, and the coldest neighbor core migration mode in the area.

方形迁移模式基于方形映射，将映射为长方形的任务整体迁移到应用区域内的另一个长方形区域内。The square migration mode is based on the square mapping, and migrates the task mapped as a rectangle to another rectangular area in the application area as a whole.

区域内最冷核迁移模式在应用区域内搜索温度最低的处理器核，当找到一个温度最低的处理器核且满足迁移条件，将过热的任务迁移到最冷核上。The coldest core migration mode in the area searches for the processor core with the lowest temperature in the application area. When a processor core with the lowest temperature is found and the migration conditions are met, the overheated task is migrated to the coldest core.

区域内最冷邻居核迁移模式在与过热处理器相邻的8个核中搜索温度最低的处理器核，若满足一定温度条件，则交换该处理器和过热处理器上的任务，并适当降低过热处理器的频率。The coldest neighbor core migration mode in the area searches for the processor core with the lowest temperature among the 8 cores adjacent to the overheated processor. If a certain temperature condition is met, the tasks on this processor and the overheated processor are exchanged, and the The frequency of the overheated processor.

图2展示了一个实例，表示系统初次到达三个应用后，根据分支界限算法为三个应用分配不同的气泡数量以获得最短的总响应时间。Figure 2 shows an example where after the system first arrives at three applications, different numbers of bubbles are assigned to the three applications according to the branch-and-bound algorithm to obtain the shortest overall response time.

应用1包含七个任务，由气泡-性能模型得到气泡0，1，3，5，7对应的执行时间分别为80、76、62、51、40(为简化表示，仅考虑可以构成规则区域的气泡数，应用区域的最长边小于系统的边长)。Application 1 contains seven tasks, and the execution times corresponding to bubbles 0, 1, 3, 5, and 7 obtained from the bubble-performance model are 80, 76, 62, 51, and 40 respectively (for simplified representation, only those that can form regular regions are considered number of bubbles, the longest side of the application area is less than the side length of the system).

应用2包含五个任务，由气泡-性能模型得到气泡0，1，3，4，5对应的执行时间分别为300、275、250、217、150。Application 2 contains five tasks. According to the bubble-performance model, the execution times corresponding to bubbles 0, 1, 3, 4, and 5 are 300, 275, 250, 217, and 150, respectively.

应用3包含八个任务，由气泡-性能模型得到气泡0，1，2，4，6，8对应的执行时间分别为195、184、166、147、125、99。Application 3 contains eight tasks. According to the bubble-performance model, the execution times corresponding to bubbles 0, 1, 2, 4, 6, and 8 are 195, 184, 166, 147, 125, and 99, respectively.

实施例中b1，b2，b3分别代表分配给应用1的气泡，分配给应用2的气泡，和分配给应用3的气泡。由分支界限法的特性，每次优先扩展响应时间增长最缓慢的分支(总响应时间即下界)，修剪下界大于低层分支下界的上层分支(如第三层的第一个分支的下界为195，修剪第二层所有下界大于195的分支)。在最底层中找到总响应时间最短的分支，即为最优解，实施例中总响应时间的最优解为165，修剪所有下界大于最优解的分支。最优解对应的气泡分配给三个应用，实施例的最优解对应的气泡为{b1＝7，b2＝5，b3＝6}。如图2，最优解节点对应的祖先节点为b1＝7即给应用1分配7个气泡，对应的父亲节点为b2＝5即给应用2分配5个气泡，该节点自己对应b3＝6即给应用3分配6个气泡。In the embodiment, b1, b2, and b3 represent the bubbles allocated to application 1, the bubbles allocated to application 2, and the bubbles allocated to application 3, respectively. According to the characteristics of the branch-and-bound method, the branch with the slowest growth in response time is preferentially expanded each time (the total response time is the lower bound), and the upper-level branch whose lower bound is greater than the lower bound of the lower-level branch is pruned (such as the lower bound of the first branch of the third layer is 195, Prune all branches with a lower bound greater than 195 on the second level). Find the branch with the shortest total response time in the bottom layer, which is the optimal solution. In the embodiment, the optimal solution of the total response time is 165, and all branches whose lower bound is greater than the optimal solution are pruned. The bubbles corresponding to the optimal solution are assigned to three applications, and the bubbles corresponding to the optimal solution in the embodiment are {b1=7, b2=5, b3=6}. As shown in Figure 2, the ancestor node corresponding to the optimal solution node is b1=7, that is, 7 bubbles are allocated to application 1, and the corresponding parent node is b2=5, that is, 5 bubbles are allocated to application 2, and the node itself corresponds to b3=6, that is Dispense 6 bubbles to Application 3.

应用区域划分及资源调度如下：应用1的应用区域需映射7个任务且包含7个气泡，为其分配大小为14的处理器区域，首次适应算法为其搜索到3*5的连续空闲区域(其中一个处理器不被占用)；应用2的应用区域需映射5个任务且包含5个气泡，为其分配大小为10的处理器区域，首次适应算法为其搜索到2*5的连续空闲区域；应用3的应用区域需映射8个任务且包含6个气泡，系统内已没有足够大的空闲区域，应用3在队列中等待。Application area division and resource scheduling are as follows: the application area of application 1 needs to map 7 tasks and contain 7 bubbles, and allocate a processor area with a size of 14 for it. The first adaptation algorithm searches for 3*5 continuous free areas for it ( One of the processors is not occupied); the application area of application 2 needs to map 5 tasks and contain 5 bubbles, and allocate a processor area with a size of 10 for it, and the first adaptation algorithm searches for 2*5 continuous free areas for it ; The application area of application 3 needs to map 8 tasks and contain 6 bubbles. There is not enough free area in the system, and application 3 is waiting in the queue.

对应用1，任务数与气泡数相等，将其按方形映射模式映射到区域内，并选择方形迁移模式，应用1开始运行；对应用2，任务数与气泡数相等，将其按方形映射模式映射到区域内，并选择方形迁移模式，应用2开始运行。For application 1, the number of tasks is equal to the number of bubbles, and it is mapped to the area according to the square mapping mode, and the square migration mode is selected, and application 1 starts running; for application 2, the number of tasks is equal to the number of bubbles, and it is mapped according to the square mapping mode Map to the region, and select the square migration mode, and application 2 starts running.

应用1先结束运行，其应用区域占用的处理器核心被释放。应用3的应用区域需映射8个任务且包含6个气泡，为其分配大小为14的连续空闲处理器区域，首次适应算法为其搜索到3*5的空闲区域。Application 1 finishes running first, and the processor cores occupied by its application area are released. The application area of application 3 needs to map 8 tasks and contain 6 bubbles, and allocate 14 continuous free processor areas for it, and the first adaptation algorithm searches for 3*5 free areas for it.

对应用3，气泡数小于任务数，将其按通信优先映射模式映射到区域内。计算其CCR为3.2大于阈值，为其选择区域最冷核迁移模式，应用3开始运行。For application 3, the number of bubbles is less than the number of tasks, and they are mapped to the area according to the communication priority mapping mode. Its CCR is calculated to be 3.2 greater than the threshold, and the coldest core migration mode in the region is selected for it, and application 3 starts running.

上述内容是结合具体的实施方式对本发明进行的详细说明，但并不能认定本发明的具体实施只限于此内容。对于本发明所属技术领域的普通技术人员而言，在不脱离本发明的原理和精神的前提下，还可以对这些实施进行若干调整、修改、替换和/或变型。本发明的保护范围由所附权利要求及其等同内容限定。The above content is a detailed description of the present invention in conjunction with specific implementation modes, but it cannot be assumed that the specific implementation of the present invention is limited to this content. For those skilled in the art to which the present invention belongs, several adjustments, modifications, substitutions and/or variations can be made to these implementations without departing from the principle and spirit of the present invention. The protection scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. The method for scheduling the processor resources of the many-core system based on the thermal perception dynamic task migration is characterized by comprising the following steps:

step one, detecting whether a waiting sequence is empty; if not, mapping each application in the waiting sequence, and then carrying out the second step; if the clock is empty, ending the resource scheduling, and waiting for the starting of the next clock cycle, and performing the step one;

step two, detecting whether an arrival queue is empty; if the queue is not empty, performing a third step; if the arrival queue is empty, no new application arrives in the clock cycle, the resource scheduling of the processor is not needed, the resource scheduling is ended, and the step one is carried out when the next clock cycle is waited to start;

detecting whether no application is running in the many-core system, if no application is running, using a bubble-performance model and a waiting time model to estimate the execution time and the waiting time under different bubbles for each application respectively, and searching a bubble distribution result with the shortest total response time of the application through a branch limit algorithm, wherein the execution time and the waiting time are used as calculation input of a cost function in a branch limit; if the application runs, performing a fourth step;

the bubble-performance model is used for calculating the corresponding execution time of each application under different numbers of bubbles, and is a polynomial regression model, so that the execution time of the application is pi _i Number of bubbles contained in application area b _i The critical path hop count of the application, i.e. the path hop count h with the longest weighted path _i Average computation time c of tasks in application _i Task average communication time t in application _i The bubble-performance model is as follows:

wherein n is ₁ ～n ₄ Is polynomial order, alpha _k ,β _k ,γ _k ,θ _k The fitting coefficient of the model is obtained by a maximum likelihood method;

to the k-th power of the number of bubbles applying i, < >>

To the k-th power of the number of hops applying the i critical path,/-th>

Computing the k-th power of the time for the average task in application i,/-th>

Average inter-task communication time per two tasks for application i;

the waiting time model is given to the sum of the area size of the current application, namely the bubble number and the task number, and is used for calculating the waiting time of each application corresponding to different bubble numbers and the current arrival rate, the model is a polynomial regression model, the area of the current application is recorded as R, the total number of processors of the system is recorded as |T|, and the average task number of the mapped application is recorded as |A _i Let the waiting time of the current application be eta _i ，η _i Modeling was performed by the following variables: r is the average bubble number-task number ratio of the mapped application, e is the average execution time of the mapped application, λ is the application arrival rate, and the waiting time model is as follows:

wherein a is ₀ Is a constant term, z is a polynomial order, r ^j To the power j, e of the average bubble number-task number ratio of the mapped application ^j To the power j, λ of the average execution time of the mapped application ^j J is a value from 1 to z to the power j of the arrival rate of the current application, delta _j ，ε _j ，μ _j Fitting coefficients of parameter items obtained by maximum likelihood regression;

step four: if the application runs, the available processor of the system is divided into a group of discontinuous idle available areas by the occupied application area, the execution time and the waiting time under the selectable bubble number are estimated for each application by using a bubble-performance model and a waiting time model respectively, the execution time and the waiting time are used as the calculation input of a cost function in a branch limit, and the optimal bubble distribution result is searched by a branch limit algorithm;

fifth, the mapping stage is applied: selecting an idle area by adopting a first-time adaptive heuristic algorithm, and then selecting an applied mapping mode for mapping; the first adaptation heuristic algorithm is as follows: searching idle processors from left to right and from top to bottom in the system, judging whether the idle processors can be used as a starting point of an application area, wherein the condition that the starting point can be used as the starting point is that the product of the number of idle processors on the right side of the same row and the number of idle processors under the same row is larger than or equal to the area of the application area; when a first processor which can be used as the starting point of the application area is found, the processor is used as the idle area at the leftmost upper corner to be distributed to the application, and if the processor which can be used as the starting point of the application area is not found, the application is put into a waiting queue; after the region is selected, mapping the application in the selected mode in the region; the mode of selecting the mapping mode of the application is as follows: when the number of bubbles and tasks applied satisfies 1:1, selecting a square mapping mode by application; when the number of bubbles and tasks applied does not satisfy 1:1, applying a mapping mode for selecting communication priority;

step six, application operation phase: selecting a migration mode for different types of applications according to the mapping mode of the application and the calculation amount-traffic ratio (Computation communication rate, CCR) of the application as selection basis; the migration modes comprise three migration modes, namely: square migration mode, coldest core migration mode in area and coldest neighbor core migration mode in area;

the specific implementation process of the three migration modes is as follows:

(1) The square migration mode is applied, the square mapping is required to be met, the size of an application area is at least twice the number of tasks, all tasks are unbinding with a currently mapped processor when the application is called, and the tasks are mapped into an idle area of the application area in sequence according to the sequence of task serial numbers;

(2) The method comprises the steps that a coldest core migration mode in an area is located on a processor which is a overheat processor and is obtained through hot spot detection when the coldest core migration mode is called, then the processor core with the lowest temperature is searched in an application area, when a lowest-temperature processor core is found and migration conditions of the coldest core migration mode in the area are met, namely the processor is an air bubble, namely an idle processor without a mapping task, the processor is not mapped with the mapping task, the task and the overheat processor, namely the original processor are unbinding and mapped to the processor, and if migration conditions of the coldest core migration mode in the area cannot be met, namely the coldest core is not an air bubble, the overheat processor is subjected to down-conversion treatment;

(3) The method comprises the steps that a coldest neighbor core migration mode in an area is located on a overheat processor when a task ID obtained through hot spot detection is called, the processor core with the lowest temperature is searched in 8 adjacent cores adjacent to the processor, if the searched processor core meets the migration condition of the coldest neighbor core migration mode in the area, namely, the task is mapped on the processor core but the temperature of the processor core is not higher than two thirds of the threshold temperature, or the processor is a bubble, the task migration is executed, and if the task is already in the processor, the tasks on the processor and the overheat processor are exchanged, namely, double unbinding-remapping is carried out; if the processor core is in an idle state, executing single unbinding-remapping, and if the migration condition of the task cannot be met, namely the temperature of the processor core is higher than two thirds of the threshold temperature, performing down-conversion treatment on the overheat processor;

step five, according to the mapping mode of the application and the calculation amount-traffic ratio of the application, the maximum number of bubbles of each application must not exceed the task number, and the condition of selecting the migration mode is as follows:

if the number of bubbles is equal to the number of tasks, a square migration mode is selected for the application to keep the communication distance unchanged in the migration process, and the processor operates in an over-frequency mode;

if the bubble number is not equal to the task number, selecting the coldest core migration of the area for the application with the CCR larger than the threshold value, selecting the coldest neighbor core migration of the area for the application with the CCR smaller than the threshold value, wherein the threshold value is 2 h/(h-1), and h is the hop number of the application critical path, namely the longest path hop number of the weighted path.

2. The method according to claim 1, wherein step three is implemented by a global manager, and when the subsequent applications arrive, the global manager counts all available processor areas in the current system, estimates the execution time and the waiting time corresponding to the available processor areas for each subsequent arriving application, and finds the application-available area correspondence relationship with the minimum cost by a branch limit algorithm.

3. The processor resource scheduling method of claim 1, wherein the branch boundary algorithm is: allocating a different number of bubbles for each application, calculating a current total response time at each node as a cost function; the total response time σ is the maximum of the sum of the latency and execution time of the application at the node to which the bubble has been allocated:

σ＝max{η _i +Π _i }，i∈{0,1,…,n}

wherein eta _i To apply i's latency, pi _i The execution time for application i; and (3) expanding branches with slowest lower bound growth in priority each time, cutting off upper nodes with higher cost than the next nodes, namely longer response time, and finally obtaining a bubble distribution result with shortest total response time, namely an application region division result.

4. The processor resource scheduling method according to claim 1, wherein the selectable number of bubbles is a difference between a total number of processors per free area and a number of application tasks; the number of application tasks refers to the number of processors that are equal to the number of processors that the application is running.