CN119248511A

CN119248511A - Task scheduling method, electronic device, computer storage medium and computer program product

Info

Publication number: CN119248511A
Application number: CN202411756408.1A
Authority: CN
Inventors: 高聪慧
Original assignee: Uc Mobile Co ltd
Current assignee: Uc Mobile Co ltd
Priority date: 2024-12-02
Filing date: 2024-12-02
Publication date: 2025-01-03

Abstract

The embodiment of the present application provides a task scheduling method, electronic device, computer storage medium and computer program product, the task scheduling method includes: obtaining multiple tasks to be scheduled; the multiple tasks include: an upstream task composed of multiple upstream task slices, and a downstream task composed of multiple downstream task slices; determining the task slice dependency, the task slice dependency characterizes the dependency between each upstream task slice and each downstream task slice; in response to the completion of the execution of the current upstream task slice, assigning a task executor to the target downstream task slice that depends on the completed upstream task slice, so that the task executor executes the target downstream task; the target downstream task slice is determined by the task slice dependency. The embodiment of the present application can improve the resource utilization of the task executor, improve the efficiency of task processing, shorten the task execution time, and reduce the delay of task processing.

Description

Task scheduling method, electronic device, computer storage medium and computer program product

Technical Field

Embodiments of the present application relate to the field of computer technologies, and in particular, to a task scheduling method, an electronic device, a computer storage medium, and a computer program product.

Background

Batch processing of tasks refers to a way of processing a set of data related to a task at one time, and is usually performed after a certain amount of data has been accumulated. Batch processing is delayed in nature, and is suitable for large-scale data analysis tasks, which can be run periodically by task scheduling, such as daily, hourly processing of specific data sets. For example, in the technical field of web search based on indexes, the full-scale index construction process can be divided into three tasks, namely a light feature merging task, a data selecting task and a heavy feature merging task in sequence according to the execution sequence. Wherein, each task can be executed in a batch processing mode.

In the conventional case, for a multitasking scenario with a dependency relationship between tasks, that is, a scenario in which an output of an upstream task is used as an input of a downstream task, the upstream task and the downstream task are often sequentially executed in order. With the above-described index building process, it is common for all light feature combinations to be completed before the data pick-up of the downstream task is started.

However, when one or some subtasks (i.e., task slices) of the light feature merge task at the upstream have long execution time, the completion progress of the entire light feature merge task may be significantly slowed down, thereby affecting the start timing of the data pick task at the downstream. Therefore, the batch processing method of the sequential execution not only reduces the resource utilization rate, but also can cause a large delay in task processing.

Disclosure of Invention

Accordingly, embodiments of the present application provide a task scheduling scheme to at least partially solve the above-mentioned problems.

According to a first aspect of an embodiment of the present application, there is provided a task scheduling method, including:

the method comprises the steps of obtaining a plurality of tasks to be scheduled, wherein the plurality of tasks comprise an upstream task formed by a plurality of upstream task fragments and a downstream task formed by a plurality of downstream task fragments;

Determining task slicing dependency relationships, wherein the task slicing dependency relationships represent the dependency relationships existing between each upstream task slicing and each downstream task slicing;

and in response to completion of execution of the current upstream task fragment, allocating a task executor for a target downstream task fragment depending on the completed upstream task fragment to execute the target downstream task by the task executor, wherein the target downstream task fragment is determined by the task fragment dependency relationship.

According to a second aspect of an embodiment of the present application, there is provided another task scheduling method, including:

Determining task slicing dependency relationship in a scheduling process aiming at a plurality of tasks, wherein the plurality of tasks comprise an upstream task formed by a plurality of upstream task slices and a downstream task formed by a plurality of downstream task slices;

receiving a task allocation request sent by a target task executor;

And distributing unexecuted target downstream task fragments to the target task executor to execute the target downstream task fragments through the target task executor, wherein the target downstream task fragments are determined according to the task fragment dependency relationship and depend on the completed upstream task fragments.

According to a third aspect of the embodiment of the present application, there is provided a further task scheduling method, including:

Sending a task allocation request to a task scheduler so that the task scheduler allocates unexecuted target downstream task fragments to the target task executor in a scheduling process for a plurality of tasks;

Acquiring the target downstream task fragments distributed by a task scheduler and executing the target downstream task fragments;

the plurality of tasks comprise an upstream task composed of a plurality of upstream task slices and a downstream task composed of a plurality of downstream task slices, and the target downstream task slices are downstream task slices depending on the completed upstream task slices.

According to a fourth aspect of an embodiment of the present application, there is provided an electronic device, including a processor, a memory, a communication interface, and a communication bus, where the processor, the memory, and the communication interface complete communication with each other through the communication bus, and the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform an operation corresponding to the method according to any one of the first to third aspects.

According to a fifth aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the first to third aspects.

According to the task scheduling scheme provided by the embodiment of the application, a plurality of tasks to be scheduled are acquired, the plurality of tasks comprise an upstream task formed by a plurality of upstream task slices and a downstream task formed by a plurality of downstream task slices, task slice dependency relationships are determined, the task slice dependency relationships represent the dependency relationships existing between each upstream task slice and each downstream task slice, task executors are distributed for target downstream task slices depending on the completed upstream task slices in response to completion of execution of the current upstream task slices, and the target downstream task slices are determined through the task slice dependency relationships.

In the embodiment of the application, after a plurality of batch processing type upstream and downstream tasks with dependency relationships are obtained, the dependency decomposition of the downstream task on the overall level of the upstream task is converted into the dependency of the downstream task fragments on individual upstream task fragments through the dependency relationships among the task fragments. According to the dependency relationship among the task slices, in the task slice scheduling and executing process, the downstream task slice executing program with the dependency relationship with the individual upstream task slices can be started after the individual upstream task slices in the upstream task are executed without waiting for the overall execution of the upstream task. Therefore, by the scheme of the embodiment of the application, after receiving the batch processing upstream and downstream tasks with the dependency relationship, the upstream and downstream tasks which are originally processed in a batch processing mode can be converted into the task fragments which are processed in a stream processing mode, so that the stream processing of the task fragment layer is realized while the batch processing operation on the data inside each task fragment is reserved.

By the method, the task submitter can still set and submit task logic according to the batch processing task mode without any change of task logic, and in the task scheduling and executing process, the scheme provided by the embodiment of the application can automatically convert batch processing tasks into stream processing task fragments. Therefore, the pseudo-streaming processing of the batch task is realized, so that the batch task executing process has the advantages of continuous data circulation, improved resource utilization rate of a task executor, improved task processing efficiency, shortened task executing time and reduced task processing delay.

In addition, in the embodiment of the application, once the execution of the upstream task fragments is completed, the corresponding downstream task fragment execution program can be started without waiting for the completion of the execution of other upstream task fragments. Therefore, for the first set of slicing execution results corresponding to the first executed upstream task slicing, the embodiment of the application can effectively shorten the output duration of the first set of slicing execution results. For the scene of checking the logical correctness of the task based on the task execution result, the checking efficiency can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a schematic diagram of a plurality of batch tasks with associations in an index build scenario;

FIG. 2 is a schematic diagram of a batch task scheduling process in the related art;

FIG. 3 is a flow chart illustrating steps of a task scheduling method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of task shard dependencies in the batch task scenario shown in FIG. 1;

FIG. 5 is a schematic diagram showing a comparison between a batch task scheduling process according to an embodiment of the present application and a batch task scheduling process according to the related art;

FIG. 6 is another comparative schematic diagram between a batch task scheduling process according to an embodiment of the present application and a batch task scheduling process according to the related art;

FIG. 7 is a schematic diagram of an exemplary system to which a task scheduling method of an embodiment of the present application is applied;

FIG. 8 is a flowchart illustrating steps of another task scheduling method according to an embodiment of the present application;

FIG. 9 is a schematic diagram of an interaction flow of a task scheduling process corresponding to the embodiment shown in FIG. 8;

FIG. 10 is a functional architecture diagram of a task scheduler for performing the embodiment of FIG. 8;

FIG. 11 is a flowchart illustrating steps of a task scheduling method according to yet another embodiment of the present application;

FIG. 12 is a schematic diagram of an interaction flow of a task scheduling process in a long-tail task slice execution process;

FIG. 13 is a block diagram of a task scheduler according to an embodiment of the present application;

FIG. 14 is a block diagram of another task scheduler according to an embodiment of the present application;

FIG. 15 is a block diagram of a task scheduler according to yet another embodiment of the present application;

Fig. 16 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions in the embodiments of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the present application, shall fall within the scope of protection of the embodiments of the present application.

Background of the application overview

Referring to fig. 1, fig. 1 is a schematic diagram of a plurality of batch tasks having an association relationship in an index building scenario. The index construction process comprises three tasks of light feature combination, data selection and heavy feature combination, which are sequentially executed according to the execution sequence. The light feature merging task specifically refers to an initial document table containing different original dimension feature values of each page, wherein a plurality of preset light features (new features with smaller influence on index construction) are newly added to each page in the table to obtain a light feature table, the light feature table contains a plurality of groups of data, one group of data corresponds to one page, the data selecting task refers to selecting data from the plurality of groups of data contained in the light feature table according to a preset selecting rule to obtain a plurality of selected groups of data, and the heavy feature merging task refers to carrying out feature addition on the plurality of groups of data again, namely, a plurality of preset heavy features (new features with larger influence on index construction) are newly added to each group of data in the plurality of groups of data to obtain a heavy feature table, so that index construction is carried out based on the obtained heavy feature table.

From the above, it is clear that the three tasks have a dependency relationship in which the execution of the data selection task depends on the result obtained by the light feature merging task, and the execution of the heavy feature merging task depends on the result obtained by the data selection task. That is, for the first two tasks, the lightweight merge task is the upstream task and the data pick task is the downstream task, while for the latter two tasks, the data pick task is the upstream task and the heavy merge task is the downstream task.

In order to improve task execution efficiency, during task execution, a complete task may be generally divided into a plurality of task slices (which may also be referred to as subtasks or task segments, etc.), so that each task slice is executed in parallel. In fig. 1, the light feature merge task is divided into 65536 task slices, which are respectively denoted as 0-0 task slices and 0-1 task slices, the data selection task is divided into 65536 task slices, which are respectively denoted as 1-0 task slices and 1-1 task slices, the heavy feature merge task is divided into 131072 task slices, which are respectively denoted as 2-0 task slices, 2-1 task slices, 2-2 task slices and 2-3 task slices (the number of the task slices obtained by specific division can be customized according to practical conditions, and the embodiment of the application is not limited).

In the related task scheduling scheme, for a multitasking batch processing scenario with a dependency relationship, an upstream task and a downstream task are often sequentially executed. Referring to fig. 2, fig. 2 is a schematic diagram of a batch task scheduling process in the related art. Fig. 2 is a schematic diagram of a multi-task scheduling and executing process in the index construction scenario shown in fig. 1, and for convenience of explanation, it is assumed that, unlike fig. 1, the three tasks of light feature merging, data selecting and heavy feature merging in fig. 2 are all divided into 3 task slices, wherein the task slices included in the light feature merging task are respectively 0-0, 0-1 and 0-2, the task slices included in the data selecting task are respectively 1-0, 1-1 and 1-2, and the task slices included in the heavy feature merging task are respectively 2-0, 2-1 and 2-2. The specific scheduling and executing process is as follows:

And starting 3 task executors in parallel, and respectively distributing 0-0 task slices, 0-1 task slices and 0-2 task slices in the light feature merging task to the 3 task executors. In the light feature merging task execution process, after the 0-0 task slice and the 0-1 task slice are executed, the execution process of the downstream data selection task is not started because the 0-2 task slice is not executed, but after the 0-2 task slice is executed, the 1-0 task slice, the 1-1 task slice and the 1-2 task slice in the data selection task are respectively distributed to the 3 task executors. Similarly, in the execution process of the data selecting task, when the 1-0 task segment and the 1-1 task segment are executed, the execution process of the downstream duplicate feature merging task is not started because the 1-2 task segment is not executed, but after the 1-2 task segment is executed, the 2-0 task segment, the 2-1 task segment and the 2-2 task segment in the duplicate feature merging are respectively distributed to the 3 task executors.

As can be seen from fig. 2, no matter which task of the three tasks is light feature merging, data selecting and heavy feature merging, there is a period of idle waiting period before the task slices No. 0 and No. 1 are executed and the task slice No. 2 is executed. During this idle waiting period, the resources of the task executor are in an idle state, although there are unexecuted downstream task slices. Therefore, the batch processing method of the sequential execution not only reduces the resource utilization rate, but also causes delay in task processing.

Detailed implementation of embodiments of the application

Referring to fig. 3, fig. 3 is a flowchart illustrating steps of a task scheduling method according to an embodiment of the present application. The task scheduling method provided by the embodiment of the application can comprise the following steps:

Step 302, obtaining a plurality of tasks to be scheduled, wherein the plurality of tasks comprise an upstream task formed by a plurality of upstream task fragments and a downstream task formed by a plurality of downstream task fragments.

Specifically, the plurality of tasks in the embodiment of the present application may be a plurality of tasks having an upstream-downstream dependency relationship with each other, and the specific number of the tasks is not limited. The specific number may be 2, or any natural number greater than 2, such as 3,4, etc. When the number of tasks to be scheduled is greater than 2, any two adjacent tasks have an upstream-downstream dependency relationship according to the execution sequence. In the index construction scenario shown in fig. 1, there are 3 tasks to be scheduled, and according to the execution sequence, the light feature merging task is an upstream task and the data selecting task is a downstream task for the first two tasks, while the data selecting task is an upstream task and the heavy feature merging task is a downstream task for the last two tasks.

To improve task execution efficiency, and facilitate parallel execution, a complete task may be generally divided into multiple task slices. In the case of an upstream task, it may be divided into a plurality of upstream task slices, and likewise in the case of a downstream task, it may be divided into a plurality of downstream task slices. In the actual scheduling process, the number of the task fragments contained in each task can be preset by a task submitter, or can be set or calculated by a task scheduler in combination with the total amount of data to be processed contained in the task and the calculation power condition of the task scheduler before the task scheduling.

After the number of task fragments contained in the task is determined, the plurality of pieces of data to be processed can be classified into one task fragment according to the data amount of each piece of data to be processed contained in the task. Further, when the data to be processed is classified, the data processing amount contained in each finally obtained task fragment can be balanced as much as possible from the data amount perspective.

Step 304, determining task shard dependency relationships, wherein the task shard dependency relationships represent dependency relationships existing between each upstream task shard and each downstream task shard.

Specifically, if a dependency relationship exists between a certain upstream task segment and a certain downstream task segment, it indicates that the execution of the downstream task segment depends on an execution result corresponding to the upstream task segment, the downstream task segment can start to execute after the execution of the upstream task segment is completed, and the execution result of the upstream task segment corresponding to the upstream task segment is used as an input condition for the execution of the downstream task segment.

The dependency relationship between task slices can be determined according to the relationship between the number of upstream task slices and the number of downstream task slices. When the number of the upstream task slices is equal to that of the downstream task slices, a one-to-one dependency relationship exists between the downstream task slices and the upstream task slices, namely, an upstream task slice execution result corresponding to one upstream task slice is used as an execution input condition of one downstream task slice, and when the number of the downstream task slices is N (N is a natural number greater than 1) times that of the upstream task slices, an N-to-one dependency relationship exists between the downstream task slices and the upstream task slices, namely, an upstream task slice execution result corresponding to one upstream task slice is used as an execution input condition of N downstream task slices.

As shown in fig. 4, fig. 4 is a schematic diagram of task slicing dependency in the batch task scenario shown in fig. 1. In terms of the light feature merging task and the data selecting task, since the number of task slices included in each task is 65536, there is a one-to-one dependency relationship between the downstream data selecting task slices and the upstream light feature merging task slices, that is, an upstream task slice execution result corresponding to one upstream task slice will be used as an execution input condition of one downstream task slice, while in terms of the data selecting task and the heavy feature merging task, when the number of downstream task slices included in the downstream heavy feature merging task is 2 times that of the upstream task slices, there is a 2-to-one dependency relationship between the downstream task slices and the upstream task slices, that is, an upstream task slice execution result corresponding to one upstream data selecting task slice will be used as an execution input condition of 2 downstream heavy feature merging task slices.

And step 306, in response to the completion of the current upstream task shard execution, allocating a task executor to the target downstream task shard depending on the completed upstream task shard so as to execute the target downstream task by the task executor, wherein the target downstream task shard is determined by the task shard dependency relationship.

Specifically, the current upstream task tile is the upstream task tile currently being executed. In the embodiment of the application, when the current execution of the upstream task segment is determined to be completed, the execution result corresponding to the upstream task segment is indicated at the moment and obtained, so that a task executor can be allocated to a target downstream task segment depending on the completed execution of the upstream task segment, and the target downstream task segment can be executed by taking the execution result as an input condition.

Referring to fig. 5, fig. 5 is a diagram illustrating a comparison between a batch task scheduling process according to an embodiment of the present application and a batch task scheduling process according to the related art. Fig. 5 (a) is a diagram illustrating a batch task scheduling process according to the related art, and fig. 5 (b) is a diagram illustrating a skin-free scheduling process according to an embodiment of the present application. Specifically:

as can be seen from the graph (b) in fig. 5, the specific scheduling and execution process is:

Similarly to fig. (a), 3 task executors are started in parallel, and 0-0 task slices, 0-1 task slices, and 0-2 task slices in the light feature merge task are respectively allocated to the above 3 task executors. In the light feature merging task executing process, after the 0-0 task slice and the 0-1 task slice are executed, the 1-0 task slice depending on the 0-0 task slice and the 1-1 task slice depending on the 0-1 task slice in the downstream data selecting task are started respectively without waiting for the execution of the 0-2 task slice after the execution of the 0-0 task slice and the 0-1 task slice is finished, then, after the execution of the 1-0 task slice is finished, the 2-0 task slice depending on the 1-0 task slice in the downstream heavy feature merging task can be started, after the execution of the 1-1 task slice is finished, the 2-1 task slice depending on the 1-1 task slice in the downstream heavy feature merging task can be started without waiting for the execution of the rest task slice in the data selecting task slice after the execution of the 1-1 task slice is finished, and similarly, after the execution of the 0-2 task slice depending on the 1-2 task slice in the downstream data selecting task slice can be started, and after the execution of the 1-2 task slice is finished, the 2 task slice can be started.

Comparing the graph (a) and the graph (b) in fig. 5, it can be known that, by the task scheduling scheme provided by the embodiment of the application, the idle waiting period originally existing in the graph (a) can be eliminated in the task scheduling and executing process, so that the resource utilization rate of the task executor is improved, and the total time consumed in the task executing process is further reduced.

In summary, in the task scheduling method provided by the embodiment of the present application, after a plurality of batch-type upstream and downstream tasks with dependency relationships are obtained, the dependency decomposition of the downstream task on the overall level of the upstream task is converted into the dependency of the downstream task partition on the individual upstream task partition through the dependency relationships among the task partitions. According to the dependency relationship among the task slices, in the task slice scheduling and executing process, the downstream task slice executing program with the dependency relationship with the individual upstream task slices can be started after the individual upstream task slices in the upstream task are executed without waiting for the overall execution of the upstream task. Therefore, by the scheme of the embodiment of the application, after receiving the batch processing upstream and downstream tasks with the dependency relationship, the upstream and downstream tasks which are originally processed in a batch processing mode can be converted into the task fragments which are processed in a stream processing mode, so that the stream processing of the task fragment layer is realized while the batch processing operation on the data inside each task fragment is reserved.

The task scheduling method provided by the embodiment of the application can be executed by any appropriate device with data processing capability, including but not limited to a PC, a server and the like.

Optionally, in some of these embodiments, in response to completion of the current upstream task shard execution, the process of assigning a task executor to a target downstream task shard that depends on the completed upstream task shard may include:

judging whether unexecuted upstream task fragments exist or not according to the completion of the execution of the current upstream task fragments;

If no unexecuted upstream task fragments exist, a task executor is distributed for the target downstream task fragments which depend on the completed upstream task fragments;

If the unexecuted upstream task fragment exists, a task executor is allocated to the unexecuted upstream task fragment, the unexecuted upstream task fragment is used as the updated current upstream task fragment, the unexecuted upstream task fragment is executed by the task executor, and the step of judging whether the unexecuted upstream task fragment exists or not in response to the completion of the execution of the current upstream task fragment is returned until the allocation of each task fragment is completed.

Specifically, when the number of task processors is limited and the number of task slices obtained by dividing the task is greater than the number of task processors capable of performing task slicing processing, all the task slices cannot be executed in parallel, but a part of the task slices are executed in parallel by a plurality of task processors, and the rest of the task slices are executed after the execution of the part of the task slices is waited. Therefore, in view of the above situation, in the embodiment of the present application, after the current upstream task is executed, it may be first determined whether there is an upstream task partition that is not executed (i.e., an upstream task partition to be scheduled), and when there is an upstream task partition that is not executed, the upstream task partition that is not executed is preferentially executed. After all the upstream task fragments are scheduled and executed, starting the execution process of the downstream task fragments until all the task fragments are executed.

Referring to fig. 6, fig. 6 is another comparative schematic diagram between a batch task scheduling process according to an embodiment of the present application and a batch task scheduling process according to the related art. For ease of understanding, the above-described embodiment of the present application is explained below with reference to fig. 6:

fig. 6 is a schematic diagram of a multi-task scheduling and executing process in the index construction scenario shown in fig. 1, for convenience of explanation, it is assumed that, unlike fig. 1, the three tasks of light feature merging, data selecting and heavy feature merging in fig. 6 are all divided into 4 task slices, wherein the task slices included in the light feature merging task are respectively 0-0, 0-1, 0-2 and 0-3, the task slices included in the data selecting task are respectively 1-0, 1-1, 1-2 and 1-3, and the task slices included in the heavy feature merging task are respectively 2-0, 2-1, 2-2 and 2-3. In addition, it is also assumed that the number of task executors that can run in parallel at the same time is 3 due to limitation of computational power resources.

Fig. 6 (a) is a diagram of a batch task scheduling process according to the related art, and fig. 6 (b) is a diagram of a task scheduling process according to an embodiment of the present application.

Specifically, as can be seen from the graph (a) in fig. 6, the specific task scheduling and executing process of the related art is that 3 task executors are started in parallel, and 0-0, 0-1 and 0-2 task slices are respectively allocated to the 3 task executors, and when the 0-0 and 0-1 task slices are simultaneously executed, any task executor executing the task slices can be allocated to execute the 0-3 task slices again (the task executor executing the 0-1 task slices is adopted to execute the 0-3 task slices in fig. 6), and at this time, the 0-2 task slices are also synchronously executed. After the 0-2 task is executed, the 0-3 task fragments are not executed, so that the task executors which originally execute the 0-2 task and the 0-0 task are in an idle waiting period until the 0-3 task fragments are executed, and then the execution of the 1-0 task fragments, the 1-1 task fragments and the 1-2 task fragments are started in parallel. Then, the 1-1 task segment is executed first, and at this time, the task executor that distributes and executes the 1-1 task segment executes the 1-3 task segment again, and at this time, the 1-0 task segment and the 1-2 task segment are executed synchronously. After the execution of 1-0 or 1-2 is completed, the task executor which has previously executed 1-0 and 1-2 is in idle waiting period because the 1-3 task fragments have not been executed, until the 1-3 task fragments are executed, and then the execution of 2-0, 2-1 and 2-2 task fragments are started in parallel. And then, the 2-0 task fragments are executed and completed at first, and a task executor for reassigning and executing the 2-0 task fragments executes the 2-3 task fragments until the 2-3 task fragments are executed.

As can be seen from the graph (b) in fig. 6, the specific scheduling and execution process is:

Similarly to the figure (a), 3 task executors are started in parallel, and 0-0, 0-1 and 0-2 task slices are respectively distributed to the 3 task executors, when the 0-0 and 0-1 task slices are simultaneously executed, whether an unexecuted (to be scheduled) upstream task exists is judged, the 0-1 task executor can be distributed to execute the 0-3 task slices again because the 0-3 upstream task slice exists, at the moment, the unexecuted (to be scheduled) upstream task does not exist, and therefore, the downstream task slices depending on the executed 0-0 and 0-1 can be determined, in particular, the downstream task slices depending on the executed 0-0 are 1-0, the downstream task slices depending on the executed 0-1 are 1-1, therefore, the task executor which can be distributed to execute the 0-0 can start to execute the 1-0, when the 0-2 is executed, the unexecuted upstream task slices do not exist, and thus the downstream task slices depending on the 0-1 can be distributed to the 0-2 can be distributed, and the task slices can be distributed to the 0-2 can be distributed to the downstream task slices shown in the figure (2, and the figure 2 can be distributed to the figure 2) and the figure (2) can be distributed to execute the task slices) according to the figure (2).

Comparing the graph (a) and the graph (b) in fig. 6, it can be known that, by the task scheduling scheme provided by the embodiment of the application, the idle waiting period originally existing in the graph (a) can be eliminated in the task scheduling and executing process, so that the resource utilization rate of the task executor is improved, and the total time consumed in the task executing process is further reduced.

Further, in the above embodiment of the present application, when there is an upstream task segment that is executed, it is determined whether there is an upstream task segment that is not executed, and when there is an upstream task segment that is executed, the upstream task segment is preferentially executed, and then, when there is no upstream task segment that is not executed, the target downstream task segment that depends on the executed upstream task segment is reassigned to be executed. Through the above process, for the legacy upstream task slices which cannot be executed in the previous parallel execution process, the corresponding execution results can be obtained as soon as possible, and further, the downstream task slice execution process which depends on the legacy upstream task slices can be started as soon as possible, thereby being beneficial to improving the acquisition efficiency of the final slice execution results corresponding to the task slices.

Optionally, in some embodiments, the task scheduling method may further include:

in the current task sharding executing process, recording task executing time length of a first task executor on the current task sharding, wherein the first task executor is a task executor configured to execute the current task sharding;

If the execution time length exceeds the preset time length, a second task executor is allocated to the current task fragment so as to execute the current task fragment in parallel through the first task executor and the second task executor;

And ending the execution process of the current task fragment in response to the task execution result of the current task fragment.

In particular, during task slicing scheduling and executing, multiple task slices may be scheduled and executed at the same time, but because different task slices may involve different data amounts and complexities, and the computing power resources or running states of different task executors may also be different, a problem that the executing process of some task slices may take too long time may be caused (for convenience of description, task slices with executing time longer than a preset time are referred to as long-tail task slices in the embodiment of the present application). In this case, the completion time of the overall task is affected by the task slicing having an excessively long execution time, which may cause an increase in the waiting time. This is especially the case in large-scale data processing, which in turn affects overall system performance and resource utilization.

In order to solve the problems, in the above embodiment of the present application, a solution is adopted in which long-tail task fragments run in parallel, when the execution duration of a certain task executor (referred to as a first task executor in the embodiment of the present application for convenience of description) for a certain task fragment is too long (if the execution duration exceeds a preset duration), the first task executor is reserved to execute the task fragment, and simultaneously, the second task executor executes the task fragment in parallel, and when any one of the first task executor and the second task executor obtains the task execution result of the long-tail task fragment, the execution process for the long-tail task fragment can be ended. For example, if the second task executor obtains the task execution result of the long-tail task partition first, the execution process of the long-tail task partition by the first task executor can be ended, otherwise, if the first task executor obtains the task execution result of the long-tail task partition first, the execution process of the long-tail task partition by the second task executor can be ended.

When the computing power resource of the first task executor is insufficient or the long tail of the task segmentation execution duration process is caused by the abnormal running state of the first task executor, the execution result of the long tail task segmentation can be obtained again in a shorter time through the parallel running solution provided by the embodiment of the application, so that the overall efficiency and stability of task scheduling and execution are improved.

In addition, in the embodiment of the application, the specific value and the specific determination mode of the preset duration are not limited, and the preset duration can be set in a self-defined manner according to actual conditions. For example, the specific value of the preset duration can be set according to experience, the data quantity and the repetition degree of the historical task fragments can be combined, the execution time of the historical task fragments is analyzed and counted, the average execution duration of each task fragment is estimated, and the preset duration is set based on the average execution duration. Illustratively, if the execution duration of the task segment exceeds 1.5 times the average execution duration, it may be determined that the long tail problem described above occurs.

Referring to fig. 7, fig. 7 is a schematic diagram of an exemplary system to which the task scheduling method according to the embodiment of the present application is applied, and for convenience of understanding, an application scenario of the task scheduling method according to the embodiment of the present application is first explained with reference to fig. 7.

As shown in fig. 7, the system 700 may include a task scheduler 702, a communication network 704, and/or one or more task executors 706, which are illustrated in fig. 7 as multiple task executors.

Task scheduler 702 may be any suitable device for performing task scheduling including, but not limited to, a server cluster, a computing cloud server cluster, and the like. In some embodiments, task scheduler 702 may perform any suitable function. For example, in some embodiments, task scheduler 702 may receive a task allocation request sent by task executor 706 and allocate the appropriate task shards to task executor 706 for execution by task executor 706.

In some embodiments, communication network 704 may be any suitable combination of one or more wired and/or wireless networks. For example, the communication network 704 can include any one or more of the Internet, an intranet, a wide area network (Wide Area Network, WAN), a local area network (Local Area Network, LAN), a wireless network, a digital subscriber line (Digital Subscriber Line, DSL) network, a frame relay network, an asynchronous transfer mode (Asynchronous Transfer Mode, ATM) network, a virtual private network (Virtual Private Network, VPN), and/or any other suitable communication network. Task executor 706 is capable of being connected to communication network 704 via one or more communication links (e.g., communication link 712), which communication network 704 is capable of being linked to task scheduler 702 via one or more communication links (e.g., communication link 714). The communication link may be any communication link suitable for transferring data between task executor 706 and task scheduler 702, such as a network link, a dial-up link, a wireless link, a hardwired link, any other suitable communication link, or any suitable combination of such links.

Task executor 706 may include any one or more devices suitable for performing task processing. In some embodiments, task executor 706 may include any suitable type of device. For example, in some embodiments, task executor 706 may include a laptop computer, a desktop computer, and/or any other suitable type of user device.

Referring to fig. 8, fig. 8 is a flowchart illustrating steps of another task scheduling method according to an embodiment of the present application. The task scheduling method provided by the embodiment of the application can be executed by the task scheduler 702 in the system shown in fig. 7. Specifically, the task scheduling method provided in the present embodiment includes the following steps:

Step 802, determining task slicing dependency relationship in a scheduling process for a plurality of tasks, wherein the plurality of tasks comprise an upstream task formed by a plurality of upstream task slices and a downstream task formed by a plurality of downstream task slices, and the task slicing dependency relationship represents the dependency relationship between each upstream task slice and each downstream task slice.

Step 804, a task allocation request sent by a target task executor is received.

Specifically, the target task executor in the embodiment of the present application may be any task executor in the system shown in fig. 7. When the target task executor is in an idle state or in a task slicing execution state, the own computing power resource can support the target task executor to start a plurality of task slicing execution processes in parallel, the task processor can send a task allocation request to the task scheduler.

Step 806, distributing the unexecuted target downstream task segment to the target task executor to execute the target downstream task segment by the target task executor, wherein the target downstream task segment is determined according to the task segment dependency relationship and depends on the completed downstream task segment.

Optionally, in some of these embodiments, assigning the unexecuted target downstream task shard to the target task executor to execute the target downstream task shard by the target task executor includes:

the method comprises the steps of judging whether unexecuted upstream task fragments exist, if unexecuted upstream task fragments exist, distributing unexecuted target downstream task fragments to a target task executor to execute the target downstream task fragments through the target task executor, and if unexecuted upstream task fragments exist, distributing unexecuted upstream task fragments to the target task executor to execute unexecuted upstream task fragments through the target task executor.

Specifically, in the above embodiment of the present application, when an upstream task segment is executed, it is determined whether there is an upstream task segment that has not been executed, and when there is an upstream task segment that has been executed, the upstream task segment is preferentially executed, and then, when there is no upstream task segment that has not been executed, the target downstream task segment that depends on the executed upstream task segment is reassigned for execution. Through the above process, for the legacy upstream task slices which cannot be executed in the previous parallel execution process, the corresponding execution results can be obtained as soon as possible, and further, the downstream task slice execution process which depends on the legacy upstream task slices can be started as soon as possible, thereby being beneficial to improving the acquisition efficiency of the final slice execution results corresponding to the task slices.

Judging whether long-tail task fragments with execution duration exceeding the preset duration exist or not, if not, distributing unexecuted target downstream task fragments to a target task executor to execute the target downstream task fragments through the target task executor, if so, distributing long-tail task fragments to the target task executor to execute the long-tail task fragments through the target task executor, and after a task execution result of the long-tail task fragments is obtained, ending the execution process of the long-tail task fragments.

Specifically, for the long-tail task fragments with possibly long execution duration, in the above embodiment of the present application, a solution of parallel running of the long-tail task fragments is adopted, where after the task scheduler determines that the long-tail task fragments exist, the long-tail task fragments are executed in parallel by a new task executor (the target task executor) that sends a task allocation request while the task fragments are reserved, and when any task executor obtains a task execution result of the long-tail task fragments, the execution process of the long-tail task fragments can be ended.

When the problem of long tail of the task segmentation execution duration process is caused by insufficient computational power resources of a task executor abnormal running state, the execution result of the long tail task segmentation can be obtained again in a shorter time through the parallel running solution provided by the embodiment of the application, so that the overall efficiency and stability of task scheduling and execution are improved.

Optionally, in some embodiments, receiving the task allocation request sent by the target task executor includes:

The method comprises the steps of receiving a heartbeat signal sent by a target task executor regularly, wherein the heartbeat signal comprises current task fragment identification information and task request identification information, the task request identification information represents whether the target task executor requests a new task fragment, and if the target task executor requests the new task fragment according to the task request identification information, determining that a task allocation request sent by the target task executor is received.

Correspondingly, distributing the long-tail task fragments to the target task executor, wherein the method comprises the steps of judging whether the task fragments being executed by the target task executor are long-tail task fragments according to the current task fragment identification information, and if not, distributing the long-tail task fragments to the target task executor.

Specifically, as shown in fig. 7, a task scheduling system generally includes a larger number of task schedulers, and the specific execution states of the task schedulers are not the same in the task scheduling and executing process. In order to facilitate the task scheduler to know the task execution status of each task executor in time and the progress of the whole task scheduling job, in the above embodiment of the present application, the task executor periodically sends a heartbeat signal containing the current task slice identification information and the task request identification information to the task scheduler, so as to report the relevant information of the task slice being executed by itself (such as the identification information of the task slice being executed currently, the task slice execution detail information, etc.) to the task scheduler through the heartbeat signal, and inform the task scheduler whether the task scheduler can receive and execute the new task slice at the current moment. And after receiving the heartbeat signal, the task scheduler updates the task fragmentation execution condition according to the current task fragmentation identification information contained in the heartbeat signal. In addition, when the task request identification information contained in the heartbeat signal indicates that the target task executor requests a new task fragment, a new task fragment to be executed is allocated to the target task executor according to the task fragment execution condition.

And after the task scheduler distributes a certain task fragment to a certain task executor, if the task scheduler fails to receive the heartbeat signal sent by the task executor on time, the task scheduler characterizes that the task executor is abnormal in operation, and at this time, the task scheduler can redistribute the task fragment originally distributed to the task processor to other task executors. By the method, the task scheduler can timely find out abnormal conditions in the task segmentation executing process, and further timely adopts corresponding remedial measures to avoid the situation that the task segmentation is not successfully executed.

In addition, for the task executor that executes the long-tail task fragments in parallel, when the task scheduler receives the fragment execution result fed back by one of the task executors through the heartbeat signal, the task executor may return an execution stop notification to stop the continuous execution of the long-tail task fragments when the other task executors that execute the long-tail task fragments in parallel send the heartbeat signal.

In summary, according to the above embodiment of the present application, the task scheduler may timely learn the task execution status of each task executor and the progress of the whole task scheduling job, so as to better perform the scheduling of the subsequent task slices. In addition, when the task executor executing the task fragments runs abnormally, the task scheduler can timely find out and adopt corresponding remedial measures, so that the situation that the task fragments cannot be successfully executed is avoided.

Referring to fig. 9, fig. 9 is a schematic diagram of a task scheduling process interaction flow corresponding to the embodiment shown in fig. 8. The task scheduling process described above is explained below with reference to fig. 9:

The task executor periodically sends heartbeat signals to the task scheduler in the task execution process. As shown in fig. 9, for one heartbeat cycle, the transmitted heartbeat signal may include the current task fragment identification information and the task request identification information. And assuming that the task executor is currently executing the task fragments, the task executor can report relevant information such as the identification of the task fragments currently executing to the task scheduler, and can report task request identification information of the task fragments expected to be applied to the task scheduler under the condition of permission of computing power resources.

After the heartbeat signal is received, the task scheduler enters a judging stage, namely whether a task slice to be executed exists currently or not, whether the task slice to be executed and the task slice being executed by the task executor are the same task slice under the condition that the task slice to be executed exists, if the task slice to be executed currently and the task slice to be executed are different from the task slice being executed by the task executor, the task slice to be executed is distributed to the task executor, otherwise, if the task slice to be executed does not exist, a new execution task slice notification is returned. Thereafter, the task scheduler may update the heartbeat signal record according to the current interaction process.

In addition, when a task executor (for convenience of description, referred to herein as a first task executor) is allocated to execute long-tail tasks in parallel, the task scheduler may also return a notification to the first task executor about whether to continue executing according to whether a long-tail task execution completion notification reported by other task executors is received. Specifically, when the long-tail task execution completion notification reported by other task executors is received, the execution stopping notification can be returned to the first task executor, otherwise, if the long-tail task execution completion notification reported by other task executors is not received, the continuous execution notification is returned to the first task executor.

Referring to fig. 10, fig. 10 is a functional architecture diagram of a task scheduler for executing the embodiment shown in fig. 8. Referring to fig. 10, the task scheduler according to the embodiment of the present application may perform task management on a plurality of batch tasks having a dependency relationship, and has the capabilities of "batch task pseudo-streaming", "long-tail parallel operation", and "failed retry". Specifically, in the task scheduling and management process, the upstream and downstream tasks which are originally processed in a batch processing mode can be converted into task fragments which are processed in a stream processing mode, batch processing operation is reserved for internal data of each task fragment, meanwhile stream processing of a task fragment layer is achieved, namely, batch task pseudo-streaming capability shown in fig. 10 is reserved, for long-tail task fragments with execution duration exceeding a preset duration, a solution of parallel operation of the long-tail task fragments is adopted, namely, the capability of parallel operation of the long-tail task fragments shown in fig. 10 is adopted, in addition, after a task scheduler distributes a certain task fragment to a certain task executor, if the task scheduler fails to receive a heartbeat signal sent by the task executor on time, abnormal operation of the task executor is represented, and at the moment, the task scheduler can redistribute the task fragment which is originally distributed to the task executor to other task executors, namely, fail retry capability shown in fig. 10.

Referring to fig. 11, fig. 11 is a flowchart illustrating steps of a task scheduling method according to still another embodiment of the present application. The task scheduling method provided by the embodiment of the application can be executed by the task executor 706 in the system shown in fig. 7. Specifically, the task scheduling method provided in the present embodiment includes the following steps:

Step 1102, a task allocation request is sent to a task scheduler, so that the task scheduler allocates unexecuted target downstream task slices to target task executors in a scheduling process for a plurality of tasks.

Step 1104, obtaining a target downstream task slice allocated by the task scheduler, and executing the target downstream task slice.

The plurality of tasks comprises an upstream task consisting of a plurality of upstream task slices and a downstream task consisting of a plurality of downstream task slices, and the target downstream task slices are downstream task slices depending on the completed upstream task slices.

Optionally, in some embodiments, the target downstream task tile is a long tail task tile, the method further comprising:

after the target downstream task fragment execution is completed, a target catalog writing request is sent to a task dispatcher;

And in response to receiving the write permission notification returned by the task scheduler, writing the task execution result of the target downstream task partition into the target directory.

Specifically, in the embodiment of the application, for different task executors that execute long-tail task fragments in parallel, data generated in the execution process can be stored in the corresponding private domain catalogs respectively, but not the target catalogs corresponding to the long-tail task fragments finally. And then, when the execution of a certain task executor for executing the long-tail task fragments in parallel is completed, applying for the task scheduler that the data in the private domain directory is written into the target directory. The specific request process is that a target catalog writing request is sent to a task scheduler, if the task scheduler determines that other task executors which have completed the long tail task slicing do not exist currently, a writing permission notification is returned, and at the moment, the task executors can write data in a private catalog into the target catalog.

Through the application mechanism (also called as an application locking mechanism), the task executor for executing the long-tail task fragments in parallel can be ensured to smoothly store the execution result to the target directory by the first task executor for completing the execution task, and the execution efficiency of the long-tail task fragments can be improved while the parallel writing conflict is avoided.

Optionally, in some embodiments, after sending the target directory write request to the task scheduler, the method further comprises:

and in response to receiving the write permission occupation notification returned by the task scheduler, periodically sending a target directory write request to the task scheduler according to a preset time interval.

Specifically, when the task scheduler determines that there are other task executors that have completed the long-tail task sharding, the write permission is already occupied by the other task executors, so the task scheduler may return a write permission occupation notification to the current task executor, where the current task executor cannot write data into the target directory.

Further, a write error may occur in consideration of a data write operation into the target directory. Thus, in the above embodiment of the present application, the current task executor may continuously and periodically send a write request to the task scheduler before it has not been able to determine that the other task executor has been successfully written. After the task scheduler determines that the writing of the other task executors is successful, after the task scheduler receives the heartbeat signal sent by the current task executor again, the task scheduler returns a notice of stopping task execution to the current task executor as feedback, and the current task executor stops the operation of sending the target catalog writing request to the task scheduler.

Through the process, the reliability of the acquisition of the execution result of the long-tail task segmentation can be effectively improved, and the problem that the final execution result of the long-tail task segmentation cannot be acquired due to data writing errors is avoided.

Optionally, in some embodiments, sending the task allocation request to the task scheduler includes:

The method comprises the steps of periodically sending a heartbeat signal to a task scheduler, wherein the heartbeat signal comprises task request identification information, the task request identification information represents whether a task executor requests a new task partition, if the task executor requests the new task partition according to the task request identification information, determining to send a task allocation request to the task scheduler, and if the target downstream task partition fails to be executed, recording a failure reason and stopping sending the heartbeat signal to the task scheduler.

Specifically, through the heartbeat mechanism in the above embodiment of the present application, the task scheduler can timely understand the task execution state of each task executor and the progress condition of the whole task scheduling work, so as to better perform the scheduling of the subsequent task fragments. In addition, when the task executor executing the task fragments runs abnormally, the task scheduler can timely find out and adopt corresponding remedial measures, so that the situation that the task fragments cannot be successfully executed is avoided.

Referring to fig. 12, fig. 12 is a schematic diagram of a task scheduling process interaction flow in a long-tail task slice execution process. The following explains the long-tail task slice execution process with reference to fig. 12:

in the process of executing the task, a task executor corresponding to the long-tail task fragments firstly enters a first judging stage, if the task is successfully executed, the failure reason is recorded and the heartbeat signal is stopped to be sent to a task dispatcher, if the task is successfully executed, a target catalog preemption lock request (namely, a target catalog writing request) is sent to the task dispatcher, and a second judging stage, if the condition that other task executors which have completed the long-tail task fragments at the moment are not judged to exist, preemption succeeds, the task dispatcher returns preemption success notification (namely, a writing permission notification), at the moment, the task executor can write data in a private catalog into the target catalog, and accordingly, the task executor can record relevant state data of the long-tail task fragments, statistics of executing time consumption, deleting process data and the like to the task dispatcher, if the condition that other task executors which have completed the long-tail task fragments at the moment exist are judged to fail, the task executor returns preemption success notification (namely, the task executor is also scheduled to write permission notification) at the moment, and then the task executor sends preemption lock request to the target catalog at regular intervals.

Fig. 13 is a block diagram illustrating a task scheduling device according to an embodiment of the present application. The task scheduling device provided by the embodiment of the application comprises the following components:

A task obtaining module 1302, configured to obtain a plurality of tasks to be scheduled, where the plurality of tasks includes an upstream task that is formed by a plurality of upstream task slices, and a downstream task that is formed by a plurality of downstream task slices;

a dependency determination module 1304, configured to determine task shard dependencies, where the task shard dependencies characterize dependencies existing between each upstream task shard and each downstream task shard;

A scheduling module 1306, configured to allocate a task executor for a target downstream task tile that depends on the completed upstream task tile in response to completion of execution of the current upstream task tile, where the target downstream task tile is determined by a task tile dependency relationship.

Optionally, in some embodiments, the scheduling module 1306, when assigning the task executor to the target downstream task shard that depends on the completed upstream task shard in response to the current upstream task shard execution being completed, is specifically configured to:

Optionally, in some embodiments, the task scheduling device further includes:

The system comprises an execution duration recording module, a first task executor, a second task executor, a first task partition, a second task partition, a third task partition, a fourth task partition and a fourth task partition, wherein the execution duration recording module is used for recording the task execution duration of the first task executor for the current task partition in the current task partition execution process;

the parallel distribution module is used for distributing a second task executor for the current task fragment if the execution time length exceeds the preset time length so as to execute the current task fragment in parallel through the first task executor and the second task executor;

and the execution stopping module is used for ending the execution process of the current task fragment in response to the task execution result of the current task fragment.

The task scheduling device in this embodiment is configured to implement the corresponding task scheduling method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein again. In addition, the functional implementation of each module in the task scheduling device of the present embodiment may refer to the description of the corresponding portion in the foregoing method embodiment, which is not repeated herein.

Fig. 14 is a block diagram illustrating a task scheduler according to another embodiment of the present application. The task scheduling device provided by the embodiment of the application is positioned in a task executor, and the task scheduling module comprises:

A relationship determination module 1402, configured to determine task shard dependency relationships in a scheduling process for a plurality of tasks, where the plurality of tasks includes an upstream task that is composed of a plurality of upstream task shards and a downstream task that is composed of a plurality of downstream task shards;

a request receiving module 1404, configured to receive a task allocation request sent by a target task executor;

And a task allocation module 1406 for allocating the unexecuted target downstream task slices to the target task executor to execute the target downstream task slices by the target task executor, wherein the target downstream task slices are determined according to the task slice dependency relationship and depend on the completed upstream task slices.

Optionally, in some of these embodiments, the task allocation module 1406 is specifically configured to:

judging whether long tail task fragments with execution time length exceeding a preset time length exist or not;

if the long tail task fragments do not exist, distributing the unexecuted target downstream task fragments to a target task executor so as to execute the target downstream task fragments through the target task executor;

If the long-tail task fragments exist, distributing the long-tail task fragments to a target task executor so as to execute the long-tail task fragments through the target task executor;

And after the task execution result of the long-tail task sharding is obtained, ending the execution process of the long-tail task sharding.

Optionally, in some embodiments, the request receiving module 1404 is specifically configured to receive a heartbeat signal periodically sent by the target task executor, where the heartbeat signal includes current task fragment identification information and task request identification information; if the target task executor requests the new task fragments according to the task request identification information, determining that a task allocation request sent by the target task executor is received;

Correspondingly, the task allocation module 1406 is specifically configured to, when executing the step of allocating the long-tail task shard to the target task executor, determine, according to the current task shard identification information, whether the task shard being executed by the target task executor is the long-tail task shard, and if not, allocate the long-tail task shard to the target task executor.

Fig. 15 is a block diagram illustrating a task scheduling device according to still another embodiment of the present application. The task scheduling device provided by the embodiment of the application is positioned in a task executor, and the task scheduling module comprises:

a request sending module 1502, configured to send a task allocation request to a task scheduler, so that the task scheduler allocates, in a scheduling process for a plurality of tasks, a target downstream task segment that is not executed to a target task executor;

The task shard acquisition module 1504 is configured to acquire a target downstream task shard allocated by the task scheduler, and execute the target downstream task shard;

Optionally, in some embodiments, the target downstream task slice is a long-tail task slice, and the task scheduling device further includes:

and the result writing module is used for sending a target directory writing request to the task scheduler after the target downstream task fragment is executed, and writing the task execution result of the target downstream task fragment into the target directory in response to receiving the writing permission notification returned by the task scheduler.

Optionally, in some embodiments, the result writing module is further configured to, after sending the target directory writing request to the task scheduler, send the target directory writing request to the task scheduler periodically at a preset time interval in response to receiving a writing authority occupation notification returned by the task scheduler.

Optionally, in some embodiments, the request sending module 1502 is specifically configured to, when executing the step of sending the task allocation request to the task scheduler, send a heartbeat signal to the task scheduler periodically, where the heartbeat signal includes task request identification information;

Optionally, the task scheduling device further includes:

and a failure recording module, configured to record a failure reason and trigger the request sending module 1502 to stop sending the heartbeat signal to the task scheduler if the target downstream task partition fails to execute.

Referring to fig. 16, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, and the specific embodiment of the present application is not limited to the specific implementation of the electronic device.

As shown in FIG. 16, the electronic device may include a processor 1602, a communication interface (Communications Interface) 1604, a memory 1606, and a communication bus 1608.

Wherein:

The processor 1602, communication interface 1604, and memory 1606 communicate with each other via a communication bus 1608.

Communication interface 1604 for communicating with other electronic devices or servers.

The processor 1602 is configured to execute the program 1610, and may specifically perform relevant steps in the method embodiments described above.

In particular, program 1610 may include program code including computer operating instructions.

The processor 1602 may be a CPU, or an Application-specific integrated Circuit ASIC (Application SPECIFIC INTEGRATED Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors included in the smart device may be the same type of processor, such as one or more CPUs, or different types of processors, such as one or more CPUs and one or more ASICs.

A memory 1606 for storing programs 1610. The memory 1606 may include high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 1610 may include a plurality of computer instructions, and the program 1610 may specifically enable the processor 1602 to perform operations corresponding to the methods described in any of the foregoing method embodiments.

The specific implementation of each step in the procedure 1610 may refer to the corresponding steps and corresponding descriptions in the units in the above method embodiments, and have corresponding beneficial effects, which are not described herein. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.

The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method described in any of the preceding method embodiments. The computer storage media includes, but is not limited to, compact disk read-Only Memory (CD-ROM), random access Memory (Random Access Memory, RAM), floppy disk, hard disk, or magneto-optical disk.

Embodiments of the present application also provide a computer program product comprising computer instructions that instruct a computing device to perform operations corresponding to any one of the above-described method embodiments.

In addition, it should be noted that, the information related to the user (including, but not limited to, user equipment information, user personal information, etc.) and the data related to the embodiment of the present application (including, but not limited to, sample data for training the model, data for analyzing, stored data, presented data, etc.) are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide a corresponding operation entry for the user to select authorization or rejection.

It should be noted that, according to implementation requirements, each component/step described in the embodiments of the present application may be split into more components/steps, or two or more components/steps or part of operations of the components/steps may be combined into new components/steps, so as to achieve the objects of the embodiments of the present application.

The methods according to embodiments of the present application described above may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD-ROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the methods described herein may be processed by such software on a recording medium using a general purpose computer, a special purpose processor, or programmable or dedicated hardware such as an Application SPECIFIC INTEGRATED Circuit (ASIC), or field programmable gate array (Field Programmable GATE ARRAY, FPGA). It is understood that a computer, processor, microprocessor controller, or programmable hardware includes a Memory component (e.g., random access Memory (Random Access Memory, RAM), read-Only Memory (ROM), flash Memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, performs the methods described herein. Furthermore, when a general purpose computer accesses code for implementing the methods illustrated herein, execution of the code converts the general purpose computer into a special purpose computer for performing the methods illustrated herein.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only for illustrating the embodiments of the present application, but not for limiting the embodiments of the present application, and various changes and modifications may be made by one skilled in the relevant art without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also fall within the scope of the embodiments of the present application, and the scope of the embodiments of the present application should be defined by the claims.

Claims

1. A task scheduling method, comprising:

Acquire multiple tasks to be scheduled; the multiple tasks include: an upstream task composed of multiple upstream task slices, and a downstream task composed of multiple downstream task slices;

Determine a task slice dependency relationship, wherein the task slice dependency relationship represents a dependency relationship between each upstream task slice and each downstream task slice;

In response to the completion of the execution of the current upstream task slice, a task executor is allocated to a target downstream task slice that depends on the completed upstream task slice, so that the target downstream task is executed by the task executor; the target downstream task slice is determined by the task slice dependency.

2. The method according to claim 1, wherein, in response to the completion of the execution of the current upstream task slice, allocating a task executor to a target downstream task slice that depends on the completed upstream task slice comprises:

In response to the current upstream task slice being executed, determining whether there is an unexecuted upstream task slice;

If there are no unexecuted upstream task slices, assign a task executor to the target downstream task slice that depends on the completed upstream task slice.

3. The method according to claim 2, wherein, after determining whether there are unexecuted upstream task slices, the method further comprises:

If there are any unexecuted upstream task slices, a task executor is assigned to the unexecuted upstream task slice, so that the unexecuted upstream task slice is used as the updated current upstream task slice, and the unexecuted upstream task slice is executed by the task executor; and the step of judging whether there are any unexecuted upstream task slices in response to the completion of the execution of the current upstream task slice is returned until all task slices are allocated.

4. The method according to any one of claims 1 to 3, wherein the method further comprises:

During the execution of the current task slice, the task execution time of the first task executor on the current task slice is recorded; the first task executor is a task executor configured to execute the current task slice; the current task slice represents the upstream task slice currently being executed, or the downstream task slice being executed;

If the execution time exceeds the preset time, a second task executor is allocated to the current task slice, so that the current task slice is executed in parallel by the first task executor and the second task executor;

In response to obtaining the task execution result of the current task slice, the execution process for the current task slice is terminated.

5. A task scheduling method, comprising:

In a scheduling process for multiple tasks, determining a task slice dependency relationship; the multiple tasks include: an upstream task composed of multiple upstream task slices, and a downstream task composed of multiple downstream task slices; the task slice dependency relationship represents a dependency relationship between each upstream task slice and each downstream task slice;

Receive a task assignment request sent by a target task executor;

Allocate the unexecuted target downstream task slice to the target task executor so as to execute the target downstream task slice through the target task executor; the target downstream task slice is a downstream task slice determined according to the task slice dependency and depends on the completed upstream task slice.

6. The method according to claim 5, wherein the step of allocating the unexecuted target downstream task slice to the target task executor so as to execute the target downstream task slice through the target task executor comprises:

Determine whether there are unexecuted upstream task shards;

If there is no unexecuted upstream task slice, the unexecuted target downstream task slice is allocated to the target task executor, so that the target downstream task slice is executed by the target task executor;

If there are unexecuted upstream task slices, the unexecuted upstream task slices are allocated to the target task executor so as to execute the unexecuted upstream task slices through the target task executor.

7. The method according to claim 5, wherein the step of allocating the unexecuted target downstream task slice to the target task executor so as to execute the target downstream task slice through the target task executor comprises:

Determine whether there are long-tail task slices whose execution time exceeds the preset time;

If there is no long-tail task slice, the unexecuted target downstream task slice is allocated to the target task executor, so that the target downstream task slice is executed by the target task executor;

If there is a long-tail task slice, the long-tail task slice is allocated to the target task executor so as to execute the long-tail task slice through the target task executor;

After obtaining the task execution result of the long-tail task slice, the execution process for the long-tail task slice is ended.

8. The method according to claim 7, wherein the receiving the task allocation request sent by the target task executor comprises:

Receive a heartbeat signal periodically sent by a target task executor, wherein the heartbeat signal includes: current task slice identification information and task request identification information; the task request identification information indicates whether the target task executor requests a new task slice;

If it is determined according to the task request identification information that the target task executor requests a new task slice, it is determined that a task allocation request sent by the target task executor is received;

The allocating the long-tail task slices to the target task executor includes:

It is determined whether the task slice being executed by the target task executor is the long-tail task slice according to the current task slice identification information; if not, the long-tail task slice is allocated to the target task executor.

9. A task scheduling method, comprising:

Sending a task allocation request to a task scheduler so that the task scheduler allocates unexecuted target downstream task slices to target task executors during the scheduling process of multiple tasks;

Obtain the target downstream task slice assigned by the task scheduler, and execute the target downstream task slice;

The multiple tasks include: an upstream task composed of multiple upstream task slices, and a downstream task composed of multiple downstream task slices; the target downstream task slice is a downstream task slice that depends on the completed upstream task slice.

10. The method according to claim 9, wherein the target downstream task slice is a long-tail task slice, and the method further comprises:

After the target downstream task slice is executed, a target directory write request is sent to the task scheduler;

In response to receiving a write permission notification returned by the task scheduler, the task execution result of the target downstream task slice is written into the target directory.

11. The method according to claim 10, wherein after sending the target directory write request to the task scheduler, the method further comprises:

In response to receiving the write permission occupied notification returned by the task scheduler, the target directory write request is periodically sent to the task scheduler at a preset time interval.

12. The method according to any one of claims 9 to 11, wherein sending the task allocation request to the task scheduler comprises:

Periodically sending a heartbeat signal to the task scheduler, wherein the heartbeat signal includes: task request identification information; the task request identification information indicates whether the task executor requests a new task slice;

If it is determined according to the task request identification information that the task executor requests a new task slice, then it is determined to send a task allocation request to the task scheduler;

The method further comprises:

If the target downstream task slice fails to execute, the failure reason is recorded and the heartbeat signal is stopped from being sent to the task scheduler.

13. An electronic device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other through the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction enables the processor to perform an operation corresponding to the method according to any one of claims 1 to 12.

14. A computer storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the method according to any one of claims 1 to 12 is implemented.

15. A computer program product, comprising computer instructions, wherein the computer instructions instruct a computing device to execute operations corresponding to the method according to any one of claims 1 to 12.