[go: up one dir, main page]

CN114840392B - Task scheduling anomaly monitoring method, device, medium and program product - Google Patents

Task scheduling anomaly monitoring method, device, medium and program product

Info

Publication number
CN114840392B
CN114840392B CN202210646351.4A CN202210646351A CN114840392B CN 114840392 B CN114840392 B CN 114840392B CN 202210646351 A CN202210646351 A CN 202210646351A CN 114840392 B CN114840392 B CN 114840392B
Authority
CN
China
Prior art keywords
task
time
scheduling
consuming
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210646351.4A
Other languages
Chinese (zh)
Other versions
CN114840392A (en
Inventor
刘林
王志远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202210646351.4A priority Critical patent/CN114840392B/en
Publication of CN114840392A publication Critical patent/CN114840392A/en
Application granted granted Critical
Publication of CN114840392B publication Critical patent/CN114840392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请提供了一种任务调度异常监控方法、装置、介质及程序产品,通过根据当前任务周期的工作计划,获取一个或多个历史任务周期的历史任务数据;根据历史任务数据确定多个任务耗时类型和调度关系图谱,每个任务耗时类型中所包含的数据量与耗时数据的总量的比值满足预设占比要求;根据多个任务耗时类型以及调度关系图谱,确定至少一个定时检测任务;根据各个定时检测任务在各个检测时间点的检测结果,预判目标系统的任务调度出现异常的概率是否满足预设预警要求;若是,则确定并输出一个或多个预警信息。解决了现有的异常监控存在响应时效差、配置不灵活,且仅是逻辑层面的监控,与实际业务耦合度不高的技术问题。

The present application provides a task scheduling anomaly monitoring method, device, medium and program product, which obtains historical task data of one or more historical task cycles according to the work plan of the current task cycle; determines multiple task time-consuming types and scheduling relationship maps based on the historical task data, and the ratio of the amount of data contained in each task time-consuming type to the total amount of time-consuming data meets the preset proportion requirement; determines at least one timed detection task based on multiple task time-consuming types and scheduling relationship maps; and predicts whether the probability of anomalies in the task scheduling of the target system meets the preset warning requirements based on the detection results of each timed detection task at each detection time point; if so, determines and outputs one or more warning information. This solves the technical problems of existing anomaly monitoring, such as poor response time, inflexible configuration, and only logical-level monitoring, which is not highly coupled with actual business.

Description

Task scheduling abnormality monitoring method, device, medium and program product
Technical Field
The present application relates to the field of financial science and technology (Fintech), and in particular, to a method, apparatus, medium, and program product for monitoring task scheduling anomalies.
Background
With the development of computer technology, more and more technologies are being applied in the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech). At present, offline data to be processed in the finance and internet industries every day has the characteristics of large data scale and high aging requirement, particularly financial enterprises relate to a large amount of supervision report data processing, and if the task of data processing is not completed, related data cannot be given on time, and even can be subjected to supervision responsibility, so that the rating and reputation of the enterprises are affected. Monitoring and abnormal response of the task scheduling system is particularly important.
Currently, open-source task scheduling frameworks or tools, such as Azkaban, airflow, oozie, are mature, but most of monitoring functions configured or developed by using the frameworks or tools are based on abnormal states of tasks or fixed parameters configured based on experience, and the like, so that the tasks are always in abnormal states or have actual influence after the monitoring alarms come out.
The existing abnormal monitoring has the technical problems of poor response time effectiveness, inflexible configuration and low coupling degree with actual service, and only the monitoring of a logic layer, so that the difficulty and the workload of operation and maintenance work are increased.
Disclosure of Invention
The application provides a task scheduling abnormity monitoring method, device, medium and program product, which are used for solving the technical problems of poor response time efficiency, inflexible configuration, only monitoring in a logic level and low coupling degree with actual service in the conventional abnormity monitoring.
In a first aspect, the present application provides a method for monitoring task scheduling abnormality, including:
According to the working plan of the current task period, acquiring historical task data of one or more historical task periods, wherein the similarity between the historical working plan of the historical task period and the working plan of the current task period meets preset requirements, and the historical task data comprises configuration data of each historical task and time-consuming data for executing each historical task;
Performing clustering processing on each historical task cycle by using a preset clustering model according to time-consuming data until a plurality of task time-consuming types are determined, wherein the ratio of the data quantity contained in each task time-consuming type to the total quantity of the time-consuming data meets the preset duty ratio requirement;
determining a scheduling relation graph according to the configuration data, determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation graph, wherein the scheduling relation graph is used for representing the dependency relation of the inter-calling processing results among the historical tasks;
And according to the detection results of each timing detection task at each detection time point, pre-judging whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement, and if so, determining and outputting one or more pieces of early warning information.
In one possible design, the one or more historical task cycles include a last task cycle that is closest to the current cycle, or a consecutive plurality of task cycles that is closest to the current cycle.
In one possible design, using a preset clustering model, performing clustering on each historical task cycle according to time-consuming data until determining a plurality of task time-consuming types, including:
randomly extracting time-consuming data of a plurality of historical tasks from all the historical tasks to serve as a clustering center;
performing first clustering processing on each historical task according to a clustering center by using a preset clustering model to determine one or more first time consumption types;
judging whether the data volume duty ratio in each first time consuming type meets the preset duty ratio requirement or not;
If yes, determining that the first time-consuming type is a task time-consuming type;
if not, re-determining a clustering center, and re-performing clustering processing to re-determine the first time-consuming types until the data volume duty ratio corresponding to each first time-consuming type meets the preset duty ratio requirement;
wherein the data volume ratio is used to characterize a ratio of the volume of data contained by the first time-consuming type to the total volume of data of the time-consuming data.
In one possible design, the preset duty cycle requirement includes the data amount duty cycle being greater than or equal to a first duty cycle threshold and less than or equal to a second duty cycle threshold.
Optionally, the first value range of the first duty ratio threshold comprises 1% -10%, and the second value range of the second duty ratio threshold comprises 40% -60%.
In one possible design, the cluster center is redetermined and the clustering process is performed again, including:
deleting a first time-consuming type having a data volume duty cycle less than a first duty cycle threshold, and/or,
Randomly extracting at least two historical tasks from each first time-consuming type with the data volume being larger than a second duty ratio threshold value to serve as a new clustering center;
for the first time-consuming type meeting the preset duty ratio requirement, a new clustering center is redetermined according to a preset mode;
and carrying out clustering again according to each new clustering center by utilizing a preset clustering model so as to determine a new first time-consuming type.
In one possible design, for a first time-consuming type that meets a preset duty cycle requirement, redefining a new cluster center according to a preset manner includes:
when the first time consumption type meets the preset duty ratio requirement, taking the average time consumption of the first time consumption type as a new clustering center.
In one possible design, determining at least one timing detection task based on a plurality of task time consuming types and a scheduling relationship graph includes:
according to preset screening requirements, determining a first target type and a second target type from the time-consuming types of each task;
determining a first fluctuation range and a second fluctuation range according to time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm;
And determining detection objects and detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical task in the time-consuming data.
In one possible design, determining the first fluctuation range and the second fluctuation range from each time-consuming data in the first target type and the second target type using a preset fluctuation algorithm includes:
determining a first fluctuation range according to the first average time consumption and the first standard deviation of all time consumption data in the first target type;
And determining a second fluctuation range according to the second average time consumption of all time consumption data in the two target types and the second standard deviation.
In one possible design, determining the first fluctuation range from the first average time consumption of all time-consuming data in the first target type and the first standard deviation includes:
the first fluctuation range is equal to the sum of the first average time consumption and the first standard deviation which is N times;
determining a second fluctuation range according to the second average time consumption of all time consumption data in the two target types and the second standard deviation, wherein the method comprises the following steps:
the second fluctuation range is equal to a difference between the second average time consumption and a second standard deviation of M times.
In one possible design, the detection times include a first detection time including superimposing a first fluctuation range on the basis of the start time and a second detection time including superimposing a second fluctuation range on the basis of the start time.
In one possible design, according to the detection results of each timing detection task at each detection time point, it is pre-determined whether the probability of abnormal task scheduling of the target system meets a preset early warning requirement, including:
If the execution progress of the detection object at the first detection time is determined to be incomplete according to the detection result, determining that the first probability of abnormality in the execution progress of the task meets the early warning requirement;
If the execution progress of the detection object at the second detection time is determined to be completed according to the detection result, the second probability that the data magnitude of the target system scheduling task is abnormal is determined to meet the early warning requirement.
In one possible design, determining and outputting one or more alert information includes:
Calculating the association degree of the previous task and the next task in the scheduling relation map according to a preset association model;
If the association degree is in the first association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein the early warning level of the first early warning information is the same as that of the second early warning information, the first early warning information is used for representing that the scheduling abnormality exists in the previous task and has an association effect on the scheduling of the next task, and the second early warning information is used for representing that the scheduling abnormality of the next task is derived from the delay of the previous task;
and outputting the first early warning information to the previous task and outputting the second early warning information to the next task.
In one possible design, determining and outputting one or more alert information includes:
Calculating the association degree of the previous task and the next task in the scheduling relation map according to a preset association model;
if the association degree is in the second association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein the first early warning level of the first early warning information is larger than the second early warning level of the second early warning information, the first early warning information is used for representing that the scheduling abnormality exists in the previous task and has an association influence on the scheduling of the latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is derived from the delay of the former task;
and outputting the first early warning information to the previous task and outputting the second early warning information to the next task.
In one possible design, determining and outputting one or more alert information includes:
Calculating the association degree of the previous task and the next task in the scheduling relation map according to a preset association model;
if the association degree is in the third association interval, outputting early warning information to the previous task, wherein the early warning information is used for representing that the previous task has scheduling abnormality.
In one possible design, the pre-warning information includes a weight feedback link;
after determining and outputting the one or more pieces of early warning information, the method further comprises:
Receiving adjustment information input by a user through a weight feedback link;
and adjusting the pre-warning weight of the detection object corresponding to the timing detection task according to the adjustment information.
In one possible design, the method further comprises determining a third detection time for the detection object according to the preset delay time when the detection object is detected to have scheduling abnormality at the first detection time, wherein the detection object is a current execution task;
when the fact that the current execution task still has scheduling abnormality is detected at the third detection time, determining a first early warning level of the current execution task according to a first preset early warning weight and early warning triggering times of the current execution task;
judging whether the first early warning level meets preset early warning conditions or not;
If yes, the early warning information is sent again to the current executing task.
In one possible design, when it is detected at the third detection time that the current execution task still has a scheduling exception, the method further includes:
determining the association degree of the currently executed task and the next task according to the scheduling relation graph by utilizing a preset association model;
Determining a second early warning level of the next task according to the second early warning weight, the association degree and the early warning triggering times of the next task;
judging whether the second early warning level meets preset early warning conditions or not;
if yes, sending early warning information to the next task, wherein the early warning information comprises a scheduling delay for prompting that the scheduling abnormality of the next task is derived from the current executing task.
In a second aspect, the present application provides a task scheduling abnormality monitoring apparatus, including:
The acquisition module is used for acquiring historical task data of one or more historical task periods according to the working plan of the current task period, wherein the similarity between the historical working plan of the historical task period and the working plan of the current task period meets the preset requirement, and the historical task data comprises configuration data of each historical task and time-consuming data for executing each historical task;
A processing module for:
Performing clustering processing on each historical task cycle by using a preset clustering model according to time-consuming data until a plurality of task time-consuming types are determined, wherein the ratio of the data quantity contained in each task time-consuming type to the total quantity of the time-consuming data meets the preset duty ratio requirement;
determining a scheduling relation graph according to the configuration data, determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation graph, wherein the scheduling relation graph is used for representing the dependency relation of the inter-calling processing results among the historical tasks;
according to the detection results of each timing detection task at each detection time point, pre-judging whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement or not;
and the output module is used for outputting early warning information to the detection object of the timing detection task.
In a third aspect, the present application provides an electronic device comprising:
A memory for storing program instructions;
a processor for calling and executing program instructions in said memory, performing any one of the possible methods provided in the first aspect.
In a fourth aspect, the present application provides a storage medium having stored therein a computer program for executing any one of the possible task scheduling anomaly monitoring methods provided in the first aspect.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements any one of the possible task scheduling anomaly monitoring methods provided in the first aspect.
The application provides a task scheduling abnormity monitoring method, device, medium and program product, which are characterized in that historical task data of one or more historical task periods are obtained according to a work plan of a current task period, the similarity of the historical work plan of the historical task period and the work plan of the current task period meets preset requirements, the historical task data comprise configuration data of each historical task and time consuming data for executing each historical task, clustering processing is conducted on each historical task cycle according to the time consuming data by utilizing a preset clustering model until time consuming data are determined, the ratio of the data quantity contained in each task time consuming type to the total quantity of the time consuming data meets preset duty ratio requirements, a scheduling relation map is determined according to the configuration data, at least one timing detection task is determined according to the time consuming type of each task and the scheduling relation map, the scheduling relation map is used for representing the dependency relationship of the processing results of the mutual calling between each historical task, whether the abnormal task scheduling of a target system meets the preset early warning requirements or not is judged according to the detection results of each timing detection task at each detection time point, and if yes, one or more early warning information is determined and output. The method solves the technical problems that the existing abnormal monitoring has poor response time efficiency and inflexible configuration, is only the monitoring of a logic level, and has low coupling degree with actual service. The technical effects of ensuring response timeliness by early warning, reserving sufficient time for problem processing, pushing early warning information aiming at dependence association among tasks and being beneficial to quick positioning of problems and resource coordination are achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic diagram of an application scenario of a task scheduling abnormality monitoring method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a task scheduling abnormality monitoring method provided by the application;
FIG. 3 is a flow chart for determining a time-consuming type of a plurality of tasks in a loop in step S202 of the embodiment shown in FIG. 2 according to the present application;
fig. 4 is a schematic flow chart of determining at least one timing detection task in step S203 in the embodiment shown in fig. 2 according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating another task scheduling anomaly monitoring method according to an embodiment of the present application;
Fig. 6 is a schematic structural diagram of a task scheduling abnormality monitoring device according to an embodiment of the present application;
Fig. 7 is a schematic structural diagram of an electronic device according to the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, including but not limited to combinations of embodiments, which are within the scope of the application, can be made by one of ordinary skill in the art without inventive effort based on the embodiments of the application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The following explains the terms related to the present application:
MQ (Message Queue) is a data structure that is "first in first out" in the underlying data structure. By data to be transmitted (also referred to as messages) is meant that the queuing mechanism is used to effect message delivery, i.e., the producer generates and queues the message, which is then processed by the consumer. The consumer can pull the message to the designated queue or subscribe to the corresponding queue for which the MQ server pushes the message.
The acquisition period refers to the acquisition days of the data for analysis and comparison required by the early warning model, and can be adjusted according to the data scale.
DATACHECK data checking, which is to check the integrity of the dependent data before the scheduling task processes the data.
Job Server, job Server, refers to the Server that receives and executes the specific work content of the scheduled task.
The current offline data to be processed in the finance and internet industries has the characteristics of large data scale and high aging requirement, particularly financial enterprises, and relates to a large amount of supervision and report data processing. Monitoring and abnormal response of the task scheduling system is particularly important.
At present, open-source task scheduling frames or tools, such as Azkaban, airflow, oozie, are mature, but most of monitoring functions configured or developed by using the frames or tools are based on abnormal states of tasks or fixed parameters configured based on experience, and the like, so that after the monitoring alarms come out, the tasks are often abnormal or cause actual influence, response time is poor, the configuration is inflexible, operation and maintenance difficulty and workload are increased, and the monitoring on a logic level is performed, so that the coupling degree with actual service is not high.
Aiming at the situations, the existing solution is to collect cluster health conditions before and after the scheduling time aiming at the task scheduling time through a big data cluster deployment engine so as to perform early warning. The current common early warning implementation algorithm is normal distribution prediction for the whole sample, does not have distinction and needs a large amount of historical data, and the prediction result is inaccurate when the calculation time consumption of the early warning system is increased.
It is to be noted that since a normal distribution is established for the entirety of all history periods, a large amount of history data is required. Further, since each history period is either several history periods in succession or the task being performed has its periodic characteristics over a period of time, the use of an integral sample may be indistinguishable.
In summary, the existing anomaly monitoring has the following technical problems:
(1) The existing monitoring implementation scheme has poor response time efficiency, inflexible configuration, large operation and maintenance workload and low coupling degree with actual service;
(2) The data acquisition requirement required by a large number of early warning causes additional burden to a large data cluster;
(3) The existing algorithm has high complexity, low data discrimination, high calculation cost and inaccurate prediction result.
To improve the existing anomaly monitoring method, the inventor of the present application has found that the following technical obstacles exist in improving the anomaly monitoring method through analysis:
(1) The existing monitoring scheme is mostly based on the condition of abnormal scheduling of large data cluster resources and the task state, and under the condition of cross scheduling of a large amount of service data, the scene of abnormal state of a single task is slow in perception of a downstream task, and the influence on the downstream task cannot be accurately analyzed.
(2) The real-time data analysis and monitoring needs to be frequently interacted with the system, so that system resources are occupied, and the system calculation pressure is increased.
(3) The application of the existing algorithm is not combined with the actual scene, so that extra resource consumption is caused and data errors are increased.
To solve the above problems, the inventive concept of the present application is:
Under the condition that the underlying logic of the task scheduling system is not changed, analyzing the blood-edge relation among tasks (i.e. the interdependence relation during scheduling), the expected normal completion time interval of the tasks and the relativity (i.e. the relativity degree) among the blood-edge tasks (i.e. the tasks with scheduling sequence relation among each other) according to task configuration information, wherein the relativity degree is called as the relativity degree below, the early warning pushing of the upstream task and the downstream task (i.e. the two tasks with adjacent execution sequence) is respectively given, the abnormal response efficiency and the processing timeliness are improved, the historical sample data in the period are analyzed, the early warning detection time is dynamically adjusted, the manpower of the operation and maintenance manual configuration is reduced, the flexibility of the early warning configuration is increased, an early warning level weight module is increased, the service attention degree fed back by an operation and maintenance personnel is provided, the dynamic early warning pushing is generated in combination with the early warning detection result, and the abnormal influence scope is prevented from being upgraded when the actual service requirement is coupled. (4) And (5) by using the existing task configuration information and history information, the scheduling platform can be directly provided and decoupled with large data cluster resources, so that the cluster burden is avoided.
Fig. 1 is a schematic diagram of an application scenario of a task scheduling abnormality monitoring method provided by the present application. As shown in fig. 1, an abnormality monitoring system 200 is independently provided outside the task scheduling system 100, the abnormality monitoring system 200 determines a plurality of timing detection tasks by executing the state task scheduling abnormality monitoring method provided by the present application, and redetermines the detection time of the timing detection tasks every task cycle. The abnormality monitoring system 200 does not alarm after the task scheduling is abnormal, but monitors the execution progress of each task, and sends early warning information to both the front and rear tasks with execution sequence requirements, and before the task scheduling is abnormal as much as possible, the probability of abnormality occurrence is found to be greater than the early warning requirement in advance through the execution progress of the task, namely the corresponding early warning information is sent.
The method for monitoring abnormal state task scheduling is specifically described as follows:
fig. 2 is a flow chart of a task scheduling abnormality monitoring method according to an embodiment of the present application. As shown in fig. 2, the specific steps of the task scheduling abnormality monitoring method include:
S201, according to a work plan of a current task period, historical task data of one or more historical task periods are obtained.
In the step, the similarity between the historical work plan of the historical task period and the work plan of the current task period meets the preset requirement, and the historical task data comprises configuration data of each historical task and time-consuming data for executing each historical task.
It should be noted that, in the anomaly monitoring method of the embodiment of the application, unlike the prior art, which needs to aim at the normal distribution prediction of the whole samples of all the historical task periods, the application compares the working plan of each historical task period with the working plan of the current task period, and when the current task period is started, or before the current task period is started, the historical task data of one or more historical task periods, of which the similarity of the working plan meets the preset requirement, is obtained, so that the execution progress of each task can be more pertinently and flexibly monitored, namely, the time point of timing supervision can be changed instead of being fixed, and the early warning area is higher and more flexible.
Because the work plan of the financial enterprise has a characteristic of being stable in a period of time, such as a plurality of task periods, the historical task period comprises a historical task period of the previous year, which is the same as or similar to the time position of the current task period in one year, or a last task period closest to the current period, or a plurality of continuous task periods closest to the current period.
S202, clustering is conducted on each historical task cycle according to time-consuming data in the historical task data by using a preset clustering model until time-consuming types of a plurality of tasks are determined.
In this step, the ratio of the amount of data contained in each task time-consuming type to the total amount of time-consuming data satisfies a preset duty ratio requirement.
Specifically, a preset number of clustering centers are extracted from time-consuming data of each historical task according to the requirement of a preset clustering model. It should be noted that different preset cluster models may correspond to different numbers of initial cluster centers. And then, clustering time-consuming data of all historical tasks by using a preset clustering model to obtain a first clustering result, namely at least one task time-consuming type obtained for the first time. Next, it is required to determine whether the ratio of the amount of data included in each task time-consuming type to the total amount of time-consuming data meets the preset duty ratio requirement, if so, the next step S203 is entered, otherwise, it is required to reset the clustering center according to the requirement of the preset clustering model, perform clustering again, and determine whether the ratio of the amount of data included in the obtained task time-consuming type to the total amount of time-consuming data meets the preset duty ratio requirement again. And cycling for a plurality of times until the ratio of the data quantity contained in the task time-consuming type to the total quantity of time-consuming data meets the preset duty ratio requirement.
Notably, resetting the cluster centers includes two aspects, one is the number of cluster centers and the other is replacing time-consuming data as cluster centers. Alternatively, the number of the cluster centers may be changed (i.e. increased or decreased), or may be kept unchanged, and those skilled in the art may set the number according to the needs of the actual application scenario.
It should be noted that, in this embodiment, the preset clustering models that perform the clustering process each time may be the same or different, that is, the same preset clustering model may be used during the cyclic clustering process for multiple times, different preset clustering models may be used each time, or one preset clustering model may perform the clustering process for multiple times.
S203, determining a scheduling relation graph according to the configuration data, and determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation graph.
In this step, the scheduling relationship graph is used to characterize the dependency relationship of the inter-calling processing results between each historical task, or the execution sequence between each historical task.
Specifically, according to task configuration information, the upstream and downstream calling relations of historical tasks are split, wherein the upstream and downstream calling relations comprise DATACHECK, MQ interaction types, and then a blood relationship map of task scheduling, namely a scheduling relationship map, is generated.
Determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation map, wherein the determining comprises the following steps:
According to preset screening requirements, determining a first target type and a second target type from time-consuming types of each task, wherein the first target type comprises a type with longer time consumption, and the second target type comprises a type with short time consumption angle;
Determining a first fluctuation range and a second fluctuation range according to time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm, wherein the first fluctuation range and the second fluctuation range can be determined according to a normal distribution diagram corresponding to the first target type and the second target type;
And determining detection objects and detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical task in the time-consuming data.
In step S202 and step S203, the "schedule relationship map is determined", and there is no sequential hard requirement, and these two steps may be performed simultaneously or may be performed first.
S204, according to detection results of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged.
In this step, if yes, S205 is executed, and if no, no abnormality is detected, and the next timing detection task is waited for detection analysis.
S205, determining and outputting one or more pieces of early warning information.
In this step, at least three possible embodiments are included.
1. A first possible embodiment is as follows:
firstly, calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
Then, if the association degree is in the first association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein the early warning level of the first early warning information is the same as that of the second early warning information, the first early warning information is used for representing that the scheduling abnormality exists in the previous task and has an association effect on the scheduling of the next task, and the second early warning information is used for representing that the scheduling abnormality of the next task is derived from the delay of the previous task;
And finally, outputting first early warning information to the previous task and outputting second early warning information to the next task.
2. A second possible embodiment is as follows:
firstly, calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
Then, if the association degree is in the second association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein the first early warning level of the first early warning information is larger than the second early warning level of the second early warning information, the first early warning information is used for representing that scheduling abnormality exists in a previous task and has an association influence on scheduling of a next task, and the second early warning information is used for representing that scheduling abnormality of the next task is derived from delay of the previous task;
And finally, outputting first early warning information to the previous task and outputting second early warning information to the next task.
3. A third possible embodiment is as follows:
firstly, calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
and then, if the association degree is in the third association interval, outputting early warning information to the previous task, wherein the early warning information is used for representing that the scheduling abnormality exists in the previous task.
In the above three embodiments, the association degree between the previous task and the next task in the scheduling relationship graph is calculated according to the preset association model, and the preset association model in this step may be selected according to the actual situation, for example, one embodiment may be represented by the formula (x):
Where r represents the association degree of the previous task X (also referred to as an upstream task) and the next task Y (also referred to as a downstream task), that is, S x is the standard deviation of sample data (i.e., time-consuming data in each history data) of the task X in the history task period, S y is the standard deviation of sample data (i.e., time-consuming data in each history data) of the task Y in the history task period, and cov (X, Y) is the covariance of sample data in the acquisition period of the task X and the task Y.
It should be noted that, in this embodiment, the sample data is time-consuming data, and the historical data obtained in this embodiment is offline data, because the data size of the offline data is very large, most of processing is based on map/reduce processing, except for fluctuation of the data size, the most intuitive embodiment is that the task is time-consuming to execute, and the data is necessarily recorded by the scheduling system, so that the data can be directly obtained, and the cluster resource is avoided from being consumed by additionally deploying the acquisition module. The time-consuming data here includes, in addition to the time of data processing, the time to wait for upstream data (i.e., the processing result of the task immediately preceding the current task). In this embodiment, each task (including a task in a current task period and a historical task in a historical task period) is a task based on a blood relationship (i.e. there is a dependency relationship between processing results called each other or an execution sequence), and is necessarily related, on the basis, the correlation of time-consuming fluctuation is more reflected in the layer of dependency of waiting upstream data between tasks, namely delay early warning.
Specifically, for the three embodiments, when the association degree is in the first association interval, for example, 0.6< r <1, the two tasks are considered to be strongly related, the task level weight configuration is read, the first early warning is taken as a default value 1 as an example, first early warning information related to the task x is generated, early warning content related to the influence task y is added into the first early warning information, and meanwhile, the same-level task y delays early warning about the task x, namely, second early warning information is generated.
When the association degree is in a second association interval, for example, 0.3< r is less than or equal to 0.6, the intermediate correlation between the two tasks is considered, the first early warning information is generated in the mode, and the second early warning task, namely the second early warning information, is generated.
When the association degree is less than or equal to 0.3 in a third association interval, if 0<r is less than or equal to 0.3, the two tasks are considered to be weakly related, the first early warning information is generated according to the mode, the early warning processing is not carried out on the task y, and if abnormality exists in the detection of the task y, the related influence content is regenerated and added into the first early warning information.
It should also be noted that the upstream and downstream relationships are based on blood-based analysis, and that in one possible design, only one layer of separation may be considered for two tasks or task interactions with the upstream and downstream systems, as the early warning analysis is for a full set of tasks. For nodes within a single task, the interval is multi-layered, but within the overall task time consumption, the node delays of the upper and lower layers are the same for the overall correlation calculation, and even for the check node of the lower layer, the time consumption is calculated from the overall task scheduling, so that the longer the waiting time is, the closer the fluctuation curve of the node task time consumption is to the time consumption of the overall task, namely, the higher the probability that the upstream and downstream are simultaneously affected is.
The embodiment of the application provides a task scheduling abnormity monitoring method, which comprises the steps of obtaining historical task data of one or more historical task periods according to a work plan of a current task period, determining a scheduling relation graph according to the configuration data, determining at least one timing detection task according to the scheduling relation graph, wherein the scheduling relation graph is used for representing the dependency relationship of a mutual calling processing result between the historical tasks, determining whether the probability of abnormal task scheduling of a target system meets preset early warning requirements or not according to the detection result of each timing detection task at each detection time point by utilizing a preset clustering model, clustering each historical task according to the time consumption data until a plurality of task time consumption types are determined, and the ratio of the total amount of data quantity and the time consumption data contained in each task time consumption type meets the preset duty ratio requirement. The method solves the technical problems that the existing abnormal monitoring has poor response time efficiency and inflexible configuration, is only the monitoring of a logic level, and has low coupling degree with actual service. The technical effects of ensuring response timeliness by early warning, reserving sufficient time for problem processing, pushing early warning information aiming at dependence association among tasks and being beneficial to quick positioning of problems and resource coordination are achieved.
To facilitate an understanding of several possible embodiments corresponding to S202, a specific description is provided below.
Fig. 3 is a schematic flow chart of determining time-consuming types of a plurality of tasks in a loop in step S202 in the embodiment shown in fig. 2 according to the present application. As shown in fig. 3, the specific steps include:
s301, time-consuming data of a plurality of historical tasks are randomly extracted from all the historical tasks to serve as a clustering center.
S302, performing first clustering processing on each historical task according to a clustering center by using a preset clustering model to determine one or more first time consumption types.
In this embodiment, the clustering process of the preset clustering model may be represented by the formula (1):
Wherein C represents a first time-consuming type, k is the number of initial cluster centers, C i is time-consuming data of each of the k cluster centers when the corresponding historical tasks are executed, that is, the execution time of the initial cluster center, and x j is the execution time of each of all samples, that is, each of the historical tasks, in one or more historical periods.
It is worth noting that the formula (1) combines the characteristics of task scheduling, and by reducing the dimension of the conventional kmeans, the algorithm complexity is reduced, the accuracy of data is ensured, the early warning deployment burden is reduced, and the operation efficiency is improved.
S303, judging whether the data volume duty ratio in each first time-consuming type meets the preset duty ratio requirement.
In this step, the data volume ratio is used to characterize the ratio of the volume of data contained in the first time-consuming type to the total volume of data of the time-consuming data. If yes, step S304 is executed, and if no, step S305 is executed.
In this embodiment, the preset duty cycle requirement includes the data amount duty cycle being greater than or equal to a first duty cycle threshold and less than or equal to a second duty cycle threshold. Optionally, the first value range of the first duty ratio threshold comprises 1% -10%, and the second value range of the second duty ratio threshold comprises 40% -60%. Preferably, the first duty cycle threshold is 5% and the second duty cycle threshold is 50%.
S304, determining that the first time-consuming type is a task time-consuming type.
S305, the clustering center is redetermined, and clustering processing is conducted again, so that the first time consumption types are redetermined, and the data volume duty ratio corresponding to each first time consumption type meets the preset duty ratio requirement.
In this embodiment, the method specifically includes:
and S3051, deleting the first time-consuming types with the data volume duty ratio smaller than the first duty ratio threshold value, and/or randomly extracting at least two historical tasks from the first time-consuming types with the data volume duty ratio larger than the second duty ratio threshold value to serve as new clustering centers.
Specifically, for example, a classification in which the number of deleted samples is less than 5% of the total number of samples, that is, a first time-consuming type in which the amount of deleted data is less than 5% is deleted. Randomly extracting 2 samples from the classification with the total number of samples being more than 50% as new cluster centers.
S3052, for the first time-consuming type meeting the preset duty ratio requirement, a new clustering center is redetermined according to a preset mode.
In one possible embodiment, when the first time-consuming type meets the preset duty cycle requirement, the average time consumption of the first time-consuming type is taken as a new cluster center.
Specifically, the time-consuming data corresponding to the new cluster center can be represented by formula (2):
Wherein C i is a classification set, i.e., a set corresponding to the first time-consuming type, i C i is the number of samples in the set, x is time-consuming data corresponding to each sample in the set, and a i is time-consuming data corresponding to a new cluster center.
After the step is performed, the step returns to step S302, i.e., the clustering process is performed again according to each new clustering center by using the preset clustering model, so as to determine the new first time-consuming type, until the data volume duty ratio corresponding to each first time-consuming type meets the preset duty ratio requirement.
According to the method for circularly determining the time-consuming types of the tasks, provided by the embodiment, the conventional kmeans is reduced in dimension by combining the characteristics of task scheduling, so that the algorithm complexity is reduced, the accuracy of data is ensured, the early warning deployment burden is reduced, and the operation efficiency is improved. By classifying the historical data, the complexity of an early warning algorithm is reduced, the early warning cost is reduced, the early warning accuracy is improved, and by using the existing task configuration information and the historical information, the scheduling platform can directly provide various data for constructing a scheduling relation map and classified sample data of task time-consuming types, is decoupled from large data cluster resources, and avoids increasing cluster burden.
To facilitate understanding of possible implementations of the "determine at least one timing detection task from a plurality of task time-consuming types and a scheduling relationship map" in step S203 in the embodiment shown in fig. 2, a specific embodiment is described below.
Fig. 4 is a schematic flow chart of determining at least one timing detection task in step S203 in the embodiment shown in fig. 2 according to an embodiment of the present application. As shown in fig. 4, the specific steps include:
S401, determining a first target type and a second target type from the time-consuming types of each task according to preset screening requirements.
In this embodiment, the first target type with the largest value in the time-consuming data corresponding to the clustering center is selected from the time-consuming types of each task, and the second target type with the smallest value in the time-consuming data is selected.
S402, determining a first fluctuation range and a second fluctuation range according to time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm.
In this step, specifically, the method includes:
S4021, determining a first fluctuation range according to the first average time consumption of all time consumption data in the first target type and the first standard deviation.
In one possible design, the first fluctuation range is equal to the sum of the first average time consumption and the first standard deviation of N times, and the first fluctuation range B 1 is shown in formula (3):
Wherein, the Representing a first average time consumption, S 1 represents a first standard deviation.
Preferably, if the sample fluctuation interval is subjected to normal distribution as a whole, the confidence level in the three standard deviations of the mean is 99.6%, and therefore, the value of N can be set to 3. It is understood that the value of N can be specifically set by those skilled in the art according to the distribution pattern obeyed by the sample fluctuation interval, and is not limited herein.
S4022, determining a second fluctuation range according to the second average time consumption of all time consumption data in the two target types and the second standard deviation.
In one possible design, the second fluctuation range is equal to the difference between the second average time consumption and the second standard deviation of M times, and the second fluctuation range B 2 is shown in formula (4):
Wherein, the Representing a second average time, S 2 represents a second standard deviation.
Preferably, if the sample fluctuation interval is subjected to normal distribution as a whole, the confidence level in the three standard deviations of the mean is 99.6%, and therefore, the value of M can be set to 3. It is understood that the value of M can be specifically set by those skilled in the art according to the distribution pattern obeyed by the sample fluctuation interval, and is not limited herein.
The method comprises the steps of selecting clusters with maximum time consumption and minimum time consumption, calculating time consumption fluctuation intervals of tasks respectively, reducing extra errors caused by differences among sample categories, reducing calculated amount compared with a normal distribution algorithm of a whole sample, and being more suitable for large data cluster environments with a large number of scheduled tasks.
S403, determining detection objects and detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical task in the time-consuming data.
In one possible design, the detection times include a first detection time including superimposing a first fluctuation range on the basis of the start time and a second detection time including superimposing a second fluctuation range on the basis of the start time.
Specifically, according to the result obtained in S402, the current execution starting time T of the scheduled task is read, and a timing detection job or a timing detection task is generated, where the detection times T 1 and T 2 are respectively shown in formula (5):
Wherein B 1 represents a first fluctuation range and B 2 represents a second fluctuation range.
It is noted that, on the basis of the embodiment shown in fig. 4, in the embodiment shown in fig. 2, S204, according to the detection results of each timing detection task at each detection time point, pre-determines whether the probability of occurrence of an abnormality in task scheduling of the target system meets the preset early warning requirement, which includes two aspects:
if the execution progress of the detection object at the first detection time is determined to be incomplete according to the detection result, determining that the first probability of abnormality in the execution progress of the task meets the preset early warning requirement;
If the execution progress of the detection object at the second detection time is determined to be completed according to the detection result, determining that the second probability of abnormality in the data magnitude of the target system scheduling task meets the preset early warning requirement.
Fig. 5 is a flowchart of another task scheduling anomaly monitoring method according to an embodiment of the present application. As shown in fig. 5, the specific steps of the method include:
s501, acquiring historical task data of one or more historical task periods according to a work plan of a current task period.
In the step, the similarity between the historical work plan of the historical task period and the work plan of the current task period meets the preset requirement, and the historical task data comprises configuration data of each historical task and time-consuming data for executing each historical task.
S502, performing clustering processing on each historical task cycle according to time-consuming data by using a preset clustering model until time-consuming types of a plurality of tasks are determined.
In this step, the ratio of the amount of data contained in each task time-consuming type to the total amount of time-consuming data satisfies a preset duty ratio requirement.
S503, determining a scheduling relation graph according to the configuration data, and determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation graph.
In this step, the scheduling relationship map is used to characterize the dependency relationship of the inter-calling processing results between each history task.
S504, according to detection results of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged.
If not, the step is executed circularly, namely the next timing detection task is executed continuously. If yes, specifically comprising judging two detection results of the first detection time T 1 and the second detection time T 2:
(1) If the execution progress of the detection object at the first detection time is determined to be incomplete according to the detection result, the first probability of abnormality in the execution progress of the task is determined to meet the early warning requirement.
(2) If the execution progress of the detection object at the second detection time is determined to be completed according to the detection result, the second probability that the data magnitude of the target system scheduling task is abnormal is determined to meet the early warning requirement.
In this embodiment, the first detection time T 1 and the second detection time T 2 are shown in formula (5).
It should be noted that, in the embodiment of the present application, only the subsequent processing performed in the first case is expanded and illustrated, i.e., steps S505 to S512 are performed. For the second case, the first case can be referred to, and another early warning mode can be independently adopted, for example, early warning information is sent only once.
S505, when the existence of scheduling abnormality of the detection object is detected at the first detection time, early warning information is sent to the current task.
In this step, the detection object is the currently executing task. And detecting that the detected object has abnormal scheduling at the first detection time, wherein the type of the early warning information is delay type early warning.
S506, determining a third detection time of the detection object according to the preset delay time.
In this step, for delayed early warning, in order to avoid that the early warning message is ignored, and thus the expected early warning effect cannot be achieved, that is, the abnormality cannot be corrected in time before a large number of tasks are scheduled abnormally, the preset delay time is added after the delayed early warning is sent, that is, after the first detection time T 1, to be the third detection time, that is, the third detection time T 3=T1 +td, where Td is the preset delay time, and optionally, the value of Td is equal to S 1 in the formula (3), that is, T 3=T1+S1.
Similarly, after the current task is detected again at the third detection time, if the delay type early warning is still sent, the steps S506 to S512 are repeatedly executed.
S507, when the fact that the current execution task still has the scheduling abnormality is detected at the third detection time, determining a first early warning level of the current execution task according to a first preset early warning weight and early warning trigger times of the current execution task.
In the step, if the current task is detected to meet the preset early warning requirement again at the third detection time, the early warning level of the current task needs to be recalculated, so that the problem that the early warning of the lower early warning level cannot be sent, and the small problem becomes a large problem and serious scheduling accidents are caused is avoided.
In this embodiment, assuming that the current task (also referred to as the last task) is task x, the reference value f (x) of the early warning level of task x can be calculated by the formula (6):
f(x)=uxt (6)
Wherein u x is the initial pre-warning weight of task x, default to 1, T is the pre-warning number, the first pre-warning is performed at the first detection time T 1, the second pre-warning is performed at the third detection time T 3, and so on.
S508, judging whether the first early warning level meets the preset early warning condition.
In this step, if yes, step S505 is executed again, that is, early warning information is sent to the current task, and steps S509 to S512 are also executed, if no, the present flow is directly ended.
Specifically, if f (x) is less than or equal to 0.3, the second early warning level is judged to be low-level early warning and early warning treatment is not carried out, if f (x) is less than or equal to 0.3, the second early warning level is judged to be medium-level early warning, and if f (x) is more than 0.6, the second early warning level is judged to be high-level early warning.
After determining the first early warning level, the corresponding early warning information may be generated with reference to S205, which is not described herein.
S509, determining the association degree of the current execution task and the next task according to the scheduling relation graph by using a preset association model.
In this step, the specific calculation manner of the association degree r between the currently executed task (i.e. task x) and the next task (i.e. task y) may refer to the formula (x) in S205, which is not described herein.
S510, determining a second early warning level of the next task according to the second early warning weight of the next task, the association degree and the early warning trigger times.
In this embodiment, the next task corresponding to the current task, i.e., task x, in the scheduling relationship map is task y, and the reference value f (y) of the early warning level of task y can be calculated by the formula (7):
f(y)=uytr (7)
Wherein u y is the initial pre-warning weight of the task y, default to 1, r is the association degree or the correlation coefficient of the task x and the task y, T is the pre-warning times, the first pre-warning is performed at the first detection time T 1, the second pre-warning is performed at the third detection time T 3, and so on.
S511, judging whether the second early warning level meets the preset early warning condition.
In this step, if yes, step S512 is executed.
Specifically, if f (y) is less than or equal to 0.3, the second early warning level is judged to be low-level early warning, early warning processing is not performed, if f (y) is less than or equal to 0.3, the second early warning level is judged to be medium-level early warning, and if f (y) is more than 0.6, the second early warning level is judged to be high-level early warning.
S512, sending early warning information to the next task.
In this step, the early warning information includes a schedule delay prompting that the scheduling abnormality of the next task is derived from the currently executed task.
Specifically, after the early warning level is determined, corresponding early warning information may be generated with reference to the specific step of S205.
It should be noted that S509 to S510 may be executed synchronously with S507, and the pages S508 and S511 may be executed synchronously.
It should be further noted that, in the task scheduling anomaly monitoring method provided in each embodiment, the pre-warning information may include a weight feedback link, and after determining and outputting one or more pre-warning information, the method further includes:
Receiving adjustment information input by a user through a weight feedback link;
and adjusting the pre-warning weight of the detection object corresponding to the timing detection task according to the adjustment information.
Specifically, the early warning information provides an early warning level weight feedback link, if the early warning level does not accord with the service response level, the weight value to be adjusted is fed back through the link, and the feedback value is updated and recorded in an early warning system database for subsequent early warning generation.
In general, the task scheduling abnormality monitoring method provided by the embodiments of the present application has at least the following advantages:
(1) Early warning can ensure response aging, and enough time is reserved for problem processing;
(2) Aiming at dependent task associated pushing, the method is favorable for quick positioning problem and resource coordination;
(3) According to the data in the period, dynamically adjusting an early warning strategy, so that problems can be found in time;
(4) Providing a feedback interface, and coupling the attention degree of the service to the scheduled task;
(5) The method does not need to share computing resources with a scheduling Server and a Job Server, and basically has no influence on a system;
(6) The system interface implementation is directly invoked following the dependency inversion principle.
Fig. 6 is a schematic structural diagram of a task scheduling abnormality monitoring device according to an embodiment of the present application. The task scheduling anomaly monitoring device 600 may be implemented in software, hardware, or a combination of both.
As shown in fig. 6, the task scheduling abnormality monitoring apparatus 600 includes:
the acquiring module 601 is configured to acquire historical task data of one or more historical task periods according to a work plan of a current task period, where similarity between the historical work plan of the historical task period and the work plan of the current task period meets a preset requirement, and the historical task data includes configuration data of each historical task and time-consuming data for executing each historical task;
a processing module 602, configured to:
Performing clustering processing on each historical task cycle by using a preset clustering model according to time-consuming data until a plurality of task time-consuming types are determined, wherein the ratio of the data quantity contained in each task time-consuming type to the total quantity of the time-consuming data meets the preset duty ratio requirement;
determining a scheduling relation graph according to the configuration data, determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation graph, wherein the scheduling relation graph is used for representing the dependency relation of the inter-calling processing results among the historical tasks;
according to the detection results of each timing detection task at each detection time point, pre-judging whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement or not;
and the output module 603 is configured to output early warning information to a detection object of the timing detection task.
In one possible design, the one or more historical task cycles include a last task cycle that is closest to the current cycle, or a consecutive plurality of task cycles that is closest to the current cycle.
In one possible design, the processing module 602 is configured to:
randomly extracting time-consuming data of a plurality of historical tasks from all the historical tasks to serve as a clustering center;
performing first clustering processing on each historical task according to a clustering center by using a preset clustering model to determine one or more first time consumption types;
judging whether the data volume duty ratio in each first time consuming type meets the preset duty ratio requirement or not;
If yes, determining that the first time-consuming type is a task time-consuming type;
if not, re-determining a clustering center, and re-performing clustering processing to re-determine the first time-consuming types until the data volume duty ratio corresponding to each first time-consuming type meets the preset duty ratio requirement;
wherein the data volume ratio is used to characterize a ratio of the volume of data contained by the first time-consuming type to the total volume of data of the time-consuming data.
In one possible design, the preset duty cycle requirement includes the data amount duty cycle being greater than or equal to a first duty cycle threshold and less than or equal to a second duty cycle threshold.
Optionally, the first value range of the first duty ratio threshold comprises 1% -10%, and the second value range of the second duty ratio threshold comprises 40% -60%.
In one possible design, the processing module 602 is configured to:
deleting a first time-consuming type having a data volume duty cycle less than a first duty cycle threshold, and/or,
Randomly extracting at least two historical tasks from each first time-consuming type with the data volume being larger than a second duty ratio threshold value to serve as a new clustering center;
for the first time-consuming type meeting the preset duty ratio requirement, a new clustering center is redetermined according to a preset mode;
and carrying out clustering again according to each new clustering center by utilizing a preset clustering model so as to determine a new first time-consuming type.
In one possible design, the processing module 602 is further configured to:
when the first time consumption type meets the preset duty ratio requirement, taking the average time consumption of the first time consumption type as a new clustering center.
In one possible design, the processing module 602 is further configured to:
according to preset screening requirements, determining a first target type and a second target type from the time-consuming types of each task;
determining a first fluctuation range and a second fluctuation range according to time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm;
And determining detection objects and detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical task in the time-consuming data.
In one possible design, the processing module 602 is further configured to:
determining a first fluctuation range according to the first average time consumption and the first standard deviation of all time consumption data in the first target type;
And determining a second fluctuation range according to the second average time consumption of all time consumption data in the two target types and the second standard deviation.
In one possible design, the processing module 602 is configured to calculate a first fluctuation range equal to a sum of the first average time and a first standard deviation N times, and calculate a second fluctuation range equal to a difference between the second average time and a second standard deviation M times.
In one possible design, the detection times include a first detection time including superimposing a first fluctuation range on the basis of the start time and a second detection time including superimposing a second fluctuation range on the basis of the start time.
In one possible design, the processing module 602 is configured to:
If the execution progress of the detection object at the first detection time is determined to be incomplete according to the detection result, determining that the first probability of abnormality in the execution progress of the task meets the early warning requirement;
If the execution progress of the detection object at the second detection time is determined to be completed according to the detection result, the second probability that the data magnitude of the target system scheduling task is abnormal is determined to meet the early warning requirement.
In one possible design, the output module 603 is configured to:
Calculating the association degree of the previous task and the next task in the scheduling relation map according to a preset association model;
If the association degree is in the first association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein the early warning level of the first early warning information is the same as that of the second early warning information, the first early warning information is used for representing that the scheduling abnormality exists in the previous task and has an association effect on the scheduling of the next task, and the second early warning information is used for representing that the scheduling abnormality of the next task is derived from the delay of the previous task;
and outputting the first early warning information to the previous task and outputting the second early warning information to the next task.
In one possible design, the output module 603 is configured to:
Calculating the association degree of the previous task and the next task in the scheduling relation map according to a preset association model;
if the association degree is in the second association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein the first early warning level of the first early warning information is larger than the second early warning level of the second early warning information, the first early warning information is used for representing that the scheduling abnormality exists in the previous task and has an association influence on the scheduling of the latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is derived from the delay of the former task;
and outputting the first early warning information to the previous task and outputting the second early warning information to the next task.
In one possible design, the output module 603 is configured to:
Calculating the association degree of the previous task and the next task in the scheduling relation map according to a preset association model;
if the association degree is in the third association interval, outputting early warning information to the previous task, wherein the early warning information is used for representing that the previous task has scheduling abnormality.
In one possible design, the pre-warning information includes a weight feedback link;
the acquiring module 601 is further configured to receive adjustment information input by a user through a weight feedback link;
The processing module 602 is further configured to adjust an early warning weight of a detection object corresponding to the timing detection task according to the adjustment information.
In one possible design, the processing module 602 is further configured to:
When the first detection time detects that the detection object has scheduling abnormality, determining a third detection time for the detection object according to the preset delay time, wherein the detection object is a current execution task;
when the fact that the current execution task still has scheduling abnormality is detected at the third detection time, determining a first early warning level of the current execution task according to a first preset early warning weight and early warning triggering times of the current execution task;
judging whether the first early warning level meets preset early warning conditions or not;
If yes, the early warning information is sent again to the current executing task.
In one possible design, the processing module 602 is further configured to:
determining the association degree of the currently executed task and the next task according to the scheduling relation graph by utilizing a preset association model;
Determining a second early warning level of the next task according to the second early warning weight, the association degree and the early warning triggering times of the next task;
judging whether the second early warning level meets preset early warning conditions or not;
if yes, sending early warning information to the next task, wherein the early warning information comprises a scheduling delay for prompting that the scheduling abnormality of the next task is derived from the current executing task.
It should be noted that, the apparatus provided in the embodiment shown in fig. 6 may perform the method provided in any of the above method embodiments, and the specific implementation principles, technical features, explanation of terms, and technical effects are similar, and are not repeated herein.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device 700 may include at least one processor 701 and a memory 702. Fig. 7 shows an electronic device, for example, a processor.
A memory 702 for storing programs. In particular, the program may include program code including computer-operating instructions.
The memory 702 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 701 is configured to execute computer-executable instructions stored in the memory 702 to implement the methods described in the above method embodiments.
The processor 701 may be a central processing unit (central processing unit, abbreviated as CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present application.
Alternatively, the memory 702 may be separate or integrated with the processor 701. When the memory 702 is a device separate from the processor 701, the electronic device 700 may further include:
A bus 703 for connecting the processor 701 and the memory 702. The bus may be an industry standard architecture (industry standard architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. Buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 702 and the processor 701 are integrated on a single chip, the memory 702 and the processor 701 may communicate through an internal interface.
The embodiment of the application also provides a computer readable storage medium, which can comprise various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk or an optical disk, and the like, and particularly, the computer readable storage medium stores program instructions, wherein the program instructions are used for the method in the method embodiments.
The embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the method of the above-described method embodiments.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (21)

1.一种任务调度异常监控方法,其特征在于,包括:1. A method for monitoring task scheduling anomalies, comprising: 根据当前任务周期的工作计划,获取一个或多个历史任务周期的历史任务数据,所述历史任务周期的历史工作计划与所述当前任务周期的所述工作计划的相似度满足预设要求,所述历史任务数据包括:各个历史任务的配置数据以及执行各个所述历史任务的耗时数据;According to the work plan of the current task cycle, historical task data of one or more historical task cycles are obtained, where the similarity between the historical work plans of the historical task cycles and the work plan of the current task cycle meets a preset requirement, and the historical task data includes: configuration data of each historical task and time-consuming data of executing each historical task; 利用预设聚类模型,根据所述耗时数据,对各个所述历史任务循环进行聚类处理,直至确定多个任务耗时类型,并且每个所述任务耗时类型中所包含的数据量与所述耗时数据的总量的比值满足预设占比要求;Using a preset clustering model, clustering is performed on each of the historical task cycles according to the time-consuming data until a plurality of task time-consuming types are determined, and the ratio of the amount of data contained in each task time-consuming type to the total amount of the time-consuming data meets a preset ratio requirement; 根据所述配置数据确定调度关系图谱,并根据多个所述任务耗时类型以及所述调度关系图谱,确定至少一个定时检测任务,所述调度关系图谱用于表征各个所述历史任务之间相互调用处理结果的依赖关系;Determining a scheduling relationship map based on the configuration data, and determining at least one timed detection task based on the multiple task time-consuming types and the scheduling relationship map, wherein the scheduling relationship map is used to represent the dependency relationship between the processing results of each of the historical tasks; 根据各个所述定时检测任务在各个检测时间点的检测结果,预判目标系统的任务调度出现异常的概率是否满足预设预警要求;若是,则确定并输出一个或多个预警信息;Based on the detection results of each of the scheduled detection tasks at each detection time point, it is predicted whether the probability of abnormality in the task scheduling of the target system meets the preset warning requirements; if so, one or more warning information is determined and output; 所述利用预设聚类模型,根据所述耗时数据,对各个所述历史任务循环进行聚类处理,直至确定多个任务耗时类型,包括:The method of using a preset clustering model to cluster each of the historical task cycles according to the time consumption data until multiple task time consumption types are determined includes: 从所有所述历史任务中随机抽取多个所述历史任务的所述耗时数据作为聚类中心;Randomly extracting the time-consuming data of a plurality of the historical tasks from all the historical tasks as cluster centers; 利用所述预设聚类模型,根据所述聚类中心,对各个所述历史任务进行第一次所述聚类处理,以确定一个或多个第一耗时类型;Using the preset clustering model, and according to the cluster centers, performing the first clustering process on each of the historical tasks to determine one or more first time-consuming types; 判断每个所述第一耗时类型中的数据量占比是否满足所述预设占比要求;Determine whether the proportion of the data volume in each of the first time-consuming types meets the preset proportion requirement; 若是,则确定所述第一耗时类型为任务耗时类型;If so, determining that the first time-consuming type is a task time-consuming type; 若否,则重新确定所述聚类中心,并重新进行所述聚类处理,以重新确定所述第一耗时类型,直至每个所述第一耗时类型对应的所述数据量占比满足所述预设占比要求;If not, re-determine the cluster center and re-perform the clustering process to re-determine the first time-consuming type until the data volume ratio corresponding to each first time-consuming type meets the preset ratio requirement; 其中,所述数据量占比用于表征所述第一耗时类型所含数据量与所述耗时数据的总数据量的比值。The data volume ratio is used to represent the ratio of the data volume contained in the first time-consuming type to the total data volume of the time-consuming data. 2.根据权利要求1所述的任务调度异常监控方法,其特征在于,一个或多个所述历史任务周期包括:与所述当前任务周期最近的上一个任务周期,或者与所述当前任务周期最近的连续多个任务周期。2. The task scheduling anomaly monitoring method according to claim 1 is characterized in that one or more of the historical task cycles include: the previous task cycle closest to the current task cycle, or multiple consecutive task cycles closest to the current task cycle. 3.根据权利要求1-2中任意一项所述的任务调度异常监控方法,其特征在于,所述预设占比要求,包括:所述数据量占比大于或等于第一占比阈值,且小于或等于第二占比阈值。3. The task scheduling anomaly monitoring method according to any one of claims 1-2 is characterized in that the preset proportion requirement includes: the data volume proportion is greater than or equal to a first proportion threshold and less than or equal to a second proportion threshold. 4.根据权利要求3所述的任务调度异常监控方法,其特征在于,所述第一占比阈值的第一取值范围包括:1%~10%,所述第二占比阈值的第二取值范围包括:40%~60%。4. The task scheduling anomaly monitoring method according to claim 3 is characterized in that a first value range of the first proportion threshold includes: 1%~10%, and a second value range of the second proportion threshold includes: 40%~60%. 5.根据权利要求3所述的任务调度异常监控方法,其特征在于,所述重新确定所述聚类中心,并重新进行所述聚类处理,包括:5. The method for monitoring task scheduling anomalies according to claim 3, wherein the re-determining the cluster center and re-performing the clustering process comprises: 删除所述数据量占比小于所述第一占比阈值的第一耗时类型;和/或,Deleting the first time-consuming type whose data volume accounts for less than the first proportion threshold; and/or, 在所述数据量占大于所述第二占比阈值的各个所述第一耗时类型中随机抽取至少两个所述历史任务作为新的所述聚类中心;Randomly selecting at least two of the historical tasks from each of the first time-consuming types whose data volume accounts for more than the second proportion threshold as new cluster centers; 对于满足所述预设占比要求的所述第一耗时类型,根据预设方式重新确定一个新的所述聚类中心;For the first time-consuming type that meets the preset proportion requirement, re-determine a new cluster center according to a preset method; 利用预设聚类模型,根据各个新的所述聚类中心,重新进行所述聚类处理,以确定新的所述第一耗时类型。The clustering process is re-performed according to each new cluster center by using a preset clustering model to determine a new first time-consuming type. 6.根据权利要求5所述的任务调度异常监控方法,其特征在于,所述对于满足所述预设占比要求的所述第一耗时类型,根据预设方式重新确定一个新的所述聚类中心,包括:6. The method for monitoring task scheduling anomalies according to claim 5, wherein for the first time-consuming type that meets the preset proportion requirement, a new cluster center is re-determined according to a preset method, comprising: 当所述第一耗时类型满足所述预设占比要求时,将所述第一耗时类型的平均耗时作为新的所述聚类中心。When the first time-consuming type meets the preset proportion requirement, the average time-consuming of the first time-consuming type is used as the new cluster center. 7.根据权利要求1所述的任务调度异常监控方法,其特征在于,所述根据多个所述任务耗时类型以及所述调度关系图谱,确定至少一个定时检测任务,包括:7. The method for monitoring task scheduling anomalies according to claim 1, wherein determining at least one scheduled detection task based on the plurality of task time-consuming types and the scheduling relationship map comprises: 根据预设筛选要求,从各个所述任务耗时类型中确定第一目标类型和第二目标类型;Determining a first target type and a second target type from each of the task time-consuming types according to preset screening requirements; 利用预设波动算法,根据所述第一目标类型以及所述第二目标类型中的各个所述耗时数据,确定第一波动范围和第二波动范围;Determine a first fluctuation range and a second fluctuation range according to each of the time-consuming data in the first target type and the second target type using a preset fluctuation algorithm; 根据所述调度关系图谱、所述第一波动范围、所述第二波动范围以及所述耗时数据中所述历史任务执行的起始时间,确定各个所述定时检测任务的检测对象和检测时间。The detection object and detection time of each of the scheduled detection tasks are determined according to the scheduling relationship map, the first fluctuation range, the second fluctuation range, and the start time of the execution of the historical tasks in the time-consuming data. 8.根据权利要求7所述的任务调度异常监控方法,其特征在于,所述利用预设波动算法,根据所述第一目标类型以及所述第二目标类型中的各个所述耗时数据,确定第一波动范围和第二波动范围,包括:8. The method for monitoring task scheduling anomalies according to claim 7, wherein the step of determining the first fluctuation range and the second fluctuation range based on the time-consuming data of each of the first target type and the second target type by using a preset fluctuation algorithm comprises: 根据所述第一目标类型中所有所述耗时数据的第一平均耗时以及第一标准差确定所述第一波动范围;determining the first fluctuation range according to a first average duration and a first standard deviation of all the duration data in the first target type; 根据所述二目标类型中所有所述耗时数据的第二平均耗时以及第二标准差确定所述第二波动范围。The second fluctuation range is determined according to a second average duration and a second standard deviation of all the duration data in the two target types. 9.根据权利要求8所述的任务调度异常监控方法,其特征在于,所述根据所述第一目标类型中所有所述耗时数据的第一平均耗时以及第一标准差确定所述第一波动范围,包括:9. The method for monitoring task scheduling anomalies according to claim 8, wherein determining the first fluctuation range based on the first average duration and the first standard deviation of all the duration data in the first target type comprises: 所述第一波动范围等于所述第一平均耗时与N倍的所述第一标准差之和;The first fluctuation range is equal to the sum of the first average time consumption and N times the first standard deviation; 所述根据所述二目标类型中所有所述耗时数据的第二平均耗时以及第二标准差确定所述第二波动范围,包括:The determining the second fluctuation range according to the second average time consumption and the second standard deviation of all the time consumption data of the two target types includes: 所述第二波动范围等于所述第二平均耗时与M倍的所述第二标准差之差。The second fluctuation range is equal to the difference between the second average time consumption and M times the second standard deviation. 10.根据权利要求7-9中任意一项所述的任务调度异常监控方法,其特征在于,所述检测时间,包括:第一检测时间和第二检测时间,所述第一检测时间包括:在所述起始时间的基础上叠加所述第一波动范围,所述第二检测时间包括:在所述起始时间的基础上叠加所述第二波动范围。10. The task scheduling anomaly monitoring method according to any one of claims 7 to 9 is characterized in that the detection time includes: a first detection time and a second detection time, the first detection time includes: superimposing the first fluctuation range on the basis of the start time, and the second detection time includes: superimposing the second fluctuation range on the basis of the start time. 11.根据权利要求10所述的任务调度异常监控方法,其特征在于,所述根据各个所述定时检测任务在各个检测时间点的检测结果,预判目标系统的任务调度出现异常的概率是否满足预设预警要求,包括:11. The method for monitoring task scheduling anomalies according to claim 10, wherein the step of predicting whether the probability of anomalies in task scheduling of the target system meets preset warning requirements based on the detection results of each of the scheduled detection tasks at each detection time point comprises: 若根据所述检测结果确定所述检测对象在第一检测时间的所述执行进度为未完成,则确定所述任务的所述执行进度存在异常的第一概率满足所述预设预警要求;If it is determined according to the detection result that the execution progress of the detection object at the first detection time is incomplete, then a first probability of determining that there is an abnormality in the execution progress of the task meets the preset warning requirement; 若根据所述检测结果确定所述检测对象在第二检测时间的所述执行进度为已完成,则确定所述目标系统调度所述任务的数据量级存在异常的第二概率满足所述预设预警要求。If it is determined according to the detection result that the execution progress of the detection object at the second detection time is completed, then it is determined that the second probability that there is an abnormality in the data level of the task scheduled by the target system meets the preset warning requirement. 12.根据权利要求1-2和7-9中任意一项所述的任务调度异常监控方法,其特征在于,所述确定并输出一个或多个预警信息,包括:12. The method for monitoring task scheduling anomalies according to any one of claims 1 to 2 and 7 to 9, wherein determining and outputting one or more warning information comprises: 根据预设关联模型,计算所述调度关系图谱中前一个任务与后一个任务的关联度;Calculating the correlation between the previous task and the next task in the scheduling relationship graph according to a preset correlation model; 若所述关联度在第一关联区间内,则确定所述预警信息包括第一预警信息和第二预警信息,且所述第一预警信息和所述第二预警信息的预警级别相同,所述第一预警信息用于表征所述前一个任务存在调度异常,且对所述后一个任务的调度产生关联影响,所述第二预警信息用于表征所述后一个任务的调度异常源自于所述前一个任务的延迟;If the correlation degree is within a first correlation interval, it is determined that the warning information includes first warning information and second warning information, and the warning levels of the first warning information and the second warning information are the same, the first warning information is used to indicate that a scheduling anomaly exists in the previous task and has an associated impact on the scheduling of the subsequent task, and the second warning information is used to indicate that the scheduling anomaly of the subsequent task is caused by a delay of the previous task; 向所述前一个任务输出所述第一预警信息,并向所述后一个任务输出所述第二预警信息。The first warning information is output to the preceding task, and the second warning information is output to the succeeding task. 13.根据权利要求1-2和7-9中任意一项所述的任务调度异常监控方法,其特征在于,所述确定并输出一个或多个预警信息,包括:13. The method for monitoring task scheduling anomalies according to any one of claims 1 to 2 and 7 to 9, wherein determining and outputting one or more warning information comprises: 根据预设关联模型,计算所述调度关系图谱中前一个任务与后一个任务的关联度;Calculating the correlation between the previous task and the next task in the scheduling relationship graph according to a preset correlation model; 若所述关联度在第二关联区间内,则确定所述预警信息包括第一预警信息和第二预警信息,且所述第一预警信息的第一预警级别大于所述第二预警信息的第二预警级别,所述第一预警信息用于表征所述前一个任务存在调度异常,且对所述后一个任务的调度产生关联影响,所述第二预警信息用于表征所述后一个任务的调度异常源自于所述前一个任务的延迟;If the correlation degree is within a second correlation interval, it is determined that the warning information includes first warning information and second warning information, and the first warning level of the first warning information is greater than the second warning level of the second warning information, the first warning information is used to indicate that a scheduling anomaly exists in the previous task and has an associated impact on the scheduling of the subsequent task, and the second warning information is used to indicate that the scheduling anomaly of the subsequent task is caused by a delay of the previous task; 向所述前一个任务输出所述第一预警信息,并向所述后一个任务输出所述第二预警信息。The first warning information is output to the preceding task, and the second warning information is output to the succeeding task. 14.根据权利要求1-2和7-9中任意一项所述的任务调度异常监控方法,其特征在于,所述确定并输出一个或多个预警信息,包括:14. The method for monitoring task scheduling anomalies according to any one of claims 1 to 2 and 7 to 9, wherein determining and outputting one or more warning information comprises: 根据预设关联模型,计算所述调度关系图谱中前一个任务与后一个任务的关联度;Calculating the correlation between the previous task and the next task in the scheduling relationship graph according to a preset correlation model; 若所述关联度在第三关联区间内,则向所述前一个任务输出所述预警信息,所述预警信息用于表征所述前一个任务存在调度异常。If the correlation degree is within a third correlation interval, the warning information is output to the previous task, where the warning information is used to indicate that a scheduling anomaly exists in the previous task. 15.根据权利要求1-2和7-9中任意一项所述的任务调度异常监控方法,其特征在于,所述预警信息中包括:权重反馈链接;15. The method for monitoring task scheduling anomalies according to any one of claims 1-2 and 7-9, wherein the warning information includes: a weight feedback link; 在所述确定并输出一个或多个预警信息之后,还包括:After determining and outputting one or more warning information, the method further includes: 接收用户通过所述权重反馈链接输入的调整信息;receiving adjustment information input by the user through the weight feedback link; 根据所述调整信息调整所述定时检测任务对应的检测对象的预警权重。The warning weight of the detection object corresponding to the scheduled detection task is adjusted according to the adjustment information. 16.根据权利要求10所述的任务调度异常监控方法,其特征在于,还包括:当在所述第一检测时间检测到所述检测对象存在所述调度异常时,根据预设延迟时间,确定对所述检测对象的第三检测时间,所述检测对象为当前执行任务;16. The method for monitoring task scheduling anomalies according to claim 10, further comprising: when the scheduling anomaly is detected in the detection object at the first detection time, determining a third detection time for the detection object according to a preset delay time, wherein the detection object is a currently executing task; 当在所述第三检测时间检测到所述当前执行任务仍然存在所述调度异常时,根据所述当前执行任务的第一预设预警权重以及预警触发次数,确定所述当前执行任务的第一预警级别;When it is detected at the third detection time that the scheduling anomaly still exists in the currently executed task, determining a first warning level of the currently executed task according to the first preset warning weight of the currently executed task and the number of warning triggers; 判断所述第一预警级别是否满足预设预警条件;Determining whether the first warning level meets the preset warning conditions; 若是,则向所述当前执行任务再次发送所述预警信息。If so, the warning information is sent again to the currently executing task. 17.根据权利要求16所述的任务调度异常监控方法,其特征在于,当在所述第三检测时间检测到所述当前执行任务仍然存在所述调度异常时,还包括:17. The method for monitoring task scheduling anomalies according to claim 16, wherein when it is detected at the third detection time that the scheduling anomaly still exists in the currently executed task, the method further comprises: 利用预设关联模型,根据所述调度关系图谱,确定所述当前执行任务与下一个任务的关联度;Determine the correlation between the currently executed task and the next task according to the scheduling relationship map using a preset correlation model; 根据所述下一个任务的第二预警权重、所述关联度以及所述预警触发次数,确定所述下一个任务的第二预警级别;determining a second warning level of the next task according to the second warning weight of the next task, the correlation degree, and the number of warning triggering times; 判断所述第二预警级别是否满足所述预设预警条件;Determining whether the second warning level meets the preset warning condition; 若是,则向所述下一个任务发送所述预警信息,所述预警信息包括提示所述下一个任务的调度异常源自于所述当前执行任务的调度延迟。If so, the warning information is sent to the next task, where the warning information includes a prompt indicating that the scheduling anomaly of the next task is caused by the scheduling delay of the currently executed task. 18.一种任务调度异常监控装置,其特征在于,包括:18. A task scheduling anomaly monitoring device, comprising: 获取模块,用于根据当前任务周期的工作计划,获取一个或多个历史任务周期的历史任务数据,所述历史任务周期的历史工作计划与所述当前任务周期的所述工作计划的相似度满足预设要求,所述历史任务数据包括:各个历史任务的配置数据以及执行各个所述历史任务的耗时数据;An acquisition module is configured to acquire historical task data of one or more historical task cycles based on a work plan of a current task cycle, wherein the similarity between the historical work plans of the historical task cycles and the work plan of the current task cycle meets a preset requirement, and the historical task data includes configuration data of each historical task and time-consuming data for executing each historical task; 处理模块,用于:Processing module for: 利用预设聚类模型,根据所述耗时数据,对各个所述历史任务循环进行聚类处理,直至确定多个任务耗时类型,并且每个所述任务耗时类型中所包含的数据量与所述耗时数据的总量的比值满足预设占比要求;Using a preset clustering model, clustering is performed on each of the historical task cycles according to the time-consuming data until a plurality of task time-consuming types are determined, and the ratio of the amount of data contained in each task time-consuming type to the total amount of the time-consuming data meets a preset ratio requirement; 根据所述配置数据确定调度关系图谱,并根据多个所述任务耗时类型以及所述调度关系图谱,确定至少一个定时检测任务,所述调度关系图谱用于表征各个所述历史任务之间相互调用处理结果的依赖关系;Determine a scheduling relationship map based on the configuration data, and determine at least one timed detection task based on the multiple task time-consuming types and the scheduling relationship map, wherein the scheduling relationship map is used to represent the dependency relationship between the processing results of each of the historical tasks; 根据各个所述定时检测任务在各个检测时间点的检测结果,预判目标系统的任务调度出现异常的概率是否满足预设预警要求;若是,则确定一个或多个预警信息;Based on the detection results of each of the scheduled detection tasks at each detection time point, it is predicted whether the probability of abnormality in the task scheduling of the target system meets the preset warning requirements; if so, one or more warning information is determined; 输出模块,用于向所述定时检测任务的检测对象输出所述预警信息;An output module, configured to output the warning information to the detection object of the scheduled detection task; 所述处理模块,具体用于:The processing module is specifically used to: 从所有所述历史任务中随机抽取多个所述历史任务的所述耗时数据作为聚类中心;Randomly extracting the time-consuming data of a plurality of the historical tasks from all the historical tasks as cluster centers; 利用所述预设聚类模型,根据所述聚类中心,对各个所述历史任务进行第一次所述聚类处理,以确定一个或多个第一耗时类型;Using the preset clustering model, and according to the cluster centers, performing the first clustering process on each of the historical tasks to determine one or more first time-consuming types; 判断每个所述第一耗时类型中的数据量占比是否满足所述预设占比要求;Determine whether the proportion of the data volume in each of the first time-consuming types meets the preset proportion requirement; 若是,则确定所述第一耗时类型为任务耗时类型;If so, determining that the first time-consuming type is a task time-consuming type; 若否,则重新确定所述聚类中心,并重新进行所述聚类处理,以重新确定所述第一耗时类型,直至每个所述第一耗时类型对应的所述数据量占比满足所述预设占比要求;If not, re-determine the cluster center and re-perform the clustering process to re-determine the first time-consuming type until the data volume ratio corresponding to each first time-consuming type meets the preset ratio requirement; 其中,所述数据量占比用于表征所述第一耗时类型所含数据量与所述耗时数据的总数据量的比值。The data volume ratio is used to represent the ratio of the data volume contained in the first time-consuming type to the total data volume of the time-consuming data. 19.一种电子设备,其特征在于,包括:19. An electronic device, comprising: 处理器;以及,processor; and, 存储器,用于存储所述处理器的计算机程序;a memory for storing a computer program for the processor; 其中,所述处理器配置为经由执行所述计算机程序来执行权利要求1至17任一项所述的任务调度异常监控方法。The processor is configured to execute the task scheduling exception monitoring method according to any one of claims 1 to 17 by executing the computer program. 20.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至17任一项所述的任务调度异常监控方法。20. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the method for monitoring task scheduling anomalies according to any one of claims 1 to 17 is implemented. 21.一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至17任一项所述的任务调度异常监控方法。21. A computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the method for monitoring task scheduling anomalies according to any one of claims 1 to 17 is implemented.
CN202210646351.4A 2022-06-09 2022-06-09 Task scheduling anomaly monitoring method, device, medium and program product Active CN114840392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210646351.4A CN114840392B (en) 2022-06-09 2022-06-09 Task scheduling anomaly monitoring method, device, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210646351.4A CN114840392B (en) 2022-06-09 2022-06-09 Task scheduling anomaly monitoring method, device, medium and program product

Publications (2)

Publication Number Publication Date
CN114840392A CN114840392A (en) 2022-08-02
CN114840392B true CN114840392B (en) 2025-08-26

Family

ID=82574833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210646351.4A Active CN114840392B (en) 2022-06-09 2022-06-09 Task scheduling anomaly monitoring method, device, medium and program product

Country Status (1)

Country Link
CN (1) CN114840392B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587008A (en) * 2022-09-27 2023-01-10 北京沃东天骏信息技术有限公司 Task monitoring method, device, server and storage medium
CN115292141B (en) * 2022-09-29 2023-02-03 深圳联友科技有限公司 Scheduling abnormity early warning method based on sliding time window and monitoring server
CN117271100B (en) * 2023-11-21 2024-02-06 北京国科天迅科技股份有限公司 Algorithm chip cluster scheduling method, device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6438573B1 (en) * 1996-10-09 2002-08-20 Iowa State University Research Foundation, Inc. Real-time programming method
CN107241205A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 abnormality monitoring method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008082B (en) * 2019-03-16 2022-06-17 平安科技(深圳)有限公司 Abnormal task intelligent monitoring method, device, equipment and storage medium
CN110275814A (en) * 2019-06-28 2019-09-24 深圳前海微众银行股份有限公司 A monitoring method and device for a business system
CN112346829B (en) * 2019-08-07 2023-02-17 上海云盾信息技术有限公司 Method and equipment for task scheduling
CN111522707B (en) * 2020-03-27 2025-03-25 中国平安财产保险股份有限公司 Big data platform scheduling warning method, device and computer-readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6438573B1 (en) * 1996-10-09 2002-08-20 Iowa State University Research Foundation, Inc. Real-time programming method
CN107241205A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 abnormality monitoring method and device

Also Published As

Publication number Publication date
CN114840392A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN114840392B (en) Task scheduling anomaly monitoring method, device, medium and program product
US8516499B2 (en) Assistance in performing action responsive to detected event
US10402225B2 (en) Tuning resources based on queuing network model
CN106802826B (en) A thread pool-based business processing method and device
US8490108B2 (en) Method of estimating a processing time of each of a plurality of jobs and apparatus thereof
JP6447120B2 (en) Job scheduling method, data analyzer, data analysis apparatus, computer system, and computer-readable medium
US11966778B2 (en) Cloud application scaler
JP6260130B2 (en) Job delay detection method, information processing apparatus, and program
CN111475267B (en) System task automatic scheduling method, device, computer equipment and storage medium
CN111680085A (en) Data processing task analysis method and device, electronic equipment and readable storage medium
CN113360270A (en) Data cleaning task processing method and device
WO2023115856A1 (en) Task exception alert method and apparatus
CN114861909A (en) Model quality monitoring method and device, electronic equipment and storage medium
US8180716B2 (en) Method and device for forecasting computational needs of an application
CN112749013A (en) Thread load detection method and device, electronic equipment and storage medium
US12327134B2 (en) Method and system for predicting batch processes
CN120124975A (en) A method, system, computer device and storage medium for executing alarm tasks
CN110795239A (en) Application memory leakage detection method and device
CN112685390B (en) Database instance management method and device and computing equipment
CN117032916A (en) Event-based task scheduling algorithm, device and storage medium
Poltavtseva et al. Planning of aggregation and normalization of data from the Internet of Things for processing on a multiprocessor cluster
CN119645578B (en) Transaction processing method, electronic device, storage medium, and program product
CN117290113B (en) Task processing method, device, system and storage medium
CN118409843A (en) Early warning method and device for operation aging, storage medium and electronic equipment
CN118796597A (en) A method, device, equipment, storage medium and product for managing scheduled tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant