CN114840392A - Method, apparatus, medium, and program product for monitoring task scheduling exception - Google Patents

Method, apparatus, medium, and program product for monitoring task scheduling exception Download PDF

Info

Publication number
CN114840392A
CN114840392A CN202210646351.4A CN202210646351A CN114840392A CN 114840392 A CN114840392 A CN 114840392A CN 202210646351 A CN202210646351 A CN 202210646351A CN 114840392 A CN114840392 A CN 114840392A
Authority
CN
China
Prior art keywords
task
early warning
time
scheduling
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210646351.4A
Other languages
Chinese (zh)
Inventor
刘林
王志远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202210646351.4A priority Critical patent/CN114840392A/en
Publication of CN114840392A publication Critical patent/CN114840392A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a method, a device, a medium and a program product for monitoring task scheduling abnormity, which are used for acquiring historical task data of one or more historical task periods according to a working plan of a current task period; determining a plurality of task time consumption types and a scheduling relation map according to historical task data, wherein the ratio of the data amount contained in each task time consumption type to the total amount of the time consumption data meets the preset ratio requirement; determining at least one timing detection task according to the multiple task time consumption types and the scheduling relation map; according to the detection result of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged in advance; and if so, determining and outputting one or more pieces of early warning information. The method solves the technical problems that the existing abnormal monitoring has poor response timeliness and inflexible configuration, is only monitoring on a logic level and has low coupling degree with actual services.

Description

Method, apparatus, medium, and program product for monitoring task scheduling exception
Technical Field
The present application relates to the field of financial technology (Fintech), and in particular, to a method, an apparatus, a medium, and a program product for monitoring task scheduling exceptions.
Background
With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually shifting to financial technology (Fintech). At present, offline data to be processed every day in the financial and internet industries has the characteristics of large data scale and high timeliness requirement, particularly, a financial enterprise relates to a large amount of supervision and submission data processing, and if a data processing task is not completed, related data cannot be given on time, and even possibly is subjected to supervision and accountability, so that enterprise rating and reputation are influenced. The monitoring and exception response of the task scheduling system is particularly important.
Currently, open-source task scheduling frameworks or tools, such as Azkaban, Airflow, Oozie, and the like, are mature, but most of monitoring functions configured or developed by using the frameworks or tools are based on task abnormal states or fixed parameters configured based on experience, and the tasks are often in abnormal states or have actual influences after such monitoring alarms come out.
Namely, the existing abnormity monitoring has the technical problems of poor response timeliness, inflexible configuration, only logic level monitoring and low coupling degree with actual services, and increases the difficulty and workload of operation and maintenance work.
Disclosure of Invention
The application provides a method, a device, a medium and a program product for monitoring task scheduling abnormity, which are used for solving the technical problems of poor response timeliness, inflexible configuration, only logic level monitoring and low coupling degree with actual services in the conventional abnormity monitoring.
In a first aspect, the present application provides a method for monitoring task scheduling exception, including:
according to the work plan of the current task cycle, historical task data of one or more historical task cycles are obtained, the similarity between the historical work plan of the historical task cycle and the work plan of the current task cycle meets preset requirements, and the historical task data comprise: configuration data of each historical task and time-consuming data for executing each historical task;
clustering each historical task cycle by using a preset clustering model according to the time-consuming data until a plurality of task time-consuming types are determined, and the ratio of the data amount contained in each task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement;
determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation map, wherein the scheduling relation map is used for representing the dependency relationship of the mutual calling processing results among the historical tasks;
according to the detection result of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged in advance; and if so, determining and outputting one or more pieces of early warning information.
In one possible design, the one or more historical task cycles include: the last task cycle nearest to the current cycle, or a consecutive number of task cycles nearest to the current cycle.
In one possible design, the method for clustering the historical task cycles by using the preset clustering model according to the time-consuming data until determining the time-consuming types of a plurality of tasks comprises the following steps:
randomly extracting time-consuming data of a plurality of historical tasks from all historical tasks as a clustering center;
performing first clustering processing on each historical task by using a preset clustering model according to a clustering center to determine one or more first time-consuming types;
judging whether the data volume ratio in each first time-consuming type meets the preset ratio requirement or not;
if so, determining that the first time consumption type is a task time consumption type;
if not, re-determining the clustering center, and re-performing clustering processing to re-determine the first time-consuming types until the data volume proportion corresponding to each first time-consuming type meets the preset proportion requirement;
the data volume proportion is used for representing the ratio of the data volume contained in the first time consumption type to the total data volume of the time consumption data.
In one possible design, the predetermined duty cycle requirement includes: the data amount ratio is greater than or equal to a first ratio threshold and less than or equal to a second ratio threshold.
Optionally, the first value range of the first ratio threshold includes: 1% -10%, the second value range of the second ratio threshold includes: 40 to 60 percent.
In one possible design, re-determining the cluster center and re-performing the clustering process includes:
deleting a first time-consuming type with the data volume proportion smaller than a first proportion threshold; and/or the presence of a gas in the atmosphere,
randomly extracting at least two historical tasks from each first time-consuming type with the data volume larger than a second ratio threshold value to serve as a new clustering center;
for a first time-consuming type meeting the preset proportion requirement, re-determining a new clustering center according to a preset mode;
and performing clustering again by using a preset clustering model according to each new clustering center to determine a new first time-consuming type.
In one possible design, for a first time-consuming type meeting a preset proportion requirement, re-determining a new cluster center according to a preset mode includes:
and when the first time consumption type meets the preset proportion requirement, taking the average time consumption of the first time consumption type as a new clustering center.
In one possible design, determining at least one timing detection task according to a plurality of task time consumption types and a scheduling relationship map includes:
determining a first target type and a second target type from each task time consumption type according to a preset screening requirement;
determining a first fluctuation range and a second fluctuation range according to each time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm;
and determining the detection object and the detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical tasks in the time-consuming data.
In one possible design, determining the first fluctuation range and the second fluctuation range according to the respective time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm includes:
determining a first fluctuation range according to first average consumed time of all consumed time data in a first target type and a first standard deviation;
and determining a second fluctuation range according to the second average consumed time of all the consumed time data in the two target types and the second standard deviation.
In one possible design, determining a first fluctuation range according to a first average elapsed time and a first standard deviation of all elapsed time data in a first target type includes:
the first fluctuation range is equal to the sum of the first average elapsed time and N times of the first standard deviation;
determining a second fluctuation range according to a second average consumed time and a second standard deviation of all consumed time data in the two target types, wherein the second fluctuation range comprises:
the second fluctuation range is equal to a difference between the second average elapsed time and M times the second standard deviation.
In one possible design, the detection time includes: a first detection time and a second detection time, the first detection time comprising: superimposing the first fluctuation range on the basis of the start time, the second detection time comprising: the second fluctuation range is superimposed on the basis of the start time.
In one possible design, according to the detection result of each timing detection task at each detection time point, whether the probability of the task scheduling abnormality of the target system meets the preset early warning requirement is judged in advance, and the method comprises the following steps:
if the execution progress of the detection object in the first detection time is determined to be incomplete according to the detection result, determining that a first probability that the execution progress of the task is abnormal meets an early warning requirement;
and if the execution progress of the detection object in the second detection time is determined to be completed according to the detection result, determining that the second probability that the data magnitude of the target system scheduling task is abnormal meets the early warning requirement.
In one possible design, determining and outputting one or more warning messages includes:
calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
if the association degree is in the first association interval, determining that the early warning information comprises first early warning information and second early warning information, and the early warning levels of the first early warning information and the second early warning information are the same, wherein the first early warning information is used for representing that a former task has scheduling abnormality and has association influence on the scheduling of the latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is derived from the delay of the former task;
and outputting first early warning information to the previous task and outputting second early warning information to the next task.
In one possible design, determining and outputting one or more warning messages includes:
calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
if the association degree is in a second association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein a first early warning level of the first early warning information is greater than a second early warning level of the second early warning information, the first early warning information is used for representing that a former task has scheduling abnormality and has association influence on scheduling of a latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is originated from delay of the former task;
and outputting first early warning information to the previous task and outputting second early warning information to the next task.
In one possible design, determining and outputting one or more warning messages includes:
calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
and if the association degree is in the third association interval, outputting early warning information to the previous task, wherein the early warning information is used for representing that the previous task has scheduling abnormity.
In one possible design, the warning information includes: a weight feedback link;
after determining and outputting one or more pieces of early warning information, the method further comprises the following steps:
receiving adjustment information input by a user through the weight feedback link;
and adjusting the early warning weight of the detection object corresponding to the timing detection task according to the adjustment information.
In one possible design, the method further includes: when the scheduling abnormality of the detection object is detected at the first detection time, determining third detection time of the detection object according to preset delay time, wherein the detection object is a current execution task;
when the scheduling exception still exists in the current execution task at the third detection time, determining a first early warning level of the current execution task according to a first preset early warning weight and early warning triggering times of the current execution task;
judging whether the first early warning level meets a preset early warning condition or not;
and if so, sending the early warning information to the current execution task again.
In one possible design, when it is detected that there is still a scheduling exception for the currently executed task at the third detection time, the method further includes:
determining the association degree of the current execution task and the next task according to the scheduling relation map by using a preset association model;
determining a second early warning level of the next task according to the second early warning weight, the association degree and the early warning triggering times of the next task;
judging whether the second early warning level meets a preset early warning condition or not;
and if so, sending early warning information to the next task, wherein the early warning information comprises a scheduling delay for prompting that the scheduling exception of the next task is originated from the currently executed task.
In a second aspect, the present application provides a task scheduling exception monitoring apparatus, including:
the acquisition module is used for acquiring historical task data of one or more historical task periods according to the working plan of the current task period, the similarity between the historical working plan of the historical task period and the working plan of the current task period meets a preset requirement, and the historical task data comprises the following steps: configuration data of each historical task and time-consuming data for executing each historical task;
a processing module to:
clustering each historical task cycle by using a preset clustering model according to the time-consuming data until a plurality of task time-consuming types are determined, and the ratio of the data amount contained in each task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement;
determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation map, wherein the scheduling relation map is used for representing the dependency relationship of the mutual calling processing results among the historical tasks;
according to the detection result of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged in advance; if yes, determining one or more pieces of early warning information;
and the output module is used for outputting the early warning information to the detection object of the timing detection task.
In a third aspect, the present application provides an electronic device comprising:
a memory for storing program instructions;
and the processor is used for calling and executing the program instructions in the memory to execute any one of the possible methods provided by the first aspect.
In a fourth aspect, the present application provides a storage medium, where a computer program is stored in the storage medium, where the computer program is configured to execute any one of the possible task scheduling exception monitoring methods provided in the first aspect.
In a fifth aspect, the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements any one of the possible task scheduling exception monitoring methods provided in the first aspect.
The application provides a method, a device, a medium and a program product for monitoring task scheduling abnormity, wherein historical task data of one or more historical task periods are obtained according to a working plan of a current task period, the similarity between the historical working plan of the historical task period and the working plan of the current task period meets preset requirements, and the historical task data comprises the following steps: configuration data of each historical task and time-consuming data for executing each historical task; clustering each historical task cycle by using a preset clustering model according to the time-consuming data until a plurality of task time-consuming types are determined, and the ratio of the data amount contained in each task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement; determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation map, wherein the scheduling relation map is used for representing the dependency relationship of the mutual calling processing results among the historical tasks; according to the detection result of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged in advance; and if so, determining and outputting one or more pieces of early warning information. The method solves the technical problems that the existing abnormal monitoring has poor response timeliness and inflexible configuration, is only monitoring on a logic level and has low coupling degree with actual services. The early warning can guarantee response timeliness, sufficient time is reserved for problem processing, early warning information is pushed according to dependence association between tasks, and the technical effects of quickly positioning problems and coordinating resources are achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic view of an application scenario of a method for monitoring task scheduling exceptions according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a task scheduling exception monitoring method according to the present application;
FIG. 3 is a flowchart illustrating a process of determining time-consuming types of multiple tasks in a loop in step S202 according to the embodiment shown in FIG. 2;
fig. 4 is a schematic flowchart of determining at least one timing detection task in step S203 of the embodiment shown in fig. 2 according to an embodiment of the present application;
FIG. 5 is a schematic flowchart of another task scheduling exception monitoring method according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a task scheduling exception monitoring apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device provided in the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, including but not limited to combinations of embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any inventive step are within the scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The following explanations are made for terms to which this application refers:
MQ (Message Queue): is a "first-in first-out" data structure in the underlying data structure. Refers to placing data (also called messages) to be transmitted in a queue, and using a queue mechanism to realize message passing, i.e. a producer generates a message and puts the message in a queue, and then the message is processed by a consumer. The consumer can pull the message to the designated queue or subscribe to the corresponding queue, and the MQ server pushes the message for the consumer.
And (3) acquisition period: the number of days for acquiring the data for analyzing and comparing required by the early warning model can be adjusted according to the data scale.
Datacheck: and data checking, namely, the scheduling task checks the integrity of the dependent data before data processing.
Job Server: and the job server refers to a server for receiving and executing the specific work content of the scheduling task.
The off-line data to be processed every day in the current financial and internet industries has the characteristics of large data scale and high timeliness requirement, particularly, financial enterprises relate to processing of a large amount of supervision and submission data, and if the task of data processing is not completed, the related data cannot be given on time, and even possibly is subjected to supervision and accountability, so that enterprise rating and reputation are influenced. The monitoring and exception response of the task scheduling system is particularly important.
At present, open-source task scheduling frameworks or tools, such as Azkaban, Airflow, Oozie and the like, are mature, but monitoring functions configured or developed by using the frameworks or tools are mostly based on task abnormal states or fixed parameters configured based on experience, and the like, tasks are often abnormal or actually influenced after the monitoring alarms come out, response timeliness is poor, configuration is not flexible, operation and maintenance difficulty and workload are increased, monitoring at a logic level is realized, and the coupling degree with actual services is not high.
For the situations, in the existing solution, an engine is deployed in a big data cluster, and the health conditions of the cluster before and after the scheduling time are acquired according to the task scheduling time, so that early warning is performed. In the implementation of the technical scheme, cluster resources are often additionally occupied, data needs to be acquired basically all day long under the condition that the data volume of the scheduling task is large, the cluster burden is increased, and the possibility of abnormity is increased; in addition, the current common early warning implementation algorithm is normal distribution prediction for the whole sample, no distinction is made, a large amount of historical data is needed, calculation time of the early warning system is increased, and meanwhile, the prediction result is not accurate enough.
It should be noted that because a normal distribution is established for the entirety of all history periods, a large amount of history data is required. And because each history period or a plurality of continuous history periods, or in a period of time, the executed task has the staged characteristics, the phenomenon of no distinction can occur when the whole sample is adopted.
In summary, the conventional anomaly monitoring has the following technical problems:
(1) the existing monitoring implementation scheme has poor response timeliness, inflexible configuration, large operation and maintenance workload and low coupling degree with actual services;
(2) a large number of data acquisition requirements needed for early warning cause additional burden to a large data cluster;
(3) the existing algorithm has high complexity, low data discrimination, high calculation cost and inaccurate prediction result.
In order to improve the existing anomaly monitoring method, the inventor of the application analyzes and finds that the following technical obstacles exist to improve the method:
(1) the existing monitoring scheme is mostly based on the condition of large data cluster resource scheduling abnormity and the task state, under the condition of large amount of service data cross scheduling, the scene of single task state abnormity is slow in perception of downstream tasks, and the influence on the downstream tasks cannot be accurately analyzed.
(2) Real-time data analysis and monitoring needs to frequently interact with the system, system resources are occupied, and system computing pressure is increased.
(3) The existing algorithm is applied without combining with the actual scene, which causes extra resource consumption and increases data errors.
In order to solve the above problems, the inventive concept of the present application is:
(1) under the condition of not changing the bottom logic of the task scheduling system, according to task configuration information, the time consumed by task execution and the MQ sending and receiving time, the blood relationship (namely the interdependency relationship during scheduling), the expected normal completion time interval of the tasks and the correlation (namely the correlation degree) between the blood tasks (namely the tasks with scheduling sequence relationship between the blood tasks) are analyzed, the early warning and pushing of the upstream and downstream tasks (namely the two tasks with adjacent execution sequences) are respectively given, and the abnormal response efficiency and the processing timeliness are improved; (2) historical sample data in a period is analyzed, early warning detection time is dynamically adjusted, manpower for operation and maintenance manual configuration is reduced, and meanwhile flexibility of early warning configuration is improved; (3) and an early warning level weight module is added, an early warning feedback interface is provided for receiving the service attention fed back by operation and maintenance personnel, a dynamic early warning push is generated by combining an early warning detection result, and the upgrading of an abnormal influence range is avoided while the actual service requirement is coupled. (4) Through historical data classification, the complexity of an early warning algorithm is reduced, the early warning cost is reduced, and the early warning accuracy is improved; (5) by using the existing task configuration information and historical information, the scheduling platform can be directly provided and decoupled with big data cluster resources, so that the increase of the load of the cluster is avoided.
Fig. 1 is a schematic view of an application scenario of a task scheduling exception monitoring method provided in the present application. As shown in fig. 1, an anomaly monitoring system 200 is independently provided outside the task scheduling system 100, and the anomaly monitoring system 200 determines a plurality of timing detection tasks by executing the status task scheduling anomaly monitoring method provided by the present application, and re-determines the detection time of the timing detection task every task cycle. The exception monitoring system 200 does not alarm after the task scheduling is abnormal, but monitors the execution progress of each task, and sends the early warning information to the two tasks before and after the task scheduling has the execution sequence requirement, and before the task scheduling is abnormal, the corresponding early warning information is sent when the probability of the occurrence of the exception is found to be greater than the early warning requirement in advance through the execution progress of the task.
The method for monitoring abnormal scheduling of status tasks provided by the present application is specifically described as follows:
fig. 2 is a flowchart illustrating a method for monitoring task scheduling exceptions according to an embodiment of the present application. As shown in fig. 2, the specific steps of the task scheduling exception monitoring method include:
s201, acquiring historical task data of one or more historical task periods according to the work plan of the current task period.
In this step, the similarity between the historical work plan of the historical task period and the work plan of the current task period meets the preset requirement, and the historical task data includes: configuration data for each historical task and time-consuming data for performing each historical task.
The abnormal monitoring method of the embodiment of the application is different from the normal distribution prediction of the whole sample of all historical task cycles in the prior art, the working plan of each historical task cycle is compared with the working plan of the current task cycle, and when the current task cycle is started or before the current task cycle is started, historical task data of one or more historical task cycles with the similarity of the working plan meeting preset requirements are obtained, so that the execution progress of each task can be flexibly supervised in a more targeted manner, namely the time point of the timing supervision can be changed instead of fixed, and the degree of division of the early warning area is higher and more flexible.
Because the work plan of the financial enterprise has a staged characteristic, i.e. is stable in a period of time, such as a plurality of task cycles, the historical task cycles include: the historical task cycle of the previous year, which is the same as or similar to the time position of the current task cycle in one year, or the last task cycle closest to the current cycle, or a plurality of continuous task cycles closest to the current cycle.
S202, clustering each historical task circularly by using a preset clustering model according to the time-consuming data in the historical task data until determining the time-consuming types of a plurality of tasks.
In this step, the ratio of the data amount contained in each task time consumption type to the total amount of the time consumption data satisfies the preset proportion requirement.
Specifically, according to the requirements of a preset clustering model, a preset number of clustering centers are extracted from the time-consuming data of each historical task. It is noted that different predetermined cluster models may correspond to different numbers of initial cluster centers. Then, clustering the time-consuming data of all historical tasks by using a preset clustering model to obtain a first clustering result, namely at least one task time-consuming type obtained for the first time. Next, it is required to determine whether the ratio of the data amount included in each task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement, if so, the next step S203 is performed, otherwise, the clustering center needs to be reset according to the requirement of the preset clustering model, clustering processing is performed again, and it is determined whether the ratio of the data amount included in the obtained task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement again. And circulating for many times until the ratio of the data quantity contained in the task time consumption type to the total quantity of the time consumption data meets the preset ratio requirement.
Notably, resetting the cluster center includes two aspects: one is the number of cluster centers and the other is time consuming data to replace as cluster centers. Optionally, the number of the cluster centers may be changed (i.e. increased or decreased) or may be kept unchanged, and those skilled in the art may set the number according to the needs of the actual application scenario.
It should be noted that, in this embodiment, the preset clustering models that perform clustering each time may be the same or different, that is, when performing cyclic clustering for multiple times, the same preset clustering model may be used, different preset clustering models may be used each time, or one preset clustering model may perform clustering for a preset number of times.
S203, determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to the time consumption types of the multiple tasks and the scheduling relation map.
In this step, the scheduling relationship map is used to represent the dependency relationship of the mutual calling processing results between the historical tasks, or the execution sequence between the historical tasks.
Specifically, according to the task configuration information, splitting the upstream and downstream call relationship of the historical task includes: and (4) interacting the datacheck and MQ, and then generating a blood relationship map of task scheduling, namely a scheduling relationship map.
Determining at least one timing detection task according to a plurality of task time consumption types and a scheduling relation map, wherein the method comprises the following steps:
determining a first target type and a second target type from various task time consumption types according to preset screening requirements, wherein the first target type comprises a type consuming longer time, and the second target type comprises a type consuming short time;
determining a first fluctuation range and a second fluctuation range according to each time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm, wherein the first fluctuation range and the second fluctuation range can be determined according to normal distribution graphs corresponding to the first target type and the second target type;
and determining the detection object and the detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical tasks in the time-consuming data.
It should be noted that, the "determination of the scheduling relationship map" in step S202 and step S203 has no hard requirement in order, and these two steps may be executed simultaneously, or may be executed first.
S204, pre-judging whether the probability of the task scheduling abnormity of the target system meets the preset early warning requirement or not according to the detection result of each timing detection task at each detection time point.
In this step, if yes, S205 is executed, and if no, it is verified that no abnormality is detected, and a next timing detection task is waited for detection analysis.
And S205, determining and outputting one or more pieces of early warning information.
At least three possible embodiments are included in this step.
The first possible implementation is as follows:
firstly, calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
then, if the association degree is in a first association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein the early warning levels of the first early warning information and the second early warning information are the same, the first early warning information is used for representing that a former task has scheduling abnormality and has association influence on the scheduling of a latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is originated from the delay of the former task;
and finally, outputting first early warning information to the previous task and outputting second early warning information to the next task.
The second possible implementation is as follows:
firstly, calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
then, if the association degree is in a second association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein a first early warning level of the first early warning information is greater than a second early warning level of the second early warning information, the first early warning information is used for representing that a former task has scheduling abnormality and has association influence on scheduling of a latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is caused by delay of the former task;
and finally, outputting first early warning information to the previous task and outputting second early warning information to the next task.
The third possible implementation mode is as follows:
firstly, calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
and then, if the association degree is in a third association interval, outputting early warning information to the previous task, wherein the early warning information is used for representing that the previous task has scheduling abnormity.
In the three embodiments, the association degree between a previous task and a next task in the scheduling relationship graph is calculated according to a preset association model, and in this step, the preset association model may be selected according to an actual situation, for example, one of the embodiments may be represented by a formula (±):
Figure BDA0003686065710000091
where r represents the degree of association of a previous task x (also referred to as an upstream task) with a subsequent task y (also referred to as a downstream task), i.e., S x Is the standard deviation of the sample data (i.e. the time-consuming data in each historical data) of the task x in the historical task period, S y For the standard deviation of the sample data (i.e., the time-consuming data in each historical data) of task Y in the historical task period, cov (X, Y) is the covariance of the sample data in the acquisition period of task X and task Y.
It should be noted that, in this embodiment, the sample data is time-consuming data, and the historical data acquired in this embodiment is offline data, because the data size of the offline data is large, most of the processing is based on map/reduce processing, except for fluctuation of the data size, the most intuitive embodiment is that the execution of the task is time-consuming, and such data is definitely recorded by the scheduling system and can be directly acquired, thereby avoiding consuming cluster resources by additionally deploying an acquisition module. The time-consuming data herein includes, in addition to the time for data processing, the time for waiting for upstream data (i.e., the processing result of the task immediately preceding the current task). In this embodiment, each task (including a task in the current task period and a historical task in the historical task period) is a task based on a blood relationship (that is, there is a dependency relationship between processing results or between execution sequences), and is certain to be related; on the basis, the relevance of time-consuming fluctuation is more reflected in the layer of dependence of waiting for upstream data among tasks, namely delaying early warning.
Specifically, for the three embodiments, when the association degree is in the first association interval, if 0.6< r <1, it is considered that the two tasks are strongly associated, and the task level weight configuration is read, taking the first early warning as the default value 1 as an example: and generating first early warning information related to the task x, adding early warning content related to the influence on the task y into the first early warning information, and simultaneously generating second early warning information which is the delayed early warning of the task y at the same level about the task x.
And when the association degree is in a second association interval, if r is more than 0.3 and less than or equal to 0.6, the two tasks are considered to be in medium correlation, the first early warning information is generated according to the mode, and meanwhile, a secondary early warning task, namely second early warning information, is generated.
And when the association degree is in a third association interval, if r is more than 0 and less than or equal to 0.3, the two tasks are considered to be weakly associated, first early warning information is generated according to the mode, early warning processing is not carried out on the task y, and if abnormality exists during detection of the task y, relevant influence contents are generated and added into the first early warning information.
It should also be noted that the upstream and downstream relationships are based on blood relationship analysis; for two tasks or a task interacting with the upstream and downstream system, in one possible design, only one layer of separation may be considered, since the pre-warning analysis here is for the full number of tasks. For nodes in a single task, intervals are multilayer, but in the whole task time consumption, the delay of nodes in an upper layer and a lower layer is the same for correlation calculation as a whole, and even for the check nodes in the lower layer, the time consumption is calculated from the whole task scheduling, so that the longer the waiting time is, the closer the fluctuation curve of the node task time consumption is to the time consumption of the whole task, namely, the higher the probability that the upstream and the downstream are affected at the same time.
The embodiment of the application provides a method for monitoring task scheduling abnormity, historical task data of one or more historical task periods are obtained according to a working plan of a current task period, the similarity between the historical working plan of the historical task period and the working plan of the current task period meets preset requirements, and the historical task data comprises the following steps: configuration data of each historical task and time-consuming data for executing each historical task; clustering each historical task cycle by using a preset clustering model according to the time-consuming data until a plurality of task time-consuming types are determined, and the ratio of the data amount contained in each task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement; determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation map, wherein the scheduling relation map is used for representing the dependency relationship of the mutual calling processing results among the historical tasks; according to the detection result of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged in advance; and if so, determining and outputting one or more pieces of early warning information. The method solves the technical problems that the existing abnormal monitoring has poor response timeliness and inflexible configuration, is only monitoring on a logic level and has low coupling degree with actual services. The early warning can guarantee response timeliness, sufficient time is reserved for problem processing, early warning information is pushed according to dependence association between tasks, and the technical effects of quickly positioning problems and coordinating resources are achieved.
To facilitate an understanding of several possible embodiments corresponding to S202, the following description is provided in detail.
Fig. 3 is a flowchart illustrating a process of circularly determining time consumption types of multiple tasks in step S202 in the embodiment shown in fig. 2 according to an implementation of the present application. As shown in fig. 3, the specific steps include:
s301, randomly extracting time-consuming data of a plurality of historical tasks from all historical tasks to serve as a clustering center.
S302, performing first clustering processing on each historical task according to a clustering center by using a preset clustering model to determine one or more first time-consuming types.
In this embodiment, the clustering process of the preset clustering model can be represented by formula (1):
Figure BDA0003686065710000101
wherein C represents a first time-consuming type, k is the number of initial clustering centers, C i Time consumption data for executing the historical tasks corresponding to the k clustering centers respectively, namely the execution time of the initial clustering center, x j The execution time of all samples in one or more history periods, namely each history task is consumed.
It is worth noting that the formula (1) combines the characteristics of task scheduling, and the conventional kmeans is subjected to dimensionality reduction, so that the algorithm complexity is reduced, the data accuracy is guaranteed, the early warning deployment burden is reduced, and the operation efficiency is improved.
S303, judging whether the data volume ratio in each first time consumption type meets the preset ratio requirement.
In this step, the data amount is a ratio of the data amount contained in the first time consumption type to the total data amount of the time consumption data. If yes, step S304 is executed, and if no, step S305 is executed.
In this embodiment, the preset occupation ratio requirement includes: the data amount ratio is greater than or equal to a first ratio threshold and less than or equal to a second ratio threshold. Optionally, the first value range of the first ratio threshold includes: 1% -10%, the second value range of the second ratio threshold includes: 40 to 60 percent. Preferably, the first percentage threshold is 5% and the second percentage threshold is 50%.
S304, determining that the first time consumption type is a task time consumption type.
S305, re-determining the clustering center, and re-performing clustering processing to re-determine the first time-consuming type until the data volume ratio corresponding to each first time-consuming type meets the preset ratio requirement.
In this embodiment, the method specifically includes:
s3051, deleting a first time-consuming type of which the data volume proportion is smaller than a first proportion threshold; and/or randomly extracting at least two historical tasks as new clustering centers in each first time-consuming type with the data volume larger than the second ratio threshold.
Specifically, for example, a classification in which the number of samples is less than 5% of the total number of samples is deleted, that is, a first time-consuming type in which the amount of data is deleted is less than 5%. Randomly draw 2 samples as new cluster centers in the category with more than 50% of the total number of samples.
S3052, for the first time-consuming type meeting the preset proportion requirement, re-determining a new clustering center according to a preset mode.
In one possible implementation, when the first time consumption type meets the preset duty requirement, the average time consumption of the first time consumption type is used as a new clustering center.
Specifically, the time-consuming data corresponding to the new cluster center can be represented by formula (2):
Figure BDA0003686065710000111
wherein, C i Is a classified set, i.e. a set corresponding to the first time-consuming type, | C i L is the number of samples in the set, x is the time-consuming data corresponding to each sample in the set, a i And time-consuming data corresponding to the new clustering center.
After the step is executed, the process returns to step S302, that is, the clustering process is performed again according to each new clustering center by using the preset clustering model to determine new first time-consuming types until the data volume proportion corresponding to each first time-consuming type meets the preset proportion requirement.
The method for circularly determining the time consumption types of the multiple tasks provided by the embodiment combines the characteristics of task scheduling, and reduces the dimension of the conventional kmeans, thereby reducing the complexity of the algorithm, ensuring the accuracy of the data, reducing the early warning deployment burden and improving the operation efficiency. By classifying the historical data, the complexity of an early warning algorithm is reduced, the early warning cost is reduced, and the early warning accuracy is improved; by using the existing task configuration information and historical information, the scheduling platform can directly provide all data for constructing the scheduling relation map and classification sample data of the task time consumption type, and is decoupled with big data cluster resources, so that the increase of cluster burden is avoided.
In order to facilitate understanding of possible implementation manners of "determining at least one timing detection task according to multiple task time consumption types and a scheduling relationship map" in step S203 in the embodiment shown in fig. 2, a description is given below by using a specific embodiment.
Fig. 4 is a schematic flowchart of a process of determining at least one timing detection task in step S203 in the embodiment shown in fig. 2 according to an embodiment of the present application. As shown in fig. 4, the specific steps include:
s401, according to preset screening requirements, a first target type and a second target type are determined from various task time consumption types.
In this embodiment, the largest value in the time-consuming data corresponding to the cluster center is selected from the task time-consuming types as the first target type, and the smallest value in the time-consuming data is selected as the second target type.
S402, determining a first fluctuation range and a second fluctuation range according to each time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm.
In this step, it specifically includes:
s4021, determining a first fluctuation range according to a first average consumed time and a first standard deviation of all consumed time data in the first target type.
In one possible design, the first fluctuation range is equal to the sum of the first average elapsed time and the first standard deviation multiplied by N, and the first fluctuation range B 1 As shown in equation (3):
Figure BDA0003686065710000121
wherein the content of the first and second substances,
Figure BDA0003686065710000122
represents the first average elapsed time, S 1 The first standard deviation is indicated.
Preferably, if the sample fluctuation interval as a whole is normally distributed, the confidence level in the three standard deviations of the mean is 99.6%, and therefore, the value of N may be set to 3. It is understood that, a person skilled in the art may specifically set the value of N according to the distribution pattern obeyed by the sample fluctuation interval, and the setting is not limited herein.
S4022, determining a second fluctuation range according to the second average consumed time and the second standard deviation of all consumed time data in the two target types.
In one possible design, the second fluctuation range is equal to the difference between the second average elapsed time and the second standard deviation multiplied by M, and the second fluctuation range B 2 As shown in equation (4):
Figure BDA0003686065710000123
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003686065710000124
represents the second average elapsed time, S 2 The second standard deviation is indicated.
Preferably, if the sample fluctuation interval as a whole is normally distributed, the confidence level in the three standard deviations of the mean is 99.6%, and therefore, the value of M may be set to 3. It is understood that, a person skilled in the art may specifically set the value of M according to the distribution pattern obeyed by the sample fluctuation interval, and the value is not limited herein.
In the step, the task time consumption fluctuation intervals are respectively calculated by selecting the clusters with the maximum time consumption and the minimum time consumption, so that the calculation amount is reduced compared with a normal distribution algorithm of the whole sample while the extra error caused by the difference among sample categories is reduced, and the method is more suitable for a large data cluster environment with a large number of scheduling tasks.
And S403, determining detection objects and detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical tasks in the time-consuming data.
In one possible design, the detection time includes: a first detection time and a second detection time, the first detection time comprising: superimposing the first fluctuation range on the basis of the start time, the second detection time comprising: the second fluctuation range is superimposed on the basis of the start time.
Specifically, the start execution time T of the current scheduling task is read based on the result obtained in S402, and a timing detection job or a timing detection task is generated, detecting the time T 1 And T 2 As shown in equation (5):
Figure BDA0003686065710000125
wherein, B 1 Denotes a first fluctuation range, B 2 Indicating a second fluctuation range.
It is to be noted that, on the basis of the embodiment shown in fig. 4, in the embodiment shown in fig. 2, S204 pre-determines whether the probability of the task scheduling exception of the target system meets the preset early warning requirement according to the detection result of the detection task at each detection time point at each timing, which includes two aspects:
if the execution progress of the detection object in the first detection time is determined to be incomplete according to the detection result, determining that a first probability that the execution progress of the task is abnormal meets a preset early warning requirement;
and if the execution progress of the detection object in the second detection time is determined to be completed according to the detection result, determining that a second probability that the data magnitude of the target system scheduling task is abnormal meets a preset early warning requirement.
Fig. 5 is a flowchart illustrating another task scheduling exception monitoring method according to an embodiment of the present application. As shown in fig. 5, the method includes the following specific steps:
s501, obtaining historical task data of one or more historical task periods according to the work plan of the current task period.
In this step, the similarity between the historical work plan of the historical task period and the work plan of the current task period meets the preset requirement, and the historical task data includes: configuration data for each historical task and time-consuming data for performing each historical task.
And S502, clustering each historical task cycle by using a preset clustering model according to the time consumption data until a plurality of task time consumption types are determined.
In this step, the ratio of the data amount contained in each task time consumption type to the total amount of the time consumption data satisfies the preset proportion requirement.
S503, determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to the time consumption types of the multiple tasks and the scheduling relation map.
In this step, the scheduling relationship map is used to represent the dependency relationship of the mutual calling processing results among the historical tasks.
S504, according to the detection result of each timing detection task at each detection time point, whether the probability of abnormity occurring in task scheduling of the target system meets the preset early warning requirement is judged in advance.
If not, the step is executed circularly, namely, the next timing detection task is executed continuously. If yes, the method specifically comprises the step of detecting the first detection time T 1 And a second detection time T 2 And (3) judging two detection results:
(1) and if the execution progress of the detection object in the first detection time is determined to be incomplete according to the detection result, determining that the first probability that the execution progress of the task is abnormal meets the early warning requirement.
(2) And if the execution progress of the detection object in the second detection time is determined to be completed according to the detection result, determining that the second probability that the data magnitude of the target system scheduling task is abnormal meets the early warning requirement.
Note that, in the present embodiment, the first detection time T 1 And a second detection time T 2 As shown in equation (5).
It should be noted that the following processing performed in the first case is only described as an example, that is, steps S505 to S512 are performed. For the second case, the first case may be referred to for processing, and another warning manner may be independently adopted, such as sending warning information only once.
And S505, when the scheduling abnormity of the detection object is detected at the first detection time, sending early warning information to the current task.
In this step, the detection object is the currently executed task. And if the scheduling abnormity of the detection object is detected in the first detection time, the type of the early warning information is delay type early warning.
And S506, determining third detection time of the detection object according to the preset delay time.
In this step, for the delayed early warning, in order to avoid that the early warning message is ignored and the expected early warning effect cannot be achieved, that is, the abnormality cannot be corrected in time before a large number of task scheduling abnormalities, the first detection time T needs to be after the delayed early warning is sent out 1 Then adding a preset delay time to obtain a third detection time, namely a third detection time T 3 =T 1 + Td, where Td is the preset delay time, optionally, the value of Td is equal to S in formula (3) 1 I.e. T 3 =T 1 +S 1
Similarly, after the current task is detected again at the third detection time, if the delay type early warning is still sent, the step S506 to the step S512 are executed repeatedly.
And S507, when the scheduling exception still exists in the currently executed task is detected at the third detection time, determining a first early warning level of the currently executed task according to the first preset early warning weight and the early warning triggering times of the currently executed task.
In this step, assuming that the current task meets the preset early warning requirement after being detected again by the third detection time, the early warning level of the current task needs to be recalculated, so as to avoid that the early warning at the lower early warning level cannot be sent, which causes a small problem to become a big problem, and causes a serious scheduling accident.
In the present embodiment, assuming that the current task (also referred to as the last task) is task x, the reference value f (x) of the warning level of task x may be calculated by equation (6):
f(x)=u x t (6)
wherein u is x The initial early warning weight of the task x is defaulted to 1, T is the early warning times, and the early warning weight is detected at a first detection time T 1 For the first warning, at a third detection time T 3 For the second warning, and so on.
And S508, judging whether the first early warning level meets a preset early warning condition.
In this step, if yes, step S505 is executed again, that is, the warning information is sent to the current task, and steps S509 to S512 are also executed, if no, this flow is directly ended.
Specifically, if f (x) is less than or equal to 0.3, the second early warning level is judged to be low-level early warning, and early warning processing is not carried out; if f (x) is more than 0.3 and less than or equal to 0.6, judging that the second early warning level is a middle-level early warning; if f (x) is more than 0.6, the second early warning level is judged to be a high-level early warning.
After the first warning level is determined, reference may be made to S205 to generate corresponding warning information, which is not described herein again.
And S509, determining the association degree of the current execution task and the next task according to the scheduling relation map by using a preset association model.
In this step, the specific calculation manner of the relevance r between the currently executed task (i.e., task x) and the next task (i.e., task y) may refer to the formula (×) in S205, which is not described herein again.
S510, determining a second early warning level of the next task according to the second early warning weight, the association degree and the early warning triggering times of the next task.
In this embodiment, the next task corresponding to the current task, that is, the task x, in the scheduling relationship map is the task y, and the reference value f (y) of the early warning level of the task y can be calculated by the formula (7):
f(y)=u y tr (7)
wherein u is y The initial early warning weight of the task y is defaulted to 1, r is the correlation degree or correlation coefficient of the task x and the task y, T is the early warning times, and the early warning weight is detected in the first detection time T 1 For the first warning, at a third detection time T 3 For the second warning, and so on.
And S511, judging whether the second early warning level meets the preset early warning condition.
In this step, if yes, step S512 is executed.
Specifically, if f (y) is less than or equal to 0.3, the second early warning level is judged to be low-level early warning, and early warning processing is not carried out; if the f (y) is more than 0.3 and less than or equal to 0.6, judging that the second early warning level is a middle-level early warning; if f (y) is more than 0.6, the second early warning level is judged to be a high-level early warning.
And S512, sending early warning information to the next task.
In this step, the warning information includes a scheduling delay prompting that the scheduling exception of the next task is originated from the currently executed task.
Specifically, after the early warning level is determined, the specific step of S205 may be referred to generate corresponding early warning information.
It should be noted that S509 to S510 may be executed synchronously with S507, and that the pages S508 and S511 may be executed synchronously.
It should be further noted that, in the method for monitoring task scheduling exception provided in each of the foregoing embodiments, the early warning information may include: a weight feedback link; after determining and outputting one or more pieces of early warning information, the method further comprises the following steps:
receiving adjustment information input by a user through the weight feedback link;
and adjusting the early warning weight of the detection object corresponding to the timing detection task according to the adjustment information.
Specifically, the early warning information provides early warning level weight feedback links, if the early warning level does not accord with the service response level, the weight value required to be adjusted is fed back through the links, and the feedback value is updated and recorded in an early warning system database for subsequent early warning generation and use.
In summary, the task scheduling exception monitoring method provided in the embodiments of the present application has at least the following beneficial effects:
(1) early warning can ensure response timeliness and reserve sufficient time for problem handling;
(2) aiming at the dependent task association pushing, the method is beneficial to quick positioning and resource coordination;
(3) according to data in a period, dynamically adjusting an early warning strategy, and being beneficial to finding problems in time;
(4) providing a feedback interface, and coupling the attention of the service to the scheduled task;
(5) the system does not need to share computing resources with a scheduling Server and a Job Server, and has no influence on the system basically;
(6) following the dependency inversion principle, the system interface implementation is invoked directly.
Fig. 6 is a schematic structural diagram of a task scheduling exception monitoring apparatus according to an embodiment of the present application. The task scheduling abnormality monitoring apparatus 600 may be implemented by software, hardware, or a combination of both.
As shown in fig. 6, the task scheduling abnormality monitoring apparatus 600 includes:
the obtaining module 601 is configured to obtain historical task data of one or more historical task periods according to a work plan of a current task period, where a similarity between the historical work plan of the historical task period and the work plan of the current task period meets a preset requirement, and the historical task data includes: configuration data of each historical task and time-consuming data for executing each historical task;
a processing module 602 configured to:
clustering each historical task cycle by using a preset clustering model according to the time-consuming data until a plurality of task time-consuming types are determined, and the ratio of the data amount contained in each task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement;
determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to the time consumption types of the tasks and the scheduling relation map, wherein the scheduling relation map is used for representing the dependency relationship of the mutual calling processing results among the historical tasks;
according to the detection result of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged in advance; if yes, determining one or more pieces of early warning information;
and an output module 603, configured to output the warning information to a detection object of the timing detection task.
In one possible design, the one or more historical task cycles include: the last task cycle nearest to the current cycle, or a consecutive number of task cycles nearest to the current cycle.
In one possible design, the processing module 602 is configured to:
randomly extracting time-consuming data of a plurality of historical tasks from all historical tasks as a clustering center;
performing first clustering processing on each historical task by using a preset clustering model according to a clustering center to determine one or more first time-consuming types;
judging whether the data volume ratio in each first time-consuming type meets the preset ratio requirement or not;
if so, determining that the first time consumption type is a task time consumption type;
if not, re-determining the clustering center, and re-performing clustering processing to re-determine the first time-consuming types until the data volume ratio corresponding to each first time-consuming type meets the preset ratio requirement;
the data volume proportion is used for representing the ratio of the data volume contained in the first time consumption type to the total data volume of the time consumption data.
In one possible design, the predetermined duty cycle requirement includes: the data amount ratio is greater than or equal to a first ratio threshold and less than or equal to a second ratio threshold.
Optionally, the first value range of the first ratio threshold includes: 1% -10%, the second value range of the second ratio threshold includes: 40 to 60 percent.
In one possible design, the processing module 602 is configured to:
deleting a first time-consuming type with the data volume proportion smaller than a first proportion threshold; and/or the presence of a gas in the gas,
randomly extracting at least two historical tasks from each first time-consuming type with the data volume larger than a second ratio threshold value to serve as a new clustering center;
for a first time-consuming type meeting the preset proportion requirement, re-determining a new clustering center according to a preset mode;
and performing clustering again by using a preset clustering model according to each new clustering center to determine a new first time-consuming type.
In one possible design, the processing module 602 is further configured to:
and when the first time consumption type meets the preset proportion requirement, taking the average time consumption of the first time consumption type as a new clustering center.
In one possible design, the processing module 602 is further configured to:
determining a first target type and a second target type from each task time consumption type according to a preset screening requirement;
determining a first fluctuation range and a second fluctuation range according to each time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm;
and determining the detection object and the detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical tasks in the time-consuming data.
In one possible design, the processing module 602 is further configured to:
determining a first fluctuation range according to the first average consumed time and the first standard deviation of all consumed time data in the first target type;
and determining a second fluctuation range according to the second average consumed time of all the consumed time data in the two target types and the second standard deviation.
In one possible design, the processing module 602 is configured to calculate a first fluctuation range equal to a sum of the first average elapsed time and N times the first standard deviation; a second fluctuation range is calculated equal to the difference between the second average elapsed time and M times the second standard deviation.
In one possible design, the detection time includes: a first detection time and a second detection time, the first detection time comprising: superimposing the first fluctuation range on the basis of the start time, the second detection time comprising: the second fluctuation range is superimposed on the basis of the start time.
In one possible design, the processing module 602 is configured to:
if the execution progress of the detection object in the first detection time is determined to be incomplete according to the detection result, determining that a first probability that the execution progress of the task is abnormal meets an early warning requirement;
and if the execution progress of the detection object in the second detection time is determined to be completed according to the detection result, determining that the second probability that the data magnitude of the target system scheduling task is abnormal meets the early warning requirement.
In one possible design, the output module 603 is configured to:
calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
if the association degree is in the first association interval, determining that the early warning information comprises first early warning information and second early warning information, and the early warning levels of the first early warning information and the second early warning information are the same, wherein the first early warning information is used for representing that a former task has scheduling abnormality and has association influence on the scheduling of the latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is derived from the delay of the former task;
and outputting first early warning information to the previous task and outputting second early warning information to the next task.
In one possible design, the output module 603 is configured to:
calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
if the association degree is in a second association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein a first early warning level of the first early warning information is greater than a second early warning level of the second early warning information, the first early warning information is used for representing that a former task has scheduling abnormality and has association influence on scheduling of a latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is originated from delay of the former task;
and outputting first early warning information to the previous task and outputting second early warning information to the next task.
In one possible design, the output module 603 is configured to:
calculating the association degree of a previous task and a next task in a scheduling relation map according to a preset association model;
and if the association degree is in the third association interval, outputting early warning information to the previous task, wherein the early warning information is used for representing that the previous task has scheduling abnormity.
In one possible design, the warning information includes: a weight feedback link;
the obtaining module 601 is further configured to receive adjustment information input by a user through the weight feedback link;
the processing module 602 is further configured to adjust an early warning weight of a detection object corresponding to the timing detection task according to the adjustment information.
In one possible design, the processing module 602 is further configured to:
when the scheduling abnormality of the detection object is detected at the first detection time, determining third detection time of the detection object according to preset delay time, wherein the detection object is a current execution task;
when the scheduling exception still exists in the current execution task at the third detection time, determining a first early warning level of the current execution task according to a first preset early warning weight and early warning triggering times of the current execution task;
judging whether the first early warning level meets a preset early warning condition or not;
and if so, sending the early warning information to the current execution task again.
In one possible design, the processing module 602 is further configured to:
determining the association degree of the current execution task and the next task according to the scheduling relation map by using a preset association model;
determining a second early warning level of the next task according to the second early warning weight, the association degree and the early warning triggering times of the next task;
judging whether the second early warning level meets a preset early warning condition or not;
and if so, sending early warning information to the next task, wherein the early warning information comprises a scheduling delay for prompting that the scheduling exception of the next task is originated from the currently executed task.
It should be noted that the apparatus provided in the embodiment shown in fig. 6 can execute the method provided in any of the above method embodiments, and the specific implementation principle, technical features, term explanation and technical effects thereof are similar and will not be described herein again.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device 700 may include: at least one processor 701 and a memory 702. Fig. 7 shows an electronic device as an example of a processor.
The memory 702 stores programs. In particular, the program may include program code including computer operating instructions.
The memory 702 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 701 is configured to execute computer-executable instructions stored by the memory 702 to implement the methods described in the method embodiments above.
The processor 701 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
Alternatively, the memory 702 may be separate or integrated with the processor 701. When the memory 702 is a device independent from the processor 701, the electronic device 700 may further include:
a bus 703 for connecting the processor 701 and the memory 702. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be classified as address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.
Alternatively, in a specific implementation, if the memory 702 and the processor 701 are implemented in a single chip, the memory 702 and the processor 701 may communicate via an internal interface.
Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium may include: various media that can store program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and in particular, the computer-readable storage medium stores program instructions for the methods in the above method embodiments.
An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method in the foregoing method embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (22)

1. A task scheduling exception monitoring method is characterized by comprising the following steps:
acquiring historical task data of one or more historical task periods according to a working plan of a current task period, wherein the similarity between the historical working plan of the historical task period and the working plan of the current task period meets a preset requirement, and the historical task data comprises the following steps: configuration data of each historical task and time-consuming data for executing each historical task;
clustering each historical task cycle by using a preset clustering model according to the time-consuming data until a plurality of task time-consuming types are determined, wherein the ratio of the data amount contained in each task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement;
determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to a plurality of task time consumption types and the scheduling relation map, wherein the scheduling relation map is used for representing the dependency relationship of mutual calling processing results among the historical tasks;
according to the detection result of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of a target system meets the preset early warning requirement is judged in advance; and if so, determining and outputting one or more pieces of early warning information.
2. The method of claim 1, wherein one or more of the historical task cycles comprises: a last task cycle nearest to the current cycle, or a plurality of consecutive task cycles nearest to the current cycle.
3. The method for monitoring task scheduling anomalies according to claim 1, wherein the clustering, by using a preset clustering model and according to the time-consuming data, each historical task cycle until a plurality of task time-consuming types are determined includes:
randomly extracting the time-consuming data of a plurality of historical tasks from all the historical tasks as a clustering center;
performing first clustering processing on each historical task according to the clustering center by using the preset clustering model to determine one or more first time-consuming types;
judging whether the data volume ratio in each first time-consuming type meets the preset ratio requirement or not;
if so, determining that the first time consumption type is a task time consumption type;
if not, re-determining the clustering center, and re-performing the clustering processing to re-determine the first time-consuming type until the data volume occupation ratio corresponding to each first time-consuming type meets the preset occupation ratio requirement;
wherein the data volume proportion is used for characterizing the ratio of the data volume contained in the first time consumption type to the total data volume of the time consumption data.
4. The method for monitoring task scheduling exception according to any one of claims 1 to 3, wherein the presetting of the duty ratio requirement comprises: the data volume fraction is greater than or equal to a first fraction threshold and less than or equal to a second fraction threshold.
5. The method of claim 4, wherein the first value range of the first duty threshold comprises: 1% -10%, the second value range of the second ratio threshold includes: 40 to 60 percent.
6. The method for monitoring task scheduling exception according to claim 4, wherein the re-determining the clustering center and re-clustering the task scheduling exception comprises:
deleting a first time-consuming type of which the data volume proportion is smaller than the first proportion threshold; and/or the presence of a gas in the gas,
randomly extracting at least two historical tasks from each first time-consuming type with the data volume larger than the second ratio threshold value as a new clustering center;
for the first time-consuming type meeting the preset proportion requirement, re-determining a new clustering center according to a preset mode;
and re-clustering by using a preset clustering model according to each new clustering center to determine a new first time-consuming type.
7. The method for monitoring task scheduling anomalies according to claim 6, wherein the step of re-determining a new clustering center according to a preset manner for the first time-consuming type meeting the preset duty ratio requirement includes:
when the first time consumption type meets the preset proportion requirement, taking the average time consumption of the first time consumption type as a new clustering center.
8. The method for monitoring task scheduling exception according to claim 1, wherein the determining at least one timing detection task according to the plurality of task time consumption types and the scheduling relationship map comprises:
determining a first target type and a second target type from each task time consumption type according to a preset screening requirement;
determining a first fluctuation range and a second fluctuation range according to the time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm;
and determining the detection object and the detection time of each timing detection task according to the scheduling relation map, the first fluctuation range, the second fluctuation range and the starting time of the execution of the historical tasks in the time-consuming data.
9. The method for monitoring task scheduling exception according to claim 8, wherein the determining a first fluctuation range and a second fluctuation range according to each of the time-consuming data in the first target type and the second target type by using a preset fluctuation algorithm comprises:
determining the first fluctuation range according to the first average consumed time and the first standard deviation of all the consumed time data in the first target type;
and determining the second fluctuation range according to the second average consumed time and the second standard deviation of all the consumed time data in the two target types.
10. The method for monitoring task scheduling exception according to claim 9, wherein the determining the first fluctuation range according to a first average elapsed time and a first standard deviation of all the elapsed time data in the first target type includes:
the first fluctuation range is equal to the sum of the first average elapsed time and N times the first standard deviation;
determining the second fluctuation range according to a second average consumed time and a second standard deviation of all the consumed time data in the two target types, including:
the second fluctuation range is equal to a difference between the second average elapsed time and M times the second standard deviation.
11. The method according to any one of claims 8 to 10, wherein the detecting the time includes: a first detection time and a second detection time, the first detection time comprising: superimposing the first fluctuation range on the basis of the start time, the second detection time including: superimposing the second fluctuation range on the basis of the start time.
12. The method for monitoring task scheduling anomalies according to claim 11, wherein the step of predicting whether the probability of occurrence of the task scheduling anomalies of the target system meets a preset early warning requirement according to the detection results of the timing detection tasks at the detection time points comprises:
if the execution progress of the detection object in the first detection time is determined to be incomplete according to the detection result, determining that a first probability that the execution progress of the task is abnormal meets the preset early warning requirement;
and if the execution progress of the detection object at the second detection time is determined to be completed according to the detection result, determining that the second probability that the data magnitude of the task scheduled by the target system is abnormal meets the preset early warning requirement.
13. The method for monitoring task scheduling exceptions according to any one of claims 1-3 and 8-10, wherein the determining and outputting one or more pieces of early warning information comprises:
calculating the association degree of a previous task and a next task in the scheduling relation map according to a preset association model;
if the association degree is in a first association interval, determining that the early warning information comprises first early warning information and second early warning information, and the early warning levels of the first early warning information and the second early warning information are the same, wherein the first early warning information is used for representing that the former task has scheduling abnormality and has association influence on the scheduling of the latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is caused by the delay of the former task;
and outputting the first early warning information to the former task and outputting the second early warning information to the latter task.
14. The method for monitoring task scheduling exceptions according to any one of claims 1-3 and 8-10, wherein the determining and outputting one or more pieces of early warning information comprises:
calculating the association degree of a previous task and a next task in the scheduling relation map according to a preset association model;
if the association degree is in a second association interval, determining that the early warning information comprises first early warning information and second early warning information, wherein a first early warning level of the first early warning information is greater than a second early warning level of the second early warning information, the first early warning information is used for representing that the former task has scheduling abnormality and has association influence on scheduling of the latter task, and the second early warning information is used for representing that the scheduling abnormality of the latter task is caused by delay of the former task;
and outputting the first early warning information to the previous task and outputting the second early warning information to the next task.
15. The method for monitoring task scheduling exceptions according to any one of claims 1-3 and 8-10, wherein the determining and outputting one or more pieces of early warning information comprises:
calculating the association degree of a previous task and a next task in the scheduling relation map according to a preset association model;
and if the association degree is in a third association interval, outputting the early warning information to the previous task, wherein the early warning information is used for representing that the previous task has scheduling abnormity.
16. The method for monitoring task scheduling exception according to any one of claims 1 to 3 and 8 to 10, wherein the early warning information includes: a weight feedback link;
after the determining and outputting one or more pieces of early warning information, the method further comprises:
receiving adjustment information input by a user through the weight feedback link;
and adjusting the early warning weight of the detection object corresponding to the timing detection task according to the adjustment information.
17. The method for monitoring task scheduling exceptions according to claim 11, further comprising: when the scheduling abnormity of the detection object is detected at the first detection time, determining third detection time of the detection object according to preset delay time, wherein the detection object is a current execution task;
when the scheduling abnormality of the currently executed task is detected at the third detection time, determining a first early warning level of the currently executed task according to a first preset early warning weight and early warning triggering times of the currently executed task;
judging whether the first early warning level meets a preset early warning condition or not;
and if so, sending the early warning information to the current execution task again.
18. The task scheduling anomaly monitoring method according to claim 17, when it is detected at the third detection time that the scheduling anomaly still exists in the currently executed task, further comprising:
determining the association degree of the current execution task and the next task according to the scheduling relation map by using a preset association model;
determining a second early warning level of the next task according to the second early warning weight, the association degree and the early warning triggering times of the next task;
judging whether the second early warning level meets the preset early warning condition or not;
and if so, sending the early warning information to the next task, wherein the early warning information comprises a scheduling delay for prompting that the scheduling exception of the next task is originated from the currently executed task.
19. A task scheduling exception monitoring apparatus, comprising:
the acquisition module is used for acquiring historical task data of one or more historical task periods according to a working plan of a current task period, wherein the similarity between the historical working plan of the historical task period and the working plan of the current task period meets a preset requirement, and the historical task data comprises the following steps: configuration data of each historical task and time-consuming data for executing each historical task;
a processing module to:
clustering each historical task cycle by using a preset clustering model according to the time-consuming data until a plurality of task time-consuming types are determined, wherein the ratio of the data amount contained in each task time-consuming type to the total amount of the time-consuming data meets the preset ratio requirement;
determining a scheduling relation map according to the configuration data, and determining at least one timing detection task according to a plurality of task time consumption types and the scheduling relation map, wherein the scheduling relation map is used for representing the dependency relationship of mutual calling processing results among the historical tasks;
according to the detection result of each timing detection task at each detection time point, whether the probability of abnormal task scheduling of the target system meets the preset early warning requirement is judged in advance; if yes, determining one or more pieces of early warning information;
and the output module is used for outputting the early warning information to a detection object of the timing detection task.
20. An electronic device, comprising:
a processor; and the number of the first and second groups,
a memory for storing a computer program for the processor;
wherein the processor is configured to perform the method of task scheduling exception monitoring of any of claims 1 to 18 via execution of the computer program.
21. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the task scheduling exception monitoring method according to any one of claims 1 to 18.
22. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method for task scheduling exception monitoring of any one of claims 1 to 18.
CN202210646351.4A 2022-06-09 2022-06-09 Method, apparatus, medium, and program product for monitoring task scheduling exception Pending CN114840392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210646351.4A CN114840392A (en) 2022-06-09 2022-06-09 Method, apparatus, medium, and program product for monitoring task scheduling exception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210646351.4A CN114840392A (en) 2022-06-09 2022-06-09 Method, apparatus, medium, and program product for monitoring task scheduling exception

Publications (1)

Publication Number Publication Date
CN114840392A true CN114840392A (en) 2022-08-02

Family

ID=82574833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210646351.4A Pending CN114840392A (en) 2022-06-09 2022-06-09 Method, apparatus, medium, and program product for monitoring task scheduling exception

Country Status (1)

Country Link
CN (1) CN114840392A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292141A (en) * 2022-09-29 2022-11-04 深圳联友科技有限公司 Scheduling abnormity early warning method based on sliding time window and monitoring server
CN117271100A (en) * 2023-11-21 2023-12-22 北京国科天迅科技股份有限公司 Algorithm chip cluster scheduling method, device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292141A (en) * 2022-09-29 2022-11-04 深圳联友科技有限公司 Scheduling abnormity early warning method based on sliding time window and monitoring server
CN115292141B (en) * 2022-09-29 2023-02-03 深圳联友科技有限公司 Scheduling abnormity early warning method based on sliding time window and monitoring server
CN117271100A (en) * 2023-11-21 2023-12-22 北京国科天迅科技股份有限公司 Algorithm chip cluster scheduling method, device, computer equipment and storage medium
CN117271100B (en) * 2023-11-21 2024-02-06 北京国科天迅科技股份有限公司 Algorithm chip cluster scheduling method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US10402225B2 (en) Tuning resources based on queuing network model
CN114840392A (en) Method, apparatus, medium, and program product for monitoring task scheduling exception
US8490108B2 (en) Method of estimating a processing time of each of a plurality of jobs and apparatus thereof
US11966778B2 (en) Cloud application scaler
JP6260130B2 (en) Job delay detection method, information processing apparatus, and program
US9558091B2 (en) Information processing device, fault avoidance method, and program storage medium
CN111444060B (en) Abnormality detection model training method, abnormality detection method and related devices
US20160004620A1 (en) Detection apparatus, detection method, and recording medium
US20120072456A1 (en) Adaptive resource allocation for multiple correlated sub-queries in streaming systems
CN111680085A (en) Data processing task analysis method and device, electronic equipment and readable storage medium
US20140244846A1 (en) Information processing apparatus, resource control method, and program
CN114741187A (en) Resource scheduling method, system, electronic device and medium
CN113515358B (en) Task scheduling method and device, electronic equipment and storage medium
CN116186017B (en) Big data collaborative supervision method and platform
US20100030732A1 (en) System and method to create process reference maps from links described in a business process model
JP7307215B2 (en) Operation support system and method
JP2007164346A (en) Decision tree changing method, abnormality determination method, and program
CN112035236B (en) Task scheduling method, device and storage medium based on multi-factor cooperation
Poltavtseva et al. Planning of aggregation and normalization of data from the Internet of Things for processing on a multiprocessor cluster
CN113517998A (en) Processing method, device and equipment of early warning configuration data and storage medium
US20150277858A1 (en) Performance evaluation device, method, and medium for information system
CN112650687B (en) Method, device, equipment and medium for testing execution priority of engine scheduling action
CN118069620A (en) Database fault prevention method, device, computer equipment and storage medium
CN112463556B (en) Volume visible latency prediction method, system, device and medium
CN117290113B (en) Task processing method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination