CN112559287B - Optimization method and device for task flow of data center station - Google Patents

Optimization method and device for task flow of data center station Download PDF

Info

Publication number
CN112559287B
CN112559287B CN202011448500.3A CN202011448500A CN112559287B CN 112559287 B CN112559287 B CN 112559287B CN 202011448500 A CN202011448500 A CN 202011448500A CN 112559287 B CN112559287 B CN 112559287B
Authority
CN
China
Prior art keywords
task
task flow
execution
key
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011448500.3A
Other languages
Chinese (zh)
Other versions
CN112559287A (en
Inventor
姜水琴
路平
张敬谊
胡杉文
王维任
袁峰
张鑫金
方幸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WONDERS INFORMATION CO Ltd
Original Assignee
WONDERS INFORMATION CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WONDERS INFORMATION CO Ltd filed Critical WONDERS INFORMATION CO Ltd
Priority to CN202011448500.3A priority Critical patent/CN112559287B/en
Publication of CN112559287A publication Critical patent/CN112559287A/en
Application granted granted Critical
Publication of CN112559287B publication Critical patent/CN112559287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3017Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0715Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • G06F11/3423Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time where the assessed time is active or idle time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an object of optimizing a task flow of a data center station. In order to achieve the above purpose, a technical scheme of the present invention is to provide a method for optimizing a task flow of a data center station. The invention further provides an optimizing device for the task flow of the data center station. The invention can monitor the execution condition of the task flow and give an alarm, thereby effectively supervising the task flow. When the task flow is abnormal, the method can not only give an alarm, but also accurately locate key abnormal nodes on the task flow, and simultaneously predict the execution time of the optimized task flow, and judge whether the optimized task flow can be completed in a preset time, thereby realizing the optimization of the task flow. Therefore, the invention can optimize the task flow of the data center station, improve the execution efficiency of the data center station, and improve the stability and reliability of the data center station.

Description

Optimization method and device for task flow of data center station
Technical Field
The invention relates to a method and a device for optimizing task flows of a data center, and belongs to the technical field of data center.
Background
The data center creates multi-source heterogeneous data, uniformly manages and manages enterprise data, provides support for enterprise business, provides efficient service for clients, and is a sediment of enterprise business and data. The data center can reduce the cost of repeated construction and can also keep the differentiated competitive advantage of enterprises. Stable and efficient data center has become an infrastructure for enterprise strategic.
A series of task flows are operated on the data center table, the task flows are usually composed of a plurality of different types of tasks such as SQL (structured query language), script, ETL (extract-transform language) and the like, the task types are various, different types of task codes are developed cooperatively by a plurality of people, and depending on different operating environments and resources, the manual configuration resources are difficult to match the demands of the tasks on the resources, and the operating efficiency is greatly different and the resource utilization rate is low. Task dependency in the task flow is complex, and downstream tasks often depend on successful execution of upstream tasks; the upstream task delays the execution of the process which will slow the whole task flow, so that the whole task flow is difficult to finish at a preset moment; and dependencies exist among task periods, such as the current task execution depends on the last scheduling execution result. The task operation efficiency difference and the complex dependency relationship often cause task abnormality phenomena such as congestion in the task flow, so that the execution efficiency of the task flow in the data is low.
The data center station adopts serial and parallel task execution logic of the directed acyclic graph, when the task is abnormal, especially the fault of the upstream task is reported to the police together with the downstream task, and the key abnormal node is difficult to be positioned quickly; when the task runs, the effective supervision on the task flow is lacking, the execution time of the task flow cannot be prejudged, and the task is difficult to finish at a preset moment, so that the stability and the reliability of the data center are lower. Therefore, there is a need to develop an optimization method for the task flow of the data center.
Disclosure of Invention
The purpose of the invention is that: and optimizing the task flow of the data center station.
In order to achieve the above object, a technical solution of the present invention is to provide a method for optimizing a task flow of a data center station, which is characterized by comprising the following steps:
step S1: at least reading task names of all single tasks on which the target task flow depends and a dependency relationship table among all single tasks on which the task flow depends;
Step S2: the method comprises the steps of monitoring the execution result of a target task flow, judging the execution result of the target task flow in the execution process of the target task flow, wherein the judgment result comprises the following steps: task completion, task error and task overtime, judging the execution result of the target task flow as abnormal when the execution result is the task error or the task overtime, entering step S3, and returning to step S2 after recording the execution duration of the single task on which the target task flow depends when the execution result is the task completion;
step S3: for a target task flow with abnormal execution results, calculating a critical path of the target task flow and the longest execution duration of the whole target task flow according to the execution duration of a single task on which the target task flow depends, wherein the method comprises the following steps of:
Step S301: establishing a directed acyclic graph according to the dependency relationship among the single tasks on which the task flows depend and the execution time length of each single task;
step S302: calculating a critical path of the directed acyclic graph as a critical path of the target task flow; the execution time of the key path of the directed acyclic graph is the longest execution time of the whole target task flow;
step S4: according to the critical path and the longest execution duration of the whole target task flow, determining critical abnormal nodes of the target task flow, and defining a single task as a node, in step S4:
when the execution result is that the task is in error, searching a key fault node in which a single task is in error, sending an error alarm, and sending a key path and the key fault node of a target task flow to a user;
when the execution result is that the task is overtime, finding out a key overtime node with the maximum execution time of a single task, wherein the task with the maximum execution time in the key node set CPL is the key overtime node, sending out overtime alarm, and sending a key path of a task flow and the key overtime node to a user;
step S5: forming an optimized task flow according to the key abnormal nodes, and predicting the longest execution duration of the optimized task flow, wherein:
the key abnormal node is the key fault node or the key overtime node obtained in the step S4;
Predicting the longest execution duration of the optimized task flow comprises the following steps: constructing a mathematical model by using a history execution record in the log, predicting the execution time length of a single task on which the optimized task flow depends by using the mathematical model, and calculating the predicted execution time length of the task flow according to the predicted execution time length of the single task;
Step S6: judging whether the optimized task flow can be completed at a preset moment according to the predicted longest execution duration of the optimized task flow, and if not, giving an alarm; if the task flow can be completed, continuously monitoring the execution state of the task flow and recording the execution time of the task flow.
Preferably, in step S2 and step S3, the execution duration of the single task is calculated and obtained according to the start time and the end time of the single task recorded in the log.
Preferably, the directed acyclic graph is represented by g= (V, E), and in step S301, establishing the directed acyclic graph g= (V, E) further includes the steps of:
Defining each event in the target task flow as the vertex of the directed acyclic graph G, wherein the ith vertex of the directed acyclic graph G is denoted as V i, and the vertices corresponding to all n events in the target task flow form a vertex set V, V= { V 1,v2,...,vn };
Each single task on which the target task flow depends is defined as a directed edge of the directed acyclic graph G, where E ij represents the directed edge in the directed acyclic graph G pointing from vertex v i to vertex v j, and the directed edges corresponding to all the single tasks in the target task flow form a set of edges E, e= { E ij|(vi,vj) }. And the weight of each directed edge is the execution duration of the corresponding single task, and the weight of the directed edge e ij is c ij.
Preferably, in step S302, calculating the critical path of the directed acyclic graph and the execution duration of the critical path includes the steps of:
The earliest and latest start times of each vertex thereon are calculated according to the directed acyclic graph G established in step S301. If the earliest starting time and the latest starting time of the current vertex are equal, adding the current vertex into a bottleneck event set, and adding a directed edge corresponding to the current vertex into a key node set CPL; setting the earliest starting time of the vertex v j as ES i and the latest starting time as LS i, if the ES i is equal to LS i, adding the vertex v j into a bottleneck event set, and adding a directed edge e ij corresponding to the vertex v j into a key node set CPL;
After traversing all vertexes on the directed acyclic graph G, the finally obtained key node set CPL is the key path of the directed acyclic graph G; the sum of the execution time lengths of all single tasks in the key node set CPL is the execution time length of the key path, namely the longest execution time length of the whole target task flow.
Preferably, in step S5, when forming the optimized task flow:
For the key fault node, checking error reasons, and modifying corresponding errors so as to form an optimized task flow;
and for the key timeout node, reconfiguring resources according to the data volume, the processor utilization rate and the memory occupation condition, so as to form an optimized task flow.
Preferably, in step S5, predicting the longest execution duration of the optimized task flow specifically includes the following steps:
Step S501: according to the number of processors, the memory occupation condition and the data volume of the tasks, which are applied by a user for the tasks, N historical tasks similar to the current single task are found out from the task records of historical execution by adopting a K nearest neighbor algorithm, the average value of the execution time length of the N historical tasks is calculated, and the average value of the execution time length is used as the predicted execution time length of the corresponding single task;
step S502: and establishing a directed acyclic graph according to the dependency relationship among the single tasks on which the optimized task flow depends and the predicted execution time length of the single tasks on which the optimized task flow depends, which is calculated in the step S501, and calculating a key path of the directed acyclic graph, wherein the longest execution time length of the key path is the predicted execution time length of the optimized task flow.
Preferably, in step S501, N tasks similar to the current single task are found by using the similarity, and if the current single task is J 1 and any one of the history tasks is J 2, the similarity is sim (J 1,J2), and there are:
Wherein x a1 and x a2 are the a-th feature vectors of the current single task J 1 and the historical task J 2, and m is the total number of feature vectors.
Preferably, in step S6, whether the optimized task flow can be completed at a predetermined time is determined according to a rule, if the rule is satisfied, the task flow is determined to be completed, otherwise, the task flow is determined to be not completed, wherein the rule is determined to be completed according to the following formula:
Ts+tmax+tcut≤Tf
wherein T s is a preset starting time of the optimized task flow, T max is a longest execution time of the optimized task flow obtained in step S5, T cut is a preset threshold, and T f is a preset completion time of the optimized task flow.
Another technical solution of the present invention is to provide an optimizing apparatus for a task flow of a data center station, wherein the optimizing method is operated, and includes:
The data reading module is used for reading the task names of the single tasks on which the task flows depend, the dependency relationship table among the single tasks and other data;
The calculation module calculates the execution time of the single task and calculates the critical path of the task flow;
the prediction module predicts the execution time of a single task and predicts the overall execution time of a task flow;
and the monitoring module is used for judging whether the task flow can be completed at a preset moment, and if the task flow cannot be completed, the monitoring module is used for giving an alarm.
The invention can monitor the execution condition of the task flow and give an alarm, thereby effectively supervising the task flow. When the task flow is abnormal, the method can not only give an alarm, but also accurately locate key abnormal nodes on the task flow, and simultaneously predict the execution time of the optimized task flow, and judge whether the optimized task flow can be completed in a preset time, thereby realizing the optimization of the task flow. Therefore, the invention can optimize the task flow of the data center station, improve the execution efficiency of the data center station, and improve the stability and reliability of the data center station.
Drawings
FIG. 1 is a flowchart of a method for optimizing a task flow of a data center station according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of an optimizing device for task flows in a data center in accordance with an embodiment of the present invention;
FIG. 3 is a task flow diagram of a data warehouse of a data center in accordance with an embodiment of the present invention;
FIG. 4 is a directed acyclic graph corresponding to a task flow of a data warehouse in a data center according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a machine learning task flow in data according to an embodiment of the present invention;
Fig. 6 is a directed acyclic graph corresponding to a machine learning task flow in data according to an embodiment of the present invention.
Detailed Description
The application will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Furthermore, it should be understood that various changes and modifications can be made by one skilled in the art after reading the teachings of the present application, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
Referring to fig. 1, an optimization method for a task flow of a data center station according to an embodiment of the present invention includes the following steps:
Step S1: reading data such as task names of all single tasks on which the target task flow depends, dependency relationship tables among all single tasks on which the task flow depends and the like;
Step S2: and monitoring the execution result of the target task flow, and calculating the execution duration of the single task on which the target task flow depends.
In this embodiment, in the execution process of the target task flow, the present invention determines an execution result of the target task flow, where the determined execution result includes: task completion, task error, task timeout.
The execution time length of the single task on which the target task flow depends is calculated and obtained according to the starting time and the ending time of the single task recorded in the log.
If the execution result of the target task flow is abnormal, step S3 is entered, in this embodiment, if the execution result is that the task is in error or the task is overtime, the execution result is judged to be abnormal; otherwise, recording the actual execution time length of the single task relied by the target task flow, and calculating the actual execution time length of the single task according to the starting time and the ending time of the single task recorded in the log. In this embodiment, for a target task flow whose execution result is task completion, the actual execution duration of a single task on which the target task flow depends is recorded.
Step S3: and for the target task flow with abnormal execution results, calculating a critical path of the target task flow and the longest execution duration of the whole target task flow according to the execution duration of the single task on which the target task flow depends.
In this step, the critical path of the target task flow and the longest execution duration of the whole target task flow are calculated according to the execution duration of the single task on which the target task flow depends, and further including the following steps:
Step S301: establishing a directed acyclic graph according to the dependency relationship among the single tasks on which the task flows depend and the execution time length of each single task;
step S302: calculating a critical path of the directed acyclic graph as a critical path of the target task flow; the execution time of the critical path of the directed acyclic graph is the longest execution time of the whole target task flow.
The directed acyclic graph is represented by g= (V, E), and in step S301, the establishing of the directed acyclic graph g= (V, E) further includes the steps of:
And defining each event in the target task flow as the vertex of the directed acyclic graph G, wherein the ith vertex of the directed acyclic graph G is denoted as V i, and the vertices corresponding to all n events in the target task flow form a vertex set V, V= { V 1,v2,...,vn }.
Each single task on which the target task flow depends is defined as a directed edge of the directed acyclic graph G, where E ij represents the directed edge in the directed acyclic graph G pointing from vertex v i to vertex v j, and the directed edges corresponding to all the single tasks in the target task flow form a set of edges E, e= { E ij|(vi,vj) }. And the weight of each directed edge is the execution duration of the corresponding single task, and the weight of the directed edge e ij is c ij.
In step S302, the critical path of the directed acyclic graph and the execution duration of the critical path are calculated, and further including the following steps:
The earliest and latest start times of each vertex thereon are calculated according to the directed acyclic graph G established in step S301. If the earliest starting time and the latest starting time of the current vertex are equal, adding the current vertex into a bottleneck event set, and adding a directed edge corresponding to the current vertex into a key node set CPL. Let the earliest start time of vertex v j be ES i and the latest start time be LS i, if ES i is equal to LS i, then vertex v j is added to the bottleneck event set, and the directed edge e ij corresponding to vertex v j is added to the key node set CPL. After traversing all the vertexes of the directed acyclic graph G, the final obtained key node set CPL is the key path of the directed acyclic graph G. The sum of the execution time lengths of all single tasks in the key node set CPL is the execution time length of the key path, namely the longest execution time length of the whole target task flow.
Step S4: and determining key abnormal nodes of the target task flow according to the key paths and the longest execution duration of the whole target task flow. In the present invention, a single task is defined as a node, and in step S4:
when the execution result is that the task is in error, searching a key fault node in which a single task is in error, sending an error alarm, and sending a key path and the key fault node of a target task flow to a user;
And when the execution result is that the task is overtime, finding out a key overtime node with the maximum execution time of a single task, wherein the task with the maximum execution time in the key node set CPL is the key overtime node, sending out an overtime alarm, and sending a key path of a task flow and the key overtime node to a user.
Step S5: and forming an optimized task flow according to the key abnormal nodes, and predicting the longest execution duration of the optimized task flow. The key abnormal node is the key fault node or the key overtime node obtained in the step S4.
Forming the optimized task flow:
For the key fault node, checking error reasons, and modifying corresponding errors so as to form an optimized task flow;
and for the key timeout node, reconfiguring resources according to the data volume, the processor utilization rate and the memory occupation condition, so as to form an optimized task flow.
In step S5, predicting the longest execution duration of the optimized task flow further includes the following steps:
Constructing a mathematical model by using a history execution record in a log, predicting the execution time length of a single task on which the optimized task flow depends by using the mathematical model, and calculating the predicted execution time length of the task flow according to the predicted execution time length of the single task, wherein the method further comprises the following steps:
Step S501: according to the number of processors, the memory occupation condition and the data volume of the tasks, which are applied by a user for the tasks, N historical tasks similar to the current single task are found out from the task records of historical execution by adopting a K neighbor algorithm, the average value of the execution time length of the N historical tasks is calculated, and the average value of the execution time length is used as the predicted execution time length of the corresponding single task.
In this embodiment, N tasks similar to the current single task are found by using the similarity, and if the current single task is J 1 and any one of the history tasks is J 2, the similarity is sim (J 1,J2), and there are:
Wherein x a1 and x a2 are the a-th feature vectors of the current single task J 1 and the historical task J 2, and m is the total number of feature vectors.
Step S502: and establishing a directed acyclic graph according to the dependency relationship among the single tasks on which the optimized task flow depends and the predicted execution time length of the single tasks on which the optimized task flow depends, which is calculated in the step S501, and calculating a key path of the directed acyclic graph, wherein the longest execution time length of the key path is the predicted execution time length of the optimized task flow.
Step S6: judging whether the optimized task flow can be completed at a preset moment according to the predicted longest execution duration of the optimized task flow, and if not, giving an alarm; if the task flow can be completed, continuously monitoring the execution state of the task flow and recording the execution time of the task flow.
In this embodiment, the determining whether the optimized task flow can be completed at the predetermined time further includes the following steps:
judging whether the optimized task flow can be completed at a preset moment according to the rule, if so, judging that the task flow can be completed, and if not, judging that the task flow cannot be completed. In this embodiment, the judgment rule adopts the following formula:
Ts+tmax+tcut≤Tf
wherein T s is a preset starting time of the optimized task flow, T max is a longest execution time of the optimized task flow obtained in step S5, T cut is a preset threshold, and T f is a preset completion time of the optimized task flow.
To achieve the above object, an embodiment of the present application further provides an optimizing apparatus for a task flow in data, fig. 2 is a block diagram of an optimizing apparatus for a task flow in data according to an embodiment of the present application, and referring to fig. 2, the apparatus includes:
the data reading module 201 reads data such as task names of individual tasks, dependency relationship tables among the individual tasks, and the like, on which the task flows depend.
The calculation module 202 calculates the execution time of the single task and calculates the critical path of the task flow.
And the prediction module 203 predicts the execution time of the single task and predicts the overall execution time of the task flow.
The monitoring module 204 judges whether the task flow can be completed at a preset time, and if not, the task flow gives an alarm.
In order to further understand the optimization method of the task flow of the data center station in the embodiment of the invention, the data warehouse task flow and the machine learning task flow of the data center station are taken as examples to further explain the invention.
The data warehouse task flow optimization method of the data center station in the embodiment of the invention specifically comprises the following steps:
Reading task names of all single tasks relied by the task flow, and data such as a dependency relationship table among the single tasks of the task flow;
And monitoring the execution result of the target task flow, and calculating the execution duration of the single task on which the target task flow depends. In the data warehouse task flow of this embodiment, the dependency relationship between single tasks and the execution duration of the single tasks are as shown in fig. 3, where event t_ods is an original layer data table, t_dwd 1~T_DWD3 is a detail layer data table, t_dws 1~T_DWS5 is a wide table of an aggregation layer, and t_ads is an application layer data table; SQL 1~SQL13 is the running node of the workflow; the number preceding SQL i is the execution time required for each SQL script.
According to the task flow, a directed acyclic graph with weights is established, as shown in fig. 4, and each layer of data table forms a vertex set V of the directed acyclic graph, namely { V 1,v2,...,v10 }; the single task node SQL i of the task flow and its execution sequence form the set E of directed edges E ij of the directed acyclic graph, and the execution duration of SQL i forms the set C of weights C ij of the directed edges.
The earliest start time ES i calculated for each vertex v i is:
{ES1:0,ES4:11,ES3:6,ES6:21,ES9:26,ES2:4,ES5:15,ES8:42,ES7:18,ES10:53}
The latest start time LS i of each vertex v i is:
{LS1:0,LS4:11,LS3:16,LS6:21,LS9:48,LS2:27,LS5:38,LS8:42,LS7:45,LS10:53}
According to the earliest start time ES i and the latest start time LSi,ES1=LS1,ES4=LS4,ES6=LS6,ES8=LS8,ES10=LS10, of each vertex v i obtained above, the vertex { v 1,v4,v6,v8,v10 } is a bottleneck event set, the corresponding directed edge { e 14,e46,e68,e8,10 } is a key node set CPL, that is, the key path of the target task flow in this embodiment is the task flow node { SQL 3,SQL6,SQL9,SQL12 }, and the longest execution duration of the entire task flow is 53.
According to FIG. 3, the single task with the greatest execution duration in the critical path { SQL 3,SQL6,SQL9,SQL12 } is SQL 9. The system issues an exception alert and sends the user the critical path { SQL 3,SQL6,SQL9,SQL12 } of the task flow and the critical exception node SQL 9.
And optimizing the task flow according to specific conditions by a user according to the key abnormal nodes to form an optimized task flow. According to the number of processors, memory occupation and task data volume of the task application which are applied by a user for the optimized task, a K neighbor algorithm is adopted, and the data of the task flow is executed in a historical mode, so that the single task execution duration of the optimized task flow is predicted, and the predicted longest execution duration t max of the task flow is further obtained.
Judging whether the optimized task flow can be completed on time according to the following formula:
Ts+tmax+tcut≤Tf
Wherein T s is a preset start time of the task flow, T cut is a threshold, and T f is a preset finish time of the task flow.
According to the judging method, if the optimized task flow can not be completed on time, an alarm is sent out. If the task flow can be completed, continuously monitoring the execution state of the task flow and recording the execution time of the task flow.
The machine learning task flow optimization method of the data center station of the other embodiment of the invention specifically comprises the following steps:
Reading task names in the task flows, dependency relationship tables among single tasks of the task flows and other data;
And monitoring the execution result of the target task flow, and calculating the execution duration of the single task on which the target task flow depends. The machine learns the dependency relationship between the single tasks of the task flow and the execution time of the single tasks according to the embodiment, as shown in fig. 5.
A directed acyclic graph is constructed according to the machine learning workflow described above, as shown in fig. 6. Each layer of data table forms a vertex set V of the directed acyclic graph, namely { V 1,v2,...,v15 }; the execution sequence of the single task nodes { load data set i, data cleaning operator i, data merging operator i, feature encoding operator i, machine learning operator i, calculation final result } of the task flow forms a set E of directed edges E ij of the directed acyclic graph, and the execution duration of each single task node forms a set C of weights C ij of the directed edges.
The earliest start time ES i calculated for each vertex v i is:
{ES2:0,ES4:4,ES6:11,ES1:0,ES3:3,ES5:9,ES7:13,ES11:19,ES10:22,ES9:20,ES8:21,ES12:25,ES14:45,ES13:36,ES15:49}
The latest start time LS i calculated to obtain each vertex v i is:
{LS2:0,LS4:4,LS6:11,LS1:2,LS3:5,LS5:11,LS7:13,LS11:22,LS10:22,LS9:22,LS8:22,LS12:25,LS14:45,LS13:45,LS15:49}
The earliest start time ES i and the latest start time LSi,ES2=LS2,ES4=LS4,ES6=LS6,ES7=LS7,ES10=LS10,ES12=LS12,ES14=LS14,ES15=LS15, of each vertex v i obtained as described above are therefore the bottleneck event set with vertex { v 2,v4,v6,v7,v10,v12,v14,v15 } and the corresponding directed edge { e 24,e46,e67,e7,10,e10,12,e12,14,e14,15 } as the critical node set CPL, i.e. the critical path of this embodiment is the task flow node { load dataset 1, data cleansing operator 2, data merge 1, feature encoding operator 3, data merge 2, machine learning operator 2, calculate final result }. The longest execution duration of the entire task flow is 49.
According to fig. 5, the single task with the largest execution duration in the critical path { load dataset 1, data cleansing operator 2, data merge 1, feature encoding operator 3, data merge 2, machine learning operator 2, calculate final result } of this embodiment is { machine learning operator 2}. The system sends out an abnormal alarm and sends a key path { load data set 1, data cleaning operator 2, data merging 1, feature coding operator 3, data merging 2, machine learning operator 2, calculation final result } and key abnormal node { machine learning operator 2} of the task flow to the user.
And optimizing the task flow according to specific conditions by a user according to the key abnormal nodes to form an optimized task flow. According to the number of processors, memory occupation and task data volume of the task application which are applied by a user for the optimized task, a K neighbor algorithm is adopted, and the data of the task flow is executed in a historical mode, so that the single task execution duration of the optimized task flow is predicted, and the predicted longest execution duration t max of the task flow is further obtained.
Judging whether the optimized task flow can be completed on time according to the following formula:
Ts+tmax+tcut≤Tf
Wherein T s is a preset start time of the task flow, T cut is a threshold, and T f is a preset finish time of the task flow.
According to the judging method, if the optimized task flow can not be completed on time, an alarm is sent out. If the task flow can be completed, continuously monitoring the execution state of the task flow and recording the execution time of the task flow.
Compared with the prior art, the method and the device for optimizing the task flow of the data center platform have the following beneficial effects:
the invention can monitor the execution condition of the task flow and give an alarm, thereby effectively supervising the task flow. When the task flow is abnormal, the method can not only give an alarm, but also accurately locate key abnormal nodes on the task flow, and simultaneously predict the execution time of the optimized task flow, and judge whether the optimized task flow can be completed in a preset time, thereby realizing the optimization of the task flow. Therefore, the invention can optimize the task flow of the data center station, improve the execution efficiency of the data center station, and improve the stability and reliability of the data center station.

Claims (9)

1. The optimizing method of the task flow of the station in the data is characterized by comprising the following steps:
step S1: at least reading task names of all single tasks on which the target task flow depends and a dependency relationship table among all single tasks on which the task flow depends;
Step S2: monitoring the execution result of the target task flow, and judging the execution result of the target task flow in the execution process of the target task flow, wherein the judgment result comprises the following steps: task completion, task error and task overtime, judging the execution result of the target task flow as abnormal when the execution result is the task error or the task overtime, entering step S3, and returning to step S2 after recording the execution duration of the single task on which the target task flow depends when the execution result is the task completion;
step S3: for a target task flow with abnormal execution results, calculating a critical path of the target task flow and the longest execution duration of the whole target task flow according to the execution duration of a single task on which the target task flow depends, wherein the method comprises the following steps of:
Step S301: establishing a directed acyclic graph according to the dependency relationship among the single tasks on which the task flows depend and the execution time length of each single task;
step S302: calculating a critical path of the directed acyclic graph as a critical path of the target task flow; the execution time of the key path of the directed acyclic graph is the longest execution time of the whole target task flow;
step S4: according to the critical path and the longest execution duration of the whole target task flow, determining critical abnormal nodes of the target task flow, and defining a single task as a node, in step S4:
when the execution result is that the task is in error, searching a key fault node in which a single task is in error, sending an error alarm, and sending a key path and the key fault node of a target task flow to a user;
when the execution result is that the task is overtime, finding out a key overtime node with the maximum execution time of a single task, wherein the task with the maximum execution time in the key node set CPL is the key overtime node, sending out overtime alarm, and sending a key path of a task flow and the key overtime node to a user;
step S5: forming an optimized task flow according to the key abnormal nodes, and predicting the longest execution duration of the optimized task flow, wherein:
the key abnormal node is the key fault node or the key overtime node obtained in the step S4;
Predicting the longest execution duration of the optimized task flow comprises the following steps: constructing a mathematical model by using a history execution record in the log, predicting the execution time length of a single task on which the optimized task flow depends by using the mathematical model, and calculating the predicted execution time length of the task flow according to the predicted execution time length of the single task;
Step S6: judging whether the optimized task flow can be completed at a preset moment according to the predicted longest execution duration of the optimized task flow, and if not, giving an alarm; if the task flow can be completed, continuously monitoring the execution state of the task flow and recording the execution time of the task flow.
2. The method for optimizing a task flow in a data center as claimed in claim 1, wherein in step S2 and step S3, the execution duration of a single task is calculated and obtained according to the start time and the end time of the single task recorded in the log.
3. The method for optimizing a task flow in a data station according to claim 1, wherein the directed acyclic graph is represented by g= (V, E), and the step S301 of creating the directed acyclic graph g= (V, E) further comprises the steps of:
Defining each event in the target task flow as the vertex of the directed acyclic graph G, wherein the ith vertex of the directed acyclic graph G is denoted as V i, and the vertices corresponding to all n events in the target task flow form a vertex set V, V= { V 1,v2,...,vn };
Defining each single task on which the target task flow depends as a directed edge of the directed acyclic graph G, wherein E ij represents the directed edge pointing from the vertex v i to the vertex v j in the directed acyclic graph G, and the directed edges corresponding to all the single tasks in the target task flow form an edge set E, E= { E ij|(vi,vj) }; and the weight of each directed edge is the execution duration of the corresponding single task, and the weight of the directed edge e ij is c ij.
4. A method for optimizing a task flow in a data station according to claim 3, wherein in step S302, calculating the critical path of the directed acyclic graph and the execution time of the critical path comprises the steps of:
Calculating the earliest starting time and the latest starting time of each vertex according to the directed acyclic graph G established in the step S301; if the earliest starting time and the latest starting time of the current vertex are equal, adding the current vertex into a bottleneck event set, and adding a directed edge corresponding to the current vertex into a key node set CPL; setting the earliest starting time of the vertex v j as ES i and the latest starting time as LS i, if the ES i is equal to LS i, adding the vertex v j into a bottleneck event set, and adding a directed edge e ij corresponding to the vertex v j into a key node set CPL;
After traversing all vertexes on the directed acyclic graph G, the finally obtained key node set CPL is the key path of the directed acyclic graph G; the sum of the execution time lengths of all single tasks in the key node set CPL is the execution time length of the key path, namely the longest execution time length of the whole target task flow.
5. The method for optimizing a task flow in data according to claim 1, wherein in step S5, when the optimized task flow is formed:
For the key fault node, checking error reasons, and modifying corresponding errors so as to form an optimized task flow;
and for the key timeout node, reconfiguring resources according to the data volume, the processor utilization rate and the memory occupation condition, so as to form an optimized task flow.
6. The method for optimizing a task flow in a data platform according to claim 5, wherein in step S5, predicting the longest execution duration of the optimized task flow specifically includes the steps of:
Step S501: according to the number of processors, the memory occupation condition and the data volume of the tasks, which are applied by a user for the tasks, N historical tasks similar to the current single task are found out from the task records of historical execution by adopting a K nearest neighbor algorithm, the average value of the execution time length of the N historical tasks is calculated, and the average value of the execution time length is used as the predicted execution time length of the corresponding single task;
step S502: and establishing a directed acyclic graph according to the dependency relationship among the single tasks on which the optimized task flow depends and the predicted execution time length of the single tasks on which the optimized task flow depends, which is calculated in the step S501, and calculating a key path of the directed acyclic graph, wherein the longest execution time length of the key path is the predicted execution time length of the optimized task flow.
7. The method of optimizing task flows in data according to claim 6, wherein in step S501, N tasks similar to a current task are found by using similarity, and if the current task is J 1 and any one of the historical tasks is J 2, the similarity is sim (J 1,J2), and there are:
Wherein x a1 and x a2 are the a-th feature vectors of the current single task J 1 and the historical task J 2, and m is the total number of feature vectors.
8. The method for optimizing task flows in data according to claim 1, wherein in step S6, it is determined whether the optimized task flows can be completed at a predetermined time according to a rule, if the rule is satisfied, it is determined that the task flows can be completed, otherwise, it is determined that the task flows cannot be completed, wherein the determination rule adopts the following formula:
Ts+tmax+tcut≤Tf
wherein T s is a preset starting time of the optimized task flow, T max is a longest execution time of the optimized task flow obtained in step S5, T cut is a preset threshold, and T f is a preset completion time of the optimized task flow.
9. An optimization apparatus for a data-in-data task flow, wherein the optimization method according to claim 1 is executed, and the optimization apparatus comprises:
the data reading module is used for reading task names of all single tasks on which the task flows depend, and dependency relationship table data among the single tasks;
The calculation module calculates the execution time of the single task and calculates the critical path of the task flow;
the prediction module predicts the execution time of a single task and predicts the overall execution time of a task flow;
and the monitoring module is used for judging whether the task flow can be completed at a preset moment, and if the task flow cannot be completed, the monitoring module is used for giving an alarm.
CN202011448500.3A 2020-12-11 2020-12-11 Optimization method and device for task flow of data center station Active CN112559287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011448500.3A CN112559287B (en) 2020-12-11 2020-12-11 Optimization method and device for task flow of data center station

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011448500.3A CN112559287B (en) 2020-12-11 2020-12-11 Optimization method and device for task flow of data center station

Publications (2)

Publication Number Publication Date
CN112559287A CN112559287A (en) 2021-03-26
CN112559287B true CN112559287B (en) 2024-08-06

Family

ID=75062681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011448500.3A Active CN112559287B (en) 2020-12-11 2020-12-11 Optimization method and device for task flow of data center station

Country Status (1)

Country Link
CN (1) CN112559287B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220542B (en) * 2021-04-01 2022-10-28 深圳市云网万店科技有限公司 Early warning method and device for computing task, computer equipment and storage medium
CN113434323A (en) * 2021-06-28 2021-09-24 浙江大华技术股份有限公司 Task flow control method of data center station and related device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107036618A (en) * 2017-05-24 2017-08-11 合肥工业大学(马鞍山)高新技术研究院 A kind of AGV paths planning methods based on shortest path depth optimization algorithm
CN110188792A (en) * 2019-04-18 2019-08-30 万达信息股份有限公司 The characteristics of image acquisition methods of prostate MRI 3-D image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837457B (en) * 2019-11-19 2022-05-13 支付宝(杭州)信息技术有限公司 Task management method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107036618A (en) * 2017-05-24 2017-08-11 合肥工业大学(马鞍山)高新技术研究院 A kind of AGV paths planning methods based on shortest path depth optimization algorithm
CN110188792A (en) * 2019-04-18 2019-08-30 万达信息股份有限公司 The characteristics of image acquisition methods of prostate MRI 3-D image

Also Published As

Publication number Publication date
CN112559287A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN112559287B (en) Optimization method and device for task flow of data center station
US20080270077A1 (en) System and method for detecting performance anomalies in a computing system
EP3133492A1 (en) Network service incident prediction
US8387054B1 (en) Method and apparatus of scheduling data movers
US20170220407A1 (en) Automatic model generation for performance monitoring
US20210117995A1 (en) Proactively predicting transaction quantity based on sparse transaction data
US20050096949A1 (en) Method and system for automatic continuous monitoring and on-demand optimization of business IT infrastructure according to business objectives
US8150861B2 (en) Technique for implementing database queries for data streams using a curved fitting based approach
US20220374442A1 (en) Extract, transform, load monitoring platform
KR102117637B1 (en) Apparatus and method for preprocessinig data
CN109120463B (en) Flow prediction method and device
US10628801B2 (en) System and method for smart alerts
CN112214261B (en) Three-layer structure DNN calculation unloading method facing edge intelligence
US11258659B2 (en) Management and control for IP and fixed networking
CN117768469B (en) Cloud service management method and system based on big data
US8180716B2 (en) Method and device for forecasting computational needs of an application
WO2020252666A1 (en) Edge computing device and method for industrial internet of things, and computer-readable storage medium
CN114170002A (en) Method and device for predicting access frequency
CN116578408A (en) Operation resource scheduling method for supporting intelligent manufacturing software
US7783509B1 (en) Determining that a change has occured in response to detecting a burst of activity
CN112150277A (en) Service data processing method, device, readable medium and equipment
Breitgand et al. Efficient control of false negative and false positive errors with separate adaptive thresholds
CN115774602A (en) Container resource allocation method, device, equipment and storage medium
CN113076232A (en) Health data index abnormity detection method and system
Sfaxi et al. Latency-Aware and Proactive Service Placement for Edge Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant