CN107241205A - abnormality monitoring method and device - Google Patents

abnormality monitoring method and device Download PDF

Info

Publication number
CN107241205A
CN107241205A CN201610184288.1A CN201610184288A CN107241205A CN 107241205 A CN107241205 A CN 107241205A CN 201610184288 A CN201610184288 A CN 201610184288A CN 107241205 A CN107241205 A CN 107241205A
Authority
CN
China
Prior art keywords
task
abnormal
time
benchmark
reruning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610184288.1A
Other languages
Chinese (zh)
Inventor
陈磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610184288.1A priority Critical patent/CN107241205A/en
Priority to TW106105604A priority patent/TW201737084A/en
Priority to PCT/CN2017/076891 priority patent/WO2017167021A1/en
Publication of CN107241205A publication Critical patent/CN107241205A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)

Abstract

This application provides abnormality monitoring method and device.Abnormality monitoring method includes:According to benchmark task set in advance in task scheduling system, the abnormal task in task scheduling system is determined;According to the benchmark deadline of benchmark task set in advance, it is determined that reruning the Late Start of abnormal task;According to the Late Start and current time for reruning abnormal task, alert process is carried out to abnormal task.The application can improve to abnormal task alarm flexibility, reduction occur alarming not in time or inessential alarm probability, improve alarm accuracy.

Description

Abnormality monitoring method and device
Technical field
The application is related to the communication technology, more particularly to a kind of abnormality monitoring method and device.
Background technology
In the big data epoch, data are more and more widely analyzed and used, but are due to data volume Big, collection process is complicated, unstable or wrong situation occurs unavoidably, particularly in distribution In formula system, error, which is retried, to be even more difficult to avoid that.When a problem occurs, if it is possible to timely early warning Even give warning in advance, can greatly reduce the loss that error in data is brought.
In task scheduling system, for ease of the task that notes abnormalities in time, typically using Mission Monitor Scheme.Prior art Mission Monitor scheme, is essentially all the numerous and diverse information of user configuring, including Alarm trigger condition, time of fire alarming, alarm object, type of alarm etc., based on these configuration informations, Task run process is monitored, when finding to meet the task of alarm trigger condition, in setting Time of fire alarming, alarmed with the type of alarm of setting to the alarm object of setting.In this side In formula, time of fire alarming is pre-configured, and flexibility is poor, easily causes alarm not prompt enough Or non-essential alarm, cause alarm accuracy poor.
The content of the invention
The application provides a kind of abnormality monitoring method and device, to improve what abnormal task was alarmed Flexibility, reduction occur alarming not in time or inessential alarm probability, improve alarm accuracy.
To reach above-mentioned purpose, embodiments herein is adopted the following technical scheme that:
First aspect there is provided a kind of abnormality monitoring method, including:
According to benchmark task set in advance in task scheduling system, the task scheduling system is determined In abnormal task;
According to the benchmark deadline of the benchmark task set in advance, it is determined that reruning described The Late Start of abnormal task;
According to the Late Start and current time for reruning the abnormal task, to described different Permanent business carries out alert process.
Second aspect there is provided a kind of exception monitoring apparatus, including:
Abnormal task determining module, for according to benchmark task set in advance in task scheduling system, Determine the abnormal task in the task scheduling system;
Latest time determining module, for being completed according to the benchmark of the benchmark task set in advance Time, it is determined that reruning the Late Start of the abnormal task;
Alert process module, for according to rerun the abnormal task Late Start and Current time, alert process is carried out to the abnormal task.
As shown from the above technical solution, the application presets the benchmark task in task scheduling system And its benchmark deadline, in task scheduling process, abnormal task is determined according to benchmark task, And then according to the benchmark deadline of benchmark task, it is determined that reruning starting the latest for abnormal task Time, according to the Late Start and current time for reruning abnormal task, to abnormal task Alert process is carried out, rather than must be reached as prior art in the time of fire alarming being pre-configured with Shi Jinhang alert process, flexibility is stronger, advantageously reduces appearance and alarms not in time or inessential report Alert probability, improves alarm accuracy.
Described above is only the general introduction of technical scheme, in order to better understand the application Technological means, and can be practiced according to the content of specification, and in order to allow the application's upper Stating can become apparent with other objects, features and advantages, below especially exemplified by the specific reality of the application Apply mode.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantage and benefit pair It will be clear understanding in those of ordinary skill in the art.Accompanying drawing is only used for showing preferred embodiment Purpose, and be not considered as the limitation to the application.And in whole accompanying drawing, use identical Reference symbol represents identical part.In the accompanying drawings:
The schematic flow sheet for the abnormality monitoring method that Fig. 1 provides for the embodiment of the application one;
Task dependence is illustrated in the task scheduling system that Fig. 2 provides for another embodiment of the application Figure;
Task dependence is illustrated in the task scheduling system that Fig. 3 provides for the another embodiment of the application Figure;
The structural representation for the exception monitoring apparatus that Fig. 4 provides for the another embodiment of the application;
The structural representation for the exception monitoring apparatus that Fig. 5 provides for the another embodiment of the application.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although in accompanying drawing Show the exemplary embodiment of the disclosure, it being understood, however, that may be realized in various forms this Disclosure by embodiments set forth here without that should be limited.It is opposite there is provided these embodiments be in order to Can be best understood from the disclosure, and can by the scope of the present disclosure completely convey to ability The technical staff in domain.
Task scheduling system refer in the way of being previously set and the time to it is a series of instruction or The system that task is scheduled execution.It is different for ease of finding in time in existing task scheduling system Permanent business is general to use Mission Monitor scheme.Existing Mission Monitor scheme, is essentially all user The numerous and diverse information of configuration, including alarm trigger condition, time of fire alarming, alarm object, type of alarm Deng, based on these configuration informations, task run process is monitored, when find meet alarm touch During the task of clockwork spring part, in the time of fire alarming of setting, with alarm from the type of alarm of setting to setting Object is alarmed.In this fashion, time of fire alarming is pre-configured, and flexibility is poor, Easily cause alarm not prompt enough or non-essential alarm, cause alarm accuracy poor.
For above-mentioned technical problem, the application provides a solution, and cardinal principle is:In advance Benchmark task and its benchmark deadline in collocation task scheduling system, are determined according to benchmark task Abnormal task, according to the benchmark deadline of benchmark task, it is determined that reruning abnormal task most The late time started, and then according to the Late Start and current time for reruning abnormal task, Alert process is carried out to abnormal task, must be in the alarm being pre-configured with no longer as prior art Time reach when carry out alert process, flexibility is stronger, advantageously reduce appearance alarm not in time or The probability of inessential alarm, improves alarm accuracy.
What deserves to be explained is, the technical scheme that the application is provided is applied to task scheduling system, and It is preferably applied to the offline task scheduling system in data warehouse development process, but not limited to this.From Being dispatched in line task scheduling system for task belongs to offline task, be relatively online or real-time task and Speech, it is primarily referred to as to be immediately applied in online operation system, but by a series of Asynchronous process after, then by the data backflow of acquisition to the task in online operation system.
The application following examples are illustrated by taking offline task scheduling system as an example, but to this area For technical staff, on the basis of the technical inspiration that the application following examples are provided, it is easy to Technical scheme is applied in online task scheduling system.
Technical scheme is described in detail with reference to embodiment and accompanying drawing.
The schematic flow sheet for the abnormality monitoring method that Fig. 1 provides for the embodiment of the application one.Such as Fig. 1 Shown, this method includes:
101st, according to benchmark task set in advance in task scheduling system, task scheduling system is determined In abnormal task.
102nd, according to the benchmark deadline of benchmark task set in advance, it is determined that reruning exception The Late Start of task.
103rd, according to the Late Start and current time for reruning abnormal task, appoint to abnormal Business carries out alert process.
The present embodiment provides a kind of abnormality monitoring method, can be performed, is used to by exception monitoring apparatus More flexible to carry out alert process to abnormal task, reduction appearance is alarmed not in time or inessential report Alert probability, improves alarm accuracy.
There is upstream and downstream dependence in task scheduling system, between task, Downstream Jobs need Treat that upstream task can be performed after performing completion.Wherein, in task scheduling system between task A kind of example of dependence is as shown in Figure 2.In the task scheduling system shown in Fig. 2, including Task A, task B, task C, task D, task E and task F.Task B and task C depend on task A, task A is task B and task C upstream task, and task B and Task C is task A Downstream Jobs;Similarly, task F depends on task A and task C, And task A and task C are task F upstream tasks, task F is task A and task C Downstream Jobs;Task D and task E dependence task A and task B, task A and task B are Under task D and task E upstream task, task D and task E tasks A and task B Trip task.
What deserves to be explained is, in the upstream and downstream task nexus shown in Fig. 2, including direct upstream and downstream Task and indirect upstream and downstream task.For example, task A be task B and task C immediately upstream Task, and task B and task C are task A direct Downstream Jobs, and task A is task D, task E and task F indirect upstream task, task D, task E and task F are tasks A indirect Downstream Jobs.In the embodiment of the present application, do not segment direct upstream and downstream task and Connect Downstream Jobs.
Due to there is upstream and downstream dependence between the task in task scheduling system, so this implementation Example presets benchmark task and its benchmark deadline in task scheduling system, by benchmark task And its benchmark deadline is as the baseline of abnormal monitoring, abnormal task is completed by the baseline and monitored And alert process.
Wherein, the benchmark deadline of benchmark task refers to the Late Finish of the benchmark task, That is, otherwise can band, it is necessary to ensure that benchmark task must be completed before the benchmark deadline Carry out serious negative consequence, for example, whole task scheduling system may be caused to report an error, or influence according to Rely normal operation of online operation system of the task scheduling system etc..
Optionally, benchmark can be determined according to the significance level of the task of each in task scheduling system Task, for example, meet the task of certain condition (such as most important) as benchmark times by most heavy degree Business.Or, base can be determined according to the dependence between the task of each in task scheduling system Quasi- task, for example, be satisfied by certain condition (for example most by upstream task quantity and Downstream Jobs quantity Task more or more than specified quantity) is used as benchmark task.If the upstream task number of a task Amount and Downstream Jobs quantity are more, illustrate that the task compares core, influence face is larger, so having The necessary guarantee task is completed before Late Finish, therefore is set to benchmark task and has It can be run on time beneficial to guarantee more multitask.
Accordingly, can be according to the applicable cases of benchmark task, really it is determined that after benchmark task Determine the benchmark deadline of benchmark task.If for example, online operation system was needed in every morning 9 points are dispatched the data that the benchmark task is calculated, then can be by the benchmark deadline of benchmark task It is set at 9 points, it means that the benchmark task must be completed before daily 9 points.In another example, If relevant personnel's needs check that the data calculated by benchmark task are given birth at 10 points in every morning Into form, then the benchmark deadline of the benchmark task can be set at 10 points, this meaning The benchmark task must complete before 10 points.
Illustrate herein, the present embodiment does not limit the number of benchmark task, can be one, can also It is multiple.In addition, in the case where benchmark task is multiple, can be set for different benchmark tasks The different benchmark deadlines, the identical benchmark deadline can also be set.As shown in Fig. 2 Task D and task E in square frame are arranged to benchmark task, and the two benchmark tasks are both needed to To complete in the morning before 6 points, it is possible to set identical benchmark complete for the two benchmark tasks Into time, such as 6 points.
, can be according to benchmark task and task in setting benchmark task and its after the benchmark deadline Dependence in scheduling system between other tasks, determines the abnormal task in task scheduling system.
In an optional embodiment, exception monitoring apparatus can be according to benchmark task and task scheduling Dependence in system between other tasks, it is determined that there is appointing for dependence with the benchmark task Business is used as task to be monitored;Then, the running for treating monitor task is monitored, so as to obtain The task of running status exception in task to be monitored is taken as abnormal task.
Further, there is dependence with benchmark task of the task includes its upstream task and downstream Task, but between directly affecting at the beginning of benchmark task and its deadline is the upper of benchmark task Trip task, and influence of the Downstream Jobs of benchmark task to benchmark task is relatively small, therefore can be with Ignore task downstream.Based on this, exception monitoring apparatus can be determined in task scheduling system by this Then the task that benchmark task is relied on, treats the operation of monitor task as task to be monitored Journey is monitored, so as to obtain the task of running status exception in task to be monitored as abnormal task. In this embodiment, the quantity of task to be monitored is relatively fewer, is conducive to saving monitoring to be consumed Various resources, improve the efficiency of task of noting abnormalities.In addition, in this embodiment, only needing Benchmark task is preset, exception monitoring apparatus can be released according to the dependence between task is counter All upstream tasks of the benchmark task, and then monitor all upstream tasks of the benchmark task automatically, Rather than needed as prior art for all upstream tasks all configure trigger condition, Time of fire alarming etc., with configuration information is less and advantage that monitoring range is wider, be particularly suitable for use in and appoint It is engaged in a fairly large number of task scheduling system.
During above-mentioned acquisition abnormal task, abnormal task refer to running status it is abnormal wait supervise Control task.Running status is for running status is normal extremely.
In an optional embodiment, the normal condition for representing normal operating condition can be preset Condition.Based on this, the running that can treat monitor task is monitored, and judges to be monitored Whether the running status of business meets normal condition condition;If judged result is to meet, it is determined that this is treated The running status of monitor task is normal;If judged result is not meet, it is determined that the task to be monitored Running status it is abnormal, regard the task to be monitored as abnormal task.Or,
In another optional embodiment, the abnormal shape for representing abnormal operating condition can be preset State condition.Based on this, the running that can treat monitor task is monitored, and judges to be monitored Whether the running status of task meets abnormal state conditions;If judged result is not meet, it is determined that The running status of the task to be monitored is normal;If judged result is to meet, it is determined that this to be monitored The running status of business is abnormal, regard the task to be monitored as abnormal task.
Certainly, in other optional embodiments, expression normal operating condition can also be concurrently set Normal condition condition and represent abnormal operating condition abnormal state conditions.
It is further alternative, above-mentioned abnormal state conditions include it is following at least one:
Operation error:Represent that the task of operation error belongs to abnormal task;
The speed of service is slack-off:Represent that the slack-off task of the speed of service belongs to abnormal task.
Based on above-mentioned abnormal state conditions, exception monitoring apparatus can be operated by following at least one, It is specific as follows to obtain abnormal task:
Task of error is run in task to be monitored is obtained as abnormal task;And
The slack-off task of the speed of service is obtained in task to be monitored as abnormal task.
Further, it is possible to determine whether the speed of service of task is slack-off by the operation duration of task. Specifically, exception monitoring apparatus, which can obtain operation duration in task to be monitored, meets strip when specifying The task of part is used as the slack-off task of the speed of service, i.e. abnormal task.
Optionally, elongate member includes but is not limited at least one following condition when above-mentioned specified:
More than preset duration threshold value:Represent that the operation duration of task to be monitored needs to be more than preset duration Be possible to during threshold value as the slack-off task of the speed of service;
Designated ratio is had more than the average operation duration in the specified period:Represent task to be monitored Operation duration needs have more designated ratio than the average operation duration in the specified period and are possible to work For the task that the speed of service is slack-off.
Above-mentioned duration threshold value can be set according to adaptability such as application scenarios and task attributes, for example may be used To be 1 hour, 30 minutes or 2 hours etc..Accordingly, above-mentioned specified period and specified ratio Example can also be set according to adaptability such as application scenarios and task attributes, such as the above-mentioned specified period Can be 10 days, 15 days or 1 month etc., above-mentioned designated ratio can be 30%, 20% or 15%, A proportion is can even is that, such as 15%-30%.
By aforesaid operations, it may be determined that the abnormal task in scheduling system of going out on missions.The exception Task refers to abnormal task occur, so need to rerun, further, since benchmark task according to Rely in abnormal task, and benchmark task must be completed before the benchmark deadline, which dictates that different Permanent business can not be arbitrarily reruned, it is necessary to start before some latest time, to ensure to rely on It can be completed in the benchmark task of abnormal task before the benchmark deadline.Based on this, abnormal prison Controlling device can be according to benchmark deadline of benchmark task set in advance, it is determined that reruning different The Late Start of permanent business.
Specifically, exception monitoring apparatus can be closed according to the dependence between benchmark task and abnormal task System, the benchmark deadline of benchmark task, the average operation duration of benchmark task and abnormal task Average operation duration it is counter pushed away, so that it is determined that reruning the Late Start of abnormal task.
Illustrate, it is assumed that the dependence between task and task that a kind of task scheduling system includes is closed System as shown in figure 3, the task scheduling system include task A, task B, task C, task D, Task E and task F.Wherein, task B is task A direct Downstream Jobs, task C, Task D and task E are task B direct Downstream Jobs respectively, and task F is the straight of task E Connect Downstream Jobs.In addition, in the task scheduling system shown in Fig. 3, task C and task D quilts One group of benchmark task is set to, the corresponding benchmark deadline is 6:00, it means that, task C It is required for task D 6:Completed before 00;And task E and task F are arranged to another group Benchmark task, the corresponding benchmark deadline is 5:00, it means that, task E and task F are Need 5:Completed before 00.
In addition to above- mentioned information, the average operating time of each task can also be known, be specially:Appoint Business E average operation duration is 0.5 hour, and task F average operation duration is 20 minutes, is appointed Business C average operation duration is 1.5 hours, and task D average operation duration is 2 hours, is appointed Business B average operation duration is 2 hours, and task A average operation duration is 10 minutes.
Assuming that monitoring task A for abnormal task, then exception monitoring apparatus can be according to above-mentioned known Information, is pushed away, it is first determined abnormal task A's since benchmark task along dependence is counter upwards The Late Finish of Downstream Jobs, i.e. task B;When then, according to task B completion the latest Between, it is determined that reruning abnormal task A Late Start.
Specifically, for task E and task F, to task E and task F when benchmark is completed Between before complete, then task E and task F Late Start are:Task E's and task F The benchmark deadline subtracts task E and task F average operation duration, i.e., 5:00-20 minutes -0.5 Hour=4:10 points, task E and task F Late Start namely according to task E and are appointed The Late Finish for the task B that business F is calculated, is 4:10 points;
For task C, completed to task C before the benchmark deadline, then task C is most The late time started is:The task C benchmark deadline subtracts task C average operation duration, i.e., 6:00-1.5 hours=4:30 points, task C Late Start is namely calculated according to task C The task B gone out Late Finish, is 4:30 points;
For task D, completed to task D before the benchmark deadline, then task D is most The late time started is:The task D benchmark deadline subtracts task D average operation duration, i.e., 6:00-2 hours=4:What 00, task D Late Start were namely calculated according to task D Task B Late Finish, is 4:00;
By above-mentioned it was determined that task B Late Finish is 4:00;
Then, because task B is needed 4:Completed before 00, then mean task B most The late time started should be:Task B Late Finish subtracts task B average operation duration, I.e. 4:00-2 hours=2:00, i.e. task B Late Start task A completion the latest Time;
Because task A is needed 2:Completed before 00, then mean starting the latest for task A Time should be:Task A Late Finish subtracts task A average operation duration, i.e., 2:00-10 minutes=1:50.
Certainly, if it is known that current time, can also calculate the time margin for the A that goes out on missions, i.e. task A Late Start and the time difference of current time.If for example, current time be 1 hour, Then task A time margin is 50 minutes.
After it is determined that reruning the Late Start of abnormal task, exception monitoring apparatus can be with According to the Late Start and current time, alert process flexibly is carried out to abnormal task.
If, can be immediately to abnormal task for example, Late Start is nearer apart from current time Alert process is carried out, so as to handle in time abnormal task;If Late Start away from From current time farther out, then alert process can be carried out to abnormal task a little later, so as to reasonable Time is alarmed, and reduction alarm is bothered user, reduces non-essential alarm.
The abnormal alarm time is to determine to the key that abnormal task carries out alert process.Wherein, it is abnormal Supervising device Main Basiss rerun the Late Start and current time of abnormal task, it is determined that The abnormal alarm time, then when reaching the abnormal alarm time, alert process is carried out to abnormal task.
Wherein, the Late Start and current time for reruning abnormal task are the abnormal reports of influence The principal element of alert time, also includes some other factorses certainly, for example need and alarm when Between section and abnormal task Exception Type etc..For application scenes, can preassign needs Will and alarm time range, referred to as specify time range.Specified time range can be work Make the time, such as 9:00--20:00.
Based on above-mentioned, whether exception monitoring apparatus may determine that current time in specifying time range Interior, if the determination result is YES, i.e., current time, which is in, specifies in time range, then by current time As the abnormal alarm time, when reaching the abnormal alarm time, alert process is carried out to abnormal task, Alert process namely is carried out to abnormal task immediately;If judged result is no, i.e., current time is not In specified time range, then according to the Exception Type of abnormal task and it can rerun different The Late Start of permanent business, determines the abnormal alarm time, when reaching the abnormal alarm time, Alert process is carried out to abnormal task.
Optionally, so that the Exception Type of abnormal task is slack-off including operation error and the speed of service as an example.
If the Exception Type of abnormal task malfunctions for operation, it may determine that and rerun abnormal task Late Start whether be later than the default very first time, if the determination result is YES, i.e., transport again The Late Start of row abnormal task is later than the default very first time, then sets and be later than current time But the second time earlier than the very first time is used as the abnormal alarm time;If judged result is no, also It is to say the Late Start for reruning abnormal task earlier than or equal to the default very first time, then Set current time as the abnormal alarm time, i.e., alert process is carried out to abnormal task immediately.Its In, alert process is being carried out when reaching the second time, equivalent to delayed alarm, is being conducive to avoiding The time of having a rest of user, it is possible to reduce bother user, and in the long run equivalent to widening Time interval between alarming twice, advantageously reduces alarm times, economizes on resources;And ought The preceding time as the abnormal alarm time can with and alarm, it is to avoid alarm the problem of bring not in time.
Illustrate herein, the present embodiment does not limit the value of the very first time and the second specified time, can To be set according to application scenarios adaptability.For example, the default very first time can be 11:00, accordingly , if current time is 9:Before 00, then second the time is specified to be 9:00, but be not limited to This.
If the Exception Type of abnormal task is that the speed of service is slack-off, it can be determined that rerun abnormal appoint Whether the Late Start of business and the time difference of current time are more than default time difference threshold value, if sentencing Disconnected result is yes, that is, reruns the Late Start of abnormal task and the time difference of current time More than default time difference threshold value, then the early institute of Late Start than operation exception task again is set The 3rd time of time difference threshold value is stated as the abnormal alarm time;If judged result is no, i.e., transport again The Late Start of row abnormal task and the time difference of current time are less than or equal to the default time difference Threshold value, then set current time to be used as the abnormal alarm time.Wherein, it will appoint than operation exception again 3rd time of the early time difference threshold value of the Late Start of business as the abnormal alarm time, quite In delayed alarm, be conducive to avoiding the time of having a rest of user, it is possible to reduce user is bothered, and And advantageously reduce alarm equivalent to the time interval widened between alarm twice in the long run Number of times, economizes on resources;And using current time as the abnormal alarm time can with and alarm, it is to avoid The problem of alarm is brought not in time.
Illustrate herein, the present embodiment does not limit the value of above-mentioned time difference threshold value, can be according to application Scene adaptability is set.For example, time difference threshold value can be 2 hours, but not limited to this.
Further, it is possible to pre-set alarm object and type of alarm.The alarm object is mainly Refer to the person liable or director for needing to handle abnormal task, for example, alarm object can be matched somebody with somebody Put in watch bill.The type of alarm includes following at least one:Audio alert, SMS alarm, Mail alarm, alarm lamp and instant messaging alarm etc..It is above-mentioned that abnormal task is carried out based on this Alert process is specially:According to the watch bill being pre-configured with, with the type of alarm of configuration to corresponding Person liable or director are alarmed, for example sent short messages to the terminal device of person liable or director or Mail, or voice message is carried out to person liable or director, etc..
From above-mentioned, exception monitoring apparatus is according to the Late Start for reruning abnormal task And current time, the abnormal alarm time can be flexibly determined, is conducive in the suitable time to exception Task carries out alert process, and without must be in the time of fire alarming being pre-configured with as prior art Alert process is carried out during arrival, flexibility is stronger, not only can with and alarm but can reduce it is unnecessary Alarm, advantageously reduce occur alarming not in time or inessential alarm probability, improve alarm Precision, is a kind of intelligent alarm scheme.
The structural representation for the exception monitoring apparatus that Fig. 4 provides for the another embodiment of the application.As schemed Shown in 4, the device includes:Abnormal task determining module 41, latest time determining module 42 and report Alert processing module 43.
Abnormal task determining module 41, for being appointed according to benchmark set in advance in task scheduling system Business, determines the abnormal task in task scheduling system.
Latest time determining module 42, during for being completed according to the benchmark of benchmark task set in advance Between, it is determined that reruning the Late Start of abnormal task.
Alert process module 43, reruns the Late Start of abnormal task for basis and works as The preceding time, alert process is carried out to abnormal task.
In an optional embodiment, as shown in figure 5, one kind of abnormal task determining module 41 is real Existing structure includes:Monitor task determining unit 411 and abnormal task acquiring unit 412.
Monitor task determining unit 411, for determine in task scheduling system by benchmark task institute according to Bad task is used as task to be monitored;
Abnormal task acquiring unit 412, for obtaining, running status in task to be monitored is abnormal to appoint Business is used as abnormal task.
Further, abnormal task acquiring unit 412 is specifically for performing following at least one operation:
Task of error is run in task to be monitored is obtained as abnormal task;
The slack-off task of the speed of service is obtained in task to be monitored as abnormal task.
Further, the speed of service in task to be monitored is obtained of abnormal task acquiring unit 412 becomes When slow task is as abnormal task, specifically for:
Obtain operation duration in task to be monitored and meet the task of elongate member when specifying as abnormal Business;Wherein, specify when elongate member include it is following at least one:
More than preset duration threshold value;
Designated ratio is had more than the average operation duration in the specified period.
In an optional embodiment, as shown in figure 5, one kind of alert process module realizes structure Including:First alert process unit 431 and the second alert process unit 432.
First alert process unit 431, for current time be in specify time range in when, Alert process is carried out to abnormal task immediately.
Second alert process unit 432, for current time be not in specify time range in when, According to the Exception Type of abnormal task and the Late Start of abnormal task is reruned, it is determined that The abnormal alarm time, when reaching the abnormal alarm time, alert process is carried out to abnormal task.
Further, the second alert process unit 432 specifically for:
If the Exception Type of abnormal task malfunctions for operation, abnormal task is being reruned the latest When time started is later than the default very first time, setting is later than current time but earlier than the very first time Second time as the abnormal alarm time, or, reruning when starting the latest of abnormal task Between earlier than or equal to the very first time when, set current time be used as the abnormal alarm time;
If the Exception Type of abnormal task is that the speed of service is slack-off, abnormal task is being reruned When the time difference of Late Start and current time is more than default time difference threshold value, ratio is set again When 3rd time of the early time difference threshold value of the Late Start of operation exception task is as abnormal alarm Between, or, it is small in the time difference for reruning the Late Start of abnormal task and current time When time difference threshold value, current time is set to be used as the abnormal alarm time.
The exception monitoring apparatus that the present embodiment is provided, in task scheduling process, according to presetting Benchmark task determine abnormal task, and then when being completed according to the benchmark of benchmark task set in advance Between, it is determined that rerun the Late Start of abnormal task, according to reruning abnormal task Late Start and current time, alert process is carried out to abnormal task, rather than as existing skill Art must carry out alert process when the time of fire alarming being pre-configured with is reached like that, and flexibility is stronger, Advantageously reduce occur alarming not in time or inessential alarm probability, improve alarm accuracy.
In addition, the exception monitoring apparatus provided using the present embodiment, only need to preset benchmark task And its benchmark deadline, the exception monitoring apparatus that the present embodiment is provided can be according to benchmark times Dependence in business and task scheduling system between other tasks is counter to release all of the benchmark task Upstream task, and then all upstream tasks of the benchmark task are monitored automatically, rather than as existing skill Art needs all to configure trigger condition, a time of fire alarming etc., tool for all upstream tasks like that There is configuration information less and the wider advantage of monitoring range, a fairly large number of of the task that is particularly suitable for use in Business scheduling system.
One of ordinary skill in the art will appreciate that:Realize whole or the portion of above-mentioned each method embodiment It can be completed step by step by the related hardware of programmed instruction.Foregoing program can be stored in one In computer read/write memory medium.Operationally, operation includes above-mentioned each method and implemented the program The step of example;And foregoing storage medium includes:ROM, RAM, magnetic disc or CD etc. are each Planting can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above only to the technical scheme for illustrating the application, and It is non-that it is limited;Although the application is described in detail with reference to foregoing embodiments, ability The those of ordinary skill in domain should be understood:It still can be to the technology described in foregoing embodiments Scheme is modified, or carries out equivalent substitution to which part or all technical characteristic;And this A little modifications are replaced, and the essence of appropriate technical solution is departed from each embodiment technology of the application The scope of scheme.

Claims (12)

1. a kind of abnormality monitoring method, it is characterised in that including:
According to benchmark task set in advance in task scheduling system, the task scheduling system is determined In abnormal task;
According to the benchmark deadline of the benchmark task set in advance, it is determined that reruning described The Late Start of abnormal task;
According to the Late Start and current time for reruning the abnormal task, to described different Permanent business carries out alert process.
2. according to the method described in claim 1, it is characterised in that described according to task scheduling system Benchmark task set in advance in system, determines the abnormal task in the task scheduling system, including:
Determine being relied on by the benchmark task for task in the task scheduling system as waiting to supervise Control task;
The task of running status exception in the task to be monitored is obtained as the abnormal task.
3. method according to claim 2, it is characterised in that the acquisition is described to be monitored The abnormal task of running status is used as the abnormal task, including following at least one operation in task:
Task of error is run in the task to be monitored is obtained as the abnormal task;
The slack-off task of the speed of service is obtained in the task to be monitored as the abnormal task.
4. method according to claim 3, it is characterised in that the acquisition is described to be monitored The slack-off task of the speed of service is as the abnormal task in task, including:
Obtain operation duration in the task to be monitored and meet the task of elongate member when specifying as described Abnormal task;Wherein, when described specified elongate member include it is following at least one:
More than preset duration threshold value;
Designated ratio is had more than the average operation duration in the specified period.
5. the method according to claim any one of 1-4, it is characterised in that described according to weight The Late Start and current time of the abnormal task are newly run, the abnormal task is carried out Alert process, including:
Specify in time range, the abnormal task is reported if the current time is in immediately Alert processing;
If the current time is not in specifying in time range, according to the exception of the abnormal task Type and the Late Start for reruning the abnormal task, determine the abnormal alarm time, When reaching the abnormal alarm time, alert process is carried out to the abnormal task.
6. method according to claim 5, it is characterised in that described to be appointed according to the exception The Exception Type of business and the Late Start for reruning the abnormal task, it is determined that abnormal report The alert time, including:
If the Exception Type of the abnormal task malfunctions for operation, appoint reruning the exception When the Late Start of business is later than the default very first time, setting is later than current time but earlier than institute The second time of the very first time is stated as the abnormal alarm time, or, it is described reruning When the Late Start of abnormal task is earlier than or equal to the very first time, current time is set to make For the abnormal alarm time;
If the Exception Type of the abnormal task is that the speed of service is slack-off, described different reruning When the Late Start of permanent business and the time difference of current time are more than default time difference threshold value, if Put the 3rd time of the time difference threshold value more early than the Late Start for reruning the abnormal task As the abnormal alarm time, or, reruning when starting the latest of the abnormal task Between when being less than or equal to the time difference threshold value with time difference of current time, current time conduct is set The abnormal alarm time.
7. a kind of exception monitoring apparatus, it is characterised in that including:
Abnormal task determining module, for according to benchmark task set in advance in task scheduling system, Determine the abnormal task in the task scheduling system;
Latest time determining module, for being completed according to the benchmark of the benchmark task set in advance Time, it is determined that reruning the Late Start of the abnormal task;
Alert process module, for according to rerun the abnormal task Late Start and Current time, alert process is carried out to the abnormal task.
8. device according to claim 7, it is characterised in that the abnormal task determines mould Block includes:
Monitor task determining unit, for determining being appointed by the benchmark in the task scheduling system The relied on task of business is used as task to be monitored;
Abnormal task acquiring unit, for obtaining, running status in the task to be monitored is abnormal to appoint Business is used as the abnormal task.
9. device according to claim 8, it is characterised in that the abnormal task obtains single Member is specifically for performing following at least one operation:
Task of error is run in the task to be monitored is obtained as the abnormal task;
The slack-off task of the speed of service is obtained in the task to be monitored as the abnormal task.
10. device according to claim 9, it is characterised in that the abnormal task is obtained Unit specifically for:
Obtain operation duration in the task to be monitored and meet the task of elongate member when specifying as described Abnormal task;Wherein, when described specified elongate member include it is following at least one:
More than preset duration threshold value;
Designated ratio is had more than the average operation duration in the specified period.
11. the device according to claim any one of 7-10, it is characterised in that the alarm Processing module includes:
First alert process unit, for the current time be in specify time range in when, Alert process is carried out to the abnormal task immediately;
Second alert process unit, for the current time be not in specify time range in when, According to the Exception Type of the abnormal task and rerun when starting the latest of the abnormal task Between, the abnormal alarm time is determined, when reaching the abnormal alarm time, to the abnormal task Carry out alert process.
12. device according to claim 11, it is characterised in that second alert process Unit specifically for:
If the Exception Type of the abnormal task malfunctions for operation, appoint reruning the exception When the Late Start of business is later than the default very first time, setting is later than current time but earlier than institute The second time of the very first time is stated as the abnormal alarm time, or, it is described reruning When the Late Start of abnormal task is earlier than or equal to the very first time, current time is set to make For the abnormal alarm time;
If the Exception Type of the abnormal task is that the speed of service is slack-off, described different reruning When the Late Start of permanent business and the time difference of current time are more than default time difference threshold value, if Put the 3rd time of the time difference threshold value more early than the Late Start for reruning the abnormal task As the abnormal alarm time, or, reruning when starting the latest of the abnormal task Between when being less than or equal to the time difference threshold value with time difference of current time, current time conduct is set The abnormal alarm time.
CN201610184288.1A 2016-03-28 2016-03-28 abnormality monitoring method and device Pending CN107241205A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201610184288.1A CN107241205A (en) 2016-03-28 2016-03-28 abnormality monitoring method and device
TW106105604A TW201737084A (en) 2016-03-28 2017-02-20 Abnormality monitoring method and device
PCT/CN2017/076891 WO2017167021A1 (en) 2016-03-28 2017-03-16 Abnormality monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610184288.1A CN107241205A (en) 2016-03-28 2016-03-28 abnormality monitoring method and device

Publications (1)

Publication Number Publication Date
CN107241205A true CN107241205A (en) 2017-10-10

Family

ID=59963429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610184288.1A Pending CN107241205A (en) 2016-03-28 2016-03-28 abnormality monitoring method and device

Country Status (3)

Country Link
CN (1) CN107241205A (en)
TW (1) TW201737084A (en)
WO (1) WO2017167021A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245127A (en) * 2019-06-12 2019-09-17 成都九洲电子信息系统股份有限公司 A kind of data migration method based on Row control
CN111324650A (en) * 2020-02-16 2020-06-23 广州信安数据有限公司 Task processing efficiency real-time evaluation early warning method, computer readable storage medium and enterprise data management system
CN111427748A (en) * 2020-03-31 2020-07-17 携程计算机技术(上海)有限公司 Task warning method, system, equipment and storage medium
CN111858065A (en) * 2020-07-28 2020-10-30 中国平安财产保险股份有限公司 Data processing method, device, storage medium and device
CN112328377A (en) * 2020-11-04 2021-02-05 北京字节跳动网络技术有限公司 Baseline monitoring method and device, readable medium and electronic equipment

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108011782B (en) * 2017-12-06 2020-10-16 北京百度网讯科技有限公司 Method and device for pushing alarm information
CN110113201B (en) * 2019-04-30 2022-12-23 平安科技(深圳)有限公司 Monitoring data processing method and device and monitoring system
CN110348718B (en) * 2019-06-28 2023-11-14 北京淇瑀信息科技有限公司 Service index monitoring method and device and electronic equipment
CN112817686B (en) * 2019-11-15 2023-07-25 北京百度网讯科技有限公司 Method, device, equipment and computer storage medium for detecting virtual machine abnormality
CN111010292A (en) * 2019-11-26 2020-04-14 苏宁云计算有限公司 Offline task delay warning system and method and computer system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283351A1 (en) * 2006-05-31 2007-12-06 Degenaro Louis R Unified job processing of interdependent heterogeneous tasks
CN101110041A (en) * 2007-08-23 2008-01-23 南京联创科技股份有限公司 Method for managing group task
CN101425024A (en) * 2008-10-24 2009-05-06 中国移动通信集团山东有限公司 Multitasking method and device
CN102004973A (en) * 2010-12-30 2011-04-06 用友软件股份有限公司 Task making method and device
CN103034554A (en) * 2012-12-30 2013-04-10 焦点科技股份有限公司 ETL (Extraction-Transformation-Loading) dispatching system and method for error-correction restarting and automatic-judgment starting

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283351A1 (en) * 2006-05-31 2007-12-06 Degenaro Louis R Unified job processing of interdependent heterogeneous tasks
CN101110041A (en) * 2007-08-23 2008-01-23 南京联创科技股份有限公司 Method for managing group task
CN101425024A (en) * 2008-10-24 2009-05-06 中国移动通信集团山东有限公司 Multitasking method and device
CN102004973A (en) * 2010-12-30 2011-04-06 用友软件股份有限公司 Task making method and device
CN103034554A (en) * 2012-12-30 2013-04-10 焦点科技股份有限公司 ETL (Extraction-Transformation-Loading) dispatching system and method for error-correction restarting and automatic-judgment starting

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245127A (en) * 2019-06-12 2019-09-17 成都九洲电子信息系统股份有限公司 A kind of data migration method based on Row control
CN111324650A (en) * 2020-02-16 2020-06-23 广州信安数据有限公司 Task processing efficiency real-time evaluation early warning method, computer readable storage medium and enterprise data management system
CN111427748A (en) * 2020-03-31 2020-07-17 携程计算机技术(上海)有限公司 Task warning method, system, equipment and storage medium
CN111858065A (en) * 2020-07-28 2020-10-30 中国平安财产保险股份有限公司 Data processing method, device, storage medium and device
CN111858065B (en) * 2020-07-28 2023-02-03 中国平安财产保险股份有限公司 Data processing method, device, storage medium and device
CN112328377A (en) * 2020-11-04 2021-02-05 北京字节跳动网络技术有限公司 Baseline monitoring method and device, readable medium and electronic equipment
CN112328377B (en) * 2020-11-04 2022-04-19 北京字节跳动网络技术有限公司 Baseline monitoring method and device, readable medium and electronic equipment
US11853792B2 (en) 2020-11-04 2023-12-26 Beijing Bytedance Network Technology Co., Ltd. Baseline monitoring method and apparatus, readable medium, and electronic device

Also Published As

Publication number Publication date
WO2017167021A1 (en) 2017-10-05
TW201737084A (en) 2017-10-16

Similar Documents

Publication Publication Date Title
CN107241205A (en) abnormality monitoring method and device
CN111813624B (en) Robot execution time length estimation method based on time length analysis and related equipment thereof
JP2004171249A (en) Backup execution decision method for database
CN102226890B (en) Monitoring method and device for host batch job data
US8489729B2 (en) System and method for social service event processing and management
CN103034554A (en) ETL (Extraction-Transformation-Loading) dispatching system and method for error-correction restarting and automatic-judgment starting
JP4502414B2 (en) Production management information output device and production management information output method
CN107153593A (en) A kind of Internet service monitoring threshold value determination method and device
CN105528264B (en) Fool proof data reconstruction method and its system
CN111522719B (en) Big data task state monitoring method and device
CN112181619A (en) Scheduling method, device, equipment and medium for batch service
JP2004206611A (en) Backup system
CN107797856B (en) Scheduled task management and control method and device based on windows service and storage medium
CN106779605A (en) A kind of method of calendar prompting, device, computing device and storage medium
CN109284193A (en) A kind of distributed data processing method and server based on multithreading
CN107783843B (en) Cloud data processing method and device
CN109710442A (en) A kind of execution method and apparatus of task
CN115471092A (en) Food production work reporting method and device, electronic equipment and storage medium
CN115099778A (en) Management and control method, system, equipment and medium for industrial manufacturing equipment
CN111159188B (en) Processing method for realizing quasi-real-time large data volume based on DataWorks
CN114742521A (en) Reminding method and device, computer equipment and computer readable storage medium
CN113419921A (en) Task monitoring method, device, equipment and storage medium
JP2002318736A (en) System, method and program for saving log data
CN109933485B (en) Scheduling method for algorithm execution and monitoring
CN114912880A (en) False duty processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171010