CN110309024B - Data processing system and method for executing data processing task - Google Patents

Data processing system and method for executing data processing task Download PDF

Info

Publication number
CN110309024B
CN110309024B CN201910327896.7A CN201910327896A CN110309024B CN 110309024 B CN110309024 B CN 110309024B CN 201910327896 A CN201910327896 A CN 201910327896A CN 110309024 B CN110309024 B CN 110309024B
Authority
CN
China
Prior art keywords
data processing
target
queue
task
semaphore
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910327896.7A
Other languages
Chinese (zh)
Other versions
CN110309024A (en
Inventor
王晟
陈少龙
邹艺林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangsu Science and Technology Co Ltd
Original Assignee
Wangsu Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangsu Science and Technology Co Ltd filed Critical Wangsu Science and Technology Co Ltd
Priority to CN201910327896.7A priority Critical patent/CN110309024B/en
Publication of CN110309024A publication Critical patent/CN110309024A/en
Application granted granted Critical
Publication of CN110309024B publication Critical patent/CN110309024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a data processing system and a method for executing a data processing task, belonging to the technical field of communication. The system comprises a scheduling device and a data processing platform, and the method comprises the following steps: the scheduling equipment acquires a target timing task, and generates a target task queue set corresponding to the target timing task, wherein the target task queue set at least comprises a queue to be executed and an execution queue; the scheduling equipment generates a data processing task at regular time based on the target timing task, and adds a semaphore corresponding to the data processing task to a queue to be executed; when the target semaphore is detected in the queue to be executed, the data processing platform transfers the target semaphore to the execution queue and executes a target data processing task corresponding to the target semaphore; and if the target data processing task is abnormal in execution, the scheduling equipment re-adds the target semaphore to the queue to be executed. By adopting the invention, the data processing task which is abnormal in execution can be automatically re-executed without manual intervention.

Description

Data processing system and method for executing data processing task
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a data processing system and a method for executing a data processing task.
Background
The data processing platform can analyze a large amount of application data generated in the running process of the service application, and operation and maintenance personnel of the service application can rapidly position abnormal events occurring in the running process of the service application according to data processing results fed back by the data processing platform, so that the running performance of the service application is optimized.
The prior art may cause a data processing platform to periodically generate data processing tasks by setting timing tasks. Existing data processing platforms typically process continuously generated application data in a batch computing manner, and in particular, the data processing platform may periodically generate data processing tasks for the application data based on timed tasks, and then create a job process for each data processing task to process all application data generated by a business application within a time window corresponding to each data processing task. For example, the operation and maintenance personnel of the business application set a timing task in the data processing platform, from the moment 8:00, detect whether the business application has been running abnormally every three minutes, when the moment 8:03 is reached, the data processing platform generates a data processing task and creates a corresponding operation process, the operation process analyzes the application data generated by the business application from the moment 8:00 to the moment 8:03 to judge whether the business application has been running abnormally, when the moment 8:06 is reached, the data processing platform generates another data processing task and creates a corresponding operation process, and the operation process analyzes the application data generated by the business application from the moment 8:03 to the moment 8:06 to judge whether the business application has been running abnormally, and so on.
The inventor of the present patent application, through studying the existing process of executing a data processing task by using a data processing platform of batch processing computing, found that at least the following problems exist in the prior art: the data processing platform adopting batch processing calculation cannot automatically re-execute the data processing tasks with abnormal execution (including overtime execution and execution failure), and needs an operation and maintenance personnel to actively inquire whether the overtime or failed execution data processing tasks exist or not, and perform manual intervention on the overtime or failed execution data processing tasks to re-execute the overtime or failed execution data processing tasks, so that the manual maintenance efficiency is low and the cost is high, and when a large number of data processing tasks corresponding to a short time window (usually not more than ten minutes) are executed, the data processing platform needs to frequently create and delete a large number of job processes for executing the data processing tasks, and the data processing platform resources are consumed and the stability of the data processing platform is affected.
Disclosure of Invention
The present application is directed to a data processing system and a method for performing data processing tasks thereof, so as to solve some or all of the problems in the prior art.
To achieve the above object, in one aspect, the present application provides a method for performing a data processing task, the method being applied to a data processing system, the data processing system including a scheduling device and a data processing platform, the method comprising: the scheduling equipment acquires a target timing task, and generates a target task queue set corresponding to the target timing task, wherein the target task queue set at least comprises a queue to be executed and an execution queue; the scheduling equipment generates a data processing task at regular time based on the target timing task, and adds a semaphore corresponding to the data processing task to a queue to be executed; when the target semaphore is detected in the queue to be executed, the data processing platform transfers the target semaphore to the execution queue and executes a target data processing task corresponding to the target semaphore; and if the target data processing task is abnormal in execution, the scheduling equipment re-adds the target semaphore to the queue to be executed.
Further, the system also comprises a management device; the steps before the scheduling device acquires the target timing task include: the management device configures the target timed task based on the CRON expression and adds the CRON expression to the task configuration table to cause the scheduling device to read the target timed task from the task configuration table.
In one embodiment, the method further comprises: the data processing platform acquires the target signal quantity from the queue to be executed according to the arrangement sequence of the target signal quantity in the queue to be executed; if a modification request of the execution priority of the target data processing task is detected, the scheduling device modifies the arrangement sequence of the target semaphore in the queue to be executed, so that the data processing platform obtains the target semaphore from the queue to be executed according to the modified arrangement sequence.
In one embodiment, the set of target task queues further includes a failure queue; if the target data processing task is abnormal in execution, the step of re-adding the target semaphore to the queue to be executed by the scheduling device comprises the following steps: if the target data processing task fails to execute, the data processing platform transfers the target semaphore to a failure queue; when the target semaphore is detected in the failure queue, the scheduling device transfers the target semaphore to the queue to be executed.
In one embodiment, the set of target task queues further includes a timeout queue; if the target data processing task is abnormal in execution, the step of re-adding the target semaphore to the queue to be executed by the scheduling device comprises the following steps: if the storage time of the target semaphore in the execution queue exceeds the preset storage time, the scheduling equipment transfers the target semaphore to the overtime queue; when the target semaphore is detected in the timeout queue, the scheduling device transfers the target semaphore to the queue to be executed.
In one embodiment, the target task queue set further comprises an archive queue; the method further comprises the steps of: recording an execution result of successful or failed execution of the data processing task in the semaphore by the data processing platform, and adding the semaphore recorded with the execution result to an archiving queue; the scheduling device records the execution result of the execution timeout of the data processing task in the semaphore and adds the semaphore recorded with the execution result to the archive queue.
Further, the data processing system further comprises archiving means; the steps after the scheduling device or the data processing platform adds the semaphore carrying the execution result to the archive queue include: when the semaphore corresponding to the data processing task is detected from the archiving queue, the archiving device acquires the semaphore, reads the execution result, and stores the execution result in a task execution result set in an external database.
Further, the data processing system also comprises alarm equipment; the steps after the archiving device stores the execution result in the task execution result set in the external database include: and when the execution result of the target data processing task is detected from the task execution result set to be that the execution failure or the execution overtime times are larger than the preset abnormal times, the alarm equipment alarms.
In one embodiment, the data processing platform employs streaming computing to perform data processing tasks; the method further comprises the steps of: and the data processing platform detects whether the signal quantity exists in the queue to be executed according to a preset period through a resident process corresponding to the target timing task.
In order to achieve the above objective, another aspect of the present application provides a data processing system, where the data processing system includes a scheduling device and a data processing platform, where the scheduling device is configured to obtain a target timing task, generate a target task queue set corresponding to the target timing task, where the target task queue set includes at least a queue to be executed, an execution queue and a failure queue; the scheduling device is also used for generating data processing tasks at regular time based on the target timing tasks and adding the semaphores corresponding to the data processing tasks to the queue to be executed; when the target signal quantity is detected in the queue to be executed, the data processing platform is used for transferring the target signal quantity to the execution queue and executing a target data processing task corresponding to the target signal quantity; if the target data processing task is abnormal in execution, the scheduling device is further used for re-adding the target semaphore to the queue to be executed.
Further, the system is used for realizing the method for executing the data processing task.
Compared with the prior art, the method and the device have the advantages that the data processing tasks corresponding to the semaphores are scheduled through the scheduling semaphores, the semaphores are stored through the task queue set, the coupling degree of the scheduling equipment and the data processing platform is reduced, and the independent operation capacity of the scheduling equipment and the data processing platform is enhanced. Secondly, classifying the execution states of the data processing tasks through various queues in the task queue set, rescheduling the signal quantity corresponding to the data processing tasks with overtime execution or failed execution, and further automatically re-executing the data processing tasks, thereby avoiding manual intervention. Thirdly, the execution results of the data processing tasks are uniformly and permanently stored in an external database through the filing equipment, so that operation and maintenance personnel of business application can conveniently review and analyze historical execution results. Fourth, the alarm device periodically acquires the execution result stored in the external database, so that the abnormal execution of the data processing task can be alarmed. Fifth, the data processing platform can process data by adopting stream computing through resident processes, and a large number of job processes for executing data processing tasks do not need to be frequently created and deleted, so that the stability of the data processing system in operation and maintenance is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for performing data processing tasks according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a transfer flow of a semaphore corresponding to a successful execution of a data processing task according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a transfer flow of semaphores corresponding to a failed data processing task according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a transfer flow of semaphores corresponding to a data processing task that performs timeout according to an embodiment of the invention
FIG. 5 is a block diagram illustrating an overall architecture of a data processing system according to another embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides a method for executing data processing tasks, which is applied to a data processing system and can be jointly realized by scheduling equipment and a data processing platform in the data processing system. The scheduling device is used for scheduling data processing tasks, planning the execution time of each data processing task, generating a corresponding task queue set for each timing task, generating the data processing tasks and signal quantity corresponding to each data processing task according to the target timing task, and storing the signal quantity in the corresponding task queue set. It should be noted that, the scheduling device may be a distributed device, and may include a plurality of scheduling hosts, where the plurality of scheduling hosts may enable a distributed lock service, request to acquire a lock according to a preset period, only one scheduling host acquires the lock at any moment, and implement a function of the scheduling device by the scheduling host that acquires the lock, where other scheduling hosts are used to implement fault backup of the scheduling device. The data processing platform can acquire the semaphore from the task queue set and execute the data processing task corresponding to the acquired semaphore; the data processing platform can also re-add the signal quantity corresponding to the data processing task with the execution failure to the corresponding task queue set, so that the data processing platform can re-execute the corresponding data processing task. It should be noted that the data processing platform may include a data transceiver device, where the data transceiver device is configured to obtain data to be processed for executing a data processing task from another device, and send a task execution result to the other device.
Fig. 1 is a flowchart of a method for performing a data processing task according to an embodiment of the present invention.
S101, a scheduling device acquires a target timing task, and generates a target task queue set corresponding to the target timing task, wherein the target task queue set at least comprises a queue to be executed and an execution queue.
In this embodiment, when the operator of the service application configures the target timing task, the data processing system may generate the task ID of the target timing task according to the target timing task. Where the task ID of the target timed task is a globally unique identification code (e.g., self-increment ID) used to tag the target timed task, it will be appreciated that in this data processing system, the task IDs of the different timed tasks are different. When the scheduling device in the data processing system acquires the target timing task, the task ID of the target timing task can be acquired at the same time. After the scheduling device obtains the target timing task and the task ID of the target timing task, a target task queue set corresponding to the target timing task can be generated, and the target task queue set is marked by the task ID of the target timing task, so that the corresponding relation between the target timing task and the target task queue set is established. The target task queue set is used for storing a signal quantity corresponding to a data processing task, and the target task queue set at least comprises a queue to be executed and an execution queue, wherein the signal quantity is stored in the queue to be executed, and the data processing task corresponding to the signal quantity is in a state of waiting to be executed; the semaphore is stored in the execution queue, indicating that the data processing task to which the semaphore corresponds is in an executing state.
Optionally, the data processing system may further comprise a management device by which the operator may configure the timing tasks. Accordingly, the steps before the scheduling device acquires the target timing task may include: the management device configures the target timed task based on the CRON expression and adds the CRON expression to the task configuration table to cause the scheduling device to read the target timed task from the task configuration table.
In one embodiment, the management device may have a UI (User Interface) to facilitate operation by an operator of the business application. The operation and maintenance personnel of the business application can manually configure the target timing task on the management device through a CRON expression, and the configuration data recorded in the CRON expression can comprise data processing instructions of the data processing task and generation conditions of the data processing task. The generation conditions of the data processing task may include parameters such as a first generation time of the data processing task, a generation frequency of the data processing task, and the like. After the target timing task is configured, the management device may store the target timing task in the task configuration table, so that the scheduling device reads the target timing task from the task configuration table. It will be appreciated that the scheduling device may continually monitor or periodically read timed tasks from the task configuration table. In addition, the operation and maintenance personnel of the business application can modify the target timing task existing in the task configuration table through the management equipment, and the scheduling equipment can generate the data processing task according to the modified target timing task after reading the modified target timing task.
S102, the scheduling device generates data processing tasks based on the target timing tasks at fixed time, and adds the semaphores corresponding to the data processing tasks to the queue to be executed.
In this embodiment, the data recorded in the semaphore may include a task ID and the remaining execution times of the data processing task, and it may be understood that, for each execution of the data processing task, the remaining execution times of the data processing task recorded in the corresponding semaphore is reduced by one. After the scheduling device in the data processing system acquires the target timing task, the data processing task can be automatically generated at regular intervals from a specific time according to the configuration data of the target timing task, and manual intervention is not needed. In order to reduce the coupling degree between the scheduling device and the data processing platform in the data processing system, the scheduling device may generate a semaphore corresponding to each data processing task while generating each data processing task, and add the semaphore to a queue to be executed in a target task queue set corresponding to a target timing task through a scheduling process, so that the data processing platform in the data processing system acquires the semaphore and further executes the data processing task corresponding to the semaphore.
It should be noted that, the storage space of each queue in the target task queue set may be fixed in size set in advance, or may be dynamically adjusted in size according to actual requirements, which is not limited in the present invention.
S103, when the target semaphore is detected in the queue to be executed, the data processing platform transfers the target semaphore to the execution queue and executes a target data processing task corresponding to the target semaphore.
In this embodiment, since the data processing platform executes the data processing task corresponding to the semaphore only after the semaphore is acquired, the data processing platform may detect whether the semaphore exists in the queue to be executed in the target task queue set at regular intervals, and when the target semaphore is detected in the queue to be executed, the data processing platform determines that the data processing task corresponding to the target semaphore may be executed, at this time, the data processing platform transfers the target semaphore to the execution queue, and then the data processing platform executes the target data processing task corresponding to the target semaphore.
Optionally, an operator of the service application may perform manual intervention on the execution priority of the data processing task to modify the execution sequence of the data processing task, and correspondingly, the method provided by the present invention may further include: the data processing platform acquires the target signal quantity from the queue to be executed according to the arrangement sequence of the target signal quantity in the queue to be executed; if a modification request of the execution priority of the target data processing task is detected, the scheduling device modifies the arrangement sequence of the target semaphore in the queue to be executed, so that the data processing platform acquires the target semaphore from the queue to be executed according to the modified arrangement sequence.
In one embodiment, the higher the execution priority of the data processing task is, the more front the storage position of the semaphore in the queue to be executed is, and the data processing platform sequentially acquires the semaphores according to the arrangement sequence of the semaphores in the queue to be executed, so as to sequentially execute the data processing task corresponding to each semaphore. Therefore, when the execution priority of the target data processing task needs to be modified, the operator of the service application can modify the storage position of the target signal quantity in the queue to be executed through the scheduling device, and then the data processing platform can acquire the target signal quantity from the queue to be executed according to the storage position of the target signal quantity. For example, the scheduling apparatus detects a modification request of the execution priority of the target data processing task, which is to change the execution priority of the target data processing task to the highest, at which time the scheduling apparatus may store the target semaphore in the head of the queue to be executed. It may be appreciated that, if the target semaphore is stored at the tail of the queue to be executed, the scheduling device may move the target semaphore located at the tail of the queue to be executed to the head of the queue to be executed; if the target semaphore is stored in the failure queue, the scheduling device may move the target semaphore in the failure queue to the head of the queue to be executed, and then the data processing platform may first acquire the target semaphore from the queue to be executed.
S104, if the target data processing task is abnormal in execution, the scheduling device re-adds the target semaphore to the queue to be executed.
In this embodiment, the execution result of the target data processing task may be abnormal execution or successful execution, and when the target data processing task is abnormal execution, the scheduling device may add the target semaphore to the queue to be executed again in order to re-execute the target data processing task, so that the data processing platform may re-acquire the target semaphore, and further re-execute the target data processing task corresponding to the target semaphore.
Optionally, in order for the data processing platform to automatically re-execute the data processing task with the execution failure and distinguish the data processing task with the execution failure from other data processing tasks, a failure queue may be set in the target task queue set, and correspondingly, step S104 may specifically include: if the target data processing task fails to execute, the data processing platform transfers the target semaphore to a failure queue; when the target semaphore is detected in the failure queue, the scheduling device transfers the target semaphore to the queue to be executed.
In one embodiment, during execution of the target data processing task, if the execution of the target data processing task fails, the data processing platform may first transfer the target semaphore from the execution queue to the failure queue, and then the scheduling device may transfer the target semaphore to the queue to be executed after detecting the target semaphore in the failure queue. It should be noted that, the scheduling device may periodically detect whether there is a semaphore in the failure queue of the target timing task, and when one or more semaphores are detected in the failure queue, the scheduling device transfers the semaphores in the failure queue to the queue to be executed one by one according to the arrangement order. The reason for the failure of executing the data processing task may be an external factor, for example, the data processing platform is required to acquire the data to be processed from the external database of the system to execute the target data processing task, and the target data processing task fails to access the external database due to temporary failure of an interface of the external database in the executing process, so that the target data processing task fails to execute.
Optionally, in order for the data processing platform to automatically re-execute the data processing task for which the execution timeout occurs and distinguish the data processing task for which the execution timeout occurs from other data processing tasks, the step S104 may specifically include: if the storage time of the target semaphore in the execution queue exceeds the preset storage time, the scheduling equipment transfers the target semaphore to the overtime queue; when the semaphore corresponding to the data processing task is detected in the timeout queue, the scheduling device transfers the semaphore to the queue to be executed.
In one embodiment, the preset storage time of the semaphore is the longest time that the semaphore can be stored in the execution queue, and represents the allowable duration of time for executing the data processing task corresponding to the semaphore. If the scheduling equipment monitors that the storage time of the target signal quantity in the execution queue exceeds the preset storage time, the time consumed by the data processing platform when executing the target data processing task exceeds the time consumed by the data processing platform, and at the moment, the execution result of the target data processing task is the execution timeout, and the data processing platform finishes executing the target data processing task. Then, the scheduling device transfers the target semaphore to the timeout queue, and then, the scheduling device transfers the target semaphore to the queue to be executed after detecting the target semaphore in the timeout queue. It should be noted that, the scheduling device may periodically detect whether there is a semaphore in the timeout queue of the target timing task, and when one or more semaphores are detected in the timeout queue, the scheduling device transfers the semaphore in the timeout queue to the queue to be executed one by one according to the arrangement order.
It is worth mentioning that the scheduling device may determine, by means of the timeout checker, whether the storage time of the target semaphore in the execution queue exceeds a preset storage time. For example, when the target semaphore starts to be stored in the execution queue, the timeout checking process may set a timer whose timeout period is a preset storage period of the target semaphore, and when the timer times up to the timeout period, the target semaphore is not removed from the execution queue, indicating that the storage period of the target semaphore in the execution queue exceeds the preset storage period.
In one embodiment, to prevent insufficient storage space remaining in the queue to be executed from causing a semaphore storage failure, an unallocated queue may be set as a buffer before the queue to be executed. The situation that the data processing platform transfers the semaphore from the execution queue to the failure queue and the scheduling device transfers the semaphore from the execution queue to the timeout queue is suitable for this embodiment. Before adding the semaphore in the failure queue and the timeout queue to the queue to be executed, the scheduling device may first add the semaphore to an unallocated queue in the target task queue set. Thus, when the semaphore suddenly increases within a certain period of time, for example, the data processing task corresponding to the target timing task frequently executes an exception, the unallocated queue needs to store not only the newly generated semaphore but also the semaphore corresponding to the data processing task executing the exception, at this time, the amount of the semaphore to be stored exceeds the amount that can be accommodated in the queue to be executed, the semaphore that the queue to be executed cannot store is temporarily stored in the unallocated queue, the scheduling system may periodically detect whether the semaphore exists in the unallocated queue, and detect whether the remaining storage space of the queue to be executed is enough to store one or more semaphores, and when the semaphore exists in the unallocated queue, and the remaining storage space of the queue to be executed is enough to store one or more semaphores, the scheduling apparatus may transfer the semaphore in the unallocated queue to the queue to be executed one by one according to the order of arrangement.
Optionally, the target task queue set may further include an archive queue for storing execution results of the data processing task, and correspondingly, the method provided by the present invention may further include: recording an execution result of successful or failed execution of the data processing task in the semaphore by the data processing platform, and adding the semaphore recorded with the execution result to an archiving queue; the scheduling device records the execution result of the execution timeout of the data processing task in the semaphore and adds the semaphore recorded with the execution result to the archive queue.
In one embodiment, since the two execution results of the execution failure and the execution success can be directly determined by the data processing platform at the end of the execution of the data processing task, the execution result of the execution timeout is determined by the scheduling device, specifically, the scheduling device can determine whether the corresponding data processing task executes the timeout by detecting the time of the semaphore stored in the execution queue through the timeout checking process. Here, in order to facilitate the subsequent acquisition of the execution results, the three execution results may be uniformly stored in the archive queue. Therefore, the data processing platform can record the execution result of the execution failure or the execution success of the data processing task in the signal quantity corresponding to the data processing task, and add the signal quantity recorded with the execution result to the archive queue. The scheduling device may record an execution result of the execution timeout of the data processing task in the semaphore corresponding to the data processing task, and add the semaphore in which the execution result is recorded to the archive queue.
For the cases of successful execution, execution failure and execution timeout of the data processing task, the transfer flow of the semaphore corresponding to the data processing task may be respectively referred to fig. 2 to fig. 4, where the sequence numbers in fig. 2 to fig. 4 represent the moving sequence of the semaphore.
In one embodiment, to prevent insufficient storage space remaining in the queue to be executed from causing a semaphore storage failure, an unallocated queue may be set as a buffer before the queue to be executed. The situation that the data processing platform transfers the semaphore from the execution queue to the failure queue and the scheduling device transfers the semaphore from the execution queue to the timeout queue is suitable for this embodiment. Before adding the semaphore in the failure queue and the timeout queue to the queue to be executed, the scheduling device may first add the semaphore to an unallocated queue in the target task queue set. Thus, when the semaphore suddenly increases within a certain period of time, for example, the data processing task corresponding to the target timing task frequently executes an exception, the unallocated queue needs to store not only the newly generated semaphore but also the semaphore corresponding to the data processing task executing the exception, at this time, the amount of the semaphore to be stored exceeds the amount that can be accommodated in the queue to be executed, the semaphore that the queue to be executed cannot store is temporarily stored in the unallocated queue, the scheduling system may periodically detect whether the semaphore exists in the unallocated queue, and detect whether the remaining storage space of the queue to be executed is enough to store one or more semaphores, and when the semaphore exists in the unallocated queue, and the remaining storage space of the queue to be executed is enough to store one or more semaphores, the scheduling apparatus may transfer the semaphore in the unallocated queue to the queue to be executed one by one according to the order of arrangement.
Optionally, in order to permanently store the execution results of all the data processing tasks, the data processing system may further include an archiving device, and the step after the scheduling device or the data processing platform adds the semaphore carrying the execution result to the archiving queue may include: when the semaphore corresponding to the data processing task is detected from the archiving queue, the archiving device acquires the semaphore, reads the execution result, and stores the execution result in a task execution result set in an external database.
In one embodiment, the archiving device may periodically detect whether the semaphore exists in the archiving queue, and when detecting the semaphore corresponding to the data processing task from the archiving queue, the archiving device may acquire all the semaphores stored in the archiving queue, read the execution result of the data processing task corresponding to each semaphore, and store the execution result in a task execution result set in an external database, where the task execution result set may be used as a data source for displaying an execution result page and alarming of abnormal execution of the subsequent data processing task. Wherein the external database may be a distributed storage device. It should be noted that, the archiving device may be a distributed device, and may include a plurality of archiving hosts, where the plurality of archiving hosts may enable a distributed lock service, request to acquire a lock according to a preset period, only one archiving host acquires the lock at any time, and the archiving host acquiring the lock implements the function of the archiving device, and other archiving hosts are used to implement fault backup of the archiving device.
Optionally, the data processing system may further comprise an alarm device in order to automatically alarm when the data processing task is executing abnormally. Accordingly, the step after the archiving device stores the execution result in the task execution result set in the external database may include: and when the execution result of the target data processing task is detected from the task execution result set to be that the execution failure and the execution overtime times are larger than the preset abnormal times, the alarm equipment alarms.
In one embodiment, a schematic diagram of the overall framework of a data processing system may be seen in FIG. 5. In order to timely alarm the data processing tasks which are re-executed for a plurality of times and still execute abnormal, preset abnormal times can be set on alarm equipment in the data processing system, and when the times of executing the abnormal times of each data processing task exceeds the preset abnormal times, the alarm equipment automatically alarms to inform operation and maintenance personnel. For example, the alarm device may periodically access a task execution result set of the external database according to a certain statistical period, calculate the execution result of the target data processing task in each period as the number of execution failure and execution timeout, and alarm when the calculation result is greater than the preset abnormal number. It should be noted that, the alarm device may also set different alarm triggering conditions for different timing tasks, for example, set a preset abnormal number of times of each data processing task corresponding to the timing task according to the importance of the timing task, and for more important timing tasks, the value of the preset abnormal number may be smaller.
Optionally, in order to ensure that the data processing task for processing the application data within the short time window can be performed in time, in any of the embodiments described above, the data processing platform may use streaming computing to perform the data processing task. Correspondingly, the method provided by the invention can further comprise the following steps: and the data processing platform detects whether the signal quantity exists in the queue to be executed according to a preset period through a resident process corresponding to the target timing task.
In one embodiment, when configuring a target timing task, an operator of the service application may create and start a resident process corresponding to the target timing task on the data processing platform, and establish a corresponding relationship between the target timing task and the resident process through a task ID of the target timing task. Then, the data processing platform can detect whether the signal quantity exists in the queues to be executed in the target task queue set according to a preset period through a resident process corresponding to the task ID.
It should be noted that the data processing platform may use any computing framework capable of implementing streaming computing, which is not limited by the present invention. For example, the data processing platform may use a store computing framework to periodically detect whether a semaphore exists in a queue to be executed in the target task queue set and acquire the target semaphore and corresponding data to be processed through a spout thread in the resident process, and execute a target data processing task corresponding to the target semaphore through a bolt thread in the resident process, which is not described in detail herein. It should be noted that each resident process may include a plurality of spout threads and a plurality of bolt threads, and the resident process corresponding to the target timing task may simultaneously execute a plurality of data processing tasks corresponding to the target timing task.
Compared with the prior art, the method and the device have the advantages that the data processing tasks corresponding to the semaphores are scheduled through the scheduling semaphores, the semaphores are stored through the task queue set, the coupling degree of the scheduling equipment and the data processing platform is reduced, and the independent operation capacity of the scheduling equipment and the data processing platform is enhanced. Secondly, classifying the execution states of the data processing tasks through various queues in the task queue set, rescheduling the signal quantity corresponding to the data processing tasks with overtime execution or failed execution, and further automatically re-executing the data processing tasks, thereby avoiding manual intervention. Thirdly, the execution results of the data processing tasks are uniformly and permanently stored in an external database through the filing equipment, so that operation and maintenance personnel of business application can conveniently review and analyze historical execution results. Fourth, the alarm device periodically acquires the execution result stored in the external database, so that the abnormal execution of the data processing task can be alarmed. Fifth, the data processing platform can process data by adopting stream computing through resident processes, and a large number of job processes for executing data processing tasks do not need to be frequently created and deleted, so that the stability of the data processing system in operation and maintenance is improved.
Based on the same technical conception, the embodiment of the invention also provides a data processing system, which comprises a scheduling device and a data processing platform, wherein the scheduling device is used for acquiring a target timing task and generating a target task queue set corresponding to the target timing task, and the target task queue set at least comprises a queue to be executed, an execution queue and a failure queue;
the scheduling device is further used for generating data processing tasks at regular time based on the target timing tasks and adding signal quantities corresponding to the data processing tasks to the queue to be executed;
when a target semaphore is detected in the queue to be executed, the data processing platform is used for transferring the target semaphore to the execution queue and executing a target data processing task corresponding to the target semaphore;
and if the target data processing task is abnormal in execution, the scheduling equipment is also used for re-adding the target semaphore to the queue to be executed. Optionally, the data processing system may further comprise a management device; before the scheduling device obtains the target timing task, the management device is used for configuring the target timing task based on the CRON expression, and adding the CRON expression to the task configuration table so that the scheduling device reads the target timing task from the task configuration table.
Optionally, the data processing platform is further configured to obtain the target semaphore from the queue to be executed according to an arrangement order of the target semaphore in the queue to be executed; if a modification request of the arrangement sequence of the target semaphore is detected, the scheduling device is further configured to modify the arrangement sequence of the target semaphore according to the modification request, so that the data processing platform obtains the target semaphore from the queue to be executed according to the modified arrangement sequence.
Optionally, the target task queue set may further include a failure queue; if the target data processing task fails to execute, the data processing platform is specifically configured to transfer the target semaphore to the failure queue; the scheduling device is configured to transfer the target semaphore to the queue to be executed when the target semaphore is detected in the failure queue.
Optionally, the target task queue set may further include a timeout queue; if the storage time of the target semaphore in the execution queue exceeds a preset storage time, the scheduling device is specifically configured to transfer the target semaphore to the timeout queue; the scheduling device is configured to transfer the target semaphore to the queue to be executed when the target semaphore is detected in the timeout queue.
Optionally, the target task queue set may further include an archive queue; the data processing platform is also used for recording the execution result of successful or failed execution of the data processing task in the semaphore and adding the semaphore recorded with the execution result to the archiving queue; the scheduling device is further configured to record an execution result of the execution timeout of the data processing task in the semaphore, and add the semaphore with the recorded execution result to the archive queue.
Further, the system may also include an archiving device; when the semaphore corresponding to the data processing task is detected from the archiving queue, the archiving device is used for acquiring the semaphore, reading the execution result, and storing the execution result in a task execution result set in an external database.
Further, the system may further comprise an alarm device; and when the execution result of the target data processing task is detected from the task execution result set to be that the execution failure and the execution overtime times are larger than the preset abnormal times, the alarm equipment alarms.
Optionally, the data processing platform performs the data processing task using streaming computing; the data processing platform can be used for detecting whether the semaphore exists in the queue to be executed or not according to a preset period through a resident process corresponding to the target timing task.
The scheduling device and the data processing platform provided in this embodiment jointly implement the above-mentioned method for executing the data processing task. The principle of implementation and the technical effects to be achieved are already discussed above and are not described in detail here.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including storing several instructions for causing a computer to perform the method described in the various embodiments or some parts of the embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (12)

1. A method of performing data processing tasks, the method being applied to a data processing system, the system comprising a scheduling device and a data processing platform, the method comprising:
The scheduling equipment acquires a target timing task and generates a target task queue set corresponding to the target timing task, wherein the target task queue set at least comprises a queue to be executed and an execution queue;
the scheduling equipment generates a data processing task based on the target timing task at fixed time, and adds a semaphore corresponding to the data processing task to the queue to be executed;
when a target semaphore is detected in the queue to be executed, the data processing platform transfers the target semaphore to the execution queue and executes a target data processing task corresponding to the target semaphore;
and if the target data processing task is abnormal in execution, the scheduling equipment re-adds the target semaphore to the queue to be executed.
2. The method of claim 1, wherein the system further comprises a management device;
the steps before the scheduling device acquires the target timing task include:
the management device configures the target timed task based on a CRON expression and adds the CRON expression to a task configuration table to cause the scheduling device to read the target timed task from the task configuration table.
3. The method of claim 1, wherein the method further comprises:
the data processing platform acquires the target semaphore from the queue to be executed according to the arrangement sequence of the target semaphore in the queue to be executed;
if a modification request of the execution priority of the target data processing task is detected, the scheduling device modifies the arrangement sequence of the target semaphore in the queue to be executed, so that the data processing platform obtains the target semaphore from the queue to be executed according to the modified arrangement sequence.
4. The method of claim 1, wherein the set of target task queues further comprises a failure queue;
the step of the scheduling device re-adding the target semaphore to the queue to be executed if the target data processing task is abnormal in execution includes:
if the target data processing task fails to execute, the data processing platform transfers the target semaphore to the failure queue;
when the target semaphore is detected in the failure queue, the scheduling device transfers the target semaphore to the queue to be executed.
5. The method of claim 1, wherein the set of target task queues further comprises a timeout queue;
the step of the scheduling device re-adding the target semaphore to the queue to be executed if the target data processing task is abnormal in execution includes:
if the storage time of the target semaphore in the execution queue exceeds the preset storage time, the scheduling equipment transfers the target semaphore to the overtime queue;
when the target semaphore is detected in the timeout queue, the scheduling device transfers the target semaphore to the queue to be executed.
6. The method of claim 1, wherein the set of target task queues further comprises an archive queue;
the method further comprises the steps of:
the data processing platform records the execution result of successful or failed execution of the data processing task in the semaphore and adds the semaphore recorded with the execution result to the archiving queue;
the scheduling device records the execution result of the execution timeout of the data processing task in the semaphore and adds the semaphore recorded with the execution result to the archive queue.
7. The method of claim 6, wherein the system further comprises an archiving device;
the step after the scheduling device or the data processing platform adds the semaphore recorded with the execution result to the archive queue includes:
when the semaphore corresponding to the data processing task is detected from the archiving queue, the archiving device acquires the semaphore, reads an execution result recorded in the semaphore, and stores the execution result in a task execution result set in an external database.
8. The method of claim 7, wherein the system further comprises an alarm device;
the step after the archiving device stores the execution result in the task execution result set in the external database includes:
and when the execution result of the target data processing task is detected from the task execution result set to be that the execution failure and the execution overtime times are larger than the preset abnormal times, the alarm equipment alarms.
9. The method of any of claims 1-8, wherein the data processing platform employs streaming computing to perform the data processing tasks;
the method further comprises the steps of:
And the data processing platform detects whether the semaphore exists in the queue to be executed according to a preset period through a resident process corresponding to the target timing task.
10. A data processing system, characterized in that the data processing system comprises a scheduling device and a data processing platform, wherein,
the scheduling equipment is used for acquiring a target timing task and generating a target task queue set corresponding to the target timing task, wherein the target task queue set at least comprises a queue to be executed, an execution queue and a failure queue;
the scheduling device is further used for generating data processing tasks at regular time based on the target timing tasks and adding signal quantities corresponding to the data processing tasks to the queue to be executed;
when a target semaphore is detected in the queue to be executed, the data processing platform is used for transferring the target semaphore to the execution queue and executing a target data processing task corresponding to the target semaphore;
and if the target data processing task is abnormal in execution, the scheduling equipment is also used for re-adding the target semaphore to the queue to be executed.
11. A system according to claim 10, wherein the system is adapted to implement a method of performing a data processing task according to any of claims 2 to 8.
12. The system of claim 10, wherein the system is configured to implement the method of performing data processing tasks of claim 9.
CN201910327896.7A 2019-04-23 2019-04-23 Data processing system and method for executing data processing task Active CN110309024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910327896.7A CN110309024B (en) 2019-04-23 2019-04-23 Data processing system and method for executing data processing task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910327896.7A CN110309024B (en) 2019-04-23 2019-04-23 Data processing system and method for executing data processing task

Publications (2)

Publication Number Publication Date
CN110309024A CN110309024A (en) 2019-10-08
CN110309024B true CN110309024B (en) 2023-07-18

Family

ID=68074529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910327896.7A Active CN110309024B (en) 2019-04-23 2019-04-23 Data processing system and method for executing data processing task

Country Status (1)

Country Link
CN (1) CN110309024B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888760A (en) * 2019-11-26 2020-03-17 中国工商银行股份有限公司 Data recovery method and device, and data processing method and device
CN111209112A (en) * 2019-12-31 2020-05-29 杭州迪普科技股份有限公司 Exception handling method and device
CN111290744B (en) * 2020-01-22 2023-07-21 北京百度网讯科技有限公司 Stream type computing job processing method, stream type computing system and electronic equipment
CN113672358A (en) * 2020-05-15 2021-11-19 北京沃东天骏信息技术有限公司 Timing task processing method, device and system, electronic equipment and storage medium
CN111625391B (en) * 2020-05-29 2023-06-13 北京思特奇信息技术股份有限公司 Task processing method, system and electronic equipment
CN111562969B (en) * 2020-07-15 2020-10-20 百度在线网络技术(北京)有限公司 Intelligent contract implementation method, device, equipment and medium for block chain
CN111930490B (en) * 2020-09-25 2021-06-15 武汉中科通达高新技术股份有限公司 Streaming media task management method and device
CN112416589A (en) * 2020-11-21 2021-02-26 广州西麦科技股份有限公司 Method for timing operation peak-shifting execution of operation and maintenance platform
CN112965799B (en) * 2021-03-05 2023-08-18 北京百度网讯科技有限公司 Task state prompting method and device, electronic equipment and medium
CN113873033B (en) * 2021-09-27 2022-12-13 江苏方天电力技术有限公司 Intelligent edge computing gateway platform with fault-tolerant function
CN113975815B (en) * 2021-11-04 2022-12-23 上海鱼尔网络科技有限公司 Task transfer method, system, device, equipment and storage medium
CN116501477B (en) * 2023-06-28 2023-09-15 中国电子科技集团公司第十五研究所 Automatic data processing method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013110816A2 (en) * 2012-01-27 2013-08-01 Tymis Method of using a shared memory
WO2014173339A1 (en) * 2013-08-07 2014-10-30 中兴通讯股份有限公司 Task scheduling service system and method
WO2016112701A1 (en) * 2015-01-16 2016-07-21 华为技术有限公司 Method and device for task scheduling on heterogeneous multi-core reconfigurable computing platform
CN109634733A (en) * 2018-12-13 2019-04-16 成都四方伟业软件股份有限公司 Task scheduling and managing method, device and operation management server

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3394416B2 (en) * 1997-04-24 2003-04-07 三菱電機株式会社 Program trace device
WO2015165073A1 (en) * 2014-04-30 2015-11-05 Oracle International Corporation System and method for supporting adaptive self-tuning locking mechanism in transactional middleware machine environment
US9772888B2 (en) * 2013-04-10 2017-09-26 Wind River Systems, Inc. Semaphore with timeout and lock-free fast path for message passing architectures
JP6462521B2 (en) * 2015-07-31 2019-01-30 株式会社日立超エル・エス・アイ・システムズ An API that prevents a normal part failure from propagating to a safety part and its processing part
CN106681836B (en) * 2016-12-28 2021-03-05 华为技术有限公司 Semaphore creation method and semaphore creation device
CN106844151B (en) * 2017-01-04 2019-11-12 南京国电南自电网自动化有限公司 A kind of network task method for detecting abnormality of VxWorks system
CN106933681B (en) * 2017-02-05 2019-10-11 深圳怡化电脑股份有限公司 It is a kind of multipair as blocking method and its system
CN108536532B (en) * 2018-04-23 2021-06-22 中国农业银行股份有限公司 Batch task processing method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013110816A2 (en) * 2012-01-27 2013-08-01 Tymis Method of using a shared memory
WO2014173339A1 (en) * 2013-08-07 2014-10-30 中兴通讯股份有限公司 Task scheduling service system and method
WO2016112701A1 (en) * 2015-01-16 2016-07-21 华为技术有限公司 Method and device for task scheduling on heterogeneous multi-core reconfigurable computing platform
CN109634733A (en) * 2018-12-13 2019-04-16 成都四方伟业软件股份有限公司 Task scheduling and managing method, device and operation management server

Also Published As

Publication number Publication date
CN110309024A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN110309024B (en) Data processing system and method for executing data processing task
US8689050B2 (en) Restarting event and alert analysis after a shutdown in a distributed processing system
US8943366B2 (en) Administering checkpoints for incident analysis
US20130073726A1 (en) Restarting event and alert analysis after a shutdown in a distributed processing system
US9170860B2 (en) Parallel incident processing
US10545807B2 (en) Method and system for acquiring parameter sets at a preset time interval and matching parameters to obtain a fault scenario type
US9086968B2 (en) Checkpointing for delayed alert creation
JP3951835B2 (en) Business management method and business processing system
WO2012056561A1 (en) Device monitoring system, method, and program
CN111782360A (en) Distributed task scheduling method and device
US20120260260A1 (en) Managing Job Execution
JP2012099092A (en) Management method, system, and computer program for incident pool
CN104601668B (en) Data push method, device and system based on condition managing
JP2005011023A (en) Job scheduling method and system
CN112200505B (en) Cross-business system process monitoring device and method, corresponding equipment and storage medium
CN110659147B (en) Self-repairing method and system based on module self-checking behavior
CN115543740A (en) Method, system, equipment and storage medium for monitoring abnormal operation of service
US20150281037A1 (en) Monitoring omission specifying program, monitoring omission specifying method, and monitoring omission specifying device
JP2010176303A (en) Batch processing system, information terminal apparatus for use in the same, and method for recovering batch processing
US20160266808A1 (en) Information processing device, information processing method, and recording medium
CN107168849B (en) Task scheduling operation monitoring method and device
CN105897498A (en) Business monitoring method and device
US20120210176A1 (en) Method for controlling information processing apparatus and information processing apparatus
US8589354B1 (en) Probe based group selection
CN115168137A (en) Monitoring method and system for timing task, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant