CN109684053B - Task scheduling method and system for big data - Google Patents

Task scheduling method and system for big data Download PDF

Info

Publication number
CN109684053B
CN109684053B CN201811308063.8A CN201811308063A CN109684053B CN 109684053 B CN109684053 B CN 109684053B CN 201811308063 A CN201811308063 A CN 201811308063A CN 109684053 B CN109684053 B CN 109684053B
Authority
CN
China
Prior art keywords
task
execution
scheduling
template
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811308063.8A
Other languages
Chinese (zh)
Other versions
CN109684053A (en
Inventor
方秋水
刘强
何建兵
陈卫国
吴金成
罗鸣鸣
冷梦甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Lingnanpass Co ltd
Original Assignee
Guangdong Lingnanpass Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Lingnanpass Co ltd filed Critical Guangdong Lingnanpass Co ltd
Priority to CN201811308063.8A priority Critical patent/CN109684053B/en
Publication of CN109684053A publication Critical patent/CN109684053A/en
Application granted granted Critical
Publication of CN109684053B publication Critical patent/CN109684053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a task scheduling method of big data, which comprises the following steps: creating a task template according to the task type; selecting a task template and a task scheduling service number to create a task, forming a configuration file, wherein the created task comprises a task name, task content and a task execution period, the task content is configured in a kv value mode, and a dependency relationship between tasks is established in a kv file format; and reading the task, generating a task instance, and acquiring task execution process information. The invention also discloses a task scheduling system of big data. According to the invention, by combining the unique characteristics of big data scene scheduling, distribution, execution and different data types, the characteristics of type separation, multi-triggering, strategy scheduling, blood-margin dependence and the like are highlighted, the construction of internal ecology is carried out on big data, and big data task scheduling management is carried out according to different strategies by big data task scheduling according to different task types.

Description

Task scheduling method and system for big data
Technical Field
The invention relates to the technical field of big data task scheduling management, in particular to a big data task scheduling method and a big data task scheduling system.
Background
In the business application of big data, when the business index is iterated and becomes more complex, the related application for managing the big data becomes a headache, for example: the problems of job dependent scheduling, task running condition monitoring, abnormal problem detection and the like can complicate our daily work.
In big data analysis systems, some scripts or execution units need to be started at a specific time, and some scripts or execution units need to be started even after certain conditions are met, in this case, the implementation is difficult only by manpower, and some systems also provide configuration of some timing tasks, but the configuration is troublesome to manage, and some of the configuration also needs to invade into a system of an execution machine, so that great hidden danger is brought.
Disclosure of Invention
In order to overcome the defects of the prior art, one of the purposes of the invention is to provide a big data task scheduling method, which combines the unique characteristics of big data scene scheduling, distribution, execution and different data types, highlights the characteristics of type, multi-trigger, strategy scheduling, blood-margin dependence and the like, carries out the construction of internal ecology of big data, and carries out big data task scheduling management according to different strategies by big data task scheduling according to different task types.
The second purpose of the invention is to provide a big data task scheduling system, which is used for better solving the problem of manual configuration management, combining the unique characteristics of big data scene scheduling, distribution, execution and different data types, highlighting the characteristics of type, multi-trigger, strategy scheduling, blood-margin dependence and the like, carrying out the construction of internal ecology of big data, carrying out big data task scheduling management according to different strategies by big data task scheduling, and according to different task types.
One of the purposes of the invention is realized by adopting the following technical scheme:
a task scheduling method of big data comprises the following steps:
creating a task template according to the task type;
selecting a task template and a task scheduling service number to create a task, forming a configuration file, wherein the created task comprises a task name, task content and a task execution period, the task content is configured in a kv value mode, and a dependency relationship between tasks is established in a kv file format;
and reading the task, generating a task instance, and acquiring task execution process information.
Further, the creating a task template according to the task type includes:
setting a template name, or/and automatically generating a template ID;
generating template data items, and inputting corresponding data item values and attributes for each template data item according to the task type.
Further, the selecting a task template and creating a task by a task scheduling service number includes:
creating a task name, and selecting a scheduling type and a task execution period;
creating a task, selecting a task template according to the task type, and configuring task content in a kv value mode; and establishing a dependency relationship between tasks through a kv file format.
Further, the reading the task, generating a task instance, and obtaining task execution process information includes:
generating a task execution list according to the task instance;
and monitoring the task execution list, and executing the task when the task meets the trigger condition.
Further, the generating a task execution list according to the task instance includes:
reading the configuration file, and obtaining the interval time of task inspection and the task generation time range;
finding out that all task states of the interval between the next execution time and the current time within a task generation time range are tasks to be checked and task execution time, wherein the tasks to be checked are tasks for which a task execution list is not generated;
generating a task execution list according to the task name, the task content and the task execution period of the task to be checked, wherein the task execution list comprises the task name, the task execution time and the task priority;
updating the state of the task to be checked, and updating the state of the task to be checked into the generated task operation;
the monitoring of the task execution list, when the task meets the triggering condition, executing the task, includes:
detecting a task to be checked according to the interval time of task checking to generate a task execution list;
and circularly checking task execution time in the task execution list at preset intervals, and if the current time meets the task execution time, performing:
creating a task execution sub-thread, calling different classes according to task types of task contents, and reading the task contents; decomposing parameters of task content to generate a task instance, and executing a target task according to the task instance; the target task is a task of which the current time meets the task execution time.
The second purpose of the invention is realized by adopting the following technical scheme:
a big data task scheduling system, comprising:
the task template creation module is used for creating a task template according to the task type;
the task scheduling management module is used for selecting a task template and a task scheduling service number to create a task to form a configuration file, wherein the created task comprises a task name, task content and a task execution period, the task content is configured in a kv value mode, and a dependency relationship between tasks is established in a kv file format;
and the task execution module is used for reading the task, generating a task instance and acquiring task execution process information.
Further, the task template creation module includes:
a setting unit for setting a template name, or/and automatically generating a template ID;
the first generation unit is used for generating template data items, and inputting corresponding data item values and attributes for each template data item according to the task type.
Further, the task scheduling management module includes:
the first creating unit is used for creating a task name, selecting a scheduling type and a task execution period;
the second creating unit is used for creating a task, selecting a task template according to the task type and configuring task content in a kv value mode; and establishing a dependency relationship between tasks through a kv file format.
Further, the task execution module includes:
the second generating unit is used for generating a task execution list according to the task instance;
and the triggering unit is used for monitoring the task execution list, and executing the task when the task meets the triggering condition.
Further, the second generating unit includes:
the reading subunit is used for reading the configuration file and acquiring the interval time of task inspection and the task generation time range;
the detection subunit is used for finding out that all task states of the interval between the next execution time and the current time in the task generation time range are tasks to be checked and task execution time, wherein the tasks to be checked are tasks for which a task execution list is not generated;
the first generation subunit is used for generating a task execution list according to the task name, the task content and the task execution period of the task to be checked, wherein the task execution list comprises the task name, the task execution time and the task priority;
the updating subunit is used for updating the state of the task to be checked, and updating the state of the task to be checked into the generated task operation;
the trigger unit includes:
the second generation subunit is used for detecting the task to be checked according to the interval time of task checking so as to generate a task execution list;
a judging subunit, configured to cyclically check the task execution time in the task execution list at preset intervals, and if the current time meets the task execution time, then:
the execution sub-unit is used for creating a task execution sub-thread, calling different classes according to the task type of the task content and reading the task content; decomposing parameters of task content to generate a task instance, and executing a target task according to the task instance; the target task is a task of which the current time meets the task execution time.
Compared with the prior art, the invention has the beneficial effects that:
1. timed mission planning trigger: flexible trigger time points (day/weekly/hour, etc.) are set according to different task types, the calculation tasks are decomposed according to time periods, the tasks are executed in parallel as much as possible, execution time is shortened, and the overall time window for executing the tasks is increased.
2. Flexible dependencies between tasks: any task can be used as a parent task of the task to perform dependency triggering; the task execution can be mutually dependent, the front-end task fails, and the follow-up dependent task is not executed.
3. Flexible and various alarm rules: the task failure can be timely and effectively alarmed, and the maintenance of operation and maintenance personnel is facilitated. Besides failure alarms, alarm rules such as incomplete task timeout, non-start task timeout and the like are supported.
4. Providing a perfect and easy-to-use Web user interface: the method is used for configuring, submitting, inquiring and monitoring the task and the dependency relationship of the task by the user.
5. The system has a complete log record: and collecting and recording standard output and standard errors generated in the task running process, providing Http access, and enabling a user to conveniently access the task running log by accessing the log Url corresponding to the task.
Drawings
FIG. 1 is a flow chart of a task scheduling method of big data according to the present invention;
FIG. 2 is a flow chart of creating a task template in accordance with the present invention;
FIG. 3 is a flow chart of the present invention for scheduling task management;
FIG. 4 is a flow chart of task execution of the present invention;
FIG. 5 is a block diagram of a big data task scheduling system according to the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and detailed description, wherein it is to be understood that, on the premise of no conflict, the following embodiments or technical features may be arbitrarily combined to form new embodiments.
According to the task scheduling method of the big data, a task scheduling platform (realized by software or/and hardware) of the big data is constructed, and rapid, efficient and flexible scheduling tasks are provided according to different task scheduling characteristics and different task type characteristics at different stages of data acquisition, data cleaning and data analysis. Referring to fig. 1, the method includes the following steps:
110. and creating a task template according to the task type.
In order to better execute the scheduling task, different task templates are customized according to the task types and then stored in a database, and the task templates can be called out for express creation of the scheduling task when the scheduling task is created.
The task template is mainly created by configuring different parameters according to different task types and different scheduling tasks, so that the problem of manually repeatedly configuring the scheduling tasks is solved. The task template comprises definition of data processing task rules such as data acquisition, data cleaning, data analysis and the like, parameters of the tasks are created for the data acquisition, the data cleaning, the data analysis and the like, and templates are provided for the creation of the tasks. The process of creating task modules is shown in fig. 2, different task templates are customized according to task types, and then the task templates are stored in a database, and can be called out for selection when scheduling tasks are created. Specifically, setting a template name, or/and automatically generating a template ID; generating template data items, inputting corresponding data item values and attributes of each template data item according to task types, forming template contents by the template data items, the data item values and the attributes of the template data items, and inputting template states, namely, the created task template mainly comprises three data input values of a template name, template contents and a template state and an automatically generated template number (template ID). Each task template is an operator, the operator is the smallest unit in the platform, and each operator carries the execution of a business logic, that is, an actual operation is mapped, for example, the execution of a script is called, an interface is called, and the like. In a task scheduling platform of big data, operators of various types exist, and the functions and the applicability of the operators are different.
120. The method comprises the steps of selecting a task template and a task scheduling service number to create a task, forming a configuration file, wherein the created task comprises a task name, task content and a task execution period, the task content is configured in a kv value mode, and a dependency relationship between tasks is established in a kv file format.
Different scheduling tasks are created according to the task templates, and the scheduling tasks comprise scheduling names, scheduling periods, scheduling types and scheduling tasks. The platform can set parameters of various tasks according to different task types, realize different task scheduling management, limit execution servers of the tasks in the task configuration process, classify according to different task objects and execute according to the servers. The task scheduling management mainly sets different scheduling tasks according to different stages of data acquisition, data cleaning, data analysis and the like and different templates. Each scheduling task contains one or more operators, which we can choose from among the already existing operators. These operators may be used alone or in combination to achieve a serial effect to accomplish a scheduling task.
Referring to fig. 3, the implementation process is as follows: different scheduling tasks are created according to the task templates, a task scheduling information table which is created and initialized previously is obtained (before step 110, the scheduling service number and the service name are initialized, and the data are used for identifying the service to which the task belongs). Selecting a corresponding task template and task scheduling service number, filling in task content, a task scheduling plan and other information to create a task plan.
The created task (i.e. the scheduling task) comprises information such as a task name, task content, task execution period and the like, the data are stored in a database, and a task scheduling program in a data acquisition, data cleaning and data analysis module reads the data and executes the task according to the set content. From the aspect of the function, the system can support task scheduling management in three stages of data acquisition, data cleaning and data analysis, has certain expansibility, and supports flexible configuration of other scheduling tasks according to service requirements. The created tasks are processed using spark-based python scripts, one or more python files for each task. The platform defines a kv file format to establish the dependency relationship between tasks, and any task can be used as a father task of the platform to perform dependency triggering; the task execution can be mutually dependent, the front-end task fails, and the follow-up dependent task is not executed. The task content can be configured with different types of parameters in a kv value mode, and the configuration is flexible. The task execution period is set so that the task execution process adopts a scheduling execution method of a time period, single-time and periodic tasks can be flexibly configured at one time, and a task scheduling program can execute the tasks periodically or periodically according to the configured parameters, so that the purpose of automatic task execution is achieved.
130. And reading the task, generating a task instance, and acquiring task execution process information.
The task scheduling and task execution script separation mode is adopted to achieve the aim of low coupling, and the task scheduling is not affected by modifying the specific content of the task.
Referring to fig. 4, task execution mainly includes three parts of task generation and monitoring, log processing, and history data processing:
wherein the task generation and monitoring further comprises:
a1, generating an execution task list according to different task types.
And reading a system configuration file, and acquiring the interval time of task inspection and the task generation time range. And finding out that all task states of the interval between the next execution time and the current time in the task generation time range are task to be checked and job execution time according to the task generation time range. And generating a task execution schedule, wherein the task execution schedule comprises a task number, task execution time, task priority and the like. And the task state of the task scheduling program for updating the task basic information table is acquired and is that the task job is generated.
A2, monitoring the generated planning task list. The task monitoring has two functions, the first function is used for executing a task execution list generation function according to the interval time of task inspection; the second function checks, every 1 second (which may be set), whether the task execution time in the task execution schedule has been validated, and if the task time has been validated, executes "task execution".
A3, executing the scheduling task of the planning task list. When the scheduling task meets the triggering condition, the task execution module creates a task execution sub-thread, reads the task content to execute the task, and updates the task information after the task execution is completed. The task execution module firstly reads task content in the task, calls different classes according to task types in the task content, reads the task content, decomposes parameters, generates a task instance and executes the task. After the task is executed, the latest running time, the next running time and the task state in the task basic information table are updated to be a non-generated task list. And calling a historical data management module, and writing the task and the job execution condition into a task execution record table and a job execution record table.
The history data processing mainly comprises writing a task execution record into a task execution history data table, and managing the processing of history files of each stage according to configuration files.
The log processing is mainly writing various logs in the running of the system, and using log classes in common codes.
The task scheduling management for data acquisition mainly comprises structured data acquisition and network data acquisition.
Structured data collection is mainly to collect the streaming data in a database. And performing task scheduling and executing tasks through a task scheduling program to complete data acquisition. And the sqoop task is configured on the basis of a scheduler, so that the structured data acquisition and scheduling task can be realized.
When a task is created, different types of task templates are created according to service requirements, and a data grabbing task is realized. The structured data collection is based on a scheduler to configure the sqoop task.
The scheduling task of the web crawler is operated by calling the python spiders script through java, so that the task template of the web crawler comprises the path of the python script. All task templates are defined as:
{FilePath:defaultvalue}
task scheduling management for data cleansing and data analysis: the spark-based python script is used to process one or more python files for each task. The task template adopts a mode similar to a JSON file format to set parameters as follows:
{pyfilepath:pathvalue}。
the task scheduling platform of big data is used for generating tasks by predefining various task templates and carrying out configuration parameters according to the calling templates, then a scheduler obtains task information through a time period to generate a task list, and the tasks are automatically executed according to the task execution period. According to the method, the scheduling tasks are established according to flexible template configuration, the dependency relationship among the tasks is managed, the whole life cycle of big data acquisition, cleaning and analysis is supported, the trouble of manually repeatedly configuring the scheduling tasks is eliminated, flexible, efficient and stable scheduling task management is provided, and support is provided for the improvement of the performance of the whole big data system.
The task scheduling of big data plays a general role in the process of carrying out the ETL of the data, the production, delivery and consumption of the whole data can penetrate through the task scheduling of the big data, the task scheduling management of the big data needs to be unfolded from the task scheduling characteristics, the requirements of a framework and a business scene for using the big data are met, and a high-availability, high-efficiency and flexible big data scheduling platform is constructed.
The big data task scheduling platform provides a batch workflow task scheduler. For running a set of jobs and flows in a particular order within a workflow. The big data task scheduling system defines a KV file format to establish the dependency relationship between tasks and provides an easy-to-use web user interface for maintaining and tracking the configuration, management, monitoring and the like of the scheduling tasks.
The big data task scheduling platform can receive the workflow submitted by the user, communicate with metadata, and save the information such as the configuration, the dependency relationship, the operation history, the resource configuration, the alarm configuration and the like of the scheduling task. And the system is responsible for unified configuration maintenance, triggering, scheduling and monitoring of tasks, executing work tasks submitted by users, realizing workflow monitoring and storing information, states, logs and the like of all workflows.
Example two
A big data task scheduling system is a virtual structure of a big data task scheduling method according to the first embodiment, please refer to fig. 5, which includes:
a task template creation module 510, configured to create a task template according to a task type;
the task scheduling management module 520 is configured to select a task template and a task scheduling service number to create a task, form a configuration file, wherein the created task comprises a task name, task content and a task execution period, the task content is configured in a kv value mode, and a dependency relationship between tasks is established in a kv file format;
the task execution module 530 is configured to read the task, generate a task instance, and obtain task execution process information.
Wherein the task template creation module 510 includes:
a setting unit for setting a template name, or/and automatically generating a template ID;
the first generation unit is used for generating template data items, and inputting corresponding data item values and attributes for each template data item according to the task type.
The task schedule management module 520 includes:
the first creating unit is used for creating a task name, selecting a scheduling type and a task execution period;
the second creating unit is used for creating a task, selecting a task template according to the task type and configuring task content in a kv value mode; and establishing a dependency relationship between tasks through a kv file format.
The task execution module 530 includes:
the second generating unit is used for generating a task execution list according to the task instance;
and the triggering unit is used for monitoring the task execution list, and executing the task when the task meets the triggering condition.
Further, the second generating unit includes:
the reading subunit is used for reading the configuration file and acquiring the interval time of task inspection and the task generation time range;
the detection subunit is used for finding out that all task states of the interval between the next execution time and the current time in the task generation time range are tasks to be checked and task execution time, wherein the tasks to be checked are tasks for which a task execution list is not generated;
the first generation subunit is used for generating a task execution list according to the task name, the task content and the task execution period of the task to be checked, wherein the task execution list comprises the task name, the task execution time and the task priority;
the updating subunit is used for updating the state of the task to be checked, and updating the state of the task to be checked into the generated task operation;
the trigger unit includes:
the second generation subunit is used for detecting the task to be checked according to the interval time of task checking so as to generate a task execution list;
a judging subunit, configured to cyclically check the task execution time in the task execution list at preset intervals, and if the current time meets the task execution time, then:
the execution sub-unit is used for creating a task execution sub-thread, calling different classes according to the task type of the task content and reading the task content; decomposing parameters of task content to generate a task instance, and executing a target task according to the task instance; the target task is a task of which the current time meets the task execution time.
The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.

Claims (6)

1. The task scheduling method for big data is characterized by comprising the following steps:
creating a task template according to the task type;
selecting a task template and a task scheduling service number to create a task, forming a configuration file, wherein the created task comprises a task name, task content and a task execution period, the task content is configured in a kv value mode, and a dependency relationship between tasks is established in a kv file format;
reading the task, generating a task instance, and acquiring task execution process information;
the reading the task, generating a task instance, and obtaining task execution process information, including:
generating a task execution list according to the task instance;
monitoring the task execution list, and executing the task when the task meets the trigger condition;
the generating a task execution list according to the task instance comprises the following steps:
reading the configuration file, and obtaining the interval time of task inspection and the task generation time range;
finding out that all task states of the interval between the next execution time and the current time within a task generation time range are tasks to be checked and task execution time, wherein the tasks to be checked are tasks for which a task execution list is not generated;
generating a task execution list according to the task name, the task content and the task execution period of the task to be checked, wherein the task execution list comprises the task name, the task execution time and the task priority;
updating the state of the task to be checked, and updating the state of the task to be checked into the generated task operation;
the monitoring of the task execution list, when the task meets the triggering condition, executing the task, includes:
detecting a task to be checked according to the interval time of task checking to generate a task execution list;
and circularly checking task execution time in the task execution list at preset intervals, and if the current time meets the task execution time, performing:
creating a task execution sub-thread, calling different task type modules according to task types of task contents, and reading the task contents; decomposing parameters of task content to generate a task instance, and executing a target task according to the task instance; the target task is a task of which the current time meets the task execution time.
2. The big data task scheduling method of claim 1, wherein the creating a task template according to a task type includes:
setting a template name, or/and automatically generating a template ID;
generating template data items, and inputting corresponding data item values and attributes for each template data item according to the task type.
3. The big data task scheduling method of claim 1, wherein the selecting a task template and a task scheduling service number creates a task, comprising:
creating a task name, and selecting a scheduling type and a task execution period;
creating a task, selecting a task template according to the task type, and configuring task content in a kv value mode; and establishing a dependency relationship between tasks through a kv file format.
4. A big data task scheduling system, characterized by comprising:
the task template creation module is used for creating a task template according to the task type;
the task scheduling management module is used for selecting a task template and a task scheduling service number to create a task to form a configuration file, wherein the created task comprises a task name, task content and a task execution period, the task content is configured in a kv value mode, and a dependency relationship between tasks is established in a kv file format;
the task execution module is used for reading the task, generating a task instance and acquiring task execution process information;
the task execution module includes:
the second generating unit is used for generating a task execution list according to the task instance;
the trigger unit is used for monitoring the task execution list, and executing the task when the task meets the trigger condition;
the second generation unit includes:
the reading subunit is used for reading the configuration file and acquiring the interval time of task inspection and the task generation time range;
the detection subunit is used for finding out that all task states of the interval between the next execution time and the current time in the task generation time range are tasks to be checked and task execution time, wherein the tasks to be checked are tasks for which a task execution list is not generated;
the first generation subunit is used for generating a task execution list according to the task name, the task content and the task execution period of the task to be checked, wherein the task execution list comprises the task name, the task execution time and the task priority;
the updating subunit is used for updating the state of the task to be checked, and updating the state of the task to be checked into the generated task operation;
the trigger unit includes:
the second generation subunit is used for detecting the task to be checked according to the interval time of task checking so as to generate a task execution list;
a judging subunit, configured to cyclically check the task execution time in the task execution list at preset intervals, and if the current time meets the task execution time, then:
the execution sub-unit is used for creating a task execution sub-thread, calling different task type modules according to the task types of the task content and reading the task content; decomposing parameters of task content to generate a task instance, and executing a target task according to the task instance; the target task is a task of which the current time meets the task execution time.
5. The big data task scheduling system of claim 4, wherein the task template creation module includes:
a setting unit for setting a template name, or/and automatically generating a template ID;
the first generation unit is used for generating template data items, and inputting corresponding data item values and attributes for each template data item according to the task type.
6. The big data task scheduling system of claim 4, wherein the task scheduling management module includes:
the first creating unit is used for creating a task name, selecting a scheduling type and a task execution period;
the second creating unit is used for creating a task, selecting a task template according to the task type and configuring task content in a kv value mode; and establishing a dependency relationship between tasks through a kv file format.
CN201811308063.8A 2018-11-05 2018-11-05 Task scheduling method and system for big data Active CN109684053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811308063.8A CN109684053B (en) 2018-11-05 2018-11-05 Task scheduling method and system for big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811308063.8A CN109684053B (en) 2018-11-05 2018-11-05 Task scheduling method and system for big data

Publications (2)

Publication Number Publication Date
CN109684053A CN109684053A (en) 2019-04-26
CN109684053B true CN109684053B (en) 2023-08-01

Family

ID=66185095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811308063.8A Active CN109684053B (en) 2018-11-05 2018-11-05 Task scheduling method and system for big data

Country Status (1)

Country Link
CN (1) CN109684053B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110308979A (en) * 2019-06-27 2019-10-08 苏宁消费金融有限公司 The asynchronous process frame and its implementation of task based access control scheduling
CN110363410A (en) * 2019-06-28 2019-10-22 北京银企融合技术开发有限公司 A kind of task response timeliness appraisal procedure, dispatching method, equipment and storage medium
CN111176802B (en) * 2019-07-26 2023-03-14 腾讯科技(深圳)有限公司 Task processing method and device, electronic equipment and storage medium
CN110688216B (en) * 2019-08-23 2022-06-17 北京浪潮数据技术有限公司 Custom cloud plan task creation method and device
CN110750308B (en) * 2019-09-11 2024-03-01 东软集团股份有限公司 Task processing method and device, storage medium and electronic equipment
CN110751384A (en) * 2019-09-30 2020-02-04 口碑(上海)信息技术有限公司 Service monitoring method and device
CN110837519A (en) * 2019-11-18 2020-02-25 北京明略软件系统有限公司 Index data management method and device, electronic equipment and machine-readable storage medium
CN111209101B (en) * 2020-01-06 2023-05-02 深圳市同洲电子股份有限公司 Big data calculation task multi-dependency scheduling system
CN113378007B (en) * 2020-03-09 2022-08-23 网易(杭州)网络有限公司 Data backtracking method and device, computer readable storage medium and electronic device
CN111813417B (en) * 2020-05-29 2023-07-28 杭州览众数据科技有限公司 Task scheduling method based on page configuration of several bin tasks and model tasks
CN111736929B (en) * 2020-06-22 2024-04-12 北京百度网讯科技有限公司 Method, apparatus, device and readable storage medium for creating task instance
CN112348348A (en) * 2020-11-02 2021-02-09 北京首钢自动化信息技术有限公司 Task data processing method and system
CN112486658A (en) * 2020-12-17 2021-03-12 华控清交信息科技(北京)有限公司 Task scheduling method and device for task scheduling
CN116416706A (en) 2020-12-18 2023-07-11 北京百度网讯科技有限公司 Data acquisition method and device
CN113395251B (en) * 2021-01-20 2024-06-14 腾讯科技(深圳)有限公司 Machine learning security scene detection method and device
CN113641739B (en) * 2021-07-05 2022-09-06 南京联创信息科技有限公司 Spark-based intelligent data conversion method
CN113645130A (en) * 2021-07-14 2021-11-12 一汽奔腾轿车有限公司 Configurable task scheduling method based on CAN bus gateway
CN113326117B (en) * 2021-07-15 2021-10-29 中国电子科技集团公司第十五研究所 Task scheduling method, device and equipment
CN113781195A (en) * 2021-09-09 2021-12-10 平安国际智慧城市科技股份有限公司 Financial data monitoring method and device
CN115328639B (en) * 2022-10-13 2022-12-13 北京云枢创新软件技术有限公司 Task scheduling system based on chip verification regression

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166590A (en) * 2013-05-20 2014-11-26 阿里巴巴集团控股有限公司 Task scheduling method and system
CN107766132A (en) * 2017-06-25 2018-03-06 平安科技(深圳)有限公司 Multi-task scheduling method, application server and computer-readable recording medium
CN108108375A (en) * 2016-11-25 2018-06-01 深圳市创梦天地科技有限公司 A kind of big data extracting method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9645845B2 (en) * 2007-09-27 2017-05-09 Sap Se Triggering job execution in application servers based on asynchronous messages sent by scheduling tasks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166590A (en) * 2013-05-20 2014-11-26 阿里巴巴集团控股有限公司 Task scheduling method and system
CN108108375A (en) * 2016-11-25 2018-06-01 深圳市创梦天地科技有限公司 A kind of big data extracting method and system
CN107766132A (en) * 2017-06-25 2018-03-06 平安科技(深圳)有限公司 Multi-task scheduling method, application server and computer-readable recording medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于Hadoop的交通物流大数据处理系统设计与实现";王寅田;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20150615;全文 *
基于大数据的数据服务应用研究;陈光;《计算机技术与发展》;20180307(第08期);全文 *

Also Published As

Publication number Publication date
CN109684053A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN109684053B (en) Task scheduling method and system for big data
EP3798846B1 (en) Operation and maintenance system and method
US10061578B2 (en) System and method of configuring a data store for tracking and auditing real-time events across different software development tools in agile development environments
CN108521339B (en) Feedback type node fault processing method and system based on cluster log
US20210342313A1 (en) Autobuild log anomaly detection methods and systems
US8601323B2 (en) Advanced management of runtime errors
CN105719126B (en) system and method for scheduling Internet big data tasks based on life cycle model
US20080065400A1 (en) System and Method for Producing Audit Trails
CN109117141B (en) Method, device, electronic equipment and computer readable storage medium for simplifying programming
US9706005B2 (en) Providing automatable units for infrastructure support
CN102663543A (en) Scheduling system used for enterprise data unification platform
KR100910336B1 (en) A system and method for managing the business process model which mapped the logical process and the physical process model
CN111190892A (en) Method and device for processing abnormal data in data backfilling
CN113760677A (en) Abnormal link analysis method, device, equipment and storage medium
CN109284324A (en) The dispatching device of flow tasks based on Apache Oozie frame processing big data
CN114372105A (en) ETL tool based method for realizing system automatic inspection
CN111104181A (en) Webpage data filling system for visually editing task flow
CN107291938B (en) Order inquiry system and method
CN108595480B (en) Big data ETL tool system based on cloud computing and application method
CN112506957A (en) Method and device for determining workflow dependency relationship
CN112527619A (en) Analysis link calling method and system based on directed acyclic graph structure
CN111291106A (en) Efficient flow arrangement method and system for ETL system
CN116627609A (en) Hive batch processing-based scheduling method and device
CN116400950A (en) DevOps element pipeline system based on version control
CN109446263A (en) A kind of data relationship correlating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant