CN111857983B

CN111857983B - Task scheduling method and device based on distributed data acquisition

Info

Publication number: CN111857983B
Application number: CN202010355882.9A
Authority: CN
Inventors: 刘春阳; 张旭; 王鹏
Original assignee: Beijing Blue Light Wit Network Technology Co ltd; National Computer Network and Information Security Management Center
Current assignee: Beijing Blue Light Wit Network Technology Co ltd; National Computer Network and Information Security Management Center
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2023-02-28
Anticipated expiration: 2040-04-29
Also published as: CN111857983A

Abstract

The invention discloses a task scheduling method based on distributed data acquisition, which comprises the following steps: acquiring a task to be processed; classifying the tasks to be processed to obtain a plurality of periodically repeated tasks; recording starting and stopping time points of each execution of each periodically repeated task in the previous n executions; calculating the average interval duration of each periodically repeated task in the previous n times of execution

Determining the next execution sequence of the plurality of periodically repeated tasks at the starting time point of the next execution of each periodically repeated task; recording the load of each task executor; respectively sending task request signals to a plurality of task executors and receiving feedback signals of the task executors; and sending the periodically repeated task to be executed firstly to the task executor corresponding to the received first feedback signal. The invention also discloses a task scheduling device based on distributed data acquisition. The invention can make the distributed system obtain the optimal effect when executing the task, and the operation is more stable.

Description

Task scheduling method and device based on distributed data acquisition

Technical Field

The invention relates to the technical field of computer information. More specifically, the present invention relates to a task scheduling method and apparatus based on distributed data acquisition.

Background

In recent years, with the development of computer systems, distributed systems, which are loosely coupled systems in which a plurality of processors are interconnected by communication lines, each processor corresponding to a personal computer, are receiving attention as a necessary means for solving the current computer problem, and are subjected to coordinated division of tasks by unified scheduling of the distributed systems. Therefore, how to schedule tasks is crucial to the aspects of exerting the overall performance of the distributed system, keeping the load of the system balanced, robust, highly available and the like. At present, most of general scheduling algorithms solve a certain specific problem, such as a round-robin method, a weighting method, a hashing method, a least connection method, a least deletion method and a fastest response method, and meanwhile, a single algorithm is easy to cause system oscillation and cause task nonuniformity among nodes, so that normal system load balance is interfered, and a distributed system cannot work efficiently.

Disclosure of Invention

An object of the present invention is to solve at least the above problems and to provide at least the advantages described later.

The invention also aims to provide a task scheduling method and device based on distributed data acquisition, which can achieve the purpose of obtaining the optimal effect when a distributed system executes tasks by classifying and grading the tasks and dynamically allocating the tasks according to the state of a task executor, so that the distributed system can operate more stably.

To achieve these objects and other advantages and in accordance with the purpose of the invention, a task scheduling method based on distributed data collection is provided, which is applied to a distributed system including a plurality of task executors, the method including:

acquiring a task to be processed;

classifying the tasks to be processed to obtain a plurality of periodically repeated tasks;

recording starting and stopping time points of each execution of each periodically repeated task in the previous n executions;

calculating the interval duration t of each adjacent two executions of each periodically repeated task in the previous n executions according to the starting and ending time points of each execution of each periodically repeated task in the previous n executions _i And according to the interval duration t of every two adjacent executions in the previous n executions of each periodically repeated task _i Calculating the average interval duration of each periodically repeated task in the previous n times of execution

According to the ending time point and the average interval duration of the last execution of each periodically repeated task

Calculating the starting time point of the next execution of each periodically repeated task, and determining the next execution sequence of the plurality of periodically repeated tasks;

recording the load of each task executor, wherein the load comprises a CPU utilization rate, a memory utilization rate and an IO utilization rate;

respectively sending task request signals to a plurality of task executors, and receiving feedback signals of each task executor, wherein the feedback signals are sent by each task executor after delaying according to own load after receiving the task request signals, and different loads correspond to different delay durations; and sending the periodical repeated task which needs to be executed first in the plurality of periodical repeated tasks to the task executor corresponding to the received first feedback signal.

Preferably, the method further comprises the following steps:

calculating the execution time length t of each periodical repeated task in the previous n times of execution according to the starting and ending time point of each periodical repeated task in the previous n times of execution _j Then according to the execution time length t of each time of each periodically repeated task in the previous n times of execution _j Calculating the average execution time length of each periodically repeated task in the previous n times of execution

According to the load of the task executor distributed by the periodical repeated task and the average execution time length of the periodical repeated task

Determining whether the periodically repeating task needs to be split into a plurality of subtasks;

and splitting the periodically repeated task to be split into a plurality of subtasks, and rescheduling the plurality of subtasks.

Preferably, the method further comprises:

classifying the tasks to be processed to obtain a plurality of disposable tasks;

and respectively distributing the plurality of disposable tasks to the plurality of task executors according to a round robin scheduling algorithm or a fastest response scheduling algorithm.

The invention also provides a task scheduling device based on distributed data acquisition, which is applied to a distributed system comprising a plurality of task executors, and the device comprises:

the task acquisition module is used for acquiring a task to be processed;

the task classification module is used for classifying the tasks to be processed to obtain a plurality of periodically repeated tasks;

the task state acquisition module is used for recording the starting and stopping time points of each execution of each periodically repeated task in the previous n executions;

a task priority classification module for calculating the interval duration t of each adjacent two executions of each periodically repeated task in the previous n executions according to the starting and ending time points of each execution of each periodically repeated task in the previous n executions _i And according to the interval duration t of every two adjacent executions in the previous n executions of each periodically repeated task _i Calculating the average interval duration of each periodically repeated task in the previous n times of execution

the task executor state acquisition module is used for recording the load of each task executor, wherein the load comprises a CPU utilization rate, a memory utilization rate and an IO utilization rate;

the task scheduling module is used for respectively sending task request signals to the plurality of task executors and receiving feedback signals of the task executors, wherein the feedback signals are sent by the task executors after the task executors delay according to self loads after receiving the task request signals, and different loads correspond to different delay durations; and sending the periodical repeated task which needs to be executed first in the plurality of periodical repeated tasks to the task executor corresponding to the received first feedback signal.

Preferably, the task priority classification module is further configured to calculate an execution time duration t of each of the periodically repeated tasks in the previous n executions according to a start-stop time point of each of the periodically repeated tasks in the previous n executions _j And executing each time in the previous n times of execution according to the execution time length t of each periodically repeated task _j Calculating the average execution time length of each periodically repeated task in the previous n times of execution

The device further comprises:

a task executor load analysis module for analyzing the load of the task executor allocated by the periodically repeated task and the average execution time of the periodically repeated task

the task splitting module is used for splitting the periodically repeated task to be split into a plurality of subtasks;

the task scheduling module is further configured to reschedule the plurality of subtasks.

Preferably, the task classification module is further configured to classify the task to be processed to obtain a plurality of disposable tasks;

the task scheduling module is also used for respectively distributing the plurality of disposable tasks to the plurality of task executors according to a round robin scheduling algorithm or a fastest response scheduling algorithm.

The present invention also provides an electronic device, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the at least one processor to execute the method.

The invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, carries out the method as described above.

The invention at least comprises the following beneficial effects: the method can dynamically adjust the execution period of the periodically repeated tasks in real time according to the multiple execution times of the periodically repeated tasks, and dynamically distribute the tasks by combining the load states of the task executors.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

Fig. 1 is a flowchart of a task scheduling method based on distributed data acquisition according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating splitting of a task in a task scheduling method based on distributed data acquisition according to another embodiment of the present invention;

fig. 3 is a flowchart illustrating scheduling of a one-time task in a task scheduling method based on distributed data acquisition according to another embodiment of the present invention.

Detailed Description

The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.

It should be noted that in the description of the present invention, the terms "lateral", "longitudinal", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

As shown in fig. 1 to 3, the present invention provides a task scheduling method based on distributed data acquisition, which is applied to a distributed system including a plurality of task executors, where the task executors are configured to receive tasks and execute operations related to the tasks according to descriptions and configuration information of the tasks. The method described herein is performed by a central processing unit of a distributed system, the method comprising:

s101, acquiring a task to be processed;

the method for acquiring the task data can be input into a distributed system by a user, and acquiring the task data to be processed further comprises converting the task data into structured data and storing the structured data.

S102, classifying the tasks to be processed to obtain a plurality of periodically repeated tasks;

the classification here is to distinguish according to the execution times of the tasks to be processed, and is divided into periodically repeated tasks and one-time tasks.

S103, recording starting and stopping time points of each periodic repetitive task executed each time in the previous n times of execution;

the previous n times refers to the total n times of execution from the latest execution to the past time, and the starting time point and the ending time point of each execution of each periodically repeated task are recorded;

s104, calculating the interval duration t of each adjacent two executions of each periodically repeated task in the previous n executions according to the starting and ending time points of each execution of each periodically repeated task in the previous n executions _i And then according to the interval duration t of every two adjacent executions in the previous n executions of each periodically repeated task _i Calculating the average interval duration of each periodically repeated task in the previous n times of execution

therefore, the execution period of each periodically repeated task can be updated in real time before the next execution of each periodically repeated task, and the problems that the task allocation is uneven and the load of a task executor is suddenly reduced due to the fixed execution period which is frequently generated by the conventional task scheduling system are avoided.

S105, recording the load of each task executor, wherein the load comprises a CPU utilization rate, a memory utilization rate and an IO utilization rate;

s106, respectively sending task request signals to a plurality of task executors, and receiving feedback signals of the task executors, wherein the feedback signals are sent by the task executors after the task executors delay according to self loads after receiving the task request signals, and different loads correspond to different delay time lengths; and sending the periodical repeated task which needs to be executed first in the plurality of periodical repeated tasks to the task executor corresponding to the received first feedback signal.

Here, different loads correspond to different delay durations, which means that: the delay time of the task executor with the heavy load is set to be longer, and the delay time of the task executor with the light load is set to be shorter.

More specific implementation processes of the step can comprise:

1)、t ₁ at the moment, a central processing unit of the distributed system respectively sends first SYN requests to a plurality of task executors to prepare for establishing connection;

2) After receiving the first SYN request, the task executor performs certain delay processing on the request according to the self load condition;

3) The task executor with light load has short delay time, so that a feedback signal (SYN + ACK packet) is sent out firstly, and the task executor with heavy load has long delay time, so that a feedback signal is sent out later;

4)、t ₂ at the moment, the feedback signal sent by the task executor with light load reaches the central processing of the distributed system firstAnd then the central processing unit of the distributed system sends a second SYN request to the task executor with light load first, the task executor with light load receives the second SYN request to check whether the ACK serial number in the second SYN request is consistent with the initial serial number of the task executor, namely, whether NACK (negative acknowledgement) and (NISN + 1) are equal is judged, if yes, the connection between the task executor and the central processing unit of the distributed system is allowed to be established, and if not, a RST data packet is sent to an upper TCP protocol stack to cancel the connection.

5)、t ₃ At the moment, the feedback signal sent by the task executor with the heavy load reaches the central processing unit of the distributed system, and because the central processing unit of the distributed system receives the feedback signal sent by the task executor with the light load, the TCP protocol stack automatically discards the feedback signal sent by the task executor with the heavy load.

Compared with the existing task scheduling method, the method in the embodiment can avoid the situations that the task executor has low load when receiving the task and the multiple tasks are overlapped and loaded highly when executing the task, so that the distributed system runs more stably, and the distributed system can obtain the optimal effect when executing the task.

In another example, the method further comprises:

s201, calculating the execution time length t of each periodical repeated task in the previous n times of execution according to the starting and ending time points of each periodical repeated task in the previous n times of execution _j Then according to the execution time length t of each time of each periodically repeated task in the previous n times of execution _j Calculating the average execution time length of each periodically repeated task in the previous n times of execution

S202, according to the load of the task executor distributed by the periodical repeated tasks and the average execution time length of the periodical repeated tasks

Determining whether the periodically repeated task needs to be split into a plurality of subtasks;

the load of the task executor allocated according to the periodical repeated task and the average execution time length of the periodical repeated task

Particularly when the load of the task executor is too high to be executed in average time

When the task is completed internally, the periodically repeated task needs to be split into a plurality of subtasks, and the number of the subtasks just meets the requirement of average execution time length

The internal completion task is limited.

S203, splitting the periodically repeated task to be split into a plurality of subtasks, and rescheduling the plurality of subtasks.

The rescheduling herein may still re-allocate as in the previous embodiment.

In the embodiment, the load of the task processor is analyzed, so that the problem that the task execution is delayed due to insufficient load of the task processor can be effectively avoided, and the task processing efficiency is greatly improved.

In another embodiment, the method further comprises:

s301, classifying the tasks to be processed to obtain a plurality of disposable tasks;

the disposable tasks herein include both bursty disposable tasks and non-bursty disposable tasks.

And S302, respectively distributing the multiple disposable tasks to the multiple task executors according to a round robin scheduling algorithm or a fastest response scheduling algorithm.

The round robin scheduling algorithm is used for scheduling non-bursty one-time tasks, and specifically comprises the following steps: the disposable tasks are linearly rotated in a string of task executors, the inside is realized in a queue mode, the first-in first-out is realized, and the orderliness of the tasks is ensured. The new task request is issued to the next task executor in the queue.

The fastest response scheduling algorithm is used for scheduling a bursty one-time task, and specifically includes: and distributing the task to the task executor with the shortest network response time according to the network response time of the task executor after receiving the task request.

In the above embodiment, after the tasks with different properties are distributed according to different scheduling methods, the tasks can be processed in a fastest and optimal manner according to the priorities, and the task executors can have uniform load and the distributed system can operate stably.

the task acquisition module is used for acquiring a task to be processed;

the task state acquisition module is used for recording the starting and stopping time point of each periodic repeated task in the previous n times of execution;

a task priority classification module for calculating the interval duration t of each adjacent two executions of each periodically repeated task in the previous n executions according to the starting and ending time points of each execution of each periodically repeated task in the previous n executions _i And then according to the interval duration t of every two adjacent executions in the previous n executions of each periodically repeated task _i Calculating the average interval duration of each periodically repeated task in the previous n times of execution

According to the ending time point of each periodically repeated task executed last timeLength of mean interval

the task executor state acquisition module is used for recording the load of each task executor, wherein the load comprises a CPU (Central processing Unit) utilization rate, a memory utilization rate and an IO (input output) utilization rate;

the task scheduling module is used for respectively sending task request signals to the plurality of task executors and receiving feedback signals of the task executors, wherein the feedback signals are sent by the task executors after the task executors receive the task request signals and delay according to the loads of the task executors, and different loads correspond to different delay durations; and sending the periodical repeated task which needs to be executed first in the plurality of periodical repeated tasks to the task executor corresponding to the received first feedback signal.

In another embodiment, the task priority classification module is further configured to calculate an execution time duration t of each of the periodically repeated tasks in the previous n executions according to the starting and ending time point of each of the periodically repeated tasks in the previous n executions _j Then according to the execution time length t of each time of each periodically repeated task in the previous n times of execution _j Calculating the average execution time length of each periodically repeated task in the previous n times of execution

The device further comprises:

In another embodiment, the task classification module is further configured to classify the task to be processed to obtain a plurality of disposable tasks;

The task scheduling device based on distributed data acquisition adopts the scheduling method based on distributed data acquisition.

The invention also provides a storage medium on which a computer program is stored which, when executed by a processor, implements the method described above.

While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims

1. The task scheduling method based on distributed data acquisition is characterized by being applied to a distributed system comprising a plurality of task executors, and the method comprises the following steps:

acquiring a task to be processed;

recording the starting and ending time points of each periodic repetitive task in the previous n times of execution;

calculating the interval duration t of each adjacent two executions of each periodically repeated task in the previous n executions according to the starting and ending time points of each execution of each periodically repeated task in the previous n executions _i And then according to the interval duration t of every two adjacent executions in the previous n executions of each periodically repeated task _i Calculating the average interval duration of each periodically repeated task in the previous n times of execution

2. The distributed data collection based task scheduling method of claim 1, further comprising:

calculating the execution time length t of each periodical repeated task in the previous n times of execution according to the starting and ending time point of each periodical repeated task in the previous n times of execution _j Then according to the execution time length t of each time of each periodically repeated task in the previous n times of execution _j Calculate each weekAverage execution duration of periodically repeated tasks in previous n executions

3. The distributed data collection based task scheduling method of claim 1, further comprising:

4. The task scheduling device based on distributed data acquisition is characterized by being applied to a distributed system comprising a plurality of task executors, and the device comprises:

the task acquisition module is used for acquiring a task to be processed;

a task priority classification module for calculating the interval duration t of each adjacent two executions of each periodically repeated task in the previous n executions according to the starting and ending time points of each execution of each periodically repeated task in the previous n executions _i And then every two adjacent executions in the previous n executions are executed according to each periodically repeated taskIs long at interval t _i Calculating the average interval duration of each periodically repeated task in the previous n times of execution

5. The distributed data collection based task scheduling apparatus of claim 4,

the task priority classification module is also used for calculating the execution duration t of each periodical repeated task in the previous n times of execution according to the starting and ending time point of each periodical repeated task in the previous n times of execution _j And executing each time in the previous n times of execution according to the execution time length t of each periodically repeated task _j Calculating the average execution time length of each periodically repeated task in the previous n times of execution

The device further comprises:

6. The distributed data collection based task scheduling apparatus of claim 4,

the task classification module is also used for classifying the tasks to be processed to obtain a plurality of disposable tasks;

7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of any of claims 1-3.

8. Storage medium on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1-3.