CN111221632A - Distributed parallel task scheduling method and device, computer equipment and storage medium - Google Patents

Distributed parallel task scheduling method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111221632A
CN111221632A CN201910979783.5A CN201910979783A CN111221632A CN 111221632 A CN111221632 A CN 111221632A CN 201910979783 A CN201910979783 A CN 201910979783A CN 111221632 A CN111221632 A CN 111221632A
Authority
CN
China
Prior art keywords
data
node
task
nodes
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910979783.5A
Other languages
Chinese (zh)
Other versions
CN111221632B (en
Inventor
钟泽峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN201910979783.5A priority Critical patent/CN111221632B/en
Publication of CN111221632A publication Critical patent/CN111221632A/en
Application granted granted Critical
Publication of CN111221632B publication Critical patent/CN111221632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The embodiment of the application provides a distributed parallel task scheduling method, a device, computer equipment and a storage medium, which relate to the technical field of computer networks, and the method comprises the steps of responding a processing request through a main node in a distributed node when the processing request of a data task is received; distributing the data tasks to each data node through the main node for data processing; the main node and the data node adopt the same configuration; when the main node is down, a new main node is selected from the data nodes, and the new main node responds to the processing request of the new data task, so that the problem of data task response interruption caused by the down of the main node is solved. According to the scheme, data processing is performed based on inter-thread kernel scheduling, a new main node can be selected from data nodes quickly, the operation efficiency of a distributed system is improved, meanwhile, the configuration of all distributed nodes is the same, the development cost is reduced, and lightweight development is achieved.

Description

Distributed parallel task scheduling method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of computer networks, in particular to a distributed parallel task scheduling method, a distributed parallel task scheduling device, computer equipment and a storage medium.
Background
In the existing task scheduling technology, for example, a task scheduling system based on Azkaban, quartz and other technologies performs scheduling of specific services in a service system in an inter-process communication manner, and an operation instruction sent by a user is processed by a task server and an execution server in the task scheduling system and then transmitted to the service system through a communication network for task scheduling.
The task server, the execution server and the service system belong to a process level, and the independence of data spaces in the task server, the execution server and the service system determines that task scheduling needs to be realized by depending on an external scheduling mode. Therefore, in the task scheduling process, a large amount of resources are consumed for communication among the processes, the task scheduling processing efficiency is affected, and particularly, when a single point of failure occurs, the system is difficult to recover.
Disclosure of Invention
The present invention aims to solve at least one of the above technical defects, particularly the problem that task scheduling among processes affects task scheduling efficiency and a single point of failure causes difficulty in system recovery, and provides a distributed parallel task scheduling method, apparatus, computer device and storage medium.
In a first aspect, this embodiment provides a distributed parallel task scheduling method, including the steps of:
when a processing request of a data task is received, responding to the processing request through a main node in a distributed node;
distributing the data tasks in the processing request to each data node through the main node for data processing; the main node and the data node adopt the same configuration;
monitoring the running states of the main node and the data node in the data processing process;
and when the main nodes are down, selecting new main nodes from the data nodes, and responding to a processing request of a new data task through the new main nodes.
In one embodiment, the step of responding to the processing request by the master node in the distributed node further comprises:
and traversing the nodes, finding out a target node which is configured with a main node identification in advance, and determining the target node as a main node.
In an embodiment, the step of allocating, by the master node, the data task in the processing request to each data node for data processing includes:
acquiring performance state information and task execution information of each data node, and determining the task amount distributed by each data node according to the performance state information and the task execution information;
and distributing the data tasks to corresponding data nodes for data processing according to the task amount through the main node.
In an embodiment, the step of obtaining performance state information and task execution information of each data node, and determining a task amount allocated to each data node according to the performance state information and the task execution information includes:
acquiring a topological relation and a dependency relation of the data task, and determining a data node for processing the data task;
calculating performance state information and task execution information of the data nodes; the performance state information comprises CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of executed tasks, the execution progress and the executed time of the executed tasks;
and determining the task quantity of the data nodes in a balanced mode according to the performance state information and the task execution information.
In an embodiment, the step of allocating, by the master node, the data task to the corresponding data node according to the task amount to perform data processing includes:
analyzing the data tasks distributed to the data nodes to obtain processing information of the data tasks;
and according to the processing information, carrying out data processing in parallel through the data nodes.
In one embodiment, the step of electing a new master node from the data nodes comprises:
acquiring performance state information and task execution information of the data nodes, and selecting target data nodes according to the performance state information and the task execution information;
and taking the target data node as a new main node and replacing the main node which is down.
In an embodiment, the step of monitoring the operating states of the master node and the data node during the data task processing includes:
monitoring heartbeat messages sent to each data node through the main node;
and if the heartbeat message received by the data node is abnormal, judging that the main node is down.
In a second aspect, the present embodiment provides a distributed parallel task scheduling apparatus, including:
the processing request response module is used for responding the processing request through the main node in the distributed nodes when receiving the processing request of the data task;
the data task allocation module is used for allocating the data tasks in the processing requests to all the data nodes through the main node for data processing; the main node and the data node adopt the same configuration;
the operation state monitoring module is used for monitoring the operation states of the main node and the data node in the data processing process;
and the new main node election module is used for electing a new main node from the data nodes when the main node goes down and responding to a processing request of a new data task through the new main node.
In a third aspect, this embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the distributed parallel task scheduling method according to any embodiment of the first aspect when executing the program.
In a fourth aspect, the present embodiment provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for distributed parallel task scheduling according to any of the embodiments of the first aspect.
In the distributed parallel task scheduling method, apparatus, computer device, and storage medium provided in the foregoing embodiments, the master node and the data nodes adopt the same configuration, the master node responds to a processing request of a data task, and allocates the data task to each data node for data processing, if the master node goes down, a new master node is selected from the data nodes having the same configuration as the master node, and the new master node responds to the processing request of the new data task. The distributed task scheduling method in the technical scheme is realized based on inter-thread communication in the same service system, each distributed node executes one thread, consumption of multi-process communication on service system resources is avoided, task scheduling processing efficiency can be improved, meanwhile, when a main node is crashed, because the configuration of each distributed node is the same, a new main node can be selected from data nodes quickly, the recovery efficiency of crash can be improved, and the work efficiency of the distributed system is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a distributed parallel task scheduling method provided by an embodiment;
FIG. 2 is a detailed flowchart of step S120 according to one embodiment;
FIG. 3 is another detailed flowchart of step S120 according to an embodiment;
FIG. 4 is a detailed flowchart of step S130 according to one embodiment;
FIG. 5 is a detailed flowchart of step S140 according to one embodiment;
FIG. 6 is a system framework employed by a conventional task scheduling method;
FIG. 7 is a system framework adopted by the distributed parallel task defense method provided in the present solution;
FIG. 8 is a block diagram illustrating an exemplary distributed parallel task scheduler;
fig. 9 is a schematic structural diagram of a distributed parallel task scheduling apparatus according to an embodiment.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
In some of the flows described in the specification and claims of this application and in the above-described figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, the order of the operations being numbered such as 110, 120, etc. merely to distinguish between various operations, the order of which is not itself intended to represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
FIG. 1 is a flowchart of a distributed parallel task scheduling method according to an embodiment. Before explaining the present solution in detail, a system architecture adopted by the distributed task scheduling method of the present solution is introduced. The system architecture comprises a plurality of distributed nodes and a client, wherein the distributed nodes are connected with one another. The distributed task scheduling method can be executed on any one of the distributed nodes, and the node selected as the main node serves as a server to respond to a service request initiated by a client and is distributed to each data node for processing.
Specifically, referring to fig. 1, the distributed parallel task scheduling method provided in this embodiment includes the following steps:
s110, when a processing request of a data task is received, the processing request is responded through the main node in the distributed node.
Wherein the plurality of distributed nodes comprises a master node and a plurality of data nodes. The main node is in communication connection with each data node. The master node is a node which responds to a processing request of a data task and distributes the task to each data node. Correspondingly, the data node is a node that performs the data task assigned by the master node. When the client can access the distributed task scheduling system, namely the service system, and send a processing request of the data task to the service system, the processing request contains processing information of the data task. And the main node in the service system responds to the processing request and acquires the processing information of the data task from the processing request sent by the client.
And S120, distributing the data tasks in the processing request to each data node through the main node for data processing.
The main node and the data node adopt the same configuration.
Different from the configuration of the main nodes in the prior art which is different from the data nodes, the configuration of each distributed node in the embodiment is the same, so that each node can be the main node, and even if the main node goes down and does not need to modify the configuration file, a new main node can be reselected from the data nodes quickly, and the data task processing efficiency is improved. Meanwhile, the main node and the data node adopt the same configuration and deployment, so that the development difficulty is reduced.
In an embodiment, after receiving the data request, the master node allocates the data task to each data node to perform processing of the data task. After receiving the data tasks, each data node starts at a specified time according to the processing information of the data tasks, and executes the data tasks in parallel.
And S130, monitoring the running states of the main node and the data node in the data processing process.
In the data task processing process, in order to facilitate understanding of the working conditions of each distributed node, the running states of the master node and the data node need to be monitored in real time, such as whether a fault occurs, the change of various performance parameters, the task execution condition, and the like. Specifically, a heartbeat message is sent to a data node through a main node to monitor whether the main node fails or not; the running state of each data node is monitored by calculating performance parameters of each data node, such as CPU (Central processing Unit) usage, memory usage, I/O (input/output) resource usage and other resource usage conditions, and load parameters, such as the number of executed tasks, execution progress, execution time and the like. Optionally, the running states of the master node and the data node may be fed back to the client, so as to check the job condition of the distributed task scheduling system in time.
And S140, when the main nodes are down, selecting new main nodes from the data nodes, and responding to a processing request of a new data task through the new main nodes.
And when the main node is down, the main node cannot respond to the processing request of the data task. Because the configuration of the data nodes is the same as that of the main nodes, at this time, a new main node can be selected from the data nodes quickly, the new main node replaces the main node which is down, and the processing request of the data task is responded. The new main node distributes the data tasks to the other data nodes except the new main node, so that the problem that the whole distributed task scheduling system cannot continuously perform task scheduling due to the fact that the current main node is down is avoided, and the processing efficiency of the data tasks is improved.
It should be noted that when the master node is down permanently, a new master node is selected from the data nodes, and the new master node responds to the processing request of the data task and allocates the data task to the data nodes other than the new master node to execute the data task. If the main node is temporarily down, when the main node is recovered to normal, the main node can replace a new main node selected from the data nodes, the original main node continues to respond to the processing request of the data task, the new main node selected from the data nodes can also respond to the processing request, and the recovered original main node serves as the data node and receives the data task distributed by the new main node.
It should be noted that task scheduling in the prior art is based on scheduling between processes, and the independence of data space between processes determines that communication needs to be implemented by an operating system, and is based on external scheduling. The distributed parallel scheduling method provided by the embodiment is executed on the basis of threads, and since multiple threads share the address space and the data space, the task data of one thread can be directly provided for other threads to use without passing through an external operating system. Based on kernel scheduling in the same system, the communication efficiency between threads in the task scheduling process in the technical scheme is high, the data task processing efficiency of the task scheduling system can be improved, and resources are saved.
According to the distributed parallel task scheduling method provided by the embodiment, based on task scheduling among threads, all distributed nodes adopt the same configuration, the main nodes respond to processing requests of data tasks, the data tasks are distributed to all the data nodes for data processing, when a certain main node is down, because the main nodes are configured with the same data nodes, each data node is possible to become the main node during task scheduling, configuration files do not need to be modified, so that new main nodes are rapidly selected again, the down recovery is rapid, the work efficiency of a distributed system is improved, meanwhile, all the distributed nodes are configured with the same configuration, the development cost is reduced, and light-weight development is realized.
In an embodiment, before responding to the processing request through the master node in the distributed node in step S110, the method may further include:
s100, traversing the nodes, finding out a target node which is configured with a main node identifier in advance, and determining the target node as a main node.
In order to facilitate the determination of the master node, one of the distributed nodes may be selected as the master node, and the master node identifier may be configured in advance. And traversing each node in the distributed task scheduling system, and finding out the target node which is pre-configured with the main node identification, wherein the target node serves as the main node.
In other embodiments, because the configuration of the master node and the data node is the same, any one of the distributed nodes can be selected as the master node, the master node can be quickly confirmed, and the efficiency of data task processing is improved.
In order to make the technical solution clearer and easier to understand, specific implementation processes and modes of the steps in the technical solution are described in detail below.
Fig. 2 is a detailed flowchart of step S120 according to an embodiment, as shown in fig. 2, in an embodiment, step S120 allocates the data task in the processing request to each data node in parallel for data processing through the master node, and may further include:
s121, acquiring performance state information and task execution information of each data node, and determining the task amount distributed by each data node according to the performance state information and the task execution information.
The performance state information of the data node refers to parameter information describing the performance state of the data node, and includes state parameters such as a CPU master frequency, a memory size, a CPU utilization rate, a memory utilization rate, and an I/O resource utilization rate, and the task execution information of the data node refers to parameter information describing the task execution state of the data node, and includes task parameters such as the number of tasks in progress, a task execution progress, an executed time, and a remaining execution time.
The performance state information and task execution information of the data node affect the efficiency of the data node in processing data tasks. In order to reasonably and evenly distribute data tasks, resources of the distributed task scheduling system are fully utilized. After receiving the performance state information and the task execution information periodically sent by each data node, the master node determines the task amount allocated by each data node according to the performance state information and the task execution information of the data node. Illustratively, if the number of data tasks processed by data node a is 2, the expected full-load task amount is 5, and the number of data tasks processed by data node B is 0, and the expected full-load task amount is 6, then it is determined that data node a can accept 3 task amounts, and data node B can accept 6 task amounts, and can preferentially allocate the newly received data tasks to data node B.
And S122, distributing the data tasks to corresponding data nodes for data processing according to the task amount through the main node.
After the task quantity required to be executed by each data node is determined, the data tasks are distributed to the corresponding data nodes through the main node according to the task quantity. Each data node can execute a plurality of data tasks in parallel without waiting for the completion of the execution of one data task and then starting to execute another data task, so that the efficiency of executing the data tasks is improved.
Fig. 3 is another detailed flowchart of step S120 according to an embodiment, and as shown in fig. 3, step S121 may specifically include the following sub-steps:
s1201, acquiring the topological relation and the dependency relation of the data task, and determining a data node for processing the data task.
The processing request of the data task comprises information such as topological relation, dependency relation and the like for processing the data task. The topological relation refers to the mutual relation of the data nodes determined according to the mutual relation among various data in the data task, and the dependency relation refers to the completion that one event is dependent on another event, namely the task processed by one data node needs to be dependent on the completion of the tasks processed by other data nodes. Optionally, in an embodiment, the topology relationship corresponding to the processing request is analyzed by using a project manager, and the dependency relationship of the processing request is analyzed by using an ikexpress.
In the embodiment, the distributed task scheduling system analyzes the topological relation and the dependency relation corresponding to the processing request of the data task, and searches for the data node related to the execution of the data task.
And S1202, calculating the performance state information and the task execution information of the data node.
The performance state information comprises CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of executed tasks, the execution progress and the executed time of the executed tasks.
In an embodiment, the lower the values of the parameters of the data node, such as the CPU utilization, the memory utilization, and the I/O resource utilization, the better the performance state of the data node. For example, a certain performance parameter, such as memory usage rate, may be set as a main parameter for evaluating the performance state, or a weight formula may be set, and a plurality of performance parameters are comprehensively considered, such as setting the weight of CPU usage rate to be 0.3, the weight of memory usage rate to be 0.5, and the weight of I/O resource usage rate to be 0.2, and the performance state of the current data node is calculated as CPU usage rate 0.3+ memory usage rate 0.5+ I/O resource usage rate 0.2.
Similarly, the smaller the number of tasks being performed by the data node, the faster the task execution progress, the shorter the remaining execution time, and the like, the better the task execution state of the data node is, and in an embodiment, a certain task parameter, such as the number of tasks being performed, may be set as a main parameter for evaluating the task execution state, or a calculation formula may be set, and the task execution state is comprehensively evaluated in consideration of a plurality of task parameters.
S1203, determining the task quantity of the data nodes in a balanced mode according to the performance state information and the task execution information.
In most cases, the performance states of the data nodes are different, and the task execution states at different times are distributed to the data nodes in a balanced manner so that the task amount can be distributed to the data nodes in accordance with the performance states and the task execution states of the data nodes. Illustratively, according to the performance state information and the task execution information, more task quantities are allocated to data nodes with better performance states or better task execution states, and correspondingly, less task quantities are allocated to data nodes with worse performance states or worse task execution states, so that the total task quantities of the data nodes are balanced, and the resource utilization rate of the distributed task scheduling system is improved.
Further, the step S122 may specifically include the following sub-steps:
s1204, analyzing the data tasks distributed to the data nodes to obtain the processing information of the data tasks.
The data task includes processing information. The data tasks may include: storing workflow execution logs, deleting data information, starting or closing a certain program, updating clearing information, and the like. After receiving the data task, the data node analyzes the data task to obtain corresponding processing information, such as the type of the task, the time for starting execution, the period for executing the task, and the like.
And S1205, according to the processing information, carrying out data processing in parallel through the data nodes.
One data node can execute a plurality of data tasks at the same time, and the same data task can be executed by a plurality of data nodes. Meanwhile, a plurality of data tasks can be executed independently, and each data node can execute the data tasks in parallel.
Illustratively, a data node A receives a data task of performing data clearing at a zero point, and a data node B receives a data task of starting to modify data at 1 point in the morning; the data node C receives a data task and the like which sends data at a seamless point, the data nodes A, B and C can execute the received data task in parallel without waiting for the completion of the execution of the data task of a certain data node, and the work efficiency of the task scheduling system is improved.
Fig. 4 is a detailed flowchart of step S130 according to an embodiment, as shown in fig. 4, in an embodiment, the step S130 monitors the operation states of the master node and the data node during the data processing process, and may include the following steps:
s1301, heartbeat messages sent to the data nodes through the main node are monitored.
In order to ensure normal communication between the master node and the plurality of data nodes, the master node sends heartbeat messages to the plurality of data nodes according to a fixed period. And judging whether the main node is down or not by monitoring the heartbeat messages sent by the main node to the data nodes.
And S1302, judging whether the heartbeat message received by the data node is abnormal or not, if so, jumping to S1303, otherwise, jumping to S1304.
And if the data node cannot receive the heartbeat message, or receives the heartbeat message overtime, or receives the heartbeat message disorderly, judging that the heartbeat message received by the data node is abnormal.
And S1303, judging that the main node is down.
And when the data node receives the heartbeat message and is abnormal, judging that the main node is down, wherein the main node cannot respond to a data task processing request sent by the client or cannot send a data task to the data node.
And S1304, judging the main node to be normal.
Specifically, the data node normally receives a heartbeat message sent by the master node, and the master node normally responds to a processing request of the data task and distributes the data task to the data node.
Fig. 5 is a detailed flowchart of step S140 according to an embodiment, and as shown in fig. 5, the step S140 elects a new master node from the data nodes, which may include the following steps:
s1401, acquiring performance state information and task execution information of the data node, and selecting a target data node according to the performance state information and the task execution information.
Because the configuration and the deployment of each data node are the same, if the current main node is down, the processing request of the data task cannot be responded. At this time, a new master node is newly elected according to the performance state information and the task execution information of the data node. For example, a data node with the best performance state information may be selected as a target data node, a data node with the best task execution state information may be selected as a target data node, and a data node with the best performance state information and task execution state information may be selected as a target data node.
As an example, the process of election target node may be: acquiring performance state information and task execution state information of each data node, and selecting a target data node according to the performance state information and the task execution state information, wherein if the data node with the lowest CPU utilization rate selected according to the performance state information is used as the target data node, the data node with the fewest current tasks selected according to the task execution state information can be used as the target data node, and the selected data node with the lowest CPU utilization rate and the fewest current tasks can be used as the target data node.
And S1402, taking the target data node as a new main node and replacing the main node which is down.
And after the target data node is taken as a new main node, switching and sending the processing request of the data task to the new main node, wherein the new main node replaces the former main node, namely the main node which is down responds to the processing request.
It should be noted that, because the main nodes and the data nodes in each distributed node adopt the same configuration, when the main node goes down, the new main node can be selected from each data node at will without considering the current performance state and task execution state of each data node.
In addition, the data node elected as the new master node can continue to execute the previously distributed data tasks, but when the processing request of the new data task is responded, the data node elected as the new master node distributes the new data task to other data nodes.
The following describes the distributed parallel task scheduling method provided by the present technical solution with reference to fig. 6 and fig. 7.
Fig. 6 is a schematic diagram of a system architecture adopted in a task scheduling method in a conventional technique, where as shown in fig. 6, a user sends a processing request of a data task to a scheduling system through a client, a task server serving as a master node responds to the processing request and sends the data task to each execution server serving as a data node, and each execution server receives the data task and then performs an action of executing the task to a service system through network connection according to processing information in the data task. In the system architecture adopted by the task scheduling, the task server, the execution server and the service system in the scheduling system are all in a process level, and the communication in the process level needs to consume more resources.
Fig. 7 is a schematic diagram of a system architecture adopted in the distributed parallel task scheduling method according to the embodiment of the present invention, and as shown in fig. 7, a distributed node executes a thread, and when the node is a master node, the master node allocates a data task to a data node through a scheduler; when the node is a data node, the data node executes a data task through a scheduler after receiving the data task. The distributed parallel task scheduling method in the scheme is based on scheduling among threads in a service system, the communication speed among the threads is higher than that among the threads, the resource processing efficiency is improved, and the communication safety among the threads is higher and the resource overhead is smaller.
Fig. 8 is a schematic structural diagram of a distributed parallel task scheduling apparatus according to an embodiment, where the distributed parallel task scheduling apparatus according to this embodiment may be integrated in a computer device.
As shown in fig. 8, the distributed parallel task scheduling apparatus includes: a processing request response module 110, a data task allocation module 120, an operation status monitoring module 130 and a new master node election module 140.
The processing request responding module 110 is configured to, when receiving a processing request for a data task, respond to the processing request through a master node in a distributed node; a data task allocation module 120, configured to allocate, by the master node, data tasks in the processing request to each data node in parallel for data processing; the main node and the data node adopt the same configuration; an operation state monitoring module 130, configured to monitor operation states of the master node and the data node in a data processing process; and a new master node election module 140, configured to elect a new master node from the data nodes when the master node goes down, and respond to a processing request of a new data task through the new master node.
The distributed parallel task scheduling device provided by this embodiment is distributed task scheduling among threads based on kernel scheduling, and by using the same configuration for distributed nodes, a master node responds to a processing request of a data task, and allocates the data task to each data node for data processing, if the master node goes down, a new master node is elected from each data node with the same configuration as the master node, and the new master node responds to the processing request of the new data task, so that the problem of data task response interruption caused by the down of the master node is avoided, the existing master node can be elected quickly, the scheduling system is recovered quickly, the operating efficiency of the distributed system is improved, meanwhile, the configuration of each distributed node is the same, the development cost is reduced, and lightweight development is realized.
Fig. 9 is another schematic structural diagram of a distributed parallel task scheduling apparatus according to an embodiment, as shown in fig. 9, in one embodiment, the distributed parallel task scheduling apparatus further includes: and the traversal module 100 is configured to traverse the nodes, find a target node preconfigured with a master node identifier, and determine the target node as a master node.
In one embodiment, the data task assignment module 120 includes: a task amount determining unit 1201, configured to obtain performance state information and task execution information of each data node, and determine a task amount allocated to each data node according to the performance state information and the task execution information; and a task amount allocation unit 1202, configured to allocate, by the master node, the data task to the corresponding data node in parallel according to the task amount.
In one embodiment, the task amount determining unit 1201 includes: the data node determining subunit is used for acquiring the topological relation and the dependency relation of the data task and determining a data node for processing the data task; the information calculation subunit is used for calculating the performance state information and the task execution information of the data nodes; the performance state information comprises CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of executed tasks, the execution progress and the executed time of the executed tasks; and the task quantity distribution subunit is used for determining the task quantity of the data node in a balanced manner according to the performance state information and the task execution information.
In one embodiment, the task amount allocation unit 1202 includes: the data task analysis subunit is used for analyzing the data tasks distributed to the data nodes to obtain the processing information of the data tasks; and the data task processing subunit is used for performing data processing in parallel through the data nodes according to the processing information.
In one embodiment, the operation status monitoring module 130 includes: a heartbeat message monitoring unit 1301, configured to monitor heartbeat messages sent to each data node by the master node; the downtime determining unit 1302 is configured to determine that the master node is down if the data node receives the heartbeat message and is abnormal.
In one embodiment, the new master node election module 140 includes: a target data node election unit 1401, configured to acquire performance state information and task execution information of the data node, and elect a data node with the best performance state information and/or the best task execution information as a target data node; a master node replacing unit 1402, configured to take the target data node as a new master node, and replace the master node that is down.
For specific limitations of the distributed parallel task scheduling apparatus, reference may be made to the above limitations of the distributed parallel task scheduling method, which is not described herein again. The modules in the distributed parallel task scheduling device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or a processor independent of the computer device, or can be stored in a memory of the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
The distributed parallel task scheduling device provided by the above can be used for executing the distributed parallel task scheduling method provided by any of the above embodiments, and has corresponding functions and beneficial effects.
An embodiment of the present invention further provides a computer device, including: one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to implement the distributed parallel task scheduling method of any of the above embodiments when the programs are executed.
Embodiments of the present invention further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a distributed parallel task scheduling method, including:
when a processing request of a data task is received, responding to the processing request through a main node in a distributed node;
distributing the data tasks in the processing request to each data node through the main node for data processing; the main node and the data node adopt the same configuration;
monitoring the running states of the main node and the data node in the data processing process;
and when the main nodes are down, selecting new main nodes from the data nodes, and responding to a processing request of a new data task through the new main nodes.
Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present invention is not limited to the operations of the distributed parallel task scheduling method described above, and may also perform related operations in the distributed parallel task scheduling method provided in any embodiment of the present invention, and has corresponding functions and advantages.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, and the computer software product may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to enable a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the distributed parallel task scheduling method according to any embodiment of the present invention.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A distributed parallel task scheduling method is characterized by comprising the following steps:
when a processing request of a data task is received, responding to the processing request through a main node in a distributed node;
distributing the data tasks in the processing request to each data node through the main node for data processing; the main node and the data node adopt the same configuration;
monitoring the running states of the main node and the data node in the data processing process;
and when the main nodes are down, selecting new main nodes from the data nodes, and responding to a processing request of a new data task through the new main nodes.
2. The distributed parallel task scheduling method of claim 1, wherein the step of responding to the processing request by the master node in the distributed node is preceded by:
and traversing the nodes, finding out a target node which is configured with a main node identification in advance, and determining the target node as a main node.
3. The distributed parallel task scheduling method according to claim 1, wherein the step of allocating, by the master node, the data task in the processing request to each data node for data processing comprises:
acquiring performance state information and task execution information of each data node, and determining the task amount distributed by each data node according to the performance state information and the task execution information;
and distributing the data tasks to corresponding data nodes for data processing according to the task amount through the main node.
4. The distributed parallel task scheduling method according to claim 3, wherein the step of obtaining performance state information and task execution information of each data node and determining the task amount allocated to each data node according to the performance state information and the task execution information comprises:
acquiring a topological relation and a dependency relation of the data task, and determining a data node for processing the data task;
calculating performance state information and task execution information of the data nodes; the performance state information comprises CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of executed tasks, the execution progress and the executed time of the executed tasks;
and determining the task quantity of the data nodes in a balanced mode according to the performance state information and the task execution information.
5. The distributed parallel task scheduling method according to claim 3, wherein the step of allocating, by the master node, the data task to the corresponding data node according to the task amount to perform data processing includes:
analyzing the data tasks distributed to the data nodes to obtain processing information of the data tasks;
and according to the processing information, carrying out data processing in parallel through the data nodes.
6. The distributed parallel task scheduling method of claim 1, wherein the step of electing a new master node from the data nodes comprises:
acquiring performance state information and task execution information of the data nodes, and selecting target data nodes according to the performance state information and the task execution information;
and taking the target data node as a new main node and replacing the main node which is down.
7. The distributed parallel task scheduling method according to claim 1, wherein the step of monitoring the operating states of the master node and the data node during the data task processing includes:
monitoring heartbeat messages sent to each data node through the main node;
and if the heartbeat message received by the data node is abnormal, judging that the main node is down.
8. A distributed parallel task scheduler, comprising:
the processing request response module is used for responding the processing request through the main node in the distributed nodes when receiving the processing request of the data task;
the data task allocation module is used for allocating the data tasks in the processing requests to all the data nodes through the main node for data processing; the main node and the data node adopt the same configuration;
the operation state monitoring module is used for monitoring the operation states of the main node and the data node in the data processing process;
and the new main node election module is used for electing a new main node from the data nodes when the main node goes down and responding to a processing request of a new data task through the new main node.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the distributed parallel task scheduling method according to any of claims 1-7 when executing the program.
10. A storage medium containing computer-executable instructions for performing the distributed parallel task scheduling method of any of claims 1-7 when executed by a computer processor.
CN201910979783.5A 2019-10-15 2019-10-15 Distributed parallel task scheduling method, device, computer equipment and storage medium Active CN111221632B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910979783.5A CN111221632B (en) 2019-10-15 2019-10-15 Distributed parallel task scheduling method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910979783.5A CN111221632B (en) 2019-10-15 2019-10-15 Distributed parallel task scheduling method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111221632A true CN111221632A (en) 2020-06-02
CN111221632B CN111221632B (en) 2024-06-25

Family

ID=70830550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910979783.5A Active CN111221632B (en) 2019-10-15 2019-10-15 Distributed parallel task scheduling method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111221632B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858012A (en) * 2020-08-05 2020-10-30 杭州安恒信息技术股份有限公司 Task scheduling method, device, equipment and medium
CN111913793A (en) * 2020-07-31 2020-11-10 同盾控股有限公司 Distributed task scheduling method, device, node equipment and system
CN112217906A (en) * 2020-06-30 2021-01-12 北京来也网络科技有限公司 Communication method, device, equipment and storage medium combining AI and RPA
CN112395140A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Decentralized task scheduling method, device, equipment and medium
CN112596893A (en) * 2020-11-23 2021-04-02 中标慧安信息技术股份有限公司 Monitoring method and system for multi-node edge computing device
CN112685157A (en) * 2020-12-29 2021-04-20 平安普惠企业管理有限公司 Task processing method and device, computer equipment and storage medium
CN112860393A (en) * 2021-01-20 2021-05-28 北京科技大学 Distributed task scheduling method and system
CN112948106A (en) * 2020-09-07 2021-06-11 深圳市明源云科技有限公司 Task allocation method and device
CN112965796A (en) * 2021-03-01 2021-06-15 亿企赢网络科技有限公司 Task scheduling system, method and device
CN114428682A (en) * 2022-01-23 2022-05-03 苏州浪潮智能科技有限公司 Overtime task processing method, system, storage medium and equipment
WO2023185556A1 (en) * 2022-03-28 2023-10-05 深圳市兆珑科技有限公司 Multi-device management method, first device, central device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521044A (en) * 2011-12-30 2012-06-27 北京拓明科技有限公司 Distributed task scheduling method and system based on messaging middleware
CN105589748A (en) * 2014-10-22 2016-05-18 阿里巴巴集团控股有限公司 Service request processing method and apparatus
CN108984284A (en) * 2018-06-26 2018-12-11 杭州比智科技有限公司 DAG method for scheduling task and device based on off-line calculation platform
CN109977161A (en) * 2019-03-28 2019-07-05 上海中通吉网络技术有限公司 The monitoring system of presto cluster

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521044A (en) * 2011-12-30 2012-06-27 北京拓明科技有限公司 Distributed task scheduling method and system based on messaging middleware
CN105589748A (en) * 2014-10-22 2016-05-18 阿里巴巴集团控股有限公司 Service request processing method and apparatus
CN108984284A (en) * 2018-06-26 2018-12-11 杭州比智科技有限公司 DAG method for scheduling task and device based on off-line calculation platform
CN109977161A (en) * 2019-03-28 2019-07-05 上海中通吉网络技术有限公司 The monitoring system of presto cluster

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
卢凯,等: "SW_PVM:一个基于PVM的分布式多线程通讯", 小型微型计算机系统, vol. 20, no. 10, 31 October 1999 (1999-10-31), pages 747 - 790 *
杜根远,著: "空间数据检索及快速处理技术研究", vol. 2015, 31 October 2015, 武汉大学出版社, pages: 142 - 143 *
赵春珍,主编: "数字图书馆管理与服务创新研究", vol. 2017, 31 May 2017, 吉林文史出版社, pages: 341 - 342 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112217906A (en) * 2020-06-30 2021-01-12 北京来也网络科技有限公司 Communication method, device, equipment and storage medium combining AI and RPA
CN111913793A (en) * 2020-07-31 2020-11-10 同盾控股有限公司 Distributed task scheduling method, device, node equipment and system
CN111858012A (en) * 2020-08-05 2020-10-30 杭州安恒信息技术股份有限公司 Task scheduling method, device, equipment and medium
CN112948106A (en) * 2020-09-07 2021-06-11 深圳市明源云科技有限公司 Task allocation method and device
CN112948106B (en) * 2020-09-07 2024-05-31 深圳市明源云科技有限公司 Task allocation method and device
CN112395140A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Decentralized task scheduling method, device, equipment and medium
CN112395140B (en) * 2020-11-17 2023-01-17 平安科技(深圳)有限公司 A decentralized task scheduling method apparatus, device and medium
CN112596893B (en) * 2020-11-23 2021-10-08 中标慧安信息技术股份有限公司 Monitoring method and system for multi-node edge computing device
CN112596893A (en) * 2020-11-23 2021-04-02 中标慧安信息技术股份有限公司 Monitoring method and system for multi-node edge computing device
CN112685157A (en) * 2020-12-29 2021-04-20 平安普惠企业管理有限公司 Task processing method and device, computer equipment and storage medium
CN112685157B (en) * 2020-12-29 2024-04-02 武汉安辰鑫信息科技有限公司 Task processing method, device, computer equipment and storage medium
CN112860393A (en) * 2021-01-20 2021-05-28 北京科技大学 Distributed task scheduling method and system
CN112860393B (en) * 2021-01-20 2024-03-15 北京科技大学 Distributed task scheduling method and system
CN112965796A (en) * 2021-03-01 2021-06-15 亿企赢网络科技有限公司 Task scheduling system, method and device
CN112965796B (en) * 2021-03-01 2024-04-09 亿企赢网络科技有限公司 Task scheduling system, method and device
CN114428682B (en) * 2022-01-23 2023-11-03 苏州浪潮智能科技有限公司 Overtime task processing method, overtime task processing system, storage medium and storage device
CN114428682A (en) * 2022-01-23 2022-05-03 苏州浪潮智能科技有限公司 Overtime task processing method, system, storage medium and equipment
WO2023185556A1 (en) * 2022-03-28 2023-10-05 深圳市兆珑科技有限公司 Multi-device management method, first device, central device and storage medium

Also Published As

Publication number Publication date
CN111221632B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
CN111221632A (en) Distributed parallel task scheduling method and device, computer equipment and storage medium
CN108881495B (en) Resource allocation method, device, computer equipment and storage medium
US20200137151A1 (en) Load balancing engine, client, distributed computing system, and load balancing method
CN112162865B (en) Scheduling method and device of server and server
US8677353B2 (en) Provisioning a standby virtual machine based on the prediction of a provisioning request being generated
US7340654B2 (en) Autonomic monitoring in a grid environment
US20210006505A1 (en) A bursty traffic allocation method, device and proxy server
CA3168286A1 (en) Data flow processing method and system
EP3432159B1 (en) Garbage collection method and device
US20130103835A1 (en) Resource management method, resource management device, and program product
CN109564528B (en) System and method for computing resource allocation in distributed computing
US11438271B2 (en) Method, electronic device and computer program product of load balancing
CN111258746A (en) Resource allocation method and service equipment
US20190171492A1 (en) Method for managing computational resources of a data center
CN112732401A (en) Virtual machine resource allocation method, system, device and medium
Zhengbing et al. Resource management in a distributed computer system with allowance for the level of trust to computational components
CN116010064A (en) DAG job scheduling and cluster management method, system and device
JP2007328413A (en) Method for distributing load
US9575865B2 (en) Information processing system and monitoring method
CN113051063B (en) Task scheduling method and device for distributed tasks and electronic equipment
Mendis et al. Bio-inspired distributed task remapping for multiple video stream decoding on homogeneous NoCs
JP2016099972A (en) Process migration method and cluster system
US20190213049A1 (en) Data controlling method of distributed computing system and distributed computing system
US11704242B1 (en) System and method for dynamic memory optimizer and manager for Java-based microservices
JP4703681B2 (en) Cluster system and takeover node determination method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant