CN111221632B - Distributed parallel task scheduling method, device, computer equipment and storage medium - Google Patents

Distributed parallel task scheduling method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN111221632B
CN111221632B CN201910979783.5A CN201910979783A CN111221632B CN 111221632 B CN111221632 B CN 111221632B CN 201910979783 A CN201910979783 A CN 201910979783A CN 111221632 B CN111221632 B CN 111221632B
Authority
CN
China
Prior art keywords
data
node
task
distributed
master node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910979783.5A
Other languages
Chinese (zh)
Other versions
CN111221632A (en
Inventor
钟泽峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN201910979783.5A priority Critical patent/CN111221632B/en
Publication of CN111221632A publication Critical patent/CN111221632A/en
Application granted granted Critical
Publication of CN111221632B publication Critical patent/CN111221632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The embodiment of the application provides a distributed parallel task scheduling method, a device, computer equipment and a storage medium, and relates to the technical field of computer networks. Distributing the data tasks to each data node through the master node for data processing; the main node and the data node adopt the same configuration; when the master node is down, a new master node is selected from the data nodes, and the new master node responds to the processing request of the new data task, so that the problem that if the master node is down, the data task response is interrupted is avoided. According to the scheme, based on inter-thread kernel scheduling, data processing is performed, new master nodes can be quickly selected from the data nodes, the operation efficiency of the distributed system is improved, meanwhile, the distributed nodes are identical in configuration, the development cost is reduced, and lightweight development is realized.

Description

Distributed parallel task scheduling method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer networks, and in particular, to a distributed parallel task scheduling method, apparatus, computer device, and storage medium.
Background
In the existing task scheduling technology, a task scheduling system based on Azkaban and quaterz technologies adopts an inter-process communication mode to schedule specific services in a service system, and an operation instruction sent by a user is transmitted to the service system through a communication network to perform task scheduling after being processed by a task server and an execution server in the task scheduling system.
The task server, the execution server and the service system all belong to the process level, and the independence of the data space in the task server, the execution server and the service system determines the mode that the task scheduling needs to be realized by depending on external scheduling. Therefore, in the task scheduling process, a large amount of resources are consumed for communication between the processes, which affects the task scheduling processing efficiency, and particularly when single-point faults occur, the system is difficult to recover.
Disclosure of Invention
The invention aims to at least solve one of the technical defects, particularly the problems that task scheduling among processes affects task scheduling efficiency and a single point failure causes difficult recovery of a system, and provides a distributed parallel task scheduling method, a distributed parallel task scheduling device, computer equipment and a storage medium.
In a first aspect, this embodiment provides a distributed parallel task scheduling method, including the steps of:
when receiving a processing request of a data task, responding to the processing request through a main node in the distributed nodes;
Distributing the data task in the processing request to each data node through the master node for data processing; the master node and the data node adopt the same configuration;
monitoring the running states of the master node and the data node in the data processing process;
when the master node is down, a new master node is selected from the data nodes, and a processing request of a new data task is responded through the new master node.
In an embodiment, the step of responding to the processing request by the master node in the distributed node further comprises:
and traversing the nodes, finding out target nodes which are pre-configured with the main node identifiers, and determining the target nodes as the main nodes.
In one embodiment, the step of distributing, by the master node, the data task in the processing request to each data node for data processing includes:
Acquiring performance state information and task execution information of each data node, and determining task quantity distributed by each data node according to the performance state information and the task execution information;
and distributing the data task to the corresponding data node through the main node according to the task quantity to perform data processing.
In an embodiment, the step of obtaining performance state information and task execution information of each data node, and determining the task amount allocated to each data node according to the performance state information and the task execution information includes:
acquiring topological relation and dependency relation of the data task, and determining a data node for processing the data task;
Calculating performance state information and task execution information of the data node; the performance state information comprises CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of executed tasks, the execution progress of the executed tasks and the executed time;
And determining the task quantity of the data node in a balanced mode according to the performance state information and the task execution information.
In an embodiment, the step of distributing, by the master node, the data task to a corresponding data node according to the task amount for data processing includes:
Analyzing the data task distributed to the data node to obtain the processing information of the data task;
and carrying out data processing in parallel through the data nodes according to the processing information.
In one embodiment, the step of selecting a new master node from the data nodes includes:
acquiring performance state information and task execution information of the data node, and competing for a target data node according to the performance state information and the task execution information;
And taking the target data node as a new main node and replacing the main node with downtime.
In one embodiment, the step of monitoring the operation states of the master node and the data node during the data task processing process includes:
monitoring heartbeat messages sent to each data node by the master node;
and if the data node receives the heartbeat message and is abnormal, judging that the main node is down.
In a second aspect, the present embodiment provides a distributed parallel task scheduling device, including:
The processing request response module is used for responding to the processing request through a main node in the distributed nodes when receiving the processing request of the data task;
The data task distribution module is used for distributing the data task in the processing request to each data node through the master node to perform data processing; the master node and the data node adopt the same configuration;
the running state monitoring module is used for monitoring the running states of the main node and the data node in the data processing process;
And the new master node election module is used for electing a new master node from the data nodes when the master node is down, and responding to a processing request of a new data task through the new master node.
In a third aspect, the present embodiment provides a computer device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor implements the distributed parallel task scheduling method according to any embodiment of the first aspect when executing the program.
In a fourth aspect, the present embodiment provides a storage medium containing computer executable instructions, which when executed by a computer processor are configured to perform the distributed parallel task scheduling method according to any of the embodiments of the first aspect.
According to the distributed parallel task scheduling method, the distributed parallel task scheduling device, the computer equipment and the storage medium, the main node and the data nodes are configured in the same mode, the main node responds to the processing request of the data task, the data task is distributed to each data node for data processing, if the main node is down, a new main node is selected from the data nodes with the same configuration as the main node, and the new main node responds to the processing request of the new data task. According to the distributed task scheduling method, the distributed nodes execute one thread based on inter-thread communication in the same service system, so that the consumption of multi-process communication on service system resources is avoided, the task scheduling processing efficiency can be improved, meanwhile, when a master node is down, as the configuration of each distributed node is the same, a new master node can be quickly selected from data nodes, the recovery efficiency of the down is improved, and the operation efficiency of the distributed system is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a distributed parallel task scheduling method provided by an embodiment;
FIG. 2 is a detailed flowchart of step S120 in one embodiment;
FIG. 3 is another detailed flowchart of step S120 in one embodiment;
FIG. 4 is a detailed flowchart of step S130 in one embodiment;
FIG. 5 is a detailed flowchart of step S140 in one embodiment;
FIG. 6 is a system framework employed by a conventional task scheduling method;
FIG. 7 is a system framework employed by the distributed parallel task defense method provided in the present solution;
FIG. 8 is a schematic diagram of a distributed parallel task scheduler according to one embodiment;
fig. 9 is another schematic structural diagram of a distributed parallel task scheduling device according to an embodiment.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
In some of the flows described in the specification and claims of the present application and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 110, 120, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
FIG. 1 is a flow chart of a distributed parallel task scheduling method provided by an embodiment. Before explaining the scheme in detail, a system architecture adopted by the distributed task scheduling method of the scheme is introduced. The system architecture includes a plurality of distributed nodes and clients, the plurality of distributed nodes being interconnected. The distributed task scheduling method can be executed on any one of the plurality of distributed nodes, and the node selected as the master node is used as a server to respond to a service request initiated by a client and is distributed to each data node for processing.
Specifically, referring to fig. 1, the distributed parallel task scheduling method provided in this embodiment includes the following steps:
And S110, when receiving a processing request of a data task, responding to the processing request through a master node in the distributed nodes.
Wherein the plurality of distributed nodes includes a master node and a plurality of data nodes. The master node is communicatively coupled to each of the data nodes. The master node is a node which responds to a processing request of a data task and distributes tasks to the data nodes. Correspondingly, a data node is a node that performs the data task assigned by the master node. When the client can access the distributed task scheduling system, namely the service system, and send a processing request of the data task to the service system, wherein the processing request contains processing information of the data task. And the master node in the service system responds to the processing request and acquires the processing information of the data task from the processing request sent by the client.
And S120, distributing the data task in the processing request to each data node through the master node for data processing.
The master node and the data node adopt the same configuration.
Different from the configuration of the main node in the prior art from the data node, the configuration of each distributed node in the embodiment is the same, so that each node can be the main node, even if the main node is down, a new main node can be rapidly reselected from the data node without modifying a configuration file, and the data task processing efficiency is improved. Meanwhile, the main node and the data node adopt the same configuration and deployment, so that the development difficulty is reduced.
In one embodiment, after receiving the data request, the master node distributes the data task to each data node to process the data task. After each data node receives the data task, the data nodes are started at regular time according to the processing information of the data task, and the data tasks are executed in parallel.
And S130, monitoring the running states of the master node and the data node in the data processing process.
In the data task processing process, in order to facilitate understanding of the working condition of each distributed node, the running states of the main node and the data node need to be monitored in real time, such as whether a fault occurs, the change of each performance parameter, the task execution condition and the like. Specifically, a heartbeat message is sent to a data node through a main node, and whether the main node fails or not is monitored; the running state of the data nodes is monitored by calculating performance parameters of each data node, such as resource use conditions of CPU use amount, memory use amount, I/O resource use rate and the like, and load parameters, such as the number of execution tasks, execution progress, execution time and the like. Optionally, the running states of the master node and the data node can be fed back to the client, so that the operation condition of the distributed task scheduling system can be checked in time.
And S140, when the master node is down, selecting a new master node from the data nodes, and responding to a processing request of a new data task through the new master node.
When the master node is down, the master node cannot respond to the processing request of the data task. Because the configuration of the data nodes is the same as that of the master nodes, at the moment, new master nodes can be rapidly selected from the data nodes, the new master nodes are replaced with the master nodes with downtime, and the processing requests of the data tasks are responded. The new master node distributes the data tasks to all other data nodes except the new master node, so that the problem that the whole distributed task scheduling system cannot continue task scheduling due to downtime of the current master node is avoided, and the processing efficiency of the data tasks is improved.
When the master node is permanently down, a new master node is selected from the data nodes, and the new master node responds to the processing request of the data task to distribute the data task from the new master node to the data nodes except the new master node so as to execute the data task. If the main node is temporarily down, after the main node is recovered to be normal, the new main node selected from the data nodes can be replaced, the original main node can continue to respond to the processing request of the data task, or the new main node selected from the data nodes can respond to the processing request, and the original main node recovered to be normal plays the role of the data node and receives the data task distributed by the new main node.
It should be noted that, task scheduling in the prior art is based on inter-process scheduling, and independence of data space between processes determines that communication needs to be implemented through an operating system, and is based on external scheduling. The distributed parallel scheduling method provided by the embodiment is based on thread execution, and because the multithread shares the address space and the data space, task data of one thread can be directly provided for other threads to use without an external operating system. Based on kernel scheduling in the same system, the communication efficiency among threads in the task scheduling process in the technical scheme is high, the data task processing efficiency of the task scheduling system can be improved, and resources are saved.
According to the distributed parallel task scheduling method provided by the embodiment, the same configuration is adopted for each distributed node based on task scheduling among threads, the master node responds to the processing request of the data task, the data task is distributed to each data node for data processing, when a certain master node is down, each data node is possible to become the master node for task scheduling due to the fact that the master node is identical to the plurality of data nodes in configuration, and a configuration file is not required to be modified, so that a new master node is rapidly reelected, down recovery is rapid, the operation efficiency of a distributed system is improved, meanwhile, the configuration of each distributed node is identical, development cost is reduced, and lightweight development is realized.
In an embodiment, before the master node in the distributed node responds to the processing request in step S110, the method may further include:
and S100, traversing the nodes, finding out target nodes which are pre-configured with the main node identifiers, and determining the target nodes as main nodes.
In order to facilitate determination of the master node, one of the distributed nodes may be selected as the master node, and preconfigured with a master node identifier. Traversing each node in the distributed task scheduling system to find out the target node which is preconfigured with the main node identification, wherein the target node acts as the main node.
In other embodiments, because the configuration of the master node and the data nodes is the same, any one node in the distributed nodes can be selected as the master node, so that the master node can be quickly confirmed, and the data task processing efficiency is improved.
In order to make the technical scheme clearer and more convenient to understand, the specific implementation process and mode of each step in the technical scheme are described in detail below.
FIG. 2 is a detailed flowchart of step S120 of an embodiment, as shown in FIG. 2, in an embodiment, step S120 of distributing data tasks in a processing request to each data node in parallel by the master node for data processing may further include:
S121, acquiring performance state information and task execution information of each data node, and determining the task quantity distributed by each data node according to the performance state information and the task execution information.
The performance state information of the data node refers to parameter information describing the performance state of the data node, which includes state parameters such as CPU main frequency, memory size, CPU utilization rate, memory utilization rate, I/O resource utilization rate, etc., and the task execution information of the data node refers to parameter information describing the task execution state of the data node, which includes task parameters such as the number of tasks in progress, task execution progress, executed time, remaining execution time, etc.
The performance state information and task execution information of the data node affect the efficiency of processing data tasks of the data node. In order to reasonably and evenly distribute the data tasks, the resources of the distributed task scheduling system are fully utilized. After receiving the performance state information and the task execution information periodically sent by each data node, the master node determines the task quantity distributed by each data node according to the performance state information and the task execution information of the data node. By way of example, if the number of data tasks handled by data node a is 2, the number of tasks expected to be full is 5, the number of data tasks handled by data node B is 0, the number of tasks expected to be full is 6, then it is determined that data node a may also accept 3 task amounts, data node B may receive 6 task amounts, and may preferentially allocate the newly received data tasks to data node B.
S122, distributing the data task to the corresponding data node through the main node according to the task quantity to perform data processing.
After the task quantity required to be executed by each data node is determined, the data tasks are distributed to the corresponding data nodes through the master node according to the task quantity. Each data node can execute a plurality of data tasks in parallel without waiting for the execution of one data task to finish and then starting the execution of the other data task, thereby improving the execution efficiency of the data tasks.
Fig. 3 is another detailed flowchart of step S120 in one embodiment, and as shown in fig. 3, further, step S121 may specifically include the following sub-steps:
S1201, obtaining the topological relation and the dependency relation of the data task, and determining the data node for processing the data task.
The processing request of the data task contains the information such as the topological relation, the dependency relation and the like for processing the data task. The topological relation refers to the mutual relation of each data node determined according to the mutual relation among each item of data in the data task, and the dependent relation refers to the fact that the occurrence of one event depends on the completion of another event, namely, the task processed by one data node needs to depend on the completion of the task processed by other data nodes. The analysis of the topological relation and the dependency relation is performed according to the actual task scheduling structure, and optionally, in an embodiment, the topological relation corresponding to the processing request is analyzed by ProjectManager, and the dependency relation of the processing request is analyzed by IKExpression.
In an embodiment, the distributed task scheduling system analyzes a topological relation and a dependency relation corresponding to a processing request of a data task, and searches a data node related to executing the data task.
S1202, calculating performance state information and task execution information of the data node.
The performance state information comprises CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of executed tasks, execution progress of the executed tasks and executed time.
In an embodiment, the lower the values of parameters such as CPU utilization, memory utilization, and I/O resource utilization of the data node, the better the performance state of the data node is explained. For example, a certain performance parameter, such as a memory usage rate, may be set as a main parameter for evaluating a performance state, or a weight formula may be set, where multiple performance parameters are comprehensively considered, such as a weight of 0.3 for a CPU usage rate, a weight of 0.5 for a memory usage rate, and a weight of 0.2 for an I/O resource usage rate, and a performance state=cpu usage rate 0.3+memory usage rate 0.5+i/O resource usage rate 0.2 for a current data node is calculated.
Similarly, the fewer the number of ongoing tasks, the faster the task execution progress, the shorter the remaining execution time, etc., the better the task execution state of the data node is indicated, and in an embodiment, a certain task parameter, such as the number of ongoing tasks, may be set, as a main parameter for evaluating the task execution state, or a calculation formula may be set, and the task execution state may be comprehensively evaluated in consideration of a plurality of task parameters.
And S1203, determining the task quantity of the data node in a balanced mode according to the performance state information and the task execution information.
In most cases, the performance states of the data nodes are different from each other, and the task execution states at different times are different from each other, so that the task amounts can be distributed to the data nodes in an equalized manner based on the performance states and the task execution states of the data nodes. For example, according to the performance state information and the task execution information, more task amounts are allocated to the data nodes with better performance states or better task execution states, and correspondingly, less task amounts are allocated to the data nodes with worse performance states or worse task execution states, so that the total task amounts of all the data nodes are balanced, and the resource utilization rate of the distributed task scheduling system is improved.
Further, the step S122 may specifically include the following substeps:
and S1204, analyzing the data tasks distributed to the data nodes to obtain the processing information of the data tasks.
The data task includes processing information. The data tasks may include: storing a workflow log, deleting data information, starting or closing a certain program, updating clearing information, and the like. After the data node receives the data task, the data task is analyzed to obtain corresponding processing information, such as the type of the task, the starting execution time, the task execution period and the like.
And S1205, carrying out data processing in parallel through the data nodes according to the processing information.
Wherein one data node may perform multiple data tasks simultaneously, and the same data task may be performed by multiple data nodes. Meanwhile, a plurality of data tasks can be independently executed, and each data node can execute the data tasks in parallel.
For example, the data node a receives a data task of zero point data clearing, and the data node B receives a data task of 1 am starting modification data; the data node C receives the data tasks and the like which send data every whole point, and the data nodes A, B and C can execute the received data tasks in parallel without waiting for the completion of the execution of the data task of one data node, thereby improving the operation efficiency of the task scheduling system.
FIG. 4 is a detailed flowchart of an embodiment of step S130. As shown in FIG. 4, in an embodiment, step S130 may include the following steps of, during data processing, monitoring the operation states of the master node and the data nodes:
S1301, monitoring heartbeat messages sent to each data node through the master node.
In order to ensure normal communication between the master node and the plurality of data nodes, the master node transmits heartbeat messages to the plurality of data nodes according to a fixed period. And judging whether the master node is down or not by monitoring heartbeat messages sent to the data node by the master node.
S1302, judging whether the heartbeat message received by the data node is abnormal, if yes, jumping to S1303, otherwise jumping to S1304.
If the data node cannot receive the heartbeat message, or receives the heartbeat message overtime, or the received heartbeat message is disordered, the data node is judged to receive the heartbeat message and is abnormal.
And S1303, judging that the master node is down.
When the data node receives the heartbeat message and is abnormal, the main node is judged to be down, and at the moment, the main node cannot respond to a data task processing request sent by a client or cannot send a data task to the data node.
S1304, judging that the master node is normal.
Specifically, the data node normally receives the heartbeat message sent by the master node, and the master node normally responds to the processing request of the data task and distributes the data task to the data node.
Fig. 5 is a detailed flowchart of step S140 in an embodiment, and as shown in fig. 5, the step S140 of selecting a new master node from the data nodes may include the following steps:
S1401, acquiring performance state information and task execution information of the data node, and competing for a target data node according to the performance state information and the task execution information.
Because the configuration and the deployment of each data node are the same, if the current main node is down, the processing request of the data task cannot be responded. At this time, a new master node is reelected according to the performance state information and the task execution information of the data node. For example, the data node with the best performance state information may be selected as the target data node, the data node with the best task execution state information may be selected as the target data node, and the data node with the best performance state information and the best task execution state information may be selected as the target data node.
Illustratively, the process of competing for the target node may be: the performance state information and the execution task state information of each data node are acquired, and target data nodes are selected according to the performance state information and the execution task state information in an competing way, for example, the data node with the lowest CPU utilization rate selected according to the performance state information is used as the target data node, the data node with the lowest current execution task selected according to the execution task state information is used as the target data node, the selected data node with the lowest CPU utilization rate and the lowest current execution task is used as the target data node, and the like.
S1402, taking the target data node as a new master node, and replacing the master node with downtime.
And after taking the target data node as a new master node, switching and sending a processing request of the data task to the new master node, wherein the new master node replaces the previous master node, namely the master node with downtime responds to the processing request.
It should be noted that, because the master node and the data nodes in each distributed node adopt the same configuration, when the master node is down, the current performance state and task execution state of each data node can be not considered, and a new master node can be directly selected from each data node.
In addition, the data node selected as the new master node may continue to execute the previously allocated data task, but in response to the processing request of the new data task, the new data task is allocated to the other data nodes by the data node selected as the new master node.
The distributed parallel task scheduling method provided by the technical scheme is further described below with reference to fig. 6 and 7.
Fig. 6 is a schematic diagram of a system architecture adopted by a task scheduling method in a conventional technology, as shown in fig. 6, a user sends a processing request of a data task to a scheduling system through a client, a task server serving as a master node responds to the processing request and sends the data task to each execution server serving as a data node, and each execution server receives the data task and then makes an action of executing the task to a service system through network connection according to processing information in the data task. In the system architecture adopted by the task scheduling, a task server, an execution server and a service system in the scheduling system are all in a process level, and more resources are consumed for the communication in the process level.
FIG. 7 is a schematic diagram of a system architecture adopted by the distributed parallel task scheduling method provided by the embodiment of the present invention, as shown in FIG. 7, a distributed node executes a thread, and when the node is a master node, the master node distributes a data task to a data node through a scheduler; when the node is a data node, the data node executes the data task through the scheduler after receiving the data task. The distributed parallel task scheduling method in the scheme is based on scheduling among threads in a service system, the communication speed among the threads is faster than that among the processes, the efficiency of resource processing is improved, the communication safety among the threads is higher, and the resource cost is lower.
Fig. 8 is a schematic structural diagram of a distributed parallel task scheduling device provided in an embodiment, where the distributed parallel task scheduling device provided in the embodiment may be integrated in a computer device.
As shown in fig. 8, the distributed parallel task scheduling device includes: a processing request response module 110, a data task allocation module 120, a running status monitoring module 130, and a new master election module 140.
The processing request response module 110 is configured to respond to a processing request of a data task through a master node in the distributed nodes when the processing request is received; the data task allocation module 120 is configured to allocate, by the master node, data tasks in the processing request to each data node in parallel for data processing; the master node and the data node adopt the same configuration; an operation state monitoring module 130, configured to monitor an operation state of the master node and the data node during a data processing process; and the new master node election module 140 is used for electing a new master node from the data nodes when the master node is down, and responding to a processing request of a new data task through the new master node.
The distributed parallel task scheduling device provided by the embodiment is based on the distributed task scheduling among threads of kernel scheduling, the distributed nodes adopt the same configuration, the master node responds to the processing request of the data task, the data task is distributed to each data node for data processing, if the master node is down, a new master node is selected from each data node with the same configuration as the master node, the new master node responds to the processing request of the new data task, the problem that the data task responds to interrupt if the master node is down is avoided, the master node which appears can be quickly selected, the scheduling system can be quickly recovered, the operation efficiency of the distributed system is improved, meanwhile, the configuration of each distributed node is the same, the development cost is reduced, and lightweight development is realized.
Fig. 9 is another schematic structural diagram of a distributed parallel task scheduling device provided in an embodiment, as shown in fig. 9, in one embodiment, the distributed parallel task scheduling device further includes: and the traversing module 100 is configured to traverse the nodes, find out a target node configured with a master node identifier in advance, and determine the target node as a master node.
In one embodiment, the data task allocation module 120 includes: a task amount determining unit 1201, configured to obtain performance state information and task execution information of each data node, and determine an amount of tasks allocated to each data node according to the performance state information and the task execution information; and a task amount distribution unit 1202, configured to distribute, by the master node, the data tasks to corresponding data nodes in parallel according to the task amount.
In one embodiment, the task amount determination unit 1201 includes: the data node determining subunit is used for acquiring the topological relation and the dependency relation of the data task and determining the data node for processing the data task; an information calculating subunit, configured to calculate performance state information and task execution information of the data node; the performance state information comprises CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of executed tasks, the execution progress of the executed tasks and the executed time; and the task quantity distribution subunit is used for determining the task quantity of the data node in a balanced mode according to the performance state information and the task execution information.
In one embodiment, the task volume allocation unit 1202 includes: the data task analysis subunit is used for analyzing the data task distributed to the data node to obtain the processing information of the data task; and the data task processing subunit is used for carrying out data processing in parallel through the data nodes according to the processing information.
In one embodiment, the operational status monitoring module 130 includes: a heartbeat message monitoring unit 1301, configured to monitor heartbeat messages sent to each data node by the master node; and the downtime determination unit 1302 is configured to determine that the master node is downtime if the data node receives the heartbeat message and is abnormal.
In one embodiment, the new master election module 140 includes: a target data node competing unit 1401, configured to obtain performance state information and task execution information of the data node, and compete out a data node with the best performance state information and/or the best task execution information as a target data node; a master node replacement unit 1402, configured to take the target data node as a new master node, and replace the master node that is down.
The specific limitation of the distributed parallel task scheduling device can be referred to the limitation of the distributed parallel task scheduling method hereinabove, and will not be described herein. The modules in the distributed parallel task scheduling device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be stored in a processor of the computer device, or may be stored in software in a memory of the computer device, so that the processor may call and execute operations corresponding to the above modules.
The distributed parallel task scheduling device provided by the above can be used for executing the distributed parallel task scheduling method provided by any embodiment, and has corresponding functions and beneficial effects.
The embodiment of the invention also provides computer equipment, which comprises: one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more program(s) configured to implement the distributed parallel task scheduling method as mentioned in any of the embodiments above when the program(s) is executed.
Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a distributed parallel task scheduling method comprising:
when receiving a processing request of a data task, responding to the processing request through a main node in the distributed nodes;
Distributing the data task in the processing request to each data node through the master node for data processing; the master node and the data node adopt the same configuration;
monitoring the running states of the master node and the data node in the data processing process;
when the master node is down, a new master node is selected from the data nodes, and a processing request of a new data task is responded through the new master node.
Of course, the storage medium containing the computer executable instructions provided by the embodiment of the invention is not limited to the operations of the distributed parallel task scheduling method described above, and the related operations in the distributed parallel task scheduling method provided by any embodiment of the invention can be executed, and the storage medium has corresponding functions and beneficial effects.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, where the instructions include a plurality of instructions for causing a computer device (which may be a robot, a personal computer, a server, or a network device, etc.) to execute the distributed parallel task scheduling method according to any embodiment of the present invention.
The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (7)

1. The distributed parallel task scheduling method is characterized by comprising the following steps of:
When receiving a processing request of a data task, traversing each node in a distributed task scheduling system, searching a target node which is pre-configured with a main node identifier, determining the target node as a main node, responding to the processing request through the main node in the distributed nodes, executing one thread by each distributed node, and sharing an address space and a data space by all threads;
The data task in the processing request is distributed to each data node through the master node for data processing, and the method comprises the following steps: acquiring performance state information and task execution information of each data node, and determining task quantity allocated to each data node according to the performance state information and the task execution information, wherein the performance state information comprises CPU main frequency, memory size, CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of tasks in progress, task execution progress, executed time and residual execution time; distributing the data task to a corresponding data node through the main node according to the task quantity to perform data processing; the master node and the data node adopt the same configuration;
monitoring the running states of the master node and the data node in the data processing process;
When the master node is down, a new master node is selected from the data nodes, including: acquiring performance state information and task execution information of the data node, and competing for a target data node with optimal performance state information or optimal task execution state according to the performance state information and the task execution information; taking the target data node as a new main node, and replacing the main node with downtime, wherein the new main node continuously executes the distributed data task;
responding to the processing request of the new data task through the new master node.
2. The distributed parallel task scheduling method according to claim 1, wherein the step of acquiring performance state information and task execution information of each data node and determining the task amount allocated to each data node according to the performance state information and task execution information includes:
acquiring topological relation and dependency relation of the data task, and determining a data node for processing the data task;
Calculating performance state information and task execution information of the data node;
And determining the task quantity of the data node in a balanced mode according to the performance state information and the task execution information.
3. The distributed parallel task scheduling method according to claim 1, wherein the step of distributing the data task to the corresponding data node by the master node according to the task amount for data processing includes:
Analyzing the data task distributed to the data node to obtain the processing information of the data task;
and carrying out data processing in parallel through the data nodes according to the processing information.
4. A distributed parallel task scheduling method according to claim 1, wherein the step of monitoring the operating states of the master node and the data node during data task processing includes:
monitoring heartbeat messages sent to each data node by the master node;
and if the data node receives the heartbeat message and is abnormal, judging that the main node is down.
5. A distributed parallel task scheduling device, comprising:
the processing request response module is used for traversing each node in the distributed task scheduling system when receiving a processing request of a data task, searching a target node which is pre-configured with a main node identifier, determining the target node as a main node, responding the processing request through the main node in the distributed nodes, executing one thread by each distributed node, and sharing an address space and a data space by all threads;
The data task allocation module is configured to allocate, by the master node, a data task in a processing request to each data node for data processing, and includes: acquiring performance state information and task execution information of each data node, and determining task quantity allocated to each data node according to the performance state information and the task execution information, wherein the performance state information comprises CPU main frequency, memory size, CPU utilization rate, memory utilization rate and I/O resource utilization rate, and the task execution information comprises the number of tasks in progress, task execution progress, executed time and residual execution time; distributing the data task to a corresponding data node through the main node according to the task quantity to perform data processing; the master node and the data node adopt the same configuration;
the running state monitoring module is used for monitoring the running states of the main node and the data node in the data processing process;
And the new master node election module is used for electing a new master node from the data nodes when the master node is down, and comprises the following steps: acquiring performance state information and task execution information of the data node, and competing for a target data node with optimal performance state information or optimal task execution state according to the performance state information and the task execution information; taking the target data node as a new main node, and replacing the main node with downtime, wherein the new main node continuously executes the distributed data task; responding to the processing request of the new data task through the new master node.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the distributed parallel task scheduling method of any of claims 1-4 when the program is executed.
7. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing the distributed parallel task scheduling method of any one of claims 1-4.
CN201910979783.5A 2019-10-15 2019-10-15 Distributed parallel task scheduling method, device, computer equipment and storage medium Active CN111221632B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910979783.5A CN111221632B (en) 2019-10-15 2019-10-15 Distributed parallel task scheduling method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910979783.5A CN111221632B (en) 2019-10-15 2019-10-15 Distributed parallel task scheduling method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111221632A CN111221632A (en) 2020-06-02
CN111221632B true CN111221632B (en) 2024-06-25

Family

ID=70830550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910979783.5A Active CN111221632B (en) 2019-10-15 2019-10-15 Distributed parallel task scheduling method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111221632B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726420A (en) * 2020-06-30 2020-09-29 北京来也网络科技有限公司 Communication method, device, equipment and storage medium based on RPA
CN111913793A (en) * 2020-07-31 2020-11-10 同盾控股有限公司 Distributed task scheduling method, device, node equipment and system
CN111858012A (en) * 2020-08-05 2020-10-30 杭州安恒信息技术股份有限公司 Task scheduling method, device, equipment and medium
CN112948106B (en) * 2020-09-07 2024-05-31 深圳市明源云科技有限公司 Task allocation method and device
CN112395140B (en) * 2020-11-17 2023-01-17 平安科技(深圳)有限公司 A decentralized task scheduling method apparatus, device and medium
CN112596893B (en) * 2020-11-23 2021-10-08 中标慧安信息技术股份有限公司 Monitoring method and system for multi-node edge computing device
CN112685157B (en) * 2020-12-29 2024-04-02 武汉安辰鑫信息科技有限公司 Task processing method, device, computer equipment and storage medium
CN112860393B (en) * 2021-01-20 2024-03-15 北京科技大学 Distributed task scheduling method and system
CN112965796B (en) * 2021-03-01 2024-04-09 亿企赢网络科技有限公司 Task scheduling system, method and device
CN114428682B (en) * 2022-01-23 2023-11-03 苏州浪潮智能科技有限公司 Overtime task processing method, overtime task processing system, storage medium and storage device
CN114598605A (en) * 2022-03-28 2022-06-07 百富计算机技术(深圳)有限公司 Multi-device management method, first device, center device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589748A (en) * 2014-10-22 2016-05-18 阿里巴巴集团控股有限公司 Service request processing method and apparatus
CN108984284A (en) * 2018-06-26 2018-12-11 杭州比智科技有限公司 DAG method for scheduling task and device based on off-line calculation platform
CN109977161A (en) * 2019-03-28 2019-07-05 上海中通吉网络技术有限公司 The monitoring system of presto cluster

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521044B (en) * 2011-12-30 2013-12-25 北京拓明科技有限公司 Distributed task scheduling method and system based on messaging middleware

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589748A (en) * 2014-10-22 2016-05-18 阿里巴巴集团控股有限公司 Service request processing method and apparatus
CN108984284A (en) * 2018-06-26 2018-12-11 杭州比智科技有限公司 DAG method for scheduling task and device based on off-line calculation platform
CN109977161A (en) * 2019-03-28 2019-07-05 上海中通吉网络技术有限公司 The monitoring system of presto cluster

Also Published As

Publication number Publication date
CN111221632A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
CN111221632B (en) Distributed parallel task scheduling method, device, computer equipment and storage medium
CN112162865B (en) Scheduling method and device of server and server
CN110297711B (en) Batch data processing method, device, computer equipment and storage medium
Hieu et al. Virtual machine consolidation with multiple usage prediction for energy-efficient cloud data centers
US7340654B2 (en) Autonomic monitoring in a grid environment
CN108881495B (en) Resource allocation method, device, computer equipment and storage medium
US20210006505A1 (en) A bursty traffic allocation method, device and proxy server
CN111818159B (en) Management method, device, equipment and storage medium of data processing node
US8918792B2 (en) Workflow monitoring and control system, monitoring and control method, and monitoring and control program
CN110113387A (en) A kind of processing method based on distributed batch processing system, apparatus and system
CN103259832A (en) Cluster resource control method for achieving dynamic load balance, fault diagnosis and failover
CN105302641B (en) The method and device of node scheduling is carried out in virtual cluster
CN109710416B (en) Resource scheduling method and device
CN109343939A (en) A kind of distributed type assemblies and parallel computation method for scheduling task
US11438271B2 (en) Method, electronic device and computer program product of load balancing
US11042417B2 (en) Method for managing computational resources of a data center using a single performance metric for management decisions
CN112015549B (en) Method and system for selectively preempting scheduling nodes based on server cluster
CN111258746A (en) Resource allocation method and service equipment
CN110955514A (en) Method, system and computer readable medium for improving utilization rate of Linux business process
CN112732401A (en) Virtual machine resource allocation method, system, device and medium
CN111767145A (en) Container scheduling system, method, device and equipment
CN114564281A (en) Container scheduling method, device, equipment and storage medium
CN104320433A (en) Data processing method and distributed data processing system
Yue et al. Microservice aging and rejuvenation
CN112910952B (en) Distributed task scheduling method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant