CN106502791A

CN106502791A - A kind of method for allocating tasks and device

Info

Publication number: CN106502791A
Application number: CN201610898038.4A
Authority: CN
Inventors: 周云锋; 亓开元; 苏志远
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2016-10-14
Filing date: 2016-10-14
Publication date: 2017-03-15
Anticipated expiration: 2036-10-14
Also published as: CN106502791B

Abstract

The invention provides a kind of method for allocating tasks and device, method includes：Corresponding resource information is distinguished according to each calculate node in distributed computing platform, is defined as calculating type of priority calculate node or storage type of priority calculate node by current calculate node；Target calculating task is obtained, and is resolved at least two calculating process for possessing time sequencing；At least two calculating process are parsed, current calculating process is resolved at least two parallel tasks；Each parallel task is parsed, to determine which distinguishes corresponding attribute information；Corresponding attribute information is distinguished according to each parallel task, present parallel task is defined as data-intensive parallel task or computation-intensive parallel task；Computation-intensive parallel task is distributed to calculating type of priority calculate node, and data-intensive parallel task is distributed to storage type of priority calculate node.By technical scheme, the calculating duration of target calculating task can be reduced.

Description

Task allocation method and device

Technical Field

The invention relates to the technical field of computers, in particular to a task allocation method and a task allocation device.

Background

With the arrival of the big data era, the requirement of big data calculation is difficult to meet through simple high-performance single-node operation, and distributed calculation models such as MapReduce, Spark, Storm and the like are generated.

At present, in a distributed computing model, a distributed computing model is constructed by using a plurality of computing nodes, a target computing task with a large computation amount is split into a serial computing process according to a time sequence, when the serial computing process can be further split into a plurality of parallel computing processes, and when the current serial computing process is further split into a plurality of parallel tasks, each parallel task in the current serial computing process is respectively and randomly allocated to a corresponding computing node in the distributed computing model, so that the corresponding computing node respectively executes the corresponding parallel tasks.

Because the hardware performances of the computing nodes in the distributed computing model are different, when a computing node with lower hardware performance cannot meet the computing requirement or the storage requirement of the parallel task allocated to the current computing node, the parallel task needs to be dropped from the current computing node and the parallel task needs to be reallocated to other computing nodes, and frequent killings and restarting of the parallel task will consume a lot of time, thereby increasing the computing time of the target computing task.

Disclosure of Invention

The embodiment of the invention provides a task allocation method and a task allocation device, which can reduce the calculation time of a target calculation task.

In a first aspect, an embodiment of the present invention provides a task allocation method, including:

s1: acquiring resource information corresponding to each computing node in an external distributed computing model, and determining a current computing node as a computing priority type computing node or a storage priority type computing node according to the resource information corresponding to each computing node;

s2: acquiring a target calculation task, and analyzing the target calculation task into at least two calculation processes with a time sequence;

s3: analyzing the at least two calculation processes in sequence, and when the current calculation process can be analyzed into at least two parallel tasks, analyzing the current calculation process into at least two parallel tasks;

s4: analyzing each parallel task to determine attribute information corresponding to each parallel task;

s5: determining the current parallel task as a data intensive parallel task or a computation intensive parallel task according to the attribute information corresponding to each parallel task respectively;

s6 assigning each of the compute-intensive parallel tasks to the at least one compute-priority compute node, respectively, and assigning each of the data-intensive parallel tasks to the at least one storage-priority compute node, respectively.

Preferably, the first and second electrodes are formed of a metal,

the analyzing each parallel task to determine attribute information corresponding to each parallel task includes: and analyzing each parallel task to determine the calculated amount, the data amount and the algorithm complexity corresponding to each parallel task.

Preferably, the first and second electrodes are formed of a metal,

further comprising: determining resource demand information corresponding to each parallel task according to the calculated amount, the data amount and the algorithm complexity corresponding to each parallel task;

said assigning each of said compute-intensive parallel tasks to said at least one compute-priority compute node and each of said data-intensive parallel tasks to said at least one storage-priority compute node, respectively, comprising: and respectively allocating each compute-intensive parallel task to the at least one compute-priority computing node according to the resource demand information respectively corresponding to each compute-intensive parallel task, and respectively allocating each compute-intensive parallel task to the at least one storage-priority computing node according to the resource demand information respectively corresponding to each data-intensive parallel task.

Preferably, the first and second electrodes are formed of a metal,

the resource demand information at least comprises the following three application information: memory resource application information, IO resource application information and CPU resource application information; further comprising:

a1: setting the weight of the memory resource application information, the weight of the IO resource application information and the weight of the CPU resource application information which are respectively corresponding to each parallel task as 1;

a2: determining whether at least one target parallel task has data skew according to the calculated amount, the data amount and the algorithm complexity which correspond to each parallel task respectively, if so, adding 1 to the weight of the memory resource application information, the weight of the IO resource application information and the weight of the CPU resource application information which correspond to the target parallel task, and then executing the step A3; otherwise, go to step A3;

a3: adding 1 to the weight of the memory resource application information and the weight of the IO resource application information corresponding to each data intensive parallel task, and adding 1 to the weight of the CPU resource application information corresponding to each compute intensive parallel task;

the allocating each compute-intensive parallel task to the at least one compute-priority compute node according to the resource demand information corresponding to each compute-intensive parallel task, respectively, and allocating each data-intensive parallel task to the at least one storage-priority compute node according to the resource demand information corresponding to each data-intensive parallel task, respectively, includes: according to each weight of memory resource application information, IO resource application information, CPU resource application information and CPU resource application information corresponding to the current compute-intensive parallel task, which correspond to the compute-intensive parallel task respectively, and each weight of memory resource application information and IO resource application information corresponding to the data-intensive parallel task respectively, which correspond to the data-intensive parallel task, are distributed to the at least one storage-priority computing node respectively.

Preferably, the method further comprises the following steps:

when the current computing process can not be analyzed into at least two parallel tasks, analyzing the current computing process to determine the corresponding computing amount, data amount and algorithm complexity of the current computing process;

determining the type of the current calculation process to be data intensive or calculation intensive according to the calculated amount, the data amount and the algorithm complexity corresponding to the current calculation process;

when the type of the current computing process is data intensive, allocating the current computing process to the at least one storage-priority computing node; or, when the type of the current computing process is compute intensive, assigning the current computing process to the at least one compute-priority computing node.

In a second aspect, an embodiment of the present invention provides a task allocation apparatus, including:

the node management module is used for acquiring resource information corresponding to each computing node in the external distributed computing model respectively, and determining the current computing node as a computing priority computing node or a storage priority computing node according to the resource information corresponding to each computing node respectively;

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target calculation task and analyzing the target calculation task into at least two calculation processes with a time sequence;

the first analysis module is used for analyzing the at least two calculation processes in sequence, and when the current calculation process can be analyzed into at least two parallel tasks, the current calculation process is analyzed into at least two parallel tasks;

the second analysis module is used for analyzing each parallel task to determine attribute information corresponding to each parallel task;

the determining module is used for determining the current parallel task as a data intensive parallel task or a calculation intensive parallel task according to the attribute information corresponding to each parallel task;

the task processing module is used for respectively allocating each computation-intensive parallel task to the at least one computation-priority computing node and allocating each data-intensive parallel task to the at least one storage-priority computing node, so that the at least one storage-priority computing node allocated to the data-intensive parallel tasks and the at least one computation-priority computing node allocated to the computation-intensive parallel tasks respectively execute the corresponding parallel tasks.

Preferably, the first and second electrodes are formed of a metal,

the second analysis module is used for analyzing each parallel task to determine the calculated amount, the data amount and the algorithm complexity corresponding to each parallel task.

Preferably, the method further comprises the following steps:

the quantitative processing module is used for determining resource demand information corresponding to each parallel task according to the calculated amount, the data amount and the algorithm complexity corresponding to each parallel task;

the task processing module is configured to allocate each compute-intensive parallel task to the at least one compute-priority computing node according to resource demand information corresponding to each compute-intensive parallel task, and allocate each compute-intensive parallel task to the at least one storage-priority computing node according to resource demand information corresponding to each data-intensive parallel task.

Preferably, the first and second electrodes are formed of a metal,

further comprising: the device comprises a weight configuration module, a judgment module, a first correction module and a second correction module; wherein,

the weight configuration module is used for setting the weight of the memory resource application information, the weight of the IO resource application information and the weight of the CPU resource application information which are respectively corresponding to each parallel task to be 1;

the judging module is used for determining whether at least one target parallel task has data inclination according to the calculated amount, the data amount and the algorithm complexity which are respectively corresponding to each parallel task, and if so, the first correcting module is triggered; otherwise, triggering the second correction module;

the first correction module is used for adding 1 to the weight of the memory resource application information, the weight of the IO resource application information and the weight of the CPU resource application information corresponding to the target parallel task under the triggering of the judgment module, and then triggering the second correction module;

the second correction module is configured to add 1 to the weight of the memory resource application information and the weight of the IO resource application information corresponding to each data intensive parallel task, and add 1 to the weight of the CPU resource application information corresponding to each computation intensive parallel task, under the trigger of the determination module or the first correction module;

the task processing module is used for allocating each of the compute-intensive parallel tasks to at least one compute-priority computing node according to the weight of memory resource application information, IO resource application information, CPU resource application information and CPU resource application information corresponding to the compute-intensive parallel tasks respectively and the weight of the memory resource application information corresponding to the compute-intensive parallel tasks respectively and the weight of the IO resource application information according to each of the memory resource application information, the IO resource application information, the CPU resource application information and the current data-intensive parallel tasks respectively corresponding to the data-intensive parallel tasks and the weight of the IO resource application information respectively, and allocating each of the data-intensive parallel tasks to the at least one storage-priority computing node respectively.

Preferably, the first and second electrodes are formed of a metal,

the second analysis module is further configured to analyze the current computing process when the current computing process cannot be analyzed into at least two parallel tasks, so as to determine a computation amount, a data amount and an algorithm complexity corresponding to the current computing process;

the determining module is further configured to determine, according to the calculation amount, the data amount and the algorithm complexity corresponding to the current calculation process, that the type of the current calculation process is data intensive or calculation intensive;

the task processing module is further used for distributing the current computing process to the at least one storage priority type computing node when the type of the current computing process is data intensive; or, when the type of the current computing process is compute intensive, assigning the current computing process to the at least one compute-priority computing node.

The embodiment of the invention provides a task allocation method and a device, in the method, resource information corresponding to each computing node in an external distributed computing model is obtained firstly, respectively determining each computing node as a computing priority type computing node or a storage priority type computing node according to the acquired resource information, after acquiring a target task and splitting the target task into a plurality of computing processes with time sequence, i.e. the splittable computation process can be sequentially split into a plurality of parallel tasks which can be executed in parallel, and further the attribute information of each parallel task is determined, determining the current parallel task as a data intensive parallel task or a computation intensive parallel task according to the attribute information of the current parallel task, and then, distributing each calculation intensive parallel task to a calculation priority type calculation node respectively, and distributing each data intensive parallel task to a storage priority type calculation node respectively; in summary, the calculation intensive tasks are distributed to the calculation priority type calculation nodes with better calculation performance, the data intensive tasks are distributed to the storage priority type calculation nodes with better storage performance, frequent kill and parallel task restart caused by the fact that the calculation nodes in the distributed calculation model cannot meet the calculation requirements or the storage requirements of the distributed parallel tasks are avoided, and the calculation time of the target calculation task can be shortened.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a task allocation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another task allocation method provided by an embodiment of the invention;

FIG. 3 is a schematic structural diagram of a task allocation apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of another task assigning apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of another task allocation apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a task allocation method, including:

s6: and distributing each computation-intensive parallel task to the at least one computation-priority computing node respectively, and distributing each data-intensive parallel task to the at least one storage-priority computing node respectively, so that the at least one storage-priority computing node distributed to the data-intensive parallel tasks and the at least one computation-priority computing node distributed to the computation-intensive parallel tasks execute the corresponding parallel tasks respectively.

In the above embodiment of the present invention, first, the resource information corresponding to each computing node in the external distributed computing model is obtained, respectively determining each computing node as a computing priority type computing node or a storage priority type computing node according to the acquired resource information, after acquiring a target task and splitting the target task into a plurality of computing processes with time sequence, i.e. the splittable computation process can be sequentially split into a plurality of parallel tasks which can be executed in parallel, and further the attribute information of each parallel task is determined, determining the current parallel task as a data intensive parallel task or a computation intensive parallel task according to the attribute information of the current parallel task, and then, distributing each calculation intensive parallel task to a calculation priority type calculation node respectively, and distributing each data intensive parallel task to a storage priority type calculation node respectively; in summary, the calculation intensive tasks are distributed to the calculation priority type calculation nodes with better calculation performance, the data intensive tasks are distributed to the storage priority type calculation nodes with better storage performance, frequent kill and parallel task restart caused by the fact that the calculation nodes in the distributed calculation model cannot meet the calculation requirements or the storage requirements of the distributed parallel tasks are avoided, and the calculation time of the target calculation task can be shortened.

Specifically, in an embodiment of the present invention, when a current computing node is determined as a storage-priority computing node or a computing-priority computing node according to resource information corresponding to each computing node in a distributed computing model, if resource information corresponding to the current computing node reflects that the performance of a CPU is higher than that of a hard disk and a memory, and when the current computing node executes a computing task, the CPU waits for reading and writing of the hard disk and the memory under most conditions, the current computing node may be determined as the computing-priority computing node; on the contrary, if the resource information corresponding to the current computing node reflects that the CPU performance is lower than that of the hard disk and the memory, and when the current computing node executes a computing task, the CPU occupancy rate is 100% in most cases, and reading and writing of the hard disk and the memory can be completed in a very short time to cause IO idle, the current computing node can be determined as a storage priority computing node.

Further, in order to facilitate determining a task type of a parallel task and determine whether one or more parallel tasks have data skew in the parallel tasks in the same calculation process, in an embodiment of the present invention, the analyzing each of the parallel tasks to determine attribute information corresponding to each of the parallel tasks includes: and analyzing each parallel task to determine the calculated amount, the data amount and the algorithm complexity corresponding to each parallel task.

In the above embodiment of the present invention, after determining the computation amount, the data amount, and the algorithm complexity respectively corresponding to a plurality of parallel tasks in the same computation process, if the computation amount, the data amount, or the algorithm complexity corresponding to at least one target parallel task is far higher than the computation amount, the data amount, or the algorithm complexity corresponding to other parallel tasks, it may be determined that data skew occurs in the current target parallel task.

Further, in order to save the computing resources, an embodiment of the present invention further includes: determining resource demand information corresponding to each parallel task according to the calculated amount, the data amount and the algorithm complexity corresponding to each parallel task;

In the above embodiments of the present invention, according to the computation amount, the data amount, and the algorithm complexity corresponding to the parallel task, the resource size (for example, the memory resource size, the IO resource size, the hard disk resource size, and the CPU resource size) required by the compute node when executing the current parallel task may be quantized, and when executing the current parallel task, the corresponding compute node may configure the virtual machine according to the quantized resource size, so as to execute the current parallel task by using the virtual machine, thereby avoiding the parallel task from consuming a large amount of resources of the current compute node.

It should be understood that when the quantized resources are not sufficient for executing the current parallel task, the virtual machine in the current computing node may be dynamically expanded without dropping the running parallel task in the current computing node.

Furthermore, when the same calculation process is divided into a plurality of parallel tasks which can be executed in parallel, data skew may occur in one or more parallel tasks in the same calculation process, for example, due to uneven key value distribution in the calculation process, excessive concentration of single-point data occurs in the calculation process, the data amount corresponding to one or more parallel tasks in the calculation process is too large, meanwhile, a mining algorithm with higher complexity which partially cannot apply a distributed calculation model may exist in the parallel tasks, and the execution time of the parallel tasks is also too long, so that the parallel tasks with data skew are preferentially processed conveniently, meanwhile, more computing resources are conveniently distributed for the system, and the computing duration corresponding to the current computing process is reduced as much as possible. Memory resource application information, IO resource application information and CPU resource application information; further comprising:

For example, when the target calculation task is split into a calculation process job1 and a calculation process job2 having a time sequence, the calculation process job1 is split into four parallel tasks such as task1, task2, task3 and task4, and the calculation amounts corresponding to task1 and task2, respectively, are relatively high, and the data amount is relatively small, and the calculation amounts corresponding to task3 and task4, respectively, are relatively small, and the data amount is relatively high, that is, when task1 and task2 are determined as calculation-intensive parallel tasks, and task3 and task4 are determined as data-intensive parallel tasks, respectively; firstly, setting the weight of memory resource application information, the weight of IO resource application information and the weight of CPU resource application information corresponding to four parallel tasks such as task1, task2, task3 and task4 to be 1, and when the calculated amount corresponding to task1 is far higher than that of task2, task3 and task4, the data amount corresponding to task1 is far higher than that of task2, task3 and task4, or the algorithm complexity corresponding to task1 is far higher than that of task2, task3 and task4, then adding 1 to the weight of memory resource application information corresponding to task1, the weight of IO resource application information and the CPU resource application information; then, adding 1 to the weight of the memory resource application information and the weight of the IO resource application information corresponding to task3 and task4 respectively, and adding 1 to the weight of the CPU resource application information corresponding to task1 and tsk2 respectively; thus, the weight of the memory resource application information, the weight of the IO resource application information, and the weight of the CPU resource application information respectively corresponding to task1, task2, task3, and task4 are: task 1- (2,2, 3); task 2- (1,1, 2); task 3- (2,2, 1); task 4- (2,2, 1); thus, in the process of allocating four parallel tasks such as task1, task2, task3 and task4 to compute nodes, the task1 may be preferentially allocated to the compute priority compute node with the best performance, then the task2 may be allocated to the corresponding compute priority compute node, and the task3 and the task4 may be allocated to the corresponding store priority compute node, in combination with the memory resource application information, the IO resource application information and the CPU resource application information respectively corresponding to task1, task2, task3 and task 4.

In summary, by preferentially allocating the parallel tasks with data skew to the corresponding compute-priority or storage-priority compute nodes with the best performance, the parallel tasks with data skew can be processed by the corresponding compute nodes in the distributed compute model first in a plurality of parallel tasks, and the corresponding compute nodes can give more resource support to the parallel tasks with data skew.

Further, in an embodiment of the present invention, the method further includes:

In the embodiment of the invention, for the calculation process which cannot be split into a plurality of parallel tasks, the calculation process is firstly analyzed to determine the corresponding calculation amount, data amount and algorithm complexity, the type of the current calculation process is determined to be data intensive or calculation intensive according to the calculation amount, data amount and algorithm complexity corresponding to the current calculation process, and the calculation process is pertinently distributed to the calculation nodes of the corresponding type according to the type of the current calculation process, so that the calculation nodes of the corresponding type can provide enough resource support for the calculation process, frequent kill and parallel task restart due to the fact that the calculation nodes cannot meet the calculation requirement or storage requirement of the distributed parallel tasks are avoided, and the calculation time of the target calculation task is further reduced.

For more clearly illustrating the objects and advantages of the technical solution provided by the present invention, as shown in fig. 2, an embodiment of the present invention provides another task allocation method, which may include the following steps:

step 201, acquiring resource information corresponding to the computing nodes A, B, C, D in the external distributed computing model.

In step 202, the computing nodes A, B, C, D are determined as the computing-priority computing nodes or the storage-priority computing nodes according to the resource information respectively corresponding to the computing nodes A, B, C, D.

For example, when the computing node A, B, C, D is determined as a data-priority computing node or a compute-priority computing node according to the resource information corresponding to the computing node A, B, C, D, if the resource information corresponding to the computing nodes a and B respectively reflects that the performance of the CPU is higher than that of the hard disk and the memory, and the computing nodes a and B should wait for the CPU to read and write the hard disk and the memory in most situations when executing computing tasks, the computing nodes a and B may be determined as the compute-priority computing nodes; on the contrary, if the resource information corresponding to the computing nodes C and D reflects that the CPU performance is lower than that of the hard disk and the memory, and the computing nodes C and D should have a CPU occupancy rate of 100% in most cases when executing the computing task, the reading and writing of the hard disk and the memory can be completed in a very short time, and IO idle occurs, then the computing nodes C and D may be determined as storage-priority computing nodes.

Step 203, acquiring target computing task Application.

In step 204, the Application is parsed into two calculation processes job1 and job2 with time sequence.

Step 205, resolve job1 into parallel tasks, tast1, tast2, tast3, and tast 4.

Step 206, analyzing the test 1, the test 2, the test 3 and the test 4 respectively to determine the calculated amount, the data amount and the algorithm complexity corresponding to the test 1, the test 2, the test 3 and the test 4 respectively.

And step 207, respectively determining the last 1, the last 2, the last 3 and the last 4 as data intensive parallel tasks or calculation intensive parallel tasks according to the calculated amount, the data amount and the algorithm complexity respectively corresponding to the last 1, the last 2, the last 3 and the last 4.

For example, when the amount of computation corresponding to task1 and task2, respectively, is relatively high and the amount of data is relatively small, and the amount of computation corresponding to task3 and task4, respectively, is relatively small and the amount of data is relatively high, then task1 and task2 may be determined to be computation-intensive parallel tasks, and task3 and task4 may be determined to be data-intensive parallel tasks, respectively.

And step 208, determining memory resource application information, IO resource application information and CPU resource application information respectively corresponding to the test 1, the test 2, the test 3 and the test 4 according to the calculated amount, the data amount and the algorithm complexity respectively corresponding to the test 1, the test 2, the test 3 and the test 4.

Here, it is to be noted that the resources required by the corresponding computing node when executing the corresponding parallel task are quantized in advance, for example, the memory resource size, the IO resource size, the hard disk resource size, the CPU resource size, and the like corresponding to the current parallel task are quantized, and when executing the current parallel task, the corresponding computing node may configure a virtual machine according to the quantized resource size, so as to execute the current parallel task by using the virtual machine, thereby avoiding the parallel task from consuming a large amount of resources of the current computing node.

Step 209, setting the weights of the memory resource application information, the IO resource application information and the CPU resource application information corresponding to the respective test 1, test 2, test 3 and test 4 to 1.

Step 210, determining whether data skew occurs in the target parallel tasks in the test 1, the test 2, the test 3 and the test 4 according to the calculated amount, the data amount and the algorithm complexity respectively corresponding to the test 1, the test 2, the test 3 and the test 4, and if so, executing step 211; otherwise, step 212 is performed.

In step 211, the weight of the memory resource application information, the weight of the IO resource application information, and the weight of the CPU resource application information corresponding to the target parallel task1 are increased by 1.

For example, in steps 209 to 211, in step 209, first, the weights of the memory resource application information, the IO resource application information, and the CPU resource application information respectively corresponding to the four parallel tasks, such as task1, task2, task3, and task4, are all set to 1, and when it is determined in step 210 that the calculated amount corresponding to task1 is much higher than tasks 2, task3, and task4, the data amount corresponding to task1 is much higher than tasks 2, task3, and task4, or the algorithm complexity corresponding to task1 is much higher than tasks 2, task3, and task4, then the weights of the memory resource application information, the IO resource application information, and the CPU resource application information corresponding to task1, and the weights of the IO resource application information, and the CPU resource application information may all be processed by adding 1 in step 211.

Step 212, adding 1 to the weight of the memory resource application information and the weight of the IO resource application information respectively corresponding to task3 and task4, and adding 1 to the weight of the CPU resource application information respectively corresponding to task1 and task 2.

In the embodiment of the present invention, after performing corresponding configuration and correction processing on the weight of the memory resource application information, the weight of the IO resource application information, and the weight of the CPU resource application information respectively corresponding to task1, task2, task3, and task4 in steps 209 to 212, the weight of the memory resource application information, the weight of the IO resource application information, and the weight of the CPU resource application information respectively corresponding to task1, task2, task3, and task4 are:

task1——(2,2,3)；

task2——(1,1,2)；

task3——(2,2,1)；

task4——(2,2,1)。

step 213, according to the memory resource application information, IO resource application information, and CPU resource application information corresponding to task1, task2, and task3 task4, respectively, and the weight of the CPU resource application information corresponding to task1 and task2, allocating task1 to compute node a, task2 to compute node a or B, and task3 and task4 to compute node C and/or compute node D.

Through the steps, on one hand, aiming at a plurality of parallel tasks in the same computing process, the computing intensive tasks tast1 and tast2 are allocated to the computing priority type computing nodes A and B with better computing performance, and the data intensive tasks tast3 and tast4 are allocated to the storage priority type computing nodes C and D with better storage performance, so that frequent kill and parallel task restart due to the fact that the computing nodes in the distributed computing model cannot meet the computing requirements or the storage requirements of the allocated parallel tasks are avoided, and the computing time of the target computing task can be reduced; on the other hand, for the compute-intensive tasks tast1 and tast2, the target parallel task tast1 with data skew can be preferentially allocated to the compute-priority computing node a with the best computation performance, the computing node a can provide sufficient support for the tast1, the execution speed of the tast1 is increased, and the computation time of the job1 is shortened; on the other hand, the tast1, the tast2, the tast3 and the tast4 can be selectively allocated to the corresponding computing nodes according to the quantized memory resource application information, the IO resource application information and the CPU resource application information, and when the computing nodes execute the current parallel task, the virtual machines can be configured according to the quantized resources, so that the current parallel task is executed by the virtual machines, and the parallel task is prevented from consuming a large amount of resources of the current computing nodes.

Step 214, the job2 is parsed to determine the calculated amount, data amount and algorithm complexity corresponding to the job 2.

And step 215, determining the type of the jobb 2 as a calculation priority type or a storage priority type according to the calculated amount, the data amount and the algorithm complexity corresponding to the jobb 2.

Step 216, when the type of the job2 is data intensive, allocating a job2 to the computing node C; when the type of job2 is compute intensive, job2 is assigned to compute node A.

As shown in fig. 3, an embodiment of the present invention provides a task allocation apparatus, including:

the node management module 301 is configured to acquire resource information corresponding to each computing node in the external distributed computing model, and determine a current computing node as a compute-priority computing node or a storage-priority computing node according to the resource information corresponding to each computing node;

an obtaining module 302, configured to obtain a target computing task, and analyze the target computing task into at least two computing processes with a time sequence;

a first parsing module 303, configured to parse the at least two computing processes in sequence, and when a current computing process can be parsed into at least two parallel tasks, parse the current computing process into at least two parallel tasks;

a second parsing module 304, configured to parse each parallel task to determine attribute information corresponding to each parallel task;

a determining module 305, configured to determine, according to attribute information respectively corresponding to each of the parallel tasks, a current parallel task as a data-intensive parallel task or a computation-intensive parallel task;

a task processing module 306, configured to respectively allocate each of the compute-intensive parallel tasks to the at least one compute-priority computing node, and respectively allocate each of the data-intensive parallel tasks to the at least one storage-priority computing node.

Further, in order to conveniently determine the task type of the parallel task and determine whether one or more parallel tasks have data skew in the parallel tasks in the same calculation process, in an embodiment of the present invention, the second parsing module 304 is configured to parse each of the parallel tasks to determine a computation amount, a data amount, and an algorithm complexity corresponding to each of the parallel tasks.

Further, in order to save resources, as shown in fig. 4, in an embodiment of the present invention, the method further includes:

the quantization processing module 401 is configured to determine resource demand information corresponding to each parallel task according to a computation amount, a data amount, and an algorithm complexity corresponding to each parallel task;

the task processing module 306 is configured to allocate each compute-intensive parallel task to the at least one compute-priority computing node according to the resource requirement information corresponding to each compute-intensive parallel task, and allocate each compute-intensive parallel task to the at least one storage-priority computing node according to the resource requirement information corresponding to each data-intensive parallel task.

Further, in order to facilitate priority processing on the parallel task with data skew, and at the same time, to facilitate allocation of more computing resources for the parallel task, and to reduce the computing duration corresponding to the current computing process as much as possible, as shown in fig. 5, an embodiment of the present invention further includes: a weight configuration module 501, a judgment module 502, a first correction module 503 and a second correction module 504; wherein,

the weight configuration module 501 is configured to set a weight of the memory resource application information, a weight of the IO resource application information, and a weight of the CPU resource application information, which correspond to each parallel task, to 1;

the judging module 502 is configured to determine whether at least one target parallel task has data skew according to the calculated amount, the data amount, and the algorithm complexity respectively corresponding to each parallel task, and if so, trigger the first correcting module 503; otherwise, triggering the second modification module 504;

the first correcting module 503 is configured to add 1 to the weight of the memory resource application information, the weight of the IO resource application information, and the weight of the CPU resource application information corresponding to the target parallel task under the trigger of the determining module 502, and then trigger the second correcting module 504;

the second modification module 504 is configured to, under the trigger of the determination module 502 or the first modification module 501, add 1 to the weight of the memory resource application information and the weight of the IO resource application information corresponding to each data intensive parallel task, and add 1 to the weight of the CPU resource application information corresponding to each computation intensive parallel task;

the task processing module 306 is configured to allocate the compute-intensive parallel tasks to the at least one compute-prioritized computing node according to the weights of the memory resource application information, the IO resource application information, the CPU resource application information, and the CPU resource application information corresponding to the compute-intensive parallel tasks, respectively, and allocate each of the data-intensive parallel tasks to the at least one storage-prioritized computing node according to the weights of the memory resource application information, the IO resource application information, the CPU resource application information, and the memory resource application information corresponding to the current data-intensive parallel tasks, and the IO resource application information, respectively, corresponding to the data-intensive parallel tasks.

In an embodiment of the present invention, the second parsing module 304 is further configured to parse the current computing process when the current computing process cannot be parsed into at least two parallel tasks, so as to determine a computation amount, a data amount, and an algorithm complexity corresponding to the current computing process;

the determining module 305 is further configured to determine, according to the calculation amount, the data amount, and the algorithm complexity corresponding to the current calculation process, that the type of the current calculation process is data intensive or calculation intensive;

the task processing module 306 is further configured to allocate the current computing process to the at least one storage-priority computing node when the type of the current computing process is data intensive; or, when the type of the current computing process is compute intensive, assigning the current computing process to the at least one compute-priority computing node.

Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.

In summary, the embodiments of the present invention have at least the following advantages:

1. in an embodiment of the invention, resource information corresponding to each computing node in the external distributed computing model is first obtained, respectively determining each computing node as a computing priority type computing node or a storage priority type computing node according to the acquired resource information, after acquiring a target task and splitting the target task into a plurality of computing processes with time sequence, i.e. the splittable computation process can be sequentially split into a plurality of parallel tasks which can be executed in parallel, and further the attribute information of each parallel task is determined, determining the current parallel task as a data intensive parallel task or a computation intensive parallel task according to the attribute information of the current parallel task, and then, distributing each calculation intensive parallel task to a calculation priority type calculation node respectively, and distributing each data intensive parallel task to a storage priority type calculation node respectively; in summary, the calculation intensive tasks are distributed to the calculation priority type calculation nodes with better calculation performance, the data intensive tasks are distributed to the storage priority type calculation nodes with better storage performance, frequent kill and parallel task restart caused by the fact that the calculation nodes in the distributed calculation model cannot meet the calculation requirements or the storage requirements of the distributed parallel tasks are avoided, and the calculation time of the target calculation task can be shortened.

2. In an embodiment of the present invention, according to the computation amount, the data amount, and the algorithm complexity corresponding to the parallel task, the resource size (for example, the memory resource size, the IO resource size, the hard disk resource size, and the CPU resource size) required by the compute node when executing the current parallel task may be quantized, and when executing the current parallel task, the corresponding compute node may configure a virtual machine according to the quantized resource size, so as to execute the current parallel task by using the virtual machine, thereby avoiding the parallel task from consuming a large amount of resources of the current compute node.

3. In an embodiment of the invention, for a parallel task with data skew, priority processing is performed on the parallel task, and the parallel task is preferentially allocated to a corresponding computation priority type computing node or a storage priority type computing node with optimal performance according to the computation type of the current parallel task, so that the parallel task with data skew can be processed by the corresponding computing node in the distributed computing model in a plurality of parallel tasks first, and the corresponding computing node can also give more resource support to the parallel task with data skew, thereby reducing the computation time corresponding to the current computing process as much as possible.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the inclusion of an element by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.

Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A task allocation method, comprising:

acquiring resource information corresponding to each computing node in an external distributed computing model, and determining a current computing node as a computing priority type computing node or a storage priority type computing node according to the resource information corresponding to each computing node;

acquiring a target calculation task, and analyzing the target calculation task into at least two calculation processes with a time sequence;

analyzing the at least two calculation processes in sequence, and when the current calculation process can be analyzed into at least two parallel tasks, analyzing the current calculation process into at least two parallel tasks;

analyzing each parallel task to determine attribute information corresponding to each parallel task;

determining the current parallel task as a data intensive parallel task or a computation intensive parallel task according to the attribute information corresponding to each parallel task respectively;

and distributing each computation-intensive parallel task to the at least one computation-priority computing node respectively, and distributing each data-intensive parallel task to the at least one storage-priority computing node respectively.

2. The task allocation method according to claim 1,

3. The task allocation method according to claim 2,

4. The task allocation method according to claim 3,

5. The method of any of claims 1 to 4, further comprising:

6. A task assigning apparatus, comprising:

7. Task assignment device according to claim 6,

8. Task assignment device according to claim 7,

further comprising: the quantitative processing module is used for determining resource demand information corresponding to each parallel task according to the calculated amount, the data amount and the algorithm complexity corresponding to each parallel task;

9. Task assignment device according to claim 8,

10. Task distribution device according to one of the claims 6 to 9,