CN109298897A - A kind of system and method that the task using resource group is distributed - Google Patents
A kind of system and method that the task using resource group is distributed Download PDFInfo
- Publication number
- CN109298897A CN109298897A CN201810695002.5A CN201810695002A CN109298897A CN 109298897 A CN109298897 A CN 109298897A CN 201810695002 A CN201810695002 A CN 201810695002A CN 109298897 A CN109298897 A CN 109298897A
- Authority
- CN
- China
- Prior art keywords
- task
- plug
- unit
- execution
- agency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44521—Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5022—Mechanisms to release resources
Abstract
The invention discloses a kind of system and method that the task using resource group is distributed.A kind of task dissemination system includes: requesting layer, receives the request for one task of execution that tenant submits;Decision-making level receives the request from the requesting layer, and determines the task for whither distributing the request;And execution level, agency and at least one resource group are executed including at least one, at least one executes each execution agent binding in agency to a resource group at least one described resource group described in wherein, wherein, it is described at least one execute each execution agency in agency the resource behaviour in service of the resource group arrived bound in itself report to the decision-making level, and the decision-making level at least based on it is described at least one execute each execution in agency and act on behalf of reported resource behaviour in service and decide whether to act on behalf of the distribution task to the execution.
Description
Technical field
The present invention relates to computer-implemented Properties Controls, particularly the task distribution in computing platform.
Background technique
Currently, increasingly developed with big data technology, there are various Computational frames or tools, such as distributed computing frame
Frame Hadoop, Tool for Data Warehouse Hive, the distributed system Spark calculated based on memory, column storing data library hbase
Deng.In the big data platform of multi-tenant, traditional task execution mode is that task submitter needs to understand that oneself submits appoints
Then service type, such as hive task, spark task or mapreduce task are gone the cluster of deployment respective service to submit and are appointed
Business, and the operation status for paying close attention to task be successfully or failure, cluster resource whether abundance etc.;In this case, when having
When multiple tasks stream needs to submit task, it may be difficult to manage, also, when a task flow failure leads to task blocking
When, the execution of other tasks may be will affect.
In addition, the value volume and range of product of the task of undertaking is all many in big data platform, data mining personnel are being executed not
It, often will be in face of the switching of different resource environment and the overall planning of Limited computational resources when the task of same type.And
When mission failure, data mining personnel may not be able to find in time and find available alternative environment come re-execute appoint
Business, thus lead to resource occupation and waste of time as caused by obstruction in failure environment.
The prior art has taught some solutions regarding to the issue above.For example, will be appointed according to the mark of service line
Business is distributed to target resource group, and the resource for setting regular limit service line to resource group uses.For another example being found using scanner
Task distribute when execute node occupation condition, and according to each node consumption resource situation by mission dispatching to suitably
Node is to execute the task.
In the solution of the prior art, common task ways of distribution is that task is distributed to target according to task identification
Resource group.If target resource group resource is occupied or task breaks down for some reason in target resource group,
Entire task queue is in blocked state.
Single task system is usually only applicable in using the solution that scanner mode executes task in the prior art, it is uncomfortable
With the environment of multiple-task type under big data environment.Therefore it is badly in need of better task ways of distribution.
Summary of the invention
An aspect of of the present present invention is a kind of task dissemination system, comprising: requesting layer receives the execution one that tenant submits and appoints
The request of business;Decision-making level receives the request from the requesting layer, and determines whither to distribute the described of the request
Task;And execution level comprising at least one execution is acted on behalf of and at least one resource group, at least one execution agency
Each execution agent binding to a resource group at least one described resource group, wherein it is described at least one execute generation
Each execution agency in reason reports the resource behaviour in service of the resource group arrived bound in itself to the decision-making level, and institute
It states decision-making level and reported resource behaviour in service is at least acted on behalf of based at least one described each execution executed in agency to determine
Fixed whether act on behalf of to the execution distributes the task.
The information of the decision-making level maintenance and resource group included by the execution level, the information of the resource group include to hold
The information of the resource of the corresponding environment of the information of row agency, execution agency and the information of executable permission.Described at least one
A execution agency executed in agency can send the heartbeat of itself to the decision-making level, so that the decision-making level judges the execution
The health status of agency.The decision-making level can identify the type of the task and the environment and resource of the required by task,
And agency is executed for task choosing adaptation.The decision-making level can distribute the task according to the request of the task
To at least two resource groups.The decision-making level can one be distributed and execute mission failure when, which is divided again
Another execution is dealt into act on behalf of and rerun the task.
Execution agency at least one execution agency can report the resource situation of itself to the decision-making level, with
Toilet states decision-making level's use when selection is executed and acted on behalf of.It is described at least one execute the execution agency in agency can be to being distributed
To task thereon carry out starting monitoring, cease listening for, duplicate removal monitor and status monitoring at least one of.It is described at least one
In the case that execution agency in execution agency can terminate being distributed to task thereon, fails or stop, determining to described
Plan layer reports current state, and clears up and discharge resource used in the task.Holding at least one execution agency
The type and relevant information for the task that row agency can distribute according to the decision-making level, execute the task.It is described at least one hold
Execution agency in row agency is implanted at least one described resource group in the form of plug-in unit, wherein the plug-in unit can be passive
The load or unload of state.
The plug-in unit unloads a plug-in unit by plug-in unit monitor load or unload, and when the plug-in unit monitor detects
When order, the plug-in unit monitor detects the plug-in unit and proceeds as follows: if the task that the plug-in unit is not currently running
And without waiting for the task of operation, then the plug-in unit is deleted from plug-in unit catalogue, to unload the plug-in unit;If described
For plug-in unit with being currently running for task or with the waiting operation of the task, then prompting the user with is having task in the plug-in unit
Running perhaps has task dispatching to wait running and prompting the user with whether to close being currently running for the task or waiting operation
Task execution after unload the plug-in unit again.The plug-in unit is by plug-in unit monitor load or unload, and the plug-in unit is supervised
Device is listened to have the function of to monitor plug-in unit catalog function, setting plug-in unit rule functional and verify at least one in plug-in unit rule functional,
The plug-in unit rule limits the Version Spec of the function of the plug-in unit, the Naming conventions of the plug-in unit and the plug-in unit.When institute
When stating a newly-increased plug-in unit in plug-in unit catalogue, the plug-in unit monitor detects function, safety and the suitability of the plug-in unit.When
When increasing a plug-in unit in plug-in unit catalogue newly, the plug-in unit monitor verifies whether the plug-in unit has realized function, the plug-in unit
Name and version whether meet Naming conventions and Version Spec in plug-in unit rule.An aspect of of the present present invention is that one kind exists
The method of distributed tasks in big data platform, wherein the big data platform includes that at least one executes agency and at least one money
Source group, it is described at least one execute each execution agent binding in agency to a resource at least one described resource group
Group, this method comprises: (1) receives the request for one task of execution that tenant submits;And (2) at least based on it is described at least one hold
The resource behaviour in service of each execution agency in row agency decides whether to task described in the execution scheduling agent.
The step (2) further includes the type for identifying the task and the environment and resource of the required by task, and is
The task choosing adaptation executes agency.The step (2) further includes the request of the task, which is distributed at least
Two resource groups.The step (2) further include one be distributed and execute mission failure when, which is re-distributed to
Another execution is acted on behalf of and reruns the task.
The method of the distributed tasks in big data platform further include: (3) by least one execution agency
Execution agency to being distributed to thereon for task carry out starting monitoring, cease listening for, duplicate removal monitor and status monitoring in extremely
One item missing.The method of the distributed tasks in big data platform further include: (3) by least one execution agency
Agency is executed, in the case where the task thereon that is distributed to terminates, fails or stops, current shape is reported to the decision-making level
State, and clear up and discharge resource used in the task.
An aspect of of the present present invention is a kind of computer-readable medium, is stored thereon with computer-readable instruction, the calculating
Machine readable instruction is able to carry out distributed tasks as described above method when being computer-executed.One embodiment of the invention is held
Row agency can monitor the execution state of task, and report the developments in time to control node, determine that task is by control node
It is no to continue waiting for, or actively switching execution agency continues to execute, and executing agency can be the case where influencing task execution
Lower progress on-line rapid estimation, upgrading, facilitate user to activate business.
One embodiment of the present of invention can be docked different clusters (such as Hadoop cluster) resource by multiple resource groups,
It can guarantee the isolation of mission dispatching and the isolation of performing environment, it is more flexible in the performing environment of the task of selection, it can also protect
To the fault-tolerant of environment when demonstrate,proving task execution.
Detailed description of the invention
Fig. 1 shows the system of the task distribution of embodiment according to the present invention.
Fig. 2 to Fig. 4 shows the process of the task distribution of embodiment according to the present invention.
Specific embodiment
As shown in Figure 1, one embodiment of the present of invention is a kind of system of task distribution, comprising: requesting layer, decision-making level
(including master control (master) node) and execution level (including executing agent node and cluster).In the disclosure, " agency's section is executed
Point " and " executing agency " can be used alternatingly.
In the system shown in figure 1, main controlled node is responsible for receiving, handle and dispatching tenant to platform as control node
Task execution request, and safeguard that the resource group under tenant's working space (executes agent node and respectively executes agent node
The resource of corresponding environment and executable permission), and agent node is executed to which to the mission dispatching and is made decisions.It holds
Row agent node is responsible for reporting the real time resources situation of the environment of itself binding, and issues in the environment of task binds to itself
It executes.Execute agent node can be replaced in the case where not influencing task execution, dilatation, upgrading.Under a certain execution agent node
The task of operation failure can be switched to another execution agent node at any time, guarantee the reliability and stability that task is completed.
Requesting layer
User (also referred to as tenant) submits the request of task to main controlled node, and the type of task may include hive task,
Spark task, flink task, shell task dispatching, and every kind of task may comprising a series of tasks rather than one.Tenant
In the task of submission, without being concerned about that task is submitted to that cluster, whether sufficient, mission failure needs weight to the resource of cluster
It opens.The information of task includes: task type, task definition, task parameters, task resource dependence etc..One embodiment of task
It is as follows:
Decision-making level
In decision-making level, control node of the main controlled node as task dissemination system, function includes: to receive appointing for tenant
Business request;The information of resource group under maintenance platform;The heartbeat of resource group is received to judge respectively to execute agent node in the resource group
Health status;Automatic identification task type, the environment and resource of required by task, and section is acted on behalf of in the execution for task choosing adaptation
Point;According to task requests, the same task can be issued to different resource groups simultaneously;After mission failure, switching executes generation
Reason node reruns task.
Wherein, heartbeat is a kind of pictute, actually refers to node and periodically reports node state to main controlled node
Information and task status information, according to certain frequency jitter just as heart.Once main controlled node can not receive from node
Heartbeat, i.e., after node lost contact certain time or certain number, then main controlled node is believed that node is unhealthy.The effect of heartbeat
It include: that (1) judges whether node lives;(2) memory of node itself, cpu, hard disk, network, i/o etc. are reported to main controlled node
Service condition;(3) state and progress of running on node for task are reported to main controlled node.
The same task can be issued to simultaneously according to the needs of users and different execute agency's (or resource by decision-making level
Group) in.According to one embodiment of present invention, when parent company initiates task requests, task can be issued to multiple sons simultaneously
It is executed in resource group belonging to company;If resource group belonging to each subsidiary is respectively independent, what when task execution was related to
Object (such as database) is also individually separated.Wherein, referent specifically refers to data and ring used when execution task
Border.When executing task in each subsidiary, although what is executed is identical task, number used when task is executed
According to environment etc. belongs to subsidiary oneself.For example, having n people in a project team, everyone, which requires to execute, " submits week
This task of report ", but the weekly that everyone writes is write according to the actual conditions of oneself.
In addition, according to one embodiment of present invention, it is assumed that user has submitted a Hadoop task.Required by task money
Source is 8 cores, and 16G, hadoop version is apache2.6.At this point, there are two execute environment bound in agent node to meet respectively
The processor and memory resource of the required by task, one of Hadoop environment are apache2.6 version, another is cdh
Version.Main controlled node will do it automatic identification, and will execute agency where mission dispatching to apache.
In addition, according to one embodiment of present invention, it is assumed that have 2 resource groups under certain tenant, have 3 under each resource group
Agent node is executed, each execution agent node docks a Hadoop cluster.Assuming that main controlled node receives a task
Request, task type is hiveserver2, and practical bottom is the mapreduce task executed on Hadoop cluster.Master control
For node first to each inquiry request for executing agency and issuing checksource, each execution agency of collection is current to can be used money
The case where source and health status.Assuming that and resource idle there are two node is sufficient at this time, main controlled node can select one at random
A execution agent node executes corresponding task by node in mission dispatching to corresponding node.
Wherein, resource situation for example, the queue space size of Hadoop cluster, hdfs directory space size can be held
Row number of tasks, the shuffle number etc. in cluster.It is an example below:
Execution level
Usually have in one resource group several execute agent node, these execute agent nodes can dock one or
Multiple Hadoop cluster environment or other kinds of cluster environment.In a preferred embodiment, an execution agent node can only be right
A cluster environment is connect, but a cluster environment can accept multiple execution agent nodes.
Execute agent node function include:
(1) heartbeat actively is reported to main controlled node, main controlled node is facilitated to confirm the health status of this execution agent node;
(2) resource situation that itself is reported to main controlled node facilitates main controlled node to execute agent node in selection and execute and appoints
When business, to reasonable assessment is made the case where executing agent node, the suitable agent node that executes is selected to execute task;
(3) task that main controlled node issues is handled, the processing operation includes the type according to issuing for task
And relevant information, execute corresponding task;
(4) task started, stopped, duplicate removal, status monitoring, obtaining implementing result and execution journal;
(5) in the case where task terminates, fails, stops, to after main controlled node report state, clear up and discharge cluster
Resource.
Wherein, duplicate removal refers to: executing agent node and finds that the distributing to it of the task is identical as the task that one is carrying out, then
It is given a warning to main controlled node, i.e., one execution agent node should not be performed simultaneously two identical tasks.
In addition, in one embodiment, executing agent node and being implanted in the form of plug-in unit in resource group, can be utilized
The modes such as API or plug-in unit monitor dynamically load or unloading plug-in unit.The function of plug-in unit monitor includes: monitoring plug-in unit catalogue, if
Set plug-in unit rule, verification plug-in unit rule and load or unload plug-in unit.Various functions are as follows:
(1) it monitors plug-in unit catalogue to refer to, such as in zookeeper distributed coordination service, plug-in unit monitor is used for
Safeguard a plug-in unit catalogue, and the moment monitors the variation of plug-in unit catalogue;When that change occurs is (such as new in plug-in unit catalogue for plug-in unit catalogue
Increase plug-in unit) when, plug-in unit monitor can be to the inspection in terms of plug-in unit progress function, safety and the suitability increased newly under plug-in unit catalogue
It surveys.
(2) setting plug-in unit rule, which refers to, is arranged a template (i.e. plug-in unit rule) for certain types of plug-in unit, wherein in mould
The function that the plug-in unit of storage the type is realized in version, the Naming conventions of plug-in unit, Version Spec etc..
(3) verification plug-in unit rule refers to when increasing plug-in unit newly in plug-in unit catalogue, and plug-in unit monitor can verify newly-increased insert
The function whether part is realized, and name, whether version etc. meets Naming conventions in plug-in unit rule, Version Spec etc..
(4) load or unload plug-in unit refers to if the plug-in unit meets set plug-in unit rule, the load of plug-in unit monitor
Plug-in unit;After plug-in unit monitor detects the order of one plug-in unit of unloading, first the plug-in unit can be detected, if the plug-in unit does not have
Task is currently running and waits running without task dispatching, then can delete the plug-in unit from plug-in unit catalogue, to unload plug-in unit;It is no
Then, whether prompting the user with has task to be currently running at the plug-in unit or has task dispatching to wait running, and prompt the user with and to close
It closes task or is unloaded again after waiting task execution.
In the disclosure, since the operation for executing agency each during the mission dispatching of resource group is independent mutually,
Therefore replacement, dilatation, the upgrading for executing agent node do not interfere with other and execute being carrying out on agent node for task.
In one embodiment of the invention, it is assumed that tenant A is there are two resource group, and there are two execute for each set of resources
Agent node wherein has 3 execution agent nodes in resource group 1, docks Hadoop cluster, in resource group 2 there are two hold
Row agent node docks flink cluster.
There are three task flows now, are task flow -1, task flow -2 and task flow -3 respectively, wherein task flow -1 has 10
Subtask is all hive task, and task flow -2 has 5 tasks, is all spark task, and task flow -3 is flink task.Two kinds
Task is submitted to resource group 1 simultaneously.Node 2 is carrying out task at this time.As shown in Fig. 2, main controlled node when checking resource, is sent out
Existing 2 inadequate resource of node, therefore task flow 1 and 2 is submitted to respectively on node 1 and node 3.
Task flow -3 as a control group, most starts on the execution agent node being submitted in resource group 2.Generation task
Stream -3 is not all appointed to task flow -3 to embody no matter the task in resource group 1 fails and successfully (that is, obstruction or execution)
What is influenced.This embody task execution between resource group be environmentally isolated and resource isolation.
At this point, the hive task of task flow -1 causes task flow blocked because of certain reason mission failure.Task flow 2 continues
It executes, two task flows are independent of each other.Resource status is as shown in Figure 3.
It executes agent node discovery task flow -1 and executes failure, start to re-run task.When going to the 4th task,
It was found that 2 task execution of node finishes, resource is sufficient, and remaining task is then assigned to node 2 and is executed, as shown in Figure 4.Wherein,
The task of re-running refers to: when main controlled node detects that task status exception, such as obstruction or non-artificial factor (include from heartbeat
Java Virtual Machine collapse etc.), when so as to cause mission failure, main controlled node can find again a healthy execution for the task
Agent node re-issues task.In above process, task flow -3:flink task whole process is unaffected.
In the disclosure, statement " User " refers to user, and " node " refers to node.
As needed, the system of various embodiments of the present invention, method and apparatus can be implemented as pure software (such as with
Java or C Plus Plus are come the software program write), it also can according to need and be embodied as pure hardware (such as dedicated ASIC core
Piece or fpga chip), the system for being also implemented as combining software and hardware (such as is stored with the fixer system of fixed code
Or the system with general-purpose storage and processor).
Another aspect of the present invention is a kind of computer-readable medium, is stored thereon with computer-readable instruction, described
Instruct the method for being performed implementable various embodiments of the present invention.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and
It is not limited to the disclosed embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The range of claimed theme only by
The attached claims are defined.
Claims (22)
1. a kind of task dissemination system, comprising:
Requesting layer receives the request for one task of execution that tenant submits;
Decision-making level receives the request from the requesting layer, and determines the task for whither distributing the request;With
Execution level, including at least one executes agency and at least one resource group, wherein it is described at least one execute in agency
Each execution agent binding to a resource group at least one described resource group,
Wherein, at least one described resource for executing the resource group that each execution agency in agency will arrive bound in itself uses
Situation is reported to the decision-making level, and the decision-making level is at least based on each execution generation at least one execution agency
Reported resource behaviour in service is managed to decide whether to act on behalf of to the execution and distribute the task.
2. task dissemination system according to claim 1, wherein included by decision-making level's maintenance and the execution level
The information of resource group, the information of the resource group include to execute the resource of environment corresponding to the information acted on behalf of, execution agency
Information and executable permission information.
3. task dissemination system according to claim 1, wherein energy is acted on behalf of at least one described execution executed in agency
Enough heartbeats that itself is sent to the decision-making level, so that the decision-making level judges the health status of execution agency.
4. task dissemination system according to claim 1, wherein the decision-making level can identify the type of the task with
And the environment and resource of the required by task, and agency is executed for task choosing adaptation.
5. task dissemination system according to claim 1, wherein the decision-making level can according to the request of the task,
The task is distributed at least two resource groups.
6. task dissemination system according to claim 1, wherein the decision-making level can be distributed and execute one
When mission failure, which is re-distributed to another execution agency and reruns the task.
7. task dissemination system according to claim 1, wherein energy is acted on behalf of at least one described execution executed in agency
Enough resource situations that itself is reported to the decision-making level, so as to decision-making level use when selection is executed and acted on behalf of.
8. task dissemination system according to claim 1, wherein energy is acted on behalf of at least one described execution executed in agency
It is enough to being distributed to thereon for task carry out starting monitoring, cease listening for, duplicate removal is monitored and status monitoring at least one of.
9. task dissemination system according to claim 1, wherein energy is acted on behalf of at least one described execution executed in agency
Enough in the case where the task thereon that is distributed to terminates, fails or stops, current state is reported to the decision-making level, and clear
Manage and discharge resource used in the task.
10. task dissemination system according to claim 1, wherein execute agency at least one execution agency
The type and relevant information of capable of being distributed according to the decision-making level for task, execute the task.
11. task dissemination system according to claim 1, wherein execute agency at least one execution agency
It is implanted in the form of plug-in unit at least one described resource group, wherein the plug-in unit can be by dynamic load or unload.
12. task dissemination system according to claim 11, wherein the plug-in unit by plug-in unit monitor load or unload,
And when the plug-in unit monitor detects the order of one plug-in unit of unloading, the plug-in unit monitor detects the plug-in unit and carries out such as
Lower operation:
If the task that the plug-in unit is not currently running and the task without waiting for operation, by the plug-in unit from plug-in unit mesh
It is deleted in record, so that the plug-in unit is unloaded,
If the plug-in unit with being currently running for task or with the waiting operation of the task, is prompted the user with and is inserted described
Part has task to be currently running or has task dispatching to wait running, and prompts the user with whether to close being currently running for the task
Or the plug-in unit is unloaded again after waiting the task execution of operation.
13. task dissemination system according to claim 11, wherein the plug-in unit by plug-in unit monitor load or unload,
And the plug-in unit monitor, which has, to be monitored in plug-in unit catalog function, setting plug-in unit rule functional and verification plug-in unit rule functional
At least one function, the plug-in unit rule limit the version of the function of the plug-in unit, the Naming conventions of the plug-in unit and the plug-in unit
This specification.
14. task dissemination system according to claim 13, wherein when increasing a plug-in unit newly in the plug-in unit catalogue,
The plug-in unit monitor detects function, safety and the suitability of the plug-in unit.
15. task dissemination system according to claim 13, wherein described when increasing a plug-in unit newly in plug-in unit catalogue
Plug-in unit monitor verifies whether the plug-in unit has realized function, and whether the name of the plug-in unit and version meet the plug-in unit
Naming conventions and Version Spec in rule.
16. a kind of method of the distributed tasks in big data platform, wherein the big data platform includes that at least one executes generation
Reason and at least one resource group, it is described at least one execute agency in each execution agent binding arrive described at least one resource
A resource group in group, this method comprises:
(1) request for one task of execution that tenant submits is received;And
(2) at least based on it is described at least one execute agency in each execution agency resource behaviour in service come decide whether to
Task described in the execution scheduling agent.
17. the method for the distributed tasks according to claim 16 in big data platform, wherein the step (2) is also wrapped
The environment and resource of the type and the required by task that identify the task are included, and is the execution generation of task choosing adaptation
Reason.
18. the method for the distributed tasks according to claim 16 in big data platform, wherein the step (2) is also wrapped
The task is distributed at least two resource groups by the request for including the task.
19. the method for the distributed tasks according to claim 16 in big data platform, wherein the step (2) is also wrapped
Include one be distributed and execute mission failure when, which is re-distributed to another execution and acts on behalf of and reruns this
Business.
20. the method for the distributed tasks according to claim 16 in big data platform, further includes:
(3) by it is described at least one execute executions in agency act on behalf of to being distributed to thereon for task carry out starting monitoring,
Cease listening for, duplicate removal monitor and status monitoring at least one of.
21. the method for the distributed tasks according to claim 16 in big data platform, further includes:
(3) by executing agency at least one execution agency, terminate, fail or stop in the task thereon that is distributed to
In the case where only, current state is reported to the decision-making level, and clear up and discharge resource used in the task.
22. a kind of computer-readable medium is stored thereon with computer-readable instruction, the computer-readable instruction is by computer
The method as described in one of any in claim 16-21 is able to carry out when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810695002.5A CN109298897A (en) | 2018-06-29 | 2018-06-29 | A kind of system and method that the task using resource group is distributed |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810695002.5A CN109298897A (en) | 2018-06-29 | 2018-06-29 | A kind of system and method that the task using resource group is distributed |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109298897A true CN109298897A (en) | 2019-02-01 |
Family
ID=65168282
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810695002.5A Pending CN109298897A (en) | 2018-06-29 | 2018-06-29 | A kind of system and method that the task using resource group is distributed |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109298897A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459666A (en) * | 2020-03-26 | 2020-07-28 | 北京金山云网络技术有限公司 | Task dispatching method and device, task execution system and server |
CN112910796A (en) * | 2021-01-27 | 2021-06-04 | 北京百度网讯科技有限公司 | Traffic management method, apparatus, device, storage medium, and program product |
CN113032141A (en) * | 2021-02-10 | 2021-06-25 | 山东英信计算机技术有限公司 | AI platform resource switching method, system and medium |
CN113806097A (en) * | 2021-09-29 | 2021-12-17 | 杭州网易云音乐科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101715001A (en) * | 2009-10-21 | 2010-05-26 | 南京邮电大学 | Method for controlling execution of grid task |
CN103685066A (en) * | 2012-09-18 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Dynamic distributed scheduling method and system |
CN103699443A (en) * | 2013-12-16 | 2014-04-02 | 北京神州绿盟信息安全科技股份有限公司 | Task distributing method and scanner |
US8706798B1 (en) * | 2013-06-28 | 2014-04-22 | Pepperdata, Inc. | Systems, methods, and devices for dynamic resource monitoring and allocation in a cluster system |
CN103986766A (en) * | 2014-05-19 | 2014-08-13 | 中国工商银行股份有限公司 | Self-adaptation load balancing job task scheduling method and device |
US8903981B2 (en) * | 2008-05-05 | 2014-12-02 | International Business Machines Corporation | Method and system for achieving better efficiency in a client grid using node resource usage and tracking |
CN104317658A (en) * | 2014-10-17 | 2015-01-28 | 华中科技大学 | MapReduce based load self-adaptive task scheduling method |
CN105653365A (en) * | 2016-02-22 | 2016-06-08 | 青岛海尔智能家电科技有限公司 | Task processing method and device |
US20160266930A1 (en) * | 2015-03-11 | 2016-09-15 | Accenture Global Services Limited | Queuing tasks in a computer system |
CN106888256A (en) * | 2017-02-21 | 2017-06-23 | 广州神马移动信息科技有限公司 | Distributed monitoring system and its monitoring and dispatching method and device |
CN106933662A (en) * | 2017-03-03 | 2017-07-07 | 广东神马搜索科技有限公司 | Distributed system and its dispatching method and dispatching device |
US20170262319A1 (en) * | 2016-03-11 | 2017-09-14 | Chris Newburn | Task mapping for heterogeneous platforms |
KR20170116439A (en) * | 2016-04-11 | 2017-10-19 | 한국전자통신연구원 | Apparatus for scheduling task |
CN107621978A (en) * | 2017-09-29 | 2018-01-23 | 郑州云海信息技术有限公司 | A kind of High Availabitity task processing Controlling model under parallel computation environment |
US9886328B2 (en) * | 2016-03-11 | 2018-02-06 | Intel Corporation | Flexible binding of tasks to target resources |
-
2018
- 2018-06-29 CN CN201810695002.5A patent/CN109298897A/en active Pending
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8903981B2 (en) * | 2008-05-05 | 2014-12-02 | International Business Machines Corporation | Method and system for achieving better efficiency in a client grid using node resource usage and tracking |
CN101715001A (en) * | 2009-10-21 | 2010-05-26 | 南京邮电大学 | Method for controlling execution of grid task |
CN103685066A (en) * | 2012-09-18 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Dynamic distributed scheduling method and system |
US8706798B1 (en) * | 2013-06-28 | 2014-04-22 | Pepperdata, Inc. | Systems, methods, and devices for dynamic resource monitoring and allocation in a cluster system |
CN103699443A (en) * | 2013-12-16 | 2014-04-02 | 北京神州绿盟信息安全科技股份有限公司 | Task distributing method and scanner |
CN103986766A (en) * | 2014-05-19 | 2014-08-13 | 中国工商银行股份有限公司 | Self-adaptation load balancing job task scheduling method and device |
CN104317658A (en) * | 2014-10-17 | 2015-01-28 | 华中科技大学 | MapReduce based load self-adaptive task scheduling method |
US20160266930A1 (en) * | 2015-03-11 | 2016-09-15 | Accenture Global Services Limited | Queuing tasks in a computer system |
CN105653365A (en) * | 2016-02-22 | 2016-06-08 | 青岛海尔智能家电科技有限公司 | Task processing method and device |
US20170262319A1 (en) * | 2016-03-11 | 2017-09-14 | Chris Newburn | Task mapping for heterogeneous platforms |
US9886328B2 (en) * | 2016-03-11 | 2018-02-06 | Intel Corporation | Flexible binding of tasks to target resources |
KR20170116439A (en) * | 2016-04-11 | 2017-10-19 | 한국전자통신연구원 | Apparatus for scheduling task |
CN106888256A (en) * | 2017-02-21 | 2017-06-23 | 广州神马移动信息科技有限公司 | Distributed monitoring system and its monitoring and dispatching method and device |
CN106933662A (en) * | 2017-03-03 | 2017-07-07 | 广东神马搜索科技有限公司 | Distributed system and its dispatching method and dispatching device |
CN107621978A (en) * | 2017-09-29 | 2018-01-23 | 郑州云海信息技术有限公司 | A kind of High Availabitity task processing Controlling model under parallel computation environment |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459666A (en) * | 2020-03-26 | 2020-07-28 | 北京金山云网络技术有限公司 | Task dispatching method and device, task execution system and server |
CN112910796A (en) * | 2021-01-27 | 2021-06-04 | 北京百度网讯科技有限公司 | Traffic management method, apparatus, device, storage medium, and program product |
CN112910796B (en) * | 2021-01-27 | 2022-12-16 | 北京百度网讯科技有限公司 | Traffic management method, apparatus, device, storage medium, and program product |
CN113032141A (en) * | 2021-02-10 | 2021-06-25 | 山东英信计算机技术有限公司 | AI platform resource switching method, system and medium |
CN113032141B (en) * | 2021-02-10 | 2022-09-20 | 山东英信计算机技术有限公司 | AI platform resource switching method, system and medium |
CN113806097A (en) * | 2021-09-29 | 2021-12-17 | 杭州网易云音乐科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109298897A (en) | A kind of system and method that the task using resource group is distributed | |
US10423451B2 (en) | Opportunistically scheduling and adjusting time slices | |
JP6190389B2 (en) | Method and system for performing computations in a distributed computing environment | |
JP6132766B2 (en) | Controlled automatic recovery of data center services | |
US8549536B2 (en) | Performing a workflow having a set of dependancy-related predefined activities on a plurality of task servers | |
JP6599439B2 (en) | Equal sharing of system resources in workflow execution | |
Abd Latiff | A checkpointed league championship algorithm-based cloud scheduling scheme with secure fault tolerance responsiveness | |
US20120023209A1 (en) | Method and apparatus for scalable automated cluster control based on service level objectives to support applications requiring continuous availability | |
US8381016B2 (en) | Fault tolerance for map/reduce computing | |
KR20140101358A (en) | Increasing availability of stateful applications | |
CN110134505A (en) | A kind of distributed computing method of group system, system and medium | |
CN113886089B (en) | Task processing method, device, system, equipment and medium | |
CN103744734A (en) | Method, device and system for task operation processing | |
Kurazumi et al. | Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce | |
Gokhroo et al. | Detecting and mitigating faults in cloud computing environment | |
US10733554B2 (en) | Information processing apparatus and method for managing connections | |
CN111190691A (en) | Automatic migration method, system, device and storage medium suitable for virtual machine | |
CN110727508A (en) | Task scheduling system and scheduling method | |
CN111831424B (en) | Task processing method, system and device | |
GB2514585A (en) | Task scheduler | |
US20220413941A1 (en) | Computing clusters | |
US10474544B1 (en) | Distributed monitoring agents for cluster execution of jobs | |
US20180341519A1 (en) | Node-local-unscheduler for scheduling remediation | |
EP3811227B1 (en) | Methods, devices and systems for non-disruptive upgrades to a distributed coordination engine in a distributed computing environment | |
US10324758B1 (en) | Read load task throttling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190201 |