CN106933662A - Distributed system and its dispatching method and dispatching device - Google Patents
Distributed system and its dispatching method and dispatching device Download PDFInfo
- Publication number
- CN106933662A CN106933662A CN201710126701.3A CN201710126701A CN106933662A CN 106933662 A CN106933662 A CN 106933662A CN 201710126701 A CN201710126701 A CN 201710126701A CN 106933662 A CN106933662 A CN 106933662A
- Authority
- CN
- China
- Prior art keywords
- node
- service node
- task
- distributed system
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
Abstract
The invention discloses a kind of distributed system and its dispatching method and dispatching device.Distributed system includes being provided to multiple service nodes the scheduling node and the multiple service nodes for operation task of task.Service node periodically can send Current resource occupied information and current task queuing message to scheduling node.Scheduling node can judge whether the service node can use according to Current resource occupied information and current task queuing message, it is possible to suspend the task new to disabled service node transmission is judged as.Thus, scheduling node can carry out the scheduling of resource between available service node, and can avoid for new task being distributed to disabled service node, prevent new task from lying on the table.
Description
Technical field
The present invention relates to distributed technical field, more particularly to a kind of distributed system and its dispatching method and scheduling are filled
Put.
Background technology
With the continuous expansion of data, general one-of-a-kind system can not meet the process task of big data.Therefore
The distributed system constituted using many machines turns into the solution of current main flow.In a distributed system, a unusual core
The problem of the heart is the scheduling of resource.In such as HADOOP (the distributed system architecture developed by Apache funds club),
The working node for being responsible for performing specific process task is assigned on each machine of cluster by central dispatching system.
The running environment of each machine is complex in distributed system, therefore how to each machine in distributed system
Device is scheduled, and during causing to occur unit problem in a distributed system, remains to keep the normal consistency of resource, is to divide at present
The subject matter of cloth systems face.
The content of the invention
It is a primary object of the present invention to provide a kind of distributed system and its dispatching method and dispatching device, it can be
When occurring unit problem in distributed system, remain to keep the normal consistency of resource.
According to an aspect of the invention, there is provided a kind of distributed system, including provide task to multiple service nodes
Scheduling node and the multiple service nodes for operation task, wherein, service node periodically to scheduling node send work as
Preceding resource occupation information and current task queuing message;Scheduling node is believed according to Current resource occupied information and current task queue
Breath, judges whether the service node can use;And scheduling node suspends new to disabled service node transmission is judged as
Task.
Thus scheduling node can carry out the scheduling of resource between available service node, and can avoid appointing new
Business is distributed to disabled service node, prevents new task from lying on the table.
Preferably, scheduling node is being not received by situation of the information from specific transactions node up to first scheduled time
It is lower that the specific transactions node is deleted from distributed system.So, the scheduled time was waited before predicate node death, can
Evade the thrashing that the of short duration abnormal belt of node is come.In addition, mutually being tied with the deactivation of busy node by by the deletion of lost contact node
Close, inferior position resource, effective prompt system effect can be eliminated on the premise of Resource Availability is ensured, preferentially using superior resources
Rate simultaneously prevents performance issue.
Preferably, scheduling node reaches first threshold in unavailable service node number and/or deleted service node number reaches
During to Second Threshold, new service node is enabled.
Thus, ensure that distributed system can provide enough resources and perform task by enabling new service node
Treatment.
Preferably, be judged to service node not by scheduling node pause when unavailable service node number is higher than three threshold value
Can use.
Thus, when unavailable service node reaches the upper limit, no longer service node can be judged to it is unavailable, with ensure
Enough resources can be provided in distributed system to perform the treatment of task.
Preferably, scheduling node in Current resource occupied information higher than the 4th threshold value and current task queuing message higher than the
Judge that the service node is unavailable during five threshold values.
By according to Current resource occupied information and current task queuing message, more fully can exactly judge business
Unavailable, such pressure that on the one hand can mitigate service node of node, on the other hand can distribute to other by new task
The service node being adapted for carrying out, to allow that new task is quickly performed.
Preferably, again be set to unavailable service node to can use in the case of at least one following by scheduling node:When
Preceding resource occupation information is not higher than the 4th threshold value and/or current task queuing message is not higher than the 5th threshold value;And service node
It is set as unavailable up to second scheduled time.
Thus, be judged as disabled service node input coefficient can make again in the case where it meets condition
With.
According to another aspect of the present invention, a kind of dispatching method of distributed system, distributed system bag are additionally provided
The multiple service nodes for operation task are included, the method includes:The Current resource for periodically obtaining service node takes letter
Breath and current task queuing message;According to Current resource occupied information and current task queuing message, judge that the service node is
It is no available;And suspend the task new to disabled service node transmission is judged as.
Preferably, the dispatching method can also include:It is being not received by the information from specific transactions node up to first
The specific transactions node is deleted from distributed system in the case of the scheduled time.
Preferably, the dispatching method can also include:First threshold is reached in unavailable service node number and/or be deleted
When service node number reaches Second Threshold, new service node is enabled.
Preferably, the dispatching method can also include:Suspend industry when unavailable service node number is higher than three threshold value
Business node is judged to unavailable.
According to a further aspect of the invention, a kind of dispatching device of distributed system, distributed system bag are additionally provided
The multiple service nodes for operation task are included, the device is used for multiple service nodes granting task, and including:Information is obtained
Unit is taken, Current resource occupied information and current task queuing message for periodically obtaining service node;Enabled node
Judging unit, for according to Current resource occupied information and current task queuing message, judging whether the service node can use;With
And Dispatching adjustment unit, for suspending the task new to disabled service node transmission is judged as.
Preferably, the dispatching device can also include:Knot removal unit, for being not received by from specific transactions
The information of node deletes the specific transactions node from distributed system up in the case of first scheduled time.
Preferably, the dispatching device can also include:New node enables unit, for being reached in unavailable service node number
When first threshold and/or deleted service node number reach Second Threshold, new service node is enabled.
Distributed system of the invention and its dispatching method and dispatching device, the Current resource sent using service node are accounted for
With information and current task queuing message, judge whether the service node can use, when judging unavailable, can suspend to the industry
Business node sends new task.Thus scheduling node can carry out the scheduling of resource between available service node, and can be with
Avoid for new task being distributed to disabled service node, prevent new task from lying on the table.
Brief description of the drawings
Disclosure illustrative embodiments are described in more detail by with reference to accompanying drawing, the disclosure above-mentioned and its
Its purpose, feature and advantage will be apparent, wherein, in disclosure illustrative embodiments, identical reference number
Typically represent same parts.
Fig. 1 shows the functional block diagram of distributed system according to an embodiment of the invention.
Fig. 2 shows the indicative flowchart of the dispatching method of distributed system according to an embodiment of the invention.
Fig. 3 shows the indicative flowchart of the dispatching device of distributed system according to an embodiment of the invention.
Fig. 4 shows the state transition diagram of the service node under a specific embodiment of the invention.
Specific embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here
Formula is limited.Conversely, these embodiments are provided so that the disclosure is more thorough and complete, and can be by the disclosure
Scope intactly conveys to those skilled in the art.
Embodiments of the invention are specifically described below with reference to Fig. 1 to Fig. 4.Fig. 1 is showed according to the present invention one
The functional block diagram of the distributed system 100 of embodiment.
As shown in figure 1, distributed system 100 includes scheduling node 110 and one or more service nodes 120.Scheduling section
Point 110 can distribute task by sending dispatch command between multiple service nodes 120.
Scheduling node 110 and service node 120 can be disposed in the server.As the presently preferred embodiments, different industry
Business node 120 can be deployed in different servers, and scheduling node 110 can be deployed in different from service node 120
An independent server in, it is also possible to and one of service node 120 is deployed in same server.In accompanying drawing
Line represents between scheduling node 110 and service node 120 existence information interaction, and above-mentioned line can be wired connection, wireless
Connection, or any type of connection that information transmission can be carried out.
Service node 120 periodically can send information to scheduling node 110.Above- mentioned information may certify that service node
Survival, i.e. normal connection in systems, therefore is referred to alternatively as " heartbeat message ".Can also be included in the information of transmission is used for
The information of the currently used situation of service node, these information is indicated to be referred to alternatively as " health status " information, for characterizing specific section
The health status of point.In one embodiment, health status information can be or including Current resource occupied information and as predecessor
Business queuing message.
Scheduling node 110 can be according to the health status information from service node 120 be received, and such as Current resource is accounted for
Judge whether the service node can use with information and current task queuing message.Service node 120 " health status " no
When good, such as Current resource occupied information and current task queuing message are too high, when indicating the node current excessively busy, scheduling
Node 110 can suspend and send new task to being judged as disabled service node 120.
The concrete processing procedure of scheduling node 110 may refer to Fig. 2, and Fig. 2 shows according to an embodiment of the invention
The indicative flowchart of the dispatching method of distributed system.
Referring to Fig. 2, method 200 starts from step S210, periodically obtains the health status information of service node, for example
Current resource occupied information and current task queuing message.Current resource occupied information described herein can include service node
CPU and the information of internal memory that the task of 120 current operations takes.CPU, EMS memory occupation situation can for example be indicated
Iowait status informations.Current task queuing message can include the quantity of the currently outstanding task of service node 120.
Whether scheduling node 110 can judge the available (step of the service node 120 according to the health status information for obtaining
S220), it is possible to suspend and send new task (step S230) to being judged as disabled service node 120.
Specifically, the Current resource occupied information of service node 120 is too high, shows the current institute's energy of the service node 120
The resource-constrained for running new task for providing, and the current task queuing message of service node 120 is too high, then show its mesh
Preceding need number of tasks to be processed is excessive, has been unsuitable for performing new task.Therefore scheduling node 110 can take Current resource
Information is judged to disabled business higher than the 4th threshold value and current task queuing message higher than the service node 120 of the 5th threshold value
Node, and pause is sent to new task.
It should be noted that the concrete numerical value of each threshold value addressed herein and hereafter can set according to actual conditions,
And " the 4th " described herein, " the 5th " and " first ", " second ", " the 3rd " that will hereafter address are merely to carry out
Distinguish, should not be construed as the limitation in any sense to invention.
The Current resource occupied information of service node is too high be probably the machine as where the service node hard disk it is aging
Cause, it is also possible to caused because the resource of the task occupancy of service node operation is larger.Can be eliminated for the former
The service node, it is also possible to which manpower intervention is repaired.For the latter, current task team can be continued to run with by the service node
Task in row, it is also possible to stop the operation of the service node, and the task in its current task queue is transferred into other business
Node or the new service node operation for enabling.
For being judged as disabled service node 120, scheduling node 110 can be incited somebody to action in its satisfaction with after condition
It is set to available service node again.
For example, scheduling node 110 can take letter in the Current resource for being previously determined to be disabled service node 120
When breath is not higher than the 4th threshold value and/or current task queuing message and is not higher than five threshold values, the service node 120 is reset
It is available.In addition, scheduling node 110 can also service node 120 be set as it is unavailable reach the second Preset Time when, by it
Again it is set to can use.For being newly set to available service node 120, scheduling node 110 can be to the new task of its granting.
For above-mentioned threshold value and Preset Time, can flexibly be set according to the comprehensive condition of system.
As a preferred embodiment of the present invention, can by service node 120 be set as it is unavailable up to second it is default when
Between when, it is set to again to can use.Thus, when multiple service nodes 120 are judged as unavailable in distributed system 100,
Can service node 120 be set as it is unavailable up to the second Preset Time when, available service node is set to automatically so that
Distributed system 100 has certain self-repairing capability, and also can be carried out for the disabled service node of manpower intervention
The offer time is provided.
Describe what scheduling node 110 in distributed system 100 sent according to service node 120 in detail above in association with Fig. 2
Current resource occupied information and current task queuing message, judge that service node 120 is unavailable and by disabled business section
Point 120 is set to the process of available service node again.
The machine where service node in distributed system can occur the abnormal conditions such as suspension or mechanical disorder frequently.
In the prior art, scheduling node is periodically received the heartbeat message from service node, then can determine that the node " survival ".
Once scheduling node is not received by the information from a certain service node, the node " death " will be immediately judged, but it is this
Mode is unfavorable for evading temporary transient system or network jitter.For example, when the extremely short interruption of the network time of origin of a certain cluster
When, although communication may recover immediately, but scheduling node 110 in a cycle due to being not received by from the cluster
Service node " heartbeat " information and judge in the cluster that a large number of services node is dead, and by it from system-kill, so as to lead
The entirety ability of cause system is affected.Although being judged as dead and deleted node then can be by a series of
Cold heat operate or manpower intervention and by access system again, but this process is complicated and time-consuming, so that system
Robustness and usability reduction.
For such case, in distributed system of the invention 100, survival of the scheduling node 110 to service node 120
The judgement of state can also add " unknown " state.That is, when service node heartbeat is lost, the node is not judged at once
" death ", but the node is set to " unknown " state, and start timing.If communication recovery, weighs within a predetermined period of time
Newly the node state is set to " survival ".But then may be used again without the information from the node is received within a predetermined period of time
The node is judged to death.Thereby, it is possible to avoid service node because the failure such as network temporarily cannot be logical with scheduling node 110
Believe and cause to be deleted by the system, so as to evade the of short duration exception of cluster, reduce thrashing.
For the service node 120 for being judged as death, scheduling node 110 can delete it from distributed system 100,
No longer from the node scheduling resource." deletion " described herein is primarily referred to as the deletion on software, i.e. system is no longer received and come from
The information of node.After " death " node recovers normal operation, can be weighed via a series of operation of cold heats or manpower intervention
It is new to access distributed system.Thus, scheduling node 110 can in guarantee system service node can with while avoid it is of short duration therefore
System fluctuation caused by barrier, safeguards so that can reject in time by the service node 120 in distributed system 100
The service node 120 gone wrong in distributed system 100, it is possible thereby to carry out task between normal service node 120
Scheduling.
In one embodiment, it is distributed in order to ensure when deleting service node and/or judging that service node is unavailable
System 100 still is able to provide treatment of enough resources to perform task, and scheduling node 110 can be in one business section of every judgement
When point is unavailable or often deletes a service node, a new service node is enabled.That herein and hereafter addresses enables
New service node can be judged as unavailable or deleted service node through before over recovery.
In addition, scheduling node 110 can also enable new business section when unavailable service node number reaches first threshold
Point, or when deleted service node number reaches Second Threshold, new service node is enabled, or in unavailable service node
Number reaches first threshold and deleted service node number reaches the condition of Second Threshold while when meeting, enabling new business section
Point.Wherein, first threshold and Second Threshold may be greater than or the natural number equal to 1.
Further, scheduling node 110 can also suspend business when unavailable service node number is higher than three threshold value
Node is judged to unavailable.
Fig. 3 shows the functional block diagram of the dispatching device of distributed system according to an embodiment of the invention.Wherein, adjust
The functional module for spending device 300 can be by realizing hardware, software or the hardware and software of the principle of the invention combination realize.This
Art personnel submodule it is understood that the functional module described by Fig. 3 can combine or be divided into, so that
Realize the principle of foregoing invention.Therefore, description herein can support any possible group to functions described herein module
Close or divide or further limit.
Dispatching device 300 shown in Fig. 3 can be used to realize the dispatching method shown in Fig. 2, below only with regard to dispatching device 300
The operation that the functional module and each functional module that can have can be performed is described briefly, for the detail portion being directed to
Divide the description that may refer to above in association with Fig. 1, Fig. 1, repeat no more here.
Referring to Fig. 3, dispatching device 300 includes that information acquisition unit 310, enabled node judging unit 320 and scheduling are adjusted
Engagement positions 330.
Information acquisition unit 310 is used to periodically obtain Current resource occupied information and the current task team of service node
Column information.Enabled node judging unit 320 is used for according to Current resource occupied information and current task queuing message, judges the industry
Whether business node can use.Dispatching adjustment unit 330 is used to suspend the task new to disabled service node transmission is judged as.
Preferably, enabled node judging unit 320 can be in Current resource occupied information higher than the 3rd threshold value and as predecessor
Business queuing message judges that the service node is unavailable when being higher than four threshold values.Further, enabled node judging unit 320 can be with
When Current resource occupied information is not higher than the 3rd threshold value and/or the current task queuing message is not higher than four threshold values, or
Person's service node be set as it is unavailable up to second scheduled time when, unavailable service node is set to again to can use.In addition, available
Node judging unit 320 can also when unavailable service node number is higher than three threshold value pause be judged to service node can not
With.
As shown in figure 3, dispatching device 300 can also alternatively include knot removal unit 350.Knot removal unit 350
For the information from specific transactions node that is not received by up in the case of first scheduled time by the specific transactions node
Deleted from the distributed system.
As shown in figure 3, dispatching device 300 alternatively can also enable unit 370 including new node.New node enables unit
370 are used to, when unavailable service node number reaches first threshold and/or deleted service node number reaches Second Threshold, enable
New service node.
Above distributed system of the invention and its dispatching method and scheduling are described in detail by reference to accompanying drawing
Device.The following concrete application that will provide the scheduling that distributed system is realized using the present invention.
Application examples
Distributed system in the present embodiment can be used for the application scenarios that scheduler object is service processes, for example, recommending
In engine, fixed schedule node provides the service node of recommendation service by dispatching in distributed system.
Service node periodically can send to scheduling node to be believed comprising iowait status informations and task queue length
The heartbeat packet of breath, scheduling node can be according to the situation of the heartbeat packet for receiving service node transmission, by the heartbeat of service node
State is divided into alive (survival), unknown (unknown), dead (death), and also can be sent according to service node is received
Iowait status informations and task queue length information, by the health status of service node be divided into invalid (invalid) and
Valid (effective).The implication of each heart beat status and health status is as follows:
Alive, scheduling node can receive the heartbeat packet of service node;
Unknown, scheduling node does not receive the heartbeat packet of service node, and service node may be dead, it is also possible to
Because of short duration heartbeat is lost caused by the failures such as network;
Dead, scheduling node has continued for some time the heartbeat for not receiving working node, show node it is dead or
Person's network etc. cannot recover in the failure short time, and node is no longer appropriate for offer service;
Invalid, because non-lethal factor causes the ability that working node provides service to be substantially weaker than other working nodes,
Such as working node is currently located machine environment badly (low memory, hard disk is aging), and node is no longer appropriate for offer service;
Valid, working node is in health status, it is possible to provide service.
Service node is carried out specifically how the conversion of several states and scheduling node are scheduled with reference to Fig. 4
Ground is illustrated.
1) client submits service routine deployment application to, and after scheduling node is received, the resource requirement according to service exists
Service node (process) is distributed on suitable machine is used for process task, and service node original state is unknow, health status
It is valid;
2) after service node reports heartbeat, scheduling node to receive heartbeat, service node state is set to alive, and more
The health status of new business node;
If 3) service node does not report heartbeat, service node state is set to unknown by scheduling node;And record the loss heart
Between at the beginning of jump;
If 4) specified in time-out time, the heartbeat of service node is not received always, the state of the service node is put
It is dead, the service node no longer receives service request.A new service node is redistributed on another machine simultaneously
For providing service;
If 5), service node current state is alive, iowait that heartbeat is reported and task queue length beyond
Predetermined scope.If being currently at the working node not up to higher limit of invalid states, scheduling node is by the service node
Health status be set to invalid, while redistributing a new service node on another machine for providing service.It is former
Service node will be stopped, and no longer provide service.Within specified a period of time, machine where former service node is no longer used to
The service node distribution of this service.
To sum up, scheduling node can from two dimensions of heart beat status and health status for scheduling of resource provides foundation,
On the premise of ensureing Resource Availability, the preferential resource using advantage eliminates inferior position resource, can effectively prevent performance issue.
Also, service node to increased " unknown " state in the middle of the transfer process of " death " state, can be used for treatment from " survival "
The of short duration exception of machine where service node, reduces thrashing.
Additionally, the method according to the invention is also implemented as a kind of computer program, the computer program includes being used for
Perform the computer program code instruction of the above steps limited in the above method of the invention.Or, it is of the invention
Method is also implemented as a kind of computer program product, and the computer program product includes computer-readable medium, in the meter
It is stored with calculation machine computer-readable recording medium for the computer program for performing the above-mentioned functions limited in the above method of the invention.Ability
Field technique personnel will also understand is that, various illustrative logical blocks, module, circuit and algorithm with reference to described by disclosure herein
Step may be implemented as the combination of electronic hardware, computer software or both.
Flow chart and block diagram in accompanying drawing show the possibility reality of the system and method for multiple embodiments of the invention
Existing architectural framework, function and operation.At this point, each square frame in flow chart or block diagram can represent module, a journey
A part for sequence section or code a, part for the module, program segment or code is used to realize regulation comprising one or more
The executable instruction of logic function.It should also be noted that in some are as the realization replaced, the function of being marked in square frame also may be used
Occur with different from the order marked in accompanying drawing.For example, two continuous square frames can essentially be performed substantially in parallel,
They can also be performed in the opposite order sometimes, and this is depending on involved function.It is also noted that block diagram and/or stream
The combination of the square frame in each square frame and block diagram and/or flow chart in journey figure, can use the function or operation for performing regulation
Special hardware based system realize, or can be realized with the combination of computer instruction with specialized hardware.
It is described above various embodiments of the present invention, described above is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.In the case of without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purport
Best explaining principle, practical application or the improvement to the technology in market of each embodiment, or make the art
Other those of ordinary skill are understood that each embodiment disclosed herein.
Claims (14)
1. a kind of distributed system, including provide the scheduling node of task and for described in operation task to multiple service nodes
Multiple service nodes, wherein,
The service node periodically sends Current resource occupied information and current task queuing message to the scheduling node;
The scheduling node judges the service node according to the Current resource occupied information and the current task queuing message
Whether can use;And
The scheduling node suspends the task new to disabled service node transmission is judged as.
2. distributed system as claimed in claim 1, wherein,
The scheduling node is in the information from specific transactions node that is not received by up to should in the case of first scheduled time
Specific transactions node is deleted from the distributed system.
3. distributed system as claimed in claim 2, wherein, the scheduling node reaches first in unavailable service node number
When threshold value and/or deleted service node number reach Second Threshold, new service node is enabled.
4. distributed system as claimed in claim 1, wherein,
Be judged to for service node unavailable by scheduling node pause when unavailable service node number is higher than three threshold value.
5. distributed system as claimed in claim 1, wherein,
The scheduling node is higher than higher than the 4th threshold value and the current task queuing message in the Current resource occupied information
Judge that the service node is unavailable during five threshold values.
6. distributed system as claimed in claim 5, wherein, the scheduling node will not in the case of at least one following
Available service node is set to can use again:
The Current resource occupied information is not higher than the 4th threshold value and/or the current task queuing message is not higher than the 5th threshold
Value;And
The service node is set as unavailable up to second scheduled time.
7. a kind of dispatching device of distributed system, the distributed system includes the multiple service nodes for operation task,
The device is used for the multiple service node granting task, and including:
Information acquisition unit, Current resource occupied information and current task queue letter for periodically obtaining service node
Breath;
Enabled node judging unit, for according to the Current resource occupied information and the current task queuing message, judging
Whether the service node can use;And
Dispatching adjustment unit, for suspending the task new to disabled service node transmission is judged as.
8. dispatching device as claimed in claim 7, also includes:
Knot removal unit, in the information from specific transactions node that is not received by up in the case of first scheduled time
The specific transactions node is deleted from the distributed system.
9. dispatching device as claimed in claim 8, also includes:
New node enables unit, for reaching first threshold in unavailable service node number and/or deleted service node number reaches
During to Second Threshold, new service node is enabled.
10. dispatching device as claimed in claim 7, wherein,
Be judged to for service node by enabled node judging unit pause when unavailable service node number is higher than three threshold value
It is unavailable.
A kind of 11. dispatching methods of distributed system, the distributed system includes the multiple service nodes for operation task,
The method includes:
Periodically obtain the Current resource occupied information and current task queuing message of service node;
According to the Current resource occupied information and the current task queuing message, judge whether the service node can use;With
And
Suspend the task new to disabled service node transmission is judged as.
12. dispatching methods as claimed in claim 11, also include:
The information from specific transactions node that is not received by up in the case of first scheduled time by the specific transactions node
Deleted from the distributed system.
13. dispatching methods as claimed in claim 12, also include:
When unavailable service node number reaches first threshold and/or deleted service node number reaches Second Threshold, enable new
Service node.
14. dispatching methods as claimed in claim 11, also include:
When unavailable service node number is higher than three threshold value, be judged to for service node unavailable by pause.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710126701.3A CN106933662A (en) | 2017-03-03 | 2017-03-03 | Distributed system and its dispatching method and dispatching device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710126701.3A CN106933662A (en) | 2017-03-03 | 2017-03-03 | Distributed system and its dispatching method and dispatching device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106933662A true CN106933662A (en) | 2017-07-07 |
Family
ID=59423938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710126701.3A Pending CN106933662A (en) | 2017-03-03 | 2017-03-03 | Distributed system and its dispatching method and dispatching device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106933662A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108076155A (en) * | 2017-12-22 | 2018-05-25 | 聚好看科技股份有限公司 | Across the method, apparatus, system and server of computer room traffic scheduling |
CN108900379A (en) * | 2018-07-09 | 2018-11-27 | 广东神马搜索科技有限公司 | Distributed network business scheduling method, calculates equipment and storage medium at device |
CN109298897A (en) * | 2018-06-29 | 2019-02-01 | 杭州数澜科技有限公司 | A kind of system and method that the task using resource group is distributed |
CN110597608A (en) * | 2019-09-12 | 2019-12-20 | 阿里巴巴集团控股有限公司 | Task processing method and device, distributed system and storage medium |
CN111176783A (en) * | 2019-11-20 | 2020-05-19 | 航天信息股份有限公司 | High-availability method and device for container treatment platform and electronic equipment |
CN111352709A (en) * | 2018-12-20 | 2020-06-30 | 顺丰科技有限公司 | Task scheduling method and device in distributed system |
CN111698132A (en) * | 2020-06-12 | 2020-09-22 | 北京字节跳动网络技术有限公司 | Method, apparatus, device and medium for controlling heartbeat events in a cluster |
CN112035721A (en) * | 2020-07-22 | 2020-12-04 | 大箴(杭州)科技有限公司 | Crawler cluster monitoring method and device, storage medium and computer equipment |
CN112328372A (en) * | 2020-11-27 | 2021-02-05 | 新华智云科技有限公司 | Kubernetes node self-healing method and system |
CN112804334A (en) * | 2021-01-15 | 2021-05-14 | 京东方科技集团股份有限公司 | Method and device for distributing and acquiring tasks, storage medium and electronic equipment |
CN112988361A (en) * | 2021-05-13 | 2021-06-18 | 神威超算(北京)科技有限公司 | Cluster task allocation method and device and computer readable medium |
CN113076188A (en) * | 2020-01-03 | 2021-07-06 | 阿里巴巴集团控股有限公司 | Scheduling method and device of distributed system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102036188A (en) * | 2009-09-24 | 2011-04-27 | 中国移动通信集团公司 | Mail proxy method, equipment and system under multi-node system |
CN102096599A (en) * | 2009-12-14 | 2011-06-15 | 中国移动通信集团公司 | Multi-queue task scheduling method and related system and equipment |
CN103092712A (en) * | 2011-11-04 | 2013-05-08 | 阿里巴巴集团控股有限公司 | Method and device for recovering interrupt tasks |
CN104838374A (en) * | 2012-12-06 | 2015-08-12 | 英派尔科技开发有限公司 | Decentralizing a HADOOP cluster |
CN106095572A (en) * | 2016-06-08 | 2016-11-09 | 东方网力科技股份有限公司 | The Dispatching System of a kind of big data process and method |
US20160378560A1 (en) * | 2014-02-28 | 2016-12-29 | Pivotal Software, Inc. | Executing a foreign program on a parallel computing system |
-
2017
- 2017-03-03 CN CN201710126701.3A patent/CN106933662A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102036188A (en) * | 2009-09-24 | 2011-04-27 | 中国移动通信集团公司 | Mail proxy method, equipment and system under multi-node system |
CN102096599A (en) * | 2009-12-14 | 2011-06-15 | 中国移动通信集团公司 | Multi-queue task scheduling method and related system and equipment |
CN103092712A (en) * | 2011-11-04 | 2013-05-08 | 阿里巴巴集团控股有限公司 | Method and device for recovering interrupt tasks |
CN104838374A (en) * | 2012-12-06 | 2015-08-12 | 英派尔科技开发有限公司 | Decentralizing a HADOOP cluster |
US20160378560A1 (en) * | 2014-02-28 | 2016-12-29 | Pivotal Software, Inc. | Executing a foreign program on a parallel computing system |
CN106095572A (en) * | 2016-06-08 | 2016-11-09 | 东方网力科技股份有限公司 | The Dispatching System of a kind of big data process and method |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108076155A (en) * | 2017-12-22 | 2018-05-25 | 聚好看科技股份有限公司 | Across the method, apparatus, system and server of computer room traffic scheduling |
CN109298897A (en) * | 2018-06-29 | 2019-02-01 | 杭州数澜科技有限公司 | A kind of system and method that the task using resource group is distributed |
CN108900379B (en) * | 2018-07-09 | 2020-12-29 | 阿里巴巴(中国)有限公司 | Distributed network service scheduling method, device, computing equipment and storage medium |
CN108900379A (en) * | 2018-07-09 | 2018-11-27 | 广东神马搜索科技有限公司 | Distributed network business scheduling method, calculates equipment and storage medium at device |
CN111352709A (en) * | 2018-12-20 | 2020-06-30 | 顺丰科技有限公司 | Task scheduling method and device in distributed system |
CN110597608A (en) * | 2019-09-12 | 2019-12-20 | 阿里巴巴集团控股有限公司 | Task processing method and device, distributed system and storage medium |
CN110597608B (en) * | 2019-09-12 | 2023-08-22 | 创新先进技术有限公司 | Task processing method and device, distributed system and storage medium |
CN111176783A (en) * | 2019-11-20 | 2020-05-19 | 航天信息股份有限公司 | High-availability method and device for container treatment platform and electronic equipment |
CN113076188A (en) * | 2020-01-03 | 2021-07-06 | 阿里巴巴集团控股有限公司 | Scheduling method and device of distributed system |
CN113076188B (en) * | 2020-01-03 | 2024-05-14 | 阿里巴巴集团控股有限公司 | Scheduling method and device of distributed system |
CN111698132A (en) * | 2020-06-12 | 2020-09-22 | 北京字节跳动网络技术有限公司 | Method, apparatus, device and medium for controlling heartbeat events in a cluster |
CN112035721A (en) * | 2020-07-22 | 2020-12-04 | 大箴(杭州)科技有限公司 | Crawler cluster monitoring method and device, storage medium and computer equipment |
CN112328372A (en) * | 2020-11-27 | 2021-02-05 | 新华智云科技有限公司 | Kubernetes node self-healing method and system |
CN112804334A (en) * | 2021-01-15 | 2021-05-14 | 京东方科技集团股份有限公司 | Method and device for distributing and acquiring tasks, storage medium and electronic equipment |
CN112988361A (en) * | 2021-05-13 | 2021-06-18 | 神威超算(北京)科技有限公司 | Cluster task allocation method and device and computer readable medium |
CN112988361B (en) * | 2021-05-13 | 2021-08-20 | 中诚华隆计算机技术有限公司 | Cluster task allocation method and device and computer readable medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106933662A (en) | Distributed system and its dispatching method and dispatching device | |
CN112162865B (en) | Scheduling method and device of server and server | |
CN107688496B (en) | Task distributed processing method and device, storage medium and server | |
CN110622478B (en) | Method and device for data synchronous processing | |
CN107291546B (en) | Resource scheduling method and device | |
US20210334135A1 (en) | Computing node job assignment using multiple schedulers | |
CN109672627A (en) | Method for processing business, platform, equipment and storage medium based on cluster server | |
CN103412786B (en) | High performance server architecture system and data processing method thereof | |
CN109246229A (en) | A kind of method and apparatus of distribution resource acquisition request | |
CN108134814B (en) | Service data processing method and device | |
WO2013104217A1 (en) | Cloud infrastructure based management system and method for performing maintenance and deployment for application system | |
CN111338773A (en) | Distributed timed task scheduling method, scheduling system and server cluster | |
CN109669762A (en) | Cloud computing resources management method, device, equipment and computer readable storage medium | |
CN110677274A (en) | Event-based cloud network service scheduling method and device | |
CN112650575B (en) | Resource scheduling method, device and cloud service system | |
CN114356557B (en) | Cluster capacity expansion method and device | |
CN109584105B (en) | Service response method and system | |
JP2004038516A (en) | Work processing system, operation management method and program for performing operation management | |
CN106164888A (en) | The sequencing schemes of network and storage I/O request for minimizing interference between live load free time and live load | |
CN114579296A (en) | Server idle calculation scheduling method and device and electronic equipment | |
CN109992392A (en) | A kind of calculation resource disposition method, device and Resource Server | |
CN107612731A (en) | One kind is based on the believable network section generation of software definition and credible recovery system | |
US10205630B2 (en) | Fault tolerance method for distributed stream processing system | |
CN106302241A (en) | Online message array dispatching method and device | |
CN106302621B (en) | A kind of message informing method and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200810 Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province Applicant after: Alibaba (China) Co.,Ltd. Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 13 layer self unit 01 Applicant before: Guangdong Shenma Search Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170707 |
|
RJ01 | Rejection of invention patent application after publication |