CN106933662A

CN106933662A - Distributed system and its dispatching method and dispatching device

Info

Publication number: CN106933662A
Application number: CN201710126701.3A
Authority: CN
Inventors: 刘东辉; 褚建辉; 卢申朋; 王新栋
Original assignee: Guangdong Shenma Search Technology Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2017-03-03
Filing date: 2017-03-03
Publication date: 2017-07-07

Abstract

The invention discloses a kind of distributed system and its dispatching method and dispatching device.Distributed system includes being provided to multiple service nodes the scheduling node and the multiple service nodes for operation task of task.Service node periodically can send Current resource occupied information and current task queuing message to scheduling node.Scheduling node can judge whether the service node can use according to Current resource occupied information and current task queuing message, it is possible to suspend the task new to disabled service node transmission is judged as.Thus, scheduling node can carry out the scheduling of resource between available service node, and can avoid for new task being distributed to disabled service node, prevent new task from lying on the table.

Description

Distributed system and its dispatching method and dispatching device

Technical field

The present invention relates to distributed technical field, more particularly to a kind of distributed system and its dispatching method and scheduling are filled Put.

Background technology

With the continuous expansion of data, general one-of-a-kind system can not meet the process task of big data.Therefore The distributed system constituted using many machines turns into the solution of current main flow.In a distributed system, a unusual core The problem of the heart is the scheduling of resource.In such as HADOOP (the distributed system architecture developed by Apache funds club), The working node for being responsible for performing specific process task is assigned on each machine of cluster by central dispatching system.

The running environment of each machine is complex in distributed system, therefore how to each machine in distributed system Device is scheduled, and during causing to occur unit problem in a distributed system, remains to keep the normal consistency of resource, is to divide at present The subject matter of cloth systems face.

The content of the invention

It is a primary object of the present invention to provide a kind of distributed system and its dispatching method and dispatching device, it can be When occurring unit problem in distributed system, remain to keep the normal consistency of resource.

According to an aspect of the invention, there is provided a kind of distributed system, including provide task to multiple service nodes Scheduling node and the multiple service nodes for operation task, wherein, service node periodically to scheduling node send work as Preceding resource occupation information and current task queuing message；Scheduling node is believed according to Current resource occupied information and current task queue Breath, judges whether the service node can use；And scheduling node suspends new to disabled service node transmission is judged as Task.

Thus scheduling node can carry out the scheduling of resource between available service node, and can avoid appointing new Business is distributed to disabled service node, prevents new task from lying on the table.

Preferably, scheduling node is being not received by situation of the information from specific transactions node up to first scheduled time It is lower that the specific transactions node is deleted from distributed system.So, the scheduled time was waited before predicate node death, can Evade the thrashing that the of short duration abnormal belt of node is come.In addition, mutually being tied with the deactivation of busy node by by the deletion of lost contact node Close, inferior position resource, effective prompt system effect can be eliminated on the premise of Resource Availability is ensured, preferentially using superior resources Rate simultaneously prevents performance issue.

Preferably, scheduling node reaches first threshold in unavailable service node number and/or deleted service node number reaches During to Second Threshold, new service node is enabled.

Thus, ensure that distributed system can provide enough resources and perform task by enabling new service node Treatment.

Preferably, be judged to service node not by scheduling node pause when unavailable service node number is higher than three threshold value Can use.

Thus, when unavailable service node reaches the upper limit, no longer service node can be judged to it is unavailable, with ensure Enough resources can be provided in distributed system to perform the treatment of task.

Preferably, scheduling node in Current resource occupied information higher than the 4th threshold value and current task queuing message higher than the Judge that the service node is unavailable during five threshold values.

By according to Current resource occupied information and current task queuing message, more fully can exactly judge business Unavailable, such pressure that on the one hand can mitigate service node of node, on the other hand can distribute to other by new task The service node being adapted for carrying out, to allow that new task is quickly performed.

Preferably, again be set to unavailable service node to can use in the case of at least one following by scheduling node：When Preceding resource occupation information is not higher than the 4th threshold value and/or current task queuing message is not higher than the 5th threshold value；And service node It is set as unavailable up to second scheduled time.

Thus, be judged as disabled service node input coefficient can make again in the case where it meets condition With.

According to another aspect of the present invention, a kind of dispatching method of distributed system, distributed system bag are additionally provided The multiple service nodes for operation task are included, the method includes：The Current resource for periodically obtaining service node takes letter Breath and current task queuing message；According to Current resource occupied information and current task queuing message, judge that the service node is It is no available；And suspend the task new to disabled service node transmission is judged as.

Preferably, the dispatching method can also include：It is being not received by the information from specific transactions node up to first The specific transactions node is deleted from distributed system in the case of the scheduled time.

Preferably, the dispatching method can also include：First threshold is reached in unavailable service node number and/or be deleted When service node number reaches Second Threshold, new service node is enabled.

Preferably, the dispatching method can also include：Suspend industry when unavailable service node number is higher than three threshold value Business node is judged to unavailable.

According to a further aspect of the invention, a kind of dispatching device of distributed system, distributed system bag are additionally provided The multiple service nodes for operation task are included, the device is used for multiple service nodes granting task, and including：Information is obtained Unit is taken, Current resource occupied information and current task queuing message for periodically obtaining service node；Enabled node Judging unit, for according to Current resource occupied information and current task queuing message, judging whether the service node can use；With And Dispatching adjustment unit, for suspending the task new to disabled service node transmission is judged as.

Preferably, the dispatching device can also include：Knot removal unit, for being not received by from specific transactions The information of node deletes the specific transactions node from distributed system up in the case of first scheduled time.

Preferably, the dispatching device can also include：New node enables unit, for being reached in unavailable service node number When first threshold and/or deleted service node number reach Second Threshold, new service node is enabled.

Distributed system of the invention and its dispatching method and dispatching device, the Current resource sent using service node are accounted for With information and current task queuing message, judge whether the service node can use, when judging unavailable, can suspend to the industry Business node sends new task.Thus scheduling node can carry out the scheduling of resource between available service node, and can be with Avoid for new task being distributed to disabled service node, prevent new task from lying on the table.

Brief description of the drawings

Disclosure illustrative embodiments are described in more detail by with reference to accompanying drawing, the disclosure above-mentioned and its Its purpose, feature and advantage will be apparent, wherein, in disclosure illustrative embodiments, identical reference number Typically represent same parts.

Fig. 1 shows the functional block diagram of distributed system according to an embodiment of the invention.

Fig. 2 shows the indicative flowchart of the dispatching method of distributed system according to an embodiment of the invention.

Fig. 3 shows the indicative flowchart of the dispatching device of distributed system according to an embodiment of the invention.

Fig. 4 shows the state transition diagram of the service node under a specific embodiment of the invention.

Specific embodiment

The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here Formula is limited.Conversely, these embodiments are provided so that the disclosure is more thorough and complete, and can be by the disclosure Scope intactly conveys to those skilled in the art.

Embodiments of the invention are specifically described below with reference to Fig. 1 to Fig. 4.Fig. 1 is showed according to the present invention one The functional block diagram of the distributed system 100 of embodiment.

As shown in figure 1, distributed system 100 includes scheduling node 110 and one or more service nodes 120.Scheduling section Point 110 can distribute task by sending dispatch command between multiple service nodes 120.

Scheduling node 110 and service node 120 can be disposed in the server.As the presently preferred embodiments, different industry Business node 120 can be deployed in different servers, and scheduling node 110 can be deployed in different from service node 120 An independent server in, it is also possible to and one of service node 120 is deployed in same server.In accompanying drawing Line represents between scheduling node 110 and service node 120 existence information interaction, and above-mentioned line can be wired connection, wireless Connection, or any type of connection that information transmission can be carried out.

Service node 120 periodically can send information to scheduling node 110.Above- mentioned information may certify that service node Survival, i.e. normal connection in systems, therefore is referred to alternatively as " heartbeat message ".Can also be included in the information of transmission is used for The information of the currently used situation of service node, these information is indicated to be referred to alternatively as " health status " information, for characterizing specific section The health status of point.In one embodiment, health status information can be or including Current resource occupied information and as predecessor Business queuing message.

Scheduling node 110 can be according to the health status information from service node 120 be received, and such as Current resource is accounted for Judge whether the service node can use with information and current task queuing message.Service node 120 " health status " no When good, such as Current resource occupied information and current task queuing message are too high, when indicating the node current excessively busy, scheduling Node 110 can suspend and send new task to being judged as disabled service node 120.

The concrete processing procedure of scheduling node 110 may refer to Fig. 2, and Fig. 2 shows according to an embodiment of the invention The indicative flowchart of the dispatching method of distributed system.

Referring to Fig. 2, method 200 starts from step S210, periodically obtains the health status information of service node, for example Current resource occupied information and current task queuing message.Current resource occupied information described herein can include service node CPU and the information of internal memory that the task of 120 current operations takes.CPU, EMS memory occupation situation can for example be indicated Iowait status informations.Current task queuing message can include the quantity of the currently outstanding task of service node 120.

Whether scheduling node 110 can judge the available (step of the service node 120 according to the health status information for obtaining S220), it is possible to suspend and send new task (step S230) to being judged as disabled service node 120.

Specifically, the Current resource occupied information of service node 120 is too high, shows the current institute's energy of the service node 120 The resource-constrained for running new task for providing, and the current task queuing message of service node 120 is too high, then show its mesh Preceding need number of tasks to be processed is excessive, has been unsuitable for performing new task.Therefore scheduling node 110 can take Current resource Information is judged to disabled business higher than the 4th threshold value and current task queuing message higher than the service node 120 of the 5th threshold value Node, and pause is sent to new task.

It should be noted that the concrete numerical value of each threshold value addressed herein and hereafter can set according to actual conditions, And " the 4th " described herein, " the 5th " and " first ", " second ", " the 3rd " that will hereafter address are merely to carry out Distinguish, should not be construed as the limitation in any sense to invention.

The Current resource occupied information of service node is too high be probably the machine as where the service node hard disk it is aging Cause, it is also possible to caused because the resource of the task occupancy of service node operation is larger.Can be eliminated for the former The service node, it is also possible to which manpower intervention is repaired.For the latter, current task team can be continued to run with by the service node Task in row, it is also possible to stop the operation of the service node, and the task in its current task queue is transferred into other business Node or the new service node operation for enabling.

For being judged as disabled service node 120, scheduling node 110 can be incited somebody to action in its satisfaction with after condition It is set to available service node again.

For example, scheduling node 110 can take letter in the Current resource for being previously determined to be disabled service node 120 When breath is not higher than the 4th threshold value and/or current task queuing message and is not higher than five threshold values, the service node 120 is reset It is available.In addition, scheduling node 110 can also service node 120 be set as it is unavailable reach the second Preset Time when, by it Again it is set to can use.For being newly set to available service node 120, scheduling node 110 can be to the new task of its granting. For above-mentioned threshold value and Preset Time, can flexibly be set according to the comprehensive condition of system.

As a preferred embodiment of the present invention, can by service node 120 be set as it is unavailable up to second it is default when Between when, it is set to again to can use.Thus, when multiple service nodes 120 are judged as unavailable in distributed system 100, Can service node 120 be set as it is unavailable up to the second Preset Time when, available service node is set to automatically so that Distributed system 100 has certain self-repairing capability, and also can be carried out for the disabled service node of manpower intervention The offer time is provided.

Describe what scheduling node 110 in distributed system 100 sent according to service node 120 in detail above in association with Fig. 2 Current resource occupied information and current task queuing message, judge that service node 120 is unavailable and by disabled business section Point 120 is set to the process of available service node again.

The machine where service node in distributed system can occur the abnormal conditions such as suspension or mechanical disorder frequently. In the prior art, scheduling node is periodically received the heartbeat message from service node, then can determine that the node " survival ". Once scheduling node is not received by the information from a certain service node, the node " death " will be immediately judged, but it is this Mode is unfavorable for evading temporary transient system or network jitter.For example, when the extremely short interruption of the network time of origin of a certain cluster When, although communication may recover immediately, but scheduling node 110 in a cycle due to being not received by from the cluster Service node " heartbeat " information and judge in the cluster that a large number of services node is dead, and by it from system-kill, so as to lead The entirety ability of cause system is affected.Although being judged as dead and deleted node then can be by a series of Cold heat operate or manpower intervention and by access system again, but this process is complicated and time-consuming, so that system Robustness and usability reduction.

For such case, in distributed system of the invention 100, survival of the scheduling node 110 to service node 120 The judgement of state can also add " unknown " state.That is, when service node heartbeat is lost, the node is not judged at once " death ", but the node is set to " unknown " state, and start timing.If communication recovery, weighs within a predetermined period of time Newly the node state is set to " survival ".But then may be used again without the information from the node is received within a predetermined period of time The node is judged to death.Thereby, it is possible to avoid service node because the failure such as network temporarily cannot be logical with scheduling node 110 Believe and cause to be deleted by the system, so as to evade the of short duration exception of cluster, reduce thrashing.

For the service node 120 for being judged as death, scheduling node 110 can delete it from distributed system 100, No longer from the node scheduling resource." deletion " described herein is primarily referred to as the deletion on software, i.e. system is no longer received and come from The information of node.After " death " node recovers normal operation, can be weighed via a series of operation of cold heats or manpower intervention It is new to access distributed system.Thus, scheduling node 110 can in guarantee system service node can with while avoid it is of short duration therefore System fluctuation caused by barrier, safeguards so that can reject in time by the service node 120 in distributed system 100 The service node 120 gone wrong in distributed system 100, it is possible thereby to carry out task between normal service node 120 Scheduling.

In one embodiment, it is distributed in order to ensure when deleting service node and/or judging that service node is unavailable System 100 still is able to provide treatment of enough resources to perform task, and scheduling node 110 can be in one business section of every judgement When point is unavailable or often deletes a service node, a new service node is enabled.That herein and hereafter addresses enables New service node can be judged as unavailable or deleted service node through before over recovery.

In addition, scheduling node 110 can also enable new business section when unavailable service node number reaches first threshold Point, or when deleted service node number reaches Second Threshold, new service node is enabled, or in unavailable service node Number reaches first threshold and deleted service node number reaches the condition of Second Threshold while when meeting, enabling new business section Point.Wherein, first threshold and Second Threshold may be greater than or the natural number equal to 1.

Further, scheduling node 110 can also suspend business when unavailable service node number is higher than three threshold value Node is judged to unavailable.

Fig. 3 shows the functional block diagram of the dispatching device of distributed system according to an embodiment of the invention.Wherein, adjust The functional module for spending device 300 can be by realizing hardware, software or the hardware and software of the principle of the invention combination realize.This Art personnel submodule it is understood that the functional module described by Fig. 3 can combine or be divided into, so that Realize the principle of foregoing invention.Therefore, description herein can support any possible group to functions described herein module Close or divide or further limit.

Dispatching device 300 shown in Fig. 3 can be used to realize the dispatching method shown in Fig. 2, below only with regard to dispatching device 300 The operation that the functional module and each functional module that can have can be performed is described briefly, for the detail portion being directed to Divide the description that may refer to above in association with Fig. 1, Fig. 1, repeat no more here.

Referring to Fig. 3, dispatching device 300 includes that information acquisition unit 310, enabled node judging unit 320 and scheduling are adjusted Engagement positions 330.

Information acquisition unit 310 is used to periodically obtain Current resource occupied information and the current task team of service node Column information.Enabled node judging unit 320 is used for according to Current resource occupied information and current task queuing message, judges the industry Whether business node can use.Dispatching adjustment unit 330 is used to suspend the task new to disabled service node transmission is judged as.

Preferably, enabled node judging unit 320 can be in Current resource occupied information higher than the 3rd threshold value and as predecessor Business queuing message judges that the service node is unavailable when being higher than four threshold values.Further, enabled node judging unit 320 can be with When Current resource occupied information is not higher than the 3rd threshold value and/or the current task queuing message is not higher than four threshold values, or Person's service node be set as it is unavailable up to second scheduled time when, unavailable service node is set to again to can use.In addition, available Node judging unit 320 can also when unavailable service node number is higher than three threshold value pause be judged to service node can not With.

As shown in figure 3, dispatching device 300 can also alternatively include knot removal unit 350.Knot removal unit 350 For the information from specific transactions node that is not received by up in the case of first scheduled time by the specific transactions node Deleted from the distributed system.

As shown in figure 3, dispatching device 300 alternatively can also enable unit 370 including new node.New node enables unit 370 are used to, when unavailable service node number reaches first threshold and/or deleted service node number reaches Second Threshold, enable New service node.

Above distributed system of the invention and its dispatching method and scheduling are described in detail by reference to accompanying drawing Device.The following concrete application that will provide the scheduling that distributed system is realized using the present invention.

Application examples

Distributed system in the present embodiment can be used for the application scenarios that scheduler object is service processes, for example, recommending In engine, fixed schedule node provides the service node of recommendation service by dispatching in distributed system.

Service node periodically can send to scheduling node to be believed comprising iowait status informations and task queue length The heartbeat packet of breath, scheduling node can be according to the situation of the heartbeat packet for receiving service node transmission, by the heartbeat of service node State is divided into alive (survival), unknown (unknown), dead (death), and also can be sent according to service node is received Iowait status informations and task queue length information, by the health status of service node be divided into invalid (invalid) and Valid (effective).The implication of each heart beat status and health status is as follows：

Alive, scheduling node can receive the heartbeat packet of service node；

Unknown, scheduling node does not receive the heartbeat packet of service node, and service node may be dead, it is also possible to Because of short duration heartbeat is lost caused by the failures such as network；

Dead, scheduling node has continued for some time the heartbeat for not receiving working node, show node it is dead or Person's network etc. cannot recover in the failure short time, and node is no longer appropriate for offer service；

Invalid, because non-lethal factor causes the ability that working node provides service to be substantially weaker than other working nodes, Such as working node is currently located machine environment badly (low memory, hard disk is aging), and node is no longer appropriate for offer service；

Valid, working node is in health status, it is possible to provide service.

Service node is carried out specifically how the conversion of several states and scheduling node are scheduled with reference to Fig. 4 Ground is illustrated.

1) client submits service routine deployment application to, and after scheduling node is received, the resource requirement according to service exists Service node (process) is distributed on suitable machine is used for process task, and service node original state is unknow, health status It is valid；

2) after service node reports heartbeat, scheduling node to receive heartbeat, service node state is set to alive, and more The health status of new business node；

If 3) service node does not report heartbeat, service node state is set to unknown by scheduling node；And record the loss heart Between at the beginning of jump；

If 4) specified in time-out time, the heartbeat of service node is not received always, the state of the service node is put It is dead, the service node no longer receives service request.A new service node is redistributed on another machine simultaneously For providing service；

If 5), service node current state is alive, iowait that heartbeat is reported and task queue length beyond Predetermined scope.If being currently at the working node not up to higher limit of invalid states, scheduling node is by the service node Health status be set to invalid, while redistributing a new service node on another machine for providing service.It is former Service node will be stopped, and no longer provide service.Within specified a period of time, machine where former service node is no longer used to The service node distribution of this service.

To sum up, scheduling node can from two dimensions of heart beat status and health status for scheduling of resource provides foundation, On the premise of ensureing Resource Availability, the preferential resource using advantage eliminates inferior position resource, can effectively prevent performance issue. Also, service node to increased " unknown " state in the middle of the transfer process of " death " state, can be used for treatment from " survival " The of short duration exception of machine where service node, reduces thrashing.

Additionally, the method according to the invention is also implemented as a kind of computer program, the computer program includes being used for Perform the computer program code instruction of the above steps limited in the above method of the invention.Or, it is of the invention Method is also implemented as a kind of computer program product, and the computer program product includes computer-readable medium, in the meter It is stored with calculation machine computer-readable recording medium for the computer program for performing the above-mentioned functions limited in the above method of the invention.Ability Field technique personnel will also understand is that, various illustrative logical blocks, module, circuit and algorithm with reference to described by disclosure herein Step may be implemented as the combination of electronic hardware, computer software or both.

Flow chart and block diagram in accompanying drawing show the possibility reality of the system and method for multiple embodiments of the invention Existing architectural framework, function and operation.At this point, each square frame in flow chart or block diagram can represent module, a journey A part for sequence section or code a, part for the module, program segment or code is used to realize regulation comprising one or more The executable instruction of logic function.It should also be noted that in some are as the realization replaced, the function of being marked in square frame also may be used Occur with different from the order marked in accompanying drawing.For example, two continuous square frames can essentially be performed substantially in parallel, They can also be performed in the opposite order sometimes, and this is depending on involved function.It is also noted that block diagram and/or stream The combination of the square frame in each square frame and block diagram and/or flow chart in journey figure, can use the function or operation for performing regulation Special hardware based system realize, or can be realized with the combination of computer instruction with specialized hardware.

It is described above various embodiments of the present invention, described above is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.In the case of without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purport Best explaining principle, practical application or the improvement to the technology in market of each embodiment, or make the art Other those of ordinary skill are understood that each embodiment disclosed herein.

Claims

1. a kind of distributed system, including provide the scheduling node of task and for described in operation task to multiple service nodes Multiple service nodes, wherein,

The service node periodically sends Current resource occupied information and current task queuing message to the scheduling node；

The scheduling node judges the service node according to the Current resource occupied information and the current task queuing message Whether can use；And

The scheduling node suspends the task new to disabled service node transmission is judged as.

2. distributed system as claimed in claim 1, wherein,

The scheduling node is in the information from specific transactions node that is not received by up to should in the case of first scheduled time Specific transactions node is deleted from the distributed system.

3. distributed system as claimed in claim 2, wherein, the scheduling node reaches first in unavailable service node number When threshold value and/or deleted service node number reach Second Threshold, new service node is enabled.

4. distributed system as claimed in claim 1, wherein,

Be judged to for service node unavailable by scheduling node pause when unavailable service node number is higher than three threshold value.

5. distributed system as claimed in claim 1, wherein,

The scheduling node is higher than higher than the 4th threshold value and the current task queuing message in the Current resource occupied information Judge that the service node is unavailable during five threshold values.

6. distributed system as claimed in claim 5, wherein, the scheduling node will not in the case of at least one following Available service node is set to can use again：

The Current resource occupied information is not higher than the 4th threshold value and/or the current task queuing message is not higher than the 5th threshold Value；And

The service node is set as unavailable up to second scheduled time.

7. a kind of dispatching device of distributed system, the distributed system includes the multiple service nodes for operation task, The device is used for the multiple service node granting task, and including：

Information acquisition unit, Current resource occupied information and current task queue letter for periodically obtaining service node Breath；

Enabled node judging unit, for according to the Current resource occupied information and the current task queuing message, judging Whether the service node can use；And

Dispatching adjustment unit, for suspending the task new to disabled service node transmission is judged as.

8. dispatching device as claimed in claim 7, also includes：

Knot removal unit, in the information from specific transactions node that is not received by up in the case of first scheduled time The specific transactions node is deleted from the distributed system.

9. dispatching device as claimed in claim 8, also includes：

New node enables unit, for reaching first threshold in unavailable service node number and/or deleted service node number reaches During to Second Threshold, new service node is enabled.

10. dispatching device as claimed in claim 7, wherein,

Be judged to for service node by enabled node judging unit pause when unavailable service node number is higher than three threshold value It is unavailable.

A kind of 11. dispatching methods of distributed system, the distributed system includes the multiple service nodes for operation task, The method includes：

Periodically obtain the Current resource occupied information and current task queuing message of service node；

According to the Current resource occupied information and the current task queuing message, judge whether the service node can use；With And

Suspend the task new to disabled service node transmission is judged as.

12. dispatching methods as claimed in claim 11, also include：

The information from specific transactions node that is not received by up in the case of first scheduled time by the specific transactions node Deleted from the distributed system.

13. dispatching methods as claimed in claim 12, also include：

When unavailable service node number reaches first threshold and/or deleted service node number reaches Second Threshold, enable new Service node.

14. dispatching methods as claimed in claim 11, also include：

When unavailable service node number is higher than three threshold value, be judged to for service node unavailable by pause.