CN107038069A

CN107038069A - Dynamic labels match DLMS dispatching methods under Hadoop platform

Info

Publication number: CN107038069A
Application number: CN201710181055.0A
Authority: CN
Inventors: 毛韦; 竹翠
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2017-08-11
Anticipated expiration: 2037-03-24
Also published as: CN107038069B

Abstract

DLMS dispatching methods are matched the invention discloses dynamic labels under Hadoop platform, belong to computer software fields, for Hadoop performance clusters difference is big, resource allocation randomness, the problem of perform overlong time, the present invention proposes a kind of by joint behavior label (hereinafter referred to as node label) and the scheduler of job class label (hereinafter referred to as operation label) progress Dynamic Matching.Node preliminary classification simultaneously assigns ancestor node label, and nodal test self performance index generation dynamic node label, operation carries out classification generation operation label according to part operation information, and Resource Scheduler distributes to node resource the operation of corresponding label.Test result indicates that, there is larger shortening on the Job execution time relative to the scheduler carried in YARN.

Description

Dynamic labels match DLMS dispatching methods under Hadoop platform

Technical field

The invention belongs to computer software fields, it is related to a kind of based on dynamic labels matching DLMS scheduling under Hadoop platform The design and realization of method.

Background technology

Early stage Hadoop version is due to resource scheduling management and MapReduce framework integrations in a module, being caused The decoupling of code is poor, it is impossible to be extended well, and a variety of frameworks are not supported.The design of Hadoop open source communities realizes one The Hadoop system of new generation of brand-new framework is planted, the system is Hadoop2.0 versions, and scheduling of resource is extracted and constructs one Individual new scheduling of resource framework, i.e. Hadoop system YARN of new generation.It is known that suitable in the environment of a certain determination Dispatching algorithm can be while user job request be met, the overall performance and system of effective lifting Hadoop job platforms Resource utilization.Acquiescence carries three kinds of schedulers in YARN：FIFO (fifo), Fair Scheduler (Fair ) and computing capability scheduler (Capacity Scheduler) Scheduler.Hadoop acquiescences use fifo schedulers, The algorithm uses the scheduling strategy of first in first out, simple easily to realize, but is detrimental to the execution of short operation, and shared cluster is not supported And multi-user management；Different user and the difference of operation resource distribution demand are considered by the Facebook fair scheduling algorithms proposed It is different, the resource of the shared cluster of support user fairness, but the configuration strategy underaction of operation resource, easily cause the wave of resource Take, and do not support operation to seize；The computing capability dispatching algorithm that Yahoo proposes supports many queues of multiple users share, computing capability Flexibly, but do not support operation is seized easily to be absorbed in local optimum.

But in actual enterprise's production, increased with the data volume of enterprise, annual cluster can all add some new sections Point, but the performance difference of clustered node is significant, this isomeric group is very universal in enterprise's production environment.If contemplated The task of one amount of calculation very big machine learning is distributed on the very poor machine node of CPU computing capabilitys, it is clear that can influence The overall execution time of operation.Three kinds of Resource Schedulers that Hadoop is carried do not solve this problem well, the present invention The resource regulating method (DLMS) of a kind of joint behavior and job class label Dynamic Matching is proposed, cpu performance is relatively good Machine stick CPU labels, IO labels are sticked on the good machine of disk I/O performance comparision either both general general Logical label, operation can stick CPU labels, IO labels task or common label according to classification, subsequently into different labels The resource allocation of corresponding label node is given corresponding label operation by queue, scheduler as far as possible, so as to reduce operation Run time, improve resource utilization ratio, improve system whole efficiency.

The content of the invention

Clustered node is carried out preliminary classification and assigns corresponding label by dispatching method proposed by the present invention. NodeManager is sent to carry out self detecting and entering original tag Mobile state adjustment before heartbeat, and uses machine learning classification Algorithm carries out classification to operation and assigns corresponding label, and the job priority set according to user, and the operation stand-by period etc. belongs to Property dynamic implement operation sequence, and by the resource allocation of respective labels give corresponding tag queue in operation.

Dispatching method proposed by the invention mainly includes with lower module：

(1) clustered node original classification and its dynamic cataloging label

Clustered node is classified firstly the need of preliminary classification is carried out according to the CPU of node and disk I/O performance.In cluster Each node is required for the task of one specified type of isolated operation and records the time that the node runs such operation, according to Node is divided into CPU types by the magnitude relationship of all node run time average values in the time of node operation individual task and cluster Node, disk I/O type node, plain edition node.

, can be to this if a node operation Partial Jobs cause load excessive during clustered node is run The label of node carries out degradation processing, is directly downgraded to ordinary node.During one node initial labels is CPU type labels, node Run CPU type tasks, although this node also has part resource to be not used, but now environment interior joint cpu performance advantage is Lose, to avoid such case from occurring, take dynamic labels method, the heart is sent to ResourceManager in NodeManager CPU the and IO utilization rates of dynamic detection node machine when jump, if it exceeds the threshold, just sticking this node label commonly Label, is required for being detected once, is achieved in node dynamic labels when sending heartbeat every time.This threshold value can be in configuration text Voluntarily configured in part, if user does not configure meeting reference system default value.

(2) acquisition and passback of Map execution informations

Hadoop operations generally fall into Map stages and Reduce stages, generally big operation map quantity at up to a hundred even more Many, an operation main time is spent in the calculating in Map stages, but each Map is identical execution logic again, So the operation information of first map process of job run can be collected, these information NodeManager to ResourceManager is delivered in scheduler when sending heartbeat, and scheduler carries out the classification of operation according to the information passed back.

In enterprise's production environment, the operation of some identical content logics, i.e. user can be all run daily known to operation answer institute The label of category, homework type label is set in order line or code for operation, and scheduler can be examined when scheduling Look into, if user is labelled to operation, just saves the link of job class, be directly scheduled.

(3) multipriority queue

To meet the demand of different user, prevent small operation from " starvation " phenomenon occur, using job priority scheme.Adjusting Newly-built 5 queues are in degree device：Original queue, wait priority query, CPU priority queries, IO priority queries and common Priority query.User submit operation to be entered in original queue first, is first run operation part map and is collected this part map Operation information, then operation, which enters to wait in priority query, waits Map operation information to return and classified, finally according to The class categories label of operation is entered in the queue of corresponding label.

(3) job class

Need to pre-process data before classification, data prediction refers to carry out at some data in early stage Reason.Quality for raising data mining generates Data Preprocessing Technology.Data Preprocessing Technology has a variety of methods：Data are clear Reason, data integration, data conversion and hough transformation.These data processing techniques are used before data mining, greatly improve number According to the quality of mining mode, reduction actual excavation required time.Circumferential edge pretreatment is mainly in terms of data normalization.Number It is exactly each variable data all linearly to be transformed on a new scale according to normalization, variable minimum value is 0 after conversion, maximum It is worth for 1, so ensures that all variable datas are both less than equal to 1.

It has selected in terms of job class simply, using the preferable Naive Bayes Classification of commonplace and classifying quality Device is classified.If user is in order line and task code if the type of added operation, the step can save, It is directly entered the medium resource to be allocated of corresponding queue.

(4) data locality

It is " mobile computing is more preferable than mobile data " that a principle is followed in Hadoop, is moved to the calculating section for placing data Point more saves cost than moving data to a calculate node, and performance is more preferable.On data, the locality present invention takes Delay degradation scheduling strategy.

Have the beneficial effect that：

1. the present invention proposes a kind of dispatching method of dynamic labels matching for isomeric group environment, by node and work Industry is classified, and calculating job priority is organized jointly with reference to the attribute of operation self character and submission user, in distribution resource When same type of resources and node are matched, it is contemplated that the performance of node is adopted with the relation of the task amount run at this stage Node label is dynamically adjusted with self-sensing method.Algorithm performance is analyzed finally by experiment.

2. the present invention is directed to the local sex chromosome mosaicism of data, it is proposed that the algorithm that delay degrades, degradation be divided into this current node, this Three kinds of frame node and random node, data locality is improved by reducing locality grade in certain delay time.

3. the present invention is previously run different type operation, run according to single node first using the method for dynamic labels Time and the average time of all nodes of cluster classify to comparing node, then according to the load of clustered node operation task Situation carries out Autonomous test to joint behavior and generates corresponding new label.

4. the present invention proposes to classify to operation, because MapReduce operations Map parts are all that identical processing is patrolled Volume, it is possible to the partial information first carried out in advance according to operation is classified to operation.

Brief description of the drawings

Fig. 1 job scheduling general frame flow charts；

Fig. 2 dispatching algorithm flow charts；

The lower three kinds of operations total run time comparison diagram of Fig. 3 difference dispatching algorithms；

Container distribution numbers spirogram under 500M data volumes under Fig. 4 DLMS；

Container distribution numbers spirogram under 1G data volumes under Fig. 5 DLMS；

Container distribution numbers spirogram under 1.5G data volumes under Fig. 6 DLMS；

Fig. 7 operations group run time comparison diagram under different dispatching algorithms；

Embodiment

For the purpose of the present invention, technical scheme and feature is more clearly understood, below in conjunction with specific embodiment, and join According to accompanying drawing, further refinement explanation is carried out to the present invention.YARN Scheduling Frameworks are as shown in Figure 1.

Each step is explained as follows：

(1) user submits application program to YARN, including user program, starts ApplicationMaster orders.

(2) ResourceManager be first Container of the application assigned, and with it is corresponding NodeManager communicates, it is desirable to which it starts the ApplicationMaster of application program.

(3) it is each task application resource, and monitor after ApplicationMaster is registered to ResourceManager Their running status, until end of run

(4) NodeManager sends and self dynamic node label of detection generation is carried out before heartbeat, and to ResourceManager reports resource.

(5) classification of task enters in different tag queues, carries out the resources to be allocated such as priority ranking.

(6) ApplicationMaster applies for and got resource by RPC agreements to ResourceManager.

(7) according to the NodeManager node labels reported and resource, scheduler is by the resource allocation of this node to correspondence The operation of tag queue.

(8) ApplicationMaster applies to after resource, is just communicated with corresponding NodeManager, it is desirable to which it starts Task.

(9) NodeManager is that task is set after running environment (environmental variance, JAR bags, binary program etc.), will Task start order is write in script, and by running the script startup task.

(10) each task reports the state and progress of oneself by some RPC agreement to ApplicationMaster, can To restart task in mission failure.

(11) after the completion of application program operation, ApplicationMaster is nullified to ResourceManager and closed certainly Oneself.

Preliminary classification is carried out to cluster physical node first, the procedure of classification is as follows：

(1) clustered machine set of node is set as N={ N_i| i ∈ [1, n] } n be node total number amount, i for since 1 n it is just whole Number, N_iRepresent i-th of physical machine in cluster.

(2) CPU, IO of identical task amount are carried out on every node and plain edition operation and records operation and holds The row time；T_cpu(i) represent in N_iThe cost time of CPU operations is performed on individual node；T_io(i) represent in N_iOn individual node Perform the cost time of IO operations, T_com(i) represent in N_iThe cost time of Ordinary Work is performed on individual node.

(3) the cluster average time of every kind of operation is calculated, the calculation formula of cluster average time is as follows： J represents the type of operation, calculates each node under such a operation With the time difference of average time, if T_cpu(i)<Avg_cpu, it is the original tag that this node sticks CPU type nodes, if T_cpu (i)>Avg_cpu, it is that this node sticks plain edition original tag, is had by the label being likely to more afterwards on every node many Individual, the label for selecting the saving time most is the last label of this node.

If Map operation information be M, it include it is following need collect information M=MIn, MOut, Rate, Acpu, Mcpu, Zcpu, Mrate } Min represents map input data amounts, MOut represents map output data quantities, and Rate represents input data Amount/output data quantity, Acpu represents CPU average service rates, and Mcpu represents cpu medians, and Zcpu represents that cpu utilization rates exceed 90% average, MRate represents internal memory usage amount, and these data are by the characteristic attribute as this later job class. Find that simple calculating CPU average time can not react the feature of operation very well, be found through experiments that CPU during experiment The number of times that the CPU usage of type operation is more than 90% is relatively more, and other kinds of operation CPU usage is more than 90% number of times phase To less, so this information is also added in the information of map passbacks.

The design method of the double-deck weight of User Defined is taken in terms of queue priority, is provided as shared by the size attribute of industry Weight be worthNum, the attribute falls into three classes num ∈ { long, mid, short }, shared by owner's attribute of operation Weight is worthUser, and the attribute is divided into two grade user ∈ { root, others }, weight shared by the urgency level of operation For worthEmogence, the attribute be divided into Three Estate prority ∈ highPrority, midPrority, LowPrority }, the weight shared by the stand-by period of operation is worthWait, and the calculation formula of wait is waitTime= NowTime-submitTime, assigns corresponding weight, the priority number of each task is finally calculated, then in corresponding team It is ranked up in row.Above-mentioned five kinds of task attribute weights are added and are 100%, and specific formula is as follows.

WorthNum+worthUser+worthEmogence+worthWtait=100%；

Last weight calculation formula：

FinalWort=worthNum*num+worthUser*user+worthEmogence*pror ity+ worthWait*waitTime

In terms of job class, using Naive Bayes Classifier, specific classifying step is as follows：

(1) it is the conditional probability under CPU, IO or plain edition operation under certain conditions to calculate an operation respectively：

P (job=lab_cpu|V₁,V₂,…,V_n)

P (job=lab_io|V₁,V₂,…,V_n)

P (job=lab_com|V₁,V₂,…,V_n)

Wherein job ∈ { cpu, io, com } represent job class label；V_iFor the attributive character of operation.

(2) according to Bayesian formula P (B | A)=P (AB)/P (A):

Assuming that V_iBetween it is relatively independent, assumed wherein according to independent

(3) P (V in actual calculating₁,V₂,…,V_n) unrelated with operation negligible, therefore can finally obtain

Similarly have

Operation is that the operation of CPU types, the operation of IO types or plain edition operation are bigger depending on which probable value.

Locality takes delay degradation scheduling strategy herein.The tactful concretism is as follows：

Increase a delay time attribute for each operation, if T_iFor the current delay time of i-th of operation, i ∈ [1, N], n is the interstitial content of cluster, T_localRepresent local node delay time threshold value, T_rackRepresent frame node delay time threshold Value.When scheduler allocates resources to operation, if the execution node and data input node of operation are not on one node, this When T_iFrom increasing 1, represent that the operation is once delayed scheduling, now by this resource allocation to other suitable operations, until working as T_i> T_localWhen, the locality of operation will be reduced to frame locality, as long as now the node in this frame can be by resource Distribute to the operation；Work as T_i>T_rackWhen, the locality of operation is reduced to random node.T therein_localAnd T_rackAll using configuration The mode of file is voluntarily configured by user according to cluster situation.It can be ensured in certain delay using the scheduling strategy of delay It is interior to obtain preferable locality.

The basic thought of DLMS dispatching methods is to allocate Partial Jobs execution in advance, and the information returned according to operation is to operation Classified, then the giving the resource allocation of node label in corresponding queue of the task, basic procedure：

Step 1 is when node reports resource by heartbeat to resource management, if original queue is not sky, time Operation in original queue is gone through, the operation that homework type label is specified in order line or program is assigned to accordingly In label priority query, original queue removes this operation.

If step 2 original queue is not sky, by the resource allocation on this node to original queue, operation enters etc. Treat to wait distribution next time resource in queue, original queue removes this operation, this wheel distribution terminates.

If it is not sky that step 3, which waits priority query, to waiting the operation in priority query to be sorted into Corresponding label priority query.

If the step 4 such as corresponding job class queue of joint behavior label is not sky, by the resource allocation of this node This queue is given, this wheel distribution terminates.

Step 5 sets and checks resource access times variable, if it exceeds the quantity of cluster, then press the resource of node CPU, IO, common, wait priority orders allocate resources to corresponding queue, this wheel finishing scheduling.This step can be prevented There is similar situations below, cpu queue operation is excessive, causes CPU type node resources to exhaust, the node of other labels also has Resource, but operation can not distribute the situation of resource.

The flow chart of algorithm is as shown in Figure 2.

Experimental situation

This section by verified by testing set forth herein DLMS schedulers actual effect.Experimental situation is 5 PCs The complete distributed type assemblies of Hadoop built, the unified node machine configuration of cluster is operating system Ubuntu- 12.04.1, JDK1.6, Hadoop2.5.1, internal memory 2G, hard disk 50G.Wherein NameNode CPU check figure is 2, The CPU core number that the CPU core number that dataNode1 CPU core number is 2, dataNode2 is 4, dataNode3 is 2, dataNode4's CPU core number is 4.

Experimental result and explanation

Prepare the wordCount (IO types) that data volume is 128M, each one of kmeans (CPU types) operation, respectively 4 first Platform node is operated above 6 times, records the time of job run.S represents chronomere second in table 1, and avg represents that the node is run The average time of respective labels task, allAvg represents that all nodes run the total average time of respective labels task, rate's Calculation formula is as follows：

Negative sign represents reduction of the average time relative to total average time, when positive sign represents that average time is relative to overall average Between increase.

DataNode1 is time saving in the time of two tasks of operation as can be seen from Table 1, and we take saving Most CPU operations are as the original tag of machine, and DataNode2 is IO labels, and DataNode3, DataNode4 are common machine Device.

The original classification of table 1 tests table

Experimental result and its analysis

Using can substantially distinguish several operations of homework type, WordCount needs substantial amounts of reading number in the Map stages According to write-in intermediate data, Map stages and Reduce stages there is no arithmetic computation, thus by such a operation it is qualitative be IO Type operation, Kmeans is required for the substantial amounts of distance calculated between point and point in Map stages and Reduce stages, not too many Intermediate data write-in, so by such a operation it is qualitative be CPU type operations, TopK do not have substantial amounts of data in the Reduce stages Write disk, also it is substantial amounts of calculate, relate only to simply compare, taking human as think that this is medium-sized Business.

Verified by two groups of experiments, first group of Setup Experiments scheduler is fifo, in 500M, 1G and 1.5G number According to being separately operable WordCount under amount, Kmeans, each 3 times of Topk operations record the average time of each operation 3 times as most Between terminal hour, switching scheduler does for Capacity and DLMS schedulers records DLMS scheduling in same experimental implementation, experiment The Container of every kind of operation is in the distribution of cluster under device, and Container is the dividing unit for representing cluster resource, be have recorded Each Map and Reduce processes are come with a Container in the distribution situation that operation burst is run in the cluster, YARN It is indicated.Each Node distribution ratio shows that node performs the ratio of job task amount to Container in the cluster.Fig. 2 Abscissa be operation data volume, ordinate is WordCount, Kmeans, this 3 kinds of operations of Topk be operated together it is total when Between.In the case where data volume increases, DLMS schedulers save about 10%-20% time compared to other schedulers.

Because DLMS can be by the resource allocation of respective nodes label to respective labels operation.The Map and reduce of operation It is to be run in the form of a Container on node, Fig. 3 to Fig. 5 is the different pieces of information amount operation under DMLS schedulers Container quantity.It is the label node of CPU types according to the original classification Node1 of upper section, Node2 and Node3 are common marks Node is signed, Node4 is IO label nodes.WordCount is IO type operations, and Topk is plain edition operation, and Kmeans is that CPU types are made Industry.As can be seen from the figure the Container regularity of distribution is the Container ratios that WordCount operations are distributed on Node4 More, the comparison of Tokp EDS maps on ordinary node Node2 and Node3 is more, Kmeans operations EDS maps on Node1 nodes Comparison it is many.Distributions of the Container of above different work on clustered node shows that DLMS schedulers improve corresponding section Probability of the resource allocation of point label to corresponding label operation.

Second group of experiment, has prepared 5 operations, is the WordCount operations of 128M and 500M data volumes respectively, 128M and The Kmeans operations of 500M data volumes, 500M Topk operations constitute an operation group.5 operations submit operation simultaneously.Not With implementation status of working continuously is simulated in the cluster of scheduler, the total time that operation group has been performed is recorded.Operation group is different Run 3 times under scheduler, record the total time that operation group has been run.Concrete outcome is shown in Fig. 6, as can be seen from Figure 6 herein The DLMS schedulers of proposition are it will be apparent that originally compared to the Hadoop times for carrying scheduler execution identical operation group saving The DMLS schedulers that text is proposed save for about 20% time compared to the Fifo schedulers that Hadoop is carried, and are adjusted than Capacity Degree device saves about 10% run time.

Claims

Dynamic labels match DLMS dispatching methods under 1.Hadoop platforms, it is characterised in that：

Clustered node original classification and its dynamic cataloging label；

Clustered node is classified firstly the need of preliminary classification is carried out according to the CPU of node and disk I/O performance；It is each in cluster Node is required for the task of one specified type of isolated operation and records the time that the node runs such operation, according to node Node is divided into CPU type sections by the magnitude relationship of all node run time average values in the time of operation individual task and cluster Point, disk I/O type node, plain edition node；

, can be to this node if a node operation Partial Jobs cause load excessive during clustered node is run Label carry out degradation processing, be directly downgraded to ordinary node；One node initial labels is operation in CPU type labels, node CPU type tasks, although this node also has part resource to be not used, but now environment interior joint cpu performance advantage has lost, To avoid such case from occurring, dynamic labels method is taken, heartbeat is sent to ResourceManager in NodeManager When the dynamic detection node machine CPU and IO utilization rates, if it exceeds the threshold, this node label just is sticked into common mark Label, are required for being detected once, are achieved in node dynamic labels when sending heartbeat every time；This threshold value can be in configuration file In voluntarily configure, if do not configure can reference system default value by user；

(1) acquisition and passback of Map execution informations

Hadoop operations generally fall into Map stages and Reduce stages, and generally big operation map quantity is even more more at up to a hundred, One operation main time is spent in the calculating in Map stages, but each Map is identical execution logic, institute again With the operation information for the first map process that can collect job run, these information NodeManager to ResourceManager is delivered in scheduler when sending heartbeat, and scheduler carries out the classification of operation according to the information passed back；

In enterprise's production environment, it can all run what operation known to the operation of some identical content logics, i.e. user should be affiliated daily Label, homework type label is set in order line or code for operation, and scheduler can be checked when scheduling, such as Fruit user is labelled to operation, just saves the link of job class, is directly scheduled；

(2) multipriority queue

To meet the demand of different user, prevent small operation from " starvation " phenomenon occur, using job priority scheme；In scheduler In newly-built 5 queues be：Original queue, wait priority query, CPU priority queries, IO priority queries and normal precedence Level queue；User submit operation to be entered in original queue first, is first run operation part map and is collected this part map operations Information, then operation, which enters to wait in priority query, waits Map operation information to return and classified, finally according to operation Class categories label enter in the queue of corresponding label；

(3) job class

Need to pre-process data before classification, data prediction refers to carry out data some processing in early stage；For The quality for improving data mining generates Data Preprocessing Technology；Data Preprocessing Technology has a variety of methods：Data scrubbing, data Integrated, data conversion and hough transformation；These data processing techniques are used before data mining, greatly improve data mining mould The time required to the quality of formula, reduction actual excavation；Circumferential edge pretreatment is mainly in terms of data normalization；Data normalization Exactly each variable data is all linearly transformed on a new scale, variable minimum value is 0 after conversion, and maximum is 1, this Sample ensures that all variable datas are both less than equal to 1；

Have selected in terms of job class it is simple, entered using the preferable Naive Bayes Classifier of commonplace and classifying quality Row classification；If user is in order line and task code if the type of added operation, the step can save, directly Into the medium resource to be allocated of corresponding queue；

(4) data locality

It is " mobile computing is more preferable than mobile data " that a principle is followed in Hadoop, and the calculate node for being moved to placement data will Cost is more saved than moving data to a calculate node, performance is more preferable；On data locality, this invention takes prolong When degrade scheduling strategy.
2. dynamic labels match DLMS dispatching methods under Hadoop platform according to claim 1, it is characterised in that：

(1) user submits application program to YARN, including user program, starts ApplicationMaster orders；

(2) ResourceManager is first Container of the application assigned, and is led to corresponding NodeManager Letter, it is desirable to which it starts the ApplicationMaster of application program；

(3) it is each task application resource, and monitor them after ApplicationMaster is registered to ResourceManager Running status, until end of run

(4) NodeManager sends and self dynamic node label of detection generation is carried out before heartbeat, and to ResourceManager reports resource；

(5) classification of task enters in different tag queues, carries out the resources to be allocated such as priority ranking；

(6) ApplicationMaster applies for and got resource by RPC agreements to ResourceManager；

(7) according to the NodeManager node labels reported and resource, scheduler is by the resource allocation of this node to corresponding label The operation of queue；

(8) ApplicationMaster applies to after resource, is just communicated with corresponding NodeManager, it is desirable to which it, which starts, appoints Business；

(9) NodeManager is that task is set after running environment (environmental variance, JAR bags, binary program etc.), by task Start order to write in script, and by running the script startup task；

(10) each task reports the state and progress of oneself, Ke Yi by some RPC agreement to ApplicationMaster Task is restarted during mission failure；

(11) after the completion of application program operation, ApplicationMaster is nullified to ResourceManager and is closed oneself；

Preliminary classification is carried out to cluster physical node first, the procedure of classification is as follows：

(1) clustered machine set of node is set as N={ N_i| i ∈ [1, n] } n be node total number amount, i for since 1 n positive integer, N_i Represent i-th of physical machine in cluster；

(2) CPU, IO and the plain edition operation and when recording Job execution of identical task amount are carried out on every node Between；T_cpu(i) represent in N_iThe cost time of CPU operations is performed on individual node；T_io(i) represent in N_iPerformed on individual node The cost time of IO operations, T_com(i) represent in N_iThe cost time of Ordinary Work is performed on individual node；

(3) the cluster average time of every kind of operation is calculated, the calculation formula of cluster average time is as follows： J represents the type of operation, calculates each node under such a operation With the time difference of average time, if T_cpu(i)<Avg_cpu, it is the original tag that this node sticks CPU type nodes, if T_cpu (i)>Avg_cpu, it is that this node sticks plain edition original tag, is had by the label being likely to more afterwards on every node many Individual, the label for selecting the saving time most is the last label of this node；

If Map operation information be M, it include it is following need collect information M=MIn, MOut, Rate, Acpu, Mcpu, Zcpu, Mrate } Min represents map input data amounts, MOut represents map output data quantities, and Rate represents input data amount/output Data volume, Acpu represents CPU average service rates, and Mcpu represents cpu medians, and Zcpu represents that cpu utilization rates are flat more than 90% Mean, MRate represents internal memory usage amount, and these data are by the characteristic attribute as this later job class；

The design method of the double-deck weight of User Defined is taken in terms of queue priority, the power shared by the size attribute of industry is provided as Weight is worthNum, and the attribute falls into three classes num ∈ { long, mid, short }, weight shared by owner's attribute of operation For worthUser, the attribute is divided into two grade user ∈ { root, others }, and weight shared by the urgency level of operation is WorthEmogence, the attribute be divided into Three Estate prority ∈ highPrority, midPrority, LowPrority }, the weight shared by the stand-by period of operation is worthWait, and the calculation formula of wait is waitTime= NowTime-submitTime, assigns corresponding weight, the priority number of each task is finally calculated, then in corresponding team It is ranked up in row；Above-mentioned five kinds of task attribute weights are added and are 100%, and specific formula is as follows；

WorthNum+worthUser+worthEmogence+worthWtait=100%；

Last weight calculation formula：

FinalWort=worthNum*num+worthUser*user+worthEmogence*pror ity+worthWait* waitTime

In terms of job class, using Naive Bayes Classifier, specific classifying step is as follows：

(1) it is the conditional probability under CPU, IO or plain edition operation under certain conditions to calculate an operation respectively：

P (job=lab_cpu|V₁,V₂,…,V_n)

P (job=lab_io|V₁,V₂,…,V_n)

P (job=lab_com|V₁,V₂,…,V_n)

Wherein job ∈ { cpu, io, com } represent job class label；V_iFor the attributive character of operation；

(2) according to Bayesian formula P (B | A)=P (AB)/P (A):

Assuming that V_iBetween it is relatively independent, assumed wherein according to independent

(3) P (V in actual calculating₁,V₂,…,V_n) unrelated with operation negligible, therefore can finally obtain

Similarly have

Operation is that the operation of CPU types, the operation of IO types or plain edition operation are bigger depending on which probable value；

Locality takes delay degradation scheduling strategy herein；The tactful concretism is as follows：

Increase a delay time attribute for each operation, if T_iFor the current delay time of i-th of operation, i ∈ [1, n], n is The interstitial content of cluster, T_localRepresent local node delay time threshold value, T_rackRepresent frame node delay time threshold；Work as tune When spending device distribution resource to operation, if the execution node and data input node of operation are not on one node, now T_iFrom Increase 1, represent that the operation is once delayed scheduling, now by this resource allocation to other suitable operations, until working as T_i>T_local When, the locality of operation will be reduced to frame locality, as long as now the node in this frame can be by resource allocation Give the operation；Work as T_i>T_rackWhen, the locality of operation is reduced to random node；T therein_localAnd T_rackAll use configuration file Mode voluntarily configured according to cluster situation by user；It can be ensured in certain delay time using the scheduling strategy of delay Obtain preferable locality；

The basic thought of DLMS dispatching methods is to allocate Partial Jobs execution in advance, and the information returned according to operation is carried out to operation Classification, then the giving the resource allocation of node label in corresponding queue of the task, basic procedure：

Step 1 is when node reports resource by heartbeat to resource management, if original queue is not sky, traversal is former Operation in beginning queue, corresponding label is assigned to by the operation that homework type label is specified in order line or program In priority query, original queue removes this operation；

If step 2 original queue is not sky, by the resource allocation on this node to original queue, operation, which enters, waits team Distribution next time resource is waited in row, original queue removes this operation, this wheel distribution terminates；

It is corresponding to waiting the operation progress in priority query to be sorted into if it is not sky that step 3, which waits priority query, Label priority query；

If the step 4 such as corresponding job class queue of joint behavior label is not sky, this is given by the resource allocation of this node Queue, this wheel distribution terminates；

Step 5 sets and checks resource access times variable, if it exceeds the quantity of cluster, then by the resource of node by CPU, IO, Commonly, priority orders are waited to allocate resources to corresponding queue, this wheel finishing scheduling；This step can prevent similar Situations below, cpu queue operation is excessive, causes CPU type node resources to exhaust, the node of other labels also has resource, but It is that operation can not distribute the situation of resource.