CN107038069A - Dynamic labels match DLMS dispatching methods under Hadoop platform - Google Patents

Dynamic labels match DLMS dispatching methods under Hadoop platform Download PDF

Info

Publication number
CN107038069A
CN107038069A CN201710181055.0A CN201710181055A CN107038069A CN 107038069 A CN107038069 A CN 107038069A CN 201710181055 A CN201710181055 A CN 201710181055A CN 107038069 A CN107038069 A CN 107038069A
Authority
CN
China
Prior art keywords
node
label
cpu
data
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710181055.0A
Other languages
Chinese (zh)
Other versions
CN107038069B (en
Inventor
毛韦
竹翠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710181055.0A priority Critical patent/CN107038069B/en
Publication of CN107038069A publication Critical patent/CN107038069A/en
Application granted granted Critical
Publication of CN107038069B publication Critical patent/CN107038069B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

DLMS dispatching methods are matched the invention discloses dynamic labels under Hadoop platform, belong to computer software fields, for Hadoop performance clusters difference is big, resource allocation randomness, the problem of perform overlong time, the present invention proposes a kind of by joint behavior label (hereinafter referred to as node label) and the scheduler of job class label (hereinafter referred to as operation label) progress Dynamic Matching.Node preliminary classification simultaneously assigns ancestor node label, and nodal test self performance index generation dynamic node label, operation carries out classification generation operation label according to part operation information, and Resource Scheduler distributes to node resource the operation of corresponding label.Test result indicates that, there is larger shortening on the Job execution time relative to the scheduler carried in YARN.

Description

Dynamic labels match DLMS dispatching methods under Hadoop platform
Technical field
The invention belongs to computer software fields, it is related to a kind of based on dynamic labels matching DLMS scheduling under Hadoop platform The design and realization of method.
Background technology
Early stage Hadoop version is due to resource scheduling management and MapReduce framework integrations in a module, being caused The decoupling of code is poor, it is impossible to be extended well, and a variety of frameworks are not supported.The design of Hadoop open source communities realizes one The Hadoop system of new generation of brand-new framework is planted, the system is Hadoop2.0 versions, and scheduling of resource is extracted and constructs one Individual new scheduling of resource framework, i.e. Hadoop system YARN of new generation.It is known that suitable in the environment of a certain determination Dispatching algorithm can be while user job request be met, the overall performance and system of effective lifting Hadoop job platforms Resource utilization.Acquiescence carries three kinds of schedulers in YARN:FIFO (fifo), Fair Scheduler (Fair ) and computing capability scheduler (Capacity Scheduler) Scheduler.Hadoop acquiescences use fifo schedulers, The algorithm uses the scheduling strategy of first in first out, simple easily to realize, but is detrimental to the execution of short operation, and shared cluster is not supported And multi-user management;Different user and the difference of operation resource distribution demand are considered by the Facebook fair scheduling algorithms proposed It is different, the resource of the shared cluster of support user fairness, but the configuration strategy underaction of operation resource, easily cause the wave of resource Take, and do not support operation to seize;The computing capability dispatching algorithm that Yahoo proposes supports many queues of multiple users share, computing capability Flexibly, but do not support operation is seized easily to be absorbed in local optimum.
But in actual enterprise's production, increased with the data volume of enterprise, annual cluster can all add some new sections Point, but the performance difference of clustered node is significant, this isomeric group is very universal in enterprise's production environment.If contemplated The task of one amount of calculation very big machine learning is distributed on the very poor machine node of CPU computing capabilitys, it is clear that can influence The overall execution time of operation.Three kinds of Resource Schedulers that Hadoop is carried do not solve this problem well, the present invention The resource regulating method (DLMS) of a kind of joint behavior and job class label Dynamic Matching is proposed, cpu performance is relatively good Machine stick CPU labels, IO labels are sticked on the good machine of disk I/O performance comparision either both general general Logical label, operation can stick CPU labels, IO labels task or common label according to classification, subsequently into different labels The resource allocation of corresponding label node is given corresponding label operation by queue, scheduler as far as possible, so as to reduce operation Run time, improve resource utilization ratio, improve system whole efficiency.
The content of the invention
Clustered node is carried out preliminary classification and assigns corresponding label by dispatching method proposed by the present invention. NodeManager is sent to carry out self detecting and entering original tag Mobile state adjustment before heartbeat, and uses machine learning classification Algorithm carries out classification to operation and assigns corresponding label, and the job priority set according to user, and the operation stand-by period etc. belongs to Property dynamic implement operation sequence, and by the resource allocation of respective labels give corresponding tag queue in operation.
Dispatching method proposed by the invention mainly includes with lower module:
(1) clustered node original classification and its dynamic cataloging label
Clustered node is classified firstly the need of preliminary classification is carried out according to the CPU of node and disk I/O performance.In cluster Each node is required for the task of one specified type of isolated operation and records the time that the node runs such operation, according to Node is divided into CPU types by the magnitude relationship of all node run time average values in the time of node operation individual task and cluster Node, disk I/O type node, plain edition node.
, can be to this if a node operation Partial Jobs cause load excessive during clustered node is run The label of node carries out degradation processing, is directly downgraded to ordinary node.During one node initial labels is CPU type labels, node Run CPU type tasks, although this node also has part resource to be not used, but now environment interior joint cpu performance advantage is Lose, to avoid such case from occurring, take dynamic labels method, the heart is sent to ResourceManager in NodeManager CPU the and IO utilization rates of dynamic detection node machine when jump, if it exceeds the threshold, just sticking this node label commonly Label, is required for being detected once, is achieved in node dynamic labels when sending heartbeat every time.This threshold value can be in configuration text Voluntarily configured in part, if user does not configure meeting reference system default value.
(2) acquisition and passback of Map execution informations
Hadoop operations generally fall into Map stages and Reduce stages, generally big operation map quantity at up to a hundred even more Many, an operation main time is spent in the calculating in Map stages, but each Map is identical execution logic again, So the operation information of first map process of job run can be collected, these information NodeManager to ResourceManager is delivered in scheduler when sending heartbeat, and scheduler carries out the classification of operation according to the information passed back.
In enterprise's production environment, the operation of some identical content logics, i.e. user can be all run daily known to operation answer institute The label of category, homework type label is set in order line or code for operation, and scheduler can be examined when scheduling Look into, if user is labelled to operation, just saves the link of job class, be directly scheduled.
(3) multipriority queue
To meet the demand of different user, prevent small operation from " starvation " phenomenon occur, using job priority scheme.Adjusting Newly-built 5 queues are in degree device:Original queue, wait priority query, CPU priority queries, IO priority queries and common Priority query.User submit operation to be entered in original queue first, is first run operation part map and is collected this part map Operation information, then operation, which enters to wait in priority query, waits Map operation information to return and classified, finally according to The class categories label of operation is entered in the queue of corresponding label.
(3) job class
Need to pre-process data before classification, data prediction refers to carry out at some data in early stage Reason.Quality for raising data mining generates Data Preprocessing Technology.Data Preprocessing Technology has a variety of methods:Data are clear Reason, data integration, data conversion and hough transformation.These data processing techniques are used before data mining, greatly improve number According to the quality of mining mode, reduction actual excavation required time.Circumferential edge pretreatment is mainly in terms of data normalization.Number It is exactly each variable data all linearly to be transformed on a new scale according to normalization, variable minimum value is 0 after conversion, maximum It is worth for 1, so ensures that all variable datas are both less than equal to 1.
It has selected in terms of job class simply, using the preferable Naive Bayes Classification of commonplace and classifying quality Device is classified.If user is in order line and task code if the type of added operation, the step can save, It is directly entered the medium resource to be allocated of corresponding queue.
(4) data locality
It is " mobile computing is more preferable than mobile data " that a principle is followed in Hadoop, is moved to the calculating section for placing data Point more saves cost than moving data to a calculate node, and performance is more preferable.On data, the locality present invention takes Delay degradation scheduling strategy.
Have the beneficial effect that:
1. the present invention proposes a kind of dispatching method of dynamic labels matching for isomeric group environment, by node and work Industry is classified, and calculating job priority is organized jointly with reference to the attribute of operation self character and submission user, in distribution resource When same type of resources and node are matched, it is contemplated that the performance of node is adopted with the relation of the task amount run at this stage Node label is dynamically adjusted with self-sensing method.Algorithm performance is analyzed finally by experiment.
2. the present invention is directed to the local sex chromosome mosaicism of data, it is proposed that the algorithm that delay degrades, degradation be divided into this current node, this Three kinds of frame node and random node, data locality is improved by reducing locality grade in certain delay time.
3. the present invention is previously run different type operation, run according to single node first using the method for dynamic labels Time and the average time of all nodes of cluster classify to comparing node, then according to the load of clustered node operation task Situation carries out Autonomous test to joint behavior and generates corresponding new label.
4. the present invention proposes to classify to operation, because MapReduce operations Map parts are all that identical processing is patrolled Volume, it is possible to the partial information first carried out in advance according to operation is classified to operation.
Brief description of the drawings
Fig. 1 job scheduling general frame flow charts;
Fig. 2 dispatching algorithm flow charts;
The lower three kinds of operations total run time comparison diagram of Fig. 3 difference dispatching algorithms;
Container distribution numbers spirogram under 500M data volumes under Fig. 4 DLMS;
Container distribution numbers spirogram under 1G data volumes under Fig. 5 DLMS;
Container distribution numbers spirogram under 1.5G data volumes under Fig. 6 DLMS;
Fig. 7 operations group run time comparison diagram under different dispatching algorithms;
Embodiment
For the purpose of the present invention, technical scheme and feature is more clearly understood, below in conjunction with specific embodiment, and join According to accompanying drawing, further refinement explanation is carried out to the present invention.YARN Scheduling Frameworks are as shown in Figure 1.
Each step is explained as follows:
(1) user submits application program to YARN, including user program, starts ApplicationMaster orders.
(2) ResourceManager be first Container of the application assigned, and with it is corresponding NodeManager communicates, it is desirable to which it starts the ApplicationMaster of application program.
(3) it is each task application resource, and monitor after ApplicationMaster is registered to ResourceManager Their running status, until end of run
(4) NodeManager sends and self dynamic node label of detection generation is carried out before heartbeat, and to ResourceManager reports resource.
(5) classification of task enters in different tag queues, carries out the resources to be allocated such as priority ranking.
(6) ApplicationMaster applies for and got resource by RPC agreements to ResourceManager.
(7) according to the NodeManager node labels reported and resource, scheduler is by the resource allocation of this node to correspondence The operation of tag queue.
(8) ApplicationMaster applies to after resource, is just communicated with corresponding NodeManager, it is desirable to which it starts Task.
(9) NodeManager is that task is set after running environment (environmental variance, JAR bags, binary program etc.), will Task start order is write in script, and by running the script startup task.
(10) each task reports the state and progress of oneself by some RPC agreement to ApplicationMaster, can To restart task in mission failure.
(11) after the completion of application program operation, ApplicationMaster is nullified to ResourceManager and closed certainly Oneself.
Preliminary classification is carried out to cluster physical node first, the procedure of classification is as follows:
(1) clustered machine set of node is set as N={ Ni| i ∈ [1, n] } n be node total number amount, i for since 1 n it is just whole Number, NiRepresent i-th of physical machine in cluster.
(2) CPU, IO of identical task amount are carried out on every node and plain edition operation and records operation and holds The row time;Tcpu(i) represent in NiThe cost time of CPU operations is performed on individual node;Tio(i) represent in NiOn individual node Perform the cost time of IO operations, Tcom(i) represent in NiThe cost time of Ordinary Work is performed on individual node.
(3) the cluster average time of every kind of operation is calculated, the calculation formula of cluster average time is as follows: J represents the type of operation, calculates each node under such a operation With the time difference of average time, if Tcpu(i)<Avgcpu, it is the original tag that this node sticks CPU type nodes, if Tcpu (i)>Avgcpu, it is that this node sticks plain edition original tag, is had by the label being likely to more afterwards on every node many Individual, the label for selecting the saving time most is the last label of this node.
If Map operation information be M, it include it is following need collect information M=MIn, MOut, Rate, Acpu, Mcpu, Zcpu, Mrate } Min represents map input data amounts, MOut represents map output data quantities, and Rate represents input data Amount/output data quantity, Acpu represents CPU average service rates, and Mcpu represents cpu medians, and Zcpu represents that cpu utilization rates exceed 90% average, MRate represents internal memory usage amount, and these data are by the characteristic attribute as this later job class. Find that simple calculating CPU average time can not react the feature of operation very well, be found through experiments that CPU during experiment The number of times that the CPU usage of type operation is more than 90% is relatively more, and other kinds of operation CPU usage is more than 90% number of times phase To less, so this information is also added in the information of map passbacks.
The design method of the double-deck weight of User Defined is taken in terms of queue priority, is provided as shared by the size attribute of industry Weight be worthNum, the attribute falls into three classes num ∈ { long, mid, short }, shared by owner's attribute of operation Weight is worthUser, and the attribute is divided into two grade user ∈ { root, others }, weight shared by the urgency level of operation For worthEmogence, the attribute be divided into Three Estate prority ∈ highPrority, midPrority, LowPrority }, the weight shared by the stand-by period of operation is worthWait, and the calculation formula of wait is waitTime= NowTime-submitTime, assigns corresponding weight, the priority number of each task is finally calculated, then in corresponding team It is ranked up in row.Above-mentioned five kinds of task attribute weights are added and are 100%, and specific formula is as follows.
WorthNum+worthUser+worthEmogence+worthWtait=100%;
Last weight calculation formula:
FinalWort=worthNum*num+worthUser*user+worthEmogence*pror ity+ worthWait*waitTime
In terms of job class, using Naive Bayes Classifier, specific classifying step is as follows:
(1) it is the conditional probability under CPU, IO or plain edition operation under certain conditions to calculate an operation respectively:
P (job=labcpu|V1,V2,…,Vn)
P (job=labio|V1,V2,…,Vn)
P (job=labcom|V1,V2,…,Vn)
Wherein job ∈ { cpu, io, com } represent job class label;ViFor the attributive character of operation.
(2) according to Bayesian formula P (B | A)=P (AB)/P (A):
Assuming that ViBetween it is relatively independent, assumed wherein according to independent
(3) P (V in actual calculating1,V2,…,Vn) unrelated with operation negligible, therefore can finally obtain
Similarly have
Operation is that the operation of CPU types, the operation of IO types or plain edition operation are bigger depending on which probable value.
Locality takes delay degradation scheduling strategy herein.The tactful concretism is as follows:
Increase a delay time attribute for each operation, if TiFor the current delay time of i-th of operation, i ∈ [1, N], n is the interstitial content of cluster, TlocalRepresent local node delay time threshold value, TrackRepresent frame node delay time threshold Value.When scheduler allocates resources to operation, if the execution node and data input node of operation are not on one node, this When TiFrom increasing 1, represent that the operation is once delayed scheduling, now by this resource allocation to other suitable operations, until working as Ti> TlocalWhen, the locality of operation will be reduced to frame locality, as long as now the node in this frame can be by resource Distribute to the operation;Work as Ti>TrackWhen, the locality of operation is reduced to random node.T thereinlocalAnd TrackAll using configuration The mode of file is voluntarily configured by user according to cluster situation.It can be ensured in certain delay using the scheduling strategy of delay It is interior to obtain preferable locality.
The basic thought of DLMS dispatching methods is to allocate Partial Jobs execution in advance, and the information returned according to operation is to operation Classified, then the giving the resource allocation of node label in corresponding queue of the task, basic procedure:
Step 1 is when node reports resource by heartbeat to resource management, if original queue is not sky, time Operation in original queue is gone through, the operation that homework type label is specified in order line or program is assigned to accordingly In label priority query, original queue removes this operation.
If step 2 original queue is not sky, by the resource allocation on this node to original queue, operation enters etc. Treat to wait distribution next time resource in queue, original queue removes this operation, this wheel distribution terminates.
If it is not sky that step 3, which waits priority query, to waiting the operation in priority query to be sorted into Corresponding label priority query.
If the step 4 such as corresponding job class queue of joint behavior label is not sky, by the resource allocation of this node This queue is given, this wheel distribution terminates.
Step 5 sets and checks resource access times variable, if it exceeds the quantity of cluster, then press the resource of node CPU, IO, common, wait priority orders allocate resources to corresponding queue, this wheel finishing scheduling.This step can be prevented There is similar situations below, cpu queue operation is excessive, causes CPU type node resources to exhaust, the node of other labels also has Resource, but operation can not distribute the situation of resource.
The flow chart of algorithm is as shown in Figure 2.
Experimental situation
This section by verified by testing set forth herein DLMS schedulers actual effect.Experimental situation is 5 PCs The complete distributed type assemblies of Hadoop built, the unified node machine configuration of cluster is operating system Ubuntu- 12.04.1, JDK1.6, Hadoop2.5.1, internal memory 2G, hard disk 50G.Wherein NameNode CPU check figure is 2, The CPU core number that the CPU core number that dataNode1 CPU core number is 2, dataNode2 is 4, dataNode3 is 2, dataNode4's CPU core number is 4.
Experimental result and explanation
Prepare the wordCount (IO types) that data volume is 128M, each one of kmeans (CPU types) operation, respectively 4 first Platform node is operated above 6 times, records the time of job run.S represents chronomere second in table 1, and avg represents that the node is run The average time of respective labels task, allAvg represents that all nodes run the total average time of respective labels task, rate's Calculation formula is as follows:
Negative sign represents reduction of the average time relative to total average time, when positive sign represents that average time is relative to overall average Between increase.
DataNode1 is time saving in the time of two tasks of operation as can be seen from Table 1, and we take saving Most CPU operations are as the original tag of machine, and DataNode2 is IO labels, and DataNode3, DataNode4 are common machine Device.
The original classification of table 1 tests table
Experimental result and its analysis
Using can substantially distinguish several operations of homework type, WordCount needs substantial amounts of reading number in the Map stages According to write-in intermediate data, Map stages and Reduce stages there is no arithmetic computation, thus by such a operation it is qualitative be IO Type operation, Kmeans is required for the substantial amounts of distance calculated between point and point in Map stages and Reduce stages, not too many Intermediate data write-in, so by such a operation it is qualitative be CPU type operations, TopK do not have substantial amounts of data in the Reduce stages Write disk, also it is substantial amounts of calculate, relate only to simply compare, taking human as think that this is medium-sized Business.
Verified by two groups of experiments, first group of Setup Experiments scheduler is fifo, in 500M, 1G and 1.5G number According to being separately operable WordCount under amount, Kmeans, each 3 times of Topk operations record the average time of each operation 3 times as most Between terminal hour, switching scheduler does for Capacity and DLMS schedulers records DLMS scheduling in same experimental implementation, experiment The Container of every kind of operation is in the distribution of cluster under device, and Container is the dividing unit for representing cluster resource, be have recorded Each Map and Reduce processes are come with a Container in the distribution situation that operation burst is run in the cluster, YARN It is indicated.Each Node distribution ratio shows that node performs the ratio of job task amount to Container in the cluster.Fig. 2 Abscissa be operation data volume, ordinate is WordCount, Kmeans, this 3 kinds of operations of Topk be operated together it is total when Between.In the case where data volume increases, DLMS schedulers save about 10%-20% time compared to other schedulers.
Because DLMS can be by the resource allocation of respective nodes label to respective labels operation.The Map and reduce of operation It is to be run in the form of a Container on node, Fig. 3 to Fig. 5 is the different pieces of information amount operation under DMLS schedulers Container quantity.It is the label node of CPU types according to the original classification Node1 of upper section, Node2 and Node3 are common marks Node is signed, Node4 is IO label nodes.WordCount is IO type operations, and Topk is plain edition operation, and Kmeans is that CPU types are made Industry.As can be seen from the figure the Container regularity of distribution is the Container ratios that WordCount operations are distributed on Node4 More, the comparison of Tokp EDS maps on ordinary node Node2 and Node3 is more, Kmeans operations EDS maps on Node1 nodes Comparison it is many.Distributions of the Container of above different work on clustered node shows that DLMS schedulers improve corresponding section Probability of the resource allocation of point label to corresponding label operation.
Second group of experiment, has prepared 5 operations, is the WordCount operations of 128M and 500M data volumes respectively, 128M and The Kmeans operations of 500M data volumes, 500M Topk operations constitute an operation group.5 operations submit operation simultaneously.Not With implementation status of working continuously is simulated in the cluster of scheduler, the total time that operation group has been performed is recorded.Operation group is different Run 3 times under scheduler, record the total time that operation group has been run.Concrete outcome is shown in Fig. 6, as can be seen from Figure 6 herein The DLMS schedulers of proposition are it will be apparent that originally compared to the Hadoop times for carrying scheduler execution identical operation group saving The DMLS schedulers that text is proposed save for about 20% time compared to the Fifo schedulers that Hadoop is carried, and are adjusted than Capacity Degree device saves about 10% run time.

Claims (2)

  1. Dynamic labels match DLMS dispatching methods under 1.Hadoop platforms, it is characterised in that:
    Clustered node original classification and its dynamic cataloging label;
    Clustered node is classified firstly the need of preliminary classification is carried out according to the CPU of node and disk I/O performance;It is each in cluster Node is required for the task of one specified type of isolated operation and records the time that the node runs such operation, according to node Node is divided into CPU type sections by the magnitude relationship of all node run time average values in the time of operation individual task and cluster Point, disk I/O type node, plain edition node;
    , can be to this node if a node operation Partial Jobs cause load excessive during clustered node is run Label carry out degradation processing, be directly downgraded to ordinary node;One node initial labels is operation in CPU type labels, node CPU type tasks, although this node also has part resource to be not used, but now environment interior joint cpu performance advantage has lost, To avoid such case from occurring, dynamic labels method is taken, heartbeat is sent to ResourceManager in NodeManager When the dynamic detection node machine CPU and IO utilization rates, if it exceeds the threshold, this node label just is sticked into common mark Label, are required for being detected once, are achieved in node dynamic labels when sending heartbeat every time;This threshold value can be in configuration file In voluntarily configure, if do not configure can reference system default value by user;
    (1) acquisition and passback of Map execution informations
    Hadoop operations generally fall into Map stages and Reduce stages, and generally big operation map quantity is even more more at up to a hundred, One operation main time is spent in the calculating in Map stages, but each Map is identical execution logic, institute again With the operation information for the first map process that can collect job run, these information NodeManager to ResourceManager is delivered in scheduler when sending heartbeat, and scheduler carries out the classification of operation according to the information passed back;
    In enterprise's production environment, it can all run what operation known to the operation of some identical content logics, i.e. user should be affiliated daily Label, homework type label is set in order line or code for operation, and scheduler can be checked when scheduling, such as Fruit user is labelled to operation, just saves the link of job class, is directly scheduled;
    (2) multipriority queue
    To meet the demand of different user, prevent small operation from " starvation " phenomenon occur, using job priority scheme;In scheduler In newly-built 5 queues be:Original queue, wait priority query, CPU priority queries, IO priority queries and normal precedence Level queue;User submit operation to be entered in original queue first, is first run operation part map and is collected this part map operations Information, then operation, which enters to wait in priority query, waits Map operation information to return and classified, finally according to operation Class categories label enter in the queue of corresponding label;
    (3) job class
    Need to pre-process data before classification, data prediction refers to carry out data some processing in early stage;For The quality for improving data mining generates Data Preprocessing Technology;Data Preprocessing Technology has a variety of methods:Data scrubbing, data Integrated, data conversion and hough transformation;These data processing techniques are used before data mining, greatly improve data mining mould The time required to the quality of formula, reduction actual excavation;Circumferential edge pretreatment is mainly in terms of data normalization;Data normalization Exactly each variable data is all linearly transformed on a new scale, variable minimum value is 0 after conversion, and maximum is 1, this Sample ensures that all variable datas are both less than equal to 1;
    Have selected in terms of job class it is simple, entered using the preferable Naive Bayes Classifier of commonplace and classifying quality Row classification;If user is in order line and task code if the type of added operation, the step can save, directly Into the medium resource to be allocated of corresponding queue;
    (4) data locality
    It is " mobile computing is more preferable than mobile data " that a principle is followed in Hadoop, and the calculate node for being moved to placement data will Cost is more saved than moving data to a calculate node, performance is more preferable;On data locality, this invention takes prolong When degrade scheduling strategy.
  2. 2. dynamic labels match DLMS dispatching methods under Hadoop platform according to claim 1, it is characterised in that:
    (1) user submits application program to YARN, including user program, starts ApplicationMaster orders;
    (2) ResourceManager is first Container of the application assigned, and is led to corresponding NodeManager Letter, it is desirable to which it starts the ApplicationMaster of application program;
    (3) it is each task application resource, and monitor them after ApplicationMaster is registered to ResourceManager Running status, until end of run
    (4) NodeManager sends and self dynamic node label of detection generation is carried out before heartbeat, and to ResourceManager reports resource;
    (5) classification of task enters in different tag queues, carries out the resources to be allocated such as priority ranking;
    (6) ApplicationMaster applies for and got resource by RPC agreements to ResourceManager;
    (7) according to the NodeManager node labels reported and resource, scheduler is by the resource allocation of this node to corresponding label The operation of queue;
    (8) ApplicationMaster applies to after resource, is just communicated with corresponding NodeManager, it is desirable to which it, which starts, appoints Business;
    (9) NodeManager is that task is set after running environment (environmental variance, JAR bags, binary program etc.), by task Start order to write in script, and by running the script startup task;
    (10) each task reports the state and progress of oneself, Ke Yi by some RPC agreement to ApplicationMaster Task is restarted during mission failure;
    (11) after the completion of application program operation, ApplicationMaster is nullified to ResourceManager and is closed oneself;
    Preliminary classification is carried out to cluster physical node first, the procedure of classification is as follows:
    (1) clustered machine set of node is set as N={ Ni| i ∈ [1, n] } n be node total number amount, i for since 1 n positive integer, Ni Represent i-th of physical machine in cluster;
    (2) CPU, IO and the plain edition operation and when recording Job execution of identical task amount are carried out on every node Between;Tcpu(i) represent in NiThe cost time of CPU operations is performed on individual node;Tio(i) represent in NiPerformed on individual node The cost time of IO operations, Tcom(i) represent in NiThe cost time of Ordinary Work is performed on individual node;
    (3) the cluster average time of every kind of operation is calculated, the calculation formula of cluster average time is as follows: J represents the type of operation, calculates each node under such a operation With the time difference of average time, if Tcpu(i)<Avgcpu, it is the original tag that this node sticks CPU type nodes, if Tcpu (i)>Avgcpu, it is that this node sticks plain edition original tag, is had by the label being likely to more afterwards on every node many Individual, the label for selecting the saving time most is the last label of this node;
    If Map operation information be M, it include it is following need collect information M=MIn, MOut, Rate, Acpu, Mcpu, Zcpu, Mrate } Min represents map input data amounts, MOut represents map output data quantities, and Rate represents input data amount/output Data volume, Acpu represents CPU average service rates, and Mcpu represents cpu medians, and Zcpu represents that cpu utilization rates are flat more than 90% Mean, MRate represents internal memory usage amount, and these data are by the characteristic attribute as this later job class;
    The design method of the double-deck weight of User Defined is taken in terms of queue priority, the power shared by the size attribute of industry is provided as Weight is worthNum, and the attribute falls into three classes num ∈ { long, mid, short }, weight shared by owner's attribute of operation For worthUser, the attribute is divided into two grade user ∈ { root, others }, and weight shared by the urgency level of operation is WorthEmogence, the attribute be divided into Three Estate prority ∈ highPrority, midPrority, LowPrority }, the weight shared by the stand-by period of operation is worthWait, and the calculation formula of wait is waitTime= NowTime-submitTime, assigns corresponding weight, the priority number of each task is finally calculated, then in corresponding team It is ranked up in row;Above-mentioned five kinds of task attribute weights are added and are 100%, and specific formula is as follows;
    WorthNum+worthUser+worthEmogence+worthWtait=100%;
    Last weight calculation formula:
    FinalWort=worthNum*num+worthUser*user+worthEmogence*pror ity+worthWait* waitTime
    In terms of job class, using Naive Bayes Classifier, specific classifying step is as follows:
    (1) it is the conditional probability under CPU, IO or plain edition operation under certain conditions to calculate an operation respectively:
    P (job=labcpu|V1,V2,…,Vn)
    P (job=labio|V1,V2,…,Vn)
    P (job=labcom|V1,V2,…,Vn)
    Wherein job ∈ { cpu, io, com } represent job class label;ViFor the attributive character of operation;
    (2) according to Bayesian formula P (B | A)=P (AB)/P (A):
    Assuming that ViBetween it is relatively independent, assumed wherein according to independent
    (3) P (V in actual calculating1,V2,…,Vn) unrelated with operation negligible, therefore can finally obtain
    Similarly have
    Operation is that the operation of CPU types, the operation of IO types or plain edition operation are bigger depending on which probable value;
    Locality takes delay degradation scheduling strategy herein;The tactful concretism is as follows:
    Increase a delay time attribute for each operation, if TiFor the current delay time of i-th of operation, i ∈ [1, n], n is The interstitial content of cluster, TlocalRepresent local node delay time threshold value, TrackRepresent frame node delay time threshold;Work as tune When spending device distribution resource to operation, if the execution node and data input node of operation are not on one node, now TiFrom Increase 1, represent that the operation is once delayed scheduling, now by this resource allocation to other suitable operations, until working as Ti>Tlocal When, the locality of operation will be reduced to frame locality, as long as now the node in this frame can be by resource allocation Give the operation;Work as Ti>TrackWhen, the locality of operation is reduced to random node;T thereinlocalAnd TrackAll use configuration file Mode voluntarily configured according to cluster situation by user;It can be ensured in certain delay time using the scheduling strategy of delay Obtain preferable locality;
    The basic thought of DLMS dispatching methods is to allocate Partial Jobs execution in advance, and the information returned according to operation is carried out to operation Classification, then the giving the resource allocation of node label in corresponding queue of the task, basic procedure:
    Step 1 is when node reports resource by heartbeat to resource management, if original queue is not sky, traversal is former Operation in beginning queue, corresponding label is assigned to by the operation that homework type label is specified in order line or program In priority query, original queue removes this operation;
    If step 2 original queue is not sky, by the resource allocation on this node to original queue, operation, which enters, waits team Distribution next time resource is waited in row, original queue removes this operation, this wheel distribution terminates;
    It is corresponding to waiting the operation progress in priority query to be sorted into if it is not sky that step 3, which waits priority query, Label priority query;
    If the step 4 such as corresponding job class queue of joint behavior label is not sky, this is given by the resource allocation of this node Queue, this wheel distribution terminates;
    Step 5 sets and checks resource access times variable, if it exceeds the quantity of cluster, then by the resource of node by CPU, IO, Commonly, priority orders are waited to allocate resources to corresponding queue, this wheel finishing scheduling;This step can prevent similar Situations below, cpu queue operation is excessive, causes CPU type node resources to exhaust, the node of other labels also has resource, but It is that operation can not distribute the situation of resource.
CN201710181055.0A 2017-03-24 2017-03-24 Dynamic label matching DLMS scheduling method under Hadoop platform Expired - Fee Related CN107038069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710181055.0A CN107038069B (en) 2017-03-24 2017-03-24 Dynamic label matching DLMS scheduling method under Hadoop platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710181055.0A CN107038069B (en) 2017-03-24 2017-03-24 Dynamic label matching DLMS scheduling method under Hadoop platform

Publications (2)

Publication Number Publication Date
CN107038069A true CN107038069A (en) 2017-08-11
CN107038069B CN107038069B (en) 2020-05-08

Family

ID=59534217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710181055.0A Expired - Fee Related CN107038069B (en) 2017-03-24 2017-03-24 Dynamic label matching DLMS scheduling method under Hadoop platform

Country Status (1)

Country Link
CN (1) CN107038069B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766150A (en) * 2017-09-20 2018-03-06 电子科技大学 A kind of job scheduling algorithm based on hadoop
CN107832153A (en) * 2017-11-14 2018-03-23 北京科技大学 A kind of Hadoop cluster resources self-adapting distribution method
CN107832134A (en) * 2017-11-24 2018-03-23 平安科技(深圳)有限公司 multi-task processing method, application server and storage medium
CN108052443A (en) * 2017-10-30 2018-05-18 北京奇虎科技有限公司 A kind of test assignment dispatching method, device, server and storage medium
CN108509280A (en) * 2018-04-23 2018-09-07 南京大学 A kind of Distributed Calculation cluster locality dispatching method based on push model
CN108959580A (en) * 2018-07-06 2018-12-07 深圳市彬讯科技有限公司 A kind of optimization method and system of label data
CN110278257A (en) * 2019-06-13 2019-09-24 中信银行股份有限公司 A kind of method of mobilism configuration distributed type assemblies node label
CN110532085A (en) * 2018-05-23 2019-12-03 阿里巴巴集团控股有限公司 A kind of dispatching method and dispatch server
WO2020034646A1 (en) * 2018-08-17 2020-02-20 华为技术有限公司 Resource scheduling method and device
CN111124765A (en) * 2019-12-06 2020-05-08 中盈优创资讯科技有限公司 Big data cluster task scheduling method and system based on node labels
WO2020119117A1 (en) * 2018-12-14 2020-06-18 平安医疗健康管理股份有限公司 Distributed computing method, apparatus and system, device and readable storage medium
CN111930493A (en) * 2019-05-13 2020-11-13 中国移动通信集团湖北有限公司 NodeManager state management method and device in cluster and computing equipment
CN112039709A (en) * 2020-09-02 2020-12-04 北京首都在线科技股份有限公司 Resource scheduling method, device, equipment and computer readable storage medium
CN112445925A (en) * 2020-11-24 2021-03-05 浙江大华技术股份有限公司 Clustering archiving method, device, equipment and computer storage medium
CN113590294A (en) * 2021-07-30 2021-11-02 北京睿芯高通量科技有限公司 Self-adaptive and rule-guided distributed scheduling method
CN114064294A (en) * 2021-11-29 2022-02-18 郑州轻工业大学 Dynamic resource allocation method and system in mobile edge computing environment
CN114840343A (en) * 2022-05-16 2022-08-02 江苏安超云软件有限公司 Task scheduling method and system based on distributed system
WO2023051233A1 (en) * 2021-09-30 2023-04-06 华为技术有限公司 Task scheduling method, device, apparatus and medium
WO2023056618A1 (en) * 2021-10-09 2023-04-13 国云科技股份有限公司 Cross-cloud platform resource scheduling method and apparatus, terminal device, and storage medium
CN117056061A (en) * 2023-10-13 2023-11-14 浙江远算科技有限公司 Cross-supercomputer task scheduling method and system based on container distribution mechanism

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013015942A1 (en) * 2011-07-28 2013-01-31 Yahoo! Inc. Method and system for distributed application stack deployment
CN104915407A (en) * 2015-06-03 2015-09-16 华中科技大学 Resource scheduling method under Hadoop-based multi-job environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013015942A1 (en) * 2011-07-28 2013-01-31 Yahoo! Inc. Method and system for distributed application stack deployment
CN104915407A (en) * 2015-06-03 2015-09-16 华中科技大学 Resource scheduling method under Hadoop-based multi-job environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐鹏: ""云计算平台作业调度算法优化研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766150A (en) * 2017-09-20 2018-03-06 电子科技大学 A kind of job scheduling algorithm based on hadoop
CN108052443A (en) * 2017-10-30 2018-05-18 北京奇虎科技有限公司 A kind of test assignment dispatching method, device, server and storage medium
CN107832153B (en) * 2017-11-14 2020-12-29 北京科技大学 Hadoop cluster resource self-adaptive allocation method
CN107832153A (en) * 2017-11-14 2018-03-23 北京科技大学 A kind of Hadoop cluster resources self-adapting distribution method
CN107832134A (en) * 2017-11-24 2018-03-23 平安科技(深圳)有限公司 multi-task processing method, application server and storage medium
CN107832134B (en) * 2017-11-24 2021-07-20 平安科技(深圳)有限公司 Multitasking method, application server and storage medium
CN108509280A (en) * 2018-04-23 2018-09-07 南京大学 A kind of Distributed Calculation cluster locality dispatching method based on push model
CN108509280B (en) * 2018-04-23 2022-05-31 南京大学 Distributed computing cluster locality scheduling method based on push model
CN110532085B (en) * 2018-05-23 2022-11-04 阿里巴巴集团控股有限公司 Scheduling method and scheduling server
CN110532085A (en) * 2018-05-23 2019-12-03 阿里巴巴集团控股有限公司 A kind of dispatching method and dispatch server
CN108959580A (en) * 2018-07-06 2018-12-07 深圳市彬讯科技有限公司 A kind of optimization method and system of label data
WO2020034646A1 (en) * 2018-08-17 2020-02-20 华为技术有限公司 Resource scheduling method and device
WO2020119117A1 (en) * 2018-12-14 2020-06-18 平安医疗健康管理股份有限公司 Distributed computing method, apparatus and system, device and readable storage medium
CN111930493A (en) * 2019-05-13 2020-11-13 中国移动通信集团湖北有限公司 NodeManager state management method and device in cluster and computing equipment
CN111930493B (en) * 2019-05-13 2023-08-01 中国移动通信集团湖北有限公司 NodeManager state management method and device in cluster and computing equipment
CN110278257A (en) * 2019-06-13 2019-09-24 中信银行股份有限公司 A kind of method of mobilism configuration distributed type assemblies node label
CN111124765A (en) * 2019-12-06 2020-05-08 中盈优创资讯科技有限公司 Big data cluster task scheduling method and system based on node labels
CN112039709A (en) * 2020-09-02 2020-12-04 北京首都在线科技股份有限公司 Resource scheduling method, device, equipment and computer readable storage medium
CN112039709B (en) * 2020-09-02 2022-01-25 北京首都在线科技股份有限公司 Resource scheduling method, device, equipment and computer readable storage medium
CN112445925A (en) * 2020-11-24 2021-03-05 浙江大华技术股份有限公司 Clustering archiving method, device, equipment and computer storage medium
CN112445925B (en) * 2020-11-24 2022-08-26 浙江大华技术股份有限公司 Clustering archiving method, device, equipment and computer storage medium
CN113590294A (en) * 2021-07-30 2021-11-02 北京睿芯高通量科技有限公司 Self-adaptive and rule-guided distributed scheduling method
CN113590294B (en) * 2021-07-30 2023-11-17 北京睿芯高通量科技有限公司 Self-adaptive and rule-guided distributed scheduling method
WO2023051233A1 (en) * 2021-09-30 2023-04-06 华为技术有限公司 Task scheduling method, device, apparatus and medium
WO2023056618A1 (en) * 2021-10-09 2023-04-13 国云科技股份有限公司 Cross-cloud platform resource scheduling method and apparatus, terminal device, and storage medium
CN114064294B (en) * 2021-11-29 2022-10-04 郑州轻工业大学 Dynamic resource allocation method and system in mobile edge computing environment
CN114064294A (en) * 2021-11-29 2022-02-18 郑州轻工业大学 Dynamic resource allocation method and system in mobile edge computing environment
CN114840343A (en) * 2022-05-16 2022-08-02 江苏安超云软件有限公司 Task scheduling method and system based on distributed system
CN117056061A (en) * 2023-10-13 2023-11-14 浙江远算科技有限公司 Cross-supercomputer task scheduling method and system based on container distribution mechanism
CN117056061B (en) * 2023-10-13 2024-01-09 浙江远算科技有限公司 Cross-supercomputer task scheduling method and system based on container distribution mechanism

Also Published As

Publication number Publication date
CN107038069B (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN107038069A (en) Dynamic labels match DLMS dispatching methods under Hadoop platform
US9542223B2 (en) Scheduling jobs in a cluster by constructing multiple subclusters based on entry and exit rules
CN104298550B (en) A kind of dynamic dispatching method towards Hadoop
CN110166282A (en) Resource allocation methods, device, computer equipment and storage medium
CN104317658A (en) MapReduce based load self-adaptive task scheduling method
US8375228B2 (en) Multiple-node system power utilization management
CN109408229A (en) A kind of dispatching method and device
CN104881322A (en) Method and device for dispatching cluster resource based on packing model
CN116560860B (en) Real-time optimization adjustment method for resource priority based on machine learning
CN112068959A (en) Self-adaptive task scheduling method and system and retrieval method comprising method
US8180823B2 (en) Method of routing messages to multiple consumers
CN111144701B (en) ETL job scheduling resource classification evaluation method under distributed environment
CN113127176A (en) Multi-role task allocation method and system for working platform
CN103268261A (en) Hierarchical computing resource management method suitable for large-scale high-performance computer
CN115665157B (en) Balanced scheduling method and system based on application resource types
Garg et al. Optimal virtual machine scheduling in virtualized cloud environment using VIKOR method
CN110084507A (en) The scientific workflow method for optimizing scheduling of perception is classified under cloud computing environment
CN116755872A (en) TOPSIS-based containerized streaming media service dynamic loading system and method
CN115391047A (en) Resource scheduling method and device
Thamsen et al. Hugo: a cluster scheduler that efficiently learns to select complementary data-parallel jobs
CN113553353A (en) Scheduling system for distributed data mining workflow
CN110427217B (en) Content-based publish-subscribe system matching algorithm lightweight parallel method and system
Seethalakshmi et al. Job scheduling in big data-a survey
CN112579324A (en) Commodity summary statistical method based on cost model
CN110532071A (en) A kind of more application schedules system and method based on GPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200508