CN110109753A - Resource regulating method and system based on various dimensions constraint genetic algorithm - Google Patents
Resource regulating method and system based on various dimensions constraint genetic algorithm Download PDFInfo
- Publication number
- CN110109753A CN110109753A CN201910340000.9A CN201910340000A CN110109753A CN 110109753 A CN110109753 A CN 110109753A CN 201910340000 A CN201910340000 A CN 201910340000A CN 110109753 A CN110109753 A CN 110109753A
- Authority
- CN
- China
- Prior art keywords
- node
- chromosome
- task
- job
- fitness
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000001105 regulatory effect Effects 0.000 title claims abstract description 21
- 238000004422 calculation algorithm Methods 0.000 title abstract description 78
- 230000002068 genetic effect Effects 0.000 title abstract description 21
- 239000011159 matrix material Substances 0.000 claims abstract description 53
- 230000035772 mutation Effects 0.000 claims abstract description 28
- 238000013468 resource allocation Methods 0.000 claims abstract description 9
- 210000000349 chromosome Anatomy 0.000 claims description 147
- 230000006870 function Effects 0.000 claims description 35
- 239000012634 fragment Substances 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 19
- 230000005540 biological transmission Effects 0.000 claims description 8
- 238000004043 dyeing Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 239000006185 dispersion Substances 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 5
- 230000004075 alteration Effects 0.000 claims description 4
- 230000033228 biological regulation Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 230000006978 adaptation Effects 0.000 claims description 3
- 230000000295 complement effect Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 2
- 238000005316 response function Methods 0.000 claims 2
- 238000005303 weighing Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 3
- 238000002474 experimental method Methods 0.000 description 30
- 108090000623 proteins and genes Proteins 0.000 description 11
- 230000000694 effects Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 238000012795 verification Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 231100000219 mutagenic Toxicity 0.000 description 4
- 230000003505 mutagenic effect Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 230000002759 chromosomal effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 208000037088 Chromosome Breakage Diseases 0.000 description 1
- 230000005886 chromosome breakage Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000505 pernicious effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to technical field of data processing, disclose a kind of resource regulating method and system based on various dimensions constraint genetic algorithm, after initializing to prediction model, task matrix, node matrix equation, construct Double fitness value function;Formulate selection-duplication operator, crossover operator, mutation operator;After carrying out successive ignition, the resource distribution mode of global optimum is obtained.The present invention is to seek more preferably Resource Allocation Formula, proposes a kind of Hadoop resource scheduling algorithm based on various dimensions constraint genetic algorithm, realizes Hadoop Resource Scheduler by the algorithm;Cluster resource allocative efficiency can be effectively improved using inventive algorithm, so that cluster task is performed integrally the time and shortens 20% or so.
Description
Technical field
The invention belongs to technical field of data processing more particularly to a kind of resource tune based on various dimensions constraint genetic algorithm
Spend method and system.Specially a kind of Hadoop resource regulating method and system based on various dimensions constraint genetic algorithm
Background technique
Currently, the immediate prior art:
Resource scheduling is a kind of combinatorial optimization problem, its final purpose is to be assigned to all tasks of cluster most to close
Suitable node executes optimal to reach cluster overall performance.Hadoop YARN provides the resource scheduling algorithm and real built in three
Corresponding Resource Scheduler, i.e. FIFO, Capacity and Fair scheduler are showed.But as application scenarios (hand over by such as iterative calculation
Mutual formula is calculated, is calculated in real time) continuous extension, these schedulers, which are not well positioned to meet user's reasonable distribution resource and reduce, appoints
Business executes the demand of time.
In conclusion problem of the existing technology is:
(1) in the prior art, allocation efficiency of resource is low, and it is long that cluster task is performed integrally the time.
(2) in different application scene, the scheduler of the prior art is not well positioned to meet user's reasonable distribution resource
Reduce the demand of task execution time.
Solve the difficulty of above-mentioned technical problem:
Resource scheduling is a kind of combinatorial optimization problem, and difficulty is to need for tasks all in cluster to be assigned to most
Reasonable node executes, and combines cluster loading condition and task execution time, optimal to reach cluster overall performance.Together
When will based on various dimensions constraint genetic algorithm be applied to Hadoop resource scheduling when, needing to carry out many experiments can just obtain
Optimized parameter setting in algorithm.
Solve the meaning of above-mentioned technical problem:
By studying existing resource scheduling scheme, more preferably resource scheduling algorithm is redesigned, for big data cloud computing
The development of technology has impetus, for improving system entirety resource utilization and Hadoop platform overall performance with important
Meaning.
Summary of the invention
In view of the problems of the existing technology, the present invention provides a kind of resource tune based on various dimensions constraint genetic algorithm
Spend method and system.
The invention is realized in this way a kind of Hadoop resource regulating method, the Hadoop resource regulating method include:
After initializing to prediction model, task matrix, node matrix equation, Double fitness value function is constructed;
Formulate selection-duplication operator, crossover operator, mutation operator;
After carrying out successive ignition, the resource distribution mode of global optimum is obtained.
Further, the Hadoop resource regulating method includes:
Step 1 initializes prediction model after user submits operation to cluster, constructs task matrix, node matrix equation
Information and coding result are saved to file;
Step 2, population building primary, generates feasible solution chromosome primary according to task matrix, node matrix equation at random, remembers
It is Scale;
Step 3, fitness calculate, and the fitness value of each chromosome in population is calculated separately by fitness function;
Step 4, termination condition judgement first judge whether to meet termination condition, i.e., before entering next round iterative evolution
Whether reach the iteration upper limit, the condition that meets then selects in current population the highest chromosome of fitness as optimal solution, otherwise into
Enter new round iteration;
Step 5, fitness probability calculation are chosen in evolve next time according to fitness value calculation each chromosome
In probability, generate new chromosome subsequently into selection-duplication, intersection, mutation operation;
Step 6 replicates the highest Scale*cp item dye of fitness in population Scale by reproduction ratio cp in duplication operator
Colour solid enters next iteration;
Step 7 executes selection operator by circulation and selects two chromosomes as parent chromosome, into crossover operation
Generate remaining Scale* (1-cp) chromosome;
Step 8 executes mutation operation for Scale* (1-cp) chromosome of generation, and to the dyeing that variation is completed
Body enters next iteration;
Step 9 enters step the iterative evolution of a three carry out new rounds.
Further, step 1, user submit operation into cluster, cluster environment model be denoted as G=NoedSet,
JobSet }, wherein NodeSet={ node1, node2, node3... ..., nodenIndicate node resource set;JobSet=
{Job1, job2, job3... ..., jobnIndicate operation set, each Jobi={ task1, task2, task3... ..., taskn}
(0≤i < n), wherein task includes map task, reduce task;Task in set JobSet is assigned to NodeSet
In node execute, carry out entirety Job task run;
Initialization prediction model method include:
To Map and Reduce task, TS (map/reduce) model is constructed, model uses following data format:
<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>
Wherein FileSize indicates current work size, and SplitSize indicates operation fragment size, and SplitNum indicates to make
Industry fragment number, MapTime, ReduceTime respectively indicate the execution time in operation Map stage and Reduce stage;Then lead to
Evaluation history task attribute information is crossed to predict the execution time of new task;
The TS of building(map/reduce)Model is stored in RescourseManager, and NodeManager is communicated by heartbeat
Node attribute information is periodically passed to RescourseManager by mechanism.
Further, in step 1,
The initial method of task matrix includes:
JobSet={ Job is used for operation set1, job2, job3... ..., jobnIndicate, each Jobi=
{ JobSize, SplitSize, SplitNum } (0≤i < n), JobSize indicate job size, and SplitSize expression is each cut
Piece size, SplitNum fragment number;Node matrix equation initial method includes: for node set NodeSet={ node1,
node2, node3... ..., noden, nodei={ cpuSpeedi, AllRi, UsedRi, Cnumi, Loadi(0≤i < n),
cpuSpeediIndicate the cpu floating-point operation ability of node i, AllRiIndicate the node server total resources, UsedRiIndicate the section
Point server resource, CnumiIndicate the node server CPU core number.
Further, in step 2, chromosome matrix generating method includes: to dye volume matrix by the matrix group of a n × t
At n row indicates mission number, and t column indicate node serial number, are denoted as chromosomeMatrix=(chromosomeMatrix [i]
[j])n×t;
Wherein matrix element chromosomeMatrix [i] [j] ∈ { 0,1 } (0≤i < n, 0≤j < t), element
ChromosomeMatrix [i] [j]=1 indicates that task i is distributed to node j and executed by this item chromosome, and element value is 0 expression
Current task is not yet assigned to the node by this item chromosome;Each task is assigned to only a node and executes, full simultaneously
Sufficient condition
Further, in step 3, using the Double fitness value function based on optimal time span and based on load balancing, so that
Most short task execution time is found during Evolution of Population and each node load balancing direction of cluster is kept to advance;
Fitness function based on time span includes: to execute the time for Job,
TjobIt is the execution time an of operation, for entire cluster, while runs multiple operations, executed the latest
Complete operation is the optimal time span of this chromosome allocation plan;
Fitness function based on optimal time span indicates are as follows:
Wherein Ftime(c) the optimal time span of the c articles chromosome in population is indicated, N indicates operation quantity,
ChromosomeScale indicates population scale;For the optimal time span collection of all chromosomes in a wheel iteration
Closing indicates are as follows:
AllFtime={ Ftime(1), Ftime(2), Ftime(3) ... ..., Ftime(c)}
Wherein AllFtimeIndicate that all chromosome time span set in epicycle iteration, set subscript indicate that chromosome is compiled
Number, element value indicates this chromosome time span value.
Further, Map task and the calculation method of Reduce task execution time include:
A) it is as follows to execute time calculating by each Map task of Job operation:
Wherein Tmap(i, j, k) indicates that k-th of fragment of operation i distributes to the time of node j execution, Split (i, k) (0
≤ k < splitNum) indicate operation i k-th of fragment size, cpuSpeedjIndicate the CPU arithmetic speed of node j;If point
Piece size and blocks of files are not of uniform size, then this task may need internet transmission of virtual laboratory blocks of files to synthesize a fragment,
Block (i, k) indicates that the fragment task needs the data block size from other node-node transmissions, and node (i, j) indicates task i storage
Network transfer speeds between node and execution node j;Job one big is divided into several fragments, one Map of a fragment
Task, it is parallel respectively to execute, the Map task being finished as entire Map task task execution time, for one
Jobi, Tmap=Max (Tmap(i,j,k));
B) for Reduce task, task execution time is according to TS(map/reduce)The historical information of model construction <
FileSize, SplitSize, SplitNum, MapTime, ReduceTime > predicted;
Fitness function based on load balancing includes:
It is higher that more balanced allocation strategy cluster source utilization rate is loaded in resource allocation process interior joint.Present invention design one
Number of tasks in set of tasks JobSet is expressed as by fitness function of the kind based on load balancing after initialization of population
JobSet.length, node set NodeSet interior joint number are expressed as NodeSet.length, then per node on average distributes
Number of tasks are as follows:
Dispersion degree of one group of data with respect to mean value is measured by standard deviation, standard deviation is smaller, more connects with average value
Closely, cluster load is more balanced;It is indicated based on load balancing fitness function are as follows:
Wherein Fload(c) standard deviation of this chromosome node distribution number of tasks, the i.e. fitness of its load balancing are indicated
Value, TaskNum (c, j) indicate the task number that the c articles chromosome, j-th of node is assigned to, and N indicates node total number amount,
AvgTask indicates each node mean allocation number of tasks in this chromosome allocation plan.For all dyeing in a wheel iteration
The load balancing fitness set expression of body are as follows:
AllFload={ Fload(1), Fload(2), Fload(3) ... ..., Fload(i)}
Wherein AllFloadIndicate that all chromosome load balancing fitness set in epicycle iteration, set subscript indicate dye
Colour solid number, element value indicate this chromosome load balancing fitness value;
Normalized: following Set criteria formula is used:
For being indicated after being based on time span fitness function normalized are as follows:
For being based on indicating after loading equal fitness line number normalized are as follows:
Ftime (k) *, which is represented, for chromosome executes the time, and the value the big, illustrates that the execution time is longer, fitness is answered
This is smaller;Fload (k) * represents dispersion degree of the node distribution number of tasks with respect to mean allocation number of tasks simultaneously, is worth bigger explanation
Cluster load is more unbalanced, and fitness should be smaller;It is based on optimal time span and based on negative for every chromosome
Carrying balanced fitness function indicates are as follows:
Further, to building historical information<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>
Carrying out prediction technique includes:
Step 1: assuming that the operation to be predicted is NewJob, first in TS(map/reduce)Model is looked for and current work NewJob
Size (FileSize) similar in operation set JobSet1={ Job1,Job2,Job3,……Jobk};
Step 2: then in JobSet1In to find fragment size (SplitSize) consistent with fragment quantity (SplitNum)
Operation set JobSet2={ Job1,Job2,Job3,……Jobk};
Step 3: and then in Jobset2In find operation similar in Map task execution time with current work NewJob
Set JobSet3={ Job1,Job2,Job3,……Jobk};
Step 4: the Reduce phased mission for finally calculating current work NewJob according to the following formula executes the time:
Wherein TreduceIndicate the Reduce stage overall execution time of current Job, TmapIndicate the Map stage of current Job
Overall execution time, AvgTmapIndicate JobSet3The Map stage average performance times of all operations in set, AvgTreduceTable
Show JobSet3All operation Reduce stage average performance times in set;Since node load can constantly change, same node
Different task execution times is had same task is in different moments, so needing plus a load regulation parameter
ω, for balancing the deadline of task under different loads, ω is expressed as the ratio of current time load and history average load.
Further, in step 5, fitness probability matrix: calculation is as follows:
Wherein Fprob(i) probability that chromosome i is selected in next round iteration is indicated.Fitness probability is in each round
Iteration terminates, and new round iteration calculates before starting, and value is mapped to one-dimensional matrix, and structure is as follows:
SelectionProbability={ Fprob(1), Fprob(2), Fprob(3) ..., Fprob(i)}
Wherein SelectionProbability indicates the set of all chromosome fitness probability in last round of iteration, collection
Closing subscript indicates chromosome numbers, and matrix intermediate value indicates the corresponding fitness probability of the chromosome, what next round iteration was selected
Probability;SelectionProbability should be met
Selection-duplication operator meets following formula:
Wherein CrossoverNum indicates to choose the chromosome quantitative for carrying out crossover operation by roulette mode,
CopyNum indicates the chromosome quantitative directly replicated, and cp is reproduction ratio;
When crossover operator does crossover operation, two-dimensional matrix is first decoded into one-dimensional form:
ChromosomeMatrix=[2,3,1,4,5,7,2 ..., 9];
Wherein chromosomeMatrix indicates item chromosome, and subscript indicates mission number, and element value indicates that node is compiled
Number, such as chromosomeMatrix [1]=3, task 1 is distributed to node 3 and executed by expression;Random complementary method is taken to intersect parent
Chromosome selects two high parent chromosomes of fitness by selection operator first, and random interception same position, which is write down, to be designated as
Flag, parent chromosome intercept chromosomeMatrix [0, flag], and mother then intercepts chromosomeMatrix for chromosome
The two, is then binned in and is formed together child chromosome by [flag, end];
The self-adaptive mutation calculation formula of mutation operator is as follows:
Wherein Pvar(k) mutation probability of chromosome k, F are indicatedAdapt(max) population chromosome maximum adaptation angle value is indicated,
FAdapt(avg) population chromosome average fitness value is indicated.FAdapt(k) fitness value of chromosome k, λ are indicatedminAnd λmaxIt is
Mutagenic factor controls the upper and lower bound of aberration rate value.
Another object of the present invention is to provide a kind of Hadoop resource tune for implementing the Hadoop resource regulating method
Degree system.
In conclusion advantages of the present invention and good effect are as follows:
To seek more preferably Resource Allocation Formula, a kind of Hadoop resource tune based on various dimensions constraint genetic algorithm is proposed
Spend algorithm (Hadoop Resource Scheduler Based on Multi-dimensional Constrained
GeneticAlgorithm, MCGA), Hadoop Resource Scheduler is realized by the algorithm.
The present invention can initialize prediction model, task matrix, node matrix equation first, construct Double fitness value function, then make
Determine selection-duplication operator, crossover operator, mutation operator etc., then by finally searching out global optimum after completing successive ignition
Resource Allocation Formula.It is proved by 3 data of table and Figure 13, Figure 14, Figure 15 experiment effect figure, can effectively be mentioned using inventive algorithm
High cluster resource allocative efficiency, so that cluster task is performed integrally the time and shortens 20% or so.
Detailed description of the invention
Fig. 1 is the Hadoop resource regulating method process provided in an embodiment of the present invention based on various dimensions constraint genetic algorithm
Figure.
Fig. 2 is TS prediction model data acquisition flow chart provided in an embodiment of the present invention.
Fig. 3 is chiasma operation chart provided in an embodiment of the present invention.
Fig. 4 is chromosomal variation operation chart provided in an embodiment of the present invention.
Fig. 5 is clustered node topological diagram provided in an embodiment of the present invention.
Fig. 6 is 2.1 figure of ant group algorithm experiment numbers provided in an embodiment of the present invention.
Fig. 7 is 4.1 figure of ant group algorithm experiment numbers provided in an embodiment of the present invention.
Fig. 8 1.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Fig. 9 is 2.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 10 is 3.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 11 is 4.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 12 is 5.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 13 is 6.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 14 is three kinds of schedulers provided in an embodiment of the present invention average task completion time figure under four group task collection.
Figure 15 is the figure of changing that the second group job collection provided in an embodiment of the present invention runs 20 times.
Figure 16 is the figure of changing that third group job collection provided in an embodiment of the present invention runs 20 times.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
In the prior art, allocation efficiency of resource is low, and it is long that cluster task is performed integrally the time.It is existing in different application scene
Having the scheduler of technology not to be well positioned to meet user's reasonable distribution resource reduces the demand of task execution time.
To solve the above problems, below with reference to concrete scheme, the present invention is described in detail.
As shown in Figure 1, the Hadoop resource tune provided in an embodiment of the present invention based on various dimensions constraint genetic Algorithm Design
Degree method, comprising:
1) start to initialize prediction model after user submits operation to cluster, building task matrix, node matrix equation will
Information and coding result are saved to file;
2) one group of feasible solution population building primary: is generated as chromosome primary according to task matrix, node matrix equation at random
It is denoted as Scale;
3) fitness calculates: the fitness value of each chromosome in population is calculated separately by fitness function;
4) whether termination condition judges: before entering next round iterative evolution, first judging whether to meet termination condition, i.e.,
Reach the iteration upper limit, the condition that meets then is selected the highest chromosome of fitness in current population and otherwise entered new as optimal solution
One wheel iteration;
5) it fitness probability calculation: is selected in evolve next time according to fitness value calculation each chromosome general
Rate generates new chromosome subsequently into selection-duplication, intersection, mutation operation;
6) by reproduction ratio cp in duplication operator, the highest Scale*cp chromosome of fitness in population Scale is replicated
Into next iteration;
7) selection operator is executed by circulation and selects two chromosomes as parent chromosome, generated into crossover operation surplus
Remaining Scale* (1-cp) chromosome;
8) mutation operation is executed for the Scale* of generation (1-cp) chromosome, under the chromosome for allowing variation to complete enters
An iteration;
9) here it is the processes that an iteration is evolved, and proceed immediately to the iterative evolution that step 3) carries out a new round.
Below with reference to each parameter setting of Hadoop resource scheduling algorithm based on various dimensions constraint genetic algorithm to the present invention
It is further described.
Cluster environment model is denoted as G={ NoedSet, JobSet }, wherein NodeSet={ node1, node2,
node3... ..., nodenIndicate node resource set;JobSet={ Job1, job2, job3... ..., jobnIndicate operation set
It closes, each Jobi={ task1, task2, task3... ..., taskn(0≤i < n), wherein task has map task also to have
reduce task.The task in this set JobSet the node in NodeSet is assigned to eventually by dispatching algorithm to execute,
And make whole Job task completion time most short.
It is further described below with reference to genetic algorithm parameter and Hadoop resource dispatching model parameter.
1) prediction model is initialized:
To Map and Reduce task, TS (map/reduce) model is constructed, model uses following data format:
<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>
Wherein FileSize indicates current work size, and SplitSize indicates operation fragment size, and SplitNum indicates to make
Industry fragment number, MapTime, ReduceTime respectively indicate the execution time in operation Map stage and Reduce stage.Then lead to
Evaluation history task attribute information is crossed to predict the execution time of new task.
The TS of building(map/reduce)Model is stored in RescourseManager, to make when dispatching algorithm starting
With node attribute information is periodically passed to RescourseManager by heartbeat communication mechanism by NodeManager.
The following Fig. 2 TS prediction model data acquisition figure of the data TRANSFER MODEL of prediction model.
2) task matrix:
JobSet={ Job is shared for operation set1, job2, job3... ..., jobnIndicate, wherein each Jobi=
{ JobSize, SplitSize, SplitNum } (0≤i < n), JobSize indicate job size, and SplitSize expression is each cut
Piece size, SplitNum fragment number.
3) node matrix equation:
For node set NodeSet={ node1, node2, node3... ..., noden, wherein nodei=
{cpuSpeedi, AllRi, UsedRi, Cnumi, Loadi(0≤i < n), cpuSpeediIndicate the cpu floating-point operation energy of node i
Power, AllRiIndicate the node server total resources, UsedRiIndicate the node server resource, CnumiIndicate the node
Server CPU core number.
4) volume matrix is dyed
Population is evolved every time can generate several chromosomes, and every chromosome is all a feasible solution of current problem, can
Contain multiple elements in row solution, each element is known as a gene of chromosome.Volume matrix is dyed by the matrix group of a n × t
At n row indicates mission number, and t column indicate node serial number, are denoted as chromosomeMatrix=(chromosomeMatrix [i]
[j])n×t。
Wherein matrix element chromosomeMatrix [i] [j] ∈ { 0,1 } (0≤i < n, 0≤j < t), element
ChromosomeMatrix [i] [j]=1 indicates that task i is distributed to node j and executed by this item chromosome, and element value is 0 expression
Current task is not yet assigned to the node by this item chromosome.At the same time, each task can only distribute to a node
It executes, so condition need to be met
5) Double fitness value function:
Fitness function is used to control the direction of Evolution of Population.The present invention is taken based on optimal time span and based on load
Balanced Double fitness value function.So that population towards the most short task execution time of searching and keeps cluster respectively to save during evolution
Advance in point load balancing direction.
In embodiments of the present invention, Double fitness value function specifically includes: fitness function based on time span and being based on
The fitness function of load balancing.
In embodiments of the present invention, the fitness function based on time span includes:
For a Job operation, the deadline is codetermined by Map, Reduce task completion time;It is right
For entire cluster, item chromosome is exactly a kind of Resource Allocation Formula, and OPTIMAL TASK time span is the allocation plan
In complete the latest Job execute the time determine.For a Job, it is as follows to execute time calculation formula for it:
Tjob=Tmap+Treduce(formula 2)
In embodiments of the present invention, Map task and the calculation method of Reduce task execution time include:
A) for Map task, present invention understands that there are inconsistent for task run node and document storing section point in the cluster
The case where.So Map Runtime handles the time by task and the resource transmission time determines.The task processing time mainly takes
Certainly in the CPU computing capability of task run node, the resource transmission time depend on task memory node and task run node it
Between network transfer speeds.So it is as follows to execute time calculating for each Map task of Job operation:
Wherein Tmap(i, j, k) indicates that k-th of fragment of operation i distributes to the time of node j execution, Split (i, k) (0
≤ k < splitNum) indicate operation i k-th of fragment size, cpuSpeedjIndicate the CPU arithmetic speed of node j;If point
Piece size and blocks of files are not of uniform size, then this task may need internet transmission of virtual laboratory blocks of files to synthesize a fragment,
Block (i, k) indicates that the fragment task needs the data block size from other node-node transmissions, and node (i, j) indicates task i storage
Network transfer speeds between node and execution node j.And Job one big may be divided into several fragments, one point
One Map task of piece, it is parallel respectively to execute, then the Map task being finished the latest will be as the task of entire Map task
The time is executed, so for a JobiFor, Tmap=Max (Tmap(i,j,k))。
B) for Reduce task, task execution time is according to TS(map/reduce)The historical information of model construction <
FileSize, SplitSize, SplitNum, MapTime, ReduceTime > predicted, steps are as follows for specific execution:
Step 1: assuming that the operation to be predicted is NewJob, first in TS(map/reduce)Model is looked for and current work NewJob
Size (FileSize) similar in operation set JobSet1={ Job1,Job2,Job3,……Jobk};
Step 2: then in JobSet1In to find fragment size (SplitSize) consistent with fragment quantity (SplitNum)
Operation set JobSet2={ Job1,Job2,Job3,……Jobk};
Step 3: and then in Jobset2In find operation similar in Map task execution time with current work NewJob
Set JobSet3={ Job1,Job2,Job3,……Jobk};
Step 4: the Reduce phased mission for finally calculating current work NewJob according to the following formula executes the time:
Wherein TreduceIndicate the Reduce stage overall execution time of current Job, TmapIndicate the Map stage of current Job
Overall execution time, AvgTmapIndicate JobSet3The Map stage average performance times of all operations in set, AvgTreduceTable
Show JobSet3All operation Reduce stage average performance times in set.Since node load can constantly change, same node
Different task execution times is had same task is in different moments, so needing plus a load regulation parameter
ω, for balancing the deadline of task under different loads, ω is expressed as the ratio of current time load and history average load.
(formula 2) can be converted to following computation model according to (formula 3) and (formula 4):
TjobIt is the execution time an of operation, for entire cluster, while multiple operations is run, wherein the latest
The operation being finished is exactly the optimal time span of this chromosome allocation plan.
So the fitness function based on optimal time span indicates are as follows:
Wherein Ftime(c) the optimal time span of the c articles chromosome in population is indicated, N indicates operation quantity,
ChromosomeScale indicates population scale.The optimal time spans of all chromosomes in one wheel iteration are indicated are as follows:
AllFtime={ Ftime(1), Ftime(2), Ftime(3) ... ..., Ftime(c)}
Wherein AllFtimeIndicate that all chromosome time span set in epicycle iteration, set subscript indicate that chromosome is compiled
Number, element value indicates this chromosome time span value.
In embodiments of the present invention, the fitness function based on load balancing includes:
It is higher that more balanced allocation strategy cluster source utilization rate is loaded in resource allocation process interior joint.Present invention design one
Number of tasks in set of tasks JobSet is expressed as by fitness function of the kind based on load balancing after initialization of population
JobSet.length, node set NodeSet interior joint number are expressed as NodeSet.length, then per node on average distributes
Number of tasks are as follows:
The present invention measured by standard deviation one group of data with respect to mean value dispersion degree, standard deviation it is smaller then with average value
Closer, cluster load is more balanced.Therefore it is indicated based on load balancing fitness function are as follows:
Wherein Fload(c) standard deviation of this chromosome node distribution number of tasks, the i.e. fitness of its load balancing are indicated
Value, TaskNum (c, j) indicate the task number that the c articles chromosome, j-th of node is assigned to, and N indicates node total number amount,
AvgTask indicates each node mean allocation number of tasks in this chromosome allocation plan.For all dyeing in a wheel iteration
The load balancing fitness set expression of body are as follows:
AllFload={ Fload(1), Fload(2), Fload(3) ... ..., Fload(i)}
Wherein AllFloadIndicate that all chromosome load balancing fitness set in epicycle iteration, set subscript indicate dye
Colour solid number, element value indicate this chromosome load balancing fitness value.
C) normalized:
Optimal time span fitness function and load balancing fitness function are different evaluation index, they have not
Same dimension and dimensional unit.In order to eliminate the dimension impact between index, need to be standardized data.The present invention
Deviation Standardization Act is used for reference, using following Set criteria formula:
For being indicated after being based on time span fitness function normalized are as follows:
For being based on indicating after loading equal fitness line number normalized are as follows:
Ftime (k) *, which is represented, for chromosome executes the time, and the value the big, illustrates that the execution time is longer, fitness is answered
This is smaller;Fload (k) * represents dispersion degree of the node distribution number of tasks with respect to mean allocation number of tasks simultaneously, is worth bigger explanation
Cluster load is more unbalanced, and fitness should be smaller.So it is based on optimal time span and base for every chromosome
It is indicated in the fitness function of load balancing are as follows:
6) fitness probability matrix:
Fitness probability matrix be calculated according to the fitness of every chromosome its in next round iteration be selected it is general
Rate, the more big selected probability of fitness is bigger, and calculation is as follows:
Wherein Fprob(i) probability that chromosome i is selected in next round iteration is indicated.Fitness probability is in each round
Iteration terminates, and new round iteration calculates before starting, and value is mapped to one-dimensional matrix, and structure is as follows:
SelectionProbability={ Fprob(1), Fprob(2), Fprob(3) ..., Fprob(i)}
Wherein SelectionProbability indicates the set of all chromosome fitness probability in last round of iteration, collection
Closing subscript indicates chromosome numbers, and matrix intermediate value indicates the corresponding fitness probability of the chromosome, that is, next round iteration quilt
The probability chosen.Therefore SelectionProbability should be met
7) selection-duplication operator:
After chromosome fitness has been calculated, into iterative cycles step.It is to select two by selection operator first
The high chromosome of fitness enters crossover operation.Operator is replicated in the present invention and uses roulette (RWS) method, and individual is selected general
Rate is got by the calculating of fitness probability matrix.In order to guarantee that outstanding chromosome obtains for delivery to the next generation, prevents from intersecting, become
Outstanding Chromosome breakage is formed pernicious iteration by ETTHER-OR operation, and the present invention is added in selection operator replicates operator, in each iteration
Reproduction ratio is set, is copied to several high chromosomes of fitness in previous generation population are intact in population of new generation.It is multiple
The setting of ratio processed ensure that algorithm stability so that population is evolved toward the direction.
Selection-duplication operator needs to meet following formula:
Wherein CrossoverNum indicates to choose the chromosome quantitative for carrying out crossover operation by roulette mode,
CopyNum indicates the chromosome quantitative directly replicated, and cp is reproduction ratio.Excessive, the excessive algorithm of reproduction ratio should not be arranged in reproduction ratio
It is not easy to restrain.It is preferable by effect when experimental verification cp=0.2.
8) crossover operator:
Crossover operation is the main method that population generates new individual.The present invention is when to chromosome coding, the two dimension of use
Matrix coder.When doing crossover operation, two-dimensional matrix is first decoded into one-dimensional form:
ChromosomeMatrix=[2,3,1,4,5,7,2 ..., 9];
Wherein chromosomeMatrix indicates item chromosome, and subscript indicates mission number, and element value indicates that node is compiled
Number, such as chromosomeMatrix [1]=3, task 1 is distributed to node 3 and executed by expression.The present invention takes random complementary method to hand over
Parent chromosome is pitched, two high parent chromosomes of fitness are selected by selection operator first, it is random to intercept same position note
Under be designated as flag, wherein parent chromosome interception chromosomeMatrix [0, flag], mother then intercepted for chromosome
The two, is then binned in and is formed together child chromosome by chromosomeMatrix [flag, end].Intersect process such as Fig. 3 dye
Shown in colour solid crossover operation schematic diagram.
9) mutation operator:
The present invention uses a kind of self-adaptive mutation calculation, so that mutation probability is being planted with chromosome fitness
Serial regulation is carried out between cluster mean and maximum value, so that it is convergent to globally optimal solution when close to optimal solution to accelerate algorithm
Speed.Self-adaptive mutation calculation formula is as follows:
Wherein Pvar(k) mutation probability of chromosome k, F are indicatedAdapt(max) population chromosome maximum adaptation angle value is indicated,
FAdapt(avg) population chromosome average fitness value is indicated.FAdapt(k) fitness value of chromosome k, λ are indicatedminAnd λmaxIt is
Mutagenic factor controls the upper and lower bound (λ of aberration rate valuemin、λmax∈(0,1)).Formula is described as follows:
It is if 1) certain chromosome fitness is higher, and has been higher than average value, then outstanding on the chromosome in order to prevent
Gene is destroyed, it should be reduced its mutation probability, that is, be worked as FAdapt(k)≥FAdapt(avg) when, it should using the side in formula (15)
Method dynamic calculates its mutation probability, so that its higher aberration rate of fitness is lower.Mutagenic factor λ is obtained according to many experimentsmin=
0.005 effect is preferable.
If 2) certain chromosome fitness is lower, and subaverage, then it is bigger just to allow the chromosome to possess
Mutation probability, for enhancing population ability of searching optimum.Work as FAdapt(k)<FAdapt(avg) when, it is general to give a maximum variation
Rate λmax.Mutagenic factor λ is obtained according to many experimentsmax=0.05 effect is preferable.
In mutation operation, to reduce calculation times, using continuous variation gene position method.Genetic mutation position is determined first
It sets, then calculates the number to be made a variation and make a variation.
Mutation operation process is as follows:
1) variation judgement: the random number P generated between one [0,1]rand(k) compared with the chromosomal variation probability
Compared with if Prand(k)<Pvar(k), then mutation operation is executed to this chromosome.
2) variation number calculates: in order to avoid the gene number of item chromosome variation is too many, algorithm is caused to be not easy to restrain,
Therefore mutant gene number should meet following constraint condition:
0<VarNum≤Pvar(k)×chromosomeMatrix.length
Wherein VarNum indicates the consecutive gene number for allowing to make a variation, and chromosomeMatrix.length indicates dyeing
Body gene number.Herein using in the gene for meeting the integer representative variation generated at random within the scope of VarNum constraint condition
Number.
3) variable position judges: generating a number at random within the scope of mrna length chromosomeMatrix.length
PindexIndicate that variable position, variable position add variation number backward, it can definitive variation segment.If variable position is beyond dyeing
Then remaining gene makes a variation body length since the 1st gene.
4) it executes variation: random change genic value mode being used to make a variation to enhance the complete of population after definitive variation segment
Office's optimizing ability.For example, it is assumed that cluster, which is appointed, 100 tasks, 10 nodes meet P for chromosome krand(k)<Pvar(k),
And VarNum=3 is calculated, Pindex=5, then it represents that make a variation three genes backward at the chromosome subscript 5.
Shown in the following Fig. 4 chromosomal variation operation chart of process that makes a variation.
Below with reference to experiment, the invention will be further described.
The present invention uses two parts experimental verification algorithm operational efficiency and validity.
First part is excellent in operational efficiency compared to ant colony intelligence optimization algorithm by emulation experiment verification algorithm
Gesture, while the setting of the optimized parameter in algorithm is obtained by emulation experiment;
Second part is by building Hadoop cluster environment, using this hair of HiBench performance benchmark test Tool validation
Bright told MCGA scheduler is carried relative to the Resource Scheduler AntScheduler and Hadoop that ant group algorithm is realized
Advantage of the Capacity scheduler on the overall task deadline illustrates the correctness and validity of algorithm.
First part:
1) environment is tested
A) algorithm operational efficiency Evaluation Environment
Simulation experimental program is realized with JavaScript language, using Chrome V8 engine as algorithm operation platform, is adopted
Visual Chart is generated with Echarts.
B) algorithm validity Evaluation Environment
For verification algorithm validity and correctness, need to be verified in Hadoop cluster.Experimental situation uses 5
The Hadoop cluster of server construction, every server are 2 cores, 6GB memory, cluster 10 cores, 30GB memory in total.One of them
NameNode, a ResourceManager, 4 DataNode, 4 NodeManager.Cluster topology Fig. 5 clustered node is opened up
It flutters shown in figure.
2) algorithm operational efficiency is assessed
Two algorithms are all made of same task and node by the stabilization for guaranteeing test environment in emulation experiment
Number setting: the task of totally 100 fixed sizes, the node of 10 fixed executive capability;Pass through continuous adjustment algorithm parametric form
The algorithm operational efficiency under different conditions is compared, optimal parameter setting is finally obtained.
A) as follows for ant group algorithm parameter setting and experimental result record:
1 ant group algorithm experiment parameter of table record
The the 2.1st, No. 4.1 experiment that interception task completion time is shorter separately below and algorithm execution time is shorter, effect
Fruit is schemed as shown in Fig. 6 ant group algorithm experiment numbers 2.1 and Fig. 7 ant group algorithm experiment numbers 4.1.
B) parameter setting for genetic told for the present invention and experimental result record are as follows:
2 genetic algorithm experiment parameter of table record
Separately below shown in experiment effect Fig. 8-Figure 13 under the setting of interception different parameters.
Analysis of experimental results about AntScheduler algorithm and MCGA algorithm is as follows:
AntScheduler algorithm: experiment shows that the Algorithm Convergence is good, but local optimum is easy to treat as the overall situation most
It is excellent, and algorithm execution time is relatively long, while algorithm stability is bad, even if same group task, same group node, finally
The overall task execution time that allocation plan obtains has difference.
MCGA algorithm: experiment display MCGA algorithm possesses shorter runing time, and execution efficiency is higher.While compared to
AntScheduler algorithm has better stability, and using same group task, same group node, final allocation plan is obtained whole
Body task execution time is roughly the same.By observation experiment number it has been found that the number of iterations is more, chromosome is more, can more find
Globally optimal solution;Reproduction ratio setting is less susceptible to restrain more greatly, it is also difficult to obtain globally optimal solution.And observation experiment data are sent out
The now experiment of number 1.1~1.4 compared to other experiment no matter algorithm execution time, Algorithm Convergence or task execution time
Have a clear superiority.Therefore show that MCGA algorithm optimized parameter is set as the number of iterations 100 times, population scale i.e. chromosome number is
100, reproduction ratio 0.2.
It can be seen that MCGA algorithm of the invention meets demand in operational efficiency.
3) algorithm validity is assessed
Below will for MCGA scheduler of the present invention, ant group algorithm realization Resource Scheduler AntScheduler,
Task execution time of the Capacity scheduler of Hadoop default under four group job collection compares and analyzes.To have avoided number
According to error, task execution time takes the average value of 5 operations, such as following table.
Task completion time compares under 3 different work collection of table
Histogram is depicted as shown in tri- kinds of schedulers of Figure 14 average task completion time under four group task collection.
In the case where according to the small operation of upper table, algorithm advantage is not obvious, this is because what small operation set divided
Map, reduce task are less, and cluster resource is relatively sufficient.And in big operation set, map, reduce task phase of division
To more, resource contention is gradually fierce, and the performance advantage of algorithm just emerges from.
Second part:
For the stability of verification algorithm, MCGA, AntScheduler, Capacity scheduler operation second is respectively adopted
Group, third group job collection 20 times observe the variation of its task execution time.
By Figure 15 and Figure 16 it is found that Capacity, AntScheduler scheduler are for its task execution of same group job
Time fluctuation amplitude is larger, concentrates the 6th, 11,14,17 experiment for the second group job of Capacity scheduler, third group is made
It is singular point that industry, which concentrates 4,8,11,13,18 experiments, and on these aspects, the execution time of Capacity scheduler is relatively
It grows and increased dramatically with the difference of front and back point, this is because it arrives first what the resource distribution mode first obtained determined, if by task point
The dispensing poor node of performance, which makes cluster load imbalance that will will lead to task overall execution time, larger difference.And this
The MCGA scheduler is invented, resource allocation is carried out using intelligent optimization algorithm, according to group operation collection and node resource
Situation dynamic tuning, smart allocation, so the fluctuation of its overall task deadline is smaller, performance is more stable.
It can be concluded that, the present invention has good robustness from above-mentioned experimental analysis, either handles big operation set also
It is that small operation set its performance is superior to the resource scheduling algorithm realized using ant group algorithm and Hadoop YARN default scheduling is calculated
Method is a kind of effective resource allocation methods.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (10)
1. a kind of Hadoop resource regulating method, which is characterized in that the Hadoop resource regulating method includes:
After user submits operation to cluster, after being initialized to prediction model, task matrix, node matrix equation, construct double suitable
Response function;
Formulate selection-duplication operator, crossover operator, mutation operator;
After carrying out successive ignition, the resource distribution mode of global optimum is obtained.
2. Hadoop resource regulating method as described in claim 1, which is characterized in that the Hadoop resource regulating method packet
It includes:
Step 1 initializes prediction model after user submits operation to cluster, and building task matrix, node matrix equation will be believed
Breath and coding result are saved to file;
Step 2, population building primary, generates feasible solution chromosome primary according to task matrix, node matrix equation at random, is denoted as
Scale;
Step 3, fitness calculate, and the fitness value of each chromosome in population is calculated separately by fitness function;
Whether step 4, termination condition judgement first judge whether to meet termination condition, i.e., before entering next round iterative evolution
Reach the iteration upper limit, the condition that meets then is selected the highest chromosome of fitness in current population and otherwise entered new as optimal solution
One wheel iteration;
Step 5, fitness probability calculation are selected in evolve next time according to fitness value calculation each chromosome
Probability generates new chromosome subsequently into selection-duplication, intersection, mutation operation;
Step 6 replicates the highest Scale*cp chromosome of fitness in population Scale by reproduction ratio cp in duplication operator
Into next iteration;
Step 7 executes selection operator by circulation and selects two chromosomes as parent chromosome, generates into crossover operation
Remaining Scale* (1-cp) chromosome;
Step 8, for generation Scale* (1-cp) chromosome execute mutation operation, and to variation complete chromosome into
Enter next iteration;
Step 9 enters step the iterative evolution of a three carry out new rounds.
3. Hadoop resource regulating method as claimed in claim 2, which is characterized in that step 1 submits operation to arrive in user
In cluster, cluster environment model is denoted as G={ NoedSet, JobSet }, wherein NodeSet={ node1, node2,
node3... ..., nodenIndicate node resource set;JobSet={ Job1, job2, job3... ..., jobnIndicate operation set
It closes, each Jobi={ task1, task2, task3... ..., taskn(0≤i < n), wherein task include map task,
reduce task;Task in set JobSet is assigned to the node in NodeSet to execute, carries out entirety Job task fortune
Row;
Initialization prediction model method include:
To Map and Reduce task, TS (map/reduce) model is constructed, model uses following data format:
<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>
Wherein FileSize indicates current work size, and SplitSize indicates operation fragment size, and SplitNum indicates operation point
The piece number, MapTime, ReduceTime respectively indicate the execution time in operation Map stage and Reduce stage;Then by commenting
Historic task attribute information is estimated to predict the execution time of new task;
The TS of building(map/reduce)Model is stored in RescourseManager, and NodeManager passes through heartbeat communication mechanism
Node attribute information is periodically passed into RescourseManager.
4. Hadoop resource regulating method as claimed in claim 2, which is characterized in that in step 1,
The initial method of task matrix includes:
JobSet={ Job is used for operation set1, job2, job3... ..., jobnIndicate, each Jobi=JobSize,
SplitSize, SplitNum } (0≤i < n), JobSize expression job size, each slice size of SplitSize expression,
SplitNum fragment number;Node matrix equation initial method includes: for node set NodeSet={ node1, node2,
node3... ..., noden, nodei={ cpuSpeedi, AllRi, UsedRi, Cnumi, Loadi(0≤i < n), cpuSpeedi
Indicate the cpu floating-point operation ability of node i, AllRiIndicate the node server total resources, UsedRiIndicate the node server
Resource, Cnum are usediIndicate the node server CPU core number.
5. Hadoop resource regulating method as claimed in claim 2, which is characterized in that in step 2, dyeing volume matrix is generated
Method includes: that dyeing volume matrix is made of the matrix of a n × t, and n row indicates mission number, and t column indicate node serial number, are denoted as
ChromosomeMatrix=(chromosomeMatrix [i] [j])n×t;
Wherein matrix element chromosomeMatrix [i] [j] ∈ { 0,1 } (0≤i < n, 0≤j < t), element
ChromosomeMatrix [i] [j]=1 indicates that task i is distributed to node j and executed by this item chromosome, and element value is 0 expression
Current task is not yet assigned to the node by this item chromosome;Each task is assigned to only a node and executes, full simultaneously
Sufficient condition
6. Hadoop resource regulating method as claimed in claim 2, which is characterized in that in step 3, when using being based on optimal
Between span and the Double fitness value function based on load balancing so that finding most short task execution time and guarantor during Evolution of Population
Each node load balancing direction of cluster is held to advance;
Fitness function based on time span includes: to execute the time for Job,
TjobIt is the execution time an of operation, for entire cluster, while runs multiple operations, the operation being finished the latest
For the optimal time span of this chromosome allocation plan;
Fitness function based on optimal time span indicates are as follows:
Wherein Ftime(c) the optimal time span of the c articles chromosome in population is indicated, N indicates operation quantity,
ChromosomeScale indicates population scale;The optimal time spans of all chromosomes in one wheel iteration are indicated are as follows:
AllFtime={ Ftime(1), Ftime(2), Ftime(3) ... ..., Ftime(c)}
Wherein AllFtimeIndicate that all chromosome time span set in epicycle iteration, set subscript indicate chromosome numbers, member
Element value indicates this chromosome time span value.
7. Hadoop resource regulating method as claimed in claim 6, which is characterized in that Map task and Reduce task execution
The calculation method of time includes:
A) it is as follows to execute time calculating by each Map task of Job operation:
Wherein Tmap(i, j, k) indicate operation i k-th of fragment distribute to node j execution time, Split (i, k) (0≤k <
SplitNum k-th of fragment size of operation i, cpuSpeed) are indicatedjIndicate the CPU arithmetic speed of node j;If fragment is big
It is small not of uniform size with blocks of files, then this task may need internet transmission of virtual laboratory blocks of files to synthesize a fragment, Block
(i, k) indicates that the fragment task needs the data block size from other node-node transmissions, and node (i, j) indicates task i memory node
And execute the network transfer speeds between node j;Job one big is divided into several fragments, one Map of a fragment
Task, it is parallel respectively to execute, the Map task being finished as entire Map task task execution time, for one
Jobi, Tmap=Max (Tmap(i,j,k));
B) for Reduce task, task execution time is according to TS(map/reduce)Historical information < FileSize of model construction,
SplitSize, SplitNum, MapTime, ReduceTime > predicted;
Fitness function based on load balancing includes:
It is higher that more balanced allocation strategy cluster source utilization rate is loaded in resource allocation process interior joint;It is suitable based on load balancing
Number of tasks in set of tasks JobSet is expressed as JobSet.length, node set after initialization of population by response function
NodeSet interior joint number is expressed as NodeSet.length, then the number of tasks of per node on average distribution are as follows:
Dispersion degree of one group of data with respect to mean value is measured by standard deviation, standard deviation is smaller then closer with average value, collection
Group's load is more balanced;It is indicated based on load balancing fitness function are as follows:
Wherein Fload(c) standard deviation of this chromosome node distribution number of tasks, the i.e. fitness value of its load balancing are indicated,
TaskNum (c, j) indicates the task number that the c articles chromosome, j-th of node is assigned to, and N indicates node total number amount, AvgTask
Indicate each node mean allocation number of tasks in this chromosome allocation plan;Load for all chromosomes in a wheel iteration
Balanced fitness set expression are as follows:
AllFload={ Fload(1), Fload(2), Fload(3) ... ..., Fload(i)}
Wherein AllFloadIndicate that all chromosome load balancing fitness set in epicycle iteration, set subscript indicate that chromosome is compiled
Number, element value indicates this chromosome load balancing fitness value;
Normalized: following Set criteria formula is used:
For being indicated after being based on time span fitness function normalized are as follows:
For being based on indicating after loading equal fitness line number normalized are as follows:
Ftime (k) *, which is represented, for chromosome executes the time, and the value the big, illustrates that the execution time is longer, fitness should be got over
It is small;Fload (k) * represents node distribution number of tasks with respect to the dispersion degree of mean allocation number of tasks simultaneously, and value is bigger to illustrate cluster
Load is more unbalanced, and fitness should be smaller;It is based on optimal time span and is based on load for every chromosome
The fitness function of weighing apparatus indicates are as follows:
8. Hadoop resource regulating method as claimed in claim 7, which is characterized in that
To building historical information<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>carry out prediction side
Method includes:
Step 1: assuming that the operation to be predicted is NewJob, first in TS(map/reduce)Model is looked for big with current work NewJob
Operation set JobSet similar in small (FileSize)1={ Job1,Job2,Job3,……Jobk};
Step 2: then in JobSet1In find fragment size (SplitSize) and fragment quantity (SplitNum) consistent operation
Set JobSet2={ Job1,Job2,Job3,……Jobk};
Step 3: and then in Jobset2In find operation set similar in Map task execution time with current work NewJob
JobSet3={ Job1,Job2,Job3,……Jobk};
Step 4: the Reduce phased mission for finally calculating current work NewJob according to the following formula executes the time:
Wherein TreduceIndicate the Reduce stage overall execution time of current Job, TmapIndicate that the Map stage of current Job is whole
Execute time, AvgTmapIndicate JobSet3The Map stage average performance times of all operations in set, AvgTreduceIt indicates
JobSet3All operation Reduce stage average performance times in set;Since node load can constantly change, same node is
Same task is set also to have different task execution times in different moments, so need plus a load regulation parameter ω,
For balancing the deadline of task under different loads, ω is expressed as the ratio of current time load and history average load.
9. Hadoop resource regulating method as claimed in claim 2, which is characterized in that in step 5, fitness probability matrix:
Calculation is as follows:
Wherein Fprob(i) probability that chromosome i is selected in next round iteration is indicated;Fitness probability is in each round iteration
Terminate, new round iteration calculates before starting, and value is mapped to one-dimensional matrix, and structure is as follows:
SelectionProbability={ Fprob(1), Fprob(2), Fprob(3) ..., Fprob(i)}
Wherein SelectionProbability indicates the set of all chromosome fitness probability in last round of iteration, under set
Mark indicates chromosome numbers, and matrix intermediate value indicates the corresponding fitness probability of the chromosome, the selected probability of next round iteration;
SelectionProbability should be met
Selection-duplication operator meets following formula:
Wherein CrossoverNum indicates to choose the chromosome quantitative for carrying out crossover operation by roulette mode,
CopyNum indicates the chromosome quantitative directly replicated, and cp is reproduction ratio;
When crossover operator does crossover operation, two-dimensional matrix is first decoded into one-dimensional form:
ChromosomeMatrix=[2,3,1,4,5,7,2 ..., 9];
Wherein chromosomeMatrix indicates item chromosome, and subscript indicates mission number, and element value indicates node serial number, such as
Task 1 is distributed to node 3 and executed by chromosomeMatrix [1]=3, expression;Random complementary method is taken to intersect parent's dyeing
Body selects two high parent chromosomes of fitness by selection operator first, and random interception same position, which is write down, is designated as flag,
Parent chromosome intercept chromosomeMatrix [0, flag], mother then intercepted for chromosome chromosomeMatrix [flag,
End], then the two is binned in and is formed together child chromosome;
The self-adaptive mutation calculation formula of mutation operator is as follows:
Wherein Pvar(k) mutation probability of chromosome k, F are indicatedAdapt(max) population chromosome maximum adaptation angle value, F are indicatedAdapt
(avg) population chromosome average fitness value is indicated;FAdapt(k) fitness value of chromosome k, λ are indicatedminAnd λmaxBe variation because
Son controls the upper and lower bound of aberration rate value.
10. a kind of Hadoop resource scheduling system for implementing Hadoop resource regulating method described in claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910340000.9A CN110109753A (en) | 2019-04-25 | 2019-04-25 | Resource regulating method and system based on various dimensions constraint genetic algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910340000.9A CN110109753A (en) | 2019-04-25 | 2019-04-25 | Resource regulating method and system based on various dimensions constraint genetic algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110109753A true CN110109753A (en) | 2019-08-09 |
Family
ID=67486753
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910340000.9A Pending CN110109753A (en) | 2019-04-25 | 2019-04-25 | Resource regulating method and system based on various dimensions constraint genetic algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110109753A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781003A (en) * | 2019-10-24 | 2020-02-11 | 重庆邮电大学 | Load balancing method for particle swarm fusion variation control |
CN111325498A (en) * | 2020-01-21 | 2020-06-23 | 北京邮电大学 | User route generation method and device for VRPSPD, electronic equipment and storage medium |
CN111400050A (en) * | 2020-03-30 | 2020-07-10 | 绿盟科技集团股份有限公司 | Method and device for allocating resources to execute tasks |
CN112486651A (en) * | 2020-11-30 | 2021-03-12 | 中国电子科技集团公司第十五研究所 | Cloud test platform task scheduling method based on improved genetic algorithm |
CN112561434A (en) * | 2020-12-18 | 2021-03-26 | 上海交通大学宁波人工智能研究院 | Joint scheduling method and auxiliary scheduling system for traditional container terminal |
CN112667405A (en) * | 2021-01-05 | 2021-04-16 | 田宇 | Information processing method, device, equipment and storage medium |
CN112990515A (en) * | 2019-12-02 | 2021-06-18 | 中船重工信息科技有限公司 | Workshop resource scheduling method based on heuristic optimization algorithm |
CN113127167A (en) * | 2021-03-18 | 2021-07-16 | 国家卫星气象中心(国家空间天气监测预警中心) | Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm |
CN113139710A (en) * | 2021-01-05 | 2021-07-20 | 中国电子科技集团公司第二十九研究所 | Multi-resource parallel task advanced plan scheduling method based on genetic algorithm |
CN113568746A (en) * | 2021-07-27 | 2021-10-29 | 北京达佳互联信息技术有限公司 | Load balancing method and device, electronic equipment and storage medium |
CN113641471A (en) * | 2021-07-30 | 2021-11-12 | 平安科技(深圳)有限公司 | Soft load scheduling method, device, equipment and medium based on genetic algorithm model |
CN113727450A (en) * | 2021-08-13 | 2021-11-30 | 中国科学院计算技术研究所 | Network slice wireless resource allocation method based on resource isolation and reuse |
CN114356564A (en) * | 2021-12-29 | 2022-04-15 | 四川大学 | System for integrating service resources |
CN116089823A (en) * | 2023-03-29 | 2023-05-09 | 成都信息工程大学 | Intelligent community visual real-time supervision method based on big data |
CN117272838A (en) * | 2023-11-17 | 2023-12-22 | 恒海云技术集团有限公司 | Government affair big data platform data acquisition optimization method |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737126A (en) * | 2012-06-19 | 2012-10-17 | 合肥工业大学 | Classification rule mining method under cloud computing environment |
US20130117752A1 (en) * | 2011-11-07 | 2013-05-09 | Sap Ag | Heuristics-based scheduling for data analytics |
CN103106253A (en) * | 2013-01-16 | 2013-05-15 | 西安交通大学 | Data balance method based on genetic algorithm in MapReduce calculation module |
CN103902375A (en) * | 2014-04-11 | 2014-07-02 | 北京工业大学 | Cloud task scheduling method based on improved genetic algorithm |
CN105550033A (en) * | 2015-11-17 | 2016-05-04 | 北京交通大学 | Genetic-tabu hybrid algorithm based resource scheduling policy method in private cloud environment |
CN106383746A (en) * | 2016-08-30 | 2017-02-08 | 北京航空航天大学 | Configuration parameter determination method and apparatus of big data processing system |
CN106936892A (en) * | 2017-01-09 | 2017-07-07 | 北京邮电大学 | A kind of self-organizing cloud multi-to-multi computation migration method and system |
US20170220944A1 (en) * | 2016-01-29 | 2017-08-03 | Peter P. Nghiem | Best trade-off point on an elbow curve for optimal resource provisioning and performance efficiency |
CN107172166A (en) * | 2017-05-27 | 2017-09-15 | 电子科技大学 | The cloud and mist computing system serviced towards industrial intelligentization |
CN107273209A (en) * | 2017-06-09 | 2017-10-20 | 北京工业大学 | The Hadoop method for scheduling task of improved adaptive GA-IAGA is clustered based on minimum spanning tree |
CN107273197A (en) * | 2017-06-14 | 2017-10-20 | 北京工业大学 | Hadoop method for scheduling task based on the improved spectral clustering genetic algorithm of orthogonal experiment |
CN108881432A (en) * | 2018-06-15 | 2018-11-23 | 广东省城乡规划设计研究院 | Cloud computing cluster load dispatching method based on GA algorithm |
-
2019
- 2019-04-25 CN CN201910340000.9A patent/CN110109753A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130117752A1 (en) * | 2011-11-07 | 2013-05-09 | Sap Ag | Heuristics-based scheduling for data analytics |
CN102737126A (en) * | 2012-06-19 | 2012-10-17 | 合肥工业大学 | Classification rule mining method under cloud computing environment |
CN103106253A (en) * | 2013-01-16 | 2013-05-15 | 西安交通大学 | Data balance method based on genetic algorithm in MapReduce calculation module |
CN103902375A (en) * | 2014-04-11 | 2014-07-02 | 北京工业大学 | Cloud task scheduling method based on improved genetic algorithm |
CN105550033A (en) * | 2015-11-17 | 2016-05-04 | 北京交通大学 | Genetic-tabu hybrid algorithm based resource scheduling policy method in private cloud environment |
US20170220944A1 (en) * | 2016-01-29 | 2017-08-03 | Peter P. Nghiem | Best trade-off point on an elbow curve for optimal resource provisioning and performance efficiency |
CN106383746A (en) * | 2016-08-30 | 2017-02-08 | 北京航空航天大学 | Configuration parameter determination method and apparatus of big data processing system |
CN106936892A (en) * | 2017-01-09 | 2017-07-07 | 北京邮电大学 | A kind of self-organizing cloud multi-to-multi computation migration method and system |
CN107172166A (en) * | 2017-05-27 | 2017-09-15 | 电子科技大学 | The cloud and mist computing system serviced towards industrial intelligentization |
CN107273209A (en) * | 2017-06-09 | 2017-10-20 | 北京工业大学 | The Hadoop method for scheduling task of improved adaptive GA-IAGA is clustered based on minimum spanning tree |
CN107273197A (en) * | 2017-06-14 | 2017-10-20 | 北京工业大学 | Hadoop method for scheduling task based on the improved spectral clustering genetic algorithm of orthogonal experiment |
CN108881432A (en) * | 2018-06-15 | 2018-11-23 | 广东省城乡规划设计研究院 | Cloud computing cluster load dispatching method based on GA algorithm |
Non-Patent Citations (3)
Title |
---|
ZHEN TANG等: "IO dependent SSD cache allocation for elastic Hadoop applications", 《SCIENCE CHINA(INFORMATION SCIENCES)》 * |
贾瑞玉等: "基于MapReduce模型的并行遗传k-means聚类算法", 《计算机工程与设计》 * |
陈姚节等: "Research on a Task Planning Method for Multi-Ship Cooperative Driving", 《JOURNAL OF SHANGHAI JIAOTONG UNIVERSITY(SCIENCE)》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781003B (en) * | 2019-10-24 | 2023-04-07 | 重庆邮电大学 | Load balancing method for particle swarm fusion variation control |
CN110781003A (en) * | 2019-10-24 | 2020-02-11 | 重庆邮电大学 | Load balancing method for particle swarm fusion variation control |
CN112990515A (en) * | 2019-12-02 | 2021-06-18 | 中船重工信息科技有限公司 | Workshop resource scheduling method based on heuristic optimization algorithm |
CN111325498A (en) * | 2020-01-21 | 2020-06-23 | 北京邮电大学 | User route generation method and device for VRPSPD, electronic equipment and storage medium |
CN111325498B (en) * | 2020-01-21 | 2023-04-18 | 北京邮电大学 | User route generation method and device for VRPSPD, electronic equipment and storage medium |
CN111400050A (en) * | 2020-03-30 | 2020-07-10 | 绿盟科技集团股份有限公司 | Method and device for allocating resources to execute tasks |
CN111400050B (en) * | 2020-03-30 | 2023-09-19 | 绿盟科技集团股份有限公司 | Method and device for allocating resources to execute tasks |
CN112486651A (en) * | 2020-11-30 | 2021-03-12 | 中国电子科技集团公司第十五研究所 | Cloud test platform task scheduling method based on improved genetic algorithm |
CN112561434A (en) * | 2020-12-18 | 2021-03-26 | 上海交通大学宁波人工智能研究院 | Joint scheduling method and auxiliary scheduling system for traditional container terminal |
CN112667405A (en) * | 2021-01-05 | 2021-04-16 | 田宇 | Information processing method, device, equipment and storage medium |
CN113139710B (en) * | 2021-01-05 | 2022-03-08 | 中国电子科技集团公司第二十九研究所 | Multi-resource parallel task advanced plan scheduling method based on genetic algorithm |
CN113139710A (en) * | 2021-01-05 | 2021-07-20 | 中国电子科技集团公司第二十九研究所 | Multi-resource parallel task advanced plan scheduling method based on genetic algorithm |
CN113127167A (en) * | 2021-03-18 | 2021-07-16 | 国家卫星气象中心(国家空间天气监测预警中心) | Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm |
CN113127167B (en) * | 2021-03-18 | 2023-11-03 | 国家卫星气象中心(国家空间天气监测预警中心) | Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm |
CN113568746A (en) * | 2021-07-27 | 2021-10-29 | 北京达佳互联信息技术有限公司 | Load balancing method and device, electronic equipment and storage medium |
CN113641471B (en) * | 2021-07-30 | 2024-02-02 | 平安科技(深圳)有限公司 | Soft load scheduling method, device, equipment and medium based on genetic algorithm model |
CN113641471A (en) * | 2021-07-30 | 2021-11-12 | 平安科技(深圳)有限公司 | Soft load scheduling method, device, equipment and medium based on genetic algorithm model |
CN113727450A (en) * | 2021-08-13 | 2021-11-30 | 中国科学院计算技术研究所 | Network slice wireless resource allocation method based on resource isolation and reuse |
CN113727450B (en) * | 2021-08-13 | 2024-03-08 | 中国科学院计算技术研究所 | Network slice wireless resource allocation method based on resource isolation and multiplexing |
CN114356564A (en) * | 2021-12-29 | 2022-04-15 | 四川大学 | System for integrating service resources |
CN116089823A (en) * | 2023-03-29 | 2023-05-09 | 成都信息工程大学 | Intelligent community visual real-time supervision method based on big data |
CN117272838B (en) * | 2023-11-17 | 2024-02-02 | 恒海云技术集团有限公司 | Government affair big data platform data acquisition optimization method |
CN117272838A (en) * | 2023-11-17 | 2023-12-22 | 恒海云技术集团有限公司 | Government affair big data platform data acquisition optimization method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110109753A (en) | Resource regulating method and system based on various dimensions constraint genetic algorithm | |
Xiao et al. | Artificial bee colony algorithm based on adaptive neighborhood search and Gaussian perturbation | |
Guo | Task scheduling based on ant colony optimization in cloud environment | |
CN109800071A (en) | A kind of cloud computing method for scheduling task based on improved adaptive GA-IAGA | |
Zhao et al. | QoS-aware web service selection with negative selection algorithm | |
CN106055395A (en) | Method for constraining workflow scheduling in cloud environment based on ant colony optimization algorithm through deadline | |
CN110515735A (en) | A kind of multiple target cloud resource dispatching method based on improvement Q learning algorithm | |
CN110389819A (en) | A kind of dispatching method and system of computation-intensive batch processing task | |
CN104333569A (en) | Cloud task scheduling algorithm based on user satisfaction | |
CN107609130A (en) | A kind of method and server for selecting data query engine | |
CN113191828B (en) | User electricity price value grade label construction method, device, equipment and medium | |
CN113821318B (en) | Internet of things cross-domain subtask combination collaborative computing method and system | |
Abdullah et al. | Integrated MOPSO algorithms for task scheduling in cloud computing | |
CN109165081A (en) | Web application adaptive resource allocation method based on machine learning | |
Wang et al. | Dominance rule and opposition-based particle swarm optimization for two-stage assembly scheduling with time cumulated learning effect | |
CN107306207A (en) | Calculated and multiple target intensified learning service combining method with reference to Skyline | |
CN115085202A (en) | Power grid multi-region intelligent power collaborative optimization method, device, equipment and medium | |
CN112231117A (en) | Cloud robot service selection method and system based on dynamic vector hybrid genetic algorithm | |
CN111047040A (en) | Web service combination method based on IFPA algorithm | |
CN107329826A (en) | A kind of heuristic fusion resource dynamic dispatching algorithm based on Cloudsim platforms | |
CN112486651B (en) | Cloud test platform task scheduling method based on improved genetic algorithm | |
CN116014764B (en) | Distributed energy storage optimization processing method and device | |
Dong et al. | Optimization of service scheduling in computing force network | |
CN107180286A (en) | Manufacturing service supply chain optimization method and system based on modified pollen algorithm | |
Wu et al. | A genetic-ant-colony hybrid algorithm for task scheduling in cloud system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190809 |
|
RJ01 | Rejection of invention patent application after publication |