CN110109753A - Resource regulating method and system based on various dimensions constraint genetic algorithm - Google Patents

Resource regulating method and system based on various dimensions constraint genetic algorithm Download PDF

Info

Publication number
CN110109753A
CN110109753A CN201910340000.9A CN201910340000A CN110109753A CN 110109753 A CN110109753 A CN 110109753A CN 201910340000 A CN201910340000 A CN 201910340000A CN 110109753 A CN110109753 A CN 110109753A
Authority
CN
China
Prior art keywords
node
chromosome
task
job
fitness
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910340000.9A
Other languages
Chinese (zh)
Inventor
张路桥
滕彩峰
李飞
王娟
韩斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201910340000.9A priority Critical patent/CN110109753A/en
Publication of CN110109753A publication Critical patent/CN110109753A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to technical field of data processing, disclose a kind of resource regulating method and system based on various dimensions constraint genetic algorithm, after initializing to prediction model, task matrix, node matrix equation, construct Double fitness value function;Formulate selection-duplication operator, crossover operator, mutation operator;After carrying out successive ignition, the resource distribution mode of global optimum is obtained.The present invention is to seek more preferably Resource Allocation Formula, proposes a kind of Hadoop resource scheduling algorithm based on various dimensions constraint genetic algorithm, realizes Hadoop Resource Scheduler by the algorithm;Cluster resource allocative efficiency can be effectively improved using inventive algorithm, so that cluster task is performed integrally the time and shortens 20% or so.

Description

Resource regulating method and system based on various dimensions constraint genetic algorithm
Technical field
The invention belongs to technical field of data processing more particularly to a kind of resource tune based on various dimensions constraint genetic algorithm Spend method and system.Specially a kind of Hadoop resource regulating method and system based on various dimensions constraint genetic algorithm
Background technique
Currently, the immediate prior art:
Resource scheduling is a kind of combinatorial optimization problem, its final purpose is to be assigned to all tasks of cluster most to close Suitable node executes optimal to reach cluster overall performance.Hadoop YARN provides the resource scheduling algorithm and real built in three Corresponding Resource Scheduler, i.e. FIFO, Capacity and Fair scheduler are showed.But as application scenarios (hand over by such as iterative calculation Mutual formula is calculated, is calculated in real time) continuous extension, these schedulers, which are not well positioned to meet user's reasonable distribution resource and reduce, appoints Business executes the demand of time.
In conclusion problem of the existing technology is:
(1) in the prior art, allocation efficiency of resource is low, and it is long that cluster task is performed integrally the time.
(2) in different application scene, the scheduler of the prior art is not well positioned to meet user's reasonable distribution resource Reduce the demand of task execution time.
Solve the difficulty of above-mentioned technical problem:
Resource scheduling is a kind of combinatorial optimization problem, and difficulty is to need for tasks all in cluster to be assigned to most Reasonable node executes, and combines cluster loading condition and task execution time, optimal to reach cluster overall performance.Together When will based on various dimensions constraint genetic algorithm be applied to Hadoop resource scheduling when, needing to carry out many experiments can just obtain Optimized parameter setting in algorithm.
Solve the meaning of above-mentioned technical problem:
By studying existing resource scheduling scheme, more preferably resource scheduling algorithm is redesigned, for big data cloud computing The development of technology has impetus, for improving system entirety resource utilization and Hadoop platform overall performance with important Meaning.
Summary of the invention
In view of the problems of the existing technology, the present invention provides a kind of resource tune based on various dimensions constraint genetic algorithm Spend method and system.
The invention is realized in this way a kind of Hadoop resource regulating method, the Hadoop resource regulating method include:
After initializing to prediction model, task matrix, node matrix equation, Double fitness value function is constructed;
Formulate selection-duplication operator, crossover operator, mutation operator;
After carrying out successive ignition, the resource distribution mode of global optimum is obtained.
Further, the Hadoop resource regulating method includes:
Step 1 initializes prediction model after user submits operation to cluster, constructs task matrix, node matrix equation Information and coding result are saved to file;
Step 2, population building primary, generates feasible solution chromosome primary according to task matrix, node matrix equation at random, remembers It is Scale;
Step 3, fitness calculate, and the fitness value of each chromosome in population is calculated separately by fitness function;
Step 4, termination condition judgement first judge whether to meet termination condition, i.e., before entering next round iterative evolution Whether reach the iteration upper limit, the condition that meets then selects in current population the highest chromosome of fitness as optimal solution, otherwise into Enter new round iteration;
Step 5, fitness probability calculation are chosen in evolve next time according to fitness value calculation each chromosome In probability, generate new chromosome subsequently into selection-duplication, intersection, mutation operation;
Step 6 replicates the highest Scale*cp item dye of fitness in population Scale by reproduction ratio cp in duplication operator Colour solid enters next iteration;
Step 7 executes selection operator by circulation and selects two chromosomes as parent chromosome, into crossover operation Generate remaining Scale* (1-cp) chromosome;
Step 8 executes mutation operation for Scale* (1-cp) chromosome of generation, and to the dyeing that variation is completed Body enters next iteration;
Step 9 enters step the iterative evolution of a three carry out new rounds.
Further, step 1, user submit operation into cluster, cluster environment model be denoted as G=NoedSet, JobSet }, wherein NodeSet={ node1, node2, node3... ..., nodenIndicate node resource set;JobSet= {Job1, job2, job3... ..., jobnIndicate operation set, each Jobi={ task1, task2, task3... ..., taskn} (0≤i < n), wherein task includes map task, reduce task;Task in set JobSet is assigned to NodeSet In node execute, carry out entirety Job task run;
Initialization prediction model method include:
To Map and Reduce task, TS (map/reduce) model is constructed, model uses following data format:
<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>
Wherein FileSize indicates current work size, and SplitSize indicates operation fragment size, and SplitNum indicates to make Industry fragment number, MapTime, ReduceTime respectively indicate the execution time in operation Map stage and Reduce stage;Then lead to Evaluation history task attribute information is crossed to predict the execution time of new task;
The TS of building(map/reduce)Model is stored in RescourseManager, and NodeManager is communicated by heartbeat Node attribute information is periodically passed to RescourseManager by mechanism.
Further, in step 1,
The initial method of task matrix includes:
JobSet={ Job is used for operation set1, job2, job3... ..., jobnIndicate, each Jobi= { JobSize, SplitSize, SplitNum } (0≤i < n), JobSize indicate job size, and SplitSize expression is each cut Piece size, SplitNum fragment number;Node matrix equation initial method includes: for node set NodeSet={ node1, node2, node3... ..., noden, nodei={ cpuSpeedi, AllRi, UsedRi, Cnumi, Loadi(0≤i < n), cpuSpeediIndicate the cpu floating-point operation ability of node i, AllRiIndicate the node server total resources, UsedRiIndicate the section Point server resource, CnumiIndicate the node server CPU core number.
Further, in step 2, chromosome matrix generating method includes: to dye volume matrix by the matrix group of a n × t At n row indicates mission number, and t column indicate node serial number, are denoted as chromosomeMatrix=(chromosomeMatrix [i] [j])n×t
Wherein matrix element chromosomeMatrix [i] [j] ∈ { 0,1 } (0≤i < n, 0≤j < t), element ChromosomeMatrix [i] [j]=1 indicates that task i is distributed to node j and executed by this item chromosome, and element value is 0 expression Current task is not yet assigned to the node by this item chromosome;Each task is assigned to only a node and executes, full simultaneously Sufficient condition
Further, in step 3, using the Double fitness value function based on optimal time span and based on load balancing, so that Most short task execution time is found during Evolution of Population and each node load balancing direction of cluster is kept to advance;
Fitness function based on time span includes: to execute the time for Job,
TjobIt is the execution time an of operation, for entire cluster, while runs multiple operations, executed the latest Complete operation is the optimal time span of this chromosome allocation plan;
Fitness function based on optimal time span indicates are as follows:
Wherein Ftime(c) the optimal time span of the c articles chromosome in population is indicated, N indicates operation quantity,
ChromosomeScale indicates population scale;For the optimal time span collection of all chromosomes in a wheel iteration Closing indicates are as follows:
AllFtime={ Ftime(1), Ftime(2), Ftime(3) ... ..., Ftime(c)}
Wherein AllFtimeIndicate that all chromosome time span set in epicycle iteration, set subscript indicate that chromosome is compiled Number, element value indicates this chromosome time span value.
Further, Map task and the calculation method of Reduce task execution time include:
A) it is as follows to execute time calculating by each Map task of Job operation:
Wherein Tmap(i, j, k) indicates that k-th of fragment of operation i distributes to the time of node j execution, Split (i, k) (0 ≤ k < splitNum) indicate operation i k-th of fragment size, cpuSpeedjIndicate the CPU arithmetic speed of node j;If point Piece size and blocks of files are not of uniform size, then this task may need internet transmission of virtual laboratory blocks of files to synthesize a fragment, Block (i, k) indicates that the fragment task needs the data block size from other node-node transmissions, and node (i, j) indicates task i storage Network transfer speeds between node and execution node j;Job one big is divided into several fragments, one Map of a fragment Task, it is parallel respectively to execute, the Map task being finished as entire Map task task execution time, for one Jobi, Tmap=Max (Tmap(i,j,k));
B) for Reduce task, task execution time is according to TS(map/reduce)The historical information of model construction < FileSize, SplitSize, SplitNum, MapTime, ReduceTime > predicted;
Fitness function based on load balancing includes:
It is higher that more balanced allocation strategy cluster source utilization rate is loaded in resource allocation process interior joint.Present invention design one Number of tasks in set of tasks JobSet is expressed as by fitness function of the kind based on load balancing after initialization of population JobSet.length, node set NodeSet interior joint number are expressed as NodeSet.length, then per node on average distributes Number of tasks are as follows:
Dispersion degree of one group of data with respect to mean value is measured by standard deviation, standard deviation is smaller, more connects with average value Closely, cluster load is more balanced;It is indicated based on load balancing fitness function are as follows:
Wherein Fload(c) standard deviation of this chromosome node distribution number of tasks, the i.e. fitness of its load balancing are indicated Value, TaskNum (c, j) indicate the task number that the c articles chromosome, j-th of node is assigned to, and N indicates node total number amount, AvgTask indicates each node mean allocation number of tasks in this chromosome allocation plan.For all dyeing in a wheel iteration The load balancing fitness set expression of body are as follows:
AllFload={ Fload(1), Fload(2), Fload(3) ... ..., Fload(i)}
Wherein AllFloadIndicate that all chromosome load balancing fitness set in epicycle iteration, set subscript indicate dye Colour solid number, element value indicate this chromosome load balancing fitness value;
Normalized: following Set criteria formula is used:
For being indicated after being based on time span fitness function normalized are as follows:
For being based on indicating after loading equal fitness line number normalized are as follows:
Ftime (k) *, which is represented, for chromosome executes the time, and the value the big, illustrates that the execution time is longer, fitness is answered This is smaller;Fload (k) * represents dispersion degree of the node distribution number of tasks with respect to mean allocation number of tasks simultaneously, is worth bigger explanation Cluster load is more unbalanced, and fitness should be smaller;It is based on optimal time span and based on negative for every chromosome Carrying balanced fitness function indicates are as follows:
Further, to building historical information<FileSize, SplitSize, SplitNum, MapTime, ReduceTime> Carrying out prediction technique includes:
Step 1: assuming that the operation to be predicted is NewJob, first in TS(map/reduce)Model is looked for and current work NewJob Size (FileSize) similar in operation set JobSet1={ Job1,Job2,Job3,……Jobk};
Step 2: then in JobSet1In to find fragment size (SplitSize) consistent with fragment quantity (SplitNum) Operation set JobSet2={ Job1,Job2,Job3,……Jobk};
Step 3: and then in Jobset2In find operation similar in Map task execution time with current work NewJob Set JobSet3={ Job1,Job2,Job3,……Jobk};
Step 4: the Reduce phased mission for finally calculating current work NewJob according to the following formula executes the time:
Wherein TreduceIndicate the Reduce stage overall execution time of current Job, TmapIndicate the Map stage of current Job Overall execution time, AvgTmapIndicate JobSet3The Map stage average performance times of all operations in set, AvgTreduceTable Show JobSet3All operation Reduce stage average performance times in set;Since node load can constantly change, same node Different task execution times is had same task is in different moments, so needing plus a load regulation parameter ω, for balancing the deadline of task under different loads, ω is expressed as the ratio of current time load and history average load.
Further, in step 5, fitness probability matrix: calculation is as follows:
Wherein Fprob(i) probability that chromosome i is selected in next round iteration is indicated.Fitness probability is in each round Iteration terminates, and new round iteration calculates before starting, and value is mapped to one-dimensional matrix, and structure is as follows:
SelectionProbability={ Fprob(1), Fprob(2), Fprob(3) ..., Fprob(i)}
Wherein SelectionProbability indicates the set of all chromosome fitness probability in last round of iteration, collection Closing subscript indicates chromosome numbers, and matrix intermediate value indicates the corresponding fitness probability of the chromosome, what next round iteration was selected Probability;SelectionProbability should be met
Selection-duplication operator meets following formula:
Wherein CrossoverNum indicates to choose the chromosome quantitative for carrying out crossover operation by roulette mode, CopyNum indicates the chromosome quantitative directly replicated, and cp is reproduction ratio;
When crossover operator does crossover operation, two-dimensional matrix is first decoded into one-dimensional form:
ChromosomeMatrix=[2,3,1,4,5,7,2 ..., 9];
Wherein chromosomeMatrix indicates item chromosome, and subscript indicates mission number, and element value indicates that node is compiled Number, such as chromosomeMatrix [1]=3, task 1 is distributed to node 3 and executed by expression;Random complementary method is taken to intersect parent Chromosome selects two high parent chromosomes of fitness by selection operator first, and random interception same position, which is write down, to be designated as Flag, parent chromosome intercept chromosomeMatrix [0, flag], and mother then intercepts chromosomeMatrix for chromosome The two, is then binned in and is formed together child chromosome by [flag, end];
The self-adaptive mutation calculation formula of mutation operator is as follows:
Wherein Pvar(k) mutation probability of chromosome k, F are indicatedAdapt(max) population chromosome maximum adaptation angle value is indicated, FAdapt(avg) population chromosome average fitness value is indicated.FAdapt(k) fitness value of chromosome k, λ are indicatedminAnd λmaxIt is Mutagenic factor controls the upper and lower bound of aberration rate value.
Another object of the present invention is to provide a kind of Hadoop resource tune for implementing the Hadoop resource regulating method Degree system.
In conclusion advantages of the present invention and good effect are as follows:
To seek more preferably Resource Allocation Formula, a kind of Hadoop resource tune based on various dimensions constraint genetic algorithm is proposed Spend algorithm (Hadoop Resource Scheduler Based on Multi-dimensional Constrained GeneticAlgorithm, MCGA), Hadoop Resource Scheduler is realized by the algorithm.
The present invention can initialize prediction model, task matrix, node matrix equation first, construct Double fitness value function, then make Determine selection-duplication operator, crossover operator, mutation operator etc., then by finally searching out global optimum after completing successive ignition Resource Allocation Formula.It is proved by 3 data of table and Figure 13, Figure 14, Figure 15 experiment effect figure, can effectively be mentioned using inventive algorithm High cluster resource allocative efficiency, so that cluster task is performed integrally the time and shortens 20% or so.
Detailed description of the invention
Fig. 1 is the Hadoop resource regulating method process provided in an embodiment of the present invention based on various dimensions constraint genetic algorithm Figure.
Fig. 2 is TS prediction model data acquisition flow chart provided in an embodiment of the present invention.
Fig. 3 is chiasma operation chart provided in an embodiment of the present invention.
Fig. 4 is chromosomal variation operation chart provided in an embodiment of the present invention.
Fig. 5 is clustered node topological diagram provided in an embodiment of the present invention.
Fig. 6 is 2.1 figure of ant group algorithm experiment numbers provided in an embodiment of the present invention.
Fig. 7 is 4.1 figure of ant group algorithm experiment numbers provided in an embodiment of the present invention.
Fig. 8 1.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Fig. 9 is 2.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 10 is 3.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 11 is 4.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 12 is 5.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 13 is 6.1 figure of genetic algorithm experiment numbers provided in an embodiment of the present invention.
Figure 14 is three kinds of schedulers provided in an embodiment of the present invention average task completion time figure under four group task collection.
Figure 15 is the figure of changing that the second group job collection provided in an embodiment of the present invention runs 20 times.
Figure 16 is the figure of changing that third group job collection provided in an embodiment of the present invention runs 20 times.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
In the prior art, allocation efficiency of resource is low, and it is long that cluster task is performed integrally the time.It is existing in different application scene Having the scheduler of technology not to be well positioned to meet user's reasonable distribution resource reduces the demand of task execution time.
To solve the above problems, below with reference to concrete scheme, the present invention is described in detail.
As shown in Figure 1, the Hadoop resource tune provided in an embodiment of the present invention based on various dimensions constraint genetic Algorithm Design Degree method, comprising:
1) start to initialize prediction model after user submits operation to cluster, building task matrix, node matrix equation will Information and coding result are saved to file;
2) one group of feasible solution population building primary: is generated as chromosome primary according to task matrix, node matrix equation at random It is denoted as Scale;
3) fitness calculates: the fitness value of each chromosome in population is calculated separately by fitness function;
4) whether termination condition judges: before entering next round iterative evolution, first judging whether to meet termination condition, i.e., Reach the iteration upper limit, the condition that meets then is selected the highest chromosome of fitness in current population and otherwise entered new as optimal solution One wheel iteration;
5) it fitness probability calculation: is selected in evolve next time according to fitness value calculation each chromosome general Rate generates new chromosome subsequently into selection-duplication, intersection, mutation operation;
6) by reproduction ratio cp in duplication operator, the highest Scale*cp chromosome of fitness in population Scale is replicated Into next iteration;
7) selection operator is executed by circulation and selects two chromosomes as parent chromosome, generated into crossover operation surplus Remaining Scale* (1-cp) chromosome;
8) mutation operation is executed for the Scale* of generation (1-cp) chromosome, under the chromosome for allowing variation to complete enters An iteration;
9) here it is the processes that an iteration is evolved, and proceed immediately to the iterative evolution that step 3) carries out a new round.
Below with reference to each parameter setting of Hadoop resource scheduling algorithm based on various dimensions constraint genetic algorithm to the present invention It is further described.
Cluster environment model is denoted as G={ NoedSet, JobSet }, wherein NodeSet={ node1, node2, node3... ..., nodenIndicate node resource set;JobSet={ Job1, job2, job3... ..., jobnIndicate operation set It closes, each Jobi={ task1, task2, task3... ..., taskn(0≤i < n), wherein task has map task also to have reduce task.The task in this set JobSet the node in NodeSet is assigned to eventually by dispatching algorithm to execute, And make whole Job task completion time most short.
It is further described below with reference to genetic algorithm parameter and Hadoop resource dispatching model parameter.
1) prediction model is initialized:
To Map and Reduce task, TS (map/reduce) model is constructed, model uses following data format:
<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>
Wherein FileSize indicates current work size, and SplitSize indicates operation fragment size, and SplitNum indicates to make Industry fragment number, MapTime, ReduceTime respectively indicate the execution time in operation Map stage and Reduce stage.Then lead to Evaluation history task attribute information is crossed to predict the execution time of new task.
The TS of building(map/reduce)Model is stored in RescourseManager, to make when dispatching algorithm starting With node attribute information is periodically passed to RescourseManager by heartbeat communication mechanism by NodeManager.
The following Fig. 2 TS prediction model data acquisition figure of the data TRANSFER MODEL of prediction model.
2) task matrix:
JobSet={ Job is shared for operation set1, job2, job3... ..., jobnIndicate, wherein each Jobi= { JobSize, SplitSize, SplitNum } (0≤i < n), JobSize indicate job size, and SplitSize expression is each cut Piece size, SplitNum fragment number.
3) node matrix equation:
For node set NodeSet={ node1, node2, node3... ..., noden, wherein nodei= {cpuSpeedi, AllRi, UsedRi, Cnumi, Loadi(0≤i < n), cpuSpeediIndicate the cpu floating-point operation energy of node i Power, AllRiIndicate the node server total resources, UsedRiIndicate the node server resource, CnumiIndicate the node Server CPU core number.
4) volume matrix is dyed
Population is evolved every time can generate several chromosomes, and every chromosome is all a feasible solution of current problem, can Contain multiple elements in row solution, each element is known as a gene of chromosome.Volume matrix is dyed by the matrix group of a n × t At n row indicates mission number, and t column indicate node serial number, are denoted as chromosomeMatrix=(chromosomeMatrix [i] [j])n×t
Wherein matrix element chromosomeMatrix [i] [j] ∈ { 0,1 } (0≤i < n, 0≤j < t), element ChromosomeMatrix [i] [j]=1 indicates that task i is distributed to node j and executed by this item chromosome, and element value is 0 expression Current task is not yet assigned to the node by this item chromosome.At the same time, each task can only distribute to a node It executes, so condition need to be met
5) Double fitness value function:
Fitness function is used to control the direction of Evolution of Population.The present invention is taken based on optimal time span and based on load Balanced Double fitness value function.So that population towards the most short task execution time of searching and keeps cluster respectively to save during evolution Advance in point load balancing direction.
In embodiments of the present invention, Double fitness value function specifically includes: fitness function based on time span and being based on The fitness function of load balancing.
In embodiments of the present invention, the fitness function based on time span includes:
For a Job operation, the deadline is codetermined by Map, Reduce task completion time;It is right For entire cluster, item chromosome is exactly a kind of Resource Allocation Formula, and OPTIMAL TASK time span is the allocation plan In complete the latest Job execute the time determine.For a Job, it is as follows to execute time calculation formula for it:
Tjob=Tmap+Treduce(formula 2)
In embodiments of the present invention, Map task and the calculation method of Reduce task execution time include:
A) for Map task, present invention understands that there are inconsistent for task run node and document storing section point in the cluster The case where.So Map Runtime handles the time by task and the resource transmission time determines.The task processing time mainly takes Certainly in the CPU computing capability of task run node, the resource transmission time depend on task memory node and task run node it Between network transfer speeds.So it is as follows to execute time calculating for each Map task of Job operation:
Wherein Tmap(i, j, k) indicates that k-th of fragment of operation i distributes to the time of node j execution, Split (i, k) (0 ≤ k < splitNum) indicate operation i k-th of fragment size, cpuSpeedjIndicate the CPU arithmetic speed of node j;If point Piece size and blocks of files are not of uniform size, then this task may need internet transmission of virtual laboratory blocks of files to synthesize a fragment, Block (i, k) indicates that the fragment task needs the data block size from other node-node transmissions, and node (i, j) indicates task i storage Network transfer speeds between node and execution node j.And Job one big may be divided into several fragments, one point One Map task of piece, it is parallel respectively to execute, then the Map task being finished the latest will be as the task of entire Map task The time is executed, so for a JobiFor, Tmap=Max (Tmap(i,j,k))。
B) for Reduce task, task execution time is according to TS(map/reduce)The historical information of model construction < FileSize, SplitSize, SplitNum, MapTime, ReduceTime > predicted, steps are as follows for specific execution:
Step 1: assuming that the operation to be predicted is NewJob, first in TS(map/reduce)Model is looked for and current work NewJob Size (FileSize) similar in operation set JobSet1={ Job1,Job2,Job3,……Jobk};
Step 2: then in JobSet1In to find fragment size (SplitSize) consistent with fragment quantity (SplitNum) Operation set JobSet2={ Job1,Job2,Job3,……Jobk};
Step 3: and then in Jobset2In find operation similar in Map task execution time with current work NewJob Set JobSet3={ Job1,Job2,Job3,……Jobk};
Step 4: the Reduce phased mission for finally calculating current work NewJob according to the following formula executes the time:
Wherein TreduceIndicate the Reduce stage overall execution time of current Job, TmapIndicate the Map stage of current Job Overall execution time, AvgTmapIndicate JobSet3The Map stage average performance times of all operations in set, AvgTreduceTable Show JobSet3All operation Reduce stage average performance times in set.Since node load can constantly change, same node Different task execution times is had same task is in different moments, so needing plus a load regulation parameter ω, for balancing the deadline of task under different loads, ω is expressed as the ratio of current time load and history average load.
(formula 2) can be converted to following computation model according to (formula 3) and (formula 4):
TjobIt is the execution time an of operation, for entire cluster, while multiple operations is run, wherein the latest The operation being finished is exactly the optimal time span of this chromosome allocation plan.
So the fitness function based on optimal time span indicates are as follows:
Wherein Ftime(c) the optimal time span of the c articles chromosome in population is indicated, N indicates operation quantity, ChromosomeScale indicates population scale.The optimal time spans of all chromosomes in one wheel iteration are indicated are as follows:
AllFtime={ Ftime(1), Ftime(2), Ftime(3) ... ..., Ftime(c)}
Wherein AllFtimeIndicate that all chromosome time span set in epicycle iteration, set subscript indicate that chromosome is compiled Number, element value indicates this chromosome time span value.
In embodiments of the present invention, the fitness function based on load balancing includes:
It is higher that more balanced allocation strategy cluster source utilization rate is loaded in resource allocation process interior joint.Present invention design one Number of tasks in set of tasks JobSet is expressed as by fitness function of the kind based on load balancing after initialization of population JobSet.length, node set NodeSet interior joint number are expressed as NodeSet.length, then per node on average distributes Number of tasks are as follows:
The present invention measured by standard deviation one group of data with respect to mean value dispersion degree, standard deviation it is smaller then with average value Closer, cluster load is more balanced.Therefore it is indicated based on load balancing fitness function are as follows:
Wherein Fload(c) standard deviation of this chromosome node distribution number of tasks, the i.e. fitness of its load balancing are indicated Value, TaskNum (c, j) indicate the task number that the c articles chromosome, j-th of node is assigned to, and N indicates node total number amount, AvgTask indicates each node mean allocation number of tasks in this chromosome allocation plan.For all dyeing in a wheel iteration The load balancing fitness set expression of body are as follows:
AllFload={ Fload(1), Fload(2), Fload(3) ... ..., Fload(i)}
Wherein AllFloadIndicate that all chromosome load balancing fitness set in epicycle iteration, set subscript indicate dye Colour solid number, element value indicate this chromosome load balancing fitness value.
C) normalized:
Optimal time span fitness function and load balancing fitness function are different evaluation index, they have not Same dimension and dimensional unit.In order to eliminate the dimension impact between index, need to be standardized data.The present invention Deviation Standardization Act is used for reference, using following Set criteria formula:
For being indicated after being based on time span fitness function normalized are as follows:
For being based on indicating after loading equal fitness line number normalized are as follows:
Ftime (k) *, which is represented, for chromosome executes the time, and the value the big, illustrates that the execution time is longer, fitness is answered This is smaller;Fload (k) * represents dispersion degree of the node distribution number of tasks with respect to mean allocation number of tasks simultaneously, is worth bigger explanation Cluster load is more unbalanced, and fitness should be smaller.So it is based on optimal time span and base for every chromosome It is indicated in the fitness function of load balancing are as follows:
6) fitness probability matrix:
Fitness probability matrix be calculated according to the fitness of every chromosome its in next round iteration be selected it is general Rate, the more big selected probability of fitness is bigger, and calculation is as follows:
Wherein Fprob(i) probability that chromosome i is selected in next round iteration is indicated.Fitness probability is in each round Iteration terminates, and new round iteration calculates before starting, and value is mapped to one-dimensional matrix, and structure is as follows:
SelectionProbability={ Fprob(1), Fprob(2), Fprob(3) ..., Fprob(i)}
Wherein SelectionProbability indicates the set of all chromosome fitness probability in last round of iteration, collection Closing subscript indicates chromosome numbers, and matrix intermediate value indicates the corresponding fitness probability of the chromosome, that is, next round iteration quilt The probability chosen.Therefore SelectionProbability should be met
7) selection-duplication operator:
After chromosome fitness has been calculated, into iterative cycles step.It is to select two by selection operator first The high chromosome of fitness enters crossover operation.Operator is replicated in the present invention and uses roulette (RWS) method, and individual is selected general Rate is got by the calculating of fitness probability matrix.In order to guarantee that outstanding chromosome obtains for delivery to the next generation, prevents from intersecting, become Outstanding Chromosome breakage is formed pernicious iteration by ETTHER-OR operation, and the present invention is added in selection operator replicates operator, in each iteration Reproduction ratio is set, is copied to several high chromosomes of fitness in previous generation population are intact in population of new generation.It is multiple The setting of ratio processed ensure that algorithm stability so that population is evolved toward the direction.
Selection-duplication operator needs to meet following formula:
Wherein CrossoverNum indicates to choose the chromosome quantitative for carrying out crossover operation by roulette mode, CopyNum indicates the chromosome quantitative directly replicated, and cp is reproduction ratio.Excessive, the excessive algorithm of reproduction ratio should not be arranged in reproduction ratio It is not easy to restrain.It is preferable by effect when experimental verification cp=0.2.
8) crossover operator:
Crossover operation is the main method that population generates new individual.The present invention is when to chromosome coding, the two dimension of use Matrix coder.When doing crossover operation, two-dimensional matrix is first decoded into one-dimensional form:
ChromosomeMatrix=[2,3,1,4,5,7,2 ..., 9];
Wherein chromosomeMatrix indicates item chromosome, and subscript indicates mission number, and element value indicates that node is compiled Number, such as chromosomeMatrix [1]=3, task 1 is distributed to node 3 and executed by expression.The present invention takes random complementary method to hand over Parent chromosome is pitched, two high parent chromosomes of fitness are selected by selection operator first, it is random to intercept same position note Under be designated as flag, wherein parent chromosome interception chromosomeMatrix [0, flag], mother then intercepted for chromosome The two, is then binned in and is formed together child chromosome by chromosomeMatrix [flag, end].Intersect process such as Fig. 3 dye Shown in colour solid crossover operation schematic diagram.
9) mutation operator:
The present invention uses a kind of self-adaptive mutation calculation, so that mutation probability is being planted with chromosome fitness Serial regulation is carried out between cluster mean and maximum value, so that it is convergent to globally optimal solution when close to optimal solution to accelerate algorithm Speed.Self-adaptive mutation calculation formula is as follows:
Wherein Pvar(k) mutation probability of chromosome k, F are indicatedAdapt(max) population chromosome maximum adaptation angle value is indicated, FAdapt(avg) population chromosome average fitness value is indicated.FAdapt(k) fitness value of chromosome k, λ are indicatedminAnd λmaxIt is Mutagenic factor controls the upper and lower bound (λ of aberration rate valuemin、λmax∈(0,1)).Formula is described as follows:
It is if 1) certain chromosome fitness is higher, and has been higher than average value, then outstanding on the chromosome in order to prevent Gene is destroyed, it should be reduced its mutation probability, that is, be worked as FAdapt(k)≥FAdapt(avg) when, it should using the side in formula (15) Method dynamic calculates its mutation probability, so that its higher aberration rate of fitness is lower.Mutagenic factor λ is obtained according to many experimentsmin= 0.005 effect is preferable.
If 2) certain chromosome fitness is lower, and subaverage, then it is bigger just to allow the chromosome to possess Mutation probability, for enhancing population ability of searching optimum.Work as FAdapt(k)<FAdapt(avg) when, it is general to give a maximum variation Rate λmax.Mutagenic factor λ is obtained according to many experimentsmax=0.05 effect is preferable.
In mutation operation, to reduce calculation times, using continuous variation gene position method.Genetic mutation position is determined first It sets, then calculates the number to be made a variation and make a variation.
Mutation operation process is as follows:
1) variation judgement: the random number P generated between one [0,1]rand(k) compared with the chromosomal variation probability Compared with if Prand(k)<Pvar(k), then mutation operation is executed to this chromosome.
2) variation number calculates: in order to avoid the gene number of item chromosome variation is too many, algorithm is caused to be not easy to restrain, Therefore mutant gene number should meet following constraint condition:
0<VarNum≤Pvar(k)×chromosomeMatrix.length
Wherein VarNum indicates the consecutive gene number for allowing to make a variation, and chromosomeMatrix.length indicates dyeing Body gene number.Herein using in the gene for meeting the integer representative variation generated at random within the scope of VarNum constraint condition Number.
3) variable position judges: generating a number at random within the scope of mrna length chromosomeMatrix.length PindexIndicate that variable position, variable position add variation number backward, it can definitive variation segment.If variable position is beyond dyeing Then remaining gene makes a variation body length since the 1st gene.
4) it executes variation: random change genic value mode being used to make a variation to enhance the complete of population after definitive variation segment Office's optimizing ability.For example, it is assumed that cluster, which is appointed, 100 tasks, 10 nodes meet P for chromosome krand(k)<Pvar(k), And VarNum=3 is calculated, Pindex=5, then it represents that make a variation three genes backward at the chromosome subscript 5.
Shown in the following Fig. 4 chromosomal variation operation chart of process that makes a variation.
Below with reference to experiment, the invention will be further described.
The present invention uses two parts experimental verification algorithm operational efficiency and validity.
First part is excellent in operational efficiency compared to ant colony intelligence optimization algorithm by emulation experiment verification algorithm Gesture, while the setting of the optimized parameter in algorithm is obtained by emulation experiment;
Second part is by building Hadoop cluster environment, using this hair of HiBench performance benchmark test Tool validation Bright told MCGA scheduler is carried relative to the Resource Scheduler AntScheduler and Hadoop that ant group algorithm is realized Advantage of the Capacity scheduler on the overall task deadline illustrates the correctness and validity of algorithm.
First part:
1) environment is tested
A) algorithm operational efficiency Evaluation Environment
Simulation experimental program is realized with JavaScript language, using Chrome V8 engine as algorithm operation platform, is adopted Visual Chart is generated with Echarts.
B) algorithm validity Evaluation Environment
For verification algorithm validity and correctness, need to be verified in Hadoop cluster.Experimental situation uses 5 The Hadoop cluster of server construction, every server are 2 cores, 6GB memory, cluster 10 cores, 30GB memory in total.One of them NameNode, a ResourceManager, 4 DataNode, 4 NodeManager.Cluster topology Fig. 5 clustered node is opened up It flutters shown in figure.
2) algorithm operational efficiency is assessed
Two algorithms are all made of same task and node by the stabilization for guaranteeing test environment in emulation experiment Number setting: the task of totally 100 fixed sizes, the node of 10 fixed executive capability;Pass through continuous adjustment algorithm parametric form The algorithm operational efficiency under different conditions is compared, optimal parameter setting is finally obtained.
A) as follows for ant group algorithm parameter setting and experimental result record:
1 ant group algorithm experiment parameter of table record
The the 2.1st, No. 4.1 experiment that interception task completion time is shorter separately below and algorithm execution time is shorter, effect Fruit is schemed as shown in Fig. 6 ant group algorithm experiment numbers 2.1 and Fig. 7 ant group algorithm experiment numbers 4.1.
B) parameter setting for genetic told for the present invention and experimental result record are as follows:
2 genetic algorithm experiment parameter of table record
Separately below shown in experiment effect Fig. 8-Figure 13 under the setting of interception different parameters.
Analysis of experimental results about AntScheduler algorithm and MCGA algorithm is as follows:
AntScheduler algorithm: experiment shows that the Algorithm Convergence is good, but local optimum is easy to treat as the overall situation most It is excellent, and algorithm execution time is relatively long, while algorithm stability is bad, even if same group task, same group node, finally The overall task execution time that allocation plan obtains has difference.
MCGA algorithm: experiment display MCGA algorithm possesses shorter runing time, and execution efficiency is higher.While compared to AntScheduler algorithm has better stability, and using same group task, same group node, final allocation plan is obtained whole Body task execution time is roughly the same.By observation experiment number it has been found that the number of iterations is more, chromosome is more, can more find Globally optimal solution;Reproduction ratio setting is less susceptible to restrain more greatly, it is also difficult to obtain globally optimal solution.And observation experiment data are sent out The now experiment of number 1.1~1.4 compared to other experiment no matter algorithm execution time, Algorithm Convergence or task execution time Have a clear superiority.Therefore show that MCGA algorithm optimized parameter is set as the number of iterations 100 times, population scale i.e. chromosome number is 100, reproduction ratio 0.2.
It can be seen that MCGA algorithm of the invention meets demand in operational efficiency.
3) algorithm validity is assessed
Below will for MCGA scheduler of the present invention, ant group algorithm realization Resource Scheduler AntScheduler, Task execution time of the Capacity scheduler of Hadoop default under four group job collection compares and analyzes.To have avoided number According to error, task execution time takes the average value of 5 operations, such as following table.
Task completion time compares under 3 different work collection of table
Histogram is depicted as shown in tri- kinds of schedulers of Figure 14 average task completion time under four group task collection.
In the case where according to the small operation of upper table, algorithm advantage is not obvious, this is because what small operation set divided Map, reduce task are less, and cluster resource is relatively sufficient.And in big operation set, map, reduce task phase of division To more, resource contention is gradually fierce, and the performance advantage of algorithm just emerges from.
Second part:
For the stability of verification algorithm, MCGA, AntScheduler, Capacity scheduler operation second is respectively adopted Group, third group job collection 20 times observe the variation of its task execution time.
By Figure 15 and Figure 16 it is found that Capacity, AntScheduler scheduler are for its task execution of same group job Time fluctuation amplitude is larger, concentrates the 6th, 11,14,17 experiment for the second group job of Capacity scheduler, third group is made It is singular point that industry, which concentrates 4,8,11,13,18 experiments, and on these aspects, the execution time of Capacity scheduler is relatively It grows and increased dramatically with the difference of front and back point, this is because it arrives first what the resource distribution mode first obtained determined, if by task point The dispensing poor node of performance, which makes cluster load imbalance that will will lead to task overall execution time, larger difference.And this The MCGA scheduler is invented, resource allocation is carried out using intelligent optimization algorithm, according to group operation collection and node resource Situation dynamic tuning, smart allocation, so the fluctuation of its overall task deadline is smaller, performance is more stable.
It can be concluded that, the present invention has good robustness from above-mentioned experimental analysis, either handles big operation set also It is that small operation set its performance is superior to the resource scheduling algorithm realized using ant group algorithm and Hadoop YARN default scheduling is calculated Method is a kind of effective resource allocation methods.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (10)

1. a kind of Hadoop resource regulating method, which is characterized in that the Hadoop resource regulating method includes:
After user submits operation to cluster, after being initialized to prediction model, task matrix, node matrix equation, construct double suitable Response function;
Formulate selection-duplication operator, crossover operator, mutation operator;
After carrying out successive ignition, the resource distribution mode of global optimum is obtained.
2. Hadoop resource regulating method as described in claim 1, which is characterized in that the Hadoop resource regulating method packet It includes:
Step 1 initializes prediction model after user submits operation to cluster, and building task matrix, node matrix equation will be believed Breath and coding result are saved to file;
Step 2, population building primary, generates feasible solution chromosome primary according to task matrix, node matrix equation at random, is denoted as Scale;
Step 3, fitness calculate, and the fitness value of each chromosome in population is calculated separately by fitness function;
Whether step 4, termination condition judgement first judge whether to meet termination condition, i.e., before entering next round iterative evolution Reach the iteration upper limit, the condition that meets then is selected the highest chromosome of fitness in current population and otherwise entered new as optimal solution One wheel iteration;
Step 5, fitness probability calculation are selected in evolve next time according to fitness value calculation each chromosome Probability generates new chromosome subsequently into selection-duplication, intersection, mutation operation;
Step 6 replicates the highest Scale*cp chromosome of fitness in population Scale by reproduction ratio cp in duplication operator Into next iteration;
Step 7 executes selection operator by circulation and selects two chromosomes as parent chromosome, generates into crossover operation Remaining Scale* (1-cp) chromosome;
Step 8, for generation Scale* (1-cp) chromosome execute mutation operation, and to variation complete chromosome into Enter next iteration;
Step 9 enters step the iterative evolution of a three carry out new rounds.
3. Hadoop resource regulating method as claimed in claim 2, which is characterized in that step 1 submits operation to arrive in user In cluster, cluster environment model is denoted as G={ NoedSet, JobSet }, wherein NodeSet={ node1, node2, node3... ..., nodenIndicate node resource set;JobSet={ Job1, job2, job3... ..., jobnIndicate operation set It closes, each Jobi={ task1, task2, task3... ..., taskn(0≤i < n), wherein task include map task, reduce task;Task in set JobSet is assigned to the node in NodeSet to execute, carries out entirety Job task fortune Row;
Initialization prediction model method include:
To Map and Reduce task, TS (map/reduce) model is constructed, model uses following data format:
<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>
Wherein FileSize indicates current work size, and SplitSize indicates operation fragment size, and SplitNum indicates operation point The piece number, MapTime, ReduceTime respectively indicate the execution time in operation Map stage and Reduce stage;Then by commenting Historic task attribute information is estimated to predict the execution time of new task;
The TS of building(map/reduce)Model is stored in RescourseManager, and NodeManager passes through heartbeat communication mechanism Node attribute information is periodically passed into RescourseManager.
4. Hadoop resource regulating method as claimed in claim 2, which is characterized in that in step 1,
The initial method of task matrix includes:
JobSet={ Job is used for operation set1, job2, job3... ..., jobnIndicate, each Jobi=JobSize, SplitSize, SplitNum } (0≤i < n), JobSize expression job size, each slice size of SplitSize expression, SplitNum fragment number;Node matrix equation initial method includes: for node set NodeSet={ node1, node2, node3... ..., noden, nodei={ cpuSpeedi, AllRi, UsedRi, Cnumi, Loadi(0≤i < n), cpuSpeedi Indicate the cpu floating-point operation ability of node i, AllRiIndicate the node server total resources, UsedRiIndicate the node server Resource, Cnum are usediIndicate the node server CPU core number.
5. Hadoop resource regulating method as claimed in claim 2, which is characterized in that in step 2, dyeing volume matrix is generated Method includes: that dyeing volume matrix is made of the matrix of a n × t, and n row indicates mission number, and t column indicate node serial number, are denoted as ChromosomeMatrix=(chromosomeMatrix [i] [j])n×t
Wherein matrix element chromosomeMatrix [i] [j] ∈ { 0,1 } (0≤i < n, 0≤j < t), element ChromosomeMatrix [i] [j]=1 indicates that task i is distributed to node j and executed by this item chromosome, and element value is 0 expression Current task is not yet assigned to the node by this item chromosome;Each task is assigned to only a node and executes, full simultaneously Sufficient condition
6. Hadoop resource regulating method as claimed in claim 2, which is characterized in that in step 3, when using being based on optimal Between span and the Double fitness value function based on load balancing so that finding most short task execution time and guarantor during Evolution of Population Each node load balancing direction of cluster is held to advance;
Fitness function based on time span includes: to execute the time for Job,
TjobIt is the execution time an of operation, for entire cluster, while runs multiple operations, the operation being finished the latest For the optimal time span of this chromosome allocation plan;
Fitness function based on optimal time span indicates are as follows:
Wherein Ftime(c) the optimal time span of the c articles chromosome in population is indicated, N indicates operation quantity, ChromosomeScale indicates population scale;The optimal time spans of all chromosomes in one wheel iteration are indicated are as follows:
AllFtime={ Ftime(1), Ftime(2), Ftime(3) ... ..., Ftime(c)}
Wherein AllFtimeIndicate that all chromosome time span set in epicycle iteration, set subscript indicate chromosome numbers, member Element value indicates this chromosome time span value.
7. Hadoop resource regulating method as claimed in claim 6, which is characterized in that Map task and Reduce task execution The calculation method of time includes:
A) it is as follows to execute time calculating by each Map task of Job operation:
Wherein Tmap(i, j, k) indicate operation i k-th of fragment distribute to node j execution time, Split (i, k) (0≤k < SplitNum k-th of fragment size of operation i, cpuSpeed) are indicatedjIndicate the CPU arithmetic speed of node j;If fragment is big It is small not of uniform size with blocks of files, then this task may need internet transmission of virtual laboratory blocks of files to synthesize a fragment, Block (i, k) indicates that the fragment task needs the data block size from other node-node transmissions, and node (i, j) indicates task i memory node And execute the network transfer speeds between node j;Job one big is divided into several fragments, one Map of a fragment Task, it is parallel respectively to execute, the Map task being finished as entire Map task task execution time, for one Jobi, Tmap=Max (Tmap(i,j,k));
B) for Reduce task, task execution time is according to TS(map/reduce)Historical information < FileSize of model construction, SplitSize, SplitNum, MapTime, ReduceTime > predicted;
Fitness function based on load balancing includes:
It is higher that more balanced allocation strategy cluster source utilization rate is loaded in resource allocation process interior joint;It is suitable based on load balancing Number of tasks in set of tasks JobSet is expressed as JobSet.length, node set after initialization of population by response function NodeSet interior joint number is expressed as NodeSet.length, then the number of tasks of per node on average distribution are as follows:
Dispersion degree of one group of data with respect to mean value is measured by standard deviation, standard deviation is smaller then closer with average value, collection Group's load is more balanced;It is indicated based on load balancing fitness function are as follows:
Wherein Fload(c) standard deviation of this chromosome node distribution number of tasks, the i.e. fitness value of its load balancing are indicated, TaskNum (c, j) indicates the task number that the c articles chromosome, j-th of node is assigned to, and N indicates node total number amount, AvgTask Indicate each node mean allocation number of tasks in this chromosome allocation plan;Load for all chromosomes in a wheel iteration Balanced fitness set expression are as follows:
AllFload={ Fload(1), Fload(2), Fload(3) ... ..., Fload(i)}
Wherein AllFloadIndicate that all chromosome load balancing fitness set in epicycle iteration, set subscript indicate that chromosome is compiled Number, element value indicates this chromosome load balancing fitness value;
Normalized: following Set criteria formula is used:
For being indicated after being based on time span fitness function normalized are as follows:
For being based on indicating after loading equal fitness line number normalized are as follows:
Ftime (k) *, which is represented, for chromosome executes the time, and the value the big, illustrates that the execution time is longer, fitness should be got over It is small;Fload (k) * represents node distribution number of tasks with respect to the dispersion degree of mean allocation number of tasks simultaneously, and value is bigger to illustrate cluster Load is more unbalanced, and fitness should be smaller;It is based on optimal time span and is based on load for every chromosome The fitness function of weighing apparatus indicates are as follows:
8. Hadoop resource regulating method as claimed in claim 7, which is characterized in that
To building historical information<FileSize, SplitSize, SplitNum, MapTime, ReduceTime>carry out prediction side Method includes:
Step 1: assuming that the operation to be predicted is NewJob, first in TS(map/reduce)Model is looked for big with current work NewJob Operation set JobSet similar in small (FileSize)1={ Job1,Job2,Job3,……Jobk};
Step 2: then in JobSet1In find fragment size (SplitSize) and fragment quantity (SplitNum) consistent operation Set JobSet2={ Job1,Job2,Job3,……Jobk};
Step 3: and then in Jobset2In find operation set similar in Map task execution time with current work NewJob JobSet3={ Job1,Job2,Job3,……Jobk};
Step 4: the Reduce phased mission for finally calculating current work NewJob according to the following formula executes the time:
Wherein TreduceIndicate the Reduce stage overall execution time of current Job, TmapIndicate that the Map stage of current Job is whole Execute time, AvgTmapIndicate JobSet3The Map stage average performance times of all operations in set, AvgTreduceIt indicates JobSet3All operation Reduce stage average performance times in set;Since node load can constantly change, same node is Same task is set also to have different task execution times in different moments, so need plus a load regulation parameter ω, For balancing the deadline of task under different loads, ω is expressed as the ratio of current time load and history average load.
9. Hadoop resource regulating method as claimed in claim 2, which is characterized in that in step 5, fitness probability matrix: Calculation is as follows:
Wherein Fprob(i) probability that chromosome i is selected in next round iteration is indicated;Fitness probability is in each round iteration Terminate, new round iteration calculates before starting, and value is mapped to one-dimensional matrix, and structure is as follows:
SelectionProbability={ Fprob(1), Fprob(2), Fprob(3) ..., Fprob(i)}
Wherein SelectionProbability indicates the set of all chromosome fitness probability in last round of iteration, under set Mark indicates chromosome numbers, and matrix intermediate value indicates the corresponding fitness probability of the chromosome, the selected probability of next round iteration; SelectionProbability should be met
Selection-duplication operator meets following formula:
Wherein CrossoverNum indicates to choose the chromosome quantitative for carrying out crossover operation by roulette mode, CopyNum indicates the chromosome quantitative directly replicated, and cp is reproduction ratio;
When crossover operator does crossover operation, two-dimensional matrix is first decoded into one-dimensional form:
ChromosomeMatrix=[2,3,1,4,5,7,2 ..., 9];
Wherein chromosomeMatrix indicates item chromosome, and subscript indicates mission number, and element value indicates node serial number, such as Task 1 is distributed to node 3 and executed by chromosomeMatrix [1]=3, expression;Random complementary method is taken to intersect parent's dyeing Body selects two high parent chromosomes of fitness by selection operator first, and random interception same position, which is write down, is designated as flag, Parent chromosome intercept chromosomeMatrix [0, flag], mother then intercepted for chromosome chromosomeMatrix [flag, End], then the two is binned in and is formed together child chromosome;
The self-adaptive mutation calculation formula of mutation operator is as follows:
Wherein Pvar(k) mutation probability of chromosome k, F are indicatedAdapt(max) population chromosome maximum adaptation angle value, F are indicatedAdapt (avg) population chromosome average fitness value is indicated;FAdapt(k) fitness value of chromosome k, λ are indicatedminAnd λmaxBe variation because Son controls the upper and lower bound of aberration rate value.
10. a kind of Hadoop resource scheduling system for implementing Hadoop resource regulating method described in claim 1.
CN201910340000.9A 2019-04-25 2019-04-25 Resource regulating method and system based on various dimensions constraint genetic algorithm Pending CN110109753A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910340000.9A CN110109753A (en) 2019-04-25 2019-04-25 Resource regulating method and system based on various dimensions constraint genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910340000.9A CN110109753A (en) 2019-04-25 2019-04-25 Resource regulating method and system based on various dimensions constraint genetic algorithm

Publications (1)

Publication Number Publication Date
CN110109753A true CN110109753A (en) 2019-08-09

Family

ID=67486753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910340000.9A Pending CN110109753A (en) 2019-04-25 2019-04-25 Resource regulating method and system based on various dimensions constraint genetic algorithm

Country Status (1)

Country Link
CN (1) CN110109753A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781003A (en) * 2019-10-24 2020-02-11 重庆邮电大学 Load balancing method for particle swarm fusion variation control
CN111325498A (en) * 2020-01-21 2020-06-23 北京邮电大学 User route generation method and device for VRPSPD, electronic equipment and storage medium
CN111400050A (en) * 2020-03-30 2020-07-10 绿盟科技集团股份有限公司 Method and device for allocating resources to execute tasks
CN112486651A (en) * 2020-11-30 2021-03-12 中国电子科技集团公司第十五研究所 Cloud test platform task scheduling method based on improved genetic algorithm
CN112561434A (en) * 2020-12-18 2021-03-26 上海交通大学宁波人工智能研究院 Joint scheduling method and auxiliary scheduling system for traditional container terminal
CN112667405A (en) * 2021-01-05 2021-04-16 田宇 Information processing method, device, equipment and storage medium
CN112990515A (en) * 2019-12-02 2021-06-18 中船重工信息科技有限公司 Workshop resource scheduling method based on heuristic optimization algorithm
CN113127167A (en) * 2021-03-18 2021-07-16 国家卫星气象中心(国家空间天气监测预警中心) Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm
CN113139710A (en) * 2021-01-05 2021-07-20 中国电子科技集团公司第二十九研究所 Multi-resource parallel task advanced plan scheduling method based on genetic algorithm
CN113568746A (en) * 2021-07-27 2021-10-29 北京达佳互联信息技术有限公司 Load balancing method and device, electronic equipment and storage medium
CN113641471A (en) * 2021-07-30 2021-11-12 平安科技(深圳)有限公司 Soft load scheduling method, device, equipment and medium based on genetic algorithm model
CN113727450A (en) * 2021-08-13 2021-11-30 中国科学院计算技术研究所 Network slice wireless resource allocation method based on resource isolation and reuse
CN114356564A (en) * 2021-12-29 2022-04-15 四川大学 System for integrating service resources
CN116089823A (en) * 2023-03-29 2023-05-09 成都信息工程大学 Intelligent community visual real-time supervision method based on big data
CN117272838A (en) * 2023-11-17 2023-12-22 恒海云技术集团有限公司 Government affair big data platform data acquisition optimization method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737126A (en) * 2012-06-19 2012-10-17 合肥工业大学 Classification rule mining method under cloud computing environment
US20130117752A1 (en) * 2011-11-07 2013-05-09 Sap Ag Heuristics-based scheduling for data analytics
CN103106253A (en) * 2013-01-16 2013-05-15 西安交通大学 Data balance method based on genetic algorithm in MapReduce calculation module
CN103902375A (en) * 2014-04-11 2014-07-02 北京工业大学 Cloud task scheduling method based on improved genetic algorithm
CN105550033A (en) * 2015-11-17 2016-05-04 北京交通大学 Genetic-tabu hybrid algorithm based resource scheduling policy method in private cloud environment
CN106383746A (en) * 2016-08-30 2017-02-08 北京航空航天大学 Configuration parameter determination method and apparatus of big data processing system
CN106936892A (en) * 2017-01-09 2017-07-07 北京邮电大学 A kind of self-organizing cloud multi-to-multi computation migration method and system
US20170220944A1 (en) * 2016-01-29 2017-08-03 Peter P. Nghiem Best trade-off point on an elbow curve for optimal resource provisioning and performance efficiency
CN107172166A (en) * 2017-05-27 2017-09-15 电子科技大学 The cloud and mist computing system serviced towards industrial intelligentization
CN107273209A (en) * 2017-06-09 2017-10-20 北京工业大学 The Hadoop method for scheduling task of improved adaptive GA-IAGA is clustered based on minimum spanning tree
CN107273197A (en) * 2017-06-14 2017-10-20 北京工业大学 Hadoop method for scheduling task based on the improved spectral clustering genetic algorithm of orthogonal experiment
CN108881432A (en) * 2018-06-15 2018-11-23 广东省城乡规划设计研究院 Cloud computing cluster load dispatching method based on GA algorithm

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130117752A1 (en) * 2011-11-07 2013-05-09 Sap Ag Heuristics-based scheduling for data analytics
CN102737126A (en) * 2012-06-19 2012-10-17 合肥工业大学 Classification rule mining method under cloud computing environment
CN103106253A (en) * 2013-01-16 2013-05-15 西安交通大学 Data balance method based on genetic algorithm in MapReduce calculation module
CN103902375A (en) * 2014-04-11 2014-07-02 北京工业大学 Cloud task scheduling method based on improved genetic algorithm
CN105550033A (en) * 2015-11-17 2016-05-04 北京交通大学 Genetic-tabu hybrid algorithm based resource scheduling policy method in private cloud environment
US20170220944A1 (en) * 2016-01-29 2017-08-03 Peter P. Nghiem Best trade-off point on an elbow curve for optimal resource provisioning and performance efficiency
CN106383746A (en) * 2016-08-30 2017-02-08 北京航空航天大学 Configuration parameter determination method and apparatus of big data processing system
CN106936892A (en) * 2017-01-09 2017-07-07 北京邮电大学 A kind of self-organizing cloud multi-to-multi computation migration method and system
CN107172166A (en) * 2017-05-27 2017-09-15 电子科技大学 The cloud and mist computing system serviced towards industrial intelligentization
CN107273209A (en) * 2017-06-09 2017-10-20 北京工业大学 The Hadoop method for scheduling task of improved adaptive GA-IAGA is clustered based on minimum spanning tree
CN107273197A (en) * 2017-06-14 2017-10-20 北京工业大学 Hadoop method for scheduling task based on the improved spectral clustering genetic algorithm of orthogonal experiment
CN108881432A (en) * 2018-06-15 2018-11-23 广东省城乡规划设计研究院 Cloud computing cluster load dispatching method based on GA algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHEN TANG等: "IO dependent SSD cache allocation for elastic Hadoop applications", 《SCIENCE CHINA(INFORMATION SCIENCES)》 *
贾瑞玉等: "基于MapReduce模型的并行遗传k-means聚类算法", 《计算机工程与设计》 *
陈姚节等: "Research on a Task Planning Method for Multi-Ship Cooperative Driving", 《JOURNAL OF SHANGHAI JIAOTONG UNIVERSITY(SCIENCE)》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781003B (en) * 2019-10-24 2023-04-07 重庆邮电大学 Load balancing method for particle swarm fusion variation control
CN110781003A (en) * 2019-10-24 2020-02-11 重庆邮电大学 Load balancing method for particle swarm fusion variation control
CN112990515A (en) * 2019-12-02 2021-06-18 中船重工信息科技有限公司 Workshop resource scheduling method based on heuristic optimization algorithm
CN111325498A (en) * 2020-01-21 2020-06-23 北京邮电大学 User route generation method and device for VRPSPD, electronic equipment and storage medium
CN111325498B (en) * 2020-01-21 2023-04-18 北京邮电大学 User route generation method and device for VRPSPD, electronic equipment and storage medium
CN111400050A (en) * 2020-03-30 2020-07-10 绿盟科技集团股份有限公司 Method and device for allocating resources to execute tasks
CN111400050B (en) * 2020-03-30 2023-09-19 绿盟科技集团股份有限公司 Method and device for allocating resources to execute tasks
CN112486651A (en) * 2020-11-30 2021-03-12 中国电子科技集团公司第十五研究所 Cloud test platform task scheduling method based on improved genetic algorithm
CN112561434A (en) * 2020-12-18 2021-03-26 上海交通大学宁波人工智能研究院 Joint scheduling method and auxiliary scheduling system for traditional container terminal
CN112667405A (en) * 2021-01-05 2021-04-16 田宇 Information processing method, device, equipment and storage medium
CN113139710B (en) * 2021-01-05 2022-03-08 中国电子科技集团公司第二十九研究所 Multi-resource parallel task advanced plan scheduling method based on genetic algorithm
CN113139710A (en) * 2021-01-05 2021-07-20 中国电子科技集团公司第二十九研究所 Multi-resource parallel task advanced plan scheduling method based on genetic algorithm
CN113127167A (en) * 2021-03-18 2021-07-16 国家卫星气象中心(国家空间天气监测预警中心) Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm
CN113127167B (en) * 2021-03-18 2023-11-03 国家卫星气象中心(国家空间天气监测预警中心) Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm
CN113568746A (en) * 2021-07-27 2021-10-29 北京达佳互联信息技术有限公司 Load balancing method and device, electronic equipment and storage medium
CN113641471B (en) * 2021-07-30 2024-02-02 平安科技(深圳)有限公司 Soft load scheduling method, device, equipment and medium based on genetic algorithm model
CN113641471A (en) * 2021-07-30 2021-11-12 平安科技(深圳)有限公司 Soft load scheduling method, device, equipment and medium based on genetic algorithm model
CN113727450A (en) * 2021-08-13 2021-11-30 中国科学院计算技术研究所 Network slice wireless resource allocation method based on resource isolation and reuse
CN113727450B (en) * 2021-08-13 2024-03-08 中国科学院计算技术研究所 Network slice wireless resource allocation method based on resource isolation and multiplexing
CN114356564A (en) * 2021-12-29 2022-04-15 四川大学 System for integrating service resources
CN116089823A (en) * 2023-03-29 2023-05-09 成都信息工程大学 Intelligent community visual real-time supervision method based on big data
CN117272838B (en) * 2023-11-17 2024-02-02 恒海云技术集团有限公司 Government affair big data platform data acquisition optimization method
CN117272838A (en) * 2023-11-17 2023-12-22 恒海云技术集团有限公司 Government affair big data platform data acquisition optimization method

Similar Documents

Publication Publication Date Title
CN110109753A (en) Resource regulating method and system based on various dimensions constraint genetic algorithm
Xiao et al. Artificial bee colony algorithm based on adaptive neighborhood search and Gaussian perturbation
Guo Task scheduling based on ant colony optimization in cloud environment
CN109800071A (en) A kind of cloud computing method for scheduling task based on improved adaptive GA-IAGA
Zhao et al. QoS-aware web service selection with negative selection algorithm
CN106055395A (en) Method for constraining workflow scheduling in cloud environment based on ant colony optimization algorithm through deadline
CN110515735A (en) A kind of multiple target cloud resource dispatching method based on improvement Q learning algorithm
CN110389819A (en) A kind of dispatching method and system of computation-intensive batch processing task
CN104333569A (en) Cloud task scheduling algorithm based on user satisfaction
CN107609130A (en) A kind of method and server for selecting data query engine
CN113191828B (en) User electricity price value grade label construction method, device, equipment and medium
CN113821318B (en) Internet of things cross-domain subtask combination collaborative computing method and system
Abdullah et al. Integrated MOPSO algorithms for task scheduling in cloud computing
CN109165081A (en) Web application adaptive resource allocation method based on machine learning
Wang et al. Dominance rule and opposition-based particle swarm optimization for two-stage assembly scheduling with time cumulated learning effect
CN107306207A (en) Calculated and multiple target intensified learning service combining method with reference to Skyline
CN115085202A (en) Power grid multi-region intelligent power collaborative optimization method, device, equipment and medium
CN112231117A (en) Cloud robot service selection method and system based on dynamic vector hybrid genetic algorithm
CN111047040A (en) Web service combination method based on IFPA algorithm
CN107329826A (en) A kind of heuristic fusion resource dynamic dispatching algorithm based on Cloudsim platforms
CN112486651B (en) Cloud test platform task scheduling method based on improved genetic algorithm
CN116014764B (en) Distributed energy storage optimization processing method and device
Dong et al. Optimization of service scheduling in computing force network
CN107180286A (en) Manufacturing service supply chain optimization method and system based on modified pollen algorithm
Wu et al. A genetic-ant-colony hybrid algorithm for task scheduling in cloud system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190809

RJ01 Rejection of invention patent application after publication