CN107589985A - A kind of two benches job scheduling method and system towards big data platform - Google Patents
A kind of two benches job scheduling method and system towards big data platform Download PDFInfo
- Publication number
- CN107589985A CN107589985A CN201710590748.5A CN201710590748A CN107589985A CN 107589985 A CN107589985 A CN 107589985A CN 201710590748 A CN201710590748 A CN 201710590748A CN 107589985 A CN107589985 A CN 107589985A
- Authority
- CN
- China
- Prior art keywords
- platform
- resource
- resources
- job
- allocation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention relates to a kind of two benches job scheduling method and system towards big data platform, the operation that user is submitted forms and treats schedule job set, the maximum return scheduling based on operation Late Start:The resource of platform is pre-allocated according to the deadline of operation, and is adjusted and dispatches according to the result of the income of operation comparison pre-allocation of resources, obtains the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;Job scheduling based on platform maximum resource utilization rate:According to the resource service condition of platform, above-mentioned pre-allocation of resources job scheduling result queue is micro-adjusted to obtain final scheduling result queue, ensures to make the resource utilization of platform reach highest on the premise of platform Income Maximum.Test result indicates that the present invention not only realizes platform maximum revenue, and the resource utilization of platform is also improved, improve the combination property of platform.
Description
Technical field
The invention belongs to the technical field that big data calculates, more particularly to a kind of two benches operation towards big data platform
Dispatching method and system.
Background technology
In recent years, flourishing with cloud computing and Internet technology, Data visualization goes out the sustainable growth mould of explosion type
Formula, big data epoch quietly arrive.Traditional data processing technique and instrument can not meet that the data processing of New Times will
Ask, therefore big data platform arises at the historic moment.Big data platform supports a variety of Computational frames, can be that multiple users provide clothes simultaneously
Business.But in big data platform, the resource of multiple users share platform, for platform provider, how efficiently to dispatch
The operation of multi-user, the resource of platform can be made full use of, and can meets the SLA requirement of most users, makes the receipts of oneself
It is beneficial maximum, already become a urgent problem to be solved.
At present, the job shop scheduling problem that existing Many researchers are directed in big data platform has made intensive studies, and carries
Many solution methods are gone out.What Zhang Z et al. were delivered《Optimizing Completion Time and Resource
Provisioning of Pig Programs》The resource distribution performance estimated based on deadline is proposed for Pig operations
Optimized model, the model eliminates the uncertain problems in the concurrent operation of pig programs execution, but the model does not consider
The problem of platform income.Liu et al. proposes the job parallelism dispatching method based on priority, and this method will using virtual technology
The computing capability of each node be divided into foreground virtual machine (with higher CPU priority) layer and background virtual machine (with compared with
Low CPU priority) layer, by the division of two levels, the balanced load of platform of this method, it is sufficiently used platform
Cpu resource, availability improve the execution efficiency of operation, shorten the response time of operation, but this dispatching method is not
There is the problem of considering platform income.As can be seen here, existing achievement in research is for the operation under various boundary conditions, different background
Scheduling problem has made intensive studies, and achieves a series of achievements, but the method in these achievements does not all account for platform
The problem of income, the maximized problem of platform maximum revenue peace Taiwan investment source utilization rate is not considered more simultaneously.
Big data platform can be simultaneously multiple user services, for platform service provider, reasonably dispatch this
A little user services can not only meet the needs of multi-user simultaneously, increase oneself income, can also improve platform utilization rate, use
Family Job execution process is as shown in Figure 1.
From fig. 1, it can be seen that have 6 operations of three users in platform, each each two operations of user, when user has submitted work
After industry, SLA agreements that platform service provider can sign according to the resource of platform and user etc. are scheduled and given birth to operation
Into queue is performed, as a result as shown in Figure 1.After the Job execution of user, service provider can obtain corresponding income.It is preferable
In the case of, the resource of platform is enough, can meet the needs of all users, and now the income of service provider is also maximum;But
In reality, the resource of platform is limited, and can not probably meet the needs of all users, is provided for platform service
For business, problems with will be faced:
(1) according in platform can resource and user SLA requirement, how the operation of scheduling multi-user, can just make
The Income Maximum of oneself;
(2) on the basis of (1), it is assumed that generated the Job execution team of an Income Maximum that can make service provider
How row, adjust Job execution queue and both can guarantee that Income Maximum, and can further improves the resource utilization of platform.
In summary, how to solve platform maximum revenue peace Taiwan investment source utilization rate maximum simultaneously in big data platform
The problem of change, still lack effective solution.
The content of the invention
The present invention is in order to solve the above problems, there is provided a kind of two benches job scheduling method towards big data platform.This
The dispatching method of invention has considered deadline constraint, the maximum return of platform and maximum resource utilization rate of operation etc. about
Beam condition, using the two benches job scheduling method based on income and resource, the dispatching method can not only meet that big data is put down
The operation deadline of platform user is required, it can also be ensured that platform resource utilization rate is realized while realizing platform maximum revenue
Highest.
To achieve these goals, the present invention is using a kind of following technical scheme:
A kind of two benches job scheduling method towards big data platform, this method comprise the following steps:
(1) operation for submitting user forms and treats schedule job set, carries out the maximum based on operation Late Start
Income is dispatched:The resource of platform is pre-allocated according to the deadline of operation, and it is pre- according to the income of operation comparison resource
The result of distribution is adjusted and dispatched, and obtains the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;
(2) job scheduling based on platform maximum resource utilization rate is carried out:According to the resource service condition of platform, to step
(1) pre-allocation of resources job scheduling result queue is micro-adjusted to obtain final scheduling result queue, ensures platform income most
The resource utilization of platform is set to reach highest on the premise of big.
Further, what the maximum return based on operation Late Start of the step (1) was dispatched concretely comprises the following steps:
(1-1), which is calculated, treats the initial Late Start of each operation in schedule job set, and according to initially opening the latest
Begin resource progress pre-allocation of resources of the time to platform;
(1-2) counts the computing resource sum of each period needs, provided according to the allocation result of pre-allocation of resources
Source predistribution result P_R;
(1-3) judges to whether there is the excess load period in pre-allocation of resources result P_R, if in the presence of into step (1-
4), if being not present, into step (1-5);
(1-4) adjusts to the initial Late Start of the excess load period operation in pre-allocation of resources result P_R
It is whole, and according to adjustment result renewal P_R, return to step (1-3);
(1-5) output makes the pre-allocation of resources result P_R of the Income Maximum of service provider.
Further, in the step (1-1), resource is carried out to the resource of platform according to initial Late Start and divided in advance
That matches somebody with somebody concretely comprises the following steps:
Allow each operation to start to perform in its initial Late Start, obtain when all operations are just in its deadline
The number of resources that each period needs during completion;
The initial Late Start of the operation is:For treating any operation in schedule job set, when it is its
During his operation contention for resources, make operation just can be in the operation time started of stop time point completion;
In the present invention, when the deadline of multiple operations is close, because platform computing resource is limited, operating room may
Generation resource is fought for, and operation can not be completed when its initial Late Start starts and performed before deadline.Therefore, it is necessary to
The initial Late Start that the operation of period is fought for resource is adjusted, and the present invention starts the latest according to the initial of operation
Time carries out pre-allocation of resources to all operations, effectively determines and the period that resource is fought for occurs.
Further, in the step (1-3), judge to whether there is the excess load period in pre-allocation of resources result P_R
The calculating whether the computing resource number summation that judging the operation run in certain time period needs is more than big data platform provides
Source sum;If being more than, the period is the excess load period, and otherwise, the period is the normal duty period.
The quantity that the computing resource sum of the big data platform is all standard Container in big data platform.
Further, step (1-4) is concretely comprised the following steps:
(1-4-1) chooses last excess load period, and obtains all operations performed within the period and formed
Operation set;
(1-4-2) in operation set choose a minimum proper subclass, minimum proper subclass by operation set all operations just
Beginning Late Start is advanced to the normal duty period for making the period be normal duty state, by all conditions that meet
The collection of minimum proper subclass is combined into the feasible adjustable strategies set of the period;
(1-4-3) assesses the assessed value of each feasible adjustable strategies in feasible adjustable strategies set according to valuation functions,
And choose optimal adjustable strategies;
(1-4-4) is adjusted to the initial Late Start of the operation in the strategy according to optimal adjustable strategies,
And pre-allocation of resources result P_R is updated according to the initial Late Start of the operation after adjustment.
Further, the valuation functions in the step (1-4-3) are:
Wherein,For the summation of the Profit Assessment value of all operations in adjustable strategies,For current time
The summation of the Profit Assessment value of all operations in section, after lastsize is is adjusted by the strategy, the remaining money of current slot
Source account for platform computing resource sum percentage, Sp be operation Profit Assessment value, Sp=| a-b |, when a is that operation is timely completed
The financial value of acquisition, b are the financial values obtained when operation is not timely completed.
Further, the job scheduling based on platform maximum resource utilization rate of the step (2) concretely comprises the following steps:
The final scheduling result queue of (2-1) initializing variable and T moment;
(2-2) finds the operation set that can be performed and not clashed with pre-allocation of resources result P_R at the T moment;
Whether the operation set that (2-3) judges not clash with pre-allocation of resources result P_R is empty set, if empty set, by T
Moment is arranged to the initial time of next period in pre-allocation of resources result P_R;If not empty set, then selected in operation set
The operation for making resource waste rate minimum is selected, obtains its optimal time started, updates the T moment;
(2-4) repeats step (2-2)-step (2-3), when being provided with an optimal beginning for each operation
Between, obtain final scheduling result;
(2-5) exports final scheduling result.
Further, the resource waste rate in the step (2-3) for the non-reusable resource after scheduling and is worked as
Ratio between preceding computing resource;
Currently computing resource is the summation of the computing resource and idle computing resources used.
In big data platform, in order to make full use of the resource of platform, the income of service provider is improved, the present invention provides one
Kind is towards the two benches job scheduling system of big data platform, and the scheduling system is based on above-mentioned a kind of towards the two of big data platform
Discontinuous running dispatching method.
To achieve these goals, the present invention is using a kind of following technical scheme:
A kind of two benches job scheduling system towards big data platform, the system include:
First stage scheduler module, the first stage scheduler module, which is used to form the operation that user submits, treats that scheduling is made
Industry set, based on each operation Late Start constraint and service provider overall maximum return, using based on operation the latest
The operation that the maximum return scheduling of time started is treated in schedule job set carries out adjustment scheduling for the first time, obtains making service provider
Income Maximum pre-allocation of resources job scheduling result queue;
With
Second stage scheduler module, the second stage scheduler module are used for according to the resource service condition of platform, to the
The pre-allocation of resources job scheduling result queue of one stage scheduler module is micro-adjusted to obtain final scheduling result queue, finally
The resource utilization of platform is set to reach highest on the premise of scheduling result queue guarantee platform Income Maximum, and can enough makes platform
Resource is fully used, and each operation band has the optimal time started in final scheduling result queue.
Further, it is described to treat that schedule job collection is:Platform service provider receives a collection of operation within some period
And consult to sign the SLA agreement related with operation to user, the collection of operation, which is combined into, treats schedule job collection J, is expressed as J={ j1,
j2,…,jn, wherein, n is the number of operation in J;
For treating any operation j in schedule job collection Ji, it is expressed as ji=(ms, rs, mt, rt, dl, bf (t)), wherein,
Ms is the Map number of tasks of the operation;Rs is the Reduce number of tasks of the operation;When mt is the average execution of operation Map tasks
Between;Rt is the average performance times of operation Reduce tasks;Dl is to constrain the deadline of the operation;Bf (t) is the operation
Revenue function.
Further, first scheduler module includes the first job scheduler and first resource scheduler, and described
Two scheduler modules include the second job scheduler and Secondary resource scheduler.
Beneficial effects of the present invention:
A kind of two benches job scheduling method and system towards big data platform of the present invention, counted based on MapReduce
Framework is calculated, a kind of two benches job scheduling system and method are proposed for the operation for having deadline to constrain.Carry in the first stage
Go out a kind of maximum return dispatching method based on operation Late Start, the dispatching method constrains according to the deadline of operation
And the avail information of operation calculates and adjusts the Late Start of each operation, and resource is carried out according to adjustment result and divided in advance
Match somebody with somebody, to ensure that the big operation of income can be completed before deadline, so that platform total revenue is maximum;In second stage, protecting
On the premise of demonstrate,proving platform Income Maximum, the job scheduling method based on platform maximum resource utilization rate is proposed, to improve platform money
Source utilization rate.Test result indicates that two benches job scheduling method proposed by the present invention not only realizes platform maximum revenue,
And the resource utilization of platform is also improved, improve the combination property of platform.
Brief description of the drawings
Fig. 1 is big data platform multi user operation implementation procedure schematic diagram;
Fig. 2 is flow chart of the method for the present invention;
Fig. 3 is the method flow diagram of the scheduling of the maximum return based on operation Late Start of the present invention;
Fig. 4 is the method flow diagram of the job scheduling based on platform maximum resource utilization rate of the present invention;
Fig. 5 is the system structure diagram of the present invention;
Fig. 6 is relation schematic diagram of the resource utilization with average operation size of the present invention;
Fig. 7 is relation schematic diagram of the operation completion rate with average operation size of the present invention;
Fig. 8 is relation schematic diagram of the total revenue with average operation size of the present invention;
Fig. 9 is influence schematic diagram of the operation set scale of the present invention to resource utilization;
Figure 10 is influence schematic diagram of the operation set scale of the present invention to operation completion rate;
Figure 11 is influence schematic diagram of the operation set scale of the present invention to income;
Figure 12 is influence schematic diagram of the computing resource sum of the present invention to resource utilization;
Figure 13 is influence schematic diagram of the computing resource sum of the present invention to operation completion rate;
Figure 14 is influence schematic diagram of the computing resource sum of the present invention to income;
Figure 15 is influence schematic diagram of the operation pressing degree of the present invention to resource utilization.
Embodiment:
It is noted that described further below is all exemplary, it is intended to provides further instruction to the application.It is unless another
Indicate, all technologies and scientific terminology that the present invention uses have leads to the application person of an ordinary skill in the technical field
The identical meanings understood.
It should be noted that term used herein above is merely to describe embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative
It is also intended to include plural form, additionally, it should be understood that, when in this manual using term "comprising" and/or " bag
Include " when, it indicates existing characteristics, step, operation, device, component and/or combinations thereof.
In the case where not conflicting, the feature in embodiment and embodiment in the application can be mutually combined.Tie below
Closing accompanying drawing, the invention will be further described with embodiment.
Embodiment 1:
As background technology is introduced, platform maximum revenue peace Taiwan investment can not effectively be solved in the prior art by existing
A kind of maximized problem of source utilization rate, there is provided two benches job scheduling method towards big data platform.The scheduling of the present invention
Method has considered the constraintss such as deadline constraint, the maximum return of platform and the maximum resource utilization rate of operation, adopts
With the two benches job scheduling method based on income and resource, the dispatching method can not only meet the work of big data platform user
Industry deadline is required, it can also be ensured that platform resource utilization rate highest is realized while realizing platform maximum revenue.
To achieve these goals, the present invention is using a kind of following technical scheme:
As shown in Fig. 2
A kind of two benches job scheduling method towards big data platform, this method comprise the following steps:
(1) operation for submitting user forms and treats schedule job set, the maximum return based on operation Late Start
Scheduling:The resource of platform is pre-allocated according to the deadline of operation, and pre-allocation of resources is compared according to the income of operation
Result be adjusted and dispatch, obtain the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;
(2) job scheduling based on platform maximum resource utilization rate:According to the resource service condition of platform, to step (1)
Pre-allocation of resources job scheduling result queue be micro-adjusted to obtain final scheduling result queue, ensure platform Income Maximum
Under the premise of the resource utilization of platform is reached highest.
First stage:Maximum return scheduling based on operation Late Start
First, operation user submitted forms and treats schedule job set, wherein, treat that schedule job collection is:
Platform service provider receives a collection of operation within some period and signed to user's negotiation related with operation
SLA agreements, the collection of operation, which is combined into, treats schedule job collection J, is expressed as J={ j1,j2,…,jn, wherein, n is of operation in J
Number;
The present invention is the job scheduling in the big data platform based on MapReduce Computational frames under isomorphism cluster, because
This assumes that the hardware configuration of each node is roughly the same, and process performance is consistent with stability, to any one task no matter at that
Node is run, and its run time is all consistent.For any operation, present invention assumes that the Map number of tasks and Reduce of known operation
Number of tasks, and the average performance times of Map tasks and Reduce tasks, in addition the present invention do not consider the data skew feelings of operation
Condition, the processing time for giving tacit consent to each Map tasks (or each reduce tasks) of operation are consistent.
For each operation in J, the execution time of present invention concern operation, number of resources, deadline, income are taken
Etc. information, for treating any operation j in schedule job collection Ji, it is expressed as ji=(ms, rs, mt, rt, dl, bf (t)), wherein,
Ms is the Map number of tasks of the operation;Rs is the Reduce number of tasks of the operation;When mt is the average execution of operation Map tasks
Between;Rt is the average performance times of operation Reduce tasks;Dl is to constrain the deadline of the operation;Bf (t) is the operation
Revenue function.
The revenue function bf (t) of operation is one on operation actual finish time ji.end piecewise function:
Wherein, a and b is illustrated respectively in the income of completion operation acquisition in deadline and can not be within deadline
Completing the income of operation acquisition (for without loss of generality, when can not complete before deadline, needs to compensate the corresponding amount of money of user
When financial value b negative number representations).
The deadline that operation is provided in the present invention is all soft deadline, i.e. operation can not be completed before its deadline
Shi Buhui causes serious consequence, will not also abandon the execution of operation.
As shown in figure 3, the tool of the scheduling of the maximum return based on operation Late Start of step (1) first stage
Body step is:
(1-1) is calculated and is treated schedule job set J={ j1,j2,…,jnIn each operation jiInitial Late Start
ji.Tols, and according to initial Late Start ji.TolsPre-allocation of resources is carried out to the resource of platform;
(1-2) counts the computing resource number that each period operation operation needs according to the allocation result of pre-allocation of resources
Summation, obtain pre-allocation of resources result P_R;
(1-3) judges to whether there is the excess load period in pre-allocation of resources result P_R, if in the presence of into step (1-
4), if being not present, into step (1-5);
(1-4) adjusts to the initial Late Start of the excess load period operation in pre-allocation of resources result P_R
It is whole, and according to adjustment result renewal P_R, return to step (1-3);
(1-5) output makes the pre-allocation of resources result P_R of the Income Maximum of service provider.
In the present embodiment, being provided according to initial Late Start to the resource of platform in the step (1-1)
Source predistribution concretely comprises the following steps:
Allow each operation to start to perform in its initial Late Start, obtain when all operations are just in its deadline
The number of resources that each period needs during completion;
The initial Late Start j of the operationi.TolsFor:For treating schedule job set J={ j1,j2,…,jn}
In any operation ji, when other no operation contention for resources, make operation jiJust the operation that can be completed in stop time point is opened
Begin the time;If operation jiDeadline be ji.dl, the initial Late Start calculation formula of the operation is as follows:
In the present invention, to any operation ji, when not having contention for resources, the operation must be in initial Late Start
ji.TolsBefore start perform just can guarantee that the operation is completed before deadline, when operation is in initial Late Start ji.Tols
Moment starts to complete in stop time point just when performing.When the deadline of multiple operations is close, because platform calculates
Resource-constrained, operating room may occur resource and fight for, operation jiIn its initial Late Start ji.TolsStarting can not when performing
Completed before deadline.Therefore, it is necessary to fight for the operation j of period to resourceiInitial Late Start ji.TolsCarry out
Adjustment, the initial Late Start j of the invention according to operationi.TolsTo all operation jiPre-allocation of resources is carried out, that is, allows operation
jiIn ji.TolsMoment starts to perform, and thus obtains each period needs when all operations are just completed its deadline
Computing resource sum, effectively determine and period for fighting for of resource occur.
In the present embodiment, the pre-allocation of resources result P_R in the step (1-2) is:
P_R={ (P1,R1),(P2,R2),…,(Pt,Rt), wherein, PiFor certain time period, RiFor in certain time period
The computing resource number summation that the operation of operation needs;
In the present embodiment, in the computing resource number summation that the operation run in certain time period needs, the present invention is right
The state being likely to occur in each period in pre-allocation of resources result P_R is defined:
Normal duty state:In pre-allocation of resources result P_R, for certain time period P, run in period P
The computing resource number summation R that needs of operation be less than the computing resource sum TS of big data platform, then claim the state of the period
For normal duty state, the period in normal duty state is referred to as normal time section.
Overload state:In pre-allocation of resources result P_R, for certain time period P, run in period P
The computing resource number summation R that operation needs is more than the computing resource sum TS of big data platform, then the state of the period is referred to as
Overload state, the period in overload state are referred to as the excess load period.
In the excess load period, the computing resource (Computing Resources) of big data platform can not meet own
The demand of operation, operating room are fought for resource occurs, and the delay for ultimately resulting in operation is completed.
In the present embodiment, in the step (1-3), when judging to whether there is in pre-allocation of resources result P_R excess load
Between section be judge computing resource number summation that the operation that is run in certain time period needs whether more than big data platform meter
Calculate total number resource TS;If being more than, the period is the excess load period, and otherwise, the period is the normal duty period.
The computing resource of the big data platform is total (Total Number of Computing Resources, TS)
For the quantity of all standard Container in big data platform.
For example, including 1 host node and NdnIt is each c from the configuration of node in the individual big data platform from node
Core CPU, mG internal memory, each Container are dimensioned to c1Core CPU m1G internal memories, then the computing resource of the big data platform is total
Number is:
TS=min (c/c1,m/m1)*Ndn
Wherein, min (c/c1,m/m1) it is the maximum Container quantity that each calculate node possesses.
It is adjusted and ensures platform income most, it is necessary to find an optimal correction strategy for the excess load period
Greatly:
In the present embodiment, step (1-4) is concretely comprised the following steps:
(1-4-1) chooses last excess load period, and obtains all operations performed within the period and formed
Operation set Ju;
(1-4-2) chooses a minimum proper subclass in operation set JuMinimum proper subclass Js is by operation set Ju
The initial Late Start of all operations is advanced to the normal duty period for making the period be normal duty state, by institute
The collection for having the minimum proper subclass Js of the condition of satisfaction is combined into the feasible adjustable strategies set CL={ Js of the period1,Js2,…,
Jsm};
(1-4-3) assesses feasible adjustable strategies set CL={ Js according to valuation functions1,Js2,…,JsmIn it is each can
The assessed value of row adjustable strategies, and choose optimal adjustable strategies;
(1-4-4) is adjusted to the initial Late Start of the operation in the strategy according to optimal adjustable strategies,
And pre-allocation of resources result P_R is updated according to the initial Late Start of the operation after adjustment.
To find optimal correction strategy, it is as follows that feasible adjustable strategies set is defined first:
Feasible adjustable strategies set:It is super negative for some in the pre-allocation of resources result P_R based on operation deadline
The operation set Ju of operation, selects a minimum proper subclass in operation set Ju in the lotus periodBy all operations in Js
Late Start shift to an earlier date so that the period operation take computing resource sum be no more than big data platform calculating
Total number resource TS (period will be adjusted to normal duty state) by overload state, such a minimum proper subclass Js
A referred to as feasible adjustable strategies.All Js for meeting condition set CL={ Js1,Js2,…,JsmIt is referred to as the period
Feasible adjustable strategies set.
For feasible adjustable strategies set CL={ Js1,Js2,…,JsmIn each operation set Jsi, for by excess load shape
State is adjusted to normal duty state, should be by JsiIn the Late Starts of all operations be advanced to:
Wherein ji∈Jsi, TcsBetween at the beginning of for the timeout period.
For a period in overload state, its feasible adjustable strategies more than one, in feasible adjustment
Any one feasible adjustable strategies are selected to be adjusted the excess load period in strategy set, to ensure big data platform
Income Maximum is, it is necessary to carry out Profit Assessment to all feasible schedule strategies in feasible adjustable strategies set, and tie according to assessing
Fruit selects optimal adjustable strategies wherein.Following two aspects are mainly considered during assessment:
(1) Profit Assessment value.
To handle timeout period (excess load period), the Late Start of operation is advanced to by the present invention
ji.TlsPlace, works as ji.TlsDuring < 0, represent that current point in time can not ensure that operation can be completed before deadline, at this moment need to fit
When some operations are given up, it is preferential to perform the big operation of Profit Assessment value to ensure the total revenue of platform maximum.Therefore,
It should preferentially ensure that the big operation of Profit Assessment value can be completed before deadline when being adjusted by adjustable strategies, i.e., can
Row adjustable strategies set CL={ Js1,Js2,…,JsmIn, optimal adjustable strategies should be minimum to total revenue assessed value
Operation set is adjusted.For any one operation, the Profit Assessment value Sp of operation be the financial value a that is obtained when being timely completed with
Difference between the income b obtained when operation is not timely completed:
Sp=| a-b |.
(2) cost is adjusted.
When being adjusted to operation, the present invention will not only consider Profit Assessment value, it is also necessary to which consideration is adjusted operation and accounted for
Number of resources influences caused by being adjusted on operation, that is, adjusts cost.Consider the situation in which, timeout mode is at one
Period in, have two operation jaAnd jb, and ja.Sp > jb.Sp, only should be by j from the point of view of incomebStart the latest
Time advance is to ensure at least jaIt can complete.Yet with jaThe computing resource needed seldom jbThe computing resource needed is a lot,
By jbLate Start shift to an earlier date after the period above can be caused to be changed into timeout mode, it is necessary to constantly to the period above
It is adjusted and finally even results in jbOr the T of other operationslsLess than 0 so as to being rejected;And if jaLate Start
In advance then due to jaThus all operations can be completed the resource of occupancy before deadline less.
To sum up consider the factor of two aspects, the present invention proposes a tune that optimization aim is turned to platform Income Maximum
Whole Policy evaluation function, the less adjustable strategies of score are optimal correction strategy, and valuation functions are as follows:
Wherein,For the summation of the Profit Assessment value of all operations in adjustable strategies,For current slot
The summation of the Profit Assessment value of interior all operations, after lastsize is is adjusted by the strategy, current slot surplus resources
Account for the percentage of platform computing resource sum.
In the process, it is necessary to calculate total revenue.The operation set J=that service provider receives in big data platform
{j1,j2,…,jnIn each operation have the deadline constraint j of oneselfiAnd the revenue function j on the time .dli.bf
(t), the actual finish time j of operationi.end in ji.dl preceding or jiDifferent incomes is obtained when after .dl, for all operations,
Total revenue calculates function:
As described above, the present invention proposes the maximum return dispatching algorithm based on operation Late Start, following institute
Show.
In algorithm 1,1-3 rows perform step (1-1), and schedule job set J={ j are treated in calculating1,j2,…,jnIn it is each
Operation jiInitial Late Start ji.Tols, and according to initial Late Start ji.TolsThe resource of platform is provided
Source pre-allocates.
4th row performs step (1-2), and according to the allocation result of pre-allocation of resources, counting each period operation operation needs
The computing resource number summation wanted, obtains pre-allocation of resources result P_R.
5-23 rows perform step (1-3) and (1-4), when judging to whether there is in pre-allocation of resources result P_R excess load
Between section, if in the presence of being adjusted to the initial Late Start of the period operation.
The initial Late Start of excess load period operation in the pre-allocation of resources result P_R of step (1-4)
When specifically being adjusted, 6-7 rows perform step (1-4-1), and the 6th row chooses last excess load period, and the 7th row obtains
The operation set Ju performed within the period;Eighth row performs step (1-4-2), according to operation set Ju by all conditions that meet
Minimum proper subclass Js collection is combined into the feasible adjustable strategies set CL={ Js of the period1,Js2,…,Jsm};9-16 rows
Step (1-4-3) is performed, the assessed value of each feasible adjustable strategies is assessed according to valuation functions and chooses optimal adjustable strategies;
17-23 rows perform step (1-4-4), and after selecting optimal correction strategy, Late Start tune is carried out to the operation in the strategy
It is whole, and according to adjustment result renewal P_R;
Circulation performs 6-23 rows untill the excess load period is not present in P_R.
24th row performs step (1-5), is exported P_R as returning result.
In algorithm 1, there are two circulations outside, and there is a circulation inside, so the time complexity of algorithm 1 is n+n* (n+
N), i.e. 2n2+n。
Second stage:Job scheduling based on platform maximum resource utilization rate
As shown in figure 4, the job scheduling based on platform maximum resource utilization rate of the step (2) concretely comprises the following steps:
The final scheduling result queue of (2-1) initializing variable and T moment;
(2-2) finds the operation set that can be performed and not clashed with pre-allocation of resources result P_R at the T moment;
Whether the operation set that (2-3) judges not clash with pre-allocation of resources result P_R is empty set, if empty set, by T
Moment is arranged to the initial time of next period in pre-allocation of resources result P_R;If not empty set, then selected in operation set
The operation for making resource waste rate minimum is selected, obtains its optimal time started, updates the T moment;
(2-4) repeats step (2-2)-step (2-3), when being provided with an optimal beginning for each operation
Between, obtain final scheduling result;
(2-5) exports final scheduling result.
In the present embodiment, the resource waste rate in the step (2-3) is the non-reusable resource after scheduling
With the ratio between computing resource currently;
Currently computing resource is the summation of the computing resource and idle computing resources used.
By the maximum return dispatching algorithm based on operation Late Start of first stage, the present invention obtains making platform
The pre-allocation of resources result P_R of Income Maximum, operation is performed according to this result and although can guarantee that the big operation of income is all being cut
Only completed before the time, but do not ensure that the resource utilization highest of platform.Therefore, it is of the invention towards big data platform
The second stage of two benches job scheduling method proposes a secondary adjustment scheduling based on platform maximum resource utilization rate and calculated
Method, platform resource utilization rate is maximized on the premise of platform Income Maximum is ensured.
In the dispatching algorithm of second stage, the resource that the present invention considers to be obtained according to first stage algorithm pre-allocates knot in advance
Fruit considers to dispatch the resource utilization of rear platform to ensure being timely completed for all operations.To make platform resource utilization rate
Maximum, the present invention use waste of resource rate (Wrr) computing resource utilization rate is assessed, waste of resource rate WrrI.e. by scheduling
Non-reusable resource W afterwardsrWith computing resource A currentlyr(computing resource that calculating uses and idle computing resources it is total
And) between ratio, i.e.,:
Waste of resource rate WrrThe smaller resource utilization for representing current platform is maximum.
As described above, it is as follows set forth herein the job scheduling algorithm based on platform maximum resource utilization rate:
In algorithm 2,
1-2 rows perform step (2-1), initializing variable;
4-10 rows perform step (2-2), find the operation set that can be performed and not clashed with predistribution resource at the T moment
Ej;
11-20 rows perform step (2-3), and the 11st row judges the operation set not clashed with pre-allocation of resources result P_R
Whether it is empty set,
12-18 rows, if EjIt is not sky, then in EjMiddle selection makes the minimum operation of resource waste rate and sets its optimal beginning
Time is current time T;
20 rows, if EjFor sky, then moment T is arranged to the initial time of next period in P_R;
Circulation performs 3-20 rows, until being provided with an optimal time started for each operation;
21 rows perform step (2-5), and result is returned.
There are two circulations in algorithm 2, so time complexity is n2.
After secondary adjustment scheduling, the present invention is treats that each operation that schedule job is concentrated determines opening for operation
The time is moved, because after secondary adjustment scheduling, the resource utilization of operation becomes big, when the actual time started of operation is with completing
Between in advance, many in pre-allocation of resources algorithm because the inadequate and abandoned operation of resource from new obtains performing chance, because
And after secondary adjustment scheduling, the income of cluster there will be an opportunity to be increased again.
Embodiment 2:
In big data platform, in order to make full use of the resource of platform, the income of service provider is improved, the present invention provides one
Kind is towards the two benches job scheduling system of big data platform, and the scheduling system is based on a kind of towards big data in embodiment 1
The two benches job scheduling method of platform.
To achieve these goals, the present invention is using a kind of following technical scheme:
As shown in figure 5,
A kind of two benches job scheduling system towards big data platform, the system include:
First stage scheduler module, the first stage scheduler module, which is used to form the operation that user submits, treats that scheduling is made
Industry set, in operation set each operation have the SLA information of oneself, Late Start based on each operation constraint kimonos
The overall maximum return of business business, treated using the maximum return scheduling based on operation Late Start in schedule job set
Operation carries out adjustment scheduling for the first time, obtains the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;
With
Second stage scheduler module, the second stage scheduler module are used for the resource service condition according to platform, protected
On the premise of the Income Maximum for demonstrate,proving service provider, secondary adjustment scheduling is carried out for the purpose of improving platform resource utilization rate, to first
The pre-allocation of resources job scheduling result queue of stage scheduler module is micro-adjusted to obtain final scheduling result queue, final to adjust
The resource utilization of platform is set to reach highest on the premise of degree result queue guarantee platform Income Maximum, and can enough makes the money of platform
Source is fully used, and each operation band has the optimal time started in final scheduling result queue.
In the present embodiment, first scheduler module includes the first job scheduler and first resource scheduler, institute
Stating the second scheduler module includes the second job scheduler and Secondary resource scheduler.The scheduling of operation is all to pass through job scheduling
Device assists to complete, and after job scheduler generates an optimal Job execution queue, just starts according to each operation most
Excellent time started initiating task;After job initiation, Resource Scheduler starts to distribute resource for each operation and performed, under Fig. 5
Shown in portion, operation can obtain resource from different nodes, and may operate on different nodes.
Embodiment 3:
Respectively from average operation size, operation set scale, platform resource sum and the urgent journey of operation in the present embodiment 3
The influences of each factor to algorithm such as degree, by the two benches operation towards big data platform in inventive embodiments 1 and embodiment 2
Dispatching method and system carry out combination property contrast with original FIFO dispatching algorithms in EDF algorithms and Hadoop.
Platform configuration:
In the present embodiment, to of the invention and right in a big data platform based on MapReduce Computational frames
Ratio 1 and comparative example 2 are tested.
1 host node and 20 configuration information identicals are included in platform from node.The configuration information of each node is CPU
Inter(R)Core(TM)i5-2400 3.10GHz,memory 8GB,hard disk 1TB,Red Hat Enterprise
Linux6.2System, Hadoop version are 2.7.1.We represent computing resource number with Container number, each
Container sizes are 1 core 2G internal memories, have 4 Container on so each node, 80 are shared in whole platform
Container。
Contrast algorithm:
In order to verify the two benches job scheduling method towards big data platform in Example 1 and Example 2 of the present invention
And the validity of system, in experiment by two benches dispatching algorithm-TPS (two-phase schedule) proposed by the present invention with
Original FIFO dispatching algorithms carry out combination property contrast in EDF algorithms and Hadoop.
EDF algorithms are classical for there is the job scheduling algorithm of deadline constraint, and the principle of the algorithm is to be based on cutting
Only time order and function determines Job execution order, the operation of preferential morning exercise cut-off time.In the process of implementation, to avoid operating room
Job execution time lengthening caused by fighting for resource, in each comparative example of the present embodiment set work as platform in surplus resources not
Less than operation resource requirement when ability initiating task.
In FIFO dispatching algorithms, operation performs according to the order of priority size (or priority of submission time), operating room
It can not perform parallel.In this paper test, operation will determine according to income ratio (financial value and the ratio of resource requirement) size
Priority with algorithm presented herein to be contrasted.
Test jobs and data:
The MapReduce operation Sort and Grep tested using classics is tested as input operation.
Evaluation index:
The present embodiment will be commented algorithm by three platform resource utilization rate, operation completion rate, platform income indexs
Valency, the calculation formula of three indexs are as follows:
Number of resources/platform resource sum used in platform resource utilization rate=Job execution.
Operation completion rate=the operation number being timely completed/treats schedule job sum.
Platform income=income for the operation being timely completed-is not timely completed the compensation amount of money of operation.
Experimental design:
Consider that average operation size, operation set scale, platform resource sum and operation pressing degree etc. are each during experiment
Influence of the factor to algorithm, carries out experimental analysis respectively:
(1) is tested as influence of the test average operation size to algorithm, we use the data block number of operation as assessment
The standard of job size, set operation sum to be fixed as 30, respectively the average data block number number of test jobs be 20,40,80,
100th, 200 when algorithm performance.
It is influence of the test jobs collection scale to algorithm to test (2), sets the mean size of operation to be fixed as 40, surveys respectively
Study the performance of algorithm when industry concentrative operation quantity is 10,20,30,40.
It is influence of the test platform total number resource to algorithm to test (3), is provided using Container numbers as Evaluation Platform
The standard of source sum, sets operation quantity and mean size to be fixed to 20,40, test respectively Container numbers be 4,
8th, 16,32,48,64,80 when algorithm performance.
It is influence of the test jobs pressing degree to algorithm to test (4), uses the distance length and work of operation deadline
Industry performs the ratio of length as the standard for assessing pressing degree, and the ratio is bigger, illustrates that operation pressing degree is lower.Set and make
Industry quantity and mean size are fixed to 20,40, test the performance of algorithm when average pressing degree is 3,4,5,6,7 respectively.
Because the dispatching algorithm complexity in the present invention is higher, need to consume a certain amount of time when being scheduled, and
FIFO is dispatched and EDF scheduling there's almost no scheduling time, therefore needs the execution time of algorithm to examine when carrying out algorithm comparison
Including worry.
Analysis of experimental results:
(1) influence of the average operation size to dispatching algorithm
As shown in figs 6-8, the present embodiment is tested (1) first, influence of the test average operation size to algorithm performance.
As shown in fig. 6, experiment (1) tests influence of the average operation size to resource utilization, from experimental result,
During operation small number, FIFO scheduling because operation serially perform and resource utilization is very low, and TPS and EDF algorithms because
Job parallelism resource utilization it is higher and because EDF algorithms only consider deadline factor thus its resource utilization be less than TPS
Algorithm;With the increase of job size, the resource utilization of FIFO scheduling can increase and TPS algorithms and EDF algorithms due to making
Industry increases, and resource fragmentation caused by scheduling can also increase thus resource utilization can decrease;Finally, when job size reaches
After to a certain degree, platform resource can not meet the needs of job parallelism, and the operation after three algorithmic dispatchings will serially be held
OK, thus the resource utilizations of three algorithms can be roughly the same.
As shown in fig. 7, experiment (1) is also tested for the relation of operation completion rate and average operation size, due to completing the time limit
Constant, computing capability of the system in fixed time period is certain, and therefore, as operation increases, the operation completion rate of 3 kinds of algorithms is all
Have an obvious reduction, but due to EDF algorithms only consider deadline factor and TPS algorithms will take into account platform income, thus EDF
The a little higher than TPS algorithms of algorithm operation completion rate, and FIFO algorithms only consider income, therefore the operation completion rate of this algorithm is most
It is low.
As shown in figure 8, experiment (1) is also tested for influence of the average operation size to total revenue, wherein FIFO is because of its resource
Utilization rate is low, so delayed credits is minimum, although EDF algorithm operation completion rates are slightly above TPS algorithms, because TPS considers
The income of operation, therefore the operation income of TPS algorithms is slightly larger than EDF algorithms, with the increase of job size, because operation is complete
Into the reduction of rate, the operation income of three kinds of algorithms has all declined, and the income of TPS algorithms is more than other two kinds of algorithms always.
(2) influence of the operation set scale to dispatching algorithm
As shown in figs. 9-11, the present embodiment is tested (2), influence of the test jobs collection scale to scheduling algorithm performance.
As shown in figure 9, experiment (2) tests influence of the operation set scale to resource utilization, and from experimental result, three
The computing resource utilization rate of kind of algorithm influenceed by operation set scale it is little, though the increase of its resource utilization cultivation scale does not have
Generation significant change.TPS algorithms utilization rate highest in three kinds of algorithms, the resource utilization of EDF algorithms are slightly below TPS algorithms,
FIFO algorithm resource utilizations are minimum.
As shown in Figure 10, experiment (2) is also tested for influence of the operation set scale to operation completion rate, because computing capability has
Limit, when operation set scale increases, the operation completion rate of three kinds of algorithms is decreased, and its reason is completed with job size to operation
The influence of rate and total revenue is identical, and here is omitted.
As shown in figure 11, experiment (2) is also tested for influence of the flat operation set scale to income, from experimental result,
When operation set scale increases, the income of three kinds of algorithms all shows the trend of reduction after first increase, and its reason is existed in operation number
When within platform computing capability, operation increase, the achievable operation of platform also increased thus can obtain more incomes,
Exceed platform computing capability however as operation number, during operation number increase, the operation not being timely completed increases, because these are not complete
What it is into operation acquisition is negative income, thus total revenue reduces on the contrary.In three kinds of algorithms FIFO algorithms because resource utilization it is low, because
This operation completion rate and income are all minimum, and EDF algorithms and TPS algorithms are high because of resource utilization, when operation number is flat
When in the range of platform computing capability, the operation completion rate and income of two kinds of algorithms are all roughly the same, continue to increase with operation number, due to
EDF algorithms only consider the deadline factor of operation and TPS algorithms preferentially complete the big operation of income, thus the work of EDF algorithms
Although industry completion rate slightly above TPS algorithms but TPS algorithm operations income are but much larger than the income of EDF algorithms.
(3) influence of the computing resource quantity to dispatching algorithm
As shown in figs. 12-14, the present embodiment is tested (3), influence of the test jobs collection scale to scheduling algorithm performance.
As shown in figure 12, experiment (3) tests influence of the computing resource number to resource utilization, from experimental result,
When computing resource is less, the resource waste rate of three kinds of algorithms is roughly the same and is reduced with computing resource increasing number, so
And after computing resource quantity is reduced to a certain extent, the resource utilization of FIFO algorithms continues with the increasing of computing resource quantity
Add and reduce, and EDF and TPS resource utilization increases as computing resource quantity increases, and TPS amounts of increase are more than EDF
Amount of increase.The reason for above-mentioned phenomenon occur is, when computing resource quantity is few, the operation after three kinds of algorithmic dispatchings is all substantially
Therefore resource utilization is similar and is reduced as resource increases for serial execution;It is more than when Container quantity continues to increase to
Operation after average operation size after EDF and TPS algorithmic dispatchings may perform parallel, thus resource utilization increased,
Because TPS algorithms have taken into account platform resource utilization rate information in job scheduling, and EDF algorithms be it is simple according to cut-off when
Between dispatch, thus TPS algorithms resource utilization is more than EDF algorithms.
As shown in figure 13, experiment (3) is also tested for influence of the computing resource number to operation completion rate;As shown in figure 14, it is real
Test (3) and be also tested for influence of the computing resource number to income, from experimental result, with the increase TPS of computing resource quantity
The operation completion rate and income of algorithm and EDF algorithms all increased, and the income of TPS algorithms is much larger than EDF algorithms;And FIFO
Algorithm operation completion rate and income when platform resource is less all increase with the increase of resource, but when platform resource is more than
After average operation size, because operation serially performs, substantial amounts of resource is wasted, thus operation completion rate and income not with
The increase of resource and increase, it is but stable in a fixed value.
(4) influence of the operation pressing degree to dispatching algorithm
As shown in figure 15, the present embodiment is tested (4), influence of the test jobs pressing degree to three algorithms.Due to
With the reduction of pressing degree, in the case of the Information invariabilities such as operation quantity, operation has more times can perform, thus three
The completion rate and income of kind algorithm all necessarily increased, thus test a test jobs pressing degree herein to resource utilization
Influence.From experimental result in figure, the resource utilization of FIFO algorithms and EDF algorithms is not substantially by operation pressing degree
Influence, and the resource utilization of TPS algorithms then increases with the step-down of pressing degree.Its reason is FIFO algorithms and EDF algorithms
In be only concerned the income or the precedence relationship of operating room deadline of operation, and TPS algorithms will on the premise of income is met
Consider the utilization rate of resource, in the pressing degree step-down of operation, resource utilization maximizes the free degree of operation in algorithm more
Height, thus can preferably to operation at the beginning of between be adjusted so that its resource utilization is maximum.
Beneficial effects of the present invention:
A kind of two benches job scheduling method and system towards big data platform of the present invention, counted based on MapReduce
Framework is calculated, a kind of two benches job scheduling system and method are proposed for the operation for having deadline to constrain.Carry in the first stage
Go out a kind of maximum return dispatching method based on operation Late Start, the dispatching method constrains according to the deadline of operation
And the avail information of operation calculates and adjusts the Late Start of each operation, and resource is carried out according to adjustment result and divided in advance
Match somebody with somebody, to ensure that the big operation of income can be completed before deadline, so that platform total revenue is maximum;In second stage, protecting
On the premise of demonstrate,proving platform Income Maximum, the job scheduling method based on platform maximum resource utilization rate is proposed, to improve platform money
Source utilization rate.Test result indicates that two benches job scheduling method proposed by the present invention not only realizes platform maximum revenue,
And the resource utilization of platform is also improved, improve the combination property of platform.
The preferred embodiment of the application is the foregoing is only, is not limited to the application, for the skill of this area
For art personnel, the application can have various modifications and variations.It is all within spirit herein and principle, made any repair
Change, equivalent substitution, improvement etc., should be included within the protection domain of the application.
Claims (10)
1. a kind of two benches job scheduling method towards big data platform, it is characterized in that:This method comprises the following steps:
(1) operation for submitting user forms and treats schedule job set, the maximum return scheduling based on operation Late Start:
The resource of platform is pre-allocated according to the deadline of operation, and the result of pre-allocation of resources is compared according to the income of operation
It is adjusted and dispatches, obtains the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;
(2) job scheduling based on platform maximum resource utilization rate:According to the resource service condition of platform, to the money of step (1)
Source predistribution job scheduling result queue is micro-adjusted to obtain final scheduling result queue, ensures the premise of platform Income Maximum
Under the resource utilization of platform is reached highest.
2. a kind of two benches job scheduling method towards big data platform as claimed in claim 1, it is characterized in that:The step
What the maximum return based on operation Late Start of (1) was dispatched suddenly concretely comprises the following steps:
(1-1), which is calculated, treats the initial Late Start of each operation in schedule job set, and during according to initially starting the latest
Between pre-allocation of resources is carried out to the resource of platform;
(1-2) counts the computing resource sum of each period needs, it is pre- to obtain resource according to the allocation result of pre-allocation of resources
Allocation result P_R;
(1-3) judges to whether there is the excess load period in pre-allocation of resources result P_R, if in the presence of, into step (1-4), if
It is not present, into step (1-5);
(1-4) is adjusted to the initial Late Start of the excess load period operation in pre-allocation of resources result P_R, and
According to adjustment result renewal P_R, return to step (1-3);
(1-5) output makes the pre-allocation of resources result P_R of the Income Maximum of service provider.
3. a kind of two benches job scheduling method towards big data platform as claimed in claim 2, it is characterized in that:The step
Suddenly in (1-1), concretely comprising the following steps for pre-allocation of resources is carried out to the resource of platform according to initial Late Start:
Allow each operation to start to perform in its initial Late Start, obtain when all operations are just completed in its deadline
When the number of resources that needs of each period;
The initial Late Start of the operation is:For treating any operation in schedule job set, when other no works
During industry contention for resources, make operation just can be in the operation time started of stop time point completion.
4. a kind of two benches job scheduling method towards big data platform as claimed in claim 2, it is characterized in that:The step
Suddenly in (1-3), judge to judge to run in certain time period with the presence or absence of the excess load period in pre-allocation of resources result P_R
Operation need computing resource number summation whether be more than big data platform computing resource sum;If being more than, the period
For the excess load period, otherwise, the period is the normal duty period;
The quantity that the computing resource sum of the big data platform is all standard Container in big data platform.
5. a kind of two benches job scheduling method towards big data platform as claimed in claim 2, it is characterized in that:The step
Suddenly (1-4) is concretely comprised the following steps:
(1-4-1) chooses last excess load period, and obtains all operations performed within the period and form operation
Collection;
(1-4-2) chooses a minimum proper subclass in operation set, minimum proper subclass by operation set all operations it is initial most
The late time started is advanced to the normal duty period for making the period be normal duty state, by all minimums for meeting condition
The collection of proper subclass is combined into the feasible adjustable strategies set of the period;
(1-4-3) assesses the assessed value of each feasible adjustable strategies in feasible adjustable strategies set according to valuation functions, and selects
Take optimal adjustable strategies;
(1-4-4) is adjusted to the initial Late Start of the operation in the strategy according to optimal adjustable strategies, and root
According to the initial Late Start renewal pre-allocation of resources result P_R of the operation after adjustment.
6. a kind of two benches job scheduling method towards big data platform as claimed in claim 5, it is characterized in that:The step
Suddenly the valuation functions in (1-4-3) are:
<mrow>
<msub>
<mi>Js</mi>
<mi>i</mi>
</msub>
<mo>.</mo>
<mi>p</mi>
<mi>f</mi>
<mo>=</mo>
<mfrac>
<mrow>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>m</mi>
<mo>&Element;</mo>
<msub>
<mi>Js</mi>
<mi>i</mi>
</msub>
</mrow>
</munder>
<msub>
<mi>j</mi>
<mi>m</mi>
</msub>
<mo>.</mo>
<mi>S</mi>
<mi>p</mi>
</mrow>
<mrow>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>m</mi>
<mo>&Element;</mo>
<mi>J</mi>
<mi>u</mi>
</mrow>
</munder>
<msub>
<mi>j</mi>
<mi>m</mi>
</msub>
<mo>.</mo>
<mi>S</mi>
<mi>p</mi>
</mrow>
</mfrac>
<mo>*</mo>
<mi>l</mi>
<mi>a</mi>
<mi>s</mi>
<mi>t</mi>
<mi>s</mi>
<mi>i</mi>
<mi>z</mi>
<mi>e</mi>
</mrow>
Wherein,For the summation of the Profit Assessment value of all operations in adjustable strategies,For in current slot
The summation of the Profit Assessment value of all operations, after lastsize is is adjusted by the strategy, current slot surplus resources account for
Platform computing resource sum percentage, Sp be operation Profit Assessment value, Sp=| a-b |, a is that operation obtains when being timely completed
Financial value, b is the financial value obtained when operation is not timely completed.
7. a kind of two benches job scheduling method towards big data platform as claimed in claim 1, it is characterized in that:The step
Suddenly the job scheduling based on platform maximum resource utilization rate of (2) concretely comprises the following steps:
The final scheduling result queue of (2-1) initializing variable and T moment;
(2-2) finds the operation set that can be performed and not clashed with pre-allocation of resources result P_R at the T moment;
Whether the operation set that (2-3) judges not clash with pre-allocation of resources result P_R is empty set, if empty set, by the T moment
It is arranged to the initial time of next period in pre-allocation of resources result P_R;If not empty set, then select to make in operation set
The minimum operation of resource waste rate, obtains its optimal time started, updates the T moment;
(2-4) repeats step (2-2)-step (2-3), until being provided with an optimal time started for each operation,
Obtain final scheduling result;
(2-5) exports final scheduling result.
8. a kind of two benches job scheduling method towards big data platform as claimed in claim 7, it is characterized in that:The step
Suddenly ratio of the resource waste rate in (2-3) between the non-reusable resource after scheduling and current computing resource
Value;
Currently computing resource is the summation of the computing resource and idle computing resources used.
9. a kind of two benches job scheduling system towards big data platform, the scheduling system is based on as claim 1-8 is any
A kind of described two benches job scheduling method towards big data platform, it is characterized in that:The system includes:
First stage scheduler module, the operation that the first stage scheduler module is used to submit user form and treat schedule job collection
Close, the overall maximum return of Late Start constraint and service provider based on each operation, start the latest using based on operation
The operation that the maximum return scheduling of time is treated in schedule job set carries out adjustment scheduling for the first time, obtains the receipts for making service provider
Beneficial maximum pre-allocation of resources job scheduling result queue;
With
Second stage scheduler module, the second stage scheduler module is used for the resource service condition according to platform, to the first rank
The pre-allocation of resources job scheduling result queue of section scheduler module is micro-adjusted to obtain final scheduling result queue, final scheduling
The resource utilization of platform is set to reach highest on the premise of result queue's guarantee platform Income Maximum, and can enough makes the resource of platform
It is fully used, each operation band has the optimal time started in final scheduling result queue.
10. a kind of two benches job scheduling system towards big data platform as claimed in claim 9, it is characterized in that:It is described
Treat that schedule job collection is:Platform service provider receives a collection of operation within some period and consults signing and operation with user
Related SLA agreements, the collection of operation, which is combined into, treats schedule job collection J, is expressed as J={ j1,j2,…,jn, wherein, n is operation in J
Number;
For treating any operation j in schedule job collection Ji, it is expressed as ji=(ms, rs, mt, rt, dl, bf (t)), wherein, ms is
The Map number of tasks of the operation;Rs is the Reduce number of tasks of the operation;Mt is the average performance times of operation Map tasks;rt
For the average performance times of operation Reduce tasks;Dl is to constrain the deadline of the operation;Bf (t) is the income of the operation
Function;
First scheduler module includes the first job scheduler and first resource scheduler, is wrapped in second scheduler module
Include the second job scheduler and Secondary resource scheduler.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710590748.5A CN107589985B (en) | 2017-07-19 | 2017-07-19 | Two-stage job scheduling method and system for big data platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710590748.5A CN107589985B (en) | 2017-07-19 | 2017-07-19 | Two-stage job scheduling method and system for big data platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107589985A true CN107589985A (en) | 2018-01-16 |
CN107589985B CN107589985B (en) | 2020-04-24 |
Family
ID=61041646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710590748.5A Active CN107589985B (en) | 2017-07-19 | 2017-07-19 | Two-stage job scheduling method and system for big data platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107589985B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112328383A (en) * | 2020-11-19 | 2021-02-05 | 湖南智慧畅行交通科技有限公司 | Priority-based job concurrency control and scheduling algorithm |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077438A (en) * | 2012-12-27 | 2013-05-01 | 深圳先进技术研究院 | Control method and system for scheduling multiple robots |
US20130111453A1 (en) * | 2011-10-31 | 2013-05-02 | Oracle International Corporation | Throughput-aware software pipelining for highly multi-threaded systems |
CN104317650A (en) * | 2014-10-10 | 2015-01-28 | 北京工业大学 | Map/Reduce type mass data processing platform-orientated job scheduling method |
CN104731662A (en) * | 2015-03-26 | 2015-06-24 | 华中科技大学 | Variable parallel work resource allocation method |
CN104778079A (en) * | 2014-01-10 | 2015-07-15 | 国际商业机器公司 | Method and device used for dispatching and execution and distributed system |
CN105159769A (en) * | 2015-09-11 | 2015-12-16 | 国电南瑞科技股份有限公司 | Distributed job scheduling method suitable for heterogeneous computational capability cluster |
CN105718316A (en) * | 2014-12-01 | 2016-06-29 | 中国移动通信集团公司 | Job scheduling method and apparatus |
CN105721565A (en) * | 2016-01-29 | 2016-06-29 | 南京邮电大学 | Game based cloud computation resource allocation method and system |
CN105740051A (en) * | 2016-01-27 | 2016-07-06 | 北京工业大学 | Cloud computing resource scheduling realization method based on improved genetic algorithm |
CN105808334A (en) * | 2016-03-04 | 2016-07-27 | 山东大学 | MapReduce short job optimization system and method based on resource reuse |
US20160218838A1 (en) * | 2013-07-25 | 2016-07-28 | Sony Corporation | Method, base station and terminal for dynamic uplink configuration in wireless communication system |
CN106293893A (en) * | 2015-06-26 | 2017-01-04 | 阿里巴巴集团控股有限公司 | job scheduling method, device and distributed system |
-
2017
- 2017-07-19 CN CN201710590748.5A patent/CN107589985B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130111453A1 (en) * | 2011-10-31 | 2013-05-02 | Oracle International Corporation | Throughput-aware software pipelining for highly multi-threaded systems |
CN103077438A (en) * | 2012-12-27 | 2013-05-01 | 深圳先进技术研究院 | Control method and system for scheduling multiple robots |
US20160218838A1 (en) * | 2013-07-25 | 2016-07-28 | Sony Corporation | Method, base station and terminal for dynamic uplink configuration in wireless communication system |
CN104778079A (en) * | 2014-01-10 | 2015-07-15 | 国际商业机器公司 | Method and device used for dispatching and execution and distributed system |
CN104317650A (en) * | 2014-10-10 | 2015-01-28 | 北京工业大学 | Map/Reduce type mass data processing platform-orientated job scheduling method |
CN105718316A (en) * | 2014-12-01 | 2016-06-29 | 中国移动通信集团公司 | Job scheduling method and apparatus |
CN104731662A (en) * | 2015-03-26 | 2015-06-24 | 华中科技大学 | Variable parallel work resource allocation method |
CN106293893A (en) * | 2015-06-26 | 2017-01-04 | 阿里巴巴集团控股有限公司 | job scheduling method, device and distributed system |
CN105159769A (en) * | 2015-09-11 | 2015-12-16 | 国电南瑞科技股份有限公司 | Distributed job scheduling method suitable for heterogeneous computational capability cluster |
CN105740051A (en) * | 2016-01-27 | 2016-07-06 | 北京工业大学 | Cloud computing resource scheduling realization method based on improved genetic algorithm |
CN105721565A (en) * | 2016-01-29 | 2016-06-29 | 南京邮电大学 | Game based cloud computation resource allocation method and system |
CN105808334A (en) * | 2016-03-04 | 2016-07-27 | 山东大学 | MapReduce short job optimization system and method based on resource reuse |
Non-Patent Citations (3)
Title |
---|
J HU等: "An Ant Colony Optimization for Grid Task Scheduling with Multiple QoS Dimensions", 《2009 EIGHTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING》 * |
王习特等: "MapReduce集群中最大收益问题的研究", 《计算机学报》 * |
陈晓旭等: "基于最小费用最大流的大规模资源调度方法", 《软件学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112328383A (en) * | 2020-11-19 | 2021-02-05 | 湖南智慧畅行交通科技有限公司 | Priority-based job concurrency control and scheduling algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN107589985B (en) | 2020-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ibrahim et al. | An integer linear programming model and adaptive genetic algorithm approach to minimize energy consumption of cloud computing data centers | |
CN109800071A (en) | A kind of cloud computing method for scheduling task based on improved adaptive GA-IAGA | |
Li et al. | An greedy-based job scheduling algorithm in cloud computing. | |
CN108428051B (en) | MapReduce job scheduling method and device facing big data platform and based on maximized benefits | |
CN104333569A (en) | Cloud task scheduling algorithm based on user satisfaction | |
CN105373426B (en) | A kind of car networking memory aware real time job dispatching method based on Hadoop | |
Chakravarthi et al. | TOPSIS inspired budget and deadline aware multi-workflow scheduling for cloud computing | |
CN108108225A (en) | A kind of method for scheduling task towards cloud computing platform | |
CN109710372B (en) | Calculation intensive cloud workflow scheduling method based on owl search algorithm | |
Zhou et al. | A novel task scheduling algorithm integrated with priority and greedy strategy in cloud computing | |
Huang et al. | Platform profit maximization on service provisioning in mobile edge computing | |
CN115714820A (en) | Distributed micro-service scheduling optimization method | |
CN107589985A (en) | A kind of two benches job scheduling method and system towards big data platform | |
Chen et al. | Deadline-constrained MapReduce scheduling based on graph modelling | |
Bagheri et al. | Enhancing energy efficiency in resource allocation for real-time cloud services | |
Maurya | Resource and task clustering based scheduling algorithm for workflow applications in cloud computing environment | |
CN112306642B (en) | Workflow scheduling method based on stable matching game theory | |
CN114980216A (en) | Dependent task unloading system and method based on mobile edge calculation | |
CN115599522A (en) | Task scheduling method, device and equipment for cloud computing platform | |
Zhang et al. | A workflow scheduling method for cloudlet management in mobile cloud | |
Sun et al. | An improved budget-deadline constrained workflow scheduling algorithm on heterogeneous resources | |
Thai et al. | Algorithms for optimising heterogeneous Cloud virtual machine clusters | |
George et al. | An objective study on improvement of task scheduling mechanism using computational intelligence in cloud computing | |
Xu et al. | Multi resource scheduling with task cloning in heterogeneous clusters | |
Hu | Hybrid dynamic scheduling of mapreduce and spark services based on the profit model in the cloud computing platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |