CN107589985A - A kind of two benches job scheduling method and system towards big data platform - Google Patents

A kind of two benches job scheduling method and system towards big data platform Download PDF

Info

Publication number
CN107589985A
CN107589985A CN201710590748.5A CN201710590748A CN107589985A CN 107589985 A CN107589985 A CN 107589985A CN 201710590748 A CN201710590748 A CN 201710590748A CN 107589985 A CN107589985 A CN 107589985A
Authority
CN
China
Prior art keywords
platform
resource
resources
job
allocation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710590748.5A
Other languages
Chinese (zh)
Other versions
CN107589985B (en
Inventor
史玉良
胡静
李庆忠
张世栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201710590748.5A priority Critical patent/CN107589985B/en
Publication of CN107589985A publication Critical patent/CN107589985A/en
Application granted granted Critical
Publication of CN107589985B publication Critical patent/CN107589985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of two benches job scheduling method and system towards big data platform, the operation that user is submitted forms and treats schedule job set, the maximum return scheduling based on operation Late Start:The resource of platform is pre-allocated according to the deadline of operation, and is adjusted and dispatches according to the result of the income of operation comparison pre-allocation of resources, obtains the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;Job scheduling based on platform maximum resource utilization rate:According to the resource service condition of platform, above-mentioned pre-allocation of resources job scheduling result queue is micro-adjusted to obtain final scheduling result queue, ensures to make the resource utilization of platform reach highest on the premise of platform Income Maximum.Test result indicates that the present invention not only realizes platform maximum revenue, and the resource utilization of platform is also improved, improve the combination property of platform.

Description

A kind of two benches job scheduling method and system towards big data platform
Technical field
The invention belongs to the technical field that big data calculates, more particularly to a kind of two benches operation towards big data platform Dispatching method and system.
Background technology
In recent years, flourishing with cloud computing and Internet technology, Data visualization goes out the sustainable growth mould of explosion type Formula, big data epoch quietly arrive.Traditional data processing technique and instrument can not meet that the data processing of New Times will Ask, therefore big data platform arises at the historic moment.Big data platform supports a variety of Computational frames, can be that multiple users provide clothes simultaneously Business.But in big data platform, the resource of multiple users share platform, for platform provider, how efficiently to dispatch The operation of multi-user, the resource of platform can be made full use of, and can meets the SLA requirement of most users, makes the receipts of oneself It is beneficial maximum, already become a urgent problem to be solved.
At present, the job shop scheduling problem that existing Many researchers are directed in big data platform has made intensive studies, and carries Many solution methods are gone out.What Zhang Z et al. were delivered《Optimizing Completion Time and Resource Provisioning of Pig Programs》The resource distribution performance estimated based on deadline is proposed for Pig operations Optimized model, the model eliminates the uncertain problems in the concurrent operation of pig programs execution, but the model does not consider The problem of platform income.Liu et al. proposes the job parallelism dispatching method based on priority, and this method will using virtual technology The computing capability of each node be divided into foreground virtual machine (with higher CPU priority) layer and background virtual machine (with compared with Low CPU priority) layer, by the division of two levels, the balanced load of platform of this method, it is sufficiently used platform Cpu resource, availability improve the execution efficiency of operation, shorten the response time of operation, but this dispatching method is not There is the problem of considering platform income.As can be seen here, existing achievement in research is for the operation under various boundary conditions, different background Scheduling problem has made intensive studies, and achieves a series of achievements, but the method in these achievements does not all account for platform The problem of income, the maximized problem of platform maximum revenue peace Taiwan investment source utilization rate is not considered more simultaneously.
Big data platform can be simultaneously multiple user services, for platform service provider, reasonably dispatch this A little user services can not only meet the needs of multi-user simultaneously, increase oneself income, can also improve platform utilization rate, use Family Job execution process is as shown in Figure 1.
From fig. 1, it can be seen that have 6 operations of three users in platform, each each two operations of user, when user has submitted work After industry, SLA agreements that platform service provider can sign according to the resource of platform and user etc. are scheduled and given birth to operation Into queue is performed, as a result as shown in Figure 1.After the Job execution of user, service provider can obtain corresponding income.It is preferable In the case of, the resource of platform is enough, can meet the needs of all users, and now the income of service provider is also maximum;But In reality, the resource of platform is limited, and can not probably meet the needs of all users, is provided for platform service For business, problems with will be faced:
(1) according in platform can resource and user SLA requirement, how the operation of scheduling multi-user, can just make The Income Maximum of oneself;
(2) on the basis of (1), it is assumed that generated the Job execution team of an Income Maximum that can make service provider How row, adjust Job execution queue and both can guarantee that Income Maximum, and can further improves the resource utilization of platform.
In summary, how to solve platform maximum revenue peace Taiwan investment source utilization rate maximum simultaneously in big data platform The problem of change, still lack effective solution.
The content of the invention
The present invention is in order to solve the above problems, there is provided a kind of two benches job scheduling method towards big data platform.This The dispatching method of invention has considered deadline constraint, the maximum return of platform and maximum resource utilization rate of operation etc. about Beam condition, using the two benches job scheduling method based on income and resource, the dispatching method can not only meet that big data is put down The operation deadline of platform user is required, it can also be ensured that platform resource utilization rate is realized while realizing platform maximum revenue Highest.
To achieve these goals, the present invention is using a kind of following technical scheme:
A kind of two benches job scheduling method towards big data platform, this method comprise the following steps:
(1) operation for submitting user forms and treats schedule job set, carries out the maximum based on operation Late Start Income is dispatched:The resource of platform is pre-allocated according to the deadline of operation, and it is pre- according to the income of operation comparison resource The result of distribution is adjusted and dispatched, and obtains the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;
(2) job scheduling based on platform maximum resource utilization rate is carried out:According to the resource service condition of platform, to step (1) pre-allocation of resources job scheduling result queue is micro-adjusted to obtain final scheduling result queue, ensures platform income most The resource utilization of platform is set to reach highest on the premise of big.
Further, what the maximum return based on operation Late Start of the step (1) was dispatched concretely comprises the following steps:
(1-1), which is calculated, treats the initial Late Start of each operation in schedule job set, and according to initially opening the latest Begin resource progress pre-allocation of resources of the time to platform;
(1-2) counts the computing resource sum of each period needs, provided according to the allocation result of pre-allocation of resources Source predistribution result P_R;
(1-3) judges to whether there is the excess load period in pre-allocation of resources result P_R, if in the presence of into step (1- 4), if being not present, into step (1-5);
(1-4) adjusts to the initial Late Start of the excess load period operation in pre-allocation of resources result P_R It is whole, and according to adjustment result renewal P_R, return to step (1-3);
(1-5) output makes the pre-allocation of resources result P_R of the Income Maximum of service provider.
Further, in the step (1-1), resource is carried out to the resource of platform according to initial Late Start and divided in advance That matches somebody with somebody concretely comprises the following steps:
Allow each operation to start to perform in its initial Late Start, obtain when all operations are just in its deadline The number of resources that each period needs during completion;
The initial Late Start of the operation is:For treating any operation in schedule job set, when it is its During his operation contention for resources, make operation just can be in the operation time started of stop time point completion;
In the present invention, when the deadline of multiple operations is close, because platform computing resource is limited, operating room may Generation resource is fought for, and operation can not be completed when its initial Late Start starts and performed before deadline.Therefore, it is necessary to The initial Late Start that the operation of period is fought for resource is adjusted, and the present invention starts the latest according to the initial of operation Time carries out pre-allocation of resources to all operations, effectively determines and the period that resource is fought for occurs.
Further, in the step (1-3), judge to whether there is the excess load period in pre-allocation of resources result P_R The calculating whether the computing resource number summation that judging the operation run in certain time period needs is more than big data platform provides Source sum;If being more than, the period is the excess load period, and otherwise, the period is the normal duty period.
The quantity that the computing resource sum of the big data platform is all standard Container in big data platform.
Further, step (1-4) is concretely comprised the following steps:
(1-4-1) chooses last excess load period, and obtains all operations performed within the period and formed Operation set;
(1-4-2) in operation set choose a minimum proper subclass, minimum proper subclass by operation set all operations just Beginning Late Start is advanced to the normal duty period for making the period be normal duty state, by all conditions that meet The collection of minimum proper subclass is combined into the feasible adjustable strategies set of the period;
(1-4-3) assesses the assessed value of each feasible adjustable strategies in feasible adjustable strategies set according to valuation functions, And choose optimal adjustable strategies;
(1-4-4) is adjusted to the initial Late Start of the operation in the strategy according to optimal adjustable strategies, And pre-allocation of resources result P_R is updated according to the initial Late Start of the operation after adjustment.
Further, the valuation functions in the step (1-4-3) are:
Wherein,For the summation of the Profit Assessment value of all operations in adjustable strategies,For current time The summation of the Profit Assessment value of all operations in section, after lastsize is is adjusted by the strategy, the remaining money of current slot Source account for platform computing resource sum percentage, Sp be operation Profit Assessment value, Sp=| a-b |, when a is that operation is timely completed The financial value of acquisition, b are the financial values obtained when operation is not timely completed.
Further, the job scheduling based on platform maximum resource utilization rate of the step (2) concretely comprises the following steps:
The final scheduling result queue of (2-1) initializing variable and T moment;
(2-2) finds the operation set that can be performed and not clashed with pre-allocation of resources result P_R at the T moment;
Whether the operation set that (2-3) judges not clash with pre-allocation of resources result P_R is empty set, if empty set, by T Moment is arranged to the initial time of next period in pre-allocation of resources result P_R;If not empty set, then selected in operation set The operation for making resource waste rate minimum is selected, obtains its optimal time started, updates the T moment;
(2-4) repeats step (2-2)-step (2-3), when being provided with an optimal beginning for each operation Between, obtain final scheduling result;
(2-5) exports final scheduling result.
Further, the resource waste rate in the step (2-3) for the non-reusable resource after scheduling and is worked as Ratio between preceding computing resource;
Currently computing resource is the summation of the computing resource and idle computing resources used.
In big data platform, in order to make full use of the resource of platform, the income of service provider is improved, the present invention provides one Kind is towards the two benches job scheduling system of big data platform, and the scheduling system is based on above-mentioned a kind of towards the two of big data platform Discontinuous running dispatching method.
To achieve these goals, the present invention is using a kind of following technical scheme:
A kind of two benches job scheduling system towards big data platform, the system include:
First stage scheduler module, the first stage scheduler module, which is used to form the operation that user submits, treats that scheduling is made Industry set, based on each operation Late Start constraint and service provider overall maximum return, using based on operation the latest The operation that the maximum return scheduling of time started is treated in schedule job set carries out adjustment scheduling for the first time, obtains making service provider Income Maximum pre-allocation of resources job scheduling result queue;
With
Second stage scheduler module, the second stage scheduler module are used for according to the resource service condition of platform, to the The pre-allocation of resources job scheduling result queue of one stage scheduler module is micro-adjusted to obtain final scheduling result queue, finally The resource utilization of platform is set to reach highest on the premise of scheduling result queue guarantee platform Income Maximum, and can enough makes platform Resource is fully used, and each operation band has the optimal time started in final scheduling result queue.
Further, it is described to treat that schedule job collection is:Platform service provider receives a collection of operation within some period And consult to sign the SLA agreement related with operation to user, the collection of operation, which is combined into, treats schedule job collection J, is expressed as J={ j1, j2,…,jn, wherein, n is the number of operation in J;
For treating any operation j in schedule job collection Ji, it is expressed as ji=(ms, rs, mt, rt, dl, bf (t)), wherein, Ms is the Map number of tasks of the operation;Rs is the Reduce number of tasks of the operation;When mt is the average execution of operation Map tasks Between;Rt is the average performance times of operation Reduce tasks;Dl is to constrain the deadline of the operation;Bf (t) is the operation Revenue function.
Further, first scheduler module includes the first job scheduler and first resource scheduler, and described Two scheduler modules include the second job scheduler and Secondary resource scheduler.
Beneficial effects of the present invention:
A kind of two benches job scheduling method and system towards big data platform of the present invention, counted based on MapReduce Framework is calculated, a kind of two benches job scheduling system and method are proposed for the operation for having deadline to constrain.Carry in the first stage Go out a kind of maximum return dispatching method based on operation Late Start, the dispatching method constrains according to the deadline of operation And the avail information of operation calculates and adjusts the Late Start of each operation, and resource is carried out according to adjustment result and divided in advance Match somebody with somebody, to ensure that the big operation of income can be completed before deadline, so that platform total revenue is maximum;In second stage, protecting On the premise of demonstrate,proving platform Income Maximum, the job scheduling method based on platform maximum resource utilization rate is proposed, to improve platform money Source utilization rate.Test result indicates that two benches job scheduling method proposed by the present invention not only realizes platform maximum revenue, And the resource utilization of platform is also improved, improve the combination property of platform.
Brief description of the drawings
Fig. 1 is big data platform multi user operation implementation procedure schematic diagram;
Fig. 2 is flow chart of the method for the present invention;
Fig. 3 is the method flow diagram of the scheduling of the maximum return based on operation Late Start of the present invention;
Fig. 4 is the method flow diagram of the job scheduling based on platform maximum resource utilization rate of the present invention;
Fig. 5 is the system structure diagram of the present invention;
Fig. 6 is relation schematic diagram of the resource utilization with average operation size of the present invention;
Fig. 7 is relation schematic diagram of the operation completion rate with average operation size of the present invention;
Fig. 8 is relation schematic diagram of the total revenue with average operation size of the present invention;
Fig. 9 is influence schematic diagram of the operation set scale of the present invention to resource utilization;
Figure 10 is influence schematic diagram of the operation set scale of the present invention to operation completion rate;
Figure 11 is influence schematic diagram of the operation set scale of the present invention to income;
Figure 12 is influence schematic diagram of the computing resource sum of the present invention to resource utilization;
Figure 13 is influence schematic diagram of the computing resource sum of the present invention to operation completion rate;
Figure 14 is influence schematic diagram of the computing resource sum of the present invention to income;
Figure 15 is influence schematic diagram of the operation pressing degree of the present invention to resource utilization.
Embodiment:
It is noted that described further below is all exemplary, it is intended to provides further instruction to the application.It is unless another Indicate, all technologies and scientific terminology that the present invention uses have leads to the application person of an ordinary skill in the technical field The identical meanings understood.
It should be noted that term used herein above is merely to describe embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in this manual using term "comprising" and/or " bag Include " when, it indicates existing characteristics, step, operation, device, component and/or combinations thereof.
In the case where not conflicting, the feature in embodiment and embodiment in the application can be mutually combined.Tie below Closing accompanying drawing, the invention will be further described with embodiment.
Embodiment 1:
As background technology is introduced, platform maximum revenue peace Taiwan investment can not effectively be solved in the prior art by existing A kind of maximized problem of source utilization rate, there is provided two benches job scheduling method towards big data platform.The scheduling of the present invention Method has considered the constraintss such as deadline constraint, the maximum return of platform and the maximum resource utilization rate of operation, adopts With the two benches job scheduling method based on income and resource, the dispatching method can not only meet the work of big data platform user Industry deadline is required, it can also be ensured that platform resource utilization rate highest is realized while realizing platform maximum revenue.
To achieve these goals, the present invention is using a kind of following technical scheme:
As shown in Fig. 2
A kind of two benches job scheduling method towards big data platform, this method comprise the following steps:
(1) operation for submitting user forms and treats schedule job set, the maximum return based on operation Late Start Scheduling:The resource of platform is pre-allocated according to the deadline of operation, and pre-allocation of resources is compared according to the income of operation Result be adjusted and dispatch, obtain the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;
(2) job scheduling based on platform maximum resource utilization rate:According to the resource service condition of platform, to step (1) Pre-allocation of resources job scheduling result queue be micro-adjusted to obtain final scheduling result queue, ensure platform Income Maximum Under the premise of the resource utilization of platform is reached highest.
First stage:Maximum return scheduling based on operation Late Start
First, operation user submitted forms and treats schedule job set, wherein, treat that schedule job collection is:
Platform service provider receives a collection of operation within some period and signed to user's negotiation related with operation SLA agreements, the collection of operation, which is combined into, treats schedule job collection J, is expressed as J={ j1,j2,…,jn, wherein, n is of operation in J Number;
The present invention is the job scheduling in the big data platform based on MapReduce Computational frames under isomorphism cluster, because This assumes that the hardware configuration of each node is roughly the same, and process performance is consistent with stability, to any one task no matter at that Node is run, and its run time is all consistent.For any operation, present invention assumes that the Map number of tasks and Reduce of known operation Number of tasks, and the average performance times of Map tasks and Reduce tasks, in addition the present invention do not consider the data skew feelings of operation Condition, the processing time for giving tacit consent to each Map tasks (or each reduce tasks) of operation are consistent.
For each operation in J, the execution time of present invention concern operation, number of resources, deadline, income are taken Etc. information, for treating any operation j in schedule job collection Ji, it is expressed as ji=(ms, rs, mt, rt, dl, bf (t)), wherein, Ms is the Map number of tasks of the operation;Rs is the Reduce number of tasks of the operation;When mt is the average execution of operation Map tasks Between;Rt is the average performance times of operation Reduce tasks;Dl is to constrain the deadline of the operation;Bf (t) is the operation Revenue function.
The revenue function bf (t) of operation is one on operation actual finish time ji.end piecewise function:
Wherein, a and b is illustrated respectively in the income of completion operation acquisition in deadline and can not be within deadline Completing the income of operation acquisition (for without loss of generality, when can not complete before deadline, needs to compensate the corresponding amount of money of user When financial value b negative number representations).
The deadline that operation is provided in the present invention is all soft deadline, i.e. operation can not be completed before its deadline Shi Buhui causes serious consequence, will not also abandon the execution of operation.
As shown in figure 3, the tool of the scheduling of the maximum return based on operation Late Start of step (1) first stage Body step is:
(1-1) is calculated and is treated schedule job set J={ j1,j2,…,jnIn each operation jiInitial Late Start ji.Tols, and according to initial Late Start ji.TolsPre-allocation of resources is carried out to the resource of platform;
(1-2) counts the computing resource number that each period operation operation needs according to the allocation result of pre-allocation of resources Summation, obtain pre-allocation of resources result P_R;
(1-3) judges to whether there is the excess load period in pre-allocation of resources result P_R, if in the presence of into step (1- 4), if being not present, into step (1-5);
(1-4) adjusts to the initial Late Start of the excess load period operation in pre-allocation of resources result P_R It is whole, and according to adjustment result renewal P_R, return to step (1-3);
(1-5) output makes the pre-allocation of resources result P_R of the Income Maximum of service provider.
In the present embodiment, being provided according to initial Late Start to the resource of platform in the step (1-1) Source predistribution concretely comprises the following steps:
Allow each operation to start to perform in its initial Late Start, obtain when all operations are just in its deadline The number of resources that each period needs during completion;
The initial Late Start j of the operationi.TolsFor:For treating schedule job set J={ j1,j2,…,jn} In any operation ji, when other no operation contention for resources, make operation jiJust the operation that can be completed in stop time point is opened Begin the time;If operation jiDeadline be ji.dl, the initial Late Start calculation formula of the operation is as follows:
In the present invention, to any operation ji, when not having contention for resources, the operation must be in initial Late Start ji.TolsBefore start perform just can guarantee that the operation is completed before deadline, when operation is in initial Late Start ji.Tols Moment starts to complete in stop time point just when performing.When the deadline of multiple operations is close, because platform calculates Resource-constrained, operating room may occur resource and fight for, operation jiIn its initial Late Start ji.TolsStarting can not when performing Completed before deadline.Therefore, it is necessary to fight for the operation j of period to resourceiInitial Late Start ji.TolsCarry out Adjustment, the initial Late Start j of the invention according to operationi.TolsTo all operation jiPre-allocation of resources is carried out, that is, allows operation jiIn ji.TolsMoment starts to perform, and thus obtains each period needs when all operations are just completed its deadline Computing resource sum, effectively determine and period for fighting for of resource occur.
In the present embodiment, the pre-allocation of resources result P_R in the step (1-2) is:
P_R={ (P1,R1),(P2,R2),…,(Pt,Rt), wherein, PiFor certain time period, RiFor in certain time period The computing resource number summation that the operation of operation needs;
In the present embodiment, in the computing resource number summation that the operation run in certain time period needs, the present invention is right The state being likely to occur in each period in pre-allocation of resources result P_R is defined:
Normal duty state:In pre-allocation of resources result P_R, for certain time period P, run in period P The computing resource number summation R that needs of operation be less than the computing resource sum TS of big data platform, then claim the state of the period For normal duty state, the period in normal duty state is referred to as normal time section.
Overload state:In pre-allocation of resources result P_R, for certain time period P, run in period P The computing resource number summation R that operation needs is more than the computing resource sum TS of big data platform, then the state of the period is referred to as Overload state, the period in overload state are referred to as the excess load period.
In the excess load period, the computing resource (Computing Resources) of big data platform can not meet own The demand of operation, operating room are fought for resource occurs, and the delay for ultimately resulting in operation is completed.
In the present embodiment, in the step (1-3), when judging to whether there is in pre-allocation of resources result P_R excess load Between section be judge computing resource number summation that the operation that is run in certain time period needs whether more than big data platform meter Calculate total number resource TS;If being more than, the period is the excess load period, and otherwise, the period is the normal duty period.
The computing resource of the big data platform is total (Total Number of Computing Resources, TS) For the quantity of all standard Container in big data platform.
For example, including 1 host node and NdnIt is each c from the configuration of node in the individual big data platform from node Core CPU, mG internal memory, each Container are dimensioned to c1Core CPU m1G internal memories, then the computing resource of the big data platform is total Number is:
TS=min (c/c1,m/m1)*Ndn
Wherein, min (c/c1,m/m1) it is the maximum Container quantity that each calculate node possesses.
It is adjusted and ensures platform income most, it is necessary to find an optimal correction strategy for the excess load period Greatly:
In the present embodiment, step (1-4) is concretely comprised the following steps:
(1-4-1) chooses last excess load period, and obtains all operations performed within the period and formed Operation set Ju;
(1-4-2) chooses a minimum proper subclass in operation set JuMinimum proper subclass Js is by operation set Ju The initial Late Start of all operations is advanced to the normal duty period for making the period be normal duty state, by institute The collection for having the minimum proper subclass Js of the condition of satisfaction is combined into the feasible adjustable strategies set CL={ Js of the period1,Js2,…, Jsm};
(1-4-3) assesses feasible adjustable strategies set CL={ Js according to valuation functions1,Js2,…,JsmIn it is each can The assessed value of row adjustable strategies, and choose optimal adjustable strategies;
(1-4-4) is adjusted to the initial Late Start of the operation in the strategy according to optimal adjustable strategies, And pre-allocation of resources result P_R is updated according to the initial Late Start of the operation after adjustment.
To find optimal correction strategy, it is as follows that feasible adjustable strategies set is defined first:
Feasible adjustable strategies set:It is super negative for some in the pre-allocation of resources result P_R based on operation deadline The operation set Ju of operation, selects a minimum proper subclass in operation set Ju in the lotus periodBy all operations in Js Late Start shift to an earlier date so that the period operation take computing resource sum be no more than big data platform calculating Total number resource TS (period will be adjusted to normal duty state) by overload state, such a minimum proper subclass Js A referred to as feasible adjustable strategies.All Js for meeting condition set CL={ Js1,Js2,…,JsmIt is referred to as the period Feasible adjustable strategies set.
For feasible adjustable strategies set CL={ Js1,Js2,…,JsmIn each operation set Jsi, for by excess load shape State is adjusted to normal duty state, should be by JsiIn the Late Starts of all operations be advanced to:
Wherein ji∈Jsi, TcsBetween at the beginning of for the timeout period.
For a period in overload state, its feasible adjustable strategies more than one, in feasible adjustment Any one feasible adjustable strategies are selected to be adjusted the excess load period in strategy set, to ensure big data platform Income Maximum is, it is necessary to carry out Profit Assessment to all feasible schedule strategies in feasible adjustable strategies set, and tie according to assessing Fruit selects optimal adjustable strategies wherein.Following two aspects are mainly considered during assessment:
(1) Profit Assessment value.
To handle timeout period (excess load period), the Late Start of operation is advanced to by the present invention ji.TlsPlace, works as ji.TlsDuring < 0, represent that current point in time can not ensure that operation can be completed before deadline, at this moment need to fit When some operations are given up, it is preferential to perform the big operation of Profit Assessment value to ensure the total revenue of platform maximum.Therefore, It should preferentially ensure that the big operation of Profit Assessment value can be completed before deadline when being adjusted by adjustable strategies, i.e., can Row adjustable strategies set CL={ Js1,Js2,…,JsmIn, optimal adjustable strategies should be minimum to total revenue assessed value Operation set is adjusted.For any one operation, the Profit Assessment value Sp of operation be the financial value a that is obtained when being timely completed with Difference between the income b obtained when operation is not timely completed:
Sp=| a-b |.
(2) cost is adjusted.
When being adjusted to operation, the present invention will not only consider Profit Assessment value, it is also necessary to which consideration is adjusted operation and accounted for Number of resources influences caused by being adjusted on operation, that is, adjusts cost.Consider the situation in which, timeout mode is at one Period in, have two operation jaAnd jb, and ja.Sp > jb.Sp, only should be by j from the point of view of incomebStart the latest Time advance is to ensure at least jaIt can complete.Yet with jaThe computing resource needed seldom jbThe computing resource needed is a lot, By jbLate Start shift to an earlier date after the period above can be caused to be changed into timeout mode, it is necessary to constantly to the period above It is adjusted and finally even results in jbOr the T of other operationslsLess than 0 so as to being rejected;And if jaLate Start In advance then due to jaThus all operations can be completed the resource of occupancy before deadline less.
To sum up consider the factor of two aspects, the present invention proposes a tune that optimization aim is turned to platform Income Maximum Whole Policy evaluation function, the less adjustable strategies of score are optimal correction strategy, and valuation functions are as follows:
Wherein,For the summation of the Profit Assessment value of all operations in adjustable strategies,For current slot The summation of the Profit Assessment value of interior all operations, after lastsize is is adjusted by the strategy, current slot surplus resources Account for the percentage of platform computing resource sum.
In the process, it is necessary to calculate total revenue.The operation set J=that service provider receives in big data platform {j1,j2,…,jnIn each operation have the deadline constraint j of oneselfiAnd the revenue function j on the time .dli.bf (t), the actual finish time j of operationi.end in ji.dl preceding or jiDifferent incomes is obtained when after .dl, for all operations, Total revenue calculates function:
As described above, the present invention proposes the maximum return dispatching algorithm based on operation Late Start, following institute Show.
In algorithm 1,1-3 rows perform step (1-1), and schedule job set J={ j are treated in calculating1,j2,…,jnIn it is each Operation jiInitial Late Start ji.Tols, and according to initial Late Start ji.TolsThe resource of platform is provided Source pre-allocates.
4th row performs step (1-2), and according to the allocation result of pre-allocation of resources, counting each period operation operation needs The computing resource number summation wanted, obtains pre-allocation of resources result P_R.
5-23 rows perform step (1-3) and (1-4), when judging to whether there is in pre-allocation of resources result P_R excess load Between section, if in the presence of being adjusted to the initial Late Start of the period operation.
The initial Late Start of excess load period operation in the pre-allocation of resources result P_R of step (1-4) When specifically being adjusted, 6-7 rows perform step (1-4-1), and the 6th row chooses last excess load period, and the 7th row obtains The operation set Ju performed within the period;Eighth row performs step (1-4-2), according to operation set Ju by all conditions that meet Minimum proper subclass Js collection is combined into the feasible adjustable strategies set CL={ Js of the period1,Js2,…,Jsm};9-16 rows Step (1-4-3) is performed, the assessed value of each feasible adjustable strategies is assessed according to valuation functions and chooses optimal adjustable strategies; 17-23 rows perform step (1-4-4), and after selecting optimal correction strategy, Late Start tune is carried out to the operation in the strategy It is whole, and according to adjustment result renewal P_R;
Circulation performs 6-23 rows untill the excess load period is not present in P_R.
24th row performs step (1-5), is exported P_R as returning result.
In algorithm 1, there are two circulations outside, and there is a circulation inside, so the time complexity of algorithm 1 is n+n* (n+ N), i.e. 2n2+n。
Second stage:Job scheduling based on platform maximum resource utilization rate
As shown in figure 4, the job scheduling based on platform maximum resource utilization rate of the step (2) concretely comprises the following steps:
The final scheduling result queue of (2-1) initializing variable and T moment;
(2-2) finds the operation set that can be performed and not clashed with pre-allocation of resources result P_R at the T moment;
Whether the operation set that (2-3) judges not clash with pre-allocation of resources result P_R is empty set, if empty set, by T Moment is arranged to the initial time of next period in pre-allocation of resources result P_R;If not empty set, then selected in operation set The operation for making resource waste rate minimum is selected, obtains its optimal time started, updates the T moment;
(2-4) repeats step (2-2)-step (2-3), when being provided with an optimal beginning for each operation Between, obtain final scheduling result;
(2-5) exports final scheduling result.
In the present embodiment, the resource waste rate in the step (2-3) is the non-reusable resource after scheduling With the ratio between computing resource currently;
Currently computing resource is the summation of the computing resource and idle computing resources used.
By the maximum return dispatching algorithm based on operation Late Start of first stage, the present invention obtains making platform The pre-allocation of resources result P_R of Income Maximum, operation is performed according to this result and although can guarantee that the big operation of income is all being cut Only completed before the time, but do not ensure that the resource utilization highest of platform.Therefore, it is of the invention towards big data platform The second stage of two benches job scheduling method proposes a secondary adjustment scheduling based on platform maximum resource utilization rate and calculated Method, platform resource utilization rate is maximized on the premise of platform Income Maximum is ensured.
In the dispatching algorithm of second stage, the resource that the present invention considers to be obtained according to first stage algorithm pre-allocates knot in advance Fruit considers to dispatch the resource utilization of rear platform to ensure being timely completed for all operations.To make platform resource utilization rate Maximum, the present invention use waste of resource rate (Wrr) computing resource utilization rate is assessed, waste of resource rate WrrI.e. by scheduling Non-reusable resource W afterwardsrWith computing resource A currentlyr(computing resource that calculating uses and idle computing resources it is total And) between ratio, i.e.,:
Waste of resource rate WrrThe smaller resource utilization for representing current platform is maximum.
As described above, it is as follows set forth herein the job scheduling algorithm based on platform maximum resource utilization rate:
In algorithm 2,
1-2 rows perform step (2-1), initializing variable;
4-10 rows perform step (2-2), find the operation set that can be performed and not clashed with predistribution resource at the T moment Ej
11-20 rows perform step (2-3), and the 11st row judges the operation set not clashed with pre-allocation of resources result P_R Whether it is empty set,
12-18 rows, if EjIt is not sky, then in EjMiddle selection makes the minimum operation of resource waste rate and sets its optimal beginning Time is current time T;
20 rows, if EjFor sky, then moment T is arranged to the initial time of next period in P_R;
Circulation performs 3-20 rows, until being provided with an optimal time started for each operation;
21 rows perform step (2-5), and result is returned.
There are two circulations in algorithm 2, so time complexity is n2.
After secondary adjustment scheduling, the present invention is treats that each operation that schedule job is concentrated determines opening for operation The time is moved, because after secondary adjustment scheduling, the resource utilization of operation becomes big, when the actual time started of operation is with completing Between in advance, many in pre-allocation of resources algorithm because the inadequate and abandoned operation of resource from new obtains performing chance, because And after secondary adjustment scheduling, the income of cluster there will be an opportunity to be increased again.
Embodiment 2:
In big data platform, in order to make full use of the resource of platform, the income of service provider is improved, the present invention provides one Kind is towards the two benches job scheduling system of big data platform, and the scheduling system is based on a kind of towards big data in embodiment 1 The two benches job scheduling method of platform.
To achieve these goals, the present invention is using a kind of following technical scheme:
As shown in figure 5,
A kind of two benches job scheduling system towards big data platform, the system include:
First stage scheduler module, the first stage scheduler module, which is used to form the operation that user submits, treats that scheduling is made Industry set, in operation set each operation have the SLA information of oneself, Late Start based on each operation constraint kimonos The overall maximum return of business business, treated using the maximum return scheduling based on operation Late Start in schedule job set Operation carries out adjustment scheduling for the first time, obtains the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;
With
Second stage scheduler module, the second stage scheduler module are used for the resource service condition according to platform, protected On the premise of the Income Maximum for demonstrate,proving service provider, secondary adjustment scheduling is carried out for the purpose of improving platform resource utilization rate, to first The pre-allocation of resources job scheduling result queue of stage scheduler module is micro-adjusted to obtain final scheduling result queue, final to adjust The resource utilization of platform is set to reach highest on the premise of degree result queue guarantee platform Income Maximum, and can enough makes the money of platform Source is fully used, and each operation band has the optimal time started in final scheduling result queue.
In the present embodiment, first scheduler module includes the first job scheduler and first resource scheduler, institute Stating the second scheduler module includes the second job scheduler and Secondary resource scheduler.The scheduling of operation is all to pass through job scheduling Device assists to complete, and after job scheduler generates an optimal Job execution queue, just starts according to each operation most Excellent time started initiating task;After job initiation, Resource Scheduler starts to distribute resource for each operation and performed, under Fig. 5 Shown in portion, operation can obtain resource from different nodes, and may operate on different nodes.
Embodiment 3:
Respectively from average operation size, operation set scale, platform resource sum and the urgent journey of operation in the present embodiment 3 The influences of each factor to algorithm such as degree, by the two benches operation towards big data platform in inventive embodiments 1 and embodiment 2 Dispatching method and system carry out combination property contrast with original FIFO dispatching algorithms in EDF algorithms and Hadoop.
Platform configuration:
In the present embodiment, to of the invention and right in a big data platform based on MapReduce Computational frames Ratio 1 and comparative example 2 are tested.
1 host node and 20 configuration information identicals are included in platform from node.The configuration information of each node is CPU Inter(R)Core(TM)i5-2400 3.10GHz,memory 8GB,hard disk 1TB,Red Hat Enterprise Linux6.2System, Hadoop version are 2.7.1.We represent computing resource number with Container number, each Container sizes are 1 core 2G internal memories, have 4 Container on so each node, 80 are shared in whole platform Container。
Contrast algorithm:
In order to verify the two benches job scheduling method towards big data platform in Example 1 and Example 2 of the present invention And the validity of system, in experiment by two benches dispatching algorithm-TPS (two-phase schedule) proposed by the present invention with Original FIFO dispatching algorithms carry out combination property contrast in EDF algorithms and Hadoop.
EDF algorithms are classical for there is the job scheduling algorithm of deadline constraint, and the principle of the algorithm is to be based on cutting Only time order and function determines Job execution order, the operation of preferential morning exercise cut-off time.In the process of implementation, to avoid operating room Job execution time lengthening caused by fighting for resource, in each comparative example of the present embodiment set work as platform in surplus resources not Less than operation resource requirement when ability initiating task.
In FIFO dispatching algorithms, operation performs according to the order of priority size (or priority of submission time), operating room It can not perform parallel.In this paper test, operation will determine according to income ratio (financial value and the ratio of resource requirement) size Priority with algorithm presented herein to be contrasted.
Test jobs and data:
The MapReduce operation Sort and Grep tested using classics is tested as input operation.
Evaluation index:
The present embodiment will be commented algorithm by three platform resource utilization rate, operation completion rate, platform income indexs Valency, the calculation formula of three indexs are as follows:
Number of resources/platform resource sum used in platform resource utilization rate=Job execution.
Operation completion rate=the operation number being timely completed/treats schedule job sum.
Platform income=income for the operation being timely completed-is not timely completed the compensation amount of money of operation.
Experimental design:
Consider that average operation size, operation set scale, platform resource sum and operation pressing degree etc. are each during experiment Influence of the factor to algorithm, carries out experimental analysis respectively:
(1) is tested as influence of the test average operation size to algorithm, we use the data block number of operation as assessment The standard of job size, set operation sum to be fixed as 30, respectively the average data block number number of test jobs be 20,40,80, 100th, 200 when algorithm performance.
It is influence of the test jobs collection scale to algorithm to test (2), sets the mean size of operation to be fixed as 40, surveys respectively Study the performance of algorithm when industry concentrative operation quantity is 10,20,30,40.
It is influence of the test platform total number resource to algorithm to test (3), is provided using Container numbers as Evaluation Platform The standard of source sum, sets operation quantity and mean size to be fixed to 20,40, test respectively Container numbers be 4, 8th, 16,32,48,64,80 when algorithm performance.
It is influence of the test jobs pressing degree to algorithm to test (4), uses the distance length and work of operation deadline Industry performs the ratio of length as the standard for assessing pressing degree, and the ratio is bigger, illustrates that operation pressing degree is lower.Set and make Industry quantity and mean size are fixed to 20,40, test the performance of algorithm when average pressing degree is 3,4,5,6,7 respectively.
Because the dispatching algorithm complexity in the present invention is higher, need to consume a certain amount of time when being scheduled, and FIFO is dispatched and EDF scheduling there's almost no scheduling time, therefore needs the execution time of algorithm to examine when carrying out algorithm comparison Including worry.
Analysis of experimental results:
(1) influence of the average operation size to dispatching algorithm
As shown in figs 6-8, the present embodiment is tested (1) first, influence of the test average operation size to algorithm performance.
As shown in fig. 6, experiment (1) tests influence of the average operation size to resource utilization, from experimental result, During operation small number, FIFO scheduling because operation serially perform and resource utilization is very low, and TPS and EDF algorithms because Job parallelism resource utilization it is higher and because EDF algorithms only consider deadline factor thus its resource utilization be less than TPS Algorithm;With the increase of job size, the resource utilization of FIFO scheduling can increase and TPS algorithms and EDF algorithms due to making Industry increases, and resource fragmentation caused by scheduling can also increase thus resource utilization can decrease;Finally, when job size reaches After to a certain degree, platform resource can not meet the needs of job parallelism, and the operation after three algorithmic dispatchings will serially be held OK, thus the resource utilizations of three algorithms can be roughly the same.
As shown in fig. 7, experiment (1) is also tested for the relation of operation completion rate and average operation size, due to completing the time limit Constant, computing capability of the system in fixed time period is certain, and therefore, as operation increases, the operation completion rate of 3 kinds of algorithms is all Have an obvious reduction, but due to EDF algorithms only consider deadline factor and TPS algorithms will take into account platform income, thus EDF The a little higher than TPS algorithms of algorithm operation completion rate, and FIFO algorithms only consider income, therefore the operation completion rate of this algorithm is most It is low.
As shown in figure 8, experiment (1) is also tested for influence of the average operation size to total revenue, wherein FIFO is because of its resource Utilization rate is low, so delayed credits is minimum, although EDF algorithm operation completion rates are slightly above TPS algorithms, because TPS considers The income of operation, therefore the operation income of TPS algorithms is slightly larger than EDF algorithms, with the increase of job size, because operation is complete Into the reduction of rate, the operation income of three kinds of algorithms has all declined, and the income of TPS algorithms is more than other two kinds of algorithms always.
(2) influence of the operation set scale to dispatching algorithm
As shown in figs. 9-11, the present embodiment is tested (2), influence of the test jobs collection scale to scheduling algorithm performance.
As shown in figure 9, experiment (2) tests influence of the operation set scale to resource utilization, and from experimental result, three The computing resource utilization rate of kind of algorithm influenceed by operation set scale it is little, though the increase of its resource utilization cultivation scale does not have Generation significant change.TPS algorithms utilization rate highest in three kinds of algorithms, the resource utilization of EDF algorithms are slightly below TPS algorithms, FIFO algorithm resource utilizations are minimum.
As shown in Figure 10, experiment (2) is also tested for influence of the operation set scale to operation completion rate, because computing capability has Limit, when operation set scale increases, the operation completion rate of three kinds of algorithms is decreased, and its reason is completed with job size to operation The influence of rate and total revenue is identical, and here is omitted.
As shown in figure 11, experiment (2) is also tested for influence of the flat operation set scale to income, from experimental result, When operation set scale increases, the income of three kinds of algorithms all shows the trend of reduction after first increase, and its reason is existed in operation number When within platform computing capability, operation increase, the achievable operation of platform also increased thus can obtain more incomes, Exceed platform computing capability however as operation number, during operation number increase, the operation not being timely completed increases, because these are not complete What it is into operation acquisition is negative income, thus total revenue reduces on the contrary.In three kinds of algorithms FIFO algorithms because resource utilization it is low, because This operation completion rate and income are all minimum, and EDF algorithms and TPS algorithms are high because of resource utilization, when operation number is flat When in the range of platform computing capability, the operation completion rate and income of two kinds of algorithms are all roughly the same, continue to increase with operation number, due to EDF algorithms only consider the deadline factor of operation and TPS algorithms preferentially complete the big operation of income, thus the work of EDF algorithms Although industry completion rate slightly above TPS algorithms but TPS algorithm operations income are but much larger than the income of EDF algorithms.
(3) influence of the computing resource quantity to dispatching algorithm
As shown in figs. 12-14, the present embodiment is tested (3), influence of the test jobs collection scale to scheduling algorithm performance.
As shown in figure 12, experiment (3) tests influence of the computing resource number to resource utilization, from experimental result, When computing resource is less, the resource waste rate of three kinds of algorithms is roughly the same and is reduced with computing resource increasing number, so And after computing resource quantity is reduced to a certain extent, the resource utilization of FIFO algorithms continues with the increasing of computing resource quantity Add and reduce, and EDF and TPS resource utilization increases as computing resource quantity increases, and TPS amounts of increase are more than EDF Amount of increase.The reason for above-mentioned phenomenon occur is, when computing resource quantity is few, the operation after three kinds of algorithmic dispatchings is all substantially Therefore resource utilization is similar and is reduced as resource increases for serial execution;It is more than when Container quantity continues to increase to Operation after average operation size after EDF and TPS algorithmic dispatchings may perform parallel, thus resource utilization increased, Because TPS algorithms have taken into account platform resource utilization rate information in job scheduling, and EDF algorithms be it is simple according to cut-off when Between dispatch, thus TPS algorithms resource utilization is more than EDF algorithms.
As shown in figure 13, experiment (3) is also tested for influence of the computing resource number to operation completion rate;As shown in figure 14, it is real Test (3) and be also tested for influence of the computing resource number to income, from experimental result, with the increase TPS of computing resource quantity The operation completion rate and income of algorithm and EDF algorithms all increased, and the income of TPS algorithms is much larger than EDF algorithms;And FIFO Algorithm operation completion rate and income when platform resource is less all increase with the increase of resource, but when platform resource is more than After average operation size, because operation serially performs, substantial amounts of resource is wasted, thus operation completion rate and income not with The increase of resource and increase, it is but stable in a fixed value.
(4) influence of the operation pressing degree to dispatching algorithm
As shown in figure 15, the present embodiment is tested (4), influence of the test jobs pressing degree to three algorithms.Due to With the reduction of pressing degree, in the case of the Information invariabilities such as operation quantity, operation has more times can perform, thus three The completion rate and income of kind algorithm all necessarily increased, thus test a test jobs pressing degree herein to resource utilization Influence.From experimental result in figure, the resource utilization of FIFO algorithms and EDF algorithms is not substantially by operation pressing degree Influence, and the resource utilization of TPS algorithms then increases with the step-down of pressing degree.Its reason is FIFO algorithms and EDF algorithms In be only concerned the income or the precedence relationship of operating room deadline of operation, and TPS algorithms will on the premise of income is met Consider the utilization rate of resource, in the pressing degree step-down of operation, resource utilization maximizes the free degree of operation in algorithm more Height, thus can preferably to operation at the beginning of between be adjusted so that its resource utilization is maximum.
Beneficial effects of the present invention:
A kind of two benches job scheduling method and system towards big data platform of the present invention, counted based on MapReduce Framework is calculated, a kind of two benches job scheduling system and method are proposed for the operation for having deadline to constrain.Carry in the first stage Go out a kind of maximum return dispatching method based on operation Late Start, the dispatching method constrains according to the deadline of operation And the avail information of operation calculates and adjusts the Late Start of each operation, and resource is carried out according to adjustment result and divided in advance Match somebody with somebody, to ensure that the big operation of income can be completed before deadline, so that platform total revenue is maximum;In second stage, protecting On the premise of demonstrate,proving platform Income Maximum, the job scheduling method based on platform maximum resource utilization rate is proposed, to improve platform money Source utilization rate.Test result indicates that two benches job scheduling method proposed by the present invention not only realizes platform maximum revenue, And the resource utilization of platform is also improved, improve the combination property of platform.
The preferred embodiment of the application is the foregoing is only, is not limited to the application, for the skill of this area For art personnel, the application can have various modifications and variations.It is all within spirit herein and principle, made any repair Change, equivalent substitution, improvement etc., should be included within the protection domain of the application.

Claims (10)

1. a kind of two benches job scheduling method towards big data platform, it is characterized in that:This method comprises the following steps:
(1) operation for submitting user forms and treats schedule job set, the maximum return scheduling based on operation Late Start: The resource of platform is pre-allocated according to the deadline of operation, and the result of pre-allocation of resources is compared according to the income of operation It is adjusted and dispatches, obtains the pre-allocation of resources job scheduling result queue of Income Maximum for making service provider;
(2) job scheduling based on platform maximum resource utilization rate:According to the resource service condition of platform, to the money of step (1) Source predistribution job scheduling result queue is micro-adjusted to obtain final scheduling result queue, ensures the premise of platform Income Maximum Under the resource utilization of platform is reached highest.
2. a kind of two benches job scheduling method towards big data platform as claimed in claim 1, it is characterized in that:The step What the maximum return based on operation Late Start of (1) was dispatched suddenly concretely comprises the following steps:
(1-1), which is calculated, treats the initial Late Start of each operation in schedule job set, and during according to initially starting the latest Between pre-allocation of resources is carried out to the resource of platform;
(1-2) counts the computing resource sum of each period needs, it is pre- to obtain resource according to the allocation result of pre-allocation of resources Allocation result P_R;
(1-3) judges to whether there is the excess load period in pre-allocation of resources result P_R, if in the presence of, into step (1-4), if It is not present, into step (1-5);
(1-4) is adjusted to the initial Late Start of the excess load period operation in pre-allocation of resources result P_R, and According to adjustment result renewal P_R, return to step (1-3);
(1-5) output makes the pre-allocation of resources result P_R of the Income Maximum of service provider.
3. a kind of two benches job scheduling method towards big data platform as claimed in claim 2, it is characterized in that:The step Suddenly in (1-1), concretely comprising the following steps for pre-allocation of resources is carried out to the resource of platform according to initial Late Start:
Allow each operation to start to perform in its initial Late Start, obtain when all operations are just completed in its deadline When the number of resources that needs of each period;
The initial Late Start of the operation is:For treating any operation in schedule job set, when other no works During industry contention for resources, make operation just can be in the operation time started of stop time point completion.
4. a kind of two benches job scheduling method towards big data platform as claimed in claim 2, it is characterized in that:The step Suddenly in (1-3), judge to judge to run in certain time period with the presence or absence of the excess load period in pre-allocation of resources result P_R Operation need computing resource number summation whether be more than big data platform computing resource sum;If being more than, the period For the excess load period, otherwise, the period is the normal duty period;
The quantity that the computing resource sum of the big data platform is all standard Container in big data platform.
5. a kind of two benches job scheduling method towards big data platform as claimed in claim 2, it is characterized in that:The step Suddenly (1-4) is concretely comprised the following steps:
(1-4-1) chooses last excess load period, and obtains all operations performed within the period and form operation Collection;
(1-4-2) chooses a minimum proper subclass in operation set, minimum proper subclass by operation set all operations it is initial most The late time started is advanced to the normal duty period for making the period be normal duty state, by all minimums for meeting condition The collection of proper subclass is combined into the feasible adjustable strategies set of the period;
(1-4-3) assesses the assessed value of each feasible adjustable strategies in feasible adjustable strategies set according to valuation functions, and selects Take optimal adjustable strategies;
(1-4-4) is adjusted to the initial Late Start of the operation in the strategy according to optimal adjustable strategies, and root According to the initial Late Start renewal pre-allocation of resources result P_R of the operation after adjustment.
6. a kind of two benches job scheduling method towards big data platform as claimed in claim 5, it is characterized in that:The step Suddenly the valuation functions in (1-4-3) are:
<mrow> <msub> <mi>Js</mi> <mi>i</mi> </msub> <mo>.</mo> <mi>p</mi> <mi>f</mi> <mo>=</mo> <mfrac> <mrow> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>&amp;Element;</mo> <msub> <mi>Js</mi> <mi>i</mi> </msub> </mrow> </munder> <msub> <mi>j</mi> <mi>m</mi> </msub> <mo>.</mo> <mi>S</mi> <mi>p</mi> </mrow> <mrow> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>&amp;Element;</mo> <mi>J</mi> <mi>u</mi> </mrow> </munder> <msub> <mi>j</mi> <mi>m</mi> </msub> <mo>.</mo> <mi>S</mi> <mi>p</mi> </mrow> </mfrac> <mo>*</mo> <mi>l</mi> <mi>a</mi> <mi>s</mi> <mi>t</mi> <mi>s</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> </mrow>
Wherein,For the summation of the Profit Assessment value of all operations in adjustable strategies,For in current slot The summation of the Profit Assessment value of all operations, after lastsize is is adjusted by the strategy, current slot surplus resources account for Platform computing resource sum percentage, Sp be operation Profit Assessment value, Sp=| a-b |, a is that operation obtains when being timely completed Financial value, b is the financial value obtained when operation is not timely completed.
7. a kind of two benches job scheduling method towards big data platform as claimed in claim 1, it is characterized in that:The step Suddenly the job scheduling based on platform maximum resource utilization rate of (2) concretely comprises the following steps:
The final scheduling result queue of (2-1) initializing variable and T moment;
(2-2) finds the operation set that can be performed and not clashed with pre-allocation of resources result P_R at the T moment;
Whether the operation set that (2-3) judges not clash with pre-allocation of resources result P_R is empty set, if empty set, by the T moment It is arranged to the initial time of next period in pre-allocation of resources result P_R;If not empty set, then select to make in operation set The minimum operation of resource waste rate, obtains its optimal time started, updates the T moment;
(2-4) repeats step (2-2)-step (2-3), until being provided with an optimal time started for each operation, Obtain final scheduling result;
(2-5) exports final scheduling result.
8. a kind of two benches job scheduling method towards big data platform as claimed in claim 7, it is characterized in that:The step Suddenly ratio of the resource waste rate in (2-3) between the non-reusable resource after scheduling and current computing resource Value;
Currently computing resource is the summation of the computing resource and idle computing resources used.
9. a kind of two benches job scheduling system towards big data platform, the scheduling system is based on as claim 1-8 is any A kind of described two benches job scheduling method towards big data platform, it is characterized in that:The system includes:
First stage scheduler module, the operation that the first stage scheduler module is used to submit user form and treat schedule job collection Close, the overall maximum return of Late Start constraint and service provider based on each operation, start the latest using based on operation The operation that the maximum return scheduling of time is treated in schedule job set carries out adjustment scheduling for the first time, obtains the receipts for making service provider Beneficial maximum pre-allocation of resources job scheduling result queue;
With
Second stage scheduler module, the second stage scheduler module is used for the resource service condition according to platform, to the first rank The pre-allocation of resources job scheduling result queue of section scheduler module is micro-adjusted to obtain final scheduling result queue, final scheduling The resource utilization of platform is set to reach highest on the premise of result queue's guarantee platform Income Maximum, and can enough makes the resource of platform It is fully used, each operation band has the optimal time started in final scheduling result queue.
10. a kind of two benches job scheduling system towards big data platform as claimed in claim 9, it is characterized in that:It is described Treat that schedule job collection is:Platform service provider receives a collection of operation within some period and consults signing and operation with user Related SLA agreements, the collection of operation, which is combined into, treats schedule job collection J, is expressed as J={ j1,j2,…,jn, wherein, n is operation in J Number;
For treating any operation j in schedule job collection Ji, it is expressed as ji=(ms, rs, mt, rt, dl, bf (t)), wherein, ms is The Map number of tasks of the operation;Rs is the Reduce number of tasks of the operation;Mt is the average performance times of operation Map tasks;rt For the average performance times of operation Reduce tasks;Dl is to constrain the deadline of the operation;Bf (t) is the income of the operation Function;
First scheduler module includes the first job scheduler and first resource scheduler, is wrapped in second scheduler module Include the second job scheduler and Secondary resource scheduler.
CN201710590748.5A 2017-07-19 2017-07-19 Two-stage job scheduling method and system for big data platform Active CN107589985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710590748.5A CN107589985B (en) 2017-07-19 2017-07-19 Two-stage job scheduling method and system for big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710590748.5A CN107589985B (en) 2017-07-19 2017-07-19 Two-stage job scheduling method and system for big data platform

Publications (2)

Publication Number Publication Date
CN107589985A true CN107589985A (en) 2018-01-16
CN107589985B CN107589985B (en) 2020-04-24

Family

ID=61041646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710590748.5A Active CN107589985B (en) 2017-07-19 2017-07-19 Two-stage job scheduling method and system for big data platform

Country Status (1)

Country Link
CN (1) CN107589985B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328383A (en) * 2020-11-19 2021-02-05 湖南智慧畅行交通科技有限公司 Priority-based job concurrency control and scheduling algorithm

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077438A (en) * 2012-12-27 2013-05-01 深圳先进技术研究院 Control method and system for scheduling multiple robots
US20130111453A1 (en) * 2011-10-31 2013-05-02 Oracle International Corporation Throughput-aware software pipelining for highly multi-threaded systems
CN104317650A (en) * 2014-10-10 2015-01-28 北京工业大学 Map/Reduce type mass data processing platform-orientated job scheduling method
CN104731662A (en) * 2015-03-26 2015-06-24 华中科技大学 Variable parallel work resource allocation method
CN104778079A (en) * 2014-01-10 2015-07-15 国际商业机器公司 Method and device used for dispatching and execution and distributed system
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN105718316A (en) * 2014-12-01 2016-06-29 中国移动通信集团公司 Job scheduling method and apparatus
CN105721565A (en) * 2016-01-29 2016-06-29 南京邮电大学 Game based cloud computation resource allocation method and system
CN105740051A (en) * 2016-01-27 2016-07-06 北京工业大学 Cloud computing resource scheduling realization method based on improved genetic algorithm
CN105808334A (en) * 2016-03-04 2016-07-27 山东大学 MapReduce short job optimization system and method based on resource reuse
US20160218838A1 (en) * 2013-07-25 2016-07-28 Sony Corporation Method, base station and terminal for dynamic uplink configuration in wireless communication system
CN106293893A (en) * 2015-06-26 2017-01-04 阿里巴巴集团控股有限公司 job scheduling method, device and distributed system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130111453A1 (en) * 2011-10-31 2013-05-02 Oracle International Corporation Throughput-aware software pipelining for highly multi-threaded systems
CN103077438A (en) * 2012-12-27 2013-05-01 深圳先进技术研究院 Control method and system for scheduling multiple robots
US20160218838A1 (en) * 2013-07-25 2016-07-28 Sony Corporation Method, base station and terminal for dynamic uplink configuration in wireless communication system
CN104778079A (en) * 2014-01-10 2015-07-15 国际商业机器公司 Method and device used for dispatching and execution and distributed system
CN104317650A (en) * 2014-10-10 2015-01-28 北京工业大学 Map/Reduce type mass data processing platform-orientated job scheduling method
CN105718316A (en) * 2014-12-01 2016-06-29 中国移动通信集团公司 Job scheduling method and apparatus
CN104731662A (en) * 2015-03-26 2015-06-24 华中科技大学 Variable parallel work resource allocation method
CN106293893A (en) * 2015-06-26 2017-01-04 阿里巴巴集团控股有限公司 job scheduling method, device and distributed system
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN105740051A (en) * 2016-01-27 2016-07-06 北京工业大学 Cloud computing resource scheduling realization method based on improved genetic algorithm
CN105721565A (en) * 2016-01-29 2016-06-29 南京邮电大学 Game based cloud computation resource allocation method and system
CN105808334A (en) * 2016-03-04 2016-07-27 山东大学 MapReduce short job optimization system and method based on resource reuse

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J HU等: "An Ant Colony Optimization for Grid Task Scheduling with Multiple QoS Dimensions", 《2009 EIGHTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING》 *
王习特等: "MapReduce集群中最大收益问题的研究", 《计算机学报》 *
陈晓旭等: "基于最小费用最大流的大规模资源调度方法", 《软件学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328383A (en) * 2020-11-19 2021-02-05 湖南智慧畅行交通科技有限公司 Priority-based job concurrency control and scheduling algorithm

Also Published As

Publication number Publication date
CN107589985B (en) 2020-04-24

Similar Documents

Publication Publication Date Title
Ibrahim et al. An integer linear programming model and adaptive genetic algorithm approach to minimize energy consumption of cloud computing data centers
CN109800071A (en) A kind of cloud computing method for scheduling task based on improved adaptive GA-IAGA
Li et al. An greedy-based job scheduling algorithm in cloud computing.
CN108428051B (en) MapReduce job scheduling method and device facing big data platform and based on maximized benefits
CN104333569A (en) Cloud task scheduling algorithm based on user satisfaction
CN105373426B (en) A kind of car networking memory aware real time job dispatching method based on Hadoop
Chakravarthi et al. TOPSIS inspired budget and deadline aware multi-workflow scheduling for cloud computing
CN108108225A (en) A kind of method for scheduling task towards cloud computing platform
CN109710372B (en) Calculation intensive cloud workflow scheduling method based on owl search algorithm
Zhou et al. A novel task scheduling algorithm integrated with priority and greedy strategy in cloud computing
Huang et al. Platform profit maximization on service provisioning in mobile edge computing
CN115714820A (en) Distributed micro-service scheduling optimization method
CN107589985A (en) A kind of two benches job scheduling method and system towards big data platform
Chen et al. Deadline-constrained MapReduce scheduling based on graph modelling
Bagheri et al. Enhancing energy efficiency in resource allocation for real-time cloud services
Maurya Resource and task clustering based scheduling algorithm for workflow applications in cloud computing environment
CN112306642B (en) Workflow scheduling method based on stable matching game theory
CN114980216A (en) Dependent task unloading system and method based on mobile edge calculation
CN115599522A (en) Task scheduling method, device and equipment for cloud computing platform
Zhang et al. A workflow scheduling method for cloudlet management in mobile cloud
Sun et al. An improved budget-deadline constrained workflow scheduling algorithm on heterogeneous resources
Thai et al. Algorithms for optimising heterogeneous Cloud virtual machine clusters
George et al. An objective study on improvement of task scheduling mechanism using computational intelligence in cloud computing
Xu et al. Multi resource scheduling with task cloning in heterogeneous clusters
Hu Hybrid dynamic scheduling of mapreduce and spark services based on the profit model in the cloud computing platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant