CN108985367A

CN108985367A - Computing engines selection method and more computing engines platforms based on this method

Info

Publication number: CN108985367A
Application number: CN201810734031.8A
Authority: CN
Inventors: 杜凡; 杜一凡; 陈昭; 刁博宇; 徐勇军
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2018-07-06
Filing date: 2018-07-06
Publication date: 2018-12-11

Abstract

A kind of more computing engines platforms the present invention provides computing engines selection method and based on this method.This method comprises: the corresponding task characteristic of task to be calculated to be input to the task execution time prediction model of each of multiple computing engines computing engines, obtain task execution time prediction result of the task to be calculated on each computing engines, wherein, the task execution time prediction model is to be obtained based on training sample set by training, and the training sample set includes a plurality of task characteristic and corresponding task execution time；Select to execute the computing engines of task to be calculated from the multiple computing engines according to the task execution time prediction result.Method of the invention can automatically select high-efficient computing engines, reduce task execution time.

Description

Computing engines selection method and more computing engines platforms based on this method

Technical field

The present invention relates to information technology field more particularly to a kind of computing engines selection method and by this method it is more based on Calculate engine platform.

Background technique

As country is in the development of a large amount of New Equipments in the directions such as sea, sky, day, deep-sea, equipment test becomes ever more important. For example, carried out up to ten thousand wind tunnel tests altogether in destroying 10 fighter plane development processes, obtained million aerodynamic datas, Become the important foundation destroying 10 successes and developing to the processing analysis of these data.Equipment test includes " test " and " evaluation " two Process is a kind of approach for obtaining data, is then analyzed various data, handled, compared, to help to make a policy.Mesh The preceding experimental data processing mode for still relying primarily on expertise and computer aided processing, is no longer satisfied current test Needs of data processing, also, due to needing the data volume different to scale to handle in experimental data processing, structuring with Situations such as unstructured processing mixes, combines in real time with processed offline, all kinds of tests can not be coped with using single engine Process demand.In response to this problem, there are three types of resolving ideas at present: being a variety of engines of labor management first, computing engines are separated Deployment, and using manual type management computing engines, execute calculating task, this mode needs a large amount of manpowers, inefficiency, simultaneously If system does not keep full load, the huge wasting of resources will result in；The second way is using the various calculating demands of support " super " engine is specially disposed the engine for supporting all processing modes, all examinations can be met using this engine Data processing needs are tested, but this mode maturity is not high at present, also takes day apart from large-scale use；Before the third mode is The compromise of the two supports that the computing platform of more computing engines, this mode on the one hand can be using maturations at present using one Various computing engines technologies, are on the other hand managed computing engines and calculating task using the method for automation, Neng Gouti High resource utilization and task execution efficiency.In short, a variety of engine efficiencies of labor management are low for above-mentioned three kinds of modes, it is " super Grade " engine is difficult to meet eager demand for the moment, and the computing platform of computing engines more than one is current balance efficiency and feasibility Solution.

However, needing to solve more computing engines compatibling problems using more computing engines platforms, calculating task unified management is asked The problem of extension of topic and the following engine, improves platform efficiency it is therefore desirable to be able to automatically select task executive engine.It is existing It supports the platform of more computing engines, cannot all solve the above problems.For example, Twitter SummingBird uses Lambda frame Structure incorporates distributed batch processing engine (Hadoop) and distributed stream computing engines (Storm), can be whole when executing request Close batch processing and stream calculation as a result, but its there is no convenient engine management mechanism, while without provide engine-operated environment every From；Apache Ambari is realized based on Web, supports supply, management and the monitoring of Apache Hadoop ecology, while providing certainly Defining interface is supported to add all kinds of single machines or Distributed engine, but it does not provide unified calculating task management, can only guarantee Particular engine compatibility, while artificial selection computing engines being needed to execute calculating task；Google Kubernete is based on Docker is realized, computing engines can be run in a manner of container, can run single machine engine and Distributed engine according to demand, The deployment of container is provided, the functions such as extension between scheduling and node cluster, but it does not have task management mechanism, while being also required to artificial Select computing engines.

Therefore, it is necessary to be improved to the prior art, to provide more computing engines platforms and be put down towards more computing engines The method of the automation selection computing engines of platform.

Summary of the invention

It is an object of the invention to overcome the defect of the above-mentioned prior art, a kind of computing engines selection method is provided and is based on More computing engines platforms of this method.

According to the first aspect of the invention, a kind of computing engines selection method is provided.Method includes the following steps:

Step 1: the corresponding task characteristic of task to be calculated being input to each of multiple computing engines and is calculated The task execution time prediction model of engine obtains task execution time prediction of the task to be calculated on each computing engines As a result, wherein the task execution time prediction model is to be obtained based on training sample set by training, the training sample set Including a plurality of task characteristic and corresponding task execution time；

Step 2: being selected to execute from the multiple computing engines according to the task execution time prediction result to be calculated The computing engines of task.

In one embodiment, the task characteristic includes algorithm types, algorithm parameter, data type, data volume With at least one in data storage position.

In one embodiment, the training sample set of a computing engines is constructed by following steps:

Step 31: a plurality of task description data for being used to describe mission bit stream are collected,；

Step 32: executing the corresponding task of each task description data using the computing engines, obtain each task The corresponding task execution time of data is described；

Step 33: the feature composition task characteristic for influencing task execution time is extracted from each task description data According to constructing the training sample set of the computing engines in conjunction with task execution time obtained.

In one embodiment, the task execution time for a computing engines being obtained by executing following steps predicts mould Type:

Step 41: the training sample set based on the computing engines, using task characteristic as independent variable, when with task execution Between be dependent variable, establish linear regression model (LRM), indicate are as follows:

y_i=β₀+β₁x_i1+…+β_px_ip, i=1,2 ..., n

Wherein, x_i1To x_ipIndicate the task feature that the training sample set of the computing engines includes, i indicates the computing engines Training sample concentrate include sample data item number number, n be the computing engines training sample set sample data item Number, β₀For bias to be optimized, β₁To β_pFor weighted value to be optimized；

Step 42: the optimization weighted value and bias of the linear regression model (LRM) are solved using least square method；

Step 43: the linear regression model (LRM) being indicated according to the optimization weighted value and bias of acquisition, the calculating is obtained and draws The task execution time prediction model held up.

In one embodiment, step 2 includes following sub-step:

Step 51: selection prediction executes time shortest computing engines；Or

Step 52: when the surplus resources that the prediction executes time shortest computing engines cannot support the feelings of task to be calculated Under condition, according to the task execution time prediction result successively preferential short computing engines of selection predicted time.

According to the second aspect of the invention, a kind of more computing engines platforms are provided.The platform includes:

Calculating task management module: for managing the process flow of calculating task and generating calculating task information；

Engine management module: for according to from the calculating task management module calculating task information according to this hair Bright computing engines selection method selects computing engines；

Task execution module: for executing calculating task and exporting task execution time.

In one embodiment, more computing engines of the invention select platform further include:

Container Management module: for calling the task execution module to execute calculating task；

User interactive module: for receiving user operation instruction and information；

Debugging task management module: for executing user's debugging task and exporting Debugging message.

In one embodiment, when the computing engines change that the platform includes, the engine management module will be counted newly The task execution time prediction model activation of engine is calculated, while by the task execution time prediction model of the computing engines replaced It is set as unactivated state.

Compared with the prior art, the advantages of the present invention are as follows: the computing engines selection method of offer can utilize engineering Learning method constructs the task execution time prediction model of multiple computing engines, and is combined based on the prediction result of constructed model Resource situation automatically selects the highest computing engines of efficiency, can substantially reduce task execution time, improves experimental data processing Efficiency；There is provided provide task management mechanism and can support by more computing engines platforms of computing engines selection method based on Engine change is calculated, improves flexibility, while good support is provided to the extension of the following engine.

Detailed description of the invention

The following drawings only makees schematical description and interpretation to the present invention, is not intended to limit the scope of the present invention, in which:

Fig. 1 shows the flow chart of computing engines selection method according to an embodiment of the invention；

Fig. 2 shows the block schematic illustrations of more computing engines platforms according to an embodiment of the invention.

Specific embodiment

It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear Crossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain The present invention is not intended to limit the present invention.

According to one embodiment of present invention, a kind of computing engines selecting party towards more computing engines platforms is provided Method, in short, constructing training sample set this method comprises: collect the task execution data of multiple computing engines；Using constructed Training sample set pass through machine learning mode training mission running time prediction model；Utilize trained task execution time Prediction model predicts the task execution time of each computing engines, and then selects suitable computing engines.Specifically, referring to Fig. 1 It is described, computing engines selection method of the invention the following steps are included:

Step S110 collects the task execution data of multiple computing engines to construct training sample set.

In this step, runing time data when multiple computing engines face the calculating task of different condition are collected, to the greatest extent It is possible each generic task is all contained in, to construct comprehensive training sample set.

According to one embodiment of present invention, the process for constructing training sample set includes following sub-step:

Step S111 prepares testing data

Algorithm to be tested is collected, prepares testing data appropriate for each algorithm.For example, can be according to training platform performance The data volume of processing is usually required to determine testing data amount, for another example according to the actual situation by total amount of data to be measured with each algorithm Obtain the upper limit.

Step S112, eligible task describe data

Task description data are used to describe the information of performed task, for example, the algorithm of task execution, the related ginseng of algorithm Number etc..One computing engines can execute specific tasks according to task description data.

It in one embodiment, is a hexa-atomic group of Task_Info, including < Task_ID by task description data definition, Algorithm_ID, Algorithm_Args, Data_Type, Data_Size, Data_Path >, wherein Task_ID indicates to appoint Business serial number, content is integer；Algorithm_ID indicates algorithm serial number, and content is integer, be can be mapped to according to the algorithm serial number Specific algorithm, such as algorithm include FP-Growth algorithm, K-Means algorithm, PageRank algorithm and Pearson correlation coefficients Algorithm etc.；Algorithm_Args indicates algorithm parameter, and content is Json coded string, by taking K-Means algorithm as an example, algorithm Parameter may include score number of clusters amount k, initialization mode initMode, maximum number of iterations maxItr etc.；Data_Type indicates number According to type, content is number designation discrete value；Data_Size indicates data volume, and content is integer, and unit is byte；Data_ Path indicates data storage position, inside has number designation discrete value, value has local file system or distributed file system.

It should be noted that defined task description data may include it is influential on task execution time it is any its His content, for example, may also include task execution priority etc. other than the algorithm above serial number and algorithm related coefficient. In addition, the content that algorithm parameter is included is also different for different algorithms.

Step S113 executes calculating task according to task description data, obtains the task execution time of multiple computing engines Data

For each computing engines, meter is executed using the description data Task_Info of ready multiple calculating tasks Calculation task collects the execution time Run_Time of each calculating task, by task description data and execute the time form two tuples < Task_Info, Run_Time >, obtain task execution data.

Step 114, data cleansing is carried out to task execution data

The purpose of data cleansing is to reject possible vicious or incomplete data.For example, can be held to all tasks Row data are for statistical analysis, obtain task execution time standard deviation, if a certain task execution time and average task execution Time is more than 3 times of standard deviations, then labels it as abnormal data and reject.In addition, also being carried out for the data of attribute column missing It rejects, attribute column missing includes task execution time missing or task description loss of learning, also carries out rejecting operation.

Step 115, Feature Engineering

The purpose of Feature Engineering is that the effect that has a significant effect to the execution time of task is selected from task description data Feature, to construct training sample set.When training sample of the invention is concentrated including task characteristic and corresponding task execution Between, it include multiple tasks feature influential on task execution time in task characteristic.

According to one embodiment of present invention, the Task_ID in task description data Task_Info holds calculating task The row time does not influence, therefore can reject this feature, and task description describes algorithm serial number (i.e. algorithm types) in data, algorithm The execution time of the tasks such as parameter, data storage position, data type, data volume has an impact, therefore is retained as instructing Practice the task characteristic in sample set.

When constructing final training sample set, for the task feature of discretization, if by the way of serial number code, Then the sequence of serial number influences whether unordered discrete magnitude, causes additional information input.Therefore, an implementation according to the present invention The mode of one-hot coding (one-hot coding) can be used to encode discrete data for example.For example, for a task description Data, discrete features include data storage position, and the value of the discrete features has " local file system " and " distributed field system System ".After being encoded using one-hot, which becomes two independent features: " data storage position-local " and " data Storage position-distribution ".When former data storage position feature value is " local file system ", " data storage position-sheet Ground " feature takes 1, and " data storage position-distribution " feature value is 0；When former data storage position feature value is " distributed When file system ", " data storage position-local " feature value is 0, and " data storage position-distribution " spy value is 1.Class As, algorithm serial number Algorithm_ID can also be encoded using one-hot, i.e., when including four kinds of algorithms, the discrete features Become four independent features.

For the sake of clarity, the following table 1 illustrates the example of the training sample set of building.

Table 1: training sample set example

Table 1 illustrates the training sample set of computing engines 1, it should be noted that task characteristic therein is according to reality Influence situation when the experiment process of border to task execution time, may include algorithm types, algorithm parameter, data storage position, In data type, data volume at least one of or can also increase other task features.In addition, when being compiled using one-hot Code is when encoding the discrete features of task characteristic, which will become multiple independent features, but this hair It is bright to be not limited to encode discrete features using one-hot coding mode.

Step S120 utilizes constructed training sample set training mission running time prediction model.

In this step, it is trained using machine learning method using training sample set, to obtain each computing engines Corresponding task execution time prediction model.

For example, linear regression can be used, gradient promotes the machine learning models such as regression tree (GBRT), XGBoost and instructed Practice.

In one embodiment, it is trained using linear regression model (LRM), the task characteristic concentrated with training sample Linear regression model (LRM) is established using task execution time as dependent variable for independent variable.For example, linear regression model (LRM) may be expressed as:

Y=e+a₁X₁+a₂X₂+a₃X₃+a₄X₄+a₅X₅+a₆X₆+a₇X₇ (1)

Wherein, X₁Represent data volume, X₂、X₃It represents data storage location and uses the feature after one-hot coding, X₄To X₇Generation Table algorithm serial number Algorithm_ID represents task execution time, a using the feature after one-hot coding, Y₁To a₇It indicates to excellent The weighted value of change, e indicate bias to be optimized.

Generally, the p member linear regression model (LRM) that can be established may be expressed as:

y_i=β₀+β₁x_i1+…+β_px_ip, i=1,2 ..., n (2)

Wherein, p be training sample set task characteristic in include feature quantity, i correspond to training sample concentrate wrap The number of the number of data contained, n are the number of data that training sample is concentrated, β₀For biasing to be optimized, β₁To β_pFor weight to be optimized Value, x_i1To x_ipThe task feature that corresponding training sample is concentrated.

In training, least square method acquisition optimization weighted value can be used and bias, the target of least square method are to allow The quadratic sum of error is minimum, it may be assumed that

Then, local derviation is asked to parameters:

Obtain normal equation group:

Write as matrix form are as follows:

X ' X β=X ' Y (6)

To obtain the solution (including weighted value and bias) of parameter:

In this step, optimization weighted value and bias can get by training, utilizes these optimization weights and bias table The model shown is task execution time prediction model, and in this way, capable of the obtaining each computing engines of the task is held Row time prediction model.

Step S130, when predicting the task execution of each computing engines using trained task execution time prediction model Between, and then select suitable computing engines.

When needing to be implemented a new calculating task, calculating task characteristic is generated according to calculating task attribute first According to being successively entered into the task execution prediction model of each computing engines later, obtain the task of each computing engines Running time prediction selects most suitable computing engines to execute as a result, finally in conjunction with system resource situation and calculating task demand Calculating task.According to one embodiment of present invention, including following sub-step:

Step 131, calculating task characteristic is generated

The process of task characteristic is generated when generating the process and building training sample set of task characteristic to be calculated It is similar, such as, it is possible to use one-hot encodes discrete features therein, the calculating task characteristic after finally verifying conversion According to whether meeting the input of task execution time prediction model.

Step 132, task execution time prediction result is obtained

The characteristic of calculating task to be predicted is input to the task execution time prediction model an of computing engines Model_i, obtain running time prediction result P_Time of the calculating task on i-th of engine_i, and then all draw can be obtained The running time prediction result held up.

Step 132, it is selected that the computing engines of calculating task will be executed according to prediction result.

Suitable engine is selected to execute calculating according to the time prediction result combination resource service condition of each computing engines Task.

For example, select time shortest engine in all running time prediction results, judge whether its surplus resources can be with It supports calculating task operation, if that cannot support, chooses the secondary short engine of time in all running time prediction results, judge it Resource.Until meeting calculating task demand, then this computing engines is selected to run calculating task.

According to an embodiment of the present invention, a kind of more computing engines platforms are provided, which includes provided by the invention Computing engines selection method, can be applied to Data Processing in Experiment.It is shown in Figure 2, more computing engines platforms of the embodiment Comprising user interactive module 210, debugging task execution module 220, calculating task management module 230, engine management module 240, Container Management module 250, task execution module 260, algorithm management module 270 and algorithms library 271, mission bit stream management module 280 and mission bit stream library 281.

User interactive module 210 is for the information exchange between user.Specifically, Flask can be used, and (it is Python The micro- frame of the Web write) rear end part as user interactive module 210, use bootstrap and jquery to construct Web page Face, while backstage can provide the interface of RESTful software architecture style for secondary development use.For example, tool when practical application Body process includes: to collect user operation instruction and input information etc. using web interface；User's operation is converted into Json The network exchange information of (JavaScript Object Notation, JS object numbered musical notation) format；It is (asynchronous using Ajax JavaScript and XML) call 210 rear end of user interactive module routing interface；Rear end responds routing interface request, completes special Determine function and returns to processing result；Web interface receives processing result and responds user's operation.

Debugging task management module 220 is pushed to web interface in real time for realizing back-end algorithm output and Debugging message Function.For example, can be used WebSocket as front and back end communication protocol, the detailed process of realization includes: user in web interface In write algorithm and algorithm debugging data；User submits debugging task；Rear end executes user's debugging task, and returns and execute knot Fruit and Debugging message；User confirms that algorithm writes situation, still needs to return to the first step when debugging；User confirms will calculate after algorithm is normal Method is added in algorithms library 271.

Calculating task management module 230 is used to manage the process flow of calculating task.For example, directed acyclic graph can be used to compile The process flow for arranging test data, by visualizing directed acyclic graph in web interface, in order to which user checks and confirm process Information.Specifically, the step of calculating task management module 230 include: user according to test data process addition calculating task；Root It is confirmed whether the calculating task of modification addition according to layout visualization result；Confirm it is errorless after can request engine management module 240, obtain Take task executive engine information；Container Management module 250 is called to execute calculating task.

Engine management module 240 is for selecting computing engines and management computing engines state etc..Computing engines can be by opening The application container engine Docker in source is packaged, and becomes container one by one, to carry out resource management.Engine management module 240 function includes: that computing engines selection algorithm of the invention is called according to each mission bit stream, and Optimal calculation is selected to draw It holds up；Judge whether selected engine is active, activates the engine if in unactivated state；By final choice Task execution computing engines information returns to calculating task management module 230.

Container Management module 250 is for managing container function, for example, docker-python can be used to carry out pipe to container Reason, by one group, specifically application and necessary dependence library form each container.The function of Container Management module 250 includes providing Container addition, container starting, container stops and the functions such as container inquiry, and calling task execution module 260 is completed to calculate and be appointed Business executes.

Task execution module 260 is encapsulated using container technique by calculating task is unified, provide algorithm call, operation monitoring and Data collection function.The function of task execution module 260 includes: to check task data integrality；Judge whether algorithm needs three Fang Ku installs three-party library if needed；Judge whether algorithm input data form is consistent with algorithm requirements, it is inconsistent then according to calculation Method demand converts input data form；It executes algorithm and statistic algorithm executes the time；Collection algorithm implementing result simultaneously updates calculating Mission bit stream etc..

Algorithm management module 270 and algorithms library 271 for managing the algorithm that more computing engines platforms are realized, such as using Database MongoDB and HDFS based on distributed document storage provide distributed algorithms library and support, provide algorithm addition, Deletion and query function.

Mission bit stream management module 280 and mission bit stream library 281 are used for management role information, for example, using relationship type number The storage and management of mission bit stream is provided according to base management system MySQL, the functions such as mission bit stream addition, deletion, inquiry are provided.

It should be noted that the method variation of flow chart of data processing is very fast, when due to carrying out actual test data for section The resource of more computing engines platforms is saved, computational efficiency is improved, computing engines used in platform should be with experimental data processing task And change.It when computing engines are changed, needs to be changed according to engine, dynamically adjusts the task execution obtained according to the present invention The state of time prediction model changes engine with realization adaptive.For example, when the computing engines of change are the meter being newly added It when calculating engine, is trained for the engine using training sample set, obtains the task execution time prediction mould of the computing engines Type.For another example, if replacement be trained engine when, by the activation of the task execution time prediction models of new computing engines, Unactivated state is set by the task execution time prediction model of the engine replaced simultaneously.

In conclusion more computing engines platforms provided by the invention are easy-to-use, can towards experimental data processing demand, It supports the online layout of more calculating tasks, while efficiency optimization can be automatically selected based on computing engines selection method of the invention Computing engines support on-line Algorithm debugging, support computing engines automatic packaging and switching etc..Specifically, more calculating of the invention The beneficial effect of engine platform is mainly reflected in: using container technique encapsulation computing engines and calculating task, start and stop speed is fast, money Source consumption is low, can achieve the purpose being environmentally isolated between engine and task with resource constraint；It is called using general calculating task Process has unified the difference between different task, convenient to be managed collectively to calculating task；The online editing of algorithm is supported to survey Examination, can attempt to check algorithm debugging result, algorithm editorial efficiency can be greatly improved；The characteristics of towards experimental data processing, Calculating task is provided and visualizes layout, it is possible to reduce user's layout mistake improves experimental data processing flexibility；Task based access control is held Machine learning method can be used in more computing engines selection algorithms of row time prediction, the automatic operation rule for excavating computing engines Rule, for the highest computing engines of specific calculation task combination resource situation efficiency of selection, holds to substantially reduce calculating task The row time improves experimental data processing efficiency；It supports computing engines change, can be improved system flexibility, while to future Engine extension provides good support.

It should be noted that, although each step is described according to particular order above, it is not intended that must press Each step is executed according to above-mentioned particular order, in fact, some in these steps can concurrently execute, or even is changed suitable Sequence, as long as can be realized required function.

The present invention can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.

Computer readable storage medium can be to maintain and store the tangible device of the instruction used by instruction execution equipment. Computer readable storage medium for example can include but is not limited to storage device electric, magnetic storage apparatus, light storage device, electromagnetism and deposit Store up equipment, semiconductor memory apparatus or above-mentioned any appropriate combination.The more specific example of computer readable storage medium Sub (non exhaustive list) include: portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), Portable compressed disk are read-only Memory (CD-ROM), memory stick, floppy disk, mechanical coding equipment, is for example stored thereon with instruction at digital versatile disc (DVD) Punch card or groove internal projection structure and above-mentioned any appropriate combination.

Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its Its those of ordinary skill can understand each embodiment disclosed herein.

Claims

1. a kind of computing engines selection method, comprising the following steps:

Step 1: the corresponding task characteristic of task to be calculated is input to each of multiple computing engines computing engines Task execution time prediction model, obtain task to be calculated on each computing engines task execution time prediction knot Fruit, wherein the task execution time prediction model is to be obtained based on training sample set by training, the training sample set packet Include a plurality of task characteristic and corresponding task execution time；

Step 2: being selected to execute task to be calculated from the multiple computing engines according to the task execution time prediction result Computing engines.

2. according to the method described in claim 1, wherein, the task characteristic includes algorithm types, algorithm parameter, data At least one of in type, data volume and data storage position.

3. according to the method described in claim 1, wherein, the training sample set of a computing engines is constructed by following steps:

Step 32: executing the corresponding task of each task description data using the computing engines, obtain each task description The corresponding task execution time of data；

Step 33: the feature composition task characteristic for influencing task execution time is extracted from each task description data, The training sample set of the computing engines is constructed in conjunction with task execution time obtained.

4. according to the method described in claim 1, wherein, the task execution of a computing engines is obtained by executing following steps Time prediction model:

Step 41: the training sample set based on the computing engines is with task execution time using task characteristic as independent variable Dependent variable establishes linear regression model (LRM), indicates are as follows:

y_i=β₀+β₁x_i1+…+β_px_ip, i=1,2 ..., n

Wherein, x_i1To x_ipIndicate the task feature that the training sample set of the computing engines includes, i indicates the training of the computing engines The number for the sample data item number for including in sample set, n are the sample data item number of the training sample set of the computing engines, β₀For Bias to be optimized, β₁To β_pFor weighted value to be optimized；

Step 43: the linear regression model (LRM) being indicated according to the optimization weighted value and bias of acquisition, obtains the computing engines Task execution time prediction model.

5. method according to any one of claims 1 to 4, wherein step 2 includes following sub-step:

Step 51: selection prediction executes time shortest computing engines；Or

Step 52: when the prediction execute time shortest computing engines surplus resources cannot support task to be calculated the case where Under, according to the task execution time prediction result successively preferential short computing engines of selection predicted time.

6. a kind of more computing engines platforms characterized by comprising

Engine management module: for according to from the calculating task management module calculating task information according to claim 1 to 5 described in any item method choice computing engines；

7. more computing engines platforms according to claim 6, which is characterized in that further include:

8. platform according to claim 7, which is characterized in that described when the computing engines change that the platform includes Engine management module activates the task execution time prediction model of new computing engines, while appointing the computing engines replaced Business running time prediction model is set as unactivated state.

9. a kind of computer readable storage medium, is stored thereon with computer program, wherein real when the program is executed by processor Now according to claim 1 to any one of 5 the method the step of.

10. a kind of computer equipment, including memory and processor, be stored on the memory to transport on a processor Capable computer program, which is characterized in that the processor realizes any one of claims 1 to 5 institute when executing described program The step of method stated.