CN108985367A - Computing engines selection method and more computing engines platforms based on this method - Google Patents
Computing engines selection method and more computing engines platforms based on this method Download PDFInfo
- Publication number
- CN108985367A CN108985367A CN201810734031.8A CN201810734031A CN108985367A CN 108985367 A CN108985367 A CN 108985367A CN 201810734031 A CN201810734031 A CN 201810734031A CN 108985367 A CN108985367 A CN 108985367A
- Authority
- CN
- China
- Prior art keywords
- task
- computing engines
- execution time
- task execution
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Operations Research (AREA)
- Strategic Management (AREA)
- Pure & Applied Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Economics (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Databases & Information Systems (AREA)
- Game Theory and Decision Science (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Entrepreneurship & Innovation (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Debugging And Monitoring (AREA)
Abstract
A kind of more computing engines platforms the present invention provides computing engines selection method and based on this method.This method comprises: the corresponding task characteristic of task to be calculated to be input to the task execution time prediction model of each of multiple computing engines computing engines, obtain task execution time prediction result of the task to be calculated on each computing engines, wherein, the task execution time prediction model is to be obtained based on training sample set by training, and the training sample set includes a plurality of task characteristic and corresponding task execution time;Select to execute the computing engines of task to be calculated from the multiple computing engines according to the task execution time prediction result.Method of the invention can automatically select high-efficient computing engines, reduce task execution time.
Description
Technical field
The present invention relates to information technology field more particularly to a kind of computing engines selection method and by this method it is more based on
Calculate engine platform.
Background technique
As country is in the development of a large amount of New Equipments in the directions such as sea, sky, day, deep-sea, equipment test becomes ever more important.
For example, carried out up to ten thousand wind tunnel tests altogether in destroying 10 fighter plane development processes, obtained million aerodynamic datas,
Become the important foundation destroying 10 successes and developing to the processing analysis of these data.Equipment test includes " test " and " evaluation " two
Process is a kind of approach for obtaining data, is then analyzed various data, handled, compared, to help to make a policy.Mesh
The preceding experimental data processing mode for still relying primarily on expertise and computer aided processing, is no longer satisfied current test
Needs of data processing, also, due to needing the data volume different to scale to handle in experimental data processing, structuring with
Situations such as unstructured processing mixes, combines in real time with processed offline, all kinds of tests can not be coped with using single engine
Process demand.In response to this problem, there are three types of resolving ideas at present: being a variety of engines of labor management first, computing engines are separated
Deployment, and using manual type management computing engines, execute calculating task, this mode needs a large amount of manpowers, inefficiency, simultaneously
If system does not keep full load, the huge wasting of resources will result in;The second way is using the various calculating demands of support
" super " engine is specially disposed the engine for supporting all processing modes, all examinations can be met using this engine
Data processing needs are tested, but this mode maturity is not high at present, also takes day apart from large-scale use;Before the third mode is
The compromise of the two supports that the computing platform of more computing engines, this mode on the one hand can be using maturations at present using one
Various computing engines technologies, are on the other hand managed computing engines and calculating task using the method for automation, Neng Gouti
High resource utilization and task execution efficiency.In short, a variety of engine efficiencies of labor management are low for above-mentioned three kinds of modes, it is " super
Grade " engine is difficult to meet eager demand for the moment, and the computing platform of computing engines more than one is current balance efficiency and feasibility
Solution.
However, needing to solve more computing engines compatibling problems using more computing engines platforms, calculating task unified management is asked
The problem of extension of topic and the following engine, improves platform efficiency it is therefore desirable to be able to automatically select task executive engine.It is existing
It supports the platform of more computing engines, cannot all solve the above problems.For example, Twitter SummingBird uses Lambda frame
Structure incorporates distributed batch processing engine (Hadoop) and distributed stream computing engines (Storm), can be whole when executing request
Close batch processing and stream calculation as a result, but its there is no convenient engine management mechanism, while without provide engine-operated environment every
From;Apache Ambari is realized based on Web, supports supply, management and the monitoring of Apache Hadoop ecology, while providing certainly
Defining interface is supported to add all kinds of single machines or Distributed engine, but it does not provide unified calculating task management, can only guarantee
Particular engine compatibility, while artificial selection computing engines being needed to execute calculating task;Google Kubernete is based on
Docker is realized, computing engines can be run in a manner of container, can run single machine engine and Distributed engine according to demand,
The deployment of container is provided, the functions such as extension between scheduling and node cluster, but it does not have task management mechanism, while being also required to artificial
Select computing engines.
Therefore, it is necessary to be improved to the prior art, to provide more computing engines platforms and be put down towards more computing engines
The method of the automation selection computing engines of platform.
Summary of the invention
It is an object of the invention to overcome the defect of the above-mentioned prior art, a kind of computing engines selection method is provided and is based on
More computing engines platforms of this method.
According to the first aspect of the invention, a kind of computing engines selection method is provided.Method includes the following steps:
Step 1: the corresponding task characteristic of task to be calculated being input to each of multiple computing engines and is calculated
The task execution time prediction model of engine obtains task execution time prediction of the task to be calculated on each computing engines
As a result, wherein the task execution time prediction model is to be obtained based on training sample set by training, the training sample set
Including a plurality of task characteristic and corresponding task execution time;
Step 2: being selected to execute from the multiple computing engines according to the task execution time prediction result to be calculated
The computing engines of task.
In one embodiment, the task characteristic includes algorithm types, algorithm parameter, data type, data volume
With at least one in data storage position.
In one embodiment, the training sample set of a computing engines is constructed by following steps:
Step 31: a plurality of task description data for being used to describe mission bit stream are collected,;
Step 32: executing the corresponding task of each task description data using the computing engines, obtain each task
The corresponding task execution time of data is described;
Step 33: the feature composition task characteristic for influencing task execution time is extracted from each task description data
According to constructing the training sample set of the computing engines in conjunction with task execution time obtained.
In one embodiment, the task execution time for a computing engines being obtained by executing following steps predicts mould
Type:
Step 41: the training sample set based on the computing engines, using task characteristic as independent variable, when with task execution
Between be dependent variable, establish linear regression model (LRM), indicate are as follows:
yi=β0+β1xi1+…+βpxip, i=1,2 ..., n
Wherein, xi1To xipIndicate the task feature that the training sample set of the computing engines includes, i indicates the computing engines
Training sample concentrate include sample data item number number, n be the computing engines training sample set sample data item
Number, β0For bias to be optimized, β1To βpFor weighted value to be optimized;
Step 42: the optimization weighted value and bias of the linear regression model (LRM) are solved using least square method;
Step 43: the linear regression model (LRM) being indicated according to the optimization weighted value and bias of acquisition, the calculating is obtained and draws
The task execution time prediction model held up.
In one embodiment, step 2 includes following sub-step:
Step 51: selection prediction executes time shortest computing engines;Or
Step 52: when the surplus resources that the prediction executes time shortest computing engines cannot support the feelings of task to be calculated
Under condition, according to the task execution time prediction result successively preferential short computing engines of selection predicted time.
According to the second aspect of the invention, a kind of more computing engines platforms are provided.The platform includes:
Calculating task management module: for managing the process flow of calculating task and generating calculating task information;
Engine management module: for according to from the calculating task management module calculating task information according to this hair
Bright computing engines selection method selects computing engines;
Task execution module: for executing calculating task and exporting task execution time.
In one embodiment, more computing engines of the invention select platform further include:
Container Management module: for calling the task execution module to execute calculating task;
User interactive module: for receiving user operation instruction and information;
Debugging task management module: for executing user's debugging task and exporting Debugging message.
In one embodiment, when the computing engines change that the platform includes, the engine management module will be counted newly
The task execution time prediction model activation of engine is calculated, while by the task execution time prediction model of the computing engines replaced
It is set as unactivated state.
Compared with the prior art, the advantages of the present invention are as follows: the computing engines selection method of offer can utilize engineering
Learning method constructs the task execution time prediction model of multiple computing engines, and is combined based on the prediction result of constructed model
Resource situation automatically selects the highest computing engines of efficiency, can substantially reduce task execution time, improves experimental data processing
Efficiency;There is provided provide task management mechanism and can support by more computing engines platforms of computing engines selection method based on
Engine change is calculated, improves flexibility, while good support is provided to the extension of the following engine.
Detailed description of the invention
The following drawings only makees schematical description and interpretation to the present invention, is not intended to limit the scope of the present invention, in which:
Fig. 1 shows the flow chart of computing engines selection method according to an embodiment of the invention;
Fig. 2 shows the block schematic illustrations of more computing engines platforms according to an embodiment of the invention.
Specific embodiment
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear
Crossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain
The present invention is not intended to limit the present invention.
According to one embodiment of present invention, a kind of computing engines selecting party towards more computing engines platforms is provided
Method, in short, constructing training sample set this method comprises: collect the task execution data of multiple computing engines;Using constructed
Training sample set pass through machine learning mode training mission running time prediction model;Utilize trained task execution time
Prediction model predicts the task execution time of each computing engines, and then selects suitable computing engines.Specifically, referring to Fig. 1
It is described, computing engines selection method of the invention the following steps are included:
Step S110 collects the task execution data of multiple computing engines to construct training sample set.
In this step, runing time data when multiple computing engines face the calculating task of different condition are collected, to the greatest extent
It is possible each generic task is all contained in, to construct comprehensive training sample set.
According to one embodiment of present invention, the process for constructing training sample set includes following sub-step:
Step S111 prepares testing data
Algorithm to be tested is collected, prepares testing data appropriate for each algorithm.For example, can be according to training platform performance
The data volume of processing is usually required to determine testing data amount, for another example according to the actual situation by total amount of data to be measured with each algorithm
Obtain the upper limit.
Step S112, eligible task describe data
Task description data are used to describe the information of performed task, for example, the algorithm of task execution, the related ginseng of algorithm
Number etc..One computing engines can execute specific tasks according to task description data.
It in one embodiment, is a hexa-atomic group of Task_Info, including < Task_ID by task description data definition,
Algorithm_ID, Algorithm_Args, Data_Type, Data_Size, Data_Path >, wherein Task_ID indicates to appoint
Business serial number, content is integer;Algorithm_ID indicates algorithm serial number, and content is integer, be can be mapped to according to the algorithm serial number
Specific algorithm, such as algorithm include FP-Growth algorithm, K-Means algorithm, PageRank algorithm and Pearson correlation coefficients
Algorithm etc.;Algorithm_Args indicates algorithm parameter, and content is Json coded string, by taking K-Means algorithm as an example, algorithm
Parameter may include score number of clusters amount k, initialization mode initMode, maximum number of iterations maxItr etc.;Data_Type indicates number
According to type, content is number designation discrete value;Data_Size indicates data volume, and content is integer, and unit is byte;Data_
Path indicates data storage position, inside has number designation discrete value, value has local file system or distributed file system.
It should be noted that defined task description data may include it is influential on task execution time it is any its
His content, for example, may also include task execution priority etc. other than the algorithm above serial number and algorithm related coefficient.
In addition, the content that algorithm parameter is included is also different for different algorithms.
Step S113 executes calculating task according to task description data, obtains the task execution time of multiple computing engines
Data
For each computing engines, meter is executed using the description data Task_Info of ready multiple calculating tasks
Calculation task collects the execution time Run_Time of each calculating task, by task description data and execute the time form two tuples <
Task_Info, Run_Time >, obtain task execution data.
Step 114, data cleansing is carried out to task execution data
The purpose of data cleansing is to reject possible vicious or incomplete data.For example, can be held to all tasks
Row data are for statistical analysis, obtain task execution time standard deviation, if a certain task execution time and average task execution
Time is more than 3 times of standard deviations, then labels it as abnormal data and reject.In addition, also being carried out for the data of attribute column missing
It rejects, attribute column missing includes task execution time missing or task description loss of learning, also carries out rejecting operation.
Step 115, Feature Engineering
The purpose of Feature Engineering is that the effect that has a significant effect to the execution time of task is selected from task description data
Feature, to construct training sample set.When training sample of the invention is concentrated including task characteristic and corresponding task execution
Between, it include multiple tasks feature influential on task execution time in task characteristic.
According to one embodiment of present invention, the Task_ID in task description data Task_Info holds calculating task
The row time does not influence, therefore can reject this feature, and task description describes algorithm serial number (i.e. algorithm types) in data, algorithm
The execution time of the tasks such as parameter, data storage position, data type, data volume has an impact, therefore is retained as instructing
Practice the task characteristic in sample set.
When constructing final training sample set, for the task feature of discretization, if by the way of serial number code,
Then the sequence of serial number influences whether unordered discrete magnitude, causes additional information input.Therefore, an implementation according to the present invention
The mode of one-hot coding (one-hot coding) can be used to encode discrete data for example.For example, for a task description
Data, discrete features include data storage position, and the value of the discrete features has " local file system " and " distributed field system
System ".After being encoded using one-hot, which becomes two independent features: " data storage position-local " and " data
Storage position-distribution ".When former data storage position feature value is " local file system ", " data storage position-sheet
Ground " feature takes 1, and " data storage position-distribution " feature value is 0;When former data storage position feature value is " distributed
When file system ", " data storage position-local " feature value is 0, and " data storage position-distribution " spy value is 1.Class
As, algorithm serial number Algorithm_ID can also be encoded using one-hot, i.e., when including four kinds of algorithms, the discrete features
Become four independent features.
For the sake of clarity, the following table 1 illustrates the example of the training sample set of building.
Table 1: training sample set example
Table 1 illustrates the training sample set of computing engines 1, it should be noted that task characteristic therein is according to reality
Influence situation when the experiment process of border to task execution time, may include algorithm types, algorithm parameter, data storage position,
In data type, data volume at least one of or can also increase other task features.In addition, when being compiled using one-hot
Code is when encoding the discrete features of task characteristic, which will become multiple independent features, but this hair
It is bright to be not limited to encode discrete features using one-hot coding mode.
Step S120 utilizes constructed training sample set training mission running time prediction model.
In this step, it is trained using machine learning method using training sample set, to obtain each computing engines
Corresponding task execution time prediction model.
For example, linear regression can be used, gradient promotes the machine learning models such as regression tree (GBRT), XGBoost and instructed
Practice.
In one embodiment, it is trained using linear regression model (LRM), the task characteristic concentrated with training sample
Linear regression model (LRM) is established using task execution time as dependent variable for independent variable.For example, linear regression model (LRM) may be expressed as:
Y=e+a1X1+a2X2+a3X3+a4X4+a5X5+a6X6+a7X7 (1)
Wherein, X1Represent data volume, X2、X3It represents data storage location and uses the feature after one-hot coding, X4To X7Generation
Table algorithm serial number Algorithm_ID represents task execution time, a using the feature after one-hot coding, Y1To a7It indicates to excellent
The weighted value of change, e indicate bias to be optimized.
Generally, the p member linear regression model (LRM) that can be established may be expressed as:
yi=β0+β1xi1+…+βpxip, i=1,2 ..., n (2)
Wherein, p be training sample set task characteristic in include feature quantity, i correspond to training sample concentrate wrap
The number of the number of data contained, n are the number of data that training sample is concentrated, β0For biasing to be optimized, β1To βpFor weight to be optimized
Value, xi1To xipThe task feature that corresponding training sample is concentrated.
In training, least square method acquisition optimization weighted value can be used and bias, the target of least square method are to allow
The quadratic sum of error is minimum, it may be assumed that
Then, local derviation is asked to parameters:
Obtain normal equation group:
Write as matrix form are as follows:
X ' X β=X ' Y (6)
To obtain the solution (including weighted value and bias) of parameter:
In this step, optimization weighted value and bias can get by training, utilizes these optimization weights and bias table
The model shown is task execution time prediction model, and in this way, capable of the obtaining each computing engines of the task is held
Row time prediction model.
Step S130, when predicting the task execution of each computing engines using trained task execution time prediction model
Between, and then select suitable computing engines.
When needing to be implemented a new calculating task, calculating task characteristic is generated according to calculating task attribute first
According to being successively entered into the task execution prediction model of each computing engines later, obtain the task of each computing engines
Running time prediction selects most suitable computing engines to execute as a result, finally in conjunction with system resource situation and calculating task demand
Calculating task.According to one embodiment of present invention, including following sub-step:
Step 131, calculating task characteristic is generated
The process of task characteristic is generated when generating the process and building training sample set of task characteristic to be calculated
It is similar, such as, it is possible to use one-hot encodes discrete features therein, the calculating task characteristic after finally verifying conversion
According to whether meeting the input of task execution time prediction model.
Step 132, task execution time prediction result is obtained
The characteristic of calculating task to be predicted is input to the task execution time prediction model an of computing engines
Modeli, obtain running time prediction result P_Time of the calculating task on i-th of enginei, and then all draw can be obtained
The running time prediction result held up.
Step 132, it is selected that the computing engines of calculating task will be executed according to prediction result.
Suitable engine is selected to execute calculating according to the time prediction result combination resource service condition of each computing engines
Task.
For example, select time shortest engine in all running time prediction results, judge whether its surplus resources can be with
It supports calculating task operation, if that cannot support, chooses the secondary short engine of time in all running time prediction results, judge it
Resource.Until meeting calculating task demand, then this computing engines is selected to run calculating task.
According to an embodiment of the present invention, a kind of more computing engines platforms are provided, which includes provided by the invention
Computing engines selection method, can be applied to Data Processing in Experiment.It is shown in Figure 2, more computing engines platforms of the embodiment
Comprising user interactive module 210, debugging task execution module 220, calculating task management module 230, engine management module 240,
Container Management module 250, task execution module 260, algorithm management module 270 and algorithms library 271, mission bit stream management module
280 and mission bit stream library 281.
User interactive module 210 is for the information exchange between user.Specifically, Flask can be used, and (it is Python
The micro- frame of the Web write) rear end part as user interactive module 210, use bootstrap and jquery to construct Web page
Face, while backstage can provide the interface of RESTful software architecture style for secondary development use.For example, tool when practical application
Body process includes: to collect user operation instruction and input information etc. using web interface;User's operation is converted into Json
The network exchange information of (JavaScript Object Notation, JS object numbered musical notation) format;It is (asynchronous using Ajax
JavaScript and XML) call 210 rear end of user interactive module routing interface;Rear end responds routing interface request, completes special
Determine function and returns to processing result;Web interface receives processing result and responds user's operation.
Debugging task management module 220 is pushed to web interface in real time for realizing back-end algorithm output and Debugging message
Function.For example, can be used WebSocket as front and back end communication protocol, the detailed process of realization includes: user in web interface
In write algorithm and algorithm debugging data;User submits debugging task;Rear end executes user's debugging task, and returns and execute knot
Fruit and Debugging message;User confirms that algorithm writes situation, still needs to return to the first step when debugging;User confirms will calculate after algorithm is normal
Method is added in algorithms library 271.
Calculating task management module 230 is used to manage the process flow of calculating task.For example, directed acyclic graph can be used to compile
The process flow for arranging test data, by visualizing directed acyclic graph in web interface, in order to which user checks and confirm process
Information.Specifically, the step of calculating task management module 230 include: user according to test data process addition calculating task;Root
It is confirmed whether the calculating task of modification addition according to layout visualization result;Confirm it is errorless after can request engine management module 240, obtain
Take task executive engine information;Container Management module 250 is called to execute calculating task.
Engine management module 240 is for selecting computing engines and management computing engines state etc..Computing engines can be by opening
The application container engine Docker in source is packaged, and becomes container one by one, to carry out resource management.Engine management module
240 function includes: that computing engines selection algorithm of the invention is called according to each mission bit stream, and Optimal calculation is selected to draw
It holds up;Judge whether selected engine is active, activates the engine if in unactivated state;By final choice
Task execution computing engines information returns to calculating task management module 230.
Container Management module 250 is for managing container function, for example, docker-python can be used to carry out pipe to container
Reason, by one group, specifically application and necessary dependence library form each container.The function of Container Management module 250 includes providing
Container addition, container starting, container stops and the functions such as container inquiry, and calling task execution module 260 is completed to calculate and be appointed
Business executes.
Task execution module 260 is encapsulated using container technique by calculating task is unified, provide algorithm call, operation monitoring and
Data collection function.The function of task execution module 260 includes: to check task data integrality;Judge whether algorithm needs three
Fang Ku installs three-party library if needed;Judge whether algorithm input data form is consistent with algorithm requirements, it is inconsistent then according to calculation
Method demand converts input data form;It executes algorithm and statistic algorithm executes the time;Collection algorithm implementing result simultaneously updates calculating
Mission bit stream etc..
Algorithm management module 270 and algorithms library 271 for managing the algorithm that more computing engines platforms are realized, such as using
Database MongoDB and HDFS based on distributed document storage provide distributed algorithms library and support, provide algorithm addition,
Deletion and query function.
Mission bit stream management module 280 and mission bit stream library 281 are used for management role information, for example, using relationship type number
The storage and management of mission bit stream is provided according to base management system MySQL, the functions such as mission bit stream addition, deletion, inquiry are provided.
It should be noted that the method variation of flow chart of data processing is very fast, when due to carrying out actual test data for section
The resource of more computing engines platforms is saved, computational efficiency is improved, computing engines used in platform should be with experimental data processing task
And change.It when computing engines are changed, needs to be changed according to engine, dynamically adjusts the task execution obtained according to the present invention
The state of time prediction model changes engine with realization adaptive.For example, when the computing engines of change are the meter being newly added
It when calculating engine, is trained for the engine using training sample set, obtains the task execution time prediction mould of the computing engines
Type.For another example, if replacement be trained engine when, by the activation of the task execution time prediction models of new computing engines,
Unactivated state is set by the task execution time prediction model of the engine replaced simultaneously.
In conclusion more computing engines platforms provided by the invention are easy-to-use, can towards experimental data processing demand,
It supports the online layout of more calculating tasks, while efficiency optimization can be automatically selected based on computing engines selection method of the invention
Computing engines support on-line Algorithm debugging, support computing engines automatic packaging and switching etc..Specifically, more calculating of the invention
The beneficial effect of engine platform is mainly reflected in: using container technique encapsulation computing engines and calculating task, start and stop speed is fast, money
Source consumption is low, can achieve the purpose being environmentally isolated between engine and task with resource constraint;It is called using general calculating task
Process has unified the difference between different task, convenient to be managed collectively to calculating task;The online editing of algorithm is supported to survey
Examination, can attempt to check algorithm debugging result, algorithm editorial efficiency can be greatly improved;The characteristics of towards experimental data processing,
Calculating task is provided and visualizes layout, it is possible to reduce user's layout mistake improves experimental data processing flexibility;Task based access control is held
Machine learning method can be used in more computing engines selection algorithms of row time prediction, the automatic operation rule for excavating computing engines
Rule, for the highest computing engines of specific calculation task combination resource situation efficiency of selection, holds to substantially reduce calculating task
The row time improves experimental data processing efficiency;It supports computing engines change, can be improved system flexibility, while to future
Engine extension provides good support.
It should be noted that, although each step is described according to particular order above, it is not intended that must press
Each step is executed according to above-mentioned particular order, in fact, some in these steps can concurrently execute, or even is changed suitable
Sequence, as long as can be realized required function.
The present invention can be system, method and/or computer program product.Computer program product may include computer
Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.
Computer readable storage medium can be to maintain and store the tangible device of the instruction used by instruction execution equipment.
Computer readable storage medium for example can include but is not limited to storage device electric, magnetic storage apparatus, light storage device, electromagnetism and deposit
Store up equipment, semiconductor memory apparatus or above-mentioned any appropriate combination.The more specific example of computer readable storage medium
Sub (non exhaustive list) include: portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM),
Erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), Portable compressed disk are read-only
Memory (CD-ROM), memory stick, floppy disk, mechanical coding equipment, is for example stored thereon with instruction at digital versatile disc (DVD)
Punch card or groove internal projection structure and above-mentioned any appropriate combination.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its
Its those of ordinary skill can understand each embodiment disclosed herein.
Claims (10)
1. a kind of computing engines selection method, comprising the following steps:
Step 1: the corresponding task characteristic of task to be calculated is input to each of multiple computing engines computing engines
Task execution time prediction model, obtain task to be calculated on each computing engines task execution time prediction knot
Fruit, wherein the task execution time prediction model is to be obtained based on training sample set by training, the training sample set packet
Include a plurality of task characteristic and corresponding task execution time;
Step 2: being selected to execute task to be calculated from the multiple computing engines according to the task execution time prediction result
Computing engines.
2. according to the method described in claim 1, wherein, the task characteristic includes algorithm types, algorithm parameter, data
At least one of in type, data volume and data storage position.
3. according to the method described in claim 1, wherein, the training sample set of a computing engines is constructed by following steps:
Step 31: a plurality of task description data for being used to describe mission bit stream are collected,;
Step 32: executing the corresponding task of each task description data using the computing engines, obtain each task description
The corresponding task execution time of data;
Step 33: the feature composition task characteristic for influencing task execution time is extracted from each task description data,
The training sample set of the computing engines is constructed in conjunction with task execution time obtained.
4. according to the method described in claim 1, wherein, the task execution of a computing engines is obtained by executing following steps
Time prediction model:
Step 41: the training sample set based on the computing engines is with task execution time using task characteristic as independent variable
Dependent variable establishes linear regression model (LRM), indicates are as follows:
yi=β0+β1xi1+…+βpxip, i=1,2 ..., n
Wherein, xi1To xipIndicate the task feature that the training sample set of the computing engines includes, i indicates the training of the computing engines
The number for the sample data item number for including in sample set, n are the sample data item number of the training sample set of the computing engines, β0For
Bias to be optimized, β1To βpFor weighted value to be optimized;
Step 42: the optimization weighted value and bias of the linear regression model (LRM) are solved using least square method;
Step 43: the linear regression model (LRM) being indicated according to the optimization weighted value and bias of acquisition, obtains the computing engines
Task execution time prediction model.
5. method according to any one of claims 1 to 4, wherein step 2 includes following sub-step:
Step 51: selection prediction executes time shortest computing engines;Or
Step 52: when the prediction execute time shortest computing engines surplus resources cannot support task to be calculated the case where
Under, according to the task execution time prediction result successively preferential short computing engines of selection predicted time.
6. a kind of more computing engines platforms characterized by comprising
Calculating task management module: for managing the process flow of calculating task and generating calculating task information;
Engine management module: for according to from the calculating task management module calculating task information according to claim
1 to 5 described in any item method choice computing engines;
Task execution module: for executing calculating task and exporting task execution time.
7. more computing engines platforms according to claim 6, which is characterized in that further include:
Container Management module: for calling the task execution module to execute calculating task;
User interactive module: for receiving user operation instruction and information;
Debugging task management module: for executing user's debugging task and exporting Debugging message.
8. platform according to claim 7, which is characterized in that described when the computing engines change that the platform includes
Engine management module activates the task execution time prediction model of new computing engines, while appointing the computing engines replaced
Business running time prediction model is set as unactivated state.
9. a kind of computer readable storage medium, is stored thereon with computer program, wherein real when the program is executed by processor
Now according to claim 1 to any one of 5 the method the step of.
10. a kind of computer equipment, including memory and processor, be stored on the memory to transport on a processor
Capable computer program, which is characterized in that the processor realizes any one of claims 1 to 5 institute when executing described program
The step of method stated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810734031.8A CN108985367A (en) | 2018-07-06 | 2018-07-06 | Computing engines selection method and more computing engines platforms based on this method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810734031.8A CN108985367A (en) | 2018-07-06 | 2018-07-06 | Computing engines selection method and more computing engines platforms based on this method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108985367A true CN108985367A (en) | 2018-12-11 |
Family
ID=64536300
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810734031.8A Pending CN108985367A (en) | 2018-07-06 | 2018-07-06 | Computing engines selection method and more computing engines platforms based on this method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108985367A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109806590A (en) * | 2019-02-21 | 2019-05-28 | 腾讯科技(深圳)有限公司 | Object control method and apparatus, storage medium and electronic device |
CN110362611A (en) * | 2019-07-12 | 2019-10-22 | 拉卡拉支付股份有限公司 | A kind of data base query method, device, electronic equipment and storage medium |
CN110727697A (en) * | 2019-08-29 | 2020-01-24 | 北京奇艺世纪科技有限公司 | Data processing method and device, storage medium and electronic device |
WO2020125182A1 (en) * | 2018-12-19 | 2020-06-25 | Oppo广东移动通信有限公司 | Algorithm processing method and apparatus, and storage medium and terminal device |
CN111401560A (en) * | 2020-03-24 | 2020-07-10 | 北京觉非科技有限公司 | Inference task processing method, device and storage medium |
CN111723112A (en) * | 2020-06-11 | 2020-09-29 | 咪咕文化科技有限公司 | Data task execution method and device, electronic equipment and storage medium |
CN112558938A (en) * | 2020-12-16 | 2021-03-26 | 中国科学院空天信息创新研究院 | Machine learning workflow scheduling method and system based on directed acyclic graph |
CN113139205A (en) * | 2021-04-06 | 2021-07-20 | 华控清交信息科技(北京)有限公司 | Secure computing method, general computing engine, device for secure computing and secure computing system |
WO2023272853A1 (en) * | 2021-06-29 | 2023-01-05 | 未鲲(上海)科技服务有限公司 | Ai-based sql engine calling method and apparatus, and device and medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6728869B1 (en) * | 2000-04-21 | 2004-04-27 | Ati International Srl | Method and apparatus for memory latency avoidance in a processing system |
CN102736896A (en) * | 2011-03-29 | 2012-10-17 | 国际商业机器公司 | Run-ahead approximated computations |
CN104834561A (en) * | 2015-04-29 | 2015-08-12 | 华为技术有限公司 | Data processing method and device |
CN105404611A (en) * | 2015-11-09 | 2016-03-16 | 南京大学 | Matrix model based multi-calculation-engine automatic selection method |
CN105900127A (en) * | 2013-09-11 | 2016-08-24 | 芝加哥期权交易所 | System and method for determining a tradable value |
CN106649503A (en) * | 2016-10-11 | 2017-05-10 | 北京集奥聚合科技有限公司 | Query method and system based on sql |
CN106649119A (en) * | 2016-12-28 | 2017-05-10 | 深圳市华傲数据技术有限公司 | Stream computing engine testing method and device |
CN107077385A (en) * | 2014-09-10 | 2017-08-18 | 亚马逊技术公司 | Calculated examples start the time |
US9823968B1 (en) * | 2015-08-21 | 2017-11-21 | Datadirect Networks, Inc. | Data storage system employing a variable redundancy distributed RAID controller with embedded RAID logic and method for data migration between high-performance computing architectures and data storage devices using the same |
-
2018
- 2018-07-06 CN CN201810734031.8A patent/CN108985367A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6728869B1 (en) * | 2000-04-21 | 2004-04-27 | Ati International Srl | Method and apparatus for memory latency avoidance in a processing system |
CN102736896A (en) * | 2011-03-29 | 2012-10-17 | 国际商业机器公司 | Run-ahead approximated computations |
CN105900127A (en) * | 2013-09-11 | 2016-08-24 | 芝加哥期权交易所 | System and method for determining a tradable value |
CN107077385A (en) * | 2014-09-10 | 2017-08-18 | 亚马逊技术公司 | Calculated examples start the time |
CN104834561A (en) * | 2015-04-29 | 2015-08-12 | 华为技术有限公司 | Data processing method and device |
US9823968B1 (en) * | 2015-08-21 | 2017-11-21 | Datadirect Networks, Inc. | Data storage system employing a variable redundancy distributed RAID controller with embedded RAID logic and method for data migration between high-performance computing architectures and data storage devices using the same |
CN105404611A (en) * | 2015-11-09 | 2016-03-16 | 南京大学 | Matrix model based multi-calculation-engine automatic selection method |
CN106649503A (en) * | 2016-10-11 | 2017-05-10 | 北京集奥聚合科技有限公司 | Query method and system based on sql |
CN106649119A (en) * | 2016-12-28 | 2017-05-10 | 深圳市华傲数据技术有限公司 | Stream computing engine testing method and device |
Non-Patent Citations (1)
Title |
---|
VIKTOR FARCIC: "《微服务运维实战》", 30 June 2018 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020125182A1 (en) * | 2018-12-19 | 2020-06-25 | Oppo广东移动通信有限公司 | Algorithm processing method and apparatus, and storage medium and terminal device |
WO2020168877A1 (en) * | 2019-02-21 | 2020-08-27 | 腾讯科技(深圳)有限公司 | Object control method and apparatus, storage medium and electronic apparatus |
CN109806590A (en) * | 2019-02-21 | 2019-05-28 | 腾讯科技(深圳)有限公司 | Object control method and apparatus, storage medium and electronic device |
US11938400B2 (en) | 2019-02-21 | 2024-03-26 | Tencent Technology (Shenzhen) Company Limited | Object control method and apparatus, storage medium, and electronic apparatus |
KR20210064373A (en) * | 2019-02-21 | 2021-06-02 | 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 | Object control method and apparatus, storage medium and electronic device |
KR102549758B1 (en) * | 2019-02-21 | 2023-06-29 | 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 | Object control method and device, storage medium and electronic device |
CN110362611A (en) * | 2019-07-12 | 2019-10-22 | 拉卡拉支付股份有限公司 | A kind of data base query method, device, electronic equipment and storage medium |
CN110727697B (en) * | 2019-08-29 | 2022-07-12 | 北京奇艺世纪科技有限公司 | Data processing method and device, storage medium and electronic device |
CN110727697A (en) * | 2019-08-29 | 2020-01-24 | 北京奇艺世纪科技有限公司 | Data processing method and device, storage medium and electronic device |
CN111401560A (en) * | 2020-03-24 | 2020-07-10 | 北京觉非科技有限公司 | Inference task processing method, device and storage medium |
CN111723112A (en) * | 2020-06-11 | 2020-09-29 | 咪咕文化科技有限公司 | Data task execution method and device, electronic equipment and storage medium |
CN112558938B (en) * | 2020-12-16 | 2021-11-09 | 中国科学院空天信息创新研究院 | Machine learning workflow scheduling method and system based on directed acyclic graph |
CN112558938A (en) * | 2020-12-16 | 2021-03-26 | 中国科学院空天信息创新研究院 | Machine learning workflow scheduling method and system based on directed acyclic graph |
CN113139205A (en) * | 2021-04-06 | 2021-07-20 | 华控清交信息科技(北京)有限公司 | Secure computing method, general computing engine, device for secure computing and secure computing system |
WO2023272853A1 (en) * | 2021-06-29 | 2023-01-05 | 未鲲(上海)科技服务有限公司 | Ai-based sql engine calling method and apparatus, and device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108985367A (en) | Computing engines selection method and more computing engines platforms based on this method | |
US11790161B2 (en) | Machine learning selection and/or application of a data model defined in a spreadsheet | |
US11080435B2 (en) | System architecture with visual modeling tool for designing and deploying complex models to distributed computing clusters | |
CN112199086B (en) | Automatic programming control system, method, device, electronic equipment and storage medium | |
CN110192210A (en) | Building and processing are used for the calculating figure of dynamic, structured machine learning model | |
CN108037919A (en) | A kind of visualization big data workflow configuration method and system based on WEB | |
CN107678790A (en) | Flow calculation methodologies, apparatus and system | |
CN103064670B (en) | Innovation platform data managing method based on position net and system | |
CN110941467A (en) | Data processing method, device and system | |
US20170139685A1 (en) | Visual software modeling method to construct software views based on a software meta view | |
CN109448100A (en) | Threedimensional model format conversion method, system, computer equipment and storage medium | |
CN116127899B (en) | Chip design system, method, electronic device, and storage medium | |
CN117992078B (en) | Automatic deployment method for reasoning acceleration service based on TensorRT-LLM model | |
CN105940636A (en) | Technologies for cloud data center analytics | |
US20240020556A1 (en) | Information processing method and apparatus, server, and user device | |
CN112130812B (en) | Analysis model construction method and system based on data stream mixed arrangement | |
CN116227565A (en) | Compiling optimization system and neural network accelerator with variable precision | |
CN115794106A (en) | Method and system for analyzing configuration of binary protocol data of rail transit | |
CN106575241A (en) | Mobile and remote runtime integration | |
CN109902085A (en) | A kind of configuration storage organization optimization method and system | |
CN116911757A (en) | Service realization method, device and storage medium | |
US11775264B2 (en) | Efficient deployment of machine learning and deep learning model's pipeline for serving service level agreement | |
Mileff | Design and development of a web-based graph editor and simulator application | |
US11909626B1 (en) | Identifying switchable elements to isolate a location from sources | |
CN118245238B (en) | Heterogeneous computing power equipment selection method and device apparatus, medium, product, and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |