CN103886203B - Automatic modeling system and method based on index prediction - Google Patents

Automatic modeling system and method based on index prediction Download PDF

Info

Publication number
CN103886203B
CN103886203B CN201410109141.7A CN201410109141A CN103886203B CN 103886203 B CN103886203 B CN 103886203B CN 201410109141 A CN201410109141 A CN 201410109141A CN 103886203 B CN103886203 B CN 103886203B
Authority
CN
China
Prior art keywords
module
algorithm
data
mentioned
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410109141.7A
Other languages
Chinese (zh)
Other versions
CN103886203A (en
Inventor
李攀登
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bull Information Systems (beijing) Co Ltd
Original Assignee
Bull Information Systems (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bull Information Systems (beijing) Co Ltd filed Critical Bull Information Systems (beijing) Co Ltd
Priority to CN201410109141.7A priority Critical patent/CN103886203B/en
Publication of CN103886203A publication Critical patent/CN103886203A/en
Application granted granted Critical
Publication of CN103886203B publication Critical patent/CN103886203B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an automatic modeling system based on index prediction. The automatic modeling system comprises a data loading and storing module, a core algorithm module, a model evaluation/integration module, an enterprise application module and a configuration module, wherein the data loading and storing module is used for loading data and storing result data generated after follow-up processes are completed, the core algorithm module is provided with an algorithm library, enables scripts of algorithm groups in the algorithm library to run and obtains the optimal parameter in each algorithm group, the model evaluation/integration module obtains the optimal or integration algorithm according to the optimal parameters obtained from the core algorithm module, the enterprise application module enables the optimal or integration algorithm obtained by the model evaluation/integration module to run, standardizes the running result and then outputs result data, and the configuration module is used for controlling and driving the data loading and storing module, the core algorithm module, the model evaluation/integration module and the enterprise application module to run. The invention further provides an automatic modeling and deployment processing method based on the index prediction.

Description

A kind of automatic modeling system and method based on index prediction
Technical field:
The present invention relates to the application of data mining technology, especially relate to a kind of automatic modeling based on index prediction System and method.
Background technology:
Along with all trades and professions number of services constantly expands, saturated, the fieriness of competition in market and market coverage scope Continuous extension, need comprehensively to understand the change in self market;Along with the aggravation of the market competition, turn of the market speed is accelerated, Require market is possessed prospective analysis ability;Along with the complexity of the influence factor of business, quantify to judge that market development is abnormal Become a difficult problem.Need people that market index carries out comprehensive analysis, the action such as Predict, Abnormal Development give warning in advance, Current had data mining technology to give to realize from theoretical method, and most common of which is exactly Time series analysis method, Database technology is that the production and operation of the method provides the foundation.
Index prediction modeling and the application process of every profession and trade has had certain accumulation at present, and a lot of enterprise is the most Through having a lot of modeling teacher in some things being engaged in prediction modeling, model is reached the standard grade, current industry have accumulated some predictions Model development and the experience of deployment, the major way of the forecast model used exploitation and deployment is: from data base by number According to exporting to this locality, by modeling Shi Liyong third party's modeling tool, according to business demand training pattern, continuous manual debugging mould Type, obtains model parameter or rule, then parameter or rule are changed into sql language is cured to production environment, then according to model The impact that market is produced after releasing by the reason of the result manual analysis market anomalies of output or new rate.
The exploitation of above-mentioned forecast model and the most reasonable business meeting common time series class of the mode of deployment Demand.But, all there is the biggest drawback in above-mentioned model development and deployment way and later stage Business Processing mode, as when index too much Time, during the most up to a hundred or up to ten thousand indexs, need to model teacher and go respectively to be accomplished manually extracted data, training pattern, exploitation solidification Script, test such as are reached the standard grade at the process, and this demand expends substantial amounts of human cost, and an also drawback: when current data is divided When cloth is unsatisfactory for the parameter distribution trained before or when business demand changes, if original demand is prediction one phase, then demand When changing predicting many phases into, need to model teacher by original each index modeling process the most again exploitation one time, this mode pair Huge waste is caused in human resources;Additionally, the traffic issues going out prediction feedback needs manually to go to judge, will certainly cause Too much subjective judgment is participated.
Summary of the invention:
For solving above-mentioned technical problem, the present invention provides a kind of automatic modeling system based on index prediction, including: number According to loading and memory module, the result data produced after loading data completes with the follow-up flow process of storage;Core algorithm module, Having algorithms library, described core algorithm module runs the script of each algorithm race in described algorithms library, obtains described each and calculates Optimized parameter in method race;Model evaluation/integration module, according to the optimized parameter obtained in described core algorithm module, obtains Optimal algorithm or Integrated Algorithm;Enterprise application modules, runs the described optimal algorithm that described model evaluation/integration module is obtained Or after Integrated Algorithm, and standardization operation result, export described result data;Configuration module, is used for controlling and drive described data Load and memory module, described core algorithm module, described model evaluation/integration module and the fortune of described enterprise application modules OK.
Preferably, described data load and memory module carries out the first pretreatment to the data after loading;Described core is calculated Method module carries out the second pretreatment, sample preparation, model training and test to having carried out described first pretreated data, and Output model training parameter, residual error, predict the outcome and configuration file.
Preferably, described first pretreatment includes that serializing process and multi objective merge;Described core algorithm module is by institute Optimized parameter in each algorithm race described obtained is stored in described configuration file.
Preferably, optimized parameter acquired in described core algorithm module is carried out by described model evaluation/integration module Assessment, obtains described optimal algorithm according to assessment result, or integrates to obtain to corresponding algorithm according to described assessment result Take described Integrated Algorithm.
Preferably, described configuration module includes that data load dispensing unit, model evaluation dispensing unit, enterprise's application configuration Unit and principal function dispensing unit;Described principal function dispensing unit can drive described data to load dispensing unit, described model Assessment dispensing unit, described enterprise application configuration unit and described core algorithm module, drive whole flow process with this, wherein said Data load dispensing unit and are used for driving described data to load and memory module, and described model evaluation dispensing unit is used for driving institute Stating model evaluation/integration module, described enterprise application configuration unit is used for driving described enterprise application modules.
Preferably, expandable resource module, there is expandable resource storehouse, can expand described in the operation of described expandable resource module The script of the algorithms of different race in exhibition resources bank, obtains the optimized parameter in described algorithms of different race.
Preferably, when described configuration block search is less than the configuration of described core algorithm module, drive described expansible The operation of resource module.
Preferably, optimized parameter acquired in described expandable resource module is entered by described model evaluation/integration module Row assessment, obtains described optimal algorithm according to assessment result, or according to described assessment result corresponding algorithm integrated with Obtain described Integrated Algorithm.
Preferably, described configuration module has enterprise's application configuration unit and principal function dispensing unit;
When the search of described principal function dispensing unit is less than the configuration of described core algorithm module, described enterprise is driven to apply Dispensing unit, and the operation of described expandable resource module is driven by described enterprise application configuration unit.
Preferably, display module, it is used for showing described result data.
Preferably, when the configuration of described configuration block search to described display module, described display module is driven to show Described result data.
Preferably, described configuration module has enterprise's application configuration unit and principal function dispensing unit;
When described principal function dispensing unit searches the configuration of described display module, drive described enterprise application configuration list Unit, and the operation of described display module is driven by described enterprise application configuration unit.
Preferably, described model evaluation/integration module, firstly evaluate whether described optimal algorithm meets demand, if full Foot, the most described enterprise application modules exports described result data after running described optimal algorithm standardization operation result, if Being unsatisfactory for demand, corresponding algorithm is integrated to obtain described according to assessment result by the most described model evaluation/integration module Integrated Algorithm, the most described enterprise application modules exports described number of results after running described Integrated Algorithm standardization operation result According to.
On the other hand, the present invention also provides for a kind of method for automatic modeling based on index prediction, including: data load step Suddenly, follow-up flow process desired data is loaded;Core algorithm operating procedure, the script of each algorithm race in operation algorithms library, obtain Optimized parameter in each algorithm race described;Model evaluation/integration step, according to obtain in described core algorithm operating procedure Optimized parameter, obtains optimal algorithm or Integrated Algorithm;Enterprise's applying step, runs in described model evaluation/integration step and is obtained Described result data is exported after the described optimal algorithm obtained or Integrated Algorithm, and standardization operation result;Rate-determining steps, control and Drive described data load step, described core algorithm operating procedure, described model evaluation/integration step and the application of described enterprise Step.
Preferably, in described data load step, the data after loading are carried out the first pretreatment;Calculate in described core In method operating procedure, to carried out described first pretreated data carry out the second pretreatment, sample prepare, model training and Test, and output model training parameter, residual error, predict the outcome and configuration file.
Preferably, described first pretreatment includes that serializing process and multi objective merge;Step is run at described core algorithm In Zhou, the optimized parameter in each algorithm race described in acquired is stored in described configuration file.
Preferably, in described model evaluation/integration step, to optimum acquired in described core algorithm operating procedure Parameter is estimated, and obtains described optimal algorithm according to assessment result, or carries out corresponding algorithm according to described assessment result Integrate to obtain described Integrated Algorithm.
Preferably, storing step, store the described result data obtained in described enterprise applying step.
Preferably, extended resources step, the script of the algorithms of different race in operation expandable resource storehouse, obtain described difference Optimized parameter in algorithm race.
Preferably, when searching for the configuration less than described core algorithm operating procedure in described rate-determining steps, drive institute State the operation of extended resources step.
Preferably, in described model evaluation/integration step, to optimized parameter acquired in described extended resources step It is estimated, obtains described optimal algorithm according to assessment result, or according to described assessment result, corresponding algorithm is integrated To obtain described Integrated Algorithm.
Preferably, show step, show described result data.
Preferably, when searching the configuration of described displaying step in described rate-determining steps, drive described displaying step Show described result data.
Preferably, in described model evaluation/integration step, firstly evaluate whether described optimal algorithm meets demand, as Fruit meets, then export described number of results after running described optimal algorithm standardization operation result in described enterprise applying step According to, if the demand of being unsatisfactory for, then according to assessment result, corresponding algorithm is integrated in described model evaluation/integration step To obtain described Integrated Algorithm, after then running described Integrated Algorithm standardization operation result in described enterprise applying step Export described result data.
The present invention sets about from actual enterprise's application, and the technology relied on has been carried out technological innovation, from automatization and intelligence Classical forecast technology is packaged and standardization by the angle changed, for from manual modeling be deployed to automation modeling and open with deployment Having warded off a wide passage, the present invention is specially adapted to the application that multi-model is built, such as thousands of index prediction models up to a hundred Exploitation and in real time application, help enterprise one accurately, in time, comprehensively prediction and monitor supervision platform, look forward to for comprehensive management The application such as industry strategic management, business management and control, quality of data management and control provides means the most reliably.Implement the present invention be obtained in that with Lower beneficial effect:
1. by the whole processes such as data loading, model development, Model Selection, model deployment are packaged into configurable from Dynamicization, greatly reduces the workload of manual intervention, is greatly improved efficiency.Additionally, carry out the self-study of model according to the cycle arranged Practise, solve tradition and be unsatisfactory for the drawback of fixing training parameter due to data distribution, extendible mould can be shared owing to constructing Type storehouse, also flexible than traditional approach may select of algorithm, reduce owing to modeling personnel's stock of knowledge is short of the skill caused Art barrier.
2. the present invention is the highest to system configuration requirements, method is complete, have that extensibility is strong, automatization, self study degree high Etc. feature, difference maximum with traditional mode in Model Selection is, the method obtains optimal models in each model race, Then in whole model library, search for optimal models and the integrated model of strong learner can be integrated into by configuration module, modeling mould Traditional manual type all it is far superior in formula, efficiency and accuracy.
3. by the service application storehouse package application in the present invention in strategic management, business management and control, quality of data pipe The fields such as reason, are greatly saved manpower in application process.Meanwhile, in data quality monitoring application aspect, the most greatly improve Correct recognition rata;In business management and control application aspect, greatly increase solution traffic identification and the traffic issues energy oriented Power, and there is rate simulation preview function, it is possible to for application, enterprise reduces loss.
Accompanying drawing illustrates:
Fig. 1 is the structured flowchart of the automatic Building modular system that embodiment of the present invention relates to;
Fig. 2 is the flow chart of the automatic modeling that embodiment of the present invention relates to;
Fig. 3 is the flow chart that model evaluation/integration module that embodiment of the present invention relates to is run.
Detailed description of the invention:
Automatic modeling system based on index prediction involved by embodiment of the present invention, this system, can be real based on R language The structure of existing model library, manage, share, data load, storage, model library are concentrated and decomposed exploitation, optimal algorithm search, decompose calculation The function of the flow process encapsulation such as method is integrated again, knowledge migration, model result automatic deployment, the application of model enterprise, application displaying, encapsulation Become different executable modules, and be provided with whole modeling and the automatization of application flow and extension by configure module The connection of module.
The automatic modeling system based on index prediction related in the present embodiment, it relates generally to business and guides certainly Dynamic trigger model automatic development and the method and system of deployment engine, wherein modeling engine includes having configuration, data load, certainly Multiple intelligent processing modules such as study, transfer learning realize business demand, the search setting that configures in the present embodiment, mould Type judges and "current" model exploitation can be judged by evaluation mechanism with the process flow operation disposed, and triggers corresponding script.Under Face, is specifically described with reference to the accompanying drawings.
Fig. 1 shows the structured flowchart of the automatic modeling system based on index prediction that embodiment of the present invention relates to.As Shown in Fig. 1, above-mentioned automatic modeling system based on index prediction includes configuring module 1 and module packed part 2, the most above-mentioned joins Put module 1 and include that data load dispensing unit 11, principal function dispensing unit 12, model evaluation dispensing unit 13 and enterprise's application and join Putting unit 14, above-mentioned module packed part 2 includes core and basic module 21, display module 22, the expansible money of modeling encapsulation Source module 23 and enterprise application modules 24.The core of above-mentioned R-add modeling encapsulation and basic module 21 have data and load and deposit Storage module 211, core algorithm module 212 and model evaluation/integration module 213.
Above-mentioned configuration module 1 is that above-mentioned data load and memory module 211, above-mentioned core algorithm module 212, above-mentioned model Assessment/integrate module 213, above-mentioned display module 22, above-mentioned expandable resource module 23 and above-mentioned enterprise application modules 24 etc. its The driving module of his 6 big modules, is responsible for the configuration of above-mentioned each big module parameter, the encapsulation of flow process, automatically runs driving etc., its Above-mentioned automatic modeling system is equivalent to the role of control centre.Wherein, the above-mentioned data in above-mentioned configuration module 1 load configuration list Unit 11, above-mentioned principal function dispensing unit 12, above-mentioned model evaluation dispensing unit 13 and above-mentioned enterprise application configuration unit 14 etc. are each Unit to above-mentioned data load and memory module 211, above-mentioned core algorithm module 212, above-mentioned model evaluation/integration module 213, The control of other 6 big modules such as above-mentioned display module 22, above-mentioned expandable resource module 23 and above-mentioned enterprise application modules 24 and Driving relationship is:
1) above-mentioned data load and memory module 211 is by different pieces of information source and the digital independent of data form, reading manner (batch or single) etc. are encapsulated as parameterized UDF(user-defined function), then above-mentioned by above-mentioned configuration module 1 Data load dispensing unit 11 and input the actual value of parameter, corresponding script in above-mentioned principal function dispensing unit 12 read above-mentioned Data load the actual value of the above-mentioned parameter in dispensing unit 11 and are driven and control the loading of data;
2) core algorithm is encapsulated as parametrization UDF by above-mentioned model evaluation/integration module 213, then by joining above-mentioned The above-mentioned model evaluation dispensing unit 13 putting module 1 carries out inputting relevant parameter or text, by above-mentioned principal function dispensing unit 12 The unified configuration file reading above-mentioned model evaluation dispensing unit 13, and pass to the UDF of model evaluation, play driving and control Effect;
3) other modules remaining develop core algorithm equally in corresponding module, and simply configuration file should in above-mentioned enterprise Generating with in dispensing unit 14, control mode is still that is gone scheduling by above-mentioned principal function dispensing unit 12 and drives.
Corresponding configuration file in above-mentioned configuration module 1 is placed on the repository (nothing of the bottom of above-mentioned configuration module 1 Diagram).
Specifically, above-mentioned configuration module 1 is used for configuration and the driving of disparate modules, deposits each big module in configuration file The activation bits such as initiation parameter, service selection, data selection, can drive with other according to the parameter that different application configuration is different Dynamic information.Above-mentioned configuration module 1 configures and stores the parameters needed for described automatic modeling system is run, and responsible driving The operation of the modules of described automatic modeling system.
Wherein, the above-mentioned data of above-mentioned configuration module 1 load dispensing unit 11, for data type, data source, data field Between and scheduling configuration.In table 1, different types of data needs load in above-mentioned data and encapsulate corresponding in memory module 211 Reading manner, and load in dispensing unit 11 in the above-mentioned data of above-mentioned configuration module 1 according to service application and configure and carry out Drive.
Table 1:
Such as, the historical data of existing KPI index leaves in enterprise database, in the present embodiment, can pass through Above-mentioned data load and memory module 211 encapsulates the historical data being obtained index by RODBC mode from above-mentioned enterprise database, And it is standardized into the data structure that model needs, parametric data source selection mode, data table name, data intercept cycle etc.;With Time, the above-mentioned data in above-mentioned configuration module 1 load just can be according to actual business demand in dispensing unit 11, by reality Parameter passes to above-mentioned data and loads and the relevant position of the function of encapsulation in memory module 211, then by above-mentioned principal function Dispensing unit 12 carries out unifying to connect and drive.
The above-mentioned principal function dispensing unit 12 of above-mentioned configuration module 1 is for the driving of whole flow process.Above-mentioned principal function configures Unit 12 encapsulate in above-mentioned module packed part 2 above-mentioned data load and memory module 211, above-mentioned core algorithm module 212, on State model evaluation/integration module 213, above-mentioned display module 22, above-mentioned expandable resource module 23 and above-mentioned enterprise application modules The parameterized total UDF of formation of the operation logic relation of 24 modules such as grade and way of search etc., the most different types of algorithm Type such as classification, time series etc. can form different principal functions UDF, defines according to practical business, configures in above-mentioned principal function When unit 12 transmits actual parameter, the above-mentioned data in above-mentioned configuration module 1 are driven to load dispensing unit 11, above-mentioned model evaluation The unit such as dispensing unit 13 and above-mentioned enterprise application configuration unit 14, and thus drive in above-mentioned module packed part 2 each greatly Module.
Above-mentioned model evaluation dispensing unit 13 in above-mentioned configuration module 1, for driving in above-mentioned module packed part 2 Above-mentioned model evaluation/integration module 213, such as, personalizable transmission actual parameter bagging, vote, boosting give above-mentioned Model evaluation/integration module 213 assesses integrated process with driving model.
Above-mentioned enterprise application configuration unit 14 in above-mentioned configuration module 1, is used for driving above-mentioned enterprise application modules 24 He Trigger above-mentioned principal function dispensing unit 12.Above-mentioned enterprise application configuration unit 14 is the triggering entrance of whole automatic modeling system, And define actual parameter and the above-mentioned actual parameter of input and output of this automatic modeling system.
Above-mentioned model encapsulation part 2 is the center of the nucleus module such as model development and enterprise's application, and for enterprise, application needs The mould processing asked and process, it can drive according to current demand personalization, namely under the control of above-mentioned configuration module 1 Saying that different business application can selectively drive corresponding module, its standardization operational relation and logic be:
1) first data are loaded onto above-mentioned core algorithm module 212;
2) by above-mentioned core algorithm mould 212 pieces, the data loaded are carried out pretreatment, sample preparation, model training, test Deng operation, and output model training parameter, residual error, predict the outcome, configuration file etc.;
3) by above-mentioned model evaluation/integration module 213, the result of above-mentioned core algorithm module 212 is filtered, Processing and integration;
4) result that above-mentioned model evaluation/integration module 213 exports is supplied to the interface of above-mentioned enterprise application modules 24;
5) above-mentioned display module is driven selectively by the above-mentioned enterprise application configuration unit 14 in above-mentioned configuration module 1 22 and above-mentioned expandable resource module 23.
Above-mentioned modeling encapsulation core and basic module 21, for above-mentioned automatic modeling system kernel model input, Process and output, above-mentioned modeling encapsulation core and basic module 21 be encapsulated as respectively above-mentioned data load and memory module 211, Above-mentioned core algorithm module 212 and above-mentioned model evaluation/integration module 213, generate parameterized three corresponding UDF, its stream Cheng Guanlian is with to call be to complete in the above-mentioned principal function dispensing unit 12 in above-mentioned configuration module 1.
Above-mentioned data load and memory module 211, the result produced after loading data completes with the follow-up flow process of storage Data.Core algorithm operational part and the application of above-mentioned automatic modeling system are all based on data.Concrete processing procedure is as follows:
Load and memory module 211 encapsulates the reading parametrization script of different types of data source in above-mentioned data, and Configuring in above-mentioned configuration module 1 and load setting accordingly, pending data carries out pretreatment to source data, at serializing after loading Reason, multi objective merging etc., and the condition required for current demand and interface type are set in above-mentioned configuration module 1, and carry out Converting, then pretreated data carry out the processed of other modules, the data after final feedback processed are supreme State data to load and memory module 211 stores.
Load dispensing unit 11 by the above-mentioned data in above-mentioned configuration module 1 and can realize batch loading automatically.
Above-mentioned core algorithm module 212 is preferably SRC core algorithm module, is its main operational center of whole system, its It is index prediction algorithm race based on R-add encapsulation and Intelligent treatment flow process, applies required data for producing enterprise Module.Concrete processing procedure is as follows:
1) above-mentioned data are loaded and the preprocessed data of memory module 211 carries out mutation structure Intelligent Recognition, it is assumed that sample This number is n, and catastrophe point is k, and in the case of first calculating known historical sample and sample distribution parameter, each traversal catastrophe point occurs Posterior probability:
Formula 1
Formula 2
Formula 3
Wherein,(formula 4), uses Gibbs sampler (Gibbs pattern generator) method approximation obtains the parameter in posterior probabilityWith catastrophe point k;
2) choosing the sequence of n-k in previous step, as total sample, uses following algorithm to automatically generate CV sample as mould The input of type training: take f and Q and be respectively positive integer, n > f*Q, first Q a length of n-q*f (q=1 ..., Q) subsample make For the input of model training, remaining q*f is checking sample, the like produce Q the CV sample sample as the training of model This input;
3) using above-mentioned CV sample as input, designing and encapsulate each algorithm in algorithm race is Base learner(base Practise device), optional algorithm race is exponential smoothing race, ARIMA race, Xie Zheng race, ARCH race, state space race (state-space), non- Parametric family etc., and to each algorithm race of each sample according to the index such as AIC&SBC, LR, RMSE, MAE, MAPE output algorithm race In optimized parameter, be stored in configuration file model evaluation, to integrate and predict calling of deployment.
Above-mentioned model evaluation/integration module 213, for above-mentioned core algorithm module 212 and above-mentioned expandable resource mould The module that in block 23, produced result is estimated and integrates, is embodied as algorithm:
1) average forecasting error that in calculating core algorithm module, Q the sample of Base learner produces:
(formula 5);
2) calculating bulk sample average forecasting error originally:(formula 6);
3) basisMinimum selects optimum Base Learner;
4) model integration: come according to the actual disposition in the above-mentioned model evaluation dispensing unit 13 in above-mentioned configuration module 1 Driving model integrated process, integrates calculation accordingly based on Base Learner encapsulation in above-mentioned model evaluation/integration module 213 Method, such as bagging, vote, boosting etc..
Above-mentioned display module 22 and above-mentioned expandable resource module 23 are opened up respectively under the driving of above-mentioned configuration module 1 Show and extend.Wherein, above-mentioned display module 22 is used for model training result and the visualization predicted the outcome, and it is by above-mentioned configuration mould The above-mentioned enterprise application configuration unit 14 of block 1 drives, and inputs the model knot produced in above-mentioned core algorithm module 212 Really, as model is estimated, predicted the outcome, according to dissimilar model result standard packaging visualization script, the order of generation exists Above-mentioned principal function dispensing unit 12 is arranged.
Above-mentioned expandable resource module 23 is mainly used in the instrumental routine interface of the non-master class of algorithms, such as textual classification model Core algorithm be sorting algorithm, but its handling process needs to carry out data participle, vectorization etc. and processes, then participle Mastery routine and the mastery routine of vectorization can develop in expansible module, then at the above-mentioned main letter of above-mentioned configuration module 1 Number dispensing unit 12 carries out the connection of flow process, ultimately produces intelligentized process and application.
Above-mentioned enterprise application modules 24, for the process of output result and the application of above-mentioned core algorithm module 212.Example As, encapsulating, in above-mentioned enterprise application modules 24, the enterprise application modules that enterprise demand is leading, above-mentioned enterprise application modules 24 can So that the knowledge transformation of other several modules is precipitated for solving the intelligent of enterprise's problem.Above-mentioned enterprise application modules 24 concrete Process as follows:
In above-mentioned enterprise application modules 24, define different demand data interfaces and tupe according to different application. Such as, data quality management is applied, then need to export the unusual fluctuation within certain period following of certain index or abnormality alarm, and The Rule of judgment of alarm and forecasted future value need the above-mentioned core algorithm module 212 that encapsulated by R-add and above-mentioned expansible money Source module 23 processes and exports.Further, before this, in addition it is also necessary to be first according in above-mentioned enterprise application configuration unit 14 Early warning demand is converted into statistics and mining algorithm, and is placed in core algorithm storehouse or the expandable resource storehouse of R-add encapsulation, and Configuration statistics threshold values and input and output standard in the above-mentioned principal function dispensing unit 12 of above-mentioned configuration module 1, then driving During data quality management program in above-mentioned enterprise application configuration unit 14, above-mentioned automatic modeling system can run above-mentioned joining automatically Put the above-mentioned principal function dispensing unit 12 configured in module 1, then drive above-mentioned core algorithm module 212 and above-mentioned expand The operation of exhibition resource module 23, during output data result feeds back to the loading of above-mentioned data and memory module 211 automatically, the most above-mentioned Principal function dispensing unit 12 drives above-mentioned display module 22 and above-mentioned enterprise application modules 24, shows accordingly result in front end and incites somebody to action Defective in quality data feedback is to corresponding service personnel.
Fig. 2 is the flow chart of the automatic modeling processing procedure that embodiment of the present invention relates to.Below, exist with reference to Fig. 2 explanation Automatic modeling processing procedure of the present embodiment.
First, start above-mentioned automatic modeling system, the actual parameter accepted according to above-mentioned enterprise application configuration unit 14, The information such as generation demand, data source, traffic issues type, and trigger above-mentioned principal function dispensing unit 12(step S1).Above-mentioned master After function dispensing unit 12 runs, drive above-mentioned data to load dispensing unit 14, and driven by above-mentioned data loading dispensing unit 14 Dynamic above-mentioned data load and memory module 211 loads or stores related data, and the most above-mentioned data include local data resource, number The result (step S2) fed back according to database data resource and module and above-mentioned enterprise application modules 24.Specifically, when above-mentioned data Load after being driven with memory module 211, can automatically load data source involved in above-mentioned configuration module 1, and can add simultaneously Carry multi-group data and be used for different types of model development, treat subsequent treatment and utilization.The form of the data of above-mentioned loading and source Can be found in aforementioned table 1.
After above-mentioned data load and memory module 211 is loaded with above-mentioned related data, above-mentioned principal function dispensing unit 12 Search for the configuration text in above-mentioned configuration module 1, it may be judged whether there is the configuration (step S3) of above-mentioned core algorithm module 212. If there is the configuration (step S3: yes) of above-mentioned core algorithm module 212, above-mentioned principal function dispensing unit 12 drives above-mentioned core Algoritic module 212, above-mentioned core algorithm module 212 runs above-mentioned core algorithm according to the relevant configuration in above-mentioned configuration module 1 The each algorithm race script in algorithms library (preferably SRC storehouse) in module 212, exports the prediction index of each algorithm race, statistics The achievement data such as index, weight index, and achievement data based on each algorithm race above-mentioned, export in each algorithm race above-mentioned Optimized parameter (step S4).
If there is no the configuration (step S3: no) of above-mentioned core algorithm module 212, illustrate that current application is to newly increase Application, the above-mentioned principal function dispensing unit 12 in above-mentioned configuration module 1 identifies the phase of above-mentioned expandable resource module 23 automatically Close configuration, drive above-mentioned enterprise application configuration unit 14, and driven above-mentioned expansible money by above-mentioned enterprise application configuration unit 14 Source module 23, the relevant configuration that above-mentioned expandable resource module 23 is identified according to above-mentioned principal function dispensing unit 12 is run from can Extended resources storehouse obtains different algorithm races, and runs the script in above-mentioned different algorithm race, export above-mentioned different calculation The achievement datas such as the prediction index of method race, statistical indicator, weight index, and achievement data based on above-mentioned different algorithm race, Export the optimized parameter (step S5) in above-mentioned different algorithm race.
After above-mentioned core algorithm module 212 or above-mentioned expandable resource module 23 export These parameters data, above-mentioned master Function dispensing unit 12 drives above-mentioned model evaluation dispensing unit 13, and is driven above-mentioned mould by above-mentioned model evaluation dispensing unit 13 Module 213(step S6 is assessed/integrated to type).In step s 6, above-mentioned model evaluation/integration module 213 is calculated according to above-mentioned core Different algorithms or application in algorithms library in method module 212 or the expandable resource storehouse of above-mentioned expandable resource module 23, search Integrated Algorithm after the corresponding valuation functions of rope, output optimal algorithm or integration.Such as, table 2 below is commented exemplified with above-mentioned model Estimate/integrate module 213 according to different algorithms or the operation result of application in the algorithms library in above-mentioned core algorithm module 212, Table 3 below exemplified with above-mentioned model evaluation/integration module 213 according to the expandable resource storehouse in above-mentioned expandable resource module 23 Middle different algorithm or the operation result of application.
Table 2:
Table 3:
ID R-add identifies unusual fluctuation problem Unusual fluctuation probability Identification types
3767193 Off-network is uprushed 0.85 Identify in advance
4571653 Product was quit the subscription of beyond normal range 0.92 Early warning afterwards
After Integrated Algorithm after above-mentioned model evaluation/integration module 213 exports optimal algorithm or integrates, above-mentioned principal function Dispensing unit 12 searches for the Integrated Algorithm (step S7) after above-mentioned optimal algorithm or integration.If there is above-mentioned optimal algorithm or whole Integrated Algorithm (step S7: yes) after conjunction, above-mentioned principal function dispensing unit 12 drives above-mentioned enterprise application configuration unit 14, and by Above-mentioned enterprise application configuration unit 14 drives above-mentioned enterprise application modules 24, and above-mentioned model can be commented by above-mentioned enterprise application modules 24 Estimate/integrate expansible according to the algorithms library in above-mentioned core algorithm module 212 or above-mentioned expandable resource module 23 of module 213 Above-mentioned optimal algorithm or Integrated Algorithm that algorithms different in resources bank or application are obtained are integrated, and export after standardization Related application feedback (result data) (step S8).Such as, according to the performance in data quality management quality pre-alert, strategy module The application functions such as the identification of unusual fluctuation in advance in assessment, business management and control export respective feedback result.If there is no above-mentioned optimum calculation Integrated Algorithm (step S7: no) after method or integration, then be waited for, wait and stay in above-mentioned model evaluation/integration module 213 Integrated Algorithm after output optimal algorithm or integration.
After above-mentioned enterprise application modules 24 is run, above-mentioned principal function dispensing unit 12 can search for above-mentioned configuration module automatically The repository of 1, it may be judged whether have the configuration (step S9) of above-mentioned display module 22.If there is joining of above-mentioned display module 22 Putting, i.e. need to show (step S9: yes), the most above-mentioned principal function dispensing unit 12 drives above-mentioned enterprise application configuration unit 14, and Driven above-mentioned display module 22 by above-mentioned enterprise application configuration unit 14, above-mentioned display module 22 by result data to show knot Really, note is reminded and the output of related data form, and is loaded and memory module 211 to above-mentioned data the above results data feedback Carry out storing (step S10).Such as, if needing to predict future trend, the most above-mentioned display module 22 generates anticipation trend figure also It is automatically saved to show in storehouse, exports the anticipation trend figure generated simultaneously.If there is no the configuration of above-mentioned display module 22, I.e. application need not show (such as unusual fluctuation early warning, only need to be sent to early warning information related service personnel) (step S9: no), then Result data is fed back to the loading of above-mentioned data by above-mentioned principal function dispensing unit 12 and memory module 211 stores, and treats follow-up Process.
Fig. 3 is the operational flow diagram of model evaluation/integration module that embodiment of the present invention relates to.Below, say with reference to figure The bright processing procedure in model evaluation of the present embodiment/integration module.
First, above-mentioned principal function dispensing unit 12 drives above-mentioned model evaluation dispensing unit 13 from above-mentioned configuration module 1 Read the relevant configuration of above-mentioned model evaluation/integration module 213, select corresponding model evaluation standard (step S61), according to upper The optimal models of each model race stating core algorithm module 3 output processes the result (step S62) of data source, by above-mentioned mould Type evaluation criteria, selects the optimal models (step S63) of whole model library, it is judged that whether described optimal models meets demand (step Rapid S64), if meeting (step S64: yes), then export described optimal models (step S65);If be unsatisfactory for (step S64: No), then integrate multiple model closest to demand, integration method be in above-mentioned configuration module 1 configuration method, as bagging, Vote, boosting etc., output meets the integrated model (step S66) of demand.
It should be understood that detailed description of the invention described in description above and embodiment are merely to illustrate the present invention and need not In limiting the scope of the present invention.After having read the present invention, those skilled in the art are to the various equivalents of the present invention Amendment all falls within the application claims limited range.

Claims (24)

1. an automatic modeling system based on index prediction, including:
Data load and memory module, for loading the data needed for follow-up flow process;
Core algorithm module, has algorithms library, and described core algorithm module runs the foot of each algorithm race in described algorithms library This, obtain the optimized parameter in each algorithm race described;
Model evaluation/integration module, according to the optimized parameter obtained in described core algorithm module, obtains optimal algorithm or integrated Algorithm;
Enterprise application modules, runs described optimal algorithm or Integrated Algorithm that described model evaluation/integration module is obtained, and marks Result data is exported after standardization operation result;
Configuration module, is used for controlling and drive described data to load and memory module, described core algorithm module, described model is commented Estimate/integrate module and the operation of described enterprise application modules.
Automatic modeling system the most according to claim 1, it is characterised in that:
Described data load and memory module carries out the first pretreatment to the data after loading;
To having carried out, described first pretreated data carry out the second pretreatment to described core algorithm module, sample prepares, mould Type training and test, and output model training parameter, residual error, predict the outcome and configuration file;
Described data load and memory module can also store described result data produced by described enterprise application modules.
Automatic modeling system the most according to claim 2, it is characterised in that:
Described first pretreatment includes that serializing process and multi objective merge;
Optimized parameter in each algorithm race described in acquired is stored in described configuration file by described core algorithm module.
Automatic modeling system the most according to claim 3, it is characterised in that:
Optimized parameter acquired in described core algorithm module is estimated, according to commenting by described model evaluation/integration module Estimate result and obtain described optimal algorithm, or integrate to obtain described collection preconceived plan to corresponding algorithm according to described assessment result Method.
Automatic modeling system the most according to claim 1, it is characterised in that:
Described configuration module includes that data load dispensing unit, model evaluation dispensing unit, enterprise's application configuration unit and main letter Number dispensing unit;
Described principal function dispensing unit can drive described data to load dispensing unit, described model evaluation dispensing unit, described Enterprise's application configuration unit and described core algorithm module, drive whole flow process with this, and wherein said data load dispensing unit For driving described data to load and memory module, described model evaluation dispensing unit is used for driving described model evaluation/integration Module, described enterprise application configuration unit is used for driving described enterprise application modules.
Automatic modeling system the most according to claim 1, also includes:
Expandable resource module, has expandable resource storehouse, and described expandable resource module is run in described expandable resource storehouse The script of algorithms of different race, obtain the optimized parameter in described algorithms of different race.
Automatic modeling system the most according to claim 6, it is characterised in that:
When described configuration block search is less than the configuration of described core algorithm module, drive the fortune of described expandable resource module OK.
Automatic modeling system the most according to claim 7, it is characterised in that:
Optimized parameter acquired in described expandable resource module is estimated by described model evaluation/integration module, according to Assessment result obtains described optimal algorithm, or integrates to obtain described integrated to corresponding algorithm according to described assessment result Algorithm.
Automatic modeling system the most according to claim 8, it is characterised in that:
Described configuration module has enterprise's application configuration unit and principal function dispensing unit;
When the search of described principal function dispensing unit is less than the configuration of described core algorithm module, drive described enterprise application configuration Unit, and the operation of described expandable resource module is driven by described enterprise application configuration unit.
Automatic modeling system the most according to claim 1, also includes:
Display module, is used for showing described result data.
11. automatic modeling systems according to claim 10, it is characterised in that:
When the configuration of described configuration block search to described display module, described display module is driven to show described number of results According to.
12. automatic modeling systems according to claim 11, it is characterised in that:
Described configuration module has enterprise's application configuration unit and principal function dispensing unit;
When described principal function dispensing unit searches the configuration of described display module, drive described enterprise application configuration unit, And the operation of described display module is driven by described enterprise application configuration unit.
13. according to the automatic modeling system according to any one of claim 1~12, it is characterised in that:
Described model evaluation/integration module, firstly evaluates whether described optimal algorithm meets demand, if it is satisfied, then described enterprise Industry application module exports described result data after running described optimal algorithm standardization operation result, if the demand of being unsatisfactory for, Corresponding algorithm is integrated to obtain described Integrated Algorithm according to assessment result by the most described model evaluation/integration module, so Rear described enterprise application modules exports described result data after running described Integrated Algorithm standardization operation result.
14. 1 kinds of method for automatic modeling based on index prediction, including:
Data load step, loads follow-up flow process desired data;
Core algorithm operating procedure, runs the script of each algorithm race in algorithms library, obtains in each algorithm race described Excellent parameter;
Model evaluation/integration step, according in described core algorithm operating procedure obtain optimized parameter, obtain optimal algorithm or Integrated Algorithm;
Enterprise's applying step, runs the described optimal algorithm obtained in described model evaluation/integration step or Integrated Algorithm, and Result data is exported after standardization operation result;
Rate-determining steps, controls and drives described data load step, described core algorithm operating procedure, described model evaluation/whole Close step and described enterprise applying step.
15. method for automatic modeling according to claim 14, it is characterised in that:
In described data load step, the data after loading are carried out the first pretreatment;
In described core algorithm operating procedure, carry out the second pretreatment, sample to having carried out described first pretreated data This preparation, model training and test, and output model training parameter, residual error, predict the outcome and configuration file.
16. method for automatic modeling according to claim 15, it is characterised in that:
Described first pretreatment includes that serializing process and multi objective merge;
In described core algorithm operating procedure, the optimized parameter in each algorithm race described in acquired is stored in described in join Put in file.
17. method for automatic modeling according to claim 16, it is characterised in that:
In described model evaluation/integration step, optimized parameter acquired in described core algorithm operating procedure is commented Estimate, obtain described optimal algorithm according to assessment result, or integrate to obtain to corresponding algorithm according to described assessment result Described Integrated Algorithm.
18. method for automatic modeling according to claim 14, also include:
Storing step, stores the described result data obtained in described enterprise applying step.
19. method for automatic modeling according to claim 14, also include:
Extended resources step, the script of the algorithms of different race in operation expandable resource storehouse, obtain in described algorithms of different race Optimized parameter.
20. method for automatic modeling according to claim 19, it is characterised in that:
When searching for the configuration less than described core algorithm operating procedure in described rate-determining steps, described extended resources is driven to walk Rapid operation.
21. method for automatic modeling according to claim 20, it is characterised in that:
In described model evaluation/integration step, optimized parameter acquired in described extended resources step is estimated, root Obtain described optimal algorithm according to assessment result, or integrate to obtain described collection to corresponding algorithm according to described assessment result Become algorithm.
22. method for automatic modeling according to claim 14, also include:
Show step, show described result data.
23. method for automatic modeling according to claim 22, it is characterised in that:
When searching the configuration of described displaying step in described rate-determining steps, described displaying step is driven to show described result Data.
24. according to the method for automatic modeling according to any one of claim 14~23, it is characterised in that:
In described model evaluation/integration step, firstly evaluate whether described optimal algorithm meets demand, if it is satisfied, then Described result data is exported after described enterprise applying step runs described optimal algorithm standardization operation result, if discontented Foot demand, then integrate to obtain described according to assessment result in described model evaluation/integration step to corresponding algorithm Integrated Algorithm, exports described knot after then running described Integrated Algorithm standardization operation result in described enterprise applying step Really data.
CN201410109141.7A 2014-03-24 2014-03-24 Automatic modeling system and method based on index prediction Expired - Fee Related CN103886203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410109141.7A CN103886203B (en) 2014-03-24 2014-03-24 Automatic modeling system and method based on index prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410109141.7A CN103886203B (en) 2014-03-24 2014-03-24 Automatic modeling system and method based on index prediction

Publications (2)

Publication Number Publication Date
CN103886203A CN103886203A (en) 2014-06-25
CN103886203B true CN103886203B (en) 2017-01-11

Family

ID=50955093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410109141.7A Expired - Fee Related CN103886203B (en) 2014-03-24 2014-03-24 Automatic modeling system and method based on index prediction

Country Status (1)

Country Link
CN (1) CN103886203B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239630B (en) * 2014-09-10 2017-03-15 中国运载火箭技术研究院 A kind of emulation dispatch system of supportive test design
CN104778254B (en) * 2015-04-20 2018-03-27 北京蓝色光标品牌管理顾问股份有限公司 A kind of distributed system and mask method of non-parametric topic automatic marking
CN107025509B (en) * 2016-02-01 2021-06-18 腾讯科技(深圳)有限公司 Decision making system and method based on business model
CN108229686B (en) * 2016-12-14 2022-07-05 阿里巴巴集团控股有限公司 Model training and predicting method and device, electronic equipment and machine learning platform
CN107169356B (en) * 2017-05-03 2020-08-18 上海上讯信息技术股份有限公司 Statistical analysis method and device
CN107766424B (en) * 2017-09-13 2020-09-15 深圳市宇数科技有限公司 Data exploration management method and system, electronic equipment and storage medium
CN107844634B (en) * 2017-09-30 2021-05-25 平安科技(深圳)有限公司 Modeling method of multivariate general model platform, electronic equipment and computer readable storage medium
CN107958268A (en) * 2017-11-22 2018-04-24 用友金融信息技术股份有限公司 The training method and device of a kind of data model
CN108133294B (en) * 2018-01-10 2020-12-04 阳光财产保险股份有限公司 Prediction method and device based on information sharing
CN108389631A (en) * 2018-02-07 2018-08-10 平安科技(深圳)有限公司 Varicella morbidity method for early warning, server and computer readable storage medium
CN110909066B (en) * 2019-12-06 2021-03-16 中科院计算技术研究所大数据研究院 Streaming data processing method based on SparkSQL and RestAPI
CN113590686B (en) * 2021-07-29 2023-11-10 深圳博沃智慧科技有限公司 Processing method, device and equipment for ecological environment data index

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763105A (en) * 2010-01-07 2010-06-30 冶金自动化研究设计院 Self-adaptation selectable constrained gas optimizing dispatching system and method for steel enterprises
CN202159334U (en) * 2011-03-14 2012-03-07 李盼池 Polymer flooding development index predicting system
CN103020448A (en) * 2012-12-11 2013-04-03 南京航空航天大学 Method and system for predicting instantaneous value of airport noise based on time series analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763105A (en) * 2010-01-07 2010-06-30 冶金自动化研究设计院 Self-adaptation selectable constrained gas optimizing dispatching system and method for steel enterprises
CN202159334U (en) * 2011-03-14 2012-03-07 李盼池 Polymer flooding development index predicting system
CN103020448A (en) * 2012-12-11 2013-04-03 南京航空航天大学 Method and system for predicting instantaneous value of airport noise based on time series analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
R软件的数据挖掘应用;陈荣鑫;《重庆工商大学学报(自然科学版)》;20111231;第28卷(第6期);第602-607页 *

Also Published As

Publication number Publication date
CN103886203A (en) 2014-06-25

Similar Documents

Publication Publication Date Title
CN103886203B (en) Automatic modeling system and method based on index prediction
Andryushkevich et al. Composition and application of power system digital twins based on ontological modeling
AU2019202916B2 (en) Quantum computing improvements to transportation
CN109726234A (en) Integrated circuit testing Information Management System based on industry internet
Relich et al. The use of intelligent systems for planning and scheduling of product development projects
CN108830745A (en) Power grid cascading fault diagnosis, early warning, assessment system based on monitoring information
CN107526832A (en) A kind of method for building the big data business model that technology is pulled based on the page
CN109407654A (en) A kind of non-linear causality analysis method of industrial data based on sparse depth neural network
US11436494B1 (en) Optimal power flow computation method based on multi-task deep learning
CN109784692A (en) A kind of fast and safely constraint economic load dispatching method based on deep learning
CN103337041A (en) System for intelligent decision-making of concrete dam pouring construction based on knowledge engineering and method thereof
Tirkel Cycle time prediction in wafer fabrication line by applying data mining methods
Liu et al. Object Oriented Bayesian Network for complex system risk assessment
Rocha et al. Bottleneck prediction and data-driven discrete-event simulation for a balanced manufacturing line
CN106951621B (en) A kind of simulation optimization method carrying Plan rescheduling for ship
CN104778254B (en) A kind of distributed system and mask method of non-parametric topic automatic marking
WO2023129164A1 (en) Digital twin sequential and temporal learning and explaining
Singh et al. Multi-objective stochastic heuristic methodology for tradespace exploration of a network centric system of systems
Kłos et al. Using a simulation method for intelligent maintenance management
Brochado et al. Understanding and Predicting Process Performance Variations of a Balanced Manufacturing Line at Bosch
CN104239046B (en) Software adaptive approach based on HMM and multi-objective Evolutionary Algorithm
EP4305565A1 (en) Computer-implemented methods referring to an industrial process for manufacturing a product and system for performing said methods
Hua et al. A new method of the constraints expression and handling for excavator boom structural optimization
Han et al. A pyramidal model for initial problem situation analysis process
Dagnino et al. MAP: Design, Development, Deployment, and Maintenance of Industrie 4.0 AI Applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170111

Termination date: 20180324