CN100568243C - The method and system of a kind of data mining and modeling - Google Patents

The method and system of a kind of data mining and modeling Download PDF

Info

Publication number
CN100568243C
CN100568243C CNB2007101495069A CN200710149506A CN100568243C CN 100568243 C CN100568243 C CN 100568243C CN B2007101495069 A CNB2007101495069 A CN B2007101495069A CN 200710149506 A CN200710149506 A CN 200710149506A CN 100568243 C CN100568243 C CN 100568243C
Authority
CN
China
Prior art keywords
data
modeling
rule
module
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2007101495069A
Other languages
Chinese (zh)
Other versions
CN101110089A (en
Inventor
劳玮
闫延涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB2007101495069A priority Critical patent/CN100568243C/en
Publication of CN101110089A publication Critical patent/CN101110089A/en
Application granted granted Critical
Publication of CN100568243C publication Critical patent/CN100568243C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of data digging method, A. sets in advance the data pick-up rule, according to described data pick-up rule, extracts modeling data and score data from data source; B. selection algorithm carries out modeling to described modeling data; C. utilize the model of described foundation, described score data is marked; D. export appraisal result.The invention also discloses a kind of modeling method, modeling and data digging system.Use the present invention by being provided with and carrying out the data decimation rule, be provided with and carry out the data modeling flow process, realize the modeling of dynamic changing data.

Description

The method and system of a kind of data mining and modeling
Technical field
The present invention relates to data mining technology, be specifically related to the method and system of a kind of data mining and modeling.
Background technology
Knowledge discovery in database (KDD, Knowledge Discovery In Database) being artificial intelligence, machine learning and the product that multiple subjects such as database technology combine, is the advanced processes process that extracts credible, novel, useful and the pattern that can be understood by the people from mass data.The pattern here is exactly a knowledge, is hidden in data rule, relation or rule behind in other words conj.or perhaps.
Figure 1 shows that prior art KDD processing procedure, as shown in Figure 1, the KDD processing procedure mainly comprises data selection, data pre-service, data-switching, data mining and five steps of interpretation of scheme/knowledge evaluation.Data mining (DM, Data Mining) is an important step among the KDD, and relation and rule that the data after being used for finding to change exist hereinafter refer to the The whole analytical process of KDD with data mining.
Fig. 2 is based on the data digging method process flow diagram of operating in the prior art.As shown in Figure 2, this method may further comprise the steps:
Step 210: modeling data is handled, and manual usage mining instrument is set up data mining model.
This step comprises: the historical data that collection is relevant with the modeling target with arrangement, therefrom select to determine constant target matrix, and the tables of data of for example selecting a database is as modeling data, and is converted to the form that data mining needs; Select certain mining algorithm, the modeling data of determining is carried out modeling, obtain model; Repeat the operation of selection algorithm, identical modeling data is carried out modeling, obtain the another one model
Step 220: assessment data is handled.This step can be carried out side by side with step 210, or carries out before or after step 210.
This step comprises: collect the historical data relevant with forecasting problem, therefrom select assessment data, and be converted to the form that data mining needs.
Step 221 and step 211: manual usage mining instrument carries out model evaluation, obtains assessment report, determines optimization model according to assessment report.
This step comprises: utilize the ready assessment data of step 220, a plurality of models that step 210 is set up are assessed, promptly utilize to set up good model historical data is predicted,, be defined as optimization model the corresponding immediate model of result in prediction result and the historical data.
Step 230: score data is handled.This step can be arranged side by side with step 210, step 220 and step 221, or carried out before or after step 210, step 220 and step 221.
This step comprises: collect the data relevant with forecasting problem, be converted to the form that data mining needs.
Step 231: manual usage mining instrument, the processing of marking.
This step comprises: manual usage mining instrument, and utilize step 211 to set up good model, the ready score data of step 230 is handled, predicted the outcome, as the future development trend of data.
For example, in the customer churn model, what appraisal result reflected is the size of customer churn possibility, generally uses a numeric representation between 0~1, and this value possibility near 1 explanation customer churn more is big more.Obtaining some or certain predicting the outcome of client after scoring is handled as this step is 0.8, and the loss possibility that can be understood as this batch client or this client is 80%.
Step 232: manual usage mining instrument is derived and is predicted the outcome.
This step comprises: predicting the outcome of calculating of step 231 imported to the database from Data Mining Tools.
Step 233: in database, analyze, so that the data of different characteristic are taked different measures to predicting the outcome.
For example, the possibility that obtains some customer churn in the customer churn model is 80%, and promptly the possibility of customer churn is bigger, and the then operator's measure that can take some to keep at this batch client is to guarantee that this batch client continues as operator and brings profit.
If desired a plurality of data sources are carried out data mining, then repeat step described above.
By foregoing description as can be known, can't realize the modeling of dynamic changing data in the prior art, modeling each time can only be obtained established data from a data source of determining.When the data source of modeling or the tables of data in the data source changed to some extent, each modeling all needed the manual established data of reselecting needs.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of data digging method, realizes the modeling and the data mining of dynamic changing data.This method comprises:
A, set in advance the data pick-up rule; According to described data pick-up rule, from data source, extract modeling data and score data; B, selection algorithm carry out modeling to described modeling data, and according to the model evaluation rule that sets in advance, the model of setting up are assessed, and determine optimum model; C, utilize the model of described foundation, described score data is marked; D, output appraisal result.
The embodiment of the invention also provides a kind of modeling method, realizes the modeling of dynamic changing data.This method comprises: the data pick-up rule according to default, extract modeling data from data source; Selection algorithm carries out modeling to described modeling data; According to the model evaluation rule that sets in advance, the model of setting up is assessed, determine optimum model.
The embodiment of the invention also provides a kind of data digging system, has realized the modeling and the data mining of dynamic changing data.This system comprises data acquisition module, MBM, application module and represent module as a result,
Described data acquisition module is used to preserve the data pick-up rule of setting, extracts modeling data and score data according to described data pick-up rule from data source;
Described MBM comprises: algorithm is selected module, is used for selection algorithm; Model building module is used for the algorithm according to described selection, and the data of described data acquisition module are carried out modeling; Evaluation module is used to preserve the model evaluation rule that sets in advance, and according to described model evaluation rule the model of setting up is assessed, and determines optimum model;
Described application module as a result is used to utilize described model, and described score data is marked;
The described module that represents is used to export appraisal result.
The embodiment of the invention also provides a kind of modeling, has realized the modeling of dynamic changing data.This system comprises data acquisition module and MBM, and described data acquisition module is used to preserve the data pick-up rule of setting, extracts modeling data according to described rule from data source; Described MBM comprises: algorithm is selected module, is used for selection algorithm; Model building module is used for the algorithm according to described selection, and the data of described data acquisition module are carried out modeling; Evaluation module is used to preserve the model evaluation rule that sets in advance, and according to described model evaluation rule the model of setting up is assessed, and determines optimum model.
Compared with prior art, the technical scheme that the embodiment of the invention provided, the data pick-up rule by execution sets in advance extracts modeling data and score data from data source, according to the algorithm of selecting the modeling data that extracts is carried out modeling then; Utilize the model of setting up that the score data that extracts is marked, thereby can realize the modeling of dynamic changing data by the data pick-up rule is set flexibly.
Description of drawings
Fig. 1 is a KDD processing procedure in the prior art;
Fig. 2 is based on the data digging method process flow diagram of operating in the prior art;
Fig. 3 is for being used for the workflow synoptic diagram of data mining in the embodiment of the invention;
Fig. 4 is a data modeling method flow diagram in the embodiment of the invention;
Fig. 5 is a data modeling application process process flow diagram as a result in the embodiment of the invention;
Fig. 6 is a data digging system structural drawing in the embodiment of the invention.
Embodiment
The present invention is described in detail below in conjunction with drawings and the specific embodiments.
Data digging method in the embodiment of the invention sets in advance the data pick-up rule, according to described data pick-up rule, extracts modeling data and score data from data source; Selection algorithm carries out modeling to described modeling data; Utilize the model of described foundation, described score data is marked; The output appraisal result.Thereby can from data source, extract qualified data by the data pick-up rule is set, thereby make the influence that the modeling data extraction is not changed by the tables of data in data source or the data source, realize the modeling of dynamic changing data.
This method further realizes carrying out automatically of data mining by workflow and Control work stream is set.
A part or whole part of the operation flow that workflow promptly operates automatically shows as each operation flow file, information or task control rules is taken action, and makes it transmit between each operation flow.
Fig. 3 is the workflow synoptic diagram that is used for data mining in the embodiment of the invention.As shown in Figure 3, the workflow that is used for data mining that is provided with in the embodiment of the invention comprises that data obtain flow process, modeling flow process, application flow and represent flow process as a result.
Wherein, data are obtained the data pick-up rule of flow process by setting in advance, and extract modeling data and score data from data source, can also analyze target data, operation such as pre-service.Modeling process selecting algorithm carries out modeling to modeling data.Application flow is utilized the model that the modeling flow process is set up as a result, data is obtained the score data that flow process obtains mark.Represent flow process output appraisal result.
If only need carry out modeling, then the workflow that is used for modeling of embodiment of the invention setting includes only data and obtains flow process and modeling flow process.
By using workflow to carry out data mining, in the process that data are obtained,, extracted data also can be set repeatedly or once extract a plurality of data sources by the data pick-up rule of workflow is set, solved the modeling problem of dynamic changing data; Simultaneously, the embodiment of the invention can realize automatic modeling by starting workflow after the workflow setting is finished, do not need manual intervention, thereby accelerated the reaction velocity of each modeling, has improved modeling efficiency, has realized the automatic operating of data mining.
Below data digging method in the embodiment of the invention is elaborated.
Fig. 4 is a data modeling method flow diagram in the embodiment of the invention.As shown in Figure 4, this method may further comprise the steps:
Data are obtained flow process and are comprised step 400 and step 401.
Step 400: obtain modeling data according to the data pick-up rule that sets in advance.
In the embodiment of the invention,, obtain modeling data from data source by the data pick-up rule that regulation engine is provided with.
Regulation engine is meant the assembly that is embedded in the application program, its task is that the current data object of submitting to engine is tested and compared with the business rule that is carried in the engine, activate those and meet business rule under the current data state, actuating logic according to the business rule statement, trigger operation corresponding in the application program, for example the actuating logic of extracted data.
The data pick-up rule is set in this step can be comprised: the condition of data pick-up is set, for example age user's between 20~30 years old telephone expenses; Can further include setting extracted data from single or multiple data sources.For example from database 1, extract the data that satisfy certain condition, or from a plurality of data sources, extract the data that satisfy certain condition.Can also comprise repeatedly extracted data is set, for example be provided with and extract 3 secondary data, extracted data from database 1 for the first time, extracted data in the database 2 for the second time, or the like.
In this flow process, the rule of obtaining of obtaining rule and score data that comprises modeling data by the data pick-up rule of regulation engine setting, obtain modeling data by the rule of obtaining of carrying out modeling data, obtain score data by the rule of obtaining of carrying out score data; Also can be by after the data pick-up rule extraction data, the data that extract are denoted as modeling data or score data, also can further between modeling data and score data, the rule of correspondence be set, modeling data and score data be mapped by the rule of correspondence.
Step 401: data pre-service.
In this step, the data of obtaining by regulation engine are carried out pre-service, comprise data are analyzed, the processing of exceptional value, the processing of null value, the extraction of data and the conversion of data etc., thus process data into the data that can carry out modeling.For example: will above the data deletion of certain limit, with null value mend qualified value, from all data, extract certain field data and with data be converted to can modeling form or processing such as normalization.
The modeling flow process comprises step 402~step 405, and wherein step 404 and step 405 are model output flow process.
Step 402: select modeling algorithm and set up model.
In this step, the data of workflow are obtained flow process with the data transmission the obtained modeling flow process to workflow, obtain rule, selection algorithm by the modeling flow process of workflow according to the algorithm that sets in advance.Algorithm obtains rule and is provided with according to the modeling purpose, as the model of predictability, trade-off decision tree algorithm, logistic regression algorithm then can be set, neural network algorithm; The association analysis model then needs the selection association algorithm is set, and can not select the logistic regression algorithm.At last, according to the algorithm of selecting, also further carry out modeling through pretreated modeling data to what obtain.
Step 403: model evaluation.
This step can be provided with evaluation index especially according to selected algorithm, as the F value, and the z value, indexs such as square error are provided with Rules of Assessment according to the evaluation index of these settings, carry out the quality that Rules of Assessment comes assessment models, to determine optimization model.
In this step, evaluation index is set, hit rate for example, promptly according to the data that existed, the result who utilizes Model Calculation to come out compares the ratio that the result who calculates is correct with the result who has existed; According to this evaluation index modelling effect is assessed, promptly according to the index that is provided with, the quality between more a plurality of models, thus judge optimization model.
Step 404: output assessment report.
Step 405: output optimal rules.
The output of selecting optimum model analysis report to finish as modeling process.
So far, workflow is provided with end, has realized the modeling of dynamic data; For the modeling of multi-group data, only need to start workflow and get final product simultaneously, realized carrying out automatically of data modeling.
Fig. 5 is a data modeling application process process flow diagram as a result in the embodiment of the invention.As shown in Figure 5, this method may further comprise the steps:
Data are obtained and are comprised step 500 and step 501.
Step 500: obtain score data.
In this step, by the regulation engine setting obtain meet be provided with the rule score data.By regulation engine is set, can obtains data or obtain data from the data source of dynamic change from a plurality of data sources.
Step 501: score data is carried out the data pre-service.
This step is carried out the data pre-service to score data, comprises data analysis, exceptional value processing, the processing of null value, the extraction of data and the conversion of data etc., thereby data are converted to the data that can directly mark.
Application flow comprises step 502~step 503.
Step 502: mark and obtain appraisal result.
In this step, the optimization model that utilizes modeling process to select is marked to corresponding score data, obtains the appraisal result of score data.
Specifically, this step comprises: the code of points of model is set, makes optimization model corresponding with modeling data and score data, carry out code of points, utilize optimization model that score data is marked, obtain the appraisal result of score data, exist in the score data as index.
Step 503: according to demands of applications, the appraisal result output content is set, comprises the appraisal result analysis, perhaps partial evaluation result's output.
This step is according to the user demand of model, for example: to the age at 20-30 between year, be worth to high, and turnover rate is kept the user more than 0.8, then in representing of model part is set and age 20-30 and turnover rate>0.8 is set and is worth and be high rule, representing the result is legal user.If user basic information and turnover rate information are only arranged in the appraisal result table, then need to be provided with rule, by user basic information and another one user basic information is arranged, it is related to also have the table of age of user information and value information to carry out simultaneously, and output meets the information of rule request.Other content be can also export, various settings in the modeling process, model evaluation report and modeling result for example exported.
Execution in step 504 then if desired: output obtains the model evaluation report.
If desired a plurality of data sources are marked, then, just can realize the automatic scoring of data by the workflow that is provided with as long as startup according to the workflow of above-mentioned flow setting, is marked to a plurality of data sources.
By the above as can be known, from data source, obtain modeling data by the data pick-up rule is set, thereby realized the modeling of dynamic changing data; Further owing to set in advance the workflow of data mining, thereby as long as start automatic modeling and the automatic scoring that the workflow that sets can realize a plurality of data sources, do not need manpower to get involved, thereby reduced the use threshold of data mining, realize carrying out automatically of data mining simultaneously, accelerated the reaction velocity of data mining.
Below with the application example of reality the modeling method in the embodiment of the invention is elaborated.
If there are 18 districts and cities in certain province, preceding 5 the districts and cities respectively modeling the highest to churn rate, model is set up in other city together, and the client that prediction prefectures and cities may run off after 3 months may further comprise the steps:
At first, according to the project objective of predicting the client that prefectures and cities may run off after 3 months, the index and the data of identifying project and needing; This part is handled at lane database, does not carry out the source that data processed is obtained as data in digging tool;
Modeling data is set in workflow obtains flow process: by regulation engine the data pick-up rule is set, extracts the highest 5 the districts and cities' data (A, B, totally 5 of C, D, G) of turnover rate, and modeling separately, obtain prefectures and cities' modeling data and score data respectively; Other districts and cities outside these 5 districts and cities treat as x districts and cities are unified; The aiming field that (lose) field that runs off in the data on x ground is set to mark;
The pretreated rule of data is set in workflow: to the modeling data that obtains, extract 5% data random sampling and carry out modeling, by the null value of data the inside is removed, exceptional value (get 5% confidence limit), the data pre-service is carried out in the 0-1 standardization; Obtain the high-quality data of directly modeling.
The modeling flow process of workflow is set: to pretreated data, select logistic regression algorithm or decision Tree algorithms, the parameter (for example, adopting default value) of algorithm is set; According to the algorithm that is provided with the modeling data after handling is carried out modeling, the rule rule that obtains running off;
In the modeling flow process of workflow, model evaluation is set: the comparative evaluation index (for example, hit rate) that model is set; Execution result to model is selected optimization model according to the evaluation index that is provided with, and for example, if the hit rate of model result is the highest, then this model is defined as optimum model; And the highest model of this hit rate is defined as the regular rule of data;
Application flow as a result is set: utilize modeling result, to the score data scoring of corresponding districts and cities to different districts and cities in workflow; For example, the optimization model that utilizes A districts and cities to set up is predicted the churn rate of these districts and cities after three months; The optimization model that utilizes x districts and cities to set up, to x districts and cities after three months churn rate predict.
In workflow, be provided with and represent flow process: for example, generate the modeling report of A districts and cities, check the overall process of A districts and cities modeling and the assessment effect of model result by the modeling report.According to application need, the model output content is set, comprise the model evaluation report, the model appraisal result is analyzed, model application data output field etc.For example, the churn rate of the prediction of output is more than 0.5, and bill is taken in the user ID more than 50 yuan.
At last, Control work stream carries out automatic modeling to other districts and cities; To user's automatic scoring of other districts and cities, prediction loss result, output meets the user ID of the condition of setting.
Fig. 6 is the data digging system structural drawing in the embodiment of the invention.As shown in Figure 6, this system comprises data acquisition module, MBM, application module and represent module as a result in workflow.
Wherein, data acquisition module is used to preserve the data pick-up rule of setting, extracts modeling data and score data according to this rule from data source.MBM is used for selection algorithm, and the modeling data that the data acquisition module obtains is set up model.Application module as a result is used to utilize the model of foundation, and the score data that the data acquisition module obtains is marked.Represent module, be used to export appraisal result.
If only need set up model, then modeling comprises data acquisition module and MBM in the embodiment of the invention.
Specifically, data acquisition module comprises that rule engine module, abstraction module and decimation rule are provided with module.
Wherein, rule engine module is used to be provided with the data pick-up rule.Abstraction module is used for the data pick-up rule according to the regulation engine setting, extracts modeling data and score data from data source.Decimation rule is provided with module, is used to be provided with the condition of data pick-up, and repeatedly extracted data is set, or from single or multiple data sources extracted data.
Data acquisition module also can further comprise pretreatment module, is used for the data that abstraction module extracts are carried out pre-service.
MBM comprises algorithm selection module, model building module and evaluation module, and wherein, algorithm is selected module, is used for selection algorithm.Model building module is used for the algorithm according to the selection of algorithm selection module, and the data of data acquisition module are carried out modeling.Evaluation module is used to preserve the model evaluation rule that sets in advance, and according to the model evaluation rule model of setting up is assessed, and determines optimum model.
By the above as can be seen, the technical scheme that the embodiment of the invention provided, the data pick-up rule by execution sets in advance extracts modeling data and score data from data source, according to the algorithm of selecting the modeling data that extracts is carried out modeling then; Utilize the model of setting up that the score data that extracts is marked, thereby can realize the modeling of dynamic changing data by the data pick-up rule is set flexibly.
The technical scheme of the embodiment of the invention further is provided with data and obtains flow process, modeling flow process, application flow and represent flow process as a result in workflow, the modeling data that obtains is carried out modeling, the score data of obtaining is marked, and multi-group data is carried out modeling and scoring by the good workflow of control setting, thereby the automatic modeling and the scoring of data have been realized, improve the reaction velocity of modeling, realized carrying out automatically of modeling and data mining.
Simultaneously, excavate, only need the capable personnel that start workflow promptly can realize data mining, therefore, do not need to carry out specially personnel's training, provide cost savings for data mining owing to flow to line data by Control work.
The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (14)

1, a kind of data digging method is characterized in that, this method comprises:
A, set in advance the data pick-up rule; According to described data pick-up rule, from data source, extract modeling data and score data;
B, selection algorithm carry out modeling to described modeling data, and according to the model evaluation rule that sets in advance, the model of setting up are assessed, and determine optimum model;
C, utilize the model of described foundation, described score data is marked;
D, output appraisal result.
2, data digging method as claimed in claim 1 is characterized in that, this method further comprises:
Set in advance and comprise that data obtain flow process, modeling flow process, application flow and represent the workflow of flow process as a result;
Described steps A is obtained in the flow process in described data and is carried out; Described step B carries out in described modeling flow process; Described step C carries out in described application flow as a result; Described step D carried out in described representing in the flow process.
3, data digging method as claimed in claim 1 or 2 is characterized in that, described steps A further comprises: described modeling data and score data are carried out pre-service.
4, data digging method as claimed in claim 1 or 2 is characterized in that, the described data pick-up rule that is provided with comprises: the condition of data pick-up is set, and repeatedly extracted data is set, or extract a plurality of data from single or multiple data sources.
5, a kind of modeling method is characterized in that, this method comprises:
A ', the default data pick-up rule of basis extract modeling data from data source;
B ', selection algorithm carry out modeling to described modeling data;
The model evaluation rule that C ', basis set in advance is assessed the model of setting up, and determines optimum model.
6, modeling method as claimed in claim 5 is characterized in that, this method further comprises:
Set in advance and comprise that data obtain the workflow of flow process and modeling flow process;
Described steps A ' obtain in the flow process in described data and to carry out; Described step B ' carries out in described modeling flow process.
7, as claim 5 or 6 described modeling methods, it is characterized in that described steps A ' further comprise: the modeling data to described extraction carries out pre-service.
As claim 5 or 6 described modeling methods, it is characterized in that 8, the described data pick-up rule that is provided with comprises: the condition of data pick-up is set, and repeatedly extracted data is set, perhaps from single or multiple data sources, extract a plurality of data.
9, a kind of data digging system is characterized in that, this system comprises data acquisition module, MBM, application module and represent module as a result,
Described data acquisition module is used to preserve the data pick-up rule of setting, extracts modeling data and score data according to described data pick-up rule from data source;
Described MBM comprises: algorithm is selected module, model building module and evaluation module, wherein:
Described algorithm is selected module, is used for selection algorithm;
Described model building module is used for the algorithm according to described selection, and the data of described data acquisition module are carried out modeling;
Described evaluation module is used to preserve the model evaluation rule that sets in advance, and according to described model evaluation rule the model of setting up is assessed, and determines optimum model;
Described application module as a result is used to utilize described model, and described score data is marked;
The described module that represents is used to export appraisal result.
10, data digging system as claimed in claim 9 is characterized in that, described data acquisition module comprises:
Rule engine module is used to be provided with the data pick-up rule;
Abstraction module is used for the data pick-up rule according to described rule engine module, extracts modeling data and score data from data source;
Reach decimation rule module is set, be used to be provided with the condition of data pick-up, and repeatedly extracted data is set, perhaps extracted data from single or multiple data sources.
11, as claim 9 or 10 described data digging systems, it is characterized in that described data acquisition module further comprises: pretreatment module is used for described modeling data is carried out pre-service.
12, a kind of modeling is characterized in that, this system comprises data acquisition module and MBM,
Described data acquisition module is used to preserve the data pick-up rule of setting, extracts modeling data according to described rule from data source;
Described MBM comprises: algorithm is selected module, model building module and evaluation module, wherein:
Described algorithm is selected module, is used for selection algorithm;
Described model building module is used for the algorithm according to described selection, and the data of described data acquisition module are carried out modeling;
Described evaluation module is used to preserve the model evaluation rule that sets in advance, and according to described model evaluation rule the model of setting up is assessed, and determines optimum model.
13, modeling as claimed in claim 12 is characterized in that, described data acquisition module comprises:
Rule engine module is used to be provided with the data pick-up rule;
Abstraction module is used for extracting modeling data according to described data pick-up rule from data source;
And decimation rule is provided with module, is used to be provided with the condition of data pick-up, and equipment extracted data repeatedly, or from single or multiple data sources extracted data.
14, modeling as claimed in claim 13 is characterized in that, described data acquisition module further comprises: pretreatment module is used for the modeling data that described abstraction module extracts is carried out pre-service.
CNB2007101495069A 2007-09-04 2007-09-04 The method and system of a kind of data mining and modeling Expired - Fee Related CN100568243C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101495069A CN100568243C (en) 2007-09-04 2007-09-04 The method and system of a kind of data mining and modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101495069A CN100568243C (en) 2007-09-04 2007-09-04 The method and system of a kind of data mining and modeling

Publications (2)

Publication Number Publication Date
CN101110089A CN101110089A (en) 2008-01-23
CN100568243C true CN100568243C (en) 2009-12-09

Family

ID=39042157

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101495069A Expired - Fee Related CN100568243C (en) 2007-09-04 2007-09-04 The method and system of a kind of data mining and modeling

Country Status (1)

Country Link
CN (1) CN100568243C (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8977720B2 (en) * 2010-12-04 2015-03-10 Zhaobing Meng Internet based hosted system and computer readable medium for modeling analysis
CN102693317B (en) * 2012-05-29 2014-11-05 华为软件技术有限公司 Method and device for data mining process generating
CN103336993B (en) * 2013-03-22 2016-06-08 无锡必通科技有限公司 Double; two automatizatioies aid decision-making system
CN104346372B (en) * 2013-07-31 2018-03-27 国际商业机器公司 Method and apparatus for assessment prediction model
CN104484409A (en) * 2014-12-16 2015-04-01 芜湖乐锐思信息咨询有限公司 Data mining method for big data processing
CN105045764A (en) * 2015-08-11 2015-11-11 精硕世纪科技(北京)有限公司 Method and system for acquiring input parameters of model cluster
CN106874306B (en) * 2015-12-14 2020-10-09 公安部户政管理研究中心 Method for evaluating key performance index of population information portrait comparison system
CN105718535B (en) * 2016-01-15 2018-03-02 深圳大学 A kind of online methods of marking and its system
CN107038167A (en) * 2016-02-03 2017-08-11 普华诚信信息技术有限公司 Big data excavating analysis system and its analysis method based on model evaluation
CN106202163A (en) * 2016-06-24 2016-12-07 中国环境科学研究院 Tongjiang lake ecological monitoring information management and early warning system
CN106778976A (en) * 2016-12-28 2017-05-31 芜湖乐锐思信息咨询有限公司 Road information management system based on the treatment of multi-form big data
CN106933956B (en) * 2017-01-22 2020-12-01 深圳市华成峰科技有限公司 Data mining method and device
CN107248030A (en) * 2017-05-26 2017-10-13 谢首鹏 A kind of bond Risk Forecast Method and system based on machine learning algorithm
CN107832429A (en) * 2017-11-14 2018-03-23 广州供电局有限公司 audit data processing method and system
CN107798137B (en) * 2017-11-23 2018-12-18 霍尔果斯智融未来信息科技有限公司 A kind of multi-source heterogeneous data fusion architecture system based on additive models
CN108615560A (en) * 2018-03-19 2018-10-02 安徽锐欧赛智能科技有限公司 A kind of clinical medical data analysis method based on data mining
CN110362303B (en) * 2019-07-15 2020-08-25 深圳市宇数科技有限公司 Data exploration method and system
WO2021056275A1 (en) * 2019-09-25 2021-04-01 Accenture Global Solutions Limited Optimizing generation of forecast
CN111581771A (en) * 2020-03-30 2020-08-25 无锡融合大数据创新中心有限公司 Stamping workpiece cracking prediction platform based on artificial intelligence technology
CN111724028A (en) * 2020-05-08 2020-09-29 中海创科技(福建)集团有限公司 Machine equipment operation analysis and mining system based on big data technology
CN114996331B (en) * 2022-06-10 2023-01-20 北京柏睿数据技术股份有限公司 Data mining control method and system
CN116362379A (en) * 2023-02-27 2023-06-30 上海交通大学 Nuclear reactor operation parameter prediction method based on six-dimensional index

Also Published As

Publication number Publication date
CN101110089A (en) 2008-01-23

Similar Documents

Publication Publication Date Title
CN100568243C (en) The method and system of a kind of data mining and modeling
CN105447090B (en) A kind of automatic data mining preprocess method
CN108877223A (en) A kind of Short-time Traffic Flow Forecasting Methods based on temporal correlation
CN104572449A (en) Automatic test method based on case library
CN106708738B (en) Software test defect prediction method and system
CN100470547C (en) Method, system and device for implementing data mining model conversion and application
CN104008143A (en) Vocational ability index system establishment method based on data mining
CN109635006A (en) Social security business association rule digging and recommendation apparatus and method based on Apriori
CN111274301B (en) Intelligent management method and system based on data assets
CN110245693B (en) Key information infrastructure asset identification method combined with mixed random forest
CN106951565B (en) File classification method and the text classifier of acquisition
CN115794803B (en) Engineering audit problem monitoring method and system based on big data AI technology
CN114202243A (en) Engineering project management risk early warning method and system based on random forest
CN111985852A (en) Multi-service collaborative quality control system construction method based on industrial big data
CN112598142B (en) Wind turbine maintenance working quality inspection auxiliary method and system
CN117519656A (en) Software development system based on intelligent manufacturing
CN115186935B (en) Electromechanical device nonlinear fault prediction method and system
CN106649599A (en) Knowledge service oriented scientific research data processing and predictive analysis platform
CN110399510A (en) A kind of the check of drawings method for pushing and system of engineering drawing
CN116307489A (en) Visual dynamic analysis method and system based on user behavior modeling
CN115600856A (en) Method, device, equipment and medium for automatically auditing and distributing events
CN111709594A (en) Economic management data analysis system
CN115983809B (en) Enterprise office management method and system based on intelligent portal platform
CN113837440B (en) Blasting effect prediction method and device, electronic equipment and medium
Weeraddana et al. Machine learning for water mains maintenance

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091209

Termination date: 20160904