CN106202310A - A kind of method setting up data mining automatic feedback system - Google Patents
A kind of method setting up data mining automatic feedback system Download PDFInfo
- Publication number
- CN106202310A CN106202310A CN201610512308.3A CN201610512308A CN106202310A CN 106202310 A CN106202310 A CN 106202310A CN 201610512308 A CN201610512308 A CN 201610512308A CN 106202310 A CN106202310 A CN 106202310A
- Authority
- CN
- China
- Prior art keywords
- data
- data mining
- parameter
- algorithm
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/217—Database tuning
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of method setting up data mining automatic feedback system, for the problem solving to need continuous manual feedback tuning during current data mining realizes.The data mining automatic feedback system that the method is set up, including data segmentation module, evaluation of result module, parameter adjustment module;Three big module cooperative work form feedback, automatically adjust, optimize the parameter of data mining algorithm, carry out data mining than ever and more save human cost.By the automatic adjustment algorithm of parameter efficiently, automatic adjustment algorithm parameter accurately, effective improve parameter adjustment efficiency, reduce the actual waste that traversal parameter value scope too much in automation process causes.
Description
Technical field
The present invention relates to Data Mining, particularly to a kind of method setting up data mining automatic feedback system.
Background technology
Along with big data technique develops rapidly, data mining technology is more widely used, colleges and universities, R&D institution, political affairs
Mansion and technology enterprise all use data mining technology widely.
One complete data mining process frequently includes: data prediction, data mining algorithm perform and data knot
Retribution announcement etc..Wherein the step of most critical is through the execution of data mining algorithm and obtains the result of data mining, this step
Suddenly substantial amounts of manual intervention and feedback are generally required.Manual intervention and feedback show as being observed by the empirical model of expert performing
The result of algorithm, readjusts algorithm parameter further according to result and re-executes algorithm and obtain new result, until data mining obtains
Satisfied result.This process often expends substantial amounts of human cost, wastes time and efforts.Although most of data minings
Algorithm can constantly iteration and convergence, but owing to initial parameter and the data that cause of locally optimal solution during calculating are dug
Pick result is undesirable, can not be solved by the iteration of algorithm own.
Summary of the invention
It is an object of the invention to solve need asking of continuous manual feedback tuning during current data mining realizes
Topic, it is provided that a kind of set up efficiently, can realize, the method for automatic data mining automatic feedback system.
A kind of method setting up data mining automatic feedback system involved in the present invention, described data mining automatic feedback
System includes data segmentation module, evaluation of result module, parameter adjustment module;Data segmentation module is for being divided into instruction by data
Practice data and evaluating data;Evaluation of result module is for the satisfaction of evaluating data Result, and the result of evaluation feeds back to ginseng
Number data point reuse module;Parameter adjustment module adjusts data mining algorithm parameter according to the evaluation of evaluation of result module.
A kind of method setting up data mining automatic feedback system involved in the present invention, its step is as follows:
Step 1, being training data and test data by source data to be excavated random division in proportion, wherein training data will
For training data mining algorithm model, test data, for the accuracy of evaluating data mining model, are held for process each time
Row carries out repeated segmentation and uses different random prototype, it is to avoid because the occasionality of random division affects the evaluation to arithmetic result;
If step 2 data mining algorithm is output as model, then test data data segmentation in step 1 produced
Independent variable carries out data mining as input, the algorithm model using data mining algorithm training to produce, tests in comparison step 1
In data, the data result of script and use algorithm model carry out the output excavated, and calculate the two matching degree, match calculating
Go out the network performance indexes such as MSE and RMSE and draw the Accuracy evaluation to algorithm model;
If data mining algorithm is output as result data, then data mining results training data produced and test number
According to comparing, calculating the two matching degree, the data matched calculate the network performance indexes such as MSE and RMSE, and just
Matching degree and network performance index feed back to supplemental characteristic adjusting module;
Step 3, according in step 2 to data mining algorithm model test results and the Accuracy evaluation to algorithm model,
According to the feedback result of evaluation of result module, use parameter automatic adjustment algorithm that the parameter of data mining is adjusted;
Step 4, using adjusting the data mining algorithm model after parameter as new algorithm model, re-execute step 1, directly
Test result to data mining algorithm model reaches requirement;
Wherein the automatic adjustment algorithm of parameter described in step 3 includes: parameter is divided into scalar parameter and vector parameters;Carry out
When adjusting ginseng, preferentially adjust scalar parameter, adjust scalar parameter when still can not meet demand, by granularity from the coarse to fine in the way of, by
Each vector parameters of successive step;
Further, above-mentioned scalar parameter refers to the value that value is finite number of parameter, as similarity distance method be only capable of into
The limited values such as Euclidean distance, Minkowski distance, manhatton distance;
Further, above-mentioned vector parameters refers to the parameter that can adjust with any floating number in certain limit, such as simplicity
The smoothing parameter of Bayesian Classification Arithmetic;
The wherein performance function of a MSE network described in step 2, for the mean square error of network, its computational methods are as follows:
The wherein performance function of a RMSE network described in step 2, for the root-mean-square error of network, its computational methods are such as
Under:
A kind of data mining automatic feedback system that the method for the present invention is set up, its data segmentation module, evaluation of result
Module, parameter adjustment module collaborative work form feedback, automatically adjust, optimize the parameter of data mining algorithm, carry out than ever
Human cost is more saved in data mining.By data segmentation module, data are divided into training data and test data, make data
Mining effect checking is evidence-based.By evaluation of result module, data mining algorithm result is evaluated, to data mining effect
Make feedback, make parameter adjustment more scientific.Automatically being adjusted the parameter of data mining algorithm by parameter adjustment module, minimizing makes
The manpower waste brought with expertise model.By the automatic adjustment algorithm of parameter efficiently, automatic adjustment algorithm ginseng accurately
Number, the effective parameter adjustment efficiency that improves, the actual wave that traversal parameter value scope too much in minimizing automation process causes
Take.
Accompanying drawing explanation
Fig. 1 is data mining automatic feedback working-flow figure in the embodiment of the present invention;
Wherein, 1 is data segmentation module, and 2 is evaluation of result module, and 3 is parameter adjustment module.
Fig. 2 is data segmentation module workflow diagram in the embodiment of the present invention;
Fig. 3 is evaluation of result module workflow diagram in the embodiment of the present invention;
Fig. 4 is evaluation of result module workflow diagram in the embodiment of the present invention;
Fig. 5 is parameter adjustment module workflow diagram in the embodiment of the present invention.
Detailed description of the invention
The present invention is further described with embodiment below in conjunction with the accompanying drawings.
Embodiment one
Exemplary by the embodiment being described with reference to the drawings, be only used for explain the present invention, and it is not intended that to this
The restriction of invention.
A kind of data mining automatic feedback system involved by the present embodiment, its workflow diagram is as it is shown in figure 1, specifically walk
Rapid as follows:
Step 1, data are split
As in figure 2 it is shown, be training data and test data by source data to be excavated random division in proportion, wherein train number
According to being used for training data mining algorithm model, test data are for the accuracy of evaluating data mining model, for mistake each time
Cheng Zhihang carries out repeated segmentation and uses different random prototype, it is to avoid comment arithmetic result because the occasionality of random division affects
Valency.
Step 2, train and assess data mining algorithm
If as it is shown on figure 3, data mining algorithm is output as model, the test data that data segmentation in step 1 is produced
Independent variable as input, the algorithm model using data mining algorithm training to produce carries out data mining, surveys in comparison step 1
In examination data, the data result of script and use algorithm model carry out the output excavated, and calculate the two matching degree, match meter
Calculate the network performance indexes such as MSE and RMSE and draw the Accuracy evaluation to algorithm model.
As shown in Figure 4, if data mining algorithm is output as result data, then data mining knot training data produced
Fruit, compared with test data, calculates the two matching degree, and the data matched calculate the network performances such as MSE and RMSE and refer to
Mark, and just matching degree and network performance index feed back to supplemental characteristic adjusting module.
The wherein performance function of a MSE network described in step 2, for the mean square error of network, its computational methods are as follows:
The wherein performance function of a RMSE network described in step 2, for the root-mean-square error of network, its computational methods are such as
Under:
Step 3, adjustment data mining algorithm parameter optimization data mining algorithm
As it is shown in figure 5, according to the feedback result of evaluation of result module, use the automatic adjustment algorithm of parameter to data mining
Parameter is adjusted.Wherein the automatic adjustment algorithm of parameter includes: parameter is divided into scalar parameter and vector parameters;Carry out adjusting ginseng
Time, preferentially adjust scalar parameter, adjust scalar parameter when still can not meet demand, by granularity from the coarse to fine in the way of, progressively adjust
Each vector parameters whole.Wherein scalar parameter refers to the value that value is finite number of parameter, as similarity distance method is only capable of as Europe
Must be apart from, the limited value such as Minkowski distance, manhatton distance in several;Wherein refer to can be in certain limit for vector parameters
The parameter that can adjust with any floating number, such as the smoothing parameter of Naive Bayes Classification Algorithm.
As it is shown in figure 1, this data mining automatic feedback system comprises 3 modules: data segmentation module 1, evaluation of result mould
Block 2, parameter adjustment module 3.
Wherein, evaluation of result module 2, parameter adjustment module 3 form into a feedback loop with data mining algorithm, are obtaining
Before satisfied data mining results, in feedback loop, constantly carry out positive feedback optimization.The result of calculation of data mining algorithm or
The mode input drawn is to algorithm evaluation module, and the output result of algorithm evaluation module is input to parameter adjustment module, and parameter is adjusted
Whole result is applied to again in data mining algorithm, forms an annular computing system.
In the description of this specification, reference term " embodiment ", " some embodiments ", " example ", " specifically show
Example " or the description of " some examples " etc. means to combine this embodiment or example describes specific features, structure, material or spy
Point is contained at least one embodiment or the example of the present invention.In this manual, to the schematic representation of above-mentioned term not
Necessarily refer to identical embodiment or example.And, the specific features of description, structure, material or feature can be any
One or more embodiments or example in combine in an appropriate manner.
Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that: not
These embodiments can be carried out multiple change in the case of departing from the principle of the present invention and objective, revise, replace and modification, this
The scope of invention is limited by claim and equivalent thereof.
Claims (1)
1. the method setting up data mining automatic feedback system, is characterized in that the method step is as follows:
Step 1, being training data and test data by source data to be excavated random division in proportion, wherein training data will be used for
Training data mining algorithm model, test data for the accuracy of evaluating data mining model, for process each time perform into
Row repeated segmentation and use different random prototype, it is to avoid because the occasionality of random division affects the evaluation to arithmetic result;
If step 2 data mining algorithm is output as model, then certainly becoming of test data data segmentation in step 1 produced
Amount carries out data mining as input, the algorithm model using data mining algorithm training to produce, and tests data in comparison step 1
The data result of middle script and use algorithm model carry out the output excavated, and calculate the two matching degree, match and calculate MSE
And the network performance index such as RMSE draws the Accuracy evaluation to algorithm model;
If data mining algorithm is output as result data, then data mining results training data produced and test data phase
Relatively, calculating the two matching degree, the data matched calculate the network performance indexes such as MSE and RMSE, and just mate
Degree and network performance index feed back to supplemental characteristic adjusting module;
Step 3, according in step 2 to data mining algorithm model test results and the Accuracy evaluation to algorithm model, according to
The feedback result of evaluation of result module, uses the automatic adjustment algorithm of parameter to be adjusted the parameter of data mining;
Step 4, using adjusting the data mining algorithm model after parameter as new algorithm model, re-execute step 1, until number
Requirement is reached according to the test result of mining algorithm model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610512308.3A CN106202310A (en) | 2016-07-01 | 2016-07-01 | A kind of method setting up data mining automatic feedback system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610512308.3A CN106202310A (en) | 2016-07-01 | 2016-07-01 | A kind of method setting up data mining automatic feedback system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106202310A true CN106202310A (en) | 2016-12-07 |
Family
ID=57463044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610512308.3A Pending CN106202310A (en) | 2016-07-01 | 2016-07-01 | A kind of method setting up data mining automatic feedback system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106202310A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764808A (en) * | 2018-03-29 | 2018-11-06 | 北京九章云极科技有限公司 | Data Analysis Services system and its on-time model dispositions method |
CN109299178A (en) * | 2018-09-30 | 2019-02-01 | 北京九章云极科技有限公司 | A kind of application method and data analysis system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110099136A1 (en) * | 2009-10-23 | 2011-04-28 | Gm Global Technology Operations, Inc. | Method and system for concurrent event forecasting |
CN103559303A (en) * | 2013-11-15 | 2014-02-05 | 南京大学 | Evaluation and selection method for data mining algorithm |
-
2016
- 2016-07-01 CN CN201610512308.3A patent/CN106202310A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110099136A1 (en) * | 2009-10-23 | 2011-04-28 | Gm Global Technology Operations, Inc. | Method and system for concurrent event forecasting |
CN103559303A (en) * | 2013-11-15 | 2014-02-05 | 南京大学 | Evaluation and selection method for data mining algorithm |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764808A (en) * | 2018-03-29 | 2018-11-06 | 北京九章云极科技有限公司 | Data Analysis Services system and its on-time model dispositions method |
CN109299178A (en) * | 2018-09-30 | 2019-02-01 | 北京九章云极科技有限公司 | A kind of application method and data analysis system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hoffer et al. | Norm matters: efficient and accurate normalization schemes in deep networks | |
US10552737B2 (en) | Artificial neural network class-based pruning | |
US20180330185A1 (en) | Spatial transformer modules | |
WO2021041133A1 (en) | Resource constrained neural network architecture search | |
CN107392125A (en) | Training method/system, computer-readable recording medium and the terminal of model of mind | |
US20220036231A1 (en) | Method and device for processing quantum data | |
CN109815855B (en) | Electronic equipment automatic test method and system based on machine learning | |
Ye et al. | Lightnet: A versatile, standalone matlab-based environment for deep learning | |
CN112116104B (en) | Method, device, medium and electronic equipment for automatically integrating machine learning | |
CN106202310A (en) | A kind of method setting up data mining automatic feedback system | |
Cheng et al. | Predictor-corrector policy optimization | |
CN107729921B (en) | Machine active learning method and learning system | |
CN110807428B (en) | Coal sample identification method, device, server and storage medium | |
US20200320380A1 (en) | Deep learning experiment content generation based on single design | |
CN110782016A (en) | Method and apparatus for optimizing neural network architecture search | |
Sankaran et al. | On the impact of larger batch size in the training of physics informed neural networks | |
Chan et al. | AI-based robust convex relaxations for supporting diverse QoS in next-generation wireless systems | |
CN112115407A (en) | Yixin machine data input equipment and method for inputting data into Yi xin machine | |
Duggal et al. | High performance squeezenext for cifar-10 | |
Kushner et al. | Limits for parabolic partial differential equations with wide band stochastic coefficients andan application to filtering theory | |
CN113869492A (en) | Counterfactual sample generation method, model adjustment method, device and medium | |
Zhao et al. | Groundwater level forecasting based on support vector machine | |
Benchikhi et al. | An ant colony based model to optimize parameters in industrial vision | |
Lamnisos et al. | Adaptive MC^ 3 and Gibbs algorithms for Bayesian Model Averaging in Linear Regression Models | |
Xie et al. | A Lightweight Neural Architecture Search Model for Medical Image Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20161207 |