CN106202310A - A kind of method setting up data mining automatic feedback system - Google Patents

A kind of method setting up data mining automatic feedback system Download PDF

Info

Publication number
CN106202310A
CN106202310A CN201610512308.3A CN201610512308A CN106202310A CN 106202310 A CN106202310 A CN 106202310A CN 201610512308 A CN201610512308 A CN 201610512308A CN 106202310 A CN106202310 A CN 106202310A
Authority
CN
China
Prior art keywords
data
data mining
parameter
algorithm
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610512308.3A
Other languages
Chinese (zh)
Inventor
张学睿
张帆
魏敏
王国胤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Institute of Green and Intelligent Technology of CAS
Original Assignee
Chongqing Institute of Green and Intelligent Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Institute of Green and Intelligent Technology of CAS filed Critical Chongqing Institute of Green and Intelligent Technology of CAS
Priority to CN201610512308.3A priority Critical patent/CN106202310A/en
Publication of CN106202310A publication Critical patent/CN106202310A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of method setting up data mining automatic feedback system, for the problem solving to need continuous manual feedback tuning during current data mining realizes.The data mining automatic feedback system that the method is set up, including data segmentation module, evaluation of result module, parameter adjustment module;Three big module cooperative work form feedback, automatically adjust, optimize the parameter of data mining algorithm, carry out data mining than ever and more save human cost.By the automatic adjustment algorithm of parameter efficiently, automatic adjustment algorithm parameter accurately, effective improve parameter adjustment efficiency, reduce the actual waste that traversal parameter value scope too much in automation process causes.

Description

A kind of method setting up data mining automatic feedback system
Technical field
The present invention relates to Data Mining, particularly to a kind of method setting up data mining automatic feedback system.
Background technology
Along with big data technique develops rapidly, data mining technology is more widely used, colleges and universities, R&D institution, political affairs Mansion and technology enterprise all use data mining technology widely.
One complete data mining process frequently includes: data prediction, data mining algorithm perform and data knot Retribution announcement etc..Wherein the step of most critical is through the execution of data mining algorithm and obtains the result of data mining, this step Suddenly substantial amounts of manual intervention and feedback are generally required.Manual intervention and feedback show as being observed by the empirical model of expert performing The result of algorithm, readjusts algorithm parameter further according to result and re-executes algorithm and obtain new result, until data mining obtains Satisfied result.This process often expends substantial amounts of human cost, wastes time and efforts.Although most of data minings Algorithm can constantly iteration and convergence, but owing to initial parameter and the data that cause of locally optimal solution during calculating are dug Pick result is undesirable, can not be solved by the iteration of algorithm own.
Summary of the invention
It is an object of the invention to solve need asking of continuous manual feedback tuning during current data mining realizes Topic, it is provided that a kind of set up efficiently, can realize, the method for automatic data mining automatic feedback system.
A kind of method setting up data mining automatic feedback system involved in the present invention, described data mining automatic feedback System includes data segmentation module, evaluation of result module, parameter adjustment module;Data segmentation module is for being divided into instruction by data Practice data and evaluating data;Evaluation of result module is for the satisfaction of evaluating data Result, and the result of evaluation feeds back to ginseng Number data point reuse module;Parameter adjustment module adjusts data mining algorithm parameter according to the evaluation of evaluation of result module.
A kind of method setting up data mining automatic feedback system involved in the present invention, its step is as follows:
Step 1, being training data and test data by source data to be excavated random division in proportion, wherein training data will For training data mining algorithm model, test data, for the accuracy of evaluating data mining model, are held for process each time Row carries out repeated segmentation and uses different random prototype, it is to avoid because the occasionality of random division affects the evaluation to arithmetic result;
If step 2 data mining algorithm is output as model, then test data data segmentation in step 1 produced Independent variable carries out data mining as input, the algorithm model using data mining algorithm training to produce, tests in comparison step 1 In data, the data result of script and use algorithm model carry out the output excavated, and calculate the two matching degree, match calculating Go out the network performance indexes such as MSE and RMSE and draw the Accuracy evaluation to algorithm model;
If data mining algorithm is output as result data, then data mining results training data produced and test number According to comparing, calculating the two matching degree, the data matched calculate the network performance indexes such as MSE and RMSE, and just Matching degree and network performance index feed back to supplemental characteristic adjusting module;
Step 3, according in step 2 to data mining algorithm model test results and the Accuracy evaluation to algorithm model, According to the feedback result of evaluation of result module, use parameter automatic adjustment algorithm that the parameter of data mining is adjusted;
Step 4, using adjusting the data mining algorithm model after parameter as new algorithm model, re-execute step 1, directly Test result to data mining algorithm model reaches requirement;
Wherein the automatic adjustment algorithm of parameter described in step 3 includes: parameter is divided into scalar parameter and vector parameters;Carry out When adjusting ginseng, preferentially adjust scalar parameter, adjust scalar parameter when still can not meet demand, by granularity from the coarse to fine in the way of, by Each vector parameters of successive step;
Further, above-mentioned scalar parameter refers to the value that value is finite number of parameter, as similarity distance method be only capable of into The limited values such as Euclidean distance, Minkowski distance, manhatton distance;
Further, above-mentioned vector parameters refers to the parameter that can adjust with any floating number in certain limit, such as simplicity The smoothing parameter of Bayesian Classification Arithmetic;
The wherein performance function of a MSE network described in step 2, for the mean square error of network, its computational methods are as follows:
M S E = Σ i = 1 r ( n i - 1 ) s i 2 N - r
The wherein performance function of a RMSE network described in step 2, for the root-mean-square error of network, its computational methods are such as Under:
R M S E = Σ i = 1 n ( X o b s , i - X mod e l , i ) 2 n
A kind of data mining automatic feedback system that the method for the present invention is set up, its data segmentation module, evaluation of result Module, parameter adjustment module collaborative work form feedback, automatically adjust, optimize the parameter of data mining algorithm, carry out than ever Human cost is more saved in data mining.By data segmentation module, data are divided into training data and test data, make data Mining effect checking is evidence-based.By evaluation of result module, data mining algorithm result is evaluated, to data mining effect Make feedback, make parameter adjustment more scientific.Automatically being adjusted the parameter of data mining algorithm by parameter adjustment module, minimizing makes The manpower waste brought with expertise model.By the automatic adjustment algorithm of parameter efficiently, automatic adjustment algorithm ginseng accurately Number, the effective parameter adjustment efficiency that improves, the actual wave that traversal parameter value scope too much in minimizing automation process causes Take.
Accompanying drawing explanation
Fig. 1 is data mining automatic feedback working-flow figure in the embodiment of the present invention;
Wherein, 1 is data segmentation module, and 2 is evaluation of result module, and 3 is parameter adjustment module.
Fig. 2 is data segmentation module workflow diagram in the embodiment of the present invention;
Fig. 3 is evaluation of result module workflow diagram in the embodiment of the present invention;
Fig. 4 is evaluation of result module workflow diagram in the embodiment of the present invention;
Fig. 5 is parameter adjustment module workflow diagram in the embodiment of the present invention.
Detailed description of the invention
The present invention is further described with embodiment below in conjunction with the accompanying drawings.
Embodiment one
Exemplary by the embodiment being described with reference to the drawings, be only used for explain the present invention, and it is not intended that to this The restriction of invention.
A kind of data mining automatic feedback system involved by the present embodiment, its workflow diagram is as it is shown in figure 1, specifically walk Rapid as follows:
Step 1, data are split
As in figure 2 it is shown, be training data and test data by source data to be excavated random division in proportion, wherein train number According to being used for training data mining algorithm model, test data are for the accuracy of evaluating data mining model, for mistake each time Cheng Zhihang carries out repeated segmentation and uses different random prototype, it is to avoid comment arithmetic result because the occasionality of random division affects Valency.
Step 2, train and assess data mining algorithm
If as it is shown on figure 3, data mining algorithm is output as model, the test data that data segmentation in step 1 is produced Independent variable as input, the algorithm model using data mining algorithm training to produce carries out data mining, surveys in comparison step 1 In examination data, the data result of script and use algorithm model carry out the output excavated, and calculate the two matching degree, match meter Calculate the network performance indexes such as MSE and RMSE and draw the Accuracy evaluation to algorithm model.
As shown in Figure 4, if data mining algorithm is output as result data, then data mining knot training data produced Fruit, compared with test data, calculates the two matching degree, and the data matched calculate the network performances such as MSE and RMSE and refer to Mark, and just matching degree and network performance index feed back to supplemental characteristic adjusting module.
The wherein performance function of a MSE network described in step 2, for the mean square error of network, its computational methods are as follows:
M S E = Σ i = 1 r ( n i - 1 ) s i 2 N - r
The wherein performance function of a RMSE network described in step 2, for the root-mean-square error of network, its computational methods are such as Under: R M S E = Σ i = 1 n ( X o b s , i - X mod e l , i ) 2 n
Step 3, adjustment data mining algorithm parameter optimization data mining algorithm
As it is shown in figure 5, according to the feedback result of evaluation of result module, use the automatic adjustment algorithm of parameter to data mining Parameter is adjusted.Wherein the automatic adjustment algorithm of parameter includes: parameter is divided into scalar parameter and vector parameters;Carry out adjusting ginseng Time, preferentially adjust scalar parameter, adjust scalar parameter when still can not meet demand, by granularity from the coarse to fine in the way of, progressively adjust Each vector parameters whole.Wherein scalar parameter refers to the value that value is finite number of parameter, as similarity distance method is only capable of as Europe Must be apart from, the limited value such as Minkowski distance, manhatton distance in several;Wherein refer to can be in certain limit for vector parameters The parameter that can adjust with any floating number, such as the smoothing parameter of Naive Bayes Classification Algorithm.
As it is shown in figure 1, this data mining automatic feedback system comprises 3 modules: data segmentation module 1, evaluation of result mould Block 2, parameter adjustment module 3.
Wherein, evaluation of result module 2, parameter adjustment module 3 form into a feedback loop with data mining algorithm, are obtaining Before satisfied data mining results, in feedback loop, constantly carry out positive feedback optimization.The result of calculation of data mining algorithm or The mode input drawn is to algorithm evaluation module, and the output result of algorithm evaluation module is input to parameter adjustment module, and parameter is adjusted Whole result is applied to again in data mining algorithm, forms an annular computing system.
In the description of this specification, reference term " embodiment ", " some embodiments ", " example ", " specifically show Example " or the description of " some examples " etc. means to combine this embodiment or example describes specific features, structure, material or spy Point is contained at least one embodiment or the example of the present invention.In this manual, to the schematic representation of above-mentioned term not Necessarily refer to identical embodiment or example.And, the specific features of description, structure, material or feature can be any One or more embodiments or example in combine in an appropriate manner.
Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that: not These embodiments can be carried out multiple change in the case of departing from the principle of the present invention and objective, revise, replace and modification, this The scope of invention is limited by claim and equivalent thereof.

Claims (1)

1. the method setting up data mining automatic feedback system, is characterized in that the method step is as follows:
Step 1, being training data and test data by source data to be excavated random division in proportion, wherein training data will be used for Training data mining algorithm model, test data for the accuracy of evaluating data mining model, for process each time perform into Row repeated segmentation and use different random prototype, it is to avoid because the occasionality of random division affects the evaluation to arithmetic result;
If step 2 data mining algorithm is output as model, then certainly becoming of test data data segmentation in step 1 produced Amount carries out data mining as input, the algorithm model using data mining algorithm training to produce, and tests data in comparison step 1 The data result of middle script and use algorithm model carry out the output excavated, and calculate the two matching degree, match and calculate MSE And the network performance index such as RMSE draws the Accuracy evaluation to algorithm model;
If data mining algorithm is output as result data, then data mining results training data produced and test data phase Relatively, calculating the two matching degree, the data matched calculate the network performance indexes such as MSE and RMSE, and just mate Degree and network performance index feed back to supplemental characteristic adjusting module;
Step 3, according in step 2 to data mining algorithm model test results and the Accuracy evaluation to algorithm model, according to The feedback result of evaluation of result module, uses the automatic adjustment algorithm of parameter to be adjusted the parameter of data mining;
Step 4, using adjusting the data mining algorithm model after parameter as new algorithm model, re-execute step 1, until number Requirement is reached according to the test result of mining algorithm model.
CN201610512308.3A 2016-07-01 2016-07-01 A kind of method setting up data mining automatic feedback system Pending CN106202310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610512308.3A CN106202310A (en) 2016-07-01 2016-07-01 A kind of method setting up data mining automatic feedback system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610512308.3A CN106202310A (en) 2016-07-01 2016-07-01 A kind of method setting up data mining automatic feedback system

Publications (1)

Publication Number Publication Date
CN106202310A true CN106202310A (en) 2016-12-07

Family

ID=57463044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610512308.3A Pending CN106202310A (en) 2016-07-01 2016-07-01 A kind of method setting up data mining automatic feedback system

Country Status (1)

Country Link
CN (1) CN106202310A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764808A (en) * 2018-03-29 2018-11-06 北京九章云极科技有限公司 Data Analysis Services system and its on-time model dispositions method
CN109299178A (en) * 2018-09-30 2019-02-01 北京九章云极科技有限公司 A kind of application method and data analysis system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099136A1 (en) * 2009-10-23 2011-04-28 Gm Global Technology Operations, Inc. Method and system for concurrent event forecasting
CN103559303A (en) * 2013-11-15 2014-02-05 南京大学 Evaluation and selection method for data mining algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099136A1 (en) * 2009-10-23 2011-04-28 Gm Global Technology Operations, Inc. Method and system for concurrent event forecasting
CN103559303A (en) * 2013-11-15 2014-02-05 南京大学 Evaluation and selection method for data mining algorithm

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764808A (en) * 2018-03-29 2018-11-06 北京九章云极科技有限公司 Data Analysis Services system and its on-time model dispositions method
CN109299178A (en) * 2018-09-30 2019-02-01 北京九章云极科技有限公司 A kind of application method and data analysis system

Similar Documents

Publication Publication Date Title
Hoffer et al. Norm matters: efficient and accurate normalization schemes in deep networks
US10552737B2 (en) Artificial neural network class-based pruning
US20180330185A1 (en) Spatial transformer modules
WO2021041133A1 (en) Resource constrained neural network architecture search
CN107392125A (en) Training method/system, computer-readable recording medium and the terminal of model of mind
US20220036231A1 (en) Method and device for processing quantum data
CN109815855B (en) Electronic equipment automatic test method and system based on machine learning
Ye et al. Lightnet: A versatile, standalone matlab-based environment for deep learning
CN112116104B (en) Method, device, medium and electronic equipment for automatically integrating machine learning
CN106202310A (en) A kind of method setting up data mining automatic feedback system
Cheng et al. Predictor-corrector policy optimization
CN107729921B (en) Machine active learning method and learning system
CN110807428B (en) Coal sample identification method, device, server and storage medium
US20200320380A1 (en) Deep learning experiment content generation based on single design
CN110782016A (en) Method and apparatus for optimizing neural network architecture search
Sankaran et al. On the impact of larger batch size in the training of physics informed neural networks
Chan et al. AI-based robust convex relaxations for supporting diverse QoS in next-generation wireless systems
CN112115407A (en) Yixin machine data input equipment and method for inputting data into Yi xin machine
Duggal et al. High performance squeezenext for cifar-10
Kushner et al. Limits for parabolic partial differential equations with wide band stochastic coefficients andan application to filtering theory
CN113869492A (en) Counterfactual sample generation method, model adjustment method, device and medium
Zhao et al. Groundwater level forecasting based on support vector machine
Benchikhi et al. An ant colony based model to optimize parameters in industrial vision
Lamnisos et al. Adaptive MC^ 3 and Gibbs algorithms for Bayesian Model Averaging in Linear Regression Models
Xie et al. A Lightweight Neural Architecture Search Model for Medical Image Classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161207