CN109146080A - The method of model realization framework based on supervision class machine learning algorithm - Google Patents
The method of model realization framework based on supervision class machine learning algorithm Download PDFInfo
- Publication number
- CN109146080A CN109146080A CN201811072255.3A CN201811072255A CN109146080A CN 109146080 A CN109146080 A CN 109146080A CN 201811072255 A CN201811072255 A CN 201811072255A CN 109146080 A CN109146080 A CN 109146080A
- Authority
- CN
- China
- Prior art keywords
- data
- model
- sample
- machine learning
- learning algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The present invention relates to the methods of the model realization framework based on supervision class machine learning algorithm, comprising the following steps: step 1: the design of model data frame entirety, mainly for explicitly defining for mode input data;Step 2: data prediction design carries out further processing processing mainly for mode input matrix is generated;Step 3: sample control design case, mainly for the sample data and label data in supervision machine study;Step 4: an algorithms library is mainly established in model training design, and the training data that step 2 is completed the process is as input, then, is called the algorithm in algorithms library, that is, is produced corresponding machine learning model;Step 5: test data is inputted in trained each model and calculates acquisition prediction result, compares the otherness of the target item and prediction result in test data by model evaluation design.The present invention reaches the realization to overall architecture by supervising the model of the learning algorithm of class machine, helps the operation in simplified later period.
Description
Technical field
The present invention relates to a kind of methods of model realization framework based on supervision class machine learning algorithm.
Background technique
Under current techniques environment, machine learning is most popular most exciting one of field.The study of machine, allows people
Enjoyed stable twit filter, convenient text and speech recognition, reliable network search engines and brilliant
Chess player, and safe and efficient autonomous driving vehicle is expected to occur indubitable, and machine learning has become one
Popular domain, but it is sometimes easy to have one's view of the important overshadowed by the trivial that constantly volume carries out innovative ability and realizes continuous study by it needs.
Summary of the invention
In order to solve the above technical problems, the object of the present invention is to provide a kind of models based on supervision class machine learning algorithm
The method for realizing framework.
To achieve the above object, the present invention adopts the following technical scheme:
The method of model realization framework based on supervision class machine learning algorithm, comprising the following steps:
Step 1: the design of model data frame entirety, mainly for explicitly defining for mode input data;
Step 2: data prediction design carries out further processing processing mainly for mode input matrix is generated;
Step 3: sample control design case, mainly for the sample data and label data in supervision machine study;
Step 4: model training design mainly establishes an algorithms library, the training data that step 2 is completed the process is as defeated
Enter, then, calls the algorithm in algorithms library, that is, produce corresponding machine learning model;
Step 5: test data is inputted in trained each model and calculates acquisition prediction knot by model evaluation design
Fruit compares the otherness of the target item and prediction result in test data.
Further, the method for the model realization framework based on supervision class machine learning algorithm, wherein the step
Mode input data are divided into target item and characteristic item in rapid 1, wherein target item is the object that model needs to predict, passes through industry
Business demand confirms such object;Characteristic item be then for carrying out model training multi-dimensional matrix, it is every in characteristic item
One dimension all has certain influence to prediction target item.
Further, the method for the model realization framework based on supervision class machine learning algorithm, wherein described
Processing mode in step 2 the following steps are included:
1, deletion row records duplicate data sample or any one column missing values are more than 50% characteristic series;
2, the basic conversion of correlated characteristic column;
3, pass through the characteristic series of some continuous types of the related dummy variable discretization of design or classifying text type;
4, the processing of exceptional value deviates excessive data point for arranging, and is directly deleted or assignment again;
5, it is calculated with the multiple characteristic series of specific logical association, generates new characteristic series;
6, data are carried out by lateral division with certain rule, is respectively defined as training data and test data.
Further, the method for the model realization framework based on supervision class machine learning algorithm, wherein described
Basic conversion includes that LOG, EXP, SQRT are converted.
It is further again, the method for the model realization framework based on supervision class machine learning algorithm, wherein institute
Sample data in step 3 is stated, needs to increase a column entitled " weight " or the amendment column of " offset ", assignment rule are as follows:
The sample that label is 1, weight are assigned a value of p1/r1;
The sample that label is 0, weight are assigned a value of (1-p1)/(1-r1);
Wherein p1 is ratio shared by label is 1 in initial bulk sample notebook data sample, and r1 is sample adjusted of sampling
Ratio shared by the sample that label is 1 in data.
It is further again, the method for the model realization framework based on supervision class machine learning algorithm, wherein institute
Stating the algorithms library in step 4 is the algorithm packet in R, or is the Scipy algorithms library in Python, or is calculated for the MLlib in Spark
Faku County.
It is further again, the method for the model realization framework based on supervision class machine learning algorithm, wherein institute
It states target item involved in step 5 and prediction result is equipped with reference quantity, respectively mean square error and classification accuracy, wherein
MSE is known as mean square error, calculation formula are as follows:
Wherein, N is test sample amount, yiFor the target item in test data,For model predication value.
Classification accuracy, calculation formula are as follows:
Wherein, N is test sample amount, and p is that model prediction is 1 and realistic objective item is also 1 quantity, and q is model prediction
For 0 and realistic objective item is also 0 quantity.
It is further again, the method for the model realization framework based on supervision class machine learning algorithm, wherein one
The rule of fixed rule is the ratio cut partition training test data with 7:3, i.e., 70% data sample is used for training pattern, 30%
Data sample be used to test.
According to the above aspect of the present invention, the present invention has at least the following advantages:
1, the invention is to supervise the complete procedure of class machine learning algorithm, has versatility and reproducibility, for each
The machine learning algorithm business in field can use.
2, the invention considers thorough in process of data preprocessing, for establish machine learning model provide it is reliable defeated
Enter.
3, the invention is suitable for all kinds of machine learning frames and all kinds of machine learning models.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And can be implemented in accordance with the contents of the specification, the following is a detailed description of the preferred embodiments of the present invention and the accompanying drawings.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is structural schematic diagram of the invention.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below
Example is not intended to limit the scope of the invention for illustrating the present invention.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction with attached in the embodiment of the present invention
Figure, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this
Invention a part of the embodiment, instead of all the embodiments.Embodiments of the present invention, which are generally described and illustrated herein in the accompanying drawings
Component can arrange and design with a variety of different configurations.Therefore, the implementation of the invention to providing in the accompanying drawings below
The detailed description of example is not intended to limit the range of claimed invention, but is merely representative of selected implementation of the invention
Example.Based on the embodiment of the present invention, those skilled in the art are obtained all without making creative work
Other embodiments shall fall within the protection scope of the present invention.
Embodiment
As shown in Figure 1, the method for the model realization framework based on supervision class machine learning algorithm, comprising the following steps:
Step 1: the design of model data frame entirety, mainly for explicitly defining for mode input data;
Wherein, mode input data are divided into target item and characteristic item, wherein target item is pair that model needs to predict
As confirming such object by business demand;Characteristic item is then for carrying out model training multi-dimensional matrix, feature
Each of item dimension all has certain influence to prediction target item.
Therefore the main task of the model data frame entirety is according to the business demand of actual items and available
Conceptual data situation is determined target item and feature item data by specific logical definition, and they is merged and pools one
A complete mode input matrix.
Step 2: data prediction design carries out further processing processing mainly for mode input matrix is generated;
Processing mode in the step 2 the following steps are included:
1, deletion row records duplicate data sample or any one column missing values are more than 50% characteristic series;
2, the basic conversion of correlated characteristic column, such as the basic conversion include that LOG, EXP, SQRT are converted;
3, pass through the characteristic series of some continuous types of the related dummy variable discretization of design or classifying text type;
4, the processing of exceptional value deviates excessive data point for arranging, and is directly deleted or assignment again;
5, it is calculated with the multiple characteristic series of specific logical association, generates new characteristic series;
6, data are carried out by lateral division with certain rule, is respectively defined as training data and test data.
The rule of certain rule is the ratio cut partition training test data with 7:3, i.e., 70% data sample is for instructing
Practice model, 30% data sample is used to test.
Step 3: sample control design case, mainly for the sample data and label data in supervision machine study;
Often occurs the case where 1-0 sample imbalance in actual items, the sample data that in most cases label is 1
Much smaller than the sample data that label is 0.It would therefore be desirable to have the processes of sample control, that is, replicate the sample data that label is 1
Or the sample data that random sampling label is 0, finally make the sample data volume that label is 1 and the sample data that label is 0
Amount is maintained on the same order of magnitude.
Sample data in the step 3 needs to increase a column entitled " weight " or the amendment column of " offset ", assignment rule
Then are as follows:
The sample that label is 1, weight are assigned a value of p1/r1;
The sample that label is 0, weight are assigned a value of (1-p1)/(1-r1);
Wherein p1 is ratio shared by label is 1 in initial bulk sample notebook data sample, and r1 is sample adjusted of sampling
Ratio shared by the sample that label is 1 in data.
Step 4: model training design mainly establishes an algorithms library, the training data that step 2 is completed the process is as defeated
Enter, then, calls the algorithm in algorithms library, that is, produce corresponding machine learning model;
It needs to need to pay attention in modeling process for step 4:
1, in the case where no theoretical proof certain algorithm is optimal, all suitable input numbers in algorithms library are needed to be traversed for
According to model;
2, for every kind of algorithm, key parameter type is also had nothing in common with each other;For the major parameter of every kind of algorithm, need
It is targetedly configured, just the fitting effect of model can be made to reach best;
3, the generation of model over-fitting in order to prevent, the cross validation for needing to carry out K folding carry out those parameters to be estimated
Adjustment fitting;
4, modeling needs to carry out Model Diagnosis after completing, such as needs to check that the R2 of model is for the algorithm of regression class
It is no larger, illustrate that the fitting effect of model is better closer to 1.In addition also to check that the normal state randomness test of residual error whether can
Enough pass through, whether there is apparent multicollinearity phenomenon between dimension;For classification problem, need to check the ROC drawn out
Whether the AUC value under curve is larger, illustrates that the fitting effect of model is better closer to 1.
Algorithms library in the step 4 is the algorithm packet in R, or is the Scipy algorithms library in Python, or is Spark
In MLlib algorithms library.
Step 5: test data is inputted in trained each model and calculates acquisition prediction knot by model evaluation design
Fruit compares the otherness of the target item and prediction result in test data.
Target item involved in the step 5 and prediction result are equipped with reference quantity, and respectively mean square error and classification is accurate
Rate, wherein
MSE is known as mean square error, calculation formula are as follows:
Wherein, N is test sample amount, yiFor the target item in test data,For model predication value.
Classification accuracy, calculation formula are as follows:
Wherein, N is test sample amount, and p is that model prediction is 1 and realistic objective item is also 1 quantity, and q is model prediction
For 0 and realistic objective item is also 0 quantity.
In addition, there may be customized model prediction Performance Evaluating Indexes in some actual projects.Comprehensively consider MSE
Value, ACCURACY value and customized model-evaluation index, selection MSE value as far as possible is small, the big model of ACCURACY value as
The model that final choice uses.
The present invention has at least the following advantages:
1, the invention is to supervise the complete procedure of class machine learning algorithm, has versatility and reproducibility, for each
The machine learning algorithm business in field can use.
2, the invention considers thorough in process of data preprocessing, for establish machine learning model provide it is reliable defeated
Enter.
3, the invention is suitable for all kinds of machine learning frames and all kinds of machine learning models.
The above is only a preferred embodiment of the present invention, it is not intended to restrict the invention, it is noted that for this skill
For the those of ordinary skill in art field, without departing from the technical principles of the invention, can also make it is several improvement and
Modification, these improvements and modifications also should be regarded as protection scope of the present invention.
Claims (8)
1. the method for the model realization framework based on supervision class machine learning algorithm, which comprises the following steps:
Step 1: the design of model data frame entirety, mainly for explicitly defining for mode input data;
Step 2: data prediction design carries out further processing processing mainly for mode input matrix is generated;
Step 3: sample control design case, mainly for the sample data and label data in supervision machine study;
Step 4: an algorithms library is mainly established in model training design, and the training data that step 2 is completed the process is as input, so
Afterwards, the algorithm in algorithms library is called, that is, produces corresponding machine learning model;
Step 5: test data is inputted in trained each model and calculates acquisition prediction result by model evaluation design, than
Compared with the otherness of target item and prediction result in test data.
2. the method for the model realization framework according to claim 1 based on supervision class machine learning algorithm, feature exist
In: mode input data are divided into target item and characteristic item in the step 1, wherein target item is pair that model needs to predict
As confirming such object by business demand;Characteristic item is then for carrying out model training multi-dimensional matrix, feature
Each of item dimension all has certain influence to prediction target item.
3. the method for the model realization framework according to claim 1 based on supervision class machine learning algorithm, feature exist
In: processing mode in the step 2 the following steps are included:
1, deletion row records duplicate data sample or any one column missing values are more than 50% characteristic series;
2, the basic conversion of correlated characteristic column;
3, pass through the characteristic series of some continuous types of the related dummy variable discretization of design or classifying text type;
4, the processing of exceptional value deviates excessive data point for arranging, and is directly deleted or assignment again;
5, it is calculated with the multiple characteristic series of specific logical association, generates new characteristic series;
6, data are carried out by lateral division with certain rule, is respectively defined as training data and test data.
4. the method for the model realization framework according to claim 3 based on supervision class machine learning algorithm, feature exist
In: the basic conversion includes that LOG, EXP, SQRT are converted.
5. the method for the model realization framework according to claim 1 based on supervision class machine learning algorithm, feature exist
In: sample data in the step 3 needs to increase a column entitled " weight " or the amendment column of " offset ", assignment rule are as follows:
The sample that label is 1, weight are assigned a value of p1/r1;
The sample that label is 0, weight are assigned a value of (1-p1)/(1-r1);
Wherein p1 is ratio shared by label is 1 in initial bulk sample notebook data sample, and r1 is sample data adjusted of sampling
Ratio shared by the sample that middle label is 1.
6. the method for the model realization framework according to claim 1 based on supervision class machine learning algorithm, feature exist
In: the algorithms library in the step 4 is the algorithm packet in R, or is the Scipy algorithms library in Python, or in Spark
MLlib algorithms library.
7. the method for the model realization framework according to claim 1 based on supervision class machine learning algorithm, feature exist
In: target item involved in the step 5 and prediction result are equipped with reference quantity, respectively mean square error and classification accuracy,
In,
MSE is known as mean square error, calculation formula are as follows:
Wherein, N is test sample amount, yiFor the target item in test data,For model predication value.
Classification accuracy, calculation formula are as follows:
Wherein, N is test sample amount, and p is that model prediction is 1 and realistic objective item is also 1 quantity, q be model prediction be 0 and
Realistic objective item is also 0 quantity.
8. the method for the model realization framework according to claim 3 based on supervision class machine learning algorithm, feature exist
In: the rule of certain rule is the ratio cut partition training test data with 7:3, i.e., 70% data sample is for training mould
Type, 30% data sample are used to test.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811072255.3A CN109146080A (en) | 2018-09-14 | 2018-09-14 | The method of model realization framework based on supervision class machine learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811072255.3A CN109146080A (en) | 2018-09-14 | 2018-09-14 | The method of model realization framework based on supervision class machine learning algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109146080A true CN109146080A (en) | 2019-01-04 |
Family
ID=64825268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811072255.3A Pending CN109146080A (en) | 2018-09-14 | 2018-09-14 | The method of model realization framework based on supervision class machine learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109146080A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008347A (en) * | 2019-11-25 | 2020-04-14 | 杭州安恒信息技术股份有限公司 | Website identification method, device and system and computer readable storage medium |
CN111047049A (en) * | 2019-12-05 | 2020-04-21 | 北京小米移动软件有限公司 | Method, apparatus and medium for processing multimedia data based on machine learning model |
CN113869342A (en) * | 2020-06-30 | 2021-12-31 | 微软技术许可有限责任公司 | Mark offset detection and adjustment in predictive modeling |
CN114254588A (en) * | 2021-12-16 | 2022-03-29 | 马上消费金融股份有限公司 | Data tag processing method and device |
-
2018
- 2018-09-14 CN CN201811072255.3A patent/CN109146080A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008347A (en) * | 2019-11-25 | 2020-04-14 | 杭州安恒信息技术股份有限公司 | Website identification method, device and system and computer readable storage medium |
CN111047049A (en) * | 2019-12-05 | 2020-04-21 | 北京小米移动软件有限公司 | Method, apparatus and medium for processing multimedia data based on machine learning model |
CN111047049B (en) * | 2019-12-05 | 2023-08-11 | 北京小米移动软件有限公司 | Method, device and medium for processing multimedia data based on machine learning model |
CN113869342A (en) * | 2020-06-30 | 2021-12-31 | 微软技术许可有限责任公司 | Mark offset detection and adjustment in predictive modeling |
CN114254588A (en) * | 2021-12-16 | 2022-03-29 | 马上消费金融股份有限公司 | Data tag processing method and device |
CN114254588B (en) * | 2021-12-16 | 2023-10-13 | 马上消费金融股份有限公司 | Data tag processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109146080A (en) | The method of model realization framework based on supervision class machine learning algorithm | |
Li et al. | Random search and reproducibility for neural architecture search | |
CN103729678B (en) | A kind of based on navy detection method and the system of improving DBN model | |
CN107688825B (en) | Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method | |
Effendy et al. | Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest | |
CN106503689A (en) | Neutral net local discharge signal mode identification method based on particle cluster algorithm | |
CN111063194A (en) | Traffic flow prediction method | |
Putra et al. | Estimation of parameters in the SIR epidemic model using particle swarm optimization | |
CN112578089B (en) | Air pollutant concentration prediction method based on improved TCN | |
CN108062566A (en) | A kind of intelligent integrated flexible measurement method based on the potential feature extraction of multinuclear | |
Özsoy et al. | Estimating the parameters of nonlinear regression models through particle swarm optimization | |
Kavitha et al. | Real time credit card fraud detection on huge imbalanced data using meta-classifiers | |
Pourchot et al. | Importance mixing: Improving sample reuse in evolutionary policy search methods | |
CN111753751A (en) | Fan fault intelligent diagnosis method for improving firework algorithm | |
Regazzoni et al. | A physics-informed multi-fidelity approach for the estimation of differential equations parameters in low-data or large-noise regimes | |
CN113657452A (en) | Tobacco leaf quality grade classification prediction method based on principal component analysis and super learning | |
CN116029221B (en) | Power equipment fault diagnosis method, device, equipment and medium | |
Namazi et al. | Surrogate assisted optimisation for travelling thief problems | |
CN116258899A (en) | Corn ear classification method based on custom light convolutional neural network | |
CN114861364A (en) | Intelligent sensing and suction regulation and control method for air inlet flow field of air-breathing engine | |
CN109697511A (en) | Data reasoning method, apparatus and computer equipment | |
Ye | Linear conic programming | |
JP2021012600A (en) | Method for diagnosis, method for learning, learning device, and program | |
Reena et al. | Software defect prediction system–decision tree algorithm with two level data pre-processing | |
Tan | Using supervised attribute selection for unsupervised learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |