CN108877947A - Depth sample learning method based on iteration mean cluster - Google Patents
Depth sample learning method based on iteration mean cluster Download PDFInfo
- Publication number
- CN108877947A CN108877947A CN201810558766.XA CN201810558766A CN108877947A CN 108877947 A CN108877947 A CN 108877947A CN 201810558766 A CN201810558766 A CN 201810558766A CN 108877947 A CN108877947 A CN 108877947A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- training
- iteration
- mean cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The depth sample learning method based on iteration mean cluster that the invention discloses a kind of, follows the steps below:S1:Training data is selected, and handles to obtain N+1 layers of training sample subset, N >=1 by n times iteration means clustering algorithm;S2:Every layer of training sample subset is independently subjected to regression training, obtains N+1 recurrence device;S3:Verify data is selected, and verify data is respectively fed to obtain N+1 verification result in N+1 recurrence device;S4:Corresponding optimal weight (the w of each recurrence device is determined based on Weighted Fusion mechanism0,w1,…,wN);S5:Test data is obtained, and obtains final prediction result using N+1 recurrence device and corresponding optimal weight.Its effect is:Learning sample is obtained into different training sample data collection by successive ignition mean cluster, is then trained and learns respectively, in the case where identical sample size, effectively increase the learning ability of model, improve the accuracy of classification or prediction.
Description
Technical field
The present invention relates to artificial intelligence technologys, and in particular to a kind of depth sample learning side based on iteration mean cluster
Method.
Background technique
With the development of artificial intelligence technology, the mode of sample learning is also varied, and the quality of sample learning method is tight
The accuracy of subsequent classification and recurrence is affected again.
Intelligent algorithm in the prior art, it is most of to be learnt and trained using single sample data set, one
Aspect only enhances classifier by increasing the number of iterations or returns device due to the learning sample limited amount that can be directly acquired
Performance, effect is limited;On the other hand, the true and false degree of existing learning sample can also generate the performance of training pattern serious
It influences, if treated all learning samples are same, it is difficult to pseudo- sample be avoided to impact model performance.
In order to avoid the influence of pseudo- sample, also it has been proposed that on-line study mechanism, such as Chinese patent 201010166225.6
A kind of disclosed self-adaptive cascade classifier training method based on on-line study is initially cascaded using a small amount of sample training first
Then the classifier is used for the target detection in image by classifier, since training sample is less, classifier initial detecting effect
Fruit is bad.But on-line study sample is automatically extracted by tracking, using self-adaptive cascade classifier algorithm to initial cascade point
Class device carries out on-line study, so as to step up the precision that the classifier carries out target detection in the picture.And pass through
Tracking obtains the new samples of classifier on-line study automatically and automatic marking, improves the intelligence of classifier training process
Degree can be changed, significantly reduce the workload of artificial mark sample class.
But by this mechanism of on-line study, the new learning sample of extraction gradually is needed, algorithm complexity is increased
Degree, and the promotion of algorithm performance needs a relatively very long process, and initial performance is relatively poor.
Summary of the invention
To solve the above-mentioned problems, the present invention provides a kind of depth sample learning method based on iteration mean cluster,
In the learning process of classifier or recurrence device, original sample is classified as by many levels, each layer of list by iteration mean cluster
Solely one classifier of training or recurrence device, are then verified by validation data set respectively, obtain each weight for returning device,
So that it is guaranteed that maximized study and the accuracy for identifying or classifying using the characteristic in sample data, Lifting scheme.
To achieve the above object, specific technical solution of the present invention is as follows:
A kind of depth sample learning method based on iteration mean cluster, key are to follow the steps below:
S1:Training data is selected, and handles to obtain N+1 layers of training sample subset, N by n times iteration means clustering algorithm
≥1;
S2:Every layer of training sample subset is independently subjected to regression training, obtains N+1 recurrence device;
S3:Verify data is selected, sample space of the sample with each layer will be first verified and carries out Euclidean distance Similarity measures,
To convert this layer of sample space most like sample therewith for the verifying sample, and these samples are respectively fed to N+1
It returns in device and obtains N+1 verification result;
S4:Corresponding optimal weight (the w of each recurrence device is determined based on Weighted Fusion mechanism0,w1,…,wN);
S5:Test data is obtained, test sample and each layer of sample space are first subjected to Euclidean distance Similarity measures,
To convert this layer of sample space most like sample therewith for the test sample, then these samples are respectively fed to step S2
Resulting N+1 recurrence device and the corresponding optimal weight of the resulting each recurrence device of step S4 obtain final prediction result.
Further, it is determined that optimal weight (w0,w1,…,wN) when constraint condition be:
Optionally, the iteration means clustering algorithm uses K mean cluster.
Optionally, the recurrence device model uses Support vector regression model, kernel function using linear kernel function or
Radial basis kernel function.
Optionally, the test data is the medical data of object to be measured, and the training data and verify data are selected from UCI
Equal public databases, each sample includes multiple features, and the prediction result is label value (integer or the floating-point of object to be measured
Number).
Optionally, the test data is the medical data of object to be measured, and the training data and verify data are selected from UCI
Diabetes data or heart disease data in equal public databases, each sample includes multiple features, the prediction result be to
Survey the age value of object.
Optionally, using mean absolute error MAE come the performance of evaluation and foreca algorithm, specially:M indicates the number of samples of test data, ajIndicate the corresponding actual value of j-th of test sample, a 'j
Indicate the corresponding predicted value of j test sample.
Remarkable result of the invention is:
Learning sample is obtained different training sample data collection by successive ignition mean cluster by this method, is then distinguished
It is trained and learns, in the case where identical sample size, by training by different level and learn, effectively increase model
Habit ability improves the accuracy of classification or recurrence.
Detailed description of the invention
Fig. 1 is depth sample learning model proposed by the present invention;
Fig. 2 is iteration mean cluster model in Fig. 1;
Fig. 3 is age prediction effect figure in specific embodiment.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for
Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention
It encloses.
It should be noted that unless otherwise indicated, technical term or scientific term used in this application should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
The present embodiment by the age predict for the purpose of describe in detail, select come from UCI database (http://
Archive.ics.uci.edu/ the part sample in two datasets), one is diabetes data collection, abbreviation MD
(Mellitus Data Set), the other is heart disease data set, abbreviation HD (Heart Disease Data Set).Heart
Sick data set includes 137 normal samples, and each sample includes 14 features;Diabetes data collection includes 268 normal samples,
Each sample includes 8 features.The detailed information of two datasets is as shown in table 1.
The essential information of 1 data set of table
Number | The range of age (year) | Age mean value (year) | Age criterion is poor | |
HD | 137 | 34~71 | 52.71 | 9.14 |
DM | 268 | 21~66 | 29.94 | 10.51 |
Each type of data sample is divided into training set at random, and verifying collection test set 100 times, obtains 100 groups of samples
This.In this trial, computer operating system is Windows 10,64,8GB memory;Experiment porch is MATLAB,
2016a.For the ease of the algorithm that subsequent analysis and explanation, the present embodiment propose, referred to as PAEM, traditional algorithm is referred to as
TAEM.Method proposed by the present invention can be in conjunction with different regression models, feature selecting algorithm, example optimal algorithm, assessment mark
Standard, to be converted into other various specific algorithms.The present embodiment is used as using Support vector regression model returns device, and
Use linear kernel function and default parameters.
Specific steps include (note as can be seen from Figure 1:In figure verifying collection and test set be combine depth sample space it
Result afterwards):
S1:Training data is selected, and handles to obtain 3 layers of training sample subset by 2 iteration means clustering algorithms;
S2:Every layer of training sample subset is independently subjected to regression training, obtains 3 recurrence devices;
S3:Verify data is selected, sample space of the sample with each layer will be first verified and carries out Euclidean distance Similarity measures,
To convert this layer of sample space most like sample therewith for the verifying sample, then these samples are respectively fed to 3 and are returned
Return in device and obtains 3 verification results;
S4:Corresponding optimal weight (the w of each recurrence device is determined based on Weighted Fusion mechanism0,w1,w2);
S5:Test data is obtained, test sample and each layer of sample space are first subjected to Euclidean distance Similarity measures,
To convert this layer of sample space most like sample therewith for the test sample, then these samples are respectively fed to step S2
Resulting 3 recurrence devices and the corresponding optimal weight of the resulting each recurrence device of step S4 obtain final prediction result.
Specifically, the cluster process of iteration means clustering algorithm is similar to K mean cluster in step S1, as shown in Fig. 2, passing through
The center of each class is found at the distance between minimum number strong point and arest neighbors center.
The core concept of iterative mean cluster:Minimize all samples to generic center Euclidean distance and, adopt
It is restrained with the mode of iteration.
Given training sample:{x(1),x(2),...,xm, specific step is as follows for K mean cluster algorithm:
1:Choose K cluster centre point, respectively μ1,μ2,...,μk
2:The generic c of each sample x is calculated according to following formulaj(1≤j≤k):
3:The center that every one kind is updated according to following formula, by μjIt is updated to μ 'j:
4:Constantly repetition step 2,3, until μjNo longer change (convergence)
5:It is resulting as a result, micro- by the random noise progress for increasing a zero-mean normal distribution for clustering each time
It adjusts, to obtain next sample set (sample space).
Y in figure0It is original and training set by iteration means clustering algorithm respectively obtains other two layers of sample Y1,Y2。
Three recurrence devices are obtained using the sample set of each layer, based on verifying collection, available corresponding result (r0,r1,r2), it is optimal
Weight wop=(w0,w1,w2) can be obtained by formula (3).
Determine optimal weight (w0,w1,w2) when constraint condition be:
After recurrence device model training has learnt, it is based on test set, obtains the prediction age a=(a that each layer returns device0,
a1,a2), by merging weight (w0,w1,w2) obtain final age af=wop Ta。
The performance of Measurement Algorithm, using mean absolute error MAE come the performance of evaluation and foreca algorithm, specially:M indicates the number of samples of test data, ajIndicate the corresponding actual value of j-th of test sample, a 'j
Indicate the corresponding predicted value of j test sample.Age detection mechanism of the present invention is better than to time of traditional age detection mechanism simultaneously
Number scale is Score.
Details are as shown in table 2, and mean indicates that average value, std indicate standard deviation.
From table 2 it can be seen that carrying out the MAE that age detection obtains using the mentioned method of the present invention for two datasets
Mean value and standard deviation it is all smaller than traditional, the age of illustration method age forecasting mechanism prediction is than traditional age forecasting mechanism
Want more acurrate.Meanwhile Score value is bigger, can illustrate the superiority of this method from the other hand.
The result at 2 two datasets of table prediction age
The histogram of table 2 is shown in Fig. 3.It mainly shows the difference and P value at the prediction age that this method obtains.
From figure 3, it can be seen that the MAE at the age that two datasets are predicted by this paper mechanism is smaller, and hypothesis testing
Obtained P value is both less than 0.05, illustrates that the MAE at the prediction age of PAEM is more preferable in significance.
Finally, it should be noted that foregoing description is the preferred embodiment of the present invention, those skilled in the art exist
Under enlightenment of the invention, without prejudice to the purpose of the present invention and the claims, multiple similar expressions can be made, this
The transformation of sample is fallen within the scope of protection of the present invention.
Claims (7)
1. a kind of depth sample learning method based on iteration mean cluster, it is characterised in that follow the steps below:
S1:Training data is selected, and handles to obtain N+1 layers of training sample subset, N >=1 by n times iteration means clustering algorithm;
S2:Every layer of training sample subset is independently subjected to regression training, obtains N+1 recurrence device;
S3:Verify data is selected, sample space of the sample with each layer will be first verified and carries out Euclidean distance Similarity measures, thus
This layer of sample space most like sample therewith is converted by the verifying sample, and these samples are respectively fed to N+1 recurrence
N+1 verification result is obtained in device;
S4:Corresponding optimal weight (the w of each recurrence device is determined based on Weighted Fusion mechanism0,w1,…,wN);
S5:Test data is obtained, test sample and each layer of sample space are first subjected to Euclidean distance Similarity measures, thus
This layer of sample space most like sample therewith is converted by the test sample, then these samples are respectively fed to obtained by step S2
N+1 recurrence device and the corresponding optimal weight of the resulting each recurrence device of step S4 obtain final prediction result.
2. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:It determines most
Good weight (w0,w1,…,wN) when constraint condition be:
3. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:It is described to change
Be equal to K mean cluster for the search principle of the cluster centre of means clustering algorithm, but each time iteration when, on original sample is exactly
Cluster centre after primary cluster.
4. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:Described time
Return device model using Support vector regression model, kernel function uses linear kernel function or Radial basis kernel function.
5. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:The survey
The medical data that data are object to be measured is tried, the training data and verify data are selected from the public databases such as UCI, each sample
Including multiple features, the prediction result is the label of object to be measured.
6. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:The survey
Try the medical data that data are object to be measured, the glycosuria of the training data and verify data in the public databases such as UCI
Sick data or heart disease data, each sample include multiple features, and the prediction result is the age value of object to be measured.
7. -6 any depth sample learning method based on iteration mean cluster according to claim 1, it is characterised in that:
Using mean absolute error MAE come the performance of evaluation and foreca algorithm, specially:M indicates test number
According to number of samples, ajIndicate the corresponding actual value of j-th of test sample, a 'jIndicate the corresponding predicted value of j test sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810558766.XA CN108877947B (en) | 2018-06-01 | 2018-06-01 | Depth sample learning method based on iterative mean clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810558766.XA CN108877947B (en) | 2018-06-01 | 2018-06-01 | Depth sample learning method based on iterative mean clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108877947A true CN108877947A (en) | 2018-11-23 |
CN108877947B CN108877947B (en) | 2021-10-15 |
Family
ID=64336272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810558766.XA Active CN108877947B (en) | 2018-06-01 | 2018-06-01 | Depth sample learning method based on iterative mean clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108877947B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222762A (en) * | 2019-06-04 | 2019-09-10 | 恒安嘉新(北京)科技股份公司 | Object prediction method, apparatus, equipment and medium |
CN111914995A (en) * | 2020-06-18 | 2020-11-10 | 北京百度网讯科技有限公司 | Regularized linear regression generation method and device, electronic equipment and storage medium |
CN113393932A (en) * | 2021-07-06 | 2021-09-14 | 重庆大学 | Parkinson's disease voice sample segment multi-type reconstruction transformation method |
CN114300116A (en) * | 2021-11-10 | 2022-04-08 | 安徽大学 | Robust disease detection method based on online classification algorithm |
CN115570228A (en) * | 2022-11-22 | 2023-01-06 | 苏芯物联技术(南京)有限公司 | Intelligent feedback control method and system for welding pipeline gas supply |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020046198A1 (en) * | 2000-06-19 | 2002-04-18 | Ben Hitt | Heuristic method of classification |
CN101944122A (en) * | 2010-09-17 | 2011-01-12 | 浙江工商大学 | Incremental learning-fused support vector machine multi-class classification method |
CN105938116A (en) * | 2016-06-20 | 2016-09-14 | 吉林大学 | Gas sensor array concentration detection method based on fuzzy division and model integration |
-
2018
- 2018-06-01 CN CN201810558766.XA patent/CN108877947B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020046198A1 (en) * | 2000-06-19 | 2002-04-18 | Ben Hitt | Heuristic method of classification |
CN101944122A (en) * | 2010-09-17 | 2011-01-12 | 浙江工商大学 | Incremental learning-fused support vector machine multi-class classification method |
CN105938116A (en) * | 2016-06-20 | 2016-09-14 | 吉林大学 | Gas sensor array concentration detection method based on fuzzy division and model integration |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222762A (en) * | 2019-06-04 | 2019-09-10 | 恒安嘉新(北京)科技股份公司 | Object prediction method, apparatus, equipment and medium |
CN111914995A (en) * | 2020-06-18 | 2020-11-10 | 北京百度网讯科技有限公司 | Regularized linear regression generation method and device, electronic equipment and storage medium |
CN113393932A (en) * | 2021-07-06 | 2021-09-14 | 重庆大学 | Parkinson's disease voice sample segment multi-type reconstruction transformation method |
CN113393932B (en) * | 2021-07-06 | 2022-11-25 | 重庆大学 | Parkinson's disease voice sample segment multi-type reconstruction transformation method |
CN114300116A (en) * | 2021-11-10 | 2022-04-08 | 安徽大学 | Robust disease detection method based on online classification algorithm |
CN114300116B (en) * | 2021-11-10 | 2023-11-28 | 安徽大学 | Robust syndrome detection method based on online classification algorithm |
CN115570228A (en) * | 2022-11-22 | 2023-01-06 | 苏芯物联技术(南京)有限公司 | Intelligent feedback control method and system for welding pipeline gas supply |
Also Published As
Publication number | Publication date |
---|---|
CN108877947B (en) | 2021-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
CN108877947A (en) | Depth sample learning method based on iteration mean cluster | |
CN111126482B (en) | Remote sensing image automatic classification method based on multi-classifier cascade model | |
Wang et al. | Relaxed multiple-instance SVM with application to object discovery | |
CN106682696B (en) | The more example detection networks and its training method refined based on online example classification device | |
CN104484681B (en) | Hyperspectral Remote Sensing Imagery Classification method based on spatial information and integrated study | |
CN110940523B (en) | Unsupervised domain adaptive fault diagnosis method | |
CN113408605A (en) | Hyperspectral image semi-supervised classification method based on small sample learning | |
CN110543906B (en) | Automatic skin recognition method based on Mask R-CNN model | |
CN103714148B (en) | SAR image search method based on sparse coding classification | |
CN104615894A (en) | Traditional Chinese medicine diagnosis method and system based on k-nearest neighbor labeled specific weight characteristics | |
CN110210625A (en) | Modeling method, device, computer equipment and storage medium based on transfer learning | |
CN110363230A (en) | Stacking integrated sewage handling failure diagnostic method based on weighting base classifier | |
CN106250913B (en) | A kind of combining classifiers licence plate recognition method based on local canonical correlation analysis | |
Iqbal et al. | Mitochondrial organelle movement classification (fission and fusion) via convolutional neural network approach | |
CN109933619A (en) | A kind of semisupervised classification prediction technique | |
CN104978569A (en) | Sparse representation based incremental face recognition method | |
CN107016377A (en) | Recognition of face optimization method based on SGASEN algorithms | |
Zhong et al. | Fuzzy nonlinear proximal support vector machine for land extraction based on remote sensing image | |
Lin et al. | A fusion-based convolutional fuzzy neural network for lung cancer classification | |
Soni et al. | RFSVM: A novel classification technique for breast cancer diagnosis | |
CN105894032A (en) | Method of extracting effective features based on sample properties | |
CN117195027A (en) | Cluster weighted clustering integration method based on member selection | |
CN109191452B (en) | Peritoneal transfer automatic marking method for abdominal cavity CT image based on active learning | |
CN111144453A (en) | Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |