CN108877947A - Depth sample learning method based on iteration mean cluster - Google Patents

Depth sample learning method based on iteration mean cluster Download PDF

Info

Publication number
CN108877947A
CN108877947A CN201810558766.XA CN201810558766A CN108877947A CN 108877947 A CN108877947 A CN 108877947A CN 201810558766 A CN201810558766 A CN 201810558766A CN 108877947 A CN108877947 A CN 108877947A
Authority
CN
China
Prior art keywords
sample
data
training
iteration
mean cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810558766.XA
Other languages
Chinese (zh)
Other versions
CN108877947B (en
Inventor
李勇明
郑源林
王品
颜芳
张�成
李新科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201810558766.XA priority Critical patent/CN108877947B/en
Publication of CN108877947A publication Critical patent/CN108877947A/en
Application granted granted Critical
Publication of CN108877947B publication Critical patent/CN108877947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The depth sample learning method based on iteration mean cluster that the invention discloses a kind of, follows the steps below:S1:Training data is selected, and handles to obtain N+1 layers of training sample subset, N >=1 by n times iteration means clustering algorithm;S2:Every layer of training sample subset is independently subjected to regression training, obtains N+1 recurrence device;S3:Verify data is selected, and verify data is respectively fed to obtain N+1 verification result in N+1 recurrence device;S4:Corresponding optimal weight (the w of each recurrence device is determined based on Weighted Fusion mechanism0,w1,…,wN);S5:Test data is obtained, and obtains final prediction result using N+1 recurrence device and corresponding optimal weight.Its effect is:Learning sample is obtained into different training sample data collection by successive ignition mean cluster, is then trained and learns respectively, in the case where identical sample size, effectively increase the learning ability of model, improve the accuracy of classification or prediction.

Description

Depth sample learning method based on iteration mean cluster
Technical field
The present invention relates to artificial intelligence technologys, and in particular to a kind of depth sample learning side based on iteration mean cluster Method.
Background technique
With the development of artificial intelligence technology, the mode of sample learning is also varied, and the quality of sample learning method is tight The accuracy of subsequent classification and recurrence is affected again.
Intelligent algorithm in the prior art, it is most of to be learnt and trained using single sample data set, one Aspect only enhances classifier by increasing the number of iterations or returns device due to the learning sample limited amount that can be directly acquired Performance, effect is limited;On the other hand, the true and false degree of existing learning sample can also generate the performance of training pattern serious It influences, if treated all learning samples are same, it is difficult to pseudo- sample be avoided to impact model performance.
In order to avoid the influence of pseudo- sample, also it has been proposed that on-line study mechanism, such as Chinese patent 201010166225.6 A kind of disclosed self-adaptive cascade classifier training method based on on-line study is initially cascaded using a small amount of sample training first Then the classifier is used for the target detection in image by classifier, since training sample is less, classifier initial detecting effect Fruit is bad.But on-line study sample is automatically extracted by tracking, using self-adaptive cascade classifier algorithm to initial cascade point Class device carries out on-line study, so as to step up the precision that the classifier carries out target detection in the picture.And pass through Tracking obtains the new samples of classifier on-line study automatically and automatic marking, improves the intelligence of classifier training process Degree can be changed, significantly reduce the workload of artificial mark sample class.
But by this mechanism of on-line study, the new learning sample of extraction gradually is needed, algorithm complexity is increased Degree, and the promotion of algorithm performance needs a relatively very long process, and initial performance is relatively poor.
Summary of the invention
To solve the above-mentioned problems, the present invention provides a kind of depth sample learning method based on iteration mean cluster, In the learning process of classifier or recurrence device, original sample is classified as by many levels, each layer of list by iteration mean cluster Solely one classifier of training or recurrence device, are then verified by validation data set respectively, obtain each weight for returning device, So that it is guaranteed that maximized study and the accuracy for identifying or classifying using the characteristic in sample data, Lifting scheme.
To achieve the above object, specific technical solution of the present invention is as follows:
A kind of depth sample learning method based on iteration mean cluster, key are to follow the steps below:
S1:Training data is selected, and handles to obtain N+1 layers of training sample subset, N by n times iteration means clustering algorithm ≥1;
S2:Every layer of training sample subset is independently subjected to regression training, obtains N+1 recurrence device;
S3:Verify data is selected, sample space of the sample with each layer will be first verified and carries out Euclidean distance Similarity measures, To convert this layer of sample space most like sample therewith for the verifying sample, and these samples are respectively fed to N+1 It returns in device and obtains N+1 verification result;
S4:Corresponding optimal weight (the w of each recurrence device is determined based on Weighted Fusion mechanism0,w1,…,wN);
S5:Test data is obtained, test sample and each layer of sample space are first subjected to Euclidean distance Similarity measures, To convert this layer of sample space most like sample therewith for the test sample, then these samples are respectively fed to step S2 Resulting N+1 recurrence device and the corresponding optimal weight of the resulting each recurrence device of step S4 obtain final prediction result.
Further, it is determined that optimal weight (w0,w1,…,wN) when constraint condition be:
Optionally, the iteration means clustering algorithm uses K mean cluster.
Optionally, the recurrence device model uses Support vector regression model, kernel function using linear kernel function or Radial basis kernel function.
Optionally, the test data is the medical data of object to be measured, and the training data and verify data are selected from UCI Equal public databases, each sample includes multiple features, and the prediction result is label value (integer or the floating-point of object to be measured Number).
Optionally, the test data is the medical data of object to be measured, and the training data and verify data are selected from UCI Diabetes data or heart disease data in equal public databases, each sample includes multiple features, the prediction result be to Survey the age value of object.
Optionally, using mean absolute error MAE come the performance of evaluation and foreca algorithm, specially:M indicates the number of samples of test data, ajIndicate the corresponding actual value of j-th of test sample, a 'j Indicate the corresponding predicted value of j test sample.
Remarkable result of the invention is:
Learning sample is obtained different training sample data collection by successive ignition mean cluster by this method, is then distinguished It is trained and learns, in the case where identical sample size, by training by different level and learn, effectively increase model Habit ability improves the accuracy of classification or recurrence.
Detailed description of the invention
Fig. 1 is depth sample learning model proposed by the present invention;
Fig. 2 is iteration mean cluster model in Fig. 1;
Fig. 3 is age prediction effect figure in specific embodiment.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention It encloses.
It should be noted that unless otherwise indicated, technical term or scientific term used in this application should be this hair The ordinary meaning that bright one of ordinary skill in the art are understood.
The present embodiment by the age predict for the purpose of describe in detail, select come from UCI database (http:// Archive.ics.uci.edu/ the part sample in two datasets), one is diabetes data collection, abbreviation MD (Mellitus Data Set), the other is heart disease data set, abbreviation HD (Heart Disease Data Set).Heart Sick data set includes 137 normal samples, and each sample includes 14 features;Diabetes data collection includes 268 normal samples, Each sample includes 8 features.The detailed information of two datasets is as shown in table 1.
The essential information of 1 data set of table
Number The range of age (year) Age mean value (year) Age criterion is poor
HD 137 34~71 52.71 9.14
DM 268 21~66 29.94 10.51
Each type of data sample is divided into training set at random, and verifying collection test set 100 times, obtains 100 groups of samples This.In this trial, computer operating system is Windows 10,64,8GB memory;Experiment porch is MATLAB, 2016a.For the ease of the algorithm that subsequent analysis and explanation, the present embodiment propose, referred to as PAEM, traditional algorithm is referred to as TAEM.Method proposed by the present invention can be in conjunction with different regression models, feature selecting algorithm, example optimal algorithm, assessment mark Standard, to be converted into other various specific algorithms.The present embodiment is used as using Support vector regression model returns device, and Use linear kernel function and default parameters.
Specific steps include (note as can be seen from Figure 1:In figure verifying collection and test set be combine depth sample space it Result afterwards):
S1:Training data is selected, and handles to obtain 3 layers of training sample subset by 2 iteration means clustering algorithms;
S2:Every layer of training sample subset is independently subjected to regression training, obtains 3 recurrence devices;
S3:Verify data is selected, sample space of the sample with each layer will be first verified and carries out Euclidean distance Similarity measures, To convert this layer of sample space most like sample therewith for the verifying sample, then these samples are respectively fed to 3 and are returned Return in device and obtains 3 verification results;
S4:Corresponding optimal weight (the w of each recurrence device is determined based on Weighted Fusion mechanism0,w1,w2);
S5:Test data is obtained, test sample and each layer of sample space are first subjected to Euclidean distance Similarity measures, To convert this layer of sample space most like sample therewith for the test sample, then these samples are respectively fed to step S2 Resulting 3 recurrence devices and the corresponding optimal weight of the resulting each recurrence device of step S4 obtain final prediction result.
Specifically, the cluster process of iteration means clustering algorithm is similar to K mean cluster in step S1, as shown in Fig. 2, passing through The center of each class is found at the distance between minimum number strong point and arest neighbors center.
The core concept of iterative mean cluster:Minimize all samples to generic center Euclidean distance and, adopt It is restrained with the mode of iteration.
Given training sample:{x(1),x(2),...,xm, specific step is as follows for K mean cluster algorithm:
1:Choose K cluster centre point, respectively μ12,...,μk
2:The generic c of each sample x is calculated according to following formulaj(1≤j≤k):
3:The center that every one kind is updated according to following formula, by μjIt is updated to μ 'j
4:Constantly repetition step 2,3, until μjNo longer change (convergence)
5:It is resulting as a result, micro- by the random noise progress for increasing a zero-mean normal distribution for clustering each time It adjusts, to obtain next sample set (sample space).
Y in figure0It is original and training set by iteration means clustering algorithm respectively obtains other two layers of sample Y1,Y2。 Three recurrence devices are obtained using the sample set of each layer, based on verifying collection, available corresponding result (r0,r1,r2), it is optimal Weight wop=(w0,w1,w2) can be obtained by formula (3).
Determine optimal weight (w0,w1,w2) when constraint condition be:
After recurrence device model training has learnt, it is based on test set, obtains the prediction age a=(a that each layer returns device0, a1,a2), by merging weight (w0,w1,w2) obtain final age af=wop Ta。
The performance of Measurement Algorithm, using mean absolute error MAE come the performance of evaluation and foreca algorithm, specially:M indicates the number of samples of test data, ajIndicate the corresponding actual value of j-th of test sample, a 'j Indicate the corresponding predicted value of j test sample.Age detection mechanism of the present invention is better than to time of traditional age detection mechanism simultaneously Number scale is Score.
Details are as shown in table 2, and mean indicates that average value, std indicate standard deviation.
From table 2 it can be seen that carrying out the MAE that age detection obtains using the mentioned method of the present invention for two datasets Mean value and standard deviation it is all smaller than traditional, the age of illustration method age forecasting mechanism prediction is than traditional age forecasting mechanism Want more acurrate.Meanwhile Score value is bigger, can illustrate the superiority of this method from the other hand.
The result at 2 two datasets of table prediction age
The histogram of table 2 is shown in Fig. 3.It mainly shows the difference and P value at the prediction age that this method obtains.
From figure 3, it can be seen that the MAE at the age that two datasets are predicted by this paper mechanism is smaller, and hypothesis testing Obtained P value is both less than 0.05, illustrates that the MAE at the prediction age of PAEM is more preferable in significance.
Finally, it should be noted that foregoing description is the preferred embodiment of the present invention, those skilled in the art exist Under enlightenment of the invention, without prejudice to the purpose of the present invention and the claims, multiple similar expressions can be made, this The transformation of sample is fallen within the scope of protection of the present invention.

Claims (7)

1. a kind of depth sample learning method based on iteration mean cluster, it is characterised in that follow the steps below:
S1:Training data is selected, and handles to obtain N+1 layers of training sample subset, N >=1 by n times iteration means clustering algorithm;
S2:Every layer of training sample subset is independently subjected to regression training, obtains N+1 recurrence device;
S3:Verify data is selected, sample space of the sample with each layer will be first verified and carries out Euclidean distance Similarity measures, thus This layer of sample space most like sample therewith is converted by the verifying sample, and these samples are respectively fed to N+1 recurrence N+1 verification result is obtained in device;
S4:Corresponding optimal weight (the w of each recurrence device is determined based on Weighted Fusion mechanism0,w1,…,wN);
S5:Test data is obtained, test sample and each layer of sample space are first subjected to Euclidean distance Similarity measures, thus This layer of sample space most like sample therewith is converted by the test sample, then these samples are respectively fed to obtained by step S2 N+1 recurrence device and the corresponding optimal weight of the resulting each recurrence device of step S4 obtain final prediction result.
2. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:It determines most Good weight (w0,w1,…,wN) when constraint condition be:
3. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:It is described to change Be equal to K mean cluster for the search principle of the cluster centre of means clustering algorithm, but each time iteration when, on original sample is exactly Cluster centre after primary cluster.
4. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:Described time Return device model using Support vector regression model, kernel function uses linear kernel function or Radial basis kernel function.
5. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:The survey The medical data that data are object to be measured is tried, the training data and verify data are selected from the public databases such as UCI, each sample Including multiple features, the prediction result is the label of object to be measured.
6. the depth sample learning method according to claim 1 based on iteration mean cluster, it is characterised in that:The survey Try the medical data that data are object to be measured, the glycosuria of the training data and verify data in the public databases such as UCI Sick data or heart disease data, each sample include multiple features, and the prediction result is the age value of object to be measured.
7. -6 any depth sample learning method based on iteration mean cluster according to claim 1, it is characterised in that: Using mean absolute error MAE come the performance of evaluation and foreca algorithm, specially:M indicates test number According to number of samples, ajIndicate the corresponding actual value of j-th of test sample, a 'jIndicate the corresponding predicted value of j test sample.
CN201810558766.XA 2018-06-01 2018-06-01 Depth sample learning method based on iterative mean clustering Active CN108877947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810558766.XA CN108877947B (en) 2018-06-01 2018-06-01 Depth sample learning method based on iterative mean clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810558766.XA CN108877947B (en) 2018-06-01 2018-06-01 Depth sample learning method based on iterative mean clustering

Publications (2)

Publication Number Publication Date
CN108877947A true CN108877947A (en) 2018-11-23
CN108877947B CN108877947B (en) 2021-10-15

Family

ID=64336272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810558766.XA Active CN108877947B (en) 2018-06-01 2018-06-01 Depth sample learning method based on iterative mean clustering

Country Status (1)

Country Link
CN (1) CN108877947B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222762A (en) * 2019-06-04 2019-09-10 恒安嘉新(北京)科技股份公司 Object prediction method, apparatus, equipment and medium
CN111914995A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Regularized linear regression generation method and device, electronic equipment and storage medium
CN113393932A (en) * 2021-07-06 2021-09-14 重庆大学 Parkinson's disease voice sample segment multi-type reconstruction transformation method
CN114300116A (en) * 2021-11-10 2022-04-08 安徽大学 Robust disease detection method based on online classification algorithm
CN115570228A (en) * 2022-11-22 2023-01-06 苏芯物联技术(南京)有限公司 Intelligent feedback control method and system for welding pipeline gas supply

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020046198A1 (en) * 2000-06-19 2002-04-18 Ben Hitt Heuristic method of classification
CN101944122A (en) * 2010-09-17 2011-01-12 浙江工商大学 Incremental learning-fused support vector machine multi-class classification method
CN105938116A (en) * 2016-06-20 2016-09-14 吉林大学 Gas sensor array concentration detection method based on fuzzy division and model integration

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020046198A1 (en) * 2000-06-19 2002-04-18 Ben Hitt Heuristic method of classification
CN101944122A (en) * 2010-09-17 2011-01-12 浙江工商大学 Incremental learning-fused support vector machine multi-class classification method
CN105938116A (en) * 2016-06-20 2016-09-14 吉林大学 Gas sensor array concentration detection method based on fuzzy division and model integration

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222762A (en) * 2019-06-04 2019-09-10 恒安嘉新(北京)科技股份公司 Object prediction method, apparatus, equipment and medium
CN111914995A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Regularized linear regression generation method and device, electronic equipment and storage medium
CN113393932A (en) * 2021-07-06 2021-09-14 重庆大学 Parkinson's disease voice sample segment multi-type reconstruction transformation method
CN113393932B (en) * 2021-07-06 2022-11-25 重庆大学 Parkinson's disease voice sample segment multi-type reconstruction transformation method
CN114300116A (en) * 2021-11-10 2022-04-08 安徽大学 Robust disease detection method based on online classification algorithm
CN114300116B (en) * 2021-11-10 2023-11-28 安徽大学 Robust syndrome detection method based on online classification algorithm
CN115570228A (en) * 2022-11-22 2023-01-06 苏芯物联技术(南京)有限公司 Intelligent feedback control method and system for welding pipeline gas supply

Also Published As

Publication number Publication date
CN108877947B (en) 2021-10-15

Similar Documents

Publication Publication Date Title
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN108877947A (en) Depth sample learning method based on iteration mean cluster
CN111126482B (en) Remote sensing image automatic classification method based on multi-classifier cascade model
Wang et al. Relaxed multiple-instance SVM with application to object discovery
CN106682696B (en) The more example detection networks and its training method refined based on online example classification device
CN104484681B (en) Hyperspectral Remote Sensing Imagery Classification method based on spatial information and integrated study
CN110940523B (en) Unsupervised domain adaptive fault diagnosis method
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
CN110543906B (en) Automatic skin recognition method based on Mask R-CNN model
CN103714148B (en) SAR image search method based on sparse coding classification
CN104615894A (en) Traditional Chinese medicine diagnosis method and system based on k-nearest neighbor labeled specific weight characteristics
CN110210625A (en) Modeling method, device, computer equipment and storage medium based on transfer learning
CN110363230A (en) Stacking integrated sewage handling failure diagnostic method based on weighting base classifier
CN106250913B (en) A kind of combining classifiers licence plate recognition method based on local canonical correlation analysis
Iqbal et al. Mitochondrial organelle movement classification (fission and fusion) via convolutional neural network approach
CN109933619A (en) A kind of semisupervised classification prediction technique
CN104978569A (en) Sparse representation based incremental face recognition method
CN107016377A (en) Recognition of face optimization method based on SGASEN algorithms
Zhong et al. Fuzzy nonlinear proximal support vector machine for land extraction based on remote sensing image
Lin et al. A fusion-based convolutional fuzzy neural network for lung cancer classification
Soni et al. RFSVM: A novel classification technique for breast cancer diagnosis
CN105894032A (en) Method of extracting effective features based on sample properties
CN117195027A (en) Cluster weighted clustering integration method based on member selection
CN109191452B (en) Peritoneal transfer automatic marking method for abdominal cavity CT image based on active learning
CN111144453A (en) Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant