CN106156789A - Towards the validity feature sample identification techniques strengthening grader popularization performance - Google Patents
Towards the validity feature sample identification techniques strengthening grader popularization performance Download PDFInfo
- Publication number
- CN106156789A CN106156789A CN201610303447.5A CN201610303447A CN106156789A CN 106156789 A CN106156789 A CN 106156789A CN 201610303447 A CN201610303447 A CN 201610303447A CN 106156789 A CN106156789 A CN 106156789A
- Authority
- CN
- China
- Prior art keywords
- feature
- sigma
- cluster
- clustering
- grader
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of towards strengthen grader promote performance validity feature sample identification techniques, it is characterised in that the method comprises the steps: 1) grader promote Performance Evaluating Indexes foundation;2) structure of fuzzy clustering criterion;3) clustering of feature samples collection;4) definition of average distance between average distance and class in class;5) foundation of initial clustering preferred criteria;6) feature samples collection secondary clustering recognition.The invention has the beneficial effects as follows, method is reasonable in design, uses simple and clear, effectively removes noise or wild point, and feature samples discrimination is high.
Description
Technical field
The present invention, based on signal processing theory, proposes a kind of validity feature specimen discerning on the basis of data clusters is analyzed
Method, utilizes the pattern of cluster analysis automatically to divide characteristic, rejects the wild point in characteristic or noise, reaches characteristic clean
The purpose changed, improves the popularization performance of support vector machine classifier on this basis.These computational methods are examined for solving mechanical breakdown
The accurate model identification related in disconnected field is laid a good foundation with classification problem.
Background technology
Support vector machine (SVM) method very advantageous in terms of pattern classification based on Statistical Learning Theory, the most successfully should
For fault diagnosis.In theory, the optimal classification surface of SVM is determined by the support vector being positioned at class edge, and it is attached to be positioned at class edge
Near open country (value) is put or (sound) point of making an uproar is often mixed in together with effective sample, and it is not optimum for causing calculated classifying face,
Thus have impact on the popularization performance of grader[1,2]。
In actual diagnosis application occasion, in signal detection, external interference and the collection of objects perimeter to be diagnosed are
The interior noise etc. of system all may introduce noise jamming in original observed data;In senser element exception or fault, system power or
The unusual fluctuations of motion or the change of only operating condition, it is also possible to produce abnormal observation outlier.These are present in original
Noise or outlier in data process if inappropriate, will enter feature space in company with feature extraction, and be formed and deviate considerably from entirety
The noise of category feature or wild point.Additionally, also have many to affect the negative factor of fault diagnosis, such as, vibrate and pass in frame for movement
The sensing observation information redundancy that the scattering passed and reverberation effect are caused, the too high intrinsic dimensionality selected by feature extraction step
Deng.Information redundancy will result in the difficulty that subsequent characteristics is extracted, and amplifies noise or the negative effect of outlier further;Intrinsic dimensionality
Select too high, then the estimation that can make sample statistics characteristic is more difficult, thus reduces the Generalization Ability of grader or extensive energy
Power[3].Therefore, it is necessary to first characteristic to be carried out the purified treatment of necessity, can be only achieved the purpose of efficient diagnosis.
Summary of the invention
The invention aims to solve the problems referred to above, develop a kind of towards strengthening the effective of grader popularization performance
Feature samples identification technology.
Realize above-mentioned purpose the technical scheme is that, a kind of towards the validity feature sample strengthening grader popularization performance
This identification technology, it is characterised in that the method comprises the steps:
1) foundation of Performance Evaluating Indexes promoted by grader;
2) structure of fuzzy clustering criterion;
3) clustering of feature samples collection;
4) definition of average distance between average distance and class in class;
5) foundation of initial clustering preferred criteria;
6) feature samples collection secondary clustering recognition.
The calculating formula of setting up of Performance Evaluating Indexes promoted by described grader:
R (w)=Remp(w)+Φ(h/l),
h≤min([r2a2],n)+1.
In formula, Φ () is confidence risk function, and h is the VC dimension of classification function, and l is number of training.It will be seen that it is true
Real risk R (w) is by empiric risk RempW () is constituted with confidence risk Φ () two parts.[] represents round numbers part.R is bag
Containing all higher dimensional space mapping pointsMinimal hyper-sphere radius.
The structure calculating formula of described fuzzy clustering criterion is:
D in formulaik=| | xk-vi|| for sample xkWith cluster centre viBetween distance, general use Euclidean distance tolerance.m
For FUZZY WEIGHTED index, generally take m=2.JFCM(U, V) represents the Different categories of samples quadratic sum to cluster centre Weighted distance, power
It is heavily sample xkM power to the i-th class degree of membership.
Clustering calculating formula of described feature samples collection is:
Formula sets clusters number c, Fuzzy Weighting Exponent m and initial subordinated-degree matrix U0, iterative steps l=0.For giving
Fixed stop value ε > 0, iterative computation is until max{ | uik l-uik l-1| < ε, algorithm terminates;Otherwise l=l+1, algorithm continues executing with.
In described class, between average distance and class, the definition calculating formula of average distance is:
In formulaFor cluster { XOThe combination of two number of data sample in }.viWith vjIt is respectively ith cluster { XiWith
Jth cluster { XjCenter.Combined number for c cluster numbering combination of two.
The calculating formula of setting up of described initial clustering preferred criteria is:
{ X in formulafBe included in c cluster (usual c >=3) be made up of validity feature sample, capacity be nfIn and
The heart is vfInitial effectively cluster, { XnIt is n for that be mainly made up of noise or wild point, capacitynAnd center is vnThe most invalid
Cluster { Xn}。
Described feature samples collection secondary clustering recognition calculating formula is:
{ X in formulasIt is from { XdIn }, a capacity of extraction is ns, center be vsCombination subset, and meet minimize standard
Then condition.{ X after effective sample extractsdIn }, remaining data sample constitutes subset { Xt, it is incorporated to by making an uproar what (wild) point was constituted
Invalid cluster { XnIn }.Formula (18) performs the secondary partition process of invalid cluster, wherein xiInvalid for formed after main division
Cluster { XnData sample in }.XnearFor effectively cluster { XfThe invalid cluster of distance { X in }nCenter vnNearest data sample
This.
Accompanying drawing explanation
Fig. 1 is of the present invention towards strengthening the flow process signal that the validity feature sample identification techniques of performance promoted by grader
Figure
Fig. 2 is that SVM classifier promotes performance evaluation schematic diagram
Hypersphere territory in Fig. 3 feature space describes
The recognition result of Fig. 4 normal condition validity feature sample
Fig. 5 gear teeth destroy the recognition result of validity feature sample
Fig. 6 support loosens the recognition result of validity feature sample
Detailed description of the invention
Below in conjunction with the accompanying drawings the present invention is specifically described, if Fig. 1 is of the present invention towards strengthening grader popularization
The schematic flow sheet of the validity feature sample identification techniques of performance, uses the validity feature sample removed based on noise (wild point) to know
Other method, carries out the feature samples in feature space purifying pretreatment.
The validity feature sample that the technical program loosens three quasi-modes with gear-box normal condition, gear teeth destruction and support is known
Not Wei example stated features sample clean preprocessing process, its ultimate principle is: structure risk based on Statistical Learning Theory is
Littleization (SRM) principle, maximize SVM classifier popularization performance, to multiple fault mode category feature samples according to stratification at
Reason principle carries out twice purified treatment, it is thus achieved that for the validity feature sample of classifier design, SVM classifier promotes performance evaluation
Principle such as Fig. 2.I.e.
R (w)=Remp(w)+Φ(h/l),
h≤min([r2a2],n)+1.
In formula, Φ () is confidence risk function, and h is the VC dimension of classification function, and l is number of training.It will be seen that it is true
Real risk R (w) is by empiric risk RempW () is constituted with confidence risk Φ () two parts.[] represents round numbers part.R is bag
Containing all higher dimensional space mapping pointsMinimal hyper-sphere radius.Hypersphere territory in feature space is retouched
State such as Fig. 3.
Embodiment 1
Normal condition validity feature specimen discerning
Normal condition feature samples collection is set up clustering criteria successively, and is carried out continuously twice clustering, validity feature
Specimen discerning result is as shown in Figure 4.
Clustering criteria sets up formula:
Clustering formula of normal condition feature samples collection is:
In normal condition class, between average distance and class, the definition of average distance is:
The formula of setting up of normal condition feature samples initial clustering preferred criteria is:
Normal condition feature samples collection secondary clustering recognition calculating formula is:
Embodiment 2
The gear teeth destroy validity feature specimen discerning
Gear teeth destructive characteristics sample set is set up clustering criteria successively, and is carried out continuously twice clustering, validity feature
Specimen discerning result is as shown in Figure 5.
Clustering criteria sets up formula:
Clustering formula of gear teeth destructive characteristics sample set is:
In gear teeth destruction class, between average distance and class, the definition of average distance is:
The formula of setting up of gear teeth destructive characteristics sample initial clustering preferred criteria is:
Gear teeth destructive characteristics sample set secondary clustering recognition calculating formula is:
Embodiment 3
Support loosens validity feature specimen discerning
Support is loosened feature samples collection and sets up clustering criteria successively, and be carried out continuously twice clustering, validity feature
Specimen discerning result is as shown in Figure 6.
Clustering criteria sets up formula:
Support loosens clustering formula of feature samples collection:
Support loosens in class the definition of average distance between average distance and class:
Support loosens the formula of setting up of feature samples initial clustering preferred criteria:
Support loosens feature samples collection secondary clustering recognition calculating formula:
List of references
[1] Du, Liu Sanyang, Qi little Gang. the fuzzy support vector machine of a kind of new membership function. Journal of System Simulation,
2009,21(7):1901-1903.
[2] Ding Shifei, Qi Bingjuan, Tan Hongyan. support vector machine is theoretical to be summarized with algorithm research. and University of Electronic Science and Technology is learned
Report, 2011,40 (1): 2-10.
[3] young tiger is opened. the analysis of feature selecting algorithm and research in text classification. Hefei: China Science & Technology University master
Academic dissertation, 2010.
Technique scheme only embodies the optimal technical scheme of technical solution of the present invention, those skilled in the art
Some variations may made some of which part all embody the principle of the present invention, belong to protection scope of the present invention it
In.
Claims (7)
1. the validity feature sample identification techniques promoting performance towards enhancing grader, it is characterised in that the method includes
Following steps:
1) foundation of Performance Evaluating Indexes promoted by grader;
2) structure of fuzzy clustering criterion;
3) clustering of feature samples collection;
4) definition of average distance between average distance and class in class;
5) foundation of initial clustering preferred criteria;
6) feature samples collection secondary clustering recognition.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature
Being, described grader is promoted the calculating formula of setting up of Performance Evaluating Indexes and is:
R (w)=Remp(w)+Φ(h/l),
h≤min([r2a2],n)+1.
In formula, Φ () is confidence risk function, and h is the VC dimension of classification function, and l is number of training.It will be seen that true wind
Danger R (w) is by empiric risk RempW () is constituted with confidence risk Φ () two parts.[] represents round numbers part.R is for comprising
There is higher dimensional space mapping point Minimal hyper-sphere radius.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature
Being, the structure calculating formula of described fuzzy clustering criterion is:
D in formulaik=| | xk-vi|| for sample xkWith cluster centre viBetween distance, general use Euclidean distance tolerance.M is mould
Stick with paste Weighted Index, generally take m=2.JFCM(U, V) represents the Different categories of samples quadratic sum to cluster centre Weighted distance, and weight is sample
This xkM power to the i-th class degree of membership.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature
Being, clustering calculating formula of described feature samples collection is:
Formula sets clusters number c, Fuzzy Weighting Exponent m and initial subordinated-degree matrix U0, iterative steps l=0.For given
Stop value ε > 0, iterative computation is until max{ | uik l-uik l-1| < ε, algorithm terminates;Otherwise l=l+1, algorithm continues executing with.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature
Being, in described class, between average distance and class, the definition calculating formula of average distance is:
In formulaFor cluster { XOThe combination of two number of data sample in }.viWith vjIt is respectively ith cluster { XiAnd jth
Cluster { XjCenter.Cc 2Combined number for c cluster numbering combination of two.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature
Being, the calculating formula of setting up of described initial clustering preferred criteria is:
{ X in formulafBe included in c cluster (usual c >=3) be made up of validity feature sample, capacity be nfAnd center is
vfInitial effectively cluster, { XnIt is n for that be mainly made up of noise or wild point, capacitynAnd center is vnInitial invalid cluster
{Xn}。
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature
Being, described feature samples collection secondary clustering recognition calculating formula is:
{ X in formulasIt is from { XdIn }, a capacity of extraction is ns, center be vsCombination subset, and meet minimize criterion bar
Part.{ X after effective sample extractsdIn }, remaining data sample constitutes subset { Xt, be incorporated to by make an uproar (wild) point constitute invalid
Cluster { XnIn }.Formula (18) performs the secondary partition process of invalid cluster, wherein xiFor the invalid cluster formed after main division
{XnData sample in }.XnearFor effectively cluster { XfThe invalid cluster of distance { X in }nCenter vnNearest data sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610303447.5A CN106156789A (en) | 2016-05-09 | 2016-05-09 | Towards the validity feature sample identification techniques strengthening grader popularization performance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610303447.5A CN106156789A (en) | 2016-05-09 | 2016-05-09 | Towards the validity feature sample identification techniques strengthening grader popularization performance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106156789A true CN106156789A (en) | 2016-11-23 |
Family
ID=57352810
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610303447.5A Pending CN106156789A (en) | 2016-05-09 | 2016-05-09 | Towards the validity feature sample identification techniques strengthening grader popularization performance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106156789A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109239585A (en) * | 2018-09-06 | 2019-01-18 | 南京理工大学 | A kind of method for diagnosing faults based on the preferred wavelet packet of improvement |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101980202A (en) * | 2010-11-04 | 2011-02-23 | 西安电子科技大学 | Semi-supervised classification method of unbalance data |
CN104794482A (en) * | 2015-03-24 | 2015-07-22 | 江南大学 | Inter-class maximization clustering algorithm based on improved kernel fuzzy C mean value |
CN105447520A (en) * | 2015-11-23 | 2016-03-30 | 盐城工学院 | Sample classification method based on weighted PTSVM (projection twin support vector machine) |
-
2016
- 2016-05-09 CN CN201610303447.5A patent/CN106156789A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101980202A (en) * | 2010-11-04 | 2011-02-23 | 西安电子科技大学 | Semi-supervised classification method of unbalance data |
CN104794482A (en) * | 2015-03-24 | 2015-07-22 | 江南大学 | Inter-class maximization clustering algorithm based on improved kernel fuzzy C mean value |
CN105447520A (en) * | 2015-11-23 | 2016-03-30 | 盐城工学院 | Sample classification method based on weighted PTSVM (projection twin support vector machine) |
Non-Patent Citations (1)
Title |
---|
焦卫东 等: "整体改进的基于支持向量机的故障诊断方法", 《仪器仪表学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109239585A (en) * | 2018-09-06 | 2019-01-18 | 南京理工大学 | A kind of method for diagnosing faults based on the preferred wavelet packet of improvement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109582003B (en) | Bearing fault diagnosis method based on pseudo label semi-supervised kernel local Fisher discriminant analysis | |
Qin et al. | The optimized deep belief networks with improved logistic sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines | |
CN102944418B (en) | Wind turbine generator group blade fault diagnosis method | |
Bezdek | Numerical taxonomy with fuzzy sets | |
Liu et al. | Study on SVM compared with the other text classification methods | |
CN111524606A (en) | Tumor data statistical method based on random forest algorithm | |
CN109102005A (en) | Small sample deep learning method based on shallow Model knowledge migration | |
CN107590506A (en) | A kind of complex device method for diagnosing faults of feature based processing | |
CN105487526A (en) | FastRVM (fast relevance vector machine) wastewater treatment fault diagnosis method | |
CN110009030B (en) | Sewage treatment fault diagnosis method based on stacking meta-learning strategy | |
Biswal et al. | Classification of power quality data using decision tree and chemotactic differential evolution based fuzzy clustering | |
CN104794368A (en) | Rolling bearing fault classifying method based on FOA-MKSVM (fruit fly optimization algorithm-multiple kernel support vector machine) | |
CN106599913A (en) | Cluster-based multi-label imbalance biomedical data classification method | |
CN110737976B (en) | Mechanical equipment health assessment method based on multidimensional information fusion | |
US20240133391A1 (en) | Prediction method for stall and surging of axial-flow compressor based on deep autoregressive network | |
CN113327632B (en) | Unsupervised abnormal sound detection method and device based on dictionary learning | |
CN111709299A (en) | Underwater sound target identification method based on weighting support vector machine | |
CN115048988B (en) | Unbalanced data set classification fusion method based on Gaussian mixture model | |
CN111753891A (en) | Rolling bearing fault diagnosis method based on unsupervised feature learning | |
CN109325553B (en) | Wind power gear box fault detection method, system, equipment and medium | |
CN110288028A (en) | ECG detecting method, system, equipment and computer readable storage medium | |
CN109976308A (en) | A kind of extracting method of the fault signature based on Laplce's score value and AP cluster | |
CN107194207A (en) | Protein ligands binding site estimation method based on granularity support vector machine ensembles | |
CN111611867A (en) | Rolling bearing intelligent fault diagnosis method based on multi-classification fuzzy correlation vector machine | |
CN107527064A (en) | A kind of application of manifold learning in fault diagnosis data extraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20161123 |
|
WD01 | Invention patent application deemed withdrawn after publication |