CN106156789A - Towards the validity feature sample identification techniques strengthening grader popularization performance - Google Patents

Towards the validity feature sample identification techniques strengthening grader popularization performance Download PDF

Info

Publication number
CN106156789A
CN106156789A CN201610303447.5A CN201610303447A CN106156789A CN 106156789 A CN106156789 A CN 106156789A CN 201610303447 A CN201610303447 A CN 201610303447A CN 106156789 A CN106156789 A CN 106156789A
Authority
CN
China
Prior art keywords
feature
sigma
cluster
clustering
grader
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610303447.5A
Other languages
Chinese (zh)
Inventor
焦卫东
杨志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Normal University CJNU
Original Assignee
Zhejiang Normal University CJNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Normal University CJNU filed Critical Zhejiang Normal University CJNU
Priority to CN201610303447.5A priority Critical patent/CN106156789A/en
Publication of CN106156789A publication Critical patent/CN106156789A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of towards strengthen grader promote performance validity feature sample identification techniques, it is characterised in that the method comprises the steps: 1) grader promote Performance Evaluating Indexes foundation;2) structure of fuzzy clustering criterion;3) clustering of feature samples collection;4) definition of average distance between average distance and class in class;5) foundation of initial clustering preferred criteria;6) feature samples collection secondary clustering recognition.The invention has the beneficial effects as follows, method is reasonable in design, uses simple and clear, effectively removes noise or wild point, and feature samples discrimination is high.

Description

Towards the validity feature sample identification techniques strengthening grader popularization performance
Technical field
The present invention, based on signal processing theory, proposes a kind of validity feature specimen discerning on the basis of data clusters is analyzed Method, utilizes the pattern of cluster analysis automatically to divide characteristic, rejects the wild point in characteristic or noise, reaches characteristic clean The purpose changed, improves the popularization performance of support vector machine classifier on this basis.These computational methods are examined for solving mechanical breakdown The accurate model identification related in disconnected field is laid a good foundation with classification problem.
Background technology
Support vector machine (SVM) method very advantageous in terms of pattern classification based on Statistical Learning Theory, the most successfully should For fault diagnosis.In theory, the optimal classification surface of SVM is determined by the support vector being positioned at class edge, and it is attached to be positioned at class edge Near open country (value) is put or (sound) point of making an uproar is often mixed in together with effective sample, and it is not optimum for causing calculated classifying face, Thus have impact on the popularization performance of grader[1,2]
In actual diagnosis application occasion, in signal detection, external interference and the collection of objects perimeter to be diagnosed are The interior noise etc. of system all may introduce noise jamming in original observed data;In senser element exception or fault, system power or The unusual fluctuations of motion or the change of only operating condition, it is also possible to produce abnormal observation outlier.These are present in original Noise or outlier in data process if inappropriate, will enter feature space in company with feature extraction, and be formed and deviate considerably from entirety The noise of category feature or wild point.Additionally, also have many to affect the negative factor of fault diagnosis, such as, vibrate and pass in frame for movement The sensing observation information redundancy that the scattering passed and reverberation effect are caused, the too high intrinsic dimensionality selected by feature extraction step Deng.Information redundancy will result in the difficulty that subsequent characteristics is extracted, and amplifies noise or the negative effect of outlier further;Intrinsic dimensionality Select too high, then the estimation that can make sample statistics characteristic is more difficult, thus reduces the Generalization Ability of grader or extensive energy Power[3].Therefore, it is necessary to first characteristic to be carried out the purified treatment of necessity, can be only achieved the purpose of efficient diagnosis.
Summary of the invention
The invention aims to solve the problems referred to above, develop a kind of towards strengthening the effective of grader popularization performance Feature samples identification technology.
Realize above-mentioned purpose the technical scheme is that, a kind of towards the validity feature sample strengthening grader popularization performance This identification technology, it is characterised in that the method comprises the steps:
1) foundation of Performance Evaluating Indexes promoted by grader;
2) structure of fuzzy clustering criterion;
3) clustering of feature samples collection;
4) definition of average distance between average distance and class in class;
5) foundation of initial clustering preferred criteria;
6) feature samples collection secondary clustering recognition.
The calculating formula of setting up of Performance Evaluating Indexes promoted by described grader:
R (w)=Remp(w)+Φ(h/l),
h≤min([r2a2],n)+1.
In formula, Φ () is confidence risk function, and h is the VC dimension of classification function, and l is number of training.It will be seen that it is true Real risk R (w) is by empiric risk RempW () is constituted with confidence risk Φ () two parts.[] represents round numbers part.R is bag Containing all higher dimensional space mapping pointsMinimal hyper-sphere radius.
The structure calculating formula of described fuzzy clustering criterion is:
D in formulaik=| | xk-vi|| for sample xkWith cluster centre viBetween distance, general use Euclidean distance tolerance.m For FUZZY WEIGHTED index, generally take m=2.JFCM(U, V) represents the Different categories of samples quadratic sum to cluster centre Weighted distance, power It is heavily sample xkM power to the i-th class degree of membership.
Clustering calculating formula of described feature samples collection is:
Formula sets clusters number c, Fuzzy Weighting Exponent m and initial subordinated-degree matrix U0, iterative steps l=0.For giving Fixed stop value ε > 0, iterative computation is until max{ | uik l-uik l-1| < ε, algorithm terminates;Otherwise l=l+1, algorithm continues executing with.
In described class, between average distance and class, the definition calculating formula of average distance is:
In formulaFor cluster { XOThe combination of two number of data sample in }.viWith vjIt is respectively ith cluster { XiWith Jth cluster { XjCenter.Combined number for c cluster numbering combination of two.
The calculating formula of setting up of described initial clustering preferred criteria is:
{ X in formulafBe included in c cluster (usual c >=3) be made up of validity feature sample, capacity be nfIn and The heart is vfInitial effectively cluster, { XnIt is n for that be mainly made up of noise or wild point, capacitynAnd center is vnThe most invalid Cluster { Xn}。
Described feature samples collection secondary clustering recognition calculating formula is:
{ X in formulasIt is from { XdIn }, a capacity of extraction is ns, center be vsCombination subset, and meet minimize standard Then condition.{ X after effective sample extractsdIn }, remaining data sample constitutes subset { Xt, it is incorporated to by making an uproar what (wild) point was constituted Invalid cluster { XnIn }.Formula (18) performs the secondary partition process of invalid cluster, wherein xiInvalid for formed after main division Cluster { XnData sample in }.XnearFor effectively cluster { XfThe invalid cluster of distance { X in }nCenter vnNearest data sample This.
Accompanying drawing explanation
Fig. 1 is of the present invention towards strengthening the flow process signal that the validity feature sample identification techniques of performance promoted by grader Figure
Fig. 2 is that SVM classifier promotes performance evaluation schematic diagram
Hypersphere territory in Fig. 3 feature space describes
The recognition result of Fig. 4 normal condition validity feature sample
Fig. 5 gear teeth destroy the recognition result of validity feature sample
Fig. 6 support loosens the recognition result of validity feature sample
Detailed description of the invention
Below in conjunction with the accompanying drawings the present invention is specifically described, if Fig. 1 is of the present invention towards strengthening grader popularization The schematic flow sheet of the validity feature sample identification techniques of performance, uses the validity feature sample removed based on noise (wild point) to know Other method, carries out the feature samples in feature space purifying pretreatment.
The validity feature sample that the technical program loosens three quasi-modes with gear-box normal condition, gear teeth destruction and support is known Not Wei example stated features sample clean preprocessing process, its ultimate principle is: structure risk based on Statistical Learning Theory is Littleization (SRM) principle, maximize SVM classifier popularization performance, to multiple fault mode category feature samples according to stratification at Reason principle carries out twice purified treatment, it is thus achieved that for the validity feature sample of classifier design, SVM classifier promotes performance evaluation Principle such as Fig. 2.I.e.
R (w)=Remp(w)+Φ(h/l),
h≤min([r2a2],n)+1.
In formula, Φ () is confidence risk function, and h is the VC dimension of classification function, and l is number of training.It will be seen that it is true Real risk R (w) is by empiric risk RempW () is constituted with confidence risk Φ () two parts.[] represents round numbers part.R is bag Containing all higher dimensional space mapping pointsMinimal hyper-sphere radius.Hypersphere territory in feature space is retouched State such as Fig. 3.
Embodiment 1
Normal condition validity feature specimen discerning
Normal condition feature samples collection is set up clustering criteria successively, and is carried out continuously twice clustering, validity feature Specimen discerning result is as shown in Figure 4.
Clustering criteria sets up formula:
Clustering formula of normal condition feature samples collection is:
In normal condition class, between average distance and class, the definition of average distance is:
The formula of setting up of normal condition feature samples initial clustering preferred criteria is:
Normal condition feature samples collection secondary clustering recognition calculating formula is:
Embodiment 2
The gear teeth destroy validity feature specimen discerning
Gear teeth destructive characteristics sample set is set up clustering criteria successively, and is carried out continuously twice clustering, validity feature Specimen discerning result is as shown in Figure 5.
Clustering criteria sets up formula:
Clustering formula of gear teeth destructive characteristics sample set is:
In gear teeth destruction class, between average distance and class, the definition of average distance is:
The formula of setting up of gear teeth destructive characteristics sample initial clustering preferred criteria is:
Gear teeth destructive characteristics sample set secondary clustering recognition calculating formula is:
Embodiment 3
Support loosens validity feature specimen discerning
Support is loosened feature samples collection and sets up clustering criteria successively, and be carried out continuously twice clustering, validity feature Specimen discerning result is as shown in Figure 6.
Clustering criteria sets up formula:
Support loosens clustering formula of feature samples collection:
Support loosens in class the definition of average distance between average distance and class:
Support loosens the formula of setting up of feature samples initial clustering preferred criteria:
Support loosens feature samples collection secondary clustering recognition calculating formula:
List of references
[1] Du, Liu Sanyang, Qi little Gang. the fuzzy support vector machine of a kind of new membership function. Journal of System Simulation, 2009,21(7):1901-1903.
[2] Ding Shifei, Qi Bingjuan, Tan Hongyan. support vector machine is theoretical to be summarized with algorithm research. and University of Electronic Science and Technology is learned Report, 2011,40 (1): 2-10.
[3] young tiger is opened. the analysis of feature selecting algorithm and research in text classification. Hefei: China Science & Technology University master Academic dissertation, 2010.
Technique scheme only embodies the optimal technical scheme of technical solution of the present invention, those skilled in the art Some variations may made some of which part all embody the principle of the present invention, belong to protection scope of the present invention it In.

Claims (7)

1. the validity feature sample identification techniques promoting performance towards enhancing grader, it is characterised in that the method includes Following steps:
1) foundation of Performance Evaluating Indexes promoted by grader;
2) structure of fuzzy clustering criterion;
3) clustering of feature samples collection;
4) definition of average distance between average distance and class in class;
5) foundation of initial clustering preferred criteria;
6) feature samples collection secondary clustering recognition.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature Being, described grader is promoted the calculating formula of setting up of Performance Evaluating Indexes and is:
R (w)=Remp(w)+Φ(h/l),
h≤min([r2a2],n)+1.
In formula, Φ () is confidence risk function, and h is the VC dimension of classification function, and l is number of training.It will be seen that true wind Danger R (w) is by empiric risk RempW () is constituted with confidence risk Φ () two parts.[] represents round numbers part.R is for comprising There is higher dimensional space mapping point Minimal hyper-sphere radius.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature Being, the structure calculating formula of described fuzzy clustering criterion is:
min J F C M ( U , V ) = &Sigma; k = 1 n &Sigma; i = 1 c ( u i k ) m ( d i k ) 2 .
D in formulaik=| | xk-vi|| for sample xkWith cluster centre viBetween distance, general use Euclidean distance tolerance.M is mould Stick with paste Weighted Index, generally take m=2.JFCM(U, V) represents the Different categories of samples quadratic sum to cluster centre Weighted distance, and weight is sample This xkM power to the i-th class degree of membership.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature Being, clustering calculating formula of described feature samples collection is:
v i l = &Sigma; k = 1 n ( u i k l ) m x k / &Sigma; k = 1 n ( u i k l ) m , i = 1 , K , c ,
u i k l + 1 = 1 / &Sigma; j = 1 c ( d i k d j k ) 2 m - 1 , &ForAll; i , &ForAll; k .
Formula sets clusters number c, Fuzzy Weighting Exponent m and initial subordinated-degree matrix U0, iterative steps l=0.For given Stop value ε > 0, iterative computation is until max{ | uik l-uik l-1| < ε, algorithm terminates;Otherwise l=l+1, algorithm continues executing with.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature Being, in described class, between average distance and class, the definition calculating formula of average distance is:
&delta; i n n e r = &Sigma; i = 1 n O - 1 &Sigma; j = i + 1 n O | | x i - x j | | / C n O 2 , &delta; i n t e r = &Sigma; i = 1 c - 1 &Sigma; j = i + 1 c | | v i - v j | | / C c 2 , .
In formulaFor cluster { XOThe combination of two number of data sample in }.viWith vjIt is respectively ith cluster { XiAnd jth Cluster { XjCenter.Cc 2Combined number for c cluster numbering combination of two.
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature Being, the calculating formula of setting up of described initial clustering preferred criteria is:
{ X f } &DoubleLeftArrow; { X O } ,
s . t . m a x O &lsqb; n O C n O 2 / &Sigma; i = 1 n O - 1 &Sigma; j = i + 1 n O | | x i - x j | | &rsqb; .
{ X n } &DoubleLeftArrow; { X O } ,
s . t . min O &lsqb; n O C n O 2 / &Sigma; i = 1 n O - 1 &Sigma; j = i + 1 n O | | x i - x j | | &rsqb; .
{ X in formulafBe included in c cluster (usual c >=3) be made up of validity feature sample, capacity be nfAnd center is vfInitial effectively cluster, { XnIt is n for that be mainly made up of noise or wild point, capacitynAnd center is vnInitial invalid cluster {Xn}。
The validity feature sample identification techniques promoting performance towards enhancing grader the most according to claim 1, its feature Being, described feature samples collection secondary clustering recognition calculating formula is:
i f | | v d - v f | | < &delta; int e r , t h e n { X d } &DoubleRightArrow; { X f } .
e l s e { X s } &DoubleRightArrow; { X f } , a n d { X t } &DoubleRightArrow; { X n } ,
s . t . min s &lsqb; &Sigma; i = 1 n s - 1 &Sigma; j = i + 1 n s | | x i - x j | | | | v s - v f | | / C n s 2 &rsqb; .
{ X in formulasIt is from { XdIn }, a capacity of extraction is ns, center be vsCombination subset, and meet minimize criterion bar Part.{ X after effective sample extractsdIn }, remaining data sample constitutes subset { Xt, be incorporated to by make an uproar (wild) point constitute invalid Cluster { XnIn }.Formula (18) performs the secondary partition process of invalid cluster, wherein xiFor the invalid cluster formed after main division {XnData sample in }.XnearFor effectively cluster { XfThe invalid cluster of distance { X in }nCenter vnNearest data sample.
CN201610303447.5A 2016-05-09 2016-05-09 Towards the validity feature sample identification techniques strengthening grader popularization performance Pending CN106156789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610303447.5A CN106156789A (en) 2016-05-09 2016-05-09 Towards the validity feature sample identification techniques strengthening grader popularization performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610303447.5A CN106156789A (en) 2016-05-09 2016-05-09 Towards the validity feature sample identification techniques strengthening grader popularization performance

Publications (1)

Publication Number Publication Date
CN106156789A true CN106156789A (en) 2016-11-23

Family

ID=57352810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610303447.5A Pending CN106156789A (en) 2016-05-09 2016-05-09 Towards the validity feature sample identification techniques strengthening grader popularization performance

Country Status (1)

Country Link
CN (1) CN106156789A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109239585A (en) * 2018-09-06 2019-01-18 南京理工大学 A kind of method for diagnosing faults based on the preferred wavelet packet of improvement

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data
CN104794482A (en) * 2015-03-24 2015-07-22 江南大学 Inter-class maximization clustering algorithm based on improved kernel fuzzy C mean value
CN105447520A (en) * 2015-11-23 2016-03-30 盐城工学院 Sample classification method based on weighted PTSVM (projection twin support vector machine)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data
CN104794482A (en) * 2015-03-24 2015-07-22 江南大学 Inter-class maximization clustering algorithm based on improved kernel fuzzy C mean value
CN105447520A (en) * 2015-11-23 2016-03-30 盐城工学院 Sample classification method based on weighted PTSVM (projection twin support vector machine)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
焦卫东 等: "整体改进的基于支持向量机的故障诊断方法", 《仪器仪表学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109239585A (en) * 2018-09-06 2019-01-18 南京理工大学 A kind of method for diagnosing faults based on the preferred wavelet packet of improvement

Similar Documents

Publication Publication Date Title
CN109582003B (en) Bearing fault diagnosis method based on pseudo label semi-supervised kernel local Fisher discriminant analysis
Qin et al. The optimized deep belief networks with improved logistic sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines
CN102944418B (en) Wind turbine generator group blade fault diagnosis method
Bezdek Numerical taxonomy with fuzzy sets
Liu et al. Study on SVM compared with the other text classification methods
CN111524606A (en) Tumor data statistical method based on random forest algorithm
CN109102005A (en) Small sample deep learning method based on shallow Model knowledge migration
CN107590506A (en) A kind of complex device method for diagnosing faults of feature based processing
CN105487526A (en) FastRVM (fast relevance vector machine) wastewater treatment fault diagnosis method
CN110009030B (en) Sewage treatment fault diagnosis method based on stacking meta-learning strategy
Biswal et al. Classification of power quality data using decision tree and chemotactic differential evolution based fuzzy clustering
CN104794368A (en) Rolling bearing fault classifying method based on FOA-MKSVM (fruit fly optimization algorithm-multiple kernel support vector machine)
CN106599913A (en) Cluster-based multi-label imbalance biomedical data classification method
CN110737976B (en) Mechanical equipment health assessment method based on multidimensional information fusion
US20240133391A1 (en) Prediction method for stall and surging of axial-flow compressor based on deep autoregressive network
CN113327632B (en) Unsupervised abnormal sound detection method and device based on dictionary learning
CN111709299A (en) Underwater sound target identification method based on weighting support vector machine
CN115048988B (en) Unbalanced data set classification fusion method based on Gaussian mixture model
CN111753891A (en) Rolling bearing fault diagnosis method based on unsupervised feature learning
CN109325553B (en) Wind power gear box fault detection method, system, equipment and medium
CN110288028A (en) ECG detecting method, system, equipment and computer readable storage medium
CN109976308A (en) A kind of extracting method of the fault signature based on Laplce&#39;s score value and AP cluster
CN107194207A (en) Protein ligands binding site estimation method based on granularity support vector machine ensembles
CN111611867A (en) Rolling bearing intelligent fault diagnosis method based on multi-classification fuzzy correlation vector machine
CN107527064A (en) A kind of application of manifold learning in fault diagnosis data extraction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161123

WD01 Invention patent application deemed withdrawn after publication