CN106156789A - Towards the validity feature sample identification techniques strengthening grader popularization performance - Google Patents

Towards the validity feature sample identification techniques strengthening grader popularization performance Download PDF

Info

Publication number
CN106156789A
CN106156789A CN201610303447.5A CN201610303447A CN106156789A CN 106156789 A CN106156789 A CN 106156789A CN 201610303447 A CN201610303447 A CN 201610303447A CN 106156789 A CN106156789 A CN 106156789A
Authority
CN
China
Prior art keywords
sigma
cluster
feature sample
formula
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610303447.5A
Other languages
Chinese (zh)
Inventor
焦卫东
杨志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Normal University CJNU
Original Assignee
Zhejiang Normal University CJNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Normal University CJNU filed Critical Zhejiang Normal University CJNU
Priority to CN201610303447.5A priority Critical patent/CN106156789A/en
Publication of CN106156789A publication Critical patent/CN106156789A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种面向增强分类器推广性能的有效特征样本识别技术,其特征在于,该方法包括如下步骤:1)分类器推广性能评价指标的建立;2)模糊聚类准则的构建;3)特征样本集一次聚类划分;4)类内平均距离与类间平均距离的定义;5)初始聚类优选准则的建立;6)特征样本集二次聚类识别。本发明的有益效果是,方法设计合理,使用简明,有效去除噪点或野点,特征样本识别率高。

The invention discloses an effective feature sample identification technology for enhancing classifier generalization performance, which is characterized in that the method comprises the following steps: 1) establishment of classifier generalization performance evaluation index; 2) construction of fuzzy clustering criterion; 3 ) primary clustering division of feature sample set; 4) definition of intra-class average distance and inter-class average distance; 5) establishment of initial cluster optimization criteria; 6) secondary cluster identification of feature sample set. The invention has the beneficial effects that the method is reasonable in design, simple in use, effectively removes noise points or wild points, and has a high recognition rate of feature samples.

Description

面向增强分类器推广性能的有效特征样本识别技术Efficient Feature Sample Recognition Techniques for Enhancing Classifier Generalization Performance

技术领域technical field

本发明基于信号处理理论,在数据聚类分析的基础上提出一种有效特征样本识别方法,利用聚类分析的模式自动划分特性,剔除特征数据中的野点或噪点,达到特征数据净化的目的,在此基础上提高支持向量机分类器的推广性能。该计算方法为解决机械故障诊断领域中涉及的精确模式识别与分类问题奠定了基础。Based on the signal processing theory, the present invention proposes an effective feature sample identification method on the basis of data cluster analysis, uses the cluster analysis mode to automatically divide characteristics, eliminates wild points or noise points in feature data, and achieves the purpose of feature data purification. On this basis, the generalization performance of the support vector machine classifier is improved. This calculation method lays the foundation for solving the precise pattern recognition and classification problems involved in the field of mechanical fault diagnosis.

背景技术Background technique

基于统计学习理论的支持向量机(SVM)方法在模式分类方面极具优势,已成功应用于故障诊断。理论上,SVM的最优分类面由位于类边缘的支持向量决定,而位于类边缘附近的野(值)点或噪(声)点往往与有效样本混杂在一起,导致所求出的分类面不是最优的,从而影响了分类器的推广性能[1,2]The support vector machine (SVM) method based on statistical learning theory has great advantages in pattern classification and has been successfully applied to fault diagnosis. Theoretically, the optimal classification surface of SVM is determined by the support vectors located at the edge of the class, and the wild (value) points or noise (sound) points near the edge of the class are often mixed with valid samples, resulting in the obtained classification surface It is not optimal, which affects the generalization performance of the classifier [1,2] .

在实际的故障诊断应用场合,信号检测中待诊断对象周边的外来干扰以及采集系统的内噪声等均可能在原始观测数据中引入噪声干扰;传感器件异常或故障、系统中力或运动的异常波动或仅仅是运行工况的改变,也可能产生异常的观测野值。这些存在于原始数据中的噪声或野值如果不适当处理,将随同特征提取进入特征空间,形成明显偏离整体类特征的噪点或野点。此外,还有许多影响故障诊断的负面因素,例如振动在机械结构中传递的散射与混响效应所导致的传感观测信息冗余,特征提取环节所选择的过高特征维数等。信息冗余将造成后续特征提取的困难,并进一步放大噪声或野值的负面作用;特征维数选择过高,则会使样本统计特性的估计更加困难,从而降低分类器的推广能力或泛化能力[3]。因此,必须首先对特征数据进行必要的净化处理,才能达到有效诊断的目的。In actual fault diagnosis applications, external interference around the object to be diagnosed in signal detection and internal noise of the acquisition system may introduce noise interference into the original observation data; abnormal or faulty sensor devices, abnormal fluctuations in force or motion in the system Or just changes in operating conditions may also produce abnormal observation outliers. If these noises or outliers in the original data are not properly processed, they will enter the feature space along with the feature extraction, forming noise or outliers that obviously deviate from the overall class features. In addition, there are many negative factors that affect fault diagnosis, such as the redundancy of sensing observation information caused by the scattering and reverberation effects of vibration transmitted in the mechanical structure, and the excessively high feature dimension selected in the feature extraction process. Information redundancy will cause difficulties in subsequent feature extraction and further amplify the negative effects of noise or outliers; if the feature dimension is too high, it will make it more difficult to estimate the statistical characteristics of the sample, thereby reducing the generalization ability or generalization of the classifier Capability [3] . Therefore, the characteristic data must first be purified to achieve the purpose of effective diagnosis.

发明内容Contents of the invention

本发明的目的是为了解决上述问题,开发了一种面向增强分类器推广性能的有效特征样本识别技术。The purpose of the present invention is to solve the above problems and develop an effective feature sample identification technology for enhancing the generalization performance of classifiers.

实现上述目的本发明的技术方案为,一种面向增强分类器推广性能的有效特征样本识别技术,其特征在于,该方法包括如下步骤:Achieving the above object The technical solution of the present invention is an effective feature sample identification technology for enhancing the generalization performance of classifiers, characterized in that the method includes the following steps:

1)分类器推广性能评价指标的建立;1) Establishment of classifier generalization performance evaluation index;

2)模糊聚类准则的构建;2) Construction of fuzzy clustering criteria;

3)特征样本集一次聚类划分;3) One-time clustering and division of feature sample set;

4)类内平均距离与类间平均距离的定义;4) The definition of the average distance within a class and the average distance between classes;

5)初始聚类优选准则的建立;5) Establishment of initial clustering optimization criteria;

6)特征样本集二次聚类识别。6) Secondary clustering identification of feature sample set.

所述分类器推广性能评价指标的建立计算式为:The establishment calculation formula of described classifier generalization performance evaluation index is:

R(w)=Remp(w)+Φ(h/l),R(w)= Remp (w)+Φ(h/l),

h≤min([r2a2],n)+1.h≤min([r 2 a 2 ],n)+1.

式中Φ(·)为置信风险函数,h为分类函数的VC维,l为训练样本数。可以看到,真实风险R(w)由经验风险Remp(w)与置信风险Φ(·)两部分构成。[·]表示取整数部分。r为包含所有高维空间映射点 的最小超球体半径。where Φ(·) is the confidence risk function, h is the VC dimension of the classification function, and l is the number of training samples. It can be seen that the real risk R(w) is composed of two parts: the empirical risk Remp (w) and the confidence risk Φ(·). [·] indicates the integer part. r contains all high-dimensional space mapping points The minimum hypersphere radius of .

所述模糊聚类准则的构建计算式为:The formula for constructing the fuzzy clustering criterion is:

式中dik=||xk-vi||为样本xk与聚类中心vi之间的距离,一般采用欧式距离度量。m为模糊加权指数,通常取m=2。JFCM(U,V)表示 各类样本到聚类中心加权距离的平方和,权重是样本xk对第i类隶属度的m次方。In the formula, d ik =||x k -v i| | is the distance between the sample x k and the cluster center v i , which is generally measured by Euclidean distance. m is the fuzzy weighting index, usually m=2. J FCM (U, V) represents the sum of squares of weighted distances from various samples to the cluster center, and the weight is the mth power of the membership degree of sample x k to the i-th class.

所述特征样本集一次聚类划分计算式为:The calculation formula of the primary clustering division of the feature sample set is:

式中设定聚类数目c、模糊加权指数m和初始隶属度矩阵U0,迭代步数l=0。对于给定的停止值ε>0,迭代计算直至max{|uik l-uik l-1|}<ε,算法终止;否则l=l+1,算法继续执行。In the formula, the number of clusters c, the fuzzy weight index m and the initial membership degree matrix U 0 are set, and the number of iteration steps l=0. For a given stop value ε>0, iterative calculation until max{|u ik l -u ik l-1 |}<ε, the algorithm terminates; otherwise l=l+1, the algorithm continues to execute.

所述类内平均距离与类间平均距离的定义计算式为:The definition calculation formula of the average distance within the class and the average distance between classes is:

式中为聚类{XO}中数据样本的两两组合数目。vi与vj分别为第i个聚类{Xi}与第j个聚类{Xj}的中心。为c个聚类编号两两组合的组合数目。In the formula is the number of pairwise combinations of data samples in the cluster {X O }. v i and v j are the centers of the i-th cluster {X i } and the j-th cluster {X j } respectively. The number of combinations that number pairwise combinations for c clusters.

所述初始聚类优选准则的建立计算式为:The formula for establishing the initial clustering optimization criterion is:

式中{Xf}为c个聚类(通常c≥3)中所包含的由有效特征样本构成的、容量为nf且中心为vf的初始有效聚类,{Xn}为主要由噪点或野点构成的、容量为nn且中心为vn的初始无效聚类{Xn}。In the formula, {X f } is an initial effective cluster with a capacity of n f and a center of v f which is composed of effective feature samples contained in c clusters (usually c≥3), and {X n } is mainly composed of The initial invalid cluster {X n } with capacity n n and center v n is composed of noise or wild points.

所述特征样本集二次聚类识别计算式为:The formula for calculating the secondary clustering identification of the feature sample set is:

式中{Xs}是从{Xd}中抽取的一个容量为ns、中心为vs的组合子集,并满足最小化准则条件。经过有效样本抽取后{Xd}中剩余的数据样本构成子集{Xt},并入由噪(野)点构成的无效聚类{Xn}中。式(18)执行无效聚类的二次划分过程,其中xi为经过主划分后形成的无效聚类{Xn}中的数据样本。Xnear为有效聚类{Xf}中距离无效聚类{Xn}的中心vn最近的数据样本。In the formula, {X s } is a combined subset with capacity n s and center v s extracted from {X d }, which satisfies the minimization criteria. After effective sample extraction, the remaining data samples in {X d } form a subset {X t }, which is merged into the invalid cluster {X n } composed of noise (wild) points. Equation (18) implements the secondary division process of invalid clusters, where xi is the data sample in the invalid cluster {X n } formed after the main division. X near is the data sample closest to the center v n of the invalid cluster {X n } in the valid cluster {X f }.

附图说明Description of drawings

图1是本发明所述面向增强分类器推广性能的有效特征样本识别技术的流程示意图Fig. 1 is a schematic flow chart of the effective feature sample identification technology for enhancing classifier generalization performance according to the present invention

图2是SVM分类器推广性能评价原理图Figure 2 is a schematic diagram of SVM classifier generalization performance evaluation

图3特征空间中的超球域描述Figure 3 Hypersphere description in feature space

图4正常状态有效特征样本的识别结果Fig.4 Recognition results of effective feature samples in normal state

图5轮齿破坏有效特征样本的识别结果Figure 5. Identification results of effective feature samples of tooth damage

图6机座松动有效特征样本的识别结果Figure 6 The recognition results of the effective feature samples of machine base looseness

具体实施方式detailed description

下面结合附图对本发明进行具体描述,如图1是本发明所述面向增强分类器推广性能的有效特征样本识别技术的流程示意图,采用基于噪点(野点)去除的有效特征样本识别方法,对特征空间中的特征样本进行净化预处理。The present invention is described in detail below in conjunction with the accompanying drawings. Fig. 1 is a schematic flow chart of the effective feature sample recognition technology for enhancing the generalization performance of the classifier according to the present invention. The feature samples in the space are pre-cleaned.

本技术方案以齿轮箱正常状态、轮齿破坏和机座松动三类模式的有效特征样本识别为例子阐述特征样本净化预处理过程,其基本原理为:基于统计学习理论的结构风险最小化(SRM)原理,最大化SVM分类器的推广性能,对多个故障模式类特征样本按照层次化处理原则进行两次净化处理,获得用于分类器设计的有效特征样本,SVM分类器推广性能评价原理如图2。即This technical solution uses the effective feature sample identification of the three types of gearbox normal state, gear tooth damage and machine base looseness as an example to illustrate the feature sample purification and pretreatment process. The basic principle is: Structural Risk Minimization (SRM) based on Statistical Learning Theory ) principle to maximize the generalization performance of the SVM classifier, perform two purification processes on multiple failure mode feature samples according to the principle of hierarchical processing, and obtain effective feature samples for classifier design. The generalization performance evaluation principle of the SVM classifier is as follows: figure 2. which is

R(w)=Remp(w)+Φ(h/l),R(w)= Remp (w)+Φ(h/l),

h≤min([r2a2],n)+1.h≤min([r 2 a 2 ],n)+1.

式中Φ(·)为置信风险函数,h为分类函数的VC维,l为训练样本数。可以看到,真实风险R(w)由经验风险Remp(w)与置信风险Φ(·)两部分构成。[·]表示取整数部分。r为包含所有高维空间映射点 的最小超球体半径。特征空间中的超球域描述如图3。where Φ(·) is the confidence risk function, h is the VC dimension of the classification function, and l is the number of training samples. It can be seen that the real risk R(w) is composed of two parts: the empirical risk Remp (w) and the confidence risk Φ(·). [·] indicates the integer part. r contains all high-dimensional space mapping points The minimum hypersphere radius of . The description of the hypersphere in the feature space is shown in Figure 3.

实施例1Example 1

正常状态有效特征样本识别Recognition of Valid Feature Samples in Normal State

对正常状态特征样本集依次建立聚类准则,并连续进行两次聚类划分,有效特征样本识别结果如图4所示。The clustering criteria are established sequentially for the feature sample set in normal state, and two consecutive clustering divisions are carried out. The identification results of effective feature samples are shown in Figure 4.

聚类准则建立式为:The establishment formula of the clustering criterion is:

正常状态特征样本集一次聚类划分式为:The primary clustering formula of the normal state feature sample set is:

正常状态类内平均距离与类间平均距离的定义式为:The definition of the average distance within a class and the average distance between classes in a normal state is:

正常状态特征样本初始聚类优选准则的建立式为:The establishment formula of the initial clustering optimization criterion of normal state feature samples is:

正常状态特征样本集二次聚类识别计算式为:The calculation formula for the secondary clustering identification of the normal state feature sample set is:

实施例2Example 2

轮齿破坏有效特征样本识别Identification of Effective Feature Samples for Gear Tooth Destruction

对轮齿破坏特征样本集依次建立聚类准则,并连续进行两次聚类划分,有效特征样本识别结果如图5所示。The clustering criterion is established sequentially for the tooth damage feature sample set, and two consecutive clustering divisions are carried out. The effective feature sample identification results are shown in Fig. 5.

聚类准则建立式为:The establishment formula of the clustering criterion is:

轮齿破坏特征样本集一次聚类划分式为:The primary clustering formula of the gear tooth damage feature sample set is:

轮齿破坏类内平均距离与类间平均距离的定义式为:The definition formulas of the average distance within a class and the average distance between classes of tooth damage are:

轮齿破坏特征样本初始聚类优选准则的建立式为:The establishment formula of the initial clustering optimization criterion for gear tooth damage feature samples is:

轮齿破坏特征样本集二次聚类识别计算式为:The formula for secondary clustering identification of tooth damage feature sample set is:

实施例3Example 3

机座松动有效特征样本识别Recognition of Effective Feature Samples for Machine Base Looseness

对机座松动特征样本集依次建立聚类准则,并连续进行两次聚类划分,有效特征样本识别结果如图6所示。The clustering criterion is established sequentially for the loose feature sample set of the machine base, and two consecutive clustering divisions are carried out. The effective feature sample identification results are shown in Figure 6.

聚类准则建立式为:The establishment formula of the clustering criterion is:

机座松动特征样本集一次聚类划分式为:The primary clustering division formula of the machine base loose feature sample set is:

机座松动类内平均距离与类间平均距离的定义式为:The definition formula of the average distance within the category and the average distance between categories is as follows:

机座松动特征样本初始聚类优选准则的建立式为:The establishment formula of the initial clustering optimization criterion for the loose feature samples of the machine base is:

机座松动特征样本集二次聚类识别计算式为:The calculation formula of the secondary clustering identification of the loose feature sample set of the machine base is:

参考文献references

[1]杜喆,刘三阳,齐小刚.一种新隶属度函数的模糊支持向量机.系统仿真学报,2009,21(7):1901-1903.[1] Du Zhe, Liu Sanyang, Qi Xiaogang. A Fuzzy Support Vector Machine with a New Membership Function. Journal of System Simulation, 2009,21(7):1901-1903.

[2]丁世飞,齐丙娟,谭红艳.支持向量机理论与算法研究综述.电子科技大学学报,2011,40(1):2-10.[2] Ding Shifei, Qi Bingjuan, Tan Hongyan. A Review of Support Vector Machine Theory and Algorithms. Journal of University of Electronic Science and Technology of China, 2011,40(1):2-10.

[3]张彪.文本分类中特征选择算法的分析与研究.合肥:中国科学技术大学硕士学位论文,2010.[3] Zhang Biao. Analysis and Research on Feature Selection Algorithms in Text Classification. Hefei: Master's Degree Thesis of University of Science and Technology of China, 2010.

上述技术方案仅体现了本发明技术方案的优选技术方案,本技术领域的技术人员对其中某些部分所可能做出的一些变动均体现了本发明的原理,属于本发明的保护范围之内。The above-mentioned technical solutions only reflect the preferred technical solutions of the technical solutions of the present invention, and some changes that those skilled in the art may make to certain parts reflect the principles of the present invention and fall within the protection scope of the present invention.

Claims (7)

1.一种面向增强分类器推广性能的有效特征样本识别技术,其特征在于,该方法包括如下步骤:1. an effective feature sample identification technology for enhancing classifier generalization performance, it is characterized in that, the method comprises the steps: 1)分类器推广性能评价指标的建立;1) Establishment of classifier generalization performance evaluation index; 2)模糊聚类准则的构建;2) Construction of fuzzy clustering criteria; 3)特征样本集一次聚类划分;3) One-time clustering and division of feature sample set; 4)类内平均距离与类间平均距离的定义;4) The definition of the average distance within a class and the average distance between classes; 5)初始聚类优选准则的建立;5) Establishment of initial clustering optimization criteria; 6)特征样本集二次聚类识别。6) Secondary clustering identification of feature sample set. 2.根据权利要求1所述的面向增强分类器推广性能的有效特征样本识别技术,其特征在于,所述分类器推广性能评价指标的建立计算式为:2. according to claim 1, face the effective feature sample identification technology of enhanced classifier generalization performance, it is characterized in that, the establishment calculation formula of described classifier generalization performance evaluation index is: R(w)=Remp(w)+Φ(h/l),R(w)= Remp (w)+Φ(h/l), h≤min([r2a2],n)+1.h≤min([r 2 a 2 ],n)+1. 式中Φ(·)为置信风险函数,h为分类函数的VC维,l为训练样本数。可以看到,真实风险R(w)由经验风险Remp(w)与置信风险Φ(·)两部分构成。[·]表示取整数部分。r为包含所有高维空间映射点 的最小超球体半径。where Φ(·) is the confidence risk function, h is the VC dimension of the classification function, and l is the number of training samples. It can be seen that the real risk R(w) is composed of two parts: the empirical risk Remp (w) and the confidence risk Φ(·). [·] indicates the integer part. r contains all high-dimensional space mapping points The minimum hypersphere radius of . 3.根据权利要求1所述的面向增强分类器推广性能的有效特征样本识别技术,其特征在于,所述模糊聚类准则的构建计算式为:3. according to claim 1, face the effective feature sample identification technology of enhanced classifier generalization performance, it is characterized in that, the construction computing formula of described fuzzy clustering criterion is: minmin JJ Ff CC Mm (( Uu ,, VV )) == &Sigma;&Sigma; kk == 11 nno &Sigma;&Sigma; ii == 11 cc (( uu ii kk )) mm (( dd ii kk )) 22 .. 式中dik=||xk-vi||为样本xk与聚类中心vi之间的距离,一般采用欧式距离度量。m为模糊加权指数,通常取m=2。JFCM(U,V)表示各类样本到聚类中心加权距离的平方和,权重是样本xk对第i类隶属度的m次方。In the formula, d ik =||x k -v i| | is the distance between the sample x k and the cluster center v i , which is generally measured by Euclidean distance. m is the fuzzy weighting index, usually m=2. J FCM (U, V) represents the sum of squares of weighted distances from various samples to the cluster center, and the weight is the mth power of the membership degree of sample x k to the i-th class. 4.根据权利要求1所述的面向增强分类器推广性能的有效特征样本识别技术,其特征在于,所述特征样本集一次聚类划分计算式为:4. according to claim 1, the effective feature sample identification technology facing enhanced classifier generalization performance, it is characterized in that, described feature sample set primary clustering division formula is: vv ii ll == &Sigma;&Sigma; kk == 11 nno (( uu ii kk ll )) mm xx kk // &Sigma;&Sigma; kk == 11 nno (( uu ii kk ll )) mm ,, ii == 11 ,, KK ,, cc ,, uu ii kk ll ++ 11 == 11 // &Sigma;&Sigma; jj == 11 cc (( dd ii kk dd jj kk )) 22 mm -- 11 ,, &ForAll;&ForAll; ii ,, &ForAll;&ForAll; kk .. 式中设定聚类数目c、模糊加权指数m和初始隶属度矩阵U0,迭代步数l=0。对于给定的停止值ε>0,迭代计算直至max{|uik l-uik l-1|}<ε,算法终止;否则l=l+1,算法继续执行。In the formula, the number of clusters c, the fuzzy weight index m and the initial membership degree matrix U 0 are set, and the number of iteration steps l=0. For a given stop value ε>0, iterative calculation until max{|u ik l -u ik l-1 |}<ε, the algorithm terminates; otherwise l=l+1, the algorithm continues to execute. 5.根据权利要求1所述的面向增强分类器推广性能的有效特征样本识别技术,其特征在于,所述类内平均距离与类间平均距离的定义计算式为:5. according to claim 1, face the effective feature sample identification technology of enhanced classifier generalization performance, it is characterized in that, the definition calculation formula of average distance in described class and average distance between classes is: &delta;&delta; ii nno nno ee rr == &Sigma;&Sigma; ii == 11 nno Oo -- 11 &Sigma;&Sigma; jj == ii ++ 11 nno Oo || || xx ii -- xx jj || || // CC nno Oo 22 ,, &delta;&delta; ii nno tt ee rr == &Sigma;&Sigma; ii == 11 cc -- 11 &Sigma;&Sigma; jj == ii ++ 11 cc || || vv ii -- vv jj || || // CC cc 22 ,, .. 式中为聚类{XO}中数据样本的两两组合数目。vi与vj分别为第i个聚类{Xi}与第j个聚类{Xj}的中心。Cc 2为c个聚类编号两两组合的组合数目。In the formula is the number of pairwise combinations of data samples in the cluster {X O }. v i and v j are the centers of the i-th cluster {X i } and the j-th cluster {X j } respectively. C c 2 is the combination number of pairwise combinations of c cluster numbers. 6.根据权利要求1所述的面向增强分类器推广性能的有效特征样本识别技术,其特征在于,所述初始聚类优选准则的建立计算式为:6. according to claim 1, face the effective feature sample identification technology of enhanced classifier generalization performance, it is characterized in that, the establishment computing formula of described initial clustering optimization criterion is: {{ Xx ff }} &DoubleLeftArrow;&DoubleLeftArrow; {{ Xx Oo }} ,, sthe s .. tt .. mm aa xx Oo &lsqb;&lsqb; nno Oo CC nno Oo 22 // &Sigma;&Sigma; ii == 11 nno Oo -- 11 &Sigma;&Sigma; jj == ii ++ 11 nno Oo || || xx ii -- xx jj || || &rsqb;&rsqb; .. {{ Xx nno }} &DoubleLeftArrow;&DoubleLeftArrow; {{ Xx Oo }} ,, sthe s .. tt .. minmin Oo &lsqb;&lsqb; nno Oo CC nno Oo 22 // &Sigma;&Sigma; ii == 11 nno Oo -- 11 &Sigma;&Sigma; jj == ii ++ 11 nno Oo || || xx ii -- xx jj || || &rsqb;&rsqb; .. 式中{Xf}为c个聚类(通常c≥3)中所包含的由有效特征样本构成的、容量为nf且中心为vf的初始有效聚类,{Xn}为主要由噪点或野点构成的、容量为nn且中心为vn的初始无效聚类{Xn}。In the formula, {X f } is an initial effective cluster with a capacity of n f and a center of v f which is composed of effective feature samples contained in c clusters (usually c≥3), and {X n } is mainly composed of The initial invalid cluster {X n } with capacity n n and center v n is composed of noise or wild points. 7.根据权利要求1所述的面向增强分类器推广性能的有效特征样本识别技术,其特征在于,所述特征样本集二次聚类识别计算式为:7. according to claim 1, the effective feature sample identification technology facing enhanced classifier generalization performance, it is characterized in that, described feature sample set secondary clustering recognition calculation formula is: ii ff || || vv dd -- vv ff || || << &delta;&delta; intint ee rr ,, tt hh ee nno {{ Xx dd }} &DoubleRightArrow;&DoubleRightArrow; {{ Xx ff }} .. ee ll sthe s ee {{ Xx sthe s }} &DoubleRightArrow;&DoubleRightArrow; {{ Xx ff }} ,, aa nno dd {{ Xx tt }} &DoubleRightArrow;&DoubleRightArrow; {{ Xx nno }} ,, sthe s .. tt .. minmin sthe s &lsqb;&lsqb; &Sigma;&Sigma; ii == 11 nno sthe s -- 11 &Sigma;&Sigma; jj == ii ++ 11 nno sthe s || || xx ii -- xx jj || || || || vv sthe s -- vv ff || || // CC nno sthe s 22 &rsqb;&rsqb; .. 式中{Xs}是从{Xd}中抽取的一个容量为ns、中心为vs的组合子集,并满足最小化准则条件。经过有效样本抽取后{Xd}中剩余的数据样本构成子集{Xt},并入由噪(野)点构成的无效聚类{Xn}中。式(18)执行无效聚类的二次划分过程,其中xi为经过主划分后形成的无效聚类{Xn}中的数据样本。Xnear为有效聚类{Xf}中距离无效聚类{Xn}的中心vn最近的数据样本。In the formula, {X s } is a combined subset with capacity n s and center v s extracted from {X d }, which satisfies the minimization criteria. After effective sample extraction, the remaining data samples in {X d } form a subset {X t }, which is merged into the invalid cluster {X n } composed of noise (wild) points. Equation (18) implements the secondary division process of invalid clusters, where xi is the data sample in the invalid cluster {X n } formed after the main division. X near is the data sample closest to the center v n of the invalid cluster {X n } in the valid cluster {X f }.
CN201610303447.5A 2016-05-09 2016-05-09 Towards the validity feature sample identification techniques strengthening grader popularization performance Pending CN106156789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610303447.5A CN106156789A (en) 2016-05-09 2016-05-09 Towards the validity feature sample identification techniques strengthening grader popularization performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610303447.5A CN106156789A (en) 2016-05-09 2016-05-09 Towards the validity feature sample identification techniques strengthening grader popularization performance

Publications (1)

Publication Number Publication Date
CN106156789A true CN106156789A (en) 2016-11-23

Family

ID=57352810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610303447.5A Pending CN106156789A (en) 2016-05-09 2016-05-09 Towards the validity feature sample identification techniques strengthening grader popularization performance

Country Status (1)

Country Link
CN (1) CN106156789A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109239585A (en) * 2018-09-06 2019-01-18 南京理工大学 A kind of method for diagnosing faults based on the preferred wavelet packet of improvement

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification methods for imbalanced data
CN104794482A (en) * 2015-03-24 2015-07-22 江南大学 Inter-class maximization clustering algorithm based on improved kernel fuzzy C mean value
CN105447520A (en) * 2015-11-23 2016-03-30 盐城工学院 Sample classification method based on weighted PTSVM (projection twin support vector machine)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification methods for imbalanced data
CN104794482A (en) * 2015-03-24 2015-07-22 江南大学 Inter-class maximization clustering algorithm based on improved kernel fuzzy C mean value
CN105447520A (en) * 2015-11-23 2016-03-30 盐城工学院 Sample classification method based on weighted PTSVM (projection twin support vector machine)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
焦卫东 等: "整体改进的基于支持向量机的故障诊断方法", 《仪器仪表学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109239585A (en) * 2018-09-06 2019-01-18 南京理工大学 A kind of method for diagnosing faults based on the preferred wavelet packet of improvement

Similar Documents

Publication Publication Date Title
Zhang et al. A new interpretable learning method for fault diagnosis of rolling bearings
Zhou et al. Dynamic graph-based feature learning with few edges considering noisy samples for rotating machinery fault diagnosis
Shao et al. A deep learning approach for fault diagnosis of induction motors in manufacturing
Oh et al. Scalable and unsupervised feature engineering using vibration-imaging and deep learning for rotor system diagnosis
CN110132598B (en) A Fault Noise Diagnosis Algorithm for Rolling Bearings of Rotating Equipment
CN109582003B (en) Bearing fault diagnosis method based on pseudo label semi-supervised kernel local Fisher discriminant analysis
CN106682688B (en) Bearing fault diagnosis method based on particle swarm optimization with stacked noise reduction self-encoding network
Wang et al. Attention-aware temporal–spatial graph neural network with multi-sensor information fusion for fault diagnosis
CN107316057B (en) Nuclear power plant fault diagnosis method
Gong et al. Implementation of machine learning for fault classification on vehicle power transmission system
CN107677472A (en) The bearing state noise diagnostics algorithm that network-oriented Variable Selection merges with Characteristic Entropy
CN105134619B (en) A kind of fault diagnosis based on wavelet energy, manifold dimension-reducing and dynamic time warping and health evaluating method
CN104502103A (en) Bearing fault diagnosis method based on fuzzy support vector machine
Lu et al. Feature extraction using adaptive multiwavelets and synthetic detection index for rotor fault diagnosis of rotating machinery
Li et al. Maximum margin Riemannian manifold-based hyperdisk for fault diagnosis of roller bearing with multi-channel fusion covariance matrix
Xu et al. A method combining refined composite multiscale fuzzy entropy with PSO-SVM for roller bearing fault diagnosis
CN105678343A (en) Adaptive-weighted-group-sparse-representation-based diagnosis method for noise abnormity of hydroelectric generating set
Zhang et al. A bearing fault diagnosis method based on multiscale dispersion entropy and GG clustering
Xu et al. Automatic roller bearings fault diagnosis using DSAE in deep learning and CFS algorithm
CN116451022A (en) Adaptive bearing fault diagnosis method based on depth discrimination reactance domain
CN111611867A (en) Intelligent Fault Diagnosis Method of Rolling Bearing Based on Multi-class Fuzzy Correlation Vector Machine
CN117056849A (en) Unsupervised method and system for monitoring abnormal state of complex mechanical equipment
Lu et al. A zero-shot intelligent fault diagnosis system based on EEMD
CN111428772B (en) Photovoltaic system depth anomaly detection method based on k-nearest neighbor adaptive voting
CN106156789A (en) Towards the validity feature sample identification techniques strengthening grader popularization performance

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161123