CN107784325A

CN107784325A - Spiral fault diagnosis model based on the fusion of data-driven increment

Info

Publication number: CN107784325A
Application number: CN201710983351.2A
Authority: CN
Inventors: 刘晶; 安雅程; 季海鹏; 刘彦凯
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2017-10-20
Filing date: 2017-10-20
Publication date: 2018-03-09
Anticipated expiration: 2037-10-20
Also published as: CN107784325B

Abstract

The invention discloses a kind of spiral fault diagnosis model based on the fusion of data-driven increment, comprise the steps：Gathered data point is simultaneously divided into normal sample and fault sample；Random sampling obtains the non-equilibrium sample of different slopes and is divided into four groups；The sample of relative equilibrium is obtained using the method for resampling based on division neighbour；DAE extraction fault signatures are inputted, increment merges feature mode when newly-increased data be present, then inputs SVM and carries out fault diagnosis；Selection had not only possessed information content but also typical example, and carries out Dynamic Comprehensive Evaluation to validity feature and example；Effective example set is merged with newly-increased data, re-starts incremental learning process.The model obtains being beneficial to the equilibrium criterion for accurately identifying fault type in the case where taking into full account sample noise and distribution characteristics, dynamic evaluation is carried out by selected characteristic and example and effective information is retained and handed on by increment merging, it is achieved thereby that the incremental learning and classification diagnosis of equipment fault rapidly and efficiently.

Description

Spiral fault diagnosis model based on the fusion of data-driven increment

Technical field

The present invention relates to bearing apparatus fault diagnosis technology field, more particularly to it is a kind of based on the fusion of data-driven increment Spiral fault diagnosis model.

Background technology

It is relatively serious that smart machine is applied to key areas, its failure effect such as industry, aviation, national defence more.In recent years, intelligence It can manufacture and be developing progressively as the core content of industry 4.0 as an important research field.Meanwhile with industrial Internet of Things Development, large scale equipment continue to bring out out magnanimity service data, are quickly and efficiently analyzed and carried by service data in process of production Fault message is taken, and efficient diagnosis and prediction are carried out to fault type, turns into the study hotspot in intelligence manufacture field.

With information technology and the depth integration of intellectual technology, the service data in equipment production process, which obtains, to be become more Easily so that the machine learning method based on data-driven is widely used in fault diagnosis field, including neutral net, branch Vector machine, Bayes and decision tree etc. are held, all achieves preferable achievement in research.[such as Yuan Pu is improved excellent based on heredity for article Change electric network failure diagnosis [J] Power System and its Automation journals of BP neural network, 2017,29 (1):118-122.] propose A kind of improved genetic optimization BP neural network that is based on carries out fault diagnosis to power network；Article [Lee win etc. be based on fuzzy clustering and Transformer fault diagnosis [J] electrotechnics journals of complete binary tree SVMs, 201631 (4):64-70.] propose base Obtained in the fault diagnosis model of fuzzy clustering and complete binary tree SVMs in diagnosing fault of power transformer higher Accuracy rate of diagnosis；[such as Xiao Hongjun is controlled article towards dynamic variation Bayes's hybrid cytokine fault diagnosis [J] of sewage disposal Theoretical and application, 201633 (11):1519-1526.] it is a kind of based on change decibel with reference to being proposed on the basis of dynamic data characteristic The dynamic fault diagnosis method of this hybrid cytokine of leaf diagnoses to the sensor fault in biochemical processing procedure of sewage；Article [multi-connected machine refrigerant charge fault diagnosis [J] Central China University of Science and Technology journal (natural sciences of the such as Wang Jiangyu based on PCA-DT Version), 201644 (7):1-4.] it is used for the refrigerant charge of multi-connected machine based on principal component analysis-decision tree (PCA-DT) algorithm Fault detection and diagnosis.But in face of the newly-increased status data of magnanimity, traditional machine learning method need by newly-increased data with Legacy data merges the whole model of re -training, adds computation complexity and memory space.The more importantly state of equipment It can change over time and change with attribute, the potential information of newly-increased data has to equipment current state and following operation trend Prior value.Therefore, how effectively using the existing model of newly-increased data correction that exponentially type increases in real time, turn into current Urgent problem.Incremental learning is i.e. while most of learnt to knowledge is preserved, constantly from new data middle school Practise new knowledge.At present, existing a variety of incremental learning models, article [gearbox of wind turbine of the such as Tang Mingzhu based on ICSVM Fault diagnosis [J] computer engineering and application, 2016,52 (14):232-236.] it is to be based on increment cost sensitivity supporting vector The gearbox of wind turbine method for diagnosing faults of machine, establish the cost-sensitive supporting vector that target is turned to misclassification Least-cost Machine fault diagnosis model；Article [application [J] vibration, survey of the such as Yin Gang adaptive sets into extreme learning machine in fault diagnosis Examination and diagnosis, 2013,33 (5):897-901.] propose that adaptive set carries out fault diagnosis, on-line tuning collection into extreme learning machine The voting stake of sub-network and network inputs weights and node bias into output；[such as Hu Yinhui is extensive for article Method for diagnosing faults [J] computer applications of InfiniBand network self studies, 2015,35 (11):3092-3096.] drawing On the basis of having entered Feature Selection strategy and incremental learning strategy, it is proposed that it is a kind of towards extensive IB networks incremental learning therefore Hinder diagnostic method IL_Bayes, diagnosed under 2 real network environment of the Milky Way.These incremental learning models are in existing document In all serve mitigate amount of calculation, improve accuracy rate, effectively save time cost effect.But in fault diagnosis field, increment The data flow of generation has the characteristics that magnanimity, non-stationary, strong noise, has High relevancy between fault increment, and with original number According to strong causality, if not being acted upon that diagnosis effect will be had a strong impact on.The above factor hinders conventional delta Learning method is further applied fault diagnosis field.

The content of the invention

It is an object of the invention to provide a kind of spiral fault diagnosis model based on the fusion of data-driven increment, to increase Measure data to promote as main line, handled for the data block that increment in each spiralization cycle is collected using non-equilibrium data, be special Sign extraction and classification, the selection of effective example, feature and example dynamic evaluation Four processes are handled successively, and will currently be chosen Evaluating characteristic and example be transferred in next spiralization cycle and merged, to retain the previously effective information that had learnt, from And form the fault diagnosis model of sped structure.The model is based on compared with conventional delta learning method by adding to carry The resampling technique of division neighbour causes fault data to reach relative equilibrium, will not only possess information content but also typical example Selection is transmitted to retain whole effective informations to greatest extent, and forgets weight to the feature of extraction and the example of selection using dynamic Overall merit is carried out, the brand-new spiral fault diagnosis model with increment information Dynamic link library is formd, efficiently solves and set The problem of characteristic such as standby fault data magnanimity, non-equilibrium, strong noise, unstable, strong causalnexus is brought, realizes fault data Non-equilibrium incremental learning and validity feature and example dynamic evaluation and transmission, reached precise and high efficiency identification equipment operation The effect of increment fault message in data.

The technical solution used in the present invention is：

A kind of spiral fault diagnosis model based on the fusion of data-driven increment, comprises the steps：

Step 1：Spark erosion technique is used deep groove ball bearing respectively on bearing to inner ring, outer ring and rolling element The Single Point of Faliure of 3 fault levels is arranged, selects motor drive terminal vibrating sensor collection normal condition (N), inner ring failure (IRF), the vibration signal under 4 kinds of states of outer ring failure (ORF) and rolling element failure (BF), obtains multiple data points, use is small Ripple bag decomposes extraction vibration signal characteristics, and it is divided into normal sample and fault sample by data generic；

Step 2：Five samples of different slopes are randomly selected to normal sample in step 1 and fault sample, and often The data point of normal sample is more than the data point in fault sample in individual sample, and described five samples are four non-equilibrium training Sample and a test sample；

Step 3：Each training sample equivalent described in step 2 is divided into four groups, one of which is as training mould Type, first add in model and be trained, after obtaining new model, then by second group of addition, after obtaining new model, add the Three groups, after obtaining new model, the 4th group is added, to verify the incremental learning ability of model；

Step 4：Use the method for resampling based on division neighbour to be respectively divided into the non-equilibrium sample in step 2 to make an uproar Sound point, boundary point and point of safes simultaneously carry out most class lack samplings and minority class over-sampling, and merging obtains the data sample of relative equilibrium This；

Step 5：Event is carried out using the equilibrium criterion sample of gained in step 4 as the input data of denoising autocoder Hindering feature extraction, being calculated when newly-increased data be present by Pattern similarity, Land use models increment carries out feature with merging method Then the increment of pattern is inputted and bearing fault pattern classification diagnosis is carried out in SVM classifier, and finely tuned using BP algorithm with merging Relevant parameter in whole disaggregated model network, finally gives spiral fault diagnosis model.

Step 6：When there is newly-increased data increment to add model, using effective example system of selection, from the data of step 1 To not only possess information content in sample but also typical example is selected and, form effective example set；

Step 7：Dynamic comprehensive is carried out to the effective example selected in the validity feature and step 6 that extract in step 5 to comment Valency, the dynamic forgetting weight that importance changes over time can be represented by giving it respectively, obtained dynamic weighting validity feature and moved State weights effective example set；

Step 8：The effective example set of dynamic weighting in step 7 is merged with newly-increased data sample, re-starts increment Habit process, so as to realize the incremental learning of bearing non-balance failure data and reliability classification diagnosis.

Based on above-mentioned steps, the model effectively realizes the non-equilibrium processing of equipment fault data, that is, utilizes the base proposed More several classes of samples and minority class sample are closed according to Reverse K Nearest Neighbors and k nearest neighbor respectively in the method for resampling of division neighbour Reason division, and take different lack sampling and over-sampling operations to take into full account noise data and sample point to obtain equilibrium criterion The situation of cloth feature.Also achieve increment extraction and the reliability classification of equipment fault feature simultaneously, i.e., it is new by DAE model extractions Increase feature mode, merged based on similarity with original model selection increment in input grader and adjust model parameter.Then carry Go out to select effective case combination of most worthy into next newly-increased data block, forget what weight overall merit was chosen according to dynamic Feature and example, the effective information previously learnt can either be preserved, the information for making to include in newly-increased data again is answered Some attention.

As preferable, the method for resampling based on division neighbour in step 4 comprises the steps：

(1) more several classes of lack sampling methods based on Reverse K Nearest Neighbors

For more several classes of samples in non-equilibrium data, by calculating the reverse of each more several classes of samples in non-equilibrium data K nearest neighbor, noise spot, boundary point and the class of point of safes three are divided into, take different operation strategies to carry out more several classes of owe respectively Sampling.Specific division rule is as follows：

(1) noise spot：That is the Reverse K Nearest Neighbors of certain more several classes of sample point are minority class sample, then it represents that the point is in few Among several classes of samples surround, belong to noise data；

(2) boundary point：I.e. in the Reverse K Nearest Neighbors of certain more several classes of sample point there is minority class sample in existing more several classes of samples again This, and the sample point data amount check which kind of no matter belongs to is more, all shows that the point is in the decision boundary of division data category Near；

(3) point of safes：That is the Reverse K Nearest Neighbors of certain more several classes of sample are more several classes of samples, then it represents that the point is in majority Class sample interior and remote decision boundary, influence smaller on classification；

It is as follows for the corresponding sampling policy of different pieces of information point：

(1) for noise spot, to avoid it from producing negative influence to subsequent classification effect, deletion strategy is taken；

(2) for boundary point, in the case where lack sampling number allows, all boundary points are retained with aid decision.When deficient When number of samples is counted out less than border, more several classes of numbers are carried out more than the data point of minority class in prioritizing selection Reverse K Nearest Neighbors Retain, because wherein containing more security information, can effectively prevent the influence of potential noise data；

(3) for point of safes, lack sampling operation is carried out, retains the data point of relatively close decision boundary, i.e., according to safety Point and the distance of minority class center of a sample arrange from big to small, preferentially delete distant point of safes.

(2) the minority class oversampler method based on k nearest neighbor：

For the minority class sample in non-equilibrium data, noise spot, boundary point and safety are divided into by calculating its k nearest neighbor Three classes of point, and boundary point is further subdivided into dangerous spot and half point of safes, take different operation strategies to be synthesized respectively Minority class over-sampling.Specific division rule is as follows：

(1) noise spot：That is the k nearest neighbor of certain minority class sample point is more several classes of samples, then it represents that the point is in more several classes of Among sample surrounds, belong to noise data；

(2) dangerous spot：I.e. in the k nearest neighbor of certain minority class sample point there is minority class sample in existing more several classes of samples again, and More several classes of number of samples are more than minority class sample, then show the point be in division data category decision boundary nearby and closer to It is more several classes of；

(3) half point of safes：I.e. in the k nearest neighbor of certain minority class sample point there is minority class sample in existing more several classes of samples again, And minority class number of samples is more than or equal to more several classes of samples, then show the point be in division data category decision boundary nearby and Closer to minority class；

(4) point of safes：That is the k nearest neighbor of certain minority class sample is minority class sample, then it represents that the point is in minority class sample This inside and remote decision boundary, influence smaller on classification.

Synthesis minority class over-sampling is carried out using Different Strategies for different pieces of information point, SMOTE passes through near with it in sample Stochastic linear interpolation selects a little to increase sample size as the mode of new samples on adjacent line.To each minority class sample X, its k arest neighbors is searched for, if sampling multiplying power is n, then n sample is randomly choosed in its k nearest samples, is designated as y₁, y₂..., y_n, in minority class sample x and y_i(i=1,2 ..., n) between stochastic linear interpolation as the following formula, construct new minority class Sample r_i：

r_i=x+rand (0,1) * (y_i-x) (4)

Wherein rand (0,1) represents a random number in (0,1) section.Then adopted according to the corresponding of different types of data point Sample strategy is as follows：

(2) for dangerous spot, to prevent more several classes of samples near minority class from influenceing generation during SMOTE over-samplings New samples, reduce more several classes of negative influences to categorised decision, take and first delete more several classes of samples therein, then synthesized The operation of minority class over-sampling；

(3) for half point of safes, because it is near decision boundary, and closer to more several classes of, illustrate that it more has weight Act on, therefore select more several classes of neighbouring samples to utilize SMOTE over-samplings, until reaching defined amount；

(4) for point of safes, over-sampling is directly carried out using SMOTE.

As preferable, to the number of samples of each classified types in the method for resampling based on division neighbour Provided, the quantity N of more several classes of lack sampling reductions should be made_dThe quantity N increased equal to minority class over-sampling_i, i.e.,

Wherein, N_majFor the quantity of more several classes of samples, N_minFor the quantity of minority class sample.

For every a kind of data point of more several classes of sample lack samplings, first to N therein_{n_maj}Individual noise spot is deleted, Then lack sampling number N_underIt should be N_d-N_{n_maj}.As point of safes number N_{s_maj}≥N_underWhen, deleted from more several classes of sample point of safes Except the N farthest apart from minority class center_underIndividual data point；As point of safes number N_{s_maj}＜ N_underWhen, deleting whole point of safes Minority class number in Reverse K Nearest Neighbors is selected to be more than more several classes of data points afterwards, preferential deletion is wherein nearest apart from minority class center Data point, until total number of deleting reaches N_under。

For every a kind of data point of minority class sample over-sampling, noise spot therein is deleted first, then summed up Into new samples number N_overIt should be noise number N in minority class_{n_min}+N_i.It is suitably few to the point of safes that is relatively distant from decision boundary Amount synthesis, and synthesis number is suitably increased to the dangerous spot near decision boundary and half point of safes.Newly closed in each division classification Into sample data is determined by average more several classes of accountings in every class data point k nearest neighbor, calculating dangerous spot, half point of safes respectively first K nearest neighbor in more several classes of average accounting R_dan、R_half

Wherein, N_{dan_maj}、N_{half_maj}, be respectively each dangerous spot, numbers more several classes of in half point of safes k nearest neighbor, K is rule Fixed k nearest neighbor number, N_dan、N_halfDangerous spot, the number of half point of safes respectively in minority class.For point of safes, R is set_safe For a relatively small value

Wherein, N_safeFor the number of point of safes in minority class, and increase R with its number_safeIt is gradually reduced.Then consider sample Distribution constructs the new samples quantity N that each type of minority class sample needs to synthesize in proportion_{over_dan}、N_{over_half}、N_{over_safe}For

Comprise the steps as preferable, described denoising autocoder fault signature extraction：

To sample x according to q_DDistribution adds random noise, it is become noisy sampleI.e.

In formula, q_DNoise is hidden at random for binomial.Then by optimizing following object function J_DAEComplete DAE training.

In formula, L (x, z) is input sample x and exports the reconstructed error between y, f_θ() is volume of the input layer to hidden layer Code function, θ be coding network parameter sets, g_θ′() is decoding functions of the hidden layer to output layer, and θ ' is the ginseng of decoding network Manifold is closed.

Calculate and comprise the steps as preferable, described Pattern similarity：

The KL divergence computation schema similarity Sim (P, Q) of object for being difficult to differentiate using can effectively distinguish geometric distance

D in formula_kl(P | | Q) is the KL divergences of two different faults feature modes of P and Q

Wherein, P (i) and i-th of value in Q (i) representative feature patterns P and Q, the otherness between P and Q is smaller, KL divergences Value it is smaller, then Similarity value is bigger.

Because KL divergences calculate its different degree based on the relative entropy between distribution, rather than by distance metric, therefore The object that geometric distance is difficult to differentiate can be effectively distinguished, better than other method for measuring similarity based on distance, can accurately be weighed Measure the similarity degree between fault mode.

It is with combination principle as preferable, described pattern increment：

Newly-increased feature pattern and the KL similarities of original feature mode are calculated, chooses maximum Sim (P, Q)_maxAs the spy The Pattern similarity value of sign, and judge newly-increased according to following principle or merge this feature pattern：

Being represented using α makes the minimum similarity degree threshold value that similarity is significant between two contrast characteristics, its value be α= minSim(F_i, F_j), represent in existing feature mode feature F two-by-two_iWith F_jBetween similarity minimum value.Use β representative features It is general it is similar with it is highly similar between threshold limit value, its value is β=maxSim (F_i, F_j), represent two in existing feature mode Two feature F_iWith F_jBetween similarity maximum.It can be seen that similarity threshold α ＜ β, and with feature mode increment with merging, threshold Value α and β dynamic changes.

(1) if β ＜ Sim (P, Q)_max, then it represents that newly-increased feature is similar to the feature height in original pattern, and it can be made Merged with the mediation nonlinear weight method of average；

(2) if α ＜ Sim (P, Q)_max＜ β, then it represents that newly-increased feature and existing feature are different, it is impossible to use and existing spy The mode that sign merges replaces newly-increased feature, so this feature is incrementally added into original pattern；

(3) if Sim (P, Q)_max＜ α, then it represents that the similarity of newly-increased feature and existing feature is less than minimum threshold, represents The newly-increased feature is insignificant noise jamming value, and processing is given up to it.

It is as preferable, described effective example selection：

By information content and it is representative whether two dimensions valuable to example makes measurements, select both to have and abundant believe Effective example that breath amount and can enough represents most of data is transferred to next increment process, ensures fault correlation and data cause and effect The continuity of property.

For information content, measured using the influence degree to classifying quality.Take out an example in order every time And calculate the error E (y obtained when training grader using remainder data_r|x_r, δ), then it is placed back into and takes out next example, by This selects k-th maximum of data of error in classification of sening as an envoy to

E(y_k|x_k, δ) and=max { E (y_r|x_r, δ) }

Wherein, δ be classifier training model in parameter sets, y_rAnd y_kTo input x_rAnd x_kWhen output result.For Error function E (the y of remainder data_r|x_r, δ), calculated using false segmentation rate.Therefore, the information content for defining any example x is

Wherein, n is data sample size, and n-1 is to remove the data sample size after example x, n_eFor the number of classification error According to number of samples.Selection makes information content e_δ(x) example more than threshold value Δ e is put into example set I as the example for most having information content, Δ e values are false segmentation rate when being classified using all data samples.

For representativeness, data sample is clustered using k-means algorithms to obtain each cluster centre, selects each cluster Immediate vicinity is put into example set R, Δ d values away from example of its Euclidean distance less than threshold value Δ d as most representative example The 1/2 of distance between two cluster centres closest in each cluster centre.

To select while having information content and representational example, effective example set U=I ∩ R are obtained.

It is as preferable, described feature and example dynamic evaluation：

(1) feature dynamic forgets weighting

According to newly-increased pattern increment and combination principle, its dynamic backoff weight is calculated, to weigh this feature pattern at any time Between change it is important sexually revise degree, by weight compensate in the form of have the function that strengthen effective model and reduce failure mode. Because the similarity of newly-increased feature and existing feature can reflect the newly-increased pattern to the important of "current" model to a certain extent Property, so using following formula normalized to Pattern similarity value：

Wherein, minSim (P, Q) represents the minimum value in all Pattern similarities, and maxSim (P, Q) represents all patterns Maximum in similarity.

To principle 1) in take the similar feature mode of the height of union operation, raise its weight with strengthen and remember effectively Pattern.Its changeable weight computational methods is as follows：

W_i+1=W_i+Sim(P,Q)_norm

Wherein, i+1 time original feature P is merged the weight that is assigned during operation for ith merge weight with it is original Feature P is added with newly-increased feature Q similarity, i=0, W when merging for the first time₀For initial weight, it is arranged in original feature mode Maximum similarity maxSim (P, Q) between feature two-by-two.

To principle 2) in take newly-increased operation to add the feature of original pattern, assign initial weight for it, computational methods are such as Under：

W₀=Sim (P, Q)_norm

For principle 3) in the newly-increased pattern given up do not consider, but exist in original pattern similar to all newly-increased features Degree is respectively less than the situation of threshold alpha, and its weight is slowly reduced using following formula, until when being less than threshold alpha, shows that this feature has turned into nothing Feature is imitated, it need to be deleted from present mode and be allowed to thoroughly forget.

W_i+1=W_i- Sim (P, Q)_norm

Wherein, weight and its phase when the weight for the feature P that i+1 time does not occur in newly-increased pattern does not occur for ith Made the difference like degree, i=0, W when feature P does not occur for the first time₀For initial weight, feature two-by-two is likewise provided as in original feature mode Between maximum similarity maxSim (P, Q).

(2) example dynamic forgets weighting

It is considered as information content e_δ(x) example Assessment of Important weight is used as, information content is calculated by error in classification, typically For a relatively small numerical value, therefore weights of importance W_importIt is calculated as

W_import=1+e_δ(x)

Due to equipment state with the time without changing of stopping pregnancy, the example more early collected there may be what is gradually failed Change procedure, and more important informations may be included with larger in newly-increased example, therefore given for multiple selected example Give the forgetting weight W of one dynamic attenuation_forget。

Wherein, m is example selected number in incremental process.Example dynamic forgetting weight W, which can be obtained, is

Finally the effective example set of dynamic weighting is merged with newly-increased data sample, re-starts the incremental learning in next cycle Process, complete the spiral course of the fault diagnosis model based on the fusion of data-driven increment.

The present invention's has the advantages that：

The present invention proposes the sped structure model based on the fusion of data-driven increment, and non-equilibrium number is used to incremental data Handled according to four processing, feature extraction and classification, the selection of effective example, feature and example dynamic evaluation complete procedures.Should Model obtains being beneficial to the equilibrium criterion for accurately identifying fault type in the case where taking into full account sample noise and distribution characteristics, leads to Cross selected characteristic and example enters Mobile state forgetting evaluation and effective information is retained and handed on by increment merging, it is achieved thereby that The incremental learning and dynamic evaluation of equipment fault rapidly and efficiently.By analysis of experiments, increasing based on data-driven for proposition is demonstrated The validity of the sped structure model of fusion is measured, bearing failure diagnosis efficiency is averagely reached 98.24%, compared to without non-equilibrium The method of data handling procedure averagely improves 7.59%, is averagely improved compared to other shallow-layers and without increment depth learning method 3.80%, incremental learning and the reliability classification diagnosis of bearing non-balance failure data can be realized.

Brief description of the drawings

The present invention is further detailed explanation with reference to the accompanying drawings and detailed description.

Fig. 1 is sped structure model treatment procedure chart of the present invention based on the fusion of data-driven increment

Fig. 2 is the spiral incremental learning procedure chart of the present invention；

Fig. 3 is feature of present invention extraction and disaggregated model network structure；

Fig. 4 is sped structure model flow figure of the present invention based on the fusion of data-driven increment；

Fig. 5 is the K values and classification performance G-mean value graphs of a relation of four training sets

Fig. 6 is the hiding number of plies and reconstructed error graph of a relation of four training sets；

Fig. 7 is concealed nodes number and reconstructed error graph of a relation；

Fig. 8 is the slope and classification performance G-mean value graphs of a relation of four training sets；

Fig. 9 is the more several classes of and minority class accuracy rate of diagnosis comparison diagram of four training sets；

Figure 10 is the 30 experiment training error rates and test error rate comparison diagram of four training sets；

Figure 11 is the training time comparison diagram that five kinds of models diagnose to four training set increments.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.

A kind of sped structure model based on the fusion of data-driven increment, comprises the steps：

Step 1：Deep groove ball bearing is used in edm process respectively on bearing to inner ring, outer ring and rolling Body arranges the Single Point of Faliure of 3 fault levels, selects motor drive terminal vibrating sensor collection normal condition (N), inner ring event Hinder the vibration signal under 4 kinds of (IRF), outer ring failure (ORF) and rolling element failure (BF) states, sample frequency 12kHz, amount to 1,341,856 data point, to described 1,341,856 data point, vibration signal characteristics are extracted using WAVELET PACKET DECOMPOSITION, will It is divided into normal sample and fault sample by data generic；

Step 4：Use the method for resampling based on division neighbour respectively will be non-equilibrium to the non-equilibrium sample in step 2 Sample is divided into noise spot, boundary point and point of safes and carries out more several classes of lack samplings and minority class over-sampling with Different Strategies, closes And obtain the data sample of relative equilibrium；

Wherein,

More several classes of lack sampling methods are：For more several classes of samples in non-equilibrium data, by calculating in non-equilibrium data The Reverse K Nearest Neighbors of each more several classes of samples, are divided into noise spot, boundary point and the class of point of safes three, take respectively different Operation strategy carries out more several classes of lack samplings, and specific division rule is as follows：

(3) point of safes：That is the Reverse K Nearest Neighbors of certain more several classes of sample are more several classes of samples, then it represents that the point is in majority Class sample interior and remote decision boundary, influence smaller on classification.

Minority class oversampler method is：For the minority class sample in non-equilibrium data, it is divided into by calculating its k nearest neighbor Noise spot, boundary point and the class of point of safes three, and boundary point is further subdivided into dangerous spot and half point of safes, difference is taken respectively Operation strategy carry out synthesizing minority class over-sampling, specific division rule is as follows：

r_i=x+rand (0,1) * (y_i-x) (4)

(4) for point of safes, over-sampling is directly carried out using SMOTE.

In addition, the number of samples algorithm for each classified types in the method for resampling based on division neighbour is, The quantity N of more several classes of lack sampling reductions should be made_dThe quantity N increased equal to minority class over-sampling_i, i.e.,

Step 5：It is special that failure is carried out using the equilibrium criterion sample for adding random noise as the input of denoising autocoder Sign extraction, calculated when newly-increased data be present by Pattern similarity, Land use models increment carries out feature mode with combination principle Increment with merging, then input and bearing fault pattern classification diagnosis carried out in SVM classifier, and whole using BP algorithm fine setting Relevant parameter in disaggregated model network；

Wherein, denoising autocoder fault signature extracting method is：

Wherein, Pattern similarity is calculated as：

In addition, pattern increment is with combination principle：

If 1) β ＜ Sim (P, Q)_max, then it represents that newly-increased feature is similar to the feature height in original pattern, can be used for The nonlinear weight method of average that reconciles merges；

If 2) α ＜ Sim (P, Q)_max＜ β, then it represents that newly-increased feature and existing feature are different, it is impossible to use and existing spy The mode that sign merges replaces newly-increased feature, so this feature is incrementally added into original pattern；

If 3) Sim (P, Q)_max＜ α, then it represents that the similarity of newly-increased feature and existing feature is less than minimum threshold, and representing should Newly-increased feature is insignificant noise jamming value, and processing is given up to it.

Step 6：When there is newly-increased data increment to add model, using effective example system of selection, from legacy data sample It is middle will not only possess information content but also typical example select come, form effective example set；

E(y_k|x_k, δ) and=max { E (y_r|x_r, δ) }

For representativeness, data sample is clustered using k-means algorithms to obtain each cluster centre, selects each cluster Immediate vicinity is put into example set R, Δ d values away from example of its Euclidean distance less than threshold value Δ d as most representative example The 1/2 of distance between two cluster centres closest in each cluster centre.To select while having information content and representative The example of property, obtains effective example set U=I ∩ R.

Step 7：Effective example of validity feature and selection to extraction carries out Dynamic Comprehensive Evaluation, gives its energy respectively Enough represent the dynamic that importance changes over time and forget weight；

1) feature dynamic forgets weighting

To pattern increment and combination principle 1) in take the similar feature mode of the height of union operation, raise its weight with Strengthen and remember effective model.Its changeable weight computational methods is as follows：

W_i+1=W_i+ Sim (P, Q)_norm

To pattern increment and combination principle 2) in take newly-increased operation to add the feature of original pattern, for the initial power of its imparting Value, computational methods are as follows：

W₀=Sim (P, Q)_norm

For pattern increment and combination principle 3) in the newly-increased pattern given up do not consider, but exist in original pattern and institute There is the situation that newly-increased feature similarity is respectively less than threshold alpha, its weight is slowly reduced using following formula, until when being less than threshold alpha, show This feature has turned into invalid feature, it need to be deleted from present mode and be allowed to thoroughly forget.

W_i+1=W_i- Sim (P, Q)_norm

2) example dynamic forgets weighting

W_import=1+e_δ(x)

Due to equipment state with the time without changing of stopping pregnancy, the example more early collected there may be what is gradually failed Change procedure, and more important informations may be included with larger in newly-increased example, therefore given for multiple selected example Give the forgetting weight W of one dynamic attenuation_forget

Step 8：The effective example set of dynamic weighting is merged with newly-increased data sample, re-starts incremental learning process, from And realize incremental learning and the reliability classification diagnosis of bearing non-balance failure data.

1st, data describe

Using spark erosion technique respectively bearing designation be 6205-2RS JEM SKF deep groove ball bearings on to inner ring, Outer ring and rolling element arrange the Single Point of Faliure of 3 fault levels, select the normal shape of motor drive terminal vibrating sensor collection Vibration signal under this 4 kinds of states of state (N), inner ring failure (IRF), outer ring failure (ORF) and rolling element failure (BF), sampling frequency Rate is 12kHz, altogether 1,341,856 data points, and uses 1,341,856 data point the energy of each frequency range of WAVELET PACKET DECOMPOSITION Value, it is different classes of to distinguish to extract suitable parameter attribute.Analyze data sample is understood, between normal data and fault data Energy imbalance be present, normal data is more several classes of samples, and remaining fault data is minority class sample.For experiment, set forth herein mould The non-equilibrium data disposal ability of type, random sampling 80,40,20,10 failure samples merge shape with 100 normal samples respectively Into 4 different training samples, 20 failure samples and 50 normal sample composition test samples, wherein each sample includes 2048 Individual data point.For the incremental learning ability of test model, above-mentioned each training sample equivalent is divided into four groups, one of which is used In sped structure model of the training based on the fusion of data-driven increment, remaining three components are increased added to existing model three times Amount study.Specific bearing state data sample description is as shown in table 1.Emulation experiment is in 764 system Intel- of Windows Completed under I5CPU computer R3.2.5 platforms.

The bearing state data of table 1 describe

2nd, model parameter and structure

1) the K values in non-equilibrium data processing

In the method for resampling based on division neighbour that sped structure model based on the fusion of data-driven increment uses, The number K that neighbour chooses is extremely important during non-equilibrium data is handled, and affects the specific dividing condition of every a kind of data, Therefore it is determined using the mode of experiment.By calculating different G-mean values corresponding to different value of K in resampling methods, come true Its fixed optimal value.Wherein G-mean indexs can take into account minority class and more several classes of discriminations simultaneously, effectively reflect classification Device is to the performance of non-equilibrium data height, its calculation

Wherein, TP represents the minority class sample size correctly classified, and TN represents the more several classes of sample sizes correctly classified, FP Represent that mistake is divided into more several classes of quantity of minority class, FN represents that mistake is divided into more several classes of minority class quantity.K span is scheduled Discussed between 1~10, the result of the test in four different training sets is as shown in figure 5, G-mean values show to classify more greatly Performance is better, considers G-mean values with that after amount of calculation, can obtain optimal k values and be chosen for respectively：7、8、5、5.

2) disaggregated model structure

Determination for hiding number of layers, sets other model parameters identical, for hiding number of layers from the increase of 1-10 layers Reconstructed error of the test model in four different training sets respectively, experimental result as shown in fig. 6, training set 1, training set 3, The amplitude of variation of model reconstruction error reaches steady after the hiding number of plies increases to 3 layers in training set 4, illustrates to continue to increase hidden Hide the number of plies and be not only added significantly to amount of calculation on the contrary without reduction model reconstruction error.For training set 2, although reconstruct misses Difference gradually reduces in the increase with the hiding number of plies always, but fall is little, and in view of subsequently for different tendencies The contrast of rate training set, mutually isostructural disaggregated model should be selected, therefore choose and hide number of layers for 3 layers, build sorter network Model.

Determined for the number of hidden nodes purpose, set other model parameters identical, it is 3 layers to hide number of layers, sets 3 layers of section Point number is in by testing the ratio 1 obtained:2:3 increases, Fig. 7 is shown increases model reconstruction with the nodes of first hidden layer The change of error, remaining each the number of hidden nodes can be calculated with it.As seen from Figure 7, training set 2, training set 3, training set 4 exist Starting stage model reconstruction error drastically declines with the increase of the number of hidden nodes, but after the number of hidden nodes is more than 18, mould Type reconstructed error is in slow ascendant trend.For training set 1, model reconstruction error when the number of hidden nodes is more than 54 drastically under Drop, and reach relatively steady when 90.Because the model reconstruction error of remaining training set increased from 18 to 90 in the number of hidden nodes Ascensional range is smaller in journey, and to ensure subsequently to carry out the training set of different incline ratios pair using mutually isostructural disaggregated model Than, therefore it is respectively 90,180,270 to choose hidden node number, builds sorter network model.

3rd, interpretation of result

1) non-equilibrium data processing experiment

Based on section determines above model parameter and structure, for the non-equilibrium training set of four kinds of different slopes, divide Cai Yong not be comprising spiral fault diagnosis model of the non-equilibrium processing procedure based on the fusion of data-driven increment and not comprising non- The spiral fault diagnosis model (None) merged based on data-driven increment of Balance Treatment process and use owe to adopt at random Sample (R-U), random over-sampling (R-O), the disaggregated model of synthesis minority class over-sampling (SMOTE) carry out comparative training, and used Test set distinguishes test model diagnosis effect, records the accuracy rate and G-mean values and averaged of 10 experiments, contrast knot Fruit is as shown in table 2.

The non-balance failure diagnostic result of table 2 compares

Can be obtained by table 2, herein carry based on data-driven increment fusion spiral fault diagnosis model accuracy rate with It is superior to not carry out the model of non-equilibrium data processing in terms of G-mean, and to the non-equilibrium training set of four different slopes Other only single non-equilibrium data processing methods for carrying out lack sampling or over-sampling process are superior to the classifying quality of test set. Fig. 8 be above-mentioned five kinds of models classification performance under different slopes situation of change, it can be seen that set forth herein based on Method for resampling based on division neighbour in the spiral fault diagnosis model of data-driven increment fusion is compared with other method model Compare, the higher G-mean values that can be taken in the case of different pieces of information slope, and data skew degree and be not apparent from shadow Ring the classification performance of institute's extracting method, it is seen that the model of proposition has carried out good processing to non-equilibrium data.Fig. 9 is above-mentioned five kinds Model contrasts in four different training sets to the classification accuracy of more several classes of samples and minority class sample, it can be seen that carries mould Type has not only reached very high accuracy rate, additionally it is possible to higher discrimination is obtained for a small number of failure classes.Although and other method Overall accuracy rate is higher, but minority class sample identification rate is relatively low.

2) incremental learning is tested

4 different training sets are respectively divided into four groups, one of which is used to train based on the fusion of data-driven increment Spiral fault diagnosis model, remaining three components carry out incremental learning added to existing model three times.Use respectively set forth herein Same BP, SVM, AE, DAE method of spiral fault diagnosis model based on the fusion of data-driven increment carry out incremental learning pair Than, and test sample test model diagnosis effect is used, accuracy rate, the G-mean values of 10 experiments are recorded to every group of incremental data It is as shown in table 3 with run time and averaged, the training average value and test value comparing result of four groups of incremental datas of calculating.

The fault diagnosis result of table 3 compares

It can be obtained by table 3, carry the spiral fault diagnosis model based on the fusion of data-driven increment herein in accuracy rate, G- Mean is better than other four kinds of algorithms substantially with run time aspect for the training set of four difference non-equilibrium slopes.From mould From the point of view of in terms of type diagnosis effect, put forward the accuracy rate of model herein and G-mean values are kept in the data set of different slopes Higher level, hence it is evident that better than AE the and DAE algorithms for being not added with incremental learning, and although shallow-layer BP algorithm and SVM algorithm take Obtained high training effect, but BP algorithm test result is poor and training the time required to it is longer, SVM algorithm is to different inclinations The test effect fluctuation of rate data is larger.It can be seen that the model of proposition is due to effectively have chosen effective feature mode in incremental data With data instance be weighted merging and transmit, it is contemplated that its change over time it is important sexually revise degree, hence in so that model Fault diagnosis effect has a certain degree of raising.Figure 10 is to be entered respectively using above-mentioned five kinds of models for four different training sets 30 training of row and test gained error in classification rate comparing result, it can be seen that carry IMH models equal table in training and test It is now steady, the classification performance more stable than other models can be obtained.In terms of the model running time, except the training time Because comprising the time overhead needed for construction depth model, therefore more than outside SVM algorithm, the model of proposition is in training time and survey Other algorithms are superior on the examination time, this is due to that other four kinds of algorithms need the existing mould of re -training when facing incremental data Type adds run time, therefore shows to carry the spiral fault diagnosis model based on the fusion of data-driven increment for mitigating Model amount of calculation saves time cost and serves certain effect.

Training time situation during for adding incremental data each time is analyzed, and Figure 11 is shown for 4 differences The curve map of training time each time when four groups of its point is sequentially added training pattern by training set respectively, equally using BP algorithm, SVM algorithm and AE algorithms without incremental learning, DAE algorithms and the model method that proposes are contrasted.It can be seen that carry based on data Training time of the spiral fault diagnosis model of driving increment fusion during data increment does not increase significantly, and protects Hold within 0-2 minutes, hence it is evident that less than other three kinds of algorithms in addition to SVM.Two kinds of deep layer algorithms of AE and DAE are advised due to data The increase of mould causes to sharp rise the time required to tectonic model, although and SVM algorithm carries out the training time needed for incremental learning Seldom, but training and test effect are easily influenceed to produce larger fluctuation by data set slope.

As can be seen here, herein carry based on data-driven increment fusion spiral fault diagnosis model and without incremental learning The method of process is compared in the case where considering model accuracy, G-mean indexs and the aspect of run time three with more bright Aobvious advantage.Carried fault diagnosis model can pass through non-equilibrium event of the method for resampling based on division neighbour to being widely present Barrier data are effectively treated, and the validity feature in newly-increased data and example are forgotten into weighting evaluation increment by dynamic and merged Handing on, both remained the effective information previously learnt, the information for making to include in newly-increased data again has obtained due attention, Effectively increase the performance of bearing apparatus failure modes diagnosis.

4th, conclusion

For solve magnanimity in equipment running process, non-equilibrium, strong noise, strong causalnexus equipment operating data bring Various problems, this paper presents the sped structure model merged based on data-driven increment, to incremental data using non-equilibrium Four data processing, feature extraction and classification, the selection of effective example, feature and example dynamic evaluation complete procedures are handled. The model obtains being beneficial to the equilibrium criterion for accurately identifying fault type in the case where taking into full account sample noise and distribution characteristics, Mobile state forgetting evaluation is entered by selected characteristic and example and effective information is retained and handed on by increment merging, so as to realize The incremental learning and dynamic evaluation of equipment fault rapidly and efficiently.By analysis of experiments, demonstrate proposition based on data-driven The validity of the sped structure model of increment fusion, makes bearing failure diagnosis efficiency averagely reach 98.24%, compared to without non-flat The method of weighing apparatus data handling procedure averagely improves 7.59%, is averagely improved compared to other shallow-layers and without increment depth learning method 3.80%, incremental learning and the reliability classification diagnosis of bearing non-balance failure data can be realized.

Operation principle is as follows：Carried out successively using following institute's extracting method in its spiralization cycle for each incremental data block Processing, taking into full account sample distribution spy using the lack sampling based on Reverse K Nearest Neighbors and based on the oversampling technique of k nearest neighbor first In the case of sign, reduce more several classes of number of samples and increase minority class number of samples respectively, and exclude noise data, with obtain compared with For the data block of balance；Followed by the fault signature pattern in DAE model extraction data blocks and classification diagnosis is carried out, to newly-increased Pattern is merged based on similarity with original model selection, to realize the incremental learning of fault signature；Then when in the presence of newly-increased data When being incrementally added model, on the basis of considering to have strong causalnexus between fault increment and legacy data, selection both had Representative example as effective example, is incorporated into next newly-increased data block and carries out next round information content after evaluation again Processing, to realize the incremental learning of fault data；Finally chosen according to two kinds of measurement Dynamic Comprehensive Evaluations of importance and time variation Feature and example, the effective information previously learnt can either be preserved, the information for making to include in newly-increased data again obtains Due attention, to meet the actual conditions of equipment running process.

1st, the sped structure model structure based on the fusion of data-driven increment

Concrete processing procedure of the present invention as shown in figure 1, complete spiral incremental learning process as shown in Fig. 2 by from The complete procedure for pre-processing dynamic evaluation solves the newly-increased data continued to bring out in processing equipment operation.Wherein in feature mode Extraction and sorting phase, the denoising autocoder being connected by stacked multilayer forms depth network structure, as shown in figure 3, simultaneously In output layer of the top layer addition with classification feature of depth network structure, Supervised classification is carried out using SVM as grader, For error using BP algorithm fine setting each layer parameter of depth network structure, further optimizing whole network makes to reach global optimum.

2nd, algorithm implements

Sped structure model step based on the fusion of data-driven increment is described as follows, and flow chart is as shown in Figure 4.

(1) deep groove ball bearing is arranged on bearing to inner ring, outer ring and rolling element respectively using spark erosion technique The Single Point of Faliures of 3 fault levels, select motor drive terminal vibrating sensor collection normal condition (N), inner ring failure (IRF), the vibration signal under 4 kinds of states of outer ring failure (ORF) and rolling element failure (BF), sample frequency 12kHz, altogether 1, 341,856 data points, and use WAVELET PACKET DECOMPOSITION extraction vibration signal characteristics；

(2) bearing apparatus fault data is pre-processed, by data generic be divided into normal more several classes of samples and Failure minority class sample, the four non-equilibrium training samples and a test sample of different slopes are obtained by random sampling, To verify the non-equilibrium data disposal ability of model.Above-mentioned each training sample equivalent is divided into four groups, one of which is used for Training pattern, remaining three components are three times added to existing model, to verify the incremental learning ability of model；

(3) using based on division neighbour method for resampling respectively by non-equilibrium sample be divided into noise spot, boundary point and Point of safes simultaneously carries out more several classes of lack samplings and minority class over-sampling with Different Strategies, and merging obtains the data sample of relative equilibrium；

(4) fault signature extraction is carried out using the equilibrium criterion sample for adding random noise as DAE input, it is new when existing Calculated when increasing data by Pattern similarity, Land use models increment is with the increment of combination principle progress feature mode with merging, so Input afterwards and bearing fault pattern classification diagnosis is carried out in SVM classifier, and finely tuned using BP algorithm in whole disaggregated model network Relevant parameter；

(5), will from legacy data sample using effective example system of selection when there is newly-increased data increment to add model Not only possessed information content but also typical example is selected and, and formed effective example set；

(6) validity feature to extraction and effective example of selection carry out Dynamic Comprehensive Evaluation, and giving it respectively being capable of generation The dynamic that table importance changes over time forgets weight；

(7) the effective example set of dynamic weighting is merged with newly-increased data sample, re-starts incremental learning process, so as to real The incremental learning of existing bearing non-balance failure data and reliability classification diagnosis.

Claims

1. a kind of spiral fault diagnosis model based on the fusion of data-driven increment, it is characterised in that comprise the steps：

Step 1：Deep groove ball bearing is arranged on bearing to inner ring, outer ring and rolling element respectively using spark erosion technique The Single Point of Faliures of 3 fault levels, select motor drive terminal vibrating sensor collection normal condition (N), inner ring failure (IRF), the vibration signal under 4 kinds of states of outer ring failure (ORF) and rolling element failure (BF), obtains multiple data points, use is small Ripple bag decomposes extraction vibration signal characteristics, and it is divided into normal sample and fault sample by data generic；

Step 2：Five samples of different slopes, and each sample are randomly selected to normal sample in step 1 and fault sample Data point of normal sample is more than the data point in fault sample in this, and described five samples are four non-equilibrium training samples With a test sample；

Step 3：Each training sample equivalent described in step 2 is divided into four groups, one of which is first as training pattern Add in model and be trained, after obtaining new model, then by second group of addition, after obtaining new model, add the 3rd group, After obtaining new model, the 4th group is added, to verify the incremental learning ability of model；

Step 4：The method for resampling based on division neighbour is used to be respectively divided into noise in the non-equilibrium sample in step 2 Point, boundary point and point of safes simultaneously carry out most class lack samplings and minority class over-sampling, and merging obtains the data sample of relative equilibrium；

Step 5：It is special that failure is carried out using the equilibrium criterion sample of gained in step 4 as the input data of denoising autocoder Sign extraction, obtains validity feature, is calculated when newly-increased data be present by Pattern similarity, Land use models increment and combination principle The increment of feature mode is carried out with merging, then inputs and bearing fault pattern classification diagnosis is carried out in SVM classifier, and use BP Algorithm finely tunes the relevant parameter in whole disaggregated model network, finally gives spiral fault diagnosis model.

2. a kind of spiral fault diagnosis model based on the fusion of data-driven increment according to claim 1, its feature It is, in addition to

Step 6：When there is newly-increased data increment to add model, using effective example system of selection, from the data sample of step 1 It is middle will not only possess information content but also typical example select come, form effective example set；

Step 7：Dynamic Comprehensive Evaluation is carried out to the effective example selected in the validity feature and step 6 that are extracted in step 5, The dynamic forgetting weight that importance changes over time can be represented by giving it respectively, obtain dynamic weighting validity feature and dynamic adds Weigh effective example set；

Step 8：The effective example set of dynamic weighting in step 7 is merged with newly-increased data sample, re-starts incremental learning mistake Journey, so as to realize the incremental learning of bearing non-balance failure data and reliability classification diagnosis.

3. the spiral fault diagnosis model according to claim 1 based on the fusion of data-driven increment, it is characterised in that Method for resampling based on division neighbour in the step 4 comprises the steps：

(1) more several classes of lack sampling methods based on Reverse K Nearest Neighbors：

It is near by the reverse K for calculating each more several classes of samples in non-equilibrium data for more several classes of samples in non-equilibrium data Neighbour, noise spot, boundary point and the class of point of safes three are divided into, take different operation strategy progress is more several classes of to owe to adopt respectively Sample；Specific division rule is as follows：

Noise spot：That is the Reverse K Nearest Neighbors of certain more several classes of sample point are minority class sample；

Boundary point：I.e. in the Reverse K Nearest Neighbors of certain more several classes of sample point there is minority class sample in existing more several classes of samples again, show this Near decision boundary of the point in division data category；

Point of safes：That is the Reverse K Nearest Neighbors of certain more several classes of sample are more several classes of samples；

For noise spot, to avoid it from producing negative influence to subsequent classification effect, deletion strategy is taken；

For boundary point, in the case where lack sampling number allows, retain all boundary points with aid decision；When lack sampling number When being counted out less than border, more several classes of numbers are retained more than the data point of minority class in prioritizing selection Reverse K Nearest Neighbors；

For point of safes, arrange, preferentially delete distant from big to small according to point of safes and the distance of minority class center of a sample Point of safes；

(2) the minority class oversampler method based on k nearest neighbor：

For the minority class sample in non-equilibrium data, noise spot, boundary point and point of safes three are divided into by calculating its k nearest neighbor Class, and boundary point is further subdivided into dangerous spot and half point of safes, take different operation strategies to carry out synthesis respectively a small number of Class over-sampling, specific division rule are as follows：

Noise spot：That is the k nearest neighbor of certain minority class sample point is more several classes of samples；

Dangerous spot：I.e. in the k nearest neighbor of certain minority class sample point there is minority class sample, and more several classes of samples in existing more several classes of samples again This number is more than minority class sample, shows that the point is in decision boundary nearby and closer to more several classes of；

Half point of safes：I.e. in the k nearest neighbor of certain minority class sample point there is minority class sample, and minority class in existing more several classes of samples again Number of samples is more than or equal to more several classes of samples, shows that the point is in decision boundary nearby and closer to minority class；

Point of safes：That is the k nearest neighbor of certain minority class sample is minority class sample；

It is as follows for the corresponding synthesis minority class over-sampling strategy of different types of data point：

For dangerous spot, take and first delete more several classes of samples therein, then synthesize the operation of minority class over-sampling；

For half point of safes, the more several classes of neighbouring samples of selection utilize SMOTE over-samplings, until reaching defined amount；

For point of safes, over-sampling is directly carried out using SMOTE.

4. the spiral fault diagnosis model according to claim 3 based on the fusion of data-driven increment, it is characterised in that： The number of samples of each classified types is defined as in the described method for resampling based on division neighbour, described is more The quantity N of several classes of lack sampling reductions_dThe quantity N increased equal to minority class over-sampling_i, i.e.,

Wherein, N_majFor the quantity of more several classes of samples, N_minFor the quantity of minority class sample；

For every a kind of data point of more several classes of sample lack samplings, first to N therein_{n_maj}Individual noise spot is deleted, then is owed Number of samples N_underIt should be N_d-N_{n_maj}, as point of safes number N_{s_maj}≥N_underWhen, from more several classes of sample point of safes delete away from The N farthest from minority class center_underIndividual data point；As point of safes number N_{s_maj}＜ N_underWhen, selected after whole point of safes are deleted Select minority class number in Reverse K Nearest Neighbors and be more than more several classes of data points, it is preferential to delete number wherein nearest apart from minority class center Strong point, until total number of deleting reaches N_under,

For every a kind of data point of minority class sample over-sampling, noise spot therein is deleted first, then summed up into new Number of samples N_overIt should be noise number N in minority class_{n_min}+N_i；The point of safes suitably small amount for being relatively distant from decision boundary is closed Into, and synthesis number is suitably increased to the dangerous spot near decision boundary and half point of safes；Newly synthesized in each division classification Sample data calculates dangerous spot, the K of half point of safes respectively first by average more several classes of accountings determine in every class data point k nearest neighbor More several classes of average accounting R in neighbour_dan、R_half

<mrow> <msub> <mi>R</mi> <mrow> <mi>d</mi> <mi>a</mi> <mi>n</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&Sigma;N</mi> <mrow> <mi>d</mi> <mi>a</mi> <mi>n</mi> <mo>_</mo> <mi>m</mi> <mi>a</mi> <mi>j</mi> </mrow> </msub> </mrow> <mrow> <mi>K</mi> <mo>*</mo> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mi>a</mi> <mi>n</mi> </mrow> </msub> </mrow> </mfrac> </mrow>

<mrow> <msub> <mi>R</mi> <mrow> <mi>h</mi> <mi>a</mi> <mi>l</mi> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&Sigma;N</mi> <mrow> <mi>h</mi> <mi>a</mi> <mi>l</mi> <mi>f</mi> <mo>_</mo> <mi>m</mi> <mi>a</mi> <mi>j</mi> </mrow> </msub> </mrow> <mrow> <mi>K</mi> <mo>*</mo> <msub> <mi>N</mi> <mrow> <mi>h</mi> <mi>a</mi> <mi>l</mi> <mi>f</mi> </mrow> </msub> </mrow> </mfrac> </mrow>

Wherein, N_{dan_maj}、N_{half_maj}, be respectively each dangerous spot, numbers more several classes of in half point of safes k nearest neighbor, K is defined K nearest neighbor number, N_dan、N_halfDangerous spot, the number of half point of safes respectively in minority class, for point of safes, set R_safeFor one Individual relatively small value,

Wherein, N_safeFor the number of point of safes in minority class, and increase R with its number_safeIt is gradually reduced, then considers sample distribution The new samples quantity N that each type of minority class sample needs to synthesize is constructed in proportion_{over_dan}、N_{over_half}、N_{over_safe}For

5. the spiral fault diagnosis model according to claim 1 based on the fusion of data-driven increment, it is characterised in that The extraction of denoising autocoder fault signature comprises the steps in the step 5：

To sample x according to q_DDistribution adds random noise, it is become noisy sampleI.e.In formula, q_DFor binomial with Machine hides noise, then by optimizing following object function J_DAEDAE training is completed,

<mrow> <msub> <mi>J</mi> <mrow> <mi>D</mi> <mi>A</mi> <mi>E</mi> </mrow> </msub> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>min</mi> </mrow> <mrow> <mi>&theta;</mi> <mo>,</mo> <msup> <mi>&theta;</mi> <mo>&prime;</mo> </msup> </mrow> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>min</mi> </mrow> <mrow> <mi>&theta;</mi> <mo>,</mo> <msup> <mi>&theta;</mi> <mo>&prime;</mo> </msup> </mrow> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <msub> <mi>g</mi> <msup> <mi>&theta;</mi> <mo>&prime;</mo> </msup> </msub> <mo>(</mo> <mrow> <msub> <mi>f</mi> <mi>&theta;</mi> </msub> <mrow> <mo>(</mo> <mover> <mi>x</mi> <mo>^</mo> </mover> <mo>)</mo> </mrow> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>

In formula, L (x, z) is input sample x and exports the reconstructed error between y, and f θ () are coding letter of the input layer to hidden layer Number, θ be coding network parameter sets, g_θ′() is decoding functions of the hidden layer to output layer, and θ ' is the parameter set of decoding network Close.

6. the spiral fault diagnosis model according to claim 1 based on the fusion of data-driven increment, it is characterised in that Pattern similarity is calculated and comprised the steps in the step 5：

<mrow> <msub> <mi>D</mi> <mrow> <mi>k</mi> <mi>l</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>P</mi> <mo>|</mo> <mo>|</mo> <mi>Q</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <mi>P</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>Q</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>

Wherein, P (i) and i-th of value in Q (i) representative feature patterns P and Q, the otherness between P and Q is smaller, the value of KL divergences Smaller, then Similarity value is bigger；

Newly-increased feature pattern and the similarity of original feature mode are calculated, chooses maximum Sim (P, Q)_maxMould as this feature Formula Similarity value.

7. the spiral fault diagnosis model according to claim 1 based on the fusion of data-driven increment, it is characterised in that Pattern increment is with combination principle in the step 5：

Being represented using α makes the minimum similarity degree threshold value that similarity is significant between two contrast characteristics, β representative features it is general it is similar with Threshold limit value between highly similar, it is seen then that similarity threshold α ＜ β, and become with the increment of feature mode with merging α and β dynamics Change；

(1) if β ＜ Sim (P, Q)_max, then it represents that newly-increased feature is similar to the feature height in original pattern, can be used for adjusting Merged with the nonlinear weight method of average；

(2) if α ＜ Sim (P, Q)_max＜ β, then it represents that newly-increased feature and existing feature are different, it is impossible to which use is closed with existing feature And mode replace newly-increased feature, so this feature is incrementally added into original pattern；

(3) if Sim (P, Q)_max＜ α, then it represents that the similarity of newly-increased feature and existing feature is less than minimum threshold, and it is new to represent this Increasing is characterized as insignificant noise jamming value, and processing is given up to it.

8. the spiral fault diagnosis model according to claim 2 based on the fusion of data-driven increment, it is characterised in that The selection of effective example is in the step 6：

By information content and it is representative whether two dimensions valuable to example makes measurement, selecting both has abundant information amount Effective example that and can enough represents most of data is transferred to next increment process；

For information content, measured using the influence degree to classifying quality, take out an example in order every time and count Calculate the error E (y obtained when training grader using remainder data_r|x_r, δ), then it is placed back into and takes out next example, thus selects Select k-th maximum of data of error in classification of sening as an envoy to

E(y_k|x_k, δ) and=max { E (y_r|x_r, δ) }

Wherein, δ be classifier training model in parameter sets, y_rAnd y_kTo input x_rAnd x_kWhen output result, for remaining Error function E (the y of data_r|x_r, δ), calculated using false segmentation rate, therefore, the information content for defining any example x is

<mrow> <msub> <mi>e</mi> <mi>&delta;</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>E</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>r</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>r</mi> </msub> <mo>,</mo> <mi>&delta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <msub> <mi>n</mi> <mi>e</mi> </msub> <mrow> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> </mrow>

Wherein, n is data sample size, and n-1 is to remove the data sample size after example x, n_eFor the data sample of classification error Number, selection make information content e_δ(x) example more than threshold value Δ e is put into example set I as the example for most having information content, and Δ e takes It is worth false segmentation rate during to be classified using all data samples；

For representativeness, data sample is clustered using k-means algorithms to obtain each cluster centre, selects each cluster centre Example set R nearby is put into as most representative example away from example of its Euclidean distance less than threshold value Δ d, Δ d values are each In cluster centre between two closest cluster centres distance 1/2, to select while having information content and representational Example, obtain effective example set U=I ∩ R.

9. the spiral fault diagnosis model according to claim 7 based on the fusion of data-driven increment, it is characterised in that Feature and example dynamic evaluation are in the step 7：

(1) feature dynamic forgets weighting

To Pattern similarity value normalized：

Wherein, minSim (P, Q) represents the minimum value in all Pattern similarities, and maxSim (P, Q) represents that all patterns are similar Maximum in degree,

To taking the similar feature mode of the height of union operation in principle (1), its weight is raised to strengthen and remember effective mould Formula, its changeable weight computational methods are as follows：

W_i+1=W_i+ Sim (P, Q)_norm

Wherein, it is ith merging weight and original feature P that i+1 time, which merges the weight assigned during operation to original feature P, It is added with newly-increased feature Q similarity, i=0, W when merging for the first time₀For initial weight, it is arranged in original feature mode two-by-two Maximum similarity maxSim (P, Q) between feature；

To taking newly-increased operation to add the feature of original pattern in principle (2), initial weight is assigned for it, computational methods are as follows：

W₀=Sim (P, Q)_norm

Do not consider for the newly-increased pattern given up in principle (3), but exist and all newly-increased feature similarities in original pattern The respectively less than situation of threshold alpha, its weight is slowly reduced using following formula, until when being less than threshold alpha, it is invalid to show that this feature has turned into Feature, it need to be deleted from present mode and be allowed to thoroughly forget,

W_i+1=W_i- Sim (P, Q)_norm

Wherein, weight and its similarity when the weight for the feature P that i+1 time does not occur in newly-increased pattern does not occur for ith Make the difference, i=0, W when feature P does not occur for the first time₀For initial weight, it is likewise provided as in original feature mode between feature two-by-two Maximum similarity maxSim (P, Q)；

(2) example dynamic forgets weighting

Using information content e_δ(x) it is used as example Assessment of Important weight W_import, a generally relatively small numerical value：

W_import=1+e_δ(x)

The forgetting weight W of one dynamic attenuation is given for multiple selected example_forget

Wherein, m is example selected number in incremental process, can obtain example dynamic forgetting weight W and be

<mrow> <mi>W</mi> <mo>=</mo> <msub> <mi>W</mi> <mrow> <mi>i</mi> <mi>m</mi> <mi>p</mi> <mi>o</mi> <mi>r</mi> <mi>t</mi> </mrow> </msub> <mo>*</mo> <msub> <mi>W</mi> <mrow> <mi>f</mi> <mi>o</mi> <mi>r</mi> <mi>g</mi> <mi>e</mi> <mi>t</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <mn>1</mn> <mo>+</mo> <msub> <mi>e</mi> <mi>&delta;</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> <msqrt> <mi>m</mi> </msqrt> </mfrac> <mo>.</mo> </mrow>