CN112380932B - Vibration signal characteristic value selection method and elevator health state evaluation or fault diagnosis method - Google Patents

Vibration signal characteristic value selection method and elevator health state evaluation or fault diagnosis method Download PDF

Info

Publication number
CN112380932B
CN112380932B CN202011202576.8A CN202011202576A CN112380932B CN 112380932 B CN112380932 B CN 112380932B CN 202011202576 A CN202011202576 A CN 202011202576A CN 112380932 B CN112380932 B CN 112380932B
Authority
CN
China
Prior art keywords
characteristic value
characteristic values
frequency domain
value
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011202576.8A
Other languages
Chinese (zh)
Other versions
CN112380932A (en
Inventor
俞英杰
郑斌
周俊帆
马骧越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Mitsubishi Elevator Co Ltd
Original Assignee
Shanghai Mitsubishi Elevator Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Mitsubishi Elevator Co Ltd filed Critical Shanghai Mitsubishi Elevator Co Ltd
Priority to CN202011202576.8A priority Critical patent/CN112380932B/en
Publication of CN112380932A publication Critical patent/CN112380932A/en
Application granted granted Critical
Publication of CN112380932B publication Critical patent/CN112380932B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B66HOISTING; LIFTING; HAULING
    • B66BELEVATORS; ESCALATORS OR MOVING WALKWAYS
    • B66B5/00Applications of checking, fault-correcting, or safety devices in elevators
    • B66B5/0006Monitoring devices or performance analysers
    • B66B5/0018Devices monitoring the operating condition of the elevator system
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B66HOISTING; LIFTING; HAULING
    • B66BELEVATORS; ESCALATORS OR MOVING WALKWAYS
    • B66B5/00Applications of checking, fault-correcting, or safety devices in elevators
    • B66B5/0006Monitoring devices or performance analysers
    • B66B5/0037Performance analysers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01HMEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
    • G01H1/00Measuring characteristics of vibrations in solids by using direct conduction to the detector
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01MTESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
    • G01M13/00Testing of machine parts
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01MTESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
    • G01M13/00Testing of machine parts
    • G01M13/04Bearings
    • G01M13/045Acoustic or vibration analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a vibration signal characteristic value selection method, which comprises the steps of analyzing the relevance between characteristic values of m multiplied by n groups of vibration signals and label values of vibration signal state labels by using variance analysis, and carrying out primary coarse screening by deleting the characteristic values which are low in relevance with the label values; obtaining a comprehensive distance correlation coefficient by calculating the distance correlation coefficient of each characteristic value and other characteristic values, carrying out characteristic value correlation analysis, and carrying out secondary coarse screening by deleting the characteristic values with high comprehensive distance correlation coefficients; and combining the eigenvalue matrixes reserved by the second coarse screening, and performing recursive characteristic elimination based on the combined eigenvalue matrix to obtain the optimal eigenvalue combination. The vibration signal characteristic value selection method can effectively screen the characteristic values of the vibration signal data, obtain the optimal characteristic value combination for forming the vibration model of the mechanical equipment in the running state, effectively reduce the dimension of the model, save the calculation resources and improve the accuracy of the model.

Description

Vibration signal characteristic value selection method and elevator health state evaluation or fault diagnosis method
Technical Field
The invention relates to the technical field of machine learning and data mining, in particular to a vibration signal characteristic value selection method and an elevator health state evaluation or fault diagnosis method.
Background
In recent years, with the rapid development of sensor technology, a vibration sensor with a small size and a low cost is used to obtain a vibration signal when a machine is operated by being mounted on the machine, as shown in fig. 1. A large number of vibration signals of normal, abnormal and fault states of mechanical equipment acquired by a sensor are used, characteristic values of corresponding vibration signals are extracted through data cleaning and signal processing, and a mechanical equipment running state diagnosis model is established through data training based on a machine learning technology. According to the mechanical equipment running state diagnosis model, the functions of intelligent monitoring of the mechanical equipment state, intelligent diagnosis of the abnormal state of the mechanical equipment and the like can be realized by analyzing the vibration signals acquired in real time by combining a big data analysis technology.
Due to different fault types of mechanical equipment, the vibration signal characteristic values of the mechanical equipment also have significant differences. Taking an elevator as an example, the bearing fault of an elevator gear train shows remarkable frequency domain characteristics; and the elevator guide rail joint is abnormal and is often represented by typical time domain impact and the like. When extracting the vibration signal feature value of the mechanical device, the vibration signal feature value of the mechanical device may be generally divided into three categories, i.e., a time domain feature value, a frequency domain feature value, and a time-frequency domain feature value, where each category includes different feature values. Taking the time-domain feature value as an example, the time-domain feature value obtained by processing the vibration signal includes a peak-to-peak value, a root-mean-square value, an average value, a kurtosis, a skewness, and the like.
When the more characteristic values are selected for the mechanical equipment operation state diagnosis model, the larger the dimension of the model is, the larger the calculation scale is, and the required calculation resources are correspondingly increased; meanwhile, the dimension of the model is too large, the model is too complex, and an overfitting phenomenon is easy to generate, so that the accuracy of the diagnosis model is reduced.
When the characteristic values selected to form the mechanical equipment operation state diagnosis model are too few, an under-fitting phenomenon is easily generated, so that the fitting degree of the diagnosis model generated by training data is not high, and the accuracy of the diagnosis model is influenced.
For the above reasons, it is necessary to effectively screen the feature values of the corresponding data before training the data and establishing the mechanical equipment operation state diagnostic model, and obtain the optimal feature value combination constituting the mechanical equipment operation state vibration model on the basis of ensuring as few vibration signal feature values as possible.
In order to obtain the optimal characteristic value combination, a recursive characteristic elimination method is usually adopted for characteristic value selection, but the method has the defects that when the number of the characteristic values is too large, the calculation scale is huge, the whole recursive process is too long, and the efficiency of characteristic selection is influenced; in addition, due to the randomness of the selection of the training data set and the test data set, the deleted effective characteristic values in the recursion process may be influenced by the data distribution, so that the accuracy of the mechanical equipment operation state diagnosis model is influenced.
Disclosure of Invention
The invention aims to solve the technical problems that the characteristic values of vibration signal data can be effectively screened, the optimal characteristic value combination for forming the vibration model of the mechanical equipment in the running state is obtained on the basis of ensuring the vibration signal characteristic values as few as possible, the dimension of the model is effectively reduced, the calculation resources are saved, and the accuracy of the model is improved.
In order to solve the above technical problem, the method for selecting a vibration signal characteristic value provided by the present invention comprises the following steps:
the mechanical equipment running state comprises a normal state and m-1 abnormal states, wherein m is an integer greater than 1;
preprocessing an original vibration signal to enable each mechanical device to have n pieces of vibration signal data in an operating state, and deleting redundant original vibration signals, wherein n is an integer larger than 1;
secondly, performing signal processing on each group of preprocessed vibration signal data, and extracting b characteristic values, wherein b is an integer greater than 2; extracting m × n groups of preprocessed vibration signal data to obtain m × n × b characteristic values;
thirdly, constructing a characteristic value matrix X and a mechanical equipment state label vector Y by the m multiplied by n multiplied by b characteristic values;
the eigenvalue matrix X is m multiplied by n rows and b columns;
the mechanical equipment state label vector Y is m multiplied by n rows and 1 column;
each row of eigenvalue X in the eigenvalue matrix X is respectively subjected to variance analysis with the mechanical equipment state label vector Y, and f values of the eigenvalue X and the mechanical equipment state label vector Y are calculated; the smaller the f value is, the lower the relevance of the corresponding characteristic value x and the label value is;
according to the f value result obtained by calculation, deleting the characteristic value which is low in relevance with the label value, and carrying out primary coarse screening on the characteristic value to obtain s kinds of characteristic values, wherein s is a positive integer, and b is larger than s;
f value f of characteristic value h h Comprises the following steps:
Figure GDA0003817262690000021
Figure GDA0003817262690000022
wherein S is hA Is the inter-group dispersion of the eigenvalues h, S hE Is the intra-group dispersion of the eigenvalues h,
Figure GDA0003817262690000023
represents the average value of the ith group of data under the h-th characteristic value,
Figure GDA0003817262690000024
means of all data in h-th column of eigenvalues, x hij And j represents the j vibration signal data of the i group of data under the h characteristic value, h is an integer which is more than or equal to 1 and less than or equal to b, and N = N × m.
Classifying the s characteristic values after the first coarse screening according to a time domain, a frequency domain and a time-frequency domain to obtain p time domain characteristic values, q frequency domain characteristic values and r time-frequency domain characteristic values, wherein p + q + r = s, and p, q and r are positive integers;
carrying out secondary coarse screening on the classified s characteristic values, and reserving a characteristic values, wherein a is a positive integer, and b is more than s and more than a;
if p is less than or equal to 2, directly taking p as the number of the time domain characteristic values subjected to the secondary coarse screening; if p is larger than 2, calculating the comprehensive distance correlation coefficient of each time domain characteristic value to other time domain characteristic values in the time domain characteristic values, carrying out time domain characteristic value correlation analysis, and then carrying out secondary coarse screening on the time domain characteristic values;
if q is less than or equal to 2, directly taking q as the frequency domain characteristic value number subjected to secondary coarse screening; if q is larger than 2, calculating a comprehensive distance correlation coefficient of each frequency domain characteristic value to other frequency domain characteristic values in the frequency domain characteristic values, carrying out frequency domain characteristic value correlation analysis, and carrying out secondary coarse screening on the frequency domain characteristic values;
if r is less than or equal to 2, directly taking r as the number of the time-frequency domain characteristic values subjected to the secondary coarse screening; if r is larger than 2, calculating a comprehensive distance correlation coefficient of each time-frequency domain characteristic value to other time-frequency domain characteristic values in the time-frequency domain characteristic values, performing time-frequency domain characteristic value correlation analysis, and performing secondary coarse screening on the time-frequency domain characteristic values;
sixthly, carrying out characteristic value matrix combination on the a types of characteristic values reserved by the second coarse screening;
and seventhly, carrying out recursive characteristic elimination based on the combined characteristic value matrix to obtain the optimal characteristic value combination.
Preferably, in the third step, the feature value with low correlation with the tag value is deleted, and the feature value of the part with smaller f value in the b feature values is deleted according to the set proportion.
Preferably, the set ratio is 25%.
Preferably, the distance correlation coefficient is synthesized for each time domain characteristic value by calculating the sum of the distance correlation coefficients of the time domain characteristic value and other p-1 time domain characteristic values and dividing by p-1;
the comprehensive distance correlation coefficient of each frequency domain characteristic value is obtained by calculating the sum of the distance correlation coefficients of the frequency domain characteristic value and other q-1 frequency domain characteristic values and dividing the sum by q-1;
the comprehensive distance correlation coefficient of each time-frequency domain characteristic value is obtained by calculating the sum of the distance correlation coefficients of the time-frequency domain characteristic value and other r-1 time-frequency domain characteristic values and dividing the sum by r-1;
Figure GDA0003817262690000031
SdCr (Ax) is a comprehensive distance correlation coefficient of the xth characteristic value of the same type of characteristic value, y is the total number of the characteristic values, aw is the w-th characteristic value of the same type of characteristic value, ax is the xth characteristic value of the same type of characteristic value, and dCor (Ax, aw) is a distance correlation coefficient of Ax and Aw.
Preferably, in the fifth step, the correlation analysis of the time domain, the frequency domain and the time-frequency domain characteristic values is performed, and the characteristic values are coarsely screened for the second time, namely, according to the time domain, the frequency domain and the time-frequency domain characteristic value comprehensive distance correlation coefficients obtained by calculation, the partial characteristic values with larger comprehensive distance correlation coefficients in the time domain, the frequency domain and the time-frequency domain are deleted according to a set proportion, so that the time domain characteristic values, the frequency domain characteristic values and the time-frequency domain characteristic values after the coarse screening of the characteristic value correlation analysis for the second time are obtained.
Preferably, in the fifth step, the correlation analysis of the time domain, the frequency domain and the time-frequency domain characteristic values is performed, and the characteristic values are coarsely screened for the second time, by setting a time domain characteristic value comprehensive distance correlation coefficient threshold, a frequency domain characteristic value comprehensive distance correlation coefficient threshold and a time-frequency characteristic value comprehensive distance correlation coefficient threshold, and according to the calculated time domain, frequency domain and time-frequency domain characteristic value comprehensive distance correlation coefficients, deleting the characteristic values of which the comprehensive distance correlation coefficients are larger than a preset threshold in the time domain, frequency domain and time-frequency domain characteristic values, so as to obtain the time domain characteristic values, frequency domain characteristic values and time-frequency domain characteristic values which are coarsely screened for the second time through the characteristic value correlation analysis.
Preferably, in the fifth step, the correlation analysis of the time domain, the frequency domain and the time-frequency domain characteristic values is performed, and the characteristic values are coarsely screened for the second time:
if the number of the obtained time domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the time domain characteristic values before characteristic value correlation analysis as screening characteristic values of the time domain characteristic values; if the number of the obtained time domain characteristic values is more than 2, taking the time domain characteristic values obtained after the characteristic value correlation analysis as screening characteristic values of the time domain characteristic values;
if the number of the obtained frequency domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the frequency domain characteristic values before characteristic value correlation analysis as screening characteristic values of the frequency domain characteristic values; if the number of the obtained frequency domain characteristic values is more than 2, taking the frequency domain characteristic values obtained after the characteristic value correlation analysis as screening characteristic values of the frequency domain characteristic values;
if the number of the obtained time-frequency domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the time-frequency domain characteristic values before the characteristic value correlation analysis as screening characteristic values of the time-frequency domain characteristic values; and if the number of the obtained time-frequency domain characteristic values is more than 2, taking the time-frequency domain characteristic values obtained after the characteristic value correlation analysis as the screening characteristic values of the time-frequency domain characteristic values.
Preferably, in step seven, performing recursive feature elimination based on the combined eigenvalue matrix to obtain an optimal eigenvalue combination, where the step refers to:
during each recursion round, the vibration signals are required to be subjected to layered k-fold division again at random, the performance of the classification model is subjected to cross verification, and the evaluation index value and the feature value importance degree sequence of each round are calculated;
deleting the characteristic value with the minimum importance after each recursion, and performing next recursion analysis until only one characteristic value is remained and can not be deleted;
and selecting the optimal characteristic value combination when the evaluation index value is the largest and the number of the characteristic values is the smallest and all the characteristic values under the corresponding recursion turns are the optimal characteristic value combination.
Preferably, in step seven, performing recursive feature elimination based on the combined eigenvalue matrix to obtain an optimal eigenvalue combination, where the step refers to:
before the elimination of the recursive eigenvalue begins, the vibration signal is divided randomly by k-fold in a layering way;
during each recursion round, performing cross validation on the performance of the classification model by using the divided k-fold data, and calculating the importance degree sequence of the evaluation index value and the characteristic value of each recursion round;
deleting the characteristic value with the minimum importance after each recursion, and performing next recursion analysis until only one characteristic value is remained and can not be deleted;
and selecting the optimal characteristic value combination when the evaluation index value is the maximum and the number of the characteristic values is the minimum and all the characteristic values under the corresponding recursion turns are the optimal characteristic value combination.
Preferably, the step of k-folding in layers refers to randomly dividing all N groups of vibration data into k folds, and the vibration data amount under different state label classification under each fold is the same.
Preferably, the cross validation is to use k-1 folds in the data divided into k folds as training set data for training a classification model and calculate an importance value of the feature value; taking the rest 1 fold as test set data, evaluating a classification model, and calculating a classification model evaluation index value;
and changing data of one folding as a test set every time, taking other folds as a training set, and repeating the processes for k times to obtain evaluation index values and feature value importance values of k rounds.
Preferably, an average value of k evaluation values calculated in the k rounds is used as an evaluation index value of the recursion round.
Preferably, the evaluation index value is an accuracy, an accuracy or an f1 score model evaluation value.
Preferably, the average ranking result of the importance of k sets of eigenvalues obtained by k rounds of calculation is used as the importance ranking of the eigenvalues of the recursion round.
Preferably, the classification model adopts a random forest or a gradient boosting decision tree.
Preferably, when the training set data is subjected to gradient lifting decision tree classification model training, calculating the importance degree corresponding to each characteristic value;
for the g-th feature value, the importance is:
Figure GDA0003817262690000051
wherein, M represents the number of trees,
Figure GDA0003817262690000052
indicating the importance of the eigenvalue in the mth tree species,
Figure GDA0003817262690000053
wherein L is the number of leaf nodes of the tree, L-1 is the number of non-leaf nodes, v t Is a feature associated with the node t,
Figure GDA0003817262690000054
is node tA reduction in square loss after splitting;
deleting the importance after the first round of recursion is finished
Figure GDA0003817262690000055
Repeating the process to the minimum characteristic value, dividing the training set data and the test set data again, performing a second round of recursive analysis, calculating the importance of the characteristic value, and performing the second round of recursive analysis to obtain an evaluation index value Score _2 of the current round of recursive analysis;
by analogy, after each round of recursion is finished, deleting the characteristic value with the minimum importance, and performing next round of recursion analysis until only one characteristic value is left and cannot be deleted;
obtaining evaluation index values Score _1, score _2and Score _aunder all recursion rounds;
and selecting the characteristic value combination corresponding to the round with the maximum evaluation index value, namely the optimized optimal characteristic value combination.
The invention also provides a method for evaluating the health state of the elevator or diagnosing faults by adopting the characteristic value combination determined by the vibration signal characteristic value selection method, firstly, the characteristic value of the collected vibration signal is extracted by the optimal characteristic value combination;
then, labeling the elevator states corresponding to each group of vibration signals, performing model training by using a gradient lifting tree model or a support vector machine classification method, and establishing a health state evaluation and fault diagnosis model;
and then, real-time vibration data acquisition is carried out on the elevator to be evaluated, the vibration signal characteristic value is extracted according to the acquired vibration data, and health state evaluation and fault diagnosis are carried out on the elevator to be evaluated on the basis of the health state evaluation and fault diagnosis model.
Preferably, when the elevator runs, a vibration sensor arranged on the elevator triggers work and collects vibration signals when the elevator runs;
for the acquired vibration signals, removing the zero drift of the sensor through data segmentation;
and then, selecting the vibration signal characteristic value of the vibration signal to obtain the optimal characteristic value combination.
The vibration signal characteristic value selection method is based on the actual situation that the vibration signal data amount of model training is not distributed uniformly, firstly, the original vibration signal data needs to be preprocessed, the vibration signal data are divided into normal states and m-1 abnormal states according to the operation state of mechanical equipment, in order to prevent the vibration signal data amount under a certain state classification from being too large and influencing the result of subsequent characteristic value selection, redundant data need to be deleted, n groups of vibration signal data under each mechanical equipment operation state classification are ensured, and therefore the vibration signal data used for extracting the characteristic values have m multiplied by n groups. Analyzing the relevance between the characteristic values and the label values of the vibration signal state labels by analyzing variance of the characteristic values of the m multiplied by n groups of vibration signals, and reducing the number of the vibration characteristic values by deleting the characteristic values which are low in relevance with the label values to carry out primary coarse screening; obtaining a comprehensive distance correlation coefficient by calculating the distance correlation coefficient of each characteristic value and other characteristic values, carrying out characteristic value correlation analysis, carrying out secondary coarse screening by deleting the characteristic values with high comprehensive distance correlation coefficient, reducing the number of characteristic values to keep the characteristic values with higher independence among the characteristic values, greatly reducing the number of the characteristic values compared with the characteristic values before the secondary coarse screening, and being beneficial to improving the speed of eliminating recursive characteristics; and carrying out eigenvalue matrix combination on the a kinds of eigenvalues reserved by the secondary coarse screening, and carrying out recursive characteristic elimination based on the combined eigenvalue matrix to obtain the optimal eigenvalue combination. The vibration signal characteristic value selection method can effectively screen the characteristic values of the vibration signal data, obtain the optimal characteristic value combination for forming the vibration model of the mechanical equipment in the running state on the basis of ensuring the vibration signal characteristic values as few as possible, effectively reduce the dimension of the model, save the computing resources and improve the accuracy of the model.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the present invention are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a vibration signal on a mechanical device during operation of the mechanical device as collected by a vibration sensor;
FIG. 2 is a flow chart of an embodiment of a vibration signal characteristic value selection method according to the present invention;
FIG. 3 is a flow chart of a first recursive feature elimination performed by an embodiment of the vibration signal feature value selection method of the present invention;
FIG. 4 is a diagram showing the relationship between the number of characteristic values obtained by GBDT classification and the first recursive characteristic elimination and the accuracy evaluation index of the classification model;
FIG. 5 is a flow chart of a second recursive feature elimination performed by an embodiment of the vibration signal feature value selection method of the present invention;
fig. 6 is a diagram of the relationship between the number of feature values obtained by SVC classification and second recursive feature elimination and the classification model accuracy evaluation index value.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
As shown in fig. 2, the method for selecting the vibration signal characteristic value includes the following steps:
the mechanical equipment running state comprises a normal state and m-1 abnormal states, wherein m is an integer greater than 1;
preprocessing an original vibration signal to enable each mechanical device to have n pieces of vibration signal data in an operating state, and deleting redundant original vibration signals, wherein n is an integer larger than 1;
secondly, performing signal processing on each group of preprocessed vibration signal data, and extracting b characteristic values, wherein b is an integer greater than 2; extracting m × n groups of preprocessed vibration signal data to obtain m × n × b characteristic values;
thirdly, constructing a characteristic value matrix X and a mechanical equipment state label vector Y by the m multiplied by n multiplied by b characteristic values;
the eigenvalue matrix X is m multiplied by n rows and b columns;
the mechanical equipment state label vector Y is m multiplied by n rows and 1 column;
each row of eigenvalue X in the eigenvalue matrix X is respectively subjected to variance analysis with the mechanical equipment state label vector Y, and f values of the eigenvalue X and the mechanical equipment state label vector Y are calculated; the smaller the f value is, the lower the relevance of the corresponding characteristic value x and the label value is;
according to the f value result obtained by calculation, deleting the characteristic value which is low in relevance with the label value, and carrying out primary coarse screening on the characteristic value to obtain s kinds of characteristic values, wherein s is a positive integer, and b is larger than s;
f value f of characteristic value h h Comprises the following steps:
Figure GDA0003817262690000081
Figure GDA0003817262690000082
wherein S is hA Inter-group dispersion, S, of characteristic value h hE Is the intra-group dispersion of the eigenvalues h,
Figure GDA0003817262690000083
represents the average value of the ith group of data under the h-th characteristic value,
Figure GDA0003817262690000084
denotes the mean value, x, of all data under the h-th characteristic value hij And j represents the j vibration signal data of the i group of data under the h characteristic value, h is an integer which is more than or equal to 1 and less than or equal to b, and N = N × m.
Classifying the s characteristic values after the first coarse screening according to a time domain, a frequency domain and a time-frequency domain to obtain p time domain characteristic values, q frequency domain characteristic values and r time-frequency domain characteristic values, wherein p + q + r = s, and p, q and r are positive integers;
carrying out secondary coarse screening on the classified s characteristic values, and reserving a characteristic values, wherein a is a positive integer, and b is more than s and more than a;
if p is less than or equal to 2, directly taking p as the number of the time domain characteristic values subjected to the secondary coarse screening; if p is larger than 2, calculating the comprehensive distance correlation coefficient of each time domain characteristic value to other time domain characteristic values in the time domain characteristic values, carrying out time domain characteristic value correlation analysis, and then carrying out secondary coarse screening on the time domain characteristic values;
if q is less than or equal to 2, directly taking q as the frequency domain characteristic value number subjected to the secondary coarse screening; if q is larger than 2, calculating a comprehensive distance correlation coefficient of each frequency domain characteristic value to other frequency domain characteristic values in the frequency domain characteristic values, carrying out frequency domain characteristic value correlation analysis, and carrying out secondary coarse screening on the frequency domain characteristic values;
if r is less than or equal to 2, directly taking r as the number of the time-frequency domain characteristic values subjected to the secondary coarse screening; if r is larger than 2, calculating a comprehensive distance correlation coefficient of each time-frequency domain characteristic value to other time-frequency domain characteristic values in the time-frequency domain characteristic values, performing time-frequency domain characteristic value correlation analysis, and performing secondary coarse screening on the time-frequency domain characteristic values;
sixthly, carrying out characteristic value matrix combination on the a types of characteristic values reserved by the second coarse screening;
and seventhly, carrying out recursive characteristic elimination based on the combined characteristic value matrix to obtain the optimal characteristic value combination.
Preferably, in step three, the feature value with low correlation with the tag value is deleted by deleting the feature value of the part with smaller f value in the b feature values according to a set ratio (for example, 25%).
In the third step, constructing a characteristic value matrix X and a mechanical equipment state label vector Y from the m multiplied by n multiplied by b characteristic values, as shown in Table 1; the eigenvalue matrix X is m multiplied by n rows and b columns; the mechanical equipment state label vector Y is m multiplied by n rows and 1 column;
TABLE 1 vibration eigenvalues and tag values of preprocessed vibration signals
Figure GDA0003817262690000091
According to the classification of the state labels of the mechanical equipment, m multiplied by n groups of characteristic values 1 of the pre-processing vibration signal data of the mechanical equipment can be divided into m groups, and the data of the first group of the characteristic values 1 is divided into X groups 11 Is represented by x 111 ,x 112 ,…,x 11n (ii) a The second group of data with characteristic value 1 is represented by X 12 Is represented by x 121 ,x 122 ,…,x 12n (ii) a ...; data of m-th group of characteristic value 1 is represented by X 1m Is represented by x 1m1 ,x 1m2 ,…,x 1mn
The f-value solution will be described by taking the eigenvalue 1 as an example.
First, the interclass dispersion of eigenvalues 1 is calculated:
Figure GDA0003817262690000092
represents the average value of the ith group of data under the characteristic value 1,
Figure GDA0003817262690000093
represents the average of all data at characteristic value 1;
then calculate the intra-group dispersion of eigenvalue 1:
Figure GDA0003817262690000094
finally according to the formula
Figure GDA0003817262690000095
F-number f for determining characteristic value 1 h Where N = N × m.
By class-by-class extrapolation, the inter-group dispersion of the eigenvalues h is:
Figure GDA0003817262690000096
intra-group dispersion of eigenvalues h:
Figure GDA0003817262690000097
of the characteristic value hf value f h Comprises the following steps:
Figure GDA0003817262690000098
wherein
Figure GDA0003817262690000099
Represents the average value of the ith group of data under the characteristic value h,
Figure GDA00038172626900000910
represents the average of all data at the eigenvalue h.
Classifying the s characteristic values after the first coarse screening according to a time domain, a frequency domain and a time-frequency domain to obtain p time domain characteristic values, q frequency domain characteristic values and r time-frequency domain characteristic values, wherein p + q + r = s, as shown in table 2.
TABLE 2 eigenvalue classification after analysis of variance, deletion of low f eigenvalues
Figure GDA0003817262690000101
In the vibration signal feature value selection method according to the embodiment, based on the actual situation that the vibration signal data amount of model training is not uniformly distributed, the original vibration signal data needs to be preprocessed, the vibration signal data are divided into normal and m-1 abnormal states according to the operation state of the mechanical equipment, in order to prevent the vibration signal data amount under a certain state classification from being too large and affecting the result of selecting the subsequent feature value, redundant data need to be deleted, n groups of vibration signal data under each mechanical equipment operation state classification are ensured, and thus the vibration signal data used for extracting the feature value have m × n groups. Analyzing the relevance between the characteristic value and the label value of the vibration signal state label by analyzing the variance of the characteristic values of the mxn groups of vibration signals, and reducing the number of the vibration characteristic values by deleting the characteristic values which are low in relevance with the label value to carry out primary coarse screening; the distance correlation coefficient of each characteristic value and other characteristic values is calculated to obtain a comprehensive distance correlation coefficient, characteristic value correlation analysis is carried out, the characteristic value with high comprehensive distance correlation coefficient is deleted to carry out secondary rough screening, the number of characteristic values is reduced, so that the characteristic values with high independence among the characteristic values are reserved, the number of the characteristic values is greatly reduced compared with that before the secondary rough screening, and the speed of eliminating the recursive characteristics is improved; and carrying out eigenvalue matrix combination on the a kinds of eigenvalues reserved by the secondary coarse screening, and carrying out recursive characteristic elimination based on the combined eigenvalue matrix to obtain the optimal eigenvalue combination. The vibration signal characteristic value selection method of the first embodiment can effectively screen the characteristic values of the vibration signal data, and obtain the optimal characteristic value combination forming the vibration model of the mechanical equipment in the operating state on the basis of ensuring as few vibration signal characteristic values as possible, thereby effectively reducing the dimension of the model, saving the calculation resources and improving the accuracy of the model.
Example two
In the fifth step, the distance correlation coefficient is synthesized for each time domain characteristic value by calculating the sum of the distance correlation coefficients of the time domain characteristic value and other p-1 time domain characteristic values and dividing the sum by p-1;
the comprehensive distance correlation coefficient of each frequency domain characteristic value is obtained by calculating the sum of the distance correlation coefficients of the frequency domain characteristic value and other q-1 frequency domain characteristic values and dividing the sum by q-1;
the comprehensive distance correlation coefficient of each time-frequency domain characteristic value is obtained by calculating the sum of the distance correlation coefficients of the time-frequency domain characteristic value and other r-1 time-frequency domain characteristic values and dividing the sum by r-1;
Figure GDA0003817262690000102
SdCr (Ax) is a comprehensive distance correlation coefficient of the x-th characteristic value of the same characteristic value, y is the total number of the characteristic values, aw is the w-th characteristic value of the same characteristic value, ax is the x-th characteristic value of the same characteristic value, and dCor (Ax, aw) is a distance correlation coefficient of Ax and Aw.
EXAMPLE III
In the fifth step, the correlation analysis of the time domain, the frequency domain and the time-frequency domain eigenvalues is performed, and the eigenvalues are coarsely screened for the second time, wherein partial eigenvalues with larger correlation coefficients of the time domain, the frequency domain and the time-frequency domain comprehensive distances are deleted according to the calculated time domain, frequency domain and time-frequency domain eigenvalues and the set proportion, so that the time domain eigenvalues, frequency domain eigenvalues and time-frequency domain eigenvalues after the second coarse screening of the eigenvalue correlation analysis are obtained.
Example four
In the fifth step, a correlation analysis of time domain, frequency domain, and time-frequency domain characteristic values is performed, and a second coarse screening of the characteristic values is performed, in which a time domain characteristic value, a frequency domain characteristic value, and a time-frequency characteristic value are set, and according to the calculated time domain, frequency domain, and time-frequency domain characteristic value, a characteristic value in which a comprehensive distance correlation coefficient is greater than a preset threshold is deleted, and a time domain characteristic value, a frequency domain characteristic value, and a time-frequency domain characteristic value subjected to the second coarse screening of the characteristic value correlation analysis are obtained.
EXAMPLE five
In the vibration signal characteristic value selection method based on the third embodiment or the fourth embodiment, in the fifth step, correlation analysis of time domain, frequency domain and time-frequency domain characteristic values is performed, and a second coarse screening is performed on the characteristic values:
if the number of the obtained time domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the time domain characteristic values before characteristic value correlation analysis as screening characteristic values of the time domain characteristic values; if the number of the obtained time domain characteristic values is more than 2, taking the time domain characteristic values obtained after the characteristic value correlation analysis as screening characteristic values of the time domain characteristic values;
if the number of the obtained frequency domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the frequency domain characteristic values before characteristic value correlation analysis as screening characteristic values of the frequency domain characteristic values; if the number of the obtained frequency domain characteristic values is more than 2, the frequency domain characteristic values obtained after the characteristic value correlation analysis are used as screening characteristic values of the frequency domain characteristic values;
if the number of the obtained time-frequency domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the time-frequency domain characteristic values before the characteristic value correlation analysis as screening characteristic values of the time-frequency domain characteristic values; and if the number of the obtained time-frequency domain characteristic values is more than 2, taking the time-frequency domain characteristic values obtained after the characteristic value correlation analysis as the screening characteristic values of the time-frequency domain characteristic values.
EXAMPLE six
In the vibration signal eigenvalue selection method based on the first embodiment, as shown in fig. 3, in step seven, recursive eigenvalue elimination is performed based on the combined eigenvalue matrix to obtain an optimal eigenvalue combination, which means that:
during each recursion round, the vibration signals are required to be subjected to layered k-fold division again at random, the performance of the classification model is subjected to cross verification, and the evaluation index value and the feature value importance degree sequence of each round are calculated;
deleting the characteristic value with the minimum importance after each recursion, and performing next recursion analysis until only one characteristic value remains and cannot be deleted;
and selecting the optimal characteristic value combination when the evaluation index value is the largest and the number of the characteristic values is the smallest and all the characteristic values under the corresponding recursion turns are the optimal characteristic value combination.
EXAMPLE seven
In the vibration signal eigenvalue selection method based on the first embodiment, as shown in fig. 5, in step seven, recursive eigenvalue elimination is performed based on the combined eigenvalue matrix to obtain an optimal eigenvalue combination, which means that:
before the elimination of the recursive eigenvalue begins, the vibration signals are divided randomly by k-fold in layers;
during each recursion round, performing cross validation on the performance of the classification model by adopting the divided k-fold data, and calculating the evaluation index value and the feature value importance degree sequence of each round;
deleting the characteristic value with the minimum importance after each recursion, and performing next recursion analysis until only one characteristic value is remained and can not be deleted;
and selecting the optimal characteristic value combination when the evaluation index value is the maximum and the number of the characteristic values is the minimum and all the characteristic values under the corresponding recursion turns are the optimal characteristic value combination.
Example eight
Based on the vibration signal characteristic value selection method of the sixth or seventh embodiment, the hierarchical k-fold is to randomly divide all N groups of vibration data into k folds, and the vibration data amount under different state label classifications under each fold is the same.
Example nine
Based on the vibration signal characteristic value selection method of the sixth or seventh embodiment, the cross validation is to use k-1 folds in data divided into k folds as training set data for training a classification model and calculate the importance value of the characteristic value; taking the rest 1 fold as test set data, evaluating a classification model, and calculating a classification model evaluation index value;
and changing the data of one fold as a test set each time, taking other folds as a training set, and repeating the processes for k times to obtain evaluation index values and feature value importance values of k rounds.
Preferably, an average value of k evaluation values calculated in the k rounds is used as an evaluation index value of the recursion round.
Preferably, the evaluation index value may be an accuracy, or an f1 score model evaluation value.
Preferably, the average ranking result of the importance of k sets of eigenvalues obtained by k rounds of calculation is used as the importance ranking of the eigenvalues of the recursion round.
Preferably, the classification model may employ a random forest, a Gradient Boosting Decision Tree (GBDT), or the like.
Example ten
Based on the vibration signal characteristic value selection method of the ninth embodiment, when gradient boosting decision tree classification model training is performed on training set data, the importance degree corresponding to each characteristic value is calculated;
for the g-th feature value, the importance is:
Figure GDA0003817262690000131
wherein, M represents the number of trees,
Figure GDA0003817262690000132
indicating the importance of the eigenvalue in the m-th tree species,
Figure GDA0003817262690000133
wherein L is the number of leaf nodes of the tree, L-1 is the number of non-leaf nodes, v t Is a feature associated with the node t,
Figure GDA0003817262690000134
is the reduction of the square loss after splitting of node t;
deleting the importance after the first round of recursion is finished
Figure GDA0003817262690000135
Repeating the process to the minimum characteristic value, dividing the training set data and the test set data again, performing a second round of recursive analysis, calculating the importance of the characteristic value, and performing the second round of recursive analysis to obtain an evaluation index value Score _2 of the current round of recursive analysis;
by analogy, after each round of recursion is finished, deleting the characteristic value with the minimum importance, and performing next round of recursion analysis until only one characteristic value is left and cannot be deleted;
at the moment, obtaining evaluation index values Score _1, score _2and Score _aunder all recursion rounds;
and selecting the characteristic value combination corresponding to the round with the maximum evaluation index value, namely the optimized optimal characteristic value combination.
EXAMPLE eleven
A method for carrying out elevator health state assessment or fault diagnosis by adopting the characteristic value combination determined by the vibration signal characteristic value selection method comprises the following steps of firstly, carrying out characteristic value extraction on an acquired vibration signal through the optimal characteristic value combination;
then, labeling the elevator states corresponding to each group of vibration signals, performing model training by using a gradient lifting tree model or a support vector machine classification method, and establishing a health state evaluation and fault diagnosis model;
and then, real-time vibration data acquisition is carried out on the elevator to be evaluated, the vibration signal characteristic value is extracted according to the acquired vibration data, and health state evaluation and fault diagnosis are carried out on the elevator to be evaluated on the basis of the health state evaluation and fault diagnosis model.
Preferably, when the elevator runs, a vibration sensor arranged on the elevator triggers work and collects vibration signals when the elevator runs;
for the acquired vibration signals, removing the zero drift of the sensor through data segmentation;
and then, selecting the vibration signal characteristic value of the vibration signal to obtain the optimal characteristic value combination.
Example twelve
Based on the vibration signal characteristic value selection method of the first embodiment, data preprocessing is performed on vibration signal data acquired by a vibration sensor mounted on a mechanical equipment car. The related vibration data comprises normal and abnormal states, redundant data is deleted, and 30 groups of vibration signal data are guaranteed under each state classification. Therefore, the vibration signal data used for the selection of the vibration signal characteristic value of the mechanical equipment has 60 groups in the embodiment. For convenience of description, in this embodiment, the vibration signal data and the number of abnormal states are reduced, and in an actual use process, enough vibration signal data should be ensured to prevent an unreasonable feature value selection due to an excessively small data amount.
And extracting characteristic values of each group of preprocessed vibration signal data, and extracting 16 vibration characteristic values in total, wherein the vibration characteristic values comprise 6 time domain characteristic values, 5 frequency domain characteristic values and group time frequency characteristic values, and the vibration characteristic values are shown in tables 3-1 and 3-2.
And carrying out variance analysis on each row of eigenvalue vectors in the eigenvalue matrix and the label vectors respectively, calculating f values of the eigenvalue vectors and the label vectors, and carrying out coarse screening on the eigenvalue vectors.
First, calculate the inter-group dispersion of h-th column eigenvalues:
Figure GDA0003817262690000141
column h, intra-group dispersion of eigenvalues:
Figure GDA0003817262690000142
f value f of h-th row characteristic value h Comprises the following steps:
Figure GDA0003817262690000143
wherein h =16 is the number of eigenvalues; n =60, which is the total data amount of the vibration signal; m =2, representing a total of two state labels; n =30, indicating that there are 30 sets of vibration signal data for each state classification. The f values for the various eigenvalues are calculated from the above equation, as shown in table 4.
According to the calculated f value result, 25% of all the feature values with smaller f values are deleted, and the remaining feature values are classified according to time domain, frequency domain, and time-frequency domain, as shown in table 5.
TABLE 3-1 vibration characteristic value data extracted after mechanical equipment vibration signal preprocessing
Figure GDA0003817262690000144
Figure GDA0003817262690000151
Figure GDA0003817262690000161
TABLE 3-2 vibration characteristic value data extracted after mechanical equipment vibration signal preprocessing
Figure GDA0003817262690000162
Figure GDA0003817262690000171
Figure GDA0003817262690000181
TABLE 4 f-number calculated after analysis of variance
Characteristic value e1 e2 e3 e4 e4 e6 e7 e8
f value 756.8 204.7 1105.9 1999.4 481.3 6129.5 49.2 95.5
Characteristic value e9 e10 e11 e12 e13 e14 e15 e16
f value 24.3 125.9 30.5 145.9 17.5 68.1 5.5 111.9
TABLE 5 eigenvalue classification after elimination of Low f eigenvalues by analysis of variance
Figure GDA0003817262690000182
Because the time domain characteristic value, the frequency domain characteristic value and the time-frequency domain characteristic value are all more than 2, the comprehensive distance correlation coefficient of each characteristic value is respectively calculated in the time domain characteristic value, the frequency domain characteristic value and the time-frequency domain characteristic value, the characteristic value correlation analysis is carried out, and the characteristic value is roughly screened for the second time.
The following description will take the example of calculating the integrated distance correlation coefficient of the 1 st column time domain feature value e 1.
First, the 1 st column time domain is calculatedEach group of data x under characteristic value 11 ,x 12 ,…,x 1N L2 norm between:
a1 u,v =||x 1u -x 1v ||,
where u, v =1,2 … N. A1 is u,v As matrix elements, an N × N dimensional L2 norm matrix a1 is constructed, where u is denoted as a row and v is denoted as a column.
Then, the centering process is performed to obtain a center distance matrix A1. Each element A1 in A1 u,v Can be represented by the formula:
Figure GDA0003817262690000183
is obtained wherein a1 u,v Represents the element in the u-th row and v-th column of the L2-norm symmetric matrix a1,
Figure GDA0003817262690000184
the average value of the u-th row is shown,
Figure GDA0003817262690000185
the average value of the v-th column is shown,
Figure GDA0003817262690000186
represents the average of all elements in the matrix a 1.
Similarly, an L2 norm matrix Aw under the time domain eigenvalue w and a center distance matrix Aw after the centering processing can be obtained.
And then calculating the distance covariance between the time domain characteristic value of the 1 st column and the time domain characteristic value w:
Figure GDA0003817262690000187
distance variance of time domain feature values in column 1:
Figure GDA0003817262690000191
distance variance of time-domain eigenvalues w:
Figure GDA0003817262690000192
therefore, the distance correlation coefficient of the time domain characteristic value of the 1 st column and the time domain characteristic value w can be calculated:
Figure GDA0003817262690000193
and finally, calculating the comprehensive distance correlation coefficient of the 1 st column of time domain characteristic values to other time domain characteristic values:
Figure GDA0003817262690000194
then, the comprehensive distance correlation coefficient of the p-th column of time domain characteristic values to other time domain characteristic values can be sequentially obtained:
Figure GDA0003817262690000195
thus, the time domain characteristic value comprehensive distance correlation coefficient obtained by calculation corresponding to the time domain characteristic value in table 5 is shown in table 6, the frequency domain characteristic value comprehensive distance correlation coefficient is shown in table 7, and the time domain characteristic value comprehensive distance correlation coefficient is shown in table 8:
TABLE 6 time domain eigenvalue integrated distance correlation coefficient
e1 e2 e3 e4 e5 e6
0.951 0.867 0.955 0.955 0.936 0.961
TABLE 7 frequency domain eigenvalue integrated distance correlation coefficient
e7 e8 e10
0.860 0.876 0.747
TABLE 8 time-frequency domain eigenvalue integrated distance correlation coefficient
e12 e14 e16
0.693 0.637 0.700
Setting the threshold value of the time domain characteristic value comprehensive distance correlation coefficient to be 0.8, setting the threshold value of the frequency domain characteristic value comprehensive distance correlation coefficient to be 0.8, and setting the threshold value of the time domain characteristic value comprehensive distance correlation coefficient to be 0.8. And deleting the characteristic values of which the comprehensive distance correlation coefficient is larger than a preset threshold value in the characteristic values of the time domain, the frequency domain and the time-frequency domain.
At the moment, the number of the time domain characteristic values is less than 2, and 2 characteristic values e2 and e5 with the minimum comprehensive distance correlation coefficient of the time domain characteristic values are reserved as screening time domain characteristic values; the frequency domain characteristic value number is less than 2, and 2 characteristic values e7 and e10 with the minimum comprehensive distance correlation coefficient of the frequency domain characteristic values are reserved as the screening frequency domain characteristic values; and (5) deleting the characteristic values e12, e14 and e16 which are larger than the preset threshold value and are used as the time-frequency domain characteristic values when the number of the time-frequency domain characteristic values is larger than 2.
And (I) classifying the vibration signal data by adopting a gradient lifting tree model (GBDT) based on 7 characteristic values after the secondary coarse screening, and eliminating recursive characteristics.
When recursive eigenvalue elimination is performed, all 60 groups of vibration data are randomly divided into 3 folds, and the vibration data amount under different state label classification under each fold is the same. Of these, 2 folds were used as training set data and the remaining 1 fold was used as test set data.
For training set data, adopting the 7 eigenvalues subjected to analysis and screening of variance analysis and eigenvalue correlation analysis to train a GBDT classification model; for the data of the test set, 7 eigenvalues after being subjected to analysis of variance and analysis and screening of eigenvalue correlation are used for evaluating the classification model obtained by training.
And changing data of one fold as a test set every time, taking other folds as a training set, repeating the process for 3 times, performing cross validation, and ensuring that all data are used as the training set and the test set to participate in model training, wherein the average value of the accuracy obtained through 3 times of calculation is used as an evaluation index value of the first round of recursion.
When the GBDT classification model training is performed on the training set data, the importance corresponding to each feature value needs to be calculated. For the g characteristic value, the importance is as follows:
Figure GDA0003817262690000201
wherein, M represents the number of trees,
Figure GDA0003817262690000202
indicating the importance of the eigenvalue in the m-th tree species,
Figure GDA0003817262690000203
wherein L is the number of leaf nodes of the tree, L-1 is the number of non-leaf nodes, v t Is a feature associated with the node t,
Figure GDA0003817262690000204
is the reduction in the square penalty after the node t split.
Deleting the importance after the first round of recursion is finished
Figure GDA0003817262690000205
And repeating the process to the minimum characteristic value, dividing the training set data and the test set data again, performing a second round of recursive analysis, calculating the importance of the characteristic value, and obtaining the evaluation index value of the current round of recursive analysis.
And by analogy, after each round of recursion is finished, deleting the characteristic value with the minimum importance, and performing next round of recursion analysis until only one characteristic value remains and cannot be deleted any more. At this time, evaluation index values for all recursion rounds are obtained.
When the accuracy evaluation index value is the largest and the number of the characteristic values is the smallest, all the corresponding characteristic values under the recursion round are the optimal characteristic value combination.
The results of the accuracy evaluation index value and the feature value with the minimum importance obtained in each round of recursion are listed in table 9, and the optimal feature value combination is obtained by: e5.
TABLE 9 recursive feature elimination results
Recursion round Rate of accuracy Characteristic value Minimum importance eigenvalue
1 0.833 e2、e5、e7、e10、e12、e14、e16 e12
2 0.833 e2、e5、e7、e10、e14、e16 e14
3 0.833 e2、e5、e7、e10、e16 e2
4 0.933 e5、e7、e10、e16 e16
5 0.817 e5、e7、e10 e10
6 0.867 e5、e7 e7
7 0.983 e5 -
And (II) carrying out recursive feature elimination on the vibration signal data by adopting support vector machine classification (SVC) based on the 7 feature values after the secondary coarse screening.
When recursive eigenvalue elimination is performed, all 60 sets of vibration data are divided into 3 folds, and the vibration data amount under different state label classification under each fold is the same. Of these, 2 folds were used as training set data and the remaining 1 fold was used as test set data.
And in each subsequent recursion, the divided 3-fold data is adopted to carry out GBDT classification model training and classification model test evaluation. And changing a folded data as a test set every time, repeating the process for 3 times, and performing cross validation to obtain the evaluation index value and the importance of the characteristic value of each recursion.
And after each round of recursion is finished, deleting the characteristic value with the minimum importance, and performing the next round of recursion analysis until only one characteristic value remains and cannot be deleted any more.
When the accuracy evaluation index value is maximum and the number of the characteristic values is minimum, all the corresponding characteristic values under the recursion of the round are the optimal characteristic value combination.
The results of the accuracy evaluation index value and the feature value with the minimum importance obtained in each iteration are listed in table 10, and the optimal feature value combination is obtained by: e5.
TABLE 10 recursive feature elimination results
Recursion round Rate of accuracy Characteristic value Minimum importance eigenvalue
1 0.850 e2、e5、e7、e10、e12、e14、e16 e2
2 0.850 e5、e7、e10、e12、e14、e16 e7
3 0.850 e5、e10、e12、e14、e16 e12
4 0.850 e5、e10、e14、e16 e14
5 0.850 e5、e10、e16 e16
6 0.850 e5、e10 e10
7 1.0 e5 -
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (18)

1. A vibration signal characteristic value selection method is characterized by comprising the following steps:
the mechanical equipment running state comprises a normal state and m-1 abnormal states, wherein m is an integer greater than 1;
preprocessing an original vibration signal to enable each mechanical device to have n pieces of vibration signal data in an operating state, and deleting redundant original vibration signals, wherein n is an integer larger than 1;
secondly, performing signal processing on each group of preprocessed vibration signal data, and extracting b characteristic values, wherein b is an integer greater than 2; extracting m × n groups of preprocessed vibration signal data to obtain m × n × b characteristic values;
thirdly, constructing a characteristic value matrix X and a mechanical equipment state label vector Y by the m multiplied by n multiplied by b characteristic values;
the eigenvalue matrix X is m multiplied by n rows and b columns;
the mechanical equipment state label vector Y is m multiplied by n rows and 1 column;
each row of eigenvalue X in the eigenvalue matrix X is respectively subjected to variance analysis with the mechanical equipment state label vector Y, and f values of the eigenvalue X and the mechanical equipment state label vector Y are calculated; the smaller the f value is, the lower the relevance of the corresponding characteristic value x and the label value is;
according to the f value result obtained by calculation, deleting the characteristic value which is low in relevance with the tag value, and carrying out primary coarse screening on the characteristic value to obtain s kinds of characteristic values, wherein s is a positive integer, and b is larger than s;
f value f of characteristic value h h Comprises the following steps:
Figure FDA0003817262680000011
Figure FDA0003817262680000012
wherein S is hA Is the inter-group dispersion of the eigenvalues h, S hE Is the intra-group dispersion of the eigenvalues h,
Figure FDA0003817262680000013
represents the average value of the ith group of data under the h-th characteristic value,
Figure FDA0003817262680000014
denotes the mean value, x, of all data under the h-th characteristic value hij J vibration signal data representing i group data under h characteristic value, h is an integer of 1 or more and b or less, and N = N × m;
classifying the s characteristic values after the first coarse screening according to a time domain, a frequency domain and a time-frequency domain to obtain p time domain characteristic values, q frequency domain characteristic values and r time-frequency domain characteristic values, wherein p + q + r = s, and p, q and r are positive integers;
carrying out secondary coarse screening on the classified s characteristic values, and reserving a characteristic values, wherein a is a positive integer, and b is greater than s and is greater than a;
if p is less than or equal to 2, directly taking p as the number of the time domain characteristic values subjected to the secondary coarse screening; if p is greater than 2, calculating a comprehensive distance correlation coefficient of each time domain characteristic value to other time domain characteristic values in the time domain characteristic values, carrying out time domain characteristic value correlation analysis, and then carrying out secondary coarse screening on the time domain characteristic values;
if q is less than or equal to 2, directly taking q as the frequency domain characteristic value number subjected to secondary coarse screening; if q is greater than 2, calculating a comprehensive distance correlation coefficient of each frequency domain characteristic value to other frequency domain characteristic values in the frequency domain characteristic values, carrying out frequency domain characteristic value correlation analysis, and carrying out secondary coarse screening on the frequency domain characteristic values;
if r is less than or equal to 2, directly taking r as the number of the time-frequency domain characteristic values subjected to the secondary coarse screening; if r is greater than 2, calculating a comprehensive distance correlation coefficient of each time-frequency domain characteristic value to other time-frequency domain characteristic values in the time-frequency domain characteristic values, performing time-frequency domain characteristic value correlation analysis, and performing secondary coarse screening on the time-frequency domain characteristic values;
sixthly, carrying out characteristic value matrix combination on the a types of characteristic values reserved by the second coarse screening;
and seventhly, carrying out recursive characteristic elimination based on the combined characteristic value matrix to obtain the optimal characteristic value combination.
2. The vibration signal characteristic value selection method according to claim 1,
in the third step, the feature values with low relevance to the label value are deleted, and the feature values of the part with the smaller f value in the b feature values are deleted according to a set proportion.
3. The vibration signal characteristic value selection method according to claim 2,
the set proportion is 25%.
4. The vibration signal characteristic value selection method according to claim 1,
the comprehensive distance correlation coefficient of each time domain characteristic value is obtained by calculating the sum of the distance correlation coefficients of the time domain characteristic value and other p-1 time domain characteristic values and dividing the sum by p-1;
the comprehensive distance correlation coefficient of each frequency domain characteristic value is obtained by calculating the sum of the distance correlation coefficients of the frequency domain characteristic value and other q-1 frequency domain characteristic values and dividing the sum by q-1;
the comprehensive distance correlation coefficient of each time-frequency domain characteristic value is obtained by calculating the sum of the distance correlation coefficients of the time-frequency domain characteristic value and other r-1 time-frequency domain characteristic values and dividing the sum by r-1;
Figure FDA0003817262680000021
SdCr (Ax) is a comprehensive distance correlation coefficient of the x-th characteristic value of the same characteristic value, y is the total number of the characteristic values, aw is the w-th characteristic value of the same characteristic value, ax is the x-th characteristic value of the same characteristic value, and dCor (Ax, aw) is a distance correlation coefficient of Ax and Aw.
5. The vibration signal characteristic value selection method according to claim 1,
and fifthly, performing time domain, frequency domain and time-frequency domain characteristic value correlation analysis, performing secondary coarse screening on the characteristic values, namely deleting partial characteristic values with larger comprehensive distance correlation coefficients in the time domain, the frequency domain and the time-frequency domain according to a set proportion according to the calculated comprehensive distance correlation coefficients of the time domain, the frequency domain and the time-frequency domain characteristic values, and obtaining time domain characteristic values, frequency domain characteristic values and time-frequency domain characteristic values subjected to secondary coarse screening through characteristic value correlation analysis.
6. The vibration signal characteristic value selection method according to claim 1,
and fifthly, performing time domain, frequency domain and time-frequency domain characteristic value correlation analysis, performing secondary coarse screening on the characteristic values, and obtaining time domain characteristic values, frequency domain characteristic values and time-frequency domain characteristic values after the secondary coarse screening of the characteristic value correlation analysis by setting a time domain characteristic value comprehensive distance correlation coefficient threshold, a frequency domain characteristic value comprehensive distance correlation coefficient threshold and a time-frequency characteristic value comprehensive distance correlation coefficient threshold, deleting the characteristic values of which the comprehensive distance correlation coefficients are larger than a preset threshold in the time domain, frequency domain and time-frequency domain characteristic values according to the calculated time domain, frequency domain and time-frequency domain characteristic value comprehensive distance correlation coefficients.
7. The vibration signal characteristic value selection method according to claim 5 or 6,
and step five, performing correlation analysis on the time domain, the frequency domain and the time-frequency domain characteristic values, and performing secondary coarse screening on the characteristic values:
if the number of the obtained time domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the time domain characteristic values before characteristic value correlation analysis as screening characteristic values of the time domain characteristic values; if the number of the obtained time domain characteristic values is more than 2, taking the time domain characteristic values obtained after the characteristic value correlation analysis as screening characteristic values of the time domain characteristic values;
if the number of the obtained frequency domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the frequency domain characteristic values before the characteristic value correlation analysis as screening characteristic values of the frequency domain characteristic values; if the number of the obtained frequency domain characteristic values is more than 2, taking the frequency domain characteristic values obtained after the characteristic value correlation analysis as screening characteristic values of the frequency domain characteristic values;
if the number of the obtained time-frequency domain characteristic values is less than 2, keeping 2 characteristic values with the minimum comprehensive distance correlation coefficient in the time-frequency domain characteristic values before the characteristic value correlation analysis as screening characteristic values of the time-frequency domain characteristic values; and if the number of the obtained time-frequency domain characteristic values is more than 2, taking the time-frequency domain characteristic values obtained after the characteristic value correlation analysis as the screening characteristic values of the time-frequency domain characteristic values.
8. The vibration signal characteristic value selection method according to claim 1,
in the seventh step, performing recursive feature elimination based on the combined eigenvalue matrix to obtain an optimal eigenvalue combination, which means:
during each recursion round, the vibration signals are required to be subjected to layered k-fold division again at random, the performance of the classification model is subjected to cross verification, and the evaluation index value and the feature value importance degree sequence of each round are calculated;
deleting the characteristic value with the minimum importance after each recursion, and performing next recursion analysis until only one characteristic value is remained and can not be deleted;
and selecting the optimal characteristic value combination when the evaluation index value is the largest and the number of the characteristic values is the smallest and all the characteristic values under the corresponding recursion turns are the optimal characteristic value combination.
9. The vibration signal characteristic value selection method according to claim 1,
in the seventh step, performing recursive feature elimination based on the combined eigenvalue matrix to obtain an optimal eigenvalue combination, which means:
before the elimination of the recursive eigenvalue begins, the vibration signal is divided randomly by k-fold in a layering way;
during each recursion round, performing cross validation on the performance of the classification model by adopting the divided k-fold data, and calculating the evaluation index value and the feature value importance degree sequence of each round;
deleting the characteristic value with the minimum importance after each recursion, and performing next recursion analysis until only one characteristic value is remained and can not be deleted;
and when the evaluation index value is the largest and the number of the characteristic values is the smallest, all the characteristic values corresponding to the recursion turns are the optimal characteristic value combination.
10. The vibration signal characteristic value selection method according to claim 8 or 9,
the layering k-fold is to randomly divide all the N groups of vibration data into k folds, and the vibration data quantity of each fold under different state label classification is the same.
11. The vibration signal characteristic value selection method according to claim 8 or 9,
the cross validation is to take k-1 folds in the data divided into k folds as training set data for training a classification model and calculate an importance value of a characteristic value; taking the rest 1 fold as test set data, evaluating a classification model, and calculating a classification model evaluation index value;
and changing the data of one fold as a test set each time, taking other folds as a training set, and repeating the processes for k times to obtain evaluation index values and feature value importance values of k rounds.
12. The vibration signal characteristic value selection method according to claim 11,
and taking the average value of k evaluation values obtained by k rounds of calculation as an evaluation index value of the recursion round.
13. The vibration signal characteristic value selection method according to claim 11,
the evaluation index value is an accuracy rate, an accuracy rate or an f1 score model evaluation value.
14. The vibration signal characteristic value selection method according to claim 11,
and taking the average sorting result of the importance of the k groups of characteristic values obtained by k rounds of calculation as the importance sorting of the characteristic values of the recursion of the round.
15. The vibration signal characteristic value selection method according to claim 11,
the classification model adopts a random forest or gradient boosting decision tree.
16. The vibration signal characteristic value selection method according to claim 11,
calculating the importance corresponding to each characteristic value when training the gradient lifting decision tree classification model of the training set data;
for the g-th feature value, the importance is:
Figure FDA0003817262680000041
wherein, M represents the number of trees,
Figure FDA0003817262680000042
indicating the importance of the eigenvalue in the mth tree species,
Figure FDA0003817262680000043
wherein L is the number of leaf nodes of the tree, L-1 is the number of non-leaf nodes, v t Is a feature associated with the node t,
Figure FDA0003817262680000044
is the reduction of the square loss after splitting of node t;
deleting the importance after the first round of recursion is finished
Figure FDA0003817262680000045
Repeating the process to the minimum characteristic value, dividing the training set data and the test set data again, performing a second round of recursive analysis, calculating the importance of the characteristic value, and performing the second round of recursive analysis to obtain an evaluation index value Score _2 of the current round of recursive analysis;
by analogy, after each round of recursion is finished, deleting the characteristic value with the minimum importance, and performing next round of recursion analysis until only one characteristic value is left and cannot be deleted;
obtaining evaluation index values Score _1, score _2and Score _aunder all recursion rounds;
and selecting the characteristic value combination corresponding to the round with the maximum evaluation index value, namely the optimized optimal characteristic value combination.
17. A method for evaluating the health status or diagnosing a fault of an elevator by using a combination of characteristic values determined by the vibration signal characteristic value selection method according to claim 1,
firstly, extracting the characteristic value of the acquired vibration signal through the optimal characteristic value combination;
then, labeling the elevator states corresponding to each group of vibration signals, performing model training by using a gradient lifting tree model or a support vector machine classification method, and establishing a health state evaluation and fault diagnosis model;
and then, real-time vibration data acquisition is carried out on the elevator to be evaluated, the vibration signal characteristic value is extracted according to the acquired vibration data, and health state evaluation and fault diagnosis are carried out on the elevator to be evaluated on the basis of the health state evaluation and fault diagnosis model.
18. The method of elevator health assessment or fault diagnosis according to claim 17,
when the elevator runs, a vibration sensor arranged on the elevator triggers work and collects vibration signals when the elevator runs;
for the acquired vibration signals, removing the zero drift of the sensor through data segmentation;
and then, selecting the vibration signal characteristic value of the vibration signal to obtain the optimal characteristic value combination.
CN202011202576.8A 2020-11-02 2020-11-02 Vibration signal characteristic value selection method and elevator health state evaluation or fault diagnosis method Active CN112380932B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011202576.8A CN112380932B (en) 2020-11-02 2020-11-02 Vibration signal characteristic value selection method and elevator health state evaluation or fault diagnosis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011202576.8A CN112380932B (en) 2020-11-02 2020-11-02 Vibration signal characteristic value selection method and elevator health state evaluation or fault diagnosis method

Publications (2)

Publication Number Publication Date
CN112380932A CN112380932A (en) 2021-02-19
CN112380932B true CN112380932B (en) 2022-10-14

Family

ID=74577108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011202576.8A Active CN112380932B (en) 2020-11-02 2020-11-02 Vibration signal characteristic value selection method and elevator health state evaluation or fault diagnosis method

Country Status (1)

Country Link
CN (1) CN112380932B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113264429A (en) * 2021-03-15 2021-08-17 上海电气集团股份有限公司 Vibration data processing, model training and detecting method and system for lifting equipment
CN117743815B (en) * 2024-02-07 2024-05-14 国网江苏省电力有限公司南京供电分公司 Method and system for optimizing signals of multi-path vibration sensor of circuit breaker

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104215905A (en) * 2014-09-05 2014-12-17 浙江工业大学 Motor fault diagnosis method based on Mahalanobis-Taguchi system and Box-Cox transformation
WO2018111428A1 (en) * 2016-12-12 2018-06-21 Emory Universtity Using heartrate information to classify ptsd
CN108760266A (en) * 2018-05-31 2018-11-06 西安交通大学 The virtual degeneration index building method of mechanical key component based on learning distance metric
EP3654588A1 (en) * 2018-11-19 2020-05-20 Cisco Technology, Inc. Active learning for interactive labeling of new device types based on limited feedback

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103528820B (en) * 2013-10-10 2015-11-18 辽宁大学 A kind of Fault Diagnosis of Roller Bearings based on distance evaluation factor potential-energy function
JP6873006B2 (en) * 2017-08-09 2021-05-19 キヤノン株式会社 Manufacturing method for transport equipment, lithography equipment and articles
CN110108431B (en) * 2019-05-22 2021-07-16 西安因联信息科技有限公司 Mechanical equipment fault diagnosis method based on machine learning classification algorithm
CN111398798B (en) * 2020-03-05 2022-08-19 广西电网有限责任公司电力科学研究院 Circuit breaker energy storage state identification method based on vibration signal interval feature extraction
CN111680661A (en) * 2020-06-19 2020-09-18 哈尔滨工业大学 Mechanical rotating part performance degradation tracking method based on multi-feature fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104215905A (en) * 2014-09-05 2014-12-17 浙江工业大学 Motor fault diagnosis method based on Mahalanobis-Taguchi system and Box-Cox transformation
WO2018111428A1 (en) * 2016-12-12 2018-06-21 Emory Universtity Using heartrate information to classify ptsd
CN108760266A (en) * 2018-05-31 2018-11-06 西安交通大学 The virtual degeneration index building method of mechanical key component based on learning distance metric
EP3654588A1 (en) * 2018-11-19 2020-05-20 Cisco Technology, Inc. Active learning for interactive labeling of new device types based on limited feedback

Also Published As

Publication number Publication date
CN112380932A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
CN108231201B (en) Construction method, system and application method of disease data analysis processing model
CN110378844B (en) Image blind motion blur removing method based on cyclic multi-scale generation countermeasure network
CN110516305B (en) Intelligent fault diagnosis method under small sample based on attention mechanism meta-learning model
CN110188047B (en) Double-channel convolutional neural network-based repeated defect report detection method
CN112380932B (en) Vibration signal characteristic value selection method and elevator health state evaluation or fault diagnosis method
CN112257530B (en) Rolling bearing fault diagnosis method based on blind signal separation and support vector machine
CN105657402A (en) Depth map recovery method
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
CN110455512B (en) Rotary mechanical multi-integration fault diagnosis method based on depth self-encoder DAE
CN110363230A (en) Stacking integrated sewage handling failure diagnostic method based on weighting base classifier
CN110188196B (en) Random forest based text increment dimension reduction method
CN108460336A (en) A kind of pedestrian detection method based on deep learning
CN112085062A (en) Wavelet neural network-based abnormal energy consumption positioning method
CN114266894A (en) Image segmentation method and device, electronic equipment and storage medium
CN112884149A (en) Deep neural network pruning method and system based on random sensitivity ST-SM
CN114386452B (en) Nuclear power circulating water pump sun gear fault detection method
CN109617864B (en) Website identification method and website identification system
CN114511521A (en) Tire defect detection method based on multiple representations and multiple sub-field self-adaption
CN114371009A (en) High-speed train bearing fault diagnosis method based on improved random forest
CN110673017A (en) Analog circuit fault element parameter identification method based on genetic algorithm
CN105741258A (en) Hull component image segmentation method based on rough set and neural network
CN117058079A (en) Thyroid imaging image automatic diagnosis method based on improved ResNet model
CN113610350B (en) Complex working condition fault diagnosis method, equipment, storage medium and device
CN115905806A (en) Deep neural network bearing fault diagnosis method with attention mechanism
CN115526227A (en) Helicopter flight data instantaneous anomaly detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant