CN111626418B

CN111626418B - Compressor fault classification method based on balanced binary tree integrated pruning strategy

Info

Publication number: CN111626418B
Application number: CN202010458446.4A
Authority: CN
Inventors: 邓晓衡; 蔚永; 黑聪; 刘梦杰
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2022-04-22
Anticipated expiration: 2040-05-27
Also published as: CN111626418A

Abstract

The invention provides an integrated pruning strategy based on a balanced binary tree, which comprises the following steps: s1, initializing a base classifier integration pool: segmenting the large data set to form a plurality of sub data sets, and training and testing each sub data set to form an initial complete classifier pool; s2, constructing a balanced binary tree to form a final sub-integration; and S3, predicting and classifying the new data sample by using the retained optimal sub-integration. The invention solves the technical problems that the fitting phenomenon is easy to generate, the base classifier with too high or too low test precision is difficult to remove, and the generalization performance is not high.

Description

Compressor fault classification method based on balanced binary tree integrated pruning strategy

Technical Field

The invention relates to the technical field of ensemble learning, in particular to a compressor fault classification method based on a balanced binary tree ensemble pruning strategy.

Background

The ensemble learning solves a plurality of problems of a single classifier in the process of mass data training and learning, but as the ensemble learning is realized by forming an ensemble pool by a plurality of single classifiers to complete a prediction or classification task, higher requirements are put forward on computer hardware resources, and the common mode for solving the problem is to use an ensemble pruning strategy idea to reduce the number of the used single classifiers as much as possible under the condition of ensuring that the final prediction or classification precision of the ensemble learning is not reduced or even improved.

The method is characterized in that a clustering task is completed by taking the testing precision of each base classifier in an integration pool as a data point, and then a subset consisting of corresponding base classifiers and consisting of most data points is selected as a final integration completion task; the integrated pruning strategy based on the optimization problem mainly uses the test results of all base classifiers in an integrated pool as an optimization problem to search the optimal sub-integration, the integrated pruning strategy based on reinforcement learning mainly tentatively searches the optimal sub-integration again and again through a reinforcement algorithm, and the integrated pruning strategy based on the sequence mainly obtains the optimal sub-integration by sequencing the precision of all the base classifiers.

The traditional sequence-based integrated pruning may cause the generation of an overfitting phenomenon, the strategy is improved, the elimination work of a part of basis classifiers with too high test precision or too low test precision in an integrated pool is completed by utilizing the characteristic of a balanced binary tree, and finally the part of basis classifiers with better generalization performance are reserved as final sub-integration.

Disclosure of Invention

The invention provides a compressor fault classification method based on a balanced binary tree integrated pruning strategy, and aims to solve the problems that a fitting phenomenon is easy to generate, a base classifier with too high or too low test precision is difficult to remove, and the generalization performance is not high in the background technology.

In order to achieve the above object, an embodiment of the present invention provides a method for classifying a compressor fault based on a balanced binary tree integrated pruning strategy, which is characterized by comprising the following steps:

s1, initializing a base classifier integration pool: segmenting the large data set to form a plurality of sub data sets, and training and testing each sub data set to form an initial complete classifier pool;

s2, constructing a balanced binary tree to form a final sub-integration: constructing a balanced binary tree according to the precision of the base classifiers in the base classifier pool, wherein each node on the balanced binary tree represents the training precision of each base classifier in the integration pool, eliminating partial leaf nodes of the left lower branch and the right lower branch of the balanced binary tree by setting a boundary pruning function, and reserving partial nodes of the middle trunk to form final sub-integration;

and S3, predicting and classifying the new data sample by using the retained optimal sub-integration.

In S1, the artificial neural network ANN is used as the basis classifier to complete training and testing, and obtain the initial basis classifier pool and the training precision of each basis classifier.

In S2, the precision of the base classifier represented by the root node is ranked at the middle position in the precision of the base classifier in the whole integration pool, the base classifier represented by each leaf node of the left lower branch is at the last position in the precision ranking of each base classifier in the integration pool, and the precision of the base classifier represented by each leaf node of the right lower limb is at the front position in the precision ranking in the integration pool.

In S2, the number of left and right branch leaf nodes of the root node is counted, a pruning threshold is set, and node elimination is performed according to the pruning threshold.

The average value of the nodes of the left branch and the right branch of the balanced binary tree is respectively used as a left pruning threshold and a right pruning threshold, and the left branch pruning threshold and the right branch pruning threshold are respectively as follows:

in S1, the big data includes a normal data set and an abnormal data set.

Wherein the number ratio range of the data sets under the normal condition to the data sets under the abnormal condition is 100: 1-1000: 1.

the segmentation work in the step S1 specifically includes: and segmenting the data set under the normal condition to obtain subdata sets under the normal condition, and merging the subdata sets under each normal operation condition with the data sets under other abnormal conditions to form data sets under various operation conditions.

The scheme of the invention has the following beneficial effects:

the method has the advantages that the balanced binary tree is used for removing the partial base classifiers with too poor generalization capability and too poor precision in the integrated pool, in the removing process, the integral base classification pool is sequentially constructed, the base classifiers with too good training precision can be deleted conveniently, the partial base classifiers are often classifiers with too poor generalization capability, and on the other hand, the classifiers with poor training precision are deleted, so that the integral integrated pool is ensured to have higher precision finally. Therefore, the invention considers the whole integrated pool, and further realizes the elimination work of the bad base classifier.

The high-quality characteristics of the balanced binary tree are used for constructing each base classifier in the integration pool according to the precision, so that the constructed balanced binary tree has the sequence characteristic, the operation of removing the classifiers with too good precision and too poor precision in the base classifier pool can be completed by carrying out the trimming operation of the left lower branch and the right lower branch on the tree, on one hand, the phenomenon of over-fitting can be avoided by removing the base classifiers with too good precision, and on the other hand, the integral integration precision can be improved by removing the base classifiers with too poor precision. So that the resulting subset composition can achieve the best generalization performance. Secondly, the dependence of the algorithm model on the hardware resources of the computer can be reduced as much as possible by reducing the scale of the integrated pool.

Drawings

FIG. 1 is a flow chart of a compressor fault classification method based on a balanced binary tree integrated pruning strategy of the present invention;

fig. 2 shows the test accuracy of the final sub-integration obtained under each pruning strategy by adopting four different pruning strategies to perform the elimination work of the bad base classifier of the base classifier pool.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

Aiming at the existing problems, the invention provides a compressor fault classification method based on a balanced binary tree integrated pruning strategy, which comprises the following steps:

s1, initializing a base classifier integration pool: segmenting the large data set to form a plurality of sub data sets, and training and testing each sub data set to form an initial complete classifier pool; the big data comprises a data set under a normal condition and a data set under an abnormal condition. The number ratio of the data sets in the normal case to the data sets in the abnormal case ranges from 100: 1-1000: 1.

the segmentation work specifically comprises the following steps: and segmenting the data set under the normal condition to obtain subdata sets under the normal condition, and merging the subdata sets under each normal operation condition with the data sets under other abnormal conditions to form data sets under various operation conditions.

The training and testing work is specifically to use an artificial neural network ANN as a base classifier to finish the training and testing work and obtain an initial base classifier pool and the training precision of each base classifier.

S2, constructing a balanced binary tree to form a final sub-integration: constructing a balanced binary tree according to the precision of the base classifiers in the base classifier pool, wherein each node on the balanced binary tree represents the training precision of each base classifier in the integration pool, the precision of the base classifier represented by the root node is ranked in the middle position in the precision of the base classifiers in the whole integration pool, the base classifier represented by each leaf node of the left lower branch is located at the tail position in the precision ranking of each base classifier in the integration pool, and the precision of the base classifier represented by each leaf node of the right lower limb is located at the front position in the precision ranking in the integration pool; counting the number of left and right branch and leaf nodes of the root node, setting a pruning threshold value, and removing nodes according to the pruning threshold value; by setting a boundary pruning function, removing partial leaf nodes of a left lower branch and a right lower branch of the balanced binary tree, and reserving partial nodes of a middle trunk to form a final sub-integration; respectively taking the average value of the nodes of the left branch and the right branch of the balanced binary tree as a left pruning threshold and a right pruning threshold, wherein the left pruning threshold and the right pruning threshold are respectively as follows:

The scheme is applied to fault diagnosis of the natural gas compressor, and a good effect is obtained in the fault diagnosis. A plurality of sensor data are built in the selected compressor, all sensors acquire data every 3 seconds, a data set with sample size of tens of millions in a certain time period is selected, and a large data set is more suitable for the proposed model.

The first step is as follows: respectively collecting data under the normal operation condition of the compressor and under the common 4 fault conditions, and respectively counting data volume information under various operation conditions. Each compressor acquires 5 Data in total, namely a Data set Data01 under a normal condition, a Data set Data02 under an abnormal condition of low inlet pressure, a Data set Data03 under an abnormal condition of air leakage of an exhaust valve, a Data set Data04 under an abnormal condition of high pressure of a recovery tank and a Data set Data05 under an abnormal condition of swinging of a rotating shaft, wherein the sample number ratio of each Data set is as follows: data01 Data02 Data03 Data04 Data05 603:3:7:5: 2.

The second step is that: the data set under the normal operation condition is averagely divided into 50 parts to obtain 50 subdata sets under the normal operation condition, and then the subdata sets under each normal operation condition are combined with the data sets under other 4 abnormal conditions to form 50 data sets under 5 operation conditions.

The third step: and (4) training the 50 sub data sets obtained in the second step by using a neural network ANN and obtaining the prediction precision of the ANN model under each sub data set.

The fourth step: and constructing a balanced binary tree containing 50 nodes according to the precision data of each ANN model obtained in the third step, wherein each node in the tree represents one ANN model.

The fifth step: and setting a pruning threshold, wherein the pruning threshold is respectively used as a left pruning threshold and a right pruning threshold by taking the average value of the nodes of the left branch and the right branch of the balanced binary tree, then respectively traversing the nodes of the left branch and comparing the nodes with the left branch pruning threshold, and if the data value of the node is less than the threshold, deleting the node from the balanced binary tree, namely, deleting the node from the balanced binary tree.

The left branch pruning threshold and the right branch pruning threshold are respectively as follows:

and a sixth step: by pruning, a balanced binary tree with a smaller scale is obtained, namely, an ensemble learning model with a smaller scale is obtained.

The seventh step: and collecting a new data set for the compressor, inputting the new data set into the integrated model subjected to balanced binary tree pruning, and performing fault classification and diagnosis.

By means of unbalanced processing and integrated pruning of the data set, the model can be more accurate in the classification process, and a large amount of computer hardware resources can be saved. Has strong application value in industrial production.

The compressor fault classification method based on the balanced binary tree integrated pruning strategy has the technical advantages that:

As shown in fig. 2, four different pruning strategies are used to perform the rejection of the poor basis classifier of the basis classifier pool, and the test precision of the final sub-integration obtained under each pruning strategy is obtained. The method specifically comprises four pruning strategies based on fuzzy clustering, optimization problem, sequence and balanced binary tree. Compared with other pruning strategies, the compressor fault classification method based on the balanced binary tree integrated pruning strategy can achieve better precision and has better practical use value.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A compressor fault classification method based on a balanced binary tree integrated pruning strategy is characterized by comprising the following steps:

s1, initializing a base classifier integration pool: respectively acquiring data under the condition of normal operation of a compressor and under the condition of various faults to obtain a large data set, segmenting the large data set to form a plurality of sub data sets, finishing training and testing work by using an Artificial Neural Network (ANN) as a base classifier for each sub data set, and obtaining an initial base classifier pool and the training precision of each base classifier to form an initial complete classifier pool;

s2, constructing a balanced binary tree to form a final sub-integration: constructing a balanced binary tree according to the precision of the base classifiers in the base classifier pool, wherein each node on the balanced binary tree represents the training precision of each base classifier in the integration pool, counting the number of left and right branch leaf sub-nodes of the root node, setting a pruning threshold by setting a boundary pruning function, removing partial leaf nodes of left and right lower branches of the balanced binary tree according to the pruning threshold, and reserving middle trunk part nodes to form a final sub-integration, wherein the precision of the base classifier represented by the root node is ranked at a middle position in the precision of the base classifiers in the whole integration pool, the base classifier represented by each leaf node of the left lower branch is ranked at a tail position in the precision ranking of each base classifier in the integration pool, the precision of the base classifier represented by each leaf node of the right lower limb is ranked at a front position in the precision ranking in the integration pool, and, respectively taking the average value of the nodes of the left branch and the right branch of the balanced binary tree as a left pruning threshold and a right pruning threshold, wherein the left pruning threshold and the right pruning threshold are respectively as follows:

and S3, collecting a new data set of the compressor, and predicting and classifying new data samples of the new data set by using the reserved optimal subset to obtain a fault classification result of the compressor.

2. The method for classifying the compressor fault based on the balanced binary tree integrated pruning strategy according to claim 1, wherein in the step S1, the big data comprises a data set under a normal condition and a data set under an abnormal condition.

3. The method for compressor fault classification based on the balanced binary tree integrated pruning strategy according to claim 2, characterized in that the number ratio of the data sets under normal conditions to the data sets under abnormal conditions ranges from 100: 1 to 1000: 1.

4. The method for classifying the fault of the compressor based on the balanced binary tree integrated pruning strategy according to claim 2, wherein the splitting operation in the step S1 is specifically as follows: and segmenting the data set under the normal condition to obtain subdata sets under the normal condition, and merging the subdata sets under each normal operation condition with the data sets under other abnormal conditions to form data sets under various operation conditions.