CN111626418B - Compressor fault classification method based on balanced binary tree integrated pruning strategy - Google Patents

Compressor fault classification method based on balanced binary tree integrated pruning strategy Download PDF

Info

Publication number
CN111626418B
CN111626418B CN202010458446.4A CN202010458446A CN111626418B CN 111626418 B CN111626418 B CN 111626418B CN 202010458446 A CN202010458446 A CN 202010458446A CN 111626418 B CN111626418 B CN 111626418B
Authority
CN
China
Prior art keywords
binary tree
precision
balanced binary
pool
integration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010458446.4A
Other languages
Chinese (zh)
Other versions
CN111626418A (en
Inventor
邓晓衡
蔚永
黑聪
刘梦杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202010458446.4A priority Critical patent/CN111626418B/en
Publication of CN111626418A publication Critical patent/CN111626418A/en
Application granted granted Critical
Publication of CN111626418B publication Critical patent/CN111626418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an integrated pruning strategy based on a balanced binary tree, which comprises the following steps: s1, initializing a base classifier integration pool: segmenting the large data set to form a plurality of sub data sets, and training and testing each sub data set to form an initial complete classifier pool; s2, constructing a balanced binary tree to form a final sub-integration; and S3, predicting and classifying the new data sample by using the retained optimal sub-integration. The invention solves the technical problems that the fitting phenomenon is easy to generate, the base classifier with too high or too low test precision is difficult to remove, and the generalization performance is not high.

Description

Compressor fault classification method based on balanced binary tree integrated pruning strategy
Technical Field
The invention relates to the technical field of ensemble learning, in particular to a compressor fault classification method based on a balanced binary tree ensemble pruning strategy.
Background
The ensemble learning solves a plurality of problems of a single classifier in the process of mass data training and learning, but as the ensemble learning is realized by forming an ensemble pool by a plurality of single classifiers to complete a prediction or classification task, higher requirements are put forward on computer hardware resources, and the common mode for solving the problem is to use an ensemble pruning strategy idea to reduce the number of the used single classifiers as much as possible under the condition of ensuring that the final prediction or classification precision of the ensemble learning is not reduced or even improved.
The method is characterized in that a clustering task is completed by taking the testing precision of each base classifier in an integration pool as a data point, and then a subset consisting of corresponding base classifiers and consisting of most data points is selected as a final integration completion task; the integrated pruning strategy based on the optimization problem mainly uses the test results of all base classifiers in an integrated pool as an optimization problem to search the optimal sub-integration, the integrated pruning strategy based on reinforcement learning mainly tentatively searches the optimal sub-integration again and again through a reinforcement algorithm, and the integrated pruning strategy based on the sequence mainly obtains the optimal sub-integration by sequencing the precision of all the base classifiers.
The traditional sequence-based integrated pruning may cause the generation of an overfitting phenomenon, the strategy is improved, the elimination work of a part of basis classifiers with too high test precision or too low test precision in an integrated pool is completed by utilizing the characteristic of a balanced binary tree, and finally the part of basis classifiers with better generalization performance are reserved as final sub-integration.
Disclosure of Invention
The invention provides a compressor fault classification method based on a balanced binary tree integrated pruning strategy, and aims to solve the problems that a fitting phenomenon is easy to generate, a base classifier with too high or too low test precision is difficult to remove, and the generalization performance is not high in the background technology.
In order to achieve the above object, an embodiment of the present invention provides a method for classifying a compressor fault based on a balanced binary tree integrated pruning strategy, which is characterized by comprising the following steps:
s1, initializing a base classifier integration pool: segmenting the large data set to form a plurality of sub data sets, and training and testing each sub data set to form an initial complete classifier pool;
s2, constructing a balanced binary tree to form a final sub-integration: constructing a balanced binary tree according to the precision of the base classifiers in the base classifier pool, wherein each node on the balanced binary tree represents the training precision of each base classifier in the integration pool, eliminating partial leaf nodes of the left lower branch and the right lower branch of the balanced binary tree by setting a boundary pruning function, and reserving partial nodes of the middle trunk to form final sub-integration;
and S3, predicting and classifying the new data sample by using the retained optimal sub-integration.
In S1, the artificial neural network ANN is used as the basis classifier to complete training and testing, and obtain the initial basis classifier pool and the training precision of each basis classifier.
In S2, the precision of the base classifier represented by the root node is ranked at the middle position in the precision of the base classifier in the whole integration pool, the base classifier represented by each leaf node of the left lower branch is at the last position in the precision ranking of each base classifier in the integration pool, and the precision of the base classifier represented by each leaf node of the right lower limb is at the front position in the precision ranking in the integration pool.
In S2, the number of left and right branch leaf nodes of the root node is counted, a pruning threshold is set, and node elimination is performed according to the pruning threshold.
The average value of the nodes of the left branch and the right branch of the balanced binary tree is respectively used as a left pruning threshold and a right pruning threshold, and the left branch pruning threshold and the right branch pruning threshold are respectively as follows:
Figure GDA0003536015830000021
Figure GDA0003536015830000022
in S1, the big data includes a normal data set and an abnormal data set.
Wherein the number ratio range of the data sets under the normal condition to the data sets under the abnormal condition is 100: 1-1000: 1.
the segmentation work in the step S1 specifically includes: and segmenting the data set under the normal condition to obtain subdata sets under the normal condition, and merging the subdata sets under each normal operation condition with the data sets under other abnormal conditions to form data sets under various operation conditions.
The scheme of the invention has the following beneficial effects:
the method has the advantages that the balanced binary tree is used for removing the partial base classifiers with too poor generalization capability and too poor precision in the integrated pool, in the removing process, the integral base classification pool is sequentially constructed, the base classifiers with too good training precision can be deleted conveniently, the partial base classifiers are often classifiers with too poor generalization capability, and on the other hand, the classifiers with poor training precision are deleted, so that the integral integrated pool is ensured to have higher precision finally. Therefore, the invention considers the whole integrated pool, and further realizes the elimination work of the bad base classifier.
The high-quality characteristics of the balanced binary tree are used for constructing each base classifier in the integration pool according to the precision, so that the constructed balanced binary tree has the sequence characteristic, the operation of removing the classifiers with too good precision and too poor precision in the base classifier pool can be completed by carrying out the trimming operation of the left lower branch and the right lower branch on the tree, on one hand, the phenomenon of over-fitting can be avoided by removing the base classifiers with too good precision, and on the other hand, the integral integration precision can be improved by removing the base classifiers with too poor precision. So that the resulting subset composition can achieve the best generalization performance. Secondly, the dependence of the algorithm model on the hardware resources of the computer can be reduced as much as possible by reducing the scale of the integrated pool.
Drawings
FIG. 1 is a flow chart of a compressor fault classification method based on a balanced binary tree integrated pruning strategy of the present invention;
fig. 2 shows the test accuracy of the final sub-integration obtained under each pruning strategy by adopting four different pruning strategies to perform the elimination work of the bad base classifier of the base classifier pool.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
Aiming at the existing problems, the invention provides a compressor fault classification method based on a balanced binary tree integrated pruning strategy, which comprises the following steps:
s1, initializing a base classifier integration pool: segmenting the large data set to form a plurality of sub data sets, and training and testing each sub data set to form an initial complete classifier pool; the big data comprises a data set under a normal condition and a data set under an abnormal condition. The number ratio of the data sets in the normal case to the data sets in the abnormal case ranges from 100: 1-1000: 1.
the segmentation work specifically comprises the following steps: and segmenting the data set under the normal condition to obtain subdata sets under the normal condition, and merging the subdata sets under each normal operation condition with the data sets under other abnormal conditions to form data sets under various operation conditions.
The training and testing work is specifically to use an artificial neural network ANN as a base classifier to finish the training and testing work and obtain an initial base classifier pool and the training precision of each base classifier.
S2, constructing a balanced binary tree to form a final sub-integration: constructing a balanced binary tree according to the precision of the base classifiers in the base classifier pool, wherein each node on the balanced binary tree represents the training precision of each base classifier in the integration pool, the precision of the base classifier represented by the root node is ranked in the middle position in the precision of the base classifiers in the whole integration pool, the base classifier represented by each leaf node of the left lower branch is located at the tail position in the precision ranking of each base classifier in the integration pool, and the precision of the base classifier represented by each leaf node of the right lower limb is located at the front position in the precision ranking in the integration pool; counting the number of left and right branch and leaf nodes of the root node, setting a pruning threshold value, and removing nodes according to the pruning threshold value; by setting a boundary pruning function, removing partial leaf nodes of a left lower branch and a right lower branch of the balanced binary tree, and reserving partial nodes of a middle trunk to form a final sub-integration; respectively taking the average value of the nodes of the left branch and the right branch of the balanced binary tree as a left pruning threshold and a right pruning threshold, wherein the left pruning threshold and the right pruning threshold are respectively as follows:
Figure GDA0003536015830000041
Figure GDA0003536015830000042
and S3, predicting and classifying the new data sample by using the retained optimal sub-integration.
The scheme is applied to fault diagnosis of the natural gas compressor, and a good effect is obtained in the fault diagnosis. A plurality of sensor data are built in the selected compressor, all sensors acquire data every 3 seconds, a data set with sample size of tens of millions in a certain time period is selected, and a large data set is more suitable for the proposed model.
The first step is as follows: respectively collecting data under the normal operation condition of the compressor and under the common 4 fault conditions, and respectively counting data volume information under various operation conditions. Each compressor acquires 5 Data in total, namely a Data set Data01 under a normal condition, a Data set Data02 under an abnormal condition of low inlet pressure, a Data set Data03 under an abnormal condition of air leakage of an exhaust valve, a Data set Data04 under an abnormal condition of high pressure of a recovery tank and a Data set Data05 under an abnormal condition of swinging of a rotating shaft, wherein the sample number ratio of each Data set is as follows: data01 Data02 Data03 Data04 Data05 603:3:7:5: 2.
The second step is that: the data set under the normal operation condition is averagely divided into 50 parts to obtain 50 subdata sets under the normal operation condition, and then the subdata sets under each normal operation condition are combined with the data sets under other 4 abnormal conditions to form 50 data sets under 5 operation conditions.
The third step: and (4) training the 50 sub data sets obtained in the second step by using a neural network ANN and obtaining the prediction precision of the ANN model under each sub data set.
The fourth step: and constructing a balanced binary tree containing 50 nodes according to the precision data of each ANN model obtained in the third step, wherein each node in the tree represents one ANN model.
The fifth step: and setting a pruning threshold, wherein the pruning threshold is respectively used as a left pruning threshold and a right pruning threshold by taking the average value of the nodes of the left branch and the right branch of the balanced binary tree, then respectively traversing the nodes of the left branch and comparing the nodes with the left branch pruning threshold, and if the data value of the node is less than the threshold, deleting the node from the balanced binary tree, namely, deleting the node from the balanced binary tree.
The left branch pruning threshold and the right branch pruning threshold are respectively as follows:
Figure GDA0003536015830000051
Figure GDA0003536015830000052
and a sixth step: by pruning, a balanced binary tree with a smaller scale is obtained, namely, an ensemble learning model with a smaller scale is obtained.
The seventh step: and collecting a new data set for the compressor, inputting the new data set into the integrated model subjected to balanced binary tree pruning, and performing fault classification and diagnosis.
By means of unbalanced processing and integrated pruning of the data set, the model can be more accurate in the classification process, and a large amount of computer hardware resources can be saved. Has strong application value in industrial production.
The compressor fault classification method based on the balanced binary tree integrated pruning strategy has the technical advantages that:
the method has the advantages that the balanced binary tree is used for removing the partial base classifiers with too poor generalization capability and too poor precision in the integrated pool, in the removing process, the integral base classification pool is sequentially constructed, the base classifiers with too good training precision can be deleted conveniently, the partial base classifiers are often classifiers with too poor generalization capability, and on the other hand, the classifiers with poor training precision are deleted, so that the integral integrated pool is ensured to have higher precision finally. Therefore, the invention considers the whole integrated pool, and further realizes the elimination work of the bad base classifier.
The high-quality characteristics of the balanced binary tree are used for constructing each base classifier in the integration pool according to the precision, so that the constructed balanced binary tree has the sequence characteristic, the operation of removing the classifiers with too good precision and too poor precision in the base classifier pool can be completed by carrying out the trimming operation of the left lower branch and the right lower branch on the tree, on one hand, the phenomenon of over-fitting can be avoided by removing the base classifiers with too good precision, and on the other hand, the integral integration precision can be improved by removing the base classifiers with too poor precision. So that the resulting subset composition can achieve the best generalization performance. Secondly, the dependence of the algorithm model on the hardware resources of the computer can be reduced as much as possible by reducing the scale of the integrated pool.
As shown in fig. 2, four different pruning strategies are used to perform the rejection of the poor basis classifier of the basis classifier pool, and the test precision of the final sub-integration obtained under each pruning strategy is obtained. The method specifically comprises four pruning strategies based on fuzzy clustering, optimization problem, sequence and balanced binary tree. Compared with other pruning strategies, the compressor fault classification method based on the balanced binary tree integrated pruning strategy can achieve better precision and has better practical use value.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (4)

1. A compressor fault classification method based on a balanced binary tree integrated pruning strategy is characterized by comprising the following steps:
s1, initializing a base classifier integration pool: respectively acquiring data under the condition of normal operation of a compressor and under the condition of various faults to obtain a large data set, segmenting the large data set to form a plurality of sub data sets, finishing training and testing work by using an Artificial Neural Network (ANN) as a base classifier for each sub data set, and obtaining an initial base classifier pool and the training precision of each base classifier to form an initial complete classifier pool;
s2, constructing a balanced binary tree to form a final sub-integration: constructing a balanced binary tree according to the precision of the base classifiers in the base classifier pool, wherein each node on the balanced binary tree represents the training precision of each base classifier in the integration pool, counting the number of left and right branch leaf sub-nodes of the root node, setting a pruning threshold by setting a boundary pruning function, removing partial leaf nodes of left and right lower branches of the balanced binary tree according to the pruning threshold, and reserving middle trunk part nodes to form a final sub-integration, wherein the precision of the base classifier represented by the root node is ranked at a middle position in the precision of the base classifiers in the whole integration pool, the base classifier represented by each leaf node of the left lower branch is ranked at a tail position in the precision ranking of each base classifier in the integration pool, the precision of the base classifier represented by each leaf node of the right lower limb is ranked at a front position in the precision ranking in the integration pool, and, respectively taking the average value of the nodes of the left branch and the right branch of the balanced binary tree as a left pruning threshold and a right pruning threshold, wherein the left pruning threshold and the right pruning threshold are respectively as follows:
Figure 1
and S3, collecting a new data set of the compressor, and predicting and classifying new data samples of the new data set by using the reserved optimal subset to obtain a fault classification result of the compressor.
2. The method for classifying the compressor fault based on the balanced binary tree integrated pruning strategy according to claim 1, wherein in the step S1, the big data comprises a data set under a normal condition and a data set under an abnormal condition.
3. The method for compressor fault classification based on the balanced binary tree integrated pruning strategy according to claim 2, characterized in that the number ratio of the data sets under normal conditions to the data sets under abnormal conditions ranges from 100: 1 to 1000: 1.
4. The method for classifying the fault of the compressor based on the balanced binary tree integrated pruning strategy according to claim 2, wherein the splitting operation in the step S1 is specifically as follows: and segmenting the data set under the normal condition to obtain subdata sets under the normal condition, and merging the subdata sets under each normal operation condition with the data sets under other abnormal conditions to form data sets under various operation conditions.
CN202010458446.4A 2020-05-27 2020-05-27 Compressor fault classification method based on balanced binary tree integrated pruning strategy Active CN111626418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010458446.4A CN111626418B (en) 2020-05-27 2020-05-27 Compressor fault classification method based on balanced binary tree integrated pruning strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010458446.4A CN111626418B (en) 2020-05-27 2020-05-27 Compressor fault classification method based on balanced binary tree integrated pruning strategy

Publications (2)

Publication Number Publication Date
CN111626418A CN111626418A (en) 2020-09-04
CN111626418B true CN111626418B (en) 2022-04-22

Family

ID=72273131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010458446.4A Active CN111626418B (en) 2020-05-27 2020-05-27 Compressor fault classification method based on balanced binary tree integrated pruning strategy

Country Status (1)

Country Link
CN (1) CN111626418B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033617A (en) * 2015-03-16 2016-10-19 广州四三九九信息科技有限公司 Method for performing game picture intelligent compression by combining with visualization tool
CN109033632A (en) * 2018-07-26 2018-12-18 北京航空航天大学 A kind of trend forecasting method based on depth quantum nerve network
CN110569867A (en) * 2019-07-15 2019-12-13 山东电工电气集团有限公司 Decision tree algorithm-based power transmission line fault reason distinguishing method, medium and equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050072A1 (en) * 2003-09-03 2005-03-03 Lucent Technologies, Inc. Highly parallel tree search architecture for multi-user detection
US8457441B2 (en) * 2008-06-25 2013-06-04 Microsoft Corporation Fast approximate spatial representations for informal retrieval

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033617A (en) * 2015-03-16 2016-10-19 广州四三九九信息科技有限公司 Method for performing game picture intelligent compression by combining with visualization tool
CN109033632A (en) * 2018-07-26 2018-12-18 北京航空航天大学 A kind of trend forecasting method based on depth quantum nerve network
CN110569867A (en) * 2019-07-15 2019-12-13 山东电工电气集团有限公司 Decision tree algorithm-based power transmission line fault reason distinguishing method, medium and equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An imbalanced data classification method based on automatic clustering under-sampling;Xiaoheng Deng 等;《2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)》;IEEE;20161211;1-8 *
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning;Rajeev Rastogi 等;《Data Mining and Knowledge Discovery》;20001031;315-344 *
基于数据挖掘与信息融合的故障诊断方法研究;孙卫祥;《万方数据》;20070814;全文 *

Also Published As

Publication number Publication date
CN111626418A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN111126386B (en) Sequence domain adaptation method based on countermeasure learning in scene text recognition
CN110851645B (en) Image retrieval method based on similarity maintenance under deep metric learning
CN112308158A (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN111898689B (en) Image classification method based on neural network architecture search
CN106203377B (en) A kind of coal dust image-recognizing method
CN110263230B (en) Data cleaning method and device based on density clustering
CN111680706A (en) Double-channel output contour detection method based on coding and decoding structure
CN110020712B (en) Optimized particle swarm BP network prediction method and system based on clustering
CN107577605A (en) A kind of feature clustering system of selection of software-oriented failure prediction
CN108280236A (en) A kind of random forest visualization data analysing method based on LargeVis
CN108416373A (en) A kind of unbalanced data categorizing system based on regularization Fisher threshold value selection strategies
CN111950630A (en) Small sample industrial product defect classification method based on two-stage transfer learning
CN112308825B (en) SqueezeNet-based crop leaf disease identification method
CN110866134A (en) Image retrieval-oriented distribution consistency keeping metric learning method
CN110826624A (en) Time series classification method based on deep reinforcement learning
CN111062425A (en) Unbalanced data set processing method based on C-K-SMOTE algorithm
CN111343147A (en) Network attack detection device and method based on deep learning
CN111833310A (en) Surface defect classification method based on neural network architecture search
CN115021679A (en) Photovoltaic equipment fault detection method based on multi-dimensional outlier detection
CN112070136A (en) Method for classifying unbalanced data based on boost decision tree and improved SMOTE
CN114049305A (en) Distribution line pin defect detection method based on improved ALI and fast-RCNN
CN114441173B (en) Rolling bearing fault diagnosis method based on improved depth residual error shrinkage network
CN112365139A (en) Crowd danger degree analysis method under graph convolution neural network
CN111984790B (en) Entity relation extraction method
CN111626418B (en) Compressor fault classification method based on balanced binary tree integrated pruning strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant