CN114722915A - Fault diagnosis method and system based on ADASYN algorithm and random forest algorithm - Google Patents
Fault diagnosis method and system based on ADASYN algorithm and random forest algorithm Download PDFInfo
- Publication number
- CN114722915A CN114722915A CN202210267057.2A CN202210267057A CN114722915A CN 114722915 A CN114722915 A CN 114722915A CN 202210267057 A CN202210267057 A CN 202210267057A CN 114722915 A CN114722915 A CN 114722915A
- Authority
- CN
- China
- Prior art keywords
- fault
- data set
- random forest
- algorithm
- wavelet packet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003745 diagnosis Methods 0.000 title claims abstract description 83
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 81
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 80
- 238000000034 method Methods 0.000 title claims abstract description 37
- 239000013598 vector Substances 0.000 claims abstract description 72
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 70
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000003066 decision tree Methods 0.000 claims description 40
- 238000012549 training Methods 0.000 claims description 28
- 238000012360 testing method Methods 0.000 claims description 27
- 238000010276 construction Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000012952 Resampling Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 5
- 101100499229 Mus musculus Dhrsx gene Proteins 0.000 claims description 4
- 238000013519 translation Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 12
- 238000004590 computer program Methods 0.000 description 5
- 238000004579 scanning voltage microscopy Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/28—Testing of electronic circuits, e.g. by signal tracer
- G01R31/2801—Testing of printed circuits, backplanes, motherboards, hybrid circuits or carriers for multichip packages [MCP]
- G01R31/281—Specific types of tests or tests for a specific type of fault, e.g. thermal mapping, shorts testing
- G01R31/2812—Checking for open circuits or shorts, e.g. solder bridges; Testing conductivity, resistivity or impedance
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/28—Testing of electronic circuits, e.g. by signal tracer
- G01R31/2832—Specific tests of electronic circuits not provided for elsewhere
- G01R31/2836—Fault-finding or characterising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/259—Fusion by voting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a fault diagnosis method based on an ADASYN algorithm and a random forest algorithm, and belongs to the technical field of fault diagnosis. The method comprises the following steps: collecting fault current of an internal circuit of the intelligent household appliance, and recording the fault type of the fault current; carrying out multi-layer wavelet packet decomposition on the fault current, calculating a normalized feature vector of the fault current, and recording the normalized feature vector and the fault type corresponding to the fault current into a data set; preprocessing the data set through an ADASYN algorithm to obtain a preprocessed data set; constructing a random forest fault diagnosis model through a random forest algorithm based on the preprocessed data set; and diagnosing the fault type of the fault current to be diagnosed according to the preprocessed data set and the random forest fault diagnosis model.
Description
Technical Field
The invention relates to the technical field of fault diagnosis, in particular to a fault diagnosis method and system based on an ADASYN algorithm and a random forest algorithm.
Background
With the continuous development of scientific technology and information technology, the use ratio of the intelligent household appliances is remarkably improved. The intelligent household appliances are different from the traditional household appliances, the internal structures of most of the intelligent household appliances are complex, and the internal circuits related to digital signal control are difficult to check through a traditional method when the internal circuits are in failure.
In the prior art, a Fourier analysis method is used for diagnosing circuit faults, firstly, time domain information is converted into a frequency domain by adopting Fourier conversion, then frequency domain fault characteristic selection is carried out, and fault diagnosis is realized, but the method is only suitable for open-circuit fault analysis; the prior art also provides a circuit fault diagnosis method combining an optimized wavelet packet and an extreme learning machine, and the extreme learning machine is used for classifying and identifying faults, so that the method has the advantage of short training time, but still has the problem of result deviation caused by unbalanced sample distribution.
Disclosure of Invention
In view of this, the invention provides a fault diagnosis method and system based on an ADASYN algorithm and a random forest algorithm, which diagnose the internal circuit fault of the intelligent household appliance by combining the ADASYN algorithm and the random forest algorithm, are suitable for circuit fault analysis in three states of a path, an open circuit and a short circuit, have the advantage of uniform sample data size, and can diagnose the internal circuit fault of the intelligent household appliance with higher diagnosis precision.
The technical scheme adopted by the embodiment of the invention for solving the technical problem is as follows:
on one hand, the invention provides a fault diagnosis method based on an ADASYN algorithm and a random forest algorithm, which comprises the following steps:
step S1, collecting fault current of an internal circuit of the intelligent household appliance, and recording the fault type of the fault current;
step S2, carrying out multi-layer wavelet packet decomposition on the fault current, calculating a normalized feature vector of the fault current, and recording the normalized feature vector and the fault type corresponding to the fault current into a data set;
step S3, preprocessing the data set through an ADASYN algorithm to obtain a preprocessed data set;
step S4, constructing a random forest fault diagnosis model through a random forest algorithm based on the preprocessed data set;
and step S5, diagnosing the fault type of the fault current to be diagnosed according to the preprocessed data set and the random forest fault diagnosis model.
Preferably, the step S1 is to collect a fault current of an internal circuit of the smart appliance, and recording a fault type of the fault current includes:
establishing a circuit model of the internal circuit of the intelligent household appliance by using PSCAD simulation software;
setting the fault type and operating the circuit model;
and collecting and recording the fault current of the circuit model in the fault type state.
Preferably, the step S2 performs multi-layer wavelet packet decomposition on the fault current and calculates a normalized feature vector of the fault current, and recording the fault type corresponding to the normalized feature vector and the fault current to a data set includes:
step 21, constructing a wavelet packet decomposition layer not less than one layer, wherein each layer of wavelet packet decomposition is represented by recursion:
wherein,is the wavelet packet of the nth node in the jth wavelet packet decomposition layer, J belongs to [1, J ]],n∈[0,2j-1]J is the total number of layers, k is a translation variable, k belongs to (- ∞, and + ∞), t is a time variable, the function h (k-2t) is the output value of the low-pass filter, and the function g (k-2t) is the output value of the high-pass filter;
step 22, using said waveletThe packet decomposition layer carries out multi-layer wavelet packet decomposition on the fault current to obtain node energy Ej,n;
Wherein E isj,nIs the node energy, W, of the nth node in the jth wavelet packet decomposition layer(j,n)Is a node signal of the nth node in the jth wavelet packet decomposition layer, dj,n(p) is the node signal W(j,n)The corresponding p-th coefficient after decomposition, p ∈ [1, m [ ]]M is the frequency band length of the nth node in the jth wavelet packet decomposition layer;
step 23, according to the node energy Ej,nEstablishing a hierarchical energy feature vector Ej:
Wherein E isjA hierarchical energy feature vector for the jth wavelet packet decomposition layer;
and 25, recording the normalized feature vector X and the corresponding fault type into the data set.
Preferably, the step S3 is to pre-process the data set by an ADASYN algorithm, and obtaining the pre-processed data set includes:
step S31, calculating the number G of samples needing to be generated according to the data sets:
Gs=(ms-max-ms)×β
Wherein m iss-maxThe fault type s with the maximum total number of samples in the data setThe number of samples, m, for maxsFor the failure type m in the data sets-maxNumber of samples corresponding to other fault types S, S ∈ [1, S-1 ]]S is the number of fault types in the data set, beta is a sample factor, and beta belongs to [0,1 ]];
Step S32, randomly selecting an ith sample belonging to the fault type S, and calculating the sample data proportion r of the fault type S-max in K adjacent samples of the ith samplei:
Wherein Δ i is the number of samples belonging to the fault type s-max in K adjacent samples of the ith sample, i ∈ [1, ms];
Step S34, calculating the number g of new samples of the final newly-added sample number required by the fault type SsThe calculation formula is as follows:
step S35, calculating g belonging to the fault type SsA new sample siAnd adding the data set to obtain a preprocessed data set, wherein a new sample siIs represented as:
si=xi+(xzi-xi)×λ
wherein x isziIs the ith sample xiOf the K neighboring samples of (a), the sample xziNot belonging to the fault type s-max, λ being a random number.
Preferably, the step S4, based on the preprocessed data set, constructing a random forest fault diagnosis model through a random forest algorithm includes:
step S41, resampling the preprocessed data set by a Bootstrap method to obtain a training data set;
step S42, a root node is constructed, a fault types are selected as decision tree features, and a decision tree is generated based on all the normalized feature vectors corresponding to the fault types a in the training data set;
step S43, repeating step S41 and step S42 to obtain C decision trees;
step S44, randomly extracting the normalized feature vector from the preprocessed data set as a test sample;
step S45, performing fault type voting on the test sample through the C decision trees, and verifying whether the fault type with the most votes is consistent with the actual fault type of the test sample, where the expression of the voting result R is:
wherein x is the training data set, y is the actual fault type of the test sample, rc(x) For the decision tree model, C is an element of [1, C ∈],I[.]To indicate the function, argmax (.) is a maximum function;
step S46, if not, adjusting the parameters of the C decision trees, and then repeatedly executing the step S45; and if the random forest fault diagnosis models are consistent, the random forest fault diagnosis model is constructed.
Preferably, the step S5 of diagnosing the fault type of the fault current to be diagnosed according to the preprocessed data set and the random forest fault diagnosis model includes:
calculating a normalized feature vector of the fault current to be diagnosed;
based on the normalized feature vector of the fault current to be diagnosed, performing fault type voting through the random forest fault diagnosis model and counting voting results;
and selecting the fault type with the most votes as the diagnosis result of the fault current to be diagnosed.
Preferably, the fault types include short circuit faults, disconnection faults, series arc faults, and leakage current faults.
The invention also provides a fault diagnosis system based on the ADASYNN algorithm and the random forest algorithm, which comprises the following steps:
the fault current acquisition module is used for acquiring fault current of an internal circuit of the intelligent household appliance and recording the fault type of the fault current;
the data set construction module is used for carrying out multilayer wavelet packet decomposition on the fault current, calculating a normalized feature vector of the fault current, and recording the normalized feature vector and the fault type corresponding to the fault current to a data set;
the data set preprocessing module is used for preprocessing the data set through an ADASYN algorithm to obtain a preprocessed data set;
the fault diagnosis module is used for constructing a random forest fault diagnosis model through a random forest algorithm based on the preprocessed data set; and the fault type of the fault current to be diagnosed is diagnosed according to the preprocessed data set and the random forest fault diagnosis model.
Preferably, the data set construction module comprises:
the wavelet packet decomposition layer construction submodule is used for constructing not less than one wavelet packet decomposition layer;
the node energy calculation submodule is used for carrying out multi-layer wavelet packet decomposition on the fault current by utilizing the wavelet packet decomposition layer to obtain node energy of each wavelet packet;
the data set construction submodule is used for establishing a level energy characteristic vector according to the node energy of each wavelet packet and performing normalization calculation to obtain a normalization characteristic vector; recording the normalized feature vectors and the corresponding fault types to the data set.
Preferably, the fault diagnosis module comprises:
the resampling submodule is used for resampling the preprocessed data set by a Bootstrap method to obtain a training data set;
the decision tree generation submodule is used for constructing a root node, selecting a fault types as decision tree characteristics, and generating a decision tree based on all the normalized characteristic vectors corresponding to the fault types a in the training data set;
the verification submodule is used for randomly extracting the normalized feature vector from the preprocessed data set to serve as a test sample, voting fault types of the test sample through the C decision trees and verifying whether the fault type with the largest voting times is consistent with the actual fault type of the test sample; and if the random forest fault diagnosis models are consistent, the random forest fault diagnosis model is constructed.
According to the technical scheme, the fault diagnosis method based on the ADASYN algorithm and the random forest algorithm, provided by the embodiment of the invention, is used for collecting the fault current of the internal circuit of the intelligent household appliance and recording the fault type of the fault current; carrying out multilayer wavelet packet decomposition on the fault current, calculating a normalized feature vector of the fault current, and recording the normalized feature vector and the fault type corresponding to the fault current to a data set; preprocessing the data set through an ADASYNN algorithm to obtain a preprocessed data set; constructing a random forest fault diagnosis model through a random forest algorithm based on the preprocessed data set; and finally, diagnosing the fault type of the fault current to be diagnosed according to the preprocessed data set and the random forest fault diagnosis model. The method is suitable for analyzing the circuit faults in three states of open circuit, open circuit and short circuit, has the advantage of uniform sample data volume, and can diagnose the internal circuit faults of the intelligent household electrical appliance with higher diagnosis precision.
Drawings
FIG. 1 is a flow chart of the steps of the fault diagnosis method based on the ADASYNN algorithm and the random forest algorithm according to the present invention;
FIG. 2(a) is a short-circuit fault current waveform diagram;
FIG. 2(b) is a broken line fault current waveform diagram;
FIG. 2(c) is a series arc fault current waveform diagram;
FIG. 2(d) is a leakage current fault current waveform diagram;
fig. 3 is an exploded schematic view of a three-layer wavelet packet provided in this embodiment;
FIG. 4 is a schematic structural diagram of a fault diagnosis system based on the ADASYN algorithm and the random forest algorithm according to the present invention;
FIG. 5(a) shows wavelet packet decomposition waveforms for different failure conditions at node 1;
FIG. 5(b) shows wavelet packet decomposition waveforms for different failure conditions at node 32;
FIG. 6 is a graph comparing the accuracy of different algorithms for various fault types in an embodiment of the present invention;
FIG. 7 is a graph comparing recall rates for various types of faults for different algorithms in an embodiment of the present invention;
fig. 8 is a graph comparing F1 values for various fault types for different algorithms in an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, an embodiment of the present invention provides a fault diagnosis method based on ADASYN algorithm and random forest algorithm, including the following steps:
step S1, collecting the fault current of the internal circuit of the intelligent household appliance, and recording the fault type of the fault current;
step S2, carrying out multi-layer wavelet packet decomposition on the fault current, calculating a normalized feature vector of the fault current, and recording the normalized feature vector and the fault type corresponding to the fault current into a data set;
step S3, preprocessing the data set through an ADASYN algorithm to obtain a preprocessed data set;
step S4, constructing a random forest fault diagnosis model through a random forest algorithm based on the preprocessed data set;
and step S5, diagnosing the fault type of the fault current to be diagnosed according to the preprocessed data set and the random forest fault diagnosis model.
Preferably, step S1 is specifically:
step S11, establishing a circuit model of the internal circuit of the intelligent household appliance by using PSCAD simulation software;
step S12, setting fault type and operating circuit model;
and step S13, acquiring and recording the fault current of the circuit model in the fault type state.
In practice, a PSCAD simulation software is used for constructing an internal circuit model of the intelligent household appliance, the effective value of the power voltage is 220V, the total simulation time is 5s, the measurement frequency is 1kHz, the model is used for simulating faults, and under each fault, current signals of the fault branch are collected, and fault current waves of the fault branch are shown in fig. 2.
When collecting data, the fault types may include a short circuit fault F1, a disconnection fault F2, a series arc fault F3, a leakage current fault F4, and the like.
The specific process of performing multi-layer wavelet packet decomposition on the fault current and calculating the normalized feature vector of the fault current in step S2 includes:
step 21, constructing a wavelet packet decomposition layer not less than one layer, wherein each layer of wavelet packet decomposition is represented by recursion:
wherein,is the wavelet packet of the nth node in the jth wavelet packet decomposition layer, J belongs to [1, J ]],n∈[0,2j-1]J is the total number of layers, k is the translation variable, kE (- ∞, + ∞), t is the time variable, function h (k-2t) is the low pass filter output, and function g (k-2t) is the high pass filter output;is the wavelet packet of the 2 n-th node in the j + 1-th wavelet packet decomposition layer,is the wavelet packet of the 2n +1 node in the j +1 th wavelet packet decomposition layer.
In practice, too many wavelet packet decomposition layers increase the complexity of operation, reduce the diagnosis efficiency, and easily cause signal distortion, and 3-6 wavelet packet decomposition layers are more suitable for ensuring the efficiency; in the embodiment of the present invention, J — 3 wavelet packet decomposition layers are taken as an example, and a schematic diagram of three-layer wavelet packet decomposition is shown in fig. 3, where W is an original time domain signal, and W is a time domain signal(j,n)And (j is 0,1,2, 3; n is 0,1,2,3,4,5,6,7) is a node signal of the nth node in the jth wavelet packet decomposition layer.
Step 22, utilizing the wavelet packet decomposition layer to carry out multi-layer wavelet packet decomposition on the fault current to obtain node energy Ej,n;
Wherein E isj,nIs the node energy of the nth node in the jth wavelet packet decomposition layer, W(j,n)For node signals of the nth node in the jth wavelet packet decomposition layer, dj,n(p) is a node signal W(j,n)The corresponding p-th coefficient after decomposition, p ∈ [1, m [ ]]M is the frequency band length of the nth node in the jth wavelet packet decomposition layer;
step 23, according to the node energy Ej,nEstablishing a hierarchical energy feature vector Ej:
Wherein E isjIs the jth waveletA hierarchical energy feature vector of a packet decomposition layer; ej,2j-1Node energy on a 2j-1 node of a jth wavelet packet decomposition layer is obtained;
and 25, recording the normalized feature vector X and the corresponding fault type into a data set.
In practice, the original current signal has higher dimensionality, and in order to reduce training time and resource consumption, the original current signal is subjected to feature extraction, a wavelet packet decomposition layer is adopted to extract wavelet packet energy features as feature vectors, and the original high-dimensional current signal is converted into a low-dimensional feature vector, so that the calculation amount is favorably reduced, and the training time is shortened; energy on each node of the last layer is extracted through wavelet packet decomposition to serve as a feature vector, more details of original signals are shown, and subsequent feature extraction is facilitated.
By extracting the wavelet packet energy characteristics as the characteristic vectors, the original high-dimensional current signals are converted into the low-dimensional characteristic vectors, the problem that high order harmonics possibly exist in a time domain is solved, the stability of fault characteristic extraction is improved, and the calculated amount is reduced.
Step S3, oversampling a training set formed by feature vectors of a few types of samples by using an ADASYN algorithm to solve the problem of data unbalance, and the specific steps include:
step S31, calculating the number G of samples needing to be generated according to the fault type S according to the data sets:
Gs=(ms-max-ms)×β (5)
Wherein m iss-maxThe number of samples, m, corresponding to the fault type s-max with the largest total number of samples in the data setsFor failure type m in data sets-maxNumber of samples corresponding to other fault types S, S ∈ [1, S-1 ]]S is the number of fault types in the data set; beta is the sameThe factor, beta ∈ [0,1 ]](ii) a In practice, when β is 1, the number of samples of the synthesized other fault types and the fault type s-max is equal.
According to the embodiment of the invention, 9100 groups of data are collected together, wherein 1500 groups of short-circuit faults, 1200 groups of disconnection faults, 1400 groups of series arc faults and 5000 groups of leakage current faults, 6370 groups are randomly selected as training samples, the rest 2730 groups are selected as test samples, and 32-dimensional fault feature vectors are obtained from each group of samples after the 6370 groups are used as the training samples and are decomposed by 5 layers of wavelet packets.
Step S32, randomly selecting an ith sample belonging to a fault type S, and calculating the sample data proportion r belonging to the fault type S-max in K adjacent samples of the ith samplei:
Wherein, Delta i is the number of samples belonging to the fault type s-max in K adjacent samples of the ith sample, i belongs to [1, m ∈s];
In practice, the data ratio is normalized, and the calculation formula can be expressed as:the number of new samples that need to be generated is calculated for each few class of samples using normalized data occupancy.
Step S34, calculating the number g of new samples of the final newly-added sample number required by the fault type SsThe calculation formula is as follows:
step S35, calculating g belonging to fault type SsA new sample siAnd adding the data set to obtain a preprocessed data set, wherein the new sample siIs represented as:
si=xi+(xzi-xi)×λ (8)
wherein x isziIs the ith sample xiOf the K neighboring samples of (1), sample xziNot belonging to fault type s-max, lambda is random number, lambda belongs to [0,1 ]]。
In practice, after three faults with a small number are expanded by adopting an ADASYN algorithm, a series arc fault data 3546 group, a short-circuit fault data 3511 group, a disconnection fault data 3536 group and a leakage current fault characteristic data 3522 group are obtained and serve as a final training set.
The ADASYN algorithm carries out oversampling on a training set formed by feature vectors of a few types of samples, solves the problem of unbalanced data in a data set, and can improve the diagnosis precision of a fault diagnosis model.
Step S4 is based on the preprocessed data set, a random forest fault diagnosis model is constructed through a random forest algorithm, in practice, the random forest algorithm is a randomly constructed forest model, each forest model is composed of a plurality of decision trees, all samples in a training set are placed in root nodes by the decision trees, then the optimal features are selected to divide the data set, if the subsets are classified basically correctly, leaf nodes are constructed, otherwise, new optimal features are selected again to continue division, and the number of the decision trees in the random forest is set to be 300.
And step S41, resampling the preprocessed data set by a Bootstrap method to obtain a training data set. Resampling the data set by using a Bootstrap method, randomly drawing a plurality of samples with the samples being replaced, repeating the sampling for K times, and recording a new data set as Nk,k=1,2,…,K。
Step S42, a root node is constructed, a fault types are selected as decision tree characteristics, and a decision tree T is generated based on all normalized characteristic vectors corresponding to the fault types in the training data setc;
Step S43, repeatedly executing step S41 and stepS42, and making the decision tree TcGrowing as much as possible, without pruning, generating K decision trees to obtain C decision trees;
step S44, randomly extracting normalized feature vectors from the preprocessed data set as test samples;
step S45, fault type voting is carried out on the test sample through C decision trees, and whether the fault type with the most voting times is consistent with the actual fault type of the test sample is verified, wherein the expression of the voting result R is as follows:
wherein x is a training data set, y is an actual fault type of a test sample, rc(x) For the decision tree model, C is an element of [1, C ∈],I[.]To indicate the function, argmax (.) is a maximum function;
step S46, if not, adjusting the parameters of the C decision trees, and then repeatedly executing the step S45; and if the random forest fault diagnosis models are consistent, finishing the construction of the random forest fault diagnosis model.
Step S5, the implementation step of diagnosing the fault type of the fault current to be diagnosed according to the preprocessed data set and the random forest fault diagnosis model comprises the following steps:
step S61, calculating a normalized feature vector of the fault current to be diagnosed;
step S62, based on the normalized feature vector of the fault current to be diagnosed, the fault type voting is carried out through a random forest fault diagnosis model, and the voting result is counted;
and step S63, selecting the fault type with the most voting times as the diagnosis result of the fault current to be diagnosed.
As shown in fig. 4, in another aspect, the present invention provides a fault diagnosis system based on ADASYN algorithm and random forest algorithm, which can be used to implement the method shown in fig. 1, including:
the fault current acquisition module is used for acquiring the fault current of the internal circuit of the intelligent household appliance and recording the fault type of the fault current;
the data set construction module is used for carrying out multilayer wavelet packet decomposition on the fault current, calculating a normalized feature vector of the fault current and recording the normalized feature vector and the fault type corresponding to the fault current to the data set;
the data set preprocessing module is used for preprocessing the data set through an ADASYN algorithm to obtain a preprocessed data set;
the fault diagnosis module is used for constructing a random forest fault diagnosis model through a random forest algorithm based on the preprocessed data set; and the fault type of the fault current to be diagnosed is diagnosed according to the preprocessed data set and the random forest fault diagnosis model.
Specifically, the data set building module comprises:
the wavelet packet decomposition layer construction submodule is used for constructing not less than one wavelet packet decomposition layer;
the node energy calculation submodule is used for carrying out multi-layer wavelet packet decomposition on the fault current by utilizing the wavelet packet decomposition layer to obtain the node energy of each wavelet packet;
the data set construction submodule is used for establishing a level energy feature vector according to the node energy of each wavelet packet and carrying out normalization calculation to obtain a normalized feature vector; and recording the normalized feature vectors and the corresponding fault types to a data set.
The fault diagnosis module includes:
the resampling submodule is used for resampling the preprocessed data set by a Bootstrap method to obtain a training data set;
the decision tree generation submodule is used for constructing a root node, selecting a fault types as decision tree characteristics, and generating a decision tree based on all normalized feature vectors corresponding to the fault types a in the training data set;
the verification submodule is used for randomly extracting the normalized feature vector from the preprocessed data set to serve as a test sample, performing fault type voting on the test sample through the C decision trees and verifying whether the fault type with the largest voting times is consistent with the actual fault type of the test sample; and if the random forest fault diagnosis models are consistent, finishing the construction of the random forest fault diagnosis model.
In practice, in the embodiment of the present invention, for 4 different fault situations, after five-layer wavelet packet decomposition is performed on fault current data, the wavelet packet decomposition waveforms of the 1 st node and the 32 th node under different fault situations are shown in fig. 5(a) and fig. 5(b), where F1 is a short-circuit fault, F2 is a broken-line fault, F3 is a series arc fault, and F4 is a leakage current fault, and it can be seen from the diagram that the decomposition waveforms of different faults are very different no matter at low frequency or high frequency, so that the energy of each node of the last layer after 5-layer wavelet packet decomposition is used as a feature vector to better distinguish different fault situations.
In practice, in order to make the experimental results more convincing, on the basis of the ADASYN algorithm amplification data set, the random forest fault diagnosis model is compared with the test results of different fault conditions of the SVM and the decision tree, the Precision (Precision, P), Recall (Recall, R) and F1 value (F1-score) are used as evaluation indexes, the test results are shown in fig. 6,7 and 8, and as can be seen from fig. 6, on the basis of the ADASYN algorithm amplification training set, the diagnosis Precision of the random forest fault (F2) is slightly lower than that of the SVM and the decision tree algorithm is higher than or equal to that of the rest faults. As shown in fig. 7 and 8, the recall rate and F1 value of four faults in random forest diagnosis are obviously superior to those of SVM and decision tree.
In this embodiment, in order to further verify the effectiveness of the method provided by the present invention, a random forest, an SVM and a decision tree algorithm are compared on the basis of an original data set and a data set amplified by an ADASYN algorithm, respectively, and the test results are shown in table 1;
as can be seen from Table 1, the test effect of the model after the training set is expanded by using the ADASYN algorithm is obviously better than that of the original unbalanced training set no matter the model is SVM, decision tree or random forest. Wherein each index of the ADASYN-random forest is obviously superior to other algorithms.
TABLE 1 comparison of test results
Aiming at the diagnosis of the internal circuit fault of the intelligent household appliance, the embodiment of the invention uses a random forest algorithm for diagnosing the internal circuit fault of the intelligent household appliance by building an internal circuit simulation model of the intelligent household appliance, utilizes wavelet packet decomposition to extract the energy characteristic of a fault signal, is favorable for solving the problem that higher harmonics may exist in a time domain, improves the stability of fault characteristic extraction, and simultaneously adopts an ADASYN algorithm to expand a training set, thereby solving the problem that the distribution of various samples acquired under the actual condition is possibly unbalanced. The experimental results show that the ADASYN-random forest fault diagnosis model has higher diagnosis precision and certain practical significance and theoretical research value for circuit fault diagnosis.
The method can diagnose the fault of the intelligent household appliance through a convenient and effective algorithm, and can save the after-sale cost while improving the product competitiveness of the intelligent household appliance manufacturer. For the family user, use experience can be improved, risks can be checked in time, and potential safety hazards are reduced.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (10)
1. A fault diagnosis method based on an ADASYN algorithm and a random forest algorithm is characterized by comprising the following steps:
step S1, collecting fault current of an internal circuit of the intelligent household appliance, and recording the fault type of the fault current;
step S2, carrying out multi-layer wavelet packet decomposition on the fault current, calculating a normalized feature vector of the fault current, and recording the normalized feature vector and the fault type corresponding to the fault current into a data set;
step S3, preprocessing the data set through an ADASYN algorithm to obtain a preprocessed data set;
step S4, constructing a random forest fault diagnosis model through a random forest algorithm based on the preprocessed data set;
and step S5, diagnosing the fault type of the fault current to be diagnosed according to the preprocessed data set and the random forest fault diagnosis model.
2. The fault diagnosis method based on the ADASYN algorithm and the random forest algorithm according to claim 1, wherein the step S1 of collecting the fault current of the internal circuit of the intelligent household appliance and recording the fault type of the fault current comprises:
step S11, establishing a circuit model of the internal circuit of the intelligent household appliance by using PSCAD simulation software;
step S12, setting the fault type and running the circuit model;
and step S13, collecting and recording the fault current of the circuit model in the fault type state.
3. The method of claim 2, wherein the step S2 is implemented by performing multi-layer wavelet packet decomposition on the fault current and calculating a normalized feature vector of the fault current, and the recording the fault type corresponding to the normalized feature vector and the fault current to a data set comprises:
step 21, constructing a wavelet packet decomposition layer not less than one layer, wherein each layer of wavelet packet decomposition is represented by recursion:
wherein,refers to the nth node in the jth wavelet packet decomposition layerIs in the wavelet packet, J belongs to [1, J ∈ [ ]],n∈[0,2j-1]J is the total number of layers, k is a translation variable, k belongs to (- ∞, and + ∞), t is a time variable, the function h (k-2t) is the output value of the low-pass filter, and the function g (k-2t) is the output value of the high-pass filter;
step 22, utilizing the wavelet packet decomposition layer to carry out multilayer wavelet packet decomposition on the fault current to obtain node energy Ej,n;
Wherein E isj,nIs the node energy of the nth node in the jth wavelet packet decomposition layer, W(j,n)Is a node signal of the nth node in the jth wavelet packet decomposition layer, dj,n(p) is the node signal W(j,n)The corresponding p-th coefficient after decomposition, p ∈ [1, m [ ]]M is the frequency band length of the nth node in the jth wavelet packet decomposition layer;
step 23, according to the node energy Ej,nEstablishing a hierarchical energy feature vector Ej:
Wherein, EjA hierarchical energy feature vector for the jth wavelet packet decomposition layer;
step 24, according to the hierarchical energy feature vector EjAnd calculating the normalized feature vector X:
and 25, recording the normalized feature vector X and the corresponding fault type into the data set.
4. The method for fault diagnosis based on ADASYNN algorithm and random forest algorithm as claimed in claim 3, wherein said step S3 is performed by preprocessing said data set through ADASYNN algorithm, and obtaining the preprocessed data set comprises:
step S31, calculating the sample number G needed to be generated by the fault type S according to the data sets:
Gs=(ms-max-ms)×β
Wherein m iss-maxThe number of samples, m, corresponding to the fault type s-max with the maximum total number of samples in the data setsFor the fault type m in the datasets-maxNumber of samples corresponding to other fault types S, S ∈ [1, S-1 ]]S is the number of the fault types in the data set, beta is a sample factor, and beta belongs to [0,1 ]];
Step S32, randomly selecting an ith sample belonging to the fault type S, and calculating the sample data proportion r of the fault type S-max in K adjacent samples of the ith samplei:
Wherein Δ i is the number of samples belonging to the fault type s-max in K adjacent samples of the ith sample, i ∈ [1, ms];
Step S34, calculating the number g of new samples of the final newly-added sample number required by the fault type SsThe calculation formula is as follows:
step S35, calculating g belonging to the fault type SsA new sample siAnd is added toThe data set is described, a preprocessed data set is obtained, wherein a new sample siIs represented as:
Si=xi+(xzi-xi)×λ
wherein x isziIs the ith sample xiOf the K neighboring samples of (a), the sample xziAnd the fault type is not s-max, and the lambda is a random number.
5. The ADASYNN algorithm and random forest algorithm-based fault diagnosis method as claimed in claim 1, wherein the step S4 comprises the steps of, based on the preprocessed data set, constructing a random forest fault diagnosis model by a random forest algorithm:
step S41, resampling the preprocessed data set by a Bootstrap method to obtain a training data set;
step S42, a root node is constructed, a fault types are selected as decision tree features, and a decision tree is generated based on all the normalized feature vectors corresponding to the fault types a in the training data set;
step S43, repeating step S41 and step S42 to obtain C decision trees;
step S44, randomly extracting the normalized feature vector from the preprocessed data set as a test sample;
step S45, performing fault type voting on the test sample through the C decision trees, and verifying whether the fault type with the most votes is consistent with the actual fault type of the test sample, where the expression of the voting result R is:
wherein x is the training data set, y is the actual fault type of the test sample, rc(x) For the decision tree model, C is an element of [1, C ∈],I[.]To indicate the function, argmax (.) is a maximum function;
step S46, if not, adjusting the parameters of the C decision trees, and then repeatedly executing the step S45; and if the random forest fault diagnosis models are consistent, the random forest fault diagnosis model is constructed.
6. The method for fault diagnosis based on ADASYNN algorithm and random forest algorithm as claimed in claim 5, wherein the step S5 for diagnosing the fault type of the fault current to be diagnosed according to the preprocessed data set and the random forest fault diagnosis model comprises:
step S61, calculating the normalized feature vector of the fault current to be diagnosed;
step S62, based on the normalized feature vector of the fault current to be diagnosed, voting the fault types through the random forest fault diagnosis model and counting the voting results;
and step S63, selecting the fault type with the largest voting times as the diagnosis result of the fault current to be diagnosed.
7. The ADASYNN and random forest algorithm based fault diagnosis method of claim 1, wherein the fault types include short circuit fault, disconnection fault, series arc fault, and leakage current fault.
8. A fault diagnosis system based on an ADASYNN algorithm and a random forest algorithm is characterized by comprising the following steps:
the fault current acquisition module is used for acquiring fault current of an internal circuit of the intelligent household appliance and recording the fault type of the fault current;
the data set construction module is used for carrying out multilayer wavelet packet decomposition on the fault current, calculating a normalized feature vector of the fault current, and recording the normalized feature vector and the fault type corresponding to the fault current to a data set;
the data set preprocessing module is used for preprocessing the data set through an ADASYN algorithm to obtain a preprocessed data set;
the fault diagnosis module is used for constructing a random forest fault diagnosis model through a random forest algorithm based on the preprocessed data set; and the fault type of the fault current to be diagnosed is diagnosed according to the preprocessed data set and the random forest fault diagnosis model.
9. The ADASYN algorithm and random forest algorithm based fault diagnosis system of claim 8, wherein the data set construction module comprises:
the wavelet packet decomposition layer construction submodule is used for constructing not less than one wavelet packet decomposition layer; the node energy calculation submodule is used for carrying out multi-layer wavelet packet decomposition on the fault current by utilizing the wavelet packet decomposition layer to obtain node energy of each wavelet packet;
the data set construction submodule is used for establishing a level energy characteristic vector according to the node energy of each wavelet packet and carrying out normalization calculation to obtain a normalization characteristic vector; recording the normalized feature vectors and the corresponding fault types to the data set.
10. The ADASYN algorithm and random forest algorithm based fault diagnosis system of claim 8, wherein the fault diagnosis module comprises:
the resampling submodule is used for resampling the preprocessed data set by a Bootstrap method to obtain a training data set;
the decision tree generation submodule is used for constructing a root node, selecting a fault types as decision tree characteristics, and generating a decision tree based on all the normalized characteristic vectors corresponding to the fault types a in the training data set;
the verification submodule is used for randomly extracting the normalized feature vector from the preprocessed data set to serve as a test sample, voting fault types of the test sample through the C decision trees and verifying whether the fault type with the largest voting times is consistent with the actual fault type of the test sample; and if the random forest fault diagnosis models are consistent, the random forest fault diagnosis model is constructed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210267057.2A CN114722915B (en) | 2022-03-16 | 2022-03-16 | Fault diagnosis method and system based on ADASYN algorithm and random forest algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210267057.2A CN114722915B (en) | 2022-03-16 | 2022-03-16 | Fault diagnosis method and system based on ADASYN algorithm and random forest algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114722915A true CN114722915A (en) | 2022-07-08 |
CN114722915B CN114722915B (en) | 2024-07-23 |
Family
ID=82237028
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210267057.2A Active CN114722915B (en) | 2022-03-16 | 2022-03-16 | Fault diagnosis method and system based on ADASYN algorithm and random forest algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114722915B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118585917A (en) * | 2024-08-07 | 2024-09-03 | 北京中联太信科技有限公司 | Electromagnetic valve fault online diagnosis method based on time-frequency domain characteristic analysis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909977A (en) * | 2019-10-12 | 2020-03-24 | 郑州电力高等专科学校 | Power grid fault diagnosis method based on ADASYN-DHSD-ET |
US20210293881A1 (en) * | 2017-11-09 | 2021-09-23 | Hefei University Of Technology | Vector-valued regularized kernel function approximation based fault diagnosis method for analog circuit |
-
2022
- 2022-03-16 CN CN202210267057.2A patent/CN114722915B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210293881A1 (en) * | 2017-11-09 | 2021-09-23 | Hefei University Of Technology | Vector-valued regularized kernel function approximation based fault diagnosis method for analog circuit |
CN110909977A (en) * | 2019-10-12 | 2020-03-24 | 郑州电力高等专科学校 | Power grid fault diagnosis method based on ADASYN-DHSD-ET |
Non-Patent Citations (2)
Title |
---|
殷涛;苏盛;刘爱国;舒一飞;薛阳;杨艺宁: "基于向前逐步回归的高损线路窃电检测", 《电网技术》, 3 November 2021 (2021-11-03) * |
袁帅;张慧丽;王晓燕;王涵;赵波;: "不平衡学习在电力设备故障诊断中的应用", 信息与电脑(理论版), no. 09, 15 May 2019 (2019-05-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118585917A (en) * | 2024-08-07 | 2024-09-03 | 北京中联太信科技有限公司 | Electromagnetic valve fault online diagnosis method based on time-frequency domain characteristic analysis |
CN118585917B (en) * | 2024-08-07 | 2024-10-11 | 北京中联太信科技有限公司 | Electromagnetic valve fault online diagnosis method based on time-frequency domain characteristic analysis |
Also Published As
Publication number | Publication date |
---|---|
CN114722915B (en) | 2024-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740523B (en) | Power transformer fault diagnosis method based on acoustic features and neural network | |
CN112041693B (en) | Power distribution network fault positioning system based on mixed wave recording | |
CN112885372B (en) | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound | |
CN105930901B (en) | A kind of Diagnosis Method of Transformer Faults based on RBPNN | |
CN111612650B (en) | DTW distance-based power consumer grouping method and system | |
CN107909118B (en) | Power distribution network working condition wave recording classification method based on deep neural network | |
CN107122790A (en) | Non-intrusion type load recognizer based on hybrid neural networks and integrated study | |
CN113177357B (en) | Transient stability assessment method for power system | |
CN111368904B (en) | Electrical equipment identification method based on electric power fingerprint | |
CN109145706A (en) | A kind of sensitive features selection and dimension reduction method for analysis of vibration signal | |
CN106447039A (en) | Non-supervision feature extraction method based on self-coding neural network | |
CN114781435B (en) | Power electronic circuit fault diagnosis method based on improved Harris eagle optimization algorithm optimization variation modal decomposition | |
Gowrishankar et al. | Transmission line fault detection and classification using discrete wavelet transform and artificial neural network | |
CN112766140A (en) | Transformer fault identification method based on kernel function extreme learning machine | |
CN109307852A (en) | A kind of method and system of the measurement error of determining electric automobile charging pile electric energy metering device | |
CN113203914A (en) | Underground cable early fault detection and identification method based on DAE-CNN | |
CN109142976A (en) | Cable fault examination method and device | |
CN112085111A (en) | Load identification method and device | |
CN114062832A (en) | Method and system for identifying short-circuit fault type of power distribution network | |
CN114722915A (en) | Fault diagnosis method and system based on ADASYN algorithm and random forest algorithm | |
CN115712871A (en) | Power electronic system fault diagnosis method combining resampling and integrated learning | |
CN115015683A (en) | Cable production performance test method, device, equipment and storage medium | |
CN106597154A (en) | Transformer fault diagnosis lifting method based on DAG-SVM | |
CN116776245A (en) | Three-phase inverter equipment fault diagnosis method based on machine learning | |
CN109459609B (en) | Distributed power supply frequency detection method based on artificial neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |