CN114077846A - RF-LSTM-based fault current multi-domain identification method and storage medium - Google Patents

RF-LSTM-based fault current multi-domain identification method and storage medium Download PDF

Info

Publication number
CN114077846A
CN114077846A CN202111198766.1A CN202111198766A CN114077846A CN 114077846 A CN114077846 A CN 114077846A CN 202111198766 A CN202111198766 A CN 202111198766A CN 114077846 A CN114077846 A CN 114077846A
Authority
CN
China
Prior art keywords
domain
lstm
fault
current
arc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111198766.1A
Other languages
Chinese (zh)
Other versions
CN114077846B (en
Inventor
丁津津
刘辉
谢民
郑国强
汪伟
张倩
徐斌
邵庆祝
孙辉
于洋
张峰
俞斌
汪勋婷
张骏
赵文广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
Anhui University
Original Assignee
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd, Anhui University filed Critical Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
Priority to CN202111198766.1A priority Critical patent/CN114077846B/en
Priority claimed from CN202111198766.1A external-priority patent/CN114077846B/en
Publication of CN114077846A publication Critical patent/CN114077846A/en
Application granted granted Critical
Publication of CN114077846B publication Critical patent/CN114077846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/08Locating faults in cables, transmission lines, or networks
    • G01R31/088Aspects of digital computing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/12Testing dielectric strength or breakdown voltage ; Testing or monitoring effectiveness or level of insulation, e.g. of a cable or of an apparatus, for example using partial discharge measurements; Electrostatic testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • G06F2218/06Denoising by applying a scale-space analysis, e.g. using wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Locating Faults (AREA)

Abstract

The invention relates to a fault current multi-domain identification method based on RF-LSTM and a storage medium, wherein the method comprises the following steps of acquiring an original current signal aiming at an arc fault platform; processing the original current signal, performing kernel principal component analysis to extract a third principal component, and then performing time domain, frequency domain and energy domain feature extraction on the third principal component signal; secondly, carrying out unbiased prediction importance estimation by using a random forest to select high correlation characteristics under the corresponding load condition; and finally, using the screened features as the feature input of the LSTM for learning and training to realize multi-domain identification of the fault arc. The method reduces the calculation amount and improves the detection speed and precision; the result shows that the method can accurately identify the arc fault.

Description

RF-LSTM-based fault current multi-domain identification method and storage medium
Technical Field
The invention relates to the technical field of arc detection methods, in particular to a fault current multi-domain identification method based on RF-LSTM and a storage medium.
Background
With the complication of the structure of the power distribution network, the number and the variety of various electric products are greatly increased and new energy is connected to the grid. These conditions constitute a potential risk to electrical safety. Arc faults are one of such serious threats, and since the series arc fault current is usually lower than the threshold value of the circuit breaker, the traditional protection device is difficult to detect the series arc faults in time to realize protection; therefore, when series arc faults cannot be found and processed in time, a large amount of heat can be generated by the fault electric contacts to ignite combustible materials, and a fire disaster can quickly spread along wires and cables, so that a great threat is formed to the safe operation of a power system. Therefore, the development of series arc fault experiments and detection method researches is of great significance, and at present, arc faults are mainly detected in two directions; one is from the analysis of the physical characteristics of the electric arc, monitor the characteristics such as arc light, noise and temperature change of the fault electric arc, can realize the accurate positioning and recognition to the intense electric arc; it is only suitable for detecting arc faults occurring at specific locations, such as medium or low voltage switchgear. The other method is to perform characteristic analysis on the electrical quantity, and extract fault characteristics by adopting various data mining algorithms to detect the arc fault; for example, in the prior art, an AlexNet convolutional neural network is constructed, a current signal is input into a deep learning network, and the network autonomously excavates characteristics hidden behind current signal data to realize identification of series fault arcs; and the wavelet decomposition is carried out on the acquired signals, and the average value and the standard deviation of the energy of each layer of detail signals are input into a BP neural network to form a wavelet neural network, so that the identification of different load test samples is realized. And time domain waveform and two-dimensional gray scale conversion thereof are used for representing time domain change characteristics of the arc fault, and then a convolutional neural network is used for extracting gray scale conversion characteristics of the arc fault. And extracting arc characteristic frequency bands by utilizing wavelet packet transformation, reconstructing characteristic frequency band signals, and solving 4-order cumulant of reconstructed signals to obtain current signal Gaussian mutation information, thereby providing a fault arc identification method suitable for branch arcs and mixed loads. And the characteristic fusion of three information entropies is realized by utilizing a three-dimensional entropy distance method, and a plurality of typical arc fault types are classified by utilizing a three-dimensional entropy space. And the serial arc fault diagnosis and line selection are realized by utilizing the cyclic neural network. And adopting the time domain characteristics and the frequency domain characteristics of the current signals as fault characteristics. The current signal principal component is calculated by KPCA. The kurtosis and skewness of the principal components are used as arc fault signatures. The FA-SVM is used for arc fault identification. Arc detection is carried out by extracting the time domain, frequency domain and wavelet energy entropy characteristics of the current signal as DNN characteristic input.
Disclosure of Invention
The invention provides a fault current multi-domain identification method based on RF-LSTM and a storage medium, which can solve the technical problems.
In order to achieve the purpose, the invention adopts the following technical scheme:
a fault current multi-domain identification method based on RF-LSTM comprises the following steps,
firstly, a series arc fault platform is constructed, simulation experiments under various load conditions are carried out aiming at the harmonic component and the load noise of the current power grid, and an original current signal is obtained;
processing the original current signal, performing kernel principal component analysis to extract a third principal component, and then performing time domain, frequency domain and energy domain feature extraction on the third principal component signal;
secondly, carrying out unbiased prediction importance estimation by using a random forest to select high correlation characteristics under the corresponding load condition;
and finally, using the screened features as the feature input of the LSTM for learning and training to realize multi-domain identification of the fault arc.
Further, the processing of the raw current signal comprises the steps of:
calculating test data under different load conditions by adopting KPCA algorithm, specifically, calculating mapping function by KPCA algorithm
Figure BDA0003304097760000021
Mapping the sample to a high-dimensional feature space F, and performing PCA analysis in the F, namely obtaining a linear representation of the test sample x _ new in the subspace through a KPCA algorithm, namely a vector after dimensionality reduction;
wherein, the calculation process of the KPCA algorithm comprises the following steps:
3) removing the average value, and centralizing;
4) calculating a kernel matrix K by using a kernel function;
3) calculating a characteristic value kernel characteristic vector of the kernel matrix;
4) arranging the eigenvectors into a matrix from top to bottom according to the corresponding eigenvalue size, and taking the first k rows to form a matrix P;
5) and P is the data after dimension reduction.
Further, the time domain features include the following:
(1) the skewness definition includes normal distribution, i.e. skewness is 0, right skewness distribution is also called positive skewness distribution, the skewness is greater than 0, left skewness distribution is also called negative skewness distribution, and the skewness is less than 0; the following were used:
Figure BDA0003304097760000031
wherein k is2、k3Respectively representing second-order and third-order center distances;
(2) kurtosis is also called kurtosis coefficient; the characteristic number of the peak of the probability density distribution curve is characterized at the mean value as follows
Figure BDA0003304097760000032
Where n is the sample size, D is the variance, xiFor the (i) th measured value(s),
Figure BDA0003304097760000037
is an arithmetic mean;
(3) crest factor CarcThe change of the current waveform is described for the time domain characteristics, and is calculated by formula (15), N is the number of samples, xiThe amplitude of each current sampling point, the crest coefficient represents the extreme degree of the peak value in the waveform, the maximum value is found by using the absolute value of the current sampling point, and the condition that the positive and negative waveforms of partial load current are not normal is avoidedEffect of symmetry on subsequent analysis:
Figure BDA0003304097760000033
(4) the fluctuation of the envelope curve of the time domain waveform of the signal in a normal state is far smaller than that in a fault state, the fault state corresponds to a complex track curve, the fluctuation of the envelope curve is large, and the standard deviation V of the envelope curve sequence is used as a quantitative index for measuring the fluctuation condition of the envelope curve of the time domain waveform of the signal x, namely
V=max(σ12) (16)
In the formula sigma1And σ2Respectively, are envelope maximum point sequences T on the time domain waveform of the signal xiStandard deviation and lower envelope minimum point sequence B of (i ═ 1,2, …, n)i(i ═ 1,2, …, n) standard deviations calculated as follows:
Figure BDA0003304097760000034
Figure BDA0003304097760000035
(5) spearman rank correlation coefficient rhosIs a nonparametric index used for measuring the correlation between two variables; a normal test current waveform having the same circuit topology is arbitrarily selected as a reference waveform, and Spearman correlation coefficients of other test waveforms are obtained. The calculation method is shown in equation (19).
Figure BDA0003304097760000036
Where x is the test current waveform and y is the selected reference current waveform.
Further, the frequency domain features include:
(1) a wavelet coefficient variance;
(2) performing wavelet transformation;
(3) the harmonic wave caused by the electric arc and the harmonic wave caused by power change under different load conditions are complex and diverse, the feature extraction is realized by adopting discrete Fourier transform, and the harmonic wave normalization H (n) is applied as follows:
Figure BDA0003304097760000041
in the formula, AnIs the amplitude of each harmonic, A1Is the amplitude of the fundamental component, and h (n) is the calculated characteristic of the harmonic.
Further, the energy domain features include:
(1) wavelet energy entropy feature extraction is carried out on the current signal, the ratio of energy of each frequency band to total energy is calculated according to equation (16) to be used as the wavelet packet energy feature of the current signal,
Figure BDA0003304097760000042
where E (i, m) is the wavelet energy of the signal current x (t), the decomposition level of E (i, m) is i, the scale is m, HnIs the wavelet packet entropy;
(2) the power spectrum is an energy description following the principle of conservation of energy in the process of converting a time-domain signal into a frequency-domain signal by fourier transform:
Figure BDA0003304097760000043
wherein s isiIs a sequence of spectra y at the corresponding frequencyiPower spectral value of yiIs a Fourier transform of a time domain signal, qiIs the percentage of the ith power spectrum in the overall power spectrum. HpIs the power spectral entropy;
(3) sample entropy SampEn is used to measure the complexity of a sequence, and the calculation steps of SampEn are as follows:
dividing n sampling points of one period into a group of vector sequences with dimension m,
Im(n)={i(n),i(n+1),…i(n+m-1)}1≤n≤N-m+1 (23)
vector sequence Im(n) and ImThe distance between (t) is defined as d;
d=maxk=0,1,…,m-1(|i(n+k)-i(t+k)|) (24)
calculating the number of t, wherein Im(n) and Im(t) the distance between them is less than or equal to r, denoted An.
Figure BDA0003304097760000044
A is to bem(r) is defined as:
Figure BDA0003304097760000051
let k be m +1, repeat equations (20) and (21):
Figure BDA0003304097760000052
Hsis the sample entropy:
Figure BDA0003304097760000053
further, unbiased prediction importance estimation is performed on random forests to select high correlation features corresponding to the load condition, and the method comprises feature selection based on the random forests and comprises the following steps:
assuming that class C exists and the probability that a sample point belongs to class I is "P" u "C", the calculation formula of the kini value is given:
Figure BDA0003304097760000054
if according to the characteristic A willThe current sample set I is divided into normal currents (I)N) And fault arc current (I)F) Then, under the condition of the characteristic A, the Gini index of the set I is defined as:
Figure BDA0003304097760000055
gini (I, A) represents the uncertainty of the data set I with different sets of characteristics A, the larger the value of the Gini index, the larger the uncertainty of the sample set.
Further, unbiased prediction importance estimation is carried out by using a random forest to select high correlation characteristics under the corresponding load condition, and feature identification based on long-term and short-term memory is also included, specifically, the structure of a forgetting gate, an input gate and an output gate is adopted to process forgetting information and remembering information;
wherein the content of the first and second substances,
formula for forgetting the door:
ft=σ[Wf·(ht-1,xt+bf)] (31)
input X ═ X1,…XT]Data h of the previous hidden layer at time tt-1Combined and adjusted to the same size as the hidden layer at time T by the W matrix, then added with an offset bfAnd classifying between 0 and 1 using sigmoid function;
the formula for the input gate:
Figure BDA0003304097760000056
in the output gate, the output to the next LSTM cell also requires two steps: 1. and 2, adjusting through a tanh function, and multiplying the hidden state data at the last moment by the input comprehensive data at the moment.
Further, the selection criteria of the high correlation characteristic are that h (n)5 is the most important characteristic of the linear load arc fault, and the power spectrum entropy is the most important characteristic of the non-linear load arc fault.
Further, the learning and training step comprises:
inputting the feature set into an LSTM for training after the feature selection is finished; loading the feature data set into a training set X, wherein the training set X is a unit array and comprises 266 sequences with different lengths, and 9 features correspond to cepstral coefficients; y is the classification vector for labels 1,2, …, 4, and the entries in the training set are matrices with 9 rows, one for each feature, and a different number of columns, i.e., one for each time step;
defining an LSTM network architecture; the input size is designated as 9, i.e. the number of features of the input data; assigning an LSTM layer with 100 hidden units and outputting the last element of the sequence, and finally assigning nine classes by including a full join layer of size 4, followed by a softmax layer and a classification layer;
specifying training options, specifying the solver as "adam", and specifying "GradientThreshold" as 1; the mini-batch size is set to 20 and the maximum epoch number is set to 50.
In another aspect, the present invention also discloses a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to perform the steps of the method as described above.
According to the technical scheme, the series arc fault of the low-voltage distribution system can possibly cause fire accidents, namely, due to the power quality problem and a large number of nonlinear loads such as power electronic equipment, a circuit can contain a large number of complex harmonic components, so that the electric quantity waveform under the condition of partial load is similar to the electric quantity waveform under the condition of series arc fault. Aiming at various load conditions, the invention provides a multi-domain arc detection method based on random forest long-time memory (RF-LSTM), and a series of arc simulation experiments are carried out in a laboratory; firstly, performing Kernel Principal Component Analysis (KPCA) on an original current signal to extract a third principal component, then performing time domain, frequency domain and energy domain feature extraction on the third principal component signal, selecting significant features related to fault arc height under linear load and nonlinear load by using unbiased prediction importance estimation of a random forest, and using the selected features as feature input of LSTM for learning and training. The result shows that the RF-LSTM arc detection method has higher precision;
the fault current multi-domain identification method based on the RF-LSTM firstly constructs a series arc fault platform, carries out simulation experiments under various load conditions aiming at the current power grid with larger harmonic component and serious interference of load noise, processes an original current signal through Kernel Principal Component Analysis (KPCA), and then selects principal component as a characteristic object of a time domain, a frequency domain, a nuclear energy domain and the like. And then, carrying out unbiased prediction importance estimation by using a random forest to select the high correlation characteristics under the corresponding load condition. Finally, the screened features are used as a training set and a test set of the LSTM, and the method reduces the calculation amount and improves the detection speed and precision. The result shows that the method can accurately identify the arc fault.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is an experimental schematic of the present invention;
FIG. 3 is a current waveform of a single phase circuit in a normal state and an arc fault state;
FIG. 4 is a preconditioning current waveform;
FIG. 5 is three gates in the LSTM neural unit;
FIG. 6 is a multiple signature graph of normal and arc fault preconditioning currents;
fig. 7 is a multi-domain feature line graph.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.
As shown in fig. 1, the RF-LSTM-based fault current multi-domain identification method according to this embodiment includes:
aiming at an arc fault platform, acquiring an original current signal;
processing the original current signal, performing kernel principal component analysis to extract a third principal component, and then performing time domain, frequency domain and energy domain feature extraction on the third principal component signal;
secondly, carrying out unbiased prediction importance estimation by using a random forest to select high correlation characteristics under the corresponding load condition;
and finally, using the screened features as the feature input of the LSTM for learning and training to realize multi-domain identification of the fault arc.
In the embodiment, a series arc fault platform is constructed, and simulation experiments under various load conditions are performed aiming at the harmonic component and the load noise of the current power grid to obtain an original current signal;
the following is a detailed description:
1 platform Structure and arc Fault experiment
1.1 Experimental facility
Because arc faults are always random, they are difficult to capture or reproduce. Therefore, a specific test platform must be used to generate fault arcs with different loads. The arc fault test platform was designed according to the standard, as shown in fig. 2. When the cable is loose, the arc generator can generate a real arc. The input device is a 220V and 50Hz power supply. A Current Transformer (CT) is used to detect a single current of a load. In addition, well-designed hardware boards are used for Data Acquisition (DAQ), rather than oscilloscopes. The data acquisition board is based on a single chip microcomputer, and the original sampling rate of acquired data is 5.8 kHz. The recorded data is collected on the circuit board and can be transmitted to the PC through the USB interface.
1.2 arc fault test
To make the experiment more realistic, the present invention selects various indoor load types and combinations, as shown in table 1. Wherein, the electric kettle and the electric baking pan are linear loads, and other electric appliances are nonlinear loads.
Table-type and power parameters of a selected load
Figure BDA0003304097760000081
1.3 Experimental analysis
The current waveforms of the single-phase circuit in the normal state and the arc fault state are respectively shown in fig. 3; FIG. 3 is a standardized single-phase current waveform, (a) is a single-phase current waveform of an electric kettle, (b) is a single-phase current waveform of an LED lamp, (c) is a single-phase current waveform of a hair dryer cold air, (d) is a single-phase current waveform of an electric baking pan, (e) is a single-phase current waveform of the hair dryer cold air + the electric baking pan, (f) is a single-phase current waveform of a dust collector, and (g) is a single-phase current waveform of the LED lamp + a humidifier; after the fault arc experiment, the original current signal waveform of the single-phase current was recorded, as shown in fig. 2. The right side of the dotted line in each figure is the current waveform at the time of fault arc occurrence. Because the normal current waveform under certain load types or combinations of loads is similar to the fault current waveform of other loads. The original current waveform cannot be directly used as a basis for judging an arc fault, and thus the obtained current waveform needs to be preprocessed.
2 preprocessing of arc fault signals
2.1 KPCA algorithm
The KPCA algorithm is an algorithm based on PCA, and can better extract the characteristics of nonlinear and non-stationary signals. KPCA is commonly used for dimensionality reduction. By computing the eigenvectors of the kernel matrix, the uncorrelated components in the measured signals can be separated.
KPCA passes a mapping function
Figure BDA0003304097760000092
The samples are mapped to a high-dimensional feature space F and PCA analysis is performed in F. When the data is mapped to a high dimensional space, it gives the data a better separability in that space. The principle of the KPCA algorithm is as follows. Assume that the data set is a matrix X of m rows and n columns. Mapping a dataset to a high dimensional space
Figure BDA0003304097760000093
Reducing the dimension of the high-dimensional space by PCA:
Φ(X)Φ(X)Twi=λiwi (1)
wherein, wi(i ═ 1, 2.., n) is high dimensionalFeature vectors in space. Lambda [ alpha ]i(i 1.., n) is a corresponding characteristic value.
Transforming linear expressions of eigenvectors with sample set phi
Figure BDA0003304097760000091
Substituting w into formula wi(i ═ 1, 2.., n), the following forms were obtained:
Φ(X)Φ(X)TΦ(X)α=λiΦ(X)α (3)
then both sides of the equation Φ (X) are left-multiplied. The formula is as follows:
Φ(X)TΦ(X)Φ(X)TΦ(X)α=λiΦ(X)TΦ(X)α (4)
the purpose of this is to construct two phi (X)TΦ (X) and replaced by a kernel matrix K (which is a symmetric matrix), there are several kernel functions:
(1) linear kernel function
The inner product of 2 vectors plus a constant can only solve the linear divisibility problem, if the linear kernel function is applied to KPCA, the derived kernel original PCA algorithm is the same as that of the KPCA
k(x,y)=xTy+c (5)
The parameter c can be adjusted;
(2) polynomial kernel function
Slightly more complex than a linear kernel, the non-linearity problem can be handled because of the increased exponent d. Here, a is required to be greater than 0 and c is required to be equal to or greater than 0. The polynomial kernel is well suited to the problem of normalization of all training data.
k(x,y)=(axTy+c)d (6)
(3) Gaussian radial basis kernel function
The gaussian kernel is a typical representation of the radial basis function kernel (RBF). The Gaussian kernel involves calculation of the Euclidean distance (2 norm) between two vectors, and the adjustable parameter has only one sigma and controls the action range of the function.
Figure BDA0003304097760000101
(4) Exponential kernel function
Figure BDA0003304097760000102
(5) Laplace kernel function
Figure BDA0003304097760000103
Then, the formula further changes to the following form:
K2α=λiK (10)
k is removed from two sides simultaneously, and a solving formula with extremely high PCA similarity is obtained:
Kα=λiα (11)
meaning that the eigenvectors corresponding to the largest K eigenvalues are found, and since K is a symmetric matrix, the resulting solution vectors must be orthogonal to each other. Since the present invention can obtain a set of basis w of a high dimensional spacei(i-1, … …, d), which can form a subspace of the high-dimensional space, the object of the present invention is to obtain a test sample xnewThe linear representation in this subspace, i.e. the vector after dimensionality reduction.
Figure BDA0003304097760000104
Calculating the KPCA algorithm:
5) the mean value was removed and the centering was performed.
6) The kernel matrix K is calculated using a kernel function.
7) And calculating an eigenvalue kernel eigenvector of the kernel matrix.
8) And arranging the eigenvectors into a matrix from top to bottom according to the corresponding eigenvalue size, and taking the first k rows to form the matrix P.
9) P is the data after dimensionality reduction.
2.2 the results of the pretreatment are shown in FIG. 4; the preconditioning current waveform in fig. 4 is as follows:
(a) the main components of the electric kettle in normal and arc fault states, (b) the main component of the LED lamp in normal and arc fault states, (c) the main component of the electric baking pan in normal and arc fault states, (d) the main component of the electric baking pan in normal and arc fault states, (e) the main component of the blower in normal and arc fault states, (f) the main component of the vacuum cleaner in normal and arc fault states, (g) the main component of the LED lamp and the humidifier in normal and arc fault states.
Therefore, the KPCA algorithm is adopted to calculate the test data under different load conditions. The kernel function of the KPCA algorithm is a radial basis kernel function. Fig. 4 shows the significant change in the third principal component curve before and after the occurrence of an arc fault. After the arc fault occurs, the curve peak value is high, and the fluctuation range is large. From the above analysis, the curve trend of the third principal component can be used to identify an arc fault.
3 feature extraction for arc faults
The arc is essentially a high temperature plasma that moves rapidly under the influence of an external magnetic field, even the magnetic field generated by the arc itself, forming a very complex shape. It therefore affects the current in the circuit, causing variations. The invention researches the multi-domain feature extraction of the current data from the aspects of time domain, frequency domain and energy.
3.1 time Domain characterization
(1) Skewness is a measure of the direction and degree of skew of the statistical data distribution, and is a numerical feature of the degree of asymmetry of the statistical data distribution. Define the degree of skewness is the third normalized moment of the sample. The skewness definition includes a normal distribution (skewness of 0), a right-skew distribution (also called a positive-skew distribution, whose skewness is >0), and a left-skew distribution (also called a negative-skew distribution, whose skewness is < 0).
Figure BDA0003304097760000111
Wherein k is2、k3Representing the second and third order center-to-center distances, respectively.
(2) The kurtosis is also called kurtosis coefficient. The number of features characterizing the peak of the probability density distribution curve at the mean. Intuitively, kurtosis reflects the sharpness of the peak.
Figure BDA0003304097760000112
Where n is the sample size, D is the variance, xiFor the (i) th measured value(s),
Figure BDA0003304097760000113
is an arithmetic mean.
(3) Crest factor (C)arc) Is also selected as a time domain feature to describe the change in current waveform. Calculated by equation (15). N is the number of samples, xiIs the amplitude of each current sample point. The crest factor represents the extreme degree of the peak in the waveform. The maximum value is searched by using the absolute value of the current sampling point, so that the influence of asymmetry of positive and negative waveforms of partial load current on subsequent analysis is avoided:
Figure BDA0003304097760000121
(4) in a normal state, the fluctuation of the envelope curve of the signal time domain waveform is far smaller than that in a fault state, the fault state corresponds to a complex track curve, and the fluctuation of the envelope curve is large. Taking envelope sequence standard deviation V as a quantitative index for measuring the fluctuation condition of the envelope of the time domain waveform of the signal x, namely
V=max(σ12) (16)
In the formula sigma1And σ2Respectively, are envelope maximum point sequences T on the time domain waveform of the signal xiStandard deviation and lower envelope minimum point sequence B of (i ═ 1,2, …, n)i(i ═ 1,2, …, n) standard deviation.The calculation formula is as follows:
Figure BDA0003304097760000122
Figure BDA0003304097760000123
(5) spearman rank correlation coefficient (ρ)s) Is a nonparametric index that measures the correlation between two variables. It uses a monotonic equation to evaluate the correlation between two statistical variables. If there are no repeated values in the data, and when the two variables are perfectly monotonically related, ρsIs +1 or-1. When rhosZero means that as x increases, y does not have any trend. When x and y are nearly perfectly monotonically correlated, ρsThe absolute value will be increased. A normal test current waveform having the same circuit topology is arbitrarily selected as a reference waveform, and Spearman correlation coefficients of other test waveforms are obtained. The calculation method is shown in equation (19).
Figure BDA0003304097760000124
Where x is the test current waveform and y is the selected reference current waveform.
3.2 frequency Domain characterization
(1) The wavelet coefficient variance may reflect the severity of the arc. The stronger the arc, the more distinct the signal jump, and the larger the pulse corresponding to the wavelet coefficient, the larger its variance value.
(2) Wavelet transforms inherit and develop the localization idea of short-time fourier transforms. It is mainly characterized by that it can fully highlight some aspects of the problem by means of conversion. Mean Absolute Deviation (MAD) is a robust measure of sample diversity in univariate datasets. The MAD is more flexible than a standard deviation in processing the outlier of the data set, and can greatly reduce the influence of the outlier on the data set. The ratio of the maximum absolute value (MA) to the MAD may characterize the curve.
(3) Arc-induced harmonics are one of its important features. The harmonics caused by power variations under different load conditions are complex and diverse. And (3) realizing feature extraction by adopting Discrete Fourier Transform (DFT). The harmonic normalization H (n) applies as follows:
Figure BDA0003304097760000131
in the formula, AnIs the amplitude of each harmonic, A1Is the amplitude of the fundamental component, and h (n) is the calculated characteristic of the harmonic.
3.3 energy Domain features
During arcing, the arc emits a significant amount of light and heat. Since the energy is conserved, the energy released comes from the current in the line, and it is therefore also necessary to analyze the energy of the current. Information entropy is used to describe the complexity and uncertainty of the system.
(1) Wavelet packet analysis has higher accuracy in signal analysis and better resolution of the high frequency part of the signal than wavelet analysis. For this purpose, the wavelet energy entropy theory is introduced. The advantage of power spectral entropy is that the signal energy is described in the frequency spectrum. It better describes the energy change of the current signal from the normal period to the arc fault period. And processing the complex sequence samples by utilizing the advantage of sample entropy.
And performing wavelet energy entropy characteristic extraction on the current signal. According to equation (16), the ratio of the energy per frequency band to the total energy is calculated as the wavelet packet energy characteristic of the current signal.
Figure BDA0003304097760000132
Where E (i, m) is the wavelet energy of the signal current x (t). E (i, m) has a decomposition level of i and a scale of m, HnIs the wavelet packet entropy.
(2) The power spectrum is an energy description following the principle of conservation of energy in the process of converting a time-domain signal into a frequency-domain signal by fourier transform:
Figure BDA0003304097760000133
wherein s isiIs a sequence of spectra y at the corresponding frequencyiPower spectral value of yiIs a Fourier transform of a time domain signal, qiIs the percentage of the ith power spectrum in the overall power spectrum. HpIs the power spectral entropy.
(3) Sample entropy (SampEn) is used to measure the complexity of the sequence. Compared to ApEn, SampEn has two advantages: the calculated amount is independent of the data length, the consistency is good, and the influence of parameter change on the sample entropy is the same. The lower the sample entropy, the higher the sequence self-similarity; the larger the sample entropy, the more complex the sample sequence. The calculation procedure for SampEN is as follows:
the n sampling points of one period are divided into a group of vector sequences with dimension m.
Im(n)={i(n),i(n+1),…i(n+m-1)}1≤n≤N-m+1 (23)
Vector sequence Im(n) and ImThe distance between (t) is defined as d.
d=maxk=0,1,…,m-1(|i(n+k)-i(t+k)|) (24)
Calculating the number of t, wherein Im(n) and Im(t) the distance between them is less than or equal to r, denoted An.
Figure BDA0003304097760000141
A is to bem(r) is defined as:
Figure BDA0003304097760000142
let k be m +1, repeat equations (20) and (21):
Figure BDA0003304097760000143
Hsis the sample entropy:
Figure BDA0003304097760000144
4 feature selection and recognition
By extracting the characteristics of the preprocessed current signals, various characteristics of a time domain, a frequency domain and an energy domain can be obtained. In fact, the characteristics of fault arcs differ due to the variety of load types and load combinations. Therefore, the screening of the characteristics can effectively improve the accuracy of fault arc detection.
4.1 feature selection based on random forest
Random forest is an algorithm that integrates multiple trees through an integrated learning idea. Its basic unit is a decision tree. The random forest based on the CART classification tree realizes the main idea of the random forest. And finding an optimal segmentation point by minimizing the segmented kini value or sample variance, and segmenting the node into two parts. CART uses the kini value as the basis for segmentation. The purpose of splitting is to enable data to be more pure and enable the result output by the decision tree to be closer to a true value. CART uses a kini value to measure node purity. The less sensitive the node, the worse the classification effect.
Assuming that class C exists and the probability that a sample point belongs to class I is "P" u "C", the calculation formula of the kini value is given:
Figure BDA0003304097760000145
if the current sample set I is divided into normal currents (I) according to the characteristic AN) And fault arc current (I)F) Then, under the condition of the characteristic A, the Gini index of the set I is defined as:
Figure BDA0003304097760000151
gini (I, A) represents the uncertainty of the dataset I with different sets of characteristics A. The larger the value of the kini index, the greater the uncertainty of the sample set.
4.2 feature recognition based on Long-short term memory
Long-term memory (LSTM) is a special type of RNN that can learn long-term relevant information. The main purpose is to solve the problems of gradient disappearance and gradient explosion in the long sequence training process. In short, LSTM outperforms normal RNN in longer sequences. Aiming at the problems of forgetting information and memorizing information, the structure of a forgetting gate, an input gate and an output gate is adopted, so that the problem is solved well. The three "gates" in the LSTM neural unit are shown in fig. 5, fig. 5 being three gates in the LSTM neural unit, wherein,
formula for forgetting the door:
ft=σ[Wf(ht-1,xt+bf)] (31)
input X ═ X1,…XT]Data h of the previous hidden layer at time tt-1Combined and adjusted to the same size as the hidden layer at time T by the W matrix. Then adding an offset bfAnd classified between 0 and 1 using sigmoid function. The formula for the input gate:
Figure BDA0003304097760000152
calculating the importance factor of the input and the input adjusted by the tanh function, and then adding to Ct-1In order to expand the total memory capacity, the memory can be continuously updated. Although some original memory may be added and updated again, this does not affect the final result.
In the output gate, the output to the next LSTM cell also requires two steps: 1. adjusted by the tanh function. 2. And multiplying the hidden state data at the last moment by the input comprehensive data at the moment.
Whether it is a forgetting gate, an input gate or an output gate, they are determined by a value between 0 and 1, which is multiplied by a set of data to display the dataThe importance of (c). The value is given bytAnd xtObtained by sigmoid function. They are in different locations, but they all measure the necessity of later transmission of the current data.
5 arc fault detection
5.1 results of Multi-Domain feature extraction
As shown in fig. 6, fig. 6 illustrates multiple characteristics of the normal and arc fault preconditioning currents;
the electric heating kettle comprises (a) multiple characteristics of normal and arc fault pretreatment current of an electric heating kettle, (b) multiple characteristics of normal and arc fault pretreatment current of an LED lamp, (c) multiple characteristics of normal cold air of an electric hair drier and arc fault pretreatment current, (d) multiple characteristics of normal and arc fault pretreatment current of an electric baking pan, (e) multiple characteristics of normal and arc fault pretreatment current in the cold air of the electric hair drier and the electric baking pan, (f) multiple characteristics of normal and arc fault pretreatment current of a dust collector, (g) multiple characteristics of normal and arc fault pretreatment current of the LED lamp and the humidifier and multiple characteristics of the arc fault pretreatment current.
The multi-domain characteristics have different trends under normal and fault arc conditions for different load types and load combinations. For example, the electric kettle has obvious time domain and frequency domain changes, but the wavelet energy entropy change of the energy domain is not large. The correlation coefficient for normal and fault arc conditions of the vacuum cleaner does not change significantly. In response to this phenomenon, it is necessary to screen the characteristics of different load types and load combinations in order to accurately identify the fault arc.
5.2 results of feature selection
Each characteristic of different loads has a different tendency to change. Therefore, there is a need for more targeted feature screening for different load conditions. According to table 2, the multi-domain features of different load types and load combinations have different score rankings in random forests. If some unnecessary features are used as the input of the load state feature recognition, the training cost of the model is effectively increased, and the accuracy of the model is influenced.
TABLE 2 first nine characteristics under different load combinations
Figure BDA0003304097760000161
As can be seen from table 2, h (n)5 is the most important feature of the linear load arc fault, and the power spectrum entropy is the most important feature of the non-linear load arc fault. Furthermore, the correlation coefficient ranks low because the sinusoidal waveform of the linear load attenuates the asymmetry of the waveform to some extent. In nonlinear loads, a large number of harmonics in the line exacerbate waveform asymmetry, and the correlation coefficient ranks high.
5.3 arc detection Using Long-short term memory
After feature selection is completed, the feature set is input into the LSTM for training. The feature data set is loaded into the training set X. The training set X is an array of elements comprising 266 sequences of different lengths, of which 9 features correspond to cepstral coefficients. Y is the classification vector for labels 1,2, …, 4. The entries in the training set are matrices with 9 rows (one for each feature) and a different number of columns (one for each time step); FIG. 7 is a multi-domain feature line;
an LSTM network architecture is defined. The input size is designated as 9 (the number of features of the input data). An LSTM layer is assigned to have 100 hidden units and the last element of the sequence is output. Finally, nine classes are specified by containing a full join layer of size 4, followed by a softmax layer and a taxonomy layer.
Training options are specified. The solver was designated as "adam" and "GradientThreshold" was designated as 1. The mini-batch size is set to 20 and the maximum epoch number is set to 50. The CPU is more suitable for training because of the small lot size and short sequence.
TABLE 3 identification of arc faults
Figure BDA0003304097760000171
From the above, the present invention proposes an arc detection method based on multi-domain features and LSTM. First, according to the standard, a simple circuit topology is established that produces arc faults. And (4) carrying out KPCA (kernel principal component analysis) on the fault arc current signal, and extracting a third principal component. And extracting and processing the signals subjected to KPCA preprocessing from the aspects of time domain, frequency domain and energy domain. And estimating the importance of the unbiased predictive variable under linear load and nonlinear conditions by using a random forest, and obtaining a corresponding characteristic combination. The complexity of the LSTM model is greatly reduced, and the speed and the precision of arc detection are improved.
The embodiment of the application also provides an electronic device, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus,
a memory for storing a computer program;
a processor, configured to implement the above fault arc multi-domain identification method when executing a program stored in a memory, the method including:
processing the original current signal, performing kernel principal component analysis to extract a third principal component, and then performing time domain, frequency domain and energy domain feature extraction on the third principal component signal;
secondly, carrying out unbiased prediction importance estimation by using a random forest to select high correlation characteristics under the corresponding load condition;
and finally, using the screened features as the feature input of the LSTM for learning and training to realize multi-domain identification of the fault arc.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM), or may include a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
In yet another embodiment provided by the present application, a computer-readable storage medium is also provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned fault arc multi-domain identification method.
In a further embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of fault arc multi-domain identification of the above-described embodiment.
It is understood that the electronic device and the storage medium provided by the embodiment of the present invention correspond to the method provided by the embodiment of the present invention, and the explanation, the example and the beneficial effects of the related contents can refer to the corresponding parts in the method.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A fault current multi-domain identification method based on RF-LSTM is characterized by comprising the following steps,
aiming at an arc fault platform, acquiring an original current signal;
processing the original current signal, performing kernel principal component analysis to extract a third principal component, and then performing time domain, frequency domain and energy domain feature extraction on the third principal component signal;
secondly, carrying out unbiased prediction importance estimation by using a random forest to select high correlation characteristics under the corresponding load condition;
and finally, using the screened features as the feature input of the LSTM for learning and training to realize multi-domain identification of the fault arc.
2. The RF-LSTM based fault current multi-domain identification method of claim 1, wherein: the processing of the raw current signal comprises the steps of:
calculating test data under different load conditions by adopting KPCA algorithm, specifically, calculating mapping function by KPCA algorithm
Figure FDA0003304097750000011
Mapping the sample to a high-dimensional feature space F, and performing PCA analysis in the F, namely obtaining a linear representation of the test sample x _ new in the subspace through a KPCA algorithm, namely a vector after dimensionality reduction;
wherein, the calculation process of the KPCA algorithm comprises the following steps:
1) removing the average value, and centralizing;
2) calculating a kernel matrix K by using a kernel function;
3) calculating a characteristic value kernel characteristic vector of the kernel matrix;
4) arranging the eigenvectors into a matrix from top to bottom according to the corresponding eigenvalue size, and taking the first k rows to form a matrix P;
5) and P is the data after dimension reduction.
3. The RF-LSTM based fault current multi-domain identification method of claim 1, wherein:
the time domain features include the following:
(1) the skewness definition includes normal distribution, i.e. skewness is 0, right skewness distribution is also called positive skewness distribution, the skewness is greater than 0, left skewness distribution is also called negative skewness distribution, and the skewness is less than 0; the following were used:
Figure FDA0003304097750000012
wherein k is2、k3Respectively representing second-order and third-order center distances;
(2) kurtosis is also called kurtosis coefficient; the characteristic number of the peak of the probability density distribution curve is characterized at the mean value as follows
Figure FDA0003304097750000013
Where n is the sample size, D is the variance, xiFor the (i) th measured value(s),
Figure FDA0003304097750000014
is an arithmetic mean;
(3) crest factor CarcThe change of the current waveform is described for the time domain characteristics, and is calculated by formula (15), N is the number of samples, xiIs the amplitude of each current sample point, the crest factor representing the peak in the waveformAnd in extreme degree, the maximum value is searched by using the absolute value of the current sampling point, so that the influence of asymmetry of positive and negative waveforms of partial load current on subsequent analysis is avoided:
Figure FDA0003304097750000021
(4) the fluctuation of the envelope curve of the time domain waveform of the signal in a normal state is far smaller than that in a fault state, the fault state corresponds to a complex track curve, the fluctuation of the envelope curve is large, and the standard deviation V of the envelope curve sequence is used as a quantitative index for measuring the fluctuation condition of the envelope curve of the time domain waveform of the signal x, namely
V=max(σ12) (16)
In the formula sigma1And σ2Respectively, are envelope maximum point sequences T on the time domain waveform of the signal xiStandard deviation and lower envelope minimum point sequence B of (i ═ 1,2, …, n)i(i ═ 1,2, …, n) standard deviations calculated as follows:
Figure FDA0003304097750000022
Figure FDA0003304097750000023
(5) spearman rank correlation coefficient rhosIs a nonparametric index used for measuring the correlation between two variables; a normal test current waveform having the same circuit topology is arbitrarily selected as a reference waveform, and Spearman correlation coefficients of other test waveforms are obtained. The calculation method is shown in equation (19).
Figure FDA0003304097750000024
Where x is the test current waveform and y is the selected reference current waveform.
4. The RF-LSTM based fault current multi-domain identification method of claim 1, wherein: the frequency domain features include:
(1) a wavelet coefficient variance;
(2) performing wavelet transformation;
(3) the harmonic wave caused by the electric arc and the harmonic wave caused by power change under different load conditions are complex and diverse, the feature extraction is realized by adopting discrete Fourier transform, and the harmonic wave normalization H (n) is applied as follows:
Figure FDA0003304097750000025
in the formula, AnIs the amplitude of each harmonic, A1Is the amplitude of the fundamental component, and h (n) is the calculated characteristic of the harmonic.
5. The RF-LSTM based fault current multi-domain identification method of claim 1, wherein the energy domain characterization comprises:
(1) wavelet energy entropy feature extraction is carried out on the current signal, the ratio of energy of each frequency band to total energy is calculated according to equation (16) to be used as the wavelet packet energy feature of the current signal,
Figure FDA0003304097750000031
where E (i, m) is the wavelet energy of the signal current x (t), the decomposition level of E (i, m) is i, the scale is m, HnIs the wavelet packet entropy;
(2) the power spectrum is an energy description following the principle of conservation of energy in the process of converting a time-domain signal into a frequency-domain signal by fourier transform:
Figure FDA0003304097750000032
wherein s isiIs a sequence of spectra y at the corresponding frequencyiPower spectral value of yiIs a Fourier transform of a time domain signal, qiIs the percentage of the ith power spectrum in the overall power spectrum. HpIs the power spectral entropy;
(3) sample entropy SampEn is used to measure the complexity of a sequence, and the calculation steps of SampEn are as follows:
dividing n sampling points of one period into a group of vector sequences with dimension m,
Im(n)={i(n),i(n+1),…i(n+m-1)} 1≤n≤N-m+1 (23)
vector sequence Im(n) and ImThe distance between (t) is defined as d;
d=maxk=0,1,…,m-1(|i(n+k)-i(t+k)|) (24)
calculating the number of t, wherein Im(n) and Im(t) the distance between them is less than or equal to r, denoted An.
Figure FDA0003304097750000033
A is to bem(r) is defined as:
Figure FDA0003304097750000034
let k be m +1, repeat equations (20) and (21):
Figure FDA0003304097750000035
Hsis the sample entropy:
Figure FDA0003304097750000036
6. the RF-LSTM based fault current multi-domain identification method of claim 1, wherein unbiased prediction importance estimation with random forests is used to select high correlation features corresponding to load conditions, including feature selection based on random forests, the steps are as follows:
assuming that class C exists and the probability that a sample point belongs to class I is "P" u "C", the calculation formula of the kini value is given:
Figure FDA0003304097750000041
if the current sample set I is divided into normal currents (I) according to the characteristic AN) And fault arc current (I)F) Then, under the condition of the characteristic A, the Gini index of the set I is defined as:
Figure FDA0003304097750000042
gini (I, A) represents the uncertainty of the data set I with different sets of characteristics A, the larger the value of the Gini index, the larger the uncertainty of the sample set.
7. The RF-LSTM-based fault current multi-domain identification method as claimed in claim 6, wherein random forests are used for unbiased prediction importance estimation to select high correlation features under corresponding load conditions, and further comprising feature identification based on long-term and short-term memory, specifically adopting the structure of a forgetting gate, an input gate and an output gate to process forgetting information and memorizing information;
wherein the content of the first and second substances,
formula for forgetting the door:
ft=σ[Wf·(ht-1,xt+bf)] (31)
input X ═ X1,…XT]Data h of the previous hidden layer at time tt-1Combined and adjusted to the same size as the hidden layer at time T by the W matrix, then added with an offset bfAnd classifying between 0 and 1 using sigmoid function;
the formula for the input gate:
Figure FDA0003304097750000043
in the output gate, the output to the next LSTM cell also requires two steps: 1. and 2, adjusting through a tanh function, and multiplying the hidden state data at the last moment by the input comprehensive data at the moment.
8. The RF-LSTM based fault current multi-domain identification method of claim 7, wherein the selection criteria of high correlation features is that H (n)5 is the most important feature of linear load arc fault and power spectrum entropy is the most important feature of non-linear load arc fault.
9. The RF-LSTM-based fault current multi-domain identification method of claim 1, wherein the learning and training step comprises:
inputting the feature set into an LSTM for training after the feature selection is finished; loading the feature data set into a training set X, wherein the training set X is a unit array and comprises 266 sequences with different lengths, and 9 features correspond to cepstral coefficients; y is the classification vector for labels 1,2, …, 4, and the entries in the training set are matrices with 9 rows, one for each feature, and a different number of columns, i.e., one for each time step;
defining an LSTM network architecture; the input size is designated as 9, i.e. the number of features of the input data; assigning an LSTM layer with 100 hidden units and outputting the last element of the sequence, and finally assigning nine classes by including a full join layer of size 4, followed by a softmax layer and a classification layer;
specifying training options, specifying the solver as "adam", and specifying "GradientThreshold" as 1; the mini-batch size is set to 20 and the maximum epoch number is set to 50.
10. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 9.
CN202111198766.1A 2021-10-14 Fault current multi-domain identification method based on RF-LSTM and storage medium Active CN114077846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111198766.1A CN114077846B (en) 2021-10-14 Fault current multi-domain identification method based on RF-LSTM and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111198766.1A CN114077846B (en) 2021-10-14 Fault current multi-domain identification method based on RF-LSTM and storage medium

Publications (2)

Publication Number Publication Date
CN114077846A true CN114077846A (en) 2022-02-22
CN114077846B CN114077846B (en) 2024-06-04

Family

ID=

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905835A (en) * 2022-11-15 2023-04-04 国网四川省电力公司电力科学研究院 Low-voltage alternating current arc fault diagnosis method fusing multidimensional characteristics
CN116186018A (en) * 2023-04-25 2023-05-30 国网冀北电力有限公司 Power data identification and analysis method based on safety control
CN117648634A (en) * 2024-01-30 2024-03-05 合肥工业大学 Method and system for predicting performance of connecting hardware fitting of power distribution network based on time domain and frequency domain information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110082640A (en) * 2019-05-16 2019-08-02 国网安徽省电力有限公司 A kind of distribution singlephase earth fault discrimination method based on long memory network in short-term
WO2020015277A1 (en) * 2018-07-20 2020-01-23 国电南瑞科技股份有限公司 Arc light fault identifying device and method based on panoramic information
CN111610416A (en) * 2020-05-25 2020-09-01 南京航空航天大学 Series arc fault intelligent circuit breaker
CN111935762A (en) * 2020-07-27 2020-11-13 国网安徽省电力有限公司 EWT and CNN-based distribution network fault diagnosis method and system under 5G carrier network
CN113092955A (en) * 2021-03-10 2021-07-09 浙江华消科技有限公司 Arc fault detection method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020015277A1 (en) * 2018-07-20 2020-01-23 国电南瑞科技股份有限公司 Arc light fault identifying device and method based on panoramic information
CN110082640A (en) * 2019-05-16 2019-08-02 国网安徽省电力有限公司 A kind of distribution singlephase earth fault discrimination method based on long memory network in short-term
CN111610416A (en) * 2020-05-25 2020-09-01 南京航空航天大学 Series arc fault intelligent circuit breaker
CN111935762A (en) * 2020-07-27 2020-11-13 国网安徽省电力有限公司 EWT and CNN-based distribution network fault diagnosis method and system under 5G carrier network
CN113092955A (en) * 2021-03-10 2021-07-09 浙江华消科技有限公司 Arc fault detection method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴晓欣;何怡刚;等: "考虑复杂时序关联特性的Bi-LSTM变压器DGA故障诊断方法", 电力自动化设备, no. 08, 10 August 2020 (2020-08-10) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905835A (en) * 2022-11-15 2023-04-04 国网四川省电力公司电力科学研究院 Low-voltage alternating current arc fault diagnosis method fusing multidimensional characteristics
CN115905835B (en) * 2022-11-15 2024-02-23 国网四川省电力公司电力科学研究院 Low-voltage alternating current arc fault diagnosis method integrating multidimensional features
CN116186018A (en) * 2023-04-25 2023-05-30 国网冀北电力有限公司 Power data identification and analysis method based on safety control
CN117648634A (en) * 2024-01-30 2024-03-05 合肥工业大学 Method and system for predicting performance of connecting hardware fitting of power distribution network based on time domain and frequency domain information
CN117648634B (en) * 2024-01-30 2024-04-16 合肥工业大学 Method and system for predicting performance of connecting hardware fitting of power distribution network based on time domain and frequency domain information

Similar Documents

Publication Publication Date Title
Lubba et al. catch22: CAnonical Time-series CHaracteristics: Selected through highly comparative time-series analysis
CN109271975B (en) Power quality disturbance identification method based on big data multi-feature extraction collaborative classification
Peng et al. Random forest based optimal feature selection for partial discharge pattern recognition in HV cables
Zhao et al. Novel method based on variational mode decomposition and a random discriminative projection extreme learning machine for multiple power quality disturbance recognition
CN106443316B (en) Multi-information detection method and device for deformation state of power transformer winding
CN106845010B (en) Low-frequency oscillation dominant mode identification method based on improved SVD noise reduction and Prony
Wang et al. Optimizing GIS partial discharge pattern recognition in the ubiquitous power internet of things context: A MixNet deep learning model
Yin et al. An integrated DC series arc fault detection method for different operating conditions
Khan Partial discharge pattern analysis using PCA and back-propagation artificial neural network for the estimation of size and position of metallic particle adhering to spacer in GIS
Caicedo et al. A systematic review of real-time detection and classification of power quality disturbances
CN113343564A (en) Transformer top layer oil temperature prediction method based on multi-element empirical mode decomposition
Iturbide et al. A comparison between LARS and LASSO for initialising the time-series forecasting auto-regressive equations
Zhang et al. Novel approach for arc fault identification with transient and steady state based time-frequency analysis
Wang et al. Cable incipient fault identification using restricted Boltzmann machine and stacked autoencoder
Nasiri Soloklo et al. Model order reduction based on moment matching using legendre wavelet and harmony search algorithm
Ahmad et al. Long short term memory based deep learning method for fault power line detection in a MV overhead lines with covered conductors
CN115015683A (en) Cable production performance test method, device, equipment and storage medium
CN117748507B (en) Distribution network harmonic access uncertainty assessment method based on Gaussian regression model
Zhu et al. Partial discharge pattern recognition method based on variable predictive model‐based class discriminate and partial least squares regression
CN114004162A (en) Modeling method for smelting load harmonic emission level under multi-working-condition scene
CN109557434A (en) Local discharge signal recognition methods under strong background noise based on the classification of compound dictionary rarefaction representation
CN114077846B (en) Fault current multi-domain identification method based on RF-LSTM and storage medium
CN114077846A (en) RF-LSTM-based fault current multi-domain identification method and storage medium
Liu et al. Classification and identification of electric shock current for safety operation in power distribution network
CN115015120B (en) Fourier infrared spectrometer and temperature drift online correction method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant