Disclosure of Invention
The invention provides a power equipment fault sound intelligent diagnosis method, a system, a terminal and a medium based on MFCC and attention mechanism, aiming at the defects in the prior art.
According to one aspect of the invention, the intelligent sound diagnosis method for the power equipment fault comprises the following steps:
establishing a power equipment fault audio sample database, and dividing a training set and a test set;
respectively carrying out preprocessing operation on the audio samples of the training set and the testing set;
extracting n-dimensional Mel cepstrum coefficients from each preprocessed frame of audio signal as a feature vector of one frame;
taking adjacent m frames of audio signals as a group of samples, and optimizing the feature vectors of the group of samples by using an attention mechanism to form optimized feature vectors;
inputting the optimized feature vector into an audio recognition model for judgment, and finishing training and testing the audio recognition model;
and inputting the audio to be identified into the audio identification model, and identifying and outputting the fault sound of the corresponding type of the electrical equipment.
Preferably, the establishing an audio sample database of the power equipment fault, and dividing the training set and the test set includes: collecting audio frequencies of common power equipment under different working conditions and defects, and carrying out type marking on the audio frequencies to form a complete power equipment fault audio frequency sample database;
for each type of audio sample, randomly extracting a part of the audio samples in proportion to be used as a training set for training a model, and using the rest of the audio samples as a test set for verifying the effectiveness of the model;
all audio samples in the training set and the testing set and corresponding labels thereof are randomly arranged.
Preferably, the preprocessing operation comprises: pre-emphasis, de-mute, framing, and windowing.
Preferably, the preprocessing operation further comprises any one or more of the following operations:
-the pre-emphasis factor is 0.97;
-the de-mute threshold is an average energy of 40%;
-the framing comprises: dividing the audio sample into 25ms segments, and setting the frame displacement as 10 ms;
-said windowing comprises: windowing each frame of audio signal with a hamming window.
Preferably, the value of n is 13-20.
Preferably, the value of m is 10-50.
Preferably, the method for constructing the audio recognition model includes:
and (3) adopting a deep neural network, and arranging a Dropout layer behind each full-connection layer of the deep neural network to construct and obtain an audio recognition model.
Preferably, the training of the audio recognition model takes the optimized feature vector formed based on the MFCC and the attention mechanism as the input of the audio recognition model, and the parameters are continuously updated through forward propagation and error backward propagation to enable the deep neural network to learn, so as to finally generalize the weight model capable of classifying the audio recognition under different operating states and defects of the power equipment.
Preferably, the activation function of the network training employs a tanh function.
According to another aspect of the present invention, there is provided an audio intelligent diagnosis system for power equipment failure, comprising:
the database construction module is used for establishing a power equipment fault audio sample database and dividing a training set and a test set;
the data preprocessing module is used for respectively preprocessing the audio samples of the training set and the testing set;
a feature vector extraction module which extracts n-dimensional mel cepstrum coefficients from each frame of preprocessed audio signal as a feature vector of one frame;
the feature optimization module takes the adjacent m frames of audio signals as a group of samples, optimizes feature vectors of the group of samples by using an attention mechanism and forms optimized feature vectors;
the audio recognition model module is used for constructing an audio recognition model, inputting the optimized feature vector into the audio recognition model for judgment, and training and testing the audio recognition model; and inputting the audio to be identified into the audio identification model, and identifying and outputting the fault sound of the corresponding type of the electrical equipment.
According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform any of the methods described above.
According to a fourth aspect of the invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operable to perform the method of any of the above.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following beneficial effects:
according to the intelligent diagnosis method, system, terminal and medium for the power equipment fault sound, deep learning is applied to power equipment fault sound recognition, an acoustic-based power equipment fault diagnosis model can be effectively established, and compared with a traditional shallow classifier, the intelligent diagnosis method has the powerful advantages that a traditional artificial intelligence algorithm cannot compare with the traditional artificial intelligence algorithm in the aspects of feature self-learning, end-to-end modeling and the like.
According to the intelligent diagnosis method, system, terminal and medium for the power equipment fault sound, provided by the invention, an attention mechanism is introduced to improve the MFCC, so that the MFCC is better suitable for representing the sound of the power equipment, adjacent m frames (for example, 50 frames) of MFCC vectors in a sample are comprehensively considered, some effective information which is more important for a current task is screened out and processed, and the identification effect is effectively improved.
According to the intelligent diagnosis method, system, terminal and medium for the power equipment fault sound, a proper deep neural network structure is built, namely, a Dropout layer is arranged after each layer is fully connected, and the problem of model overfitting is avoided.
Detailed Description
The following examples illustrate the invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
Fig. 1 is a flowchart of an intelligent sound diagnostic method for a fault of an electrical device according to an embodiment of the present invention.
As shown in fig. 1, the method for intelligently diagnosing a fault of an electrical device according to this embodiment may include the following steps:
s100, establishing a power equipment fault audio sample database, and dividing a training set and a test set;
s200, respectively carrying out preprocessing operation on the audio samples of the training set and the testing set;
s300, extracting n-dimensional Mel cepstrum coefficients from each preprocessed frame of audio signal to be used as a feature vector of one frame;
s400, using the adjacent m frames of audio signals as a group of samples, and optimizing the feature vectors of the group of samples by using an attention mechanism to form optimized feature vectors;
s500, inputting the optimized feature vector into an audio recognition model for judgment, and finishing training and testing the audio recognition model;
and S600, inputting the audio to be identified into the audio identification model, and identifying and outputting the fault sound of the corresponding type of the power equipment.
In S100 of this embodiment, as a preferred embodiment, establishing an audio sample database of the power equipment failure, and dividing the training set and the test set, the method may include the following steps:
s101, collecting audio frequencies of common power equipment under different working conditions and defects, and carrying out type marking on the audio frequencies to form a complete power equipment fault audio frequency sample database;
s102, for each type of audio sample, randomly extracting a part of the audio samples in proportion to be used as a training set for training a model, and using the rest of the audio samples as a test set for verifying the effectiveness of the model;
s103, randomly arranging all the audio samples and the corresponding labels in the training set and the testing set.
In a specific application example, 80% can be randomly drawn as a training set, and the remaining 20% can be drawn as a testing set.
In S200 of this embodiment, as a preferred embodiment, the preprocessing operation may include: pre-emphasis, de-mute, framing, and windowing.
In a specific application example, in the pre-emphasis process, the pre-emphasis coefficient may be 0.97.
In one embodiment, the de-muting threshold may be 40% of the average energy in de-muting.
In a specific application example, in the framing process, the framing may include: the audio samples are sliced into 25ms segments with a frame shift of 10 ms.
In a specific application example, in the windowing process, the windowing may include: windowing each frame of audio signal with a hamming window.
In S300 of this embodiment, n may be 13 to 20 as a preferred embodiment.
In S400 of this embodiment, m may be 10 to 50 as a preferred embodiment.
In S500 of this embodiment, as a preferred embodiment, the method for constructing an audio recognition model may include the following steps:
and (3) adopting a deep neural network, and arranging a Dropout layer behind each full-connection layer of the deep neural network to construct and obtain an audio recognition model.
In S500 of this embodiment, as a preferred embodiment, the training of the audio recognition model may include the following steps:
and (3) taking the optimized feature vector formed based on the MFCC and the attention mechanism as the input of the audio recognition model, continuously updating parameters through forward propagation and error backward propagation to enable the deep neural network to learn, and finally generalizing a weight model capable of recognizing and classifying the audio under different operating states and defects of the power equipment.
In S500 of this embodiment, as a preferred embodiment, the activation function of the network training employs a tanh function.
Fig. 2 is a flowchart of an intelligent sound diagnosis method for power equipment failure according to a preferred embodiment of the present invention.
As shown in fig. 2, the method for intelligently diagnosing a fault of an electrical device according to the preferred embodiment may include the following steps:
step 1, establishing an audio sample database of common power equipment faults, and then dividing a training set and a test set according to a proportion.
In this preferred embodiment, step 1 specifically includes the following steps:
step 1.1, collecting audio frequencies of power equipment such as a transformer and a switch under different working conditions and defects, and labeling the audio frequencies to form a complete power equipment fault audio frequency sample database.
And 1.2, randomly extracting 80% of samples of each type as a training set to train the model, and using the rest 20% as test samples to verify the effectiveness of the model. In order to ensure the effectiveness of learning, all samples and corresponding labels are randomly arranged and input into the network in a disordered sequence.
And 2, respectively carrying out pre-emphasis, de-muting, framing and windowing on the audio samples of the training set and the test set.
In this preferred embodiment, step 2 specifically includes the following steps:
step 2.1, pre-emphasis: the pre-emphasis processing is realized by passing the signal through a high-pass filter with first-order finite-length unit impulse response, and the transfer function of the high-pass filter is as follows:
H(z)=1-αz -1 (1)
in the formula, alpha is a pre-emphasis coefficient and is 0.97.
Step 2.2, de-muting: whether the sound signal is mute or not is judged by judging whether the short-time energy of the sound signal reaches a certain threshold value or not. Generally, the average energy is used as a measure, and the average energy of the sound signal is calculated by the following formula:
in the formula, L represents the number of sampling points;
x (n) -data for each sample point.
Step 2.3, framing: a plurality of sampling points are grouped into an observation unit, which is called a sub-frame. The covered time is about as. An overlap region is set between two adjacent frames, the overlap region includes a plurality of sampling points, and the time covered is about bs.
Further, the number of the plurality of sampling points is 400; the covered time is 0.025 s; the overlap region contains 160 sampling points covering 0.01 s.
And 2.4, windowing each frame of audio signal by using a Hamming window. The Hamming window function is expressed as:
in the formula, N represents the frame length, and α is a window function parameter, which is generally 0.46.
And 3, extracting 20-dimensional Mel cepstrum coefficient from each frame of signal obtained after preprocessing to serve as a feature vector of one frame.
In this preferred embodiment, step 3 specifically includes the following steps:
step 3.1, performing windowing on each frame signal after framingThe fast fourier transform obtains the frequency spectrum of each frame, and the power spectrum of the sound signal is obtained by performing a modulo square on the frequency spectrum of the sound signal (sound sample). Setting sound signal X a (k) The DFT of (1) is:
where k denotes the kth frequency of the fourier transform, x (N) denotes an input speech signal, and N denotes the number of points of the fourier transform.
Step 3.2, passing the energy spectrum through a group of Mel-scale triangular filter banks, wherein the frequency response H of the triangular filter banks m (k) Comprises the following steps:
wherein f (-) represents the center frequency,
m is the number of filters.
Step 3.3, calculating the logarithmic energy s (m) output by each filter bank, wherein the form is as follows:
step 3.4, obtaining MFCC coefficient C (n) through Discrete Cosine Transform (DCT) in the form of:
and 4, taking the adjacent 50 frames of audio signals as a group of samples, optimizing the feature vectors of the group of samples by using an attention mechanism, and constructing new feature vectors.
In this preferred embodiment, step 4 specifically includes the following steps:
step 4.1, input signal, with X ═ { X 1 ,x 2 ,...,x n Denotes n input signals.
Step 4.2, attention distribution calculation, for convenience of understanding, assume key i =value i =x i Then the attention distribution is:
α i =softmax(s(key i ,q))=softmax(s(x i ,q)) (8)
in the formula of alpha i -the weight of the ith input message;
softmax — map the input between 0-1 and normalize the guaranteed sum to 1;
s(key
i q) -attention scoring mechanism, herein adopted dot product models, i.e.
And 4.3, carrying out weighted average on the information to obtain an attention vector:
in the formula of alpha i -the weight of the ith input message.
And 5, inputting the optimized feature vector into an audio recognition model constructed by a deep neural network for judgment, and finishing training and testing the audio recognition model.
In the preferred embodiment, step 5 specifically includes the following steps:
step 5.1, building a deep neural network architecture, wherein an activation function of network training adopts a tanh function:
and 5.2, arranging a Dropout layer behind each full-connection layer, randomly discarding some hidden neurons in the network, and keeping the number of input neurons and output neurons unchanged.
And 5.3, after training and debugging the network structure parameters to be optimal, repeating the steps of preprocessing and feature extraction on the test set sample, inputting the feature vector into the network to obtain the corresponding probability of each fault sound, and taking the fault with the highest probability as the recognition result of the audio sample.
And 6, inputting the audio to be identified into the audio identification model, and identifying and outputting the fault sound of the corresponding type of the power equipment.
The intelligent diagnosis method for the power equipment fault sound provided by the embodiment of the invention can effectively solve the problem of adaptability of the Mel cepstrum coefficient to the power equipment sound, and can obtain better recognition effect when recognizing the power equipment fault sound.
Fig. 3 is a schematic structural diagram of a sound intelligent diagnosis system for power equipment failure according to an embodiment of the present invention.
As shown in fig. 3, the intelligent sound diagnostic system for power equipment failure provided by this embodiment may include the following modules:
the database construction module is used for establishing a power equipment fault audio sample database and dividing a training set and a test set;
the data preprocessing module is used for respectively preprocessing the audio samples of the training set and the testing set;
a feature vector extraction module which extracts n-dimensional mel cepstrum coefficients from each frame of preprocessed audio signal as a feature vector of one frame;
the feature optimization module takes the adjacent m frames of audio signals as a group of samples, optimizes feature vectors of the group of samples by using an attention mechanism and forms optimized feature vectors;
the audio recognition model module is used for constructing an audio recognition model, inputting the optimized feature vector into the audio recognition model for judgment, and training and testing the audio recognition model; and inputting the audio to be identified into the audio identification model, and identifying and outputting the fault sound of the corresponding type of the electrical equipment.
A third embodiment of the present invention provides a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to execute the method according to any one of the above embodiments of the present invention when executing the program.
Optionally, a memory for storing a program; a Memory, which may include a volatile Memory (RAM), such as a Random Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memories are used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in partition in the memory or memories. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments. Reference may be made in particular to the description relating to the preceding method embodiment.
The processor and the memory may be separate structures or may be an integrated structure integrated together. When the processor and the memory are separate structures, the memory, the processor may be coupled by a bus.
A fourth embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any of the above-mentioned embodiments of the invention.
According to the intelligent diagnosis method, the system, the terminal and the medium for the power equipment fault sound provided by the embodiment of the invention, firstly, a common power equipment fault audio sample database is established; then preprocessing operations such as pre-emphasis, de-mute, framing, windowing and the like are carried out on the audio samples; then extracting n-dimensional (such as 20-dimensional) Mel Cepstral Coefficients (MFCC) from each frame signal obtained after preprocessing as a feature vector of the frame signal; then, taking the adjacent m frames (such as 50 frames) as a group of samples, and optimizing by using an attention mechanism to form a new feature vector of the sample; and finally, inputting the optimized feature vector into a built deep neural network for judgment, and identifying fault sounds of various types of power equipment. The embodiment of the invention can effectively solve the problem of adaptability of the Mel cepstrum coefficient to the sound of the power equipment, and the method can obtain better identification effect when identifying the fault sound of the power equipment.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may implement the composition of the system by referring to the technical solution of the method, that is, the embodiment in the method may be understood as a preferred example for constructing the system, and will not be described herein again.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.