CN117497003A

CN117497003A - Transformer voiceprint recognition method based on channel attention mechanism and AH-Softmax

Info

Publication number: CN117497003A
Application number: CN202311622037.3A
Authority: CN
Inventors: 汪兆冉; 黄文礼; 杨建旭; 张可; 吴国元; 韩俊宝; 晏雨晴; 侯仕杰; 程晗
Original assignee: Anhui Nanrui Jiyuan Power Grid Technology Co ltd
Current assignee: Anhui Nanrui Jiyuan Power Grid Technology Co ltd
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-02-02

Abstract

The invention relates to a transformer voiceprint recognition method based on a channel attention mechanism and AH-Softmax, which comprises the following steps: acquiring audio data of any power transformer in real time, and acquiring voiceprint data in a binary format of the audio data and sampling frequency; preprocessing voiceprint data in a binary format; extracting features of voiceprint data in a training set; improving the VGG19 model to obtain an improved VGG19 model; and training the improved VGG19 model by utilizing the voiceprint feature training set, inputting the audio data of the power transformer to be identified into the trained VGG19 model, and identifying the voiceprint type of the power transformer. The improved VGG19 model has more distinguishing capability on the characteristics of each channel, and the distinguishing capability on the voiceprint of the transformer is enhanced; the information quantity of the transformer voiceprint samples is emphasized, the transformer voiceprint samples with different information quantities can be dynamically distinguished, and the recognition degree and accuracy of the improved VGG19 model to the transformer voiceprint signals can be improved.

Description

Transformer voiceprint recognition method based on channel attention mechanism and AH-Softmax

Technical Field

The invention relates to the technical field of power equipment, in particular to a transformer voiceprint recognition method based on a channel attention mechanism and AH-Softmax.

Background

The power transformer is used as one of the most critical devices in the power system and bears the difficult tasks of voltage conversion, electric energy transmission and the like, so that the normal and stable operation of the transformer is ensured to have very important significance for the safe operation of the whole power system. When different faults occur in the transformer, different sound signals sent by the transformer contain very rich state information, and the running state of the transformer can be reflected to a great extent. With the rapid development of artificial intelligence, fault diagnosis of mechanical states of transformer equipment by utilizing a voiceprint recognition technology becomes a new research hot spot.

Algorithms including BP neural network, SVM support vector machine, CNN model and LSTM model can not meet the requirements of modern power system in terms of speed and accuracy, and under the challenge of multiple factors, the short plate of the traditional identification method is further highlighted, and the specific expression is: the voice print type of the transformer is mainly divided into a normal voice print signal and an abnormal voice print signal, wherein the normal voice print signal has the types of normal working voice, human voice, normal working voice, bird voice and the like, the normal working voice is the same as rain voice, the abnormal voice print signal has the types of breakdown, bias magnetic working condition, short-circuit impact, partial discharge and the like, and therefore the transformer can be seen to have excessive voice print types. However, the conventional recognition method mainly has the problems of low speed and low accuracy in recognition of the transformer voiceprint at present, on one hand, because the conventional recognition model usually corresponds to downsampling when features are extracted, the number of channels is greatly increased while the length and width dimensions of an image are reduced, and in many channels, a plurality of types of information are contained, and the correlation between some channels and the transformer voiceprint is high, some channels are low, and even some channels are almost not. Because of the channels with low correlation and no correlation with the transformer voiceprint, the accuracy of the recognition model for transformer voiceprint recognition is affected, and the recognition time is prolonged; on the other hand, the classification function part of the traditional recognition model usually takes pure samples as the premise that noise is not contained, but in practical application, the sample collection usually has difficulty in removing external noise, and the part of samples have negative influence on the training effect, so that the classification accuracy of the recognition model on the transformer voiceprints is reduced.

Disclosure of Invention

In order to solve the problems of low recognition speed and low recognition accuracy of the voiceprint of the transformer, the invention aims to provide a method for recognizing the voiceprint of the transformer based on a channel attention mechanism and AH-Softmax, which can rapidly and accurately recognize different types of voiceprint signals of the transformer.

In order to achieve the above purpose, the present invention adopts the following technical scheme: a transformer voiceprint recognition method based on a channel attention mechanism and AH-Softmax, the method comprising the sequential steps of:

(1) Utilizing r voiceprint acquisition sensors to acquire audio data of any power transformer in real time, wherein each acquired audio data corresponds to a file address; acquiring voiceprint data in a binary format of audio data and sampling frequency;

(2) Preprocessing voiceprint data in a binary format to obtain preprocessed voiceprint data omega (n), forming a data set, and dividing the data set into a training set and a verification set according to a proportion;

(3) Characteristic extraction is carried out on voiceprint data in the training set by utilizing a Mel Frequency Cepstrum Coefficient (MFCC) to obtain voiceprint characteristics and form a voiceprint characteristic training set;

(4) Improving the VGG19 model to obtain an improved VGG19 model;

(5) Training the improved VGG19 model by utilizing the voiceprint feature training set to obtain a trained VGG19 model, and verifying the trained VGG19 model through the verification set;

(6) And inputting the audio data of the power transformer to be identified into the trained VGG19 model, and identifying the voice print type of the power transformer.

The step (2) specifically comprises the following steps:

(2a) Denoising: removing a mute part and a noise part of voiceprint data in a binary format by using an endpoint detection method based on the audio amplitude to obtain a residual part;

(2b) Pre-emphasis: the loss of the rest part is compensated by utilizing a pre-emphasis method, the voice print data in the input binary format is set as S (n), and the audio after passing through the first-order FIR filter is set asThe formula is shown as formula (1):

in the formula (1), a is a constant, and 0.96 is taken;

(2c) Framing: dividing the audio into N sections of voice signals with fixed sizes by utilizing a framing method, wherein each section of voice signal is a frame, and the frame length frame takes 25ms; the frame division adopts an overlapping segmentation method, the overlapping part of the previous frame and the next frame is frame shift, and the ratio m of the frame shift to the frame length is 0.5; framing a speech signal of length N as shown in equation (2):

the data is divided into n frames, each frame f _n The position of (2) is [ m ] frame (n-1), m ] frame (n-1) +frame]If the last frame is at the end (frame + (n-1) +frame)>N, filling the excess part with 0;

(2d) Windowing: each frame is brought into a window function, the window function selects a Hamming window, and the expression of the Hamming window is shown in a formula (3):

wherein R is _M (n) is voiceprint data of the nth frame, and ω (n) is voiceprint data after preprocessing.

The step (3) specifically comprises the following steps:

(3a) The pre-processed voiceprint data ω (n) is decomposed into two sub-signals using a fast fourier transform FFT: even sample point signalAnd odd sample point signal +.>And then add even sample point signal->And odd sample point signal +.>Is equivalent to two sum terms of length +.>The specific calculation process of the discrete Fourier transform DFT is as follows: n additions are performed for any k, and the DFT shares N ² A secondary multiplication operation; n-1 addition operations are carried out on any K, and the discrete Fourier transform DFT shares N (K-1) addition operations; fast fourier transform FFT sharing N (log) ₂ N-1) multiplication operations and N log ₂ N addition operations;

(3b) The following operations are performed with a mel filter bank:

(3b1) Determining that the lowest frequency of the voiceprint data processed in the step (3 a) is 0Hz and the highest frequency is f _s The number M of the Mel filters is 23;

(3b2) Converting the lowest frequency and the highest frequency into respective mel scales low_mel and high_mel respectively;

(3b3) Calculating the distance d of the center mel frequencies of two adjacent mel filters _mel As shown in formula (4):

wherein high_mel is the Mel scale of the highest frequency, low_mel is the Mel scale of the lowest frequency, M is the number of Mel filters;

(3c) Obtaining the spectrum of the voice signal passing through the Mel filter group by using the logarithmic operationAs shown in formula (6):

wherein H (k) is a high-frequency spectrum function, and E (k) is a low-frequency spectrum function;

only the amplitude is considered, as shown in equation (7):

taking logarithm from two sides, as shown in formula (8):

and then taking inverse Fourier transform at two sides to obtain the complex, wherein the complex is shown in a formula (9):

(3d) The length of the voice signal is enlarged to be twice as much as the original length and changed into 2N by using the DCT, in order to make the enlarged signal symmetric about 0, the whole extended signal is shifted to the right by 0.5 units, and the final DCT conversion formula is expressed as (10):

wherein N is the length of the voice signal,for the xth spectrum, when u=0, +.>Otherwise the first set of parameters is selected,u is the generalized frequency;

(3e) Combining dynamic and static characteristics of a frequency spectrum after DCT to improve the recognition performance of a system, wherein a calculation formula of a frequency spectrum differential parameter is as follows:

wherein d _t Representing the t first-order difference, i.e. voiceprint characteristics, C _t Represents the t-th cepstral coefficient, Q represents the order of the cepstral coefficient, and K represents the time difference of the first derivative.

The step (4) specifically refers to:

(4a) Adding a channel attention mechanism module into the VGG19 model:

the VGG19 model consists of 2 convolution layers of 64x3x3, 2 convolution layers of 128x3x3, 5 maximum pooling layers of 2x2, 8 convolution layers of 512x3x3, 3 full connection layers and a classification function Softmax; the channel attention mechanism module consists of a global pooling layer of 1x1x512, a full-connection layer of 1x1x64, an activation function layer of 1x1x64, a full-connection layer of 1x1x512 and a logistic regression layer of 1x1x 512; the channel attention mechanism module is added between the last 512x3x3 convolution layer and the last 2x2 max pooling layer of the VGG19 model;

(4b) The original classification function Softmax in the VGG19 model is replaced by a new classification function AH-Softmax, which is shown in the formula (16):

wherein the sample weight indicates a functionp _l Probability of sample class l, p _j For the probability of sample class j, θ _j,l Is w _j Included angle with sample x, θ _l,l Is w _l Included angle with sample x, s= ||w _j || ||x||，f(m),θ _l,l )＝cos(m,θ _l,l )，L _j ＝d(p _j )-1，w _j The optimization angle of the sample class j is defined, and m is the margin of the boundary loss function; w (w) _l An optimized angle for sample class l; j is the total category number of the sample;

h(t,θ _j,l ,L _j ) And 1 is a re-weighting function for emphasizing the weights of different power transformer voiceprint samples, and the weights are respectively in the following two forms, as shown in the formula (17) and the formula (18):

h(t,θ _j,l ,L _j )＝exp(stL _j ) (17)

h(t,θ _j,l ,L _j )＝exp(st(cos(θ _j,l )+1)L _j ) (18)

in the formula, exp (stL) _j ) For a fixed weight function, exp (st (cos (θ) _j,l )+1)L _j ) Is an adaptive weight function.

According to the technical scheme, the beneficial effects of the invention are as follows: firstly, a channel attention mechanism module is added on an original VGG19 model to obtain an improved VGG19 model, the improved VGG19 model has more distinguishing capability on the characteristics of each channel, the relation among the channels and the importance degree of the relation are obtained in training, and the distinguishing capability on the voiceprint of the transformer is enhanced; secondly, replacing the classification function Softmax of the original VGG19 model with a new classification function AH-Softmax, estimating a transformer voiceprint sample tag by using weight indication function distribution as a clue, emphasizing the information quantity of the transformer voiceprint sample, dynamically distinguishing the transformer voiceprint samples with different information quantities, definitely emphasizing the information vector in the transformer voiceprint sample, and simultaneously absorbing the discernability among different transformer voiceprint categories to guide the distinguishing feature learning, and improving the recognition degree and accuracy of the improved VGG19 model to the transformer voiceprint signals through two parts of improvement.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a network structure diagram of the improved VGG19 model.

Detailed Description

As shown in fig. 1, a transformer voiceprint recognition method based on a channel attention mechanism and AH-Softmax, the method comprises the following sequential steps:

(4) Improving the VGG19 model to obtain an improved VGG19 model;

The step (2) specifically comprises the following steps:

in the formula (1), a is a constant, and 0.96 is taken;

the pre-emphasis is used for adding a zero point to counteract the high-end spectrum amplitude drop caused by glottal pulse, so that the signal spectrum is flattened and the resonance amplitudes are close, only the influence in the sound channel is left in the voice, and the extracted characteristics are more in accordance with the model of the meta-sound channel. A first-order FIR filter is inserted, high frequency is improved, meanwhile, a low frequency part is attenuated, when some fundamental frequencies are assigned to be larger, interference of the fundamental frequencies to formant detection is reduced through pre-emphasis, and meanwhile, the dynamic range of a frequency spectrum is reduced.

The frame-to-frame smooth transition is achieved by windowing, and continuity is maintained, i.e., signal discontinuities that may be caused at both ends of each frame are eliminated. Because the truncation has frequency domain energy leakage, a window function is required to reduce the effects of the truncation. Each frame is thus brought into a window function, where the window function selects a hamming window.

The step (3) specifically comprises the following steps:

(3a) The pre-processed voiceprint data ω (n) is decomposed into two sub-signals using a fast fourier transform FFT: even sample point signalAnd odd sample point signal +.>And then add even sample point signal->And odd sample point signal +.>Is equivalent to two sum terms of length +.>The specific calculation process of the discrete Fourier transform DFT is as follows: n additions are performed for any k, and the DFT shares N ² A secondary multiplication operation; n-1 addition operations are carried out on any k, and the discrete Fourier transform DFT shares N (N-1) addition operations; fast fourier transform FFT sharing N (log) ₂ N-1) multiplication operations and N log ₂ N addition operations;

(3b) The following operations are performed with a mel filter bank:

only the amplitude is considered, as shown in equation (7):

taking logarithm from two sides, as shown in formula (8):

The step (4) specifically refers to:

(4a) Adding a channel attention mechanism module into the VGG19 model:

as shown in fig. 2, the VGG19 model consists of 2 convolution layers of 64x3x3, 2 convolution layers of 128x3x3, 5 max pooling layers of 2x2, 8 convolution layers of 512x3x3, 3 full connection layers, and a classification function Softmax; the channel attention mechanism module consists of a global pooling layer of 1x1x512, a full-connection layer of 1x1x64, an activation function layer of 1x1x64, a full-connection layer of 1x1x512 and a logistic regression layer of 1x1x 512; the channel attention mechanism module is added between the last 512x3x3 convolution layer and the last 2x2 max pooling layer of the VGG19 model;

wherein the sample weight indicates a functionp _l Probability of sample class l, p _j For the probability of sample class j, θ _j,l Is w _j Included angle with sample x, θ _l,l Is w _l Included angle with sample x, s= ||w _j || ||x||，f(m,θ _l,l )＝cos(m,θ _l,l )，L _j ＝d(p _j )-1，w _j The optimization angle of the sample class j is defined, and m is the margin of the boundary loss function; w (w) _l An optimized angle for sample class l; j isTotal category number of samples;

h(t,θ _j,l ,L _j )＝exp(stL _j ) (17)

h(t,θ _j,l ,L _j )＝exp(st(cos(θ _j,l )+1)L _j ) (18)

In summary, the channel attention mechanism module is added to the original VGG19 model to obtain an improved VGG19 model, the improved VGG19 model has more distinguishing capability on the characteristics of each channel, the relation among the channels and the importance degree thereof are obtained in training, and the distinguishing capability on the voiceprint of the transformer is enhanced; the classification function Softmax of the original VGG19 model is replaced by a new classification function AH-Softmax, the new classification function AH-Softmax utilizes weight indication function distribution as a clue to estimate the voice print sample labels of the transformer, the information quantity of the voice print samples of the transformer is emphasized, the voice print samples of the transformer with different information quantities can be dynamically distinguished, the information vectors in the voice print samples of the transformer are clearly emphasized, and meanwhile the discernability among voice print categories of different transformers is absorbed to guide the distinguishing characteristic learning of the voice print samples, and the degree and the accuracy of the improved VGG19 model on the recognition of voice print signals of the transformer can be improved through two parts of improvement.

Claims

1. A transformer voiceprint recognition method based on a channel attention mechanism and AH-Softmax is characterized by comprising the following steps: the method comprises the following steps in sequence:

(4) Improving the VGG19 model to obtain an improved VGG19 model;

2. The transformer voiceprint recognition method based on channel attention mechanism and AH-Softmax of claim 1, wherein: the step (2) specifically comprises the following steps:

in the formula (1), a is a constant, and 0.96 is taken;

3. The transformer voiceprint recognition method based on channel attention mechanism and AH-Softmax of claim 1, wherein: the step (3) specifically comprises the following steps:

(3a) The pre-processed voiceprint data ω9n) is decomposed into two sub-signals using a fast fourier transform FFT: even sample point signalAnd odd sample point signal +.>And then add even sample point signal->And odd sample point signal +.>Is equivalent to two sum terms of length +.>The specific calculation process of the discrete Fourier transform DFT is as follows: n additions are performed for any k, and the DFT shares N ² A secondary multiplication operation; n-1 addition operations are carried out on any k, and the discrete Fourier transform DFT shares N (N-1) addition operations; fast fourier transform FFT sharing N (log) ₂ N-1) multiplication operations and N log ₂ N addition operations;

(3b) The following operations are performed with a mel filter bank:

only the amplitude is considered, as shown in equation (7):

taking logarithm from two sides, as shown in formula (8):

4. The transformer voiceprint recognition method based on channel attention mechanism and AH-Softmax of claim 1, wherein: the step (4) specifically refers to:

(4a) Adding a channel attention mechanism module into the VGG19 model:

wherein the sample weight indicates a functionp _l Probability of sample class l, p _j For the probability of sample class j, θ _j,l Is w _j And sample xIncluded angle theta _l,l Is w _l Included angle with sample x, s= ||w _j ||||x||，f(m,θ _l,l )＝cos(m,θ _l,l )，L _j ＝d(p _j )-1，w _j The optimization angle of the sample class j is defined, and m is the margin of the boundary loss function; w (w) _l An optimized angle for sample class l; j is the total category number of the sample;

h(t,θ _j,l ,L _j )＝exp(stL _j ) (17)

h(t,θ _j,l ,L _j )＝exp(st(cos(θ _j,l )+1)L _j ) (18)