CN112432905B - Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil - Google Patents

Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil Download PDF

Info

Publication number
CN112432905B
CN112432905B CN202110115606.XA CN202110115606A CN112432905B CN 112432905 B CN112432905 B CN 112432905B CN 202110115606 A CN202110115606 A CN 202110115606A CN 112432905 B CN112432905 B CN 112432905B
Authority
CN
China
Prior art keywords
photoacoustic
spectrum
cepstrum coefficient
neural network
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110115606.XA
Other languages
Chinese (zh)
Other versions
CN112432905A (en
Inventor
李俊逸
童斌
黄杰
王乔珍
胡艳波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Infotech Co ltd
Original Assignee
Hubei Infotech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Infotech Co ltd filed Critical Hubei Infotech Co ltd
Priority to CN202110115606.XA priority Critical patent/CN112432905B/en
Publication of CN112432905A publication Critical patent/CN112432905A/en
Application granted granted Critical
Publication of CN112432905B publication Critical patent/CN112432905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/1702Systems in which incident light is modified in accordance with the properties of the material investigated with opto-acoustic detection, e.g. for gases or analysing solids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/1702Systems in which incident light is modified in accordance with the properties of the material investigated with opto-acoustic detection, e.g. for gases or analysing solids
    • G01N2021/1704Systems in which incident light is modified in accordance with the properties of the material investigated with opto-acoustic detection, e.g. for gases or analysing solids in gases

Abstract

The invention relates to a voiceprint recognition method and a voiceprint recognition device based on photoacoustic spectroscopy of characteristic gas in transformer oil, wherein the method comprises the following steps: acquiring photoacoustic sound spectrums of target characteristic gas in transformer oil under different excitation light sources, different working environments and different concentrations to obtain a sound spectrum data set; acquiring noises of the transformer in different working environments to obtain a noise data set; extracting the characteristics of the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient, the Gabor tensor cepstrum coefficient and the like of each photoacoustic sound spectrum; mapping and fusing each characteristic of each photoacoustic sound spectrum into a multi-dimensional characteristic vector; training a SqueezeNet convolution neural network by using a multi-dimensional feature vector and a noise data set; the SqueezeNet convolutional neural network is used for identifying the photoacoustic sound spectrum of the target characteristic gas. The invention utilizes the deep learning model to identify the photoacoustic sound spectrum, thereby improving the identification efficiency, accuracy and stability.

Description

Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil
Technical Field
The invention belongs to the field of measurement and deep learning of power equipment, and particularly relates to a voiceprint recognition method and device based on photoacoustic spectroscopy of characteristic gas in transformer oil.
Background
The gas photoacoustic spectroscopy technology is a novel detection technology, mainly used for quantitatively analyzing the concentration of gas by detecting the absorption of laser photon energy by gas molecules, and also belongs to a gas analysis method for measuring the absorption. Compared with a detection method for directly measuring optical radiation energy, the technology increases a link of converting heat energy into sound signals, and applies the photoacoustic spectrum detection technology in the online monitoring process of the gas content in the transformer oil, because the technology has higher detection sensitivity and lower sample gas demand, the time of oil-gas separation can be greatly reduced, and the measurement period is shortened.
The latest national standard 'analysis and judgment of dissolved gas in transformer oil' issued by the State quality supervision agency indicates that the gas generation principle of transformer insulating oil is due to the decomposition of insulating oil and solid insulating materials under the action of electricity and heat. The low energy discharge failure causes the weakest CH bond to break, reforming hydrogen, ethylene to form at temperatures above methane and ethane, and a large amount of acetylene to be generated in the arc path of the arc.
The "characteristic gas valuable for judging internal faults of oil-filled electrical equipment" is defined in the standard: namely hydrogen (H)2) Methane (CH)4) Ethane (C)2H6) Ethylene (C)2H4) Acetylene (C)2H2) Carbon monoxide (CO) and carbon dioxide (CO)2) And oxygen (O) is illustrated2) And nitrogen (N)2) It can be used as auxiliary judgment index. Thus containing oxygen (O)2) The 8 fault gases in the gas can meet the requirements of the national standard of China only by online monitoring, and the nitrogen (N) is further monitored2) Is the new direction of development in the world.
Because the excited spectral peak is usually more than one after the fault gas absorbs the light energy, the light source isolation is adopted or the frequency of the light source is controlled respectively, so that the complexity, the cost and the identification time of the identification equipment are increased, and the stability of the equipment is reduced.
Disclosure of Invention
In order to reduce the complexity, cost and identification time of the transformer single characteristic gas identification equipment and improve the identification accuracy and stability, the invention provides a voiceprint identification method based on photoacoustic spectroscopy of characteristic gas in transformer oil in a first aspect, which comprises the following steps: acquiring photoacoustic sound spectrums of target characteristic gas in transformer oil under different excitation light sources, different working environments and different concentrations to obtain a sound spectrum data set; acquiring noises of the transformer in different working environments to obtain a noise data set; framing each photoacoustic spectrum in the acoustic spectrum data set by adopting a Hamming window, wherein the overlapping length of adjacent windows is 512, and extracting the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic spectrum; mapping and fusing the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic sound spectrum; respectively taking the multi-dimensional feature vector and noise in the noise data set as a positive sample and a negative sample to construct a sample data set; training the SqueezeNet convolutional neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained SqueezeNet convolutional neural network; and inputting the photoacoustic sound spectrum to be identified into a trained SqueezeNet convolutional neural network, and identifying whether the photoacoustic sound spectrum contains a photoacoustic signal of the target characteristic gas.
In some embodiments of the invention, the mel-frequency cepstral coefficients comprise first order differential mel-frequency cepstral coefficients and second order differential mel-frequency cepstral coefficients; the Gabor tensor cepstrum coefficients include first order differential Gabor tensor cepstrum coefficients and second order differential Gabor tensor cepstrum coefficients.
In some embodiments of the present invention, mapping and fusing the peak, the frequency kurtosis, the instantaneous frequency bandwidth, the mel-frequency cepstrum coefficient, and the Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum into a multidimensional vector, and obtaining the multidimensional feature vector of each photoacoustic sound spectrum includes the following steps: mapping the extracted Mel frequency cepstrum coefficient and Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum into 1 24-dimensional vector respectively to obtain two 24-dimensional vectors, and mapping the wave crest, the frequency kurtosis and the instantaneous frequency bandwidth of each photoacoustic sound spectrum into 1-dimensional vector respectively to obtain three 1-dimensional vectors; and fusing the two 24-dimensional vectors and the three 1-dimensional vectors to obtain 1 51-dimensional vector, wherein the 51-dimensional vector is a multi-dimensional feature vector.
In some embodiments of the invention, the SqueezeNet convolutional neural network comprises a first convolutional neural network and a second convolutional neural network; the first convolutional neural network comprises a convolutional layer, a pooling layer, an LSTM layer, a Fire module, a global average pooling layer, an output layer and a softmax layer, and the softmax layer of the first convolutional neural network and the softmax layer of the second convolutional neural network are fused through basic probability distribution.
In some embodiments of the invention, the second convolutional neural network is a deep convolutional neural network.
In some embodiments of the invention, the target characteristic gas is one of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide, oxygen, or nitrogen.
The invention provides a voiceprint recognition device for photoacoustic spectroscopy in transformer oil, which comprises an acquisition module, an extraction module, a fusion module, a training module and a recognition module, wherein the acquisition module is used for acquiring photoacoustic spectrograms of target characteristic gas in the transformer oil under different excitation light sources, different working environments and different concentrations to obtain a sonogram data set; acquiring noises of the transformer in different working environments to obtain a noise data set; the extraction module is used for framing each photoacoustic spectrum in the acoustic spectrum data set by adopting a Hamming window, wherein the overlapping length of adjacent windows is 512, and extracting the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic spectrum; the fusion module is used for mapping and fusing the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic sound spectrum; the training module is used for respectively taking the multi-dimensional characteristic vector and the noise in the noise data set as a positive sample and a negative sample to construct a sample data set; training the SqueezeNet convolutional neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained SqueezeNet convolutional neural network; the identification module is used for inputting the photoacoustic sound spectrum to be identified into the trained SqueezeNet convolutional neural network and identifying whether the photoacoustic sound spectrum contains the photoacoustic signal of the target characteristic gas.
In some embodiments of the present invention, the extraction module includes a first extraction module, a second extraction module, the first extraction module is configured to extract a peak, a frequency kurtosis, an instantaneous frequency bandwidth, a mel-frequency cepstrum coefficient, and a Gabor tensor cepstrum coefficient of the photoacoustic sound spectrum; the second extraction module is used for extracting a first-order difference Mel frequency cepstrum coefficient, a second-order difference Mel frequency cepstrum coefficient, a first-order difference Gabor tensor cepstrum coefficient and a second-order difference Gabor tensor cepstrum coefficient of the photoacoustic sound spectrum.
In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; and a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the method for identifying a voiceprint based on photoacoustic spectroscopy of a characteristic gas in transformer oil provided by the first aspect of the present invention.
In a fourth aspect of the present invention, a computer readable medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for voiceprint recognition based on photoacoustic spectroscopy of characteristic gases in transformer oil provided by the first aspect of the present invention.
The invention has the beneficial effects that:
1. in acousto-optic technology, the frequency of the acoustic wave excited by light absorption is determined by the modulation frequency of the light source; and its intensity is related only to the volume fraction of the characteristic gas that can absorb the narrow band spectrum. Therefore, the volume fraction of each gas in the gas pool can be accurately measured by establishing the quantitative relation between the volume fraction of the gas and the intensity of the sound wave. The acoustic components in the photoacoustic spectrum are regarded as the voiceprints, the attributes of the gases such as the concentration of the gases are regarded as labels, and a voiceprint recognition model based on deep learning is established, so that the method can be flexibly and conveniently applied to qualitative or quantitative detection of characteristic gases under different noise environments and light source conditions, the time and the calculated amount for analyzing the photoacoustic spectrum are reduced, and the recognition efficiency, the accuracy and the stability are improved;
2. extracting all characteristics of a sound signal of a single sound characteristic which is often not fully represented by a target characteristic gas, and utilizing an improved mode of fusion of various characteristics (wave crest, frequency kurtosis, instantaneous frequency bandwidth, mel frequency cepstrum coefficient and Gabor tensor cepstrum coefficient) to enable the characteristics to be mutually complemented and mapped, so that the accuracy and the effectiveness of sound signal identification can be improved to a certain extent;
3. certain environmental noise is added in the training process, so that the generalization and robustness of the deep learning model can be improved, and the occurrence of the condition that the deep learning model is over-fitted or under-fitted or even cannot be converged is reduced;
the SqueezeNet convolutional neural network is used as a lightweight deep learning model, has lower parameters, storage overhead and calculation amount compared with the traditional convolutional neural network (CNN, DNN, VGG) and the like, can be even deployed in an FPGA, and can be integrated with an acquisition module or an extraction module of the application.
Drawings
FIG. 1 is a schematic flow diagram of voiceprint identification based on photoacoustic spectroscopy of characteristic gases in transformer oil in some embodiments of the present invention;
FIG. 2 is a schematic diagram of a SqueezeNet convolutional neural network in some embodiments of the present invention;
FIG. 3 is a schematic diagram of the Fire module structure of a first convolutional neural network in some embodiments of the present invention;
FIG. 4 is a schematic diagram of the structure of a voiceprint recognition apparatus for photoacoustic sonography of a characteristic gas in transformer oil in some embodiments of the present invention;
FIG. 5 is a basic block diagram of an electronic device in some embodiments of the invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, the invention provides a voiceprint recognition method based on photoacoustic spectroscopy of characteristic gas in transformer oil, which comprises the following steps: s101, acquiring photoacoustic spectrums of target characteristic gas in transformer oil under different excitation light sources, different working environments and different concentrations to obtain a spectrum data set; acquiring noises of the transformer in different working environments to obtain a noise data set; s102, framing each photoacoustic spectrum in the acoustic spectrum data set by adopting a Hamming window, wherein the overlapping length of adjacent windows is 512, and extracting the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic spectrum; s103, mapping and fusing a peak, a Frequency kurtosis, an instantaneous Frequency bandwidth, a Mel Frequency Cepstral Coefficient (MFCC) and a Gabor Tensor Cepstral Coefficient (GTCC) of each photoacoustic sound spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic sound spectrum; s104, respectively taking the multi-dimensional feature vector and noise in the noise data set as a positive sample and a negative sample to construct a sample data set; training the SqueezeNet convolutional neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained SqueezeNet convolutional neural network; and S105, inputting the photoacoustic sound spectrum to be identified into the trained SqueezeNet convolutional neural network, and identifying whether the photoacoustic sound spectrum contains a photoacoustic signal of the target characteristic gas.
To further improve the recognition accuracy of the photoacoustic sound spectrum, in some embodiments of the present invention, the mel-frequency cepstrum coefficients include first-order differential mel-frequency cepstrum coefficients and second-order differential mel-frequency cepstrum coefficients, and the Gabor tensor cepstrum coefficients include first-order differential Gabor tensor cepstrum coefficients and second-order differential Gabor tensor cepstrum coefficients.
The Gamma (GT) filter is a pulse represented by a gamma distribution functionAnd the response has a sharper frequency selection characteristic. Since the GT filter is consistent with the filtering characteristics of the basilar membrane of the cochlea of the human ear, a more accurate approximation of the perceptual frequency response can be provided, which is calculated by the following formula:
Figure 896939DEST_PATH_IMAGE001
whereintIs a variable of the time, and is,Awhich is indicative of the amplitude factor, is,nthe order of the filter is represented,Bwhich is indicative of the duration of the impulse response,fc denotes the center frequency of the filter,φrepresenting the filter phase. Frequency kurtosis spectral kurtosis is a fourth order statistical feature used to measure the flatness of data near its mean. The kurtosis value is 0, and the wave pattern presents Gaussian distribution; when the kurtosis value is less than 0, the wave pattern is in flat distribution; when the kurtosis value is larger than 0, the wave pattern has a sharp peak value.
In step S102 of some embodiments of the present invention, mapping and fusing the peak, the frequency kurtosis, the instantaneous frequency bandwidth, the mel-frequency cepstrum coefficient, and the Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum into a multidimensional vector, and obtaining the multidimensional feature vector of each photoacoustic sound spectrum includes the following steps: mapping the extracted Mel frequency cepstrum coefficient and Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum to 1 24-dimensional vector respectively to obtain 2 24-dimensional vectors; mapping the wave crest, the frequency kurtosis and the instantaneous frequency bandwidth of each photoacoustic sound spectrum to 1 vector in 1 dimension respectively to obtain 3 vectors in one dimension; 2 24-dimensional vectors and 3 1-dimensional vectors are used to obtain 1 51-dimensional vector, and the 51-dimensional vector is a multi-dimensional feature vector.
Optionally, in the feature extraction process, besides the hamming window is used for framing, one or more of a rectangular window, a triangular window, a haining window, a blackman window, and a cather window may be used for extracting the features of the sound signal.
Referring to fig. 2, in some embodiments of the invention, the SqueezeNet convolutional neural network comprises a first convolutional neural network and a second convolutional neural network, and the softmax layer of the first convolutional neural network and the softmax layer of the second convolutional neural network are fused by basic probability distribution. The first convolutional neural network further comprises an input layer (51), a convolutional layer, a pooling layer, an LSTM layer, a Fire module (comprising a first Fire module layer, a second Fire module layer and a third Fire module layer), a global average pooling layer and an output layer. The second convolutional neural network further comprises an input layer (51), a first fully-connected layer, and a second fully-connected layer. Preferably, the second convolutional Neural network is a Deep convolutional Neural network (DNN).
It will be appreciated that 51 of the (51,) input layers of the first and second convolutional neural networks represents the dimensionality of the multi-dimensional feature vector, with other omitted parameters given by the operating environment of the actual model. For example, (51 × 32 × 3) represents 3 sets of samples per batch, each set of samples including 32 samples of 51-dimensional vectors.
Referring to fig. 3, each Fire module layer contains two convolutional layers: the compressed layer and the expanded layer are respectively connected with a ReLU (Rectified Linear Unit) active layer, wherein the compressed layer is composed of S1 × 1 convolution kernels, the expanded layer comprises E1 1 × 1 convolution kernels and E3 3 × 3 convolution kernels, and the number relation of the convolution kernels meets S < E1+ E3. Where S represents the number of convolution kernels of the compression layer and E represents the convolution sum of the extension layer, generally E1= E3.
Optionally, the number of the Fire module layer and the number of the full connection layers are adjusted according to the actual situation.
It can be understood that acetylene, ethylene, methane, ethane, trace water, hydrogen, carbon monoxide and carbon dioxide are commonly dissolved gases in the transformer oil. Each gas molecule has its own absorption peak, and the absorption peaks of different gases are different from each other to some extent, but in some regions, the absorption peaks overlap with each other, and when gas analysis is performed using light in this wavelength band, cross-talk between gases is likely to occur. Therefore, when selecting the absorption peaks, it should be avoided to choose the absorption peaks with overlapping and as large peaks as possible, and to keep them separated from the water, so as to avoid the corresponding cross-effects. The recognition model of the invention breaks through the limitation when being used for recognizing the characteristic gas, and the target characteristic gas is one of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide, oxygen or nitrogen.
Referring to fig. 4, a second aspect of the present invention provides a voiceprint recognition apparatus 1 for photoacoustic spectroscopy in transformer oil, which includes an acquisition module 11, an extraction module 12, a fusion module 13, a training module 14, and a recognition module 15.
The acquisition module 11 is configured to acquire photoacoustic spectrograms of a single target characteristic gas in a transformer in environments with different excitation light sources, different concentrations, and different noises to obtain a spectrogram data set; acquiring noises of the transformer in different working environments to obtain a noise data set; the extraction module 12 is configured to perform framing on each photoacoustic spectrum in the acoustic spectrum data set by using a hamming window, where the overlapping length of adjacent windows is 512, and extract a peak, a frequency kurtosis, an instantaneous frequency bandwidth, a mel frequency cepstrum coefficient, and a Gabor tensor cepstrum coefficient of each photoacoustic spectrum; the fusion module 13 is configured to map and fuse a peak, a frequency kurtosis, an instantaneous frequency bandwidth, a mel frequency cepstrum coefficient, and a Gabor tensor cepstrum coefficient of each photoacoustic acoustic spectrum to a multidimensional vector, so as to obtain a multidimensional feature vector of each photoacoustic acoustic spectrum; the training module 14 is configured to use the multidimensional feature vector and noise in the noise data set as a positive sample and a negative sample, respectively, to construct a sample data set; training the SqueezeNet convolutional neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained SqueezeNet convolutional neural network; the identification module 15 is configured to input the photoacoustic spectrum to be identified into a trained squeezet convolutional neural network, and identify whether the photoacoustic spectrum contains a photoacoustic signal of a target characteristic gas.
Furthermore, the recognition result of the concentration, the light source and the like of the target characteristic gas is obtained according to the recognized photoacoustic signal, and is specifically determined by the sample label (the sample label contains information of the concentration, the light source and the like) participating in model training.
In some embodiments of the present invention, the extraction module 12 includes a first extraction module and a second extraction module, wherein the first extraction module is configured to extract a peak of the photoacoustic sound spectrum, a frequency kurtosis, an instantaneous frequency bandwidth, a mel-frequency cepstrum coefficient, and a Gabor tensor cepstrum coefficient; the second extraction module is used for extracting a first-order difference Mel frequency cepstrum coefficient, a second-order difference Mel frequency cepstrum coefficient, a first-order difference Gabor tensor cepstrum coefficient and a second-order difference Gabor tensor cepstrum coefficient of the photoacoustic sound spectrum.
In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; and a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the method for identifying a voiceprint based on photoacoustic spectroscopy of a characteristic gas in transformer oil provided by the first aspect of the present invention.
Referring to fig. 5, an electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following devices may be connected to the I/O interface 505 in general: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more computer programs which, when executed by the electronic device, cause the electronic device to:
computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. The voiceprint recognition method based on the photoacoustic spectrum of the characteristic gas in the transformer oil is characterized by comprising the following steps of:
acquiring photoacoustic sound spectrums of target characteristic gas in transformer oil under different excitation light sources, different working environments and different concentrations to obtain a sound spectrum data set, and acquiring noise of a transformer under different working environments to obtain a noise data set;
framing each photoacoustic spectrum in the acoustic spectrum data set by adopting a Hamming window, wherein the overlapping length of adjacent windows is 512, and extracting the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic spectrum;
mapping and fusing the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic sound spectrum;
respectively taking the multi-dimensional characteristic vector and noise in the noise data set as a positive sample and a negative sample, constructing a sample data set, and training the SqueezeNet convolutional neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, so as to obtain the trained SqueezeNet convolutional neural network;
and inputting the photoacoustic sound spectrum to be identified into a trained SqueezeNet convolutional neural network, and identifying whether the photoacoustic sound spectrum contains a photoacoustic signal of the target characteristic gas.
2. The method for identifying the voiceprint based on the photoacoustic spectroscopy of the characteristic gas in the transformer oil according to claim 1, wherein the mel-frequency cepstrum coefficients comprise a first-order differential mel-frequency cepstrum coefficient and a second-order differential mel-frequency cepstrum coefficient;
the Gabor tensor cepstrum coefficients include first order differential Gabor tensor cepstrum coefficients and second order differential Gabor tensor cepstrum coefficients.
3. The method for identifying the voiceprint based on the photoacoustic spectrum of the characteristic gas in the transformer oil according to claim 1 or 2, wherein the step of mapping and fusing the peak, the frequency kurtosis, the instantaneous frequency bandwidth, the mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic spectrum into a multidimensional vector to obtain the multidimensional characteristic vector of each photoacoustic spectrum comprises the following steps:
mapping the extracted Mel frequency cepstrum coefficient and Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum into 1 24-dimensional vector respectively to obtain two 24-dimensional vectors, and mapping the wave crest, the frequency kurtosis and the instantaneous frequency bandwidth of each photoacoustic sound spectrum into 1-dimensional vector respectively to obtain three 1-dimensional vectors;
and fusing the two 24-dimensional vectors and the three 1-dimensional vectors to obtain 1 51-dimensional vector, wherein the 51-dimensional vector is a multi-dimensional feature vector.
4. The method for voiceprint recognition based on photoacoustic spectroscopy of characteristic gases in transformer oil according to claim 1, wherein the squeezet convolutional neural network comprises a first convolutional neural network and a second convolutional neural network;
the first convolutional neural network comprises a convolutional layer, a pooling layer, an LSTM layer, a Fire module, a global average pooling layer, an output layer and a softmax layer, and the softmax layer of the first convolutional neural network and the softmax layer of the second convolutional neural network are fused through basic probability distribution.
5. The method for identifying the voiceprint based on the photoacoustic spectroscopy of the characteristic gas in the transformer oil according to claim 4, wherein the second convolutional neural network is a deep convolutional neural network.
6. The method for identifying the voiceprint based on the photoacoustic spectroscopy of the characteristic gas in the transformer oil according to claim 1, wherein the target characteristic gas is one of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide, oxygen or nitrogen.
7. A voiceprint recognition device based on photoacoustic spectroscopy of characteristic gas in transformer oil is characterized by comprising an acquisition module, an extraction module, a fusion module, a training module and a recognition module;
the acquisition module is used for acquiring photoacoustic spectrums of target characteristic gases in the transformer oil under different excitation light sources, different concentrations and different noise environments to obtain a spectrum data set; the method is also used for acquiring the noise of the transformer in different working environments to obtain a noise data set;
the extraction module is used for framing each photoacoustic spectrum in the acoustic spectrum data set by adopting a Hamming window, wherein the overlapping length of adjacent windows is 512, and extracting the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic spectrum;
the fusion module is used for mapping and fusing the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the Mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of each photoacoustic sound spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic sound spectrum;
the training module is used for respectively taking the multi-dimensional characteristic vector and the noise in the noise data set as a positive sample and a negative sample to construct a sample data set; training the SqueezeNet convolutional neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained SqueezeNet convolutional neural network;
the identification module is used for inputting the photoacoustic sound spectrum to be identified into the trained SqueezeNet convolutional neural network and identifying whether the photoacoustic sound spectrum contains the photoacoustic signal of the target characteristic gas.
8. The device for identifying the voiceprint based on the photoacoustic spectrum of the characteristic gas in the transformer oil according to claim 7, wherein the extraction module comprises a first extraction module and a second extraction module;
the first extraction module is used for extracting the wave crest, the frequency kurtosis, the instantaneous frequency bandwidth, the mel frequency cepstrum coefficient and the Gabor tensor cepstrum coefficient of the photoacoustic sound spectrum;
the second extraction module is used for extracting a first-order difference Mel frequency cepstrum coefficient, a second-order difference Mel frequency cepstrum coefficient, a first-order difference Gabor tensor cepstrum coefficient and a second-order difference Gabor tensor cepstrum coefficient of the photoacoustic sound spectrum.
9. An electronic device, comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1-6 for voiceprint recognition based on characteristic gas photoacoustic spectroscopy in transformer oil.
10. A computer-readable medium, characterized in that a computer program is stored thereon, wherein the computer program, when being executed by a processor, is adapted to carry out a method for voiceprint recognition based on photoacoustic spectroscopy of a gas characteristic of transformer oil as set forth in any one of claims 1 to 6.
CN202110115606.XA 2021-01-28 2021-01-28 Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil Active CN112432905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110115606.XA CN112432905B (en) 2021-01-28 2021-01-28 Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110115606.XA CN112432905B (en) 2021-01-28 2021-01-28 Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil

Publications (2)

Publication Number Publication Date
CN112432905A CN112432905A (en) 2021-03-02
CN112432905B true CN112432905B (en) 2021-04-20

Family

ID=74697373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110115606.XA Active CN112432905B (en) 2021-01-28 2021-01-28 Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil

Country Status (1)

Country Link
CN (1) CN112432905B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112595672B (en) * 2021-03-03 2021-05-14 湖北鑫英泰系统技术股份有限公司 Mixed gas photoacoustic spectrum identification method and device based on deep learning
CN113111944B (en) * 2021-04-13 2022-05-31 湖北鑫英泰系统技术股份有限公司 Photoacoustic spectrum identification method and device based on deep learning and gas photoacoustic effect
CN113252571A (en) * 2021-04-23 2021-08-13 陕西省石油化工研究设计院 Method for detecting gas by using spectrometry

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102884413A (en) * 2010-03-02 2013-01-16 利康股份有限公司 Method and apparatus for the photo-acoustic identification and quantification of analyte species in a gaseous or liquid medium
CN206074431U (en) * 2016-07-19 2017-04-05 安徽华电六安电厂有限公司 A kind of optoacoustic spectroscopy principle transformer online monitoring system based on elimination cross influence
CN109800700A (en) * 2019-01-15 2019-05-24 哈尔滨工程大学 A kind of underwater sound signal target classification identification method based on deep learning
WO2019217507A1 (en) * 2018-05-11 2019-11-14 Carrier Corporation Photoacoustic detection system
CN111341319A (en) * 2018-12-19 2020-06-26 中国科学院声学研究所 Audio scene recognition method and system based on local texture features
CN112304869A (en) * 2019-07-26 2021-02-02 英飞凌科技股份有限公司 Gas sensing device for sensing gas in gas mixture and method for operating the same

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120118042A1 (en) * 2010-06-10 2012-05-17 Gillis Keith A Photoacoustic Spectrometer with Calculable Cell Constant for Quantitative Absorption Measurements of Pure Gases, Gaseous Mixtures, and Aerosols

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102884413A (en) * 2010-03-02 2013-01-16 利康股份有限公司 Method and apparatus for the photo-acoustic identification and quantification of analyte species in a gaseous or liquid medium
CN206074431U (en) * 2016-07-19 2017-04-05 安徽华电六安电厂有限公司 A kind of optoacoustic spectroscopy principle transformer online monitoring system based on elimination cross influence
WO2019217507A1 (en) * 2018-05-11 2019-11-14 Carrier Corporation Photoacoustic detection system
CN111341319A (en) * 2018-12-19 2020-06-26 中国科学院声学研究所 Audio scene recognition method and system based on local texture features
CN109800700A (en) * 2019-01-15 2019-05-24 哈尔滨工程大学 A kind of underwater sound signal target classification identification method based on deep learning
CN112304869A (en) * 2019-07-26 2021-02-02 英飞凌科技股份有限公司 Gas sensing device for sensing gas in gas mixture and method for operating the same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Identication and concentration measurements of atmospheric pollutants through a neural network analysis of photothermal signatures;K. Boccara et al;《JOURNAL DE PHYSIQUE IV》;19941231;第C7-107至C7-110页 *
不同语音特征对声音分类的有效性研究;王华朋等;《中国刑警学院学报》;20201231;摘要、第125128页以及表1、5 *

Also Published As

Publication number Publication date
CN112432905A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN112432905B (en) Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil
WO2021174757A1 (en) Method and apparatus for recognizing emotion in voice, electronic device and computer-readable storage medium
US10878823B2 (en) Voiceprint recognition method, device, terminal apparatus and storage medium
Andersen et al. Nonintrusive speech intelligibility prediction using convolutional neural networks
Wu et al. Automatic speech emotion recognition using modulation spectral features
US20210271826A1 (en) Speech translation method, electronic device and computer-readable storage medium
CN112595672B (en) Mixed gas photoacoustic spectrum identification method and device based on deep learning
CN112504971B (en) Photoacoustic spectrum identification method and device for characteristic gas in transformer oil
Faundez-Zanuy et al. Nonlinear speech processing: overview and applications
CN113111944B (en) Photoacoustic spectrum identification method and device based on deep learning and gas photoacoustic effect
Chaki Pattern analysis based acoustic signal processing: a survey of the state-of-art
CN115273904A (en) Angry emotion recognition method and device based on multi-feature fusion
CN112504970B (en) Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning
Joy et al. Deep Scattering Power Spectrum Features for Robust Speech Recognition.
Yuen et al. Asdf: A differential testing framework for automatic speech recognition systems
CN114302301B (en) Frequency response correction method and related product
Hu et al. A lightweight multi-sensory field-based dual-feature fusion residual network for bird song recognition
Liu et al. A novel feature extraction strategy for multi-stream robust emotion identification
CN108962389A (en) Method and system for indicating risk
Slaney et al. Pitch-gesture modeling using subband autocorrelation change detection.
CN112908343B (en) Acquisition method and system for bird species number based on cepstrum spectrogram
Wang et al. Time-domain adaptive attention network for single-channel speech separation
Korvel et al. Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically
CN112420022B (en) Noise extraction method, device, equipment and storage medium
Vuddagiri et al. Study of robust language identification techniques for future smart cities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and device for identifying voiceprint based on photoacoustic spectrum of characteristic gas in transformer oil

Effective date of registration: 20220610

Granted publication date: 20210420

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: HUBEI INFOTECH CO.,LTD.

Registration number: Y2022420000153

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230922

Granted publication date: 20210420

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: HUBEI INFOTECH CO.,LTD.

Registration number: Y2022420000153