Disclosure of Invention
In order to improve the identification accuracy and flexibility of the audio signal of the gas photoacoustic spectrum and improve the identification accuracy and stability, the first aspect of the invention provides a deep learning-based gas photoacoustic spectrum enhanced voiceprint identification method, which comprises the following steps: acquiring a plurality of photoacoustic spectrums of target characteristic gas under different components and concentrations to obtain a photoacoustic spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set; framing each photoacoustic sound spectrum in the photoacoustic sound spectrum data set, and extracting a Mel frequency cepstrum coefficient, a wave peak, a spectrum skewness and a spectrum kurtosis of each photoacoustic sound spectrum; mapping and fusing the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic acoustic spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic acoustic spectrum; respectively taking the concentrations of the noise in the multi-dimensional characteristic vector and the enhanced noise data set and the target characteristic gas as a positive sample, a negative sample and a label to construct a sample data set; training a recurrent neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained recurrent neural network; and inputting the photoacoustic sound spectrum to be identified into the trained recurrent neural network, and identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum.
In some embodiments of the invention, the mixing the photoacoustic spectroscopy dataset with the noise dataset resulting in an enhanced noise dataset comprises the steps of: and performing time stretching, tone shifting, random matching, cyclic shifting or mixing on the noise signals in the noise data set and the noise signals in the photoacoustic sound spectrum to obtain an enhanced noise data set.
Further, the generative countermeasure network and the noise data set are utilized to improve the realism of the enhanced noise data set. Preferably, the generative confrontation network comprises a generation network and a discrimination network, and the generation network is used for generating reconstructed noise samples according to the noise samples in the noise data set; and the discrimination network is used for evaluating the similarity between the reconstructed noise sample and the noise sample.
In some embodiments of the invention, the recurrent neural network comprises a plurality of deep convolutional neural network elements.
The target characteristic gas is one of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide, oxygen or nitrogen.
The invention provides a gas photoacoustic spectrum enhanced voiceprint recognition device based on deep learning, which comprises an acquisition module, an extraction module, a fusion module, a training module and a recognition module, wherein the acquisition module is used for acquiring a plurality of photoacoustic spectra of target characteristic gas under different components and concentrations to obtain a photoacoustic spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set; the extraction module is used for framing each photoacoustic spectrum in the photoacoustic spectrum data set and extracting the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic spectrum; the fusion module is used for mapping and fusing the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic acoustic spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic acoustic spectrum; the training module is used for respectively taking the concentrations of the noise in the multi-dimensional characteristic vector and the enhanced noise data set and the target characteristic gas as a positive sample, a negative sample and a label to construct a sample data set; training a recurrent neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained recurrent neural network; the identification module is used for inputting the photoacoustic sound spectrum to be identified into the trained recurrent neural network and identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum.
Further, the acquiring module comprises a first acquiring module, a second acquiring module and a mixing module, wherein the first acquiring module is used for acquiring a plurality of photoacoustic sound spectrums of the target characteristic gas under different components and concentrations to obtain a photoacoustic sound spectrum data set; the second acquisition module is used for acquiring the noise of the photoacoustic cell in different sound environments to obtain a noise data set; the mixing module is configured to mix the audio signal of the photoacoustic sound spectrum with the noise data set to obtain an enhanced noise data set.
In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for gas photoacoustic spectroscopy-enhanced voiceprint recognition based on deep learning provided by the first aspect of the present invention.
In a fourth aspect of the present invention, a computer readable medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the deep learning based gas photoacoustic spectroscopy enhanced voiceprint recognition method provided by the first aspect of the present invention.
The invention has the beneficial effects that:
1. because the collected photoacoustic cell is subjected to noise in different environment sound environments at the construction stage of the sample data set, the adaptability of the identification model can be improved, the photoacoustic cell can be flexibly used for identifying photoacoustic sound spectrums in different scenes, and the dependence on a specific structure of the photoacoustic cell is reduced on the premise of ensuring that the precision is not reduced;
2. the diversity and the authenticity of the reconstructed noise sample are further improved through the generative countermeasure network, so that the robustness of the model is improved, and the gas represented by the photoacoustic spectrum and the concentration or other information thereof can be better and more accurately identified;
3. in order to better learn the sound characteristics of the gas photoacoustic sound spectrum, the cyclic neural network comprises a deep convolutional neural network layer, and the expression of different characteristics by the layered model is more definite and has higher accuracy than a model formed by a single cyclic neural network layer.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, in a first aspect of the present invention, there is provided a gas photoacoustic spectroscopy enhanced voiceprint recognition method based on deep learning, comprising the steps of: s101, acquiring a plurality of photoacoustic sound spectrums of target characteristic gas under different components and concentrations to obtain a photoacoustic sound spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set; s102, framing each photoacoustic spectrum in the photoacoustic spectrum data set, and extracting a Mel frequency cepstrum coefficient, a wave peak, a spectrum skewness and a spectrum kurtosis of each photoacoustic spectrum; s103, mapping and fusing the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic sound spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic sound spectrum; s104, respectively taking the concentrations of the multi-dimensional characteristic vector, noise in the enhanced noise data set and the target characteristic gas as a positive sample, a negative sample and a label to construct a sample data set; training a Recurrent Neural Network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining a trained Recurrent Neural Network (RNN); and S105, inputting the photoacoustic sound spectrum to be identified into the trained recurrent neural network, and identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum.
In some embodiments of the invention, the mixing the photoacoustic spectroscopy dataset with the noise dataset resulting in an enhanced noise dataset comprises the steps of: and performing Time Stretch (Time-Stretch), Pitch Shift (Pitch-Shift), random matching, cyclic Shift and mixing on the noise signals in the noise data set and the audio signals in the photoacoustic sound spectrum to obtain an enhanced noise data set. It will be appreciated that the data enhancement described above is similar to image enhancement, and for example, the boundary after audio signal deformation such as stretching needs to be correspondingly stretched.
Further, the augmented noise data set is augmented with a Generative Adaptive Network (GAN) and a noise data set. Preferably, the generative confrontation network comprises a generation network and a discrimination network, and the generation network is used for generating reconstructed noise samples according to the noise samples in the noise data set; and the discrimination network is used for evaluating the similarity between the reconstructed noise sample and the noise sample.
Specifically, the loss function of the generative countermeasure network is:
wherein V (D, G) represents a loss function of the generative impedance network, D represents a discriminant network, and G represents a generative network;
a loss function representing the generated network,
representing a loss function of the discriminant network, and Pdata (.) representing a sample distribution function; x represents an authentic noise sample, and z represents a non-authentic noise sample (including a non-authentic noise sample obtained by mixing a noise signal in the photoacoustic spectrometry data set with a noise signal in the noise data set in the above-described embodiment); g (z) represents G network generated reconstructed noise samples, d (x) represents the authenticity (probability of being authentic) of the noise samples; d (G (z)) represents the evaluation of the similarity (true or false) of the reconstructed noise sample and the noise sampleProbability). The similarity is determined by the similarity of the images of the sound spectrum, the similarity of the images of the corresponding frequency spectrum, the cross entropy, the KL divergence, the JS divergence or the fitness of the waveform, and the acoustic method.
Referring to FIG. 2, in some embodiments of the invention, the recurrent neural network includes a plurality of deep convolutional neural network elements. From bottom to top, the bottommost part is an input layer, the middle layer (1-L) layer is a hidden layer, the top layer is an output layer, the expression of different characteristics of the hierarchical model is more definite than that of a model formed by a single cyclic neural network layer, and the advantage of processing sequence data by the cyclic neural network is kept.
Optionally, the Recurrent neural network is an LSTM (Long Short-term Memory) neural network or a gru (gated regenerative unit) neural network.
Optionally, the identification method is applied to identification of characteristic gas in transformer oil, and then the target characteristic gas is one of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide, oxygen or nitrogen.
Referring to fig. 3, in a second aspect of the present invention, there is provided a gas photoacoustic spectroscopy enhanced voiceprint recognition apparatus 1 based on deep learning, including an obtaining module 11, an extracting module 12, a fusing module 13, a training module 14, and a recognition module 15, where the obtaining module 11 is configured to obtain a plurality of photoacoustic spectra of a target characteristic gas under different components and concentrations, so as to obtain a photoacoustic spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set; the extraction module 12 is configured to perform framing on each photoacoustic spectrum in the photoacoustic spectrum data set, and extract a mel-frequency cepstrum coefficient, a peak, a spectral skewness, and a spectral kurtosis of each photoacoustic spectrum; the fusion module 13 is configured to map and fuse the mel-frequency cepstrum coefficient, the peak, the spectral skewness, and the spectral kurtosis of each photoacoustic acoustic spectrum into a multidimensional vector, so as to obtain a multidimensional feature vector of each photoacoustic acoustic spectrum; the training module 14 is configured to use the concentrations of the multi-dimensional feature vector, noise in the enhanced noise data set, and the target feature gas as a positive sample, a negative sample, and a label, respectively, to construct a sample data set; training a recurrent neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained recurrent neural network; the identifying module 15 is configured to input the photoacoustic spectrum to be identified into the trained recurrent neural network, and identify the concentration of the target characteristic gas corresponding to the photoacoustic spectrum.
Further, the acquiring module 11 includes a first acquiring module, a second acquiring module and a mixing module, where the first acquiring module is configured to acquire a plurality of photoacoustic spectra of the target characteristic gas under different components and concentrations to obtain a photoacoustic spectrum data set; the second acquisition module is used for acquiring the noise of the photoacoustic cell in different sound environments to obtain a noise data set; the mixing module is configured to mix the audio signal of the photoacoustic sound spectrum with the noise data set to obtain an enhanced noise data set.
In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for gas photoacoustic spectroscopy-enhanced voiceprint recognition based on deep learning provided by the first aspect of the present invention.
Referring to fig. 4, an electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following devices may be connected to the I/O interface 505 in general: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 4 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 4 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more computer programs which, when executed by the electronic device, cause the electronic device to:
computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.