CN108335702A

CN108335702A - A kind of audio defeat method based on deep neural network

Info

Publication number: CN108335702A
Application number: CN201810101400.XA
Authority: CN
Inventors: 余春艳; 齐子铭; 管发乾; 张栋
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2018-02-01
Filing date: 2018-02-01
Publication date: 2018-07-27

Abstract

The audio defeat method based on deep neural network that the present invention relates to a kind of.This method is training and two stages of test；In the training stage, the training data of DNN models is made an uproar by band to be formed with pure audio, since log-spectral domain more meets the auditory system of human ear, to the audio extraction log power spectrum of input as feature；In test phase, the log power spectrum of the transformed song of tone color is input in the DNN models obtained from the training stage, the output of model is exactly the log power spectrum of audio after noise reduction, since the Auditory Perception of human ear is to the phase information of audio and insensitive, so phase information is directly calculated from the transformed song of Multisound, the log power spectrum and phase information for finally combining the output of DNN models, reconstruct the audio after noise reduction.The method of the present invention can be to the noise reduction of voice and song audio after audio especially tone color conversion.

Description

A kind of audio defeat method based on deep neural network

Technical field

The present invention relates to the audio defeat method in field of singing, especially a kind of audio defeat based on deep neural network Method.

Background technology

Real-life voice audio signals or song audio signal are not often pure, all with various The noise of various kinds.And the purpose of audio defeat is exactly the noise removed as far as possible in audio signal, makes the transformed song of tone color Sound is purer, so as to improve the quality of audio, improves its clarity and intelligibility.

Traditional audio defeat method mainly has Bayes' assessment based on statistical model, Subspace algorithm, spectrum-subtraction Deng.These algorithms all have very strong hypothesis to the characteristic of noise, wherein the computation complexity of spectrum-subtraction is minimum, it is only necessary to carry out Positive inversefouriertransform, however when the signal-to-noise ratio of audio signal is relatively low, spectrum-subtraction is very big to the intelligibility damage of audio.

Most of traditional unsupervised noise reduction algorithm be all based on ambient noise additivity feature or audio and noise it Between certain statistical property and propose and realize, which results in the scope of application very littles of these algorithms.From answering for noise jamming Polygamy sets out, consider using this nonlinear model of deep neural network to noise frequency pure audio between mapping relations into Row modeling, and realize the noise reduction to audio after tone color conversion.

Therefore, this patent is based on above-mentioned analysis, and the stronger audio defeat of generalization ability is trained using deep neural network Model completes the noise reduction to audio.

Invention content

The audio defeat method based on deep neural network that the purpose of the present invention is to provide a kind of, can be special to audio It is the noise reduction of voice and song audio after tone color is converted.

To achieve the above object, the technical scheme is that：A kind of audio defeat method based on deep neural network, Include the following steps：

Step S1：Data are pre-processed, band is obtained and makes an uproar audio data；

Step S2：Training DNN audio defeat models, it is transformed that obtained DNN audio defeats model can complete tone color Mapping between the log power spectrum of song and the log power spectrum of pure audio；

Step S3：Noise reduction is carried out to the transformed song of tone color, that is, combines trained DNN audio defeats mould in step S2 Type, the log power spectrum of output and phase information, reconstruct the audio after noise reduction.

In an embodiment of the present invention, the specific implementation of the step S1 is：Using TIMIT data sets as pure Audio data；And different signal-to-noise ratio grades and different types of a variety of noises are added in pure audio, band is generated with this and is made an uproar Audio data.

In an embodiment of the present invention, the signal-to-noise ratio grade includes 20dB, 15dB, 10dB.

In an embodiment of the present invention, the type of the noise include additive white Gaussian noise, Babble, Restaurant、Street、Car、Exhibition。

In an embodiment of the present invention, the step S2 specifically includes following steps：

Step S21：With the log power spectrum with noise frequency to stack RBM carry out pre-training, using it is unsupervised, successively covet Greedy training method updates the parameter of RBM with CD algorithms；

Step S22：With the DNN audio defeat models that stochastic gradient descent algorithm training is whole；In DNN audio defeat models The parameter of the parts RBM is initialized using the parameter that step S21 is trained, the parameter of DNN audio defeat model output layers Carry out random initializtion；The loss function of DNN audio defeat models is the log power spectrum and DNN audio defeat moulds of pure audio The least mean-square error between log power spectrum after the noise reduction of type output, calculation formula are as follows：

Wherein, E indicates mean square error；WithThe log power spectrum after n-th of sample noise reduction and pure sound are indicated respectively The log power spectrum of frequency；N indicates total number of samples；D indicates the size of log power spectrum；(W^l, b^l) indicate l layers of weight And biasing；The update mode of weight W and biasing b are as follows：

Wherein, λ indicates learning rate.

In an embodiment of the present invention, in the step S22, DNN audio defeat models constitute as follows：

First layer RBM is that there are one Gauss-Bernoulli Jacob of visible layer and a hidden layer to be limited Boltzmann machine, node for tool Number is 2048, activation primitive Sigmoid；

Second layer RBM is that Bernoulli Jacob-Bernoulli Jacob is limited Boltzmann machine, number of nodes 2048, activation primitive Sigmoid；

Third layer RBM is that Bernoulli Jacob-Bernoulli Jacob is limited Boltzmann machine, number of nodes 2048, activation primitive Sigmoid；

4th layer of RBM is output layer, and number of nodes 257, activation primitive is linear activation primitive.

Compared to the prior art, the invention has the advantages that：The method of the present invention is trained using deep neural network Go out the stronger audio defeat model of generalization ability, completes the noise reduction to audio.

Description of the drawings

Fig. 1 is the method flow schematic block diagram of the present invention.

Fig. 2 is the structure chart for the deep neural network applied in the embodiment of the present invention.

Specific implementation mode

Below in conjunction with the accompanying drawings, technical scheme of the present invention is specifically described.

A kind of audio defeat method based on deep neural network of the present invention, as shown in Figure 1, including the following steps：

Step S1：Data are pre-processed.

Step S2：Training DNN audio defeat models, obtained DNN models can complete pair of the transformed song of tone color Mapping between number power spectrum and the log power spectrum of pure audio.

Step S3：Noise reduction is carried out to the transformed song of tone color, in conjunction with trained DNN models, the logarithm of output in S2 Power spectrum and phase information reconstruct the audio after noise reduction.

Further, the specific practice of the step S1 is to use TIMIT data sets as pure audio data；And pure A variety of noises of different brackets signal-to-noise ratio are added in net audio, signal-to-noise ratio grade has 20dB, 15dB, 10dB etc., the type of noise Including additive white Gaussian noise, Babble, Restaurant, Street, Car, Exhibition etc., made an uproar with this to generate band Audio data.

Further, the step S2 specifically includes following steps：

Step S21：With the log power spectrum with noise frequency to stack RBM carry out pre-training, use it is unsupervised, by The greedy training method of layer, updates the parameter of RBM with CD algorithms (Contrastive Divergence).

Step S22：With stochastic gradient descent algorithm (Stochastic Gradient Descent, SGD) training entirety DNN noise reduction models.The parameter of the parts RBM is initialized with the parameter that previous step pre-training obtains in DNN models, model output The parameter of layer carries out random initializtion.After the noise reduction that the loss function of model exports for the log power spectrum of pure audio with model Log power spectrum between least mean-square error (Minimum Mean Squared Error, MMSE), calculation formula is as follows：

Wherein, λ indicates learning rate.

In the present embodiment, step S22 is as shown, DNN noise reduction models are as follows：

First layer RBM is that there are one Gauss-Bernoulli Jacob of visible layer and a hidden layer to be limited Boltzmann machine for tool (Gaussian-BernoulliRBM, GBRBM), number of nodes 2048, activation primitive Sigmoid；

Second layer RBM is that Bernoulli Jacob-Bernoulli Jacob is limited Boltzmann machine (Bernoulli-BernoulliRBM, BBRBM), Number of nodes 2048, activation primitive Sigmoid；

Third layer RBM be Bernoulli Jacob-Bernoulli Jacob be limited Boltzmann machine (Bernoulli-Bernoulli RBM, BBRBM), number of nodes 2048, activation primitive Sigmoid；

The above are preferred embodiments of the present invention, all any changes made according to the technical solution of the present invention, and generated function is made When with range without departing from technical solution of the present invention, all belong to the scope of protection of the present invention.

Claims

1. a kind of audio defeat method based on deep neural network, which is characterized in that include the following steps：

Step S2：Training DNN audio defeat models, obtained DNN audio defeats model can complete the transformed song of tone color Log power spectrum and the log power spectrum of pure audio between mapping；

Step S3：To the transformed song of tone color carry out noise reduction, that is, combine step S2 in trained DNN audio defeats model, The log power spectrum and phase information of output, reconstruct the audio after noise reduction.

2. a kind of audio defeat method based on deep neural network according to claim 1, which is characterized in that the step Suddenly the specific implementation of S1 is：Using TIMIT data sets as pure audio data；And it is added in pure audio different Signal-to-noise ratio grade and different types of a variety of noises generate band with this and make an uproar audio data.

3. a kind of audio defeat method based on deep neural network according to claim 2, which is characterized in that the letter It includes 20dB, 15dB, 10dB to make an uproar than grade.

4. a kind of audio defeat method based on deep neural network according to claim 2, which is characterized in that described to make an uproar The type of sound includes additive white Gaussian noise, Babble, Restaurant, Street, Car, Exhibition.

5. a kind of audio defeat method based on deep neural network according to claim 1, which is characterized in that the step Rapid S2 specifically includes following steps：

Step S21：With the log power spectrum with noise frequency to stack RBM carries out pre-training, using it is unsupervised, successively greediness Training method updates the parameter of RBM with CD algorithms；

Step S22：With the DNN audio defeat models that stochastic gradient descent algorithm training is whole；RBM in DNN audio defeat models Partial parameter is initialized using the parameter that step S21 is trained, and the parameter of DNN audio defeat model output layers carries out Random initializtion；The loss function of DNN audio defeat models is defeated for the log power spectrum and DNN audio defeat models of pure audio The least mean-square error between log power spectrum after the noise reduction gone out, calculation formula are as follows：

Wherein, E indicates mean square error；WithThe log power spectrum after n-th of sample noise reduction and pure audio are indicated respectively Log power spectrum；N indicates total number of samples；D indicates the size of log power spectrum；(W^l, b^l) indicate l layer of weight and inclined It sets；The update mode of weight W and biasing b are as follows：

Wherein, λ indicates learning rate.

6. a kind of audio defeat method based on deep neural network according to claim 5, which is characterized in that the step In rapid S22, DNN audio defeat models constitute as follows：

First layer RBM is that there are one Gauss-Bernoulli Jacob of visible layer and a hidden layer to be limited Boltzmann machine for tool, and number of nodes is 2048, activation primitive Sigmoid；