CN108335702A - A kind of audio defeat method based on deep neural network - Google Patents

A kind of audio defeat method based on deep neural network Download PDF

Info

Publication number
CN108335702A
CN108335702A CN201810101400.XA CN201810101400A CN108335702A CN 108335702 A CN108335702 A CN 108335702A CN 201810101400 A CN201810101400 A CN 201810101400A CN 108335702 A CN108335702 A CN 108335702A
Authority
CN
China
Prior art keywords
audio
power spectrum
dnn
log power
defeat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810101400.XA
Other languages
Chinese (zh)
Inventor
余春艳
齐子铭
管发乾
张栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201810101400.XA priority Critical patent/CN108335702A/en
Publication of CN108335702A publication Critical patent/CN108335702A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Abstract

The audio defeat method based on deep neural network that the present invention relates to a kind of.This method is training and two stages of test;In the training stage, the training data of DNN models is made an uproar by band to be formed with pure audio, since log-spectral domain more meets the auditory system of human ear, to the audio extraction log power spectrum of input as feature;In test phase, the log power spectrum of the transformed song of tone color is input in the DNN models obtained from the training stage, the output of model is exactly the log power spectrum of audio after noise reduction, since the Auditory Perception of human ear is to the phase information of audio and insensitive, so phase information is directly calculated from the transformed song of Multisound, the log power spectrum and phase information for finally combining the output of DNN models, reconstruct the audio after noise reduction.The method of the present invention can be to the noise reduction of voice and song audio after audio especially tone color conversion.

Description

A kind of audio defeat method based on deep neural network
Technical field
The present invention relates to the audio defeat method in field of singing, especially a kind of audio defeat based on deep neural network Method.
Background technology
Real-life voice audio signals or song audio signal are not often pure, all with various The noise of various kinds.And the purpose of audio defeat is exactly the noise removed as far as possible in audio signal, makes the transformed song of tone color Sound is purer, so as to improve the quality of audio, improves its clarity and intelligibility.
Traditional audio defeat method mainly has Bayes' assessment based on statistical model, Subspace algorithm, spectrum-subtraction Deng.These algorithms all have very strong hypothesis to the characteristic of noise, wherein the computation complexity of spectrum-subtraction is minimum, it is only necessary to carry out Positive inversefouriertransform, however when the signal-to-noise ratio of audio signal is relatively low, spectrum-subtraction is very big to the intelligibility damage of audio.
Most of traditional unsupervised noise reduction algorithm be all based on ambient noise additivity feature or audio and noise it Between certain statistical property and propose and realize, which results in the scope of application very littles of these algorithms.From answering for noise jamming Polygamy sets out, consider using this nonlinear model of deep neural network to noise frequency pure audio between mapping relations into Row modeling, and realize the noise reduction to audio after tone color conversion.
Therefore, this patent is based on above-mentioned analysis, and the stronger audio defeat of generalization ability is trained using deep neural network Model completes the noise reduction to audio.
Invention content
The audio defeat method based on deep neural network that the purpose of the present invention is to provide a kind of, can be special to audio It is the noise reduction of voice and song audio after tone color is converted.
To achieve the above object, the technical scheme is that:A kind of audio defeat method based on deep neural network, Include the following steps:
Step S1:Data are pre-processed, band is obtained and makes an uproar audio data;
Step S2:Training DNN audio defeat models, it is transformed that obtained DNN audio defeats model can complete tone color Mapping between the log power spectrum of song and the log power spectrum of pure audio;
Step S3:Noise reduction is carried out to the transformed song of tone color, that is, combines trained DNN audio defeats mould in step S2 Type, the log power spectrum of output and phase information, reconstruct the audio after noise reduction.
In an embodiment of the present invention, the specific implementation of the step S1 is:Using TIMIT data sets as pure Audio data;And different signal-to-noise ratio grades and different types of a variety of noises are added in pure audio, band is generated with this and is made an uproar Audio data.
In an embodiment of the present invention, the signal-to-noise ratio grade includes 20dB, 15dB, 10dB.
In an embodiment of the present invention, the type of the noise include additive white Gaussian noise, Babble, Restaurant、Street、Car、Exhibition。
In an embodiment of the present invention, the step S2 specifically includes following steps:
Step S21:With the log power spectrum with noise frequency to stack RBM carry out pre-training, using it is unsupervised, successively covet Greedy training method updates the parameter of RBM with CD algorithms;
Step S22:With the DNN audio defeat models that stochastic gradient descent algorithm training is whole;In DNN audio defeat models The parameter of the parts RBM is initialized using the parameter that step S21 is trained, the parameter of DNN audio defeat model output layers Carry out random initializtion;The loss function of DNN audio defeat models is the log power spectrum and DNN audio defeat moulds of pure audio The least mean-square error between log power spectrum after the noise reduction of type output, calculation formula are as follows:
Wherein, E indicates mean square error;WithThe log power spectrum after n-th of sample noise reduction and pure sound are indicated respectively The log power spectrum of frequency;N indicates total number of samples;D indicates the size of log power spectrum;(Wl, bl) indicate l layers of weight And biasing;The update mode of weight W and biasing b are as follows:
Wherein, λ indicates learning rate.
In an embodiment of the present invention, in the step S22, DNN audio defeat models constitute as follows:
First layer RBM is that there are one Gauss-Bernoulli Jacob of visible layer and a hidden layer to be limited Boltzmann machine, node for tool Number is 2048, activation primitive Sigmoid;
Second layer RBM is that Bernoulli Jacob-Bernoulli Jacob is limited Boltzmann machine, number of nodes 2048, activation primitive Sigmoid;
Third layer RBM is that Bernoulli Jacob-Bernoulli Jacob is limited Boltzmann machine, number of nodes 2048, activation primitive Sigmoid;
4th layer of RBM is output layer, and number of nodes 257, activation primitive is linear activation primitive.
Compared to the prior art, the invention has the advantages that:The method of the present invention is trained using deep neural network Go out the stronger audio defeat model of generalization ability, completes the noise reduction to audio.
Description of the drawings
Fig. 1 is the method flow schematic block diagram of the present invention.
Fig. 2 is the structure chart for the deep neural network applied in the embodiment of the present invention.
Specific implementation mode
Below in conjunction with the accompanying drawings, technical scheme of the present invention is specifically described.
A kind of audio defeat method based on deep neural network of the present invention, as shown in Figure 1, including the following steps:
Step S1:Data are pre-processed.
Step S2:Training DNN audio defeat models, obtained DNN models can complete pair of the transformed song of tone color Mapping between number power spectrum and the log power spectrum of pure audio.
Step S3:Noise reduction is carried out to the transformed song of tone color, in conjunction with trained DNN models, the logarithm of output in S2 Power spectrum and phase information reconstruct the audio after noise reduction.
Further, the specific practice of the step S1 is to use TIMIT data sets as pure audio data;And pure A variety of noises of different brackets signal-to-noise ratio are added in net audio, signal-to-noise ratio grade has 20dB, 15dB, 10dB etc., the type of noise Including additive white Gaussian noise, Babble, Restaurant, Street, Car, Exhibition etc., made an uproar with this to generate band Audio data.
Further, the step S2 specifically includes following steps:
Step S21:With the log power spectrum with noise frequency to stack RBM carry out pre-training, use it is unsupervised, by The greedy training method of layer, updates the parameter of RBM with CD algorithms (Contrastive Divergence).
Step S22:With stochastic gradient descent algorithm (Stochastic Gradient Descent, SGD) training entirety DNN noise reduction models.The parameter of the parts RBM is initialized with the parameter that previous step pre-training obtains in DNN models, model output The parameter of layer carries out random initializtion.After the noise reduction that the loss function of model exports for the log power spectrum of pure audio with model Log power spectrum between least mean-square error (Minimum Mean Squared Error, MMSE), calculation formula is as follows:
Wherein, E indicates mean square error;WithThe log power spectrum after n-th of sample noise reduction and pure sound are indicated respectively The log power spectrum of frequency;N indicates total number of samples;D indicates the size of log power spectrum;(Wl, bl) indicate l layers of weight And biasing;The update mode of weight W and biasing b are as follows:
Wherein, λ indicates learning rate.
In the present embodiment, step S22 is as shown, DNN noise reduction models are as follows:
First layer RBM is that there are one Gauss-Bernoulli Jacob of visible layer and a hidden layer to be limited Boltzmann machine for tool (Gaussian-BernoulliRBM, GBRBM), number of nodes 2048, activation primitive Sigmoid;
Second layer RBM is that Bernoulli Jacob-Bernoulli Jacob is limited Boltzmann machine (Bernoulli-BernoulliRBM, BBRBM), Number of nodes 2048, activation primitive Sigmoid;
Third layer RBM be Bernoulli Jacob-Bernoulli Jacob be limited Boltzmann machine (Bernoulli-Bernoulli RBM, BBRBM), number of nodes 2048, activation primitive Sigmoid;
4th layer of RBM is output layer, and number of nodes 257, activation primitive is linear activation primitive.
The above are preferred embodiments of the present invention, all any changes made according to the technical solution of the present invention, and generated function is made When with range without departing from technical solution of the present invention, all belong to the scope of protection of the present invention.

Claims (6)

1. a kind of audio defeat method based on deep neural network, which is characterized in that include the following steps:
Step S1:Data are pre-processed, band is obtained and makes an uproar audio data;
Step S2:Training DNN audio defeat models, obtained DNN audio defeats model can complete the transformed song of tone color Log power spectrum and the log power spectrum of pure audio between mapping;
Step S3:To the transformed song of tone color carry out noise reduction, that is, combine step S2 in trained DNN audio defeats model, The log power spectrum and phase information of output, reconstruct the audio after noise reduction.
2. a kind of audio defeat method based on deep neural network according to claim 1, which is characterized in that the step Suddenly the specific implementation of S1 is:Using TIMIT data sets as pure audio data;And it is added in pure audio different Signal-to-noise ratio grade and different types of a variety of noises generate band with this and make an uproar audio data.
3. a kind of audio defeat method based on deep neural network according to claim 2, which is characterized in that the letter It includes 20dB, 15dB, 10dB to make an uproar than grade.
4. a kind of audio defeat method based on deep neural network according to claim 2, which is characterized in that described to make an uproar The type of sound includes additive white Gaussian noise, Babble, Restaurant, Street, Car, Exhibition.
5. a kind of audio defeat method based on deep neural network according to claim 1, which is characterized in that the step Rapid S2 specifically includes following steps:
Step S21:With the log power spectrum with noise frequency to stack RBM carries out pre-training, using it is unsupervised, successively greediness Training method updates the parameter of RBM with CD algorithms;
Step S22:With the DNN audio defeat models that stochastic gradient descent algorithm training is whole;RBM in DNN audio defeat models Partial parameter is initialized using the parameter that step S21 is trained, and the parameter of DNN audio defeat model output layers carries out Random initializtion;The loss function of DNN audio defeat models is defeated for the log power spectrum and DNN audio defeat models of pure audio The least mean-square error between log power spectrum after the noise reduction gone out, calculation formula are as follows:
Wherein, E indicates mean square error;WithThe log power spectrum after n-th of sample noise reduction and pure audio are indicated respectively Log power spectrum;N indicates total number of samples;D indicates the size of log power spectrum;(Wl, bl) indicate l layer of weight and inclined It sets;The update mode of weight W and biasing b are as follows:
Wherein, λ indicates learning rate.
6. a kind of audio defeat method based on deep neural network according to claim 5, which is characterized in that the step In rapid S22, DNN audio defeat models constitute as follows:
First layer RBM is that there are one Gauss-Bernoulli Jacob of visible layer and a hidden layer to be limited Boltzmann machine for tool, and number of nodes is 2048, activation primitive Sigmoid;
Second layer RBM is that Bernoulli Jacob-Bernoulli Jacob is limited Boltzmann machine, number of nodes 2048, activation primitive Sigmoid;
Third layer RBM is that Bernoulli Jacob-Bernoulli Jacob is limited Boltzmann machine, number of nodes 2048, activation primitive Sigmoid;
4th layer of RBM is output layer, and number of nodes 257, activation primitive is linear activation primitive.
CN201810101400.XA 2018-02-01 2018-02-01 A kind of audio defeat method based on deep neural network Pending CN108335702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810101400.XA CN108335702A (en) 2018-02-01 2018-02-01 A kind of audio defeat method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810101400.XA CN108335702A (en) 2018-02-01 2018-02-01 A kind of audio defeat method based on deep neural network

Publications (1)

Publication Number Publication Date
CN108335702A true CN108335702A (en) 2018-07-27

Family

ID=62927933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810101400.XA Pending CN108335702A (en) 2018-02-01 2018-02-01 A kind of audio defeat method based on deep neural network

Country Status (1)

Country Link
CN (1) CN108335702A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036412A (en) * 2018-09-17 2018-12-18 苏州奇梦者网络科技有限公司 voice awakening method and system
CN109147817A (en) * 2018-08-29 2019-01-04 昆明理工大学 A kind of denoising audio feature extraction algorithm based on the limited Boltzmann machine that makes a variation
CN109378010A (en) * 2018-10-29 2019-02-22 珠海格力电器股份有限公司 Training method, the speech de-noising method and device of neural network model
CN109378013A (en) * 2018-11-19 2019-02-22 南瑞集团有限公司 A kind of voice de-noising method
CN111292768A (en) * 2020-02-07 2020-06-16 腾讯科技(深圳)有限公司 Method and device for hiding lost packet, storage medium and computer equipment
CN111341332A (en) * 2020-02-28 2020-06-26 重庆邮电大学 Speech feature enhancement post-filtering method based on deep neural network
CN111554321A (en) * 2020-04-20 2020-08-18 北京达佳互联信息技术有限公司 Noise reduction model training method and device, electronic equipment and storage medium
CN112202778A (en) * 2020-09-30 2021-01-08 联想(北京)有限公司 Information processing method and device and electronic equipment
EP3913904A1 (en) * 2019-06-10 2021-11-24 Google LLC Training a model for speech and noise energy estimation
CN115659150A (en) * 2022-12-23 2023-01-31 中国船级社 Signal processing method, device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104966517A (en) * 2015-06-02 2015-10-07 华为技术有限公司 Voice frequency signal enhancement method and device
CN105023580A (en) * 2015-06-25 2015-11-04 中国人民解放军理工大学 Unsupervised noise estimation and speech enhancement method based on separable deep automatic encoding technology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104966517A (en) * 2015-06-02 2015-10-07 华为技术有限公司 Voice frequency signal enhancement method and device
CN105023580A (en) * 2015-06-25 2015-11-04 中国人民解放军理工大学 Unsupervised noise estimation and speech enhancement method based on separable deep automatic encoding technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YONG XU等: ""A Regression Approach to Speech Enhancement Based on Deep Neural Networks"", 《IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147817A (en) * 2018-08-29 2019-01-04 昆明理工大学 A kind of denoising audio feature extraction algorithm based on the limited Boltzmann machine that makes a variation
CN109036412A (en) * 2018-09-17 2018-12-18 苏州奇梦者网络科技有限公司 voice awakening method and system
CN109378010A (en) * 2018-10-29 2019-02-22 珠海格力电器股份有限公司 Training method, the speech de-noising method and device of neural network model
CN109378013B (en) * 2018-11-19 2023-02-03 南瑞集团有限公司 Voice noise reduction method
CN109378013A (en) * 2018-11-19 2019-02-22 南瑞集团有限公司 A kind of voice de-noising method
EP3913904A1 (en) * 2019-06-10 2021-11-24 Google LLC Training a model for speech and noise energy estimation
CN111292768A (en) * 2020-02-07 2020-06-16 腾讯科技(深圳)有限公司 Method and device for hiding lost packet, storage medium and computer equipment
CN111292768B (en) * 2020-02-07 2023-06-02 腾讯科技(深圳)有限公司 Method, device, storage medium and computer equipment for hiding packet loss
CN111341332A (en) * 2020-02-28 2020-06-26 重庆邮电大学 Speech feature enhancement post-filtering method based on deep neural network
CN111554321A (en) * 2020-04-20 2020-08-18 北京达佳互联信息技术有限公司 Noise reduction model training method and device, electronic equipment and storage medium
CN111554321B (en) * 2020-04-20 2023-12-05 北京达佳互联信息技术有限公司 Noise reduction model training method and device, electronic equipment and storage medium
CN112202778A (en) * 2020-09-30 2021-01-08 联想(北京)有限公司 Information processing method and device and electronic equipment
CN115659150A (en) * 2022-12-23 2023-01-31 中国船级社 Signal processing method, device and equipment

Similar Documents

Publication Publication Date Title
CN108335702A (en) A kind of audio defeat method based on deep neural network
CN109859767B (en) Environment self-adaptive neural network noise reduction method, system and storage medium for digital hearing aid
CN105611477B (en) The voice enhancement algorithm that depth and range neutral net are combined in digital deaf-aid
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN105023580B (en) Unsupervised noise estimation based on separable depth automatic coding and sound enhancement method
CN108922518A (en) voice data amplification method and system
CN108615533B (en) High-performance voice enhancement method based on deep learning
CN112151059A (en) Microphone array-oriented channel attention weighted speech enhancement method
CN111292762A (en) Single-channel voice separation method based on deep learning
Parveen et al. Speech enhancement with missing data techniques using recurrent neural networks
Yuliani et al. Speech enhancement using deep learning methods: A review
CN107967920A (en) A kind of improved own coding neutral net voice enhancement algorithm
CN112992121B (en) Voice enhancement method based on attention residual error learning
Wu et al. Increasing compactness of deep learning based speech enhancement models with parameter pruning and quantization techniques
CN113744749B (en) Speech enhancement method and system based on psychoacoustic domain weighting loss function
Roy et al. DeepLPC-MHANet: Multi-head self-attention for augmented Kalman filter-based speech enhancement
Shi et al. End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network.
CN110136746B (en) Method for identifying mobile phone source in additive noise environment based on fusion features
Jia et al. A deep learning-based time-domain approach for non-intrusive speech quality assessment
CN114360571A (en) Reference-based speech enhancement method
Dash et al. Mitigating information interruptions by COVID-19 face masks: a three-stage speech enhancement scheme
Gandhiraj et al. Auditory-based wavelet packet filterbank for speech recognition using neural network
Sunny et al. Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam
Kashani et al. Speech Enhancement via Deep Spectrum Image Translation Network
Wang et al. Robust speech recognition from ratio masks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180727