CN109767781A - Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning - Google Patents

Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning Download PDF

Info

Publication number
CN109767781A
CN109767781A CN201910167788.8A CN201910167788A CN109767781A CN 109767781 A CN109767781 A CN 109767781A CN 201910167788 A CN201910167788 A CN 201910167788A CN 109767781 A CN109767781 A CN 109767781A
Authority
CN
China
Prior art keywords
speech
noise
signal
separating method
spectral density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910167788.8A
Other languages
Chinese (zh)
Inventor
张啟权
王明江
陆云
韩宇菲
张禄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201910167788.8A priority Critical patent/CN109767781A/en
Publication of CN109767781A publication Critical patent/CN109767781A/en
Priority to PCT/CN2019/117076 priority patent/WO2020177372A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Complex Calculations (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present invention provides a kind of speech separating method based on super-Gaussian priori speech model and deep learning, system and storage mediums, the speech separating method is including the use of clean speech power spectral density estimated value and noise power spectral density estimated value, so as to find out the prior weight in gain function, prior weight is brought into the value that gain function is obtained in gain function, it is multiplied to obtain the estimated value of clean speech amplitude spectrum with noisy speech spectrum using gain function value, the voice signal that we can be restored out using overlap-add technology.The beneficial effects of the present invention are: the present invention can not only effectively inhibit nonstationary noise signal under the combination of traditional statistical model and depth learning technology, while also solving the problems, such as that depth learning technology height relies on training data and generalization ability is weak.The combination of the two makes the enhancing performance of this method all show to obtain unusual robust under various noise circumstances and state of signal-to-noise.

Description

Speech separating method, system based on super-Gaussian priori speech model and deep learning And storage medium
Technical field
The present invention relates to voice processing technology fields, more particularly to based on super-Gaussian priori speech model and deep learning Speech separating method, system and storage medium.
Background technique
Since voice signal is usually polluted by the interference noise from surrounding, this leads to such as automatic speech recognition, man-machine The application such as dialogue, hearing aid encounters very big challenge.Existing traditional voice enhancing technology for nonstationary noise and The big heavy discount of performance in the case of low signal-to-noise ratio.Although the speech enhancement technique based on deep learning risen recently can be well Inhibit nonstationary noise, however the performance of this kind of algorithm is highly dependent on training data, for no study or trains The meeting of Data Representation is very bad.
Summary of the invention
The present invention provides a kind of speech separating method based on super-Gaussian priori speech model and deep learning, including such as Lower step:
Step 1: receiving Noisy Speech Signal;
Step 2: respectively using super-Gaussian statistical model and Gaussian statistics model to clean speech signal and noise signal Fourier Transform Coefficients are modeled, based on the statistical model it is assumed that using minimum mean square error criterion to clean speech signal Amplitude spectrum is estimated, the estimation of amplitude spectrum is obtained;
Step 3: estimating clean speech power spectral density using deep neural network;
Step 4: minimum mean square error criterion is based on using statistical model, and noise power spectral density is tracked, noise function Rate spectrum density most descends mean square error estimation to obtain by what recursive average current noise was composed;
Step 5: clean speech power spectral density estimated value, the step 4 obtained using step 3 obtains noise power spectral density Prior weight is brought into gain function so as to find out the prior weight in gain function and obtains gain letter by estimated value Several values is multiplied to obtain the estimated value of clean speech amplitude spectrum, utilizes overlap-add using gain function value with noisy speech spectrum Our clean speech signals that can be restored out of technology.
As a further improvement of the present invention, in the step 2, for super-Gaussian voice signal model, the ginseng of selection Numerical value is μ=0.2 and β=0.001.
As a further improvement of the present invention, in the step 3, there are two hidden layers for deep neural network framework, swash The line rectification unit that function living uses, output layer is using softmax function.
As a further improvement of the present invention, the number of nodes of first and the second hidden layer is 512, the training number of use According to integrating as TIMIT speech database.
As a further improvement of the present invention, in the step 3, in order to train deep neural network, it is necessary first to right Voice data is pre-processed, with signal-to-noise ratio be 0 the noise signals of clean speech and multiple types, 5,10,15dB mixed It closes to obtain Noisy Speech Signal;The input feature vector of deep neural network be 13 Jan Vermeer cepstrum coefficients and its single order and Second differnce coefficient.
As a further improvement of the present invention, in the step 3, sub-frame processing is carried out to each Noisy Speech Signal, Its 39 dimensional feature, including 13 Jan Vermeer cepstrum coefficients and a second differnce coefficient are extracted simultaneously;Furthermore for utilization before and after frames We using present frame and take the features of each three frame in front and back 7 frames in total to information, so the input feature vector dimension of input layer is 273 Dimension.
As a further improvement of the present invention, in the step 3, the cost function that deep neural network uses is intersection Entropy loss function, output layer belong to the probability of each phoneme using softmax output present frame, using belonging to each phoneme probability And its corresponding power spectrum does the estimation that weighted average calculation goes out clean speech power spectral density.
The speech Separation system based on super-Gaussian priori speech model and deep learning that the present invention also provides a kind of, it is special Sign is, comprising: memory, processor and the computer program being stored on the memory, the computer program are matched It is set to the step of realizing method of the present invention when being called by the processor.
The present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has calculating The step of machine program, the computer program realizes method of the present invention when being configured to be called by processor.
The beneficial effects of the present invention are: the present invention is by under the combination of traditional statistical model and depth learning technology, not only Nonstationary noise signal can effectively be inhibited, while also solving depth learning technology height and relying on training data and extensive energy The weak problem of power.The combination of the two makes the enhancing performance of this method under various noise circumstances and state of signal-to-noise All show to obtain unusual robust.
Detailed description of the invention
Fig. 1 is deep neural network architecture diagram of the invention.
Specific embodiment
The invention discloses a kind of speech separating method based on super-Gaussian priori speech model and deep learning, not only very Good inhibits nonstationary noise, also shows fine Generalization Capability simultaneously for the data that do not trained.
The present invention is mainly to pass through to realize that the voice an of robust increases in conjunction with traditional statistical model and depth learning technology Strong method.Entire method mainly includes four parts: using the speech gain function based on super-Gaussian voice hypothesized model, utilization Neural network estimates the meter of the estimation of power spectrum, noise power spectrum of clean speech signal, prior weight and gain function It calculates.
Introduce signal model first: it is contemplated that additive signal model, y (n)=x (n)+d (n), wherein y (n) be Noisy Speech Signal, x (n) and d (n) respectively represent clean speech signal and noise signal.By using Short Time Fourier Transform The relationship of time-frequency domain is obtained, Y (l, k)=X (l, k)+D (l, k), wherein l and k respectively represents the index of frame number and Frequency point.Language Sound and the Fourier Transform Coefficients of noise signal obey super-Gaussian and Gaussian Profile respectively.
The present invention is based on the speech separating methods of super-Gaussian priori speech model and deep learning to include the following steps:
Step 1: receiving Noisy Speech Signal;
Step 2: respectively using super-Gaussian statistical model and Gaussian statistics model to clean speech signal and noise signal Fourier Transform Coefficients are modeled, based on the statistical model it is assumed that using minimum mean square error criterion to clean speech signal Amplitude spectrum is estimated, being estimated as follows for amplitude spectrum is obtained:
Here, ξ=λxdRepresent prior weight, λx=E | X (l, k) |2And λd=E | D (l, k) |2It is pure respectively Net phonetic speech power spectrum density and noise power spectral density.Furthermore ζ=γ ξ/(μ+ξ) wherein γ=| Y (l, k) |2d(l, k) is represented Posteriori SNR.M (,;) indicate confluent hypergeometric function.The parameter that we select for super-Gaussian voice signal model Value is μ=0.2 and β=0.001.
Step 3: estimating clean speech power spectral density using deep neural network;
It will be seen that gain function depends on the calculating of prior weight from formula (1), and prior weight It calculates and depends on clean speech power spectral density and noise power spectral density.So step 3 mainly estimates clean speech power Spectrum density.The present invention estimates pure phonetic speech power spectrum density using deep neural network.The depth nerve net that the present invention uses Network framework is as shown in Figure 1.
The neural network framework used is there are two hidden layer, the line rectification unit (ReLu) that activation primitive uses, output Layer is using softmax function.The number of nodes of first and the second hidden layer is 512.The training dataset used for TIMIT speech database.
In order to train neural network, it is necessary first to be pre-processed to voice data, we are clean speech and many classes The noise signal of type with signal-to-noise ratio is 0,5,10,15dB carry out mixing to obtaining Noisy Speech Signal.The input of neural network Feature be 13 Jan Vermeer cepstrum coefficients (Mel Frequency Cepstrum Coefficient, MFCC) and its single order and Second differnce coefficient.So we carry out sub-frame processing to each Noisy Speech Signal, while its 39 dimensional feature is extracted, including 13 Tie up MFCC and a second differnce coefficient.Furthermore in order to which using the information of before and after frames, we use present frame and take each three frame in front and back The feature of 7 frames in total is expressed as so the input feature vector dimension of input layer is 273 (39 multiplied by 7) dimension
Zl=[z1,l,z2,l,…,zV,l] (3)
Wherein V=273.
The cost function that neural network uses is cross entropy loss function.Trained target is which identification present frame belongs to A phoneme, so output layer belongs to the probability of each phoneme using softmax output present frame, be expressed as P (q | Zl), it uses One-hot vector indicates.All phonemes include mute in TIMIT data set and non-voice state is divided into Q=61 Classification, q ∈ { 1,2,3 ..., Q }.
Finally using belonging to each phoneme probability and its corresponding power spectrum does weighted average calculation and goes out clean speech function The estimation of rate spectrum density.Training neural network, we select the calculating of Adam optimization algorithm progress gradient.
Step 4: by step 3, we have obtained the estimation of pure phonetic speech power spectrum density.Find out from formula (1), first It tests the calculating of signal-to-noise ratio while needing clean speech power spectral density and noise power spectral density.So in step 4, Wo Menli Least mean-square error (Minimum Mean Squared Error, MMSE) criterion is based on to noise function with traditional statistical model Rate spectrum density is tracked.Noise power spectral density most descends mean square error estimation to obtain by what recursive average current noise was composed ?.
Step 5: by step 3 and step 4, we have obtained clean speech power spectral density and noise power spectrum is close The estimated value of degree.Using value is estimated to obtain, we can find out the significant variable in gain function, prior weight.It carries it into The value of our available gain functions into gain function is multiplied us with noisy speech spectrum using gain function value Obtain the estimated value of pure voice amplitudes spectrum.The clean speech letter that we can be restored out using overlap-add technology Number.Overlap-add technology is technology of the common frequency restoration to time domain.
The speech Separation system based on super-Gaussian priori speech model and deep learning that the invention also discloses a kind of, packet Include: memory, processor and the computer program being stored on the memory, the computer program are configured to by described The step of processor realizes method of the present invention when calling.
The invention also discloses a kind of computer readable storage medium, the computer-readable recording medium storage has calculating The step of machine program, the computer program realizes method of the present invention when being configured to be called by processor.
Beneficial effects of the present invention are as follows:
1. present invention employs the super-Gaussian distribution hypothesized models for more meeting voice Fourier coefficient statistical property, so that estimating The gain function of gauge is more accurate.
2. being energized Speech processing using depth learning technology.Learnt using the powerful modeling ability of depth learning technology Noisy speech can effectively inhibit the noise signal of height non-stationary to the mapping relations between clean speech signal.
3. effectively nonstationary noise can not only be inhibited to believe under the combination of traditional statistical model and depth learning technology Number, while also solving the problems, such as that depth learning technology height relies on training data and generalization ability is weak.The combination of the two makes The enhancing performance of this method all shows to obtain unusual robust under various noise circumstances and state of signal-to-noise.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention Protection scope.

Claims (10)

1. a kind of speech separating method based on super-Gaussian priori speech model and deep learning, which is characterized in that including as follows Step:
Step 1: receiving Noisy Speech Signal;
Step 2: respectively using super-Gaussian statistical model and Gaussian statistics model in Fu of clean speech signal and noise signal Leaf transformation coefficient is modeled, based on the statistical model it is assumed that using minimum mean square error criterion to clean speech signal amplitude Spectrum is estimated, the estimation of amplitude spectrum is obtained;
Step 3: estimating clean speech power spectral density using deep neural network;
Step 4: minimum mean square error criterion is based on using statistical model, and noise power spectral density is tracked, noise power spectrum Density most descends mean square error estimation to obtain by what recursive average current noise was composed;
Step 5: clean speech power spectral density estimated value, the step 4 obtained using step 3 obtains noise power spectral density estimation Value, so as to find out the prior weight in gain function, prior weight is brought into gain function and obtains gain function Value is multiplied to obtain the estimated value of clean speech amplitude spectrum, utilizes overlap-add technology using gain function value with noisy speech spectrum The clean speech signal that we can be restored out.
2. speech separating method according to claim 1, which is characterized in that in the step 2, obtain estimating for amplitude spectrum It counts as follows:
Wherein, ξ=λxdRepresent prior weight, λx=E | X (l, k) |2And λd=E | D (l, k) |2It is pure language respectively Sound power spectral density and noise power spectral density;Furthermore the γ in ζ=γ ξ/(μ+ξ)=| Y (l, k) |2d(l, k) represents posteriority Signal-to-noise ratio, M (,;) indicate confluent hypergeometric function.
3. speech separating method according to claim 2, which is characterized in that in the step 2, for super-Gaussian voice Signal model, the parameter value selected are μ=0.2 and β=0.001.
4. speech separating method according to claim 1, which is characterized in that in the step 3, deep neural network frame There are two hidden layers for structure, and the line rectification unit that activation primitive uses, output layer is using softmax function.
5. speech separating method according to claim 4, which is characterized in that the number of nodes of first and the second hidden layer is equal It is 512, the training dataset used is TIMIT speech database.
6. speech separating method according to claim 1, which is characterized in that in the step 3, in order to train depth refreshing Through network, it is necessary first to be pre-processed to voice data, the noise signal clean speech and multiple types is with signal-to-noise ratio 0,5,10,15dB carries out mixing to obtain Noisy Speech Signal;The input feature vector of deep neural network is 13 Jan Vermeer cepstrums Coefficient and its single order and second differnce coefficient.
7. speech separating method according to claim 6, which is characterized in that in the step 3, to each noisy speech Signal carries out sub-frame processing, while extracting its 39 dimensional feature, including 13 Jan Vermeer cepstrum coefficients and a second differnce coefficient;Furthermore In order to which using the information of before and after frames, we using present frame and take the features of each three frame in front and back 7 frames in total, so input layer Input feature vector dimension is 273 dimensions.
8. speech separating method according to claim 6, which is characterized in that in the step 3, deep neural network is adopted Cost function is cross entropy loss function, and output layer belongs to the probability of each phoneme, benefit using softmax output present frame With belonging to each phoneme probability and its corresponding power spectrum does the estimation that weighted average calculation goes out clean speech power spectral density.
9. a kind of speech Separation system based on super-Gaussian priori speech model and deep learning characterized by comprising storage Device, processor and the computer program being stored on the memory, the computer program are configured to by the processor The step of method of any of claims 1-8 is realized when calling.
10. a kind of computer readable storage medium, it is characterised in that: the computer-readable recording medium storage has computer journey Sequence, the computer program realize the step of method of any of claims 1-8 when being configured to be called by processor Suddenly.
CN201910167788.8A 2019-03-06 2019-03-06 Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning Pending CN109767781A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910167788.8A CN109767781A (en) 2019-03-06 2019-03-06 Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning
PCT/CN2019/117076 WO2020177372A1 (en) 2019-03-06 2019-11-11 Voice separation method and system based on super-gaussian prior voice module and deep learning, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910167788.8A CN109767781A (en) 2019-03-06 2019-03-06 Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning

Publications (1)

Publication Number Publication Date
CN109767781A true CN109767781A (en) 2019-05-17

Family

ID=66457658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910167788.8A Pending CN109767781A (en) 2019-03-06 2019-03-06 Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning

Country Status (2)

Country Link
CN (1) CN109767781A (en)
WO (1) WO2020177372A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144347A (en) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 Data processing method, device, platform and storage medium
WO2020177372A1 (en) * 2019-03-06 2020-09-10 哈尔滨工业大学(深圳) Voice separation method and system based on super-gaussian prior voice module and deep learning, and storage medium
CN112289337A (en) * 2020-11-03 2021-01-29 北京声加科技有限公司 Method and device for filtering residual noise after machine learning voice enhancement
CN112653979A (en) * 2020-12-29 2021-04-13 苏州思必驰信息科技有限公司 Adaptive dereverberation method and device
WO2021208287A1 (en) * 2020-04-14 2021-10-21 深圳壹账通智能科技有限公司 Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium
WO2022161277A1 (en) * 2021-01-29 2022-08-04 北京沃东天骏信息技术有限公司 Speech enhancement method, model training method, and related device
CN116580723A (en) * 2023-07-13 2023-08-11 合肥星本本网络科技有限公司 Voice detection method and system in strong noise environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001003125B1 (en) * 1999-07-02 2001-02-08 Conexant Systems Inc Bi-directional pitch enhancement in speech coding systems
CN105632512A (en) * 2016-01-14 2016-06-01 华南理工大学 Dual-sensor voice enhancement method based on statistics model and device
CN107610712A (en) * 2017-10-18 2018-01-19 会听声学科技(北京)有限公司 The improved MMSE of combination and spectrum-subtraction a kind of sound enhancement method
CN108074582A (en) * 2016-11-10 2018-05-25 电信科学技术研究院 A kind of noise suppressed signal-noise ratio estimation method and user terminal

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2341299A (en) * 1998-09-04 2000-03-08 Motorola Ltd Suppressing noise in a speech communications unit
CN101685638B (en) * 2008-09-25 2011-12-21 华为技术有限公司 Method and device for enhancing voice signals
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
CN103903631B (en) * 2014-03-28 2017-10-03 哈尔滨工程大学 Voice signal blind separating method based on Variable Step Size Natural Gradient Algorithm
US9564144B2 (en) * 2014-07-24 2017-02-07 Conexant Systems, Inc. System and method for multichannel on-line unsupervised bayesian spectral filtering of real-world acoustic noise
CN107731242B (en) * 2017-09-26 2020-09-04 桂林电子科技大学 Gain function speech enhancement method for generalized maximum posterior spectral amplitude estimation
CN109767781A (en) * 2019-03-06 2019-05-17 哈尔滨工业大学(深圳) Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001003125B1 (en) * 1999-07-02 2001-02-08 Conexant Systems Inc Bi-directional pitch enhancement in speech coding systems
CN105632512A (en) * 2016-01-14 2016-06-01 华南理工大学 Dual-sensor voice enhancement method based on statistics model and device
CN108074582A (en) * 2016-11-10 2018-05-25 电信科学技术研究院 A kind of noise suppressed signal-noise ratio estimation method and user terminal
CN107610712A (en) * 2017-10-18 2018-01-19 会听声学科技(北京)有限公司 The improved MMSE of combination and spectrum-subtraction a kind of sound enhancement method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ROBERT REHR .ET AL: "On the Importance of Super-Gaussian Speech Priors", 《IEEE》 *
TIMO GERKMANN.ET AL: "NOISE POWER ESTIMATION BASED ON THE PROBABILITY OF SPEECH PRESENCE", 《IEEE》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020177372A1 (en) * 2019-03-06 2020-09-10 哈尔滨工业大学(深圳) Voice separation method and system based on super-gaussian prior voice module and deep learning, and storage medium
CN111144347A (en) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 Data processing method, device, platform and storage medium
WO2021208287A1 (en) * 2020-04-14 2021-10-21 深圳壹账通智能科技有限公司 Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium
CN112289337A (en) * 2020-11-03 2021-01-29 北京声加科技有限公司 Method and device for filtering residual noise after machine learning voice enhancement
CN112289337B (en) * 2020-11-03 2023-09-01 北京声加科技有限公司 Method and device for filtering residual noise after machine learning voice enhancement
CN112653979A (en) * 2020-12-29 2021-04-13 苏州思必驰信息科技有限公司 Adaptive dereverberation method and device
WO2022161277A1 (en) * 2021-01-29 2022-08-04 北京沃东天骏信息技术有限公司 Speech enhancement method, model training method, and related device
CN116580723A (en) * 2023-07-13 2023-08-11 合肥星本本网络科技有限公司 Voice detection method and system in strong noise environment
CN116580723B (en) * 2023-07-13 2023-09-08 合肥星本本网络科技有限公司 Voice detection method and system in strong noise environment

Also Published As

Publication number Publication date
WO2020177372A1 (en) 2020-09-10

Similar Documents

Publication Publication Date Title
CN109767781A (en) Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning
CN106486131B (en) A kind of method and device of speech de-noising
CN109841206A (en) A kind of echo cancel method based on deep learning
CN102324232A (en) Method for recognizing sound-groove and system based on gauss hybrid models
CN109949821B (en) Method for removing reverberation of far-field voice by using U-NET structure of CNN
CN102693724A (en) Noise classification method of Gaussian Mixture Model based on neural network
CN109887489A (en) Speech dereverberation method based on the depth characteristic for generating confrontation network
CN109346084A (en) Method for distinguishing speek person based on depth storehouse autoencoder network
Astudillo et al. An uncertainty propagation approach to robust ASR using the ETSI advanced front-end
CN112017682A (en) Single-channel voice simultaneous noise reduction and reverberation removal system
CN106373559A (en) Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
CN103021405A (en) Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN108257606A (en) A kind of robust speech personal identification method based on the combination of self-adaptive parallel model
González et al. MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition
Roy et al. DeepLPC-MHANet: Multi-head self-attention for augmented Kalman filter-based speech enhancement
Jensen et al. Minimum mean-square error estimation of mel-frequency cepstral features–a theoretically consistent approach
Deligne et al. Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)
Wang et al. Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm
Nathwani et al. An extended experimental investigation of DNN uncertainty propagation for noise robust ASR
Xu et al. Vector taylor series based joint uncertainty decoding.
Chen Noise reduction of bird calls based on a combination of spectral subtraction, Wiener filtering, and Kalman filtering
Fingscheidt et al. Data-driven speech enhancement
KR20170087211A (en) Feature compensation system and method for recognizing voice
Chang et al. Multiple statistical models for soft decision in noisy speech enhancement
Li et al. Enhanced speech based jointly statistical probability distribution function for voice activity detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190517

RJ01 Rejection of invention patent application after publication