CN109767781A - Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning - Google Patents
Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning Download PDFInfo
- Publication number
- CN109767781A CN109767781A CN201910167788.8A CN201910167788A CN109767781A CN 109767781 A CN109767781 A CN 109767781A CN 201910167788 A CN201910167788 A CN 201910167788A CN 109767781 A CN109767781 A CN 109767781A
- Authority
- CN
- China
- Prior art keywords
- speech
- noise
- signal
- separating method
- spectral density
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000013135 deep learning Methods 0.000 title claims abstract description 12
- 230000006870 function Effects 0.000 claims abstract description 34
- 238000001228 spectrum Methods 0.000 claims abstract description 29
- 230000003595 spectral effect Effects 0.000 claims abstract description 22
- 238000013179 statistical model Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims 1
- 230000002708 enhancing effect Effects 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/0308—Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Complex Calculations (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The present invention provides a kind of speech separating method based on super-Gaussian priori speech model and deep learning, system and storage mediums, the speech separating method is including the use of clean speech power spectral density estimated value and noise power spectral density estimated value, so as to find out the prior weight in gain function, prior weight is brought into the value that gain function is obtained in gain function, it is multiplied to obtain the estimated value of clean speech amplitude spectrum with noisy speech spectrum using gain function value, the voice signal that we can be restored out using overlap-add technology.The beneficial effects of the present invention are: the present invention can not only effectively inhibit nonstationary noise signal under the combination of traditional statistical model and depth learning technology, while also solving the problems, such as that depth learning technology height relies on training data and generalization ability is weak.The combination of the two makes the enhancing performance of this method all show to obtain unusual robust under various noise circumstances and state of signal-to-noise.
Description
Technical field
The present invention relates to voice processing technology fields, more particularly to based on super-Gaussian priori speech model and deep learning
Speech separating method, system and storage medium.
Background technique
Since voice signal is usually polluted by the interference noise from surrounding, this leads to such as automatic speech recognition, man-machine
The application such as dialogue, hearing aid encounters very big challenge.Existing traditional voice enhancing technology for nonstationary noise and
The big heavy discount of performance in the case of low signal-to-noise ratio.Although the speech enhancement technique based on deep learning risen recently can be well
Inhibit nonstationary noise, however the performance of this kind of algorithm is highly dependent on training data, for no study or trains
The meeting of Data Representation is very bad.
Summary of the invention
The present invention provides a kind of speech separating method based on super-Gaussian priori speech model and deep learning, including such as
Lower step:
Step 1: receiving Noisy Speech Signal;
Step 2: respectively using super-Gaussian statistical model and Gaussian statistics model to clean speech signal and noise signal
Fourier Transform Coefficients are modeled, based on the statistical model it is assumed that using minimum mean square error criterion to clean speech signal
Amplitude spectrum is estimated, the estimation of amplitude spectrum is obtained;
Step 3: estimating clean speech power spectral density using deep neural network;
Step 4: minimum mean square error criterion is based on using statistical model, and noise power spectral density is tracked, noise function
Rate spectrum density most descends mean square error estimation to obtain by what recursive average current noise was composed;
Step 5: clean speech power spectral density estimated value, the step 4 obtained using step 3 obtains noise power spectral density
Prior weight is brought into gain function so as to find out the prior weight in gain function and obtains gain letter by estimated value
Several values is multiplied to obtain the estimated value of clean speech amplitude spectrum, utilizes overlap-add using gain function value with noisy speech spectrum
Our clean speech signals that can be restored out of technology.
As a further improvement of the present invention, in the step 2, for super-Gaussian voice signal model, the ginseng of selection
Numerical value is μ=0.2 and β=0.001.
As a further improvement of the present invention, in the step 3, there are two hidden layers for deep neural network framework, swash
The line rectification unit that function living uses, output layer is using softmax function.
As a further improvement of the present invention, the number of nodes of first and the second hidden layer is 512, the training number of use
According to integrating as TIMIT speech database.
As a further improvement of the present invention, in the step 3, in order to train deep neural network, it is necessary first to right
Voice data is pre-processed, with signal-to-noise ratio be 0 the noise signals of clean speech and multiple types, 5,10,15dB mixed
It closes to obtain Noisy Speech Signal;The input feature vector of deep neural network be 13 Jan Vermeer cepstrum coefficients and its single order and
Second differnce coefficient.
As a further improvement of the present invention, in the step 3, sub-frame processing is carried out to each Noisy Speech Signal,
Its 39 dimensional feature, including 13 Jan Vermeer cepstrum coefficients and a second differnce coefficient are extracted simultaneously;Furthermore for utilization before and after frames
We using present frame and take the features of each three frame in front and back 7 frames in total to information, so the input feature vector dimension of input layer is 273
Dimension.
As a further improvement of the present invention, in the step 3, the cost function that deep neural network uses is intersection
Entropy loss function, output layer belong to the probability of each phoneme using softmax output present frame, using belonging to each phoneme probability
And its corresponding power spectrum does the estimation that weighted average calculation goes out clean speech power spectral density.
The speech Separation system based on super-Gaussian priori speech model and deep learning that the present invention also provides a kind of, it is special
Sign is, comprising: memory, processor and the computer program being stored on the memory, the computer program are matched
It is set to the step of realizing method of the present invention when being called by the processor.
The present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has calculating
The step of machine program, the computer program realizes method of the present invention when being configured to be called by processor.
The beneficial effects of the present invention are: the present invention is by under the combination of traditional statistical model and depth learning technology, not only
Nonstationary noise signal can effectively be inhibited, while also solving depth learning technology height and relying on training data and extensive energy
The weak problem of power.The combination of the two makes the enhancing performance of this method under various noise circumstances and state of signal-to-noise
All show to obtain unusual robust.
Detailed description of the invention
Fig. 1 is deep neural network architecture diagram of the invention.
Specific embodiment
The invention discloses a kind of speech separating method based on super-Gaussian priori speech model and deep learning, not only very
Good inhibits nonstationary noise, also shows fine Generalization Capability simultaneously for the data that do not trained.
The present invention is mainly to pass through to realize that the voice an of robust increases in conjunction with traditional statistical model and depth learning technology
Strong method.Entire method mainly includes four parts: using the speech gain function based on super-Gaussian voice hypothesized model, utilization
Neural network estimates the meter of the estimation of power spectrum, noise power spectrum of clean speech signal, prior weight and gain function
It calculates.
Introduce signal model first: it is contemplated that additive signal model, y (n)=x (n)+d (n), wherein y (n) be
Noisy Speech Signal, x (n) and d (n) respectively represent clean speech signal and noise signal.By using Short Time Fourier Transform
The relationship of time-frequency domain is obtained, Y (l, k)=X (l, k)+D (l, k), wherein l and k respectively represents the index of frame number and Frequency point.Language
Sound and the Fourier Transform Coefficients of noise signal obey super-Gaussian and Gaussian Profile respectively.
The present invention is based on the speech separating methods of super-Gaussian priori speech model and deep learning to include the following steps:
Step 1: receiving Noisy Speech Signal;
Step 2: respectively using super-Gaussian statistical model and Gaussian statistics model to clean speech signal and noise signal
Fourier Transform Coefficients are modeled, based on the statistical model it is assumed that using minimum mean square error criterion to clean speech signal
Amplitude spectrum is estimated, being estimated as follows for amplitude spectrum is obtained:
Here, ξ=λx/λdRepresent prior weight, λx=E | X (l, k) |2And λd=E | D (l, k) |2It is pure respectively
Net phonetic speech power spectrum density and noise power spectral density.Furthermore ζ=γ ξ/(μ+ξ) wherein γ=| Y (l, k) |2/λd(l, k) is represented
Posteriori SNR.M (,;) indicate confluent hypergeometric function.The parameter that we select for super-Gaussian voice signal model
Value is μ=0.2 and β=0.001.
Step 3: estimating clean speech power spectral density using deep neural network;
It will be seen that gain function depends on the calculating of prior weight from formula (1), and prior weight
It calculates and depends on clean speech power spectral density and noise power spectral density.So step 3 mainly estimates clean speech power
Spectrum density.The present invention estimates pure phonetic speech power spectrum density using deep neural network.The depth nerve net that the present invention uses
Network framework is as shown in Figure 1.
The neural network framework used is there are two hidden layer, the line rectification unit (ReLu) that activation primitive uses, output
Layer is using softmax function.The number of nodes of first and the second hidden layer is 512.The training dataset used for
TIMIT speech database.
In order to train neural network, it is necessary first to be pre-processed to voice data, we are clean speech and many classes
The noise signal of type with signal-to-noise ratio is 0,5,10,15dB carry out mixing to obtaining Noisy Speech Signal.The input of neural network
Feature be 13 Jan Vermeer cepstrum coefficients (Mel Frequency Cepstrum Coefficient, MFCC) and its single order and
Second differnce coefficient.So we carry out sub-frame processing to each Noisy Speech Signal, while its 39 dimensional feature is extracted, including 13
Tie up MFCC and a second differnce coefficient.Furthermore in order to which using the information of before and after frames, we use present frame and take each three frame in front and back
The feature of 7 frames in total is expressed as so the input feature vector dimension of input layer is 273 (39 multiplied by 7) dimension
Zl=[z1,l,z2,l,…,zV,l] (3)
Wherein V=273.
The cost function that neural network uses is cross entropy loss function.Trained target is which identification present frame belongs to
A phoneme, so output layer belongs to the probability of each phoneme using softmax output present frame, be expressed as P (q | Zl), it uses
One-hot vector indicates.All phonemes include mute in TIMIT data set and non-voice state is divided into Q=61
Classification, q ∈ { 1,2,3 ..., Q }.
Finally using belonging to each phoneme probability and its corresponding power spectrum does weighted average calculation and goes out clean speech function
The estimation of rate spectrum density.Training neural network, we select the calculating of Adam optimization algorithm progress gradient.
Step 4: by step 3, we have obtained the estimation of pure phonetic speech power spectrum density.Find out from formula (1), first
It tests the calculating of signal-to-noise ratio while needing clean speech power spectral density and noise power spectral density.So in step 4, Wo Menli
Least mean-square error (Minimum Mean Squared Error, MMSE) criterion is based on to noise function with traditional statistical model
Rate spectrum density is tracked.Noise power spectral density most descends mean square error estimation to obtain by what recursive average current noise was composed
?.
Step 5: by step 3 and step 4, we have obtained clean speech power spectral density and noise power spectrum is close
The estimated value of degree.Using value is estimated to obtain, we can find out the significant variable in gain function, prior weight.It carries it into
The value of our available gain functions into gain function is multiplied us with noisy speech spectrum using gain function value
Obtain the estimated value of pure voice amplitudes spectrum.The clean speech letter that we can be restored out using overlap-add technology
Number.Overlap-add technology is technology of the common frequency restoration to time domain.
The speech Separation system based on super-Gaussian priori speech model and deep learning that the invention also discloses a kind of, packet
Include: memory, processor and the computer program being stored on the memory, the computer program are configured to by described
The step of processor realizes method of the present invention when calling.
The invention also discloses a kind of computer readable storage medium, the computer-readable recording medium storage has calculating
The step of machine program, the computer program realizes method of the present invention when being configured to be called by processor.
Beneficial effects of the present invention are as follows:
1. present invention employs the super-Gaussian distribution hypothesized models for more meeting voice Fourier coefficient statistical property, so that estimating
The gain function of gauge is more accurate.
2. being energized Speech processing using depth learning technology.Learnt using the powerful modeling ability of depth learning technology
Noisy speech can effectively inhibit the noise signal of height non-stationary to the mapping relations between clean speech signal.
3. effectively nonstationary noise can not only be inhibited to believe under the combination of traditional statistical model and depth learning technology
Number, while also solving the problems, such as that depth learning technology height relies on training data and generalization ability is weak.The combination of the two makes
The enhancing performance of this method all shows to obtain unusual robust under various noise circumstances and state of signal-to-noise.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that
Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist
Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention
Protection scope.
Claims (10)
1. a kind of speech separating method based on super-Gaussian priori speech model and deep learning, which is characterized in that including as follows
Step:
Step 1: receiving Noisy Speech Signal;
Step 2: respectively using super-Gaussian statistical model and Gaussian statistics model in Fu of clean speech signal and noise signal
Leaf transformation coefficient is modeled, based on the statistical model it is assumed that using minimum mean square error criterion to clean speech signal amplitude
Spectrum is estimated, the estimation of amplitude spectrum is obtained;
Step 3: estimating clean speech power spectral density using deep neural network;
Step 4: minimum mean square error criterion is based on using statistical model, and noise power spectral density is tracked, noise power spectrum
Density most descends mean square error estimation to obtain by what recursive average current noise was composed;
Step 5: clean speech power spectral density estimated value, the step 4 obtained using step 3 obtains noise power spectral density estimation
Value, so as to find out the prior weight in gain function, prior weight is brought into gain function and obtains gain function
Value is multiplied to obtain the estimated value of clean speech amplitude spectrum, utilizes overlap-add technology using gain function value with noisy speech spectrum
The clean speech signal that we can be restored out.
2. speech separating method according to claim 1, which is characterized in that in the step 2, obtain estimating for amplitude spectrum
It counts as follows:
Wherein, ξ=λx/λdRepresent prior weight, λx=E | X (l, k) |2And λd=E | D (l, k) |2It is pure language respectively
Sound power spectral density and noise power spectral density;Furthermore the γ in ζ=γ ξ/(μ+ξ)=| Y (l, k) |2/λd(l, k) represents posteriority
Signal-to-noise ratio, M (,;) indicate confluent hypergeometric function.
3. speech separating method according to claim 2, which is characterized in that in the step 2, for super-Gaussian voice
Signal model, the parameter value selected are μ=0.2 and β=0.001.
4. speech separating method according to claim 1, which is characterized in that in the step 3, deep neural network frame
There are two hidden layers for structure, and the line rectification unit that activation primitive uses, output layer is using softmax function.
5. speech separating method according to claim 4, which is characterized in that the number of nodes of first and the second hidden layer is equal
It is 512, the training dataset used is TIMIT speech database.
6. speech separating method according to claim 1, which is characterized in that in the step 3, in order to train depth refreshing
Through network, it is necessary first to be pre-processed to voice data, the noise signal clean speech and multiple types is with signal-to-noise ratio
0,5,10,15dB carries out mixing to obtain Noisy Speech Signal;The input feature vector of deep neural network is 13 Jan Vermeer cepstrums
Coefficient and its single order and second differnce coefficient.
7. speech separating method according to claim 6, which is characterized in that in the step 3, to each noisy speech
Signal carries out sub-frame processing, while extracting its 39 dimensional feature, including 13 Jan Vermeer cepstrum coefficients and a second differnce coefficient;Furthermore
In order to which using the information of before and after frames, we using present frame and take the features of each three frame in front and back 7 frames in total, so input layer
Input feature vector dimension is 273 dimensions.
8. speech separating method according to claim 6, which is characterized in that in the step 3, deep neural network is adopted
Cost function is cross entropy loss function, and output layer belongs to the probability of each phoneme, benefit using softmax output present frame
With belonging to each phoneme probability and its corresponding power spectrum does the estimation that weighted average calculation goes out clean speech power spectral density.
9. a kind of speech Separation system based on super-Gaussian priori speech model and deep learning characterized by comprising storage
Device, processor and the computer program being stored on the memory, the computer program are configured to by the processor
The step of method of any of claims 1-8 is realized when calling.
10. a kind of computer readable storage medium, it is characterised in that: the computer-readable recording medium storage has computer journey
Sequence, the computer program realize the step of method of any of claims 1-8 when being configured to be called by processor
Suddenly.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910167788.8A CN109767781A (en) | 2019-03-06 | 2019-03-06 | Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning |
PCT/CN2019/117076 WO2020177372A1 (en) | 2019-03-06 | 2019-11-11 | Voice separation method and system based on super-gaussian prior voice module and deep learning, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910167788.8A CN109767781A (en) | 2019-03-06 | 2019-03-06 | Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109767781A true CN109767781A (en) | 2019-05-17 |
Family
ID=66457658
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910167788.8A Pending CN109767781A (en) | 2019-03-06 | 2019-03-06 | Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109767781A (en) |
WO (1) | WO2020177372A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144347A (en) * | 2019-12-30 | 2020-05-12 | 腾讯科技(深圳)有限公司 | Data processing method, device, platform and storage medium |
WO2020177372A1 (en) * | 2019-03-06 | 2020-09-10 | 哈尔滨工业大学(深圳) | Voice separation method and system based on super-gaussian prior voice module and deep learning, and storage medium |
CN112289337A (en) * | 2020-11-03 | 2021-01-29 | 北京声加科技有限公司 | Method and device for filtering residual noise after machine learning voice enhancement |
CN112653979A (en) * | 2020-12-29 | 2021-04-13 | 苏州思必驰信息科技有限公司 | Adaptive dereverberation method and device |
WO2021208287A1 (en) * | 2020-04-14 | 2021-10-21 | 深圳壹账通智能科技有限公司 | Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium |
WO2022161277A1 (en) * | 2021-01-29 | 2022-08-04 | 北京沃东天骏信息技术有限公司 | Speech enhancement method, model training method, and related device |
CN116580723A (en) * | 2023-07-13 | 2023-08-11 | 合肥星本本网络科技有限公司 | Voice detection method and system in strong noise environment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001003125B1 (en) * | 1999-07-02 | 2001-02-08 | Conexant Systems Inc | Bi-directional pitch enhancement in speech coding systems |
CN105632512A (en) * | 2016-01-14 | 2016-06-01 | 华南理工大学 | Dual-sensor voice enhancement method based on statistics model and device |
CN107610712A (en) * | 2017-10-18 | 2018-01-19 | 会听声学科技(北京)有限公司 | The improved MMSE of combination and spectrum-subtraction a kind of sound enhancement method |
CN108074582A (en) * | 2016-11-10 | 2018-05-25 | 电信科学技术研究院 | A kind of noise suppressed signal-noise ratio estimation method and user terminal |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2341299A (en) * | 1998-09-04 | 2000-03-08 | Motorola Ltd | Suppressing noise in a speech communications unit |
CN101685638B (en) * | 2008-09-25 | 2011-12-21 | 华为技术有限公司 | Method and device for enhancing voice signals |
CN104103278A (en) * | 2013-04-02 | 2014-10-15 | 北京千橡网景科技发展有限公司 | Real time voice denoising method and device |
CN103903631B (en) * | 2014-03-28 | 2017-10-03 | 哈尔滨工程大学 | Voice signal blind separating method based on Variable Step Size Natural Gradient Algorithm |
US9564144B2 (en) * | 2014-07-24 | 2017-02-07 | Conexant Systems, Inc. | System and method for multichannel on-line unsupervised bayesian spectral filtering of real-world acoustic noise |
CN107731242B (en) * | 2017-09-26 | 2020-09-04 | 桂林电子科技大学 | Gain function speech enhancement method for generalized maximum posterior spectral amplitude estimation |
CN109767781A (en) * | 2019-03-06 | 2019-05-17 | 哈尔滨工业大学(深圳) | Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning |
-
2019
- 2019-03-06 CN CN201910167788.8A patent/CN109767781A/en active Pending
- 2019-11-11 WO PCT/CN2019/117076 patent/WO2020177372A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001003125B1 (en) * | 1999-07-02 | 2001-02-08 | Conexant Systems Inc | Bi-directional pitch enhancement in speech coding systems |
CN105632512A (en) * | 2016-01-14 | 2016-06-01 | 华南理工大学 | Dual-sensor voice enhancement method based on statistics model and device |
CN108074582A (en) * | 2016-11-10 | 2018-05-25 | 电信科学技术研究院 | A kind of noise suppressed signal-noise ratio estimation method and user terminal |
CN107610712A (en) * | 2017-10-18 | 2018-01-19 | 会听声学科技(北京)有限公司 | The improved MMSE of combination and spectrum-subtraction a kind of sound enhancement method |
Non-Patent Citations (2)
Title |
---|
ROBERT REHR .ET AL: "On the Importance of Super-Gaussian Speech Priors", 《IEEE》 * |
TIMO GERKMANN.ET AL: "NOISE POWER ESTIMATION BASED ON THE PROBABILITY OF SPEECH PRESENCE", 《IEEE》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020177372A1 (en) * | 2019-03-06 | 2020-09-10 | 哈尔滨工业大学(深圳) | Voice separation method and system based on super-gaussian prior voice module and deep learning, and storage medium |
CN111144347A (en) * | 2019-12-30 | 2020-05-12 | 腾讯科技(深圳)有限公司 | Data processing method, device, platform and storage medium |
WO2021208287A1 (en) * | 2020-04-14 | 2021-10-21 | 深圳壹账通智能科技有限公司 | Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium |
CN112289337A (en) * | 2020-11-03 | 2021-01-29 | 北京声加科技有限公司 | Method and device for filtering residual noise after machine learning voice enhancement |
CN112289337B (en) * | 2020-11-03 | 2023-09-01 | 北京声加科技有限公司 | Method and device for filtering residual noise after machine learning voice enhancement |
CN112653979A (en) * | 2020-12-29 | 2021-04-13 | 苏州思必驰信息科技有限公司 | Adaptive dereverberation method and device |
WO2022161277A1 (en) * | 2021-01-29 | 2022-08-04 | 北京沃东天骏信息技术有限公司 | Speech enhancement method, model training method, and related device |
CN116580723A (en) * | 2023-07-13 | 2023-08-11 | 合肥星本本网络科技有限公司 | Voice detection method and system in strong noise environment |
CN116580723B (en) * | 2023-07-13 | 2023-09-08 | 合肥星本本网络科技有限公司 | Voice detection method and system in strong noise environment |
Also Published As
Publication number | Publication date |
---|---|
WO2020177372A1 (en) | 2020-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109767781A (en) | Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning | |
CN106486131B (en) | A kind of method and device of speech de-noising | |
CN109841206A (en) | A kind of echo cancel method based on deep learning | |
CN102324232A (en) | Method for recognizing sound-groove and system based on gauss hybrid models | |
CN109949821B (en) | Method for removing reverberation of far-field voice by using U-NET structure of CNN | |
CN102693724A (en) | Noise classification method of Gaussian Mixture Model based on neural network | |
CN109887489A (en) | Speech dereverberation method based on the depth characteristic for generating confrontation network | |
CN109346084A (en) | Method for distinguishing speek person based on depth storehouse autoencoder network | |
Astudillo et al. | An uncertainty propagation approach to robust ASR using the ETSI advanced front-end | |
CN112017682A (en) | Single-channel voice simultaneous noise reduction and reverberation removal system | |
CN106373559A (en) | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting | |
CN103021405A (en) | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter | |
CN108257606A (en) | A kind of robust speech personal identification method based on the combination of self-adaptive parallel model | |
González et al. | MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition | |
Roy et al. | DeepLPC-MHANet: Multi-head self-attention for augmented Kalman filter-based speech enhancement | |
Jensen et al. | Minimum mean-square error estimation of mel-frequency cepstral features–a theoretically consistent approach | |
Deligne et al. | Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization) | |
Wang et al. | Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm | |
Nathwani et al. | An extended experimental investigation of DNN uncertainty propagation for noise robust ASR | |
Xu et al. | Vector taylor series based joint uncertainty decoding. | |
Chen | Noise reduction of bird calls based on a combination of spectral subtraction, Wiener filtering, and Kalman filtering | |
Fingscheidt et al. | Data-driven speech enhancement | |
KR20170087211A (en) | Feature compensation system and method for recognizing voice | |
Chang et al. | Multiple statistical models for soft decision in noisy speech enhancement | |
Li et al. | Enhanced speech based jointly statistical probability distribution function for voice activity detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190517 |
|
RJ01 | Rejection of invention patent application after publication |