CN109448751B - Binaural speech enhancement method based on deep learning - Google Patents

Binaural speech enhancement method based on deep learning Download PDF

Info

Publication number
CN109448751B
CN109448751B CN201811646317.7A CN201811646317A CN109448751B CN 109448751 B CN109448751 B CN 109448751B CN 201811646317 A CN201811646317 A CN 201811646317A CN 109448751 B CN109448751 B CN 109448751B
Authority
CN
China
Prior art keywords
channel
complex
frequency domain
speech
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811646317.7A
Other languages
Chinese (zh)
Other versions
CN109448751A (en
Inventor
李军锋
孙兴伟
夏日升
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN201811646317.7A priority Critical patent/CN109448751B/en
Publication of CN109448751A publication Critical patent/CN109448751A/en
Application granted granted Critical
Publication of CN109448751B publication Critical patent/CN109448751B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Abstract

The invention discloses a binaural speech enhancement method based on deep learning, which comprises the following steps: the method comprises the steps of respectively processing left/right channel noisy speech signals containing target speech signals to be enhanced to obtain left/right frequency domain signals, combining amplitudes of the left/right channel noisy speech signals to obtain single-channel complex characteristics, respectively calculating corresponding target speech ideal complex masking by using the frequency domain signals of the left/right channel and theoretical values of the corresponding target frequency domain signals, combining the target speech ideal complex masking to form target speech single-channel complex masking theoretical values, training a complex feedforward neural network by combining the single-channel complex characteristics to obtain a binaural speech enhancement model, respectively processing the left/right channel noisy speech signals by using target speech single-channel complex masking estimated values output by the model to obtain left/right channel frequency domain signals, and finally obtaining corresponding target speech time domain signals. The method can suppress noise interference and maintain spatial information of a target sound source. The generalization capability of the deep neural network is fully utilized to achieve the enhancement of the binaural voice.

Description

Binaural speech enhancement method based on deep learning
Technical Field
The invention relates to the technical field of speech enhancement, in particular to a binaural speech enhancement method based on deep learning.
Background
At present, the speech enhancement technology mainly removes background noise and directional noise interference in speech signals, improves speech quality and intelligibility, and thus obtains better performance in speech recognition and human ear understanding. In the enhancement technology taking single-channel voice as output, background noise can be suppressed by utilizing different characteristics of voice and noise in a time-frequency domain of single-channel input, and directional noise can be better removed by utilizing spatial information of target voice and interference signals in multi-channel input. In binaural hearing, human ears can improve the comprehension of voice by using the spatial information difference between a target and an interference signal in dual-channel voice, and can perform positioning by using the spatial information of a target sound source. In most traditional speech enhancement with dual channels as output, only interference removal is considered, no special processing is performed on the spatial information of target speech, and the suppression effect on non-stationary noise is poor.
Disclosure of Invention
The invention aims to solve the defects in the prior art.
In order to achieve the aim, the invention discloses a binaural speech enhancement method based on deep learning, which comprises the following steps:
respectively performing framing, windowing and Fourier transformation on the noisy speech signal of the left channel and the noisy speech signal of the right channel to obtain a noisy speech frequency domain signal of the left channel and a noisy speech frequency domain signal of the right channel; the left channel noisy speech signal comprises a left channel target speech signal to be enhanced, and the right channel noisy speech signal comprises a right channel target speech signal to be enhanced;
combining the amplitudes of the left channel voice frequency domain signal with noise and the right channel voice frequency domain signal with noise to obtain single-channel complex characteristics;
calculating by using the left channel noisy speech frequency domain signal and the left channel target speech frequency domain signal theoretical value to obtain a left channel target speech ideal complex mask; calculating by using the theoretical values of the right channel noisy speech frequency domain signal and the right channel target speech frequency domain signal to obtain an ideal complex masking of the right channel target speech;
combining the left channel target voice ideal complex masking and the right channel target voice ideal complex masking to form a target voice single-channel complex masking theoretical value;
training a complex feedforward neural network by using a single-channel complex feature and a target voice single-channel complex masking theoretical value to obtain a binaural voice enhancement model;
the single-channel complex feature is used as the input of a binaural voice enhancement model, a target voice single-channel complex masking estimated value is output, and a left-channel noisy voice frequency domain signal and a right-channel noisy voice frequency domain signal are respectively enhanced by the target voice single-channel complex masking estimated value to obtain a left-channel target voice frequency domain signal estimated value and a right-channel target voice frequency domain signal estimated value;
and respectively carrying out inverse Fourier transform on the estimated value of the left channel target voice frequency domain signal and the estimated value of the right channel target voice frequency domain signal to obtain a left channel target voice time domain signal and a right channel target voice time domain signal.
Preferably, the steps of framing, windowing and fourier transforming the left and right channel noisy speech signals are performed, in particular,
respectively carrying out framing and windowing on the noisy speech signal of the left channel and the noisy speech signal of the right channel, taking 1024 sampling points as a frame signal, and if the length is insufficient, firstly filling zero to 1024 points; then windowing each frame of signal, wherein a Hamming window is adopted as a windowing function; and finally, carrying out Fourier transform on each frame of signal.
Preferably, the single-channel complex feature XC=|XL|+j|XRWhere j is the complex imaginary unit, | XLI is the amplitude of the left channel voice frequency domain signal with noise, | XRAnd | is the amplitude of the noise-containing speech frequency domain signal of the right channel.
Preferably, the left channel target speech ideal complex masking is:
Figure BDA0001932133400000031
wherein j is complex imaginary unit, XLIs a complex number, is a left channel noisy speech frequency domain signal, SLThe signal is a complex number, the theoretical value of the left channel target voice frequency domain signal is shown, and r and i represent the real part and the imaginary part of the complex number;
preferably, the ideal complex masking of the right channel target speech is:
Figure BDA0001932133400000032
wherein j is complex imaginary unit, XRIs a complex number, is a right channel noisy speech frequency domain signal, SRThe expression r and i is the complex number, the theoretical value of the target speech frequency domain signal of the right channel, and the real part and the imaginary part of the complex number are taken.
Preferably, the target voice single-channel complex masking theoretical value MC=ML+jMRWhere j is the complex imaginary unit, MLFor ideal complex masking of the left channel target speech, MRIdeal complex masking for the right channel target speech.
Preferably, the step of training the complex feedforward neural network to obtain the binaural speech enhancement model by using the single-channel complex feature and the target speech single-channel complex masking theoretical value, specifically,
the complex feedforward neural network is a fully-connected neural network with 4 layers, and each layer in the network has 1024 hidden-layer complex nodes. The activation function of each neuron uses a linear modification unit and acts on the real part and imaginary part of the complex number node, respectively, with the expression f (x) max (0, x).
And performing front-and-back frame expansion on the single-channel complex feature to obtain a single-channel complex expansion feature, outputting a target voice single-channel complex masking estimation value as the input of a complex feedforward neural network, taking a target voice single-channel complex masking theoretical value as a training target of the complex feedforward neural network, and continuously reducing the mean square error between the target voice single-channel complex masking estimation value and the target voice single-channel complex masking theoretical value through iteration.
Preferably, the single-channel complex masking estimate MC′=ML′+jMR', where j is the complex imaginary unit, ML' is an estimate of the ideal complex masking of the left channel target speech, MR' is an estimate of the ideal complex masking of the right channel target speech.
Preferably, the left channel target voice frequency domain signal estimated value X'L=M′L*XLWherein M isL' is an estimate of the ideal complex masking of the left channel target speech, XLA voice frequency domain signal with noise of a left channel is obtained;
preferably, the right channel target speech frequency domain signal estimated value X'R=M′R*XRWherein M isR' estimation value, X, of ideal complex masking of target speech of right channelRAnd the right channel is a voice frequency domain signal with noise.
The invention has the advantages that: the method has the advantages that the ideal complex masking of the left channel and the right channel is utilized to construct single-channel complex masking, and the single-channel complex masking is estimated through a complex feedforward neural network, so that the purpose of jointly processing the left channel and the right channel is achieved, and the spatial information of a target sound source is kept while noise interference is suppressed. By containing enough noise types and orientations in the training data, the generalization capability of the deep neural network can be fully utilized, the robustness of the model is improved, and the purpose of binaural speech enhancement is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a binaural speech enhancement method based on deep learning.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a binaural speech enhancement method based on deep learning. As shown in fig. 1, includes:
step S101: and respectively performing framing, windowing and Fourier transformation on the voice signal with noise of the left channel and the voice signal with noise of the right channel to obtain a voice frequency domain signal with noise of the left channel and a voice frequency domain signal with noise of the right channel.
The left-channel noisy speech signal comprises a left-channel target speech signal to be enhanced, and the right-channel noisy speech signal comprises a right-channel target speech signal to be enhanced.
In a specific embodiment, framing and windowing are respectively carried out on a noisy speech signal of a left channel and a noisy speech signal of a right channel, 1024 sampling points are taken as a frame signal, and if the length is insufficient, zero padding is carried out to 1024 points; then windowing each frame of signal, wherein a Hamming window is adopted as a windowing function; and finally, carrying out Fourier transform on each frame of signal to obtain a left channel voice frequency domain signal with noise and a right channel voice frequency domain signal with noise.
Step S102: and combining the amplitudes of the left channel voice frequency domain signal with noise and the right channel voice frequency domain signal with noise to obtain single-channel complex characteristics.
In particular, a single-channel complex feature XC=|XL|+j|XRWhere j is the complex imaginary unit, | XLI is the amplitude of the left channel voice frequency domain signal with noise, | XRAnd | is the amplitude of the noise-containing speech frequency domain signal of the right channel.
Step S103: calculating by using the left channel noisy speech frequency domain signal and the left channel target speech frequency domain signal theoretical value to obtain a left channel target speech ideal complex mask; and calculating to obtain the ideal complex masking of the right channel target voice by using the theoretical value of the right channel noisy voice frequency domain signal and the right channel target voice frequency domain signal.
Specifically, the ideal complex masking of the left channel target speech is:
Figure BDA0001932133400000061
wherein j is complex imaginary unit, XLIs a complex number, is a left channel noisy speech frequency domain signal, SLThe expression r and i is the real part and the imaginary part of the complex number, and is the theoretical value of the target speech frequency domain signal of the left channel.
The ideal complex masking of the right channel target speech is:
Figure BDA0001932133400000062
wherein j is complex imaginary unit, XRIs a complex number, is a right channel noisy speech frequency domain signal, SRIs complex, is the frequency domain signal of the right channel target voiceIn terms of values, r and i denote the real and imaginary parts of the complex number.
Step S104: and combining the left channel target voice ideal complex masking and the right channel target voice ideal complex masking to form a target voice single-channel complex masking theoretical value.
Specifically, the target voice single-channel complex masking theoretical value MC=ML+jMRWhere j is the complex imaginary unit, MLFor ideal complex masking of the left channel target speech, MRIdeal complex masking for the right channel target speech.
Step S105: and training the complex feedforward neural network by using the single-channel complex feature and the target voice single-channel complex masking theoretical value to obtain a binaural voice enhancement model.
In one embodiment, the complex feedforward neural network is a 4-layer fully-connected neural network, and each layer in the network has 1024 hidden-layer complex nodes. The activation function of each neuron uses a linear modification unit and acts on the real part and imaginary part of the complex number node, respectively, with the expression f (x) max (0, x).
And performing front-and-back frame expansion on the single-channel complex feature to obtain a single-channel complex expansion feature, outputting a target voice single-channel complex masking estimation value as the input of a complex feedforward neural network, taking a target voice single-channel complex masking theoretical value as a training target of the complex feedforward neural network, and continuously reducing the mean square error between the target voice single-channel complex masking estimation value and the target voice single-channel complex masking theoretical value through iteration.
Step S106: and taking the single-channel complex feature as the input of a binaural voice enhancement model, outputting a target voice single-channel complex masking estimated value, and enhancing the left-channel noisy voice frequency domain signal and the right-channel noisy voice frequency domain signal respectively by using the target voice single-channel complex masking estimated value to obtain a left-channel target voice frequency domain signal estimated value and a right-channel target voice frequency domain signal estimated value.
Specifically, a single-channel complex masking estimate MC′=ML′+jMR', wherein j isComplex imaginary unit, ML' is an estimate of the ideal complex masking of the left channel target speech, MR' is an estimate of the ideal complex masking of the right channel target speech.
Left channel target voice frequency domain signal estimated value X'L=M′L*XLWherein M isL' is an estimate of the ideal complex masking of the left channel target speech, XLAnd the left channel is a voice frequency domain signal with noise.
Right channel target voice frequency domain signal estimated value X'R=M′R*XRWherein M isR' estimation value, X, of ideal complex masking of target speech of right channelRAnd the right channel is a voice frequency domain signal with noise.
Step S107: and respectively carrying out inverse Fourier transform on the estimated value of the left channel target voice frequency domain signal and the estimated value of the right channel target voice frequency domain signal to obtain a left channel target voice time domain signal and a right channel target voice time domain signal.
The invention provides a binaural speech enhancement method based on deep learning, which constructs single-channel complex masking by utilizing ideal complex masking of a left channel and a right channel, estimates the single-channel complex masking through a complex feedforward neural network, achieves the purpose of jointly processing the left channel and the right channel, and further keeps the spatial information of a target sound source while inhibiting noise interference. By containing enough noise types and orientations in the training data, the generalization capability of the deep neural network can be fully utilized, the robustness of the model is improved, and the purpose of binaural speech enhancement is achieved.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A binaural speech enhancement method based on deep learning, characterized in that it comprises the steps of:
respectively performing framing, windowing and Fourier transformation on the noisy speech signal of the left channel and the noisy speech signal of the right channel to obtain a noisy speech frequency domain signal of the left channel and a noisy speech frequency domain signal of the right channel; the left channel noisy speech signal comprises a left channel target speech signal to be enhanced, and the right channel noisy speech signal comprises a right channel target speech signal to be enhanced;
combining the amplitudes of the left channel voice frequency domain signal with noise and the right channel voice frequency domain signal with noise to obtain single-channel complex characteristics;
calculating by using the left channel noisy speech frequency domain signal and a left channel target speech frequency domain signal theoretical value to obtain a left channel target speech ideal complex mask; calculating by using the right channel noisy speech frequency domain signal and a right channel target speech frequency domain signal theoretical value to obtain a right channel target speech ideal complex mask;
combining the left channel target voice ideal complex masking and the right channel target voice ideal complex masking to form a target voice single-channel complex masking theoretical value;
training a complex feedforward neural network by using the single-channel complex feature and the target voice single-channel complex masking theoretical value to obtain a binaural voice enhancement model;
taking the single-channel complex feature as the input of the binaural voice enhancement model, outputting a target voice single-channel complex masking estimated value, and respectively enhancing a left-channel noisy voice frequency domain signal and a right-channel noisy voice frequency domain signal by using the target voice single-channel complex masking estimated value to obtain a left-channel target voice frequency domain signal estimated value and a right-channel target voice frequency domain signal estimated value;
and respectively carrying out inverse Fourier transform on the estimated value of the left channel target voice frequency domain signal and the estimated value of the right channel target voice frequency domain signal to obtain a left channel target voice time domain signal and a right channel target voice time domain signal.
2. The method according to claim 1, wherein the steps of framing, windowing and Fourier transforming the left and right channel noisy speech signals, respectively, are performed by,
respectively carrying out framing and windowing on the noisy speech signal of the left channel and the noisy speech signal of the right channel, taking 1024 sampling points as a frame signal, and if the length is insufficient, firstly supplementing zero to 1024 points; then windowing each frame of signal, wherein a Hamming window is adopted as a windowing function; and finally, carrying out Fourier transform on each frame of signal.
3. The method of claim 1, wherein the single-channel complex signature is:
XC=|XL|+j|XR|
wherein j is complex imaginary unit, | XLI is the amplitude of the left channel voice frequency domain signal with noise, | XRAnd | is the amplitude of the noise-containing speech frequency domain signal of the right channel.
4. The method of claim 1,
the ideal complex masking of the left channel target speech is:
Figure FDA0002659484200000021
wherein j is complex imaginary unit, XLIs a complex number, is a left channel noisy speech frequency domain signal, SLThe signal is a complex number, the theoretical value of the left channel target voice frequency domain signal is shown, and r and i represent the real part and the imaginary part of the complex number;
the ideal complex masking of the right channel target speech is:
Figure FDA0002659484200000022
wherein j is complex imaginary unit, XRIs a complex number, is a right channel noisy speech frequency domain signal, SRThe expression r and i is the complex number, the theoretical value of the target speech frequency domain signal of the right channel, and the real part and the imaginary part of the complex number are taken.
5. The method according to any one of claims 1 or 4, wherein the target voice single-channel complex masking theoretic value is:
MC=ML+jMR
where j is the complex imaginary unit, MLFor ideal complex masking of the left channel target speech, MRIdeal complex masking for the right channel target speech.
6. The method according to claim 1, characterized in that the step of training a complex feedforward neural network using the single-channel complex feature and a target speech single-channel complex masking theory value to obtain a binaural speech enhancement model, in particular,
the complex feedforward neural network is a fully-connected neural network with 4 layers, and each layer in the network is provided with 1024 hidden layer complex nodes; the activation function of each neuron uses a linear modification unit and acts on the real part and the imaginary part of a complex number node respectively, and the expression is f (x) max (0, x);
and performing front-back frame expansion on the single-channel complex feature to obtain a single-channel complex expansion feature, using the single-channel complex expansion feature as the input of the complex feedforward neural network, outputting a target voice single-channel complex masking estimation value, using a target voice single-channel complex masking theoretical value as a training target of the complex feedforward neural network, and continuously reducing the mean square error between the target voice single-channel complex masking estimation value and the target voice single-channel complex masking theoretical value through iteration.
7. The method of claim 1, wherein the single-channel complex masking estimate is:
MC′=ML′+jMR
where j is the complex imaginary unit, MLIs a left channel target languageEstimate of the tone-ideal complex masking, MR' is an estimate of the ideal complex masking of the right channel target speech.
8. The method according to claim 1 or 7,
the left channel target voice frequency domain signal estimation value is as follows:
X′L=M′L*XL
wherein M isL' is an estimate of the ideal complex masking of the left channel target speech, XLA voice frequency domain signal with noise of a left channel is obtained;
the right channel target voice frequency domain signal estimation value is as follows:
X′R=M′R*XR
wherein M isR' estimation value, X, of ideal complex masking of target speech of right channelRAnd the right channel is a voice frequency domain signal with noise.
CN201811646317.7A 2018-12-29 2018-12-29 Binaural speech enhancement method based on deep learning Active CN109448751B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811646317.7A CN109448751B (en) 2018-12-29 2018-12-29 Binaural speech enhancement method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811646317.7A CN109448751B (en) 2018-12-29 2018-12-29 Binaural speech enhancement method based on deep learning

Publications (2)

Publication Number Publication Date
CN109448751A CN109448751A (en) 2019-03-08
CN109448751B true CN109448751B (en) 2021-03-23

Family

ID=65540255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811646317.7A Active CN109448751B (en) 2018-12-29 2018-12-29 Binaural speech enhancement method based on deep learning

Country Status (1)

Country Link
CN (1) CN109448751B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110739002B (en) * 2019-10-16 2022-02-22 中山大学 Complex domain speech enhancement method, system and medium based on generation countermeasure network
CN111239686B (en) * 2020-02-18 2021-12-21 中国科学院声学研究所 Dual-channel sound source positioning method based on deep learning
CN111681646A (en) * 2020-07-17 2020-09-18 成都三零凯天通信实业有限公司 Universal scene Chinese Putonghua speech recognition method of end-to-end architecture
CN113129918B (en) * 2021-04-15 2022-05-03 浙江大学 Voice dereverberation method combining beam forming and deep complex U-Net network
CN113921027B (en) * 2021-12-14 2022-04-29 北京清微智能信息技术有限公司 Speech enhancement method and device based on spatial features and electronic equipment
CN114566189B (en) * 2022-04-28 2022-10-04 之江实验室 Speech emotion recognition method and system based on three-dimensional depth feature fusion

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157156B (en) * 2011-03-21 2012-10-10 清华大学 Single-channel voice enhancement method and system
US9881631B2 (en) * 2014-10-21 2018-01-30 Mitsubishi Electric Research Laboratories, Inc. Method for enhancing audio signal using phase information
CN107845389B (en) * 2017-12-21 2020-07-17 北京工业大学 Speech enhancement method based on multi-resolution auditory cepstrum coefficient and deep convolutional neural network
CN108564963B (en) * 2018-04-23 2019-10-18 百度在线网络技术(北京)有限公司 Method and apparatus for enhancing voice

Also Published As

Publication number Publication date
CN109448751A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109448751B (en) Binaural speech enhancement method based on deep learning
CN109584903B (en) Multi-user voice separation method based on deep learning
US8958572B1 (en) Adaptive noise cancellation for multi-microphone systems
US9681246B2 (en) Bionic hearing headset
CN111081267B (en) Multi-channel far-field speech enhancement method
KR20180069879A (en) Globally Optimized Least Squares Post Filtering for Voice Enhancement
Mosayyebpour et al. Single-microphone LP residual skewness-based inverse filtering of the room impulse response
CN108986832A (en) Ears speech dereverberation method and device based on voice probability of occurrence and consistency
Wang et al. Mask weighted STFT ratios for relative transfer function estimation and its application to robust ASR
Li et al. Multichannel online dereverberation based on spectral magnitude inverse filtering
CN115424627A (en) Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm
Aroudi et al. Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding
Shi et al. Robust digit recognition using phase-dependent time-frequency masking
Zohourian et al. GSC-based binaural speaker separation preserving spatial cues
Fischer et al. Robust constrained MFMVDR filters for single-channel speech enhancement based on spherical uncertainty set
CN110858485B (en) Voice enhancement method, device, equipment and storage medium
Nasu et al. Cross-channel spectral subtraction for meeting speech recognition
Ji et al. Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment.
Masuyama et al. Causal distortionless response beamforming by alternating direction method of multipliers
Ayrapetian et al. Asynchronous acoustic echo cancellation over wireless channels
Prasad et al. Two microphone technique to improve the speech intelligibility under noisy environment
Tan et al. Kronecker Product Based Linear Prediction Kalman Filter for Dereverberation and Noise Reduction
Miyabe et al. Barge-in-and noise-free spoken dialogue interface based on sound field control and semi-blind source separation
Giri et al. A novel target speaker dependent postfiltering approach for multichannel speech enhancement
Mosayyebpour et al. Time delay estimation via minimum-phase and all-pass component processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant