CN109841206A - A kind of echo cancel method based on deep learning - Google Patents

A kind of echo cancel method based on deep learning Download PDF

Info

Publication number
CN109841206A
CN109841206A CN201811013935.8A CN201811013935A CN109841206A CN 109841206 A CN109841206 A CN 109841206A CN 201811013935 A CN201811013935 A CN 201811013935A CN 109841206 A CN109841206 A CN 109841206A
Authority
CN
China
Prior art keywords
acoustic feature
signal
shot
echo
long term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811013935.8A
Other languages
Chinese (zh)
Other versions
CN109841206B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Elephant Acoustical (shenzhen) Technology Co Ltd
Original Assignee
Elephant Acoustical (shenzhen) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elephant Acoustical (shenzhen) Technology Co Ltd filed Critical Elephant Acoustical (shenzhen) Technology Co Ltd
Priority to CN201811013935.8A priority Critical patent/CN109841206B/en
Publication of CN109841206A publication Critical patent/CN109841206A/en
Priority to PCT/CN2019/090528 priority patent/WO2020042706A1/en
Application granted granted Critical
Publication of CN109841206B publication Critical patent/CN109841206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The disclosure discloses a kind of echo cancel method based on deep learning, device and electronic equipment, storage medium, belongs to field of computer technology.The described method includes: extracting acoustic feature from received microphone signal, the microphone signal includes near end signal and remote signaling;The acoustic feature is iterated operation in the recurrent neural networks model with shot and long term memory of training in advance, calculates the ratio film of the acoustic feature;The acoustic feature is sheltered using the ratio film, the phase of the acoustic feature and the microphone signal after masking is synthesized, the near end signal after echo cancellor is obtained.Above-mentioned echo cancel method and device based on deep learning can ambient noise, it is double make peace non-linear distortion when realize echo cancellor, greatly improve the effect and applicable scene of echo cancellor.And without postfilter, effectively simplify electronic equipment, reduces electronic equipment cost.

Description

A kind of echo cancel method based on deep learning
Technical field
This disclosure relates to computer application technology, in particular to a kind of echo cancel method based on deep learning, Device and electronic equipment, storage medium.
Background technique
In a communications system, when loudspeaker and microphone couple, microphone will pick up signal that loudspeaker issues and its Reverberation, thus echogenicity.Such as all there is the puzzlement of echo in videoconference, hands-free phone and mobile communication.
Echo cancellor faces many problems, such as it is double say, ambient noise and non-linear distortion.Firstly, double say is communication system Typical conversational mode in system, when both ends speaker, have while speaking.However, near-end voice signals will seriously affect adaptive calculation The convergence of method and may cause they dissipate.In addition, received signal not only includes echo and proximal end language at microphone Sound signal also includes ambient noise.Traditionally, the method for echo cancellor is by finite impulse response (FIR) (FIR) filter Adaptively estimate then the acoustic pulses response between loudspeaker and microphone passes through one to realize the elimination of echo Postfilter inhibits remaining echo after ambient noise and echo cancellor.
The final goal of AEC (Acoustic Echo Cancellation, echo cancellor) is to completely eliminate remote signaling, Only near end signal is sent.However, traditional echo cancel method be echo path is modeled as to linear system, but by In the non-linear limitation of the components such as power amplifier and loudspeaker, in the actual conditions of echo cancellor, remote signaling is it is possible that non- Linear distortion has seriously affected the effect of echo cancellor.
Summary of the invention
Effect in order to solve echo cancellor in the related technology is bad and the technical issues of needing postfilter, and the disclosure mentions A kind of echo cancel method based on deep learning, device and electronic equipment, storage medium are supplied.
In a first aspect, providing a kind of echo cancel method based on deep learning, comprising:
Acoustic feature is extracted from received microphone signal, the microphone signal includes that near end signal and distal end are believed Number;
The acoustic feature is iterated in the recurrent neural networks model with shot and long term memory of training in advance Operation calculates the ratio film of the acoustic feature;
The acoustic feature is sheltered using the ratio film;
The phase of the acoustic feature and the microphone signal after masking is synthesized, is obtained by echo Near end signal after elimination.
Optionally, described the step of acoustic feature is extracted from received microphone signal, includes:
Received microphone signal is divided into time frame according to preset period of time, the microphone signal includes proximal end letter Number and remote signaling;
Spectrum amplitude vector is extracted from the time frame;
The spectrum amplitude vector is normalized, acoustic feature is formed.
Optionally, the spectrum amplitude vector is normalized, formed acoustic feature the step of include:
Current time frame is merged to be normalized to form acoustic feature with the spectrum amplitude vector of time in the past frame.
Optionally, the construction method of the recurrent neural networks model with shot and long term memory of training in advance includes:
Determine that the voice of speaking when being trained is proximally and distally (to refer to) signal;
Remote signaling, near end signal when speaking voice described in collection as distal end, proximal end, and voice training is established with this Collection, wherein the remote signaling is echo signal, the near end signal and the echo signal form microphone signal;
The voice training collection is trained by the recurrent neural network with shot and long term memory, described in building Recurrent neural networks model with shot and long term memory.
Optionally, the voice training collection is trained by the recurrent neural network with shot and long term memory, The step of building recurrent neural networks model with shot and long term memory includes:
The acoustic feature of the microphone signal, distal end (echo) signal is extracted respectively;
According to the microphone signal, the acoustic feature of remote signaling, pass through the recurrence mind with shot and long term memory Estimation through ideal ratio film when network progress echo cancellor, the building recurrent neural network mould with shot and long term memory Type.
Optionally, the voice training collection is trained by the recurrent neural network with shot and long term memory, The step of building recurrent neural networks model with shot and long term memory also may include:
Linear echo elimination is carried out to the microphone signal by traditional AEC algorithm;
The linear AEC for carrying out linear echo elimination to the remote signaling, by traditional AEC algorithm respectively exports carry out sound Learn the extraction of feature;
According to the remote signaling, the acoustic feature of the linear AEC output, pass through the passing with shot and long term memory The estimation of ideal ratio film when neural network being returned to carry out echo cancellor, the building recurrent neural network with shot and long term memory Model.
Optionally, the method can also include:
The extraction of acoustic feature is carried out to the remote signaling, microphone signal, the linear AEC output respectively;
According to the acoustic feature that the remote signaling, microphone signal, the linear AEC are exported, there is length by described The recurrent neural network of short-term memory carries out the estimation of ideal ratio film when echo cancellor, and building is described, and there is shot and long term to remember Recurrent neural networks model.
Second aspect provides a kind of echo cancelling device based on deep learning, comprising:
Acoustic feature extraction module, for extracting acoustic feature from received input signal, the input signal includes Microphone signal and remote signaling;
Ratio film computing module, for by the acoustic feature in advance training with shot and long term memory recurrent neural It is iterated operation in network model, calculates the ratio film of the acoustic feature;
Masking block, for being sheltered using the ratio film to the acoustic feature;
Voice synthetic module, for that will be carried out by the phase of the acoustic feature and the microphone signal after masking Synthesis, obtains the near end signal after echo cancellor.
Optionally, the training objective using ideal ratio film as the recurrent neural networks model with shot and long term memory.
The third aspect provides a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes, so that at least one described processor is able to carry out method as described in relation to the first aspect.
Fourth aspect provides a kind of computer readable storage medium, and for storing program, described program is when executed So that electronic equipment executes method as described in relation to the first aspect.
The technical scheme provided by this disclosed embodiment can include the following benefits:
When carrying out echo cancellor, acoustic feature is extracted from received microphone signal, acoustic feature is instructed in advance After being iterated the ratio film that operation calculates acoustic feature in the experienced recurrent neural networks model with shot and long term memory, use The ratio film shelters acoustic feature.The phase of acoustic feature and microphone signal after masking is closed again At realization echo cancellor.Due to using the recurrent neural networks model with shot and long term memory of training in advance in the program, So as to realize echo cancellor when the noise that has powerful connections, double non-existing property of making peace are distorted, echo cancellor is greatly improved Effect and applicable scene.And without postfilter, it is effectively simplified electronic equipment, reduces electronic equipment cost.
It should be understood that above general description and following detailed description is merely illustrative, this public affairs can not be limited Open range.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and in specification together principle for explaining the present invention.
Fig. 1 is a kind of flow chart of echo cancel method based on deep learning shown according to an exemplary embodiment.
Fig. 2 is a kind of specific reality of step S110 in the echo cancel method based on deep learning of Fig. 1 corresponding embodiment Existing flow chart.
Fig. 3 is the building side according to the recurrent neural networks model with shot and long term memory shown in Fig. 1 corresponding embodiment A kind of specific implementation flow chart of method.
Fig. 4 is the flow diagram of echo cancellor shown according to an exemplary embodiment.
Fig. 5 is the building side according to the recurrent neural networks model with shot and long term memory shown in Fig. 4 corresponding embodiment A kind of specific implementation flow chart of step S123 in method.
Fig. 6 is the building side according to the recurrent neural networks model with shot and long term memory shown in Fig. 4 corresponding embodiment Another specific implementation flow chart of step S123 in method.
Fig. 7 is the building side according to the recurrent neural networks model with shot and long term memory shown in Fig. 6 corresponding embodiment Another specific implementation flow chart of step S123 in method.
Fig. 8 is shown according to an exemplary embodiment using the microphone signal (a) of smart phone acquisition, distally (ginseng Examine) signal (b), tradition AEC algorithm linear echo eliminate output (c) and LSTM3 output signal (d) spectrogram.
Fig. 9 is a kind of block diagram of echo cancelling device based on deep learning shown according to an exemplary embodiment.
Figure 10 is extracted according to acoustic feature in the echo cancelling device based on deep learning shown in Fig. 9 corresponding embodiment A kind of block diagram of module 110.
Figure 11 is a kind of block diagram according to the ratio film computing module 120 shown in Fig. 9 corresponding embodiment.
Figure 12 is a kind of block diagram of the model construction submodule 123 shown in Figure 11 corresponding embodiment.
Figure 13 is another block diagram of the model construction submodule 123 shown in Figure 11 corresponding embodiment.
Figure 14 is another block diagram of the model construction submodule 123 shown in Figure 11 corresponding embodiment.
Specific embodiment
Here will the description is performed on the exemplary embodiment in detail, the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, consistent with some aspects of the invention.
Fig. 1 is a kind of flow chart of echo cancel method based on deep learning shown according to an exemplary embodiment. The echo cancel method based on deep learning can be used in the electronic equipments such as smart phone, computer.As shown in Figure 1, this is based on The echo cancel method of deep learning may include step S110, step S120, step S130, step S140.
Step S110, it includes microphone signal and remote that acoustic feature microphone signal is extracted from received microphone signal End signal (i.e. echo signal).
Microphone signal is voice signal received when carrying out echo cancellor, and the sound pick-up outfits such as microphone will acquire close End signal and echo signal are that is, microphone signal includes near end signal and remote signaling (i.e. echo signal).
When electronic equipment carries out echo cancellor, the voice signal of the sound pick-up outfits such as microphone acquisition can receive, it can also be with The voice signal that other electronic equipments are sent is received, can also be and receive voice signal otherwise, herein without one One description.
For example, the sound pick-up outfits such as microphone will carry out the acquisition of voice signal in videoconference, the recording such as microphone are set Indoor near end signal where the voice signal of standby acquisition not only includes microphone further includes from distally transmitting through loudspeaker The remote signaling of broadcasting.
Optionally, the acquisition of the sound pick-up outfits such as microphone carries out the acquisition of input signal with the frequency acquisition of 16KHz.
Acoustic feature is the data characteristics that can characterize voice signal.
When extracting acoustic feature from received voice signal, STFT (Short-time can be used to voice signal Fourier transform, Short Time Fourier Transform) acoustic feature is extracted, voice signal can also be mentioned using wavelet transformation Acoustic feature is taken, can also take other form and extract acoustic feature from received voice signal.
Optionally, as shown in Fig. 2, step S110 may include step S111, step S112, step S113.
Received microphone signal is divided into time frame according to preset period of time by step S111.
Preset period of time is to be divided into voice signal more according to preset period of time the pre-set time interval phase A time frame.
Optionally, received microphone signal is carried out to the division of time frame according to preset period of time, and per adjacent two There are the overlappings of half of preset period of time between a time frame.
In a specific illustrative embodiment, received voice signal is divided into multiple time frames for 20 milliseconds according to every frame, And the overlapping between every two adjacent time frame with 10 milliseconds.Then is applied to each time frame of input signal at 320 points STFT, this can generate 161 frequency separations.
Step S112 extracts spectrum amplitude vector from time frame.
Spectrum amplitude vector is normalized in step S113, forms acoustic feature.
In one exemplary embodiment, STFT is applied to each time frame to extract spectrum amplitude vector, each frequency spectrum Amplitude vector forms acoustic feature after normalized.
Optionally, bigger vector is connected by multiple successive frames centered on current time frame form acoustics spy Sign, to improve the effect of echo cancellor.
For example, when spectrum amplitude vector is normalized, by the frequency spectrum of current time frame and time in the past frame Amplitude vector merging is normalized, and forms acoustic feature.Specifically, previous 5 frame and current time frame are spliced into one A unified feature vector, as input of the invention.The quantity of time in the past frame is also less than 5, improves the reality of application Shi Xing.
Therefore, when extracting acoustic feature from voice signal, voice signal is divided into the time according to preset period of time Frame makes to provide based on the acoustic feature echo cancellation process extracted from each time frame defeated by the way that the reasonable time period is arranged Enter, and merge to form acoustic feature by the way that current time frame is carried out selectivity with the spectrum amplitude vector of time in the past frame, Echo cancellation performance can be improved.
Step S120 carries out acoustic feature in the recurrent neural networks model with shot and long term memory of training in advance Interative computation calculates the ratio film of acoustic feature.
Ratio film is the relationship characterized between input signal and near end signal, indicates and inhibits echo and retain proximal end to believe Number tradeoff.
Ideally, after carrying out masking processing to input signal by ratio film, echo can be carried out to input signal It eliminates, restores near end signal.
With shot and long term memory (LSTM, Long Short-Term Memory) recurrent neural network (RNN, Recurrent Neural Network) (below will " with shot and long term memory recurrent neural network " be referred to as " LSTM ") be In advance made of training.
The acoustic feature that step S110 is obtained is iterated fortune in the LSTM model as the input of LSTM model It calculates, calculates the ratio film to the acoustic feature.
In this step, the target by IRM (Ideal Ratio Mask, ideal ratio film) as interative computation.Frequency spectrum The IRM of each T-F (time-frequency) unit in figure can be stated with following equation:
Wherein SSTFT(t, f) and YSTFT(t, f) be respectively the time-frequency member near end signal and microphone signal amplitude it is big It is small.
Acoustic feature is sheltered by predicted ideal ratio film during supervised training, and then using ratio film, To obtain the near end signal after echo cancellor.
Step S130 shelters acoustic feature using ratio film.
The phase of acoustic feature and microphone signal after masking is synthesized, is obtained by returning by step S140 Near end signal after sound elimination.
After training is completed, during deduction or operation, directly inhibit echo and back using the LSTM model of training Scape noise.Specifically, one input waveform is operated to generate the ratio film of estimation with trained LSTM model.It connects (or masking) is weighted to reflective acoustic feature with this ratio film, to generate the near end signal for eliminating echo.
In one exemplary embodiment, by the spectrum amplitude vector after masking together with the phase of microphone signal It is sent to inverse Fourier transform, to export the near end signal in corresponding time domain.
Using method as described above, when carrying out echo cancellor, acoustic feature is extracted from received input signal, it will Acoustic feature is iterated operation in the recurrent neural networks model with shot and long term memory of training in advance and calculates acoustics spy After the ratio film of sign, acoustic feature is sheltered using the ratio film.Again by the acoustic feature and microphone after masking The phase of signal is synthesized, and realizes echo cancellor.Due to using training in advance there is shot and long term to remember in the program Recurrent neural networks model, so as to realize echo cancellor when ambient noise, double non-existing property of making peace are distorted, significantly The effect and applicable scene of echo cancellor are improved, and without postfilter, is effectively simplified electronic equipment, reduces Electronic equipment cost.
Fig. 3 is the building side according to the recurrent neural networks model with shot and long term memory shown in Fig. 1 corresponding embodiment A kind of specific implementation flow chart of method.As shown in figure 3, the construction method of the recurrent neural networks model that there is shot and long term to remember It may include step S121, step S122 and step S123.
Step S121 determines that the voice of speaking when being trained is used as and proximally and distally (refers to) signal.
Choose when being trained speak voice when mode there are many, can be chosen by way of preestablishing specific Voice of speaking, the voice of speaking when randomly selecting trained can also be passed through.
In order to realize that being not only restricted to training speaks the echo cancellor of voice, by using various male voices and female voice into Row training.
In one exemplary embodiment, by from TIMIT (The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus cooperates building by Texas Instrument, the Massachusetts Institute of Technology and SRI International Acoustics-phoneme continuous speech corpus) voice of speaking of preset quantity is randomly selected in data set.
The speech sample frequency of TIMIT data set is 16kHz, altogether includes 6300 sentences, by coming from eight, U.S. master Want dialect area 630 people everyone say 10 given sentences, all sentences are all at phone-level (phone level) On carried out manual segmentation, label.Wherein, 70% speaker is male, and most of speakers are adult white men.
Step S122 collects voice of speaking as proximal end, distal reference signal, and with this and establishes voice training collection.
Echo signal by remote signaling by microphone it is practical recording or it is artificial synthesized.Voice training collection is by proximal end, remote End is constituted with reference to microphone signal.Wherein, microphone signal is that near end signal is mixed with echo signal.
Optionally, from TIMIT data set in 630 voice of speaking randomly choose 100 pairs speak voice as proximal end with Speak voice (40 pairs of male Females, 30 pairs of male-males, 30 pairs of female-females) in distal end.Every kind is recorded with 16kHz sample rate It speaks 10 language of voice.7 voices of these voice of speaking for generating multiple microphone signals, microphone signal by with The echo signal of near-end speech and the far-end speech selected at random that machine is selected mixes.Remaining 3 voices are for generating 300 test microphone signals.Entire training set is for about 50 hours.In order to further increase for the extensive of voice of speaking Ability, we randomly choose other 10 pairs of voice of speaking (4 couples of males-from 430 speakers remaining in TIMIT data set Women, 3 pairs of male-males and 3 pairs of female-females), generate the test mixed signal of 100 unbred voice of speaking.? Echo signal is recorded with smart phone in 2.7 × 3 × 4.5 meters of room, then the echo signal of recording is added to the letter of proximal end Number formed microphone signal.
Step S123 is trained voice training collection by the recurrent neural network remembered with shot and long term, building tool The recurrent neural networks model for thering is shot and long term to remember.
LSTM is a kind of time recurrent neural network, and paper is published in 1997 for the first time.Due to unique design structure, LSTM is suitable for the critical event being spaced in processing and predicted time sequence and delay is very long.
The performance of LSTM usually more preferably than other time recurrent neural network and Hidden Markov Model (HMM), for example is used On not zonal cooling handwriting recognition.2009, ICDAR handwriting recognition was won with the artificial nerve network model that LSTM is constructed First.LSTM is also commonly used for automatic speech recognition, and database of giving a lecture naturally with TIMIT for 2013 reaches 17.7% mistake The accidentally record of rate.As nonlinear model, LSTM can be used as complicated non-linear unit and construct larger deep neural network.
LSTM is a kind of certain types of RNN, can effectively capture long-term context.Compared with traditional RNN, LSTM changes Be apt in the training process over time and bring gradient reduce or gradient explosion issues.The storage list of LSTM module There are three doors for member: input gate forgets door and out gate.How many current information should be added to memory cell by input gate control, Forget door control should retain how many previous message, out gate controls whether output information.Specifically, LSTM can be retouched with mathematical formulae It states as follows.
it=σ (Wixxt+Wihht-1+bi)
ft=σ (Wfxxt+Wfhht-1+bf)
ot=σ (Woxxt+Wohht-1+bo)
zt=g (Wzxxt+Wzhht-1+bz)
ct=ft⊙ct-1+it⊙zt
ht=ot⊙g(ct)
Wherein it,ftAnd otIt is input gate, the output for forgeing door and out gate respectively.xtAnd htIt is illustrated respectively in time t's Input feature vector and hiding activation.ztAnd ctRespectively indicate block input and storage unit.σ represents sigmoidal function, i.e. and σ (x)= 1/(1+ex), g represents hyperbolic tangent function, i.e. g (x)=(ex-e-x)/(ex+e-x)。bi、bf、boAnd bzIt is input gate respectively, loses Forget door, out gate and the corresponding offset of input block.Symbol ⊙ indicates that array element is gradually multiplied.Input gate and forgetting door are bases What previous activation and current input calculated, and the update of context-sensitive is executed to memory cell.
Fig. 4 is the flow diagram of echo cancellor shown according to an exemplary embodiment.As shown in figure 4, input is to connect The input signal of receipts, the near end signal after exporting as echo cancellor, " 1 " in figure indicate the step of being related to during the training period, figure In " 2 " the step of indicating prediction (deductions) stage, " 3 " in figure indicate the step that training and prediction are shared.As there is supervision Learning method, the present invention are training objective using ideal ratio film (IRM).IRM be by comparing microphone signal STFT and What the STFT of its corresponding near end signal was obtained.In the training stage, the RNN with LSTM estimates each input signal (including wheat Gram wind number and remote signaling) IRM, then calculate the MSE (Mean Square Error, mean square error) between IRM. The MSE of entire training set is minimized by duplicate more wheel iteration, and training sample is used only once in every wheel iteration.Training After completion, during deduction or operation, directly inhibit echo and ambient noise using the LSTM after training.It is specific next It says, trained LSTM handle to input signal and ratio calculated film, then using the ratio film calculated to input signal It is handled, finally recombines to obtain the near end signal after echo cancellor.
The output at top obtains the prediction of ratio film by sigmoidal shape function (referring to fig. 4), then carries out with IRM Compare, by comparing, MSE mistake is generated, for adjusting LSTM weight.
Optionally, Fig. 5 is according to the recurrent neural networks model with shot and long term memory shown in Fig. 3 corresponding embodiment A kind of specific implementation flow chart of step S123 in construction method.As shown in figure 5, step S123 may include step S1231 With step S1232.
Step S1231 extracts the acoustic feature of microphone signal, remote signaling respectively.
Step S1232 passes through the recurrence remembered with shot and long term according to microphone signal, the acoustic feature of remote signaling Neural network carries out the estimation of ideal ratio film when echo cancellor, constructs the recurrent neural networks model with shot and long term memory.
Optionally, Fig. 6 is according to the recurrent neural networks model with shot and long term memory shown in Fig. 3 corresponding embodiment Another specific implementation flow chart of step S123 in construction method.As shown in fig. 6, step S123 may include step S1233, step S1234 and step S1235.
Step S1233 carries out linear echo elimination to microphone signal by traditional AEC algorithm.
Microphone signal is handled in advance by the echo cancellation algorithm of traditional linear AEC, AEC is exported into conduct The input signal of LSTM, and then construct the recurrent neural networks model with shot and long term memory.
Step S1234 carries out the extraction of acoustic feature to remote signaling, linear AEC output respectively.
Step S1235 passes through the recurrence remembered with shot and long term according to the acoustic feature that remote signaling, linear AEC are exported Neural network carries out the estimation of ideal ratio film when echo cancellor, constructs the recurrent neural networks model with shot and long term memory.
Optionally, Fig. 7 is according to the recurrent neural networks model with shot and long term memory shown in Fig. 3 corresponding embodiment Another specific implementation flow chart of step S123 in construction method.As shown in fig. 7, step S123 remove including step S1233, It can also include step S1236, step S1237 outside step S1234 and step S1235.
Step S1236 carries out the extraction of acoustic feature to remote signaling, microphone signal, linear AEC output respectively.
Step S1237, according to the acoustic feature that remote signaling, microphone signal, linear AEC are exported, by with length The recurrent neural network of phase memory carries out the estimation of ideal ratio film when echo cancellor, constructs the recurrence mind with shot and long term memory Through network model.
It will be by step S1231 and step S1232, using microphone signal, remote signaling as input signal, using having The recurrent neural network of shot and long term memory carries out the estimation of ideal ratio film when echo cancellor, and constructing has passing for shot and long term memory Neural network model is returned to be known as LSTM1.
By step S1233, step S1234 and step S1235, first pass through in advance traditional AEC algorithm to microphone signal into Row processing obtains AEC output.And using linear AEC output, remote signaling as input signal, passed using what is remembered with shot and long term The estimation of ideal ratio film when neural network being returned to carry out echo cancellor, constructs the recurrent neural networks model with shot and long term memory Referred to as LSTM2.
By step S1233, step S1236 and step S1237, remote signaling, microphone signal, linear AEC are exported As input signal, the estimation of ideal ratio film when carrying out echo cancellor using the recurrent neural network remembered with shot and long term, Constructing, there is the recurrent neural networks model of shot and long term memory to be known as LSTM3.
It compares and LSTM1, LSTM3 is by further improving docking as supplementary features for the output of traditional AEC algorithm The input signal of receipts carries out the effect of echo cancellor.
Table 1 indicates to carry out STOI (Short-Time when echo cancellor using tri- kinds of models of LSTM1, LSTM2, LSTM3 Objective Intelligibility, in short-term objective intelligibility), PESQ (Perceptual Evaluation of Speech Quality, objective speech quality assessment) and ERLE (Echo Return Loss Enhancement, echo backhaul Increment is lost) results of three kinds of performance indicators.Tri- kinds of models of LSTM1, LSTM2, LSTM3 used in during this all have Two hidden layers, every layer has 512 units."None" is the result of unprocessed signal;" ideal " is the knot of ideal ratio film Fruit can be regarded as the upper limit of optimum.
The system AEC result tested in table 1:STOI, PESQ and ERLE
As shown in table 1, compared with traditional AEC algorithm, tri- models of LSTM1, LSTM2, LSTM3 can be carried out better echo It eliminates.Traditional AEC algorithm is combined with deep learning can be further improved system performance.LSMT3 ratio LSTM2 more can be significant Improve STOI.
In order to further illustrate linear AEC as a result, Fig. 8 is shown according to an exemplary embodiment using smart phone record The microphone signal of system and the spectrogram of near end signal.Fig. 8 (a) illustrates the spectrogram of microphone signal;Fig. 8 (b) is shown The spectrogram of corresponding near end signal;Fig. 8 (c) and Fig. 8 (d) shows using LSTM3 model and uses conventional linear AEC algorithm Spectrum results contrast schematic diagram after carrying out echo cancellor, wherein Fig. 8 (c) illustrates the spectrogram of linear AEC output, Fig. 8 (d) spectrogram that LSTM3 carries out the near end signal obtained after echo cancellor is illustrated.As can be seen that carrying out echo by LSTM3 Output after elimination is very similar to clean near end signal.This shows that proposed method can retain near end signal well, It can inhibit the echo with non-linear distortion and ambient noise.
Using method as described above, input is believed by the recurrent neural networks model with shot and long term memory of building Number carry out echo cancellor when, echo cancellation performance can be effectively improved.
Following is embodiment of the present disclosure, and it is real to can be used for executing this above-mentioned echo cancel method based on deep learning Apply example.For those undisclosed details in the apparatus embodiments, echo cancellor side of the disclosure based on deep learning is please referred to Method embodiment.
Fig. 9 is a kind of block diagram of echo cancelling device based on deep learning shown according to an exemplary embodiment, should Device includes but is not limited to: acoustic feature extraction module 110, ratio film computing module 120, masking block 130 and speech synthesis Module 140.
Acoustic feature extraction module 110, for extracting acoustic feature, the input signal packet from received input signal Include microphone signal and remote signaling;
Ratio film computing module 120, for by the acoustic feature in advance training with shot and long term memory recurrence It is iterated operation in neural network model, calculates the ratio film of the acoustic feature;
Masking block 130, for being sheltered using the ratio film to the acoustic feature;
Voice synthetic module 140, for by by masking after the acoustic feature and the microphone signal phase It is synthesized, obtains the near end signal after echo cancellor.
The realization process of the function of modules and effect in above-mentioned apparatus, is specifically shown in the above-mentioned echo based on deep learning The realization process of step is corresponded in removing method, details are not described herein.
Optionally, as shown in Figure 10, acoustic feature extraction module 110 described in Fig. 9 includes but is not limited to: time frame is drawn Sub-unit 111, spectrum amplitude vector extraction unit 112 and acoustic feature form unit 113.
Time frame division unit 111, for received microphone signal to be divided into time frame according to preset period of time;
Spectrum amplitude vector extraction unit 112, for extracting spectrum amplitude vector from the time frame;
Acoustic feature forms unit 113, and for the spectrum amplitude vector to be normalized, it is special to form acoustics Sign.
Optionally, time frame division unit described in Figure 10 111 includes but is not limited to: the division subelement of time frame.
The division subelement of time frame, for received microphone signal to be carried out time frame according to preset period of time It divides, and there are the overlappings of half of preset period of time between each adjacent two time frame.
Optionally, the formation of acoustic feature described in Figure 10 unit 113 includes but is not limited to: more time frame normalizing beggars are single Member.
More time frames normalize subelement, for current time frame and the spectrum amplitude vector of time in the past frame to be merged into Row normalized forms acoustic feature.
Optionally, as shown in figure 11, the computing module of ratio film described in Fig. 9 120 further includes but is not limited to: voice determines Submodule 121, voice training collection setting up submodule 122 and model construction submodule 123.
Voice determines submodule 121, and the voice of speaking when being trained for determining is proximally and distally (to refer to) signal;
Voice training collection setting up submodule 122, for collecting distal end letter of the voice of speaking as distal end, proximal end when Number, near end signal, voice training collection is established with this, wherein the remote signaling be echo signal, the near end signal with it is described Echo signal forms microphone signal;
Model construction submodule 123, for the recurrent neural network with shot and long term memory described in the voice Training set is trained, the building recurrent neural networks model with shot and long term memory.
Optionally, as shown in figure 12, model construction submodule described in Figure 11 123 further includes but is not limited to: the first sound Learn feature unit 1231 and the first model construction unit 1232.
First acoustic feature unit 1231, for extracting the acoustic feature of the microphone signal, remote signaling respectively;
First model construction unit 1232 passes through institute for the acoustic feature according to the microphone signal, remote signaling The estimation that the recurrent neural network with shot and long term memory carries out ideal ratio film when echo cancellor is stated, building is described to have length The recurrent neural networks model of phase memory.
Optionally, as shown in figure 13, model construction module 123 described in Figure 11 can also include but is not limited to: linear AEC processing unit 1233, the second acoustics feature unit 1234 and the second model construction unit 1235.
Linear AEC processing unit 1233, for being handled by tradition AEC algorithm the microphone signal;
Second acoustics feature unit 1234, for respectively to the remote signaling, linear after the deep learning AEC output carries out the extraction of acoustic feature;
Second model construction unit 1235, the acoustic feature for being exported according to the remote signaling, the linear AEC, By the estimation of ideal ratio film when the recurrent neural network progress echo cancellor with shot and long term memory, the tool is constructed The recurrent neural networks model for thering is shot and long term to remember.
Optionally, as shown in figure 14, model construction module 123 described in Figure 11 can also include but is not limited to: third Acoustic feature unit 1236 and third model construction unit 1237.
Third acoustic feature unit 1236, for respectively to the remote signaling, microphone signal, linear AEC export into The extraction of row acoustic feature;
Third model construction unit 1237, for being exported according to the remote signaling, microphone signal, the linear AEC Acoustic feature, ideal ratio film is estimated when carrying out echo cancellor by the recurrent neural network with shot and long term memory It calculates, the building recurrent neural networks model with shot and long term memory.
Optionally, the present invention also provides a kind of electronic equipment, execute as the above exemplary embodiments it is any shown in be based on The all or part of step of the echo cancel method of deep learning.Electronic equipment includes:
Processor;And
The memory being connect with the processor communication;Wherein,
The memory is stored with readable instruction, and the readable instruction is realized when being executed by the processor as above-mentioned Method described in either exemplary embodiment.
The concrete mode that processor executes operation in terminal in the embodiment is somebody's turn to do related based on deep learning Detailed description is performed in the embodiment of echo cancel method, no detailed explanation will be given here.
In the exemplary embodiment, a kind of storage medium is additionally provided, which is that computer readable storage is situated between Matter, such as can be the provisional and non-transitory computer readable storage medium for including instruction.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, it can To carry out various modifications and change when without departing from the scope.The scope of the present invention is limited only by the attached claims.

Claims (10)

1. a kind of echo cancel method based on deep learning, which is characterized in that the described method includes:
Acoustic feature is extracted from received microphone signal, the microphone signal includes near end signal and remote signaling;
The acoustic feature is iterated operation in the recurrent neural networks model with shot and long term memory of training in advance, Calculate the ratio film of the acoustic feature;
The acoustic feature is sheltered using the ratio film;
The phase of the acoustic feature and the microphone signal after masking is synthesized, is obtained by echo cancellor Near end signal afterwards.
2. the method according to claim 1, wherein described extract acoustic feature from received microphone signal The step of include:
Received microphone signal is divided into time frame according to preset period of time, the microphone signal include near end signal and Remote signaling;
Spectrum amplitude vector is extracted from the time frame;
The spectrum amplitude vector is normalized, acoustic feature is formed.
3. according to the method described in claim 2, being formed it is characterized in that, the spectrum amplitude vector is normalized The step of acoustic feature includes:
Current time frame is merged to be normalized to form acoustic feature with the spectrum amplitude vector of time in the past frame.
4. the method according to claim 1, wherein the recurrence mind with shot and long term memory of training in advance Construction method through network model includes:
Determine that the voice of speaking when being trained is proximally and distally (to refer to) signal;
Remote signaling, near end signal when speaking voice described in collection as distal end, proximal end, and voice training collection is established with this, Wherein the remote signaling is echo signal, and the near end signal and the echo signal form microphone signal;
The voice training collection is trained by the recurrent neural network with shot and long term memory, is had described in building The recurrent neural networks model of shot and long term memory.
5. according to the method described in claim 4, it is characterized in that, passing through the recurrent neural network with shot and long term memory The voice training collection is trained, the step of building recurrent neural networks model with shot and long term memory includes:
The acoustic feature of the microphone signal, distal end (echo) signal is extracted respectively;
According to the microphone signal, the acoustic feature of remote signaling, pass through the recurrent neural net with shot and long term memory Network carries out the estimation of ideal ratio film when echo cancellor, the building recurrent neural networks model with shot and long term memory.
6. according to the method described in claim 4, it is characterized in that, passing through the recurrent neural network with shot and long term memory The voice training collection is trained, it can also be with the step of the building recurrent neural networks model with shot and long term memory Include:
Linear echo elimination is carried out to the microphone signal by traditional AEC algorithm;
The linear AEC for carrying out linear echo elimination to the remote signaling, by the tradition AEC algorithm respectively exports carry out sound Learn the extraction of feature;
According to the remote signaling, the acoustic feature of the linear AEC output, pass through the recurrence mind with shot and long term memory Estimation through ideal ratio film when network progress echo cancellor, the building recurrent neural network mould with shot and long term memory Type.
7. according to the method described in claim 6, it is characterized in that, the method can also include:
The extraction of acoustic feature is carried out to the remote signaling, microphone signal, the linear AEC output respectively;
According to the acoustic feature that the remote signaling, microphone signal, the linear AEC are exported, there is shot and long term by described The recurrent neural network of memory carries out the estimation of ideal ratio film when echo cancellor, the building recurrence with shot and long term memory Neural network model.
8. a kind of echo cancelling device based on deep learning, which is characterized in that described device includes:
Acoustic feature extraction module, for extracting acoustic feature from received input signal, the input signal includes Mike Wind number and remote signaling;
Ratio film computing module, for by the acoustic feature in advance training with shot and long term memory recurrent neural network It is iterated operation in model, calculates the ratio film of the acoustic feature;
Masking block, for being sheltered using the ratio film to the acoustic feature;
Voice synthetic module, for closing the phase of the acoustic feature and the microphone signal after masking At obtaining the near end signal after echo cancellor.
9. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one It manages device to execute, so that at least one described processor is able to carry out the method according to claim 1 to 7.
10. a kind of computer readable storage medium, for storing program, which is characterized in that described program makes when executed Electronic equipment executes the method according to claim 1 to 7.
CN201811013935.8A 2018-08-31 2018-08-31 Echo cancellation method based on deep learning Active CN109841206B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811013935.8A CN109841206B (en) 2018-08-31 2018-08-31 Echo cancellation method based on deep learning
PCT/CN2019/090528 WO2020042706A1 (en) 2018-08-31 2019-06-10 Deep learning-based acoustic echo cancellation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811013935.8A CN109841206B (en) 2018-08-31 2018-08-31 Echo cancellation method based on deep learning

Publications (2)

Publication Number Publication Date
CN109841206A true CN109841206A (en) 2019-06-04
CN109841206B CN109841206B (en) 2022-08-05

Family

ID=66883031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811013935.8A Active CN109841206B (en) 2018-08-31 2018-08-31 Echo cancellation method based on deep learning

Country Status (2)

Country Link
CN (1) CN109841206B (en)
WO (1) WO2020042706A1 (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136737A (en) * 2019-06-18 2019-08-16 北京拙河科技有限公司 A kind of voice de-noising method and device
CN110473516A (en) * 2019-09-19 2019-11-19 百度在线网络技术(北京)有限公司 Phoneme synthesizing method, device and electronic equipment
CN110660406A (en) * 2019-09-30 2020-01-07 大象声科(深圳)科技有限公司 Real-time voice noise reduction method of double-microphone mobile phone in close-range conversation scene
WO2020042706A1 (en) * 2018-08-31 2020-03-05 大象声科(深圳)科技有限公司 Deep learning-based acoustic echo cancellation method
CN110944089A (en) * 2019-11-04 2020-03-31 中移(杭州)信息技术有限公司 Double-talk detection method and electronic equipment
CN110956976A (en) * 2019-12-17 2020-04-03 苏州科达科技股份有限公司 Echo cancellation method, device, equipment and readable storage medium
CN111161752A (en) * 2019-12-31 2020-05-15 歌尔股份有限公司 Echo cancellation method and device
CN111292759A (en) * 2020-05-11 2020-06-16 上海亮牛半导体科技有限公司 Stereo echo cancellation method and system based on neural network
CN111343410A (en) * 2020-02-14 2020-06-26 北京字节跳动网络技术有限公司 Mute prompt method and device, electronic equipment and storage medium
CN111353258A (en) * 2020-02-10 2020-06-30 厦门快商通科技股份有限公司 Echo suppression method based on coding and decoding neural network, audio device and equipment
CN111370016A (en) * 2020-03-20 2020-07-03 北京声智科技有限公司 Echo cancellation method and electronic equipment
CN111654572A (en) * 2020-05-27 2020-09-11 维沃移动通信有限公司 Audio processing method and device, electronic equipment and storage medium
CN111768796A (en) * 2020-07-14 2020-10-13 中国科学院声学研究所 Acoustic echo cancellation and dereverberation method and device
CN111816177A (en) * 2020-07-03 2020-10-23 北京声智科技有限公司 Voice interruption control method and device for elevator and elevator
CN111883154A (en) * 2020-07-17 2020-11-03 海尔优家智能科技(北京)有限公司 Echo cancellation method and apparatus, computer-readable storage medium, and electronic apparatus
CN111951819A (en) * 2020-08-20 2020-11-17 北京字节跳动网络技术有限公司 Echo cancellation method, device and storage medium
CN112055284A (en) * 2019-06-05 2020-12-08 北京地平线机器人技术研发有限公司 Echo cancellation method, neural network training method, apparatus, medium, and device
CN112203180A (en) * 2020-09-24 2021-01-08 安徽文香信息技术有限公司 Smart classroom loudspeaker headset self-adaptive volume adjusting system and method
CN112259112A (en) * 2020-09-28 2021-01-22 上海声瀚信息科技有限公司 Echo cancellation method combining voiceprint recognition and deep learning
CN112420073A (en) * 2020-10-12 2021-02-26 北京百度网讯科技有限公司 Voice signal processing method, device, electronic equipment and storage medium
CN112466318A (en) * 2020-10-27 2021-03-09 北京百度网讯科技有限公司 Voice processing method and device and voice processing model generation method and device
CN112489668A (en) * 2020-11-04 2021-03-12 北京百度网讯科技有限公司 Dereverberation method, dereverberation device, electronic equipment and storage medium
CN112634933A (en) * 2021-03-10 2021-04-09 北京世纪好未来教育科技有限公司 Echo cancellation method and device, electronic equipment and readable storage medium
CN112786068A (en) * 2021-01-12 2021-05-11 普联国际有限公司 Audio source separation method and device and storage medium
CN113012709A (en) * 2019-12-20 2021-06-22 北京声智科技有限公司 Echo cancellation method and device
CN113053400A (en) * 2019-12-27 2021-06-29 武汉Tcl集团工业研究院有限公司 Training method of audio signal noise reduction model, audio signal noise reduction method and device
CN113077812A (en) * 2021-03-19 2021-07-06 北京声智科技有限公司 Speech signal generation model training method, echo cancellation method, device and equipment
CN113179354A (en) * 2021-04-26 2021-07-27 北京有竹居网络技术有限公司 Sound signal processing method and device and electronic equipment
CN113192527A (en) * 2021-04-28 2021-07-30 北京达佳互联信息技术有限公司 Method, apparatus, electronic device and storage medium for cancelling echo
CN113257267A (en) * 2021-05-31 2021-08-13 北京达佳互联信息技术有限公司 Method for training interference signal elimination model and method and equipment for eliminating interference signal
CN113707166A (en) * 2021-04-07 2021-11-26 腾讯科技(深圳)有限公司 Voice signal processing method, apparatus, computer device and storage medium
CN114173259A (en) * 2021-12-28 2022-03-11 思必驰科技股份有限公司 Echo cancellation method and system
WO2022077305A1 (en) * 2020-10-15 2022-04-21 Beijing Didi Infinity Technology And Development Co., Ltd. Method and system for acoustic echo cancellation
CN115762552A (en) * 2023-01-10 2023-03-07 阿里巴巴达摩院(杭州)科技有限公司 Method for training echo cancellation model, echo cancellation method and corresponding device
CN116386655A (en) * 2023-06-05 2023-07-04 深圳比特微电子科技有限公司 Echo cancellation model building method and device

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883155B (en) * 2020-07-17 2023-10-27 海尔优家智能科技(北京)有限公司 Echo cancellation method, device and storage medium
CN112750449B (en) * 2020-09-14 2024-02-20 腾讯科技(深圳)有限公司 Echo cancellation method, device, terminal, server and storage medium
CN113096679A (en) * 2021-04-02 2021-07-09 北京字节跳动网络技术有限公司 Audio data processing method and device
CN113744748A (en) * 2021-08-06 2021-12-03 浙江大华技术股份有限公司 Network model training method, echo cancellation method and device
CN116778970B (en) * 2023-08-25 2023-11-24 长春市鸣玺科技有限公司 Voice detection model training method in strong noise environment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719969A (en) * 2009-11-26 2010-06-02 美商威睿电通公司 Method and system for judging double-end conversation and method and system for eliminating echo
US8189766B1 (en) * 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
US20140328490A1 (en) * 2013-05-03 2014-11-06 Qualcomm Incorporated Multi-channel echo cancellation and noise suppression
CN104157293A (en) * 2014-08-28 2014-11-19 福建师范大学福清分校 Signal processing method for enhancing target voice signal pickup in sound environment
CN104581516A (en) * 2013-10-15 2015-04-29 清华大学 Dual-microphone noise reduction method and device for medical acoustic signals
CN107452389A (en) * 2017-07-20 2017-12-08 大象声科(深圳)科技有限公司 A kind of general monophonic real-time noise-reducing method
US20180040333A1 (en) * 2016-08-03 2018-02-08 Apple Inc. System and method for performing speech enhancement using a deep neural network-based signal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9672821B2 (en) * 2015-06-05 2017-06-06 Apple Inc. Robust speech recognition in the presence of echo and noise using multiple signals for discrimination
CN105225672B (en) * 2015-08-21 2019-02-22 胡旻波 Merge the system and method for the dual microphone orientation noise suppression of fundamental frequency information
CN106373583B (en) * 2016-09-28 2019-05-21 北京大学 Multi-audio-frequency object coding and decoding method based on ideal soft-threshold mask IRM
CN107845389B (en) * 2017-12-21 2020-07-17 北京工业大学 Speech enhancement method based on multi-resolution auditory cepstrum coefficient and deep convolutional neural network
CN109841206B (en) * 2018-08-31 2022-08-05 大象声科(深圳)科技有限公司 Echo cancellation method based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8189766B1 (en) * 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
CN101719969A (en) * 2009-11-26 2010-06-02 美商威睿电通公司 Method and system for judging double-end conversation and method and system for eliminating echo
US20140328490A1 (en) * 2013-05-03 2014-11-06 Qualcomm Incorporated Multi-channel echo cancellation and noise suppression
CN104581516A (en) * 2013-10-15 2015-04-29 清华大学 Dual-microphone noise reduction method and device for medical acoustic signals
CN104157293A (en) * 2014-08-28 2014-11-19 福建师范大学福清分校 Signal processing method for enhancing target voice signal pickup in sound environment
US20180040333A1 (en) * 2016-08-03 2018-02-08 Apple Inc. System and method for performing speech enhancement using a deep neural network-based signal
CN107452389A (en) * 2017-07-20 2017-12-08 大象声科(深圳)科技有限公司 A kind of general monophonic real-time noise-reducing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨飞然等: "一种基于软判决的立体声回声抵消算法", 《电声技术》 *

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020042706A1 (en) * 2018-08-31 2020-03-05 大象声科(深圳)科技有限公司 Deep learning-based acoustic echo cancellation method
CN112055284A (en) * 2019-06-05 2020-12-08 北京地平线机器人技术研发有限公司 Echo cancellation method, neural network training method, apparatus, medium, and device
CN110136737A (en) * 2019-06-18 2019-08-16 北京拙河科技有限公司 A kind of voice de-noising method and device
CN110473516A (en) * 2019-09-19 2019-11-19 百度在线网络技术(北京)有限公司 Phoneme synthesizing method, device and electronic equipment
CN110473516B (en) * 2019-09-19 2020-11-27 百度在线网络技术(北京)有限公司 Voice synthesis method and device and electronic equipment
US11417314B2 (en) 2019-09-19 2022-08-16 Baidu Online Network Technology (Beijing) Co., Ltd. Speech synthesis method, speech synthesis device, and electronic apparatus
CN110660406A (en) * 2019-09-30 2020-01-07 大象声科(深圳)科技有限公司 Real-time voice noise reduction method of double-microphone mobile phone in close-range conversation scene
CN110944089A (en) * 2019-11-04 2020-03-31 中移(杭州)信息技术有限公司 Double-talk detection method and electronic equipment
CN110956976B (en) * 2019-12-17 2022-09-09 苏州科达科技股份有限公司 Echo cancellation method, device and equipment and readable storage medium
CN110956976A (en) * 2019-12-17 2020-04-03 苏州科达科技股份有限公司 Echo cancellation method, device, equipment and readable storage medium
CN113012709A (en) * 2019-12-20 2021-06-22 北京声智科技有限公司 Echo cancellation method and device
CN113053400B (en) * 2019-12-27 2024-06-07 武汉Tcl集团工业研究院有限公司 Training method of audio signal noise reduction model, audio signal noise reduction method and equipment
CN113053400A (en) * 2019-12-27 2021-06-29 武汉Tcl集团工业研究院有限公司 Training method of audio signal noise reduction model, audio signal noise reduction method and device
CN111161752A (en) * 2019-12-31 2020-05-15 歌尔股份有限公司 Echo cancellation method and device
CN111161752B (en) * 2019-12-31 2022-10-14 歌尔股份有限公司 Echo cancellation method and device
CN111353258A (en) * 2020-02-10 2020-06-30 厦门快商通科技股份有限公司 Echo suppression method based on coding and decoding neural network, audio device and equipment
CN111343410A (en) * 2020-02-14 2020-06-26 北京字节跳动网络技术有限公司 Mute prompt method and device, electronic equipment and storage medium
CN111370016B (en) * 2020-03-20 2023-11-10 北京声智科技有限公司 Echo cancellation method and electronic equipment
CN111370016A (en) * 2020-03-20 2020-07-03 北京声智科技有限公司 Echo cancellation method and electronic equipment
CN111292759B (en) * 2020-05-11 2020-07-31 上海亮牛半导体科技有限公司 Stereo echo cancellation method and system based on neural network
CN111292759A (en) * 2020-05-11 2020-06-16 上海亮牛半导体科技有限公司 Stereo echo cancellation method and system based on neural network
CN111654572A (en) * 2020-05-27 2020-09-11 维沃移动通信有限公司 Audio processing method and device, electronic equipment and storage medium
CN111816177A (en) * 2020-07-03 2020-10-23 北京声智科技有限公司 Voice interruption control method and device for elevator and elevator
CN111768796A (en) * 2020-07-14 2020-10-13 中国科学院声学研究所 Acoustic echo cancellation and dereverberation method and device
CN111768796B (en) * 2020-07-14 2024-05-03 中国科学院声学研究所 Acoustic echo cancellation and dereverberation method and device
CN111883154B (en) * 2020-07-17 2023-11-28 海尔优家智能科技(北京)有限公司 Echo cancellation method and device, computer-readable storage medium, and electronic device
CN111883154A (en) * 2020-07-17 2020-11-03 海尔优家智能科技(北京)有限公司 Echo cancellation method and apparatus, computer-readable storage medium, and electronic apparatus
CN111951819A (en) * 2020-08-20 2020-11-17 北京字节跳动网络技术有限公司 Echo cancellation method, device and storage medium
CN111951819B (en) * 2020-08-20 2024-04-09 北京字节跳动网络技术有限公司 Echo cancellation method, device and storage medium
CN112203180A (en) * 2020-09-24 2021-01-08 安徽文香信息技术有限公司 Smart classroom loudspeaker headset self-adaptive volume adjusting system and method
CN112259112A (en) * 2020-09-28 2021-01-22 上海声瀚信息科技有限公司 Echo cancellation method combining voiceprint recognition and deep learning
CN112420073B (en) * 2020-10-12 2024-04-16 北京百度网讯科技有限公司 Voice signal processing method, device, electronic equipment and storage medium
CN112420073A (en) * 2020-10-12 2021-02-26 北京百度网讯科技有限公司 Voice signal processing method, device, electronic equipment and storage medium
WO2022077305A1 (en) * 2020-10-15 2022-04-21 Beijing Didi Infinity Technology And Development Co., Ltd. Method and system for acoustic echo cancellation
CN115668366A (en) * 2020-10-15 2023-01-31 北京嘀嘀无限科技发展有限公司 Acoustic echo cancellation method and system
KR20210116372A (en) * 2020-10-27 2021-09-27 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Voice processing method and device and voice processing model generation method and device
CN112466318A (en) * 2020-10-27 2021-03-09 北京百度网讯科技有限公司 Voice processing method and device and voice processing model generation method and device
CN112466318B (en) * 2020-10-27 2024-01-19 北京百度网讯科技有限公司 Speech processing method and device and speech processing model generation method and device
KR102577513B1 (en) * 2020-10-27 2023-09-12 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Voice processing method and device and voice processing model generation method and device
CN112489668B (en) * 2020-11-04 2024-02-02 北京百度网讯科技有限公司 Dereverberation method, device, electronic equipment and storage medium
CN112489668A (en) * 2020-11-04 2021-03-12 北京百度网讯科技有限公司 Dereverberation method, dereverberation device, electronic equipment and storage medium
CN112786068B (en) * 2021-01-12 2024-01-16 普联国际有限公司 Audio sound source separation method, device and storage medium
CN112786068A (en) * 2021-01-12 2021-05-11 普联国际有限公司 Audio source separation method and device and storage medium
CN112634933A (en) * 2021-03-10 2021-04-09 北京世纪好未来教育科技有限公司 Echo cancellation method and device, electronic equipment and readable storage medium
CN112634933B (en) * 2021-03-10 2021-06-22 北京世纪好未来教育科技有限公司 Echo cancellation method and device, electronic equipment and readable storage medium
CN113077812A (en) * 2021-03-19 2021-07-06 北京声智科技有限公司 Speech signal generation model training method, echo cancellation method, device and equipment
CN113077812B (en) * 2021-03-19 2024-07-23 北京声智科技有限公司 Voice signal generation model training method, echo cancellation method, device and equipment
CN113707166A (en) * 2021-04-07 2021-11-26 腾讯科技(深圳)有限公司 Voice signal processing method, apparatus, computer device and storage medium
CN113707166B (en) * 2021-04-07 2024-06-07 腾讯科技(深圳)有限公司 Voice signal processing method, device, computer equipment and storage medium
CN113179354A (en) * 2021-04-26 2021-07-27 北京有竹居网络技术有限公司 Sound signal processing method and device and electronic equipment
CN113179354B (en) * 2021-04-26 2023-10-10 北京有竹居网络技术有限公司 Sound signal processing method and device and electronic equipment
WO2022227932A1 (en) * 2021-04-26 2022-11-03 北京有竹居网络技术有限公司 Method and apparatus for processing sound signals, and electronic device
CN113192527B (en) * 2021-04-28 2024-03-19 北京达佳互联信息技术有限公司 Method, apparatus, electronic device and storage medium for canceling echo
CN113192527A (en) * 2021-04-28 2021-07-30 北京达佳互联信息技术有限公司 Method, apparatus, electronic device and storage medium for cancelling echo
CN113257267A (en) * 2021-05-31 2021-08-13 北京达佳互联信息技术有限公司 Method for training interference signal elimination model and method and equipment for eliminating interference signal
CN114173259B (en) * 2021-12-28 2024-03-26 思必驰科技股份有限公司 Echo cancellation method and system
CN114173259A (en) * 2021-12-28 2022-03-11 思必驰科技股份有限公司 Echo cancellation method and system
CN115762552A (en) * 2023-01-10 2023-03-07 阿里巴巴达摩院(杭州)科技有限公司 Method for training echo cancellation model, echo cancellation method and corresponding device
CN116386655B (en) * 2023-06-05 2023-09-08 深圳比特微电子科技有限公司 Echo cancellation model building method and device
CN116386655A (en) * 2023-06-05 2023-07-04 深圳比特微电子科技有限公司 Echo cancellation model building method and device

Also Published As

Publication number Publication date
WO2020042706A1 (en) 2020-03-05
CN109841206B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN109841206A (en) A kind of echo cancel method based on deep learning
CN111756942B (en) Communication device and method for performing echo cancellation and computer readable medium
Luo et al. Real-time single-channel dereverberation and separation with time-domain audio separation network.
Mammone et al. Robust speaker recognition: A feature-based approach
KR100908121B1 (en) Speech feature vector conversion method and apparatus
CN104157293B (en) The signal processing method of targeted voice signal pickup in a kind of enhancing acoustic environment
CN109841226A (en) A kind of single channel real-time noise-reducing method based on convolution recurrent neural network
Zhao et al. Late reverberation suppression using recurrent neural networks with long short-term memory
CN112735456B (en) Speech enhancement method based on DNN-CLSTM network
Pandey et al. Self-attending RNN for speech enhancement to improve cross-corpus generalization
CN109887489B (en) Speech dereverberation method based on depth features for generating countermeasure network
Xiao et al. The NTU-ADSC systems for reverberation challenge 2014
Wan et al. Networks for speech enhancement
Kolossa et al. Independent component analysis and time-frequency masking for speech recognition in multitalker conditions
CN109979476A (en) A kind of method and device of speech dereverbcration
CN112037809A (en) Residual echo suppression method based on multi-feature flow structure deep neural network
CN110660406A (en) Real-time voice noise reduction method of double-microphone mobile phone in close-range conversation scene
CN111986679A (en) Speaker confirmation method, system and storage medium for responding to complex acoustic environment
CN110998723B (en) Signal processing device using neural network, signal processing method, and recording medium
Lv et al. A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation
Geng et al. End-to-end speech enhancement based on discrete cosine transform
Kothapally et al. Skipconvgan: Monaural speech dereverberation using generative adversarial networks via complex time-frequency masking
Nathwani et al. An extended experimental investigation of DNN uncertainty propagation for noise robust ASR
Peer et al. Reverberation matching for speaker recognition
Blouet et al. Evaluation of several strategies for single sensor speech/music separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40008141

Country of ref document: HK

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 533, podium building 12, Shenzhen Bay science and technology ecological park, No.18, South Keji Road, high tech community, Yuehai street, Nanshan District, Shenzhen, Guangdong 518000

Applicant after: ELEVOC TECHNOLOGY Co.,Ltd.

Address before: 2206, phase I, International Students Pioneer Building, 29 Gaoxin South Ring Road, Yuehai street, Nanshan District, Shenzhen, Guangdong 518000

Applicant before: ELEVOC TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant