WO2021114779A1 - Echo cancellation method, apparatus, and system employing double-talk detection - Google Patents
Echo cancellation method, apparatus, and system employing double-talk detection Download PDFInfo
- Publication number
- WO2021114779A1 WO2021114779A1 PCT/CN2020/114168 CN2020114168W WO2021114779A1 WO 2021114779 A1 WO2021114779 A1 WO 2021114779A1 CN 2020114168 W CN2020114168 W CN 2020114168W WO 2021114779 A1 WO2021114779 A1 WO 2021114779A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- input sound
- utterance
- sound signal
- double
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000001514 detection method Methods 0.000 title claims abstract description 40
- 230000005236 sound signal Effects 0.000 claims abstract description 142
- 238000012545 processing Methods 0.000 claims abstract description 43
- 238000001914 filtration Methods 0.000 claims abstract description 40
- 238000013507 mapping Methods 0.000 claims abstract description 23
- 230000003044 adaptive effect Effects 0.000 claims abstract description 19
- 230000004044 response Effects 0.000 claims description 51
- 238000003672 processing method Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 12
- 230000000717 retained effect Effects 0.000 claims description 5
- 230000006854 communication Effects 0.000 abstract description 29
- 238000004891 communication Methods 0.000 abstract description 25
- 238000004519 manufacturing process Methods 0.000 abstract 3
- 238000010586 diagram Methods 0.000 description 6
- 238000005070 sampling Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000002592 echocardiography Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000009365 direct transmission Effects 0.000 description 2
- 238000011897 real-time detection Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005577 local transmission Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
- H04M9/082—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Definitions
- This application relates to the field of voice communication, and in particular to an echo cancellation method, device and system based on double-ended voice detection.
- the acoustic echo is due to the coupling between the speaker and the terminal microphone, resulting in the telephone microphone not only containing useful voice signals, but also echo. If the microphone signal is not processed, the echo signal and the near-end voice signal will be transmitted to the far-end speaker for playback, and the far-end caller will hear his delayed voice, which will make people feel uncomfortable and affect the call Effect. When the echo is loud, the call cannot even be carried out normally. Therefore, effective measures must be taken to suppress the echo and eliminate its impact in order to improve the quality of voice communication.
- Echo cancellation has become an engineering problem that needs to be solved since Bell invented the telephone.
- communication methods and application scenarios have become increasingly diversified, and communication terminals have become more and more compact, making the coupling between speakers and microphones stronger and stronger, and the echo channel has become more and more complex and changeable. This is voice communication.
- Acoustic echo cancellation in the system poses a great challenge.
- Acoustic echo is generally produced in hands-free communication systems. It is an echo generation method affected by sound wave propagation. Generally, it can be divided into two situations: direct echo and indirect echo.
- Direct echo means that the sound played by the speaker directly enters the microphone along the path without any reflection and is picked up. This echo has the shortest delay time, and the voice energy of the far-end speaker, the distance and angle between the speaker and the microphone, and the speaker The playback volume and the pickup sensitivity of the microphone are related to other factors.
- Indirect echo refers to the collection of echoes generated by the sound played by the speaker entering the microphone after being reflected one or more times through different paths. The characteristics of this echo are long delay time, large delay jitter, and the amount of echo that is greatly affected by the environment.
- an adaptive echo canceller (Acoustic Echo Canceller, AEC for short) is usually used to cancel the echo.
- AEC Acoustic Echo Canceller
- the basic principle of AEC can be summarized as adaptively estimating the echo and subtracting the estimated echo from the signal picked up by the microphone.
- AEC can avoid the influence of echo between the callers; in the hands-free phone, AEC can minimize the echo.
- the echo cancellation effect of AEC can meet the current needs; however, when there is obvious near-end sound, the performance of AEC based on various existing adaptive filtering algorithms will deteriorate, and it cannot even guarantee self-control. Adapt to the convergence of the filtering algorithm.
- double-talk detector DTD
- a typical application of DTD is to freeze AEC updates during double-talk periods to prevent adaptive filtering algorithms. Divergence.
- the double-ended utterance detection algorithm may specifically include an energy-based double-ended utterance detection algorithm, a double-ended utterance detection algorithm based on signal correlation characteristics, and a double-ended utterance detection algorithm based on spectral characteristics.
- These double-ended vocalization detection algorithms all rely on the selection of a fixed threshold, and the vocalization state is judged by comparing the calculated statistics with the threshold.
- the fixed threshold method cannot accurately detect the double-ended voice state. This not only affects the robustness of echo cancellation, but also produces severe sound cuts during subsequent processing, that is, the sound transmitted to the remote user will be intermittent.
- the main influencing factor in hands-free communication equipment is the signal-to-return ratio of the signal received by the microphone, that is, the amplitude (power) ratio of the near-end voice received by the microphone to the echo signal received from the speaker.
- the microphone's response ratio is usually lower during hands-free calls, and the distance between the microphone and the near-end talker, the volume of the near-end talker, and the size of the echo will change the return ratio. This makes the traditional The double-ended voice detection algorithm based on a fixed threshold often fails, and it is difficult to balance the duplex and de-echo performance in hands-free calling.
- the echo cancellation technology in the prior art cannot accurately filter out the echo interference in double-ended voice problems, especially in hands-free calls and conference calls, and the call quality is easily affected.
- the technical problem solved by this application is how to better eliminate echo and improve the duplex call experience of the hands-free voice communication terminal.
- embodiments of the present application provide an echo cancellation method, device, and system based on double-ended vocalization detection, where the echo cancellation method based on double-ended vocalization detection may include: acquiring an input sound signal from a sound collection device; Perform adaptive filtering on the input sound signal to obtain a near-end speech estimation signal; determine the current utterance state according to the near-end speech estimation signal; obtain a preset mapping relationship between the utterance state and the processing mode, according to the The mapping relationship obtains the processing mode corresponding to the current utterance state; processes the near-end speech estimation signal according to the processing mode; and outputs the processed near-end speech estimation signal to obtain an output signal.
- the determining the current utterance state according to the near-end speech estimation signal includes: calculating the double-ended utterance state of the current frame in the input sound signal according to the input sound signal and the near-end speech estimation signal The average value of the statistics; obtain the dual-speaker judgment threshold corresponding to the current frame, the dual-speaker judgment threshold is obtained according to the signal-to-return ratio of the input sound signal and the near-end interference signal; according to the double-end utterance of the current frame The relationship between the average value of the state statistics and the dual-talk judgment threshold is used to determine the current utterance state.
- the calculating the average value of the double-ended utterance state statistics of the current frame in the input sound signal according to the input sound signal and the near-end speech estimation signal includes: calculating the current state according to the following formula The average value of the double-ended utterance state statistics of the frame: where, is the average value of the double-ended utterance state statistics of the current frame, represents the power of the near-end speech estimation signal at the kth frame and the nth sample point, and represents the total The power of the input sound signal in the k-th frame and the n-th sample point represents the average value of the values in the brackets.
- the obtaining the dual-talk judgment threshold corresponding to the current frame includes: real-time estimation of the input sound signal To obtain the average response ratio of the current frame in the input sound signal; obtain multiple preset thresholds for the response ratio, and construct multiple response ratio intervals according to the multiple thresholds; Determine the interval of the average response ratio of the current frame to which the average response ratio belongs, and obtain the dual-speaking judgment threshold corresponding to the interval of the said current frame as the dual-speaking judgment threshold of the current frame.
- the real-time estimation of the signal return ratio of the input sound signal to obtain the average signal return ratio of the current frame in the input sound signal includes: acquiring a near-end interference signal, where the near-end interference signal is and The sound signal generated by the sound generating device at the same end of the sound collection device; calculate the average response ratio of the current frame in the input sound signal according to the following formula; wherein, represents the estimated average response ratio of the k-th frame, Its unit is dB, which represents the power of the input sound signal at the k-th frame and the n-th sample point, represents the power of the near-end interference signal at the k-th frame and the n-th sample point, and represents the value in brackets average value.
- the acquiring multiple preset thresholds of the return ratio, and constructing multiple intervals of the return ratio according to the multiple thresholds includes: comparing the acquired multiple thresholds with the return ratio. The two adjacent ones are used as the boundary value of the RR interval to obtain multiple RL interval.
- the utterance state includes two states: only the far-end utterance and not only the far-end utterance, and the preset mapping relationship between the utterance state and the processing mode includes: when the utterance state is only the far-end utterance, Performing zeroing processing on the near-end speech estimation signal or suppressing it to be inaudible; when the utterance state is judged to be not only the far-end utterance, the near-end speech estimation signal is retained.
- the not only far-end utterance includes two states: near-end utterance only and double-ended utterance.
- the performing adaptive filtering on the input sound signal to obtain the near-end speech estimation signal includes: performing linear filtering and non-linear filtering on the input sound signal, respectively, to obtain the near-end speech estimation signal.
- An embodiment of the present application also provides an echo cancellation device based on double-ended vocalization detection.
- the device includes: an input sound signal acquisition module for acquiring an input sound signal from a sound collection device; a filtering module for evaluating the input sound The signal is adaptively filtered to obtain the near-end speech estimation signal; the current utterance state determination module is used to determine the current utterance state according to the near-end speech estimation signal; the processing method acquisition module is used to obtain the preset utterance state and The mapping relationship between the processing modes, the processing mode corresponding to the current utterance state is obtained according to the mapping relationship; a near-end processing module, configured to process the near-end speech estimation signal according to the processing mode; an output module , Used to output the processed near-end speech estimation signal to obtain an output signal.
- the embodiment of the present application also provides an echo cancellation system based on double-ended voice detection, including a sound collection device, a same-end voice device, and an echo cancellation device, and the echo cancellation device executes the steps of any one of the above-mentioned methods.
- An embodiment of the present application provides an echo cancellation method based on double-ended utterance detection.
- the method includes: acquiring an input sound signal from a sound collection device; adaptively filtering the input sound signal to obtain a near-end speech estimation signal; Determine the current utterance state according to the near-end speech estimation signal; obtain the mapping relationship between the preset utterance state and the processing mode, and obtain the processing mode corresponding to the current utterance state according to the mapping relationship;
- the near-end speech estimation signal is processed in a manner; the processed near-end speech estimation signal is output to obtain an output signal.
- the input sound signal in a voice call such as a telephone is different from the direct transmission to the peer device in the existing communication scheme or only the adaptive echo cancellation is transmitted to the peer device.
- the technical scheme in this method Customize different processing methods according to different sounding states corresponding to the input sound signal, and accurately filter out the echo in the input sound signal by combining the characteristics of double-ended sounding. Especially in the call system that is greatly affected by the interference of double-end vocalization, such as hands-free call and voice conference, the call quality can be significantly improved.
- the real-time sounding state judgment is performed on each frame of the input sound signal to realize the real-time update of the processing method of the near-end voice estimation signal, so that the input sound signal can be accurately and completely echo canceled, and the call can be guaranteed. Stability of the process.
- the signal-to-return ratio of the input sound signal with the near-end interference signal as the echo source is calculated in real time by sampling, and different dual-talk judgment thresholds are set when the influence of the near-end interference signal on the input sound signal is different. It can more accurately determine the current sounding state and improve the accuracy of echo cancellation for the input sound signal.
- two utterance states are defined, and processing methods corresponding to the two utterance states are specified, which can basically meet the requirements of real-time echo cancellation in common voice calls.
- the adaptive filtering of the input sound signal includes two operations of linear filtering and non-linear filtering, which can further suppress the echo of the input sound signal.
- the echo cancellation system based on double-ended vocalization detection provided by the embodiments of the present application can perform real-time detection based on the acoustic echo generated in the communication process, and eliminate it based on the detection result, so that the echo cancellation system can be improved when the voice communication terminal is in the hands-free mode. Eliminate the effect to improve the quality of the call.
- the echo cancellation method, device and system based on double-ended utterance detection provided in the embodiments of the present application can distinguish between only far-end utterance and only near-end utterance or double-ended utterance in real time.
- the time-domain output result is zeroed or suppressed to inaudible, so that the echo can be eliminated to the greatest extent while ensuring the duplex call performance, so as to improve the echo cancellation and duplex performance at the same time.
- the purpose is to improve the duplex call experience of the hands-free voice communication terminal.
- FIG. 1 is a schematic flowchart of an echo cancellation method based on double-ended vocalization detection according to an embodiment of the present application
- FIG. 2 is a schematic diagram of the application of an echo cancellation method based on double-ended vocalization detection according to an embodiment of the present application
- FIG. 3 is a schematic flowchart of step S103 in FIG. 1 in an embodiment of the present application.
- FIG. 4 is a schematic flowchart of step S302 in FIG. 3 in an embodiment of the present application.
- FIG. 5 is a schematic diagram of a response ratio interval according to an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of an echo cancellation device based on double-ended vocalization detection according to an embodiment of the present application
- FIG. 7 is a schematic structural diagram of an echo cancellation system based on double-ended vocalization detection according to an embodiment of the present application.
- the echo cancellation technology in the prior art cannot accurately filter out the echo interference in double-ended voice problems, especially in hands-free calls and conference calls, and the call quality is easily affected.
- an embodiment of the present application provides an echo cancellation method based on double-ended vocalization detection.
- the method includes: acquiring an input sound signal from a sound collection device; performing adaptive filtering on the input sound signal to obtain a close End speech estimation signal; determine the current utterance state according to the near-end speech estimation signal; obtain the mapping relationship between the preset utterance state and processing mode, and obtain the processing mode corresponding to the current utterance state according to the mapping relationship ; Process the near-end speech estimation signal according to the processing mode; output the processed near-end speech estimation signal to obtain an output signal.
- Adopting the solution described in this embodiment can filter out the interference signal in the double-ended voice, and significantly improve the quality of the call.
- Figure 1 provides a schematic flow chart of an echo cancellation method based on double-ended vocalization detection; the method may specifically include the following steps:
- S101 Obtain an input sound signal from a sound collection device.
- the input sound signal is the sound signal collected by the sound collection device.
- the sound collection device may be a microphone or other device, and for a telephone or phone-like call, it is a sound collection device that comes with a terminal such as a mobile phone, a landline or a computer.
- the terminal such as the telephone collects the sound of the local end through the sound collection device in real time, and transmits it to the opposite end of the call through the communication line.
- the sound collection device at the local end collects the input sound signal, it is not directly transmitted to the call. Instead, through the following steps S102 to S106, the input sound signal is echo canceled to improve the quality of the voice call.
- S102 Perform adaptive filtering on the input sound signal to obtain a near-end speech estimation signal.
- the adaptive filtering method After acquiring the input sound signal from the sound collection device, the acquired input sound signal is filtered to filter out the echo signal generated at the local end that interferes with the normal call, and to obtain the near-end voice estimation signal after the echo signal is filtered out.
- the adaptive filtering method can use an adaptive echo canceller (ie, AEC) to filter the input sound signal to filter out the near-end speech estimation signal.
- AEC adaptive echo canceller
- S103 Determine the current utterance state according to the near-end speech estimation signal.
- the utterance state can include different states such as far-end utterance only, double-end utterance, and near-end utterance only.
- the utterance state corresponds to different processing methods for the obtained near-end speech estimation signal, which can be set according to needs.
- the vocal state of is not limited to the examples mentioned above.
- the current utterance state is to determine the real-time utterance state of the near-end speech estimation signal obtained this time to determine its real-time corresponding utterance state.
- the corresponding utterance state can be determined according to the waveform, channel and other attributes of the speech signal.
- S104 Obtain a preset mapping relationship between the utterance state and the processing mode, and obtain the processing mode corresponding to the current utterance state according to the mapping relationship.
- the processing method is a corresponding processing method for the near-end speech estimation signal of each utterance state, and may include processing methods such as setting the near-end speech estimation signal to zero (0), fully retaining or retaining part, and so on.
- the mapping relationship between the utterance state and the processing mode can be set in advance. After the current utterance state is determined, the corresponding processing mode can be automatically obtained according to the mapping relationship.
- S105 Process the near-end speech estimation signal according to the processing manner.
- the near-end speech estimation signal is processed according to this processing mode.
- S106 Output the processed near-end speech estimation signal to obtain an output signal.
- the processed near-end voice estimation signal can correctly reflect the call information of the local end, and this output signal can be transmitted to the call peer through the communication link.
- the input sound signal in a voice call such as a telephone
- it is different from the direct transmission to the opposite device in the existing communication scheme or the transmission to the opposite device with only adaptive echo cancellation.
- Different processing methods can be customized according to the different sounding states of the input sound signal, and the interference or echo in the input sound signal can be accurately filtered by combining the characteristics of double-ended sounding.
- the call quality can be significantly improved.
- Figure 2 provides a schematic diagram of the application of an echo cancellation method based on double-ended utterance detection; in the application scenario shown in Figure 2, the call object includes a far-end device 200 and a near-end device 210, where the far The end device 200 includes a far-end microphone 201 and a far-end speaker 202, and the near-end device 210 includes a near-end speaker 203 and a near-end microphone 204.
- the far-end microphone 201 sends the downlink signal S1 to the near-end speaker 203
- the direct echo S2 is the sound signal that is emitted by the near-end speaker 203 and is directly picked up by the near-end microphone 204
- the indirect echo S3 is the sound signal from the near-end speaker.
- 203 emits a sound signal that is reflected by the environment and indirectly picked up by the near-end microphone 204. While picking up the echoes (direct echo S2 and indirect echo S3), a person (not shown) sends a voice to the near-end microphone 204 (marked "voice" in the figure), and the near-end microphone 204 picks up the voice and generates an uplink signal S4 is sent to the remote speaker 202 to be played out.
- the echo cancellation method based on double-ended voice detection in FIG. 1 can be applied to the near-end microphone 204 side in FIG. 2 where the near-end microphone 204 obtains the input sound signal to be sent to the far-end device 200 (that is, according to the voice in FIG. 2 Before the obtained sound signal), the input sound signal is processed by the echo cancellation method in FIG. 1 first.
- Step S103 in FIG. 1 determines the current utterance state according to the near-end speech estimation signal, which may specifically include steps S301 to S303 in FIG. 3.
- S301 Calculate the average value of the double-ended utterance state statistics of the current frame in the input sound signal according to the input sound signal and the near-end speech estimation signal.
- the double-ended utterance state statistics of the current frame are based on the current frame in the input sound signal as the reference point, and the input sound signal and the near-end speech estimation signal before the reference point are respectively sampled, and the input sound signal and the near-end speech estimation signal are respectively sampled. Signals are compared, calculated, and used to reflect the current sounding state of the input sound signal.
- the average value is the average value of the double-ended voice state statistics at several sampling points.
- the average value of the double-ended utterance state statistics of the current frame may be obtained by inputting the input sound signal and the near-end speech estimation signal into the double-ended utterance detector.
- S302 Acquire a dual-talk judgment threshold corresponding to the current frame, where the dual-talk judgment threshold is obtained according to the signal-to-return ratio of the input sound signal and the near-end interference signal.
- the signal-to-return ratio of the input sound signal is the energy ratio of the signal and the echo in the input sound signal, and the signal-to-return ratio of the input sound signal can be calculated to obtain the signal-to-return ratio.
- the near-end interference signal is the interference signal generated by the sound generated by the same-end sounding device corresponding to the sound collecting device on the reception of the microphone, and can be obtained from the sounding device corresponding to the sound collecting device.
- the sound-producing device can be a device such as a speaker corresponding to the local microphone in telephone communication.
- the dual-talk judgment threshold is the threshold value used to determine the utterance state corresponding to the average of the double-ended utterance state statistics of the current frame. Set multiple thresholds of the utterance state for the average value of the double-ended utterance state statistics, that is, dual-talk judgment Threshold.
- the dual-talk judgment threshold is set based on two factors: the signal-to-return ratio of the input sound signal and the near-end interference signal.
- S303 Determine the current utterance state according to the magnitude relationship between the average value of the double-ended utterance state statistics of the current frame and the double-talk determination threshold.
- step S302 According to the relationship between the average value of the double-ended utterance state statistics of the current frame in the input sound signal and the double-talk judgment threshold obtained in step S302, it is determined which utterance state the average value of the double-ended utterance state statistics of the current frame is in. Within the threshold interval to determine the current vocalization state.
- the real-time sounding state judgment is performed on each frame of the input sound signal to realize real-time update of the processing method of the near-end speech estimation signal, so that the input sound signal can be accurately and completely echo canceled. Ensure the stability of the call process.
- step S301 the calculation method of the average value of the double-ended utterance state statistics of the current frame in the input sound signal can be calculated according to the following formula:
- Step S302 in FIG. 3 obtains the dual-talk judgment threshold corresponding to the current frame.
- the dual-talk judgment threshold is based on the signal-back ratio and near-end interference of the input sound signal.
- Obtaining the signal may include steps S401 to S403 in Fig. 4, where:
- S401 Estimate the signal-to-return ratio of the input sound signal in real time to obtain an average signal-to-return ratio of the current frame in the input sound signal.
- S402 Acquire multiple preset thresholds of the response ratio, and construct multiple intervals of the ratio of the response ratio according to the multiple thresholds.
- the preset multiple thresholds are values obtained through experience or extreme technical personnel.
- the boundary values of multiple thresholds can be generated based on multiple thresholds to define multiple thresholds.
- the corresponding dual-speaking judgment threshold is set for each response ratio interval.
- S403 Determine a signal-to-return ratio interval to which the average return ratio of the current frame belongs, and obtain a dual-speaking judgment threshold corresponding to the signal-to-return ratio interval as the dual-speaking judgment threshold of the current frame.
- the signal-to-return ratio of the input sound signal with the near-end interference signal as the echo source is calculated in real time by sampling.
- different dual-talk judgments are set Threshold, more accurately determine the current sounding state, and improve the accuracy of echo cancellation for the input sound signal.
- the calculation method of the average response ratio of the current frame in step S401 in FIG. 4 is as follows:
- the near-end interference signal is a sound signal generated by a sound-producing device at the same end as the sound collection device.
- P m (k, n) represents the power of the input sound signal at the k-th frame and the n-th sample point
- P x (k, n ) Represents the power of the near-end interference signal at the k-th frame and the n-th sample point
- mean() represents the average value of the values in the brackets.
- P m (k, n) and P x (k, n) are the power values of the sampling points obtained by sampling the input sound signal and the near-end interference signal respectively in frames.
- the sampling process is: acquiring n sample points in the input sound signal and the near-end interference signal respectively, and the signal frame corresponding to each sample point is the k-th frame. Among them, n and k are variable count values.
- Step S402 in FIG. 4 obtains multiple preset thresholds of the response ratio, and constructs multiple intervals of the response ratio according to the multiple thresholds, which may include: Two adjacent ones of the acquired multiple thresholds of the response ratio are used as the boundary value of the response ratio interval to obtain a plurality of ratio intervals.
- the threshold value is used as the boundary value of a RL interval to obtain multiple RL interval.
- the preset multiple thresholds of the return ratio are SER_thr_1, SER_thr_2, SER_thr_3,..., SER_thr_k, and the return ratio interval is formed by the thresholds.
- the information response ratio interval can be expressed as: the information response ratio interval 501, the information response ratio interval 502,..., the information response ratio interval 50k, where k is a variable value, which represents the kth information response ratio interval 50k, according to K+1 thresholds of the response ratio can construct k response ratio intervals of 50k.
- the corresponding dual-speaking judgment threshold for each response ratio interval, that is, the dual-speaking judgment threshold m1, the dual-speaking judgment threshold m2, ..., the dual-speaking judgment threshold mk in Fig. 5.
- the corresponding dual-talk judgment threshold is obtained, that is, step S403.
- the RR interval is automatically constructed based on the preset RR threshold value as the interval boundary value.
- the utterance state includes two states: only the far-end utterance and not only the far-end utterance, and the preset mapping relationship between the utterance state and the processing mode includes: when the utterance state is only the far-end utterance When the near-end speech estimation signal is zeroed or suppressed to be inaudible; when the utterance state is not only the far-end utterance, the near-end speech estimation signal is retained.
- two voice states can be set, namely, only the far-end voice and not only the far-end voice.
- the near-end speech estimation signal needs to be zeroed or suppressed to be inaudible, that is, the near-end speech estimation signal is filtered out, and the mute signal is used as The transmission signal of the local end is transmitted to the opposite end device of the call.
- the near-end voice estimation signal When it is determined based on the near-end voice estimation signal that the current utterance state is not only the far-end utterance, the near-end voice estimation signal needs to be retained, and the near-end voice estimation signal is transmitted to the peer device of the call as a transmission signal of the local end.
- two utterance states are defined, and processing methods corresponding to the two utterance states are specified, which can basically meet the requirements of real-time echo cancellation in common voice calls.
- the not only far-end utterance includes two states: near-end utterance only and double-ended utterance.
- near-end sound-only means that the sound collection device only collects the transmission from the local end. Signals, but no near-end interference signal is collected; the double-ended sounding state means that the sound collection device collects both the local transmission signal and the near-end interference signal.
- the processing method can be further specified for these two states. For example, for only the near-end voice, no processing is done and the voice signal is directly transmitted to the opposite end, and so on.
- Step S102 in FIG. 1 performs adaptive filtering on the input sound signal to obtain a near-end speech estimation signal, which may specifically include two filtering operations, namely linear filtering and non-linear filtering. .
- the input sound signal is processed by linear filtering in filters such as AEC to eliminate part of the echo.
- the input sound signal still contains linear residual echo and nonlinear echo.
- near-end utterance it also contains near-end speech.
- Continuous non-linear processing and filtering of the sound signal containing residual echo can be used to achieve further echo suppression.
- the adaptive filtering of the input sound signal includes two operations of linear filtering and non-linear filtering, which can further suppress the echo of the input sound signal.
- the embodiment of the application also provides an echo cancellation device based on double-ended vocalization detection. Please refer to FIG. 6.
- the device may include an input sound signal acquisition module 601, a filtering module 602, a sound state determination module 603, and a processing mode acquisition module 604. , Near-end processing module 605 and output module 606, where:
- the input sound signal acquisition module 601 is used to acquire the input sound signal from the sound collection device.
- the filtering module 602 is configured to perform adaptive filtering on the input sound signal to obtain a near-end speech estimation signal.
- the utterance state determination module 603 is configured to determine the current utterance state according to the near-end speech estimation signal.
- the processing mode obtaining module 604 is configured to obtain a preset mapping relationship between the utterance state and the processing mode, and obtain the processing mode corresponding to the current utterance state according to the mapping relationship.
- the near-end processing module 605 is configured to process the near-end speech estimation signal according to the processing mode.
- the output module 606 is configured to output the processed near-end speech estimation signal to obtain an output signal.
- the utterance state determination module 603 may include:
- a real-time utterance state acquisition unit configured to calculate the average value of the double-ended utterance state statistics of the current frame in the input sound signal according to the input sound signal and the near-end speech estimation signal;
- a threshold obtaining unit configured to obtain a dual-talk judgment threshold corresponding to the current frame, the dual-talk judgment threshold being obtained according to the signal-to-return ratio of the input sound signal and the near-end interference signal;
- the utterance state determination unit is configured to determine the current utterance state according to the magnitude relationship between the average value of the double-ended utterance state statistics of the current frame and the dual-talk determination threshold.
- the threshold value acquisition unit includes:
- the current response ratio obtaining subunit is used to estimate the signal response ratio of the input sound signal in real time to obtain the average signal response ratio of the current frame in the input sound signal;
- a signal response ratio interval construction subunit for obtaining a plurality of preset signal response ratio thresholds, and constructing a plurality of signal response ratio intervals according to the plurality of signal response ratio thresholds;
- the threshold judging subunit is used to judge the interval of the average response ratio of the current frame to which the average response ratio belongs, and obtain the dual-speaking judgment threshold corresponding to the said interval as the dual-speaking judgment threshold of the current frame.
- the above-mentioned signal response ratio interval construction subunit is further used to use two adjacent ones of the obtained multiple signal response ratio threshold values as the boundary value of the signal response ratio interval to obtain multiple signal response ratios. Back to the interval.
- the filtering module 602 in FIG. 6 is further configured to perform linear filtering and non-linear filtering on the input sound signal to obtain the near-end speech estimation signal.
- the embodiment of the present application also provides an echo cancellation system based on double-ended vocalization detection, including a sound collection device, a same-end sounding device, and an echo cancellation device.
- the echo cancellation device performs the double-ended-based echo cancellation system provided in FIGS. 1 to 5. The steps of the echo cancellation method for vocal detection.
- Figure 7 is a schematic diagram of an echo cancellation system based on double-ended voice detection; the system includes a sound collection device 701, an echo cancellation device 702, and a same-end voice device 703.
- the sound collection device 701 may be a microphone in telephone communication for collecting the input sound signal A1.
- the same-end sounding device 703 may be a speaker connected to the same end as a microphone in telephone communication to generate a sound signal, but it may interfere with the input sound signal A1, so it is used as the interference sound signal A6.
- the echo cancellation device 702 is a device for implementing the echo cancellation method based on double-ended vocalization detection in FIGS. 1 to 5 in this application.
- the function of the echo cancellation device can be realized by means of entity or logic circuit, software programming, etc.
- the echo cancellation device 702 may include a linear AEC filter 7021, an NLP filter 7022, a double-ended utterance detector 7023, a signal-to-return ratio estimator 7024, a threshold determiner 7025 and a processor 7026.
- the echo cancellation device 702 processes the sound signals received from the sound collection device 701 and the same-end sound device 703 in the communication process as follows:
- the input sound signal A1 is linearly filtered through the linear AEC filter 7021 to obtain the linearly filtered sound signal A2, and then the NLP filter is applied to A2.
- Non-linear filtering obtains the near-end speech estimation signal A3, which is used as an input signal of the double-ended utterance detector 7023.
- the input sound signal A1 is directly used as another input signal of the double-ended sounding detector.
- the linear AEC filter 7021 uses the interference sound signal A6 as a filtering reference factor to linearly filter the input sound signal A1.
- the input sound signal A1 is input to the echo ratio estimator 7024, the average echo ratio A4 of the current frame of the input sound signal is calculated in real time, and the average echo ratio A4 is transmitted to the threshold determiner 7025, which is based on the preset Multiple signal response ratio intervals constructed by multiple signal response ratio thresholds to determine the dual-talk judgment threshold A5 corresponding to the average return ratio of the current frame A4, and send the double-talk judgment threshold A5 to the double-ended utterance detector 7023 As the basis for judging the current utterance state.
- the signal-to-return ratio estimator 7024 samples the input sound signal A1 and the interference sound signal A6, and calculates the average signal-to-return ratio A4 of the current frame according to the following formula:
- the double-ended utterance detector 7023 acquires the first input signal (ie the near-end voice estimation signal A3), the second input signal (ie the input sound signal A1), and the dual-talk judgment threshold A5, and determines the current utterance in real time based on this information State A7.
- the current utterance state is obtained based on the average of the double-ended utterance state statistics of the current frame.
- the double-ended utterance detector 7023 samples the near-end speech estimation signal A3 and the input sound signal A1, and calculates the average value of the double-ended utterance state statistics of the current frame according to the following formula:
- the double-ended utterance detector 7023 sends the obtained current utterance state A7 to the processor 7026, and the processor 7026 processes the near-end voice estimation signal A3 according to the current utterance state A7.
- the processing method is: when the utterance state is only the far-end utterance, the near-end speech estimation signal A3 is zeroed or suppressed to inaudible; when the utterance state is not only the far-end utterance, the near-end speech estimation signal A3 is retained.
- the processed near-end speech estimation signal is output to obtain an output signal A8, and the output signal A8 can be transmitted to the device of the communication opposite end via the communication link.
- the above-mentioned echo cancellation system based on double-ended vocalization detection performs real-time detection based on the acoustic echo generated in the communication process, and eliminates it according to the detection result, so that the echo cancellation effect can be improved when the voice communication terminal is in the hands-free mode to improve the call quality.
- the echo cancellation method, device and system based on double-ended utterance detection provided in the embodiments of the present application can distinguish between only far-end utterance and only near-end utterance or double-ended utterance in real time.
- the time-domain output result is zeroed or suppressed to inaudible, so that the echo can be eliminated to the greatest extent while ensuring the duplex call performance, so as to improve the echo cancellation and duplex performance at the same time.
- the purpose is to improve the duplex call experience of the hands-free voice communication terminal.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephone Function (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
Description
Claims (11)
- 一种基于双端发声检测的回声消除方法,其特征在于,所述方法包括:An echo cancellation method based on double-ended utterance detection, characterized in that the method includes:从声音采集设备获取输入声音信号;Obtain the input sound signal from the sound collection device;对所述输入声音信号进行自适应滤波,得到近端语音估计信号;Performing adaptive filtering on the input sound signal to obtain a near-end speech estimation signal;根据所述近端语音估计信号判定当前的发声状态;Judging the current utterance state according to the near-end speech estimation signal;获取预设的发声状态与处理方式之间的映射关系,根据所述映射关系获取所述当前的发声状态对应的处理方式;Acquiring a preset mapping relationship between a utterance state and a processing manner, and acquiring a processing manner corresponding to the current utterance state according to the mapping relationship;根据所述处理方式对所述近端语音估计信号进行处理;Processing the near-end speech estimation signal according to the processing manner;将处理后的近端语音估计信号输出,得到输出信号。The processed near-end speech estimation signal is output to obtain an output signal.
- 根据权利要求1所述的方法,其特征在于,所述根据所述近端语音估计信号判定当前的发声状态,包括:The method according to claim 1, wherein the determining the current utterance state according to the near-end speech estimation signal comprises:根据所述输入声音信号和所述近端语音估计信号计算所述输入声音信号中当前帧的双端发声状态统计量的平均值;Calculating an average value of the double-ended utterance state statistics of the current frame in the input sound signal according to the input sound signal and the near-end speech estimation signal;获取所述当前帧对应的双讲判断阈值,所述双讲判断阈值根据所述输入声音信号的信回比和近端干扰信号得到;Acquiring a dual-talk judgment threshold corresponding to the current frame, where the dual-talk judgment threshold is obtained according to the signal-to-return ratio of the input sound signal and the near-end interference signal;根据所述当前帧的双端发声状态统计量的平均值与双讲判断阈值的大小关系,判定当前的发声状态。The current utterance state is determined according to the relationship between the average value of the double-ended utterance state statistics of the current frame and the double-talk determination threshold.
- 根据权利要求2所述的方法,其特征在于,所述根据所述输入声音信号和所述近端语音估计信号计算所述输入声音信号中当前帧的双端发声状态统计量的平均值,包括:The method according to claim 2, wherein the calculating the average value of the double-ended utterance state statistics of the current frame in the input sound signal according to the input sound signal and the near-end speech estimation signal comprises :根据下述公式计算所述当前帧的双端发声状态统计量的平均值:Calculate the average value of the double-ended voice state statistics of the current frame according to the following formula:其中, 为当前帧的双端发声状态统计量的平均值,P S(k,n)表 示近端语音估计信号在第k帧、第n个样本点的功率,P m(k,n)表示所述输入声音信号在第k帧、第n个样本点的功率,mean()表示取括号内数值的平均值。 among them, Is the average of the double-ended utterance state statistics of the current frame, P S (k, n) represents the power of the near-end speech estimation signal in the k-th frame and the n-th sample point, and P m (k, n) represents the The power of the input sound signal at the k-th frame and the n-th sample point, mean() means to take the average of the values in the brackets.
- 根据权利要求2所述的方法,其特征在于,所述获取所述当前帧对应的双讲判断阈值,所述双讲判断阈值根据所述输入声音信号的信回比和近端干扰信号得到,包括:The method according to claim 2, wherein said obtaining a dual-talk determination threshold corresponding to the current frame, the dual-talk determination threshold is obtained according to the signal-to-return ratio of the input sound signal and the near-end interference signal, include:实时估计所述输入声音信号的信回比,以得到所述输入声音信号中当前帧的平均信回比;Real-time estimation of the signal-to-return ratio of the input sound signal to obtain an average signal-to-return ratio of the current frame in the input sound signal;获取预设的多个信回比阈值,并根据所述多个信回比阈值构建多个信回比区间;Acquiring multiple preset thresholds for the response ratio, and constructing multiple intervals of the response ratio according to the multiple thresholds;判断所述当前帧的平均信回比所属的信回比区间,并获取所述的信回比区间对应的双讲判断阈值作为所述当前帧的双讲判断阈值。Determine the average response ratio interval of the current frame to which the response ratio interval belongs, and obtain the dual-speaker judgment threshold corresponding to the ratio interval as the dual-speaker judgment threshold of the current frame.
- 根据权利要求4所述的方法,其特征在于,所述实时估计所述输入声音信号的信回比,以得到所述输入声音信号中当前帧的平均信回比,包括:The method according to claim 4, wherein the real-time estimation of the signal-to-return ratio of the input sound signal to obtain the average signal-to-return ratio of the current frame in the input sound signal comprises:获取近端干扰信号,所述近端干扰信号为与所述声音采集设备的同端发声设备产生的声音信号;Acquiring a near-end interference signal, where the near-end interference signal is a sound signal generated by a sound-producing device at the same end as the sound collection device;根据下述公式计算所述输入声音信号中当前帧的平均信回比;Calculate the average signal-to-return ratio of the current frame in the input sound signal according to the following formula;其中, 表示估计得到的第k帧的平均信回比,其单位为dB,P m(k,n)表示所述输入声音信号在第k帧、第n个样本点的功率,P x(k,n)表示所述近端干扰信号在第k帧、第n个样本点的功率,mean()表示取括号内数值的平均值。 among them, Represents the estimated average signal-to-return ratio of the k-th frame, and its unit is dB, P m (k, n) represents the power of the input sound signal at the k-th frame and the n-th sample point, P x (k, n ) Represents the power of the near-end interference signal at the k-th frame and the n-th sample point, and mean() represents the average value of the values in the brackets.
- 根据权利要求4所述的方法,其特征在于,所述获取预设的多个信回比阈值,并根据所述多个信回比阈值构建多个信回比区间, 包括:The method according to claim 4, wherein the obtaining multiple preset thresholds of the return ratio, and constructing multiple intervals of the return ratio according to the multiple thresholds, comprises:将获取的所述多个信回比阈值中相邻的两个作为所述信回比区间的边界值,得到多个信回比区间。Two adjacent ones of the acquired multiple thresholds of the response ratio are used as the boundary value of the response ratio interval to obtain a plurality of ratio intervals.
- 根据权利要求1所述的方法,其特征在于,所述发声状态包括仅远端发声和非仅远端发声两种状态,所述预设的发声状态与处理方式之间的映射关系包括:The method according to claim 1, wherein the utterance state includes two states: only the far-end utterance and not only the far-end utterance, and the mapping relationship between the preset utterance state and the processing mode includes:当发声状态为仅远端发声时,对所述近端语音估计信号作置零处理或抑制至不可闻;When the utterance state is that only the far-end is uttered, zeroing the near-end speech estimation signal or suppressing it to be inaudible;当发声状态为非仅远端发声时,保留所述近端语音估计信号。When the utterance state is not only the far-end utterance, the near-end speech estimation signal is retained.
- 根据权利要求7所述的方法,其特征在于,所述非仅远端发声包括仅近端发声和双端发声两种状态。The method according to claim 7, wherein the non-only far-end utterance includes two states: only near-end utterance and double-ended utterance.
- 根据权利要求1所述的方法,其特征在于,所述对所述输入声音信号进行自适应滤波,得到近端语音估计信号,包括:The method according to claim 1, wherein said performing adaptive filtering on said input sound signal to obtain a near-end speech estimation signal comprises:分别对所述输入声音信号进行线性滤波和非线性滤波,得到所述近端语音估计信号。Perform linear filtering and nonlinear filtering on the input sound signal respectively to obtain the near-end speech estimation signal.
- 一种基于双端发声检测的回声消除装置,其特征在于,所述装置包括:An echo cancellation device based on double-ended vocalization detection, characterized in that the device includes:输入声音信号获取模块,用于从声音采集设备获取输入声音信号;The input sound signal acquisition module is used to acquire the input sound signal from the sound collection device;滤波模块,用于对所述输入声音信号进行自适应滤波,得到近端语音估计信号;The filtering module is used to perform adaptive filtering on the input sound signal to obtain a near-end speech estimation signal;发声状态判定模块,用于根据所述近端语音估计信号判定当前的发声状态;The utterance state determination module is configured to determine the current utterance state according to the near-end speech estimation signal;处理方式获取模块,用于获取预设的发声状态与处理方式之间的映射关系,根据所述映射关系获取所述当前的发声状态对应的处理方式;A processing method acquisition module, configured to acquire a preset mapping relationship between a utterance state and a processing method, and acquire the processing method corresponding to the current utterance state according to the mapping relationship;近端处理模块,用于根据所述处理方式对所述近端语音估计信号进行处理;The near-end processing module is configured to process the near-end speech estimation signal according to the processing mode;输出模块,用于将处理后的近端语音估计信号输出,得到输出信号。The output module is used to output the processed near-end speech estimation signal to obtain an output signal.
- 一种基于双端发声检测的回声消除系统,包括声音采集设备、同端发声设备和回声消除设备,其特征在于,所述回声消除设备执行权利要求1至9任一项所述方法的步骤。An echo cancellation system based on double-ended voice detection, comprising a sound collection device, a same-end voice device, and an echo cancellation device, characterized in that the echo cancellation device executes the steps of the method according to any one of claims 1 to 9.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911284296.3A CN110995951B (en) | 2019-12-13 | 2019-12-13 | Echo cancellation method, device and system based on double-end sounding detection |
CN201911284296.3 | 2019-12-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021114779A1 true WO2021114779A1 (en) | 2021-06-17 |
Family
ID=70093348
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/114168 WO2021114779A1 (en) | 2019-12-13 | 2020-09-09 | Echo cancellation method, apparatus, and system employing double-talk detection |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110995951B (en) |
WO (1) | WO2021114779A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110995951B (en) * | 2019-12-13 | 2021-09-03 | 展讯通信(上海)有限公司 | Echo cancellation method, device and system based on double-end sounding detection |
CN111556210B (en) * | 2020-04-23 | 2021-10-22 | 深圳市未艾智能有限公司 | Call voice processing method and device, terminal equipment and storage medium |
CN113225442B (en) * | 2021-04-16 | 2022-09-02 | 杭州网易智企科技有限公司 | Method and device for eliminating echo |
CN113241085B (en) * | 2021-04-29 | 2022-07-22 | 北京梧桐车联科技有限责任公司 | Echo cancellation method, device, equipment and readable storage medium |
CN113808609A (en) * | 2021-09-18 | 2021-12-17 | 展讯通信(上海)有限公司 | Echo detection method and device, computer readable storage medium and terminal equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101179294A (en) * | 2006-11-09 | 2008-05-14 | 爱普拉斯通信技术(北京)有限公司 | Self-adaptive echo eliminator and echo eliminating method thereof |
CN102655558A (en) * | 2012-05-21 | 2012-09-05 | 宁波工程学院 | Double-end pronouncing robust structure and acoustic echo cancellation method |
US20120281603A1 (en) * | 2011-05-06 | 2012-11-08 | Futurewei Technologies, Inc. | Transmit Phase Control for the Echo Cancel Based Full Duplex Transmission System |
CN107635082A (en) * | 2016-07-18 | 2018-01-26 | 深圳市有信网络技术有限公司 | A kind of both-end sounding end detecting system |
CN110995951A (en) * | 2019-12-13 | 2020-04-10 | 展讯通信(上海)有限公司 | Echo cancellation method, device and system based on double-end sounding detection |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3922997B2 (en) * | 2002-10-30 | 2007-05-30 | 沖電気工業株式会社 | Echo canceller |
CN1925346A (en) * | 2006-09-05 | 2007-03-07 | 华为技术有限公司 | Detecting method for double speaking state in echo wave counteract |
JP4411309B2 (en) * | 2006-09-21 | 2010-02-10 | Okiセミコンダクタ株式会社 | Double talk detection method |
US8345860B1 (en) * | 2008-05-09 | 2013-01-01 | Hellosoft India PVT. Ltd | Method and system for detection of onset of near-end signal in an echo cancellation system |
CN102160296B (en) * | 2009-01-20 | 2014-01-22 | 华为技术有限公司 | Method and apparatus for detecting double talk |
CN101719969B (en) * | 2009-11-26 | 2013-10-02 | 美商威睿电通公司 | Method and system for judging double-end conversation and method and system for eliminating echo |
CN102377453B (en) * | 2010-08-06 | 2014-02-26 | 联芯科技有限公司 | Method and device for controlling updating of self-adaptive filter and echo canceller |
CN101917527B (en) * | 2010-09-02 | 2013-07-03 | 杭州华三通信技术有限公司 | Method and device of echo elimination |
CN102065190B (en) * | 2010-12-31 | 2013-08-28 | 杭州华三通信技术有限公司 | Method and device for eliminating echo |
CN103179296B (en) * | 2011-12-26 | 2017-02-15 | 中兴通讯股份有限公司 | Echo canceller and echo cancellation method |
US9100466B2 (en) * | 2013-05-13 | 2015-08-04 | Intel IP Corporation | Method for processing an audio signal and audio receiving circuit |
CN104519212B (en) * | 2013-09-27 | 2017-06-20 | 华为技术有限公司 | A kind of method and device for eliminating echo |
US20160171988A1 (en) * | 2014-12-15 | 2016-06-16 | Wire Swiss Gmbh | Delay estimation for echo cancellation using ultrasonic markers |
CN106533500B (en) * | 2016-11-25 | 2019-11-12 | 上海伟世通汽车电子系统有限公司 | A method of optimization Echo Canceller convergence property |
CN108134863B (en) * | 2017-12-26 | 2020-06-19 | 中山大学花都产业科技研究院 | Improved double-end detection device and detection method based on double statistics |
CN108540680B (en) * | 2018-02-02 | 2021-03-02 | 广州视源电子科技股份有限公司 | Switching method and device of speaking state and conversation system |
CN108696648B (en) * | 2018-05-16 | 2021-08-24 | 上海小度技术有限公司 | Method, device, equipment and storage medium for processing short-time voice signal |
CN109547655A (en) * | 2018-12-30 | 2019-03-29 | 广东大仓机器人科技有限公司 | A kind of method of the echo cancellation process of voice-over-net call |
CN110138990A (en) * | 2019-05-14 | 2019-08-16 | 浙江工业大学 | A method of eliminating mobile device voip phone echo |
CN110335618B (en) * | 2019-06-06 | 2021-07-30 | 福建星网智慧软件有限公司 | Method for improving nonlinear echo suppression and computer equipment |
-
2019
- 2019-12-13 CN CN201911284296.3A patent/CN110995951B/en active Active
-
2020
- 2020-09-09 WO PCT/CN2020/114168 patent/WO2021114779A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101179294A (en) * | 2006-11-09 | 2008-05-14 | 爱普拉斯通信技术(北京)有限公司 | Self-adaptive echo eliminator and echo eliminating method thereof |
US20120281603A1 (en) * | 2011-05-06 | 2012-11-08 | Futurewei Technologies, Inc. | Transmit Phase Control for the Echo Cancel Based Full Duplex Transmission System |
CN102655558A (en) * | 2012-05-21 | 2012-09-05 | 宁波工程学院 | Double-end pronouncing robust structure and acoustic echo cancellation method |
CN107635082A (en) * | 2016-07-18 | 2018-01-26 | 深圳市有信网络技术有限公司 | A kind of both-end sounding end detecting system |
CN110995951A (en) * | 2019-12-13 | 2020-04-10 | 展讯通信(上海)有限公司 | Echo cancellation method, device and system based on double-end sounding detection |
Also Published As
Publication number | Publication date |
---|---|
CN110995951B (en) | 2021-09-03 |
CN110995951A (en) | 2020-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021114779A1 (en) | Echo cancellation method, apparatus, and system employing double-talk detection | |
CN105825864B (en) | Both-end based on zero-crossing rate index is spoken detection and echo cancel method | |
CN103428385B (en) | For handling the method for audio signal and circuit arrangement for handling audio signal | |
KR100989266B1 (en) | Double talk detection method based on spectral acoustic properties | |
US6792107B2 (en) | Double-talk detector suitable for a telephone-enabled PC | |
KR101444100B1 (en) | Noise cancelling method and apparatus from the mixed sound | |
US9094744B1 (en) | Close talk detector for noise cancellation | |
US9443528B2 (en) | Method and device for eliminating echoes | |
WO2019140755A1 (en) | Echo elimination method and system based on microphone array | |
US9100756B2 (en) | Microphone occlusion detector | |
JP4568439B2 (en) | Echo suppression device | |
US5390244A (en) | Method and apparatus for periodic signal detection | |
WO2015043150A1 (en) | Echo cancellation method and apparatus | |
JPH09172396A (en) | System and method for removing influence of acoustic coupling | |
JP2008507926A (en) | Headset for separating audio signals in noisy environments | |
EP3791565A1 (en) | Method, apparatus, and computer-readable media utilizing residual echo estimate information to derive secondary echo reduction parameters | |
US8041028B2 (en) | Double-talk detection | |
CN106571147B (en) | Method for suppressing acoustic echo of network telephone | |
CN111742541B (en) | Acoustic echo cancellation method, acoustic echo cancellation device and storage medium | |
WO2010083641A1 (en) | Method and apparatus for detecting double talk | |
TWI506620B (en) | Communication apparatus and voice processing method therefor | |
CN100508031C (en) | Method for identifying and eliminating echo generated by speech at remote end in SCDMA handset | |
CN110634496B (en) | Double-talk detection method and device, computer equipment and storage medium | |
CN111556210B (en) | Call voice processing method and device, terminal equipment and storage medium | |
JP3607625B2 (en) | Multi-channel echo suppression method, apparatus thereof, program thereof and recording medium thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20899723 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20899723 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20899723 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.01.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20899723 Country of ref document: EP Kind code of ref document: A1 |