CN111742541B

CN111742541B - Acoustic echo cancellation method, acoustic echo cancellation device and storage medium

Info

Publication number: CN111742541B
Application number: CN201780097325.8A
Authority: CN
Inventors: 范泛; 弗拉迪斯拉夫·伊戈列维奇·瓦西里耶夫; 德米特里·弗拉基米罗维奇·萨拉纳
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-12-08
Filing date: 2017-12-08
Publication date: 2021-11-30
Anticipated expiration: 2037-12-08
Also published as: CN111742541A; WO2019112467A1

Abstract

The embodiment of the application provides an acoustic echo cancellation method and device, which are used for improving the detection accuracy of acoustic echo cancellation and reducing echo residual errors. The method comprises the following steps: a first terminal picks up a signal containing an echo signal generated by the first terminal playing a reference signal, wherein the reference signal is a voice signal received by the first terminal from a second terminal; the first terminal adaptively filters the reference signal and the pickup signal using a kalman filter and a variable step normalized mean square (NLMS) filter to obtain a first residual signal and a second residual signal, respectively; the first terminal performs mixed filtering processing on the first residual signal and the second residual signal to obtain a target residual signal; the first terminal carries out residual echo estimation according to the estimated echo signal so as to obtain an estimated residual echo signal; and the first terminal performs residual echo suppression on the target residual signal according to the estimated residual echo signal so as to output an echo suppressed signal. The embodiment of the invention provides a hybrid adaptive filter based on Kalman filtering and variable-step NLMS filtering, and designs a method for multi-time NLP and an implicit single call/double call determination method so as to more accurately calculate residual echo suppression gain without delay estimation. The solution can significantly improve the robustness of the acoustic echo cancellation algorithm and achieve a smooth full duplex call effect.

Description

Acoustic echo cancellation method, acoustic echo cancellation device and storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to an acoustic echo cancellation method and apparatus.

Background

Acoustic echoes are generated during the acoustic feedback process, wherein after the near-end speaker plays the far-end transmitted sound, the near-end microphone again picks up the sound and transmits the sound to the far-end. The generation of the acoustic echo mainly comprises the following paths:

(1) echo transmission within the calling device: the microphone directly receives the sound signal played by the loudspeaker;

(2) direct sound transmission outside the calling device: the microphone receives a direct sound signal played by the speaker through an acoustic transmission path outside the device;

(3) single/multiple reflection sound transmission outside the calling device: the microphone receives the single/multiple reflected sound signal played by the speaker through an acoustic transmission path outside the device.

The acoustic echo is mainly present in scenarios where the loudspeaker is a sound emitting device in a calling application, e.g. in scenarios of handheld/hands-free modes such as cell phones, notebook phone calls, car phones or video conferencing. If the echo is not processed, the delay caused by the communication network (GSM/WCDMA/VOLTE, VoIP) will seriously affect the subjective call experience and even generate a howling sound.

With the large-scale use of voice communication systems, Acoustic Echo Cancellation (AEC), which is a key algorithm module indispensable to a voice enhancement system, is widely used, wherein the AEC refers to a process of removing acoustic echoes. Common AEC algorithms typically include two parts: adaptive Filtering (AF) and nonlinear processing (NLP). The adaptive filtering portion of conventional acoustic echo cancellation techniques mainly use a Normalized Least Mean Square (NLMS) method. The NLP part mainly adopts a gain calculation method based on a coherent function. However, in order to accurately calculate the coherence function, the delays of the reference signal and the pickup signal need to be estimated. In addition, in order to ensure a smooth double talk effect, double talk detection is also required.

In the prior art, the self-adaptive filter has low convergence speed, inaccurate NLP gain calculation, delay estimation and double-end call detection are easily influenced by various factors, and the acoustic echo cancellation performance cannot reach the optimum.

Disclosure of Invention

The embodiment of the application provides an acoustic echo cancellation method and device, which are used for improving the detection accuracy of acoustic echo cancellation and reducing echo residual errors.

In order to solve the above problem, the embodiments of the present application provide the following technical means.

In one aspect, an embodiment of the present application provides an acoustic echo cancellation method for a first terminal, where the first terminal performs voice communication with a second terminal, and the method includes:

the first terminal picks up a signal containing an echo signal generated by the first terminal playing a reference signal, wherein the reference signal is a voice signal received by the first terminal from the second terminal;

the first terminal adaptively filters the reference signal and the pickup signal using a kalman filter and a variable step normalized mean square (NLMS) filter to obtain a first residual signal and a second residual signal, respectively;

the first terminal performs mixed filtering processing on the first residual signal and the second residual signal to obtain a target residual signal;

the first terminal carries out residual echo estimation according to the estimated echo signal so as to obtain an estimated residual echo signal;

and the first terminal performs residual echo suppression on the target residual signal according to the estimated residual echo signal so as to output an echo suppressed signal.

In a first implementation manner of the first aspect of the present application, the performing, by the first terminal, hybrid filtering processing on the first residual signal and the second residual signal to obtain a target residual signal includes:

the first terminal determines the energy of the first residual signal and the energy of the second residual signal at a plurality of frequency points respectively;

the first terminal selects a residual signal with smaller energy between the first residual signal and the second residual signal at each of the plurality of frequency points to obtain the target residual signal.

In a second implementation manner of the first aspect of the present application, the performing, by the first terminal, residual echo estimation according to the estimated echo signal to obtain an estimated residual echo signal includes:

the first terminal jointly determines the estimated echo signal by using the Kalman filter and the variable step size NLMS filter;

the first terminal determines a first echo power spectrum of the estimated echo signal;

the first terminal performs harmonic generation processing on the estimated echo signal to obtain a power spectrum after harmonic generation;

the first terminal carries out frequency spectrum splicing on the power spectrum after the harmonic wave is generated and the first echo power spectrum to obtain a second echo power spectrum;

the first terminal carries out smoothing processing on the second echo power spectrum to obtain a third echo power spectrum;

the first terminal selects an echo power spectrum with larger frequency point energy between the third echo power spectrum and the second echo power spectrum at each of the plurality of frequency points to obtain the estimated residual echo signal.

In a third implementation manner of the first aspect of the present application, the performing, by the first terminal, residual echo suppression on the target residual signal according to the estimated residual echo signal to output an echo-suppressed signal includes:

the first terminal determines the energy of the target residual signal and the energy of the estimated residual echo signal;

the first terminal performs initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal to obtain a signal after the initial residual echo suppression;

the first terminal detects a fundamental tone signal of the signal subjected to the initial residual echo suppression;

when the pitch signal is detected in the initial residual echo suppressed signal, the first terminal performs harmonic enhancement on the initial residual echo suppressed signal to obtain a harmonic enhanced signal;

the first terminal performs secondary residual echo suppression on the harmonic enhanced signal through secondary gain calculation to obtain a signal subjected to secondary residual echo suppression;

the first terminal performs cepstrum smoothing on the signal subjected to the secondary residual echo suppression to obtain a signal subjected to cepstrum smoothing;

and the first terminal performs final residual echo suppression on the signal subjected to the cepstrum smoothing through final gain calculation to obtain a signal subjected to echo suppression.

In a fourth implementation manner of the first aspect of the present application, after the first terminal detects the pitch signal of the initial residual echo suppressed signal, the method includes:

when no pitch signal is detected in the initial residual echo suppressed signal, the first terminal performs cepstrum smoothing on the initial residual echo suppressed signal to obtain a cepstrum smoothed signal;

In a fifth implementation manner of the first aspect of the present application, the performing, by the first terminal, initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal to obtain an initial residual echo suppressed signal includes:

the first terminal determines a prior signal echo ratio according to the target residual signal and the estimated residual echo signal;

and the first terminal carries out the initial gain calculation according to the prior signal echo ratio.

In a sixth implementation manner of the first aspect of the present application, after the performing, by the first terminal, residual echo estimation according to the estimated echo signal to obtain an estimated residual echo signal, the method further includes:

the first terminal generates scene identification information according to the reference signal, the picked-up signal and the target residual signal, wherein the scene identification information comprises at least one of the amplitude of echo reverberation, the distortion degree of acoustic equipment and the change of an acoustic path;

and the first terminal dynamically adjusts the estimated residual echo signal according to the scene identification information.

In a second aspect, an embodiment of the present application provides an acoustic echo cancellation device for a first terminal, where the first terminal performs voice communication with a second terminal, and the device includes:

a signal obtaining module, configured to pick up a signal including an echo signal generated by the first terminal playing a reference signal, where the reference signal is a voice signal received by the first terminal from the second terminal;

an adaptive filtering module for adaptively filtering the reference signal and the pickup signal using a kalman filter and a variable step Normalized Least Mean Square (NLMS) filter to obtain a first residual signal and a second residual signal, respectively;

a hybrid filtering module, configured to perform hybrid filtering processing on the first residual signal and the second residual signal to obtain a target residual signal;

the residual echo estimation module is used for carrying out residual echo estimation according to the estimated echo signal so as to obtain an estimated residual echo signal;

and the residual echo suppression module is used for performing residual echo suppression on the target residual signal according to the estimated residual echo signal so as to output a signal after echo suppression.

In a first implementation form of the second aspect of the present application, the hybrid filtering module further includes:

a first energy determination module, configured to determine an energy of the first residual signal and an energy of the second residual signal at a plurality of frequency points, respectively;

a first signal selecting module, configured to select a residual signal with smaller energy between the first residual signal and the second residual signal at each of the plurality of frequency points to obtain the target residual signal.

In a second implementation manner of the second aspect of the present application, the residual echo estimation module further includes:

an estimated echo signal determination module for jointly determining the estimated echo signal using the kalman filter and the variable step size NLMS filter;

a power spectrum determination module for determining a first echo power spectrum of the estimated echo signal;

the harmonic generation processing module is used for carrying out harmonic generation processing on the estimated echo signal so as to obtain a power spectrum after harmonic generation;

the frequency spectrum splicing module is used for carrying out frequency spectrum splicing on the power spectrum after the harmonic wave is generated and the first echo power spectrum to obtain a second echo power spectrum;

the smoothing module is used for smoothing the second echo power spectrum to obtain a third echo power spectrum;

a second signal selection module, configured to select, at each of the plurality of frequency points, an echo power spectrum with a larger energy at a frequency point between the third echo power spectrum and the second echo power spectrum, so as to obtain the estimated residual echo signal.

In a third implementation form of the second aspect of the present application, the residual echo suppression module further includes:

a second energy determination module for determining an energy of the target residual signal and an energy of the estimated residual echo signal;

an initial residual echo suppression module, configured to perform initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal, so as to obtain a signal after initial residual echo suppression;

a pitch signal detection module, configured to perform pitch signal detection on the signal after the initial residual echo suppression;

a harmonic enhancement module to: when the pitch signal is detected in the initial residual echo suppressed signal, performing harmonic enhancement on the initial residual echo suppressed signal to obtain a harmonic enhanced signal;

the second residual echo suppression module is used for performing second residual echo suppression on the harmonic enhanced signal through second gain calculation to obtain a signal subjected to second residual echo suppression;

a cepstrum smoothing module, configured to perform cepstrum smoothing on the signal after the secondary residual echo suppression to obtain a cepstrum smoothed signal;

and the final residual echo suppression module is used for performing final residual echo suppression on the smoothed signal through final gain calculation so as to obtain a signal subjected to echo suppression.

In a fourth implementation form of the second aspect of the application,

the cepstrum smoothing module is further configured to; and when the pitch signal is not detected in the initial residual echo suppressed signal, performing cepstrum smoothing processing on the initial residual echo suppressed signal to obtain a cepstrum smoothed signal.

In a fifth implementation manner of the second aspect of the present application, the initial residual echo suppression module further includes:

a signal echo ratio determining module, configured to determine a prior signal echo ratio according to the target residual signal and the estimated residual echo signal;

and the initial gain calculation module is used for calculating the initial gain according to the echo ratio of the prior signal.

In a sixth implementation manner of the second aspect of the present application, the first terminal further includes:

a scene recognition module for generating scene recognition information from the reference signal, the pickup signal, and the target residual signal, wherein the scene recognition information includes at least one of an amplitude of echo reverberation, a distortion degree of an acoustic device, and a variation of an acoustic path;

the echo estimation module is further configured to: and after the scene identification module generates scene identification information, dynamically adjusting the estimated residual echo signal according to the scene identification information.

In a third aspect, the present application provides a computer-readable storage medium containing instructions, where the instructions, when executed on a computer, cause the computer to perform the method described in any one implementation manner of the first aspect and the first aspect.

The application provides a mixed adaptive filter based on Kalman filtering and variable-step NLMS filtering, and designs a method for multi-time NLP and an implicit single-call/double-call determination method so as to more accurately calculate residual echo suppression gain without delay estimation. The solution can significantly improve the robustness of the acoustic echo cancellation algorithm and achieve a smooth full duplex call effect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following briefly introduces the embodiments or the drawings required in the prior art. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic diagram of an acoustic echo cancellation system applying an acoustic echo cancellation method according to an embodiment of the present application;

FIG. 2 is a flow chart of an acoustic echo cancellation method provided in accordance with an embodiment of the present application;

FIG. 3 is a flowchart of an acoustic echo cancellation method in a scene according to an embodiment of the present application;

FIG. 4 is a flowchart of residual echo suppression gain calculation in a scenario provided in accordance with an embodiment of the present application;

FIG. 5 is a flow chart of an acoustic echo cancellation method in another scenario provided in accordance with an embodiment of the present application;

FIG. 6 is a flow chart of an acoustic echo cancellation method in another scenario provided in accordance with an embodiment of the present application;

FIG. 7 is a flow chart of an acoustic echo cancellation method in another scenario provided in accordance with an embodiment of the present application;

FIG. 8 is a flow chart of an acoustic echo cancellation method in another scenario provided in accordance with an embodiment of the present application;

fig. 9A is a schematic diagram of a first terminal provided in accordance with an embodiment of the present application;

fig. 9B is a schematic diagram of an adaptive filtering module provided in accordance with an embodiment of the present application;

fig. 9C is a schematic diagram of a hybrid filtering module provided in accordance with an embodiment of the present application;

FIG. 9D is a schematic diagram of a residual echo estimation module provided in accordance with an embodiment of the present application;

FIG. 9E is a schematic diagram of a residual echo suppression module provided in accordance with an embodiment of the present application;

FIG. 9F is a schematic diagram of an initial echo suppression module provided in accordance with an embodiment of the present application;

fig. 9G is a schematic diagram of another first terminal provided in accordance with an embodiment of the present application;

fig. 10 is a schematic diagram of an acoustic echo cancellation device applying an acoustic echo cancellation method according to an embodiment of the present application.

For the purpose of illustration, the drawings depict aspects of the example embodiments. Variations, alternative configurations, alternative components, and modifications may be made to the example embodiments.

Detailed description of the invention

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic diagram of an acoustic echo cancellation system applying an acoustic echo cancellation method according to an embodiment of the present application. Embodiments of the acoustic echo cancellation method described in the present application apply to speech dual-ended devices that communicate via a microphone and a speaker, for example, acoustic echo cancellation systems that apply the acoustic echo cancellation method. As shown in fig. 1, the acoustic echo cancellation system comprises a first terminal 10 and a second terminal 11. In the embodiment of the present application, the first terminal 10 communicates with the second terminal 11, for example, the first terminal 10 may communicate with the second terminal 11 through a wireless network, and the first terminal 10 performs voice communication with the second terminal 11. In addition, the first terminal 10 may also perform voice communication with a plurality of terminals. The voice communication method between the first terminal 10 and the plurality of terminals is similar to the voice communication method between the first terminal 10 and the second terminal 11, and refer to the descriptions of a plurality of scenarios in the following embodiments of the present application.

Fig. 2 is a flowchart of an acoustic echo cancellation method according to an embodiment of the present application. As shown in fig. 2, the following steps are included.

S201: the first terminal picks up a signal containing an echo signal generated by the first terminal playing a reference signal, wherein the reference signal is a voice signal received by the first terminal from the second terminal.

In this embodiment, the second terminal 11 as the far end transmits a voice signal to the first terminal 10 as the near end, and the first terminal 10 receives the voice signal, wherein the voice signal received by the first terminal 10 is defined as a reference signal, which is used as a reference in the acoustic echo cancellation of the following embodiments. The first terminal 10 is equipped with a speaker and a microphone, wherein the speaker of the first terminal 10 plays the voice signal received by the first terminal 10, and the microphone of the first terminal 10 picks up the voice signal played by the speaker of the first terminal 10. A speech signal picked up by the microphone of the first terminal 10 is defined as a picked-up signal, wherein the picked-up signal may for example consist of an actual echo signal and/or near-end speech and/or background noise. If the actual echo signal is sent back to the second terminal 11 so that the user of the second terminal 11 listens to his own voice, the actual echo signal should be suppressed.

S202: the first terminal performs adaptive filtering on the reference signal and the pickup signal by using a Kalman filter and a variable step Normalized Least Mean Square (NLMS) filter simultaneously to obtain a first residual signal and a second residual signal, respectively.

In this embodiment, the first terminal uses a kalman filter and a variable step size NLMS filter in the frequency domain simultaneously to obtain a robust adaptive filtering capability. The kalman filter is also referred to as a kalman adaptive filter. The convergence speed of the Kalman filter is high, the convergence performance of the Kalman filter in a double call is equivalent to that of a single call, but the absolute filtering performance of the Kalman filter is weak. In contrast, the absolute filtering performance of the variable-step NLMS filter after complete convergence is strong, but the convergence speed of the variable-step NLMS filter is slow, and the frequency point of the variable-step NLMS filter is easy to change. The combination of the variable-step NLMS filter and the kalman filter can make up for the above-mentioned respective drawbacks of the variable-step NLMS filter and the kalman filter. Specifically, the convergence performance of the kalman filter in the double-talk is equivalent to that of the single-talk to make up for the drawbacks of the conventional variable step size NLMS filter, because the kalman filter changes the step size by directly using the covariance matrix of the filter factor error, which is regarded as a fully automatic step size change, rather than artificially changing the step size. Meanwhile, the hybrid adaptive filter based on the variable step size NLMS filter and the Kalman filter can enable frequency points to be scattered less and enable filtering performance to be stronger.

The main processes of the kalman filter and the variable-step NLMS filter in the hybrid adaptive filter are shown below, respectively.

The step of the first terminal performing adaptive filtering on the reference signal and the pickup signal by using a kalman filter and a variable step Normalized Least Mean Square (NLMS) filter at the same time in S202 to obtain a first residual signal and a second residual signal respectively specifically includes the following steps.

S10: and the first terminal adopts the Kalman filtering factor updated in the previous frame to carry out self-adaptive filtering on the reference signal and the pickup signal so as to output a first residual error signal.

In this embodiment, the first residual signal may be composed of, for example, a residual echo signal of the kalman filter and/or near-end speech and/or background noise. The reference signal has a plurality of frames and the pickup signal has one frame. It is assumed that there are N frames and M frequency bins, and therefore, the kalman filtering factor has N × M factors, where N and M are positive integers.

S11: and the first terminal calculates the covariance matrix of the residual signal of the Kalman filter by adopting the covariance matrix of the filtering factor error.

In this embodiment, the residual signal of the kalman filter is the difference between the actual echo signal in the picked-up signal and the echo signal estimated by the kalman filter. Calculating a covariance matrix of a residual signal of the Kalman filter by:

S_k＝H_kP_k-1|_k-1H_k ^T+R_k

where Sk is a covariance matrix of a residual signal of the Kalman filter, Pk-1| k-1 is a covariance matrix of the filter factor error, H is the reference signal, Rk is a noise signal, and k is a current frame.

S12: and the first terminal calculates Kalman gain by adopting a covariance matrix of the Kalman filter residual signal.

In the present embodiment, the kalman gain is calculated by the following equation:

K_k＝P_k-1|_k-1H_k ^TS_k ^-1

wherein, K_kIs the Kalman gain, S_kIs the covariance matrix, P, of the residual signal of the Kalman filter_k-1|_k-1Is the covariance matrix of the filter factor error, and H is the reference signal.

S13: and the first terminal updates the Kalman filtering factor by adopting the Kalman gain.

In the present embodiment, the kalman filter factor is calculated by the following formula:

X_k|_k＝X_k-1|_k-1+K_kY_k

wherein, X_k|kIs an updated Kalman filter factor, X_k-1|_k-1Is the Kalman filter factor before update, K_kIs the Kalman gain, Y_kIs the filtered residual signal.

S14: the first terminal updates a covariance matrix of the kalman filter factor error for a next frame.

In this embodiment, the covariance matrix of the updated kalman filter factor error is calculated by the following equation:

P_k|_k＝(I-K_kH_k)P_k-1|_k-1+Q_k

wherein, P_k|_kIs the covariance matrix, P, of the updated Kalman filter factor error_k-1|_k-1Is the covariance matrix, K, of the Kalman filter factor error before update_kIs the Kalman gain, H_kIs said reference signal, Q_kIs K_kY_kThe expected variance of (c).

S20: and the first terminal carries out self-adaptive filtering on the reference signal and the picked signal according to a first variable step size NLMS filtering factor so as to obtain the second residual signal.

In this embodiment, the first terminal adaptively filters the reference signal and the pickup signal according to the variable step NLMS filtering factor updated in the previous frame to obtain the second residual signal. Wherein the second residual signal may for example consist of a residual echo signal of the variable step NLMS filter and/or near-end speech and/or background noise. Generally speaking, a speech frame is a period of 10-30 milliseconds, e.g., for an 8K signal sample, a 10ms frame includes 80 sample points.

S21: the first terminal determines a smoothed energy of the reference signal and a smoothed energy of the second residual signal; and determining low-speed smoothing energy of the second residual signal; determining frequency points at which the smoothing energy of the second residual signal is greater than the low-speed smoothing energy of the second residual signal; and frequency point constraining the determined frequency points to generate a third residual signal.

In this embodiment, the first terminal calculates a smoothing energy of the reference signal and a smoothing energy of the second residual signal, where a smoothing coefficient of the smoothing energy may be, for example, 0.9. The first terminal calculates a low-speed smoothing energy of the second residual signal, wherein a smoothing coefficient of the low-speed smoothing energy may be 0.98, for example, to determine a frequency point at which the smoothing energy of the second residual signal is greater than the low-speed smoothing energy of the second residual signal. Then, the first terminal performs frequency point constraint on the determined frequency points to generate a third residual signal.

S22: and the first terminal adjusts the filtering step length according to the third residual signal and a preset threshold value so as to obtain the adjusted filtering step length.

In this embodiment, the first terminal adjusts a filtering step size using the third residual signal and a preset threshold to obtain an adjusted filtering step size, where the preset threshold may be, for example, 2e^-6。

S23: the first terminal determines a second variable step size NLMS filtering factor according to the smooth energy of the reference signal, the smooth energy of the second residual signal and the adjusted filtering step size; and updating the variable step size NLMS filter according to the second variable step size NLMS filter factor.

In this embodiment, the first terminal calculates a new variable step NLMS filtering factor according to the smoothing energy of the reference signal, the smoothing energy of the second residual signal, and the adjusted filtering step, and updates the variable step NLMS filter according to the new variable step NLMS filtering factor.

S203: and the first terminal performs mixed filtering processing on the first residual signal and the second residual signal to obtain a target residual signal.

In this embodiment, after the first terminal performs adaptive filtering using a hybrid adaptive filter based on the kalman filter and the variable-step NLMS filter, the first terminal may determine a target residual signal of the hybrid adaptive filter according to the first residual signal and the second residual signal, where the target residual signal may be composed of, for example, an actual residual echo signal of the hybrid adaptive filter and/or near-end speech and/or background noise. The target residual signal may represent a difference between the pickup signal and an estimated echo signal, wherein the estimated echo signal is estimated by a hybrid adaptive filter combining the kalman filter and the variable-step NLMS filtering.

The step of performing, by the first terminal, hybrid filtering processing on the first residual signal and the second residual signal in S203 to obtain a target residual signal specifically includes the following steps.

S30: the first terminal calculates the energy of the first residual signal and the energy of the second residual signal at a plurality of frequency points, respectively.

S31: the first terminal selects a residual signal with smaller energy between the first residual signal and the second residual signal at each of the plurality of frequency points to obtain the target residual signal.

S204: and the first terminal carries out residual echo estimation according to the estimated echo signal so as to obtain an estimated residual echo signal.

In this embodiment, when the first terminal obtains the target residual signal, the first terminal may further obtain an estimated echo signal estimated by a hybrid adaptive filter combining the kalman filter and the variable-step NLMS filter. Based on the obtained estimated echo signal, the first terminal estimates an actual residual echo signal in the target residual echo signal, wherein the estimated residual echo signal is defined as the estimated residual echo signal.

In S204, the step of performing residual echo estimation by the first terminal according to the estimated echo signal to obtain an estimated residual echo signal specifically includes the following steps.

S40: the first terminal jointly determines the estimated echo signal using the kalman filter and the variable step size NLMS filter.

In this embodiment, the estimated echo signal may be obtained by calculating a difference between the pickup signal and the target residual signal. The estimated echo signal may also be obtained by combining a first estimated echo signal estimated by the kalman filter and a second estimated echo signal estimated by the variable-step NLMS filter.

S41: the first terminal determines a first echo power spectrum of the estimated echo signal.

S42: and the first terminal performs harmonic generation processing on the estimated echo signal to obtain a power spectrum after harmonic generation.

S43: and the first terminal carries out frequency spectrum splicing on the power spectrum after the harmonic generation and the first echo power spectrum to obtain a second echo power spectrum.

S44: and the first terminal carries out smoothing processing on the second echo power spectrum to obtain a third echo power spectrum.

S45: the first terminal selects an echo power spectrum with larger frequency point energy between the third echo power spectrum and the second echo power spectrum at each of the plurality of frequency points to obtain the estimated residual echo signal.

In this embodiment, the first terminal executes step 41 and step 42, where there is no precedence order between step 41 and step 42. The first terminal performs harmonic generation processing on the estimated echo signal, for example, calculates the energy of the estimated echo signal, and then performs inverse fourier transform. Further, an absolute value is taken for a real part of the energy of the estimated echo signal, and the result is inversely converted to be inversely converted to a frequency domain, and then a real part thereof is taken to complete a harmonic generation process. The frequency spectrum inconsistency between the reference signal and the pick-up signal caused by the nonlinear distortion can be compensated to a certain extent by the harmonic generation, so that the nonlinear echo is suppressed. And then, performing spectrum splicing on the power spectrum after the harmonic generation and the first echo power spectrum to obtain a second echo power spectrum. Finally, the second echo power spectrum is smoothed to obtain a third echo power spectrum, and a maximum power spectrum between the third echo power spectrum and the second echo power spectrum is used at each of a plurality of frequency points to determine a power spectrum of the estimated residual echo signal. The echo tail of the reverberant scene can be estimated in the energy spectrum of the picked-up signal by smoothing, and this type of echo reverberant signal can be suppressed in multiple nonlinear suppression.

S205: and the first terminal performs residual echo suppression on the target residual signal according to the estimated residual echo signal so as to output an echo suppressed signal.

In this embodiment, after the first terminal estimates the residual echo signal, the first terminal performs residual echo suppression on the target residual signal according to the estimated residual echo signal, and outputs an echo-suppressed signal through gain calculation. The residual echo suppression may be performed one or more times and the gain calculation may be done by one or more gain calculations, which will be explained below. The gain calculation may be performed by, for example, a Maximum A Posteriori (MAP) method, wherein the input parameters are a prior signal echo ratio, a posterior signal echo ratio, a gain adjustment constant, a suppression intensity limiting parameter. The gain calculation may also be performed by other methods, such as a wiener filter.

In S205, the step of performing residual echo suppression on the target residual signal by the first terminal according to the estimated residual echo signal to output an echo-suppressed signal includes the following specific steps.

S50: the first terminal determines an energy of the target residual signal and an energy of the estimated residual echo signal.

S51: and the first terminal performs initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal so as to obtain a signal after the initial residual echo suppression.

In this embodiment, the first terminal attempts to cancel the actual residual echo signal of the hybrid adaptive filter in the target residual signal using the estimated residual echo signal.

S52: and the first terminal detects the pitch signal of the signal subjected to the initial residual echo suppression.

In the present embodiment, it is determined whether or not a pitch portion exists in the signal after the initial residual echo suppression by using a pitch signal detection method. This step is actually an implicit one-call/two-call detection method. If pitch is detected, it indicates the presence of near-end speech.

S53: when the pitch signal is detected in the initial residual echo suppressed signal, the first terminal performs harmonic enhancement on the initial residual echo suppressed signal to obtain a harmonic enhanced signal.

In this embodiment, the pitch signal detection may be performed by comparing the energy of the initial residual echo suppressed signal with a preset pitch signal detection threshold. And if the energy of the signal after the initial residual echo suppression is greater than the preset pitch signal detection threshold, detecting the pitch signal. And if the energy of the signal after the initial residual echo suppression is smaller than the preset pitch signal detection threshold, the pitch signal is not detected. The pitch signal detection actually refers to an implicit single call/double call detection method.

S54: and the first terminal performs secondary residual echo suppression on the harmonic enhanced signal through secondary gain calculation to obtain a signal subjected to secondary residual echo suppression.

In this embodiment, the harmonic enhancement is performed in the double-talk, so that the damaged near-end speech can be partially restored, thereby improving the subjective auditory effect in the double-talk.

S55: and the first terminal performs cepstrum smoothing on the signal subjected to the secondary residual echo suppression to obtain a cepstrum smoothed signal.

In this embodiment, the cepstrum smoothing process may further suppress residual echoes in the target residual signal. The cepstrum smoothing of the signal after the suppression of the secondary residual echo is actually a smoothing of the prior signal echo ratio.

S56: and the first terminal performs final residual echo suppression on the smoothed signal through final gain calculation to obtain an echo suppressed signal.

In the present embodiment, the above steps S50 to S56 specifically show the process of multiple times of nonlinear suppression, so that the residual echo suppression gain calculation can be performed more accurately without performing delay estimation. In addition, the above steps S50 to S56 also specifically describe the implicit one call/two call determination method. Therefore, the embodiment of the application obviously improves the robustness of the acoustic echo cancellation algorithm and realizes the smooth full-duplex calling effect.

In this embodiment, after the first terminal detects the pitch signal of the signal after the initial residual echo suppression, the method includes the following steps.

S60: when the pitch signal is not detected in the initial residual echo suppressed signal, the first terminal performs cepstrum smoothing on the initial residual echo suppressed signal to obtain a cepstrum smoothed signal.

In this embodiment, the cepstrum smoothing process performed on the initial residual echo suppressed signal is actually a smoothing process performed on an a priori signal echo ratio. The pitch signal detection may be performed by comparing the energy of the initial residual echo suppressed signal with a preset pitch signal detection threshold. And if the energy of the signal after the initial residual echo suppression is greater than the preset pitch signal detection threshold, detecting the pitch signal. And if the energy of the signal after the initial residual echo suppression is smaller than the preset pitch signal detection threshold, the pitch signal is not detected. The pitch signal detection actually refers to an implicit single call/double call detection method.

S61: and the first terminal performs final residual echo suppression on the signal subjected to the cepstrum smoothing through final gain calculation to obtain a signal subjected to echo suppression.

In this embodiment, the cepstrum smoothing process may further suppress residual echoes in the target residual signal.

For S52, if the pitch signal is detected, triggering execution of S53; if the pitch signal is not detected, execution of S60 is triggered. And if the triggering execution is S60, directly performing cepstrum smoothing processing on the signal subjected to initial residual echo suppression after the initial residual echo suppression is finished so as to obtain a cepstrum smoothed signal. And finally, performing final gain calculation on the smoothed signal to realize final residual echo suppression so as to obtain the echo suppressed signal.

In S51, the first terminal performs initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal to obtain a signal after initial residual echo suppression specifically includes the following steps.

S71: and the first terminal determines a prior signal echo ratio according to the target residual signal and the estimated residual echo signal.

S72: and the first terminal carries out the initial gain calculation according to the prior signal echo ratio.

In this embodiment, after the terminal determines the energy of the target residual signal and the energy of the estimated residual echo signal, the first terminal calculates an a priori signal echo ratio by using a Decision Directed (DD) method, wherein the a priori signal echo ratio is an energy ratio between the calculated energy of the signal and the echo energy, and has a unit of dB, and then takes a logarithm and multiplies by 10. For example, based on the target residual signal and the estimated residual echo signal, the first terminal may estimate, for example, near-end speech or a combination of near-end speech and background noise, such that the a priori signal echo ratio may be determined from the estimated residual echo signal in combination with the estimated near-end speech or estimated near-end speech and background noise. The concept of the signal-to-echo ratio is similar to the signal-to-noise ratio. After the prior signal echo ratio is calculated, initial gain calculation is performed, and gain enabling processing is performed by using a wiener filter so as to perform initial residual echo suppression on the target residual signal.

In this embodiment, after the first terminal performs residual echo estimation according to the estimated echo signal to obtain an estimated residual echo signal, the following steps are included.

S81: the first terminal generates scene identification information according to the reference signal, the picked-up signal and the target residual signal, wherein the scene identification information comprises at least one of the amplitude of echo reverberation, the distortion degree of an acoustic device and the change of an acoustic path.

In this embodiment, the echo reverberation is mainly the tail of the echo direct sound. The echo direct sound is an element contained in the pickup signal. In contrast, the smear is an element not contained in the pickup signal, which is purely the reverberation produced by the near-end acoustic ambient reflections. The distortion degree is used for judging the difference between the curve of the echo amplitude frequency response and the curve of the picked-up signal. If there is no distortion, the two curves are parallel to each other. The change of the acoustic path refers to a process of changing an acoustic path from the pickup signal to a microphone due to a microphone block, a speaker block, or a voice reflection.

S82: and the first terminal dynamically adjusts the estimated residual echo signal according to the scene identification information.

In the embodiment, a scene recognition processing method is added. The scene recognition is used to determine at least one of a magnitude of echo reverberation, a degree of distortion of the acoustic device, and a variation of the acoustic path from the reference signal, the pickup signal, and the mixing filter output result. And dynamically adjusting the estimated residual echo signal according to the information to realize the intelligent characteristic of ensuring smooth double-call in an ideal scene and ensuring acoustic echo cancellation in a severe scene.

The application provides a hybrid adaptive filter based on Kalman filtering and variable step size NLMS filtering, which is used for providing robust adaptive filtering capability. The convergence performance of the Kalman adaptive filter in double calls is equivalent to that in single calls, so that the defects of a traditional variable step size NLMS filter are overcome, the output results of the variable step size NLMS filter and the Kalman filter are combined, the advantages of the two filters can be exerted, the number of divergent frequency points is reduced, and the filtering capability is stronger. In addition, the application provides an echo estimation method based on harmonic generation and echo smoothing, which is used for suppressing nonlinear echo and echo reverberation tail. The frequency spectrum inconsistency between the reference signal and the pick-up signal caused by the nonlinear distortion can be compensated to a certain extent by the harmonic generation, so that the nonlinear echo can be suppressed. Furthermore, the echo tail of a reverberant scene may be estimated in the energy spectrum of the pickup signal by smoothing, and this type of echo reverberant signal may be suppressed by multiple non-linear suppression.

In addition, the application provides a method for multiple nonlinear suppression, which comprises a double-talk harmonic enhancement method and a cepstrum smoothing gain calculation method based on pitch signal detection, wherein the double-talk near-end voice protection capability is strong, and echo suppression residual errors are less. The detection of the fundamental tone signal of the residual signal actually refers to an implicit one-call/two-call determination process, harmonic enhancement is performed in the two-call, and damaged near-end speech can be partially restored to improve the subjective auditory effect in the two-call. Cepstral smoothing may further suppress residual echo to further reduce echo residual.

Fig. 3 is a flowchart of an acoustic echo cancellation method in a scene according to an embodiment of the present application. As shown in fig. 3, the method includes:

s301: the first terminal acquires a reference signal and a pickup signal, wherein the reference signal is a voice signal transmitted from the second terminal to the first terminal, and the pickup signal is acquired by the first terminal through voice pickup of the voice signal played from the first terminal.

S302: the first terminal performs adaptive filtering on the reference signal and the pickup signal by using a Kalman filter and a variable step Normalized Least Mean Square (NLMS) filter simultaneously to obtain a first residual signal and a second residual signal, respectively.

S303: and the first terminal performs mixed filtering processing on the first residual signal and the second residual signal to obtain a target residual signal.

In S304, the first terminal performs residual echo estimation according to the estimated echo signal to obtain an estimated residual echo signal, where S304 specifically includes:

the first terminal determines the estimated echo signal jointly estimated by the Kalman filter and the variable step size NLMS filter; the first terminal calculates a first echo power spectrum of the estimated echo signal; the first terminal performs harmonic generation processing on the estimated echo signal to obtain a power spectrum after harmonic generation; the first terminal carries out frequency spectrum splicing on the power spectrum after the harmonic wave is generated and the first echo power spectrum to obtain a second echo power spectrum; the first terminal carries out smoothing processing on the second echo power spectrum to obtain a third echo power spectrum; the first terminal selects an echo power spectrum with larger frequency point energy between the third echo power spectrum and the second echo power spectrum at each of a plurality of frequency points to obtain the estimated residual echo signal.

S305: and the first terminal performs residual echo suppression on the target residual signal for multiple times according to the estimated residual echo signal so as to output an echo suppressed signal.

In the present embodiment, the above-mentioned acoustic echo cancellation method combines the kalman adaptive filter and the variable step NLMS filter to realize a robust adaptive filtering capability. Specifically, the convergence performance of the kalman adaptive filter in a double call is equivalent to the convergence performance in a single call, so as to make up for the defects of the conventional variable step size NLMS filter, and the output results of the variable step size NLMS filter and the kalman filter are used in combination, so that the advantages of the two filters can be exerted, the number of divergent frequency points is reduced, and the filtering capability is enhanced. In addition, the above acoustic echo cancellation method can compensate for the spectrum inconsistency between the reference signal and the pickup signal caused by the nonlinear distortion to some extent through harmonic generation, thereby suppressing the nonlinear echo.

Fig. 4 is a flowchart of residual echo suppression gain calculation in a scene according to an embodiment of the present application. As shown in fig. 4, the residual echo suppression gain calculation includes:

s401: the first terminal determines an energy of the target residual signal and an energy of the estimated residual echo signal.

S402: and the first terminal performs initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal to obtain a signal after the initial residual echo suppression, wherein the initial gain calculation is performed according to a prior signal echo ratio, and the prior signal echo ratio is determined according to the target residual signal and the estimated residual echo signal.

S403: and the first terminal performs pitch signal detection on the initial residual echo suppressed signal, wherein the pitch signal detection can be performed by comparing the energy of the initial residual echo suppressed signal with a preset threshold value. And if the energy of the signal after the initial residual echo suppression is greater than the preset pitch signal detection threshold, detecting the pitch signal. And if the energy of the signal after the initial residual echo suppression is smaller than the preset pitch signal detection threshold, the pitch signal is not detected.

S404: when the pitch signal is detected in the initial residual echo suppressed signal, the first terminal performs harmonic enhancement on the initial residual echo suppressed signal.

S405: and the first terminal performs secondary residual echo suppression on the harmonic enhanced signal through secondary gain calculation to obtain a signal subjected to secondary residual echo suppression.

S406: and when no pitch signal is detected in the signal subjected to the secondary residual echo suppression, the first terminal performs cepstrum smoothing on the signal subjected to the secondary residual echo suppression to obtain a cepstrum smoothed signal.

S407: and the first terminal performs final residual echo suppression on the smoothed signal through final gain calculation to obtain an echo suppressed signal.

In this embodiment, the multiple nonlinear suppression methods may include a double-talk harmonic enhancement method based on pitch signal detection and a cepstrum smoothing gain calculation method, so that the double-talk near-end speech protection capability is strong and the echo suppression residual is less. The detection of the fundamental tone signal of the residual signal actually refers to an implicit one-call/two-call determination process, harmonic enhancement is performed in the two-call, and damaged near-end speech can be partially restored to improve the subjective auditory effect in the two-call. The cepstral smoothing may further suppress residual echoes to further reduce echo residual.

Fig. 5 is a flowchart of an acoustic echo cancellation method in another scenario provided according to an embodiment of the present application. The embodiment shown in fig. 5 differs from the embodiments shown in fig. 3 and 4 mainly in that a scene recognition module is added. The scene identification module may identify whether an echo may be generated from an ideal scene or a bad scene. And the first terminal determines the echo reverberation size, the distortion degree of the acoustic equipment and the change of the acoustic path according to the reference signal, the picked-up signal and the output result of the mixing filter. And dynamically adjusting the estimated residual echo value according to the information, thereby realizing the intelligent characteristic of ensuring smooth double-call in an ideal scene and ensuring the acoustic echo cancellation in a severe scene.

Fig. 6 is a flowchart of an acoustic echo cancellation method in another scenario provided according to an embodiment of the present application. The main difference between the embodiment shown in fig. 6 and the embodiments shown in fig. 3 and 4 is that the echo suppression process is performed by only one gain calculation and enabling. The embodiment is the simplest processing in the application, is mainly used for reducing the calculation overhead, and is suitable for a scene with relatively fixed echoes, such as an earphone mode or a handheld mode of a mobile phone.

Fig. 7 is a flowchart of an acoustic echo cancellation method in another scenario provided according to an embodiment of the present application. The main difference between the embodiment shown in fig. 7 and the embodiments shown in fig. 3 and 4 is that the echo suppression processing is performed only once by gain calculation and enabling and a scene recognition module is added. The embodiment is suitable for scenes with small distortion of acoustic equipment and large variation of echo paths.

Fig. 8 is a flowchart of an acoustic echo cancellation method in another scenario provided according to an embodiment of the present application. The main difference between the embodiment shown in fig. 8 and the embodiments shown in fig. 3 and 4 is the addition of an additional kalman filter after the hybrid filtering process. The embodiment is suitable for scenes with the need for front-end enhancement of speech recognition.

In the technical scheme provided by the embodiment of the application, the hybrid adaptive filtering has extremely high stability, high convergence speed and accurate residual echo estimation. Thus, the present application may be used for devices and applications that require acoustic echo cancellation, for example, products and applications such as notebook computers, tablet telephony, video conferencing systems, voice recognition, and front-end enhancements.

Fig. 9A is a schematic diagram of a first terminal provided according to an embodiment of the present application. As shown in fig. 9A, the first terminal is specifically a first terminal 900, where the first terminal 900 performs voice communication with a second terminal, and includes: a signal acquisition module 901, an adaptive filtering module 902, a hybrid filtering module 903, a residual echo estimation module 904, and a residual echo suppression module 905.

The signal obtaining module 901 is configured to pick up a signal including an echo signal generated by the first terminal playing a reference signal, where the reference signal is a voice signal received by the first terminal from the second terminal.

The adaptive filtering module 902 is configured to adaptively filter the reference signal and the pickup signal using a kalman filter and a variable step Normalized Least Mean Square (NLMS) filter to obtain a first residual signal and a second residual signal, respectively.

The hybrid filtering module 903 is configured to perform hybrid filtering processing on the first residual signal and the second residual signal to obtain a target residual signal.

The residual echo estimation module 904 is configured to perform residual echo estimation according to the estimated echo signal to obtain an estimated residual echo signal.

The residual echo suppression module 905 is configured to perform residual echo suppression on the target residual signal according to the estimated residual echo signal, so as to output an echo-suppressed signal.

Fig. 9B is a schematic diagram of an adaptive filtering module according to an embodiment of the present application. As shown in fig. 9B, the adaptive filtering module 902 further includes: a variable step size NLMS filtering module 9021, a residual signal generating module 9022, a step size adjusting module 9023, and a coefficient updating module 9024.

The variable step NLMS filtering module 9021 performs adaptive filtering on the reference signal and the pickup signal according to a first variable step NLMS filtering factor to obtain the second residual signal.

The residual signal generating module 9022 is configured to determine a smooth energy of the reference signal, a smooth energy of the second residual signal, and a low-speed smooth energy of the second residual signal; determining frequency points at which the smoothing energy of the second residual signal is greater than the low-speed smoothing energy of the second residual signal; and frequency point constraining the determined frequency points to generate a third residual signal.

The step length adjusting module 9023 is configured to adjust a filtering step length according to the third residual signal and a preset threshold, so as to obtain an adjusted filtering step length.

The coefficient updating module 9024 is configured to determine a second variable step size NLMS filtering factor according to the smoothing energy of the reference signal, the smoothing energy of the second residual signal, and the adjusted filtering step size; and updating the variable step size NLMS filter according to the second variable step size NLMS filter factor.

Fig. 9C is a schematic diagram of a hybrid filtering module provided according to an embodiment of the present application. As shown in fig. 9C, the hybrid filter module 903 further includes: a first energy determination module 9031 and a first signal selection module 9032.

The first energy determining module 9031 is configured to determine the energy of the first residual signal and the energy of the second residual signal at a plurality of frequency points, respectively.

The first signal selecting module 9032 is configured to select, at each of the plurality of frequency points, a residual signal with smaller energy between the first residual signal and the second residual signal to obtain the target residual signal.

Fig. 9D is a schematic diagram of a residual echo estimation module provided according to an embodiment of the present application. As shown in fig. 9D, the residual echo estimation module 904 further includes: an estimated echo signal determination module 9041, a power spectrum calculation module 9042, a harmonic generation processing module 9043, a frequency spectrum splicing module 9044, a smoothing processing module 9045, and a second signal selection module 9046.

The estimated echo signal determination module 9041 is configured to jointly determine the estimated echo signal using the kalman filter and the variable step size NLMS filter.

The power spectrum determination module 9042 is configured to determine a first echo power spectrum of the estimated echo signal.

The harmonic generation processing module 9043 is configured to perform harmonic generation processing on the estimated echo signal to obtain a power spectrum after harmonic generation.

The frequency spectrum splicing module 9044 is configured to perform frequency spectrum splicing on the power spectrum after the harmonic generation and the first echo power spectrum to obtain a second echo power spectrum.

The smoothing module 9045 is configured to perform smoothing processing on the second echo power spectrum to obtain a third echo power spectrum.

The second signal selecting module 9046 is configured to select, at each of the plurality of frequency points, an echo power spectrum with a larger energy at a frequency point between the third echo power spectrum and the second echo power spectrum, so as to obtain the estimated residual echo signal.

Fig. 9E is a schematic diagram of a residual echo suppression module provided according to an embodiment of the present application. As shown in fig. 9E, the residual echo suppression module 905 further includes: a second energy determination module 9051, an initial residual echo suppression module 9052, a pitch signal detection module 9053, a harmonic enhancement module 9054, a secondary residual echo suppression module 9055, a cepstrum smoothing module 9056, and a final residual echo suppression module 9057.

The second energy determining module 9051 is configured to determine an energy of the target residual signal and an energy of the estimated residual echo signal.

The initial residual echo suppression module 9052 is configured to perform initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal, so as to obtain a signal after initial residual echo suppression.

The pitch signal detection module 9053 is configured to perform pitch signal detection on the signal after the initial residual echo suppression.

The harmonic enhancement module 9054 is configured to: and when the pitch signal is detected in the initial residual echo suppressed signal, performing harmonic enhancement on the initial residual echo suppressed signal to obtain a harmonic enhanced signal.

The second residual echo suppression module 9055 is configured to perform second residual echo suppression on the harmonic-enhanced signal through second gain calculation to obtain a signal after the second residual echo suppression.

The cepstrum smoothing module 9056 is configured to perform cepstrum smoothing on the signal after the secondary residual echo suppression, so as to obtain a cepstrum smoothed signal. When no pitch signal is detected in the signal after the secondary residual echo suppression, the cepstrum smoothing module 9056 is further configured to perform cepstrum smoothing on the signal after the initial residual echo suppression to obtain a cepstrum smoothed signal.

The final residual echo suppression module 9057 is configured to perform final residual echo suppression on the smoothed signal through final gain calculation to obtain an echo-suppressed signal. The final residual echo suppression module 9057 is further configured to perform final residual echo suppression on the smoothed signal through final gain calculation to obtain an echo-suppressed signal.

Fig. 9F is a schematic diagram of an initial echo suppression module provided in accordance with an embodiment of the present application. As shown in fig. 9F, the initial echo suppression module 9052 includes: a signal echo ratio determination module 90521 and an initial gain calculation module 90522.

The signal echo ratio determination module 90521 is configured to determine an a priori signal echo ratio according to the target residual signal and the estimated residual echo signal.

The initial gain calculation module 90522 is configured to perform the initial gain calculation according to the a priori signal echo ratio.

Fig. 9G is a schematic diagram of another first terminal provided in an embodiment of the present application. As shown in fig. 9G, the first terminal 900 further includes a scene recognition module 906.

The scene recognition module 906 is configured to generate scene recognition information according to the reference signal, the picked-up signal, and the target residual signal, wherein the scene recognition information includes at least one of an amplitude of echo reverberation, a distortion degree of an acoustic device, and a variation of an acoustic path.

The echo estimation module 904 is further configured to: and after the scene identification module generates scene identification information, dynamically adjusting the estimated residual echo signal according to the scene identification information.

The acoustic echo cancellation device comprises various components, such as a processor 10300, a memory 10100, and a transceiver 10200, connected via a bus 10400. The memory 10100 can store data 10110 and instructions 10120. The processor 10300 can implement the disclosed methods by executing the instructions 10120 and using the data 10110. The transceiver 10200 comprises a transmitter 10210 and a receiver 10220 so that signals can be transmitted and received from the acoustic echo cancellation device.

The steps of the methods described herein may be embodied directly in hardware, in software executed by a processor, which may be located in a computer-readable storage medium, or in a combination of the two. The nature of the inventive solution or its contributions to the art, or all or part of it, may thus be embodied in a software product. The computer software product may be stored in a computer readable storage medium and contain instructions for instructing a computer device (e.g., a personal computer, server, or network device) to perform all or part of the steps of the method specified in any one of the embodiments of the invention. Examples of the computer readable storage medium include various media capable of storing program code, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The previous description of the specific embodiments is provided to enable any person skilled in the art to make or use the present invention. However, various modifications of the embodiments of the invention which are within the general principles of the invention are also within the scope of the invention.

Claims

1. An acoustic echo cancellation method for a first terminal, wherein the first terminal is in voice communication with a second terminal, comprising:

(201) the first terminal picks up a signal containing an echo signal generated by the first terminal playing a reference signal, wherein the reference signal is a voice signal received by the first terminal from the second terminal;

(202) the first terminal adaptively filters the reference signal and the pickup signal using a kalman filter and a variable step normalized mean square (NLMS) filter to obtain a first residual signal and a second residual signal, respectively;

(203) the first terminal performs mixed filtering processing on the first residual signal and the second residual signal to obtain a target residual signal;

(204) the first terminal carries out residual echo estimation according to the estimated echo signal so as to obtain an estimated residual echo signal;

(205) the first terminal performs residual echo suppression on the target residual signal according to the estimated residual echo signal to output an echo suppressed signal;

wherein the (203) performing, by the first terminal, hybrid filtering processing on the first residual signal and the second residual signal to obtain a target residual signal comprises:

(30) the first terminal determines the energy of the first residual signal and the energy of the second residual signal at a plurality of frequency points respectively;

(31) the first terminal selects a residual signal with smaller energy between the first residual signal and the second residual signal at each of the plurality of frequency points to obtain the target residual signal.

2. The method of claim 1, wherein the (204) the first terminal performing residual echo estimation from the estimated echo signal to obtain an estimated residual echo signal comprises:

(40) the first terminal jointly determines the estimated echo signal by using the Kalman filter and the variable step size NLMS filter;

(41) the first terminal determines a first echo power spectrum of the estimated echo signal;

(42) the first terminal performs harmonic generation processing on the estimated echo signal to obtain a power spectrum after harmonic generation;

(43) the first terminal carries out frequency spectrum splicing on the power spectrum after the harmonic wave is generated and the first echo power spectrum to obtain a second echo power spectrum;

(44) the first terminal carries out smoothing processing on the second echo power spectrum to obtain a third echo power spectrum;

(45) the first terminal selects an echo power spectrum with larger frequency point energy between the third echo power spectrum and the second echo power spectrum at each of a plurality of frequency points to obtain the estimated residual echo signal.

3. The method of claim 1, wherein said (205) residual echo suppressing the target residual signal from the estimated residual echo signal by the first terminal to output an echo suppressed signal comprises:

(50) the first terminal determines the energy of the target residual signal and the energy of the estimated residual echo signal;

(51) the first terminal performs initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal to obtain a signal after the initial residual echo suppression;

(52) the first terminal detects a fundamental tone signal of the signal subjected to the initial residual echo suppression;

(53) when the pitch signal is detected in the initial residual echo suppressed signal, the first terminal performs harmonic enhancement on the initial residual echo suppressed signal to obtain a harmonic enhanced signal;

(54) the first terminal performs secondary residual echo suppression on the harmonic enhanced signal through secondary gain calculation to obtain a signal subjected to secondary residual echo suppression;

(55) the first terminal performs cepstrum smoothing on the signal subjected to the secondary residual echo suppression to obtain a signal subjected to cepstrum smoothing;

(56) and the first terminal performs final residual echo suppression on the signal subjected to the cepstrum smoothing through final gain calculation to obtain a signal subjected to echo suppression.

4. The method according to claim 3, comprising, after detection of the pitch signal by the first terminal (52) of the initial residual echo suppressed signal:

(60) when no pitch signal is detected in the initial residual echo suppressed signal, the first terminal performs cepstrum smoothing on the initial residual echo suppressed signal to obtain a cepstrum smoothed signal;

(61) and the first terminal performs final residual echo suppression on the signal subjected to the cepstrum smoothing through final gain calculation to obtain a signal subjected to echo suppression.

5. The method according to claim 3, wherein the (51) first terminal performing an initial residual echo suppression on the target residual signal by an initial gain calculation based on the estimated residual echo signal to obtain an initial residual echo suppressed signal comprises:

(71) the first terminal determines a prior signal echo ratio according to the target residual signal and the estimated residual echo signal;

(72) and the first terminal carries out the initial gain calculation according to the prior signal echo ratio.

6. The method of claim 1, further comprising, after the (204) the first terminal performing residual echo estimation from the estimated echo signal to obtain an estimated residual echo signal:

(81) the first terminal generates scene identification information according to the reference signal, the picked-up signal and the target residual signal, wherein the scene identification information comprises at least one of the amplitude of echo reverberation, the distortion degree of acoustic equipment and the change of an acoustic path;

(82) and the first terminal dynamically adjusts the estimated residual echo signal according to the scene identification information.

7. An acoustic echo cancellation device for a first terminal, wherein the first terminal is in voice communication with a second terminal, comprising:

a signal obtaining module (901) configured to pick up a signal including an echo signal generated by the first terminal playing a reference signal, where the reference signal is a voice signal received by the first terminal from the second terminal;

an adaptive filtering module (902) for adaptively filtering the reference signal and the pickup signal using a kalman filter and a variable step Normalized Least Mean Square (NLMS) filter to obtain a first residual signal and a second residual signal, respectively;

a hybrid filtering module (903) for performing hybrid filtering processing on the first residual signal and the second residual signal to obtain a target residual signal;

a residual echo estimation module (904) for performing a residual echo estimation from the estimated echo signal to obtain an estimated residual echo signal;

a residual echo suppression module (905) for performing residual echo suppression on the target residual signal according to the estimated residual echo signal to output an echo suppressed signal;

wherein the hybrid filtering module (903) further comprises:

a first energy determination module (9031) configured to determine an energy of the first residual signal and an energy of the second residual signal at a plurality of frequency points, respectively;

a first signal selection module (9032) configured to select, at each of the plurality of frequency bins, a residual signal with a smaller energy between the first residual signal and the second residual signal to obtain the target residual signal.

8. The apparatus of claim 7, wherein the residual echo estimation module (904) further comprises:

an estimated echo signal determination module (9041) for determining the estimated echo signal using the kalman filter and the variable-step NLMS filter together;

a power spectrum determination module (9042) for determining a first echo power spectrum of the estimated echo signal;

a harmonic generation processing module (9043) configured to perform harmonic generation processing on the estimated echo signal to obtain a power spectrum after harmonic generation;

a frequency spectrum splicing module (9044) for performing frequency spectrum splicing on the power spectrum after the harmonic generation and the first echo power spectrum to obtain a second echo power spectrum;

a smoothing module (9045) configured to perform smoothing processing on the second echo power spectrum to obtain a third echo power spectrum;

a second signal selection module (9046) configured to select, at each of a plurality of frequency points, an echo power spectrum with a larger energy at the frequency point between the third echo power spectrum and the second echo power spectrum, to obtain the estimated residual echo signal.

9. The apparatus of claim 7, wherein the residual echo suppression module (905) further comprises:

a second energy determination module (9051) for determining an energy of the target residual signal and an energy of the estimated residual echo signal;

an initial residual echo suppression module (9052) configured to perform initial residual echo suppression on the target residual signal through initial gain calculation according to the estimated residual echo signal, so as to obtain a signal after initial residual echo suppression;

a pitch signal detection module (9053) configured to perform pitch signal detection on the signal after the initial residual echo suppression;

a harmonic enhancement module (9054) to: when the pitch signal is detected in the initial residual echo suppressed signal, performing harmonic enhancement on the initial residual echo suppressed signal to obtain a harmonic enhanced signal;

a second residual echo suppression module (9055) for performing second residual echo suppression on the harmonic-enhanced signal through second gain calculation to obtain a second residual echo suppressed signal;

a cepstrum smoothing module (9056) configured to perform cepstrum smoothing on the signal subjected to the secondary residual echo suppression to obtain a cepstrum smoothed signal;

and the final residual echo suppression module (9057) is used for performing final residual echo suppression on the smoothed signal through final gain calculation to obtain an echo suppressed signal.

10. The apparatus of claim 9,

the cepstrum smoothing module (9056) is further configured to; and when the pitch signal is not detected in the initial residual echo suppressed signal, performing cepstrum smoothing processing on the initial residual echo suppressed signal to obtain a cepstrum smoothed signal.

11. The apparatus of claim 9, wherein the initial residual echo suppression module (9052) further comprises:

a signal echo ratio determination module (90521) for determining an a priori signal echo ratio from the target residual signal and the estimated residual echo signal;

an initial gain calculation module (90522) for performing the initial gain calculation based on the a priori signal echo ratio.

12. The apparatus of claim 7, wherein the first terminal further comprises:

a scene identification module (906) for generating scene identification information from the reference signal, the pickup signal, and the target residual signal, wherein the scene identification information includes at least one of an amplitude of echo reverberation, a degree of distortion of an acoustic device, and a variation of an acoustic path;

the echo estimation module (904) is further configured to: and after the scene identification module generates scene identification information, dynamically adjusting the estimated residual echo signal according to the scene identification information.

13. A computer-readable storage medium containing instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 6.