CN111246037B

CN111246037B - Echo cancellation method, device, terminal equipment and medium

Info

Publication number: CN111246037B
Application number: CN202010183666.0A
Authority: CN
Inventors: 吴威麒; 肖波; 许一峰; 陈满砚
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2021-11-16
Anticipated expiration: 2040-03-16
Also published as: CN111246037A

Abstract

The disclosure discloses an echo cancellation method, an echo cancellation device, a terminal device and a medium. The method comprises the following steps: acquiring a far-end signal; processing the far-end signal by a step-length variable adaptive filter to obtain an echo signal, wherein the step-length variable adaptive filter is an adaptive filter with a variable learning factor step length when processing each frame of the far-end signal; determining a residual spectrum signal according to a microphone signal and the echo signal; and carrying out nonlinear processing on the residual spectrum signal to obtain an output signal so as to complete echo cancellation. By using the method, the generation of the echo leakage phenomenon is effectively avoided by the step-length variable self-adaptive filter. Furthermore, echo is effectively eliminated based on nonlinear processing.

Description

Echo cancellation method, device, terminal equipment and medium

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to an echo cancellation method, apparatus, terminal device, and medium.

Background

An adaptive filter refers to a filter that changes parameters and structure of the filter using an adaptive algorithm according to a change in environment. In general, the structure of the adaptive filter is not changed. While the coefficients of the adaptive filter are time-varying coefficients updated by the adaptive algorithm. I.e. its coefficients are automatically adapted continuously to a given signal to obtain a desired response. The most important feature of the adaptive filter is that it can operate efficiently in unknown environments and can track the time-varying characteristics of the input signal.

Generally, the learning factor of the conventional linear adaptive filter cannot be adjusted quickly according to the change of an echo path or the occurrence of a double-talk state, and the convergence speed is relatively slow, so that the problem of echo leakage often occurs easily.

Disclosure of Invention

The present disclosure provides an echo cancellation method, apparatus, terminal device and medium, which effectively avoid the problem of echo leakage.

In a first aspect, an embodiment of the present disclosure provides an echo cancellation method, including:

acquiring a far-end signal;

processing the far-end signal by a step-length variable adaptive filter to obtain an echo signal, wherein the step-length variable adaptive filter is an adaptive filter with a variable learning factor step length when processing each frame of the far-end signal;

determining a residual spectrum signal according to a microphone signal and the echo signal;

and carrying out nonlinear processing on the residual spectrum signal to obtain an output signal so as to complete echo cancellation.

In a second aspect, an embodiment of the present disclosure further provides an echo cancellation device, including:

the acquisition module is used for acquiring a far-end signal;

the first processing module is used for processing the far-end signal by a step-length variable adaptive filter to obtain an echo signal, wherein the step-length variable adaptive filter is an adaptive filter with a variable learning factor step length when processing each frame of the far-end signal;

a determining module for determining a residual spectrum signal according to a microphone signal and the echo signal;

and the second processing module is used for carrying out nonlinear processing on the residual spectrum signal to obtain an output signal so as to complete echo cancellation.

In a third aspect, an embodiment of the present disclosure further provides a terminal device, including:

one or more processing devices;

storage means for storing one or more programs;

the one or more programs are executed by the one or more processing devices, so that the one or more processing devices implement the echo cancellation method provided by the embodiment of the disclosure.

In a fourth aspect, the disclosed embodiments also provide a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processing device, implements the echo cancellation method provided by the disclosed embodiments.

The embodiment of the disclosure provides an echo cancellation method, an echo cancellation device, a terminal device and a medium, wherein a far-end signal is obtained firstly; secondly, processing the far-end signal by a step-length variable adaptive filter to obtain an echo signal, wherein the step-length variable adaptive filter is an adaptive filter with a variable learning factor step length when processing each frame of the far-end signal; then determining a residual spectrum signal according to the microphone signal and the echo signal; and finally, carrying out nonlinear processing on the residual spectrum signal to obtain an output signal so as to complete echo cancellation. By using the method, the generation of the echo leakage phenomenon is effectively avoided by the step-length variable self-adaptive filter. Furthermore, echo is effectively eliminated based on nonlinear processing.

Drawings

Fig. 1 is a schematic flowchart of an echo cancellation method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an echo cancellation method according to a second embodiment of the disclosure;

fig. 2a is a schematic structural diagram of an echo cancellation method according to a second embodiment of the present disclosure;

fig. 2b is a schematic diagram of a remote signal according to a second embodiment of the disclosure;

fig. 2c is a schematic diagram of a near-end signal according to a second embodiment of the disclosure;

fig. 2d is a schematic diagram of an output signal according to a second embodiment of the disclosure;

fig. 3 is a schematic structural diagram of an echo cancellation device according to a third embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

In the following embodiments, optional features and examples are provided in each embodiment, and various features described in the embodiments may be combined to form a plurality of alternatives, and each numbered embodiment should not be regarded as only one technical solution. Furthermore, the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

Example one

Fig. 1 is a schematic flowchart of an echo cancellation method according to an embodiment of the present disclosure, where the method is applicable to solve the problem of leaky echo. The method may be performed by an echo cancellation device, wherein the device may be implemented by software and/or hardware and is generally integrated on a terminal device, which in this embodiment includes but is not limited to: mobile phones, computers, personal digital assistants, and the like.

The echo cancellation method disclosed by the present disclosure may be an echo processing method based on a software algorithm level, the echo cancellation method may be packaged as an application program in a terminal device, and the echo cancellation method may be used to solve a technical problem of echo leakage in the terminal device, and may also solve a technical problem of echo leakage in a communication process of other terminal devices.

As shown in fig. 1, a method for echo cancellation according to a first embodiment of the present disclosure includes the following steps:

and S110, acquiring a far-end signal.

The far-end signal may be a signal collected by a far-end microphone. The manner in which the far-end signal is acquired is not limited herein. After the far-end signal is acquired, the far-end signal can be used for estimating an echo signal so as to perform echo cancellation processing on a microphone signal acquired by a near-end microphone.

When a speaker A speaks locally, the voice is sent to a speaker B through audio preprocessing, coding and packaging, when the end B broadcasts through a loudspeaker, the voice of the speaker A is recorded back, and the voice is sent to the speaker A through coding and packaging, so that the speaker A hears own echo and seriously interferes with conversation communication. The echo cancellation method may be integrated on the terminal equipment used by speaker a. The microphone on the terminal device may be a near-end microphone. The microphone on the terminal device used by speaker B may be a far-end microphone.

And S120, processing the far-end signal by a step-length variable self-adaptive filter to obtain an echo signal.

The step-size-variable adaptive filter is an adaptive filter with a variable learning factor step size when processing the far-end signal of each frame. In order to solve the technical problem of the leaky echo of the linear adaptive filter, the echo signal is determined by the step-size variable adaptive filter.

The step size of the learning factor is variable when the step size variable adaptive filter processes each frame of far-end signals. The step-size-variable adaptive filter may be a linear filter in which the step size of the learning factor is variable; or a Kalman filter with variable learning factor step size, etc. The echo signal may be considered to be an echo signal estimated based on the far-end signal.

After the far-end signal is obtained, the step may process the far-end signal through the step size variable adaptive filter to obtain the echo signal. Specifically, when the step-size variable adaptive filter processes the far-end signal, the far-end signal may be divided into at least two sub-filter blocks, and the far-end signal is processed in the frequency domain. The learning factor of the step-size variable adaptive filter is variable when processing each frame of far-end signals.

Specifically, the processing of the far-end signal by the step-size variable adaptive filter to obtain the echo signal includes:

determining a voice signal of each sub-filter block according to the far-end signal;

carrying out Fourier transform on each voice signal to obtain a corresponding frequency domain signal;

and multiplying the frequency domain signal of each sub-filter by a filter coefficient, accumulating and carrying out Fourier inversion to obtain an echo signal.

The step-size variable adaptive filter may first determine the speech signal corresponding to each sub-filter when processing the far-end signal.

Illustratively, the nth frame speech of the nth sub-filter block is represented as:

x_p(n)＝[x(nR-pL-M+1)，...x(nR-p*L)]^T. Where P is 0, 1, 2., (P-1), L is the length of each filter, R is the speech frame shift, M is the number of sample points where the frame overlaps with the frame, n, L, R, and M are positive numbers, P is an integer, and x (n) represents the far-end signal.

The frequency domain signal corresponding to the speech signal can be represented as X_p(n，k)＝FFT(x_p(n)). Wherein, X_pAnd (n, k) may be a frequency domain signal of a k frequency point of the nth frame.

After the frequency domain signal of each sub-filter is determined, each sub-filter may be multiplied by the corresponding filter coefficient and then accumulated to perform inverse fourier transform to obtain an echo signal. For example, the last L elements of the signal obtained after the inverse fourier transform are used as echo signals.

The echo signal may be:

the last L elements of (a).

Wherein the content of the first and second substances,

being echo signals, W_p(n, k) is the filter coefficient of the p-th sub-filter block, X_pAnd (n, k) is a frequency domain signal of the voice signal corresponding to the p-th sub-filter block.

And S130, determining a residual spectrum signal according to the microphone signal and the echo signal.

The microphone signal may be considered to be the signal picked up by the near-end microphone. The microphone signal may be the sum of the near-end speech signal, the local noise signal and the actual echo signal. The residual spectrum signal can be regarded as a fourier transformed residual signal. The residual signal may be considered as the microphone signal after the echo signal is removed.

After the echo signal is determined, the microphone signal of the frequency domain from which the echo signal is removed is used as a residual spectrum signal in the step.

And S140, carrying out nonlinear processing on the residual spectrum signal to obtain an output signal so as to complete echo cancellation.

After the residual spectrum signal is determined, the residual spectrum signal may be subjected to nonlinear processing in this step to obtain an output signal from which the echo is removed. The output signal may be transmitted to a remote end.

The non-linear processing may be post-non-linear filtering to suppress non-linear noise in the residual spectrum signal. For example, the residual spectrum signal may be subjected to a residual echo noise reduction process and/or a learning factor-based nonlinear process to obtain an output signal.

The non-linear processing based on the learning factor may be to determine a non-linear factor based on the learning factor, so as to perform non-linear processing on the residual spectrum signal based on the non-linear factor.

In a method for echo cancellation provided in an embodiment of the present disclosure, a far-end signal is first obtained; secondly, processing the far-end signal by a step-length variable adaptive filter to obtain an echo signal, wherein the step-length variable adaptive filter is an adaptive filter with a variable learning factor step length when processing each frame of the far-end signal; then determining a residual spectrum signal according to the microphone signal and the echo signal; and finally, carrying out nonlinear processing on the residual spectrum signal to obtain an output signal so as to complete echo cancellation. By using the method, the generation of the echo leakage phenomenon is effectively avoided by the step-length variable self-adaptive filter. Furthermore, echo is effectively eliminated based on nonlinear processing.

On the basis of the above-described embodiment, a modified embodiment of the above-described embodiment is proposed, and it is to be noted herein that, in order to make the description brief, only the differences from the above-described embodiment are described in the modified embodiment.

In one embodiment, the step size variable adaptive filter has a learning factor proportional to the filter coefficients. For example, the learning factor is changed based on the filter coefficient to adaptively and dynamically adjust the filter weight, so as to achieve the purpose of fast and stable operation. The filter coefficients may be proportional (e.g., proportional) to the scaling factor and thus the learning factor.

In one embodiment, the step-size variable adaptive filter is a linear filter with a step-size variable learning factor, and the step-size variable adaptive filter comprises at least two sub-filter blocks.

For example, assuming that the total length of the linear filter is N order, the linear filter is divided into P sub-filter blocks, and each filter has a length of L, so that L is N/P; accordingly, the speech frame is shifted to R, M sample points overlap between frames, and the frame length is R + M, which is simplified to L ═ M ═ R here. Wherein P is a positive integer greater than or equal to 2. N, L, R and M are positive numbers, which are not limited herein and can be set by those skilled in the art according to the actual situation. The step size of the learning factor of the linear filter is variable.

In one embodiment, a learning factor of the step-size variable adaptive filter is determined based on the echo signal, the residual spectrum signal, and filter coefficients.

The filter coefficients may be the filter coefficients of each sub-filter block.

Illustratively, the learning factor of the step-size variable adaptive filter per frame of the far-end signal can be determined by the following formula:

wherein mu (n, k) is the learning factor of the step-length variable adaptive filter when processing the k-th frequency point far-end signal of the nth frame,

to reveal a factor, W_p(n, k) are filter coefficients of the p-th sub-filter block,

as frequency domain signals of echo signals, i.e.

Obtaining an echo signal through Fourier inverse transformation,

is a residual spectral signal.

The leakage factor may be determined based on a frequency domain signal of the echo signal and a residual spectral signal. When determining the leakage factor corresponding to the far-end signal of the current frame, the determination may be based on the leakage factor corresponding to the far-end signal of the previous frame.

In one embodiment, the filter coefficients of each sub-filter block in the step-size variable adaptive filter when processing the far-end signal of the current frame are determined according to the corresponding speech signal when processing the far-end signal of the previous frame and the filter coefficients, the learning factor and the residual spectrum signal when processing the far-end signal of the previous frame.

The filter coefficients may be different for each sub-filter when processing the far-end signal for different frames.

In one embodiment, the filter coefficients of each sub-filter may be determined by the following equation:

W_p(n+1，k)＝W_p(n，k)+μ(n，k)conj(X_p(n，k))E(n，k)；

wherein, W_p(n +1, k) is the filter coefficient of the p sub-filter processing the current frame far-end signal, i.e. the filter coefficient when the p sub-filter block processes the n +1 frame k frequency point far-end signal, W_pThe (n, k) th sub-filter processes the filter coefficient of the far-end signal of the previous frame, namely the filter coefficient when the p sub-filter block processes the k frequency point far-end signal of the nth frame, mu (n, k) is the learning factor of the far-end signal of the previous frame, X_pAnd (n, k) is a voice signal corresponding to the last frame of far-end signal processed by the p-th sub-filter, namely, a voice signal corresponding to the k-th frequency point far-end signal of the n-th frame of the p-th sub-filter block, and E (n, k) is a residual spectrum signal corresponding to the last frame of far-end signal, namely, a residual spectrum signal corresponding to the k-th frequency point far-end signal of the n-th frame. conj (.) represents the conjugate operation of the matrix.

In one embodiment of the present invention,

where 0 is a 0 vector of M rows and 1 column.

Example two

Fig. 2 is a schematic flow chart of an echo cancellation method according to a second embodiment of the present disclosure, which is embodied based on the above embodiments. In this embodiment, determining a residual spectrum signal according to a microphone signal and the echo signal specifically includes:

extracting an echo signal from the microphone signal to obtain a residual signal;

and carrying out Fourier transform on the residual signal to obtain a residual spectrum signal.

For a detailed description of the present embodiment, please refer to the above embodiments.

As shown in fig. 2, a second echo cancellation method provided in the embodiment of the present disclosure includes the following steps:

and S210, acquiring a far-end signal.

And S220, processing the far-end signal by a step-length variable self-adaptive filter to obtain an echo signal.

And S230, extracting an echo signal from the microphone signal to obtain a residual signal.

In determining the residual spectrum signal, the present embodiment may first determine the residual signal. This step may determine a difference of the microphone signal and the echo signal as a residual signal. For example, the residual signal is represented as:

wherein e (n) is a residual signal, d (n) is a microphone signal,

is an echo signal.

And S240, carrying out Fourier transform on the residual signal to obtain a residual spectrum signal.

The residual spectrum signal can be expressed as:

where E (n, k) is a residual spectrum signal. This step may perform fourier transform on a vector formed by the residual signal and the 0 vector of M rows and 1 columns to obtain a residual spectrum signal.

And S250, carrying out nonlinear processing on the residual spectrum signal to obtain an output signal.

The following describes an exemplary echo cancellation method provided by the present disclosure:

in order to solve the technical problem of echo leakage, the echo cancellation method in the disclosure provides a sub-band-based equal scale factor, and the filter weight is dynamically adjusted in a self-adaptive manner, so as to achieve the purpose of rapidness and stability. In addition, a nonlinear suppression factor based on a learning factor is provided, the suppression factor can accurately suppress echo components, protect near-end voice components, and finally, the purpose of completely eliminating echo is achieved through a round of residual echo noise reduction.

Fig. 2a is a schematic structural diagram of an echo cancellation method provided in the second embodiment of the present disclosure, and referring to fig. 2a, the echo cancellation method of the present disclosure includes two parts: the step length variable linear filter reduces the damage of near-end voice; and based on the nonlinear processing of the learning rate and the residual echo noise reduction processing, the residual echo is completely eliminated.

The far-end signal x (n) is estimated by the variable length adaptive filter h (n) (namely the step length variable adaptive filter)

The microphone signals d (n) (y (n) + s (n)) + v (n), where s (n) is the near-end speech, v (n) is the local noise, and y (n) is the actual echo. Residual signal

The residual signal passes through a nonlinear processing module to obtain an output signal out (n), wherein the nonlinear processing module comprises a learning factor-based nonlinear processing module and a residual echo noise reduction processing module.

In this disclosure, the adaptive filter coefficients are represented as: w_p(n, k) wherein P is 0, 1, 2., (P-1).

The echo frequency domain and time domain signals estimated by the adaptive filtering are represented as:

the last L elements of (1);

residual signal

The FFT transform obtains a residual spectrum, namely a residual spectrum signal:

where 0 is a 0 vector of M rows and 1 column.

The convergence conditions of the P sub-filter blocks in the same sub-band are stable and consistent, the convergence conditions of different sub-bands are different, on the basis, an equal scale factor based on the sub-band is provided, the filter coefficient with larger weight among different sub-bands is enhanced, the filter coefficient with smaller weight is reduced, the convergence is accelerated, meanwhile, the proportion is determined through the comprehensive performance of each sub-band of the P sub-filter blocks, the unstable jitter condition among different blocks of the same sub-band can be eliminated, and the filter divergence is reduced.

The specific calculation method is as follows:

calculation of the equal scale factor:

the variable step learning rate factor (i.e., learning factor) is as follows:

the equal scale factor is proportional to the filter coefficient, and the learning factor is proportional to the filter coefficient.

The leakage factor is calculated as follows:

S_EY(n，k)＝α(n)S_EY(n-1，k)+(1-α(n))S_E(n，k)conj(S_Y(n，k))；

S_YY(n，k)＝α(n)S_YY(n-1，k)+(1-α(n))S_Y(n，k)conj(S_Y(n，k))。

the estimated power spectral density of the echo signal is approximately expressed as:

where real (.) represents taking the real part. conj (.) represents the conjugate operation of the matrix.

The power spectral density of the linear residual is approximately expressed as:

S_E(n，k)＝α(n)S_E(n-1，k)+(1-α(n))real(E(n，k)conj(E(n，k)))。

the filter coefficient weights are updated as follows:

W_p(n+1，k)＝W_p(n，k)+μ(n，k)conj(X_p(n，k))E(n，k)；

where 0 represents a 0 vector. Such as a 0 vector of M rows and 1 column.

Fig. 2b is a schematic diagram of a remote signal according to a second embodiment of the disclosure; fig. 2c is a schematic diagram of a near-end signal according to a second embodiment of the disclosure; fig. 2d is a schematic diagram of an output signal according to a second embodiment of the disclosure. Referring to fig. 2b-2d, the resulting output signal is effectively noise-free based on echo cancellation processing of the far-end signal and the near-end signal.

The echo cancellation method provided in the second embodiment of the present disclosure embodies the operation of determining the residual spectrum signal. By using the method, the technical problem of echo leakage can be effectively solved.

In one embodiment, the performing nonlinear processing on the residual spectrum signal to obtain an output signal specifically includes:

and carrying out nonlinear processing on the residual spectrum signal based on the learning factor to obtain an output signal.

The larger the learning factor is, the larger the estimated echo intensity is, that is, the higher the probability of the echo occurring at the frequency point is, and the nonlinear processing factor based on the learning factor can effectively distinguish the echo frequency point region from the near-end voice frequency point region, so that the echo frequency point can be further inhibited in a targeted manner, and especially in a double-talk state, near-end voice can be effectively protected.

In one embodiment, the subjecting the residual spectrum signal to a non-linear processing based on the learning factor to obtain an output signal includes:

determining a nonlinear factor according to the learning factor, the number of sub-filter blocks included in the step-size variable adaptive filter and a corresponding voice signal when the step-size variable adaptive filter processes the far-end signal;

determining a product of the non-linearity factor and the residual spectral signal as an output signal.

When the residual spectrum signal is subjected to the non-linear processing based on the learning factor, the output signal may be obtained by multiplying the residual spectrum signal by the non-linear factor.

The voice signal corresponding to the processing of the far-end signal by the step-size variable adaptive filter can be regarded as the voice signal corresponding to the processing of the far-end signal by each sub-filter included in the step-size variable adaptive filter.

The corresponding speech signal when the step size variable adaptive filter processes the far-end signal can be used to determine the energy of the far-end signal in all sub-filter blocks.

In one embodiment, the non-linearity factor is determined by the following equation:

X_p(n，k)＝FFT(x_p(n))；

wherein H (n, k) is a non-linear factor, and P is the step-size variable adaptive filterNumber of sub-filter blocks included, x_p(n) is the nth frame speech signal of the p-th sub-filter block, and T represents the transpose operation of the matrix.

In one embodiment, the method further specifically includes:

and carrying out residual echo noise reduction processing on the output signal to obtain a noise-reduced output signal.

Estimating residual echo:

the method of noise estimation using minimum tracking is as follows:

the smoothed spectrum of the microphone signal is represented as:

D(n，k)＝FFT(d(n))；

D_smooth(n，k)＝0.85D_smooth(n-1，k)+0.15|D(n，k)|²；

N(n，k)＝min(D_smooth(n，k)，D_smooth(n-1，k)...D_smooth(n-win_size，k))；

the win _ size may be equal to the number of time windows, where the number of time windows may be set according to practical situations, and is not limited herein, for example, the win _ size is 80, and the length of a single time window is 10ms, which corresponds to a time window of 800 ms.

The residual echo is processed as noise, so the total noise estimate is:

Total(n，k)＝min(Res(n，k)+N(n，k)，|E(n，k)|²)。

the wiener filter estimator computes as follows:

the posterior signal-to-noise ratio:

the decision-directed (DD) algorithm estimates the a priori signal-to-noise ratio:

calculating the final wiener filter factor:

E2(n，k)＝E1(n，k)Wiener(n，k)；

where E1(n, k) is an output signal after nonlinear processing.

Obtaining a sample output of the nth frame: out (n) IFFT (E2(n, k)).

EXAMPLE III

Fig. 3 is a schematic structural diagram of an echo cancellation device according to a third embodiment of the present disclosure, which is applicable to solve the problem of leaky echo, where the device may be implemented by software and/or hardware and is generally integrated on a terminal device.

As shown in fig. 3, the apparatus includes: an acquisition module 31, a first processing module 32, a determination module 33 and a second processing module 34;

the acquiring module 31 is configured to acquire a far-end signal;

a first processing module 32, configured to process the far-end signal through a step-size variable adaptive filter to obtain an echo signal, where the step-size variable adaptive filter learns that the step size of a factor is variable when processing each frame of the far-end signal;

a determining module 33, configured to determine a residual spectrum signal according to a microphone signal and the echo signal;

and a second processing module 34, configured to perform nonlinear processing on the residual spectrum signal to obtain an output signal, so as to complete echo cancellation.

In this embodiment, the apparatus first acquires the far-end signal through the acquisition module 31; secondly, the first processing module 32 is configured to process the far-end signal by using a step-size variable adaptive filter to obtain an echo signal, where the step-size variable adaptive filter is an adaptive filter with a variable learning factor step size when processing each frame of the far-end signal; then, determining a residual spectrum signal by a determining module 33 according to the microphone signal and the echo signal; and finally, the residual spectrum signal is subjected to nonlinear processing by a second processing module 34 to obtain an output signal, so as to complete echo cancellation.

The embodiment provides an echo cancellation device, which effectively avoids the generation of a leakage echo phenomenon through a step-size variable adaptive filter. Furthermore, echo is effectively eliminated based on nonlinear processing.

Further, a learning factor of the step-size variable adaptive filter is determined based on the echo signal, the residual spectrum signal, and filter coefficients.

Further, the step size variable adaptive filter has a learning factor proportional to the filter coefficient.

Further, the step-size variable adaptive filter is a linear filter with a variable learning factor step size, and the step-size variable adaptive filter comprises at least two sub-filter blocks.

Further, the filter coefficient when each sub-filter block in the step-size variable adaptive filter processes the far-end signal of the current frame is determined according to the corresponding speech signal when processing the far-end signal of the previous frame and the filter coefficient, the learning factor and the residual spectrum signal when processing the far-end signal of the previous frame.

Further, the filter coefficients of each sub-filter block are determined by the following formula:

W_p(n+1，k)＝W_p(n，k)+μ(n，k)conj(X_p(n，k))E(n，k)；

wherein, W_p(n +1, k) is the filter coefficient when the p sub-filter block processes the k frequency point far-end signal of the n +1 frame, W_p(n, k) is the filter coefficient when the p sub-filter block processes the nth frame k frequency point far-end signal, mu (n, k) is the learning factor, E (n, k) is the residual spectrum signal corresponding to the nth frame k frequency point far-end signal, X_pAnd (n, k) is a voice signal corresponding to the remote signal of the nth frequency point of the nth frame of the pth sub-filter block, and conj (.) represents the conjugate operation of the matrix.

Further, the determining module 33 is specifically configured to:

Further, the second processing module 34 is specifically configured to:

Further, the second processing module 34 performs a non-linear processing on the residual spectrum signal based on the learning factor to obtain an output signal, including:

Further, the second processing module 34 determines the non-linearity factor by the following formula:

X_p(n，k)＝FFT(x_p(n))；

wherein H (n, k) is a non-linear factor, P is the number of sub-filter blocks included in the step-size-variable adaptive filter, x_p(n) is the nth frame speech signal of the p-th sub-filter block.

Further, the apparatus further comprises:

and the noise reduction module is used for carrying out residual echo noise reduction processing on the output signal to obtain a noise-reduced output signal.

The echo cancellation device can execute the echo cancellation method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present disclosure. Fig. 4 shows a schematic structural diagram of a terminal device 400 suitable for implementing an embodiment of the present disclosure. The terminal Device 400 in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a vehicle mounted terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a desktop computer and the like. The terminal device 400 shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, the terminal device 400 may include one or more processing means (e.g., a central processing unit, a graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage means 408 into a Random Access Memory (RAM) 403. One or more processing devices 401 implement the methods as provided by the present disclosure. In the RAM403, various programs and data necessary for the operation of the terminal apparatus 400 are also stored. The processing device 401, the ROM402, and the RAM403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408, including, for example, magnetic tape, hard disk, etc., storage 408 for storing one or more programs; and a communication device 409. The communication means 409 may allow the terminal device 400 to communicate with other devices wirelessly or by wire to exchange data. While fig. 4 illustrates a terminal apparatus 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 409, or from the storage device 408, or from the ROM 402. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 401.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer-readable medium may be contained in the terminal device 400; or may exist separately without being assembled into the terminal device 400.

The computer-readable medium carries one or more programs which, when executed by the terminal device, cause the terminal device 400 to:

acquiring a far-end signal;

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a module in some cases does not constitute a limitation on the module itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides, in accordance with one or more embodiments of the present disclosure, an echo cancellation method, including:

acquiring a far-end signal;

Example 2 in accordance with one or more embodiments of the present disclosure, the method of example 1,

a learning factor of the step-size variable adaptive filter is determined based on the echo signal, the residual spectrum signal, and filter coefficients.

Example 3 in accordance with one or more embodiments of the present disclosure, the method of example 1,

the learning factor of the step-size variable adaptive filter is proportional to the filter coefficient.

Example 4 in accordance with one or more embodiments of the present disclosure, the method of example 1,

the step-size-variable adaptive filter is a linear filter with a variable learning factor step size, and comprises at least two sub-filter blocks.

Example 5 in accordance with one or more embodiments of the present disclosure, the method of example 4,

and the filter coefficient when each sub-filter block in the step length variable self-adaptive filter processes the far-end signal of the current frame is determined according to the corresponding voice signal when the far-end signal of the previous frame is processed and the filter coefficient, the learning factor and the residual spectrum signal when the far-end signal of the previous frame is processed.

Example 6 in accordance with one or more embodiments of the present disclosure, the method of example 5,

the filter coefficients for each sub-filter block are determined by the following formula:

W_p(n+1，k)＝W_p(n，k)+μ(n，k)conj(X_p(n，k))E(n，k)；

Example 7 the method of example 1, the determining a residual spectral signal from the microphone signal and the echo signal, according to one or more embodiments of the present disclosure, comprising:

Example 8 the method of example 1, the non-linearly processing the residual spectrum signal to obtain an output signal, according to one or more embodiments of the present disclosure, includes:

Example 9 the method of example 8, the subjecting the residual spectrum signal to a non-linear processing based on the learning factor to obtain an output signal, according to one or more embodiments of the present disclosure, includes:

Example 10 in accordance with the method of example 9, in accordance with one or more embodiments of the present disclosure, the non-linearity factor is determined by the following equation:

X_p(n，k)＝FFT(x_p(n))；

Example 11 the method of example 8, in accordance with one or more embodiments of the present disclosure, further comprising:

Example 12 provides, in accordance with one or more embodiments of the present disclosure, an echo cancellation device, including:

the acquisition module is used for acquiring a far-end signal;

Example 13 provides, in accordance with one or more embodiments of the present disclosure, a terminal device, comprising:

one or more processing devices;

storage means for storing one or more programs;

when executed by the one or more processing devices, cause the one or more processing devices to implement the method of any of examples 1-11.

Example 14 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements the method of any of examples 1-11, in accordance with one or more embodiments of the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. An echo cancellation method, comprising:

acquiring a far-end signal;

carrying out nonlinear processing on the residual spectrum signal to obtain an output signal so as to complete echo cancellation;

the performing nonlinear processing on the residual spectrum signal to obtain an output signal includes:

carrying out nonlinear processing on the residual spectrum signal based on the learning factor to obtain an output signal;

the step-size-variable adaptive filter is a linear filter with a variable learning factor step size, and comprises at least two sub-filter blocks;

W_p(n+1，k)＝W_p(n，k)+μ(n，k)conj(X_p(n，k))E(n，k)；

2. The method of claim 1, wherein a learning factor of the step-size variable adaptive filter is determined based on the echo signal, the residual spectrum signal, and filter coefficients.

3. The method of claim 1, wherein the step size variable adaptive filter has a learning factor proportional to filter coefficients.

4. The method of claim 1, wherein the filter coefficients of each sub-filter block in the step-size variable adaptive filter when processing the far-end signal of the current frame are determined according to the corresponding speech signal when processing the far-end signal of the previous frame and the filter coefficients, the learning factor and the residual spectrum signal when processing the far-end signal of the previous frame.

5. The method of claim 1, wherein determining a residual spectral signal from the microphone signal and the echo signal comprises:

6. The method of claim 1, wherein the subjecting the residual spectrum signal to a non-linear processing based on the learning factor to obtain an output signal comprises:

7. The method of claim 6, wherein the non-linearity factor is determined by the formula:

X_p(n，k)＝FFT(x_p(n))；

wherein H (n, k) is a non-linear factor, P is the number of sub-filter blocks included in the step-size-variable adaptive filter, x_p(n) the nth frame speech signal of the p-th sub-filter block;

mu (n, k) is a learning factor, X_pAnd (n, k) is a voice signal corresponding to the far-end signal of the kth frequency point of the nth frame of the p-th sub-filter block, and real (.) represents a real part.

8. The method of claim 1, further comprising:

9. An echo cancellation device, comprising:

the acquisition module is used for acquiring a far-end signal;

the second processing module is used for carrying out nonlinear processing on the residual spectrum signal to obtain an output signal so as to complete echo cancellation;

the second processing module is further used for carrying out nonlinear processing on the residual spectrum signal based on the learning factor to obtain an output signal;

W_p(n+1，k)＝W_p(n，k)+μ(n，k)conj(X_p(n，k))E(n，k)；

10. A terminal device, comprising:

one or more processing devices;

storage means for storing one or more programs;

when executed by the one or more processing devices, cause the one or more processing devices to implement the echo cancellation method of any one of claims 1-8.

11. A computer-readable medium, on which a computer program is stored, which, when being executed by processing means, carries out the echo cancellation method according to any one of claims 1-8.