CN116013345A

CN116013345A - Echo cancellation method and electronic equipment

Info

Publication number: CN116013345A
Application number: CN202211672640.8A
Authority: CN
Inventors: 毛亚朋; 黄景标; 方瑞东; 林聚财; 殷俊; 刘克柱; 薛晗
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2023-04-25

Abstract

The invention discloses an echo cancellation method and electronic equipment, which are used for better canceling echo in a steady state and avoiding large-area echo leakage when a path is suddenly changed. The method comprises the following steps: acquiring an audio signal, wherein the audio signal comprises a near-end signal and a far-end signal; filtering the far-end signal by using a first filter and a second filter to obtain respectively corresponding echo information, wherein the capacity of the first filter for eliminating echo when an echo path is stable is higher than that of the second filter, and the convergence speed of the second filter when the echo path is changed is higher than that of the first filter; adjusting the echo suppression laser progress according to the echo information corresponding to each echo, and determining a first residual signal by utilizing the echo information corresponding to the first filter and the near-end signal; and suppressing the first residual signal by using the adjusted echo suppression excitation progress to obtain an audio signal after echo cancellation.

Description

Echo cancellation method and electronic equipment

Technical Field

The present invention relates to the field of audio signal processing technologies, and in particular, to a method for echo cancellation and an electronic device.

Background

In instant messaging applications, two parties or multiple parties need to communicate with each other in real time, and in places with high requirements, an external sound box is usually adopted to play sound, so that echo is necessarily generated, that is, after one party speaks, sound is played through the sound box of the other party, and then the sound is collected by the Mic (microphone) of the other party and transmitted back to the user. If the echo is not processed, the communication quality and the user experience are affected, and more serious vibration is formed, so that howling is generated.

Acoustic echo cancellation is a processing method that prevents the sound at the far end from returning by canceling or removing the far-end audio signal picked up in the local microphone, after the sound is collected by the Mic, the sound played by the local speaker is cancelled from the sound data collected by the Mic, so that the sound recorded by the Mic is only the sound uttered by the local user.

However, in the case of abrupt change of the echo path, the echo cancellation algorithm adopted at present is rapidly deteriorated, and the echo cancellation effect is also rapidly deteriorated compared with the echo path in steady state, and a large-area echo leakage situation is generated.

Disclosure of Invention

The invention provides an echo cancellation method and electronic equipment, which are used for realizing better echo cancellation under a steady state by utilizing different characteristics of a dual filter in a steady state and convergence process, and rapidly compressing the echo to avoid large-area echo leakage when a path is suddenly changed.

In a first aspect, an embodiment of the present invention provides a method for echo cancellation, where the method includes:

acquiring an audio signal, wherein the audio signal comprises a near-end signal and a far-end signal, and the near-end signal and the far-end signal are distinguished based on the propagation mode of the audio signal;

filtering the far-end signal by using a first filter and a second filter to obtain respectively corresponding echo information, wherein the capacity of the first filter for eliminating echo when an echo path is stable is higher than that of the second filter, and the convergence speed of the second filter when the echo path is changed is higher than that of the first filter;

adjusting the echo suppression laser progress according to the echo information corresponding to each echo, and determining a first residual signal by utilizing the echo information corresponding to the first filter and the near-end signal;

and suppressing the first residual signal by using the adjusted echo suppression excitation progress to obtain an audio signal after echo cancellation.

In the embodiment, the second filter is used for tracking the echo path, when the change of the echo path is detected, the echo suppression progress is adjusted, and the aggressive echo suppression measure is adopted, so that the condition of large-area echo leakage is reduced.

In a second aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is configured to store a program executable by the processor, and the processor is configured to read the program in the memory and execute the following steps:

As an alternative embodiment, when the echo information includes an echo path, the processor is specifically configured to perform:

Determining the similarity between the echo paths corresponding to the echo paths;

and adjusting the progress of echo suppression excitation according to the similarity, wherein the progress of echo suppression excitation increases with the increase of the similarity.

As an alternative embodiment, when the echo information comprises an echo signal, the processor is specifically configured to perform:

determining a second residual signal according to the echo information corresponding to the second filter and the near-end signal;

determining an energy ratio according to the power of the first residual signal and the power of the second residual signal;

and adjusting the echo suppression laser progress according to the energy ratio, wherein the echo suppression laser progress increases with the increase of the energy ratio.

As an alternative embodiment, the processor is specifically configured to perform:

and determining the state of a first filter according to the echo information corresponding to each echo, and adjusting the echo suppression progress according to the state of the first filter.

As an alternative embodiment, the processor is specifically further configured to perform:

adjusting the convergence speed of the first filter according to the echo information corresponding to each echo;

And filtering the far-end signal by using the adjusted first filter to obtain corresponding new echo information, and continuously adjusting the echo suppression progress according to the new echo information and the echo information corresponding to the second filter.

and adjusting the convergence speed of the first filter according to the similarity, wherein the convergence speed is reduced along with the increase of the similarity.

and adjusting the convergence speed of the first filter according to the energy ratio, wherein the convergence speed increases with the increase of the energy ratio.

Adjusting a noise covariance matrix of the first filter according to the echo information corresponding to each filter, wherein the noise covariance matrix is used for representing the deviation degree of echo path estimation of the first filter;

and adjusting the convergence speed of the first filter according to the noise covariance matrix.

and determining the convergence state of the first filter according to the echo information corresponding to each echo, and adjusting the convergence speed of the first filter according to the convergence state of the first filter.

the steady state error of the first filter is less than the steady state error of the second filter and/or the convergence speed of the second filter is greater than the convergence speed of the first filter.

In a third aspect, an embodiment of the present invention further provides an apparatus for echo cancellation, where the apparatus includes:

the audio acquisition module is used for acquiring an audio signal, wherein the audio signal comprises a near-end signal and a far-end signal, and the near-end signal and the far-end signal are distinguished based on the propagation mode of the audio signal;

The double-filtering module is used for respectively filtering the far-end signals by using a first filter and a second filter to obtain respectively corresponding echo information, wherein the capacity of the first filter for eliminating echo when an echo path is stable is higher than that of the second filter, and the convergence speed of the second filter when the echo path is changed is higher than that of the first filter;

the calculation residual error module is used for adjusting the echo suppression laser progress according to the echo information corresponding to each echo, and determining a first residual error signal by utilizing the echo information corresponding to the first filter and the near-end signal;

and the echo suppression module is used for suppressing the first residual signal by utilizing the adjusted echo suppression laser progress to obtain an audio signal after echo cancellation.

As an optional implementation manner, when the echo information includes an echo path, the calculating residual module is specifically configured to:

As an alternative embodiment, when the echo information includes an echo signal, the calculating residual module is specifically configured to:

As an alternative embodiment, the calculation residual module is specifically configured to:

As an optional implementation manner, the convergence speed adjusting module is specifically configured to:

As an optional implementation manner, when the echo information includes an echo path, the convergence speed adjustment module is specifically configured to:

As an optional implementation manner, when the echo information includes an echo signal, the convergence speed adjustment module is specifically configured to:

As an optional implementation manner, the convergence speed adjustment module is specifically configured to:

As an alternative embodiment, the steady state error of the first filter is smaller than the steady state error of the second filter, and/or the convergence speed of the second filter is greater than the convergence speed of the first filter.

In a fourth aspect, embodiments of the present invention also provide a computer storage medium having stored thereon a computer program for carrying out the steps of the method of the first aspect described above when executed by a processor.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an application scenario of echo cancellation according to an embodiment of the present invention;

fig. 2 is a flowchart of an implementation of a method for echo cancellation according to an embodiment of the present invention;

fig. 3 is a flowchart of an implementation method of echo cancellation according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an apparatus for echo cancellation according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the embodiment of the invention, the term "and/or" describes the association relation of the association objects, which means that three relations can exist, for example, a and/or B can be expressed as follows: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The application scenario described in the embodiment of the present invention is for more clearly describing the technical solution of the embodiment of the present invention, and does not constitute a limitation on the technical solution provided by the embodiment of the present invention, and as a person of ordinary skill in the art can know that the technical solution provided by the embodiment of the present invention is applicable to similar technical problems as the new application scenario appears. In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.

In a video conference, people usually use a hands-free or video conference terminal to talk, and the sound played by a speaker and the near-end speech sound are collected by a microphone at the same time, as shown in fig. 1, this embodiment provides an application scenario schematic diagram of echo cancellation, and the microphone collects the voice signal of the user and also collects the sound played by a far-end speaker at the same time, so that the performance of echo cancellation and acoustic echo cancellation is required, and the performance of echo cancellation is used as an important index in a voice interaction system, so that the communication experience between the user and equipment or between the users is greatly affected. Conventional acoustic echo cancellation typically includes two major blocks, i.e., linear echo cancellation and echo post-processing, where linear echo cancellation typically uses a normalized least squares algorithm (Normalized Least Mean Square, NLMS), an affine projection algorithm (Affine projection, AP), a recursive least squares algorithm (Recursive least squares, RLS), a KALMAN filter algorithm (KALMAN), and the like to obtain a linear residual signal. Under the condition of abrupt change of the echo path, the adaptive algorithm needs to converge again to estimate the current real echo path, and the linear echo cancellation effect is rapidly deteriorated compared with the steady state in the convergence process, so that the large-area echo leakage condition is generated. Although the echo cancellation algorithm adopted at present can remove echo components in the acquired signals to a certain extent, under the condition of abrupt change of an echo path, the echo cancellation algorithm is rapidly deteriorated, and compared with the echo cancellation effect under steady state, the echo cancellation effect is also rapidly deteriorated, so that the condition of large-area echo leakage is generated.

Echo path detection (Echo-path Change Detector, ECD) is a significant element in improving the robustness of Echo cancellation algorithms. According to the echo cancellation method provided by the embodiment, by extracting different characteristics of the double filters in the steady state and convergence process, whether the current environment has echo path mutation or not can be detected according to the echo information corresponding to each double filter, so that when the echo path has mutation, aggressive echo suppression measures are adopted to suppress echoes rapidly, and the condition of large-area echo leakage is avoided.

As shown in fig. 2, the specific implementation flow of the method for echo cancellation according to this embodiment is as follows:

step 200, acquiring an audio signal, wherein the audio signal comprises a near-end signal and a far-end signal, and the near-end signal and the far-end signal are distinguished based on the propagation mode of the audio signal;

in practice, the propagation means may comprise a signal propagated through a playback device such as a speaker/horn, or a signal propagated through a non-playback device, such as a signal propagated through air. The far-end signal includes a voice signal played through the speaker/horn, and the near-end signal includes a voice signal in the environment other than the voice signal played by the speaker/horn, such as user voice, ambient noise, and the like. The near-end signal includes but is not limited to an echo-containing voice signal collected by the audio collection device, and the far-end signal includes but is not limited to a voice signal played remotely, for example, the near-end voice signal may be understood as user voice picked up by a microphone, and the far-end voice signal may be understood as sound played through a speaker/loudspeaker of the near-end device transmitted to the near-end device through a network or the like.

Step 201, filtering the far-end signal by using a first filter and a second filter to obtain respective corresponding echo information, wherein the capacity of the first filter for eliminating echo when an echo path is stable is higher than that of the second filter, and the convergence rate of the second filter when the echo path is changed is higher than that of the first filter;

in some embodiments, before the filtering process is performed on the far-end signal by using the first filter and the second filter, the obtained audio signal may be converted from time domain to frequency domain, and the specific conversion process is as follows, taking the microphone to collect the audio signal as an example:

first, the near-end signal acquired by the microphone required for echo cancellation is denoted as d (n), and the far-end signal transmitted from the far end and played through the loudspeaker is denoted as x (n). And secondly, framing and short-time Fourier transformation are carried out on the near-end signal to obtain a frequency domain near-end signal of the near-end signal, and similarly, framing and short-time Fourier transformation are carried out on the far-end signal to obtain a frequency domain far-end signal of the far-end signal.

The frame dividing method in this embodiment may use an overlap segmentation method, where the overlap between frames is a frame shift, where the value of the frame shift is half of the frame length. The method comprises the steps of converting a near-end signal D (n) and a far-end signal X (n) in a current frame audio signal acquired by a microphone and the near-end signal D (n) and the far-end signal X (n) in a previous frame audio signal from a time domain signal into a frequency domain signal by adopting a short-time Fourier transform method to obtain the frequency domain signal of the current frame audio signal, marking the frequency domain near-end signal in the frequency domain signal of the converted current frame audio signal as D (k), and marking the frequency domain far-end signal in the frequency domain signal of the converted current frame audio signal as X (k).

In implementation, the first filter and the second filter are used for respectively filtering the frequency domain far-end signal of the far-end signal to obtain the corresponding echo information.

In some embodiments, the echo information in this embodiment includes, but is not limited to, at least one of an echo path and an echo signal, where the echo signal may be obtained by calculation of the echo path and a frequency domain far-end signal, and specifically may be determined by the following formula:

Y _shadow (k)＝W _shadow (k, l) X (k) formula (1);

wherein W is _shadow (k, l) represents the echo path (frequency domain signal), Y _shadow (k) Represents an echo signal (frequency domain signal), X (k) represents a frequency domain far-end signal; k represents a frequency bin, and l represents a time frame number.

In this embodiment, a dual filter is used, and the dual filter has different characteristics, and the first filter and the second filter can both perform linear filtering processing on the far-end signal, but the first filter is used for outputting a final linear filtering result, and the second filter is used for performing foreground prediction and timely tracking rapid path change of an echo path. By utilizing different characteristics of the double filters, the echo is well eliminated in a steady state, and the echo is rapidly compressed when the path is suddenly changed, so that the echo leakage in a large area is avoided.

In some embodiments, the steady state error of the first filter is less than the steady state error of the second filter, and/or the convergence speed of the second filter is greater than the convergence speed of the first filter.

It should be noted that, the convergence speed of the filter is generally contradictory to the steady-state error, when the filter has a faster convergence speed, a larger steady-state error is generally brought, whereas when the filter has a lower steady-state error, the convergence speed is generally slower; the present embodiment uses the faster convergence speed of the second filter to track the echo path quickly, and uses the lower steady state error of the first filter as the final linear output.

In some embodiments, the filtering algorithm adopted by the first filter/second filter in the present embodiment includes, but is not limited to, at least one of LMS (Least Mean Square, least mean square error) algorithm, NLMS (Normalized LMS) algorithm, RLS (Recursive Least Square, recursive least squares) algorithm, KALMAN (KALMAN) algorithm.

Step 202, adjusting the echo suppression laser progress according to the echo information corresponding to each echo, and determining a first residual signal by using the echo information corresponding to the first filter and the near-end signal;

In some embodiments, when the echo information includes an echo path, the echo suppression progress may be adjusted according to the similarity of the echo path, specifically:

determining the similarity between the echo paths corresponding to the echo paths; and adjusting the progress of echo suppression excitation according to the similarity, wherein the progress of echo suppression excitation increases with the increase of the similarity.

In practice, since the main function of the second filter in this embodiment is to track the rapid echo path change in time, a faster convergence filtering algorithm, such as NLMS algorithm, is adopted, and the frequency domain form of the echo path estimated by the second filter is denoted as W _shadoω (k，l）。

Since the first filter in this embodiment is used as the final linear output result filter, the function of the first filter is to cancel echo as much as possible, so that a filter with smaller steady state error, such as a KALMAN filter, is used, and after convergence, the first filter has a lower steady state error, so that the echo can be removed better. Taking the first filter as the KALMAN filter as an example, the echo path estimated by the first filter is recorded as W in the frequency domain _main (k，l)。

Optionally, the similarity between echo paths is determined by the following formula:

similarity(k)＝Dis tan ce(W _main (k，l)，W _shadow (k, l)) formula (2);

where similarity (k) represents similarity, distance () represents the Distance between the computed echo paths, W _shadow (k, l) represents the echo path corresponding to the second filter,W _main (k, l) represents an echo path corresponding to the first filter, and the method for calculating the distance in this embodiment may be a method for calculating cosine similarity, cepstrum distance, KL divergence, etc., which is not limited in this embodiment.

In some embodiments, the state of the first filter is determined according to the similarity between the respective echo paths, and the echo suppression progress is adjusted according to the state of the first filter. Wherein the similarity is [0,1]The closer the similarity is to 0, the more W is expressed _shadow (k, l) and W _main The greater the difference in (k, l), the closer the similarity is to 1, indicating W _shadow (k, l) and W _main The more the (k, l) is like, the smaller the difference; the greater the similarity, the better the convergence state of the first filter; the smaller the similarity, the more divergent the state of the first filter; the better the convergence state of the first filter, the smaller the echo suppression laser progress, and the greater the echo suppression laser progress of the first filter, wherein the echo suppression laser progress is changed within a certain range.

In implementation, the relationship between the state of the first filter and the similarity in the present embodiment is as follows:

Wherein L is ₁ May be 0, when the similarity approaches 0, the first filter is in a convergence state, when the similarity is greater than 0 and less than L ₂ When the similarity is greater than L, the first filter is in an under-filtered state ₂ When the first filter is in a divergent state.

Optionally, the echo suppression laser progress is adjusted according to the state of the first filter by:

the gamma represents the progress of echo suppression excitation, the range of the gamma value interval is [0,1], and the higher the gamma value is, the greater the echo suppression is, and the more serious the near-end voice damage is correspondingly. And adjusting the echo suppression excitation progress according to the state of the first filter, wherein the echo suppression excitation progress is respectively corresponding to the grade_1, the grade_2 and the grade_3, and the grade_1 is smaller than the grade_2 and smaller than the grade_3 under the normal condition.

In some embodiments, when the echo information includes an echo signal, the echo suppression progress may be adjusted according to an energy ratio of the residual signal, specifically:

step 1) determining a second residual signal according to echo information corresponding to the second filter and the near-end signal;

in implementation, the second residual signal is determined according to the difference value between the near-end signal and the echo signal corresponding to the second filter, and is specifically determined by the following formula:

e _shadow (n)＝d(n)-y _shadow (n) formula (3);

wherein Y is _shadow (k) Representing echo signals corresponding to the second filter, y _shadow (n) represents Y _shadow (k) D (n) represents the acquired near-end signal (time-domain signal), e _shadow (n) represents a second residual signal.

Step 2) determining an energy ratio according to the power of the first residual signal and the power of the second residual signal;

in implementation, the difference value between the echo information corresponding to the near-end signal and the first filter is used to determine a first residual signal, which is specifically determined by the following formula:

Y _main (k)＝W _main (k, l) X (k) formula (4);

e _main (n)＝d(n)-y _main (n) formula (5);

in the formula (4), W _main (k, l) represents the echo path (frequency domain signal) estimated by the first filter, X (k) represents the frequency domain far-end signal, Y _main (k) Representing an echo signal (frequency domain signal) corresponding to the first filter; k represents a frequency bin, and l represents a time frame number.

In the formula (5), y _main (n) represents Y _main (k) Time domain of (a)Form d (n) represents the acquired near-end signal (time-domain signal), e _main (n) represents a first residual signal (time domain signal).

Optionally, the energy ratio is determined according to the ratio of the power of the first residual signal and the power of the second residual signal.

And 3) adjusting the echo suppression laser progress according to the energy ratio, wherein the echo suppression laser progress increases with the increase of the energy ratio. Wherein the echo suppression laser progress increases with the increase of the energy ratio within a certain range.

Optionally, the echo suppression excitation progress may be adaptively adjusted based on the energy ratio, for example, the energy ratio of a plurality of intervals may be set, where each interval corresponds to one echo suppression excitation progress, and when the calculated energy ratio is in a certain interval, the echo suppression excitation progress corresponding to the interval is taken as a final adjustment target, and the current echo suppression excitation progress is adjusted to the adjustment target. The curve relationship between the energy ratio and the echo suppression excitation progress may be set, for example, a linear relationship, a nonlinear relationship, or the like, so that the curve relationship may satisfy a condition that the echo suppression excitation progress increases with an increase in the energy ratio. Specifically, how to adjust the echo suppression progress based on the energy ratio can be adjusted according to actual requirements, and this embodiment is not limited too much.

Wherein when the energy ratio exceeds the threshold, indicating that the filter is in a divergent state, the first filter may be reset.

In some embodiments, the state of the first filter is determined according to the echo information corresponding to each other, and the echo suppression progress is adjusted according to the state of the first filter.

Optionally, the state of the first filter is determined according to the energy ratio, and the echo suppression progress is adjusted according to the state of the first filter. Wherein a smaller energy ratio indicates a better convergence state of the first filter, and the echo path is closer to the real echo path; the better the convergence state of the first filter, the smaller the echo suppression laser progress, and the echo suppression laser progress changes within a certain range.

In practice, the power of the first residual signal is calculated by the following formula:

wherein, beta is a preset value and represents a smoothing factor; e, e _main (i) Representing the first residual signal, P _main (n) represents the power of the first residual signal.

Wherein, beta is a preset value and represents a smoothing factor; e, e _shadow (i) Representing the second residual signal, P _shadow (n) represents the power of the second residual signal.

In practice, P is used _main /P _shadow Represents the energy ratio, where P is when the first filter is in steady state _main /P _shadow Typically a value less than 1. The state of the first filter may be determined from the energy ratio by:

wherein T is ₁ Represents a value greater than 1 and close to 1, T ₂ > 1 and T ₂ ＞T ₁ . When the energy ratio is close to 1, the first filter is in a convergence state, and when the energy ratio is more than 1 and less than T ₂ When the energy ratio is greater than or equal to T, the first filter is in an under-filtered state ₂ When the first filter is in a divergent state.

When the environment is stable, the first filter converges after a sufficient time, and generally enters a converging state, and at this time, the energy of the first residual signal corresponding to the first filter is smaller than the energy of the second residual signal corresponding to the second filter. When the environment has slight changes such as walking, dragging a desk and a chair, the filter usually enters an under-filtering state, and the energy of a first residual signal corresponding to the first filter is slightly smaller than or equal to that of a second residual signal corresponding to the second filter. When the acquisition equipment is dragged, the conference room is opened and closed, and the horn is shielded, the first filter is likely to enter a divergent state.

Threshold value T ₁ And T is ₂ Can be set according to actual use, and in general, the first filter is P in a convergence state or a partially undercrown state _main (n)＜P _shadow (n)。

After the state of the filter is determined according to the energy ratio, the echo suppression progress can be adjusted according to the state of the first filter, which is specifically as follows:

In some embodiments, when the echo suppression progress is adjusted according to the echo information corresponding to each other, the state of the first filter may be determined first according to the echo information corresponding to each other, and the echo suppression progress may be adjusted according to the state of the first filter. Wherein the more the first filter converges, the smaller the echo suppression laser progress; the more the first filter is dispersed, the greater the progress of the echo suppression laser.

And 203, suppressing the first residual signal by using the adjusted echo suppression laser progress to obtain an audio signal after echo cancellation.

In some embodiments, the present embodiment may adjust the convergence speed of the first filter according to the echo information corresponding to each of the echo information while adjusting the echo suppression progress according to the echo information corresponding to each of the echo information; and filtering the far-end signal by using the adjusted first filter to obtain corresponding new echo information, continuously adjusting the echo suppression progress according to the new echo information and the echo information corresponding to the second filter, and suppressing the first residual signal by using the continuously adjusted echo suppression progress to obtain an audio signal after echo cancellation.

It should be noted that, in the process of performing echo cancellation in this embodiment, echo cancellation is performed in real time for each received audio signal frame, when echo in an audio signal of a current frame is suppressed by using an adjusted echo suppression excitation progress, filtering processing is performed on an audio signal of a next frame by using a first filter that adjusts a convergence speed, and further, residual signals of echo information and a near-end signal obtained after filtering are suppressed by using a readjusted echo suppression excitation progress, so as to obtain an audio signal after echo cancellation of the next frame.

In this embodiment, a dual filter is used, and since the dual filter has different characteristics, the first filter and the second filter can both perform linear filtering processing on the far-end signal, but the first filter is used for outputting a final linear filtering result, and the second filter is used for performing foreground prediction and timely tracking rapid path change of the echo path. By utilizing different characteristics of the double filters, the echo is better eliminated in a steady state, and the first filter can quickly converge and quickly compress the echo when the path is suddenly changed, so that the echo leakage in a large area is avoided.

It should be noted that, the convergence speed of the filter is generally contradictory to the steady-state error, when the filter has a faster convergence speed, a larger steady-state error is generally brought, whereas when the filter has a lower steady-state error, the convergence speed is generally slower; in this embodiment, when a path abrupt change is detected, the echo is quickly compressed by adjusting the convergence speed of the first filter, so as to avoid large-area echo leakage.

In some embodiments, when the echo information includes an echo path, the convergence speed of the first filter is adjusted according to the similarity of the echo path, specifically:

Determining the similarity between the echo paths corresponding to the echo paths; and adjusting the convergence speed of the first filter according to the similarity, wherein the convergence speed is reduced along with the increase of the similarity.

In practice, taking the first filter as the KALMAN filter as an example, the echo path estimated by the first filter is denoted as W in the frequency domain _main (k, l). Taking the NLMS algorithm with a larger step size as an example for the second filter, the frequency domain form of the echo path estimated by the second filter is denoted as W _shadow (k，l)。

In practice, the W can be calculated from the first residual signal and the planning step _main (k, l) performing iterative updating, wherein the updating rule is as follows:

W _main (k，l+1)＝AW ⁺ (k, l) equation (7);

where a is a preset value, and a is generally defined as a value close to 1 when the echo path is assumed to be slowly changing. W (W) ⁺ (k, l) represents an intermediate variable that calculates an echo path of the far-end signal of the current audio signal frame.

W+m _ain (k，l)＝W _main (k，l)+K(k)E _main (k) Equation (8);

wherein W is _main (k, l) represents the frequency domain form of the echo path estimated by the first filter, E _main (k) Representing the frequency domain version of the first residual signal, K (K) represents the kalman gain, which can be understood as one form of step size in the echo path estimation process.

P(k,l+1)＝A ² P ⁺ (k,l)+ψ _ΔΔ (k) Equation (9);

Wherein P (k, l+1) represents a noise covariance matrix, P ⁺ (k, l) represents an intermediate variable for calculating a noise covariance matrix, A is a preset value, ψ _ΔΔ (k) Representing a process noise covariance matrix, which can be understood as the deviation degree of the echo path estimated by the current first filter obtained in the estimation process and the real echo pathThe effect on the whole algorithm is relatively large, and the effect can be calculated by adopting a plurality of estimation methods, and the embodiment does not limit the effect excessively, wherein one estimation method can be expressed as follows: psi _ΔΔ (k)＝(1-A ² )E[W _main (k，l)W ^H _main (k，l)]。

Wherein I is _M Representing a unit diagonal matrix, C (k) representing a fixed coefficient matrix,

representing the adjusted noise covariance matrix; k (K) represents a kalman gain, K represents a frequency bin, and l represents a time frame number.

Wherein P (k, l) represents the noise covariance matrix of the first filter, alpha represents the adjustment parameter, and the alpha value directly affects the echo path tracking capability and the steady state error, so that the value can be monitored and controlled in real time, the filter can be quickly converged when the path is changed, and the steady state error is as low as possible.

Wherein K (K) represents the Kalman gain, P (K, l) represents the noise covariance matrix of the first filter, C (K) represents a fixed coefficient matrix, ψ _ss (k) Representing observed noise in Kalman filtering, where ψ _ss (k) Can be expressed as psi _SS (k)＝E[E _main (k)E _main ^H (k)]，E _main (k) Representing a frequency domain version of the first residual signal. B represents the number of blocks of the frequency domain division, and B represents the index value of the number of blocks.

In practice, according to the second residualSignal and planning step length, can be applied to W _shadow (k, l) performing iterative updating, wherein the updating rule is as follows:

where X (k) represents the frequency domain far-end signal, μ (k) represents the planned step size, μ, Δ is a preset fixed value, μ is the step size, may be set to 0.5, Δ is a variable preventing the denominator from being 0, and may be set to a small value, for example, 1e-10.

W _shadow (k，l+1)＝W _shadow (k，l)+μ(k)X ^H (k) E (k) equation (14);

wherein X (k) represents the frequency domain far-end signal, k represents the frequency bin, l represents the time frame number, μ (k) represents the planning step size, E (k) represents the second residual signal E _shadow A frequency domain version of (n); w (W) _shadow (k, l) represents the echo path obtained by filtering the far-end signal of the first frame by the second filter, W _shadow (k, l+1) represents the echo path of the far-end signal of the (1+1) th frame obtained by filtering the far-end signal by the second filter.

Optionally, the similarity (k) between echo paths is determined by the above formula (2).

In some embodiments, the state of the first filter is determined according to the similarity between the respective echo paths, and the convergence speed of the first filter is adjusted according to the state of the first filter.

In some embodiments, said adjusting the convergence speed of the first filter according to the respective echo information comprises:

adjusting a noise covariance matrix of the first filter according to the echo information corresponding to each filter, wherein the noise covariance matrix is used for representing the deviation degree of echo path estimation of the first filter; and determining the convergence speed of the first filter according to the adjusted noise covariance matrix.

In implementation, firstly, according to the similarity between the echo paths corresponding to the first filter, determining the state of the first filter, adjusting the noise covariance matrix of the first filter according to the state of the first filter, and determining the convergence rate of the first filter according to the adjusted noise covariance matrix. Wherein the smaller the similarity, the more convergent the first filter; the greater the similarity, the more divergent the first filter; the greater the similarity, the more stable the current path, and the echo cancellation capability in steady state can be improved by reducing the convergence speed.

Step 1) determining the state of the first filter according to the similarity between the echo paths corresponding to each other.

The relationship between the state of the first filter and the similarity in the present embodiment is as follows:

Step 2) adjusting a noise covariance matrix of the first filter according to the state of the first filter.

Optionally, the noise covariance matrix of the first filter is adjusted according to the state of the first filter by:

wherein the adjustment parameter alpha is determined according to the state of the first filter, and the noise covariance matrix P (k) of the first filter is adjusted by using the adjustment parameter alpha, i.e. the adjusted noise covariance matrix is obtained according to the formula (11)

And 3) adjusting the convergence speed of the first filter according to the noise covariance matrix.

Correcting the size of the noise covariance matrix in the first filter by using the adjustment parameter alpha, and when the filter is in a convergence state, T ₃ With smaller value, ensuring lower steady-state error, when the filter needs to converge, the objective of quickly tracking echo path is achieved by increasing alpha value, and T can be set in general case ₃ ＜T ₄ ＜T ₅ 。

In some embodiments, when the echo information includes an echo signal, the convergence speed of the first filter may be adjusted according to an energy ratio of a residual signal, specifically:

in practice, the second residual signal e may be determined by the above equation (3) _shadow (n) determining a first residual signal e according to equation (5) _main (n) and will not be described in detail herein.

In some embodiments, the energy ratio may also be determined by the absolute value of the difference between the power of the first residual signal and the power of the second residual signal. The energy ratio in this embodiment characterizes the energy of the residual signal, and this embodiment does not define too much how the energy ratio is determined based on the residual signal.

Step 3) adjusting the convergence speed of the first filter according to the energy ratio, wherein the convergence speed increases with the increase of the energy ratio.

Optionally, the convergence speed may be adaptively adjusted based on the energy ratio, for example, the energy ratio of a plurality of intervals may be set, where each interval corresponds to one convergence speed, and when the calculated energy ratio is in a certain interval, the convergence speed corresponding to the interval is taken as a final adjustment target, and the convergence speed of the current first filter is adjusted to the adjustment target. The curve relationship between the energy ratio and the convergence speed may be set, for example, a linear relationship, a nonlinear relationship, or the like, so that the curve relationship satisfies a condition that the convergence speed increases with an increase in the energy ratio. Specifically, how to adjust the convergence speed based on the energy ratio can be adjusted according to the actual requirement, which is not limited in this embodiment too much.

In some embodiments, a convergence state of the first filter is determined according to the respective echo information, and a convergence speed of the first filter is adjusted according to the convergence state of the first filter.

Optionally, the state of the first filter is determined according to the energy ratio, the noise covariance matrix of the first filter is adjusted according to the state of the first filter, and the convergence speed of the first filter is determined according to the adjusted noise covariance matrix of the first filter. The smaller the energy ratio is, the more convergent the filter is, the larger the energy ratio is, the more divergent the filter is, the smaller the filter is, the larger the more divergent the filter is.

In practice, the power of the first residual signal may be calculated according to the above formula (6), and the power of the second residual signal may be calculated according to the above formula (7).

wherein T is ₁ Represents a value greater than 1 and close to 1, T ₂ > 1 and T ₂ ＞T ₁ . When the energy ratio is close to 1, the first filter is in a convergence state, and when the energy ratio is more than 1 and less than T ₂ When in a first stateThe filter is in an under-filtering state when the energy ratio is greater than or equal to T ₂ When the first filter is in a divergent state.

After determining the state of the filter according to the energy ratio, the noise covariance matrix of the first filter may be adjusted according to the state of the first filter:

According to the adjusted noise covariance matrix +.>

A convergence speed of the first filter is determined.

In the embodiment, a design of a dual-filter architecture is adopted, a first filter capable of changing step length is used as a main filter, and the dual-filter architecture has good echo suppression capability in a steady state; the second filter with large step size is used as an auxiliary filter, which has a faster convergence speed. When the echo environment is stable and the echo environment changes, the main filter and the auxiliary filter show different characteristics, and the estimated echo path similarity or residual signal energy difference can be selected as the characteristics to judge the current first filter state. When the path change is detected, the first filter needs to converge as soon as possible, the first filter can converge rapidly by increasing the noise covariance matrix correction coefficient alpha in the first filter, the echo suppression excitation progress is increased, the large-area echo residue is avoided, and when the steady state is detected, the adjustment parameter alpha and the echo suppression excitation progress are reduced, so that the echo cancellation degree and the near-end voice effect are ensured.

As shown in fig. 3, this embodiment further provides a specific implementation method of echo cancellation, where an implementation flow of the method is as follows:

step 300, obtaining a plurality of audio signal frames, wherein each audio signal frame comprises a near-end signal and a far-end signal;

step 301, respectively performing filtering processing on far-end signals contained in a current audio signal frame by using a first filter and a second filter to obtain respectively corresponding echo information;

step 302, adjusting the echo suppression laser progress and the convergence speed of the first filter according to the echo information corresponding to each echo;

optionally, the state of the first filter is determined according to the echo information corresponding to each echo, and the echo suppression progress and the convergence speed of the first filter are adjusted according to the state of the first filter. In implementation, the state of the first filter may be determined according to the similarity between the echo paths corresponding to each other, or the state of the first filter may be determined according to the energy ratio of the residual signal, and detailed determination procedures are described in the above steps, which are not repeated here.

The state of the filter and the corresponding relation between the echo suppression laser progress and the convergence speed of the first filter are as follows:

Wherein gamma represents the progress of echo suppression excitation, and the gamma value interval range is [0,1]The higher the gamma value, the greater the suppression of echo and the corresponding severe damage to near-end speech. Correcting the size of the noise covariance matrix in the first filter by using the adjustment parameter alpha, and when the filter is in a convergence state, T ₃ The smaller value is adopted, so that a lower steady-state error is ensured, and when the filter needs to be converged, the purpose of quickly tracking the echo path is achieved by increasing the alpha value. Adjusting the noise covariance matrix P (k) of the first filter by the adjustment parameter alpha, i.e. obtaining an adjusted noise covariance matrix according to the above formula (11)

Step 303, determining a first residual signal by using echo information corresponding to the first filter and a near-end signal included in a current audio signal frame;

step 304, the first residual signal is restrained by utilizing the adjusted echo restraining laser progress, and a current audio signal frame after echo cancellation is obtained;

step 305, respectively performing filtering processing on the far-end signal contained in the next audio signal frame by using a first filter and a second filter for adjusting the convergence speed to obtain respective corresponding echo information;

Step 306, adjusting the echo suppression laser progress and the convergence speed of the first filter again according to the echo information corresponding to each echo;

step 307, determining a first residual signal by using echo information corresponding to the first filter and a near-end signal included in a next audio signal frame;

and 308, suppressing the first residual signal by using the echo suppression laser progress after the readjustment to obtain the next audio signal frame after echo cancellation.

The embodiment adopts a double-filter structure, the first filter can reach higher echo suppression degree in a steady state, and the second filter can rapidly track the change of an echo path. And combining the double filters, and judging the state of the current first filter according to different characteristics of the double filters when the steady state and the echo path change, and adjusting the convergence speed of the filter and the echo suppression laser progress according to the state of the filter so as to achieve rapid convergence and avoid large-area echo leakage.

Based on the same inventive concept, the embodiment of the present invention further provides an electronic device, and because the electronic device is the electronic device in the method in the embodiment of the present invention, and the principle of the electronic device for solving the problem is similar to that of the method, implementation of the electronic device may refer to implementation of the method, and repeated descriptions are omitted.

As shown in fig. 4, the electronic device comprises a processor 400 and a memory 401, the memory 401 is used for storing a program executable by the processor 400, and the processor 400 is used for reading the program in the memory 401 and executing the following steps:

As an alternative embodiment, when the echo information includes an echo path, the processor 400 is specifically configured to perform:

As an alternative embodiment, when the echo information comprises an echo signal, the processor 400 is specifically configured to perform:

As an alternative embodiment, the processor 400 is specifically configured to perform:

As an alternative embodiment, the processor 400 is specifically further configured to perform:

Based on the same inventive concept, the embodiment of the present invention further provides an echo cancellation device, and since the device is the device in the method in the embodiment of the present invention, and the principle of the device for solving the problem is similar to that of the method, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.

As shown in fig. 5, the apparatus includes:

an audio acquisition module 500, configured to acquire an audio signal, where the audio signal includes a near-end signal and a far-end signal, and the near-end signal and the far-end signal are distinguished based on a propagation manner of the audio signal;

the dual-filtering module 501 is configured to perform filtering processing on the far-end signal by using a first filter and a second filter, so as to obtain respective corresponding echo information, where the capability of the first filter to cancel echo when an echo path is stable is higher than that of the second filter, and the convergence speed of the second filter when the echo path is changed is higher than that of the first filter;

the residual calculation module 502 is configured to adjust the echo suppression laser progress according to the echo information corresponding to each of the two echo signals, and determine a first residual signal by using the echo information corresponding to the first filter and the near-end signal;

and an echo suppression module 503, configured to suppress the first residual signal by using the adjusted echo suppression laser progress, so as to obtain an audio signal after echo cancellation.

As an alternative embodiment, when the echo information includes an echo path, the calculation residual module 502 is specifically configured to:

As an alternative embodiment, when the echo information includes an echo signal, the calculation residual module 502 is specifically configured to:

As an alternative embodiment, the calculation residual module 502 is specifically configured to:

Based on the same inventive concept, embodiments of the present disclosure provide a computer storage medium, the computer storage medium including: computer program code which, when run on a computer, causes the computer to perform the method of echo cancellation as any of the previous discussions. Since the principle of solving the problem by the computer storage medium is similar to that of echo cancellation, implementation of the computer storage medium can refer to implementation of the method, and the repetition is omitted.

In a specific implementation, the computer storage medium may include: a universal serial bus flash disk (USB, universal Serial Bus Flash Drive), a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.

Based on the same inventive concept, the disclosed embodiments also provide a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of echo cancellation as any of the preceding discussions. Since the principle of the solution of the problem of the computer program product is similar to that of the echo cancellation method, the implementation of the computer program product may refer to the implementation of the method, and the repetition is omitted.

The computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of echo cancellation, the method comprising:

2. The method of claim 1, wherein when the echo information includes an echo path, the adjusting the echo suppression progress according to the respective corresponding echo information comprises:

3. The method of claim 1, wherein when the echo information comprises echo signals, the adjusting the echo suppression progress according to the respective corresponding echo information comprises:

4. A method according to any one of claims 1 to 3, wherein said adjusting the progress of the echo suppression excitation according to the respective echo information comprises:

5. A method according to any one of claims 1 to 3, wherein the method further comprises:

6. The method of claim 5, wherein when the echo information includes an echo path, the adjusting the convergence speed of the first filter according to the respective echo information comprises:

7. The method of claim 5, wherein when the echo information comprises echo signals, the adjusting the convergence speed of the first filter according to the respective echo information comprises:

8. The method of claim 5, wherein said adjusting the convergence speed of the first filter based on the respective echo information comprises:

9. The method of claim 5, wherein said adjusting the convergence speed of the first filter based on the respective echo information comprises:

10. The method according to claim 1, wherein the steady state error of the first filter is smaller than the steady state error of the second filter and/or the convergence speed of the second filter is greater than the convergence speed of the first filter.

11. An electronic device comprising a processor and a memory for storing a program executable by the processor, the processor being arranged to read the program in the memory and to perform the steps of the method according to any one of claims 1 to 10.

12. A computer storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 1 to 10.