CN102065190A

CN102065190A - Method and device for eliminating echo

Info

Publication number: CN102065190A
Application number: CN2010106181360A
Authority: CN
Inventors: 封伶刚
Original assignee: Hangzhou H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2010-12-31
Filing date: 2010-12-31
Publication date: 2011-05-18
Anticipated expiration: 2030-12-31
Also published as: CN102065190B

Abstract

The invention discloses a method and a device for eliminating echo. The method comprises the steps of: determining the state of each subband adaptive filter respectively according to the cross correlation coefficient of an input signal on the near end of each subband and an echo estimation signal; when one end of the subband adaptive filter is in a speaking state, replacing the residual signal of the subband adaptive filter with comfortable noise of the subband, and outputting the replaced signal; and when both ends of the subband adaptive filter are in the speaking state, outputting the residual signal of the subband adaptive filter. In the invention, the signal processing characteristic of the subband can be fully utilized so that the residual echo can be suppressed more effectively. In addition, the suppression on the residual echo while both ends are in the speaking state and the protection on the voice of a person speaking on one end are enhanced, and the integral effect and the fluency of the system are improved.

Description

Echo cancellation method and device

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an echo cancellation method and an echo cancellation device.

Background

In a voice communication system, after a far-end input signal reaches a local signal receiving device (e.g., a telephone), the far-end input signal passes through a sound box, a room, and the like of the local signal receiving device to reach a receiver, and in the process, an echo is often generated due to sound reflection in the sound box and the room. To cancel the echo, echo cancellation techniques are required to listen for the echo signal and cancel it from the speech signal. Echo cancellation has long been a very challenging task, mainly due to:

(1) acoustic echoes will enter the microphone directly or after one or more reflections in a superimposed form, resulting in a long tail of the echo and a long impulse response of the corresponding echo channel, typically several hundred milliseconds.

(2) While the acoustic spectrum of a speech signal is non-flat and diffuse, and the conventional adaptive algorithm is related to the statistical properties of the input signal, the diffusion of eigenvalues in the autocorrelation matrix of the speech signal slows down the adaptive convergence process of, for example, NLMS (Normalized Least Mean Square) algorithm.

(3) The characteristics of the Acoustic Echo channel are non-stationary, the impulse response of the Acoustic Echo changes greatly due to the movement of a speaker or other people or objects in a room, and the fast changing characteristics of the Echo channel require that the convergence speed of AEC (Acoustic Echo Cancellation) should be as fast as possible and have a good fast tracking capability.

(4) In an actual system, due to the influence of nonlinearity of audio acquisition and playing equipment, nonlinear echoes generated by the nonlinear echo acquisition and playing equipment cannot be eliminated by an adaptive filter; due to the influence of environmental noise and the like, the coefficient of the adaptive filter may not be perfectly matched with the actual room impulse response after convergence, and a residual echo which is not eliminated is generated; the nonlinear echo and the linear residual echo need to be added with a nonlinear post-processing module after the adaptive filter for further processing, so that the residual echo is suppressed, and the overall effect of the echo cancellation system is improved.

Fig. 1 is a schematic diagram of an echo cancellation post-processing algorithm, in which symbols are described as follows:

x: a far-end input signal;

y: x actual echo signals formed by passing through the room;

v: local speaker's voice and background noise;

d: a near-end input signal of an echo canceller;

obtaining estimated echo through adaptive filter operation;

e: filtering the output residual signal;

e_post: an output signal after a post-processing algorithm;

h: an actual room impulse response;

the adaptive filter coefficients, i.e. the estimate of h.

As shown in fig. 1, the post-processing algorithm is based on an adaptive filter and adds a VAD (Voice Activity Detector) and a CNG (Comfort Noise Generation) module. The VAD module is used for detecting whether a signal received by a near-end microphone has a voice signal, wherein the signal received by the microphone in the system comprises an echo of a far-end signal passing through a room, the voice of a near-end speaker, near-end environment background noise and the like; when it is determined that no person is speaking at the far end or the near end through VAD detection, that is, when a signal received by a microphone only contains background noise, spectral characteristics (i.e., LPC (Linear Prediction Coefficient) and energy gain) of the background noise are estimated. Since the background noise generally changes slowly, the spectral feature period estimated by the VAD is updated to the CNG module. CNG generates a section of white noise excitation according to the background noise spectrum characteristics provided by VAD, and generates comfortable noise by a prediction error filter consisting of LPC coefficients and energy gain; when the echo cancellation system detects a single-ended speaking state through the double-end detection module, the NLP (non-Linear Post-processing) replaces a residual echo signal output by the adaptive filter with a comfortable noise generated by CNG (compressed natural gas) so as to prevent a far end from hearing the residual echo which is not completely cancelled; of course, when talking on both ends (i.e., the near end is talking), the NLP will deliver the signal containing the near end speaker's voice output from the adaptive filter directly to the far end.

At present, the adaptive filter algorithm has full-band and sub-band based, and compared with the full-band based post-processing algorithm, the sub-band based post-processing algorithm has the following specific advantages:

(1) the adaptive filtering algorithm based on the sub-band has the advantages of high convergence rate, low calculation complexity and the like, so that the adaptive filtering algorithm is widely adopted, and the post-processing algorithm based on the full frequency band is difficult to expand to the sub-band;

(2) when double-end talking, the signal output by the adaptive filter contains the voice of the near-end speaker and the residual echo signal, and the residual echo can not be processed based on the full-frequency-band post-processing algorithm, and the far-end can hear the residual echo;

(3) in some practical environments, due to the fact that the device has serious nonlinearity or environmental noise is serious, the adaptive filter cannot be converged well, residual echo is obvious, when near-end speaker voice with the amplitude equivalent to that of echo occurs, the nonlinear processing algorithm is difficult to distinguish the states of single-ended speech and double-ended speech, and misjudgment can cause that the residual echo cannot be well inhibited in single-ended speech or the cut-off is serious in double-ended speech.

The subband-based adaptive filtering algorithm is widely used in the field of echo cancellation due to high convergence rate and low computation complexity. However, due to the existence of the nonlinear echo and the linear residual echo, a nonlinear post-processing algorithm is required to further process the signal output by the subband adaptive filter to suppress the residual echo. Meanwhile, when only far-end speech is available, comfortable noise which is consistent with the spectrum of near-end background noise is inserted after residual echo is suppressed, so that the problem of background noise interruption caused by residual echo transition suppression during single-end speech is relieved.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

in a traditional system, a subband self-adaptive filtering output signal is synthesized into a full-band signal and then processed by a full-band-based post-processing algorithm, and the post-processing algorithm does not fully utilize the advantages of subbands and further improves the post-processing effect.

Disclosure of Invention

The invention aims to provide an echo cancellation method and a device thereof, which are used for realizing echo cancellation based on sub-bands, and therefore, the invention adopts the following technical scheme:

an echo cancellation method, comprising the steps of:

determining the state of each sub-band adaptive filter according to the cross-correlation coefficient of the near-end input signal and the echo estimation signal of each sub-band;

when the sub-band adaptive filter is in a single-end speaking state, replacing a residual signal of the sub-band adaptive filter by comfort noise of the sub-band and then outputting a replaced signal;

when the sub-band adaptive filter is in a double-talk state, a residual signal of the sub-band adaptive filter is output.

In the above method, determining the state of the subband adaptive filter according to the cross-correlation coefficient between the near-end input signal of the subband and the echo estimation signal includes:

when the cross correlation coefficient of the sub-band is larger than or equal to a first threshold value, determining that the sub-band adaptive filter is in a single-ended speaking state;

when the cross correlation coefficient of the sub-band is smaller than or equal to a second threshold value, determining that the sub-band adaptive filter is in a double-talk state;

for the sub-band with the cross correlation coefficient between the second threshold value and the first threshold value, determining the state of the sub-band adaptive filter according to the number or the occupied proportion of the sub-band adaptive filter in the specified state or the average value of the cross correlation coefficient of each sub-band adaptive filter;

wherein 0 < the second threshold < the first threshold < 1.

In the method, for the subband with the cross-correlation coefficient between the second threshold and the first threshold, determining the state of the subband adaptive filter according to the number of the subband adaptive filters in the specified state, specifically:

if the number of the sub-band adaptive filters in the single-ended speaking state exceeds a set threshold, the adaptive filters of the sub-bands with the cross-correlation coefficients between the second threshold and the first threshold are in the single-ended speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; or,

if the number of the sub-band adaptive filters in the double-end speaking state exceeds a set threshold, the adaptive filters of all sub-bands with the cross correlation coefficients between a second threshold and a first threshold are in the double-end speaking state; otherwise, the adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in the single-end speaking state.

In the method, for the sub-band with the cross-correlation coefficient between the second threshold and the first threshold, the state of the sub-band adaptive filter is determined according to the proportion of the sub-band adaptive filter in the specified state, specifically:

if the proportion of the number of the sub-band adaptive filters in the single-ended speaking state exceeds a set threshold, the adaptive filters of the sub-bands with the cross-correlation coefficients between a second threshold and a first threshold are in the single-ended speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; or,

if the proportion of the number of the sub-band adaptive filters in the double-end speaking state exceeds a set threshold, the adaptive filters of the sub-bands with the cross correlation coefficients between a second threshold and a first threshold are in the double-end speaking state; otherwise, the adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in the single-end speaking state.

In the method, for the sub-band with the cross-correlation coefficient between the second threshold and the first threshold, the state of the sub-band adaptive filter is determined according to the average value of the cross-correlation coefficient of each sub-band adaptive filter, which specifically includes:

if the average value of the cross correlation coefficients of all the sub-bands is larger than a third threshold value, the self-adaptive filter of all the sub-bands with the cross correlation coefficients between the second threshold value and the first threshold value is in a single-ended speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is in a double-end speaking state;

wherein 0 < the second threshold < the third threshold < the first threshold < 1.

if the weighted average value of the cross-correlation coefficient of each sub-band is greater than a third threshold value, the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold value and the first threshold value is in a single-ended speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is in a double-end speaking state;

In the above method, the weight value corresponding to the sub-band whose cross-correlation coefficient is greater than the first threshold, the weight value corresponding to the sub-band whose cross-correlation coefficient is less than the second threshold, and the weight value of the sub-band whose cross-correlation coefficient is between the second threshold and the first threshold are greater.

In the above method, the weight values of the sub-bands are:

wherein N is the number of sub-bands,

is the energy of the sub-band near-end input signal of the echo canceller.

An echo cancellation device, comprising:

the state determining module is used for determining the state of each sub-band self-adaptive filter according to the cross-correlation coefficient of the near-end input signal and the echo estimation signal of each sub-band;

the output module is used for replacing the residual signal of the subband self-adaptive filter with the comfortable noise of the subband when the subband self-adaptive filter is in a single-ended speaking state and then outputting the replaced signal; when the sub-band adaptive filter is in a double-talk state, a residual signal of the sub-band adaptive filter is output.

In the above apparatus, the state determining module is specifically configured to determine that the subband adaptive filter is in a single-ended speaking state when the cross-correlation coefficient of the subband is greater than or equal to a first threshold; when the cross correlation coefficient of the sub-band is smaller than or equal to a second threshold value, determining that the sub-band adaptive filter is in a double-talk state; for the sub-band with the cross correlation coefficient between the second threshold value and the first threshold value, determining the state of the sub-band adaptive filter according to the number or the occupied proportion of the sub-band adaptive filter in the specified state or the average value of the cross correlation coefficient of each sub-band adaptive filter; wherein 0 < the second threshold < the first threshold < 1.

In the above apparatus, the state determining module is specifically configured to, when determining the state of the subband adaptive filter according to the number of subband adaptive filters in a specified state for subbands with cross-correlation coefficients between a second threshold and a first threshold, determine that the adaptive filter of each subband with cross-correlation coefficients between the second threshold and the first threshold is in a single-ended speaking state if the number of subband adaptive filters in the single-ended speaking state exceeds a set threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; or if the number of the sub-band adaptive filters in the double-end speaking state exceeds a set threshold, judging that the adaptive filters of all sub-bands with the cross correlation coefficients between a second threshold and a first threshold are in the double-end speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is judged to be in a single-end speaking state.

In the above apparatus, the state determining module is specifically configured to, when determining the state of the subband adaptive filter according to the proportion of the subband adaptive filter in the specified state for the subband having the cross-correlation coefficient between the second threshold and the first threshold, determine that the adaptive filter of each subband having the cross-correlation coefficient between the second threshold and the first threshold is in the single-ended speaking state if the proportion of the number of the subband adaptive filters in the single-ended speaking state exceeds a set threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; or if the proportion of the number of the sub-band adaptive filters in the double-end speaking state exceeds a set threshold, judging that the adaptive filters of the sub-bands with the cross correlation coefficients between the second threshold and the first threshold are in the double-end speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is judged to be in a single-end speaking state.

In the above apparatus, the state determining module is specifically configured to, when determining the state of the subband adaptive filter according to the average value of the cross-correlation coefficient of each subband adaptive filter for subbands having cross-correlation coefficients between the second threshold and the first threshold, determine that the adaptive filter of each subband having cross-correlation coefficients between the second threshold and the first threshold is in a single-ended speaking state if the average value of the cross-correlation coefficient of each subband is greater than a third threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; wherein 0 < the second threshold < the third threshold < the first threshold < 1.

In the above apparatus, the state determining module is specifically configured to, when determining the state of the subband adaptive filter according to an average value of the cross-correlation coefficients of the subband adaptive filters, for a subband having a cross-correlation coefficient between a second threshold and a first threshold, determine that the adaptive filter of each subband having a cross-correlation coefficient between the second threshold and the first threshold is in a single-ended speaking state if a weighted average value of the cross-correlation coefficients of each subband is greater than a third threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; wherein 0 < the second threshold < the third threshold < the first threshold < 1.

In the above apparatus, the weight value corresponding to the sub-band whose cross-correlation coefficient used by the state determining module is greater than the first threshold, the weight value corresponding to the sub-band whose cross-correlation coefficient is less than the second threshold, and the weight value of the sub-band whose cross-correlation coefficient is between the second threshold and the first threshold are greater.

The beneficial technical effects of the invention comprise:

the state of the adaptive filter of each sub-band is determined according to the cross-correlation coefficient of the near-end input signal and the echo estimation signal of each sub-band, and different processing methods are used according to different states, namely, the echo cancellation processing is carried out only in a single-end speaking state, so that the characteristic of sub-band signal processing is fully utilized, the residual echo is more effectively inhibited, the inhibition of the residual echo in double-end speaking and the protection of the voice of a local speaker are enhanced, and the overall effect and the fluency of the system are improved.

Drawings

FIG. 1 is a diagram illustrating a prior art echo cancellation post-processing algorithm;

fig. 2 is a schematic diagram of an echo cancellation process according to an embodiment of the present invention;

fig. 3 is a schematic diagram of another echo cancellation process according to an embodiment of the present invention;

fig. 4 is a schematic diagram of another echo cancellation process according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an echo cancellation device according to an embodiment of the present invention.

Detailed Description

In an echo cancellation system, a near-end input signal d and an estimated echo signal are commonly used

The cross-correlation coefficient eta of the adaptive filter is used for representing the convergence degree of the adaptive filter, namely when the correlation coefficient eta is close to 1, the convergence of the adaptive filter is considered to be better, namely the echo is estimated

Better approaches the input signal d; and when the correlation coefficient eta is close to 0, the filter is consideredConvergence is not ideal or in a double-talk state.

η may be expressed using the following formula:

wherein d (n) represents a near-end input signal of the echo canceller,representing the estimated echo resulting from the adaptive filter operation,

the energy of d (n) is represented,

to represent

The energy of (a).And

the calculation formula of (c) can be as follows:

based on the near-end input signal d and the estimated echo signalThe cross-correlation coefficient eta of the embodiment of the invention is used for judging the state of the filter in the non-linear post-processing process, and the idea is popularized to the sub-band, namely, the eta is used_i(subscript i denotes the ith subband) to determine the state of the ith subband adaptive filter:

wherein N is the number of subbands; d_i(n) represents the sub-band near-end input signal of the echo canceller,

representing the estimated echo obtained by the operation of a subband adaptive filter, subband signal d_i(n) and

possibly a plurality; superscript is a conjugation operation;denotes d_i(ii) the energy of (n),

to represent

Energy of sub-band signal

Andthe calculation of (d) is given by:

in the embodiment of the invention, when the state of the subband adaptive filter is judged to be single-ended speaking (namely when the correlation coefficient eta is_iIs close to 1 and is expressed as eta_i→ 1), it indicates that the residual signal output by the ith subband adaptive filter is mainly residual echo, in which case it needs to be further suppressed, such as the residual signal can be replaced by comfort noise generated by the subband; when the state of the subband adaptive filter is judged to be double-ended speech (namely when the correlation coefficient eta is_iClose to 0, expressed as η_i→ 0), the residual signal output from the subband adaptive filter may be directly output without being processed.

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Referring to fig. 2, a schematic diagram of a subband-based echo cancellation process according to an embodiment of the present invention is shown, where the process may include:

step 201, calculating the cross-correlation coefficient eta between the near-end input signal and the echo estimation signal of each sub-band_i。

Specifically, the cross-correlation coefficient between the near-end input signal and the echo estimation signal of each sub-band can be calculated according to formula (4), formula (5) and formula (6).

Step 202, according to the cross-correlation coefficient eta of each sub-band_iAnd determining the state of each sub-band adaptive filter. If the sub-band adaptive filter is in the single-ended speaking state, go to step 203; if the subband adaptive filter is in the double-ended speaking state, the process proceeds to step 204.

Wherein if the cross-correlation coefficient eta of the sub-band_iIf T is more than or equal to T (wherein T is a set threshold value, and T is more than 0 and less than 1), the subband adaptive filter is considered to be in a single-ended speaking state; if cross correlation coefficient of subband eta_iIf T is less than T, the sub-band adaptive filter is considered to be in a double-end talking state. The value of T may be determined according to the degree of system nonlinearity, the strength of the ambient background noise, the performance of the subband adaptive filter, and other factors, for example, T is 0.5.

Step 203, for the sub-band adaptive filter in the single-ended speaking state, the residual signal is processed and then output to suppress or eliminate the echo.

In this step, echo can be eliminated in various ways, and the embodiment of the present invention preferably replaces the subband residual signal with the comfort noise of the subband and then outputs the subband residual signal by the subband adaptive filter.

And step 204, directly outputting the residual signal of the sub-band adaptive filter for the sub-band adaptive filter in the double-talk state.

As can be seen from the above process, the sub-units at the same time are processedThe belt is judged to be in different states and respectively carries out corresponding subsequent processing; eta_iJudging the speech as single-ended speech, replacing residual echo with comfortable noise of corresponding sub-band, eta_iIf the value is less than T, the speech is judged to be double-ended speech, and the residual signal output by the sub-band self-adaptive filter is directly output, so that residual echo is specifically inhibited, and the phenomenon of sound cutting during double-ended speech is avoided to a certain extent.

When the system nonlinearity is severe or the environmental background noise ratio is strong, the mutual giving coefficient eta of the sub-bands_i→ (. eta. +. DELTA.) as defined by_i→ 0.5 + -0.2, in this case, if only the cross-correlation coefficient η of each subband is relied upon_iThe relation with the threshold T makes it difficult to obtain an accurate state, which may affect the effect of echo cancellation.

To solve this problem, the embodiment of the present invention further improves the scheme shown in fig. 1, and integrates the information of each sub-band, when there are a set number of sub-bands that can be definitely determined as single-ended speech (i.e. η |)_i→ 1) or other critical sub-bands (i.e. η) when it can be definitely determined that the proportion of the single-ended speech sub-band reaches the set proportion_i→ (. eta. +. DELTA.) as defined by_i→ (0.5 ± 0.2)) is judged to be single-ended speaking, so that the probability is relatively high; similarly, when there are a predetermined number of sub-bands that can be definitely determined as double-ended speech, the probability that other sub-bands in the critical state are determined as double-ended speech is relatively high. The improved process is described in detail below with reference to fig. 2.

Referring to fig. 3, a schematic diagram of another subband-based echo cancellation process provided in the embodiment of the present invention is shown, where the process may include:

step 301, calculating the cross-correlation coefficient η between the near-end input signal and the echo estimation signal of each sub-band_i。

Step 302-304 according to the cross-correlation coefficient eta of each sub-band_iAnd determining the state of each sub-band adaptive filter. If the sub-band adaptive filter is in the single-ended speaking state, go to step 305;if the subband adaptive filter is in the double-ended speech state, then step 306 is performed.

Wherein if the cross-correlation coefficient eta of the sub-band_i≥T₁(wherein T is₁A set first threshold), the subband adaptive filter is considered to be in a single-ended speaking state; if cross correlation coefficient of subband eta_i≤T₂(wherein T is₂A set second threshold), the subband adaptive filter is considered to be in a double-talk state; cross correlation coefficient T of sub-band₂＜η_i＜T₁Then it is necessary to further determine whether the subband filter is in a single-ended or double-ended speaking state. For the cross-correlation coefficient eta_iIn (T)₂，T₁) The sub-band adaptive filter of the range can determine whether the sub-band adaptive filter belongs to the single-end speaking state or the double-end speaking state according to the number or the occupied proportion of the sub-band adaptive filters in the single-end speaking state or the double-end speaking state. Wherein, 0 < T₂＜T₁< 1, for example, when the system nonlinearity is severe or the environmental background noise is strong, the cross-correlation coefficient eta of each sub-band adaptive filter_iWhen → 0.5 + -0.2, can set T₁＝0.3，T₂＝0.7。

In particular, for the cross-correlation coefficient η_iIn (T)₂，T₁) The subband adaptive filter of the range can determine its state in the following manner (fig. 3 shows only a specific implementation of one of them):

the first method is as follows: if the cross correlation coefficient η_i≥T₁Exceeds a set threshold (n > th _ n as shown in fig. 3, where n is the cross-correlation coefficient η)_i≥T₁The number of subbands of (th _ n) is a set threshold), the cross-correlation coefficient η is considered to be_iIn (T)₂，T₁) The sub-band adaptive filter of the range is in a single-ended speaking state; otherwise, the cross-correlation coefficient η is considered_iIn (T)₂，T₁) The subband adaptive filters of the range are in a double talk state.

The second method comprises the following steps: if the cross correlation coefficient η_i≥T₁If the proportion of the sub-bands in all the sub-bands exceeds a set threshold (for example, exceeds 50%), the cross-correlation coefficient eta is considered_iIn (T)₂，T₁) The sub-band adaptive filter of the range is in a single-ended speaking state; otherwise, the cross-correlation coefficient η is considered_iIn (T)₂，T₁) The subband adaptive filters of the range are in a double talk state.

The third method comprises the following steps: if the cross correlation coefficient η_i≤T₂If the number of sub-bands exceeds a predetermined threshold, the cross-correlation coefficient η is considered to be_iIn (T)₂，T₁) The sub-band adaptive filters of the range are in double-ended speech state, otherwise the cross-correlation coefficient eta is considered_iIn (T)₂，T₁) The subband adaptive filters of the range are in a single-ended speech state.

The method is as follows: if the cross correlation coefficient η_i≤T₂If the proportion of the sub-bands in all the sub-bands exceeds a set threshold (for example, exceeds 50%), the cross-correlation coefficient eta is considered_iIn (T)₂，T₁) The sub-band adaptive filters of the range are in double-ended speech state, otherwise the cross-correlation coefficient eta is considered_iIn (T)₂，T₁) The subband adaptive filters of the range are in a single-ended speech state.

Step 305, for the subband adaptive filter in the single-ended speaking state, the residual signal is processed and output to suppress or eliminate the echo. The specific implementation manner can be the same as step 203 in fig. 2.

And step 306, directly outputting the residual signal of the subband adaptive filter for the subband adaptive filter in the double-talk state.

An alternative to the flow shown in fig. 3 may be as shown in fig. 4, which differs from the flow shown in fig. 3 in that: for the cross-correlation coefficient eta_iIn (T)₂，T₁) The subband adaptive filter of the range, determines the state it is in by: root of herbaceous plantCross correlation coefficient eta according to all sub-bands_iSuch as an arithmetic mean or a mean derived by another algorithm. As shown in fig. 4, the process may include:

step 401, calculating the cross-correlation coefficient η between the near-end input signal and the echo estimation signal of each sub-band_i。

Step 402-404 according to the cross correlation coefficient eta of each sub-band_iAnd determining the state of each sub-band adaptive filter. If the sub-band adaptive filter is in the single-ended speaking state, go to step 405; if the subband adaptive filter is in the double-ended speaking state, step 406 is performed.

Wherein if the cross-correlation coefficient eta of the sub-band_i≥T₁Then, the subband adaptive filter is considered to be in a single-ended speaking state; if cross correlation coefficient of subband eta_i≤T₂Then the sub-band adaptive filter is considered to be in a double-end speaking state; cross correlation coefficient T of sub-band₂＜η_i＜T₁Then it is necessary to further determine whether the subband filter is in a single-ended or double-ended speaking state. For the cross-correlation coefficient eta_iIn (T)₂，T₁) A range of subband adaptive filters, which may be based on the cross-correlation coefficient η of all subbands_iTo determine whether the subband adaptive filter belongs to a single-ended speech state or a double-ended speech state.

In particular, if the cross-correlation coefficient η of all sub-bands is_iThe arithmetic mean value gamma of is not less than T₃(wherein T is₃Is a set third threshold), the cross-correlation coefficient η_iIn (T)₂，T₁) The sub-band adaptive filter of the range is in a single-ended speaking state; if the cross-correlation coefficient η of all sub-bands_iArithmetic mean of gamma < T₃Cross correlation coefficient η_iIn (T)₂，T₁) The subband adaptive filters of the range are in a double talk state. Wherein, 0 < T₂＜T₃＜T₁< 1, e.g. can set T₁＝0.8，T₂＝0.2，T₃0.5. Cross correlation coefficient eta of all sub-bands_iThe arithmetic mean γ of (d) can be expressed as:

where N is the number of subbands.

Step 405, for the subband adaptive filter in the single-ended speaking state, the residual signal is processed and then output to suppress or eliminate echo. The specific implementation manner can be the same as step 103 in fig. 1.

In step 406, for the subband adaptive filter in the double-talk state, the residual signal of the subband adaptive filter is directly output.

In the flow shown in fig. 4, the cross-correlation coefficient η is determined directly by calculating the average of the correlation coefficients of the respective subbands_iIn (T)₂，T₁) The state of the subband adaptive filter for a range may be relatively coarse. For example, when a certain sub-band contains only background noise, or echo or near-end speech signal with little energy component, the correlation coefficient η of the sub-band_iAnd the average value calculation is also involved, and when the proportion of the sub-bands with weak energy is more, the final average value result gamma can be adversely affected.

To solve this problem, another embodiment of the present invention further improves the flow shown in fig. 4, and introduces an energy factor to determine the cross-correlation coefficient η_iIn (T)₂，T₁) The state of the subband adaptive filter for the range. Specifically, the average value γ is obtained by performing energy weighted averaging on the correlation coefficients of the respective subbands, and a relatively large energy may be givenWith a more well-defined sub-band (e.g. η)_i≥T₁Or η_i≤T₂) With a larger weight, the final judgment critical state becomes a more reasonable state.

The modified procedure is substantially the same as the procedure shown in fig. 3, except that equation (7) is replaced with equation (8) below to calculate the mean value:

where N is the number of subbands, the energy weight can be expressed as:

in formula (9)

Can be calculated by the formula (2).

It can be seen from the above flow that the embodiment of the present invention determines that each sub-band at the same time is in different states, and performs corresponding subsequent processing respectively; eta_i≥T₁Judging as single-ended speaking, the residual echo is replaced by comfortable noise of corresponding sub-band, eta_i≤T₂Judging the speech at two ends, and directly outputting a residual signal output by the sub-band adaptive filter; especially for the sub-bands of critical states (i.e. cross-correlation coefficient eta)_iIn (T)₂，T₁) Sub-bands of the range) to be classified into more reasonable states by weighted means to achieve better suppression of residual echo and avoid the phenomenon of clipping when talking at both ends.

Based on the same technical concept, the embodiment of the present invention further provides an echo cancellation device that can be applied to the above-mentioned process provided by the embodiment of the present invention.

As shown in fig. 5, the echo canceling device may include:

a state determining module 501, configured to determine a state of each subband adaptive filter according to a cross-correlation coefficient of a near-end input signal and an echo estimation signal of each subband;

an output module 502, configured to output a replaced signal after replacing a residual signal of the subband adaptive filter with comfort noise of the subband when the subband adaptive filter is in a single-ended speaking state; when the sub-band adaptive filter is in a double-talk state, a residual signal of the sub-band adaptive filter is output.

When the state determining module 501 determines that the cross-correlation coefficient of the subband is greater than or equal to the first threshold, it determines that the subband adaptive filter is in the single-ended speaking state; when the cross correlation coefficient of the sub-band is judged to be smaller than or equal to a second threshold value, the sub-band adaptive filter is determined to be in a double-end speaking state; for the sub-band with the cross correlation coefficient between the second threshold value and the first threshold value, determining the state of the sub-band adaptive filter according to the number or the occupied proportion of the sub-band adaptive filter in the specified state or the average value of the cross correlation coefficient of each sub-band adaptive filter; wherein 0 < the second threshold < the first threshold < 1.

Specifically, when determining the state of the subband adaptive filter according to the number of the subband adaptive filters in the specified state for the subband having the cross-correlation coefficient between the second threshold and the first threshold, the state determining module 501 determines that the adaptive filter of each subband having the cross-correlation coefficient between the second threshold and the first threshold is in the single-ended speaking state if the number of the subband adaptive filters in the single-ended speaking state exceeds the set threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; or if the number of the sub-band adaptive filters in the double-end speaking state exceeds a set threshold, judging that the adaptive filters of all sub-bands with the cross correlation coefficients between a second threshold and a first threshold are in the double-end speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is judged to be in a single-end speaking state.

Specifically, when determining the state of the subband adaptive filter according to the proportion of the subband adaptive filter in the specified state for the subband with the cross-correlation coefficient between the second threshold and the first threshold, the state determining module 501 determines that the adaptive filter of each subband with the cross-correlation coefficient between the second threshold and the first threshold is in the single-ended speaking state if the proportion of the number of the subband adaptive filters in the single-ended speaking state exceeds the set threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; or if the proportion of the number of the sub-band adaptive filters in the double-end speaking state exceeds a set threshold, judging that the adaptive filters of the sub-bands with the cross correlation coefficients between the second threshold and the first threshold are in the double-end speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is judged to be in a single-end speaking state.

Specifically, when determining the state of the sub-band adaptive filter according to the average value of the cross-correlation coefficient of each sub-band adaptive filter for the sub-band having the cross-correlation coefficient between the second threshold and the first threshold, the state determining module 501 determines that the adaptive filter of each sub-band having the cross-correlation coefficient between the second threshold and the first threshold is in the single-ended speaking state if the average value of the cross-correlation coefficient of each sub-band is greater than the third threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; wherein 0 < the second threshold < the third threshold < the first threshold < 1.

Specifically, when determining the state of the sub-band adaptive filter according to the average value of the cross-correlation coefficient of each sub-band adaptive filter for the sub-band having the cross-correlation coefficient between the second threshold and the first threshold, the state determining module 501 determines that the adaptive filter of each sub-band having the cross-correlation coefficient between the second threshold and the first threshold is in the single-ended speaking state if the weighted average value of the cross-correlation coefficient of each sub-band is greater than the third threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; wherein 0 < the second threshold < the third threshold < the first threshold < 1.

Specifically, the weight value corresponding to the sub-band whose cross-correlation coefficient used by the state determining module 501 is greater than the first threshold, the weight value corresponding to the sub-band whose cross-correlation coefficient is less than the second threshold, and the weight value of the sub-band whose cross-correlation coefficient is between the second threshold and the first threshold are greater.

In summary, the embodiment of the present invention is simple and easy to implement in a real-time system, and makes full use of the characteristic of sub-band signal processing, so as to effectively suppress the residual echo, and enhance the suppression of the residual echo during dual-end speech and the protection of the voice of the speaker at the local end, so as to improve the overall effect and fluency of the system.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for enabling a terminal device (which may be a mobile phone, a personal computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. An echo cancellation method, comprising the steps of:

2. The method of claim 1, wherein determining the state of the subband adaptive filter based on cross-correlation coefficients of the near-end input signal and the echo estimate signal for the subband comprises:

wherein 0 < the second threshold < the first threshold < 1.

3. The method as claimed in claim 2, wherein for the sub-band having the cross-correlation coefficient between the second threshold and the first threshold, the state of the sub-band adaptive filter is determined according to the number of sub-band adaptive filters in the specified state, specifically:

4. The method as claimed in claim 2, wherein for the sub-band having the cross-correlation coefficient between the second threshold and the first threshold, the state of the sub-band adaptive filter is determined according to the proportion of the sub-band adaptive filter in the specified state, specifically:

5. The method as claimed in claim 2, wherein for the sub-band having the cross-correlation coefficient between the second threshold and the first threshold, the state of the sub-band adaptive filter is determined according to the average value of the cross-correlation coefficient of each sub-band adaptive filter, specifically:

6. The method as claimed in claim 2, wherein for the sub-band having the cross-correlation coefficient between the second threshold and the first threshold, the state of the sub-band adaptive filter is determined according to the average value of the cross-correlation coefficient of each sub-band adaptive filter, specifically:

7. The method of claim 6, wherein the weight values corresponding to subbands having cross-correlation coefficients greater than a first threshold and subbands having cross-correlation coefficients less than a second threshold are greater than the weight values corresponding to subbands having cross-correlation coefficients between the second threshold and the first threshold.

8. The method of claim 6, wherein the weight values for the subbands are:

wherein N is the number of sub-bands,

is the energy of the sub-band near-end input signal of the echo canceller.

9. An echo cancellation device, comprising:

10. The apparatus of claim 9, wherein the state determination module is specifically configured to determine that the subband adaptive filter is in a single-ended speaking state when the cross-correlation coefficient of the subband is greater than or equal to a first threshold; when the cross correlation coefficient of the sub-band is smaller than or equal to a second threshold value, determining that the sub-band adaptive filter is in a double-talk state; for the sub-band with the cross correlation coefficient between the second threshold value and the first threshold value, determining the state of the sub-band adaptive filter according to the number or the occupied proportion of the sub-band adaptive filter in the specified state or the average value of the cross correlation coefficient of each sub-band adaptive filter; wherein 0 < the second threshold < the first threshold < 1.

11. The apparatus according to claim 10, wherein the state determining module is specifically configured to, when determining the state of the subband adaptive filter according to the number of subband adaptive filters in a specified state for subbands having cross-correlation coefficients between a second threshold and a first threshold, determine that the adaptive filter of each subband having cross-correlation coefficients between the second threshold and the first threshold is in a single-ended speaking state if the number of subband adaptive filters in the single-ended speaking state exceeds a set threshold; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; or if the number of the sub-band adaptive filters in the double-end speaking state exceeds a set threshold, judging that the adaptive filters of all sub-bands with the cross correlation coefficients between a second threshold and a first threshold are in the double-end speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is judged to be in a single-end speaking state.

12. The apparatus according to claim 10, wherein the state determining module is specifically configured to, for a subband having a cross-correlation coefficient between a second threshold and a first threshold, determine a state of the subband adaptive filter according to a ratio of the subband adaptive filters in a specified state, and if the ratio of the number of the subband adaptive filters in a single-ended speaking state exceeds a set threshold, determine that the adaptive filter of each subband having the cross-correlation coefficient between the second threshold and the first threshold is in the single-ended speaking state; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; or if the proportion of the number of the sub-band adaptive filters in the double-end speaking state exceeds a set threshold, judging that the adaptive filters of the sub-bands with the cross correlation coefficients between the second threshold and the first threshold are in the double-end speaking state; otherwise, the adaptive filter of each sub-band with the cross correlation coefficient between the second threshold and the first threshold is judged to be in a single-end speaking state.

13. The apparatus according to claim 10, wherein the state determining module is specifically configured to, for a sub-band having a cross-correlation coefficient between a second threshold and a first threshold, determine a state of the sub-band adaptive filter according to an average value of the cross-correlation coefficient of each sub-band adaptive filter, and if the average value of the cross-correlation coefficient of each sub-band is greater than a third threshold, determine that the adaptive filter of each sub-band having the cross-correlation coefficient between the second threshold and the first threshold is in a single-ended speaking state; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; wherein 0 < the second threshold < the third threshold < the first threshold < 1.

14. The apparatus according to claim 10, wherein the state determining module is specifically configured to, for a sub-band having a cross-correlation coefficient between a second threshold and a first threshold, determine a state of the sub-band adaptive filter according to an average value of the cross-correlation coefficient of each sub-band adaptive filter, and if a weighted average value of the cross-correlation coefficient of each sub-band is greater than a third threshold, determine that the adaptive filter of each sub-band having the cross-correlation coefficient between the second threshold and the first threshold is in a single-ended speaking state; otherwise, judging that the self-adaptive filter of each sub-band with the cross-correlation coefficient between the second threshold and the first threshold is in a double-end speaking state; wherein 0 < the second threshold < the third threshold < the first threshold < 1.

15. The apparatus of claim 14, wherein the state determination module uses a weight value for subbands having cross-correlation coefficients greater than a first threshold and a weight value for subbands having cross-correlation coefficients less than a second threshold that are greater than the weight value for subbands having cross-correlation coefficients between the second threshold and the first threshold.