CN112201266B

CN112201266B - Echo suppression method and device

Info

Publication number: CN112201266B
Application number: CN202010885166.1A
Authority: CN
Inventors: 付聪; 雷欣; 李志飞
Original assignee: Go Out And Ask Suzhou Information Technology Co ltd
Current assignee: Go Out And Ask Suzhou Information Technology Co ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2023-06-13
Anticipated expiration: 2040-08-28
Also published as: CN112201266A

Abstract

The application discloses an echo suppression method and device. The method comprises the following steps: collecting a far-end sound signal; determining stationary noise N (f, t) and calculating echo residual energy P of far-end sound signal _residual (f, t), and calculating the signal P of the far-end sound signal after linear echo cancellation by using the linear echo cancellation algorithm _aec (f, t). When P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)‑P _residual (f,t)And when N (f, t), determining the value of f as a first frequency domain value, and determining the value of t as a first time domain value. Constructing Gaussian white noise G (f, t) with variance of N (f, t), and replacing P with G (f, t) at a time-frequency point with f being a first frequency domain value and t being a first time domain value _residual (f,t)。

Description

Echo suppression method and device

Technical Field

The present application relates to the field of audio, and in particular, to an echo suppression method.

Background

Many electronic devices now have both microphones and speakers. When the microphone and the loudspeaker are simultaneously started, the microphone can record sound played by the loudspeaker easily, and echo is generated. Then echo cancellation is required for the sound recorded by the microphone to improve the signal-to-noise ratio of the near-end sound signal so that the output sound is clear. In general, echo cancellation requires real-time system identification of an echo propagation path, a linear FIR filter H (Z) is constructed, and then an echo-cancelled signal E (Z) is calculated by using the following formula, where E (Z) =mic (Z) -H (Z) ×spk (Z), where MIC (Z) is a near-end sound signal collected by a microphone, and SPK (Z) is a far-end sound signal played by a speaker. This process may be referred to as linear echo cancellation.

However, the practical situation is very complex, the low-cost loudspeaker monomer, the power amplifier circuit saturated due to overlarge volume can introduce a large amount of harmonic distortion without the well designed acoustic structure; the playing volume is too large, which may cause clipping and also introduce harmonic distortion. At this time, a linear filter cannot fit these nonlinear factors completely, and thus cannot cancel echo completely. Meanwhile, even if nonlinear distortion does not exist, the filter with a limited length has limited fitting capacity; and the adopted system identification algorithm often has a certain compromise on performance due to the limitation of memory and calculation amount, so that the linear system cannot be completely fitted. Therefore, after linear echo cancellation, some residual energy of echo will usually exist.

The current mainstream method for removing the residual energy of the echo is to calculate the coherence between MIC (Z), E (Z) and SPK (Z). However, this method has high memory and computing resource requirements, which are not satisfied by many embedded devices.

Disclosure of Invention

In view of this, the embodiments of the present invention provide an echo suppression method and apparatus, which can effectively suppress the residual energy of echo, has extremely low requirements on memory and computing resources, and can be applied to any electronic device.

To achieve the above object, in a first aspect, the present invention provides an echo suppressing method, the method comprising:

collecting a far-end sound signal;

determining stationary noise N (f, t); wherein f represents a frequency domain and t represents a time domain;

calculating the echo residual energy P of the far-end sound signal _residual (f,t)；

Calculating a signal P of the far-end sound signal after linear echo cancellation by using a linear echo cancellation algorithm _aec (f,t)；

When the values of f and t meet P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)-P _residual (f,t)When N (f, t), determining the value of f as a first frequency domain value, and determining the value of t as a first time domain value; wherein P is _{residual_thres} An energy threshold value of the echo residual signal is preset; p (P) _{voice_thres} Is the energy of the preset near-end sound;

constructing Gaussian white noise G (f, t) with variance of N (f, t);

at the time-frequency point where f is the first frequency domain value and t is the first time domain value, G (f, t) is used to replace P _residual (f,t)。

Preferably, the computing of the echo residual energy P corresponding to the far-end sound signal _residual (f, t) comprising: p is calculated using the following formula _residual (f,t)：

P _residual (f,t)＝C _error (f)*P _ref (f,t)+C _harmonic2 (f)*P _ref (f/2,t)+C _harmonic3 (f)*P _ref (f/3,t)+C _harmonic4 (f)*P _ref (f/4,t)；

Wherein P is _ref (f, t) is the time-frequency point energy of the far-end sound signal at time t and frequency f; p (P) _ref (f/2, t) is the energy of the time frequency point of the far-end sound signal with the time t and the frequency f/2; p (P) _ref (f/3, t) is the energy of the time frequency point of the far-end sound signal with the time t and the frequency f/3; p (P) _ref (f/4, t) is the energy of the time frequency point of the far-end sound signal with the time t and the frequency f/4; c (C) _error (f) Is a preset first coefficient; c (C) _harmonic2 (f) Is a preset second coefficient; c (C) _harmonic3 (f) Is a preset third coefficient; c (C) _harmonic4 (f) Is a preset fourth coefficient.

Preferably, the C _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f) Is determined by the following method: when a loudspeaker of the equipment outputs the maximum play volume and plays the exponential sweep frequency signal, collecting a far-end sound reference signal; calculate C using the following formula _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f)：

P_ _error (f)＝C _error (f)*P_ _ref (f)；

P_ _harmonic (f/2)＝C _harmonic2 (f)*P_ _ref (f/2)；

P_ _harmonic (f/3)＝C _harmonic3 (f)*P_ _ref (f/3)；

P_ _harmonic (f/4)＝C _harmonic4 (f)*P_ _ref (f/4)；

Wherein P\u _ref (f) Is a far-end sound reference signal; p/u _error (f) after linear echo cancellation is performed on the far-end sound reference signal, residual echo energy of the near-end signal at a frequency point f; p/u _harmonic (f/2) is the f/2 frequency point of the far-end sound reference signal, and after the second harmonic is generated through the echo path, the residual echo energy at the frequency point f; p/u _harmonic (f/3) is the f/3 frequency point of the far-end sound reference signal, and residual echo energy at the frequency point f/2 after the third harmonic is generated through the echo path; p/u _harmonic (f/4) is the f/4 frequency point of the far-end sound reference signal, four harmonics are generated through the echo path, and residual echo energy is generated at the frequency point f/3.

Preferably, the linear echo cancellation algorithm includes: minimum mean square error, or affine projection algorithm, or recursive least squares.

Preferably, the determining stationary noise N (f, t) includes: determining stationary noise N (f, t) using a stationary noise estimation algorithm; the stationary noise estimation algorithm comprises: minimum statistics, minimum control recursive average MCRA, improved minimum control recursive average IMCRA.

To achieve the above object, in a second aspect, the present invention provides an echo suppressing apparatus comprising:

the acquisition unit is used for acquiring far-end sound signals;

a first determination unit for determining stationary noise N (f, t); wherein f represents a frequency domain and t represents a time domain;

a first calculation unit for calculating the echo residual energy P of the far-end sound signal _residual (f,t)；

The second calculating unit is further configured to calculate a signal P obtained by performing linear echo cancellation on the far-end sound signal by using a linear echo cancellation algorithm _aec (f,t)；

A second determining unit for determining that when the values of f and t satisfy P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)-P _residual (f,t)When N (f, t), determining the value of f as a first frequency domain value, and determining the value of t as a first time domain value; wherein P is _{residual_thres} An energy threshold value of the echo residual signal is preset; p (P) _{voice_thres} Is the energy of the preset near-end sound;

a construction unit for constructing a gaussian white noise G (f, t) with variance N (f, t);

a cancellation unit for substituting P with G (f, t) at a time-frequency point where f is a first frequency domain value and t is a first time domain value _residual (f,t)。

Preferably, the first computing unit is specifically configured to: p is calculated using the following formula _residual (f,t)：

P_ _error (f)＝C _error (f)*P_ _ref (f)；

P_ _harmonic (f/2)＝C _harmonic2 (f)*P_ _ref (f/2)；

P_ _harmonic (f/3)＝C _harmonic3 (f)*P_ _ref (f/3)；

P_ _harmonic (f/4)＝C _harmonic4 (f)*P_ _ref (f/4)；

Wherein P\u _ref (f) Is a far-end sound reference signal; p (P) _error (f) after linear echo cancellation is performed on the far-end sound reference signal, residual echo energy of the near-end signal at a frequency point f; p/u _harmonic (f/2) is the f/2 frequency point of the far-end sound reference signal, and after the second harmonic is generated through the echo path, the residual echo energy at the frequency point f; p/u _harmonic (f/3) is the f/3 frequency point of the far-end sound reference signal, and residual echo energy at the frequency point f/2 after the third harmonic is generated through the echo path; p/u _harmonic (f/4) is the f/4 frequency point of the far-end sound reference signal, four harmonics are generated through the echo path, and residual echo energy is generated at the frequency point f/3.

Preferably, the first determining unit is specifically configured to: determining stationary noise N (f, t) using a stationary noise estimation algorithm; the stationary noise estimation algorithm comprises: minimum statistics, minimum control recursive average MCRA, improved minimum control recursive average IMCRA.

In order to achieve the above object, in a third aspect, the present invention provides a computer-readable storage medium storing a computer program for executing the echo suppressing method described in the first aspect.

In order to achieve the above object, in a fourth aspect, the present invention provides an electronic apparatus comprising: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement the echo suppression method described in the first aspect.

By utilizing the echo suppression method and the device provided by the scheme, the far-end sound signal is collected; determining stationary noise N (f, t) and calculating echo residual energy P of far-end sound signal _residual (f, t), and calculating the signal P of the far-end sound signal after linear echo cancellation by using the linear echo cancellation algorithm _aec (f, t). When the values of f and t meet P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)-P _residual (f,t)And when N (f, t), determining the value of f as a first frequency domain value, and determining the value of t as a first time domain value. Constructing Gaussian white noise G (f, t) with variance of N (f, t), wherein when f is a first frequency domain value and t is a time-frequency point of a first time domain value, namely an echo residual time-frequency point, G (f, t) is used for replacing P _residual And (f, t), the echo can be effectively restrained, and the calculation process has extremely low requirements on memory and calculation resources, and can be applied to any electronic equipment.

Drawings

The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 is a flowchart of an echo suppression method according to an exemplary embodiment of the present application;

FIG. 2 is a spectral diagram provided by an exemplary embodiment of the present application;

fig. 3 is a block diagram of an echo suppression device according to an exemplary embodiment of the present application;

fig. 4 is a block diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

Fig. 1 is a flowchart of an echo suppression method according to an exemplary embodiment of the present application. The method is applied to an echo suppression device which is configurable in an electronic apparatus having both a microphone and a loudspeaker. The method comprises the following steps:

step 101, collecting far-end sound signals.

In one example, many electronic devices have both microphones and speakers. When the microphone and the loudspeaker are simultaneously started, the electronic equipment can play sound signals through the loudspeaker collected by the microphone. This sound signal picked up by the microphone of the electronic device and played by the speaker of the electronic device itself is called the far-end sound signal.

Step 102, stationary noise N (f, t) is determined.

Where f represents the frequency domain and t represents the time domain. Stationary noise N (f, t) is a random signal, and sampling in the time domain is a generalized stationary random process expected to be 0. The energy of the stationary noise N (f, t) can be considered to be time-invariant.

In one example, stationary noise N (f, t) may be determined using stationary noise estimation algorithms including, but not limited to: minimum statistics, minimum control recursive average MCRA (Minima Controlled Recursive Averaging), modified minimum control recursive average IMCRA (Improved Minima Controlled Recursive Averaging).

Step 103, calculating echo residual energy P of the far-end sound signal _residual (f,t)。

In one example, P can be calculated using the following formula _residual (f,t)：

Wherein P is _ref (f, t) is the energy of a time frequency point of the far-end sound signal at time t and frequency f; p (P) _ref (f/2, t) is the energy of a time frequency point of the far-end sound signal with the time t and the frequency f/2; p (P) _ref (f/3, t) is the energy of a time frequency point of the far-end sound signal with the time t and the frequency f/3; p (P) _ref (f/4, t) is the energy of a time frequency point of the far-end sound signal with the time t and the frequency f/4; c (C) _error (f) Is a preset first coefficient; c (C) _harmonic2 (f) Is a preset second coefficient; c (C) _harmonic3 (f) Is a preset third coefficient; c (C) _harmonic4 (f) Is a preset fourth coefficient.

The echo residue after the linear echo cancellation is caused by both the limitation of the linear echo cancellation algorithm (hereinafter referred to as AEC) itself and the nonlinear distortion of the system. The limitations of the algorithm are:

the room response is very long, while the length of the AEC filter is limited, making it impossible to fit the room response completely;

the change of room response, the interruption of signals and the like can cause the re-convergence of the AEC filter, and the AEC performance is obviously reduced in the convergence process;

some AEC algorithms (e.g., based on subband decomposition) have some degree of nonlinearity themselves, and have an upper performance limit after complete convergence;

the nonlinear distortion results from the summation of the nonlinearities of the various devices on the echo feedback path, including:

overload of the speaker power amplifier;

nonlinear between the horn input voltage variation and cone push distance;

nonlinearity introduced by the acoustic structure from the horn to the microphone;

echo residues introduced by limitations of the algorithm have frequency invariance; the echo residue introduced by nonlinear distortion exists in harmonic form.

P _residual (f) After AEC, the residual echo signal at the frequency point f; p (P) _error (f) As the f frequency point of the far-end sound signal, after AEC, residual echo energy at the f frequency point is generatedBecause of the limitations of the algorithm. P (P) _harmonic (f/2) is the f/2 frequency point of the far-end sound signal, because the nonlinear distortion on the echo feedback path generates the second harmonic wave, and the residual echo energy at the frequency point f; p (P) _harmonic (f/3) is the f/3 frequency point of the far-end sound signal, because the nonlinear distortion on the echo feedback path generates third harmonic wave, and the residual echo energy at the frequency point f; p (P) _harmonic (f/4) is the f/4 frequency point of the far-end sound signal because nonlinear distortion on the echo feedback path produces four harmonics, the residual echo energy at frequency point f. And, as shown in the spectrogram of FIG. 2, it can be seen that the echo residual energy P at 2000Hz _residual (f) From P _error (f)，P _harmonic (f/2)，P _harmonic (f/3)，P _harmonic (f/4)，P _harmonic (f/5) and the like, wherein P _harmonic (f/5) has been smaller, P _harmonic (f/5) and higher harmonic components can be ignored, and therefore, can be obtained:

P _residual (f)＝P _error (f)+P _harmonic (f/2)+P _harmonic (f/3)+P _harmonic (f/4),

i.e. the echo residual energy is equal to the first harmonic residual energy P not removed by the linear echo cancellation algorithm _error (f) Plus the superposition of the energy of the subharmonics generated by the nonlinear distortion. Because the higher harmonic energy is negligible, only the 4 th harmonic is superimposed.

And, by

P _error (f)＝C _error (f)*P _ref (f)；

P _harmonic (f/2)＝C _harmonic2 (f)*P _ref (f/2)；

P _harmonic (f/3)＝C _harmonic3 (f)*P _ref (f/3)；

P _harmonic (f/3)＝C _harmonic4 (f)*P _ref (f/4)；

Available, P _residual (f,t)＝C _error (f)*P _ref (f,t)+C _harmonic2 (f)*P _ref (f/2,t)+C _harmonic3 (f)*P _ref (f/3,t)+C _harmonic4 (f)*P _ref (f/4,t)。

Further, in the above formula, C is preset _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f) Is determined by the following method:

and when the loudspeaker of the electronic equipment outputs the maximum playing volume and plays the index sweep frequency signal, collecting the far-end sound reference signal. The far-end sound reference signal refers to a sound signal with the largest volume, which is collected by a microphone of the electronic equipment and is played by a loudspeaker of the electronic equipment.

Then calculate C using the following formula _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f)：

P_ _error (f)＝C _error (f)*P_ _ref (f)；

P_ _harmonic (f/2)＝C _harmonic2 (f)*P_ _ref (f/2)；

P_ _harmonic (f/3)＝C _harmonic3 (f)*P_ _ref (f/3)；

P_ _harmonic (f/4)＝C _harmonic4 (f)*P_ _ref (f/4)；

Wherein P\u _ref (f) Is a far-end sound reference signal; p/u _error (f) After linear echo cancellation is carried out on the far-end sound reference signal, residual echo energy of the near-end signal at a frequency point f; p/u _harmonic (f/2) is the f/2 frequency point of the far-end sound reference signal, and after the second harmonic is generated through the echo path, the residual echo energy at the frequency point f; p/u _harmonic (f/3) is the f/3 frequency point of the far-end sound reference signal, and residual echo energy at the frequency point f/2 after the third harmonic is generated through the echo path; p/u _harmonic (f/4) is the f/4 frequency point of the far-end sound reference signal, four harmonics are generated through the echo path, and residual echo energy is generated at the frequency point f/3.

At C _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f) After the value of (C) is determined, C _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f) Is configured in the echo suppressing device for calculating the echo residual energy P of the far-end sound signal _residual (f,t)。

Step 104, calculating the signal P of the far-end sound signal after linear echo cancellation by using the linear echo cancellation algorithm _aec (f,t)。

In one example, the linear echo cancellation algorithm includes, but is not limited to: minimum mean square error, or affine projection algorithm, or recursive least squares, etc.

Step 105, when the values of f and t satisfy P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)-P _residual (f,t)And when N (f, t), determining the value of f as a first frequency domain value, and determining the value of t as a first time domain value.

Wherein P is _{residual_thres} The energy threshold value of the echo residual signal is a preset experience value, such as 1e-2f; p (P) _{voice_thres} The energy of the near-end sound, which is preset, is an empirical value, such as set to 1e-1f.

It will be appreciated that if P _residual (f,t)>P _{residual_thres} The echo residue is larger, so that the echo residue needs to be suppressed, otherwise, the echo residue is smaller, and the echo residue suppression can be omitted; if P _aec (f,t)-P _residual (f,t)<P _{voice_thres} Indicating that the near-end voice energy is smaller, so that echo residues need to be suppressed, otherwise indicating that the near-end voice energy is larger, if the echo residues are suppressed, the near-end voice can be distorted, and the echo residues can not be suppressed; if P _aec (f,t)>N (f, t), in order to promote the robustness of algorithm, only when the energy of the time frequency point of the signal after linear echo cancellation is greater than the stationary noise, echo residual suppression is performed, otherwise echo residual suppression may not be performed. Based on this, P is satisfied only when f, t are taken as values _residual (f,t)>P _{residual_thres} ，P _aec (f,t)-P _residual (f,t)And when the three conditions of N (f, t) are met, the time frequency point is considered to be the echo residual time frequency point which needs to be suppressed.

And 106, constructing Gaussian white noise G (f, t) with variance of N (f, t).

Step 107, in the time-frequency point where f is the first frequency domain value and t is the first time domain value, G (f, t) is used to replace P _residual (f,t)。

Based on step 105, when the values of f and t satisfy P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)-P _residual (f,t)N (f, t), the time-frequency point is considered to be the echo residual time-frequency point, and G (f, t) is used to replace P in the time-frequency point _residual (f, t) the echo can be effectively suppressed.

By using the echo suppression method provided in the present embodiment, the far-end sound signal is collected, the stationary noise N (f, t) is determined, and the echo residual energy P of the far-end sound signal is calculated _residual (f, t), and calculating the signal P of the far-end sound signal after linear echo cancellation by using the linear echo cancellation algorithm _aec (f, t). When the values of f and t meet P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)-P _residual (f,t)And when N (f, t), determining the value of f as a first frequency domain value, and determining the value of t as a first time domain value. Constructing Gaussian white noise G (f, t) with variance of N (f, t), wherein when f is a first frequency domain value and t is a time-frequency point of a first time domain value, namely an echo residual time-frequency point, G (f, t) is used for replacing P _residual And (f, t), the echo can be effectively restrained, and the calculation process has extremely low requirements on memory and calculation resources, and can be applied to any electronic equipment. Meanwhile, the echo suppression method can ensure that the sound is undistorted in the near-end single-talk and the sound distortion is very small in the near-end double-talk, so that the user experience is improved.

Fig. 3 is a block diagram of an echo suppression device according to an exemplary embodiment of the present application. As shown in fig. 3, an echo suppression device according to an embodiment of the present application includes:

an acquisition unit 201, configured to acquire a far-end sound signal.

A first determining unit 202 for determining stationary noise N (f, t); where f represents the frequency domain and t represents the time domain.

A first calculating unit 203 for calculating echo residual energy P of the far-end sound signal _residual (f,t)。

The second calculating unit 204 is further configured to calculate a signal P obtained by performing linear echo cancellation on the far-end sound signal by using a linear echo cancellation algorithm _aec (f,t)。

A second determining unit 205 for determining that when the values of f and t satisfy P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)-P _residual (f,t)When N (f, t), determining the value of f as a first frequency domain value, and determining the value of t as a first time domain value; wherein P is _{residual_thres} An energy threshold value of the echo residual signal is preset; p (P) _{voice_thres} Is the energy of the preset near-end sound.

A construction unit 206 for constructing a gaussian white noise G (f, t) with variance N (f, t).

A cancellation unit 207 for substituting P with G (f, t) at a time-frequency point where f is a first frequency domain value and t is a first time domain value _residual (f,t)。

Preferably, the first computing unit 203 is specifically configured to: p is calculated using the following formula _residual (f,t)：

Wherein P is _ref (f, t) is the time-frequency point energy of the far-end sound signal at time t and frequency f; p (P) _ref (f/2, t) is the energy of the time frequency point of the far-end sound signal with the time t and the frequency f/2; p (P) _ref (f/3, t) is the energy of the time frequency point of the far-end sound signal with the time t and the frequency f/3; p (P) _ref (f/4, t) is the energy of the time-frequency point of the far-end sound signal with the time t and the frequency f/4；C _error (f) Is a preset first coefficient; c (C) _harmonic2 (f) Is a preset second coefficient; c (C) _harmonic3 (f) Is a preset third coefficient; c (C) _harmonic4 (f) Is a preset fourth coefficient.

P_ _error (f)＝C _error (f)*P_ _ref (f)；

P_ _harmonic (f/2)＝C _harmonic2 (f)*P_ _ref (f/2)；

P_ _harmonic (f/3)＝C _harmonic3 (f)*P_ _ref (f/3)；

P_ _harmonic (f/4)＝C _harmonic4 (f)*P_ _ref (f/4)；

Preferably, the first determining unit 202 is specifically configured to: determining stationary noise N (f, t) using a stationary noise estimation algorithm; the stationary noise estimation algorithm comprises: minimum statistics, minimum control recursive average MCRA, improved minimum control recursive average IMCRA.

By using the echo suppression device provided in the present embodiment, the far-end sound signal is collected, the stationary noise N (f, t) is determined, and the echo residual energy P of the far-end sound signal is calculated _residual (f, t), and calculating the signal P of the far-end sound signal after linear echo cancellation by using the linear echo cancellation algorithm _aec (f, t). When the values of f and t meet P _residual (f,t)>P _{residual_thres} And P is _aec (f,t)-P _residual (f,t)And when N (f, t), determining the value of f as a first frequency domain value, and determining the value of t as a first time domain value. Constructing Gaussian white noise G (f, t) with variance of N (f, t), wherein when f is a first frequency domain value and t is a time-frequency point of a first time domain value, namely an echo residual time-frequency point, G (f, t) is used for replacing P _residual And (f, t), the echo can be effectively restrained, and the calculation process has extremely low requirements on memory and calculation resources, and can be applied to any electronic equipment. Meanwhile, the echo suppression device can ensure that the sound is undistorted when the near-end is used for single-talk and the sound distortion is very small when the near-end is used for double-talk, so that the user experience is improved.

Next, an electronic device 11 according to an embodiment of the present application is described with reference to fig. 4. As shown in fig. 4, the electronic device 11 includes one or more processors 111 and a memory 112.

The processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 11 to perform desired functions.

Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 111 to implement the echo suppression methods and/or other desired functions of the various embodiments of the present application described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.

In one example, the electronic device 11 may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

The input device 113 may include, for example, a keyboard, a mouse, and the like.

The output device 114 may output various information to the outside, including the determined distance information, direction information, and the like. The output device 114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device 11 relevant to the present application are shown in fig. 4 for simplicity, components such as buses, input/output interfaces, and the like being omitted. In addition, the electronic device 11 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in an echo suppression method according to various embodiments of the present application described in the "exemplary methods" section of the present specification.

The computer program product may write program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform steps in an echo suppression method according to various embodiments of the present application described in the above-mentioned "exemplary methods" section of the present specification.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not intended to be limited to the details disclosed herein as such.

The block diagrams of the devices, apparatuses, devices, systems referred to in this application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent to the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. An echo suppression method, the method comprising:

collecting a far-end sound signal;

calculating the echo residual energy P of the far-end sound signal _residual (f, t)；

Using linear echoThe cancellation algorithm calculates the signal P of the far-end sound signal after linear echo cancellation _aec (f, t) ；

When the values of f and t meet P _residual (f, t) > P _{residual_thres} And P is _aec (f, t) - P _residual (f, t) N (f, t), determining the value of f as a first frequency domain value and the value of t as a first time domain value, wherein P _{residual_thres} An energy threshold value of the echo residual signal is preset; p (P) _{voice_thres} Is the energy of the preset near-end sound;

constructing Gaussian white noise G (f, t) with variance of N (f, t);

at the time-frequency point where f is the first frequency domain value and t is the first time domain value, G (f, t) is used to replace P _residual (f, t)。

2. The method according to claim 1, wherein the computing of the echo residual energy P corresponding to the far-end sound signal _residual (f, t) comprising:

p is calculated using the following formula _residual (f, t)：

；

3. According to claim 2The method of (2), characterized in that the C _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f) Is determined by the following method:

when a loudspeaker of the equipment outputs the maximum play volume and plays the exponential sweep frequency signal, collecting a far-end sound reference signal;

calculate C using the following formula _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f)：

；

；

；

；

Wherein P\u _ref (f) Is a far-end sound reference signal; p/u _error (f) Residual echo energy at a frequency point f after linear echo cancellation is carried out on a far-end sound reference signal; p/u _harmonic (f/2) is the f/2 frequency point of the far-end sound reference signal, and after the second harmonic is generated through the echo path, the residual echo energy at the frequency point f; p/u _harmonic (f/3) is the f/3 frequency point of the far-end sound reference signal, and residual echo energy at the frequency point f/2 after the third harmonic is generated through the echo path; p/u _harmonic (f/4) is the f/4 frequency point of the far-end sound reference signal, four harmonics are generated through the echo path, and residual echo energy is generated at the frequency point f/3.

4. The method of claim 1, wherein the linear echo cancellation algorithm comprises: minimum mean square error, or affine projection algorithm, or recursive least squares.

5. The method according to claim 1, wherein said determining stationary noise N (f, t) comprises:

determining stationary noise N (f, t) using a stationary noise estimation algorithm; the stationary noise estimation algorithm comprises: minimum statistics, minimum control recursive average MCRA, improved minimum control recursive average IMCRA.

6. An echo suppression device, said device comprising:

the acquisition unit is used for acquiring far-end sound signals;

a first calculation unit for calculating the echo residual energy P of the far-end sound signal _residual (f, t)；

The second calculating unit is further configured to calculate a signal P obtained by performing linear echo cancellation on the far-end sound signal by using a linear echo cancellation algorithm _aec (f, t) ；

A second determining unit for determining that when the values of f and t satisfy P _residual (f, t) > P _{residual_thres} And P is _aec (f, t) - P _residual (f, t) N (f, t), determining the value of f as a first frequency domain value and the value of t as a first time domain value, wherein P _{residual_thres} An energy threshold value of the echo residual signal is preset; p (P) _{voice_thres} Is the energy of the preset near-end sound;

a cancellation unit for substituting P with G (f, t) at a time-frequency point where f is a first frequency domain value and t is a first time domain value _residual (f, t)。

7. The apparatus according to claim 6, wherein the first computing unit is specifically configured to:

p is calculated using the following formula _residual (f, t)：

；

8. The apparatus of claim 7, wherein the C _error (f)，C _harmonic2 (f)，C _harmonic3 (f)，C _harmonic4 (f) Is determined by the following method:

；

；

；

；

9. The apparatus of claim 6, wherein the linear echo cancellation algorithm comprises: minimum mean square error, or affine projection algorithm, or recursive least squares.

10. The apparatus according to claim 6, wherein the first determining unit is specifically configured to:

11. A computer readable storage medium storing a computer program for executing the echo suppression method according to any one of the preceding claims 1-5.

12. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor being configured to read the executable instructions from the memory and execute the instructions to implement the echo suppression method according to any one of the preceding claims 1-5.