CN112837697A

CN112837697A - Echo suppression method and device

Info

Publication number: CN112837697A
Application number: CN202110192795.0A
Authority: CN
Inventors: 秦亚光; 夏龙
Original assignee: Beijing Ape Power Future Technology Co Ltd
Current assignee: Beijing Ape Power Future Technology Co Ltd
Priority date: 2021-02-20
Filing date: 2021-02-20
Publication date: 2021-05-25
Anticipated expiration: 2041-02-20
Also published as: CN112837697B

Abstract

The application relates to an echo suppression method and device. The echo suppression method comprises the following steps: performing linear processing on a near-end signal acquired by audio input equipment to obtain an error signal; obtaining an overall residual signal included in the error signal, wherein the overall residual signal includes at least one of a background noise signal, an early residual echo signal, a late residual echo signal, and a late reverberation signal; determining a noise suppression gain corresponding to the overall residual signal; and applying the noise suppression gain to the error signal to obtain a target signal. The echo suppression method provided by the application can further suppress the residual echo in the audio, and improves the user experience.

Description

Echo suppression method and device

Technical Field

The present application relates to the field of audio signal processing technologies, and in particular, to an echo suppression method and apparatus, a computing device, and a computer-readable storage medium.

Background

In the course of the live course conversation, the voice information of the user is received through the microphone, and the sound is played to the user through the loudspeaker. However, the sound played by the speaker is collected by the microphone again as an acoustic echo, and particularly, the user is in an environment with serious reverberation, which may cause bad use feeling for the user. Even though the prior art already has the echo cancellation method and apparatus, the processed sound still has residual echo signals that are difficult to completely remove.

Therefore, how to further suppress the residual echo is an urgent technical problem to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the present application provides an echo suppression method and apparatus, a computing device and a computer-readable storage medium, so as to solve the technical defects in the prior art.

A first aspect of the embodiments of the present application provides an echo suppression method, including:

performing linear processing on a near-end signal acquired by audio input equipment to obtain an error signal;

obtaining an overall residual signal included in the error signal, wherein the overall residual signal includes at least one of a background noise signal, an early residual echo signal, a late residual echo signal, and a late reverberation signal;

determining a noise suppression gain corresponding to the overall residual signal;

and applying the noise suppression gain to the error signal to obtain a target signal.

Optionally, for the echo suppression method, performing linear processing on a near-end signal collected by an audio input device to obtain an error signal, includes:

performing linear adaptive filtering processing on a near-end signal acquired by audio input equipment to obtain an estimated echo signal;

removing the estimated echo signal from the near-end signal to obtain the error signal.

Optionally, for the echo suppression method, the overall residual signal includes a background noise signal, and obtaining the overall residual signal included in the error signal includes:

and based on a preset energy threshold value, taking each frame of audio signal with energy smaller than the energy threshold value in the error signal as a background noise signal.

Optionally, for the echo suppression method, the obtaining the overall residual signal included in the error signal includes:

according to the echo estimation energy and the error signal energy, a leakage parameter algorithm is adopted to obtain a leakage parameter corresponding to the early residual echo;

and obtaining the early residual echo signal according to the error signal and the leakage parameter.

obtaining reverberation time according to the volume of the sound production place;

applying the reverberation time to a spectral variance algorithm of late residual echoes to obtain the late residual echo signal.

obtaining a corresponding posterior signal-to-noise ratio and a corresponding prior signal-to-noise ratio according to the error signal;

calculating by adopting a minimum mean square error algorithm according to the posterior signal-to-noise ratio and the prior signal-to-noise ratio to obtain a first parameter;

and obtaining the late reverberation signal by adopting a late reverberation signal spectrum variance algorithm according to the first parameter.

Optionally, for the echo suppression method, determining a noise suppression gain corresponding to the overall residual signal includes:

and calculating to obtain corresponding noise suppression gain through the whole residual signal by adopting a minimum mean square error algorithm.

A second aspect of the embodiments of the present application provides an echo suppressing apparatus, including:

the linear processing module is configured to perform linear processing on the near-end signal collected by the audio input device to obtain an error signal;

a residual noise acquisition module configured to obtain an overall residual signal included in the error signal, wherein the overall residual signal includes a background noise signal, an early residual echo signal, a late residual echo signal, and a late reverberation signal;

a suppression gain determination module configured to determine a noise suppression gain corresponding to the overall residual signal;

an action module configured to act the noise suppression gain on the error signal to obtain a target signal.

A third aspect of the embodiments of the present application provides a computing device, including a memory, a processor, and computer instructions stored in the memory and executable on the processor, wherein the processor implements the steps of the foregoing echo suppression method when executing the computer instructions.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, which stores computer instructions, and is characterized in that the computer instructions, when executed by a processor, implement the steps of the foregoing echo suppression method.

According to the echo suppression method and device provided by the application, firstly, the near-end signal collected by the audio input equipment is subjected to linear processing, so that the obvious echo signal can be eliminated, an error signal is obtained, and a basis is provided for the subsequent further echo suppression processing; the method comprises the steps of obtaining an integral residual signal contained in an error signal, determining a noise suppression gain corresponding to the integral residual signal, further obtaining a residual noise signal in the error signal and obtaining a corresponding suppression gain, acting the noise suppression gain on the error signal to obtain a target signal, further obtaining the residual noise signal in the error signal and determining a corresponding noise suppression gain, acting the noise suppression gain on the error signal, and achieving the effect of suppressing residual echo on the basis of the error signal obtained through linear processing, thereby improving user experience.

Drawings

Fig. 1 is a schematic flowchart illustrating an echo suppression method according to an embodiment of the present application;

fig. 2 is an overall processing flow diagram of an echo suppression method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an echo suppression device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present invention relate are explained.

Minimum mean square error algorithm: the mmse (minimum Mean Squared error) algorithm replaces its exact value with an estimate of the gradient vector.

Smoothing coefficient: for determining the level of smoothing of the data and the speed of response to the difference between the predicted and actual results.

Audio signal (Audio): the information carrier is a frequency and amplitude change information carrier with regular sound waves of voice, music and sound effects.

Near-end signal: refers to an audio signal collected by an audio collecting device such as a microphone.

Far-end signal: which refers to collecting audio signals collected at an audio playback device through a data line.

An adaptive filter: a digital filter capable of performing digital signal processing by automatically adjusting performance according to an input signal.

Gain (Gain): the degree to which the current, voltage or power is increased for a component, circuit, device or system. Typically specified in decibel (dB) numbers.

Echo signal: the audio signal output by the audio output device is a signal collected by the audio input device after being reflected once or more times in the external environment.

Reverberation: when the sound wave propagates indoors, the sound wave is reflected by obstacles such as walls, ceilings, floors and the like, and a part of the sound wave is absorbed by the obstacles once every reflection. When a sound source emitting the sound waves stops sounding, the sound waves disappear after being reflected and absorbed for many times indoors, and after the sound source stops sounding, the sound waves emitted by a plurality of other sound sources are mixed with the sound waves emitted by the sound source for a period of time, which is called reverberation, and the period of time is called reverberation time.

Late reverberation: in this application, the reverberation of a sound after 60 ms.

The spectral variance algorithm: the method is used for obtaining the spectral variance of the audio signal, the spectral variance represents the square of the energy of the audio signal, and the energy of the audio signal can be obtained by carrying out open-square calculation on the spectral variance. The calculation is generally carried out using the following formula,

wherein, | E_r(l,k)²I represents the energy of the audio signal, in practical applications different subscripts may be used to represent the energy of different audio signals.

FFT: fast Fourier transform (fast Fourier Transformer).

AEC: automatic Echo Cancellation (Automatic Echo Cancellation).

In the present application, an echo suppression method and apparatus, a computing device and a computer readable storage medium are provided, which are described in detail in the following embodiments one by one.

The present embodiment provides an echo suppression method, as shown in fig. 1, fig. 1 shows a schematic flow chart of the echo suppression method provided in an embodiment of the present application, and the method includes steps S101 to S104.

S101, performing linear processing on a near-end signal collected by audio input equipment to obtain an error signal.

According to the method, firstly, a near-end signal collected by audio equipment is subjected to linear processing to obtain an error signal, the error signal is represented by e (n), then, the error signal e (n) is subjected to post-processing to obtain a noise suppression Gain, and finally, the noise suppression Gain is acted on the error signal e (n) to obtain a Target signal Target with further echo elimination.

Near-end signals refer to audio signals captured by an audio input device. The audio signal is a frequency and amplitude variation information carrier with regular sound waves of voice, music or sound effects. And the far-end signal refers to an audio signal collected at the audio playing device through the data line. Inputting the far-end signal into an adaptive filter to perform linear processing to obtain an estimated echo signal, and subtracting the estimated echo signal from a near-end signal to obtain the error signal e (n). Through the above steps, a part of the echo is removed from the error signal e (n), but a part of the echo that cannot be removed by linear processing is also included, so that the echo that cannot be removed by linear processing needs to be further suppressed by post-processing.

In the present embodiment, the audio input device includes an audio acquisition device such as a microphone and a microphone for acquiring audio signal data.

In particular, in a practical application scenario, during the audio signal acquisition process through the audio input device, the acquired audio signal usually includes an echo signal, a reverberation signal, a noise signal, and the like. The echo signal refers to a signal collected by the audio input device after the audio signal output by the audio output device is reflected once or many times in the external environment, wherein the audio output device comprises a loudspeaker and other devices used for playing audio data. The reverberation signal is a signal collected by audio input equipment after sound wave continuously exists and is reflected after a sound source stops sounding in a space. The reverberation signal is different from the echo signal, and the reverberation within a certain time range after the user makes the sound of the user more natural, that is, the reverberation within the certain time range is beneficial to the sound quality, but the reverberation exceeding the certain time range destroys the sound quality, and the echo affects the sound quality, and should be eliminated in practical application. The noise signal refers to an audio signal collected by the audio input device, excluding a user voice signal, an echo signal and a reverberation signal, such as a sound of a fan, a sound of stepping on a floor or an alternating current sound of other people.

Optionally, in the echo suppression method provided by the present application, performing linear processing on a near-end signal collected by an audio input device to obtain an error signal includes:

Specifically, the near-end signal is subjected to linear adaptive filtering processing by an adaptive filter, wherein the adaptive filter is a digital filter capable of performing digital signal processing according to the automatic adjustment performance of the input signal. The working principle of the self-adaptive filter is as follows: and adjusting the weight vector of the filter by using different adaptive filtering algorithms, and estimating an approximate echo path to approximate a real echo path so as to obtain an estimated echo signal. The estimated echo signal is removed from the near-end signal to achieve echo cancellation. The specific algorithm adopted by the linear adaptive filtering process is not limited in the present application, and a person skilled in the art can select the algorithm according to actual situations.

In the echo suppression method provided by the application, the estimated echo signal subjected to echo cancellation processing can be removed from the near-end signal collected by the audio input device through linear adaptive filtering processing, so that an error signal is obtained, and a basis is provided for subsequent residual echo processing.

And S102, obtaining an overall residual signal contained in the error signal, wherein the overall residual signal comprises at least one of a background noise signal, an early residual echo signal, a late residual echo signal and a late reverberation signal.

Although the error signal e (n) from which part of the echo signal is removed is obtained through the linear processing, in practical application, due to technical limitations, signals such as echo, background noise, reverberation, and the like still remain in the error signal e (n) obtained through the linear processing, which destroys the sound quality of the user.

In the echo suppression method provided by the present application, post-processing needs to be performed on the error signal E (n), so as to obtain and remove the integral residual signal E included in the error signal E (n)_Total(l, k) to eliminate the echo signal that can not be removed in the linear processing process, and improve the echo suppression effect. The overall residual signal E_Total(l, k) comprises a background noise signal V (l, k), an early residual echo signal E_res(l, k), late residual echo signal E_r(l, k) and late reverberation signal Z_r(l, k). The background noise signal V (l, k), the early residual echo signal E_res(l, k), the late residual echo signal E_r(l, k) and late reverberation signal Z_r(l, k) is a noise signal that interferes with the user's voice, and is an audio signal that needs to be suppressed.

In particular, in practical applications, during the process of speaking through the audio input device, background noise, such as footsteps, whistling sounds, and other user's alternating current sounds, may exist, and the background noise may adversely affect the audio of the user, so that the background noise needs to be removed. The echo suppression method provided by the application adopts an energy judgment method to obtain a noise signal.

Optionally, in the echo suppression method provided by the present application, the obtaining the overall residual signal included in the error signal includes:

and based on a preset power spectrum energy threshold, taking each frame of audio signal with the power spectrum energy smaller than the energy threshold in the error signal as a background noise signal.

The background noise signal, denoted by V (l, k), refers to other audio signals that interfere with the user's speech signal, except for the user's speech signal. For example, a user is speaking, a voice generated by speaking is a voice signal of the user, and there may be a voice that may cause interference to the voice signal of the user in an environment where the user is currently located, such as walking sounds of other users in the current environment, a voice of knocking a keyboard, and the like, where the voice that may cause interference to the voice signal of the user is a background noise signal.

The power spectrum energy threshold is used for taking each frame of audio signal smaller than the power spectrum energy threshold as a background noise signal V (l, k) in the error signal e (n). The power spectrum energy value used as the power spectrum energy threshold value can be obtained by the following method.

Specifically, in the echo suppression method provided by the present application, the power spectrum energy value of each frame of the audio signal in the error signal can be calculated by the following method:

POW_-Y(l)＝sum(Y(l)[k])；

wherein l denotes the frame number, POW, of the audio signal_-Y (l) represents the power spectrum energy value of the audio signal of the l-th frame, Y (l) k]And the power spectrum energy value of the kth frequency point of the l frame of audio signal is represented, k represents the frequency point of each sampling frequency of the audio signal, wherein l is more than or equal to 1. In general, if the number of FFT points is 256, the number range of the sampling frequency of the audio signal can be [0,255 ]]And y (l) represents the sum of power spectrum energy values of respective sampling frequencies of the l-th frame audio signal.

The sampling frequency is the number of times a sample of the acoustic amplitude is taken per second when the analog acoustic waveform is digitized. In the prior art, in order to ensure that sound is not distorted, the sampling frequency in a live broadcast scene is generally about 16 kHz. Commonly used audio sampling frequencies are 8kHz, 16kHz, 22.05kHz, 37.8kHz, 44.1kHz, 48kHz, etc.

The power spectrum energy threshold is obtained through the steps, and in practical application, each frame of audio signal smaller than the power spectrum energy threshold is used as a background noise signal V (l, k).

Optionally, the obtained error signal includes a residual echo signal after linear processing, and the echo suppression method provided by the present application performs rough estimation on the residual echo in the error signal, specifically:

said overall residual signal comprising an early residual echo signal, obtaining an overall residual signal comprised by said error signal, comprising:

Specifically, in the echo suppression method provided by the present application, the leakage parameter is calculated by the following formula (1):

wherein R is_EYEstimating a cross-correlation value of the echo and the error signal; r_YYIn order to estimate the autocorrelation values of the echoes,

an estimate representing a leakage parameter;

specifically, the meaning of formula (1) is: using the value R of the cross-correlation of the estimated echo with the error signal_EYAnd the autocorrelation value R of the estimated echo_YYComparing to obtain leakage parameters; if the leakage parameter is larger, it is determined that the early residual Echo signal after the Automatic Echo Cancellation (AEC) process is also larger.

The cross-correlation value R_EYThe autocorrelation value R is calculated by the following formula (2)_YYCalculated by the following formula (3):

REY(l,k)＝(1-β(l))REY(l,k)+β(l)P_Y(l,k)P_E(l,k) (2)

R_YY(l,k)＝(1-β(l))R_YY(l,k)+β(l)(P_Y(l,k))² (3)

in the above-mentioned formula (2) and formula (3), P_Y(l, k) is echo estimated energy, which is obtained by calculation according to the following formula (4), and beta (l) is a first smoothing coefficient and can be taken as a value between 0 and 1; p_E(l, k) is the error signal energy, calculated by the following equation (5):

in the above equations (4) and (5), γ is the second smoothing coefficient, and may be a value between 0 and 1. In the formula, l represents the l frame error signal, and k represents the k frequency point. | E (l, k) (Y)²The spectral energy value of the kth frequency point of the ith frame error signal is expressed, and similarly, | Y (l, k) & ltcount |²And the spectral energy value of the k frequency point representing the echo estimation energy of the l frame.

The calculation formula for obtaining the early residual echo signal by the above formula (1) to formula (5) is the following formula (6):

E_res(l,k)＝η*(Y(l,k)-D(l,k))² (6)

wherein E is_res(l, k) represents an early residual Echo signal, Y (l, k) represents a near-end signal, D (l, k) represents an estimated Echo signal after being processed by Automatic Echo Cancellation (AEC), and η represents a leakage parameter, which can be obtained by the above formula (1).

Reverberation Time (RT 60) is the Time taken for the sound field to decay by 60dB, in seconds, and is generally denoted as T60 in the formula. The larger the RT60, the longer the sound generated by the sound production site will disappear. In practical applications, the mid frequency 1000hz is commonly used for marking, so the mid frequency null field reverberation time is generally used for marking a room reverberation characteristic. For example, the reverberation time of a cinema is 0.8s, and the intermediate-frequency null-field reverberation time of the cinema is 0.8 s. Therefore, the reverberation time RT60 of the sound production site is determined by the volume of the sound production site.

Specifically, the reverberation spectrum variance of the late residual echo is shown as follows:

wherein,

estimate of reverberation spectrum variance, N, representing late residual echo_eRepresenting late reverberation obtained according to the reverberation time, R representing effective FFT point number of two adjacent frames, f_sAnd expressing the sampling rate, sampling the obtained audio sample at a preset sampling rate to obtain a digital signal, performing FFT (fast Fourier transform) processing on the digital signal, and performing FFT processing on N sampling points to obtain FFT results of N points. It should be noted that the spectral variance of the late residual echo represents the square of the late residual echo, which is obtained according to the formula (7)

And performing square-opening calculation to obtain the late residual echo.

Wherein ρ (k) represents an attenuation rate of a k-th frequency point signal of the audio signal, and ρ (k) is calculated by the following equation (8):

where T60(k) represents reverberation time, and k represents frequency points.

In the formula (7), the first and second groups,

an estimate representing the initial energy of the late residual echo at the kth frequency bin of the l-th frame,

calculated by the following equation (9):

in formula (9):

wherein, alpha (k) represents attenuation factor of k frequency point, e represents Euler number, N_WThe window length representing the analysis is long,

representing the estimated value of the late reverberation part of the impulse response, R representing the number of effective FFT points of two adjacent frames, N_eRepresenting late reverberation, part of the absolute value sign representing the spectral energy of the late impulse response, f_sThe sampling rate is represented in j (lR), j represents the imaginary part of the complex number, l in brackets represents the l-th frame, R represents the effective FFT point number of two adjacent frames, and rho (k) represents the attenuation rate of the k-th frequency point signal of the audio signal. For example, in practical application, assuming that the actual impulse response of the current room is 128ms, taking a sampling rate of 16kHz as an example, that the impulse response signal length is 2048 points, if the late reverberation N is_eTaking 512, 513-1024 represents the late reverberation part of the impulse response

In the formula (9), the absolute value symbol represents internally

The frequency domain representation of (the late reverberation part of the impulse response), i.e. an estimate of the spectral energy of the late impulse response.

The formula of the spectral variance of the finally obtained late residual echo is shown in the following formula (10):

wherein,

an estimate of the spectral variance of the late residual echo, 0 ≦ η_x1 or less, representing a third slip coefficient,

representing the spectral variance of the estimated echo signal in the previous frame, | X (l, k) & gt²Representing the spectral energy of the estimated echo signal, l representing the ith frame of the estimated echo signal, k representing the kth frequency point, it should be noted that the spectral variance of the late residual echo

An estimate representing the square of the late residual echo signal, and the spectral variance

Performing a square-opening calculation to obtain the late residual echo signal E_r(l,k)。

Specifically, in the echo suppression method provided by the present application, the overall residual signal includes a late reverberation signal.

Reverberation is that sound waves are reflected by obstacles such as walls, ceilings, floors and the like in the process of indoor propagation, and a part of the sound waves is absorbed by the obstacles once every reflection. When a sound source sending the sound waves stops sounding, the sound waves disappear after being reflected and absorbed for many times indoors, and after the sound source stops sounding, the sound waves sent by a plurality of other sound sources are mixed with the sound waves sent by the sound source for a period of time, which is called reverberation, and the period of time is called reverberation time. Since the early reverberation after the sound wave is emitted can increase the sound quality, the late reverberation in the present application refers to the reverberation after the sound is emitted for 60ms, and needs to be suppressed.

In particular, the posterior signal-to-noise ratio (γ)_SP(l, k)) is calculated by the following formula (11); the a priori signal-to-noise ratio (ξ)_SP(l, k)) is calculated using the following formula (12):

wherein, | E (l, k) & gtY & lt in formula (11)²Represents the sum of the spectral energies of the frequency domain of the near-end signal,

representing the sum of the spectral variances of the late residual echo and the noise, i.e. the total noise energy.

In the formula (12), λ_z(l, k) represents the ideal signal of the desired near-end signal after removing noise, reverberation, and echo residual.

Obtaining corresponding posterior signal-to-noise ratio and prior signal-to-noise ratio through the formula, and obtaining a first parameter (G) by adopting a minimum mean square error algorithm_SP(l, k)), as shown in the following equation (13):

the variance of the reverberated speech signal Z (l, k) is calculated as shown in equation (14):

wherein,

an estimate, η, representing the variance of the reverberant speech signal Z (l, k)_zRepresents a fourth smoothing coefficient, 0 ≦ η_z1, E (l, k) represents the frequency domain representation of the error signal E (n).

It should be noted that the reverberated speech signal Z (l, k) represents an audio signal including user speech to be retained and a noise signal to be removed, where the reverberated speech signal includes Z_e(l, k) and Z_r(l, k) two moieties, Z_e(l, k) represents an early speech signal, which is the user's speech that needs to be preserved, Z_r(l, k) represents the late reverberation signal, which is the part that needs to be eliminated in the reverberated speech signal.

Finally, the corresponding late reverberation signal is obtained through the late reverberation signal spectrum variance calculation of the following formula (16), wherein N is_rThe method is used for representing the time distance of arrival of the sound direct wave, R is used for representing the effective length of a frame of data, and the specific formula (15) and the formula (16) are as follows:

wherein α (k) is calculated as follows:

wherein e is Euler number, rho (k) represents attenuation rate of kth frequency point signal of audio signal, R represents FFT point number, f_sAnd represents the sampling rate, and alpha (k) represents the attenuation factor of the k-th frequency point. In addition, in the formula (15)

An estimate representing the spectral variance of the reverberant speech signal Z (l, k), in equation (16)

Representing an estimate of the spectral variance of the late reverberation signal, the estimate of the spectral variance of the late reverberation signal

And performing square-on calculation to obtain the late reverberation signal.

In the echo suppression method provided by the application, the late residual echo signal and the late residual echo signal are obtained by adding the reverberation signal, so that the suppression effect on the residual echo is further improved.

Through the foregoing, the present application obtains a background noise signal, an early residual echo signal, a late residual echo signal, and a late reverberation signal, and sums the background noise signal, the early residual echo signal, the late residual echo signal, and the late reverberation signal to obtain the overall residual signal E_Total(l, k). Specifically, the error signal is obtained according to equation (17-1), and the overall residual signal is obtained according to equation (17-2):

E(l,k)＝Z_e(l,k)+Z_r(l,k)+Z_res(l,k)+E_r(l,k)+V(l,k) (17-1)

E_Total(l,k)＝Z_r(l,k)+E_res(l,k)+E_r(l,k)+V(l,k) (17-2)

in the formula (17-1), E (l, k) represents an error signal output after linear processing, and Z_e(l, k) represents an early speech signal, Z_r(l, k) represents late reverberation signal, E_res(l, k) represents the early residual echo signal, E_r(l, k) represents the late residual echo signal and V (l, k) represents the background noise signal. In practical applications, the audio signal to be preserved is the early speech signal Z_e(l, k) and the audio signal to be suppressed is the late reverberation signal Z_r(l,k)、E_res(l, k) early residual echo signal and late residual echo signal E_r(l, k) and a background noise signal V (l, k).

And S103, determining a noise suppression gain corresponding to the integral residual signal.

Decibels are the unit (dB) of amplifier gain, and the ratio of amplifier output to input is the amplification factor in "multiples", e.g., 10 times amplifier, 100 times amplifier. When the unit is changed to decibels, the amplification factor is called gain. In the present application, the noise suppression Gain (Gain), i.e., the attenuation coefficient of noise, is 0 < Gain < 1.

Optionally, in the echo suppression method provided by the present application, determining a noise suppression gain corresponding to the overall residual signal includes:

Specifically, the noise suppression Gain is obtained by using a minimum mean square error algorithm and combining the integral residual signal.

In practical applications, G is generally used to represent the noise suppression Gain. The specific calculation process of the noise suppression Gain is shown in the following formula (18):

xi in equation (18)_SP(l, k) represents the prior signal-to-noise ratio, which can be obtained according to equation (12), exp refers to the exponential, and the integral is the exponential integral, where v is_kThis can be obtained by the following equation (19):

in the formulas (18) and (19), v is defined as_kRepresenting an intermediate variable, without actual physical significance, used only to link equation (18) with equation (19), where γ is in equation (19)_SP(l, k) is the posterior signal-to-noise ratio, which can be obtained according to equation (11).

And S104, applying the noise suppression gain to the error signal to obtain a target signal.

As can be seen from the foregoing, in the echo suppression method provided in the present application, an error signal for canceling a partial echo signal is obtained through linear processing, and a residual echo signal that is not completely canceled exists in the error signal, so that the present application obtains an overall residual signal existing in the error signal and determines a noise suppression gain corresponding to the overall residual signal. Fig. 2 shows an overall processing flow chart of the echo suppression method, where in fig. 2, a is the near-end signal, e (n) is the error signal, Total is the overall residual signal, Gain is the noise suppression Gain, and Target is the Target signal. In practical application, the near-end signal a may be obtained by an audio acquisition device, the near-end signal a is subjected to linear processing by an adaptive filter to obtain the error signal e (n), the error signal e (n) is subjected to post-processing to obtain the overall residual signal Total, the overall residual signal Total is processed by a minimum mean square error algorithm to obtain the noise suppression Gain, the noise suppression Gain is applied to the error signal e (n) to obtain the Target signal Target, and the Target signal Target is a user speech signal from which various noises are removed, which includes: gain e (n) Target.

In the echo suppression method provided by the application, the obtained noise suppression gain acts on the error signal, a target signal of further suppression of the residual echo is obtained, and user experience is improved.

Specifically, the specific means for applying the noise suppression gain to the error signal to obtain the target signal may be: the obtained noise suppression gain is multiplied by the error signal, and since the noise suppression gain is smaller than 1, after the error signal is multiplied by the noise suppression gain, the whole residual signal in the error signal is reduced, and therefore the residual echo is further suppressed.

Through the steps from S101 to S104, the application provides an echo suppression method, firstly, an error signal is obtained by performing linear processing on a near-end signal acquired by a microphone, so that an echo signal with obvious removal can be obtained, an error signal subjected to rough processing is obtained, and a basis is provided for subsequent further echo suppression processing; the method comprises the steps of obtaining an integral residual signal contained in an error signal, determining a noise suppression gain corresponding to the integral residual signal, further obtaining residual noise in the error signal and obtaining a corresponding suppression gain, acting the noise suppression gain on the error signal to obtain a target signal, further obtaining residual noise in the error signal and determining a corresponding noise suppression gain, acting the noise suppression gain on the error signal, achieving the effect of suppressing residual echo on the basis of the error signal obtained through linear processing, and improving user experience.

The embodiment provides an echo suppression device, as shown in fig. 3, fig. 3 shows a schematic structural diagram of an echo suppression device provided in the present application, where the device includes the following modules:

a linear processing module 301 configured to perform linear processing on a near-end signal collected by an audio input device to obtain an error signal;

a residual noise obtaining module 302 configured to obtain an overall residual signal included in the error signal, wherein the overall residual signal includes at least one of a background noise signal, an early residual echo signal, a late residual echo signal, and a late reverberation signal;

a suppression gain determination module 303 configured to determine a noise suppression gain corresponding to the overall residual signal;

an action module 304 configured to act the noise suppression gain on the error signal to obtain a target signal.

Optionally, the linear processing module 301 is further configured to:

Optionally, the overall residual signal comprises a background noise signal;

optionally, the residual noise obtaining module 302 is further configured to:

Optionally, the overall residual signal comprises an early residual echo signal;

optionally, the residual noise obtaining module 302 is further configured to:

Optionally, the overall residual signal comprises a late residual echo signal;

optionally, the residual noise obtaining module 302 is further configured to:

Optionally, the overall residual signal comprises a late reverberation signal;

optionally, the residual noise obtaining module 302 is further configured to:

Optionally, the suppression gain determination module 303 is further configured to:

The application provides an echo suppression device, firstly, an error signal is obtained by linear processing according to a near-end signal collected by an audio input device, so that an echo signal with obvious removal can be obtained, an error signal subjected to rough processing is obtained, and a foundation is provided for subsequent further echo suppression processing; the method comprises the steps of obtaining an integral residual signal contained in an error signal, determining a noise suppression gain corresponding to the integral residual signal, further obtaining a residual noise signal in the error signal and obtaining a corresponding suppression gain, acting the noise suppression gain on the error signal to obtain a target signal, further obtaining the residual noise signal in the error signal and determining a corresponding noise suppression gain, acting the noise suppression gain on the error signal, achieving the effect of suppressing residual echo on the basis of the error signal obtained through linear processing, and improving user experience.

The above is a schematic scheme of an echo suppression device of the present embodiment. It should be noted that the technical solution of the apparatus belongs to the same concept as the technical solution of the echo suppression method, and details of the technical solution of the echo suppression apparatus, which are not described in detail, can be referred to the description of the technical solution of the echo suppression method. The specific contents included in the echo suppression method have been provided in the foregoing embodiments, and are not described herein again.

An embodiment of the present embodiment provides a computing device 400, and as shown in fig. 4, fig. 4 is a block diagram illustrating a structure of the computing device 400 according to an embodiment of the present specification. The components of the computing device 400 include, but are not limited to, a memory 410 and a processor 420. Processor 420 is coupled to memory 410 via bus 430 and database 450 is used to store data.

Computing device 400 also includes access device 440, access device 440 enabling computing device 400 to communicate via one or more networks 460. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 440 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 400, as well as other components not shown in FIG. 4, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 4 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 400 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 400 may also be a mobile or stationary server.

Wherein, the processor 420 may execute the steps in the echo suppression method provided by the foregoing embodiments. The specific steps are not described in detail in this embodiment.

An embodiment of the present application further provides a computer-readable storage medium, which stores computer instructions, and when the instructions are executed by a processor, the method for echo suppression is implemented as the steps in the foregoing echo suppression method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the echo suppression method described above, and for details that are not described in detail in the technical solution of the storage medium, reference may be made to the description of the technical solution of the echo suppression method described above.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. An echo suppression method, comprising:

2. The echo suppression method according to claim 1, wherein the linear processing of the near-end signal collected by the audio input device to obtain the error signal comprises:

3. The method of echo suppression according to claim 1, wherein said overall residual signal comprises a background noise signal, and obtaining an overall residual signal comprised by said error signal comprises:

4. The echo suppression method according to claim 1, wherein the overall residual signal comprises an early residual echo signal, and obtaining the overall residual signal comprised by the error signal comprises:

5. The echo suppression method according to claim 1, wherein the overall residual signal comprises a late residual echo signal, and obtaining the overall residual signal contained in the error signal comprises:

6. The echo suppression method of claim 1, wherein the overall residual signal comprises a late reverberation signal, and obtaining the overall residual signal contained in the error signal comprises:

7. The method of echo suppression according to claim 1, wherein determining a noise suppression gain for the overall residual signal comprises:

8. An echo suppression device, comprising:

9. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to implement the method of:

the steps of the echo suppression method according to any one of claims 1 to 7.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the echo suppression method according to any one of claims 1 to 7.