CN111225317A

CN111225317A - Echo cancellation method

Info

Publication number: CN111225317A
Application number: CN202010053652.7A
Authority: CN
Inventors: 王前慧; 王平; 邓小红; 李俊潇
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2020-06-02
Anticipated expiration: 2040-01-17
Also published as: CN111225317B

Abstract

The invention relates to the technical field of voice processing, and discloses an echo cancellation method, which solves the problem that an echo cancellation technology in the traditional technology cannot obtain an ideal echo cancellation effect in a scene with a strong far-end signal. The method comprises the following steps: a. collecting a far-end signal and a near-end signal; b. judging whether the far-end signal is larger than the speaker voice in the near-end signal or not according to the corresponding parameters of the far-end signal and the near-end signal; c. when the far-end signal is larger than the voice of the speaker in the near-end signal, performing automatic gain processing on the near-end signal; d. performing decorrelation processing on the near-end signal; e. pre-emphasis and de-direct-current processing are carried out on the near-end signal and the far-end signal; f. obtaining an echo estimation value of a far-end signal through a double-filter, then removing echo from a near-end signal, and adjusting the updating of the coefficient of the double-filter through the results of the previous filtering and the current filtering; g. and removing residual echo by utilizing the correlation among the far-end signal, the near-end signal and the output signal of the filter to obtain a final output signal.

Description

Echo cancellation method

Technical Field

The invention relates to the technical field of voice processing, in particular to an echo cancellation method.

Background

With the advent of the artificial intelligence era, voice technology is an important interface for human-computer interaction. Particularly, with the continuous development of the internet of things technology, people hope to use voice control intelligent equipment in a longer distance and a more complex environment, so that the traditional near-field voice interaction cannot meet the requirements of people, and the microphone array technology becomes the core of far-field interaction.

Aiming at the current complex application scene, a series of key technologies capable of effectively improving the speech recognition rate are developed based on a microphone array, and the key technologies mainly comprise: speech enhancement, sound source localization, reverberation cancellation, echo cancellation, noise suppression, etc.

The echo cancellation mainly utilizes means such as adaptive signal processing to cancel the interference of background sound, the basic principle is as shown in fig. 1, the voice signal collected by the microphone includes the voice signal of the speaker at the far end and near end and the echo signal, the adaptive filter outputs the echo signal estimation value, the echo cancellation is carried out by subtracting the echo signal estimation value from the voice signal collected by the microphone, and the output value is fed back to the adaptive filter for coefficient updating, thereby improving the accuracy of the echo estimation; the existing echo cancellation algorithm can obtain good effect for the condition that the sound signal of the loudspeaker is not strong, but when the sound of the loudspeaker is large, the loudspeaker signal already covers the required speaking voice signal in the voice signal collected by the microphone, and the echo cancellation effect is not ideal when the speaking voice is hard to hear in the sense of hearing.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: an echo cancellation method is provided to solve the problem that the echo cancellation technology in the traditional technology cannot obtain an ideal echo cancellation effect in a scene with a strong far-end signal.

The technical scheme adopted by the invention for solving the technical problems is as follows:

an echo cancellation method, comprising the steps of:

a. collecting a far-end signal and a near-end signal;

b. judging whether the far-end signal is larger than the speaker voice in the near-end signal or not according to the corresponding parameters of the far-end signal and the near-end signal;

c. when the far-end signal is larger than the voice of the speaker in the near-end signal, performing automatic gain processing on the near-end signal;

d. performing decorrelation processing on the near-end signal;

e. pre-emphasis and de-direct-current processing are carried out on the near-end signal and the far-end signal;

f. obtaining an echo estimation value of a far-end signal through a double-filter, then removing echo from a near-end signal, and adjusting the updating of the coefficient of the double-filter through the results of the previous filtering and the current filtering;

g. and removing residual echo by utilizing the correlation among the far-end signal, the near-end signal and the output signal of the filter to obtain a final output signal.

As a further optimization, in step b, the determining whether the far-end signal is greater than the speaker voice in the near-end signal according to the corresponding parameters of the far-end signal and the near-end signal specifically includes:

respectively calculating the energy and power spectrum parameters of the far-end signal and the near-end signal, and calculating the cross correlation between the far-end signal and the near-end signal through the power spectrum parameters;

and when the energy ratio of the far-end signal to the near-end signal is greater than a preset energy ratio threshold value and the cross correlation between the far-end signal and the near-end signal is greater than a set threshold value, judging that the far-end signal is greater than the voice of the speaker in the near-end signal.

As a further optimization, the preset energy ratio threshold is 0.5, and the set threshold of the cross correlation is 0.9.

The invention has the beneficial effects that:

the near-end and far-end signal energy detection and cross correlation detection are added before the traditional echo cancellation, when the far-end signal is judged to be stronger than the speaker voice signal in the near-end signal, the near-end signal is subjected to automatic gain processing and decorrelation processing, so that the effective speaker voice signal can be increased, the interference of partial far-end signals is removed, the echo cancellation is facilitated, and a cleaner speaker voice signal is extracted; meanwhile, after echo cancellation, the method for removing residual echo by using correlation is added, so that a relatively pure human voice signal can be finally obtained, and a relatively good echo cancellation effect is obtained.

Drawings

FIG. 1 is a schematic diagram of echo cancellation;

fig. 2 is a flow chart of an echo cancellation method according to the present invention.

Detailed Description

The invention aims to provide an echo cancellation method, which solves the problem that the echo cancellation technology in the traditional technology cannot obtain an ideal echo cancellation effect in a scene that a far-end signal is larger than a near-end signal. The core idea is as follows: the method comprises the steps of firstly obtaining a far-end signal (a signal played by a loudspeaker) and a near-end signal (a required signal and an echo signal) through a microphone, carrying out energy detection and power spectrum correlation detection on the far-end signal and the near-end signal, and then using the far-end signal and the near-end signal as a discrimination condition. Meanwhile, when the far-end signal is strong, the correlation between the near-end signal and the far-end signal is large, and the near-end signal needs to be decorrelated first. Then, an echo estimation value is obtained by using the self-adaptive filter, echo signals are removed, and meanwhile, the updating of the filter coefficient is adjusted through the results of the previous filtering and the current filtering. And finally, removing residual echo by utilizing the correlation among the far-end signal, the near-end signal and the output signal of the filter to obtain a final output signal.

In a specific implementation, as shown in fig. 2, the echo cancellation method in the present invention includes the following steps:

(1) acquiring a far-end signal and a near-end signal:

in this step, the far-end signal collected by the microphone array is a sound signal played by a loudspeaker, and the collected near-end signal comprises a required voice signal and an echo signal;

(2) judging whether the far-end signal is larger than the speaker voice signal in the near-end signal according to the corresponding parameters of the far-end signal and the near-end signal:

in this step, the energy of the far-end signal and the near-end signal, the power spectrum and other parameters are respectively calculated as the basis for discrimination.

And (3) obtaining energy by squaring the time domains of the far-end signal and the near-end signal, windowing the far-end signal and the near-end signal, and performing fast Fourier transform processing to obtain frequency spectrums.

W_x＝|x(n)|²(1)

W_d＝|d(n)|²(2)

x_f＝fft(x(n)·win) (3)

d_f＝fft(d(n)·win) (4)

In the formula: x (n), d (n) represent near-end and far-end signals, Wx, Wd represent time domain energy of near-end and far-end signals, x_f、d_fRepresenting the near-end and far-end spectra, and win representing the hanning window function.

A preferred example of the method for discriminating the magnitudes of the speaker's voice signals in the far-end signal and the near-end signal is as follows:

energy detection: by calculating the energy of the far-end signal and the near-end signal and the ratio between them as a reference value, the ratio of the energy of the far-end signal to the energy of the near-end signal is closer to 1 when the far-end signal is large and the speaking voice is small, and the ratio is closer to 0 when the far-end signal is small and the speaking voice is large.

k_WRepresenting the ratio of the time domain energies.

And (3) correlation detection: the energy judgment is rough and the interference is more, so the correlation detection of the far-end signal and the near-end signal is needed, the power spectrum of the far-end signal and the near-end signal is smoothed, then the cross power spectrum of the far-end signal and the near-end signal is obtained, the correlation of the far-end signal and the near-end signal can be obtained, the larger the correlation is, the larger the far-end signal contained in the near-end signal is, the smaller the voice signal of the speaker is, and the more the near-end signal needs to be enhanced and de-correlated.

S_d＝gama·S_d+(1-gama)d_f·d'_f(6)

S_x＝gama·S_x+(1-gama)x_f·x'_f(7)

S_xd＝gama·S_xd+(1-gama)x_f·d'_f(8)

Sx and Sd represent a near-end smooth power spectrum and a far-end smooth power spectrum, Sxd near-end cross-power spectrum, Cxd represents cross-correlation, CMxd represents a mean value of the cross-correlation, rang represents frequency points in a frequency point range, the range is 300Hz-1.8KHz, and N represents the total number of the frequency points.

When the energy ratio k of the far-end signal to the near-end signal_WAnd if the cross correlation CMxd between the far-end signal and the near-end signal is greater than 0.5 and greater than 0.9, the far-end signal is judged to be greater than the voice signal of the speaker.

It should be noted that, if the far-end signal is not greater than the voice signal of the speaker, the existing echo processing scheme is directly adopted.

(3) When the far-end signal is greater than the near-end speaker voice signal, the near-end signal is automatically gained, so that the speaker voice signal can be increased, and when the near-end signal is used for subtracting the echo estimation value during echo cancellation, the speaker voice signal contained in the near-end signal can be more effectively extracted.

(4) Meanwhile, when the far-end signal is larger than the near-end signal, decorrelation processing needs to be carried out on the near-end signal, interference of a part of far-end signals can be removed, and echo signals can be eliminated during echo elimination.

(5) After the near-end signal is subjected to enhancement and decorrelation processing, the near-end signal and the far-end signal are subjected to pre-emphasis and direct-current removal processing.

(6) And obtaining an echo estimation value of the far-end signal through a double filter, and then removing the echo of the near-end signal. Meanwhile, the updating of the filter coefficient is adjusted through the results of the previous filtering and the current filtering.

(7) And calculating the cross correlation between the far-end signal and the output signal of the filter again, and removing residual echo by using the cross correlation between the far-end signal and the near-end signal and the cross correlation between the far-end signal and the output signal of the filter to obtain a final output signal.

Compared with the traditional speex echo cancellation technology, the invention adds near-end and far-end signal energy detection and cross correlation detection before echo cancellation, when the far-end signal is judged to be stronger than the speaker voice signal in the near-end signal, the effective speaker voice signal can be increased by automatic gain processing and decorrelation processing of the near-end signal, and the interference of partial far-end signal is removed, thus being more beneficial to echo cancellation and extracting cleaner speaker voice signal; meanwhile, after echo cancellation, the method for removing residual echo by using correlation is added, so that a relatively pure human voice signal can be finally obtained, and a relatively good echo cancellation effect is obtained.

Claims

1. An echo cancellation method, comprising the steps of:

a. collecting a far-end signal and a near-end signal;

d. performing decorrelation processing on the near-end signal;

2. The echo cancellation method of claim 1,

in step b, the determining whether the far-end signal is greater than the speaker voice in the near-end signal according to the corresponding parameters of the far-end signal and the near-end signal specifically includes:

3. The echo cancellation method of claim 2,

the preset energy ratio threshold is 0.5, and the set threshold of the cross correlation is 0.9.