CN111883155A

CN111883155A - Echo cancellation method, device and storage medium

Info

Publication number: CN111883155A
Application number: CN202010700907.4A
Authority: CN
Inventors: 马路; 赵培; 苏腾荣
Original assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2020-11-03
Anticipated expiration: 2040-07-17
Also published as: CN111883155B

Abstract

The invention provides an echo cancellation method, an echo cancellation device and a storage medium, wherein the method comprises the following steps: obtaining a prediction echo signal through a nonlinear filter based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process that the nonlinear filter carries out forward calculation on the far-end signal based on the neural network, nonlinear processing is carried out on a weighted summation result of each node in the forward calculation based on a nonlinear function; and performing echo cancellation on the near-end signal input by the microphone according to the predicted echo signal. The invention solves the problems of poor nonlinear echo suppression effect and high processing complexity in the scheme of realizing echo cancellation by combining linear filtering with nonlinear processing, improves the estimation precision of nonlinear echo and further improves the echo cancellation effect.

Description

Echo cancellation method, device and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to an echo cancellation method, an echo cancellation device and a storage medium.

Background

The voice signal processing technology is a key technology in the field of man-machine interaction at present, and the echo cancellation algorithm can realize the cancellation of self-played voice signals received by a microphone of equipment, is a key algorithm for whole voice signal processing and voice enhancement, has an extremely important effect on the voice recognition of the rear end, and is a key technology for voice signal processing.

Fig. 1 is a schematic diagram of echo cancellation, and as shown in fig. 1, in an echo cancellation method in Web Real-time communication (WebRTC) of an open-source tool, an adaptive filter is used to estimate an echo, so as to cancel a linear echo; and the suppression of the residual nonlinear echo is finished by utilizing nonlinear processing. The method can well eliminate linear echo, but residual echo is introduced due to nonlinear echo and time delay estimation error, although nonlinear processing can suppress the residual echo to a certain extent, the suppression degree is limited, and certain residual echo still exists, especially echo in complex environment and nonlinear echo introduced by equipment loudspeaker, thereby affecting final echo elimination effect and further affecting the performance of whole sound signal processing. In addition, the nonlinear processing in the conventional echo cancellation method has high computational complexity, and takes half of the computation time of the whole echo cancellation algorithm.

Disclosure of Invention

Embodiments of the present invention provide an echo cancellation method, apparatus, and storage medium, to at least solve the problems of poor nonlinear echo suppression effect and high processing complexity in a scheme of implementing echo cancellation by combining linear filtering with nonlinear processing.

According to an embodiment of the present invention, there is provided an echo cancellation method including: obtaining a prediction echo signal through a nonlinear filter based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process that the nonlinear filter carries out forward calculation on the far-end signal based on the neural network, nonlinear processing is carried out on a weighted summation result of each node in the forward calculation based on a nonlinear function; and performing echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

In at least one example embodiment, the method further comprises: determining an error signal from the predicted echo signal and a desired input signal; and adjusting the weight coefficient of the neural network according to the error signal.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, adjusting weight coefficients of the neural network according to the error signal includes: calculating the adjusted weight coefficient according to the error signal e (k)

Wherein e (k) is d (k) -o (k), o (k) is the predicted echo signal, d (k) is the desired input signal,

represents the weight coefficient from the ith node of the l-1 layer to the jth node of the l layer in the neural network at the k moment,

representing the weight coefficient from the ith node of the l-1 layer to the jth node of the l layer in the neural network at the moment of k +1, mu is an adjustment step length, Δ w represents the change of the weight coefficient, and the weight coefficient is obtained by solving the weight coefficient of the error signal e (k)

The partial derivatives of (a) are obtained.

In at least one exemplary embodiment, before determining an error signal based on the predicted echo signal and a desired input signal, and adjusting weight coefficients of the neural network based on the error signal, further comprises: performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively; in the absence of sound at the near end and sound at the far end, entering a step of determining an error signal from the predicted echo signal and an expected input signal, and adjusting weighting coefficients of the neural network in accordance with the error signal, wherein the expected input signal comprises: the near-end signal input by the microphone in the case where there is no sound at the near-end and sound at the far-end.

In at least one exemplary embodiment, before obtaining a predicted echo signal through a non-linear filter based on a far-end signal, and performing echo cancellation on a near-end signal input by a microphone according to the predicted echo signal, the method further includes: performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively; and under the condition that the far end has sound, obtaining a predicted echo signal through a nonlinear filter based on a far end signal, and performing echo cancellation on a near end signal input by the microphone according to the predicted echo signal.

In at least one exemplary embodiment, double-ended detection of the near-end signal and the far-end signal, respectively, to determine whether there is sound at the near-end and the far-end, respectively, comprises: respectively acquiring a first energy value of the near-end signal and a second energy value of the far-end signal; determining that the near end has no sound if the first energy value is lower than a first sound determination threshold, and determining that the near end has sound if the first energy value is not lower than a first sound determination threshold; determining that the far-end has no sound if the second energy value is lower than a second sound determination threshold, and determining that the far-end has sound if the second energy value is not lower than a second sound determination threshold.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, deriving the predicted echo signal based on the far-end signal by a nonlinear filter includes: and taking the far-end signal as an input signal of a node of the input layer of the neural network, and performing forward calculation on the far-end signal, wherein in the forward calculation, the output value of the node of the previous layer is subjected to weighted summation to obtain a predicted value according to a weight coefficient from the node of the previous layer to the node of the current layer by layer, and the predicted value is subjected to nonlinear processing to obtain the output value of the node of the current layer until the output value of the node of the output layer is obtained and is used as the predicted echo signal.

According to another embodiment of the present invention, there is provided an echo canceling device including: the nonlinear filter is used for obtaining a predicted echo signal based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process of carrying out forward calculation on the far-end signal based on the neural network, the nonlinear filter carries out nonlinear processing on a weighted summation result of each node in the forward calculation based on a nonlinear function; and the echo cancellation module is used for carrying out echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

In at least one example embodiment, the apparatus further comprises: an error determination module for determining an error signal based on the predicted echo signal and an expected input signal and inputting the error signal to the nonlinear filter; the nonlinear filter is used for adjusting the weight coefficient of the neural network according to the error signal.

In at least one exemplary embodiment, the apparatus further comprises a double ended detection module for: performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively; turning on a function of the nonlinear filter to adjust weight coefficients of the neural network according to the error signal in a case where the near end has no sound and the far end has sound, wherein the desired input signal includes: the near-end signal input by the microphone in the case where there is no sound at the near-end and sound at the far-end; and/or, under the condition that the far end has sound, starting the function of performing echo cancellation on the near-end signal input by the microphone by the echo cancellation module according to the predicted echo signal.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, the nonlinear filter derives a predicted echo signal based on a far-end signal by: and taking the far-end signal as an input signal of a node of the input layer of the neural network, and performing forward calculation on the far-end signal, wherein in the forward calculation, the output value of the node of the previous layer is subjected to weighted summation to obtain a predicted value according to a weight coefficient from the node of the previous layer to the node of the current layer by layer, and the predicted value is subjected to nonlinear processing to obtain the output value of the node of the current layer until the output value of the node of the output layer is obtained and is used as the predicted echo signal.

According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

According to the invention, a far-end signal is input into a nonlinear filter constructed based on a neural network to obtain a predicted echo signal, and echo cancellation is carried out on a near-end signal input by a microphone according to the predicted echo signal. In the embodiment of the invention, in the process of performing forward calculation on the far-end signal by the nonlinear filter based on the neural network, the weighted summation result of each node in the forward calculation is subjected to nonlinear processing based on the nonlinear function, so that the nonlinear filter can replace the combination of the traditional linear adaptive filter and a nonlinear processing module, the independent nonlinear processing of nonlinear residual echo cancellation is avoided, the estimation precision of nonlinear echo is improved, and the echo cancellation effect is further improved, thereby solving the problems of poor nonlinear echo suppression effect and high processing complexity in the scheme of realizing echo cancellation by combining linear filtering and nonlinear processing.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of echo cancellation;

fig. 2 is a flowchart of an echo cancellation method according to embodiment 1 of the present invention;

fig. 3 is a block diagram of the echo canceling device according to embodiment 2 of the present invention;

fig. 4 is a block diagram showing an exemplary configuration of an echo canceling device according to embodiment 2 of the present invention;

FIG. 5 is a schematic diagram of the echo cancellation algorithm based on nonlinear adaptive filtering according to embodiment 4 of the present invention;

FIG. 6 is a flowchart of a nonlinear adaptive filtering algorithm based on a BP neural network according to embodiment 4 of the present invention;

fig. 7 is a flowchart of coefficient updating of the BP neural network-based nonlinear filter according to embodiment 4 of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Example 1

In this embodiment, an echo cancellation method is provided, and fig. 2 is a flowchart of an echo cancellation method according to embodiment 1 of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, obtaining a predicted echo signal through a nonlinear filter based on a far-end signal, for example, the predicted echo signal may be obtained by estimating the echo signal through the nonlinear filter based on the far-end signal, where the nonlinear filter is constructed based on a neural network, and in a process of performing forward calculation on the far-end signal based on the neural network, the nonlinear filter performs nonlinear processing on a weighted summation result of each node in the forward calculation based on a nonlinear function;

and step S204, carrying out echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

In Acoustic Echo Cancellation (AEC), a far-end signal refers to a voice signal of a person at a far-end (e.g., an opposite-end participant of a voice conference, located at other conference places), which is generally transmitted to a current conference place through a communication line; the near-end signal refers to a signal acquired by a near-end microphone, and is a superposition of a voice signal acquired by the near-end microphone after a participant in a current conference room speaks and a voice signal of a far-end (for example, an opposite-end participant of a voice conference, located in other conference rooms) person is played through a near-end speaker (a speaker of the current conference room), and is convolved with an echo path of a room of the local conference room. Through the step S202, the echo signal generated after the far-end signal is emitted by the loudspeaker of the local conference room and passes through the echo path of the room of the local conference room is estimated through the nonlinear filter, so that the estimated predicted echo signal is eliminated from the near-end signal collected by the microphone of the local conference room, and the echo is eliminated. The echo generated by the far-end signal which is emitted by the local loudspeaker and reflected by a complex and changeable wall surface and picked up by the near-end microphone belongs to indirect echo which is a nonlinear signal, and the nonlinear filter in the embodiment is realized by further carrying out nonlinear processing on the weighted summation result of each node, so that the echo path corresponding to the nonlinear echo can be fully simulated, more accurate echo estimation can be carried out, and the method can be better used for echo cancellation of the nonlinear echo.

According to the scheme, a far-end signal is input into a nonlinear filter constructed based on a neural network to obtain a predicted echo signal, and echo cancellation is carried out on a near-end signal input by a microphone according to the predicted echo signal. In the embodiment of the invention, in the process of performing forward calculation on the far-end signal by the nonlinear filter based on the neural network, the weighted summation result of each node in the forward calculation is subjected to nonlinear processing based on the nonlinear function, so that the nonlinear filter can replace the combination of the traditional linear adaptive filter and a nonlinear processing module, the independent nonlinear processing of nonlinear residual echo cancellation is avoided, the estimation precision of nonlinear echo is improved, and the echo cancellation effect is further improved, thereby solving the problems of poor nonlinear echo suppression effect and high processing complexity in the scheme of realizing echo cancellation by combining linear filtering and nonlinear processing.

To implement an adaptive nonlinear filter, the weight parameters of the neural network may be adjusted based on the error between the predicted echo signal and the near-end signal. Thus, in at least one exemplary embodiment, the method may further comprise: determining an error signal from the predicted echo signal and a desired input signal; and adjusting the weight coefficient of the neural network according to the error signal. As an exemplary embodiment, adjusting the weight coefficients of the neural network according to the error signal may be implemented by: and performing back propagation on the error signal in the neural network, and sequentially adjusting the weight coefficients between the nodes of the adjacent layers of the neural network.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, the determining an error signal from the predicted echo signal and an expected input signal includes: the error signal e (k) is calculated by the following formula d (k) — o (k), where o (k) is the predicted echo signal, d (k) is the desired input signal, and the calculation method of o (k) may be, for example, a neural network in which the number of input nodes is 10, the number of hidden layers is 1, the number of nodes is 3, and the number of output nodes is 1

f₁And f₂Respectively representing the nonlinear activation functions used by the hidden layer and the output layer, wherein the nonlinear activation functions used can be the same or different, and g (k) represents the amplitude gain calculated by the k-th update. Because the output is generally a decimal between-1 and 1 if the neural network uses a non-linear activation function such as Sigmoid, and the size of the echo value to be predicted mostly exceeds this range, it needs to multiply a data gain to expand the equivalent data range, and the data gain can be obtained by calculating the maximum value of the data in each input node, that is: g (k) Max (x)_i(k))。

The partial derivatives of (a) are obtained.

Therefore, the weight change of the output layer is:

wherein d is_kRepresenting the desired output value of the output node k, o_kIs a predicted value of the output node k, g is the data gain described above, and Δ w is the data gain for the layer-3 network shown in this example_j,kThe weight change from the intermediate node j to the node K of the output layer is shown, and the number of output layers K is 1.

The weight change of the intermediate hidden layer is:

wherein o is_iRepresenting the output, o, of the input node i of the previous layer_jIndicating the output of the current layer node j, K indicates the total number of output nodes, which is the predicted echo signal in this embodiment, so K is 1. For the layer 3 network of this embodiment,. DELTA.w_i,jRepresenting the change in weight from input node i to intermediate node j.

In the present embodiment, it is possible to determine whether the near end and the far end have sound through double-end detection, and control that the adjustment of the weight coefficient is performed only in the case where the near end is silent and the far end has sound, because the desired input signal in this case excludes the influence of the near end user sound, the adjustment of the weight coefficient based on this is more accurate. In at least one exemplary embodiment, prior to determining an error signal from the predicted echo signal and the desired input signal, and adjusting the weight coefficients of the neural network based on the error signal, the method may further comprise:

performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively;

in the absence of sound at the near end and sound at the far end, entering a step of determining an error signal from the predicted echo signal and an expected input signal, and adjusting weighting coefficients of the neural network in accordance with the error signal, wherein the expected input signal comprises: the near-end signal input by the microphone in the case where there is no sound at the near-end and sound at the far-end.

In the embodiment, it is possible to determine whether the near end and the far end have sound through double-end detection, and control to perform echo cancellation only in the case that the far end has sound, because echo cancellation is most necessary in this case. In at least one exemplary embodiment, before obtaining a predicted echo signal through a non-linear filter based on the far-end signal and performing echo cancellation on the near-end signal input by the microphone according to the predicted echo signal, the method may further include:

and under the condition that the far end has sound, obtaining a predicted echo signal through a nonlinear filter based on a far end signal, and performing echo cancellation on a near end signal input by the microphone according to the predicted echo signal.

The process of performing double-end detection on the near-end signal and the far-end signal respectively to determine whether there is sound at the near-end and the far-end respectively can be realized in various ways, for example, by judging through an energy value, or by judging through a correlation operation, and the like. In at least one exemplary embodiment, double-ended detection of the near-end signal and the far-end signal, respectively, to determine whether the near-end and the far-end have sounds, respectively, may include:

respectively acquiring a first energy value of the near-end signal and a second energy value of the far-end signal;

determining that the near end has no sound if the first energy value is lower than a first sound determination threshold, and determining that the near end has sound if the first energy value is not lower than a first sound determination threshold;

determining that the far-end has no sound if the second energy value is lower than a second sound determination threshold, and determining that the far-end has sound if the second energy value is not lower than a second sound determination threshold.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, and step S202 may include the operations of:

and taking the far-end signal as an input signal of a node of the input layer of the neural network, and performing forward calculation on the far-end signal, wherein in the forward calculation, the output value of the node of the previous layer is subjected to weighted summation to obtain a predicted value according to a weight coefficient from the node of the previous layer to the node of the current layer by layer, and the predicted value is subjected to nonlinear processing to obtain the output value of the node of the current layer until the output value of the node of the output layer is obtained and is used as the predicted echo signal.

In the above process of weighting and summing the output values of the nodes of the previous layer according to the weighting coefficients from the nodes of the previous layer to the nodes of the current layer by layer to obtain the predicted values, the calculation of the output value of the node of each layer may be represented as:

where k denotes the kth iteration,

and

respectively represent the output values of the node j of the l-th layer and the node i of the l-1-th layer of the previous layer,

representing a weight coefficient between a level l-1 node i to a level l node j, N representing the number of level l-1 nodes, f (x) representing a non-linear function with an argument x, an exemplary oneThe nonlinear activation function of (2) is a Sigmoid function, i.e.:

through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

In this embodiment, an echo cancellation device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 3 is a block diagram showing a configuration of an echo canceling device according to embodiment 2 of the present invention, and as shown in fig. 3, the echo canceling device includes:

the nonlinear filter 32 is configured to obtain a predicted echo signal based on a far-end signal, where the nonlinear filter 32 is constructed based on a neural network, and in a process of performing forward calculation on the far-end signal based on the neural network, the nonlinear filter 32 performs nonlinear processing on a weighted summation result of each node in the forward calculation based on a nonlinear function;

and the echo cancellation module 34 is configured to perform echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

To implement an adaptive nonlinear filter, the weight parameters of the neural network may be adjusted based on the error between the predicted echo signal and the near-end signal. Therefore, as shown in an exemplary block diagram of an echo cancellation device according to embodiment 2 of the present invention in fig. 4, the device may further include:

an error determination module 42 for determining an error signal based on the predicted echo signal and a desired input signal and inputting the error signal to the nonlinear filter 32;

the nonlinear filter 32 is used for adjusting the weight coefficient of the neural network according to the error signal.

As shown in an exemplary block diagram of an echo cancellation device according to embodiment 2 of the present invention in fig. 4, in at least one exemplary embodiment, the device may further include a double-end detection module 44 configured to: performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively; in the case where there is no sound at the near end and sound at the far end, the function of the nonlinear filter 32 to adjust the weight coefficients of the neural network according to the error signal is turned on, wherein the desired input signal comprises: the near-end signal input by the microphone in the case where there is no sound at the near-end and sound at the far-end; and/or, in the case that there is sound at the far end, the echo cancellation module 34 is turned on to perform echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

In at least one exemplary embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, the nonlinear filter 32 derives a predicted echo signal based on a far-end signal by:

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

step S1, obtaining a prediction echo signal through a nonlinear filter based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process of carrying out forward calculation on the far-end signal based on the neural network, the nonlinear filter carries out nonlinear processing on the weighted summation result of each node in the forward calculation based on a nonlinear function;

and step S2, performing echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Example 4

Because the traditional echo cancellation method based on the adaptive filtering adopts a linear filtering method, only linear echo can be cancelled; although the nonlinear processing can suppress the nonlinear echo to a certain extent, the suppression degree is limited, especially for the echo under a complex environment and the nonlinear echo introduced by a device loudspeaker, a large amount of residual echo is introduced, and the nonlinear processing adopted for eliminating the residual echo has high time complexity, which seriously affects the performance of echo cancellation, and further affects the performance of whole sound signal processing and speech enhancement.

In order to improve the estimation accuracy of the nonlinear echo and further improve the echo cancellation effect, this embodiment provides an echo cancellation method based on the BP (back propagation) neural network nonlinear adaptive filtering, which uses the BP neural network to construct a nonlinear adaptive filter to estimate the echo signal received by the microphone, and replaces the original two modules of linear adaptive filtering and nonlinear processing. Meanwhile, double-end detection of voice is completed by adopting double-end detection, echo cancellation is carried out only when the far end has voice, and echo estimation is carried out only when the near end has no voice, so that the estimation of echo is prevented from being influenced by the existence of voice. Due to the nonlinear characteristic of the neural network, the method has stronger environment modeling capability and can well predict nonlinear echo, thereby improving the performance of echo cancellation. It should be noted that, the BP neural network is only described as an example, and the description should not be construed as limiting the type of the neural network, and the method is applicable to various types of neural networks.

Fig. 5 is a schematic diagram illustrating the principle of an echo cancellation algorithm based on nonlinear adaptive filtering according to embodiment 4 of the present invention. As shown in fig. 5, the neural network-based algorithm process mainly involves two modules, namely Double ended detection (DTD) and nonlinear adaptive filter.

The double-end detection mainly detects far-end and near-end signals, echo cancellation is carried out only when the far-end has sound, and echo estimation is carried out only when the near-end has no sound, so that the echo estimation is prevented from being influenced by the existence of the sound. A typical double ended detection method is to calculate the energy of the far and near end signals separately.

The nonlinear adaptive filter mainly estimates echo signals which are reflected by the environment and received by the microphone of the nonlinear adaptive filter.

Fig. 6 is a structure diagram of a nonlinear adaptive filter based on a BP neural network according to embodiment 4 of the present invention, and the main steps of the echo cancellation algorithm flow based on the filter structure are as follows:

s601, extracting an input signal: the input samples are stored in a delay unit (register unit) in sequence, and the length of the filter is set by the application scenario. Here, a BP network is assumed in which the number of nodes in the input layer is N equal to 10, the number of hidden layers is 1, the number of nodes is 3, and the number of nodes in the output layer is 1. It should be noted that the present embodiment is only described by taking this setting as an example, and it should not be understood that the scheme is only applicable to the neural network in this setting.

S602, forward calculation: and multiplying the data in the filter register by corresponding tap coefficients in sequence, wherein the initial value of the tap coefficients can be set to be a random decimal between-1 and 1, and can also be set to be 1. Each hidden layer node can be calculated to obtain a predicted value, the predicted value is subjected to nonlinear processing through a nonlinear unit, and the calculation results of different hidden layer nodes are calculated through a weight network to obtain the final valueOutput of (2)

The computation of the node output values for each layer can be expressed as:

where k denotes the kth iteration,

and

representing the weight coefficients between the i-1 th layer node and the i-1 th layer node, N representing the number of the l-1 th layer nodes, f (x) representing a nonlinear function with an argument x, and performing nonlinear processing on the result of the weighted summation, an exemplary nonlinear activation function being a Sigmoid function, namely:

assuming that the number of input nodes is 10, the number of intermediate nodes is 3, and the number of output nodes is 1, after 3-layer forward transmission, the final output prediction value obtained by calculation may be represented as:

wherein the content of the first and second substances,

the weighting factor representing the second level node j to the node 1 of the third level output level, the number of output nodes is 1 because only one predicted output is considered.

S603, error calculation: the error value is obtained by subtracting the desired input signal d (k) from the predicted signal o (k). The desired input signal d (k) is controlled by double-ended detection, and is calculated when there is a signal at the far end and no signal at the near end. The calculated error expression may be expressed as:

wherein e (k) represents the expected signal d (k) and the predicted signal o (k) (i.e. the output value of the neural network, i.e. the predicted value calculated by the filter network

) Difference of (f)₁And f₂Respectively representing the nonlinear activation functions used by the hidden layer and the output layer, wherein the nonlinear activation functions used can be the same or different, and g (k) represents the amplitude gain calculated by the k-th update. Because the output is generally a decimal between-1 and 1 if the neural network uses a non-linear activation function such as Sigmoid, and the size of the echo value to be predicted mostly exceeds this range, it needs to multiply a data gain to expand the equivalent data range, and the data gain can be obtained by calculating the maximum value of the data in each input node, that is: g (k) Max (x)_i(k))。

S604, backward calculation: and (4) performing back propagation on the calculated error, and sequentially adjusting each weight coefficient of the filter. Update the filter coefficients, i.e.:

the weight coefficient from the l-1 layer node i to the l layer node j at the moment k is shown, Δ w represents an error value, and μ is an adjustment step length. Wherein the change Δ w of the weight coefficient is obtained by weighting the error signal e (k)

Is derived from the partial derivative of. Therefore, the weight change of the output layer is:

The weight change of the intermediate hidden layer is:

The process of updating the coefficients of the nonlinear filter based on the BP neural network in the echo cancellation algorithm according to the present embodiment is shown in fig. 7.

The echo cancellation method based on the BP neural network nonlinear filtering of the embodiment estimates the nonlinear echo by using the nonlinear characteristic of the BP neural network, replaces the traditional echo cancellation method based on the linear adaptive filter and the nonlinear processing, and has the following advantages:

better echo cancellation performance: because the method adopts the neural network to realize the nonlinear adaptive filter and replaces the conventional linear adaptive filter, the nonlinear fitting characteristic of the neural network can be utilized to complete the estimation of nonlinear echo, and the performance of echo cancellation can be further improved;

the algorithm has simple structure: the method adopts the nonlinear filter based on the BP neural network to realize the echo estimation, and replaces the linear adaptive filter and the nonlinear processing module required by the conventional echo cancellation, thereby having simpler calculation structure, definite module function and physical significance and easy realization.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An echo cancellation method, comprising:

obtaining a prediction echo signal through a nonlinear filter based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process that the nonlinear filter carries out forward calculation on the far-end signal based on the neural network, nonlinear processing is carried out on a weighted summation result of each node in the forward calculation based on a nonlinear function;

and performing echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

2. The method of claim 1, further comprising:

determining an error signal from the predicted echo signal and a desired input signal;

and adjusting the weight coefficient of the neural network according to the error signal.

3. The method of claim 2, wherein the neural network comprises an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, wherein adjusting the weight coefficients of the neural network based on the error signal comprises:

calculating the adjusted weight coefficient according to the error signal e (k)

The partial derivatives of (a) are obtained.

4. The method of claim 2, further comprising, prior to determining an error signal based on the predicted echo signal and a desired input signal, and adjusting weighting coefficients of the neural network based on the error signal:

5. The method of claim 1, further comprising, before obtaining a predicted echo signal based on the far-end signal by a non-linear filter and performing echo cancellation on a near-end signal input by the microphone according to the predicted echo signal:

6. The method of claim 4 or 5, wherein double-ended detection of the near-end signal and the far-end signal, respectively, to determine whether there is sound at the near-end and far-end, respectively, comprises:

7. The method of any one of claims 1-5, wherein the neural network comprises an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, and wherein deriving the predicted echo signal based on the far-end signal with a non-linear filter comprises:

8. An echo cancellation device, comprising:

the nonlinear filter is used for obtaining a predicted echo signal based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process of carrying out forward calculation on the far-end signal based on the neural network, the nonlinear filter carries out nonlinear processing on a weighted summation result of each node in the forward calculation based on a nonlinear function;

and the echo cancellation module is used for carrying out echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

9. The apparatus of claim 8, further comprising:

an error determination module for determining an error signal based on the predicted echo signal and an expected input signal and inputting the error signal to the nonlinear filter;

the nonlinear filter is used for adjusting the weight coefficient of the neural network according to the error signal.

10. The apparatus of claim 9, further comprising a double end detection module to:

turning on a function of the nonlinear filter to adjust weight coefficients of the neural network according to the error signal in a case where the near end has no sound and the far end has sound, wherein the desired input signal includes: the near-end signal input by the microphone in the case where there is no sound at the near-end and sound at the far-end; and/or, under the condition that the far end has sound, starting the function of performing echo cancellation on the near-end signal input by the microphone by the echo cancellation module according to the predicted echo signal.

11. The apparatus according to any of claims 8-10, wherein the neural network comprises an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, the nonlinear filter deriving a predicted echo signal based on a far-end signal by:

12. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 7 when executed.