CN113903351A

CN113903351A - Echo cancellation method, device, equipment and storage medium

Info

Publication number: CN113903351A
Application number: CN202111171723.4A
Authority: CN
Inventors: 向伟; 陈建哲; 张腾飞
Original assignee: Baidu Online Network Technology Beijing Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2022-01-07
Also published as: CN110265048A; CN110265048B

Abstract

The disclosure provides an echo cancellation method, apparatus, device and storage medium. The method comprises the following steps: the method comprises the steps that computing equipment estimates time delay between a reference signal played by a second voice interaction device and an echo signal corresponding to the acquired reference signal, wherein the second voice interaction device is a voice interaction device currently used by the computing equipment; and the computing equipment eliminates the echo signal in the original signal acquired by the second voice interaction device according to the estimated time delay. The present disclosure improves echo cancellation effects.

Description

Echo cancellation method, device, equipment and storage medium

The application is a divisional application with the application number of 201910205707.9, the application date of 2019, 03 and 18, and the name of echo cancellation method, device, equipment and storage medium.

Technical Field

The present disclosure relates to the field of signal processing, and in particular, to a method, an apparatus, a device, and a storage medium for echo cancellation.

Background

Currently, in speech recognition, Echo Cancellation in a collected speech signal can be achieved through an Echo Cancellation process, such as an Acoustic Echo Cancellation (AEC) algorithm.

In the prior art, echo cancellation processing specifically cancels an echo signal included in a speech signal collected by a microphone according to a time delay between a played reference signal and an echo signal corresponding to the reference signal collected by the microphone, so as to obtain an original signal sent by a speaker, and avoid an echo caused by the echo signal being superimposed on the original signal. In general, the time delay used in performing the echo cancellation process is a default time delay, that is, an echo signal included in a voice signal collected by a microphone is cancelled based on the default time delay.

However, in the prior art, a default time delay is used in the echo cancellation processing, so that the echo cancellation effect is poor.

Disclosure of Invention

The embodiment of the disclosure provides an echo cancellation method, an echo cancellation device, an echo cancellation apparatus, and a storage medium, which are used to solve the problem in the prior art that an echo cancellation effect is poor due to the use of a default time delay in echo cancellation processing.

In a first aspect, an embodiment of the present disclosure provides an echo cancellation method, including:

when a voice interaction device used by computing equipment is changed from a first voice interaction device to a second voice interaction device, the computing equipment estimates the time delay between a reference signal played by the second voice interaction device and an echo signal corresponding to the acquired reference signal;

and the computing equipment eliminates the echo signal in the original signal acquired by the second voice interaction device according to the estimated time delay.

In a possible implementation, if a connection object of the terminal computing device changes, a voice interaction apparatus used by the computing device is changed from a first voice interaction apparatus to a second voice interaction apparatus.

In one possible implementation, if the computing device is changed from being connected with a target device to not being connected with the target device, the voice interaction apparatus used by the computing device is changed from a first voice interaction apparatus to a second voice interaction apparatus, the target device includes the first voice interaction apparatus, and the computing device includes the second voice interaction apparatus;

or, if the computing device is changed from being not connected with the target device to being connected with the target device, the voice interaction apparatus used by the computing device is changed from a first voice interaction apparatus to a second voice interaction apparatus, the computing device includes the first voice interaction apparatus, and the target device includes the second voice interaction apparatus.

In one possible implementation, the target device is a vehicle.

In a possible implementation, if the computing device is changed from being connected with a first target device to being connected with a second target device, the voice interaction apparatus used by the computing device is changed from a first voice interaction apparatus to a second voice interaction apparatus, the first target device includes the first voice interaction apparatus, and the second target device includes the second voice interaction apparatus.

In one possible implementation, the estimating, by the computing device, a time delay between a reference signal played by the second voice interaction apparatus and a collected echo signal corresponding to the reference signal includes:

the computing equipment determines a time difference between each first time point in the first time points and a second time point corresponding to each first time point according to a plurality of first time points and a plurality of second time points corresponding to the first time points one by one to obtain a plurality of time differences, wherein the first time point is a time point when the second voice interaction device plays the reference signal, and the second time point is a time point when the second voice interaction device acquires the echo signal corresponding to the reference signal played by the corresponding first time point;

the computing device determines a time delay of the reference signal and the echo signal according to the plurality of time differences.

In one possible implementation, the computing device determining, from the plurality of time differences, a time delay of the reference signal and the echo signal includes:

and the computing equipment determines the time delay of the reference signal and the echo signal according to the time differences and a preset estimation algorithm.

In one possible implementation, the predetermined estimation algorithm is a least mean square LMS algorithm.

In a possible implementation, the eliminating, by the computing device, the echo signal in the original signal collected by the second voice interaction apparatus according to the estimated time delay includes:

the computing equipment judges whether the time delay is within a preset time delay range or not;

if the time delay is within the time delay range, eliminating the echo signal in the original signal collected by the second voice interaction device according to the time delay;

and if the time delay is not in the time delay range, eliminating the echo signal in the original signal acquired by the second voice interaction device according to the time delay in the time delay range.

In one possible implementation, the canceling, by the terminal computing device, the echo signal in the acquired original signal according to the estimated time delay includes:

and the terminal computing equipment eliminates the echo signal in the original signal acquired by the second voice interaction device by adopting an Acoustic Echo Cancellation (AEC) algorithm according to the time delay obtained by estimation.

In a possible implementation, after the terminal computing device cancels the echo signal in the original signal collected by the second voice interaction apparatus according to the estimated time delay, the method further includes:

carrying out voice recognition on the voice signal obtained after the elimination to obtain a voice recognition result;

and performing subsequent processing according to the voice recognition result.

In one possible implementation, the subsequent processing includes a wake-up processing and/or an output processing.

In a second aspect, an embodiment of the present disclosure provides an echo cancellation apparatus applied to a computing device, including:

the estimation module is used for estimating the time delay between a reference signal played by the second voice interaction device and an acquired echo signal corresponding to the reference signal when the voice interaction device used by the computing equipment is changed from a first voice interaction device to a second voice interaction device;

and the elimination module is used for eliminating the echo signal in the original signal acquired by the second voice interaction device according to the estimated time delay.

In one possible implementation, the estimation module is specifically configured to:

determining a time difference between each first time point in the plurality of first time points and a second time point corresponding to each first time point according to the plurality of first time points and a plurality of second time points corresponding to the plurality of first time points one to obtain a plurality of time differences, wherein the first time point is a time point when the second voice interaction device plays the reference signal, and the second time point is a time point when the second voice interaction device acquires the echo signal corresponding to the reference signal played by the corresponding first time point;

and determining the time delay of the reference signal and the echo signal according to the plurality of time differences.

In a possible implementation, the estimating module is configured to determine, according to the plurality of time differences, a time delay between the reference signal and the echo signal, and specifically includes:

and determining the time delay of the reference signal and the echo signal according to the time differences and a preset estimation algorithm.

In one possible implementation, the cancellation module is specifically configured to:

judging whether the time delay is within a preset time delay range or not;

In a possible implementation, the canceling module cancels the echo signal in the original signal collected by the second voice interaction apparatus according to the time delay, specifically including:

and according to the time delay obtained by estimation, eliminating the echo signal in the original signal acquired by the second voice interaction device by adopting an acoustic echo elimination AEC algorithm.

In one possible implementation, the apparatus further comprises: a response module;

the response module is configured to: carrying out voice recognition on the voice signal obtained after the elimination to obtain a voice recognition result; and performing subsequent processing according to the voice recognition result.

In a third aspect, an embodiment of the present disclosure provides an echo cancellation device, including:

a processor and a memory for storing computer instructions; the processor executes the computer instructions to perform the method of any of the first aspects described above.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where instructions that, when executed by a processor of an echo cancellation device, enable the echo cancellation device to perform the method of any one of the above first aspects.

In a fifth aspect, an embodiment of the present disclosure provides a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of any of the first aspects.

The echo cancellation method, apparatus, device and storage medium provided in the embodiments of the present disclosure, when a voice interaction device used by a computing device is changed from a first voice interaction device to a second voice interaction device, the computing device estimates a time delay between a reference signal played by the second voice interaction device and an echo signal corresponding to the acquired reference signal, and cancels the echo signal in an original signal acquired by the second voice interaction device according to the estimated time delay, so that when the voice interaction device used by the computing device is changed, the time delay of the changed voice interaction device can be estimated in time, and the echo signal in the original signal acquired by the changed voice interaction device is cancelled based on the estimated time delay, thereby not only avoiding a problem of poor echo cancellation effect due to the use of a default time delay, and the problem of poor echo cancellation effect caused by inaccurate time delay when the voice interaction device used by the computer equipment is changed or the time delay of the voice interaction device before the change is used for canceling the echo signal in the original signal collected by the changed voice interaction device can be avoided, and the echo cancellation effect is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a schematic view of a first application scenario of an echo cancellation method according to an embodiment of the present disclosure;

fig. 2 is a schematic view of an application scenario of the echo cancellation method according to the embodiment of the present disclosure;

fig. 3 is a schematic view of an application scenario of the echo cancellation method according to the embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a first echo cancellation method according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of a second echo cancellation method according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of a third echo cancellation method according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a first echo cancellation device according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a second echo cancellation device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments obtained based on the embodiments in the disclosure belong to the protection scope of the disclosure.

Fig. 1 is a schematic view of an application scenario of the echo cancellation method according to the embodiment of the present disclosure, as shown in fig. 1, the application scenario may include a computing device 11, and the computing device 11 may include at least two voice interaction apparatuses, for example, a voice interaction apparatus a and a voice interaction apparatus b in fig. 1. Computing device 11 may use voice interaction means a or voice interaction means b to engage in voice interactions with the user. Specifically, the computing device 11 may use the voice interaction device a of the computing device 11 to collect voice, and use the voice interaction device b of the computing device 11 to perform voice playing, such as playing music and playing navigation; alternatively, the computing device 11 may use the voice interaction apparatus b of the computing device 11 to collect voice and use the voice interaction apparatus b of the computing device 11 to play voice.

Fig. 2 is a schematic view of an application scenario of the echo cancellation method according to the embodiment of the present disclosure, as shown in fig. 2, the application scenario may include a computing device 11 and a first target device 12, where the computing device 11 may include at least one voice interaction apparatus, the first target device 12 may include at least one voice interaction apparatus, for example, in fig. 1, the computing device 11 includes a voice interaction apparatus a, and the first target device 12 includes a voice interaction apparatus b. The computing device 11 may use the voice interaction means a of the computing device 11 or the voice interaction means b of the first target device 12 for voice interaction with the user. Specifically, the computing device 11 may use the voice interaction device a of the computing device 11 to collect voice, and use the voice interaction device b of the computing device 11 to perform voice playing, such as playing music and playing navigation; alternatively, the computing device 11 may use the voice interaction apparatus b of the first target device 12 to collect voice, and use the voice interaction apparatus b of the first target device 12 to perform voice playing.

Fig. 3 is a schematic diagram of an application scenario of the echo cancellation method according to the embodiment of the present disclosure, as shown in fig. 3, the application scenario may include a computing device 11, a first target device 12, and a second target device 12, where the first target device 12 may include at least one voice interaction apparatus, and the second target device 13 may include at least one voice interaction apparatus, for example, in fig. 1, the first target device 12 includes a voice interaction apparatus a, and the second target device 13 includes a voice interaction apparatus b. The computing device 11 may use the voice interaction means a of the first target apparatus 12 or the voice interaction means b of the second target apparatus 12 for voice interaction with the user. Specifically, the computing device 11 may use the voice interaction device a of the first target device 12 to collect voice, and use the voice interaction device b of the first target device 12 to perform voice playing, such as playing music and playing navigation; alternatively, the computing device 11 may use the second target device 13 voice interaction apparatus b to collect voice, and use the second target device 13 voice interaction apparatus b to perform voice playing.

It is understood that the above three application scenarios may be combined, and one application scenario may include the computing device 11, the first target device 12, and the second target device 12, where the computing device 11 may include at least two voice interaction apparatuses, and the first target device 12 and the second target device 13 may each include one voice interaction apparatus. Wherein, the computing device 11 may use a voice interaction apparatus of the computing device 11 to collect voice, and use the voice interaction apparatus of the computing device 11 to perform voice playing; alternatively, the computing device 11 may use another voice interaction apparatus of the computing device 11 to collect voice, and use the another voice interaction apparatus of the computing device 11 to perform voice playing; or, the computing device 11 may use the first target device 12 voice interaction apparatus to collect voice, and use the first target device 12 voice interaction apparatus to perform voice playing; the computing device 11 may use the second target device 13 voice interaction apparatus to collect voice and use the second target device 13 voice interaction apparatus to play voice.

It should be noted that the voice interaction device in the embodiment of the present disclosure may be any entity device capable of collecting voice and playing the voice.

It should be noted that the computing device (computing device)11 may specifically be a device capable of playing voice and collecting voice through the voice interaction apparatus, and may have a certain computing capability (e.g., estimating a time delay). For a specific type of computing device, the present disclosure may not be limited, and may be, for example, a cell phone, a tablet, a wearable device, and the like.

It should be noted that, the connection manner of the voice interaction apparatus between the computing device and the target device in fig. 2 and fig. 3 may not be limited in the present disclosure.

Fig. 4 is a schematic flowchart of a first embodiment of an echo cancellation method according to an embodiment of the present disclosure. The method of this embodiment may be performed by a computing device, as shown in fig. 4, and the method of this embodiment may include:

step 401, when a voice interaction device used by a computing device is changed from a first voice interaction device to a second voice interaction device, the computing device estimates a time delay between a reference signal played by the second voice interaction device and an echo signal corresponding to the acquired reference signal.

In this step, the first voice interaction device may be understood as the voice interaction device a, and the second voice interaction device may be understood as the voice interaction device b; alternatively, the first voice interaction device may be understood as the voice interaction device b, and the second voice interaction device may be understood as the voice interaction device a. The voice interaction device used by the computing equipment can be understood as a voice interaction device used by the computing equipment for playing and collecting voice, and a user can perform voice interaction with the computing equipment through the voice interaction device.

For the application scenario shown in fig. 1, the voice interaction means used by the computing device is changed from the first voice interaction means to the second voice interaction means, for example, the voice interaction means used by the computing device 11 can be changed from the voice interaction means a of the computing device 11 to the voice interaction means b of the computing device 11. At this time, the voice interaction device a of the computing apparatus 11 may be understood as a first voice interaction device, and the voice interaction device b of the computing apparatus 11 may be understood as a second voice interaction device.

For the application scenario shown in fig. 2, the voice interaction means used by the computing device is changed from the first voice interaction means to the second voice interaction means, for example, the voice interaction means used by the computing device 11 is changed from the voice interaction means a of the computing device 11 to the voice interaction means b of the first target device 12. At this time, the voice interaction apparatus a of the computing device 11 may be understood as a first voice interaction apparatus, and the voice interaction apparatus b of the first target device 12 may be understood as a second voice interaction apparatus.

For the application scenario shown in fig. 3, the voice interaction means used by the computing device is changed from the first voice interaction means to the second voice interaction means, for example, the voice interaction means used by the computing device 11 is changed from the voice interaction means a of the first target device 12 to the voice interaction means b of the second target device 13. At this time, the voice interaction apparatus a of the first target device 12 may be understood as a first voice interaction apparatus, and the voice interaction apparatus b of the second target device 13 may be understood as a second voice interaction apparatus.

The voice signal played by the computing device using the voice interaction device may be referred to as a reference signal, and the voice signal collected by the computing device using the voice interaction device may be referred to as an original signal. It is understood that after the reference signal is played by the computing device, the played sound may be collected by the voice interaction apparatus, i.e. the collected original signal may include the voice signal played by the reference signal computing device.

Due to different hardware structures of different voice interaction devices, the time delay between the reference signal played by the computing equipment and the echo signal corresponding to the acquired reference signal by the different voice interaction devices may be different. Here, by estimating the time delay between the reference signal played by the second voice interaction device and the acquired echo signal corresponding to the reference signal when the voice interaction device used by the computing device is changed from the first voice interaction device to the second voice interaction device, the time delay between the changed reference signal played by the second voice interaction device and the acquired echo signal corresponding to the reference signal can be estimated in time when the voice interaction device used by the computing device is changed.

It will be appreciated that during the playing of the reference signal by the computing device, the original signal collected may also include the user's speech signal when the user speaks.

It should be noted that, the disclosure may not be limited to a specific manner in which the computing device estimates a time delay between a reference signal played by the second voice interaction apparatus and an echo signal corresponding to the collected reference signal.

It should be noted that, for a specific manner in which the computing device determines that the voice interaction apparatus used by the computing device is changed from the first voice interaction apparatus to the second voice interaction apparatus, the embodiment of the present disclosure may not be limited, for example, the computing device may monitor the used voice interaction apparatus to determine whether the used voice interaction apparatus is changed, that is, whether the used voice interaction apparatus is changed from the first voice interaction apparatus to the second voice interaction apparatus.

Step 402, the computing device eliminates the echo signal in the original signal collected by the second voice interaction device according to the estimated time delay.

In this step, as to a specific manner of canceling the echo signal in the original signal acquired by the second voice interaction apparatus according to the time delay estimated and obtained in step 401, the embodiment of the present disclosure may not be limited, for example, the reference signal may be moved according to the time delay estimated and the echo signal in the original signal acquired by the second voice interaction apparatus may be cancelled according to the acquired original signal and the moved reference signal.

Here, since in step 401, when the speech interaction device used by the computing apparatus is changed from the first speech interaction device to the second speech interaction device, the time delay of the echo signal corresponding to the reference signal played by the second speech interaction device and the acquired reference signal is estimated, so that in step 402, the time delay of the echo signal corresponding to the reference signal played by the second speech interaction device and the acquired reference signal can be used to cancel the echo signal in the original signal acquired by the second speech interaction device, thereby avoiding that when the speech interaction device is changed from the first speech interaction device to the second speech interaction device, or when the echo signal in the original signal acquired by the second speech interaction device is canceled by using the time delay of the echo signal corresponding to the reference signal played by the first speech interaction device and the acquired reference signal, the echo cancellation effect is poor due to inaccurate time delay.

In the echo cancellation method provided in this embodiment, when the voice interaction apparatus used by the computing device is changed from the first voice interaction apparatus to the second voice interaction apparatus, the computing device estimates a time delay between the reference signal played by the second voice interaction apparatus and the echo signal corresponding to the acquired reference signal, and cancels the echo signal in the original signal acquired by the second voice interaction apparatus according to the estimated time delay, so that when the voice interaction apparatus used by the computing device is changed, the time delay of the changed voice interaction apparatus can be estimated in time, and the echo signal in the original signal acquired by the changed voice interaction apparatus is cancelled based on the estimated time delay, which not only can avoid the problem of poor echo cancellation effect caused by using a default time delay, but also can avoid that when the voice interaction apparatus used by the computing device is changed, or when the echo signal in the original signal collected by the voice interaction device (i.e. the second voice interaction device) after the change is eliminated by using the time delay of the voice interaction device (i.e. the first voice interaction device) before the change, the echo elimination effect is poor due to inaccurate time delay, and the echo elimination effect is improved.

Fig. 5 is a flowchart illustrating a second echo cancellation method according to an embodiment of the present disclosure. On the basis of the embodiment shown in fig. 5, this embodiment mainly describes an optional implementation manner in which, when the voice interaction apparatus changes, the computing device estimates a time delay between a reference signal played by the second voice interaction apparatus and an echo signal corresponding to the acquired reference signal.

Step 501, determining whether a connection object of a computing device changes.

In this step, if the connection object of the computing device changes, it may indicate that the voice interaction apparatus changes, that is, the voice interaction apparatus used by the computing device changes from the first voice interaction apparatus to the second voice interaction apparatus. If the connection object of the computing device is not changed, it may indicate that the voice interaction apparatus is not changed, that is, the voice interaction apparatus used by the computing device is not changed from the first voice interaction apparatus to the second voice interaction apparatus.

The first voice interaction device can be understood as a voice interaction device used before the voice interaction device used by the computing equipment is changed. The second voice interaction device can be understood as a voice interaction device used by the computing equipment after being changed.

Optionally, the connection object of the computing device changes, specifically, the change may be a change between two states, that is, the computing device is connected with the target device, and the computing device is not connected with the target device.

Specifically, if the computing device is changed from being connected with a target device to being unconnected with the target device, a voice interaction apparatus used by the computing device is changed from a first voice interaction apparatus to a second voice interaction apparatus, the target device includes the first voice interaction apparatus, and the computing device includes the second voice interaction apparatus; or, if the computing device is changed from being not connected with the target device to being connected with the target device, the voice interaction apparatus used by the computing device is changed from the first voice interaction apparatus to the second voice interaction apparatus, the computing device includes the first voice interaction apparatus, and the target device includes the second voice interaction apparatus.

For example, as shown in fig. 2, when the computing device 11 is connected to the first target device 12, the computing device 11 may perform voice interaction with the user using the voice interaction apparatus b of the first target device 12; when the computing device 11 is not connected with the first target device 12, the computing device 11 may perform voice interaction with the user using the voice interaction means a of the computing device 11. Therefore, when the connection state of the computing device 11 and the first target device 12 changes, the voice interaction apparatus that can represent the usage of the computing device changes from the first voice interaction apparatus to the second voice interaction apparatus. Specifically, when the computing device 11 changes from being connected with the first target device 12 to not being connected with the first target device, the voice interaction apparatus b may be regarded as a first voice interaction apparatus, and the voice interaction apparatus a may be regarded as a second voice interaction apparatus; when the computing device 11 is changed from being unconnected to the first target device 12 to being connected to the first target device, the voice interaction apparatus a may be regarded as a first voice interaction apparatus, and the voice interaction apparatus b may be regarded as a second voice interaction apparatus.

It should be noted that the target device may specifically be a device that the computing device 11 can establish a connection with and can control part of hardware of the target device, where the part of hardware includes a voice interaction device. For example, the target device may be a vehicle, and in this case, the computing device may be a computing device that supports a specific function that is a function that the computing device can establish a connection with the target device and can control part of hardware of the target device.

Or, optionally, the connection object of the computing device changes, specifically, the change between two states of the computing device being connected to one target device and the computing device being connected to another target device may be used. Specifically, if the computing device is changed from being connected with a first target device to being connected with a second target device, the voice interaction apparatus used by the computing device is changed from the first voice interaction apparatus to the second voice interaction apparatus, the first target device includes the first voice interaction apparatus, and the second target device includes the second voice interaction apparatus.

For example, as shown in fig. 3, when the computing device 11 is connected to the first target device 12, the computing device 11 may perform voice interaction with the user using the voice interaction apparatus a of the first target device 12; when the computing device 11 is connected with the second target device 13, the computing device 11 may perform voice interaction with the user using the voice interaction apparatus b of the second target device 13. Therefore, when the connection state of the computing device 11 with the first target device 12 and the second target device 13 changes, the voice interaction apparatus that can represent the use of the computing device changes from the first voice interaction apparatus to the second voice interaction apparatus. Specifically, when the computing device 11 changes from being connected with the first target device 12 to being connected with the second target device 13, the voice interaction apparatus a may be regarded as a first voice interaction apparatus, and the voice interaction apparatus b may be regarded as a second voice interaction apparatus; when the computing device 11 is changed from being connected with the second target device 13 to being connected with the first target device 12, the voice interaction apparatus b can be regarded as a first voice interaction apparatus, and the voice interaction apparatus a can be regarded as a second voice interaction apparatus.

If the connection object of the computing device changes, executing step 502; and if the connection object of the computing equipment is not changed, ending the process.

Step 502, the computing device estimates a time delay between a reference signal played by a second voice interaction device and an echo signal corresponding to the acquired reference signal.

In this step, the second voice interaction apparatus may be understood as a voice interaction apparatus currently used by the computing device. Optionally, the time delay may be determined by:

step A, the computing device determines a time difference between each first time point in the plurality of first time points and a second time point corresponding to each first time point according to the plurality of first time points and a plurality of second time points corresponding to the plurality of first time points one to obtain a plurality of time differences, wherein the first time point is a time point when the second voice interaction device plays the reference signal, and the second time point is a time point when the second voice interaction device acquires the echo signal corresponding to the reference signal played by the corresponding first time point.

Here, in order to avoid the problem that the determined time delay is inaccurate due to inaccuracy of a single time difference, optionally, a plurality of time differences may be obtained according to the plurality of first time points and the plurality of second time points. For example, the computing device may record a time point 1 (which may be understood as a first time point) at which the speech signal x is played (which may be understood as a reference signal), collect an original signal, and record a time point 2 at which the original signal is collected, where if the speech signal x is included in the original signal, the time point 2 is a second time point corresponding to the time point 1, and further, may obtain a time difference between the time point 2 and the time point 1. For another example, when playing the voice signal y (which may be understood as a reference signal), the computing device may record a time point 3 (which may be understood as a first time point) at which the voice signal y is played, collect an original signal, and record a time point 4 at which the original signal is collected, where if the original signal includes the voice signal y, the time point 4 is a second time point corresponding to the time point 3, and further, may obtain a time difference between the time point 4 and the time point 3.

It should be noted that, the disclosure is not limited to a specific manner of including the reference signal in the acquired original signal.

And step B, the computing equipment determines the time delay of the reference signal and the echo signal according to the time differences.

Specifically, the time delay between the reference signal and the echo signal may be obtained by performing mathematical calculation on a plurality of time differences, for example, the time delay may be obtained by averaging a plurality of time differences. Optionally, when the time delay is obtained according to the time difference, a certain estimation algorithm may be adopted. Further optionally, step B may specifically include: and the computing equipment determines the time delay of the reference signal and the echo signal according to the time differences and a preset estimation algorithm.

Illustratively, the pre-set estimation algorithm is a Least-Mean-Square (LMS) algorithm. Here, the preset estimation algorithm is an LMS algorithm, so that the time delay is determined by adopting a machine learning mode according to a plurality of time differences, and the accuracy of time delay determination is improved.

Step 503, the computing device eliminates the echo signal in the original signal collected by the second voice interaction apparatus according to the estimated time delay.

In this step, optionally, the echo signal in the acquired original signal may be eliminated by using an AEC algorithm. Specifically, step 503 may include: and the computing equipment adopts an Acoustic Echo Cancellation (AEC) algorithm to cancel the echo signal in the original signal acquired by the second voice interaction device according to the time delay obtained by estimation.

Considering that the applicable delay range of an AEC algorithm is certain after the AEC algorithm is determined, so as to avoid the problem that the echo cancellation effect is poor due to the determined delay being outside the certain delay range, optionally, step 503 may specifically include: the computing equipment judges whether the time delay is within a preset time delay range or not; if the time delay is within the time delay range, eliminating the echo signal in the original signal collected by the second voice interaction device according to the time delay; and if the time delay is not in the time delay range, eliminating the echo signal in the original signal acquired by the second voice interaction device according to the time delay in the time delay range.

In the echo cancellation method provided in this embodiment, whether a connection object of the computing device changes is determined, and if the connection object of the computing device changes, the computing device estimates a time delay between a reference signal played by the second voice interaction apparatus and an echo signal corresponding to the acquired reference signal, and the computing device cancels the echo signal in the acquired original signal according to the time delay obtained by estimation, so that the fact that the connection object of the computing device changes to represent that the voice interaction apparatus used by the computing device is changed from the first voice interaction apparatus to the second voice interaction apparatus is realized.

Fig. 6 is a schematic flowchart of a third echo cancellation method according to an embodiment of the present disclosure. On the basis of the foregoing embodiments, the present embodiment mainly describes an alternative implementation manner after performing echo cancellation. As shown in fig. 6, the method of this embodiment may include:

step 601, when the voice interaction device used by the computing device is changed from the first voice interaction device to the second voice interaction device, the computing device estimates the time delay between the reference signal played by the second voice interaction device and the acquired echo signal corresponding to the reference signal.

It should be noted that step 601 is similar to step 401, and is not described herein again.

Step 602, the computing device eliminates the echo signal in the original signal collected by the second voice interaction apparatus according to the estimated time delay.

It should be noted that step 602 is similar to step 402, and is not described herein again.

Step 603, performing voice recognition on the voice signal obtained after the elimination to obtain a voice recognition result.

In this step, the speech recognition result may be, for example, "power on", "weather", or the like. The present disclosure is not limited to a specific embodiment of performing speech recognition on a speech signal obtained after cancellation.

Since the echo cancellation effect can be improved in step 601 and step 602, the accuracy of the speech signal on which the speech recognition is performed in step 603 is higher, so that the accuracy of the speech recognition result can be improved.

And step 604, performing subsequent processing according to the voice recognition result.

In this step, after the voice recognition result is obtained, certain processing may be performed based on the voice recognition result. Here, the present disclosure may not be limited as to the type of processing, and the subsequent processing may include, for example, a wake-up processing and/or an output processing.

For the wake-up processing, for example, it may be determined whether the voice recognition result is the same as a preset wake-up instruction, and if the voice recognition result is the same as the preset result, the application program corresponding to the preset wake-up instruction of the computing device is woken up. For the output process, for example, the speech recognition result may be output in a text box of an input interface.

In the echo cancellation method provided by this embodiment, when the voice interaction device used by the computing device is changed from the first voice interaction device to the second voice interaction device, the computing device estimates a time delay between the reference signal played by the second voice interaction device and the echo signal corresponding to the acquired reference signal, and according to the estimated time delay, the computing device cancels the echo signal in the acquired original signal, performs voice recognition on the voice signal obtained after cancellation to obtain a voice recognition result, and performs subsequent processing according to the voice recognition result.

Fig. 7 is a schematic structural diagram of a first embodiment of an echo cancellation device according to the present disclosure, where the device provided in this embodiment may be applied to the foregoing method embodiment to implement the function of a computing device thereof. As shown in fig. 7, the apparatus of the present embodiment may include: an estimation module 701 and a cancellation module 702.

The estimating module 701 is configured to estimate a time delay between a reference signal played by a second voice interaction device and an acquired echo signal corresponding to the reference signal when a voice interaction device used by the computing device is changed from a first voice interaction device to the second voice interaction device;

a cancellation module 702, configured to cancel, according to the estimated time delay, the echo signal in the original signal collected by the second voice interaction apparatus.

In a possible implementation, if a connection object of the computing device changes, a voice interaction apparatus used by the computing device is changed from a first voice interaction apparatus to a second voice interaction apparatus.

In one possible implementation, the target device is a vehicle.

In one possible implementation, the estimation module 701 is specifically configured to:

In a possible implementation, the estimating module 701 is configured to determine, according to the multiple time differences, a time delay between the reference signal and the echo signal, and specifically includes:

In one possible implementation, the elimination module 702 is specifically configured to:

judging whether the time delay is within a preset time delay range or not;

In a possible implementation, the eliminating module 702 eliminates the echo signal in the original signal collected by the second voice interaction apparatus according to the time delay, specifically including:

In one possible implementation, the apparatus further comprises: a response module 703;

the response module 703 is configured to: carrying out voice recognition on the voice signal obtained after the elimination to obtain a voice recognition result; and performing subsequent processing according to the voice recognition result.

The apparatus of this embodiment may be configured to implement the technical solutions of the embodiments shown in the foregoing methods, and the implementation principles and technical effects are similar, which are not described herein again.

Fig. 8 is a schematic structural diagram of a second echo cancellation device according to an embodiment of the present disclosure, and as shown in fig. 8, the device may include: a processor 801 and a memory 802 for storing computer instructions.

Wherein, the processor 801 executes the computer instructions to execute the following method:

In one possible implementation, the target device is a vehicle.

In one possible implementation, the method for estimating, by the computing device, a time delay between a played reference signal and a collected echo signal corresponding to the reference signal includes:

and the computing equipment adopts an Acoustic Echo Cancellation (AEC) algorithm to cancel the echo signal in the original signal acquired by the second voice interaction device according to the time delay obtained by estimation.

In a possible implementation, after the computing device cancels the echo signal in the original signal collected by the second voice interaction apparatus according to the estimated time delay, the computing device further includes:

and performing subsequent processing according to the voice recognition result.

The disclosed embodiments also provide a computer-readable storage medium, where instructions, when executed by a processor of an echo cancellation device, enable the echo cancellation device to perform an echo cancellation method, the method comprising:

when a voice interaction device used by a computing device is changed from a first voice interaction device to a second voice interaction device, the computing device estimates the time delay between a reference signal played by the second voice interaction device and an echo signal corresponding to the acquired reference signal, wherein the voice interaction device is used for voice interaction between a user and the computing device;

In one possible implementation, the target device is a vehicle.

and performing subsequent processing according to the voice recognition result.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims

1. An echo cancellation method, comprising:

the method comprises the steps that computing equipment estimates time delay between a reference signal played by a second voice interaction device and an echo signal corresponding to the acquired reference signal, wherein the second voice interaction device is a voice interaction device currently used by the computing equipment;

2. The method of claim 1, wherein the computing device estimating a time delay between a reference signal played by a second voice interaction device and the acquired echo signal corresponding to the reference signal comprises:

3. The method of claim 2, wherein the computing device determining the time delay of the reference signal and the echo signal from the plurality of time differences comprises:

4. The method according to any one of claims 1 to 3, wherein the computing device cancels the echo signal in the original signal collected by the second voice interaction apparatus according to the estimated time delay, and the method includes:

5. The method of any one of claims 1 to 4, wherein before the computing device estimates the time delay between the reference signal played by the second voice interaction apparatus and the acquired echo signal corresponding to the reference signal, the computing device further comprises:

the computing device determines the second voice interaction device as the currently used voice interaction device.

6. The method of claim 5, wherein the computing device determining that the currently used voice interaction apparatus is the second voice interaction apparatus comprises at least one of:

if the computing equipment plays and collects voice through the second voice interaction device, the computing equipment determines the second voice interaction device as a currently used voice interaction device; alternatively, the first and second electrodes may be,

if the computing equipment is changed from being connected with target equipment to not being connected with the target equipment, the computing equipment determines the second voice interaction device as a currently used voice interaction device, the target equipment comprises a first voice interaction device, and the computing equipment comprises the second voice interaction device; alternatively, the first and second electrodes may be,

if the computing equipment is changed from being not connected with the target equipment to being connected with the target equipment, the computing equipment determines the second voice interaction device as a currently used voice interaction device, the computing equipment comprises a first voice interaction device, and the target equipment comprises the second voice interaction device; alternatively, the first and second electrodes may be,

if the computing equipment is changed from being connected with first target equipment to being connected with second target equipment, the computing equipment determines the second voice interaction device to be a currently used voice interaction device, the first target equipment comprises the first voice interaction device, and the second target equipment comprises the second voice interaction device.

7. An echo cancellation device, comprising:

the estimation module is used for estimating the time delay between a reference signal played by a second voice interaction device and an acquired echo signal corresponding to the reference signal, wherein the second voice interaction device is a voice interaction device currently used by the computing equipment;

8. The apparatus of claim 7, wherein the estimation module is specifically configured to:

9. The apparatus of claim 8, wherein the estimation module is specifically configured to:

10. The apparatus of any one of claims 7 to 9, the cancellation module being specifically configured to:

judging whether the time delay is within a preset time delay range or not;

11. The apparatus of any of claims 7 to 10, the estimation module further to:

and determining the second voice interaction device as the currently used voice interaction device.

12. The apparatus of claim 11, wherein the estimation module is specifically configured to perform at least one of:

if the computing equipment plays and collects voice through the second voice interaction device, determining the second voice interaction device as a currently used voice interaction device; alternatively, the first and second electrodes may be,

if the computing equipment is changed from being connected with target equipment to not being connected with the target equipment, determining the second voice interaction device as a currently used voice interaction device, wherein the target equipment comprises a first voice interaction device, and the computing equipment comprises the second voice interaction device; alternatively, the first and second electrodes may be,

if the computing equipment is changed from being not connected with the target equipment to being connected with the target equipment, determining the second voice interaction device as a currently used voice interaction device, wherein the computing equipment comprises a first voice interaction device, and the target equipment comprises the second voice interaction device; alternatively, the first and second electrodes may be,

and if the computing equipment is changed from being connected with first target equipment to being connected with second target equipment, determining the second voice interaction device as a currently used voice interaction device, wherein the first target equipment comprises the first voice interaction device, and the second target equipment comprises the second voice interaction device.

13. An echo cancellation device, comprising:

a processor and a memory for storing computer instructions; the processor executes the computer instructions to perform the method of any of claims 1-6.

14. A computer-readable storage medium having instructions that, when executed by a processor of an echo cancellation device, enable the echo cancellation device to perform the method of any of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 6.