CN117278677A

CN117278677A - Echo cancellation method, apparatus, device and computer readable storage medium

Info

Publication number: CN117278677A
Application number: CN202310982457.6A
Authority: CN
Inventors: 刘兵兵; 吴劼; 侯天峰; 吕和强; 毛婷婷; 王敦聪; 刘哲; 何冲; 袁斌
Original assignee: Goertek Techology Co Ltd
Current assignee: Goertek Techology Co Ltd
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-12-22

Abstract

The invention relates to the technical field of noise reduction, in particular to an echo cancellation method, an echo cancellation device and a computer readable storage medium, wherein the echo cancellation method is applied to communication equipment, a bone voiceprint sensor is arranged on the communication equipment, and the echo cancellation method comprises the following steps: picking up bone conduction sound signals through a bone voiceprint sensor, and estimating estimated voice signals picked up by a microphone on the communication equipment based on the bone conduction sound signals; picking up a microphone sound signal by a microphone, and subtracting the estimated voice signal from the microphone sound signal to obtain an estimated noise signal; acquiring a far-end sound signal, and estimating an estimated echo signal corresponding to the far-end sound signal based on the far-end sound signal; and superposing the estimated noise signal and the estimated echo signal to obtain an echo-cancelled reference signal, carrying out echo cancellation on the microphone sound signal by the reference signal to obtain an echo-cancelled signal, and taking the echo-cancelled signal as a call sound signal. The invention realizes the improvement of the call effect.

Description

Echo cancellation method, apparatus, device and computer readable storage medium

Technical Field

The present invention relates to the field of noise reduction technologies, and in particular, to an echo cancellation method, apparatus, device, and computer readable storage medium.

Background

With the development of scientific and technological life, audio and video communication has become an important means for people to communicate remotely. In the audio and video call process, a call sound signal of a far-end call end is transmitted to a loudspeaker of a near-end call end to be played, the played sound signal is reflected in the environment, the reflected sound signal is picked up by a microphone of the near-end call end and is transmitted to the far-end call end together with a call sound signal of a user, and accordingly the far-end user hears an echo of the user in the call process.

The echo signal in the sound signal picked up by the microphone is eliminated by the estimated echo signal, however, because the distance between the microphone and the loudspeaker on the communication equipment is relatively close, the microphone picks up not only the echo signal but also the signal played by the loudspeaker, thus the unnecessary sound signal still exists in the sound signal of the microphone processed by the echo elimination algorithm, the audio quality of the communication sound signal is lowered, and the communication effect is reduced.

Disclosure of Invention

The invention mainly aims to provide an echo cancellation method, an echo cancellation device, echo cancellation equipment and a computer readable storage medium, aiming to improve the echo cancellation effect and the conversation effect.

In order to achieve the above object, the present invention provides an echo cancellation method applied to a call device, on which a bone voiceprint sensor is disposed, the echo cancellation method comprising the steps of:

picking up bone conduction sound signals through the bone voiceprint sensor, and estimating estimated voice signals picked up by a microphone on the communication device based on the bone conduction sound signals;

picking up a microphone sound signal by the microphone and subtracting the estimated voice signal from the microphone sound signal to obtain an estimated noise signal;

acquiring a far-end sound signal, and estimating an estimated echo signal corresponding to the far-end sound signal based on the far-end sound signal;

and superposing the estimated noise signal and the estimated echo signal to obtain an echo-cancelled reference signal, carrying out echo cancellation on the microphone sound signal by the reference signal to obtain an echo cancellation signal, and taking the echo cancellation signal as a call sound signal.

Optionally, the step of estimating an estimated speech signal picked up by a microphone on the telephony device based on the bone conduction sound signal includes:

estimating a first frequency band voice signal picked up by a microphone on the communication equipment based on the bone conduction voice signal, wherein the first frequency band is consistent with the frequency band of the bone conduction voice signal, and is a frequency band corresponding to the first frequency band voice signal;

Estimating a second frequency band voice signal picked up by the microphone based on the first frequency band voice signal and a preset equal-loudness curve, wherein a second frequency band corresponding to the second frequency band voice signal is a frequency band except the first frequency band in a pickup frequency band of the microphone;

and combining the first frequency band voice signal and the second frequency band voice signal to obtain an estimated voice signal.

Optionally, the step of playing the estimated echo signal generated by the far-end sound signal based on the far-end sound signal estimate includes:

estimating a first frequency band echo signal corresponding to the far-end sound signal in the first frequency band based on the far-end sound signal;

determining a loudness difference between the first frequency band echo signal and the first frequency band speech signal;

estimating a second frequency band echo signal corresponding to the far-end sound signal in the second frequency band based on the loudness difference value and the second frequency band voice signal;

and combining the first frequency band echo signal and the second frequency band echo signal to obtain an estimated echo signal.

Optionally, the step of superposing the estimated noise signal and the estimated echo signal to obtain an echo cancellation reference signal includes:

And carrying out weighted summation processing on the estimated noise signal and the estimated echo signal through a preset weight group to obtain an echo cancellation reference signal.

Optionally, after the step of performing echo cancellation on the microphone sound signal by using the reference signal to obtain an echo cancellation signal, the method further includes:

updating to obtain a voice signal estimation coefficient of the next moment based on the estimated noise signal, the bone conduction voice signal and the voice signal estimation coefficient of the current moment;

and updating the echo signal estimation coefficient at the next moment based on the echo cancellation signal, the far-end sound signal and the echo signal estimation coefficient at the current moment.

and superposing the bone conduction sound signal and the echo cancellation signal to obtain a near-end sound signal, and taking the near-end sound signal as a call sound signal.

Optionally, the step of superimposing the bone conduction sound signal and the echo cancellation signal to obtain a near-end sound signal includes:

dividing the bone conduction sound signal into at least one bone conduction subband signal according to a preset bandwidth, and dividing the echo cancellation signal into at least one echo cancellation subband signal according to the preset bandwidth;

Determining a first sub-band signal from each of the bone conduction sub-band signals, and determining a second sub-band signal from each of the echo cancellation sub-band signals, wherein the signal energy of the first sub-band signal is greater than that of the echo cancellation sub-band signal in the same frequency band, and the signal energy of the second sub-band signal is greater than that of the bone conduction sub-band signal in the same frequency band;

and combining the first sub-band signal and the second sub-band signal to obtain a near-end sound signal.

In order to achieve the above object, the present invention further provides an echo cancellation device, where the echo cancellation device is disposed on a call device, and a bone voiceprint sensor is disposed on the call device, and the echo cancellation device includes:

a voice estimation module for picking up bone conduction sound signals through the bone voiceprint sensor and estimating estimated voice signals picked up by a microphone on the call device based on the bone conduction sound signals;

the noise estimation module is used for picking up a microphone sound signal through the microphone and subtracting the estimated voice signal from the microphone sound signal to obtain an estimated noise signal;

the echo estimation module is used for acquiring a far-end sound signal and estimating an estimated echo signal corresponding to the far-end sound signal based on the far-end sound signal;

And the echo cancellation module is used for superposing the estimated noise signal and the estimated echo signal to obtain an echo cancelled reference signal, carrying out echo cancellation on the microphone sound signal through the reference signal to obtain an echo cancelled signal, and taking the echo cancelled signal as a call sound signal.

To achieve the above object, the present invention also provides an echo cancellation device including: a memory, a processor, and an echo cancellation program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the echo cancellation method as described above.

In addition, in order to achieve the above object, the present invention also proposes a computer-readable storage medium having stored thereon an echo cancellation program which, when executed by a processor, implements the steps of the echo cancellation method as described above.

In the invention, the bone conduction sound signal is picked up by the bone conduction sound signal sensor, and the estimated voice signal picked up by the microphone on the communication equipment is estimated based on the bone conduction sound signal.

And picking up a microphone sound signal by the microphone, subtracting the estimated voice signal from the microphone sound signal to obtain an estimated noise signal, wherein the estimated noise signal at least comprises a sound signal played by a loudspeaker picked up by the microphone and an echo signal in the environment. Acquiring a far-end sound signal, and estimating an estimated echo signal corresponding to the far-end sound signal based on the far-end sound signal; and superposing the estimated noise signal and the estimated echo signal to obtain an echo-cancelled reference signal, carrying out echo cancellation on the microphone sound signal by the reference signal to obtain an echo-cancelled signal, and taking the echo-cancelled signal as a call sound signal.

The invention realizes that the echo signal of the far-end reference signal and the sound signal played by the loudspeaker picked up by the microphone are included in the echo-cancelled reference signal, and compared with the reference signal obtained by directly estimating the far-end sound signal, the invention can inhibit the non-voice signal in the sound signal of the microphone as much as possible and improve the signal-to-noise ratio of the sound signal after echo cancellation, thereby improving the signal-to-noise ratio of the call sound signal, the effect of echo cancellation and the call effect.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

Fig. 2 is a flowchart of a first embodiment of the echo cancellation method according to the present invention;

FIG. 3 is a flowchart of an echo cancellation method according to an embodiment of the present invention;

fig. 4 is a schematic functional block diagram of an echo cancellation device according to a preferred embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic device structure of a hardware running environment according to an embodiment of the present invention.

It should be noted that, in the echo cancellation device of the embodiment of the present invention, the echo cancellation device may be a call device that provides a call function, for example, a headset, a smart glasses, a head-mounted display device, a smart phone, a personal computer, or a device that establishes a communication connection with the call device, for example, a server, etc., which is not limited herein.

As shown in fig. 1, the echo cancellation device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the device structure shown in fig. 1 does not constitute a limitation of the echo cancellation device, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and an echo cancellation program may be included in a memory 1005, which is a type of computer storage medium. An operating system is a program that manages and controls the hardware and software resources of the device, supporting the execution of echo cancellation programs, as well as other software or programs. In the device shown in fig. 1, the user interface 1003 is mainly used for data communication with the client; the network interface 1004 is mainly used for establishing communication connection with a server; and the processor 1001 may be configured to call an echo cancellation program stored in the memory 1005 and perform the following operations:

Further, the step of estimating an estimated speech signal picked up by a microphone on the telephony device based on the bone conduction sound signal includes:

Further, the step of playing the estimated echo signal generated by the far-end sound signal based on the far-end sound signal estimate includes:

Further, the step of superposing the estimated noise signal and the estimated echo signal to obtain an echo cancellation reference signal includes:

Further, after the step of performing echo cancellation on the microphone sound signal by using the reference signal to obtain an echo cancellation signal, the processor 1001 may be further configured to invoke an echo cancellation program stored in the memory 1005 to perform the following operations:

Further, the step of superimposing the bone conduction sound signal and the echo cancellation signal to obtain a near-end sound signal includes:

Further, before the step of dividing the input audio signal into a high frequency signal and a low frequency signal according to a preset frequency, the processor 1001 may be further configured to invoke an echo cancellation program stored in the memory 1005 to perform the following operations:

based on the above-described structure, various embodiments of an echo cancellation method are presented.

Referring to fig. 2, fig. 2 is a flowchart of a first embodiment of the echo cancellation method according to the present invention.

Embodiments of the present invention provide embodiments of echo cancellation methods, it being noted that although a logical sequence is illustrated in the flow chart, in some cases the steps illustrated or described may be performed in a different order than that illustrated herein. In this embodiment, the executing body of the echo cancellation method may be a communication device, such as an earphone, a smart glasses, a head-mounted display device, a smart phone, a personal computer, or a device that establishes a communication connection with the communication device, such as a server, but not limited in this embodiment, and the executing body is omitted for convenience of description. In this embodiment, the echo cancellation method is applied to a call device, where a bone voiceprint sensor is disposed on the call device, and the echo cancellation method includes:

Step S10, picking up bone conduction sound signals through the bone voiceprint sensor, and estimating estimated voice signals picked up by a microphone on the communication equipment based on the bone conduction sound signals;

in this embodiment, two end users who perform a call are referred to as a near-end call user and a far-end call user, respectively, a terminal device of the near-end call user is referred to as a near-end call terminal, and a terminal device of the far-end call user is referred to as a far-end call terminal. When the two-end user calls, the echo cancellation is needed to be carried out on the call sound signals of the users so as to improve the call effect. In this embodiment, on the basis of estimating an echo signal based on a far-end sound signal, a bone conduction sound signal is combined to perform speech signal estimation to obtain an estimated speech signal, an estimated noise signal in a sound signal picked up by a microphone is determined according to the estimated speech signal, echo cancellation is performed based on the estimated echo signal and the estimated noise signal, and the sound signal after echo cancellation is used as a call sound signal to improve the signal-to-noise ratio of the call sound signal, thereby improving the call effect.

Specifically, in the present embodiment, a bone conduction sound signal is picked up by a bone voiceprint sensor provided on a telephone apparatus. The bone conduction sound signal may or may not include a sound signal of a near-end talking user, and thus, in a possible embodiment, the operation of estimating the speech signal may be performed by detecting an index of signal energy, frequency, etc. of the bone conduction sound signal, where it is determined that the bone conduction sound signal includes a sound signal of a user talking; if the bone conduction sound signal comprises a sound signal spoken by the user, the estimated echo signal of the far-end reference signal is used as the reference signal for echo cancellation.

The bone conduction sound signal may further include an interference sound signal, where the interference sound signal may be caused by the bone conduction sensor picking up an interference vibration other than a vibration of speaking of the near-end call user, for example, a vibration generated by an impact of the call device or a vibration caused by a severe shake of the call device itself, so in a possible embodiment, before the bone conduction sound signal is transmitted to the call device of the far-end call user, the bone conduction sound signal may be processed, and the processed sound signal is used as a call sound signal of the near-end call user, so as to filter the interference sound signal generated by the interference vibration, improve a signal purity of the sound signal heard by the far-end call user, and improve a call experience of the user.

After the bone conduction sound signal is obtained, a voice signal picked up by a microphone on the call device can be estimated based on the bone conduction sound signal, hereinafter referred to as an estimated voice signal to show distinction. In a possible implementation manner, the bone conduction sound signal is estimated by a filter to obtain an estimated voice signal; in another possible embodiment, the estimated voice signal may be determined based on the attenuation amount and the bone conduction sound signal by calculating the attenuation amount of the vibration signal corresponding to the bone conduction sound signal in the air, and other possible methods may be adopted, which are not limited herein.

Step S20, picking up a microphone sound signal through the microphone, and subtracting the estimated voice signal from the microphone sound signal to obtain an estimated noise signal;

in the present embodiment, a call sound signal is determined with reference to a sound signal picked up by a microphone, and the sound signal picked up by the microphone will be hereinafter referred to as a microphone sound signal to show distinction.

The microphone sound signal picked up by the microphone is an environmental sound signal of the environment in which the microphone is located, and in a feasible implementation manner, when the communication device is a smart phone, the microphone sound signal is an environmental sound signal of the external environment; in another possible embodiment, when the headset device is a telephone device and the microphone is an inner ear microphone, the microphone sound signal is an ambient sound signal of an inner ear environment. When a near-end user speaks, the microphone sound signal at least includes a voice signal of the near-end user, a far-end sound signal transmitted by a far-end speaking end played by a speaker, and an echo signal of the played far-end sound signal, and for the speaking sound signal, the far-end sound signal played by the speaker and the echo signal of the far-end sound signal are noise signals to be eliminated, and since the noise signals cannot be picked up by the microphone alone, the embodiment obtains an estimated noise signal by subtracting the estimated voice signal from the microphone sound signal (hereinafter referred to as an estimated noise signal for distinguishing). In a specific embodiment, the estimated noise signal may be obtained by spectral subtraction.

Step S30, obtaining a far-end sound signal, and estimating an estimated echo signal corresponding to the far-end sound signal based on the far-end sound signal;

in this embodiment, echo estimation is performed on the far-end sound signal, so as to obtain an estimated echo signal generated by playing the far-end sound signal. The echo estimation may be performed by a filter.

The echo signal of the far-end sound signal in the estimated noise signal obtained based on the microphone sound signal may have errors under the influence of the interference signal in the environment where the microphone is located, and in order to avoid the influence of the echo cancellation effect, the embodiment obtains the estimated echo signal with higher accuracy by performing echo estimation on the far-end echo signal, and performs echo cancellation based on the estimated echo signal and the estimated noise signal, so as to reduce the signal-to-noise ratio of the call sound signal.

Step S40, the estimated noise signal and the estimated echo signal are overlapped to obtain an echo-cancelled reference signal, the echo cancellation signal is obtained by carrying out echo cancellation on the microphone sound signal through the reference signal, and the echo cancellation signal is used as a call sound signal.

In this embodiment, the estimated noise signal and the estimated echo signal are superimposed, and the superimposed sound signal is used as the reference signal for echo cancellation. In a possible implementation, the estimated noise signal and the estimated echo signal may be superimposed by means of weighted summation; in another possible implementation manner, signal energy of each estimated noise subband signal and each estimated echo subband signal may be calculated, and compared with the estimated noise subband signal and the estimated echo subband signal in the same frequency band, a subband signal with larger signal energy in the same frequency band is used as a reference signal of the frequency band.

After the reference signal is obtained, echo cancellation is carried out on the microphone sound signal through the reference signal to obtain an echo cancellation signal, and the echo cancellation signal is used as a call sound signal. In a specific embodiment, before the call sound signal is transmitted to the far-end call end, signal processing may be performed on the call sound signal; before the far-end call terminal plays the call sound signal, the far-end call user can perform signal processing on the call sound signal according to actual requirements, and the signal processing can include frequency response adjustment, gain adjustment, dynamic range compression and the like, which is not limited herein.

Further, in a possible embodiment, after step S40, the method further includes:

and S50, superposing the bone conduction sound signal and the echo cancellation signal to obtain a near-end sound signal, and taking the near-end sound signal as a call sound signal.

In some telephony devices, the microphone may be positioned such that the signal energy of the voice signal in the voice signal picked up by the microphone is not high, and the voice signal is unclear, for example, when the telephony device is an earphone device, the microphone is an inner ear microphone, and blocking of the earphone causes the signal energy of the voice signal picked up by the inner ear microphone in the ear of the user to be not high, and the voice signal is unclear. At this time, if the echo-cancelled microphone sound signal is directly used as the call sound signal, although the signal-to-noise ratio of the call sound signal is low, the far-end call user may not hear the very clear voice signal, so that the call experience of the user is affected.

In this embodiment, the near-end sound signal is obtained by superimposing the bone conduction sound signal and the echo cancellation signal, and the near-end sound signal is used as the call sound signal.

Further, in a possible embodiment, step S50 includes:

step S501, dividing the bone conduction sound signal into at least one bone conduction subband signal according to a preset bandwidth, and dividing the echo cancellation signal into at least one echo cancellation subband signal according to the preset bandwidth;

in this embodiment, the bone conduction sound signal is divided into at least one bone conduction subband signal according to a preset bandwidth, and the echo cancellation signal is divided into at least one echo cancellation subband signal according to a preset bandwidth.

Step S502, determining a first sub-band signal from each of the bone conduction sub-band signals, and determining a second sub-band signal from each of the echo cancellation sub-band signals, wherein the signal energy of the first sub-band signal is greater than the echo cancellation sub-band signal of the same frequency band, and the signal energy of the second sub-band signal is greater than the bone conduction sub-band signal of the same frequency band;

In this embodiment, the signal energy of each bone conduction subband signal and each echo cancellation subband signal is calculated. And comparing the bone conduction sub-band signal and the echo cancellation sub-band signal in the same frequency band, and taking the sub-band signal with larger signal energy in the same frequency band as a near-end sound signal of the frequency band.

Specifically, a first sub-band signal is determined from the respective bone conduction sub-band signals, wherein the signal energy of the first sub-band signal is greater than the echo cancellation sub-band signal of the same frequency band. A second sub-band signal is determined from the respective echo cancellation sub-band signal, wherein the second sub-band signal has a signal energy greater than the bone conduction sub-band signal of the same frequency band.

Step S503, combining the first sub-band signal and the second sub-band signal to obtain a near-end sound signal.

The first sub-band signal and the second sub-band signal are combined to obtain a near-end sound signal. The embodiment ensures that the signal energy of each frequency band of the near-end sound signal is large and the signal is clear, thereby improving the conversation effect.

Further, in a possible implementation, the echo cancellation signal and the bone conduction sound signal may also be processed by adopting a weighted summation manner, which is not limited herein.

Further, in a possible implementation manner, in order to avoid the existence of too abrupt subband sound signals in the near-end sound signal, the dynamic range of the near-end sound signal may be controlled, so that the change of the near-end sound signal is gentle, and the conversation experience of the user is improved.

In this embodiment, the bone conduction sound signal is picked up by the bone conduction sound signal sensor, and the estimated sound signal picked up by the microphone on the call equipment is estimated based on the bone conduction sound signal, compared with the sound signal picked up by the microphone, the signal to noise ratio of the bone conduction sound signal is low, and the estimated sound signal with higher signal to noise ratio can be obtained by estimating the sound signal in the sound signal picked up by the microphone by the bone conduction sound signal.

The embodiment realizes that the echo signal of the far-end reference signal and the sound signal played by the loudspeaker picked up by the microphone are included in the echo-cancelled reference signal, and compared with the method of directly estimating the far-end sound signal to obtain the reference signal, the embodiment can inhibit the non-voice signal in the sound signal of the microphone as much as possible and improve the signal-to-noise ratio of the sound signal after echo cancellation, thereby improving the signal-to-noise ratio of the call sound signal, improving the effect of echo cancellation and improving the call effect.

Further, based on the first embodiment, a second embodiment of the echo cancellation method of the present invention is provided, and in this embodiment, the step S10 includes:

step S101, estimating a first frequency band voice signal picked up by a microphone on the communication equipment based on the bone conduction voice signal, wherein the first frequency band is consistent with the frequency band of the bone conduction voice signal, and the first frequency band is the frequency band corresponding to the first frequency band voice signal;

in this embodiment, since the pickup frequency band of the bone voiceprint sensor is different from the pickup frequency band of the microphone, when estimating the voice signal picked up by the microphone, a segment estimation manner is adopted to improve the accuracy of estimating the voice signal.

Specifically, a first frequency band voice signal picked up by a microphone on a call device is estimated based on a bone conduction voice signal, and a frequency band corresponding to the first frequency band voice signal is hereinafter referred to as a first frequency band, and the first frequency band coincides with the frequency band of the bone conduction voice signal. Specifically, the signal estimation may be performed by a filter, which is not described herein.

Step S102, estimating a second frequency band voice signal picked up by the microphone based on the first frequency band voice signal and a preset equal-loudness curve, wherein the second frequency band is a frequency band except the first frequency band in a pickup frequency band of the microphone, and the second frequency band is a frequency band corresponding to the second frequency band voice signal;

in this embodiment, a frequency band other than the first frequency band in the pickup frequency band of the microphone is referred to as a second frequency band, and the estimated voice signal of the second frequency band picked up by the microphone is referred to as a second frequency band voice signal.

Specifically, in this embodiment, the second frequency band speech signal is estimated by combining the equal-loudness curve of the human ear and the first frequency band speech signal, where the equal-loudness curve (equal loudness contour) of the human ear is a curve of the relationship between the sound pressure levels and the frequencies of different frequencies where the human ear senses the same loudness in the hearing frequency range, and the equal-loudness curve indicates what sound pressure level the sound signal of different frequencies needs to reach to obtain the hearing loudness that is consistent to the listener. The specific process of estimating the second frequency band voice signal may be: determining the signal loudness (hereinafter referred to as target signal loudness) of the first-frequency-band voice signal, and determining an equal-loudness curve which is the same as the target signal loudness from the human ear equal-loudness curves as a preset equal-loudness curve; and adjusting the sound pressure level of each frequency point signal of the microphone sound signals of the second frequency band in the microphone sound signals according to a preset equal-sound curve, wherein the obtained signals are the second frequency band sound signals.

Step S103, combining the first frequency band voice signal and the second frequency band voice signal to obtain an estimated voice signal.

Further, in a possible embodiment, step S30 includes:

step S301, estimating a first frequency band echo signal corresponding to the first frequency band of the far-end sound signal based on the far-end sound signal;

in this embodiment, the echo signal of the first frequency band corresponding to the first frequency band of the far-end sound signal is estimated based on the far-end sound signal, specifically, the signal estimation may be performed through a filter, which is not described herein.

Step S302, determining a loudness difference between the first frequency band echo signal and the first frequency band voice signal;

a loudness difference between the first frequency band echo signal and the first frequency band speech signal is determined. In a possible embodiment, the specific procedure for determining the loudness difference may be: based on the loudness value of the first frequency band echo signal, the loudness value can be the loudness value of the first frequency band echo signal by taking the maximum loudness in the first frequency band echo signal, or can be the loudness value of the first frequency band echo signal by taking the average loudness value of the first frequency band echo signal as the loudness value of the first frequency band echo signal, and can be specifically set according to actual requirements; similarly, determining the loudness value of the first frequency band voice signal; and taking the loudness value of the echo signal of the first frequency band and the loudness value of the voice signal of the first frequency band as loudness differences. In another possible implementation manner, the loudness difference of each frequency point of the first frequency band echo signal and the first frequency band voice signal may be calculated, and the maximum value, the average value or other types of statistical values in the loudness differences of the respective frequency points are taken as the loudness difference, which is not limited herein.

Step S303, estimating a second frequency band echo signal corresponding to the second frequency band of the far-end sound signal based on the loudness difference value and the second frequency band speech signal;

and estimating a second frequency band echo signal corresponding to the far-end sound signal in the second frequency band based on the loudness difference value and the second frequency band voice signal. The specific steps can be as follows: and adjusting the loudness of the far-end sound signal of the second frequency band, and maintaining the loudness difference between the far-end sound signal of the second frequency band and each frequency point of the voice signal of the second frequency band at the loudness difference value to obtain a second frequency band echo signal.

Step S304, combining the first frequency band echo signal and the second frequency band echo signal to obtain an estimated echo signal.

And combining the first frequency band echo signal and the second frequency band echo signal to obtain an estimated echo signal. According to the method, the echo signal is estimated in a segmented estimation mode, and the accuracy of estimating the echo signal is improved.

Further, in a possible embodiment, step S40 includes:

step S401, carrying out weighted summation processing on the estimated noise signal and the estimated echo signal through a preset weight group to obtain an echo cancellation reference signal.

In this embodiment, the reference signal for echo cancellation is obtained by performing weighted summation processing on the estimated noise signal and the estimated echo signal through a preset weight set. In the embodiment, the noise signal and the echo signal are overlapped in a weighted summation mode, and compared with the signal overlapping based on signal energy, the method can reduce the calculated amount in the echo cancellation process, improve the echo cancellation efficiency and reduce the delay of call sound signals, thereby improving the call experience of users.

Further, in a possible embodiment, after step S40, the method further includes:

step S60, updating and obtaining a voice signal estimation coefficient of the next moment based on the estimated noise signal, the bone conduction voice signal and the voice signal estimation coefficient of the current moment;

in this embodiment, signal estimation is performed by the filter, and after each estimation, the filter coefficient of the filter is updated to ensure the accuracy of the estimation result.

In this embodiment, a filter for obtaining an estimated speech signal is referred to as a speech estimation filter, and coefficients of the speech estimation filter are referred to as speech signal estimation coefficients. Specifically, based on the estimated noise signal, the bone conduction sound signal and the speech signal estimated coefficient at the current moment, the speech signal estimated coefficient at the next moment is updated, and the specific calculation process may be: the product obtained by multiplying the preset coefficient by the estimated voice signal and multiplying the bone conduction voice signal and the voice signal estimated coefficient at the current moment are added to obtain the voice signal estimated coefficient at the next moment, and the updating formula can be as follows:

W ^update (w)＝W(w)+2μE(w)X(w)

wherein W is ^update (W) is the speech signal estimation coefficient at the next time, W (W) is the speech signal estimation coefficient at the current time, μ is a preset coefficient, E (W) is the estimated speech signal, and X (W) is the bone conduction sound signal.

Further, in a possible implementation manner, a segment estimation manner is adopted when estimating the voice signal, and in this implementation manner, the bone conduction voice signal and the estimated voice signal of the first frequency band (i.e., the voice signal of the first frequency band) may be adopted to update to obtain the voice signal estimation coefficient of the next moment, so as to improve the accuracy of the voice signal estimation coefficient.

Step S70, updating to obtain the echo signal estimation coefficient of the next moment based on the echo cancellation signal, the far-end sound signal and the echo signal estimation coefficient of the current moment.

In the present embodiment, a filter for obtaining an estimated echo signal is referred to as an echo estimation filter, and a coefficient of the echo estimation filter is referred to as an echo signal estimation coefficient.

And updating to obtain the echo signal estimation coefficient of the next moment based on the echo cancellation signal, the far-end sound signal and the echo signal estimation coefficient of the current moment. The specific calculation process can be as follows: the preset coefficient is multiplied by the echo cancellation signal and multiplied by the far-end sound signal, and the product obtained is added with the echo signal estimation coefficient at the current moment to obtain the voice signal estimation coefficient at the next moment, and the updating formula can be as follows:

Wherein,estimating coefficients for the echo signal at the next instant, for>The coefficient is estimated for the echo signal at the current moment, epsilon is a preset coefficient, neare (w) is an echo cancellation signal, and M (w) is a far-end sound signal.

Further, in a possible implementation manner, a segment estimation manner is adopted when the echo signal is estimated, in this implementation manner, the remote sound signal in the first frequency band and the estimated echo signal in the first frequency band (i.e. the echo signal in the first frequency band) may be adopted to update to obtain the echo signal estimation coefficient of the next moment, so as to improve the accuracy of the echo signal estimation coefficient.

In this embodiment, a first frequency band voice signal picked up by a microphone on a call device is estimated based on a bone conduction voice signal, where the first frequency band is consistent with a frequency band of the bone conduction voice signal, and the first frequency band is a frequency band corresponding to the first frequency band voice signal; estimating a second frequency band voice signal picked up by the microphone based on the first frequency band voice signal and a preset equal-loudness curve, wherein the second frequency band corresponding to the second frequency band voice signal is a frequency band except the first frequency band in a pickup frequency band of the microphone; and combining the first frequency band voice signal and the second frequency band voice signal to obtain an estimated voice signal. In the embodiment, the estimated voice signal is determined by adopting a segmentation estimation mode, so that the accuracy of the estimated voice signal is improved.

Further, referring to fig. 3, in the present embodiment, during echo cancellation, a frame processing is performed on a sound signal picked up by a microphone, and the sound signal after the frame processing is subjected to FFT (fast Fourier transform ) to obtain a microphone sound signal, and, for example, a total of 240 sampling points in the microphone sound signal are obtained by converting a window length 256 into a frequency domain, taking one frame of 15ms sampling data and a sampling rate of 16000 as an example. In the embodiment, a bone voiceprint sensor with a pickup frequency band of 0-500 Hz/1000Hz (namely, a first frequency band) is adopted, and the complex domain X (w) is obtained by performing time-frequency conversion in the same way. In this embodiment, the specific procedure for obtaining the estimated speech signal may be:

estimating a first frequency band voice signal picked up by a microphone on the communication equipment based on the bone conduction voice signal, wherein the first frequency band is consistent with the frequency band of the bone conduction voice signal, and is a frequency band corresponding to the first frequency band voice signal; estimating a second frequency band voice signal picked up by the microphone based on the first frequency band voice signal and a preset equal-loudness curve, wherein the second frequency band corresponding to the second frequency band voice signal is a frequency band except the first frequency band in a pickup frequency band of the microphone; and combining the first frequency band voice signal and the second frequency band voice signal to obtain an estimated voice signal. In this embodiment, the formula for calculating the first frequency band speech signal and updating the speech signal estimation coefficient may be:

Wherein W is _0～1000Hz (w) is the speech signal estimation coefficient at the current time, Y _0～1000Hz (w) is an estimated noise signal of 0 Hz-500 Hz/1000Hz, D _0～1000Hz (w) microphone Sound Signal of 0 Hz-500 Hz/1000Hz, E _0～1000Hz (w) is an estimated speech signal of 0 Hz-500 Hz/1000Hz, and mu is a preset coefficient.

In this embodiment, the specific process of determining the estimated echo signal may be:

estimating a first frequency band echo signal corresponding to the far-end sound signal in a first frequency band based on the far-end sound signal; determining a loudness difference between the first frequency band echo signal and the first frequency band speech signal; estimating a second frequency band echo signal corresponding to the far-end sound signal in the second frequency band based on the loudness difference value and the second frequency band voice signal; and combining the first frequency band echo signal and the second frequency band echo signal to obtain an estimated echo signal. And carrying out weighted summation processing on the estimated noise signal and the estimated echo signal through a preset weight group to obtain an echo cancellation reference signal.

The formula for calculating the echo signal of the first frequency band, the echo cancellation signal and updating the echo signal estimation coefficient can be as follows:

wherein, thereinIs the echo estimation coefficient at the current moment, M _0～1000Hz (w) is a far-end sound signal, N _0～1000Hz (w) is an estimated echo signal of 0 Hz-500 Hz/1000Hz,/A- >Is the reference signal of echo cancellation of 0 Hz-500 Hz/1000Hz, P _0～1000Hz (w) microphone sound signals of 0 Hz-500 Hz/1000Hz, neare _0～1000Hz (w) is an echo cancellation signal of 0 Hz-500 Hz/1000Hz, epsilon is a preset coefficient, and theta and 1-theta are preset weight groups.

In addition, an embodiment of the present invention further provides an echo cancellation device, referring to fig. 4, where the echo cancellation device is disposed on a call device, and a bone voiceprint sensor is disposed on the call device, and the echo cancellation device includes:

a voice estimation module 10 for picking up bone conduction sound signals by the bone voiceprint sensor and estimating estimated voice signals picked up by a microphone on the telephony device based on the bone conduction sound signals;

a noise estimation module 20, configured to pick up a microphone sound signal by the microphone, and subtract the estimated voice signal from the microphone sound signal to obtain an estimated noise signal;

an echo estimation module 30, configured to obtain a far-end sound signal, and estimate an estimated echo signal corresponding to the far-end sound signal based on the far-end sound signal;

the echo cancellation module 40 is configured to superimpose the estimated noise signal and the estimated echo signal to obtain an echo cancelled reference signal, perform echo cancellation on the microphone sound signal by using the reference signal to obtain an echo cancelled signal, and use the echo cancelled signal as a call sound signal.

Further, the speech estimation module 10 is further configured to:

Further, the echo cancellation device further comprises a coefficient updating module for:

Further, the echo cancellation device further comprises a superposition module, configured to:

Further, the superposition module is further configured to:

The embodiments of the echo cancellation device of the present invention may refer to the embodiments of the echo cancellation method of the present invention, and will not be described herein.

In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the storage medium stores an echo cancellation program, and the echo cancellation program is executed by a processor to realize the steps of an echo cancellation method as follows.

Embodiments of the echo cancellation device and the computer readable storage medium of the present invention may refer to embodiments of the echo cancellation method of the present invention, and will not be described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. An echo cancellation method, wherein the echo cancellation method is applied to a call device, and a bone voiceprint sensor is arranged on the call device, and the echo cancellation method comprises the following steps:

2. The echo cancellation method of claim 1, wherein the step of estimating an estimated speech signal picked up by a microphone on the telephony device based on the bone conduction sound signal comprises:

3. The echo cancellation method of claim 2, wherein said step of estimating an estimated echo signal generated by playing said far-end sound signal based on said far-end sound signal comprises:

4. The echo cancellation method of claim 1, wherein the step of superimposing the estimated noise signal and the estimated echo signal to obtain an echo cancellation reference signal comprises:

5. The method of echo cancellation according to claim 1, wherein after said step of echo canceling said microphone sound signal with said reference signal to obtain an echo cancelled signal, further comprising:

6. The echo cancellation method according to any one of claims 1 to 5, wherein after the step of echo cancelling the microphone sound signal by the reference signal to obtain an echo cancelled signal, further comprises:

7. The method of echo cancellation according to claim 6, wherein said step of superimposing said bone conduction sound signal and said echo cancellation signal to obtain a near-end sound signal comprises:

8. An echo cancellation device, wherein the echo cancellation device is disposed on a call device, and a bone voiceprint sensor is disposed on the call device, the echo cancellation device comprises:

9. An echo cancellation device, characterized in that the echo cancellation device comprises: memory, a processor and an echo cancellation program stored on the memory and executable on the processor, which echo cancellation program when executed by the processor implements the steps of the echo cancellation method according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which an echo cancellation program is stored, which when executed by a processor implements the steps of the echo cancellation method according to any one of claims 1 to 7.