CN111968660A - Echo cancellation device and method, electronic device, and storage medium - Google Patents

Echo cancellation device and method, electronic device, and storage medium Download PDF

Info

Publication number
CN111968660A
CN111968660A CN201910419525.1A CN201910419525A CN111968660A CN 111968660 A CN111968660 A CN 111968660A CN 201910419525 A CN201910419525 A CN 201910419525A CN 111968660 A CN111968660 A CN 111968660A
Authority
CN
China
Prior art keywords
signal
sound
echo
voice
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910419525.1A
Other languages
Chinese (zh)
Inventor
程光伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201910419525.1A priority Critical patent/CN111968660A/en
Publication of CN111968660A publication Critical patent/CN111968660A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)

Abstract

Disclosed are an echo cancellation device and method, an electronic device, and a storage medium, wherein the device includes: the voice interaction device comprises an echo cancellation module, a first sound acquisition module and a second sound acquisition module, wherein the first sound acquisition module is arranged at a first position of the voice interaction device, and the second sound acquisition module is arranged at a second position of the voice interaction device; the first voice acquisition module is used for acquiring a first voice signal, and the first voice signal comprises a first target voice signal; the second sound acquisition module is used for acquiring a second voice signal, and the second voice signal comprises a second target sound signal; an echo cancellation module for determining a target sound signal from a first target sound signal and a second target sound signal based on a first voice signal and a second voice signal; the two sound acquisition modules arranged at different positions acquire sound signals with different intensities, one of the sound signals is used as a monitoring signal, an echo signal in the other sound signal is eliminated, a target sound signal is acquired, and simple and efficient echo elimination is realized.

Description

Echo cancellation device and method, electronic device, and storage medium
Technical Field
The present disclosure relates to voice technologies, and in particular, to an echo cancellation apparatus and method, an electronic device, and a storage medium.
Background
With the maturity of the AI technology in the intelligent voice interaction application, the voice-based human-computer interaction is applied more and more widely, and typical application cases include an intelligent sound box, an intelligent household appliance, an intelligent television, a vehicle-mounted intelligent interaction and the like. In order to ensure the high efficiency of voice interaction, the voice front-end processing is indispensable, and a key target thereof is echo cancellation, that is, the influence of sound played by the device itself on voice interaction is eliminated.
In the prior art, echo suppression with relatively acceptable influence in the communication field can be realized by adopting nonlinear processing, but the nonlinear processing can seriously damage a voice signal and has serious influence on voice awakening or voice recognition.
Disclosure of Invention
The present disclosure is proposed to solve the above technical problems. Embodiments of the present disclosure provide an echo cancellation device and method, an electronic device, and a storage medium.
According to an aspect of the embodiments of the present disclosure, there is provided an echo cancellation device, including: the voice interaction device comprises an echo cancellation module, a first sound acquisition module and a second sound acquisition module, wherein the first sound acquisition module is arranged at a first position of the voice interaction device, and the second sound acquisition module is arranged at a second position of the voice interaction device;
the first sound collection module is used for collecting a first voice signal, and the first voice signal comprises a first target sound signal;
the second sound acquisition module is used for acquiring a second voice signal, wherein the second voice signal comprises a second target sound signal;
the echo cancellation module is configured to determine a target sound signal from the first target sound signal and the second target sound signal based on the first speech signal and the second speech signal.
According to another aspect of the embodiments of the present disclosure, there is provided an echo cancellation method, including:
acquiring a first voice signal through a first voice acquisition module arranged at a first position, wherein the first voice signal comprises a first target voice signal;
acquiring a second voice signal through a second voice acquisition module arranged at a second position; the second voice signal comprises a second target sound signal;
determining a target sound signal from the first target sound signal and the second target sound signal based on the first speech signal and the second speech signal.
According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including: at least one loudspeaker and the echo cancellation device of the above embodiments;
the loudspeaker is used for playing an echo signal;
the echo cancellation device is used for receiving a voice signal, canceling the echo signal in the voice signal and obtaining a target sound signal; the voice signal comprises the target sound signal and an echo signal played by the loudspeaker.
According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the echo cancellation method according to the above embodiment.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the echo cancellation method of the above embodiments.
An echo cancellation device and method, an electronic device, and a storage medium provided based on the foregoing embodiments of the present disclosure include: the voice interaction device comprises an echo cancellation module, a first sound acquisition module and a second sound acquisition module, wherein the first sound acquisition module is arranged at a first position of the voice interaction device, and the second sound acquisition module is arranged at a second position of the voice interaction device; the first sound collection module is used for collecting a first voice signal, and the first voice signal comprises a first target sound signal; the second sound acquisition module is used for acquiring a second voice signal, wherein the second voice signal comprises a second target sound signal; the echo cancellation module is configured to determine a target sound signal from the first target sound signal and the second target sound signal based on the first voice signal and the second voice signal; the method comprises the steps that sound signals with different intensities are obtained through two sound collection modules (a first sound collection module and a second sound collection module) which are arranged at different positions, one of the sound signals is used as a monitoring signal, an echo signal in the other sound signal is eliminated, a target sound signal is further obtained, and simple and efficient echo elimination is achieved.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a diagram of an application scenario of an echo cancellation device according to an embodiment of the present disclosure.
Fig. 2 is a functional block diagram of the embodiment provided in fig. 1.
Fig. 3 is a schematic structural diagram of an echo cancellation device according to an exemplary embodiment of the present disclosure.
Fig. 4 is a schematic structural diagram of an echo cancellation device according to another exemplary embodiment of the present disclosure.
Fig. 5 is a schematic structural diagram of an echo cancellation module in the echo cancellation device provided in fig. 3 according to the present disclosure.
Fig. 6 is a flowchart illustrating an echo cancellation method according to an exemplary embodiment of the disclosure.
Fig. 7 is a flowchart illustrating an echo cancellation method according to another exemplary embodiment of the present disclosure.
Fig. 8 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the application
In the process of implementing the present disclosure, the inventors found that, due to the size limitation of the device, the distance between the speaker and the collecting microphone is usually short, and meanwhile, in order to reduce the manufacturing cost, the used speaker cannot meet the low linear distortion requirement required by the general AEC algorithm, which results in the deterioration of the noise reduction performance of the general algorithm.
There are at least the following problems: when the noise is reduced by the nonlinear algorithm in the prior art, in order to achieve the purpose of suppressing echo, the nonlinear processing can seriously damage the voice signal and has serious influence on voice awakening or voice recognition.
Exemplary System
Taking a smart speaker device as an example, fig. 1 is an application scenario diagram of an echo cancellation device provided in the embodiment of the present disclosure. As shown in fig. 1, an echo signal monitoring microphone is disposed near the sound outlet of the speaker of the device, and the reference signal is collected through a sound propagation path L1, while a near-end signal collecting microphone is disposed at a relatively distant position from the sound outlet, and a near-end desired signal is collected through a sound propagation path L2.
Fig. 2 is a functional block diagram of the embodiment provided in fig. 1. Playing a source echo signal through a loudspeaker inside the device, obtaining a voice signal output signal ref (including a target sound signal and an echo signal) by a monitoring microphone through an echo path 1 (corresponding to a sound propagation path L1 in fig. 1) and using the voice signal output signal ref as a reference signal for adaptive filtering, wherein the monitoring microphone can obtain a background noise n1 in addition to the voice signal S1, and in the present embodiment, the source echo signal played by the loudspeaker is mainly eliminated, and the background noise n1 can be ignored; the near-end signal collecting microphone obtains an output signal d (including a target sound signal and an echo signal with different intensities from the signal ref) through an echo path 2 (corresponding to the sound propagation path L2 in fig. 1), at this time, the near-end signal collecting microphone can obtain a background noise n2 in addition to the speech signal S2, in this embodiment, the source echo signal played by the speaker is mainly cancelled, and the background noise n2 can be ignored; the echo canceller processes the signal ref and the signal d by using a self-adaptive filtering algorithm to obtain a residual signal e, the residual signal e can be used as a target sound signal to realize echo cancellation, and the target sound signal can be used for communication, voice recognition, voiceprint recognition and the like. The monitoring microphone and the near-end signal acquisition microphone provided by the embodiment can also be used for realizing the functions of noise reduction and the like when the loudspeaker does not work (does not send out a source echo signal). The method and the device can achieve the purposes of echo cancellation and dual-microphone noise reduction only by relying on two microphones, and hardware cost is reduced.
Exemplary devices
Fig. 3 is a schematic structural diagram of an echo cancellation device according to an exemplary embodiment of the present disclosure. This embodiment can be used to voice interaction equipment such as smart sound, as shown in fig. 3, includes:
an echo cancellation module 31, and a first sound collection module 32 located at a first location and a second sound collection module 33 located at a second location.
Wherein the first and second positions may be different.
The first sound collecting module 32 is configured to collect a first voice signal, where the first voice signal includes a first target sound signal.
The first sound collection module 32 may be a microphone or the like, for example, a near-end collection microphone in fig. 1.
And a second sound collection module 33, configured to collect a second voice signal, where the second voice signal includes a second target sound signal.
The second sound collection module 33 may be a microphone, for example, the monitoring microphone in fig. 1. Optionally, the first target sound signal differs from the second target sound signal only in intensity.
The echo cancellation module 31 is configured to determine a target sound signal from the first target sound signal and the second target sound signal based on the first speech signal and the second speech signal.
The echo cancellation device provided by the embodiment of the present disclosure obtains sound signals with different intensities through two sound collection modules (a first sound collection module and a second sound collection module) disposed at different positions, and uses one of the sound signals as a monitoring signal to cancel an echo signal in another sound signal, thereby obtaining a target sound signal.
Fig. 4 is a schematic structural diagram of an echo cancellation device according to another exemplary embodiment of the present disclosure. As shown in fig. 4, on the basis of the embodiment shown in fig. 3, the apparatus of this embodiment further includes:
and the sound playing module 44 is used for playing the source echo signal.
In an alternative example, the sound playing module 44 is a speaker shown in fig. 1, and when external sound is collected as a target sound signal, a sound signal played through the speaker is taken as a source echo signal of the present embodiment.
In this embodiment, the first sound collecting module 32 collects the source echo signal through the first path to obtain a first voice signal composed of the first echo signal and the first target sound signal.
Wherein the first echo signal is obtained based on the source echo signal; the first path is a first sound propagation path between the sound playing module 44 and the first sound collecting module. Due to the nature of the propagation of sound, this first sound propagation path may be the primary propagation path for the source echo signal to propagate to the first sound collection module 32.
In an alternative example, the first path is the L2 path shown in fig. 1, in this example, in order to prevent the L2 path from being the main propagation path, sound insulation processing may be performed inside the smart sound box device (for example, adding a partition board to the rear end of the speaker inside the cavity) to determine that the source echo signal does not directly propagate through the inside of the smart sound box device to the first sound collection module 32.
The second sound collection module 33 collects the source echo signal through the second path to obtain a second speech signal composed of the second echo signal and the first target sound signal.
The second echo signal is obtained based on the source echo signal; the second path is a second sound propagation path between the sound playing module and the second sound collecting module. The second sound propagation path may be the primary propagation path for the source echo signal to propagate to the second sound acquisition module 33.
In an alternative example, the second path is the L1 path shown in fig. 1, in this example, in order to prevent the L1 path from being the main propagation path, sound insulation treatment may be performed inside the smart sound box device (for example, a high-density partition board is used as an inner layer of the cavity) to determine that the source echo signal does not directly propagate through the inside of the smart sound box device to the second sound collection module 33.
The method comprises the steps that a source echo signal and a target voice signal are simultaneously acquired from different paths, a first voice signal and a second voice signal which are formed by the source echo signal and the target voice signal and have different intensities can be acquired through a first voice acquisition module and a second voice acquisition module respectively, and since the two voice signals comprise the target voice signal and the echo signal, one voice signal (such as the second voice signal) is used as a monitoring signal, the echo signal of the other voice signal (such as the first voice signal) can be eliminated, so that the target voice signal is acquired.
In an alternative example, for the monitoring microphone (corresponding to the second sound collection module in the above-described embodiment) and the near-end signal collection microphone (corresponding to the first sound collection module in the above-described embodiment), it is ensured that the frequency responses of the two microphones are as consistent as possible. The better the microphone frequency response consistency, the better the performance of the adaptive filtering.
Placement of the two microphone (monitor microphone and near-end signal acquisition microphone) positions. The position that the monitoring microphone was put is the nearest better apart from the sound outlet, and the near-end signal acquisition microphone is the far better apart from the sound outlet, and equipment requirement leakproofness is enough high simultaneously, reaches sufficient syllable-dividing effect.
Optionally, the first path distance is greater than twice the second path distance.
In the above application example shown in FIG. 1, YanThe distance from the sound device (corresponding to the sound playing module) to the monitoring microphone is L1 (corresponding to the second path); the speaker-to-near end signal acquisition microphone distance is L2 (corresponding to the first path); the larger the energy difference between the first echo signal and the second echo signal is, the more beneficial it is to eliminate the echo signal in the speech signal to obtain the target sound source signal, wherein the energy difference between the source echo signal played by the loudspeaker to the monitoring microphone and the near-end signal collecting microphone is 20 × log10(L2/L1) dB, when L2>2 × L1, the distance of the first path is limited to be greater than twice the distance of the second path, and the difference between the energy of the first echo signal and the energy of the second echo signal is increased by the difference in propagation distance, so that the echo cancellation effect can be improved.
Fig. 5 is a schematic structural diagram of an echo cancellation module in the echo cancellation device provided in fig. 3 according to the present disclosure. In this embodiment, the echo cancellation module 31 includes:
a calculating unit 311 is configured to determine the first echo signal based on the second speech signal.
An adaptive filter 312 for determining the target sound signal based on the first speech signal, the second speech signal and the first echo signal obtained by the calculation unit.
In this embodiment, referring to the schematic diagram shown in fig. 2, the second voice signal includes a second echo signal and a second target voice signal, the second echo signal may be determined by combining a source echo signal and the second voice signal played by a speaker, an energy difference between the first echo signal and the second echo signal is determined by a relationship between an echo path 1 and an echo path 2, the first echo signal may be determined on the premise that the second echo signal and the energy difference are known, the first voice signal may be filtered by combining the second voice signal and the first echo signal through an adaptive filter, and the first target voice signal is determined as the target voice signal.
Optionally, the adaptive filter 312 is configured to filter the first voice signal based on the second voice signal, filter the first echo signal in the first voice signal, and obtain the first target sound signal as the target sound signal.
The echo canceller includes a function of calculating an echo signal based on the first voice signal and the second voice signal, and filtering the echo signal from the first voice signal through an adaptive filter.
The echo cancellation module 31 in the embodiment shown in fig. 5 further includes:
an updating unit 313, configured to determine whether to update the parameter of the adaptive filter based on an energy difference between the first speech signal and the second speech signal in the same frequency-domain subband.
Referring to the schematic diagram of fig. 2, by changing the signal ref and the signal d to the frequency domain, the filter can be dynamically updated by comparing the energy difference of the same sub-band of the two signals.
In an alternative example, the updating unit 313 is configured to update the parameter of the adaptive filter in response to that the energy ratio of the first speech signal and the second speech signal in the same frequency-domain subband reaches a first preset condition.
An adaptive filter refers to a filter that changes parameters and structure of the filter using an adaptive algorithm according to a change in environment. In general, the structure of the adaptive filter is not changed. And the parameters of the adaptive filter are time-varying coefficients updated by the adaptive algorithm. I.e. its parameters are automatically continuously adapted to a given signal to obtain a desired response.
Optionally, the filter specific update and filtering operations are as follows:
if REFi/Di>k (L2L 2/L1/L1), the adaptive filter updates while filtering the signal, wherein REFiFor the ith sub-band energy of the frequency domain corresponding to signal ref (second speech signal), DiFor the ith subband energy of the signal d (the first speech signal) corresponding to the frequency domain, k is a threshold coefficient (which is a set value) and is a positive number smaller than 1 but close to 1. At this time, the corresponding speaker is in a working state, that is, the speaker is playing the source noise signal, and the filter updating operation is performed.
If Th<REFi/Di<k (L2L 2/L1/L1), the volume played by the speaker can be considered to be small, and at this time,filter updating is stopped but filtering is continued. Th is a number (set value) larger than 1. The speaker may be in operation at this time.
If REFi/Di<Th, stop filter updating and stop filtering. At this time, optionally, blind source separation noise reduction may be performed using two microphone signals (the monitor microphone and the near-end signal acquisition microphone), or sound source localization processing may be performed using two microphone signals.
The larger the ratio of L2 to L1, the more loudspeaker signal components are contained in the signal ref relative to the signal d, the more accurate the filter estimate is, and the better the filtering performance is.
And when L2 and L1 are fixed, REFi/DiThe larger the ratio is, the more the reference signal components in the signal are, at this time, the updating step length of the filter can be increased, the tracking speed is increased, and REFi/DiWhen the ratio is larger than k (L2L 2/L1/L1), the smaller the ratio is, the larger the relative proportion of the near-end signal (second sound signal) components is, and at the moment, the updating step size of the filter is reduced, so that the risk of filter divergence is reduced.
In an alternative example, an apparatus provided by an embodiment of the present disclosure includes a cavity;
the sound playing module 44 is disposed inside the cavity, and the first sound collection module 32 and the second sound collection module 33 are disposed outside the cavity.
The embodiment sets up monitoring microphone position outside the equipment cavity, and to the sound pressure unstability that the cavity is inside because speaker vibrations arouse, the high-power during operation of speaker, the air is through less sound hole introduction noise etc. has very strong adaptability. Because the source echo signal passes through the sound outlet and then is transmitted to the outside, no matter how the inside of the cavity changes, the first sound collection module 32 and the second sound collection module 33 are arranged outside the cavity, and because the sound collection module is arranged outside the cavity, the influence of the vibration of the loudspeaker inside the cavity on the sound collection module can be avoided, and therefore the stability of the energy ratio of the first echo signal to the second echo signal can be improved.
Because the consistency between the two microphones is generally better than the distortion of the loudspeaker, when only the loudspeaker broadcasts, the noise reduction effect can reach nearly 40dB, which is larger than the noise reduction amount of the reference signal collected by the hardware.
According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including: at least one loudspeaker and the echo cancellation device provided in any of the above embodiments;
and the loudspeaker is used for playing the echo signal.
The voice interaction device is used for receiving the voice signal, eliminating an echo signal in the voice signal and obtaining a target sound signal; the voice signal includes a target sound signal and an echo signal played by a speaker.
The electronic device provided by this embodiment may be an intelligent sound box device as shown in fig. 1 or other electronic devices that need to cancel echo emitted by speakers, fig. 1 only shows a case that includes one speaker, when there are multiple speakers, there is no need to make any change (such as adding a microphone, etc.), the electronic device provided by this embodiment can also obtain a better effect, i.e., it is compatible with stereo and multi-channel audio devices, and if it is desired to improve echo cancellation effect under multiple speakers, it can be implemented by adding a reference microphone, optionally, the number of reference microphones after adding the reference microphones corresponds to the number of speakers, for example, two speakers correspond to two reference microphones.
Exemplary method
Fig. 6 is a flowchart illustrating an echo cancellation method according to an exemplary embodiment of the disclosure. As shown in fig. 6, the present embodiment may be applied to a voice interaction device such as a smart speaker, and the method of the present embodiment includes:
step 601, a first voice signal is collected through a first sound collection module arranged at a first position.
The first voice signal comprises a first target sound signal.
And step 602, acquiring a second voice signal through a second voice acquisition module arranged at a second position.
Wherein the second voice signal comprises a second target sound signal.
Step 603, determining a target sound signal from the first target sound signal and the second target sound signal based on the first speech signal and the second speech signal.
According to the echo cancellation method provided by the embodiment of the disclosure, the two sound collection modules (the first sound collection module and the second sound collection module) arranged at different positions are used for obtaining sound signals with different intensities, one of the sound signals is used as a monitoring signal, an echo signal in the other sound signal is cancelled, and then a target sound signal is obtained, so that simple and efficient echo cancellation is realized.
Fig. 7 is a flowchart illustrating an echo cancellation method according to another exemplary embodiment of the present disclosure. On the basis of the embodiment provided by fig. 6, the embodiment of the method further includes:
step 701, playing a source echo signal.
In this embodiment, step 601 includes: the method comprises the steps of collecting a source echo signal through a first path to obtain a first voice signal formed by a first echo signal and a first target sound signal.
Wherein the first echo signal is obtained based on the source echo signal; the first path is a first sound propagation path between the sound playing module and the first sound collecting module.
Step 602 includes: and acquiring a source echo signal through a second path to obtain a second voice signal consisting of a second echo signal and the first target sound signal.
Wherein the second echo signal is obtained based on the source echo signal; the second path is a second sound propagation path between the sound playing module and the second sound collecting module.
Optionally, the first path distance is greater than twice the second path distance.
In this case, step 603 includes:
step 6031, a first echo signal is determined based on the second speech signal.
Step 6032, determine the target sound signal based on the first voice signal, the second voice signal and the first echo signal obtained by the calculation unit by using the adaptive filter.
Optionally, step 6032 comprises: and filtering the first voice signal based on the second voice signal, filtering a first echo signal in the first voice signal, and obtaining a first target voice signal as a target voice signal.
Optionally, step 603 further includes:
step 6033, determining whether to update parameters of the adaptive filter based on the energy difference of the first voice signal and the second voice signal in the same frequency domain sub-band; when it is determined to update the parameters of the adaptive filter, step 6034 is performed; otherwise, go to step 6032;
in step 6034, parameters of the adaptive filter are updated, and the adaptive filter with the updated parameters performs step 6032.
Optionally, step 6033 comprises: and updating the parameters of the adaptive filter in response to the energy ratio of the first voice signal and the second voice signal in the same frequency domain sub-band reaching a first preset condition.
Any of the echo cancellation methods provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: terminal equipment, a server and the like. Alternatively, any of the echo cancellation methods provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor may execute any of the echo cancellation methods mentioned in the embodiments of the present disclosure by calling a corresponding instruction stored in a memory.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 8. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.
FIG. 8 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
As shown in fig. 8, the electronic device 80 includes one or more processors 81 and memory 82.
The processor 81 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 80 to perform desired functions.
Memory 82 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 81 to implement the echo cancellation methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 80 may further include: an input device 83 and an output device 84, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, when the electronic device is the first device 100 or the second device 200, the input device 83 may be a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 83 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.
The input device 83 may also include, for example, a keyboard, a mouse, and the like.
The output device 84 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 84 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 80 relevant to the present disclosure are shown in fig. 8, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 80 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the echo cancellation method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the echo cancellation method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (12)

1. An echo cancellation device, comprising: the device comprises an echo cancellation module, a first sound acquisition module positioned at a first position and a second sound acquisition module positioned at a second position;
the first sound collection module is used for collecting a first voice signal, and the first voice signal comprises a first target sound signal;
the second sound acquisition module is used for acquiring a second voice signal, wherein the second voice signal comprises a second target sound signal;
the echo cancellation module is configured to determine a target sound signal from the first target sound signal and the second target sound signal based on the first speech signal and the second speech signal.
2. The apparatus of claim 1, wherein the apparatus further comprises:
the sound playing module is used for playing the source echo signal;
the first sound acquisition module acquires the source echo signal through a first path to obtain a first voice signal formed by a first echo signal and the first target sound signal, wherein the first echo signal is obtained based on the source echo signal; the first path is a first sound propagation path between the sound playing module and the first sound collecting module;
the second sound acquisition module acquires the source echo signal through a second path to obtain a second voice signal formed by a second echo signal and the first target sound signal, wherein the second echo signal is obtained based on the source echo signal; the second path is a second sound propagation path between the sound playing module and the second sound collecting module.
3. The apparatus of claim 2, wherein the first path distance is greater than twice the second path distance.
4. The apparatus of claim 2, wherein the echo cancellation module comprises:
a calculation unit for determining the first echo signal based on the second speech signal;
an adaptive filter for determining the target sound signal based on the first voice signal, the second voice signal and the first echo signal obtained by the calculating unit.
5. The apparatus of claim 4, wherein the adaptive filter is configured to filter the first speech signal based on the second speech signal, filter a first echo signal in the first speech signal, and obtain the first target sound signal as the target sound signal.
6. The apparatus of claim 5, wherein the echo cancellation module further comprises:
an updating unit, configured to determine whether to update the parameter of the adaptive filter based on an energy difference between the first speech signal and the second speech signal in the same frequency domain subband.
7. The apparatus of claim 6, wherein the updating unit is configured to update the parameters of the adaptive filter in response to an energy ratio of the first speech signal and the second speech signal in a same frequency-domain subband reaching a first preset condition.
8. The device of any of claims 2-7, wherein the device comprises a lumen;
the sound playing module is arranged inside the cavity, and the first sound collecting module and the second sound collecting module are arranged outside the cavity.
9. An echo cancellation method, comprising:
acquiring a first voice signal through a first voice acquisition module arranged at a first position, wherein the first voice signal comprises a first target voice signal;
acquiring a second voice signal through a second voice acquisition module arranged at a second position; the second voice signal comprises a second target sound signal;
determining a target sound signal from the first target sound signal and the second target sound signal based on the first speech signal and the second speech signal.
10. An electronic device, comprising: at least one loudspeaker and the echo canceling device of any one of the preceding claims 1 to 8;
the loudspeaker is used for playing an echo signal;
the echo cancellation device is used for receiving a voice signal, canceling the echo signal in the voice signal and obtaining a target sound signal; the voice signal comprises the target sound signal and an echo signal played by the loudspeaker.
11. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the echo cancellation method of claim 9.
12. A computer-readable storage medium, which stores a computer program for executing the echo cancellation method of claim 9.
CN201910419525.1A 2019-05-20 2019-05-20 Echo cancellation device and method, electronic device, and storage medium Pending CN111968660A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910419525.1A CN111968660A (en) 2019-05-20 2019-05-20 Echo cancellation device and method, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910419525.1A CN111968660A (en) 2019-05-20 2019-05-20 Echo cancellation device and method, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
CN111968660A true CN111968660A (en) 2020-11-20

Family

ID=73357945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910419525.1A Pending CN111968660A (en) 2019-05-20 2019-05-20 Echo cancellation device and method, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN111968660A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362843A (en) * 2021-06-30 2021-09-07 北京小米移动软件有限公司 Audio signal processing method and device
CN113555030A (en) * 2021-07-29 2021-10-26 杭州萤石软件有限公司 Audio signal processing method, device and equipment
CN115881151A (en) * 2023-01-04 2023-03-31 广州市森锐科技股份有限公司 Bidirectional pickup denoising method, device, equipment and medium based on high-speed shooting instrument
WO2023077980A1 (en) * 2021-11-04 2023-05-11 深圳Tcl新技术有限公司 Sound effect adjusting method and apparatus, storage medium, and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101179294A (en) * 2006-11-09 2008-05-14 爱普拉斯通信技术(北京)有限公司 Self-adaptive echo eliminator and echo eliminating method thereof
CN101820302A (en) * 2009-02-27 2010-09-01 比亚迪股份有限公司 Device and method for canceling echo
CN102377453A (en) * 2010-08-06 2012-03-14 联芯科技有限公司 Method and device for controlling updating of self-adaptive filter and echo canceller
CN103259563A (en) * 2012-02-16 2013-08-21 联芯科技有限公司 Self-adapting filter divergence detection method and echo cancellation system
CN104519212A (en) * 2013-09-27 2015-04-15 华为技术有限公司 An echo cancellation method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101179294A (en) * 2006-11-09 2008-05-14 爱普拉斯通信技术(北京)有限公司 Self-adaptive echo eliminator and echo eliminating method thereof
CN101820302A (en) * 2009-02-27 2010-09-01 比亚迪股份有限公司 Device and method for canceling echo
CN102377453A (en) * 2010-08-06 2012-03-14 联芯科技有限公司 Method and device for controlling updating of self-adaptive filter and echo canceller
CN103259563A (en) * 2012-02-16 2013-08-21 联芯科技有限公司 Self-adapting filter divergence detection method and echo cancellation system
CN104519212A (en) * 2013-09-27 2015-04-15 华为技术有限公司 An echo cancellation method and apparatus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362843A (en) * 2021-06-30 2021-09-07 北京小米移动软件有限公司 Audio signal processing method and device
CN113362843B (en) * 2021-06-30 2023-02-17 北京小米移动软件有限公司 Audio signal processing method and device
CN113555030A (en) * 2021-07-29 2021-10-26 杭州萤石软件有限公司 Audio signal processing method, device and equipment
CN113555030B (en) * 2021-07-29 2024-05-31 杭州萤石软件有限公司 Audio signal processing method, device and equipment
WO2023077980A1 (en) * 2021-11-04 2023-05-11 深圳Tcl新技术有限公司 Sound effect adjusting method and apparatus, storage medium, and electronic device
CN115881151A (en) * 2023-01-04 2023-03-31 广州市森锐科技股份有限公司 Bidirectional pickup denoising method, device, equipment and medium based on high-speed shooting instrument

Similar Documents

Publication Publication Date Title
CN111968660A (en) Echo cancellation device and method, electronic device, and storage medium
KR102410447B1 (en) Adaptive Beamforming
CA2989759C (en) System and method for echo suppression for in-car communications
US9494683B1 (en) Audio-based gesture detection
EP3791565B1 (en) Method and apparatus utilizing residual echo estimate information to derive secondary echo reduction parameters
CN104395957B (en) A kind of general restructural echo cancelling system
US9595997B1 (en) Adaption-based reduction of echo and noise
CN108681440A (en) A kind of smart machine method for controlling volume and system
KR20190097026A (en) Acoustic echo cancellation
EP3348047A1 (en) Audio signal processing
US11349525B2 (en) Double talk detection method, double talk detection apparatus and echo cancellation system
US10339951B2 (en) Audio signal processing in a vehicle
JP2012510779A (en) System and method for double-talk detection in acoustically harsh environments
US20160073209A1 (en) Maintaining spatial stability utilizing common gain coefficient
US20140349638A1 (en) Signal processing control in an audio device
CN109215672B (en) Method, device and equipment for processing sound information
CN112037810A (en) Echo processing method, device, medium and computing equipment
CN111356058B (en) Echo cancellation method and device and intelligent sound box
CN112997249B (en) Voice processing method, device, storage medium and electronic equipment
US9978387B1 (en) Reference signal generation for acoustic echo cancellation
CN111756906B (en) Echo suppression method and device for voice signal and computer readable medium
CN110913312B (en) Echo cancellation method and device
Sugiyama et al. A noise robust hearable device with an adaptive noise canceller and its DSP implementation
US9596541B2 (en) Post-filter for handling resonance-induced echo components
CA2840730C (en) Maintaining spatial stability utilizing common gain coefficient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination