WO2020191512A1 - 回声消除装置、回声消除方法、信号处理芯片及电子设备 - Google Patents

回声消除装置、回声消除方法、信号处理芯片及电子设备 Download PDF

Info

Publication number
WO2020191512A1
WO2020191512A1 PCT/CN2019/079174 CN2019079174W WO2020191512A1 WO 2020191512 A1 WO2020191512 A1 WO 2020191512A1 CN 2019079174 W CN2019079174 W CN 2019079174W WO 2020191512 A1 WO2020191512 A1 WO 2020191512A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice signal
digital voice
update
double
detection module
Prior art date
Application number
PCT/CN2019/079174
Other languages
English (en)
French (fr)
Inventor
韩文凯
王鑫山
李国梁
郭红敬
朱虎
Original Assignee
深圳市汇顶科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市汇顶科技股份有限公司 filed Critical 深圳市汇顶科技股份有限公司
Priority to CN201980000673.8A priority Critical patent/CN111989934B/zh
Priority to PCT/CN2019/079174 priority patent/WO2020191512A1/zh
Publication of WO2020191512A1 publication Critical patent/WO2020191512A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/02Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback

Definitions

  • the embodiments of the present application relate to the field of signal processing technology, and in particular, to an echo cancellation device, an echo cancellation method, a signal processing chip, and electronic equipment.
  • Echo cancellation is currently a major problem in the industry. From the perspective of echo generation, in addition to environmental reasons, such as in a hands-free communication system, the echo is caused by the sound of the speaker feedback to the microphone, and it also includes the network transmission delay. The echo brought. In addition, it also includes the indirect echo produced by the far-end sound after one or multiple reflections. From the perspective of the influencing factors of echo cancellation, it is not only related to the external environment of the terminal equipment of the communication system, but also closely related to the performance of the host running the communication system and network conditions. As for the external environment, it may specifically include: the relative distance and direction between the microphone and the speaker, the relative distance and direction between the speaker and the speaker, the size of the room and the wall material of the room, and so on.
  • AEC Acoustic Echo Cancellation
  • AEC Acoustic Echo Cancellation
  • the echo cancellation algorithm uses an adaptive filter to simulate the echo path, and continuously adjusts the coefficients of the filter through the adaptive algorithm to make the impulse response close to the true echo path. Then combine the far-end speech signal and filter to get the estimated echo signal. Then, the estimated echo signal is subtracted from the input signal of the microphone to achieve the purpose of eliminating the echo.
  • DTD Double Talk Detection
  • the so-called two-terminal call means that the signal collected by the microphone includes both the echo caused by the far-end voice signal and the voice signal of the near-end speaker.
  • the adaptive filtering algorithm will inevitably generate misadjusted noise during the echo cancellation process, by reducing the update step size of the adjustment filter coefficient, the misadjusted noise of the algorithm can be reduced, and the convergence accuracy of the algorithm can be improved.
  • the reduction of the filter coefficient update step size will reduce the convergence speed and tracking speed of the algorithm.
  • the fixed-step adaptive filtering algorithm has contradictory requirements on the algorithm adjustment factor in terms of convergence speed, tracking speed, and convergence accuracy.
  • the selected decision threshold is usually fixed, which may cause missed detection and false alarm probability, thereby reducing the DTD detection accuracy of double-talk detection.
  • one of the technical problems solved by the embodiments of the present invention is to provide an echo cancellation device, an echo cancellation method, a signal processing chip and an electronic device to overcome the above-mentioned defects in the prior art.
  • An embodiment of the present application provides an echo cancellation device, which includes:
  • Voice endpoint detection module used to detect whether there is an actual echo digital voice signal in the near-end digital voice signal
  • the double-talk detection module is configured to determine whether to activate according to the detection result of the voice endpoint detection module, and detect the double-talk probability after activation to control the update of the filter coefficient;
  • the adaptive filter is used to generate an estimated echo digital voice signal according to the filter coefficient and the far-end digital voice signal, so as to eliminate the actual echo digital voice signal in the near-end digital voice signal.
  • the voice endpoint detection module is further configured to compare the energy of the near-end digital voice signal and the far-end digital voice signal with a preset energy threshold, To detect whether the actual echo digital voice signal exists in the near-end digital voice signal.
  • the double-talk detection module is further configured to indicate that the actual echo digit does not exist in the near-end digital voice signal after the detection result of the voice endpoint detection module
  • the filter coefficient is updated according to the historical step; or the double-talk detection module is further configured to detect the presence of the near-end digital voice signal after the detection result of the voice endpoint detection module The actual echo digital voice signal is activated to update the filter coefficients.
  • the double-ended call detection module is further configured to determine whether to start according to the detection result of the voice endpoint detection module, and after the start, pass the estimated echo digital voice
  • the estimated energy of the signal and the estimated digital speech signal of the near-end speaker detects the double talk probability and controls the update of the filter coefficient.
  • the double-talk detection module is further configured to determine whether to activate according to the detection result of the voice endpoint detection module, and smoothly process the estimated energy after activation The double-talk probability is detected according to the smoothed estimated energy to control the update of the filter coefficient.
  • the double-ended call detection module is further configured to determine whether to start according to the detection result of the voice endpoint detection module, and after the start, pass the estimated echo digital voice
  • the estimated energy of the signal and the estimated digital speech signal of the near-end speaker determines that there is no digital speech signal of the near-end speaker and the presence of the near-end speaker's digital speech signal.
  • the double-talk detection module is further configured to determine whether to activate according to the detection result of the voice endpoint detection module, and smoothly process the estimated energy after activation According to the smoothed estimated energy, determine the ratio of the probability that the digital voice signal of the near-end speaker does not exist and the digital voice signal of the near-end speaker exists, and Detecting the double-talk probability and controlling the update of the filter coefficient.
  • the near-end digital voice signal exists when the digital voice signal of the near-end speaker does not exist and the digital voice signal of the near-end speaker exists.
  • the ratio of the probability is inversely proportional to the double-talk probability.
  • the double-talk detection module is further configured to determine a step-size update factor according to the double-talk probability, so as to determine the filter according to the step-size update factor
  • the update step size of the coefficients in turn updates the filter coefficients.
  • the double-talk probability and the step update factor have a non-linear relationship.
  • the change trend of the double-ended call probability is opposite to the change trend of the step update factor.
  • the double-ended call The value of the probability is 1, and the step size update factor is 0, then the update step size of the filter coefficient is reduced to slow down the filter coefficient or stop the update of the filter coefficient; There is no digital voice signal of the near-end speaker but only the actual echo digital voice signal in the end digital voice signal, the value of the double-talk probability is 0, and the step update factor is a non-zero value , The update step size of the filter coefficient is increased to speed up the update of the filter coefficient.
  • the double-talk detection module is further configured to determine the step size update factor according to the double-talk probability, so as to determine the step size update factor based on the step size update factor.
  • the update step size of the filter coefficient is further updated according to the update step size and the update gradient.
  • the double-talk detection module is further configured to determine the step update factor according to the double-talk probability, so as to update the factor according to the step size and the step size
  • the amount of smoothing determines the update step size of the filter coefficient, and then updates the filter coefficient according to the update step size and the update gradient.
  • the update step size and the step size update factor have a linear relationship.
  • the filter coefficient has a linear relationship with the update step size.
  • it further includes: an addition module, configured to subtract the estimated echo digital voice signal from the near-end digital voice signal to obtain an error digital voice signal to eliminate all The actual echo digital voice signal in the near-end digital voice signal.
  • an addition module configured to subtract the estimated echo digital voice signal from the near-end digital voice signal to obtain an error digital voice signal to eliminate all The actual echo digital voice signal in the near-end digital voice signal.
  • the embodiment of the present application also provides an echo cancellation method, which includes:
  • the voice endpoint detection module detects whether there is an actual echo digital voice signal in the near-end digital voice signal
  • the double-talk detection module determines whether to activate according to the detection result of the voice endpoint detection module, and controls the update of the filter coefficient by detecting the double-talk probability after activation;
  • the adaptive filter generates an estimated echo digital voice signal according to the filter coefficient and the far-end digital voice signal, so as to eliminate the actual echo digital voice signal in the near-end digital voice signal.
  • An embodiment of the present application also provides a signal processing chip, which includes the echo cancellation device described in any embodiment of the present application.
  • An embodiment of the present application also provides an electronic device, which includes the signal processing chip described in any embodiment of the present application.
  • the voice endpoint detection module detects whether there is an actual echo digital voice signal in the near-end digital voice signal; the double-talk detection module determines whether to activate according to the detection result of the voice endpoint detection module, and passes it after activation The detected double-talk probability is used to control the update of the filter coefficient; the adaptive filter generates an estimated echo digital voice signal according to the filter coefficient and the far-end digital voice signal to eliminate the near-end digital voice signal The actual echo digital voice signal.
  • the echo digital voice signal estimated by the adaptive filter such as the error digital voice signal output by the adder
  • the double-talk detection module is fed back to control the double-talk detection module to detect the double-talk probability, so that the double-talk module and the self
  • the mutual restriction of adaptive filters solves the contradiction between the steady-state offset and the convergence speed in the adaptive filtering algorithm, and improves the detection accuracy of the double-talk detection module and the convergence performance of the filter.
  • FIG. 1 is a schematic structural diagram of a signal processing chip using an echo cancellation device in Embodiment 1 of the application;
  • FIG. 2 is a schematic diagram of the working flow of the signal processing chip in the second embodiment of the application.
  • the voice endpoint detection module detects whether there is an actual echo digital voice signal in the near-end digital voice signal; the double-talk detection module determines whether to activate according to the detection result of the voice endpoint detection module, and passes the detection after activation To control the update of the filter coefficients; the adaptive filter generates an estimated echo digital voice signal according to the filter coefficients and the far-end digital voice signal, so as to eliminate the near-end digital voice signal.
  • the actual echo digital voice signal is an actual echo digital voice signal in the near-end digital voice signal.
  • the echo digital voice signal estimated by the adaptive filter such as the error digital voice signal output by the adder
  • the double-talk detection module is fed back to control the double-talk detection module to detect the double-talk probability, so that the double-talk module and the self
  • the mutual restriction of adaptive filters solves the contradiction between the steady-state offset and the convergence speed in the adaptive filtering algorithm, and improves the detection accuracy of the double-talk detection module and the convergence performance of the filter.
  • FIG. 1 is a schematic structural diagram of a signal processing chip applying an echo cancellation device in Embodiment 1 of the application; as shown in FIG. 1, the signal processing chip includes an echo cancellation device 100, and the echo cancellation device 100 specifically includes a voice endpoint detection module 106, The double-ended call detection module 108 and the adaptive filter 110.
  • the echo cancellation device may also include: a voice collection module 102, a voice playback module 104, and an addition module 112.
  • the voice collection module 102 is connected to the voice endpoint respectively.
  • the detection module 106, the double-talk detection module 108, and the addition module 112 are in communication connection, the voice playback module 104 is respectively connected to the voice endpoint detection module 106 and the adaptive filter 110, and the voice endpoint detection module 106 communicates with the double-talk detection module 108 Connected, the double-talk detection module 108 is respectively connected to the adaptive filter 110 and the addition module 112 in communication.
  • the voice collection module 102 is used to collect the near-end analog voice signal y(t) to generate the near-end digital voice signal y(n); in this embodiment, the voice collection module may specifically be a microphone.
  • the end analog voice signal y(t) may include the voice signal s(t) of the near-end speaker, or the echo analog voice signal d(t) caused by the voice playback module 104 playing the remote analog voice signal.
  • the so-called far-end analog voice signal x(t) and the near-end analog voice signal y(t) are distinguished by the distance between the two parties' voice signals in the signal processing chip.
  • the voice playing module 104 is used to play the received remote analog voice signal x(t); in this embodiment, the voice playing module 104 may be specifically a speaker.
  • the voice endpoint detection module 106 is used to detect whether there is an actual echo digital voice signal d(n) in the near-end digital voice signal y(n); in this embodiment, the voice endpoint detection module 106 can also be called voice Endpoint detector (Voice Activity Detector, VAD for short).
  • VAD Voice Activity Detector
  • the double-ended call detection module 108 is used to determine whether to start according to the detection result of the voice endpoint detection module, and after the start-up, the detected double-ended call probability is used to control the update of the filter coefficient; in this embodiment, the double-ended call
  • the call detection module 108 may also be referred to as a double-ended call detector DTD.
  • the adjustment of the update step of the filter coefficients is specifically performed by the double-talk detection module 108, but in fact, if the weight of the double-ended detection module 108 is to be reduced, another implementation is required.
  • an update module can also be added separately between the double-talk detection module 108 and the adaptive filter 110, which is specifically used to update the filter coefficients or determine the update step length of the filter coefficients.
  • the adaptive filter 110 is used to generate an estimated echo digital voice signal according to the filter coefficients and the remote digital voice signal x(n) To eliminate the actual echo digital voice signal d(n) in the near-end digital voice signal y(n).
  • the adaptive filter 110 is, for example, a multi-delay block frequency domain adaptive filter.
  • the adding module 112 is configured to subtract the estimated echo digital voice signal from the near-end digital voice signal y(n)
  • the error digital speech signal e(n) is obtained to eliminate the actual echo digital speech signal d(n) in the near-end digital speech signal y(n).
  • the addition module 112 may specifically be an adder.
  • the estimated echo digital speech signal The more accurate, that is, the closer to the actual echo digital speech signal d(n), the higher the speech intelligibility.
  • the voice endpoint detection module 106 is further configured to compare the energy of the near-end digital voice signal y(n) and the far-end digital voice signal x(n) with preset energy thresholds respectively, It is detected whether there is an actual echo digital voice signal d(n) in the near-end digital voice signal y(n). If the energy of the near-end digital voice signal y(n) and the far-end digital voice signal x(n) are both greater than the corresponding preset energy threshold, it is determined that there is actual The echo digital speech signal d(n).
  • the double-talk detection module 108 when the detection result of the voice endpoint detection module 106 indicates that there is no actual echo digital voice signal d(n) in the near-end digital voice signal y(n), the double-talk detection module 108 Do not start, so that the filter coefficient is updated according to the historical step; when the detection result of the voice endpoint detection module indicates that there is an actual echo digital voice signal d(n) in the near-end digital voice signal, the double-talk detection module 108 starts to determine the update step size of the filter coefficients.
  • the update step size of the filter coefficients is reduced to slow down the update of the filter coefficients or stop the update of the filter coefficients; if the near-end digital voice signal x(n) does not exist in the near-end speaker The digital voice signal s(n) and only the echo digital voice signal d(n) exist, that is, in a single-ended conversation state, the update step size of the filter coefficient is increased to speed up the update of the filter coefficient.
  • the adaptive filter is further configured to generate an estimated echo digital voice signal according to the filter coefficient and the remote digital voice signal x(n)
  • Fig. 2 is a schematic diagram of the working flow of the signal processing chip in the second embodiment of the application; corresponding to Fig. 1 above, it includes:
  • the voice playing module plays the received remote analog voice signal x(t);
  • the echo analog voice signal d(t) included in the near-end analog voice signal y(t) is specifically caused by the far-end analog voice signal x(t). Therefore, for the voice collection module 102, the input near-end analog voice signal y(t) may include the speaker's analog voice signal s(t) and the echo analog voice signal d(t). It should be noted here that if there is a remote analog voice signal x(t), it will be played, otherwise, it will not be played.
  • the voice collection module collects the near-end analog voice signal y(t) to generate the near-end digital voice signal y(n);
  • the voice endpoint detection module detects whether there is an actual echo digital voice signal d(n) in the near-end digital voice signal y(n);
  • the voice endpoint detection module 106 is a voice endpoint detector VAD, it can specifically detect the remote end through the short-term energy method, the time-domain average zero-crossing rate method, and the short-term correlation method.
  • the digital voice signal x(n) and the near-end digital voice signal y(n) are then judged whether there is an actual echo digital voice signal d(n).
  • the short-term energy method is adopted, the energy of the far-end digital voice signal x(n) and the near-end digital voice signal y(n) can be detected by the voice endpoint detection module 106, and the energy thresholds can be set separately. Make comparisons.
  • the energy of the far-end digital voice signal x(n) and the near-end digital voice signal y(n) are both greater than the corresponding energy threshold, it indicates that the near-end digital voice signal y(n) has an echo digital voice signal d(n) ), because the echo digital voice signal d(n) is generated from the far-end digital voice signal x(n), it can be understood that when there is a far-end digital voice signal x(n), that is, there is an echo digital voice signal d(n).
  • the double-talk detection module determines whether to start according to the detection result of the voice endpoint detection module, and detects the double-talk probability after the start to determine the update step of the filter coefficient;
  • the double-talk detection module does not start when the detection result indicates that there is no actual echo digital voice signal d(n) in the near-end digital voice signal y(n) , So that the filter coefficients update the filter coefficients according to the historical step length, for example, when the echo cancellation is performed in units of frames, the filter coefficients can be updated according to the update step length of the near-end digital voice signal of the previous frame; or ,
  • the double-talk detection module 108 is activated when the detection result indicates that there is an actual echo digital voice signal d(n) in the near-end digital voice signal y(n) to control it according to the detected double-talk probability Determination of filter coefficient update step size.
  • the update step length of the filter coefficient is reduced, and the update of the filter coefficient is controlled to be slow, or the update of the filter coefficient is directly stopped.
  • the reason why the update of the filter coefficient is slowed down or even stopped is mainly considered in the near end.
  • the presence of the speaker's digital speech signal s(n) will cause the filter coefficients to diverge, and it is impossible to generate an accurately estimated echo digital speech signal Thereby affecting the effectiveness of echo cancellation; if there is no near-end speaker's digital voice signal s(n) in the near-end digital voice signal y(n) but only the actual echo digital voice signal d(n) ,
  • the update step size is increased to update the filter coefficients.
  • the update step please refer to the following description with the probability model as an example.
  • a separate update step determination module can be added between the double-talk detection module and the adaptive filter, which is specifically used to calculate the update step, or it can also be calculated by the double-talk detection module Update step size.
  • the adaptive filter generates an estimated echo digital voice signal according to the filter coefficient and the remote digital voice signal x(n);
  • the adaptive filter 110 is a multi-delay block frequency domain adaptive filter, that is, it includes several block adaptive filters, for example, the number of adaptive filter blocks is D, so as to achieve Shorter block delays, faster convergence speed and smaller storage requirements.
  • the addition module subtracts the estimated echo digital voice signal from the near-end digital voice signal y(n) to obtain an error digital voice signal e(n) to eliminate the actual near-end digital voice signal y(n). Echo digital speech signal d(n).
  • the determination of the update step length by the statistical probability model is taken as an example to realize the mutual restriction of the double-talk detection module 108 and the adaptive filter 110 for exemplary description.
  • the near-end voice digital voice signal, the far-end digital voice signal, and the echo digital voice signal are processed in units of frames, that is, referring to the number of frequency points of the adaptive filter M divides the frame, that is, each M data points in the echo digital speech signal d(n), the digital speech signal s(n) of the near-end speaker, and the near-end digital speech signal y(n) are recorded as 1 frame.
  • Apply the above echo cancellation scheme to each frame of the echo digital voice signal d(n), the digital voice signal s(n) of the near-end speaker, and the near-end digital voice signal y(n).
  • H 0 and H 1 are used to indicate that there is no digital voice signal s(n) of the near-end speaker but only the actual echo digital voice signal d(n), and the digital voice signal of the near-end speaker exists at the same time. s(n) and the actual echo digital speech signal d(n).
  • D(i) [D(i,1),D(i,2),...,D(i,M)]
  • S(i) [S(i, 1),S(i,2),...,S(i,M)]
  • Y(i) [Y(i,1),Y(i,2),...,Y(i, M)], representing the actual echo digital voice signal d(n), the digital voice signal s(n) of the near-end speaker, and the i-th frame signal of the near-end digital voice signal y(n) in the first to Mth frames.
  • the value of M is generally equal to the order of the adaptive filter.
  • X(i) [X(i,1),X(i,2),...,X(i,M)] indicates that the i-th frame signal of the remote digital voice signal x(n) is in the Frequency domain signal from 1 to M frequency point.
  • ⁇ s (i,k) and ⁇ d (i,k) represent the digital speech signal s(n) of the near-end speaker and the actual echo digital speech signal d(n) of the i-th frame signal respectively.
  • H 0 ) represents the position of the near-end digital speech signal y(n) when there is no near-end speaker's digital speech signal s(n)
  • the probability that the i-frame signal has a frequency-domain signal at the k-th frequency point, k 1, 2...M, p(Y(i,k)
  • the i-th frame signal of the near-end digital speech signal y(n) has the probability of a frequency domain signal at the k-th frequency point.
  • Y(i)) indicates that the i-th frame signal of the near-end digital voice signal y(n) has frequency domain signals at the first to Mth frequency points, and the presence of the near-end speaker can be detected
  • the probability of the digital voice signal s(n) is the double-talk probability.
  • ⁇ k (Y(i,k)) is that when there is no digital speech signal s(n) of the near-end speaker, the i-th frame signal of the near-end digital speech signal y(n) has a frequency at the k-th frequency point.
  • the ratio of the probability of the near-end speaker's digital speech signal s(n) to the probability that the i-th frame signal of the near-end digital speech signal y(n) has a frequency-domain signal at the k-th frequency point or The i-th frame signal called the near-end digital speech signal y(n) has the likelihood ratio of the frequency domain signal at the k-th frequency point).
  • ⁇ s (i,k) and ⁇ d (i,k) indicate that the digital voice signal s(n) of the near-end speaker and the i-th frame signal of the actual echo digital voice signal d(n) are at the k-th frequency point
  • the actual energy above is difficult to obtain in actual scenarios, so in this embodiment, it is estimated by the following formula (5).
  • the frequency domain signal on k frequency points that is, the estimated near-end speech is obtained by the energy of the frequency domain signal E(i,k) at the kth frequency point of the i-th frame signal of the error digital speech signal e(n)
  • the estimated energy of the frequency domain signal of the i-th frame signal at the k-th frequency point The estimated energy of
  • the actual energy of the i-th frame signal of the digital speech signal s(n) of the near-end speaker at the k-th frequency point uses the i-1th of the error digital speech signal e(n)
  • the frame signal is replaced by the energy of the frequency domain signal at the kth frequency point. If it is the actual energy of the i-th frame signal of the echo digital speech signal d(n) at the kth frequency point, use the estimated echo digital speech signal
  • the estimated energy of the frequency domain signal at the kth frequency point of the i-1th frame signal is replaced.
  • ⁇ s and ⁇ d represent the energy estimation smoothing parameter of the digital speech signal and the echo digital speech signal energy estimation smoothing parameter of the near-end speaker respectively, so as to smooth the estimated energy according to the smoothed
  • the estimated energy determines the likelihood ratio to detect the double-talk probability and then control the update of the filter coefficients.
  • the purpose is to prevent the energy value between two consecutive frames of the digital speech signal of the near-end speaker from sudden changes and continuous
  • the energy value between the two frames of echo digital speech signals has a sudden change, and the size of the corresponding smoothing parameter is set based on empirical values according to different application scenarios.
  • the likelihood ratio may also be determined according to the estimated energy to detect the double-talk probability and control the update of the filter coefficient.
  • the estimated echo speech digital signal can be obtained through the adaptive filter
  • the error digital voice signal e(n) is obtained through the addition module. Therefore, for the double-talk detection module, the estimated digital voice signal of the near-end speaker and the estimated echo digital voice signal can be obtained by formula (5)
  • the i-th frame signal has a frequency at the k-th frequency point.
  • the value of) is not large, so p(H 1
  • Y(i)) 0 is obtained.
  • Y(i)) is usually between 1 and 0, and it will only be 1 or 0 under theoretical conditions.
  • Y(i)) actually represents the size of the probability of a double-ended call.
  • Y(i)) is directly related to the above-mentioned likelihood ratio ⁇ k (Y(i,k)), or is also called the likelihood ratio ⁇ k (Y(i,k))
  • the magnitude of will affect p(H 1
  • the energy of the signal d(n) is constantly changing. Therefore, it is equivalent to the difference between p(H 1
  • the adaptive filter 110 adopts a multi-delay block frequency domain adaptive filter
  • one of the filter blocks such as the m-th block (m is the above D) adaptive filter
  • the corresponding filter coefficient W(i+1, m) is calculated according to the following formula (6).
  • ⁇ (i,m) is the update gradient of filter coefficients during signal processing of the i+1th frame of the near-end speech digital speech signal y(n), which is detailed in the relevant literature on the multi-delay block frequency domain adaptive filtering algorithm The calculation method of is not repeated here.
  • the update step size of the filter coefficient is reduced to slow down the update of the filter coefficient, or stop the update of the filter coefficient;
  • Y(i))
  • the step size update factor I is a non-zero value to increase the update step size of the filter coefficient according to the above formula (6) to update the filter coefficient. It can be seen that the change trend of the double-ended call probability is opposite to the change trend of the step update factor. In the actual process, p(H 1
  • the step smoothing amount ⁇ (i) is introduced, that is, in the i+1 frame signal (also known as the near-end speech digital speech signal y(n))
  • the update step size ⁇ (i) of the i-th frame signal of the near-end voice digital voice signal y(n) is also referred to.
  • the double-talk detection module is further configured to determine the step update factor according to the double-talk probability, so as to determine the filter coefficient according to the step update factor and the step smoothing amount The update step size of, and then update the filter coefficients according to the update step size and the update gradient.
  • the electronic devices in the embodiments of this application exist in various forms, including but not limited to:
  • Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
  • Such terminals include: PDA, MID and UMPC devices, such as iPad.
  • Portable entertainment equipment This type of equipment can display and play multimedia content.
  • Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as program modules.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific transactions or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network execute transactions.
  • program modules can be located in local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)

Abstract

一种回声消除装置、回声消除方法、信号处理芯片及电子设备,回声消除装置包括:语音端点检测模块(106),用于检测近端数字语音信号中是否存在实际的回声数字语音信号;双端通话检测模块(108),用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后检测双端通话概率以控制滤波器系数的更新;自适应滤波器(110),用于根据所述滤波器系数以及远端数字语音信号生成估计的回声数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。回声消除装置中,自适应滤波器估计的回声数字语音信号比如和加法器输出的误差数字语音信号反馈回去控制双端通话检测模块检测双端通话概率,使得双端通话模块和自适应滤波器的相互制约,从而解决了自适应滤波算法中稳态失调量和收敛速度之间的矛盾,提高了双端通话检测模块的检测精度和滤波器的收敛性能。

Description

回声消除装置、回声消除方法、信号处理芯片及电子设备 技术领域
本申请实施例涉及信号处理技术领域,尤其涉及一种回声消除装置、回声消除方法、信号处理芯片及电子设备。
背景技术
回声消除目前是业内一大难题,从回声产生的途径有来看,除了环境原因产生的回声比如在免提通信系统中,由于扬声器的声音反馈到麦克风导致了回声,还包括网络传输延时所带来的回声。另外,还包括远端声音经过一重或者多重反射以后产生的间接回声。从回声消除的影响因素来看,不仅和通信系统终端设备的外部环境有关,还和运行通信系统的主机性能以及网络状况密切相关。而对于外部环境来说,其具体可以包括:麦克风和扬声器之间的相对距离、相对方向,扬声器与扬声器之间的相对距离以及方向,房间大小和房间墙壁材质等等。
回声的存在会影响到语音的清晰度,因此通过声学回声消除(AEC,Acoustic Echo Cancellation)来改善语音通信质量。回声消除算法(AEC)是使用一个自适应滤波器模拟回声路径,通过自适应算法不断调整滤波器的系数,使其冲击响应和真实回声路径相逼近。再结合远端语音信号和滤波器得到估计的回声信号。然后,从麦克风的输入信号中减去估计的回声信号,从而达到消除回声的目的。
但是,近端说话者语音信号的存在,会导致滤波器系数发散,从而影响回声消除的效果。因此,在现有技术的回声消除算法中必须要有双端通话检测(Double Talk Detection,DTD)。所谓双端通话是指麦克风采集的信号中既包括远端语音信号导致的回音,又包括近端说话者的语音信号。由于在回声消除过程中自适应滤波算法不可避免地将产生误调噪声,通过减小调整滤波器系数的更新步长即可减小算法的误调噪声,提高算法的收敛精度。但是滤波器系数更新步长的减小会降低算法的收敛速度和跟踪速度。因此,现有方案中固定步长的自适应滤波算法在收敛速度、跟踪速度和收敛精度等方面对算法调整因子的要求是相互矛盾的。另外,而且现有采用能量或相关的双端通话检测DTD算 法,选取的判决门限通常固定不变,由此会存在漏检和虚警概率,从而降低了双端通话检测DTD检测精度。
发明内容
有鉴于此,本发明实施例所解决的技术问题之一在于提供一种回声消除装置、回声消除方法、信号处理芯片及电子设备,用以克服现有技术中的上述缺陷。
本申请实施例提供了一种回声消除装置,其包括:
语音端点检测模块,用于检测近端数字语音信号中是否存在实际的回声数字语音信号;
双端通话检测模块,用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后检测双端通话概率以控制滤波器系数的更新;
自适应滤波器,用于根据所述滤波器系数以及远端数字语音信号生成估计的回声数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。
可选地,在本申请的任一实施例中,所述语音端点检测模块进一步用于根据所述近端数字语音信号和所述远端数字语音信号的能量与预设的能量门限来比较,以检测所述近端数字语音信号中是否存在所述实际的回声数字语音信号。
可选地,在本申请的任一实施例中,所述双端通话检测模块进一步用于在所述语音端点检测模块检测结果表明所述近端数字语音信号中不存在所述实际的回声数字语音信号时不启动,使得所述滤波器系数按历史步长进行更新;或者,所述双端通话检测模块进一步用于在所述语音端点检测模块检测结果表明所述近端数字语音信号中存在所述实际的回声数字语音信号时启动以更新所述滤波器系数。
可选地,在本申请的任一实施例中,所述双端通话检测模块进一步用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后通过估计的所述回声数字语音信号和估计的所述近端说话者的数字语音信号的估计能量检测所述双端通话概率进而控制所述滤波器系数的更新。
可选地,在本申请的任一实施例中,所述双端通话检测模块进一步用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后对所述估计能量进行平滑处理以根据所述平滑后的估计能量检测所述双端通话概率进而控制所述滤波器系数的更新。
可选地,在本申请的任一实施例中,所述双端通话检测模块进一步用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后通过估计的所述回声数字语音信号和估计的所述近端说话者的数字语音信号的所述估计能量确定分别不存在所述近端说话者的数字语音信号以及存在所述近端说话者的数字语音信号时存在所述近端数字语音信号的概率之比,以检测所述双端通话概率进而控制所述滤波器系数的更新。
可选地,在本申请的任一实施例中,所述双端通话检测模块进一步用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后对所述估计能量进行平滑处理以根据所述平滑后的估计能量确定分别不存在所述近端说话者的数字语音信号以及存在所述近端说话者的数字语音信号时存在所述近端数字语音信号的概率之比,以检测所述双端通话概率进而控制所述滤波器系数的更新。
可选地,在本申请的任一实施例中,所述分别不存在所述近端说话者的数字语音信号以及存在所述近端说话者的数字语音信号时存在所述近端数字语音信号的概率之比与所述双端通话概率成反比关系。
可选地,在本申请的任一实施例中,所述双端通话检测模块进一步用于根据所述双端通话概率确定步长更新因子,以根据所述步长更新因子确定所述滤波器系数的更新步长进而更新所述滤波器系数。
可选地,在本申请的任一实施例中,所述双端通话概率与所述步长更新因子成非线性关系。
可选地,在本申请的任一实施例中,所述双端通话概率的变化趋势与所述步长更新因子变化趋势相反。
可选地,在本申请的任一实施例中,若所述近端数字语音信号中同时存在所述近端说话者的数字语音信号和所述实际的回声数字语音信号,所述双端通话概率的值为1,所述步长更新因子为0,则所述滤波器系数的所述更新步长减小以减缓所述滤波器系数或者停止所述滤波器系数的更新;若所述近端数字语音信号中不存在所述近端说话者的数字语音信号而只存在所述实际的回声数字语音信号,所述双端通话概率的值为0,所述步长更新因子为非0值,则所述滤波器系数的所述更新步长增加以加快所述滤波器系数的更新。
可选地,在本申请的任一实施例中,所述双端通话检测模块进一步用于根据所述双端通话概率确定所述步长更新因子,以根据所述步长更新因子确定所述滤波器系数的所述更新步长,进而根据所述更新步长以及更新梯度进行所述 滤波器系数的更新。
可选地,在本申请的任一实施例中,所述双端通话检测模块进一步用于根据所述双端通话概率确定所述步长更新因子,以根据所述步长更新因子以及步长平滑量,确定所述滤波器系数的更新步长,进而根据所述更新步长以及所述更新梯度进行所述滤波器系数的更新。
可选地,在本申请的任一实施例中,所述更新步长与所述步长更新因子成线性关系。
可选地,在本申请的任一实施例中,所述滤波器系数与所述更新步长成线性关系。
可选地,在本申请的任一实施例中,还包括:加法模块,用于从所述近端数字语音信号中减去估计的所述回声数字语音信号得到误差数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。
本申请实施例还提供一种回声消除方法,其包括:
语音端点检测模块检测近端数字语音信号中是否存在实际的回声数字语音信号;
双端通话检测模块根据所述语音端点检测模块的检测结果决定是否启动,并在启动后通过检测双端通话概率以控制滤波器系数的更新;
自适应滤波器根据所述滤波器系数以及远端数字语音信号生成估计的回声数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。
本申请实施例还提供一种信号处理芯片,其包括本申请任一实施例所述的回声消除装置。
本申请实施例还提供一种电子设备,其包括本申请任一实施例所述的信号处理芯片。
本申请实施例中,通过语音端点检测模块检测近端数字语音信号中是否存在实际的回声数字语音信号;双端通话检测模块根据语音端点检测模块的检测结果来决定是否启动,并在启动后通过检测到的双端通话概率来以控制滤波器系数的更新;自适应滤波器根据所述滤波器系数以及远端数字语音信号生成估计的回声数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。由此可见,回声消除装置中,自适应滤波器估计的回声数字语音信号比如和加法器输出的误差数字语音信号反馈回去控制双端通话检测模块检测双端通话概率,使得双端通话模块和自适应滤波器的相互制约,从而解决了自适应滤波算法中稳态失调量和收敛速度之间的矛盾,提高了双端通话检测模块 的检测精度和滤波器的收敛性能。
附图说明
后文将参照附图以示例性而非限制性的方式详细描述本申请实施例的一些具体实施例。附图中相同的附图标记标示了相同或类似的部件或部分。本领域技术人员应该理解,这些附图未必是按比例绘制的。附图中:
图1为本申请实施例一中应用回声消除装置的信号处理芯片的结构示意图;
图2为本申请实施例二中信号处理芯片的工作流程示意图。
具体实施方式
实施本发明实施例的任一技术方案必不一定需要同时达到以上的所有优点。
下面结合本发明实施例附图进一步说明本发明实施例具体实现。
本申请实施例中,通过语音端点检测模块检测近端数字语音信号中是否存在实际的回声数字语音信号;双端通话检测模块根据语音端点检测模块的检测结果决定是否启动,并在启动后通过检测到的双端通话概率以控制滤波器系数的更新;自适应滤波器根据所述滤波器系数以及远端数字语音信号生成估计的回声数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。由此可见,回声消除装置中,自适应滤波器估计的回声数字语音信号比如和加法器输出的误差数字语音信号反馈回去控制双端通话检测模块检测双端通话概率,使得双端通话模块和自适应滤波器的相互制约,从而解决了自适应滤波算法中稳态失调量和收敛速度之间的矛盾,提高了双端通话检测模块的检测精度和滤波器的收敛性能。
图1为本申请实施例一中应用回声消除装置的信号处理芯片的结构示意图;如图1所示,该信号处理芯片包括回声消除装置100,该回声消除装置100具体包括语音端点检测模块106、双端通话检测模块108、自适应滤波器110,除此之外,该回声消除装置还可以包括:语音采集模块102、语音播放模块104、加法模块112,其中,语音采集模块102分别与语音端点检测模块106、双端通话检测模块108、加法模块112通讯连接,语音播放模块104分别与语音端点检测模块106、自适应滤波器110通讯连接,语音端点检测模块106与双端通话检测模块108通讯连接,双端通话检测模块108分别与自适应滤波器110、加法模块112通讯连接。
其中,语音采集模块102用于采集近端模拟语音信号y(t),以生成所述近端数字语音信号y(n);本实施例中,语音采集模块具体可以为麦克风,其采集的近端模拟语音信号y(t)可能包括近端说话者的语音信号s(t),也可能包括语音播放模块104播放远端模拟语音信号而导致的回声模拟语音信号d(t)。此处需要说明的是,所谓远端模拟语音信号x(t)、近端模拟语音信号y(t)是以通讯者双方语音信号在信号处理芯片中远近来进行区分。
其中,语音播放模块104,用于播放接收到的远端模拟语音信号x(t);本实施例中,语音播放模块104可以具体为扬声器。
其中,语音端点检测模块106,用于检测近端数字语音信号y(n)中是否存在实际的回声数字语音信号d(n);本实施例中,语音端点检测模块106又可以称之为语音端点检测器(Voice Activity Detector,简称VAD)。
其中,双端通话检测模块108,用于根据语音端点检测模块的检测结果决定是否启动,并在启动后通过检测到的双端通话概率以控制滤波器系数的更新;本实施例中,双端通话检测模块108又可以称之为双端通话检测器DTD。需要说明的是,本实施例中,滤波器系数的更新步长的调整具体由双端通话检测模块108来执行,但实际上,如果要实现双端检测模块108的轻量化,在另外一实施例中也可以在双端通话检测模块108和自适应滤波器110之间单独增加一更新模块,具体用于进行滤波器系数的更新或滤波器系数的更新步长的确定。
其中,自适应滤波器110,用于根据滤波器系数以及所述远端数字语音信号x(n)生成估计的回声数字语音信号
Figure PCTCN2019079174-appb-000001
以消除所述近端数字语音信号y(n)中实际的回声数字语音信号d(n)。本实施例中,所述自适应滤波器110比如为多时延块频域自适应滤波器。
其中,加法模块112,用于通过从所述近端数字语音信号y(n)中减去估计的回声数字语音信号
Figure PCTCN2019079174-appb-000002
得到所述误差数字语音信号e(n),以消除所述近端数字语音信号y(n)中实际的回声数字语音信号d(n)。本实施例中,加法模块112可以具体为加法器。所述估计的回声数字语音信号
Figure PCTCN2019079174-appb-000003
越准确,即越接近实际的所述回声数字语音信号d(n),则语音的清晰度越高。
进一步地,本实施例中,所述语音端点检测模块106进一步用于根据近端数字语音信号y(n)和远端数字语音信号x(n)的能量分别与预设的能量门限相比较,检测所述近端数字语音信号y(n)中是否存在实际的回声数字语音信号d(n)。若所述近端数字语音信号y(n)和远端数字语音信号x(n)的能 量均大于对应的预设的能量门限,则判定所述近端数字语音信号y(n)中存在实际的回声数字语音信号d(n)。
进一步地,本实施例中,所述语音端点检测模块106检测结果表明所述近端数字语音信号y(n)中不存在实际的回声数字语音信号d(n)时,双端通话检测模块108不启动,使得所述滤波器系数按照历史步长更新;所述语音端点检测模块检测结果表明所述近端数字语音信号中存在实际的回声数字语音信号d(n)时,双端通话检测模块108启动以确定滤波器系数的更新步长。
进一步地,本实施例中,若所述近端数字语音信号y(n)中同时存在近端说话者的数字语音信号s(n)和所述回声数字语音信号d(n),即处于双端通话状态,则滤波器系数的更新步长减小以减慢滤波器系数的更新或者停止滤波器系数的更新;若所述近端数字语音信号x(n)中不存在近端说话者的数字语音信号s(n)而只存在回声数字语音信号d(n),即处于单端通话状态,则所述滤波器系数的更新步长增加以加快滤波器系数的更新。有关如何进行更新步长的确定详见下述实施例记载。步长更新详见下述公式(1)-(6)定义。
本实施例中,所述自适应滤波器进一步用于根据所述滤波器系数以及所述远端数字语音信号x(n)生成估计的回声数字语音信号
Figure PCTCN2019079174-appb-000004
以下结合回声消除方法的实施例对上述信号处理芯片的工作原理进行示例性说明。
图2为本申请实施例二中信号处理芯片的工作流程示意图;对应上述图1,其包括:
S202、语音播放模块播放接收到的远端模拟语音信号x(t);
本实施例中,近端模拟语音信号y(t)中包括的回声模拟语音信号d(t)具体由远端模拟语音信号x(t)引起。因此,对于语音采集模块102来说,其输入的近端模拟语音信号y(t)可能包括说话者的模拟语音信号s(t)以及回声模拟语音信号d(t)。此处需要说明的是,如果存在远端模拟语音信号x(t)则播放,否则,不播放。
S204、语音采集模块采集近端模拟语音信号y(t),以生成所述近端数字语音信号y(n);
S206、语音端点检测模块检测所述近端数字语音信号y(n)中是否存在实际的回声数字语音信号d(n);
本实施例中,如前所述,若语音端点检测模块106为语音端点检测器VAD,则具体可以通过短时能量法、时域平均过零率法和短时相关性法等来检测远端 数字语音信号x(n)和近端数字语音信号y(n),进而判断是否存在实际的回声数字语音信号d(n)。进一步地,如果采取短时能量法,则可以通过语音端点检测模块106检测远端数字语音信号x(n)和近端数字语音信号y(n)的能量,并分别与预先设置的能量门限去做比对。如果远端数字语音信号x(n)和近端数字语音信号y(n)的能量均大于对应的能量门限值,则表明近端数字语音信号y(n)存在回声数字语音信号d(n),由于回声数字语音信号d(n)是因远端数字语音信号x(n)而产生,因此,可以理解为当存在远端数字语音信号x(n)时,也即存在回声数字语音信号d(n)。
S208、双端通话检测模块根据所述语音端点检测模块的检测结果决定是否启动,并在启动后检测双端通话概率以确定滤波器系数的更新步长;
本实施例中,如前所述,所述双端通话检测模块在所述检测结果表明所述近端数字语音信号y(n)中不存在实际的回声数字语音信号d(n)时不启动,使得所述滤波器系数按照历史步长更新滤波器系数,比如当以帧为单位进行回声消除时,则此时可以按照上一帧近端数字语音信号的更新步长更新滤波器系数;或者,所述双端通话检测模块108在所述检测结果表明所述近端数字语音信号y(n)中存在实际的回声数字语音信号d(n)时启动以根据检测到的双端通话概率控制滤波器系数更新步长的确定。
具体地,本实施例中,若所述近端数字语音信号y(n)中同时存在近端说话者的数字语音信号s(n)和所述实际的回声数字语音信号d(n),则所述滤波器系数的更新步长减小,控制滤波器系数更新变慢,或者直接停止滤波器系数的更新,此处之所以要使滤波器系数更新变慢甚至停止更新,主要考虑到近端说话者的数字语音信号s(n)的存在会导致滤波器系数发散,无法生成准确地估计的回声数字语音信号
Figure PCTCN2019079174-appb-000005
从而影响回声消除的有效性;若所述近端数字语音信号y(n)中不存在近端说话者的数字语音信号s(n)而只存在所述实际的回声数字语音信号d(n),则更新步长增加以更新滤波器系数。更新步长的计算请参见下述以概率模型为例的说明。
此处,如前所述,可以在双端通话检测模块和自适应滤波器之间单独增加一更新步长确定模块,具体用于计算更新步长,或者也可以由双端通话检测模块来计算更新步长。
S210、自适应滤波器根据所述滤波器系数以及远端数字语音信号x(n)生成估计的回声数字语音信号;
本实施例中,如前所述,该自适应滤波器110为多时延块频域自适应滤波 器,即其包括若干块自适应滤波器,比如自适应滤波器块的数量为D,从而实现更短的块延时、更快的收敛速度和更小的存储要求。
S212、加法模块从所述近端数字语音信号y(n)中减去估计的回声数字语音信号得到误差数字语音信号e(n)以消除所述近端数字语音信号y(n)中实际的回声数字语音信号d(n)。
下述以统计概率模型进行更新步长的确定为例来实现双端通话检测模块108和自适应滤波器110的相互制约进行示例性说明。另外,将具体应用上述回声消除的方案时,对近端语音数字语音信号、远端数字语音信号、回声数字语音信号是以帧为单位进行处理的,即参照自适应滤波器的频点的数量M进行帧的划分,即分别把回声数字语音信号d(n)、近端说话者的数字语音信号s(n)以及近端数字语音信号y(n)中每M个数据点记为1帧,针对回声数字语音信号d(n)、近端说话者的数字语音信号s(n)以及近端数字语音信号y(n)中的每一帧应用上述回声消除方案。
如前所述,存在回声数字语音信号d(n)情形实际上可以具体区分为:
(1)不存在近端说话者的数字语音信号s(n)而只存在实际的回声数字语音信号d(n),此种情形又称之单端通话(或又称之为single talk)
(2)既存在近端说话者的数字语音信号s(n)又存在实际的回声数字语音信号d(n),此种情形又称之为双端通话(或又称之double talk);
为此,用H 0和H 1分别表示不存在近端说话者的数字语音信号s(n)而只存在实际的回声数字语音信号d(n),以及同时存在近端说话者的数字语音信号s(n)以及实际的回声数字语音信号d(n)。
Figure PCTCN2019079174-appb-000006
在上述公式(1)中,D(i)=[D(i,1),D(i,2),...,D(i,M)],S(i)=[S(i,1),S(i,2),...,S(i,M)]和Y(i)=[Y(i,1),Y(i,2),...,Y(i,M)],分别表示实际的回声数字语音信号d(n)、近端说话者的数字语音信号s(n)以及近端数字语音信号y(n)的第i帧信号在第1至第M个频点上的频域信号,M的数值一般等于自适应滤波器的阶数。同样,X(i)=[X(i,1),X(i,2),...,X(i,M)]表示远端数字语音信号x(n)的第i帧信号在第1至第M个频点上的频域信号。
本实施例中,假设以近端说话者的数字语音信号s(n)和远端数字语音信号x(n)服从零均值的高斯分布为前提,且实际的回声数字语音信号d(n)、 近端说话者的数字语音信号s(n)互不相关,则存在如下算式(2)所示的关系:
Figure PCTCN2019079174-appb-000007
其中,σ s(i,k)和σ d(i,k)分别表示为近端说话者的数字语音信号s(n)和实际的回声数字语音信号d(n)的第i帧信号在第k个频点上的实际能量,p(Y(i,k)|H 0)表示当不存在近端说话者的数字语音信号s(n)时,近端数字语音信号y(n)的第i帧信号在第k个频点具有频域信号的概率,k=1,2...M,p(Y(i,k)|H 1)表示当存在近端说话者的数字语音信号s(n)时,近端数字语音信号y(n)的第i帧信号在第k个频点具有频域信号的概率。
根据贝叶斯法则,存在如下述算式(3)的关系:
Figure PCTCN2019079174-appb-000008
其中q=p(H 0)/p(H 1)表示在近端数字语音信号y(n)的一帧信号中近端说话者不说话的时间和说话时间之比,即一帧信号中不存在近端说话者的数字语音信号和存在近端说话者的数字语音信号的概率比。p(H 1|Y(i))表示近端数字语音信号y(n)的第i帧信号在第1至第M个频点上具有频域信号时,能检测到存在近端说话者的数字语音信号s(n)的概率,即所述双端通话概率。
对上述公式(3)做整理得到:
Figure PCTCN2019079174-appb-000009
其中Λ k(Y(i,k))是不存在近端说话者的数字语音信号s(n)时近端数字语音 信号y(n)的第i帧信号在第k个频点上存在频域信号的概率与存在近端说话者的数字语音信号s(n)时近端数字语音信号y(n)的第i帧信号在第k个频点上存在频域信号的概率之比(或者称之近端数字语音信号y(n)的第i帧信号在第k个频点上存在频域信号的似然比)。再参见上述公式(3)可知,上述似然比与所述双端通话概率成反比关系。σ s(i,k)和σ d(i,k)表示近端说话者的数字语音信号s(n)和实际的回声数字语音信号d(n)的第i帧信号在第k个频点上的实际能量,实际场景中难以获取,因此本实施例中,通过下述算式(5)进行估计得到。
另外,假设自适应滤波器对回声估计的足够精确,则自适应滤波器生成的
Figure PCTCN2019079174-appb-000010
在双端通话的时候加法器112输出误差数字语音信号e(n)=s(n),即
Figure PCTCN2019079174-appb-000011
通过e(n)对近端说话者的数字语音信号进行估计,
Figure PCTCN2019079174-appb-000012
表示估计的近端说话者的数字语音信号的第i帧信号在第k个频点上的频域信号,E(i,k)表示误差数字语音信号e(n)的第i帧信号在第k个频点上的频域信号,即通过误差数字语音信号e(n)的第i帧信号在第k个频点上的频域信号E(i,k)的能量得到估计的近端说话者的数字语音信号的第i+1帧信号在第k个频点上的估计能量
Figure PCTCN2019079174-appb-000013
以及通过估计的回声数字语音信号
Figure PCTCN2019079174-appb-000014
的第i帧信号在第k个频点上的频域信号的估计能量
Figure PCTCN2019079174-appb-000015
得到实际的回声数字语音信号d(n)的第i+1帧信号在第k个频点上的估计能量。即上述公式(4)中,近端说话者的数字语音信号s(n)的第i帧信号在第k个频点上的实际能量则使用误差数字语音信号e(n)的第i-1帧信号在第k个频点上的频域信号的能量代替,如果是回声数字语音信号d(n)的第i帧信号在第k个频点上的实际能量则使用估计的回声数字语音信号
Figure PCTCN2019079174-appb-000016
的第i-1帧信号在第k个频点上的频域信号的估计能量代替。
Figure PCTCN2019079174-appb-000017
Figure PCTCN2019079174-appb-000018
Figure PCTCN2019079174-appb-000019
其中λ s、λ d(=0.91)分别表示近端说话者的数字语音信号能量估计平滑参数、回声数字语音信号能量估计平滑参数,以对所述估计能量进行平滑处理以根据所述平滑后的估计能量确定所述似然比以检测所述双端通话概率进而控制所述滤波器系数的更新,其目的在于防止连续两帧近端说话者的数字语音信号之间 的能量值发生突变、连续两帧回声数字语音信号之间的能量值发生突变,对应平滑参数的大小根据不同的应用场景凭经验值设置。
但是,需要说明的是,在其他实施例中,也可以根据估计能量确定所述似然比以检测所述双端通话概率进而控制所述滤波器系数的更新。
由上述算式(1)-(6)可见,由于可通过自适应滤波器得到估计的回声语音数字信号
Figure PCTCN2019079174-appb-000020
同时,通过加法模块得到误差数字语音信号e(n),因此,对于双端通话检测模块来说,通过公式(5)可以得到估计的近端说话者的数字语音信号以及估计的回声数字语音信号的第i帧信号在第k个频点上的估计能量;再带入到上述算式(4)中得到近端数字语音信号y(n)的第i帧信号在第k个频点上存在频域信号的似然比;再带到上述算式(3)中得到近端数字语音信号y(n)的第i帧信号在第1至第M个频点上具有频域信号时,能检测到存在近端说话者的数字语音信号s(n)的概率即双端通话概率p(H 1|Y(i))。
但是,此处需要说明的是,上述公式(1)-(5)只是如何确定双端通话概率、步长更新因子以及更新步长的一种具体举例,对于本领域人员里说,在上述思想的启发下,也可以有其他等同替代方案。
参见上述公式(3),理论情况下,若双端通话检测模块检测到近端数字语音信号y(n)的第i帧信号中同时存在实际的回声数字语音信号d(n)、近端说话者的数字语音信号s(n)时,即q=p(H 0)/p(H 1)的值为0,由此得到p(H 1|Y(i))=1。反之,当双端通话检测模块(108)检测到近端数字语音信号y(n)中只存在实际的回声数字语音信号d(n)时,即q=p(H 0)/p(H 1)的值为无群大,因此得到p(H 1|Y(i))=0。但是,在实际应用时,p(H 1|Y(i))通常是在1和0之间,理论情况下才会为1或者0。
由上述可见,上述p(H 1|Y(i))的大小,实际表征了双端通话概率的大小。而p(H 1|Y(i))的具体大小又跟上述似然比Λ k(Y(i,k))直接关联,或者又称为似然比Λ k(Y(i,k))的大小会影响p(H 1|Y(i)),而似然比Λ k(Y(i,k))又随着近端说话者的数字语音信号s(n)、实际的回声数字语音信号d(n)的能量不断变化,因此,相当于p(H 1|Y(i))与上述近端说话者的数字语音信号s(n)和实际的回声数字语音信号d(n)的能量相关联。
因此,对于其中自适应滤波器110采用多延时块频域自适应滤波器时,其 中的一个滤波器块来说,比如第m块(m最大为上述D)自适应滤波器来说,其在针对近端语音数字语音信号y(n)的第i+1帧信号时,其对应的滤波器系数W(i+1,m)按照如下公式(6)进行计算。
W(i+1,m)=W(i,m)+μ(i)Φ(i,m)
Figure PCTCN2019079174-appb-000021
其中Φ(i,m)为针对近端语音数字语音信号y(n)的第i+1帧信号处理时滤波器系数更新梯度,在多延时块频域自适应滤波算法相关文献中有详细的计算方法,在这不再过多赘述。γ(=0.993)是滤波器系数的更新步长μ的平滑系数,
Figure PCTCN2019079174-appb-000022
是更新步长μ的步长更新因子,由式中可见具体根据p(H 1|Y(i))控制,其中α、β是正常数,α是用来控制步长因子的收敛速度,β是用来控制滤波器更新步长的上限,两者均根据实际的应用场景来设定大小。由上述公式(6)可见,所述双端通话概率p(H 1|Y(i))与所述步长更新因子
Figure PCTCN2019079174-appb-000023
成非线性关系。
再参见上述公式(6)可见,所述更新步长与所述步长更新因子成线性关系,且所述滤波器系数与所述更新步长成线性关系。
如前所述,p(H 1|Y(i))=1时,实际对应步长更新因子
Figure PCTCN2019079174-appb-000024
则参照上述公式(6),而所述滤波器系数的所述更新步长减小以减缓滤波器系数的更新,或者停止滤波器系数的更新;当p(H 1|Y(i))=0时,步长更新因子
Figure PCTCN2019079174-appb-000025
为非0值以根据上述公式(6)所述滤波器系数的所述更新步长增加以更新所述滤波器系数。由此可见,所述双端通话概率的变化趋势与所述步长更新因子变化趋势相反。而在实际过程中p(H 1|Y(i))大多在0和1之间,因此,即可通过上述实时计算的步长更新因子控制更新步长,进而进行滤波器系数的更新。实际上,再进一步参照公式(6)根据所述双端通话概率确定步长更新因子,以根据所述步长更新因子确定所述滤波器系数的更新步长,进而根据更新步长以及更新梯度进行所述滤波器系数的更新。为了防止更新步长的突变,再参见上述公式(6),引入了步长平滑量γμ(i),即在针对近端语音数字语音信号y(n)的第i+1帧信号(又称之为当前帧)进行更新步长的确定时,同时参考近端语音数字语音信号y(n)的第i帧信号的更新步长μ(i)。推而广之,所述双端通话检测模块进一步用于根据所述双端通话概率确定所述步长更 新因子,以根据所述步长更新因子以及步长平滑量,确定所述滤波器系数的更新步长,进而根据所述更新步长以及更新梯度进行所述滤波器系数的更新。
综上可见,参照上述公式(3)-(6),p(H 1|Y(i))与自适应滤波器输出的估计的回声数字语音信号
Figure PCTCN2019079174-appb-000026
强相关,而自适应滤波器的滤波系数更新又受双端通话检测模块的控制,从而实现了双端检测模块和自适应滤波器的相互制约,提高了自适应滤波器的收敛速度、跟踪速度和收敛精度,从而达到很好的回声消除。另外,由于似然比Λ k(Y(i,k))跟信号的能量之间关联,是不断变化的,由此导致p(H 1|Y(i))也是不断变化的,从而也就避免了现有技术中在进行双端通话检测时设定固定判决门限而导致的漏检和虚警概率,从而进一步保证了双端检测的检测精度。
本申请实施例的电子设备以多种形式存在,包括但不限于:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
(4)其他具有数据交互功能的电子装置。
至此,已经对本主题的特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作可以按照不同的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序,以实现期望的结果。在某些实施方式中,多任务处理和并行处理可以是有利的。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、 光学存储器等)上实施的计算机程序产品的形式。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定事务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行事务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (20)

  1. 一种回声消除装置,其特征在于,包括:
    语音端点检测模块,用于检测近端数字语音信号中是否存在实际的回声数字语音信号;
    双端通话检测模块,用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后检测双端通话概率以控制滤波器系数的更新;
    自适应滤波器,用于根据所述滤波器系数以及远端数字语音信号生成估计的回声数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。
  2. 根据权利要求1所述的装置,其特征在于,所述语音端点检测模块进一步用于根据所述近端数字语音信号和所述远端数字语音信号的能量与预设的能量门限来比较,以检测所述近端数字语音信号中是否存在所述实际的回声数字语音信号。
  3. 根据权利要求1或2所述的装置,其特征在于,所述双端通话检测模块进一步用于在所述语音端点检测模块检测结果表明所述近端数字语音信号中不存在所述实际的回声数字语音信号时不启动,使得所述滤波器系数按历史步长进行更新;或者,所述双端通话检测模块进一步用于在所述语音端点检测模块检测结果表明所述近端数字语音信号中存在所述实际的回声数字语音信号时启动以更新所述滤波器系数。
  4. 根据权利要求1所述的装置,其特征在于,所述双端通话检测模块进一步用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后通过估计的所述回声数字语音信号和估计的所述近端说话者的数字语音信号的估计能量检测所述双端通话概率进而控制所述滤波器系数的更新。
  5. 根据权利要求4所述的装置,其特征在于,所述双端通话检测模块进一步用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后对所述估计能量进行平滑处理以根据所述平滑后的估计能量检测所述双端通话概率进而控制所述滤波器系数的更新。
  6. 根据权利要求4所述的装置,其特征在于,所述双端通话检测模块进一步用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后通过估计的所述回声数字语音信号和估计的所述近端说话者的数字语音信号的所述估计能量确定分别不存在所述近端说话者的数字语音信号以及存在所述近端说 话者的数字语音信号时存在所述近端数字语音信号的概率之比,以检测所述双端通话概率进而控制所述滤波器系数的更新。
  7. 根据权利要求5所述的装置,其特征在于,所述双端通话检测模块进一步用于根据所述语音端点检测模块的检测结果决定是否启动,并在启动后对所述估计能量进行平滑处理以根据所述平滑后的估计能量确定分别不存在所述近端说话者的数字语音信号以及存在所述近端说话者的数字语音信号时存在所述近端数字语音信号的概率之比,以检测所述双端通话概率进而控制所述滤波器系数的更新。
  8. 根据权利要求6或7所述的装置,其特征在在于,所述分别不存在所述近端说话者的数字语音信号以及存在所述近端说话者的数字语音信号时存在所述近端数字语音信号的概率之比与所述双端通话概率成反比关系。
  9. 根据权利要求1-8任一项所述的装置,其特征在于,所述双端通话检测模块进一步用于根据所述双端通话概率确定步长更新因子,以根据所述步长更新因子确定所述滤波器系数的更新步长进而更新所述滤波器系数。
  10. 根据权利要求9所述的装置,其特征在于,所述双端通话概率与所述步长更新因子成非线性关系。
  11. 根据权利要求10所述的装置,其特征在于,所述双端通话概率的变化趋势与所述步长更新因子变化趋势相反。
  12. 根据权利要求11所述的装置,其特征在于,若所述近端数字语音信号中同时存在所述近端说话者的数字语音信号和所述实际的回声数字语音信号,所述双端通话概率的值为1,所述步长更新因子为0,则所述滤波器系数的所述更新步长减小以减缓所述滤波器系数或者停止所述滤波器系数的更新;若所述近端数字语音信号中不存在所述近端说话者的数字语音信号而只存在所述实际的回声数字语音信号,所述双端通话概率的值为0,所述步长更新因子为非0值,则所述滤波器系数的所述更新步长增加以加快所述滤波器系数的更新。
  13. 根据权利要求9所述的装置,其特征在于,所述双端通话检测模块进一步用于根据所述双端通话概率确定所述步长更新因子,以根据所述步长更新因子确定所述滤波器系数的所述更新步长,进而根据所述更新步长以及更新梯度进行所述滤波器系数的更新。
  14. 根据权利要求13所述的装置,其特征在于,所述双端通话检测模块进一步用于根据所述双端通话概率确定所述步长更新因子,以根据所述步长更新因子以及步长平滑量,确定所述滤波器系数的更新步长,进而根据所述更新 步长以及所述更新梯度进行所述滤波器系数的更新。
  15. 根据权利要求9-14中任一项所述的装置,其特征在于,所述更新步长与所述步长更新因子成线性关系。
  16. 根据权利要求9-15中任一项所述的装置,其特征在于,更新后的所述滤波器系数与所述更新步长成线性关系。
  17. 根据权利要求1-16中任一项所述的装置,其特征在于,还包括:加法模块,用于从所述近端数字语音信号中减去估计的所述回声数字语音信号得到误差数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。
  18. 一种回声消除方法,其特征在于,包括:
    语音端点检测模块检测近端数字语音信号中是否存在实际的回声数字语音信号;
    双端通话检测模块根据所述语音端点检测模块的检测结果决定是否启动,并在启动后通过检测双端通话概率以控制滤波器系数的更新;
    自适应滤波器根据所述滤波器系数以及远端数字语音信号生成估计的回声数字语音信号,以消除所述近端数字语音信号中所述实际的回声数字语音信号。
  19. 一种信号处理芯片,其特征在于,包括权利要求1-17任一项所述的装置。
  20. 一种电子设备,其特征在于,包括权利要求19所述的信号处理芯片。
PCT/CN2019/079174 2019-03-22 2019-03-22 回声消除装置、回声消除方法、信号处理芯片及电子设备 WO2020191512A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980000673.8A CN111989934B (zh) 2019-03-22 2019-03-22 回声消除装置、回声消除方法、信号处理芯片及电子设备
PCT/CN2019/079174 WO2020191512A1 (zh) 2019-03-22 2019-03-22 回声消除装置、回声消除方法、信号处理芯片及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/079174 WO2020191512A1 (zh) 2019-03-22 2019-03-22 回声消除装置、回声消除方法、信号处理芯片及电子设备

Publications (1)

Publication Number Publication Date
WO2020191512A1 true WO2020191512A1 (zh) 2020-10-01

Family

ID=72610367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/079174 WO2020191512A1 (zh) 2019-03-22 2019-03-22 回声消除装置、回声消除方法、信号处理芯片及电子设备

Country Status (2)

Country Link
CN (1) CN111989934B (zh)
WO (1) WO2020191512A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113345457B (zh) * 2021-06-01 2022-06-17 广西大学 一种基于贝叶斯理论的声学回声消除自适应滤波器及滤波方法
CN114650340B (zh) * 2022-04-21 2024-07-02 深圳市中科蓝讯科技股份有限公司 一种回声消除方法、装置及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1822709A (zh) * 2006-03-24 2006-08-23 北京中星微电子有限公司 一种麦克风回声消除系统
US9767828B1 (en) * 2012-06-27 2017-09-19 Amazon Technologies, Inc. Acoustic echo cancellation using visual cues
CN109348072A (zh) * 2018-08-30 2019-02-15 湖北工业大学 一种应用于回声抵消系统的双端通话检测方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1822709A (zh) * 2006-03-24 2006-08-23 北京中星微电子有限公司 一种麦克风回声消除系统
US9767828B1 (en) * 2012-06-27 2017-09-19 Amazon Technologies, Inc. Acoustic echo cancellation using visual cues
CN109348072A (zh) * 2018-08-30 2019-02-15 湖北工业大学 一种应用于回声抵消系统的双端通话检测方法

Also Published As

Publication number Publication date
CN111989934A (zh) 2020-11-24
CN111989934B (zh) 2022-03-04

Similar Documents

Publication Publication Date Title
US11349525B2 (en) Double talk detection method, double talk detection apparatus and echo cancellation system
CN108141502B (zh) 降低声学系统中的声学反馈的方法及音频信号处理设备
WO2018188282A1 (zh) 回声消除方法、装置、会议平板及计算机存储介质
EP2783504B1 (en) Acoustic echo cancellation based on ultrasound motion detection
US9344579B2 (en) Variable step size echo cancellation with accounting for instantaneous interference
US10978086B2 (en) Echo cancellation using a subset of multiple microphones as reference channels
JP4377952B1 (ja) 適応フィルタ及びこれを有するエコーキャンセラ
JP2018535602A (ja) 音響エコーキャンセレーションのためのダブルトーク検出
US9286883B1 (en) Acoustic echo cancellation and automatic speech recognition with random noise
CN104050971A (zh) 声学回声减轻装置和方法、音频处理装置和语音通信终端
US20200396329A1 (en) Acoustic echo cancellation based sub band domain active speaker detection for audio and video conferencing applications
CN111199748B (zh) 回声消除方法、装置、设备以及存储介质
WO2020181766A1 (zh) 一种语音信号处理方法、装置、设备及可读存储介质
WO2007139621A1 (en) Adaptive acoustic echo cancellation
WO2020252629A1 (zh) 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备
CN105432062B (zh) 用于回波去除的方法、设备及介质
JPWO2019044176A1 (ja) 音声処理装置及び音声処理方法、並びに情報処理装置
WO2020191512A1 (zh) 回声消除装置、回声消除方法、信号处理芯片及电子设备
CN105491256A (zh) 一种声学回声消除器启动阶段稳健的步长调整方法
WO2024088142A1 (zh) 音频信号处理方法、装置、电子设备及可读存储介质
CN111355855B (zh) 回声处理方法、装置、设备及存储介质
US12039989B2 (en) Echo canceller with variable step-size control
US10827076B1 (en) Echo path change monitoring in an acoustic echo canceler
KR20220157475A (ko) 반향 잔류 억제
Muzahid et al. Advanced double-talk detection algorithm based on joint signal energy and cross-correlation estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19922183

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19922183

Country of ref document: EP

Kind code of ref document: A1