WO2020232659A1 - 双端通话检测方法、双端通话检测装置以及回声消除系统 - Google Patents

双端通话检测方法、双端通话检测装置以及回声消除系统 Download PDF

Info

Publication number
WO2020232659A1
WO2020232659A1 PCT/CN2019/087907 CN2019087907W WO2020232659A1 WO 2020232659 A1 WO2020232659 A1 WO 2020232659A1 CN 2019087907 W CN2019087907 W CN 2019087907W WO 2020232659 A1 WO2020232659 A1 WO 2020232659A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice signal
digital voice
end digital
double
far
Prior art date
Application number
PCT/CN2019/087907
Other languages
English (en)
French (fr)
Inventor
韩文凯
李国梁
王鑫山
郭红敬
朱虎
Original Assignee
深圳市汇顶科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市汇顶科技股份有限公司 filed Critical 深圳市汇顶科技股份有限公司
Priority to EP19929818.3A priority Critical patent/EP3796629B1/en
Priority to CN201980000966.6A priority patent/CN112292844B/zh
Priority to PCT/CN2019/087907 priority patent/WO2020232659A1/zh
Priority to US17/031,862 priority patent/US11349525B2/en
Publication of WO2020232659A1 publication Critical patent/WO2020232659A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/32Reducing cross-talk, e.g. by compensating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Definitions

  • the embodiments of the present application relate to the technical field of signal processing, and in particular, to a double-ended call detection method, a double-ended call detection device, and an echo cancellation system.
  • Echo cancellation is currently a major problem in the industry. From the perspective of echo generation, in addition to environmental reasons, such as in a hands-free communication system, the echo is caused by the sound of the speaker feedback to the microphone, and it also includes the network transmission delay. The echo brought. In addition, it also includes the indirect echo produced by the far-end sound after one or multiple reflections. From the perspective of the influencing factors of echo cancellation, it is not only related to the external environment of the terminal equipment of the communication system, but also closely related to the performance of the host running the communication system and network conditions. As for the external environment, it may specifically include: the relative distance and direction between the microphone and the speaker, the relative distance and direction between the speaker and the speaker, the size of the room and the wall material of the room, and so on.
  • AEC Acoustic Echo Cancellation
  • AEC Acoustic Echo Cancellation
  • the echo cancellation algorithm uses an adaptive filter to simulate the echo path, and continuously adjusts the coefficients of the filter through the adaptive algorithm to make the impulse response close to the true echo path. Then combine the far-end speech signal and filter to get the estimated echo signal. Then, the estimated echo signal is subtracted from the input signal of the microphone to achieve the purpose of eliminating the echo.
  • DTD Double Talk Detection
  • one of the technical problems solved by the embodiments of the present application is to provide a double-talk detection method, a double-talk detection device, and an echo cancellation system to overcome the above-mentioned defects in the prior art.
  • the embodiment of the application provides a double-ended call detection method, which includes: according to the energy ratio between the far-end digital voice signal and the near-end digital voice signal, and the near-end digital voice signal and the far-end digital voice signal The frequency coherence value between the two is used to determine whether the digital voice signal of the near-end speaker still exists in the near-end digital voice signal.
  • the embodiment of the present application provides a double-ended call detection device, which includes: a double-ended call detector, which is used to determine the energy ratio between the far-end digital voice signal and the near-end digital voice signal, and The frequency coherence value between the near-end digital voice signal and the far-end digital voice signal is used to determine whether the near-end digital voice signal still has a digital voice signal of the near-end speaker.
  • An embodiment of the application provides an echo cancellation system, which includes a double-talk detection device and an adaptive filter.
  • the double-talk detection device includes a double-talk detector, and the double-talk detector is used for The energy ratio between the digital voice signal and the near-end digital voice signal, and the frequency coherence value between the near-end digital voice signal and the far-end digital voice signal, to determine whether the near-end digital voice signal still exists The digital voice signal of the near-end speaker.
  • the determination is made based on the energy ratio between the far-end digital voice signal and the near-end digital voice signal, and the frequency coherence value between the near-end digital voice signal and the far-end digital voice signal. Whether the digital voice signal of the near-end speaker still exists in the near-end digital voice signal. It can be seen that, based on the energy ratio and frequency coherence value at the same time, missed detection and false detection are avoided, and the accuracy of double-ended call detection is improved. When applied to the field of echo cancellation, the near-end voice is completely eliminated. The echo in the signal improves the communication experience of both parties on the call.
  • FIG. 1 is a schematic structural diagram of an echo cancellation system in Embodiment 1 of this application;
  • FIG. 2 is a schematic diagram of the working process of echo cancellation by the echo cancellation system in the second embodiment of the application;
  • FIG. 3 is a schematic flowchart of a double-ended call detection method in Embodiment 3 of this application;
  • FIG. 4 is a schematic diagram of the process of the double-ended call detection performed by the double-ended call detection device in the fourth embodiment of the application.
  • the determination is made based on the energy ratio between the far-end digital voice signal and the near-end digital voice signal, and the frequency coherence value between the near-end digital voice signal and the far-end digital voice signal. Whether the digital voice signal of the near-end speaker still exists in the near-end digital voice signal. It can be seen that, based on the energy ratio and frequency coherence value at the same time, missed detection and false detection are avoided, and the accuracy of double-ended call detection is improved. When applied to the field of echo cancellation, the near-end voice is completely eliminated. The echo in the signal improves the communication experience of both parties on the call.
  • the double-talk detection module only has a technical relationship with the near-end digital voice signal y(n) and the far-end digital voice signal x(n), and has nothing to do with the adaptive filter. Therefore, in fact, the embodiment of the present application
  • the double-ended call detection solution is not limited to echo cancellation application scenarios, but can also be applied to other scenarios.
  • FIG. 1 is a schematic diagram of the structure of the echo cancellation system applied in the first embodiment of the application; as shown in Figure 1, the echo cancellation system 100 specifically includes a voice endpoint detection module 106, a double-talk detector 108, and an adaptive filter 110.
  • the echo cancellation system may also include: a voice collection module 102, a voice playback module 104, and an addition module 112.
  • the voice collection module 102 communicates with the voice endpoint detection module 106, the double-talk detector 108, and the addition module 112, respectively.
  • the voice playback module 104 is respectively connected to the voice endpoint detection module 106 and the adaptive filter 110, the voice endpoint detection module 106 is in communication with the double-talk detector 108, and the double-talk detector 108 is respectively connected to the adaptive filter 110.
  • the addition module 112 is connected to the communication.
  • the voice collection module 102 is used to collect the near-end analog voice signal y(t) to generate the near-end digital voice signal y(n); in this embodiment, the voice collection module may specifically be a microphone.
  • the end analog voice signal y(t) may include the voice signal s(t) of the near-end speaker, or the echo analog voice signal d(t) caused by the voice playback module 104 playing the remote analog voice signal.
  • the voice playing module 104 is used to play the remote analog voice signal corresponding to the received remote digital voice signal x(n); in this embodiment, the voice playing module 104 may be specifically a speaker.
  • the voice endpoint detection module 106 is used to detect whether there is an echo digital voice signal d(n) in the near-end digital voice signal y(n); in this embodiment, the voice endpoint detection module 106 can also be called voice endpoint detection Detector (Voice Activity Detector, VAD for short).
  • VAD Voice Activity Detector
  • the double-talk detector is activated to compare the far-end digital voice signal x(n) with the near-end digital voice The energy ratio between signals y(n), and the frequency coherence value between the near-end digital voice signal y(n) and the far-end digital voice signal x(n), to determine the near-end digital voice signal Whether the digital voice signal s(n) of the near-end speaker still exists in y(n) to control the update of filter coefficients;
  • the adaptive filter 110 is used to generate an estimated echo digital voice signal according to the filter coefficients and the remote digital voice signal x(n) To eliminate the echo digital voice signal d(n) present in the near-end digital voice signal y(n).
  • the adaptive filter 110 is, for example, a multi-delay block frequency domain adaptive filter.
  • the adding module 112 is configured to subtract the estimated echo digital voice signal from the near-end digital voice signal y(n)
  • the error digital speech signal e(n) is obtained to eliminate the echo digital speech signal d(n) existing in the near-end digital speech signal y(n).
  • the addition module 112 may specifically be an adder.
  • the estimated echo digital speech signal The more accurate, that is, the closer to the actual echo digital speech signal d(n), the higher the speech intelligibility.
  • the voice endpoint detection module 106 is configured to detect the near-end digital voice signal based on the power corresponding to the far-end digital voice signal x(n) and the near-end digital voice signal y(n). Whether there is an echo digital voice signal d(n) in the voice signal y(n). For example, if the powers of the near-end digital voice signal y(n) and the far-end digital voice signal x(n) are both greater than the corresponding preset threshold, it is determined that the near-end digital voice signal y(n) has echo digital Speech signal d(n).
  • the double-talk detector 108 if there is no echo digital voice signal d(n) in the near-end digital voice signal y(n), the double-talk detector 108 is not activated, so that the filter coefficient is History step update.
  • the double-talk detection module only has a technical relationship with the near-end digital voice signal y(n) and the far-end digital voice signal x(n), and is related to the estimated echo digital voice signal. And the error digital voice signal e(n) is irrelevant, therefore, the decoupling of the double-talk detection module and the adaptive filter is realized.
  • Fig. 2 is a schematic diagram of the working process of echo cancellation by the echo cancellation system in the second embodiment of the application; corresponding to Fig. 1, it includes:
  • the voice playing module plays the received remote analog voice signal x(t);
  • the echo analog voice signal d(t) included in the near-end analog voice signal y(t) is specifically caused by the far-end analog voice signal x(t). Therefore, for the voice collection module 102, the input near-end analog voice signal y(t) may include the speaker's analog voice signal s(t) and the echo analog voice signal d(t). It should be noted here that if there is a remote analog voice signal x(t), it will be played, otherwise, it will not be played.
  • the voice collection module collects the near-end analog voice signal y(t) to generate the near-end digital voice signal y(n);
  • the voice endpoint detection module detects whether there is an echo digital voice signal d(n) in the near-end digital voice signal y(n);
  • the voice endpoint detection module 106 is a voice endpoint detector VAD, it can specifically detect the remote end through the short-term energy method, the time-domain average zero-crossing rate method, and the short-term correlation method. The digital voice signal x(n) and the near-end digital voice signal y(n) are then judged whether there is an echo digital voice signal d(n). Further, if the short-term energy method is adopted, the voice endpoint detection module can detect the near-end based on the power corresponding to the far-end digital voice signal x(n) and the near-end digital voice signal y(n) Whether there is an echo digital voice signal d(n) in the digital voice signal y(n).
  • the near-end digital voice signal y(n) and the far-end digital voice signal x(n) are both greater than the corresponding preset threshold, it is determined that the near-end digital voice signal y(n) has echo digital
  • the voice signal d(n) because the echo digital voice signal d(n) is generated by the far-end digital voice signal x(n), it can be understood as when there is a far-end digital voice signal x(n), that is There is an echo digital voice signal d(n).
  • the following formula (1) please refer to the following formula (1).
  • VAD represents the output signal of the voice endpoint detection module.
  • the value of the output signal VAD is 1, which means that the echo digital voice signal d(n) exists in the near-end digital voice signal y(n), Otherwise, in other situations, the value of the output signal VAD is 0, which means that there is no echo digital voice signal d(n) in the near-end digital voice signal y(n).
  • the double-talk detector is activated to determine whether the near-end digital voice signal y(n) still exists
  • the digital voice signal s(n) of the near-end speaker to control the update of the filter coefficients
  • the double-talk detector is further used for the energy ratio between the far-end digital voice signal x(n) and the near-end digital voice signal y(n), and the near-end The frequency coherence value between the digital voice signal y(n) and the far-end digital voice signal x(n) to determine whether the near-end digital voice signal y(n) still has the digital voice signal of the near-end speaker s(n).
  • the double-talk detector does not start, so that the filter coefficient is The filter coefficients are updated in historical steps. For example, when echo cancellation is performed in units of frames, the filter coefficients can be updated according to the update step of the near-end digital speech signal of the previous frame; if the near-end digital speech signal y When there is an echo digital voice signal d(n) in (n), the double-talk detector is activated to determine whether the near-end digital voice signal y(n) still has a digital voice signal s(n) of the near-end speaker.
  • the near-end digital voice signal y(n) contains both the near-end speaker's digital voice signal s(n) and the existing echo digital voice signal d(n)
  • the update step length of the filter coefficient is reduced, and the update of the filter coefficient is controlled to be slow, or the update of the filter coefficient is directly stopped.
  • the reason why the update of the filter coefficient is slowed down or even stopped is mainly considered in the near end.
  • the presence of the speaker's digital speech signal s(n) will cause the filter coefficients to diverge, and it is impossible to generate an accurately estimated echo digital speech signal Thereby affecting the effectiveness of echo cancellation; if the near-end speaker's digital voice signal s(n) does not exist in the near-end digital voice signal y(n) and only the existing echo digital voice signal d(n) exists , The update step size is increased to update the filter coefficients.
  • a separate update step determination module can be added between the double-talk detector and the adaptive filter, specifically used to calculate the update step, or it can also be calculated by the double-talk detector Update step size.
  • the adaptive filter generates an estimated echo digital voice signal according to the filter coefficient and the remote digital voice signal x(n);
  • the adaptive filter 110 is a multi-delay block frequency domain adaptive filter, that is, it includes several block adaptive filters, for example, the number of adaptive filter blocks is D, so as to achieve Shorter block delays, faster convergence speed and smaller storage requirements.
  • the addition module subtracts the estimated echo digital voice signal from the near-end digital voice signal y(n) to obtain an error digital voice signal e(n) to eliminate the presence of the near-end digital voice signal y(n). Echo digital speech signal d(n).
  • the near-end voice digital voice signal, the far-end digital voice signal, and the echo digital voice signal are processed in units of frames, that is, with reference to the number of frequency points M of the adaptive filter.
  • Frame division that is, each M data points in the echo digital speech signal d(n), the near-end speaker's digital speech signal s(n), and the near-end digital speech signal y(n) are recorded as 1 frame.
  • the echo cancellation scheme is applied to each frame in the echo digital voice signal d(n), the digital voice signal s(n) of the near-end speaker, and the near-end digital voice signal y(n).
  • Frame processing that is, with reference to the number of frequency points M of the adaptive filter.
  • FIG. 3 is a schematic flowchart of a double-ended call detection method in Embodiment 3 of this application; as shown in FIG. 3, it includes:
  • step S218A is specifically: acquiring the i-th frame signal in the remote digital voice signal x(n);
  • step S218B is specifically: acquiring the i-th frame signal in the near-end digital voice signal y(n);
  • an analog-to-digital converter is specifically used to perform analog-to-digital conversion on the far-end analog voice signal x(t) to obtain a far-end digital voice signal x(n), and to perform analog-to-digital conversion on the near-end analog voice signal y(t).
  • the near-end digital voice signal y(n) is obtained and sent directly to the double-talk detection device.
  • the echo digital speech signal d(n), the digital speech signal s(n) of the near-end speaker, and the near-end digital Each M data points in the voice signal y(n) is recorded as 1 frame.
  • double-talk detection it is essentially the digital voice signal s(n) for the near-end speaker and the near-end digital voice signal y(n Each frame of signal in) is processed, or, in other words, a double-talk detection scheme is applied in units of frames.
  • steps S218A and S218B if the current time is for the i-th frame signal in the far-end digital speech signal x(n) and the i-th frame signal in the near-end digital speech signal y(n), then it can be targeted Obtain the i-th frame signal in the far-end digital voice signal x(n) from the analog-to-digital converter, and apply the double-talk detection scheme to the i-th frame signal in the near-end digital voice signal y(n), that is, finally through the subsequent steps Determine whether the i-th frame signal in the near-end digital voice signal y(n) still has the digital voice signal of the near-end speaker.
  • step S228A is specifically: performing time-domain overlap storage on the i-th frame signal and the i-1th frame signal in the acquired remote digital voice signal x(n);
  • step S228B specifically includes: performing time-domain overlap storage on the i-th frame signal and the i-1th frame signal in the acquired near-end digital voice signal y(n);
  • the processing of calculating the power can also be in the frequency domain. Calculating the power, you can get the distribution of signal power in the frequency domain, and get the relationship between signal power and frequency.
  • steps S228A and S228B the i-th frame signal and the i-1th frame signal in the acquired remote digital voice signal x(n) are stored in the time domain, followed by steps S238A and S238B. Convert it to the frequency domain to get the corresponding frequency domain signal.
  • step S238A is specifically: the frequency domain conversion module converts the i-th frame signal and the i-1th frame signal in the overlapping stored near-end digital voice signal y(n) into corresponding frequency domain signals;
  • step S238B is specifically: the frequency domain conversion module converts the i-th frame signal and the i-1th frame signal in the remote digital voice signal x(n) into corresponding frequency domain signals;
  • the i-th frame signal and the i-1-th frame signal stored in the near-end digital voice signal y(n) overlapped in the time domain are converted into corresponding frequency domain signals through discrete Fourier transform, and
  • the i-th frame signal and the i-1th frame signal stored in the remote digital speech signal x(n) overlapped in the time domain are converted into corresponding frequency domain signals.
  • the specific processing of time domain to frequency domain conversion by the frequency domain conversion module is shown in formula (2).
  • i is greater than or equal to 1:
  • the i-th frame signal in the remote digital voice signal x(n) is denoted as x(i)
  • x(i) [x((i-1)M)...x(iM-1)] T ;
  • the i-1th frame signal in the remote digital voice signal x(n) is denoted as x(i-1)
  • x(i-1) [x(((i-1)-1)M)...x((i-1)M-1)] T
  • the i-th frame signal in the near-end digital speech signal y(n) is denoted as y(i)
  • y(i) [y((i-1)M)...y(iM-1)] T ;
  • the i-1th frame signal in the near-end digital speech signal y(n) is denoted as y(i-1)
  • y(i-1) [y(((i-1)-1)M)...y((i-1)M-1)] T
  • F is the discrete Fourier transform matrix, the dimension of which is 2M ⁇ 2M;
  • X(i) is the frequency domain signal corresponding to the i-th frame and the i-1th frame stored in the time domain overlapped storage in the remote digital voice signal x(n);
  • Y(i) is the frequency domain signal corresponding to the i-th frame and the i-1th frame stored in the time domain overlapped storage in the near-end digital voice signal y(n).
  • step S248A is specifically: the power calculation module calculates the power of the i-th frame and the i-1th frame signal in the remote digital voice signal x(n) to overlap and store the corresponding frequency domain signal; in fact, it is common in the art From the perspective of the entire technical solution, the power here can be obtained from the power spectrum, or the power corresponding to the frequency point can be obtained by looking up the table.
  • step S248B is specifically: the power calculation module calculates the power of the i-th frame and the i-1th frame signal in the near-end digital voice signal y(n) to overlap and store the corresponding frequency domain signal;
  • S248C Calculate the correlation power between the power of the far-end digital voice signal and the near-end digital voice signal
  • step S248C is specifically: the power calculation module calculates the i-th frame and the i-1th frame signal in the far-end digital voice signal x(n) to overlap and store the corresponding frequency domain signal and the near-end digital voice signal y( n) The signal in the i-th frame and the i-1th frame overlaps and stores the associated power between the corresponding frequency domain signals;
  • steps S248A, S248B, and S248C are specifically executed by the power calculation module, and the detailed calculation of the power processing in the steps S248A, S248B, and S248C is detailed in the following formula (3).
  • i is greater than or equal to 2:
  • the power of the i-th and i-1th frames of the remote digital voice signal x(n) overlaps and stores the corresponding frequency domain signal as
  • the i-th frame and the i-1th frame signal overlap and store the power of the corresponding frequency domain signal as
  • the correlation power between frequency domain signals is denoted as
  • the i-1th frame and the i-2th frame signal overlap and store the power of the corresponding frequency domain signal as
  • the i-1th and i-2th frames of the far-end digital voice signal x(n) are overlapped and stored corresponding to the i-1th and i-2th frames of the near-end digital voice signal y(n).
  • the correlation power between frequency domain signals is denoted as
  • is a smoothing parameter. Different values are selected according to different application scenarios to prevent the correlation power between two consecutive frames of the far-end digital voice signal x(n) and the near-end digital voice signal y(n) from sudden changes, corresponding to smoothing
  • the size of the parameter is set with empirical values according to different application scenarios.
  • Power is recorded as The associated power is recorded as The respectively set smoothing parameters can be called the first smoothing parameter, the second smoothing parameter and the third smoothing parameter. If the values of the three smoothing parameters are completely the same, they are collectively called the smoothing parameter.
  • the power corresponding to the near-end digital voice signal y(n) is calculated according to the frequency domain signal corresponding to the near-end digital voice signal y(n) and the first smoothing parameter, and according to the far-end
  • the terminal digital voice signal x(n) corresponds to the frequency domain signal and the second smoothing parameter calculates the power corresponding to the remote digital voice signal x(n).
  • determining the near-end digital voice signal y(n) and the far-end digital voice signal according to the near-end digital voice signal y(n), the far-end digital voice signal x(n), and a third smoothing parameter The associated power between x(n).
  • step S258 is specifically: the voice endpoint detection module overlaps and stores the power of the corresponding frequency domain signal according to the i-th frame and the i-1th frame signal in the remote digital voice signal x(n), and the near-end digital
  • the i-th frame and the i-1th frame signal in the speech signal y(n) overlap and store the power of the corresponding frequency domain signal, and detect whether there is an echo digital speech signal d(n) in the near-end digital speech signal y(n);
  • the far-end digital voice signal x(n) is The i-th frame and the i-1th frame signal overlap and store the power of the corresponding frequency domain signal, and the i-th frame and the i-1th frame signal in the near-end digital voice signal y(n) overlap and store the corresponding frequency domain signal power respectively
  • the corresponding power threshold comparison if the i-th frame and the i-1th frame signal in the far-end digital voice signal x(n) overlap the power of the corresponding frequency domain signal, and the i-th frame in the near-end digital voice signal y(n) When the powers of the corresponding frequency-domain signals overlapped with the i-1th frame signal are respectively greater than the corresponding power thresholds, it is determined that the echo digital voice signal d(n) exists in the near-end digital voice signal y(n).
  • step S268 is specifically: the double-talk detector is activated to determine whether the near-end digital voice signal y(n) still exists in the near-end speaker's digital voice signal s(n).
  • step S268 an exemplary technical description of step S268 is shown in FIG. 3 below.
  • Fig. 4 is a schematic diagram of a flow of double-ended call detection performed by the double-ended call detection device in the fourth embodiment of the application; as shown in Fig. 4, it includes:
  • step S268A' is specifically: the energy ratio calculation module in the double-talk detector overlaps and stores the corresponding frequency domain according to the i-th frame and the i-1th frame signal in the remote digital voice signal x(n) The power of the signal and the signal of the i-th frame and the i-1th frame in the near-end digital voice signal y(n) overlap and store the power of the corresponding frequency domain signal to determine the energy ratio;
  • the energy ratio is written as ⁇ is the control factor for calculating the energy ratio, avoiding the formula (4)
  • the denominator caused by 0 is not large.
  • the value of ⁇ is relatively Smaller.
  • the energy ratio is directly proportional to the power of the corresponding frequency domain signal in the i-th and i-1th frames of the remote digital speech signal x(n), and inversely proportional to the near-end digital speech signal y(n)
  • the i-th frame and the i-1th frame signal overlap and store the corresponding power of the corresponding frequency domain signal.
  • the energy ratio calculation module is specifically used to determine the energy ratio.
  • the technical processing for determining the energy ratio can also be integrated into other modules, and it is not necessary to add an energy ratio calculation module.
  • step S268B' is specifically: the energy ratio calculation module in the double-talk detector determines the preset frequency band according to the frequency of the digital voice signal s(n) of the near-end speaker;
  • step S268C' is specifically: the energy ratio calculation module determines the i-th frame of the near-end digital voice signal y(n) and the far-end digital voice signal x(n) in a preset frequency band Overlap and store the energy ratio between the corresponding frequency domain signals with the i-1th frame signal;
  • the preset frequency band is set according to the frequency of the digital speech signal s(n) of the near-end speaker and selected from the energy ratio calculated in step S268A'
  • the frequency range of human digital voice signals is usually 300 Hz to 3400 Hz. Therefore, the following formula (5) is used to filter the energy ratio.
  • f L represents the lower limit of the frequency
  • f H represents the upper limit of the frequency
  • f s is the sampling frequency of each digital voice signal.
  • n 0 represents the lower limit of the frequency of the near-end speaker's digital speech signal s(n) corresponding to the frequency point in the frequency domain
  • n 1 represents the near-end speaker's digital speech
  • the upper limit of the frequency of the signal s(n) corresponds to the frequency point in the frequency domain.
  • the frequency points corresponding to the upper and lower limits of the above frequency are determined in the frequency domain.
  • the near-end digital voice signal y(n) and the i-th frame signal of the far-end digital voice signal x(n) respectively correspond to multiple frequency domain signals, and then multiple energy ratios are correspondingly obtained.
  • the upper and lower limits of the frequency are
  • the respective corresponding frequency points in the frequency domain define the positions of the energy ratios filtered according to the frequency among the multiple energy ratios.
  • the frequency range of the human digital voice signal is usually 300Hz ⁇ 3400Hz
  • 300HZ is the lower limit of the frequency.
  • the lower limit of the frequency has a corresponding frequency point n 0 in the frequency domain, and it is determined that 3400Hz is the frequency.
  • Upper limit the upper limit of the frequency has a corresponding frequency point n 1 in the frequency domain.
  • step S268D' is specifically: the energy ratio calculation module performs frequency domain signals corresponding to the near-end digital voice signal y(n) and the far-end digital voice signal x(n) in the preset frequency band. The energy ratio between quantile processing;
  • the quantile processing is performed, and the energy ratio filtered by the step according to the frequency is sorted from small to large. A part of the energy ratio is further screened out to perform double-talk detection based on the further screened energy ratio.
  • quartile processing can be used, then the first quantile is 0.25, the second quantile is 0.5, the third quantile is 0.75, and the fourth quantile is 1, then the second quantile is selected
  • the energy ratio between the number 0.5 and the third quantile is 0.75, that is, the energy ratio after quantile processing is obtained, or it is also called the judgment quantity after quantile processing, as the effective energy ratio, detailed calculation The way is shown in formula (6).
  • 0.5 and 0.75 represent the second quantile and the third quantile
  • n L represents the frequency point corresponding to the lower limit of the above frequency after quantile processing in the frequency domain
  • n H represents the quantile
  • the upper limit of the above frequency after number processing corresponds to the frequency point in the frequency domain
  • the upper limit and lower limit of the above frequency after quantile processing respectively correspond to the frequency point in the frequency domain.
  • the above quartiles can also be sorted from small to large.
  • the energy ratio between the first quantile and the third quantile can also be selected.
  • the specific choice of the energy ratio between the two or more quantiles can be flexibly determined according to the needs of the application scenario.
  • step S268E' is specifically: the energy ratio calculation module determines the average value of the energy ratio after the quantile processing
  • the average value of the energy ratio after the quantile processing is calculated through step S268', as detailed in the following formula (7).
  • n H -n L +1 represents the number of frequency points at which the energy ratio after the quantile processing is located.
  • step S268A" is specifically: the coherence calculation module in the double-talk detector corresponds to the near-end digital voice signal y(n) and the far-end digital voice signal x(n) respectively. Power, and the associated power between the near-end digital voice signal y(n) and the far-end digital voice signal x(n) to determine the near-end digital voice signal y(n) and the far-end digital The frequency coherence value between speech signals x(n);
  • the frequency coherence value in step S268A" is calculated according to the following formula (8).
  • the frequency coherence value is proportional to the overlap between the near-end digital voice signal y(n) and the i-th frame and the i-1th frame signal in the far-end digital voice signal x(n) Store the associated power between the corresponding frequency domain signals, and inversely proportional to the near-end digital voice signal y(n) and the far-end digital voice signal x(n). Corresponding to the power of the frequency domain signal.
  • the double-talk detector is further used to calculate the power corresponding to the near-end digital voice signal y(n) and the far-end digital voice signal x(n) respectively. Determine the frequency coherence value between the near-end digital voice signal y(n) and the far-end digital voice signal x(n).
  • step S268B" is specifically: the double-talk detector is further configured to determine the preset frequency band according to the frequency of the digital voice signal s(n) of the near-end speaker;
  • step S268C" is specifically: determining the i-th frame and i-th frame of the near-end digital voice signal y(n) and the far-end digital voice signal x(n) in the preset frequency band.
  • One frame of signal overlaps and stores the frequency coherence values between corresponding frequency domain signals;
  • step S268D" is specifically: corresponding to the frequency coherence value between the near-end digital voice signal y(n) and the far-end digital voice signal x(n) in the preset frequency band. Perform quantile processing;
  • the purpose of quantile processing is to further avoid missed detection and false detection.
  • the specific quantile processing method is also quartile processing, so that the lower limit of the frequency is in the frequency domain.
  • the corresponding frequency point n L the lower limit of the frequency after quantile processing corresponds to the frequency point n H in the frequency domain.
  • the upper limit and lower limit of the frequency after quantile processing are corresponding to the frequency points in the frequency domain.
  • some frequency coherence values are filtered out, which is recorded as
  • step S268E" is specifically: determining the average value of the frequency coherence value after the quantile processing
  • the average value of the frequency coherence value after the quantile processing is recorded as For the meaning of other parameters, please refer to the aforementioned record.
  • step S268 is specifically: according to the comparison result of the average value of the energy ratio after the quantile processing and the energy threshold, and the average of the frequency coherence value after the quantile processing Value and the coherence value threshold comparison result, judging whether the near-end speaker's digital voice signal s(n) exists in the i-th frame signal of the near-end digital voice signal y(n);
  • step S268G is executed, otherwise, S268H is executed.
  • the following formula (10) is specifically used to determine whether there is a digital voice signal of the near-end speaker.
  • the energy threshold is denoted as ⁇ T
  • the coherence value threshold is denoted as c T
  • DTD is a variable representing the judgment result. If the average value of the energy ratio after the quantile processing is greater than The energy threshold, or if the average value of the energy ratio after the quantile processing is less than the energy threshold and the average value of the frequency coherence value after the quantile processing is less than the coherence value Threshold, the value of DTD is 1; in other cases, the value of DTD is 0.
  • S268G is a two-terminal call
  • step S268G is specifically: the double-talk detector determines that the near-end digital voice signal y(n) still has the digital voice signal s(n) of the near-end speaker;
  • S268H is a single-ended call
  • step S268H is specifically: the double-talk detector determines that there is no digital voice signal s(n) of the near-end speaker in the near-end digital voice signal y(n);
  • the DTD value is 0, it is determined that there is no digital voice signal s(n) of the near-end speaker in the near-end digital voice signal y(n), that is, it is a single-ended conversation state.
  • the determination result in step S268I is either a double-talk state or a single-talk state, and the update of the filter coefficient is controlled according to different determination results, so that the adaptive filter is based on the filter coefficient And the remote digital speech signal x(n) generates an estimated echo digital speech signal To eliminate the echo digital voice signal d(n) present in the near-end digital voice signal y(n).
  • the preset frequency band is determined according to the frequency of the digital voice signal s(n) of the near-end speaker, but this is not a unique limitation. In fact, according to the double-talk detection In the application scenario of the solution, the preset frequency band can be determined according to the needs of the scenario, and then the energy ratio and frequency coherence value can be screened.
  • the quantile processing method is specifically used to further filter the energy ratio and frequency coherence value from the multiple energy ratios and frequency coherence values filtered according to the preset frequency band, but it is not unique here.
  • the energy ratio and frequency coherence value can be further screened from the multiple energy ratios and frequency coherence values filtered according to the preset frequency band according to the needs of the scenario.
  • the method of specifically determining whether the near-end speaker's digital voice signal s(n) still exists in the near-end digital voice signal y(n) is not a unique limitation.
  • the application scenario of the terminal call detection solution can be set to a more precise judgment method according to the needs of the scenario.
  • the frequency coherence value between the signals is described as an example. However, for those of ordinary skill in the art, if the energy ratio and the frequency coherence value have been determined when the solution is executed, the determination of the energy ratio and the frequency coherence value can be omitted directly. And if it is necessary to determine the energy ratio and the frequency coherence value, the above-mentioned embodiment uses the power corresponding to the far-end digital voice signal and the near-end digital voice signal to determine the far-end digital voice signal and the near-end digital voice signal respectively.
  • the power can be obtained from the power spectrum as an example.
  • obtaining the power from the power spectrum is merely an example, and alternatively, the power can also be obtained by looking up a table.
  • Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
  • Such terminals include: PDA, MID and UMPC devices, such as iPad.
  • Portable entertainment equipment This type of equipment can display and play multimedia content.
  • Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as program modules.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific transactions or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network execute transactions.
  • program modules can be located in local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephone Function (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

一种双端通话检测方法、双端通话检测装置以及回声消除系统,双端通话检测方法包括:根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。双端通话检测方法避免了漏检和误检情形,提高了双端通话检测的精度,当应用于回声消除领域的时候,较为彻底地消除了近端语音信号中的回声,提高了通话双方的通讯体验。

Description

双端通话检测方法、双端通话检测装置以及回声消除系统 技术领域
本申请实施例涉及信号处理技术领域,尤其涉及一种双端通话检测方法、双端通话检测装置以及回声消除系统。
背景技术
回声消除目前是业内一大难题,从回声产生的途径有来看,除了环境原因产生的回声比如在免提通信系统中,由于扬声器的声音反馈到麦克风导致了回声,还包括网络传输延时所带来的回声。另外,还包括远端声音经过一重或者多重反射以后产生的间接回声。从回声消除的影响因素来看,不仅和通信系统终端设备的外部环境有关,还和运行通信系统的主机性能以及网络状况密切相关。而对于外部环境来说,其具体可以包括:麦克风和扬声器之间的相对距离、相对方向,扬声器与扬声器之间的相对距离以及方向,房间大小和房间墙壁材质等等。
回声的存在会影响到语音的清晰度,因此通过声学回声消除(AEC,Acoustic Echo Cancellation)来改善语音通信质量。回声消除算法(AEC)是使用一个自适应滤波器模拟回声路径,通过自适应算法不断调整滤波器的系数,使其冲击响应和真实回声路径相逼近。再结合远端语音信号和滤波器得到估计的回声信号。然后,从麦克风的输入信号中减去估计的回声信号,从而达到消除回声的目的。
但是,近端说话者语音信号的存在,会导致滤波器系数发散,从而影响回声消除的效果。因此,在现有技术的回声消除算法中必须要有双端通话检测(Double Talk Detection,DTD)。所谓双端通话是指麦克风采集的信号中既包括远端语音信号导致的回音,又包括近端说话者的语音信号。
现有双端通话检测方案中,通常存在漏检和误检情形,导致检测的精度较低,如果出现漏检由此导致后续的自适应滤波器系数发散;而如果出现误检,将单端通话(Single Talk,ST)误当做双端通话(Double Talk,DT)处理,则会减缓甚至停止自适应滤波器系数的更新,由此最终导致回声消除算法输出的语音信号中存在较大的残留回声,影响通话双方的通讯体验。
发明内容
有鉴于此,本申请实施例所解决的技术问题之一在于提供一种双端通话检测方法、双端通话检测装置以及回声消除系统,用以克服现有技术中的上述缺陷。
本申请实施例提供一种双端通话检测方法,其包括:根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语 音信号。
本申请实施例提供一种双端通话检测装置,其包括:双端通话检测器,所述双端通话检测器用于根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。
本申请实施例提供一种回声消除系统,其包括双端通话检测装置以及自适应滤波器,所述双端通话检测装置包括:双端通话检测器,所述双端通话检测器用于根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。
本申请实施例中,通过根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。由此可见,由于同时基于能量比和频率相干值,从而避免了漏检和误检情形,提高了双端通话检测的精度,当应用于回声消除领域的时候,较为彻底地消除了近端语音信号中的回声,提高了通话双方的通讯体验。
附图说明
后文将参照附图以示例性而非限制性的方式详细描述本申请实施例的一些具体实施例。附图中相同的附图标记标示了相同或类似的部件或部分。本领域技术人员应该理解,这些附图未必是按比例绘制的。附图中:
图1为本申请实施例一中应用回声消除系统的结构示意图;
图2为本申请实施例二中回声消除系统消除回声的工作流程示意图;
图3为本申请实施例三中双端通话检测方法的流程示意图;
图4为本申请实施例四中双端通话检测装置进行双端通话检测的流程示意图。
具体实施方式
实施本申请实施例的任一技术方案必不一定需要同时达到以上的所有优点。
下面结合本申请实施例附图进一步说明本申请实施例具体实现。
本申请实施例中,通过根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。由此可见,由于同时基于能量比和频率相干值,从而避免了漏检和误检情形,提高了双端通话检测的精度,当应用于回声消除领域的时候,较为彻底地消除了近端语音信号中的回声,提高了通话双方的通讯体验。另外,双端通话检测模块只与所述近端数字语音信号y(n) 和远端数字语音信号x(n)有技术关系,而与自适应滤波器无关,因此,实际上本申请实施例的双端通话检测方案,并非仅仅局限于回声消除应用场景中,也可以应用于其他场景中。
图1为本申请实施例一中应用回声消除系统的结构示意图;如图1所示,回声消除系统100具体包括语音端点检测模块106、双端通话检测器108、自适应滤波器110,除此之外,该回声消除系统还可以包括:语音采集模块102、语音播放模块104、加法模块112,其中,语音采集模块102分别与语音端点检测模块106、双端通话检测器108、加法模块112通讯连接,语音播放模块104分别与语音端点检测模块106、自适应滤波器110通讯连接,语音端点检测模块106与双端通话检测器108通讯连接,双端通话检测器108分别与自适应滤波器110、加法模块112通讯连接。
其中,语音采集模块102用于采集近端模拟语音信号y(t),以生成所述近端数字语音信号y(n);本实施例中,语音采集模块具体可以为麦克风,其采集的近端模拟语音信号y(t)可能包括近端说话者的语音信号s(t),也可能包括语音播放模块104播放远端模拟语音信号而导致的回声模拟语音信号d(t)。
其中,语音播放模块104,用于播放接收到的远端数字语音信号x(n)对应的远端模拟语音信号;本实施例中,语音播放模块104可以具体为扬声器。
其中,语音端点检测模块106,用于检测近端数字语音信号y(n)中是否存在回声数字语音信号d(n);本实施例中,语音端点检测模块106又可以称之为语音端点检测器(Voice Activity Detector,简称VAD)。
若所述近端数字语音信号y(n)中存在回声数字语音信号d(n),则所述双端通话检测器启动以根据远端数字语音信号x(n)与所述近端数字语音信号y(n)之间的能量比,以及所述近端数字语音信号y(n)与所述远端数字语音信号x(n)之间的频率相干值,判断所述近端数字语音信号y(n)中是否还存在近端说话者的数字语音信号s(n),以控制滤波器系数的更新;
自适应滤波器110,用于根据滤波器系数以及所述远端数字语音信号x(n)生成估计的回声数字语音信号
Figure PCTCN2019087907-appb-000001
以消除所述近端数字语音信号y(n)中存在的回声数字语音信号d(n)。本实施例中,所述自适应滤波器110比如为多时延块频域自适应滤波器。
其中,加法模块112,用于通过从所述近端数字语音信号y(n)中减去估计的回声数字语音信号
Figure PCTCN2019087907-appb-000002
得到所述误差数字语音信号e(n),以消除所述近端数字语音信号y(n)中存在的回声数字语音信号d(n)。本实施例中,加法模块112可以具体为加法器。所述估计的回声数字语音信号
Figure PCTCN2019087907-appb-000003
越准确,即越接近实际的所述回声数字语音信号d(n),则语音的清晰度越高。
进一步地,本实施例中,所述语音端点检测模块106用于根据所述远端数字语音信号x(n)与所述近端数字语音信号y(n)对应的功率检测所述近端数字语音信号y(n)中是否存在回声数字语音信号d(n)。比如若所述近端数字语音信号y(n) 和远端数字语音信号x(n)的功率均大于对应的预设门限,则判定所述近端数字语音信号y(n)中存在回声数字语音信号d(n)。
进一步地,本实施例中,若所述近端数字语音信号y(n)中实际不存在回声数字语音信号d(n)时,双端通话检测器108不启动,使得所述滤波器系数按照历史步长更新。
本实施例中,由图1可见,双端通话检测模块只与所述近端数字语音信号y(n)和远端数字语音信号x(n)有技术关系,而与估计的回声数字语音信号
Figure PCTCN2019087907-appb-000004
以及所述误差数字语音信号e(n)无关,因此,实现了双端通话检测模块和自适应滤波器的解耦。
以下结合回声消除方法的实施例对回声消除的工作原理进行示例性说明。
图2为本申请实施例二中回声消除系统消除回声的工作流程示意图;对应上述图1,其包括:
S202、语音播放模块播放接收到的远端模拟语音信号x(t);
本实施例中,近端模拟语音信号y(t)中包括的回声模拟语音信号d(t)具体由远端模拟语音信号x(t)引起。因此,对于语音采集模块102来说,其输入的近端模拟语音信号y(t)可能包括说话者的模拟语音信号s(t)以及回声模拟语音信号d(t)。此处需要说明的是,如果存在远端模拟语音信号x(t)则播放,否则,不播放。
S204、语音采集模块采集近端模拟语音信号y(t),以生成所述近端数字语音信号y(n);
S206、语音端点检测模块检测所述近端数字语音信号y(n)中是否存在回声数字语音信号d(n);
本实施例中,如前所述,若语音端点检测模块106为语音端点检测器VAD,则具体可以通过短时能量法、时域平均过零率法和短时相关性法等来检测远端数字语音信号x(n)和近端数字语音信号y(n),进而判断是否存在回声数字语音信号d(n)。进一步地,如果采取短时能量法,则所述语音端点检测模块可以根据所述远端数字语音信号x(n)与所述近端数字语音信号y(n)对应的功率检测所述近端数字语音信号y(n)中是否存在回声数字语音信号d(n)。比如若所述近端数字语音信号y(n)和远端数字语音信号x(n)的功率均大于对应的预设门限,则判定所述近端数字语音信号y(n)中存在回声数字语音信号d(n),由于回声数字语音信号d(n)是因远端数字语音信号x(n)而产生,因此,可以理解为当存在远端数字语音信号x(n)时,也即存在回声数字语音信号d(n),更为直观的请参见如下公式(1)。
Figure PCTCN2019087907-appb-000005
上述公式(1)中,VAD表示语音端点检测模块的输出信号,该输出信号VAD的值为1,即表示所述近端数字语音信号y(n)中存在回声数字语音信号d(n),否则在其他情形下,该输出信号VAD的值为0,即表示所述近端数字语音信号y(n)中实际不存在回声数字语音信号d(n)。
S208、若所述近端数字语音信号y(n)中存在回声数字语音信号d(n)时则双端通话检测器启动,以判断所述近端数字语音信号y(n)中是否还存在近端说话者的数字语音信号s(n)以控制滤波器系数的更新;
本实施例中,所述双端通话检测器进一步用于所述根据远端数字语音信号x(n)与所述近端数字语音信号y(n)之间的能量比,以及所述近端数字语音信号y(n)与所述远端数字语音信号x(n)之间的频率相干值,判断所述近端数字语音信号y(n)中是否还存在近端说话者的数字语音信号s(n)。
本实施例中,如前所述,若所述近端数字语音信号y(n)中实际不存在回声数字语音信号d(n)时双端通话检测器不启动,使得所述滤波器系数按照历史步长更新滤波器系数,比如当以帧为单位进行回声消除时,则此时可以按照上一帧近端数字语音信号的更新步长更新滤波器系数;若所述近端数字语音信号y(n)中存在回声数字语音信号d(n)时双端通话检测器启动以判断所述近端数字语音信号y(n)中是否还存在近端说话者的数字语音信号s(n)。
具体地,本实施例中,若所述近端数字语音信号y(n)中同时存在近端说话者的数字语音信号s(n)和所述存在的回声数字语音信号d(n),则所述滤波器系数的更新步长减小,控制滤波器系数更新变慢,或者直接停止滤波器系数的更新,此处之所以要使滤波器系数更新变慢甚至停止更新,主要考虑到近端说话者的数字语音信号s(n)的存在会导致滤波器系数发散,无法生成准确地估计的回声数字语音信号
Figure PCTCN2019087907-appb-000006
从而影响回声消除的有效性;若所述近端数字语音信号y(n)中不存在近端说话者的数字语音信号s(n)而只存在所述存在的回声数字语音信号d(n),则更新步长增加以更新滤波器系数。
此处,如前所述,可以在双端通话检测器和自适应滤波器之间单独增加一更新步长确定模块,具体用于计算更新步长,或者也可以由双端通话检测器来计算更新步长。
S210、自适应滤波器根据所述滤波器系数以及远端数字语音信号x(n)生成估计的回声数字语音信号;
本实施例中,如前所述,该自适应滤波器110为多时延块频域自适应滤波器,即其包括若干块自适应滤波器,比如自适应滤波器块的数量为D,从而实现更短的块延时、更快的收敛速度和更小的存储要求。
S212、加法模块从所述近端数字语音信号y(n)中减去估计的回声数字语音信号得到误差数字语音信号e(n)以消除所述近端数字语音信号y(n)中存在的回声数字语音信号d(n)。
在具体应用上述回声消除的方案时,对近端语音数字语音信号、远端数字语音信号、回声数字语音信号是以帧为单位进行处理的,即参照自适应滤波器的频点的数量M进行帧的划分,即分别把回声数字语音信号d(n)、近端说话者的数字语音信号s(n)以及近端数字语音信号y(n)中每M个数据点记为1帧,针对回声数字语音信号d(n)、近端说话者的数字语音信号s(n)以及近端数字语音信号y(n)中的每 一帧应用上述回声消除方案。对应地,在进行双端通话检测时,实质上也是针对回声数字语音信号d(n)、近端说话者的数字语音信号s(n)以及近端数字语音信号y(n)中的每一帧进行处理。
如前所述,存在回声数字语音信号d(n)情形实际上可以具体区分为:
(1)不存在近端说话者的数字语音信号s(n)而只存在回声数字语音信号d(n),此种情形又称之单端通话(或又称之为single talk)
(2)既存在近端说话者的数字语音信号s(n)又存在回声数字语音信号d(n),此种情形又称之为双端通话(或又称之double talk);
下述实施例中,将着重以双端通话检测装置如何实现双端通话检测为例进行说明。
图3为本申请实施例三中双端通话检测方法的流程示意图;如图3所示,其包括:
S218A、获取远端数字语音信号;
本实施例中,步骤S218A具体为:获取远端数字语音信号x(n)中第i帧信号;
S218B、获取近端数字语音信号;
本实施例中,步骤S218B具体为:获取近端数字语音信号y(n)中第i帧信号;
本实施例中,具体通过模数转换器对远端模拟语音信号x(t)进行模数转换得到远端数字语音信号x(n)、对近端模拟语音信号y(t)进行模数转换得到近端数字语音信号y(n),并直接发送给双端通话检测装置。
如前所述,若参照自适应滤波器的频点的数量M进行帧的划分,即分别把回声数字语音信号d(n)、近端说话者的数字语音信号s(n)以及近端数字语音信号y(n)中每M个数据点记为1帧,在进行双端通话检测时,实质上也是针对近端说话者的数字语音信号s(n)以及近端数字语音信号y(n)中的每一帧信号进行处理,或者,换言之,是以帧为单位应用双端通话检测方案。
因此,在步骤S218A、S218B中,若当前时刻针对远端数字语音信号x(n)中第i帧信号,以及近端数字语音信号y(n)中第i帧信号的话,则可针对性地从模数转换器获取远端数字语音信号x(n)中第i帧信号,以及近端数字语音信号y(n)中第i帧信号应用双端通话检测方案即可,即通过后续步骤最终判断近端数字语音信号y(n)中第i帧信号是否还存在近端说话者的数字语音信号。
S228A、远端数字语音信号重叠存储
本实施例中,步骤S228A具体为:对获取到的远端数字语音信号x(n)中第i帧信号与第i-1帧信号进行时域重叠存储;
S228B、近端数字语音信号重叠存储
本实施例中,步骤S228B具体为:对获取到近端数字语音信号y(n)中第i帧信号与第i-1帧信号进行时域重叠存储;
本实施例中,当在频域上应用上述回声消除方案,尤其考虑到后续自适应滤 波器为频域自适应滤波器的话,均可以是基于远端数字语音信号x(n)中每一帧信号对应频域信号,以及近端数字语音信号y(n)中每一帧信号对应频域信号,而在本实施例中,在后续步骤中,计算功率的处理也可以是在频域,通过计算功率,可以得到信号功率在频域的分布状况,获取信号功率随着频率的变化关系。为了减少资源消耗,在步骤S228A和S228B中对获取到的远端数字语音信号x(n)中第i帧信号与第i-1帧信号进行时域重叠存储,后续在步骤S238A、步骤S238B中将其转换到频域上得到对应频域信号。
S238A、近端数字语音信号的频域转换
本实施例中,步骤S238A具体为:频域转换模块将重叠存储的近端数字语音信号y(n)中第i帧信号与第i-1帧信号转换为对应频域信号;
S238B、远端数字语音信号的频域转换
本实施例中,步骤S238B具体为:所述频域转换模块将远端数字语音信号x(n)中第i帧信号与第i-1帧信号转换为对应频域信号;
本实施例中,通过离散傅里叶变换分别将近端数字语音信号y(n)中按照时域重叠存储的第i帧信号与第i-1帧信号转换为对应的频域信号,以及将远端数字语音信号x(n)中按照时域重叠存储的第i帧信号与第i-1帧信号转换为对应的频域信号。具体地,通过频域转换模块进行时域到频域转换的具体处理如公式(2)所示。
Figure PCTCN2019087907-appb-000007
上述公式(2)中,i大于等于1:
远端数字语音信号x(n)中第i帧信号记为x(i)
x(i)=[x((i-1)M)...x(iM-1)] T
远端数字语音信号x(n)中第i-1帧信号记为x(i-1)
x(i-1)=[x(((i-1)-1)M)...x((i-1)M-1)] T
近端数字语音信号y(n)中第i帧信号记为y(i)
y(i)=[y((i-1)M)...y(iM-1)] T
近端数字语音信号y(n)中第i-1帧信号记为y(i-1)
y(i-1)=[y(((i-1)-1)M)...y((i-1)M-1)] T
F为离散傅里叶变换矩阵,其维度为2M×2M;
X(i)为远端数字语音信号x(n)中时域重叠存储的第i帧与第i-1帧对应频域信号;
Y(i)为近端数字语音信号y(n)中时域重叠存储的第i帧与第i-1帧对应频域信号。
由上述公式(2)可见,考虑到应用到回声消除的应用场景中,为了实现后续自适应滤波的快速卷积运算,上述采用了第i帧与第i-1帧信号进行时域重叠存储,从而降低资源消耗。
S248A、计算远端数字语音信号功率
本实施例中,步骤S248A具体为:功率计算模块计算远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率;实际上,对于本领域普通技术人员来说,从整个技术方案上来看,此处的功率可以由功率谱得到,也可以以查表的方式获得频点对应的功率。
S248B、计算近端数字语音信号功率
本实施例中,步骤S248B具体为:所述功率计算模块计算近端数字语音信号y(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率;
S248C、计算远端数字语音信号与近端数字语音信号功率之间的关联功率;
本实施例中,步骤S248C具体为:所述功率计算模块计算远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号与近端数字语音信号y(n)中第i帧和第i-1帧信号重叠存储对应频域信号之间的关联功率;
本实施例中,具体通过功率计算模块执行上述步骤S248A、S248B、S248C,其详细计算步骤S248A、S248B、S248C中功率的处理详见下述公式(3)。
Figure PCTCN2019087907-appb-000008
上述公式(3)中,i大于等于2:
远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率记为
Figure PCTCN2019087907-appb-000009
近端数字语音信号y(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率记为
Figure PCTCN2019087907-appb-000010
远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号与近端数字语音信号y(n)中第i帧和第i-1帧信号重叠存储对应频域信号之间的关联功率记为
Figure PCTCN2019087907-appb-000011
远端数字语音信号x(n)中第i-1帧和第i-2帧信号重叠存储对应频域信号的功率记为
Figure PCTCN2019087907-appb-000012
近端数字语音信号y(n)中第i-1帧和第i-2帧信号重叠存储对应频域信号的功率记为
Figure PCTCN2019087907-appb-000013
远端数字语音信号x(n)中第i-1帧和第i-2帧信号重叠存储与近端数字语音信号y(n)中第i-1帧和第i-2帧信号重叠存储对应频域信号之间的关联功率记为
Figure PCTCN2019087907-appb-000014
λ为平滑参数,根据不同的应用场景选取不同的值,防止远端数字语音信号x(n)和近端数字语音信号y(n)中连续两帧信号之间的关联功率发生突变,对应平 滑参数的大小根据不同的应用场景凭经验值设置。针对功率记为
Figure PCTCN2019087907-appb-000015
功率记为
Figure PCTCN2019087907-appb-000016
关联功率记为
Figure PCTCN2019087907-appb-000017
分别设置的平滑参数又可以称之为第一平滑参数、第二平滑参数以第三平滑参数,这三个平滑参数值若完全相同,则统一称之为平滑参数。
Figure PCTCN2019087907-appb-000018
表示矩阵元素乘积,“*”表示复数共轭。
具体地,比如,i=2时,上述公式(3)具体为:
Figure PCTCN2019087907-appb-000019
由上述公式(3)可见,根据所述近端数字语音信号y(n)对应频域信号以及第一平滑参数计算所述近端数字语音信号y(n)对应的功率,以及根据所述远端数字语音信号x(n)对应频域信号以及第二平滑参数计算所述远端数字语音信号x(n)对应的功率。以及根据所述近端数字语音信号y(n)和所述远端数字语音信号x(n)以及第三平滑参数确定所述近端数字语音信号y(n)与所述远端数字语音信号x(n)之间的关联功率。
S258、回声检测
本实施例中,步骤S258具体为:所述语音端点检测模块根据远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率,以及近端数字语音信号y(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率,检测所述近端数字语音信号y(n)中是否存在回声数字语音信号d(n);
若存在,则执行S268,否则跳转到步骤S218A、S218B,以分别获取远端数字语音信号x(n)中第i+1帧信号,以及获取近端数字语音信号y(n)中第i+1帧信号,并在S228A中对远端数字语音信号x(n)中第i+1帧信号与第i帧信号进行时域重叠存储,以及在S228B中对近端数字语音信号y(n)中第i+1帧信号与第i帧信号进行时域重叠存储;
本实施例中,如前所述,若基于能量法检测所述近端数字语音信号y(n)中是否存在回声数字语音信号d(n)的话,则远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率,以及近端数字语音信号y(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率分别与对应的功率门限比较,若远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率,以及近端数字语音信号y(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率分别均大于对应的功率门限,则判定所述近端数字语音信号y(n)中存在回声数字语音信号d(n)。
S268、双端通话检测
本实施例中,步骤S268具体为:双端通话检测器启动,以判断所述近端数 字语音信号y(n)中是否还存在近端说话者的数字语音信号s(n)。
本实施例中,有关步骤S268的示例性技术描述详见下述图3。
图4为本申请实施例四中双端通话检测装置进行双端通话检测的流程示意图;如图4所示,其包括:
S268A’、计算能量比
本实施例中,步骤S268A’具体为:所述双端通话检测器中的能量比计算模块根据远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率以及近端数字语音信号y(n)中第i帧和第i-1帧信号重叠存储对应频域信号的功率确定所述能量比;
本实施例中,具体采用如下公式(4)计算所述能量比:
Figure PCTCN2019087907-appb-000020
公式(4)中,能量比记为
Figure PCTCN2019087907-appb-000021
δ为能量比计算的控制因子,避免了公式(4)中
Figure PCTCN2019087907-appb-000022
为0导致的分母无群大,一般情况下,δ的数值相对
Figure PCTCN2019087907-appb-000023
较小。
由上述公式可见,所述能量比正比于所述远端数字语音信号x(n)第i帧和第i-1帧信号重叠存储对应频域信号的功率,反比于所述近端数字语音信号y(n)第i帧和第i-1帧信号重叠存储对应频域信号的对应的功率。
本实施例中,具体通过能量比计算模块来确定所述能量比,但是,实际上,也可以将确定所述能量比的技术处理集成到其他模块上,并非要特别增加一个能量比计算模块。
S268B’、确定预设频段
本实施例中,步骤S268B’具体为:所述双端通话检测器中的能量比计算模块根据所述近端说话者的数字语音信号s(n)的频率确定所述预设频段;
S268C’、初步筛选能量比
本实施例中,步骤S268C’具体为:所述能量比计算模块确定在预设频段内所述近端数字语音信号y(n)和所述远端数字语音信号x(n)的第i帧和第i-1帧信号重叠存储分别对应频域信号之间的能量比;
本实施例中,为了进一步避免漏检以及误检,根据所述近端说话者的数字语音信号s(n)的频率设定所述预设频段并从步骤S268A’计算得到的能量比中筛选出一部分,比如,人类的数字语音信号的频率段通常在300Hz~3400Hz,因此,通过下述公式(5)进行能量比的筛选。
Figure PCTCN2019087907-appb-000024
公式(5)中,f L表示频率的下限,f H表示频率的上限,f s为各数字语音信号采样频率,对于所述近端数字语音信号y(n)和所述远端数字语音信号x(n)的第i帧信号来说,n 0表示近端说话者的数字语音信号s(n)的频率的下限在频域上对应的频点,n 1表示近端说话者的数字语音信号s(n)的频率的上限在频域上对应的频点,此处,需要说明的是,由于确定出上述频率的上限、下限在频域上分别对应的频点, 由此,针对所述近端数字语音信号y(n)和所述远端数字语音信号x(n)的第i帧信号分别对应频域信号有多个,则对应得到多个能量比,上述频率的上限、下限在频域上分别对应的频点限定了按照频率筛选出来的能量比在所述多个能量比中的位置。
具体地,如果人类的数字语音信号的频率段通常在300Hz~3400Hz,则确定出300HZ为频率的下限,该频率的下限在频域的上存在对应的频点n 0,确定出3400Hz为频率的上限,该频率的上限在频域的上存在对应的频点n 1
S268D’、能量比的分位数处理
本实施例中,步骤S268D’具体为:所述能量比计算模块对预设频段内所述近端数字语音信号y(n)和所述远端数字语音信号x(n)分别对应频域信号之间的能量比进行分位数处理;
本实施例中,为了防止语音通话中刚开始以及即将结束时语音信号比较弱导致的漏检以及误检,进行分位数处理,对步骤根据频率筛选出来的能量比由小到大进行排序再进一步从中筛选出一部分能量比,以基于进一步筛选出来的能量比进行双端通话的检测。具体地,可以采用四分位数处理,则第一分位数为0.25,第二分位数为0.5,第三分位数为0.75,第四分位数为1,则选取第二分位数为0.5和第三分位数为0.75之间的能量比,即得到分位数处理后的能量比,或者又称之为分位数处理后的判决量,作为有效的能量比,详细计算方式如公式(6)所示。
Figure PCTCN2019087907-appb-000025
公式(6)中,0.5和0.75分别表示第二分位数和第三分位数,n L表示分位数处理后的上述频率的下限在频域上对应的频点,n H表示分位数处理后的上述频率的上限在频域上对应的频点,分位数处理后的上述频率的上限、下限分别在频域上对应频点限定了在经过分位处理后筛选了部分能量比,记为
Figure PCTCN2019087907-appb-000026
此处,上述四分位数处理时也可以从小到大排序。另外,也可以选择第一分位和第三分位之间的能量比。实际上,对于具体选择那两个或超过两个分位数之间的能量比,根据应用场景的需求灵活确定。
在其他实施例中,也可以采用其他具体的分位数处理,详细不再赘述。S268E’、计算能量比的平均值;
本实施例中,步骤S268E’具体为:所述能量比计算模块确定所述分位数处理后的所述能量比的平均值;
进一步地,本实施例中,为了增加双端通话检测准确性,通过步骤S268’计算所述分位数处理后的所述能量比的平均值,详细如下述公式(7)。
Figure PCTCN2019087907-appb-000027
上述公式(7)中,能量比的平均值记为
Figure PCTCN2019087907-appb-000028
n H-n L+1表示所述分位数处理后的所述能量比所处的频点数,其他参数的含义请参见前述记载。
S268A”、计算频率相干值
本实施例中,步骤S268A”具体为:所述双端通话检测器中的相干计算模块 根据所述近端数字语音信号y(n)和所述远端数字语音信号x(n)分别对应的功率,以及所述近端数字语音信号y(n)与所述远端数字语音信号x(n)之间的关联功率,确定所述近端数字语音信号y(n)与所述远端数字语音信号x(n)之间的频率相干值;
本实施例中,按照如下公式(8)计算步骤S268A”中的频率相干值。
Figure PCTCN2019087907-appb-000029
公式(8)中,
Figure PCTCN2019087907-appb-000030
表示矩阵元素乘积,“*”表示复数共轭。
由上述公式(8)可见,所述频率相干值正比于所述近端数字语音信号y(n)与所述远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号之间的关联功率,且反比于所述近端数字语音信号y(n)和所述远端数字语音信号x(n)中第i帧和第i-1帧信号重叠存储对应频域信号分别对应的功率。
当然,其他实施例中,如果只是粗略估计,所述双端通话检测器进一步用于根据所述近端数字语音信号y(n)和所述远端数字语音信号x(n)分别对应的功率确定所述近端数字语音信号y(n)以及远端数字语音信号x(n)之间的频率相干值。
S268B”、确定预设频段
本实施例中,步骤S268B”具体为:所述双端通话检测器进一步用于根据所述近端说话者的数字语音信号s(n)的频率确定所述预设频段;
S268C”、初步筛选频率相干值
本实施例中,步骤S268C”具体为:确定在所述预设频段内所述近端数字语音信号y(n)与所述远端数字语音信号x(n)的第i帧与第i-1帧信号重叠存储分别对应频域信号之间的频率相干值;
本实施例中,有关频率的相关描述请参见针对上述步骤SS268B’、S268C’的说明,得到频率的下限在频域上对应的频点、频率的上限在频域上对应的频点。此处,需要说明的是,由于确定出频率的上限、下限在频域上分别对应的频点,由此,针对所述近端数字语音信号y(n)和所述远端数字语音信号x(n)的第i帧与第i-1帧信号重叠存储分别对应频域信号有多个,则对应得到多个频率相干值,频率的上限、下限在频域上分别对应的频点限定了按照频率筛选出来的频率相干值在所述多个频率相干值中的位置。
S268D”、频率相干值的分位数处理
本实施例中,步骤S268D”具体为:对预设频段内所述近端数字语音信号y(n)和所述远端数字语音信号x(n)分别对应频域信号之间的频率相干值进行分位数处理;
此处,与上述步骤S268’类似,分位数处理的目的同样是进一步避免漏检以及误检,具体的分位数处理方式也为四分位数处理,从而得到频率的下限在频域上对应的频点n L,分位数处理后的频率的下限在频域上对应频点n H。分位数处理后,分位数处理后的频率的上限、下限分别在频域上对应频点限定了在经过分位处理后筛选了部分频率相干值,记为
Figure PCTCN2019087907-appb-000031
S268E”、计算频率相干值的平均值;
本实施例中,步骤S268E”具体为:确定所述分位数处理后的所述频率相干值的平均值;
本实施例中,具体计算所述分位数处理后的所述频率相干值的平均值如公式(9)所示。
Figure PCTCN2019087907-appb-000032
所述分位数处理后的所述频率相干值的平均值记为
Figure PCTCN2019087907-appb-000033
其他参数的含义请参见前述记载。
S268F、双端通话检测
本实施例中,步骤S268具体为:根据所述分位数处理后的所述能量比的平均值与能量门限的比对结果,以及所述分位数处理后的所述频率相干值的平均值与相干值门限比对的结果,判断所述近端数字语音信号y(n)的第i帧信号中是否存在所述近端说话者的数字语音信号s(n);
若所述分位数处理后的所述能量比的平均值大于所述能量门限,或者,若所述分位数处理后的所述能量比的平均值小于所述能量门限且所述分位数处理后的所述频率相干值的平均值小于所述相干值门限,执行步骤S268G,否则执行S268H。
本实施例中,具体通过如下公式(10)来判断是否存在所述近端说话者的数字语音信号。
Figure PCTCN2019087907-appb-000034
公式(10)中,所述能量门限记为ρ T,所述相干值门限记为c T,DTD为代表判断结果的变量,若所述分位数处理后的所述能量比的平均值大于所述能量门限,或者,若所述分位数处理后的所述能量比的平均值小于所述能量门限且所述分位数处理后的所述频率相干值的平均值小于所述相干值门限,则DTD的值为1;除此之外其他情形时,DTD的值为0。
S268G、为双端通话;
本实施例中,步骤S268G具体为:所述双端通话检测器判定所述近端数字语音信号y(n)中还存在所述近端说话者的数字语音信号s(n);
如前所述,若DTD的值为1,则判定所述近端数字语音信号y(n)中还存在所述近端说话者的数字语音信号s(n),即为双端通话状态;
S268H、为单端通话;
本实施例中,步骤S268H具体为:所述双端通话检测器判定所述近端数字语音信号y(n)中不存在所述近端说话者的数字语音信号s(n);
如前所述,若DTD的值为0,则判定所述近端数字语音信号y(n)中不存在所述近端说话者的数字语音信号s(n),即为单端通话状态。
S268I、输出判定结果
本实施例中,步骤S268I中的判定结果或为双端通话状态或为单端通话状态,根据不同的判定结果控制滤波器系数的更新,以使得所述自适应滤波器根据所述滤波器系数以及所述远端数字语音信号x(n)生成估计的回声数字语音信号
Figure PCTCN2019087907-appb-000035
以消除所述近端数字语音信号y(n)中存在的回声数字语音信号d(n)。
需要说明的是,本实施例中,根据所述近端说话者的数字语音信号s(n)的频率确定所述预设频段,但是此处并非唯一性限定,实际上,根据双端通话检测方案的应用场景,可以根据场景的需求确定所述预设频段,进而进行能量比和频率相干值的筛选。
另外,本实施例中,具体采用分位数处理法从根据所述预设频段筛选出的多个能量比和频率相干值中进一步进行能量比和频率相干值的筛选,但是此处并非唯一性限定,实际上,根据双端通话检测方案的应用场景,可以根据场景的需求从根据所述预设频段筛选出的多个能量比和频率相干值中进一步进行能量比和频率相干值的筛选。
再者,本实施例中,具体判断所述近端数字语音信号y(n)中是否还存在近端说话者的数字语音信号s(n)的方法,并非唯一性限定,实际上,根据双端通话检测方案的应用场景,可以根据场景的需求设定更加精确的判断方式。
上述实施例中,主要以在频域实现为例进行说明的。但是,在上述实施例的启发下,本领域普通技术人员也可以在不偏离本申请思想的前提下,在时域上实现。
另外,上述实施例中记载的具体公式,仅仅是示例并非唯一性限定,在不偏离本申请思想的前提下,本领域普通技术人员可对其进行变形。
另外,在上述实施例中,是以需要确定所述远端数字语音信号与所述近端数字语音信号之间的所述能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值为例进行说明。但是,对于本领域普通技术人员来说,如果在该方案执行时,已经确定好了所述能量比和所述频率相干值的话,则可以直接省略确定所述能量比和所述频率相干值。且如果需要确定所述能量比和所述频率相干值,上述实施例以根据远端数字语音信号与近端数字语音信号对应的功率来分别确定所述远端数字语音信号与所述近端数字语音信号之间的所述能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值的方式为例。该功率具体可以从功率谱中可获得为例进行说明。实际上,对于本领域普通技术人员来说,从整个技术方案上来看,从功率谱去获得所述功率仅仅是举例,可替代地,也可以以查表的方式获得功率。
本申请实施例的上述技术方案可以具体用的各种类型的电子设备上,该电子设备以多种形式存在,包括但不限于:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理 功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
(4)其他具有数据交互功能的电子装置。
至此,已经对本主题的特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作可以按照不同的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序,以实现期望的结果。在某些实施方式中,多任务处理和并行处理可以是有利的。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非 排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定事务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行事务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (37)

  1. 一种双端通话检测方法,其特征在于,包括:
    根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。
  2. 根据权利要求1所述的双端通话检测方法,其特征在于,还包括:确定所述远端数字语音信号与所述近端数字语音信号之间的所述能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值。
  3. 根据权利要求2所述的双端通话检测方法,其特征在于,确定所述远端数字语音信号与所述近端数字语音信号之间的所述能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值,包括:根据远端数字语音信号与近端数字语音信号对应的功率分别确定所述远端数字语音信号与所述近端数字语音信号之间的所述能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值。
  4. 根据权利要求1所述的双端通话检测方法,其特征在于,还包括:根据所述远端数字语音信号与所述近端数字语音信号对应的功率检测所述近端数字语音信号中是否存在回声数字语音信号,若存在,则判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。
  5. 根据权利要求1所述的双端通话检测方法,其特征在于,所述能量比正比于所述远端数字语音信号对应的功率,反比于所述近端数字语音信号对应的功率。
  6. 根据权利要求3所述的双端通话检测方法,其特征在于,所述根据远端数字语音信号与近端数字语音信号对应的功率确定所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值,包括:根据所述近端数字语音信号和所述远端数字语音信号分别对应的功率,以及所述近端数字语音信号与所述远端数字语音信号之间的关联功率,确定所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值。
  7. 根据权利要求6所述的双端通话检测方法,其特征在于,所述频率相干值正比于所述近端数字语音信号与所述远端数字语音信号之间的所述关联功率,且反比于所述近端数字语音信号和所述远端数字语音信号分别对应的功率。
  8. 根据权利要求1-7任一项所述的双端通话检测方法,其特征在于,还包括:根据所述近端数字语音信号对应频域信号计算所述近端数字语音信号对应的功率,以及根据所述远端数字语音信号对应频域信号计算所述远端数字语音信号对应的功率。
  9. 根据权利要求8所述的双端通话检测方法,其特征在于,所述据所述近端数字语音信号对应频域信号计算所述近端数字语音信号对应的功率,包括:根据所述近端数字语音信号对应频域信号以及第一平滑参数计算所述近端数字语音信号对应的功率;根据所述远端数字语音信号对应频域信号计算所述远端数字语音信号对应的功率,包 括:根据所述远端数字语音信号对应频域信号以及第二平滑参数计算所述远端数字语音信号对应的功率。
  10. 根据权利要求6所述的双端通话检测方法,其特征在于,还包括:根据所述近端数字语音信号和所述远端数字语音信号以及第三平滑参数确定所述近端数字语音信号与所述远端数字语音信号之间的所述关联功率。
  11. 根据权利要求1-10任一项所述的双端通话检测方法,其特征在于,根据所述远端数字语音信号与所述近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号,包括:根据在预设频段内的所述近端数字语音信号和所述远端数字语音信号之间的所述能量比,以及在所述预设频段内的所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。
  12. 根据权利要求11所述的双端通话检测方法,其特征在于,还包括:根据所述近端说话者的数字语音信号的频率确定所述预设频段。
  13. 根据权利要求11或12所述的双端通话检测方法,其特征在于,11还包括:对预设频段内所述近端数字语音信号和所述远端数字语音信号之间的所述能量比和所述频率相干值进行分位数处理,以根据所述分位数处理后的所述能量比,以及所述分位数处理后的所述频率相干值判断所述近端数字语音信号中是否存在所述近端说话者的数字语音信号。
  14. 根据权利要求13所述的双端通话检测方法,其特征在于,还包括:确定所述分位数处理后的所述能量比的平均值,以及所述分位数处理后的所述频率相干值的平均值,以根据所述分位数处理后的所述能量比的平均值,以及所述分位数处理后的所述频率相干值的平均值判断所述近端数字语音信号中是否存在所述近端说话者的数字语音信号。
  15. 根据权利要求14所述的双端通话检测方法,其特征在于,所述根据所述分位数处理后的所述能量比的平均值,以及所述分位数处理后的所述频率相干值的平均值判断所述近端数字语音信号中是否存在所述近端说话者的数字语音信号,包括:根据所述分位数处理后的所述能量比的平均值与能量门限的比对结果,以及所述分位数处理后的所述频率相干值的平均值与相干值门限比对的结果,判断所述近端数字语音信号中是否存在所述近端说话者的数字语音信号。
  16. 根据权利要求15所述的双端通话检测方法,其特征在于,若所述分位数处理后的所述能量比的平均值小于所述能量门限且所述分位数处理后的所述频率相干值的平均值小于所述相干值门限,则判定所述近端数字语音信号中存在所述近端说话者的数字语音信号;否则,判定所述近端数字语音信号中不存在所述近端说话者的数字语音信号。
  17. 一种双端通话检测装置,其特征在于,包括:双端通话检测器,所述双端通话检测器用于根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音 信号中是否还存在近端说话者的数字语音信号。
  18. 根据权利要求17所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于确定所述远端数字语音信号与所述近端数字语音信号之间的所述能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值。
  19. 根据权利要求18所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于根据所述远端数字语音信号与所述近端数字语音信号对应的功率分别确定所述远端数字语音信号与所述近端数字语音信号之间的所述能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值。
  20. 根据权利要求17所述的双端通话检测装置,其特征在于,还包括:语音端点检测模块,用于根据所述远端数字语音信号与所述近端数字语音信号对应的功率检测所述近端数字语音信号中是否存在回声数字语音信号,若存在,则所述双端通话检测器判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。
  21. 根据权利要求17所述的双端通话检测装置,其特征在于,所述双端通话检测器包括:能量比计算模块,用于根据所述远端数字语音信号与所述近端数字语音信号对应的功率确定所述能量比。
  22. 根据权利要求17所述的双端通话检测装置,其特征在于,所述能量比正比于所述远端数字语音信号对应的功率,反比于所述近端数字语音信号对应的功率。
  23. 根据权利要求17所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于根据所述近端数字语音信号和所述远端数字语音信号分别对应的功率,以及所述近端数字语音信号与所述远端数字语音信号之间的关联功率,确定所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值。
  24. 根据权利要求23所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于根据所述近端数字语音信号和所述远端数字语音信号确定所述近端数字语音信号与所述远端数字语音信号之间的所述关联功率。
  25. 根据权利要求24所述的双端通话检测装置,其特征在于,所述频率相干值正比于所述近端数字语音信号与所述远端数字语音信号之间的所述关联功率,且反比于所述近端数字语音信号和所述远端数字语音信号分别对应的功率。
  26. 根据权利要求17至25中任一项所述的双端通话检测装置,其特征在于,所述双端通话检测器包括:相干计算模块,所述相干计算模块进一步用于确定所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值。
  27. 根据权利要求17-26任一项所述的双端通话检测装置,其特征在于,还包括:功率计算模块,用于计算所述近端数字语音信号和所述远端数字语音信号分别对应的功率。
  28. 根据权利要求27所述的双端通话检测装置,其特征在于,所述功率计算模块进一步用于根据所述近端数字语音信号对应频域信号计算所述近端数字语音信号对应的功率,以及根据所述远端数字语音信号对应频域信号计算所述远端数字语音信号对应的功率。
  29. 根据权利要求28所述的双端通话检测装置,其特征在于,所述功率计算模 块进一步用于根据所述近端数字语音信号对应频域信号以及第一平滑参数计算所述近端数字语音信号对应的功率,以及根据所述远端数字语音信号对应频域信号以及第二平滑参数计算所述远端数字语音信号对应的功率。
  30. 根据权利要求21所述的双端通话检测装置,其特征在于,还包括:功率计算模块,用于根据所述近端数字语音信号和所述远端数字语音信号以及第三平滑参数确定所述近端数字语音信号与所述远端数字语音信号之间的关联功率。
  31. 根据权利要求17-30任一项所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于根据在预设频段内的所述近端数字语音信号和所述远端数字语音信号之间的所述能量比,以及在所述预设频段的所述近端数字语音信号与所述远端数字语音信号之间的所述频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。
  32. 根据权利要求31所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于根据所述近端说话者的数字语音信号的频率确定所述预设频段。
  33. 根据权利要求31或32所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于对预设频段内所述近端数字语音信号和所述远端数字语音信号的所述能量比和所述频率相干值进行分位数处理,以根据所述分位数处理后的所述能量比,以及所述分位数处理后的所述频率相干值判断所述近端数字语音信号中是否存在所述近端说话者的数字语音信号。
  34. 根据权利要求33所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于确定所述分位数处理后的所述能量比的平均值,以及所述分位数处理后的所述频率相干值的平均值,以根据所述分位数处理后的所述能量比的平均值,以及所述分位数处理后的所述频率相干值的平均值判断所述近端数字语音信号中是否存在所述近端说话者的数字语音信号。
  35. 根据权利要求34所述的双端通话检测装置,其特征在于,所述双端通话检测器进一步用于根据所述分位数处理后的所述能量比的平均值与能量门限的比对结果,以及所述分位数处理后的所述频率相干值的平均值与相干值门限比对的结果,判断所述近端数字语音信号中是否存在所述近端说话者的数字语音信号。
  36. 根据权利要求35所述的双端通话检测装置,其特征在于,若所述分位数处理后的所述能量比的平均值小于所述能量门限且所述分位数处理后的所述频率相干值的平均值小于所述相干值门限,则判定所述近端数字语音信号中存在所述近端说话者的数字语音信号;否则,判定所述近端数字语音信号中不存在所述近端说话者的数字语音信号。
  37. 一种回声消除系统,其特征在于,包括双端通话检测装置以及自适应滤波器,所述双端通话检测装置包括:双端通话检测器,所述双端通话检测器用于根据远端数字语音信号与近端数字语音信号之间的能量比,以及所述近端数字语音信号与所述远端数字语音信号之间的频率相干值,判断所述近端数字语音信号中是否还存在近端说话者的数字语音信号。
PCT/CN2019/087907 2019-05-22 2019-05-22 双端通话检测方法、双端通话检测装置以及回声消除系统 WO2020232659A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP19929818.3A EP3796629B1 (en) 2019-05-22 2019-05-22 Double talk detection method, double talk detection device and echo cancellation system
CN201980000966.6A CN112292844B (zh) 2019-05-22 2019-05-22 双端通话检测方法、双端通话检测装置以及回声消除系统
PCT/CN2019/087907 WO2020232659A1 (zh) 2019-05-22 2019-05-22 双端通话检测方法、双端通话检测装置以及回声消除系统
US17/031,862 US11349525B2 (en) 2019-05-22 2020-09-24 Double talk detection method, double talk detection apparatus and echo cancellation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/087907 WO2020232659A1 (zh) 2019-05-22 2019-05-22 双端通话检测方法、双端通话检测装置以及回声消除系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/031,862 Continuation US11349525B2 (en) 2019-05-22 2020-09-24 Double talk detection method, double talk detection apparatus and echo cancellation system

Publications (1)

Publication Number Publication Date
WO2020232659A1 true WO2020232659A1 (zh) 2020-11-26

Family

ID=73459040

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087907 WO2020232659A1 (zh) 2019-05-22 2019-05-22 双端通话检测方法、双端通话检测装置以及回声消除系统

Country Status (4)

Country Link
US (1) US11349525B2 (zh)
EP (1) EP3796629B1 (zh)
CN (1) CN112292844B (zh)
WO (1) WO2020232659A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161748B (zh) * 2020-02-20 2022-09-23 百度在线网络技术(北京)有限公司 一种双讲状态检测方法、装置以及电子设备
CN113223547B (zh) * 2021-04-30 2024-05-24 杭州网易智企科技有限公司 双讲检测方法、装置、设备和介质
CN113345459B (zh) * 2021-07-16 2023-02-21 北京融讯科创技术有限公司 一种双讲状态的检测方法、装置、计算机设备及存储介质
CN113689872A (zh) * 2021-08-16 2021-11-23 广州朗国电子科技股份有限公司 一种声学回声消除装置
CN113949776B (zh) * 2021-10-19 2024-04-16 随锐科技集团股份有限公司 一种基于双步长快速回声消除的双端讲话检测方法和装置
CN114401399B (zh) * 2022-03-28 2022-08-09 广州迈聆信息科技有限公司 一种音频双向延时估计方法、装置、会议终端及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000072557A2 (en) * 1999-05-24 2000-11-30 Motorola Inc. Advanced double-talk detector
CN101179294A (zh) * 2006-11-09 2008-05-14 爱普拉斯通信技术(北京)有限公司 自适应回声消除器及其回声消除方法
CN101179635A (zh) * 2006-11-06 2008-05-14 爱普拉斯通信技术(北京)有限公司 对免提电话进行回声控制的装置、方法和系统
CN102984406A (zh) * 2012-10-01 2013-03-20 美商威睿电通公司 用于检测双端通话情况的方法及其系统
CN109068012A (zh) * 2018-07-06 2018-12-21 南京时保联信息科技有限公司 一种用于音频会议系统的双端通话检测方法
CN109348072A (zh) * 2018-08-30 2019-02-15 湖北工业大学 一种应用于回声抵消系统的双端通话检测方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0847180A1 (en) * 1996-11-27 1998-06-10 Nokia Mobile Phones Ltd. Double talk detector
CN101106405A (zh) * 2006-07-12 2008-01-16 北京大学深圳研究生院 回声消除器、回声消除方法及其双端通话检测系统
DE102008039330A1 (de) * 2008-01-31 2009-08-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Berechnen von Filterkoeffizienten zur Echounterdrückung
JP5347794B2 (ja) * 2009-07-21 2013-11-20 ヤマハ株式会社 エコー抑圧方法およびその装置
US8625776B2 (en) * 2009-09-23 2014-01-07 Polycom, Inc. Detection and suppression of returned audio at near-end
CN103718538B (zh) * 2011-05-17 2015-12-16 谷歌公司 可实现声学回声消除的音频信号非线性后期处理方法和系统
CN103325379A (zh) * 2012-03-23 2013-09-25 杜比实验室特许公司 用于声学回声控制的方法与装置
EP2822263B1 (en) * 2013-07-05 2019-03-27 Sennheiser Communications A/S Communication device with echo suppression
CN105391879B (zh) * 2015-12-09 2018-06-12 天津大学 一种无回声残留双端通话鲁棒的声学回声消除方法
CN109215672B (zh) * 2017-07-05 2021-11-16 苏州谦问万答吧教育科技有限公司 一种声音信息的处理方法、装置及设备
US10863269B2 (en) * 2017-10-03 2020-12-08 Bose Corporation Spatial double-talk detector
US10115411B1 (en) * 2017-11-27 2018-10-30 Amazon Technologies, Inc. Methods for suppressing residual echo
GB2572650A (en) * 2018-04-06 2019-10-09 Nokia Technologies Oy Spatial audio parameters and associated spatial audio playback
US20210099201A1 (en) * 2019-09-26 2021-04-01 Quantum Benchmark, Inc. Systems and methods for cancellation of crosstalk
CN111161748B (zh) * 2020-02-20 2022-09-23 百度在线网络技术(北京)有限公司 一种双讲状态检测方法、装置以及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000072557A2 (en) * 1999-05-24 2000-11-30 Motorola Inc. Advanced double-talk detector
CN101179635A (zh) * 2006-11-06 2008-05-14 爱普拉斯通信技术(北京)有限公司 对免提电话进行回声控制的装置、方法和系统
CN101179294A (zh) * 2006-11-09 2008-05-14 爱普拉斯通信技术(北京)有限公司 自适应回声消除器及其回声消除方法
CN102984406A (zh) * 2012-10-01 2013-03-20 美商威睿电通公司 用于检测双端通话情况的方法及其系统
CN109068012A (zh) * 2018-07-06 2018-12-21 南京时保联信息科技有限公司 一种用于音频会议系统的双端通话检测方法
CN109348072A (zh) * 2018-08-30 2019-02-15 湖北工业大学 一种应用于回声抵消系统的双端通话检测方法

Also Published As

Publication number Publication date
US20210013927A1 (en) 2021-01-14
EP3796629B1 (en) 2022-08-31
EP3796629A1 (en) 2021-03-24
CN112292844A (zh) 2021-01-29
EP3796629A4 (en) 2021-06-30
US11349525B2 (en) 2022-05-31
CN112292844B (zh) 2022-04-15

Similar Documents

Publication Publication Date Title
WO2020232659A1 (zh) 双端通话检测方法、双端通话检测装置以及回声消除系统
JP4377952B1 (ja) 適応フィルタ及びこれを有するエコーキャンセラ
CN108630219B (zh) 回声抑制音频信号特征跟踪的处理系统、方法及装置
CN104994249B (zh) 声回波消除方法和装置
US20200396329A1 (en) Acoustic echo cancellation based sub band domain active speaker detection for audio and video conferencing applications
CN101207663A (zh) 网络通信装置及消除网络通信装置的噪音的方法
WO2020252629A1 (zh) 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备
CN109767780A (zh) 一种语音信号处理方法、装置、设备及可读存储介质
US20210020188A1 (en) Echo Cancellation Using A Subset of Multiple Microphones As Reference Channels
CN107770683A (zh) 一种回声场景下音频采集状态的检测方法及装置
CN105432062B (zh) 用于回波去除的方法、设备及介质
CN112489670B (zh) 时延估计方法、装置、终端设备和计算机可读存储介质
WO2021077599A1 (zh) 一种双讲检测方法、装置、计算机设备和存储介质
CN112602150A (zh) 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
CN109215672B (zh) 一种声音信息的处理方法、装置及设备
CN112037810A (zh) 一种回音处理方法、装置、介质和计算设备
WO2020191512A1 (zh) 回声消除装置、回声消除方法、信号处理芯片及电子设备
WO2019239977A1 (ja) エコー抑圧装置、エコー抑圧方法およびエコー抑圧プログラム
CN106297816B (zh) 一种回声消除的非线性处理方法和装置及电子设备
CN110148421A (zh) 一种残余回声检测方法、终端和装置
CN111355855B (zh) 回声处理方法、装置、设备及存储介质
WO2020107455A1 (zh) 语音处理方法、装置、存储介质及电子设备
US10827076B1 (en) Echo path change monitoring in an acoustic echo canceler
JP4903843B2 (ja) 適応フィルタ及びこれを有するエコーキャンセラ
JP4964267B2 (ja) 適応フィルタ及びこれを有するエコーキャンセラ

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019929818

Country of ref document: EP

Effective date: 20201214

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929818

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE