WO2020252629A1 - 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备 - Google Patents

残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备 Download PDF

Info

Publication number
WO2020252629A1
WO2020252629A1 PCT/CN2019/091565 CN2019091565W WO2020252629A1 WO 2020252629 A1 WO2020252629 A1 WO 2020252629A1 CN 2019091565 W CN2019091565 W CN 2019091565W WO 2020252629 A1 WO2020252629 A1 WO 2020252629A1
Authority
WO
WIPO (PCT)
Prior art keywords
residual echo
factor
voice signal
signal
residual
Prior art date
Application number
PCT/CN2019/091565
Other languages
English (en)
French (fr)
Inventor
郭红敬
李国梁
王鑫山
韩文凯
朱虎
Original Assignee
深圳市汇顶科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市汇顶科技股份有限公司 filed Critical 深圳市汇顶科技股份有限公司
Priority to CN201980001068.2A priority Critical patent/CN110431624B/zh
Priority to PCT/CN2019/091565 priority patent/WO2020252629A1/zh
Publication of WO2020252629A1 publication Critical patent/WO2020252629A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Definitions

  • the embodiments of the present application relate to the field of voice technology, and in particular, to a residual echo detection method, a residual echo detection device, a voice processing chip, and electronic equipment.
  • AEC acoustic echo cancellation
  • echo cancellation mainly focuses on acoustic echo.
  • Acoustic echo cancellation is mainly divided into two parts: linear echo cancellation and residual echo cancellation.
  • linear echo cancellation the linear echo can be estimated by using an adaptive filter to approximate the real sound field as much as possible, then the echo signal is estimated, and the estimated echo signal is subtracted from the voice signal actually collected by the microphone. Achieve the effect of echo cancellation, but due to the limited number of adaptive filter orders, data characteristics, speaker nonlinear characteristics and other factors, the echo cannot be completely eliminated, and there will still be residual echo. The existence of residual echo will seriously affect the voice quality and user experience of the call. For this reason, residual echo cancellation processing is needed to eliminate residual echo.
  • the inventor found that the elimination of residual echo is based on the premise that the residual echo is accurately detected. The higher the accuracy of the detection result, the more effectively the residual echo can be eliminated. Therefore, it is urgent to provide a Solution to achieve residual echo detection.
  • the embodiments of the present application provide a residual echo detection method, a residual echo detection device, a voice processing chip, and electronic equipment to at least solve the above-mentioned problems in the prior art.
  • the embodiment of the present application provides a residual echo detection method, which includes:
  • the residual echo detection factor it is detected whether there is residual echo.
  • An embodiment of the present application provides a residual echo detection device, which includes:
  • the detection factor calculation unit is used to determine the residual echo detection factor according to the relative power between the far-end voice signal and the near-end voice signal;
  • the residual echo detection unit is configured to detect whether there is residual echo according to the residual echo detection factor.
  • An embodiment of the present application provides a voice processing chip, which includes: a residual echo detection device.
  • the residual echo detection device includes: a detection factor calculation unit for determining the relative power between the far-end voice signal and the near-end voice signal Residual echo detection factor; a residual echo detection unit for detecting whether there is residual echo according to the residual echo detection factor.
  • An embodiment of the present application provides an electronic device, which includes the voice processing chip described in any of the embodiments of the present application.
  • the residual echo detection factor is determined according to the relative power between the far-end voice signal and the near-end voice signal; according to the residual echo detection factor, whether there is residual echo is detected, thereby providing A detection scheme for residual echo is proposed.
  • Figure 1 is a schematic structural diagram of an echo cancellation system that can apply the residual echo detection solution of the present application
  • Figure 2 is a schematic structural diagram of an echo cancellation system to which the residual echo detection device of the present application is applied;
  • FIG. 3 is a schematic flowchart of a residual echo detection method according to an embodiment of this application.
  • FIG. 4 is a schematic flowchart of another residual echo detection method according to an embodiment of the application.
  • FIG. 5 is a schematic flowchart of yet another residual echo detection method according to an embodiment of the application.
  • Fig. 6 is a schematic flowchart of a residual echo cancellation method in an embodiment of the application.
  • the residual echo detection factor is determined according to the relative power between the far-end voice signal and the near-end voice signal; according to the residual echo detection factor, whether there is residual echo is detected, thereby providing A detection scheme for residual echo is proposed.
  • Figure 1 is a schematic structural diagram of an echo cancellation system to which the residual echo detection scheme of the present application can be applied; as shown in Figure 1, the echo cancellation device specifically includes a voice endpoint detection module 106, a double-talk detector 108, and an adaptive filter 110, among other things
  • the echo cancellation device may also include: a voice collection module 102, a voice playback module 104, and an addition module 112.
  • the voice collection module 102 communicates with the voice endpoint detection module 106, the double-talk detector 108, and the addition module 112, respectively.
  • the voice playback module 104 is respectively connected to the voice endpoint detection module 106 and the adaptive filter 110, the voice endpoint detection module 106 is in communication with the double-talk detector 108, and the double-talk detector 108 is respectively connected to the adaptive filter 110.
  • the addition module 112 is connected to the communication.
  • the voice collection module 102 is used to collect the near-end analog voice signal y(t) to generate the near-end digital voice signal; in this embodiment, the voice collection module may specifically be a microphone, and the collected near-end analog voice signal y(t) may include the analog voice signal s(t) of the near-end speaker, and may also include the echo analog voice signal d(t) caused by the voice playing module 104 playing the far-end analog voice signal x(t).
  • the voice playing module 104 is used to play the received remote analog voice signal x(t); in this embodiment, the voice playing module 104 may be specifically a speaker.
  • the voice endpoint detection module 106 is used to detect whether there is an echo analog voice signal d(t); in this embodiment, the voice endpoint detection module 106 may also be referred to as a Voice Activity Detector (VAD).
  • VAD Voice Activity Detector
  • the double-talk detector is used to detect whether the echo analog voice signal d(t) and the analog voice signal s(t) of the near-end speaker exist at the same time, that is, to distinguish whether it is a single-ended talking state or a double-ended talking state to determine Update of filter coefficients.
  • the adaptive filter 110 is used to generate an estimated echo analog voice signal according to the filter coefficients and the remote analog voice signal x(t) To eliminate the echo digital voice signal d(t) in the near-end analog voice signal y(t).
  • the adaptive filter 110 is, for example, a multi-delay block frequency domain adaptive filter.
  • the adding module 112 is configured to subtract the estimated echo analog voice signal from the near-end analog voice signal y(t)
  • the error analog voice signal e(t) is obtained to eliminate the echo analog voice signal d(t) existing in the near-end analog voice signal y(t).
  • the addition module 112 may specifically be an adder.
  • the estimated echo analog speech signal The more accurate, that is, the closer to the actual echo analog voice signal d(t), the higher the clarity of the voice.
  • Fig. 2 is a schematic structural diagram of an echo cancellation system to which the residual echo detection device of the present application is applied; as shown in Fig. 2, a residual echo detection device 114 and a residual echo cancellation device 118 can be added to the echo cancellation system of Fig. 1.
  • the residual echo detection device includes: a detection factor calculation unit and a residual echo detection unit.
  • the detection factor calculation unit is used to determine the residual echo detection factor according to the relative power between the far-end voice signal and the near-end voice signal; the residual echo detection unit is used for According to the residual echo detection factor, whether there is residual echo is detected; the residual echo cancellation device can be used to eliminate the detected residual echo.
  • the residual echo detection factor can be calculated in the following two situations:
  • the detection factor calculation unit is further configured to calculate the relative power between the far-end voice signal and the error voice signal and the all-around power between the far-end voice signal and the near-end voice signal. According to the relevant power, the residual echo detection factor is determined. Further, the detection factor calculation unit is further configured to determine the correlation power between the far-end speech signal and the error speech signal and the correlation power between the far-end speech signal and the near-end speech signal The ratio is the residual echo detection factor. In specific implementation, the correlation power between the far-end speech signal and the error speech signal can be calculated, and the correlation power between the far-end speech signal and the near-end speech signal can be calculated, Then calculate the ratio between the two correlation powers and use it as the residual echo detection factor. When the residual echo detection unit detects whether there is residual echo, if the residual echo detection factor is greater than the residual echo detection factor threshold, it is determined that there is residual echo, otherwise it is determined that there is no residual echo.
  • the detection factor calculation unit is further used to calculate the relative power between the far-end voice signal and the estimated echo voice signal and the far-end voice signal and the near-end voice signal The correlation power between the two determines the residual echo detection factor. Further, the detection factor calculation unit may be further configured to determine the correlation power between the far-end voice signal and the estimated echo voice signal and the correlation power between the far-end voice signal and the near-end voice signal The ratio of is the residual echo detection factor. In specific implementation, the correlation power between the far-end speech signal and the estimated echo speech signal, and the correlation power between the far-end speech signal and the near-end speech signal can be determined first, and then The ratio between the two correlation powers is calculated and used as the residual echo detection factor. When the residual echo detection unit detects whether there is residual echo, if the residual echo detection factor is less than the residual echo detection factor threshold, it is determined that there is residual echo, otherwise it is determined that there is no residual echo.
  • the residual echo detection factor threshold when the above-mentioned different methods are used to calculate the residual echo detection factor, the residual echo detection factor threshold will be used when judging whether there is residual echo. Therefore, theoretically speaking, the residual echo detection factor threshold is calculated separately for any one of them.
  • the residual echo detection factor threshold can be flexibly set according to the detection accuracy. Or, in other words, if one of the above two situations is adopted to calculate the residual echo detection factor, the residual echo detection factor threshold can be flexibly set according to the detection accuracy.
  • the determined residual echo detection factor is called the first residual echo detection factor, and the corresponding threshold is called the first residual echo detection factor threshold; correspondingly, based on the far-end voice signal and the estimated
  • the correlation power between the echo speech signals and the correlation power between the far-end speech signal and the near-end speech signal, the determined residual echo detection factor is called the second residual echo detection factor, and the corresponding The threshold is referred to as the second residual echo detection factor threshold; preferably, the second residual echo detection factor threshold is smaller than the first residual echo detection factor threshold.
  • the detection factor calculation unit is further configured to calculate the power of the echo signal according to the correlation power between the far-end voice signal and the error voice signal, the estimated power of the echo voice signal, and the The residual echo detection factor determines the first residual echo suppression factor to eliminate the residual echo according to the first residual echo suppression factor.
  • the detection factor calculation unit is further configured to calculate the power of the error voice signal according to the correlation power between the far-end voice signal and the estimated echo voice signal,
  • the residual echo detection factor determines the first residual echo suppression factor to eliminate the residual echo according to the first residual echo suppression factor.
  • a normalized correlation factor between signals can also be introduced and combined with the residual echo detection factor to perform the residual echo detection.
  • a correlation factor calculation unit may also be included to determine the product of the power of the far-end voice signal and the power of the error voice signal, and based on the product, and the far-end voice signal and the A normalized correlation factor is calculated for the correlation power between the error speech signals, and the normalized correlation factor is combined with the residual echo detection factor to perform the detection of the residual echo.
  • it may also include a correlation factor calculation unit, which is used to determine the product of the power of the far-end voice signal and the estimated power of the echo voice signal, according to the product, and the A normalized correlation factor is calculated for the correlation power between the far-end speech signal and the estimated echo speech signal, and the normalized correlation factor is combined with the residual echo detection factor to perform the residual echo detection.
  • a correlation factor calculation unit which is used to determine the product of the power of the far-end voice signal and the estimated power of the echo voice signal, according to the product, and the A normalized correlation factor is calculated for the correlation power between the far-end speech signal and the estimated echo speech signal, and the normalized correlation factor is combined with the residual echo detection factor to perform the residual echo detection.
  • the correlation factor calculation unit in the first situation can also be called the first correlation factor calculation unit
  • the correlation factor calculation unit in the second situation can also be called the correlation factor calculation unit. It is the second correlation factor calculation unit.
  • the first correlation factor calculation unit and the second correlation factor calculation unit can also be multiplexed.
  • the residual echo detection unit when the residual echo detection unit performs residual echo detection according to the combination of the normalized correlation factor and the residual echo detection factor, it may specifically be based on the normalized correlation factor and the normalized correlation factor threshold The comparison result of the residual echo detection factor and the comparison result of the residual echo detection factor threshold are used to detect the residual echo.
  • the residual echo detection unit may be further configured to: if it is in a single-ended conversation state, generate a second residual echo suppression factor according to the normalized correlation factor, and take the first residual echo suppression factor.
  • the minimum value of the echo suppression factor and the second residual echo suppression factor is used as the effective residual echo suppression factor, which is used to perform a product operation with the error speech signal to eliminate the residual echo.
  • the residual echo detection unit may be further configured to: if it is in a single-ended conversation state, generate a second residual echo suppression factor according to the normalized correlation factor, and take the first residual echo suppression factor.
  • the maximum value of the echo suppression factor and the second residual echo suppression factor is used as an effective residual echo suppression factor, which is used to perform a product operation with the error voice signal to eliminate the residual echo.
  • the second residual echo suppression factor is the difference between a stable value and the normalization factor corresponding to each situation.
  • the residual echo detection unit may be further configured to: if it is in a double-talk state, generate a third residual echo suppression factor based on the a priori estimated echo voice signal power and the near-end voice signal power, and take the The maximum value of the first residual echo suppression factor and the third residual echo suppression factor is used as the effective residual echo suppression factor to eliminate the residual echo; or, if it is in a double-talk state, according to the normalized correlation factor and The first residual echo suppression factor determines an effective residual echo suppression factor, which is used to adjust filter coefficients to eliminate the residual echo.
  • the residual echo detection device may further include: a correction unit, if the normalized correlation factor is greater than the normalized correlation factor threshold The upper limit of is used to reduce the effective residual echo suppression factor to correct the effective residual echo suppression factor; or, if the normalized correlation factor is less than the lower limit of the normalized correlation factor threshold, it is used to increase The effective residual echo suppression factor is to correct the effective residual echo suppression factor.
  • the aforementioned suppression factor correction unit in a double-ended conversation state, it is preferable to use the aforementioned suppression factor correction unit to correct the effective residual echo suppression factor.
  • the above-mentioned correction unit can be used to correct the effective residual echo suppression factor, or it is not necessary to correct the effective residual echo suppression factor.
  • the effective residual echo suppression factor may not be corrected.
  • a detection factor correction unit may also be included. According to the set effective threshold of the residual echo detection factor, the effective and invalid residual echo detection factors are determined, and the effective residual echo detection factors are determined according to the effective threshold. The mean value of the residual echo detection factor corrects the invalid residual echo detection factor.
  • the residual echo can be eliminated according to the following solution.
  • Solution 1 According to the product of the residual echo suppression factor and the error speech signal, the error speech signal after residual echo cancellation is obtained, and the execution subject may be a residual echo cancellation unit.
  • Solution 2 Adjust the filter coefficient update step size of the adaptive filter according to the residual echo detection factor, and adjust the filter coefficient according to the filter coefficient update step size to eliminate the residual echo. Adjusting the filter coefficient update step size of the adaptive filter, and the execution subject of adjusting the filter coefficient according to the filter coefficient update step size may be a residual echo cancellation unit.
  • scheme 1 or scheme 2 can be adopted at the same time, or a combination of scheme 1 and scheme 2 can be used to more completely eliminate residual echo.
  • FIG. 3 is a schematic flowchart of a method for detecting residual echo according to an embodiment of the application; as shown in FIG. 3, it includes:
  • S301 Determine the relative power between the far-end voice signal and the error voice signal, and determine the relative power between the far-end voice signal and the near-end voice signal;
  • the method further includes: In step 301, the far-end speech time domain signal, the error speech time domain signal, and the near-end speech time domain signal are respectively transformed into the frequency domain to obtain the far-end speech domain signal and the error speech domain signal. Signal, the near-end voice domain signal, and then determine the relative power between the far-end voice signal and the error voice signal in the frequency domain, and determine the relative power between the far-end voice signal and the near-end voice signal.
  • the relative power between the far-end voice signal and the error voice signal in the frequency domain in step S301 when determining the relative power between the far-end voice signal and the error voice signal in the frequency domain in step S301, and when determining the relative power between the far-end voice signal and the near-end voice signal, Specifically, the relative power between the corresponding frequency domain signal of the far-end voice domain signal and the error voice domain signal is determined in the frequency domain by frequency points, and the relative power between the far-end voice domain signal and the near-end voice domain signal is determined Corresponding to the relative power between frequency domain signals.
  • the far-end analog voice signal is denoted as x(t)
  • the near-end analog voice signal is denoted as y(t)
  • the estimated echo analog voice signal is denoted as The error analog speech signal e(t).
  • the far-end digital voice signal is recorded as x(n)
  • the near-end digital voice signal is recorded as y(n)
  • the estimated echo digital voice signal is recorded as d(n)
  • error digital voice The signal is denoted as e(n).
  • the frequency domain signal X [X(1),X(2)...X(N)] T , near-end of the i-th frame signal of the far-end digital speech signal through fast Fourier transformation of the above-mentioned digital speech signals
  • the frequency domain signal of the i-th frame signal of the digital speech signal Y [Y(1), Y(2)...Y(N)] T , the frequency domain signal of the i-th frame signal of the estimated echo digital speech signal
  • the frequency domain signal E [E(1), E(2)...E(N)] T of the i-th frame signal of the error digital speech signal, where N is the number of frequency points of the adaptive filter.
  • the correlation power between the n-th frame signal in the remote digital voice signal and the corresponding frequency-domain signal at the k-th frequency point of the n-th frame signal in the error digital voice signal is denoted as S xe (k ,n)
  • the relative power between the n-th frame signal in the far-end digital voice signal and the n-th frame signal in the near-end digital voice signal at the k-th frequency point is recorded as S xy (k,n )
  • the relative power between the n-1th frame signal in the remote digital speech signal and the corresponding frequency domain signal at the kth frequency point of the n-1th frame signal in the error digital speech signal is denoted as S xe (k, n-1)
  • the relative power between the n-1th frame signal in the far-end digital speech signal and the n-1th frame signal in the near-end digital speech signal at the kth frequency point is denoted as S xy (k,n-1)
  • X(k,n) indicates that the
  • the residual echo detection factor is determined according to the correlation power between the far-end voice signal and the error voice signal and the correlation power between the far-end voice signal and the near-end voice signal Specifically, the ratio of the correlation power between the far-end voice signal and the error voice signal to the correlation power between the far-end voice signal and the near-end voice signal may be used as the residual echo detection factor.
  • the residual echo detection factor is specifically calculated by the following formula (2).
  • ⁇ xe (k, n) represents the residual echo detection factor
  • is the control factor to prevent the denominator of the formula (2) from being zero
  • the value of ⁇ is less than S xy (k, n).
  • the residual echo detection factor is compared with a residual echo detection factor threshold, and if the residual echo detection factor is greater than the residual echo detection factor threshold, it indicates that there is more residual echo, otherwise it indicates that there is less or no residual echo. There is residual echo.
  • FIG. 4 is a schematic flow chart of another residual echo detection method according to an embodiment of the application; as shown in FIG. 4, it includes:
  • S401 Determine the relative power between the far-end voice signal and the estimated echo voice signal, and determine the relative power between the far-end voice signal and the near-end voice signal;
  • S402. Determine a residual echo detection factor based on the correlation power between the far-end voice signal and the estimated echo voice signal and the correlation power between the far-end voice signal and the near-end voice signal to detect whether there is Residual echo.
  • step S402 the residual echo is determined according to the correlation power between the far-end speech signal and the estimated echo speech signal and the correlation power between the far-end speech signal and the near-end speech signal
  • the ratio of the correlation power between the far-end voice signal and the estimated echo voice signal to the correlation power between the far-end voice signal and the near-end voice signal is specifically used as the residual echo detection factor.
  • is the control factor to prevent the denominator of formula (4) from being zero, and the value of ⁇ is less than S xy (k,n).
  • the residual echo detection factor calculated according to formula (4) is compared with a residual echo detection factor threshold, and if the residual echo detection factor is less than the residual echo detection factor threshold, it indicates that there are more residual echoes, Otherwise, it indicates that there is little or no residual echo.
  • the residual echo detection factor threshold used for residual echo detection based on the residual echo detection factor calculated based on formula (4) is the same as the residual echo detection factor used for residual echo detection calculated based on formula (2)
  • For the magnitude relationship of the residual echo detection factor threshold please refer to the relevant description in the embodiment of FIG. 1 for details.
  • the error digital speech signal e(n) is obtained by subtracting the estimated echo digital speech signal d(n) from the near-end digital speech signal y(n). Therefore, in fact, since the sum of the two residual echo detection factors calculated according to formula (2) and formula (4) is approximately 1, theoretically, the two residual echo detection factors can be converted to each other. Therefore, if formula (4) is used for residual echo detection, the residual echo detection logic is opposite to the residual echo detection logic using formula (2).
  • the detection logic here can specifically refer to the residual echo detection factor, followed by When the corresponding residual echo detection factor thresholds are compared, whether the residual echo detection factor is greater than or less than the corresponding residual echo detection factor threshold can it be determined that there is more residual echo, or it indicates that there is less or no residual echo processing.
  • Fig. 5 is a schematic flow chart of another residual echo detection method according to an embodiment of the application; as shown in Fig. 5, based on the embodiment in Fig. 3, in order to further improve the accuracy of residual echo detection and perform simple quantitative analysis of residual echo
  • the relevant steps of normalizing correlation factors are added, which include:
  • S501 Determine the relative power between the far-end voice signal and the error voice signal, and determine the relative power between the far-end voice signal and the near-end voice signal;
  • S502 Determine a residual echo detection factor according to the correlation power between the far-end voice signal and the error voice signal and the correlation power between the far-end voice signal and the near-end voice signal;
  • step S501 and step S502 are similar to the embodiment described in FIG. 3 above.
  • the power of the far-end voice signal and the power of the error voice signal are counted in the frequency domain. That is, the far-end analog voice signal and the error analog voice signal are subjected to analog-to-digital conversion to obtain the far-end digital voice signal and the error digital voice signal, and then the far-end digital voice signal and the error digital voice signal are converted to the frequency domain.
  • the power of the far-end voice signal and the power of the error voice signal can be determined by referring to the calculation method of the following formula (5).
  • S xx (k, n) represents the power of the corresponding frequency domain signal at the k-th frequency point of the n-th frame signal in the remote digital voice signal
  • S xx (k, n-1) represents The power of the corresponding frequency domain signal of the n-1th frame signal at the kth frequency point in the remote digital speech signal
  • S ee (k,n) represents the error digital speech signal at the nth frame signal at the kth frequency point
  • S ee (k,n-1) represents the power of the corresponding frequency domain signal at the kth frequency point of the n-1th frame signal in the error digital speech signal
  • X(k,n ) Indicates that the nth frame signal in the remote digital voice signal corresponds to the frequency domain signal at the kth frequency point
  • E(k,n) indicates that the nth frame signal in the error digital voice signal corresponds to the frequency domain signal at the kth frequency point
  • the signal, X(k,n) represents the power of the corresponding frequency
  • the estimated power of the echo speech signal is calculated according to the following formula (5)':
  • S504 Determine a normalized correlation factor between the remote voice signal and the error voice signal according to the power of the remote voice signal and the power of the error voice signal;
  • the normalized correlation factor can be calculated in the frequency domain with reference to the following formula (6).
  • the product of the power of the far-end voice signal and the estimated power of the echo voice signal is determined with reference to the above formula (6), and the product is based on the product and the far-end
  • the normalized correlation factor is calculated, and the normalized factor is specifically calculated with reference to the following formula (6)':
  • S505 Perform a quantitative analysis on the existing residual echo according to the normalized correlation factor and the residual echo detection factor.
  • the existing residual echo is quantitatively analyzed, which can be specifically According to the comparison result of the normalized correlation factor and the threshold of the normalized correlation factor, and the comparison result of the residual echo detection factor and the residual echo detection factor threshold, the existing residual echo is quantitatively analyzed, and a rough estimate is made.
  • the amount of residual echo what needs to be explained here is that the amount of residual echo is only a relative concept, and can be flexibly set according to the application scenario. If there are more residual echoes, it can be considered that there is residual echo, otherwise, it can be considered that there is no residual echo. .
  • step S505 in order to further improve the accuracy, in step S505, according to the normalized correlation factor and the residual echo detection factor, when the existing residual echo is quantitatively analyzed, first according to the double-talk detector
  • the detected call state that is, the single-ended call state or the double-ended call state, and then the residual echo detection factor is used according to the call state, or the residual echo detection factor and the normalized correlation factor are used, Quantitative analysis of residual echo.
  • the residual echo detection factor is calculated with reference to the above formula (2).
  • the residual echo detection factor is directly compared with the residual echo detection factor threshold. If it is greater than the residual echo detection factor threshold, it indicates that there is residual echo, otherwise , There is no residual echo, the absence of residual echo here can be considered as there is less residual echo, ideally it can be considered that there is no residual echo.
  • the adaptive filter has a stronger ability to eliminate echoes and eliminates the echoes more thoroughly, the error voice domain signal E(k) is small and basically approaches 0, the calculated ⁇ xe (k,n) It is also relatively small and thus smaller than the threshold of the residual echo detection factor, indicating that there is less residual echo or theoretically no residual echo at all.
  • the residual echo detection factor is calculated based on the above second situation and is smaller than the corresponding residual echo detection factor threshold, it indicates that there are more residual echoes. It is greater than the corresponding residual echo detection factor threshold, indicating that the residual echo is less or theoretically there is no residual echo at all.
  • the upper limit ⁇ up and the lower limit ⁇ low of the residual echo detection factor threshold, and the upper limit C up and the lower limit C low of the normalized correlation factor threshold are set. Therefore, residual echo detection is performed.
  • the upper limit ⁇ up and the lower limit ⁇ low of the corresponding residual echo detection factor threshold, and the upper limit C up and the lower limit C low of the normalized correlation factor threshold are used for residual echo detection.
  • the detection is contrary to the above situation:
  • the size of the upper limit ⁇ up and the lower limit ⁇ low of the corresponding residual echo detection factor threshold in each case, and the size of the upper limit C up and the lower limit C low of the normalized correlation factor threshold Flexible setting according to detection accuracy.
  • the upper limit ⁇ up and the lower limit ⁇ low of the residual echo detection factor threshold in the first case correspond to the residual echo detection factor threshold in the second case.
  • the lower limit ⁇ low and the upper limit ⁇ up , the upper limit C up and the lower limit C low of the normalized correlation factor threshold in the first case correspond to the lower limit C low and the upper limit C of the normalized correlation factor threshold in the second case, respectively up .
  • Figure 6 is a schematic flow chart of the residual echo cancellation method in an embodiment of the application; referring to any of the above-mentioned Figures 3 to 5, it is determined that there is residual echo that should be cancelled, as shown in Figure 6, the residual echo is cancelled
  • the process specifically includes the following steps:
  • S601 Determine whether it is in a single-ended conversation state or a dual-ended pass state
  • the result of the call state comes from the double-talk detector.
  • the residual echo cancellation mechanism set may be: determining the first residual echo suppression factor.
  • the first residual echo suppression factor may be determined according to the correlation power between the far-end voice signal and the error voice signal, the estimated power of the echo voice signal, and the residual echo detection factor.
  • the first residual echo suppression factor can be directly used as the effective residual echo suppression factor to eliminate residual echo.
  • the second residual echo suppression factor can also be generated according to the normalized correlation factor, and the smallest of the first residual echo suppression factor and the second residual echo suppression factor can be selected. The value is used as the effective residual echo suppression factor to eliminate the residual echo. The smaller the effective residual echo suppression factor, the greater the residual echo cancellation strength, so as to achieve suppression according to the normalized correlation factor and the first residual echo The factor determines the effective residual echo suppression factor to eliminate the residual echo.
  • the first residual echo suppression factor and the second residual echo suppression factor used for residual echo cancellation based on the effective residual echo suppression factor in the single-ended talk state can be in accordance with the following formula (7) And (8), and the effective residual echo suppression factor can be calculated according to the following formula (9).
  • G(k,n) represents the first residual echo suppression factor
  • G 1 (k,n) represents the second residual echo suppression factor
  • G'(k,n) represents the effective residual echo suppression factor
  • the second residual echo suppression factor is the difference between a stable value and the normalization factor.
  • the stable value is theoretically equal to 1, but in practice, due to various other influences, the stable value may be greater than 1.
  • formula (7)' that is, by detecting the power of the error voice signal and the residual echo according to the correlation power between the far-end voice signal and the estimated echo voice signal Factor, determining the first residual echo suppression factor to eliminate the residual echo according to the first residual echo suppression factor. If it is in a single-ended conversation state, refer to formula (8)' to generate a second residual echo suppression factor according to the normalized correlation factor, and refer to formula (9)' to take the first residual echo suppression factor and the second residual echo The maximum value of the suppression factors is used as an effective residual echo suppression factor, and is used to perform a product operation with the error speech signal to eliminate the residual echo.
  • E(k,n) represents that the nth frame signal in the error digital speech signal corresponds to the frequency domain signal at the kth frequency point, It means that the nth frame signal in the error digital speech signal after residual echo cancellation corresponds to the frequency domain signal at the kth frequency point, that is, the effective residual echo suppression factor and the error speech signal are multiplied to eliminate the residual echo.
  • the filter coefficient of the adaptive filter is directly adjusted according to the residual echo detection factor to eliminate the residual echo. Further, when adjusting the filter coefficient of the adaptive filter according to the residual echo detection factor to eliminate the residual echo, specifically according to the residual echo detection factor, adjust the filter coefficient update step size of the adaptive filter, and according to the filter The filter coefficient update step adjusts the filter coefficient to eliminate the residual echo.
  • adjusting the filter coefficient update step size of the adaptive filter according to the residual echo detection factor includes: determining the effective frequency point, and filtering the effective residual echo detection factor according to the effective frequency point; calculating the average value of the effective residual echo detection factor And the product of the maximum update step length of the filter coefficient; take the maximum value of the product and the maximum update step length of the filter coefficient as the effective update step of the filter coefficient; update the filter according to the effective update step of the filter coefficient Filter coefficient to eliminate the residual echo, which corresponds to the time domain.
  • the effective frequency points are selected according to the effective frequency range of 300-3400 Hz of human speech.
  • the effective update step size of the filter coefficient can be determined according to the following formula (11).
  • represents the product of the mean value of the effective residual echo detection factor and the maximum update step size of the filter coefficient
  • ⁇ max represents the maximum update step size of the filter coefficient
  • ⁇ min represents the minimum update step size of the filter coefficient
  • ⁇ ' represents the effective update step size of the filter coefficients
  • the maximum update step size in formula (11) can achieve faster update of the filter coefficients.
  • the filter coefficients can be updated in the time domain according to the following formula (12).
  • w(n) represents the filter coefficient for the nth frame signal of the remote digital voice signal
  • x(n) represents the nth frame signal in the remote digital voice signal
  • e(n) represents the error number
  • ⁇ x(n) ⁇ 2 represents the energy of the nth frame signal in the remote digital voice signal.
  • the aforementioned filter coefficient update step size is determined in the time domain.
  • the effective update step size of the filter coefficients can also be determined in the frequency domain, that is, the residual echo can be eliminated in units of frequency points. Specifically, the elimination of the remote speech time domain signal in each The residual echo at the frequency point.
  • the filter coefficient update step size of the adaptive filter is adjusted in the unit of frequency point.
  • the step size update step includes: determining the effective frequency point and filtering according to the effective frequency point Effective residual echo detection factor; for the effective frequency point k, calculate the product of the effective residual echo detection factor ⁇ (k,n) and the filter coefficient maximum update step size ⁇ max as the new update step size ⁇ (k, n) ; Take the maximum value of the calculated new step size ⁇ (k,n) and the minimum filter coefficient update step size ⁇ min as the effective frequency point filter coefficient effective update step size ⁇ ′(k,n);
  • the filter coefficients are updated according to the effective update step of the filter coefficients to eliminate the residual echo in the corresponding frequency-domain signal at the effective frequency point of the remote speech time-domain signal one by one, and this residual echo is also corresponding To the frequency domain.
  • the filter coefficients are updated according to the effective update step of the filter coefficients to eliminate the residual echo in the corresponding frequency-domain signal at the effective frequency point of the remote speech time-domain signal one by one, and this residual echo is also corresponding To the frequency domain.
  • the effective update step size of the k-th frequency point filter coefficient when it is obtained, it is specifically based on the k-th frequency point residual echo detection factor ⁇ (k, n) and the selected step-size transformation function f ( ⁇ (k,n)), determine the effective filter update step size of the kth frequency point to eliminate the residuals in the corresponding frequency domain signal of the remote speech time domain signal at the kth frequency point echo.
  • the method of calculating the effective update step length of the filter coefficient in the formula (14) is simpler and the data calculation amount is smaller.
  • the following formula (13) can be referred to to determine the effective update step size for the filter coefficients generally applicable to each frequency point.
  • ⁇ (k,n) represents the residual echo detection factor corresponding to the nth frame of the remote digital voice signal at the kth frequency point
  • ⁇ (k,n) represents the The n frame signal corresponds to the product of the effective residual echo detection factor at the kth frequency point and the maximum update step length of the filter coefficient
  • ⁇ '(k,n) indicates that the nth frame signal in the corresponding remote digital speech signal is in the kth
  • f( ⁇ (k,n)) represents the step length transformation function of the nth frame signal in the corresponding remote digital speech signal at the kth frequency point, which is essentially It is a function of the residual echo detection factor.
  • f( ⁇ (k,n)) can be positive or negative. It is mainly to meet the needs of the application scenario when updating the step size. Decrease or increase; ⁇ step is the adjustment step of each step, ⁇ ′(k,n-1) represents the corresponding remote digital voice signal of the n-1th frame signal on the kth frequency point.
  • ⁇ step is the adjustment step of each step
  • ⁇ ′(k,n-1) represents the corresponding remote digital voice signal of the n-1th frame signal on the kth frequency point
  • the effective update step size of the filter coefficients In formula (14), the maximum value of the newly calculated step size ⁇ (k,n) at the k-th frequency point and the minimum update step size ⁇ min of the filter coefficient is the most effective. Filter update step size.
  • the filter coefficients can be updated in the frequency domain according to the following formula (15).
  • w n+1 (k) represents the filter coefficient of the corresponding frequency domain signal at the k-th frequency point for the n+1 frame signal in the remote digital voice signal
  • w n (k) represents the filter coefficient for the remote digital voice signal
  • the nth frame signal corresponds to the filter coefficient of the frequency domain signal at the kth frequency point
  • ⁇ X(k,n) ⁇ 2 means that the nth frame signal in the remote digital speech signal corresponds to the frequency domain at the kth frequency point
  • the energy of the signal ⁇ represents the control factor
  • X * (k, n) represents the conjugate of the corresponding frequency domain signal at the k-th frequency point of the n-th frame signal in the remote digital speech signal
  • E(k, n) represents the error
  • the nth frame signal in the speech time domain signal corresponds to the frequency domain signal at the kth frequency point.
  • the residual echo cancellation mechanism set for the double-talk state is: generating a third residual echo suppression factor according to the estimated echo voice signal power and the near-end voice signal power, and taking the first residual echo suppression factor and The maximum value of the third residual echo suppression factor is used as the effective residual echo suppression factor, and the residual echo is eliminated with reference to the above formula (10).
  • the reason for the maximum value is that the greater the effective residual echo suppression factor, the greater the residual echo cancellation strength.
  • the maximum value is used here.
  • a priori estimated echo speech signal can be obtained through testing, and the a priori estimated echo speech signal power can be calculated, and the third residual is generated together with the near-end speech signal power.
  • the echo suppression factor here, as described above, is calculated a priori estimated echo voice signal power and near-end voice signal power in the frequency domain.
  • the following formula (16) can be used to calculate the a priori estimated echo speech signal power and the near-end speech signal power, and the formula (17) can be referred to to calculate the third residual echo suppression factor.
  • the power of the n-th frame signal corresponding to the frequency domain signal at the k-th frequency point in the echo digital speech signal according to the posterior, and the posterior estimate of the n-1th frame signal in the echo digital speech signal
  • the k-th frequency point corresponds to the power of the frequency domain signal
  • the a posteriori estimated echo digital speech signal of the nth frame signal corresponds to the frequency domain signal at the k-th frequency point
  • the posterior estimated echo digital speech signal the nth frame The signal corresponds to the conjugate of the frequency domain signal at the k-th frequency point for residual echo detection.
  • a priori estimated echo digital speech signal of the nth frame signal corresponds to the power of the frequency domain signal at the kth frequency point is relatively accurate, therefore, it is preferable to estimate a priori echo digital speech signal for the nth frame signal.
  • the power of the -1 frame signal corresponding to the frequency domain signal at the k-th frequency point is substituted into the above formula (16).
  • G(k,n) represents the first residual echo suppression factor
  • G 2 (k,n) represents the third residual echo suppression factor
  • G'(k,n) represents the effective residual echo suppression factor
  • the effective residual echo suppression factor is reduced to correct the effective residual echo suppression factor; or if the normalization If the correlation factor is smaller than the lower limit of the threshold of the normalized correlation factor, the effective residual echo suppression factor is increased to correct the effective residual echo suppression factor. For example, in a double-ended conversation state, it may appear that C xe (k,n) is greater than or equal to the upper limit C up , or it may appear that C xe (k,n) is less than the lower limit C low .
  • C xe (k,n) ⁇ C low it means that the nth frame signal in the near-end speech signal corresponds to less residual echo in the frequency domain signal at the k-th frequency point, in other words, it is the near-end speaker's speech The signal probability is small.
  • the nth frame signal in the near-end speech signal is set at the kth frequency point corresponding to the effective residual echo suppression factor G of the frequency domain signal. '(k,n) is slightly increased to realize the correction.
  • the effective and invalid residual echo detection factors may also be determined according to the set effective threshold of the residual echo detection factor, and the invalid residual echo detection factors are detected based on the mean value of the effective residual echo detection factors.
  • the factor is corrected.
  • the effective mean value of the residual echo detection factor is used to replace the invalid residual echo detection factor.
  • no processing is performed on the invalid residual echo detection factor, so in the residual echo cancellation processing, the frequency points corresponding to the invalid residual echo detection factor are directly ignored.
  • the residual echo detection factor when judging whether the residual echo detection factor is invalid or effective, as mentioned above, it can be judged according to the sum of the residual echo detection factor calculated by the above formulas (2) and (4).
  • these two The sum of the detection factors is 1, but in actual products, considering the influence of various other factors, the combined value of these two detection factors may actually be greater than 1, but basically at a stable value. For this reason, in practice If the sum of the two residual echo detection factors calculated by the above formulas (2) and (4) is greater than the stable value, it indicates that the residual echo detection factor at the corresponding frequency point is invalid. On the contrary, it indicates that the residual echo detection factor at the corresponding frequency point is invalid. The echo detection factor is effective. The effective residual echo detection factor calculated according to the above formula (2) is averaged to replace the invalid residual echo detection factor.
  • the above-mentioned residual echo detection device can be integrated on a voice processing chip.
  • An embodiment of the present application provides an electronic device, which includes the voice processing chip described in any embodiment of the present application.
  • the electronic devices in the embodiments of this application exist in various forms, including but not limited to:
  • Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
  • Such terminals include: PDA, MID and UMPC devices, such as iPad.
  • Portable entertainment equipment This type of equipment can display and play multimedia content.
  • Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • Server A device that provides computing services.
  • the composition of a server includes a processor 810, hard disk, memory, system bus, etc.
  • the server is similar to a general computer architecture, but because it needs to provide highly reliable services, it has High requirements in terms of performance, reliability, security, scalability, and manageability.
  • the embodiments of the present application can be provided as a method, a system, or a computer program product. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
  • the computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • processors CPU
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include: but not limited to phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as program modules.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific transactions or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network execute transactions.
  • program modules can be located in local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)

Abstract

一种残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备,残余回声检测方法包括:根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;根据所述残余回声检测因子,检测是否存在残余回声,从而提供了一种残余回声的检测方案。

Description

残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备 技术领域
本申请实施例涉及语音技术领域,尤其涉及一种残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备。
背景技术
随着通信技术、人工智能、语音交互等技术的快速发展,对通信质量、可穿戴设备的用户体验、语音交互的可靠性等提出了越来越高的要求。无论那种应用场景,只要存在语音通话的场景,就一定存在回声,因此,需要通过回声消除(acoustic echo cancellation,AEC)消除回声,改善语音质量,从而提升用户体验。
回声消除中大多数情况下主要针对声学回声。声学回声消除主要分成两部分:线性回声消除以及残余回声消除。在线性回声消除中,线性回声可以通过使用自适应滤波器对回声路径进行估计,尽可能逼近真实的声场,然后估计出回声信号,并在麦克风实际采集到的语音信号中扣除估计的回声信号以达到回声消除的效果,但是由于自适应滤波器阶数有限、数据特性、扬声器非线性特性等因素的影响,并不能彻底消除掉回声,仍然会存在残余回声。残余回声的存在会严重影响通话的语音质量和用户体验感。为此,需要通过残余回声消除处理消除残余回声。
在发明人实现本申请的过程中发现,残余回声的消除是以准确地检测到残余回声为前提,检测结果的准确度越高,越能保证有效地消除掉残留回声,因此,亟待提供一种解决方案,以实现残余回声的检测。
发明内容
有鉴于此,本申请实施例提供一种残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备,用以至少解决现有技术中存在的上述问题。
本申请实施例提供一种残余回声检测方法,其包括:
根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;
根据所述残余回声检测因子,检测是否存在残余回声。
本申请实施例提供一种残余回声检测装置,其包括:
检测因子计算单元,用于根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;
残余回声检测单元,用于根据所述残余回声检测因子,检测是否存在残余回声。
本申请实施例提供一种语音处理芯片,其包括:残余回声检测装置所述残余回声检测装置包括:检测因子计算单元,用于根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;残余回声检测单元,用于根据所述残余回声检测因子,检测是否存在残余回声。
本申请实施例提供一种电子设备,其包括本申请实施例任一所述的语音处理芯片。
由以上技术方案可见,本申请实施例中,根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;根据所述残余回声检测因子,检测是否存在残余回声,从而提供了一种残余回声的检测方案。
附图说明
图1为可应用本申请残余回声检测方案的回声消除系统结构示意图;
图2为应用了本申请残余回声检测装置的回声消除系统结构示意图;
图3为本申请实施例一种残余回声检测方法流程示意图;
图4为本申请实施例又一种残余回声检测方法流程示意图;
图5为本申请实施例再一种残余回声检测方法流程示意图;
图6为本申请实施例中残余回声消除方法的流程示意图。
具体实施方式
为使本领域的普通技术人员更好地理解本申请实施例中的技术方案,下面结 合附图对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请的一部分实施例,而不是全部实施例。因此,本领域普通技术人员基于所描述的实施例而获得的其他实施例,都应当属于本申请实施例保护的范围。
由以上技术方案可见,本申请实施例中,根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;根据所述残余回声检测因子,检测是否存在残余回声,从而提供了一种残余回声的检测方案。
图1为可应用本申请残余回声检测方案的回声消除系统结构示意图;如图1所示,回声消除装置具体包括语音端点检测模块106、双端通话检测器108、自适应滤波器110,除此之外,该回声消除装置还可以包括:语音采集模块102、语音播放模块104、加法模块112,其中,语音采集模块102分别与语音端点检测模块106、双端通话检测器108、加法模块112通讯连接,语音播放模块104分别与语音端点检测模块106、自适应滤波器110通讯连接,语音端点检测模块106与双端通话检测器108通讯连接,双端通话检测器108分别与自适应滤波器110、加法模块112通讯连接。
其中,语音采集模块102用于采集近端模拟语音信号y(t),以生成所述近端数字语音信;本实施例中,语音采集模块具体可以为麦克风,其采集的近端模拟语音信号y(t)可能包括近端说话者的模拟语音信号s(t),也可能包括语音播放模块104播放远端模拟语音信号x(t)而导致的回声模拟语音信号d(t)。
其中,语音播放模块104,用于播放接收到的远端模拟语音信号x(t);本实施例中,语音播放模块104可以具体为扬声器。
其中,语音端点检测模块106,用于检测是否存在回声模拟语音信号d(t);本实施例中,语音端点检测模块106又可以称之为语音端点检测器(Voice Activity Detector,简称VAD)。
所述双端通话检测器用于检测是否同时存在回声模拟语音信号d(t)以及近端说话者的模拟语音信号s(t),即区分是单端通话状态,还是双端通话状态,以确定滤波器系数的更新。
自适应滤波器110,用于根据滤波器系数以及所述远端模拟语音信号x(t)生成估计的回声模拟语音信号
Figure PCTCN2019091565-appb-000001
以消除所述近端模拟语音信号y(t)中存在的回声数字语音信号d(t)。本实施例中,所述自适应滤波器110比如为多时延块频域自适应 滤波器。
其中,加法模块112,用于通过从所述近端模拟语音信号y(t)中减去估计的回声模拟语音信号
Figure PCTCN2019091565-appb-000002
得到所述误差模拟语音信号e(t),以消除所述近端模拟语音信号y(t)中存在的回声模拟语音信号d(t)。本实施例中,加法模块112可以具体为加法器。所述估计的回声模拟语音信号
Figure PCTCN2019091565-appb-000003
越准确,即越接近实际的所述回声模拟语音信号d(t),则语音的清晰度越高。
图2为应用了本申请残余回声检测装置的回声消除系统结构示意图;如图2所示,可在上述图1的回声消除系统中增加了残余回声检测装置114以及残余回声消除装置118,所述残余回声检测装置包括:检测因子计算单元以及残余回声检测单元,检测因子计算单元用于根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;残余回声检测单元用于根据所述残余回声检测因子,检测是否存在残余回声;所述残余回声消除装置可用于消除检测到的残余回声。
而对于残余回声检测因子的计算可通过如下两种情形来计算:
第一种情形:在一应用场景中,检测因子计算单元进一步用于根据所述远端语音信号与误差语音信号之间的相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定残余回声检测因子。进一步地,检测因子计算单元进一步用于确定所述远端语音信号与所述误差语音信号之间的所述相关功率与所述远端语音信号与近端语音信号之间的所述相关功率的比值为所述残余回声检测因子。在具体实施时,可以计算出所述远端语音信号与所述误差语音信号之间的所述相关功率,以及计算出所述远端语音信号与近端语音信号之间的所述相关功率,再计算这两个相关功率之间的比值比并将其作为所述残余回声检测因子。残余回声检测单元在检测是否存在残余回声时,若所述残余回声检测因子大于残余回声检测因子门限,则判定存在残余回声,否则判定不存在残余回声。
第二种情形:在另一种应用场景中,检测因子计算单元进一步用于根据所述远端语音信号与估计的回声语音信号之间的相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定所述残余回声检测因子。进一步地,可以检测因子计算单元进一步用于确定所述远端语音信号与估计的回声语音信号之间的所述相关功率与所述远端语音信号与近端语音信号之间的所述相关功率的比值为所述残余回声检 测因子。在具体实施时,可以先确定出所述远端语音信号与估计的回声语音信号之间的所述相关功率,以及所述远端语音信号与近端语音信号之间的所述相关功率,再计算这两个相关功率之间的比值并将其作为所述残余回声检测因子。残余回声检测单元在检测是否存在残余回声时,若所述残余回声检测因子小于残余回声检测因子门限,则判定存在残余回声,否则判定不存在残余回声。
此处,需要说明的是,在采取上述不同方式计算残余回声检测因子,在判断是否存在残余回声时都会用到残余回声检测因子门限,因此,从理论上来讲,单独对应其中任一种计算残余回声检测因子方式来说,残余回声检测因子门限可以根据检测准确度进行灵活设置。或者,换言之,如果采取上述两种情形中的任一种计算残余回声检测因子方式,残余回声检测因子门限可以根据检测准确度进行灵活设置。但是,如果要采取上述两种情形同时进行残余回声的检测的话,为了便于区分,根据远端语音信号与误差语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定的残余回声检测因子称之为第一残余回声检测因子,对应的门限称之为第一残余回声检测因子门限;对应地,根据所述远端语音信号与估计的回声语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定的所述残余回声检测因子称之为第二残余回声检测因子,对应的门限称之为第二残余回声检测因子门限;优选地,第二残余回声检测因子门限小于第一残余回声检测因子门限。
分别对应上述两种情形,具体地对于第一种情形,检测因子计算单元进一步用于根据所述远端语音信号与误差语音信之间的相关功率、所述估计的回声语音信号的功率以及所述残余回声检测因子,确定所述第一残余回声抑制因子,以根据所述第一残余回声抑制因子消除所述残余回声。
分别对应上述两种情形,具体地对于第二种情形,检测因子计算单元进一步用于根据所述远端语音信号与估计的回声语音信之间的相关功率、所述误差语音信号的功率以及所述残余回声检测因子,确定所述第一残余回声抑制因子,以根据所述第一残余回声抑制因子消除所述残余回声。
进一步地,如果为了提高检测准确度,还可以引入信号之间的归一化相关因子,将其与所述残余回声检测因子结合进行所述残余回声的检测。
具体地,比如对于第一种情形,还可以包括相关因子计算单元,确定所述远端语音信号的功率与误差语音信号的功率的乘积,根据所述乘积,以及所述远端语音信号与所述误差语音信号之间的相关功率,计算归一化相关因子,所述归一化相关因子与所述残余回声检测因子结合,以进行所述残余回声的检测。
具体地,比如对于上述第二种情形,还可以包括相关因子计算单元,其用于确定所述远端语音信号的功率与估计的回声语音信号的功率的乘积,根据所述乘积,以及所述远端语音信号与所述估计的回声语音信号之间的相关功率,计算归一化相关因子,所述归一化相关因子与所述残余回声检测因子结合,以进行所述残余回声的检测。
再次,需要说明的是,为了便于区分,针对还第一种情形下的相关因子计算单元又可以称之为第一相关因子计算单元,针对第二种情形下的相关因子计算单元又可以称之为第二相关因子计算单元。当然,第一相关因子计算单元和第二相关因子计算单元也可以复用。
进一步地,在残余回声检测单元根据所述归一化相关因子与所述残余回声检测因子的结合,进行残余回声的检测时,具体可以根据所述归一化相关因子与归一化相关因子门限的比对结果,以及所述残余回声检测因子与残余回声检测因子门限的比对结果,进行残余回声的检测。
进一步地,对应于上述第一种情形,残余回声检测单元可以进一步用于:若处于单端通话状态,则根据所述归一化相关因子生成第二残余回声抑制因子,取所述第一残余回声抑制因子和第二残余回声抑制因子中的最小值作为有效残余回声抑制因子,用于与所述误差语音信号进行乘积运算以消除所述残余回声。
进一步地,对应于上述第二种情形,残余回声检测单元可以进一步用于:若处于单端通话状态,则根据所述归一化相关因子生成第二残余回声抑制因子,取所述第一残余回声抑制因子和第二残余回声抑制因子中的最大值作为有效残余回声抑制因子,用于与所述误差语音信号进行乘积运算以消除所述残余回声。
进一步地,在上述两种情形中,所述第二残余回声抑制因子为一稳定值与各自情形对应的所述归一化因子的差值。
进一步地,优选地,残余回声检测单元可以进一步用于:若处于双端通话状 态,则根据先验的估计的回声语音信号功率以及近端语音信号功率生成第三残余回声抑制因子,取所述第一残余回声抑制因子和第三残余回声抑制因子中的最大值作为有效残余回声抑制因子,以消除所述残余回声;或者,若处于双端通话状态,则根据所述归一化相关因子以及所述第一残余回声抑制因子确定有效残余回声抑制因子,用于调整滤波器系数以消除所述残余回声。
进一步地,在使用到有效残余回声抑制因子的实施例中,考虑到避免对语音的损伤,残余回声检测装置还可以包括:修正单元,如果所述归一化相关因子大于归一化相关因子门限的上限,用于减小所述有效残余回声抑制因子以对所述有效残余回声抑制因子进行修正;或者,若所述归一化相关因子小于归一化相关因子门限的下限,用于增大所述有效残余回声抑制因子以对所述有效残余回声抑制因子进行修正。
具体地,在双端通话状态时,优选采用上述抑制因子修正单元对有效残余回声抑制因子进行修正。而在单端通话状态时,可以采用上述修正单元对有效残余回声抑制因子进行修正,也可以不用修正有效残余回声抑制因子。当然,实际上,如果在双端通话状态时不考虑对语音可能造成的损伤,也可以不用修正有效残余回声抑制因子。
类似地,为了避免语音损伤,尤其在双端通话状态,还可以包括检测因子修正单元,根据设定的残余回声检测因子有效门限,确定有效和无效的所述残余回声检测因子,根据有效的所述残余回声检测因子的均值对无效的所述残余回声检测因子进行修正。
在经过上述方案检测出存在残余回声后,具体可以根据如下方案进行残余回声的消除。
方案一:根据所述残余回声抑制因子与所述误差语音信号的乘积,得到残余回声消除后的所述误差语音信号,其执行主体可以为残余回声消除单元。
方案二:根据所述残余回声检测因子,调整自适应滤波器的滤波系数更新步长,根据所述滤波器系数更新步长调整滤波器系数,以消除所述残余回声。调整自适应滤波器的滤波系数更新步长,根据所述滤波器系数更新步长调整滤波器系数的执行主体可以为残余回声消除单元。
而对于上述方案二又可以存在如下三种可能的实现方式:
2.1确定有效的残余回声检测因子的均值以及滤波器系数最大更新步长的乘积,根据滤波器系数最小更新步长以及所述乘积,确定滤波器系数有效更新步长。
2.2确定有效的残余回声检测因子以及滤波器系数最大更新步长的乘积,根据滤波器系数最小更新步长以及所述乘积,确定滤波器系数有效更新步长。
2.3根据残余回声检测因子和步长变换函数,确定滤波器系数有效更新步长。
当然,在此需要说明的是,可以同时单独采取上述方案一或者方案二,也可以采用方案一和方案二结合,以较为彻底消除掉残余回声。
以下本申请的实施例中以此提供实施例说明如何实现残余回声的检测,以及在检测到残余回声后如何进行消除。
下述实施例中,主要以针对上述第一种情形计算残余回声检测因子为基础如何实现残余回声的检测进行说明。同时,由于第二种情形残余回声检测因子为基础实现残余回声的检测出的逻辑与第一种情形下的逻辑相反,因此,穿插在下述实施例中进行简要说明,以使本领域普通技术人员对本申请的技术方案清楚的理解。
图3为本申请实施例一种残余回声检测方法流程示意图;如图3所示,其包括:
S301、确定远端语音信号与误差语音信号之间的相关功率,以及确定所述远端语音信号与近端语音信号之间的相关功率;
本实施例中,若所述远端语音信号为远端语音时域信号,所述误差语音信号为误差语音时域信号,所述近端语音信号为近端语音时域信号,则还包括:在步骤301中首先分别将所述远端语音时域信号、所述误差语音时域信号、所述近端语音时域信号变换到频域得到远端语音频域信号、所述误差语音频域信号、所述近端语音频域信号,再在频域上确定远端语音信号与误差语音信号之间的相关功率,以及确定所述远端语音信号与近端语音信号之间的相关功率。
进一步地,本实施例中,步骤S301中在频域上确定远端语音信号与误差语音信号之间的相关功率,以及确定所述远端语音信号与近端语音信号之间的相关功率时,具体以频点为单位在频域上确定远端语音频域信号与误差语音频域信号的对应频域信号之间的相关功率,以及确定远端语音频域信号与近端语音频域信号的对应频域信号之间的相关功率。
具体地,在一种应用场景中,如前所述,假设远端模拟语音信号记为x(t),近端模拟语音信号记为y(t),估计的回声模拟语音信号记为
Figure PCTCN2019091565-appb-000004
误差模拟语音信号e(t)。这些模拟语音信号经过模数转换后得到远端数字语音信号记为x(n),近端数字语音信号记为y(n),估计的回声数字语音信号记为d(n)、误差数字语音信号记为e(n)。
上述各数字语音信号经过快速傅里叶变化得到远端数字语音信号的第i帧信号的频域信号X=[X(1),X(2)...X(N)] T、近端数字语音信号的第i帧信号的频域信号Y=[Y(1),Y(2)...Y(N)] T、估计的回声数字语音信号的第i帧信号的频域信号
Figure PCTCN2019091565-appb-000005
误差数字语音信号的第i帧信号的频域信号E=[E(1),E(2)...E(N)] T,N为自适应滤波器的频点数。
上述各相关功率的计算方式如公式(1)所示:
Figure PCTCN2019091565-appb-000006
上述公式(1)中,远端数字语音信号中第n帧信号与误差数字语音信号中第n帧信号在第k个频点上的对应频域信号之间的相关功率记为S xe(k,n),以及远端数字语音信号中第n帧信号与近端数字语音信号中第n帧信号在第k个频点上对应频域信号之间的相关功率记为S xy(k,n),远端数字语音信号中第n-1帧信号与误差数字语音信号中第n-1帧信号在第k个频点上的对应频域信号之间的相关功率记为S xe(k,n-1),以及远端数字语音信号中第n-1帧信号与近端数字语音信号中第n-1帧信号在第k个频点上对应频域信号之间的相关功率记为S xy(k,n-1),X(k,n)表示远端数字语音信号中第n帧信号在第k个频点上对应频域信号,Y(k,n)表示近端数字语音信号中第n帧信号在第k个频点上对应频域信号,E(k,n)表示误差数字语音信号中第n帧信号在第k个频点上对应频域信号,Y(k,n) *、E(k,n) *分别表示Y(k,n)、E(k,n)的共轭,λ为平滑因子,0<λ<1,k=1……N。
S302、根据远端语音信号与误差语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定残余回声检测因子,以检测是否存在残余回声。
本实施例中,在步骤S302中根据远端语音信号与误差语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定残余回声检测因子时,具体可以根据远端语音信号与误差语音信号之间的所述相关功率与所述远端语音信号与近端语音信号之间的所述相关功率的比值作为所述残余回声检测因子。
具体地,在一应用场景中,具体通过如下公式(2)计算所述残余回声检测因子。
Figure PCTCN2019091565-appb-000007
上述公式(2)中,η xe(k,n)表示所述残余回声检测因子,σ为控制因子,防止公式(2)的分母为零,σ值小于S xy(k,n)。
进一步地,将所述残余回声检测因子与残余回声检测因子门限进行比对,如果所述残余回声检测因子大于残余回声检测因子门限,则表明存在较多的残余回声,否则表明存在较少或者不存在残余回声。
图4为本申请实施例又一种残余回声检测方法流程示意图;如图4所示,其包括:
S401、确定远端语音信号与估计的回声语音信号之间的相关功率,以及确定所述远端语音信号与近端语音信号之间的相关功率;
Figure PCTCN2019091565-appb-000008
Figure PCTCN2019091565-appb-000009
表示远端数字语音信号中第n帧信号与估计的回声数字语音信号中第n帧信号在第k个频点上的对应频域信号之间的相关功率,
Figure PCTCN2019091565-appb-000010
表示远端数字语音信号中第n-1帧信号与估计的回声数字语音信号中第n-1帧信号在第k个频点上的对应频域信号之间的相关功率,X(k,n)表示远端数字语音信号中第n帧信号在第k个频点上对应频域信号,
Figure PCTCN2019091565-appb-000011
表示估计的回声数字语音信号中第n帧信号在第k个频点上对应频域信号,*表示共轭。
所述远端语音信号与近端语音信号之间的相关功率参见上述公式(1)。
S402、根据远端语音信号与估计的回声语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定残余回声检测因子,以检 测是否存在残余回声。
本实施例中,在步骤S402中根据远端语音信号与估计的回声语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定残余回声检测因子时,具体以远端语音信号与估计的回声语音信号之间的所述相关功率与所述远端语音信号与近端语音信号之间的所述相关功率的比值作为残余回声检测因子。
在一具体的应用场景中,参见下述公式(4)计算残余回声检测因子。
Figure PCTCN2019091565-appb-000012
上述公式(4)中,
Figure PCTCN2019091565-appb-000013
表示残余回声检测因子,σ为控制因子,防止公式(4)的分母为零,σ值小于S xy(k,n)。
进一步地,将按照公式(4)计算得到的所述残余回声检测因子与残余回声检测因子门限进行比对,如果所述残余回声检测因子小于残余回声检测因子门限,则表明存在较多残余回声,否则表明存在较少或者不存在残余回声。
基于公式(4)计算得到的所述残余回声检测因子进行残余回声的检测所使用的残余回声检测因子门限与基于公式(2)计算得到的所述残余回声检测因子进行残余回声的检测所使用的残余回声检测因子门限的大小关系,详细请参见上述图1实施例中的相关说明。
此处,需要说明的是,如前所述,由于从所述近端数字语音信号y(n)中减去估计的回声数字语音信号d(n)得到所述误差数字语音信号e(n),因此,实际上,由于按照公式(2)和按照公式(4)计算得到的两个残余回声检测因子之和从理论来讲大致为1,因此,这两个残余回声检测因子可以互相转换。因此,如果采用公式(4)进行残余回声的检测,则进行残余回声的检测逻辑与用公式(2)进行残余回声的检测逻辑相反,此处检测逻辑具体可以指得到残余回声检测因子之后,跟对应的残余回声检测因子门限进行比对时,是残余回声检测因子大于还是小于对应的残余回声检测因子门限才判定存在较多残余回声,或者表明存在较少或者不存在残余回声的处理过程。
如前所述,如果判定存在较多残余回声,则认定为存在残余回声,则需要执行后续的残余回声消除处理。如果存在较少或者完全不存在残余回声,则可认定不存 在残余回声,则不需要执行后续的残余回声消除处理。
图5为本申请实施例再一种残余回声检测方法流程示意图;如图5所示,在图3实施例的基础上,为了进一步提高残余回声检测的准确性以及对残余回声进行简单的定量分析,本实施例中,增加了归一化相关因子的相关步骤,其包括:
S501、确定远端语音信号与误差语音信号之间的相关功率,以及确定所述远端语音信号与近端语音信号之间的相关功率;
S502、根据远端语音信号与误差语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定残余回声检测因子;
本实施例中,步骤S501和步骤S502类似上述图3所述实施例。
S503、确定远端语音信号的功率,以及误差语音信号的功率;
本实施例中,如前所述,是在频域上统计远端语音信号的功率,以及误差语音信号的功率。即远端模拟语音信号、误差模拟语音信号进行模数转换得到远端数字语音信号、误差数字语音信号,再将远端数字语音信号、误差数字语音信号转换到频域。
具体地,在一种应用场景中的具体可参照如下公式(5)计算方法确定远端语音信号的功率,以及误差语音信号的功率。
Figure PCTCN2019091565-appb-000014
上述公式(5)中,S xx(k,n)表示远端数字语音信号中第n帧信号在第k个频点上的对应频域信号的功率,S xx(k,n-1)表示远端数字语音信号中第n-1帧信号在第k个频点上的对应频域信号的功率,S ee(k,n)表示误差数字语音信号中第n帧信号在第k个频点上的对应频域信号的功率,S ee(k,n-1)表示误差数字语音信号中第n-1帧信号在第k个频点上的对应频域信号的功率,X(k,n)表示远端数字语音信号中第n帧信号在第k个频点上对应频域信号,E(k,n)表示误差数字语音信号中第n帧信号在第k个频点上对应频域信号,X(k,n) **和E(k,n) *分别表示X(k,n)和E(k,n)的共轭,λ为平滑因子,0<λ<1。
在其他实施例中,针对上述第二种情形,估计的回声语音信号的功率按照如下公式(5)’来计算:
Figure PCTCN2019091565-appb-000015
公式(5)’中,
Figure PCTCN2019091565-appb-000016
表示估计的回声数字语音信号中第n帧信号在第k个频点上对应频域信号,
Figure PCTCN2019091565-appb-000017
表示估计的回声数字语音信号中第n-1帧信号在第k个频点上的对应频域信号的功率,
Figure PCTCN2019091565-appb-000018
表示
Figure PCTCN2019091565-appb-000019
的共轭,
Figure PCTCN2019091565-appb-000020
表示估计的回声数字语音信号中第n帧信号在第k个频点上的对应频域信号的功率。
S504、根据远端语音信号的功率和误差语音信号的功率,确定所述远端语音信号与所述误差语音信号的归一化相关因子;
本实施例中,在一种应用场景中,根据所述远端语音信号与所述误差语音信号之间的相关功率,与所述远端数字语音信号的功率与误差数字语音信号的功率乘积的比值,计算所述归一化相关因子。具体可参照如下公式(6)计算在频域上计算所述归一化相关因子。
Figure PCTCN2019091565-appb-000021
在上述公式(6)中,通过计算远端数字语音信号中第n帧信号与误差数字语音信号中第n帧信号在第k个频点上的对应频域信号之间的相关功率S xe(k,n),与远端数字语音信号中第n帧信号在第k个频点上的对应频域信号的功率S xx(k,n)与误差数字语音信号中第n帧信号在第k个频点上的对应频域信号的功率S ee(k,n)乘积的比值,将该比值作为归一化相关因子,记为C xe(k,n),以表示所述远端数字语音信号与所述误差数字语音信号中第n帧信号在第k个频点上的对应频域信号的归一化相关因子。
在其他实施例中,针对上述第二种情形下,参照上述公式(6)确定所述远端语音信号的功率与估计的回声语音信号的功率的乘积,根据所述乘积,以及所述远端语音信号与所述估计的回声语音信号之间的相关功率,计算归一化相关因子,归一化因子具体参照如下公式(6)’计算:
Figure PCTCN2019091565-appb-000022
上述公式(6)’中的参数说明参见上述其他实施例。
S505、根据所述归一化相关因子与所述残余回声检测因子,对存在的残余回声进行定量分析。
本实施例中,根据所述归一化相关因子与所述残余回声检测因子,即所述归一化相关因子与所述残余回声检测因子结合,对存在的残余回声进行定量分析,其具体可以根据所述归一化相关因子与归一化相关因子门限的比对结果,以及所述残余回声检测因子与残余回声检测因子门限的比对结果,对存在的残余回声进行定量分析,大致估计出残余回声的多少,此处需要说明的是,残余回声的多少仅是相对概念,根据应用场景可以进行灵活设置,残余回声较多,即可认为存在残余回声,否则,即可认为不存在残余回声。
进一步地,本实施例中,为了进一步提高准确度,步骤S505中根据所述归一化相关因子与所述残余回声检测因子,对存在的残余回声进行定量分析时,首先根据双端通话检测器检测到的通话状态,即处于单端通话状态还是双端通话状态,之后再根据通话状态,利用所述残余回声检测因子,或者,利用所述残余回声检测因子以及所述归一化相关因子,对存在的残余回声进行定量分析。此处,所述残余回声检测因子参照上述公式(2)计算。
在一种应用场景中,如果通话状态为单端通话状态,则直接根据所述残余回声检测因子与残余回声检测因子门限进行比对,如果大于残余回声检测因子门限,则表明存在残余回声,否则,不存在残余回声,此处不存在残余回声可认为是存在较少残余回声,理想情况下可认为不存在残余回声。此处,需要说明的是,在单端通话状态,如果自适应滤波器消除回声的能力比较弱,由此会导致S xe(k,n)与S xy(k,n)两者就比较接近,从而使得参照上述公式(2)计算得到所述残余回声检测因子η xe(k,n)较大,从而大于残余回声检测因子门限,表明残余回声较多。相反地,如果自适应滤波器消除回声的能力比较强,较为彻底的消除了回声,误差语音频域信号E(k)较小,基本趋近于0,计算出的η xe(k,n)也比较小,从而小于残余回声检测因子门限,表明残余回声较少或者理论上根本不存在残余回声。
在其他实施例中,如果基于上述第二种情形计算残余回声检测因子,小于对应的残余回声检测因子门限,表明残余回声较多。大于对应的残余回声检测因子门限,表明残余回声较少或者理论上根本不存在残余回声。
当然,可替代地,如果是单端通话状态,也可以结合所述归一化相关因子进行残余回声的定量分析。但是,在实际实施中,考虑到残余回声检测的准确度要求不 高的话,优选只用残余回声检测因子进行残余回声的定量分析。
在另一种应用场景中,针对上述第一种情形,如果通话状态为双端通话状态,由于此时存在近端说话者的语音信号,如果要避免进行残余回声的消除时误伤近端说话者的语音信号,在进行残余回声的定量分析时,除了要考虑残余回声检测因子,优选还要考虑所述归一化相关因子。
为此,为了提高残余回声检测的准确性,设置了残余回声检测因子门限的上限η up以及下限η low,以及归一化相关因子门限的上限C up以及下限C low,因此,进行残余回声的定量分析过程如下:
(1)如果η xe(k,n)大于等于上限η up,且C xe(k,n)大于等于上限C up,则表明残余回声较多;
(2)如果η xe(k,n)大于等于上限η up,且C xe(k,n)小于下限C low,此时表明残余回声较少;若C xe(k,n)介于上限C up和下限C low之间,则表明存在中等量的残余回声。
(3)如果η xe(k,n)介于门限η low与η up之间,则表明消除了部分回声,但是存在残余回声,进一步根据C xe(k,n)进行残余回声的定量分析;若C xe(k,n)大于等于上限C up,则表明存在较多残余回声;若C xe(k,n)介于上限C up和下限C low之间,则表明存在中等量的残余回声。
(4)如果η xe(k,n)小于门限η low,或者C xe(k,n)小于C low,则表明存在较少残余回声。
再一种应用场景中,针对上述第二种情形,利用对应的残余回声检测因子门限的上限η up以及下限η low,以及归一化相关因子门限的上限C up以及下限C low进行残余回声的检测与上述情形相反:
(1)’如果
Figure PCTCN2019091565-appb-000023
小于下限η low,且
Figure PCTCN2019091565-appb-000024
小于下限C low,则表明残余回声较多;
(2)’如果
Figure PCTCN2019091565-appb-000025
小于下限η low,且
Figure PCTCN2019091565-appb-000026
大于等于上限C up,此时表明残余回声较少;若
Figure PCTCN2019091565-appb-000027
介于上限C up和下限C low之间,则表明存在中等量的残余回声。
(3)’如果
Figure PCTCN2019091565-appb-000028
介于下限η low与上限η up之间,则表明消除了部分回声,但是存在残余回声,进一步根据
Figure PCTCN2019091565-appb-000029
进行残余回声的定量分析;若
Figure PCTCN2019091565-appb-000030
小于下限C low,则表明存在较多残余回声;若
Figure PCTCN2019091565-appb-000031
介于上限C up和下限C low之间,则表明存在中等量 的残余回声。
(4)’如果
Figure PCTCN2019091565-appb-000032
大于等于上限η up,或者
Figure PCTCN2019091565-appb-000033
大于等于上限C up,则表明存在较少残余回声。
单独采用第一种情形或者第二种情形,各自情形下对应的残余回声检测因子门限的上限η up以及下限η low的大小,以及归一化相关因子门限的上限C up以及下限C low的大小,根据检测准确度灵活设置。但是,如果要同时使用上述第一种情形或者第二种情形的话,第一种情形下的残余回声检测因子门限的上限η up以及下限η low分别对应第二种情形下的残余回声检测因子门限的下限η low和上限η up,第一种情形下的归一化相关因子门限的上限C up以及下限C low分别对应第二种情形下的归一化相关因子门限的下限C low和上限C up
图6为本申请实施例中残余回声消除方法的流程示意图;再参照上述图3-图5任一实施例确定出存在应消除的残余回声之后,如图6所示,对残余回声进行消除的过程具体包括如下步骤:
S601、判断处于单端通话状态,还是双端通过状态;
S602、若处于单端通话状态,则根据针对单端通话状态下设置的残余回声消除机制进行残余回声的消除;
本实施例中,通话状态的结果来自双端通话检测器。
本实施例中,在一种应用场景中,在步骤S602中进行残余回声的消除时,设置的残余回声消除机制可以为:确定第一残余回声抑制因子。在确定第一残余回声抑制因子时,可以根据所述远端语音信号与误差语音信之间的相关功率、估计的回声语音信号的功率以及残余回声检测因子,确定所述第一残余回声抑制因子。
进一步地,如果对残余回声的消除程度要求不高或者不严格,则直接可以直接可将第一残余回声抑制因子作为有效残余回声抑制因子进行残余回声的消除。但是,如果考虑到要较为彻底的消除掉残余回声,则还可以根据归一化相关因子生成第二残余回声抑制因子,取所述第一残余回声抑制因子和第二残余回声抑制因子中的最小值作为有效残余回声抑制因子,以消除所述残余回声,有效残余回声抑制因子越小,则残余回声消除的力度越大,从而实现根据所述归一化相关因子以及所述第一残余回声抑制因子确定有效残余回声抑制因子,以消除所述残余回声。
具体地,在上述第一种情形时,针对上述单端通话状态时基于有效残余回声抑制因子进行残余回声消除使用的第一残余回声抑制因子和第二残余回声抑制因子可按照如下公式(7)和(8)来计算,而有效残余回声抑制因子可以按照如下公式(9)来计算。
Figure PCTCN2019091565-appb-000034
G 1(k,n)=1-C xe(k,n)    (8)
G′(k,n)=min(G(k,n),G 1(k,n))    (9)
上述公式(7)中,G(k,n)表示第一残余回声抑制因子;上述公式(8)中,G 1(k,n)表示第二残余回声抑制因子;上述公式(9)中,G'(k,n)表示有效残余回声抑制因子。
参见上述公式(8),推而广之,所述第二残余回声抑制因子为一稳定值与所述归一化因子的差值。该稳定值理论上等于1,但实际中由于各种其他影响,该稳定值可能大于1。
具体地,在上述第二种情形时,参照公式(7)’即通过根据所述远端语音信号与估计的回声语音信之间的相关功率、所述误差语音信号的功率以及所述残余回声检测因子,确定所述第一残余回声抑制因子,以根据所述第一残余回声抑制因子消除所述残余回声。若处于单端通话状态,则参照公式(8)’根据所述归一化相关因子生成第二残余回声抑制因子,参照公式(9)’取所述第一残余回声抑制因子和第二残余回声抑制因子中的最大值作为有效残余回声抑制因子,用于与所述误差语音信号进行乘积运算以消除所述残余回声。
Figure PCTCN2019091565-appb-000035
Figure PCTCN2019091565-appb-000036
G'(k,n)=max(G(k,n),G 1(k,n))    (9)’
在得到有效残余回声抑制因子之后,可以参照如下公式(10)进行残余回声的消除。
Figure PCTCN2019091565-appb-000037
上述公式(10)中,E(k,n)表示误差数字语音信号中第n帧信号在第k个频 点上对应频域信号,
Figure PCTCN2019091565-appb-000038
表示经过残余回声消除后的误差数字语音信号中第n帧信号在第k个频点上对应频域信号,即有效残余回声抑制因子与所述误差语音信号进行乘积运算以消除所述残余回声。
本实施例中,在另一种应用场景中,针对单端通话状态的情形,则直接根据残余回声检测因子,调整自适应滤波器的滤波系数,以消除所述残余回声。进一步地,在根据残余回声检测因子,调整自适应滤波器的滤波系数,以消除所述残余回声时,具体根据残余回声检测因子,调整自适应滤波器的滤波系数更新步长,根据所述滤波器系数更新步长调整滤波器系数,以消除所述残余回声。
具体地,根据残余回声检测因子,调整自适应滤波器的滤波系数更新步长,包括:确定有效频点,并根据有效频点筛选有效的残余回声检测因子;计算有效的残余回声检测因子的均值以及滤波器系数最大更新步长的乘积;取所述乘积与滤波器系数的最大更新步长的最大值作为滤波器系数有效更新步长;根据所述滤波器系数有效更新步长更新所述滤波器系数,以消除所述残余回声,此残余回声对应到时域上。
本实施例中,根据人类语音的有效频率范围300-3400Hz筛选有效频点。
具体地,可以根据如下公式(11)去确定滤波器系数有效更新步长。
Figure PCTCN2019091565-appb-000039
上述公式(11)中,μ表示有效的残余回声检测因子的均值以及滤波器系数最大更新步长的乘积,μ max表示滤波器系数最大更新步长;μ min表示滤波器系数最小更新步长;μ'表示滤波器系数有效更新步长,公式(11)中取最大更新步长可较快的实现滤波器系数的较快更新。
具体地,可以根据如下公式(12)在时域上更新所述滤波器系数。
Figure PCTCN2019091565-appb-000040
上述公式(12)中,w(n)表示针对远端数字语音信号第n帧信号的滤波器系数;x(n)表示远端数字语音信号中第n帧信号;e(n)表示误差数字语音信号中第n帧信号;‖x(n)‖ 2表示远端数字语音信号中第n帧信号的能量。
上述滤波器系数更新步长是在时域上进行确定的。实际上,可替代地,也可以在频域进行滤波器系数有效更新步长的确定,即以频点为单位进行残余回声的消除, 具体地,消除所述远端语音时域信号在每个频点上存在的残余回声。在频域上根据残余回声检测因子,以频点为单位调整自适应滤波器的滤波系数更新步长,对于第n帧信号,步长更新步骤包括:确定有效频点,并根据有效频点筛选有效的残余回声检测因子;针对有效频点k,计算有效的残余回声检测因子η(k,n)与滤波器系数最大更新步长μ max的乘积为新的更新步长μ(k,n);取计算出的新的步长μ(k,n)与滤波器系数最小更新步长μ min的最大值作为所述有效频点的滤波器系数有效更新步长μ′(k,n);根据所述滤波器系数有效更新步长更新所述滤波器系数,以逐一消除所述远端语音时域信号在有效频点上对应频域信号中存在的所述残余回声,此残余回声也是对应到频域上。在一种应用场景中的具体实现可参见下述公式(13)。
或者,针对频域上的情形,得到第k个频点滤波器系数有效更新步长时,具体根据第k个频点残余回声检测因子η(k,n)和选定的步长变换函数f(η(k,n)),确定第k个频点的有效滤波器更新步长,以消除所述远端语音时域信号在第k个频点上对应频域信号中存在的所述残余回声。在一种应用场景中的具体实现可参见下述公式(14)。相对于公式(13)计算滤波器系数有效更新步长的方式,公式(14)计算滤波器系数有效更新步长的方式更加简单,数据计算量较小。
具体地,可以参照如下公式(13)确定针对普遍适用于每一个频点上的滤波器系数有效更新步长。
Figure PCTCN2019091565-appb-000041
按照公式(14)调整针对每一个频点的滤波器系数有效更新步长。
Figure PCTCN2019091565-appb-000042
上述公式(13)、(14)中,η(k,n)表示远端数字语音信号中第n帧信号在第k个频点上对应的残余回声检测因子;μ(k,n)表示第n帧信号对应在第k个频点上有效的残余回声检测因子与滤波器系数最大更新步长的乘积;μ'(k,n)表示对应远端数字语音信号中第n帧信号在第k个频点上对应的滤波器系数有效更新步长;f(η(k,n))表示对应远端数字语音信号中第n帧信号在第k个频点上步长变换函数,其实质上是残余回声检测因子的函数,根据应用场景灵活设置,f(η(k,n))具体的数值可以为正数, 也可以为负数,主要是为了满足应用场景的需要在更新步长时是减小还是增加;μ step为每次步长的调节的步进,μ′(k,n-1)表示对应远端数字语音信号中第n-1帧信号在第k个频点上对应的滤波器系数有效更新步长,在公式(14)中取第k个频点新计算的步长μ(k,n)与滤波器系数最小更新步长μ min的最小值中的最大值最为有效滤波器更新步长。
具体地,可以根据如下公式(15)在频域上更新所述滤波器系数。
Figure PCTCN2019091565-appb-000043
w n+1(k)表示针对远端数字语音信号中第n+1帧信号在第k个频点上对应频域信号的滤波器系数;w n(k)表示针对远端数字语音信号中第n帧信号在第k个频点上对应频域信号的滤波器系数;‖X(k,n)‖ 2表示远端数字语音信号中第n帧信号在第k个频点上对应频域信号的能量;δ表示控制因子;X *(k,n)表示远端数字语音信号中第n帧信号在第k个频点上对应频域信号的共轭;E(k,n)表示误差语音时域信号中第n帧信号在第k个频点上对应频域信号。
S603、若处于双端通话状态,则根据针对双端通话状态下设置的残余回声消除机制进行残余回声的消除。
本实施例中,本实施例中,在一种应用场景中,在步骤S602中进行残余回声的消除时已确定出所述第一残余回声抑制因子。因此,进一步地,针对双端通话状态下设置的残余回声消除机制为:根据估计的回声语音信号功率以及近端语音信号功率生成第三残余回声抑制因子,取所述第一残余回声抑制因子和第三残余回声抑制因子中的最大值作为有效残余回声抑制因子,并参照上述公式(10)消除所述残余回声。如前所述,取最大值的原因在于,由于有效残余回声抑制因子越大,残余回声消除的力度越大,当处于双端通话状态时,残余回声消除的力度越大,为了尽可能地消除掉残余回声又不会造成语音的误伤,为此,此处取最大值。
此处,在一具体的应用场景中,可以通过测试得到一个先验的估计的回声语音信号,并计算先验的估计的回声语音信号的功率,进一步与近端语音信号功率一起生成第三残余回声抑制因子,此处,如前所述,是在频域上计算先验的估计的回声语 音信号的功率以及近端语音信号功率。
具体地,可参照如下公式(16)计算先验的估计的回声语音信号的功率以及近端语音信号功率,参照公式(17)计算第三残余回声抑制因子。
Figure PCTCN2019091565-appb-000044
Figure PCTCN2019091565-appb-000045
Figure PCTCN2019091565-appb-000046
表示先验的估计回声数字语音信号中第n帧信号在第k个频点上对应频域信号的功率,
Figure PCTCN2019091565-appb-000047
表示先验的估计回声数字语音信号中第n-1帧信号在第k个频点上对应频域信号的功率,
Figure PCTCN2019091565-appb-000048
表示先验的估计回声数字语音信号中第n帧信号在第k个频点上对应频域信号,
Figure PCTCN2019091565-appb-000049
表示先验的估计回声数字语音信号中第n帧信号在第k个频点上对应频域信号的共轭;S yy(k,n)表示近端数字语音信号中第n帧信号在第k个频点上对应频域信号的功率,S yy(k,n-1)表示近端数字语音信号中第n-1帧信号在第k个频点上对应频域信号的功率;Y(k,n)表示近端数字语音信号中第n帧信号在第k个频点上对应频域信号,Y(k,n) *表示近端数字语音信号中第n帧信号在第k个频点上对应频域信号的共轭。G 2(k,n)表示第三残余回声抑制因子,β表示控制因子,第三残余回声抑制因子的计算方式详细参见公式(17)。
另外,实际上也可以根据后验的估计回声数字语音信号中第n帧信号在第k个频点上对应频域信号的功率,后验的估计回声数字语音信号中第n-1帧信号在第k个频点上对应频域信号的功率,后验的估计回声数字语音信号中第n帧信号在第k个频点上对应频域信号,后验的估计回声数字语音信号中第n帧信号在第k个频点上对应频域信号的共轭来进行残余回声的检测。但是,相对来说,因为先验的估计回声数字语音信号中第n帧信号在第k个频点上对应频域信号的功率比较准确,因此,优选先验的估计回声数字语音信号中第n-1帧信号在第k个频点上对应频域信号的功率代入到上述公式(16)。
取所述第一残余回声抑制因子和第三残余回声抑制因子中的最大值作为有效残余回声抑制因子的具体计算方式如公式(18)所示。
G′(k,n)=max(G(k,n),G 2(k,n))     (18)
上述公式(18)中,G(k,n)表示第一残余回声抑制因子,G 2(k,n)表示第三残余回声抑制因子,G'(k,n)表示有效残余回声抑制因子。
在得到针对双端通话状态时的有效残余回声抑制因子之后,参照上述公式(10)进行残余回声的消除。
优选地,如果所述归一化相关因子大于归一化相关因子门限的上限,则减小所述有效残余回声抑制因子以对所述有效残余回声抑制因子进行修正;或者,若所述归一化相关因子小于归一化相关因子门限的下限,则增大所述有效残余回声抑制因子以对所述有效残余回声抑制因子进行修正。比如针对双端通话状态时,可能会出现C xe(k,n)大于等于上限C up,或者,可能会出现C xe(k,n)小于下限C low。实际上如果C xe(k,n)>C up,说明近端语音信号中第n帧信号在第k个频点上对应频域信号中残余回声较多,换句话说,是近端说话者语音信号的概率较小,因此,可以将近端语音信号中第n帧信号在第k个频点上对应频域信号设置的有效残余回声抑制因子G(k,n)稍微减小即实现修正;若C xe(k,n)<C low,说明近端语音信号中第n帧信号在第k个频点上对应频域信号中残余回声较少,换句话说,是近端说话者语音信号的概率较小,为了减小语音损失的同时尽可能彻底的消除残余回声,将近端语音信号中第n帧信号在第k个频点上对应频域信号设置的有效残余回声抑制因子G'(k,n)稍微增大即实现修正。
在上述实施例中,还可以根据设定的残余回声检测因子有效门限,确定有效和无效的所述残余回声检测因子,根据有效的所述残余回声检测因子的均值对无效的所述残余回声检测因子进行修正。具体实施时,如果计算到的残余回声检测因子大于残余回声检测因子有效门限,对应频点的残余回声检测因子则无效,否则,对应频点的残余回声检测因子则有效。则优选对无效的残余回声检测因子进行修正。比如用有效的所述残余回声检测因子的均值代替无效的残余回声检测因子。或者更为直接地,对无效的残余回声检测因子不做任何处理,从而在残余回声消除处理上,直接忽略无效的残余回声检测因子对应的频点。
另外,在具体判断残余回声检测因子的无效或者有效时,如前所述,可以根据上述公式(2)和(4)计算得到的残余回声检测因子加和值来判断,在理论上这两个检测因子的加和值为1,但是在实际产品上,考虑到其他各种因素的影响这两个检测因子的结合值实际可能大于1,但基本上处于一个稳定值,为此,在实际中,如果上述 公式(2)和(4)计算得到两个残余回声检测因子的加和值大于该稳定值,则表明对应频点处的残余回声检测因子无效,相反,表明对应频点处的残余回声检测因子有效。在对按照上述公式(2)计算得到的有效的残余回声检测因子求均值以代替无效的残余回声检测因子。
在具体实施时,可以将上述残余回声检测装置集成到语音处理芯片上。
本申请实施例提供一种电子设备,其包括本申请任一实施例所述的语音处理芯片。
本申请实施例的电子设备以多种形式存在,包括但不限于:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
(4)服务器:提供计算服务的设备,服务器的构成包括处理器810、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。
(5)其他具有数据交互功能的电子装置。
至此,已经对本主题的特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作可以按照不同的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序,以实现期望的结果。在某些实施方式中,多任务处理和并行处理可以是有利的。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算 机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括:但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、 只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定事务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行事务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (28)

  1. 一种残余回声检测方法,其特征在于,包括:
    根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;
    根据所述残余回声检测因子,检测是否存在残余回声。
  2. 根据权利要求1所述的方法,其特征在于,所述根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子,包括:根据所述远端语音信号与估计的回声语音信号之间的相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定所述残余回声检测因子。
  3. 根据权利要求1所述的方法,其特征在于,所述根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子,包括:根据所述远端语音信号与误差语音信号之间的相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定残余回声检测因子。
  4. 根据权利要求2所述的方法,其特征在于,根据所述远端语音信号与估计的回声语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定所述残余回声检测因子,包括:确定所述远端语音信号与估计的回声语音信号之间的所述相关功率与所述远端语音信号与近端语音信号之间的所述相关功率的比值为所述残余回声检测因子。
  5. 根据权利要求3所述的方法,其特征在于,所述根据远端语音信号与误差语音信号之间的所述相关功率以及所述远端语音信号与近端语音信号之间的所述相关功率,确定残余回声检测因子,包括:确定所述远端语音信号与所述误差语音信号之间的所述相关功率与所述远端语音信号与近端语音信号之间的所述相关功率的比值为所述残余回声检测因子。
  6. 根据权利要求2所述的方法,其特征在于,所述检测是否存在残余回声,包括:若所述残余回声检测因子小于残余回声检测因子门限,则判定存在残余回声,否则判定不存在残余回声。
  7. 根据权利要求3所述的方法,其特征在于,所述检测是否存在残余回声,包括:若所述残余回声检测因子大于残余回声检测因子门限,则判定存在残余 回声,否则判定不存在残余回声。
  8. 根据权利要求2所述的方法,其特征在于,还包括:确定所述远端语音信号的功率与估计的回声语音信号的功率的乘积,根据所述乘积,以及所述远端语音信号与所述估计的回声语音信号之间的相关功率,计算归一化相关因子,所述归一化相关因子与所述残余回声检测因子结合,以进行所述残余回声的检测。
  9. 根据权利要求3所述的方法,其特征在于,还包括:确定所述远端语音信号的功率与误差语音信号的功率的乘积,根据所述乘积,以及所述远端语音信号与所述误差语音信号之间的相关功率,计算归一化相关因子,所述归一化相关因子与所述残余回声检测因子结合,以进行所述残余回声的检测。
  10. 根据权利要求6或7所述的方法,其特征在于,所述归一化相关因子与所述残余回声检测因子结合,以进行残余回声的检测,包括:根据所述归一化相关因子与归一化相关因子门限的比对结果,以及所述残余回声检测因子与残余回声检测因子门限的比对结果,进行残余回声的检测。
  11. 根据权利要求10所述的方法,其特征在于,还包括:根据所述远端语音信号与误差语音信之间的相关功率、所述估计的回声语音信号的功率以及所述残余回声检测因子,确定所述第一残余回声抑制因子,以根据所述第一残余回声抑制因子消除所述残余回声。
  12. 根据权利要求10所述的方法,其特征在于,还包括:根据所述远端语音信号与估计的回声语音信之间的相关功率、所述误差语音信号的功率以及所述残余回声检测因子,确定所述第一残余回声抑制因子,以根据所述第一残余回声抑制因子消除所述残余回声。
  13. 根据权利要求12所述的方法,其特征在于,根据所述第一残余回声抑制因子消除所述残余回声,包括:若处于单端通话状态,则根据所述归一化相关因子生成第二残余回声抑制因子,取所述第一残余回声抑制因子和第二残余回声抑制因子中的最大值作为有效残余回声抑制因子,用于与所述误差语音信号进行乘积运算以消除所述残余回声。
  14. 根据权利要求11所述的方法,其特征在于,根据所述第一残余回声 抑制因子消除所述残余回声,包括:若处于单端通话状态,则根据所述归一化相关因子生成第二残余回声抑制因子,取所述第一残余回声抑制因子和第二残余回声抑制因子中的最小值作为有效残余回声抑制因子,用于与所述误差语音信号进行乘积运算以消除所述残余回声。
  15. 根据权利要求13或者14所述的方法,其特征在于,所述第二残余回声抑制因子为一稳定值与所述归一化因子的差值。
  16. 根据权利要求11所述的方法,其特征在于,根据所述第一残余回声抑制因子消除所述残余回声,包括:若处于双端通话状态,则根据先验的估计的回声语音信号功率以及近端语音信号功率生成第三残余回声抑制因子,取所述第一残余回声抑制因子和第三残余回声抑制因子中的最大值作为有效残余回声抑制因子,以消除所述残余回声。
  17. 根据权利要求11或12所述的方法,其特征在于,根据所述第一残余回声抑制因子消除所述残余回声,包括:根据所述归一化相关因子以及所述第一残余回声抑制因子确定有效残余回声抑制因子,以消除所述残余回声。
  18. 根据权利要求13-17中任一项所述的方法,其特征在于,还包括:如果所述归一化相关因子大于归一化相关因子门限的上限,则减小所述有效残余回声抑制因子以对所述有效残余回声抑制因子进行修正;或者,若所述归一化相关因子小于归一化相关因子门限的下限,则增大所述有效残余回声抑制因子以对所述有效残余回声抑制因子进行修正。
  19. 根据权利要求1-18中任一项所述的方法,其特征在于,还包括:根据设定的残余回声检测因子有效门限,确定有效和无效的所述残余回声检测因子,根据有效的所述残余回声检测因子的均值对无效的所述残余回声检测因子进行修正。
  20. 根据权利要求13-18中任一项所述的方法,其特征在于,还包括:所述有效残余回声抑制因子与所述误差语音信号进行乘积运算,以消除所述残余回声。
  21. 根据权利要求1-20中任一项所述的方法,其特征在于,还包括:根据所述残余回声检测因子,调整自适应滤波器的滤波系数,以消除所述残余回 声。
  22. 根据权利要求21所述的方法,其特征在于,根据所述残余回声检测因子,调整自适应滤波器的滤波系数,以消除所述残余回声,包括:根据所述残余回声检测因子,调整自适应滤波器的滤波系数更新步长,根据所述滤波器系数更新步长调整滤波器系数,以消除所述残余回声。
  23. 根据权利要求22所述的方法,其特征在于,根据所述残余回声检测因子,调整自适应滤波器的滤波系数更新步长,包括:确定有效的残余回声检测因子的均值以及滤波器系数最大更新步长的乘积,根据滤波器系数最小更新步长以及所述乘积,确定滤波器系数有效更新步长。
  24. 根据权利要求22所述的方法,其特征在于,根据所述残余回声检测因子,调整自适应滤波器的滤波系数更新步长,包括:确定有效的残余回声检测因子以及滤波器系数最大更新步长的乘积,根据滤波器系数最小更新步长以及所述乘积,确定滤波器系数有效更新步长。
  25. 根据权利要求22所述的方法,其特征在于,根据所述残余回声检测因子,调整自适应滤波器的滤波系数更新步长,包括:根据残余回声检测因子和步长变换函数,确定滤波器系数有效更新步长。
  26. 一种残余回声检测装置,其特征在于,包括:
    检测因子计算单元,用于根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;
    残余回声检测单元,用于根据所述残余回声检测因子,检测是否存在残余回声。
  27. 一种语音处理芯片,其特征在于,包括:残余回声检测装置,所述残余回声检测装置包括:检测因子计算单元,用于根据远端语音信号与近端语音信号之间的相关功率,确定残余回声检测因子;残余回声检测单元,用于根据所述残余回声检测因子,检测是否存在残余回声。
  28. 一种电子设备,其特征在于,包括权利要求27所述的语音处理芯片。
PCT/CN2019/091565 2019-06-17 2019-06-17 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备 WO2020252629A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980001068.2A CN110431624B (zh) 2019-06-17 2019-06-17 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备
PCT/CN2019/091565 WO2020252629A1 (zh) 2019-06-17 2019-06-17 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/091565 WO2020252629A1 (zh) 2019-06-17 2019-06-17 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备

Publications (1)

Publication Number Publication Date
WO2020252629A1 true WO2020252629A1 (zh) 2020-12-24

Family

ID=68419109

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091565 WO2020252629A1 (zh) 2019-06-17 2019-06-17 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备

Country Status (2)

Country Link
CN (1) CN110431624B (zh)
WO (1) WO2020252629A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808609A (zh) * 2021-09-18 2021-12-17 展讯通信(上海)有限公司 回声检测方法及装置、计算机可读存储介质、终端设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148421B (zh) * 2019-06-10 2021-07-20 浙江大华技术股份有限公司 一种残余回声检测方法、终端和装置
CN110992975B (zh) * 2019-12-24 2022-07-12 大众问问(北京)信息科技有限公司 一种语音信号处理方法、装置及终端
CN113763975B (zh) * 2020-06-05 2023-08-29 大众问问(北京)信息科技有限公司 一种语音信号处理方法、装置及终端

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101778183A (zh) * 2009-01-13 2010-07-14 华为终端有限公司 一种残留回声抑制方法及设备
CN103905656A (zh) * 2012-12-27 2014-07-02 联芯科技有限公司 残留回声的检测方法及装置
CN107123430A (zh) * 2017-04-12 2017-09-01 广州视源电子科技股份有限公司 回声消除方法、装置、会议平板及计算机存储介质
CN107134281A (zh) * 2017-05-04 2017-09-05 重庆第二师范学院 一种自适应回声消除中自适应滤波器系数更新方法
US20180255183A1 (en) * 2016-02-22 2018-09-06 Tencent Technology (Shenzhen) Company Limited Echo cancellation method and apparatus, and computer storage medium
CN109087665A (zh) * 2018-07-06 2018-12-25 南京时保联信息科技有限公司 一种非线性回声抑制方法
CN109273019A (zh) * 2017-04-21 2019-01-25 豪威科技股份有限公司 用于回声抑制的双重通话检测的方法及回声抑制

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719969B (zh) * 2009-11-26 2013-10-02 美商威睿电通公司 判断双端对话的方法、系统以及消除回声的方法和系统
CN104050971A (zh) * 2013-03-15 2014-09-17 杜比实验室特许公司 声学回声减轻装置和方法、音频处理装置和语音通信终端
US10367948B2 (en) * 2017-01-13 2019-07-30 Shure Acquisition Holdings, Inc. Post-mixing acoustic echo cancellation systems and methods
CN108172233B (zh) * 2017-12-12 2019-08-13 天格科技(杭州)有限公司 基于远端估计信号和误差信号回归因子的回声消除方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101778183A (zh) * 2009-01-13 2010-07-14 华为终端有限公司 一种残留回声抑制方法及设备
CN103905656A (zh) * 2012-12-27 2014-07-02 联芯科技有限公司 残留回声的检测方法及装置
US20180255183A1 (en) * 2016-02-22 2018-09-06 Tencent Technology (Shenzhen) Company Limited Echo cancellation method and apparatus, and computer storage medium
CN107123430A (zh) * 2017-04-12 2017-09-01 广州视源电子科技股份有限公司 回声消除方法、装置、会议平板及计算机存储介质
CN109273019A (zh) * 2017-04-21 2019-01-25 豪威科技股份有限公司 用于回声抑制的双重通话检测的方法及回声抑制
CN107134281A (zh) * 2017-05-04 2017-09-05 重庆第二师范学院 一种自适应回声消除中自适应滤波器系数更新方法
CN109087665A (zh) * 2018-07-06 2018-12-25 南京时保联信息科技有限公司 一种非线性回声抑制方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808609A (zh) * 2021-09-18 2021-12-17 展讯通信(上海)有限公司 回声检测方法及装置、计算机可读存储介质、终端设备

Also Published As

Publication number Publication date
CN110431624A (zh) 2019-11-08
CN110431624B (zh) 2023-04-21

Similar Documents

Publication Publication Date Title
WO2020252629A1 (zh) 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备
WO2018188282A1 (zh) 回声消除方法、装置、会议平板及计算机存储介质
EP2973557B1 (en) Acoustic echo mitigation apparatus and method, audio processing apparatus and voice communication terminal
US7945442B2 (en) Internet communication device and method for controlling noise thereof
AU2015240992B2 (en) Situation dependent transient suppression
CN108696648B (zh) 一种短时语音信号处理的方法、装置、设备及存储介质
US20140334620A1 (en) Method for processing an audio signal and audio receiving circuit
US11349525B2 (en) Double talk detection method, double talk detection apparatus and echo cancellation system
US11245788B2 (en) Acoustic echo cancellation based sub band domain active speaker detection for audio and video conferencing applications
US10978086B2 (en) Echo cancellation using a subset of multiple microphones as reference channels
US8644522B2 (en) Method and system for modeling external volume changes within an acoustic echo canceller
US8971522B2 (en) Noise reduction
KR102190833B1 (ko) 에코 억제
US20240046947A1 (en) Speech signal enhancement method and apparatus, and electronic device
CN112602150A (zh) 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
CN106297816B (zh) 一种回声消除的非线性处理方法和装置及电子设备
US8406430B2 (en) Simulated background noise enabled echo canceller
US10403301B2 (en) Audio signal processing apparatus for processing an input earpiece audio signal upon the basis of a microphone audio signal
CN111989934B (zh) 回声消除装置、回声消除方法、信号处理芯片及电子设备
JP4395105B2 (ja) 音響結合量推定方法、音響結合量推定装置、プログラム、記録媒体
CN115440236A (zh) 一种回声抑制方法、装置、电子设备及存储介质
CN113470676A (zh) 声音处理方法、装置、电子设备和存储介质
KR20140006195A (ko) 반향 제거 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933632

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19933632

Country of ref document: EP

Kind code of ref document: A1