WO2009056035A1 - Method and apparatus for judging dtx - Google Patents

Method and apparatus for judging dtx Download PDF

Info

Publication number
WO2009056035A1
WO2009056035A1 PCT/CN2008/072774 CN2008072774W WO2009056035A1 WO 2009056035 A1 WO2009056035 A1 WO 2009056035A1 CN 2008072774 W CN2008072774 W CN 2008072774W WO 2009056035 A1 WO2009056035 A1 WO 2009056035A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
band
low
feature information
change amount
Prior art date
Application number
PCT/CN2008/072774
Other languages
English (en)
French (fr)
Inventor
Jinliang Dai
Eyal Shlomot
Deming Zhang
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to EP08844412.0A priority Critical patent/EP2202726B1/en
Priority to AU2008318143A priority patent/AU2008318143B2/en
Publication of WO2009056035A1 publication Critical patent/WO2009056035A1/zh
Priority to US12/763,573 priority patent/US9047877B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the present invention relates to the field of signal processing technologies, and in particular, to a DTX (Discontinuous Transmission System) decision method and apparatus. Background technique
  • Speech coding technology can compress the transmission bandwidth of voice signals and increase the capacity of the communication system. Since only about 40% of voice communication is voice-containing, and other times are silence or background noise, in order to further save transmission bandwidth, DTX/CNG (Comfortable Noise Generation) technology has emerged. This technique allows the encoder to use a codec algorithm other than the speech signal for the background noise signal, reducing the average code rate. Simply put, DTX/CNG technology is to encode the background noise segment at the encoding end. It does not need to encode full-rate like a speech frame, nor does it need to encode the background noise of each frame, but only after several frames. Sending a smaller number of encoding parameters (SID frames) than the speech frame; at the decoding end, recovering continuous background noise based on the parameters of the received non-continuous background noise frame, and does not significantly affect Subjective hearing quality.
  • SID frames Sending a smaller number of encoding parameters (SID frames) than the speech frame; at the decoding end, recovering continuous background noise based on the parameters
  • a non-continuous background noise coded frame is usually called a SID (Sience Insertion Descriptor) frame.
  • the SID frame generally only contains spectral parameters and signal energy parameters, and there is no fixed codebook, adaptive codebook, etc. with respect to the voice coded frame.
  • the relevant parameters, and the SID frame are not continuously transmitted, thereby reducing the average bit rate.
  • it is generally detected by the extracted noise parameters to determine whether a SID frame needs to be transmitted. This process can be referred to as a DTX (Discontinuous Transmission) decision, and the output of the DTX decision is "1" or "0", indicating that a SID frame is required or not required to be transmitted.
  • the results of the DTX decision also reflect whether the nature of the current noise has changed significantly.
  • G.729.1 is the latest generation of voice codec standard released by the ITU.
  • the biggest feature of this embedded voice codec standard is that it has layered coding and can provide code rate.
  • the narrowband-to-broadband audio quality which ranges from 8kb/s to 32kb/s, allows the outer code stream to be discarded according to channel conditions during transmission, and has good channel adaptability.
  • each layer encoder system is shown in Figure 1.
  • the input is a superframe of 20ms.
  • the sample rate is 16000Hz and the frame length is 320 points, the input signal is first divided into two sub-bands by QMF filtering ⁇ ), ⁇ )), and the low sub-band signal 5 passes the 50Hz cut-off frequency of Qualcomm.
  • the filter is preprocessed, and the output signal is encoded using a narrowband embedded CELP encoder of 8 kb/s to 12 kb/s, and the difference signal between the local composite signal of the CELP encoder at (") and 12 Kb/s code rate ( ")
  • the signal ⁇ s( ":) after perceptual weighted filtering ( W LB (z) ) is transformed into the frequency domain by MDCT.
  • the weighting filter ⁇ ⁇ (including gain compensation is used to maintain the filter output ⁇ ⁇ (" )
  • the weighted difference signal is transformed into the frequency domain.
  • the high subband component is multiplied by the signal after spectral inversion » preprocessed by a low pass filter with a cutoff frequency of 3000 Hz, and the filtered signal is encoded using a TDBWE encoder.
  • the ⁇ (") entering the TDAC encoding module must also be first converted to the frequency domain using MDCT.
  • the two sets of MDCT coefficients were finally encoded using TDAC.
  • some parameters are transmitted using the FEC (Frame Loss Error Concealed) encoder to improve the error caused by frame loss during transmission.
  • the G.729.1 encoder encodes a full-rate code stream with 12 layers, a core layer rate of 8 kb/s, which is a G.729 code stream; and a low-band enhancement layer coding rate of 12 kb/s, which is a fixed codebook for the core layer.
  • Encoding enhancement, 12kb/s and 8kb/s both correspond to narrow-band signal components;
  • 3GPP the 3rd Generation Partner Project
  • Voice Codec Standard AMR Adaptive Multi-Rate, Adaptive Multi-Rate Vocoder
  • the DTX strategy used is to use the SID- FIRST frame with only 1 bit of valid data to indicate the beginning of the noise segment at the end of the speech segment, and the first SID containing the specific noise information in the third frame after the SID- FIRST frame.
  • the SID_UPDATE frame is sent every 8 frames at a fixed interval. Only the SID-UPDATE frame contains encoded data with comfort noise parameters.
  • the strategy of transmitting SID frames using fixed intervals in AMR cannot adaptively transmit SID frames according to the actual characteristics of noise, that is, there is no guarantee that SID frames will be transmitted when necessary.
  • the disadvantage of using this method in an actual communication system is that, on the one hand, the noise characteristic has changed significantly, but since the SID frame is not transmitted, the decoding end cannot obtain the changed noise information in time; on the other hand, it is possible to transmit the SID frame. At this time, the possible noise characteristics remain stable for a relatively long period of time (greater than 8 frames), and there is no need to transmit SID frames, which causes a waste of bandwidth.
  • the DTX strategy at the encoding end uses the narrowband noise parameter.
  • the change situation adaptively determine whether to send the SID, the interval between the two frames before and after the SID is at least 20 milliseconds, and the maximum is not limited.
  • the disadvantage of this method is that only the energy parameters and spectral parameters extracted from the narrowband signal are used to guide the DTX decision without using the information of the wideband component, so the comprehensive and appropriate DTX decision result may not be given for the wideband speech application scenario. .
  • Embodiments of the present invention provide a DTX decision method and apparatus, to implement banding and layering processing on a noise signal, and obtain a comprehensive and reasonable DTX decision result.
  • an embodiment of the present invention provides a DTX decision method, including the following steps:
  • the DTX decision is made based on the amount of change in the characteristic information of each of the divided signals.
  • An embodiment of the present invention further provides a DTX decision apparatus, including:
  • a band dividing module configured to acquire a banding signal according to the input signal
  • a feature information change quantity obtaining module configured to acquire a change amount of the feature information of each of the banding signals after the banding module is zoned;
  • a decision module configured to perform a DTX decision according to the change amount of the feature information of each of the banded signals acquired by the feature information change amount acquisition module.
  • the method of banding and layering is used to give a comprehensive and reasonable DTX decision result in the noise coding stage, so that the SID coding/CNG decoding can be closer to the actual noise characteristics. . DRAWINGS
  • FIG. 1 is a block diagram of a G.729.1 layer encoder circuit in the prior art
  • FIG. 2 is a flowchart of a DTX decision method according to Embodiment 1 of the present invention
  • FIG. 3 is a schematic structural diagram of a DTX decision device according to Embodiment 5 of the present invention
  • FIG. 4 is a DTX decision in Embodiment 5 of the present invention
  • a schematic diagram of a structure of a low-band characteristic information change amount acquisition sub-module of the device
  • FIG. 5 is a schematic diagram of a usage scenario of a DTX decision apparatus according to Embodiment 5 of the present invention
  • FIG. 6 is a schematic diagram of another use scenario of the DTX decision apparatus in Embodiment 5 of the present invention.
  • a DTX decision method is shown in FIG. 1 and includes: In step S101, the input signal is banded.
  • the wideband signal when the input signal is a wideband signal, the wideband signal can be divided into two subbands of a low band and a high band; when the input signal is an ultra wideband signal, the ultra wideband signal can be divided into a low band at a time, High-band and ultra-highband signals; or first divided into ultra-highband signals and wideband signals, and then the wideband signals are divided into low-band and high-band signals.
  • the low band signal it can be further divided into a low band core layer signal and a low band enhancement layer signal; for a high band signal, it can be further divided into a high band core layer signal and a high band enhancement layer signal.
  • This zoning can be implemented by QMF (Quadature Mirror Filter).
  • a narrowband signal refers to a signal with a frequency band of 0 to 4000 Hz
  • a wideband signal refers to a signal with a frequency band of 0 to 8000 Hz
  • an ultra-wideband signal refers to a signal with a frequency band of 0 to 16000 Hz.
  • Narrowband or lowband (broadband component) signals refer to signals from 0 to 4000 Hz
  • highband (wideband components) signals refer to signals from 4000 to 8000 Hz
  • ultrahighband (ultra-wideband components) signals refer to signals from 8000-16000 Hz.
  • the encoding algorithm enters the trailing phase.
  • the encoder still encodes the input signal according to the speech frame coding algorithm. Its main function is to estimate the characteristics of the noise and initialize the subsequent noise coding algorithm.
  • the noise coding is started, and the input signal is banded.
  • Step sl02 Obtain feature information of each of the banded signals and a change amount of the feature information.
  • the feature information includes energy information and spectral information of the low band signal, which can be obtained by using a linear prediction analysis model.
  • the feature information includes time-domain envelope information and frequency-domain envelope information, which can be obtained by TDB WE (Time Domain Band Width Extension) encoding algorithm.
  • TDB WE Time Domain Band Width Extension
  • the variation metric of the signal in the banding can be obtained.
  • Step sl03 Perform a DTX decision based on the amount of change in the characteristic information of the acquired banded signal.
  • Low-bandwidth characteristic variation metric and high-noise characteristic variation for wideband signals The amount is integrated as a DTX decision result of the wideband; for the ultra-wideband signal, the wideband signal characteristic variation metric and the ultrahighband signal characteristic variation metric are integrated as the DTX decision result of the entire ultra-wideband.
  • the encoding information of the full-rate of the input noise signal is divided into a low-band core layer, a low-band enhancement layer, a high-band core layer, a high-band enhancement layer, and an ultra-high band layer, and the corresponding coding rate is sequentially increased.
  • the noise hierarchy can then be mapped to the actual coding rate.
  • the DTX decision only calculates the variation of the feature information corresponding to the low-band core layer. If the decision function value is greater than a certain threshold, the SID frame is transmitted, otherwise it is not sent.
  • the DTX decision can be jointly determined using the feature information variation of the low-band core layer and the low-band enhancement layer. If the decision function value is greater than a certain threshold, the SID frame is transmitted, otherwise it is not sent.
  • the combined DDT decision is performed using the joint feature information variation of the low-band component and the feature information variation corresponding to the high-band core layer. If the decision function value is greater than a certain threshold, the SID frame is sent. Otherwise it will not be sent.
  • the integrated DTX decision is performed using the joint feature information change amount of the low-band component and the joint feature information change amount of the wide-band component, and if the decision function value is greater than a certain threshold, the SID frame is transmitted, otherwise hair.
  • the DTX decision can be made using the joint characteristic information variation of the full-band signal. If the decision function value is greater than a certain threshold, the SID frame is sent, otherwise it is not sent.
  • Equation (1) When encoding to the high-band core layer or the high-band enhancement layer, Equation (1) is simplified to
  • ⁇ , ⁇ , and ⁇ are used as DTX decision criteria.
  • the DTX decision output 6 ⁇ - ⁇ g is 0, indicating that no noise frame encoding information needs to be transmitted.
  • the DTX decision output is L to indicate that the noise frame encoding information needs to be transmitted; when ⁇ and the difference are greater than 1 or less than 1, according to formula (1) ⁇ ⁇ + ⁇ ) ⁇ As the DTX decision standard.
  • the structure of the SID frame used in this embodiment is as shown in Table 1: Table 1: Bit allocation of the SID frame
  • the system operates at 16k sampling rate with an input signal bandwidth of 8kHz.
  • the full rate frame of the SID frame contains three layers, which are a low band core layer, a low band enhancement layer, and a high band core layer, respectively.
  • the coding parameters used in the low-band core layer are basically similar to the SID frame coding parameters in Appendix B of G.729, which are quantized with 5 bits for the energy parameter, and quantized with 10 bits for the spectral parameter LSF; low-band enhancement layer Based on the low-band core layer, the quantization error of the energy and spectral parameters is further quantified, that is, the energy is used for the second-level quantization, and the spectrum is Use third-level quantization, where the second-level quantization of energy uses 3 bits, and the third-level quantization of the spectrum uses 6-bit; the high-band core layer uses coding parameters similar to those in the TDBWE algorithm in G.729.1, but will be 16 points.
  • the time domain envelope is simplified to 1 time domain energy gain, quantized
  • the input signal is banded, that is, divided into two sub-bands, the low-band frequency range is 0 ⁇ 4kHz, and the high-band frequency range is 4kHz ⁇ 8kHz.
  • the input 16 kHz sample rate signal is banded using a QMF filter bank.
  • the low pass filter ( z ) is a 64-tap symmetrical FIR filter, a high-pass filter (which can be obtained by:
  • the encoder only needs to encode to the low-band core layer or the low-band enhancement layer, then the DTX decision only needs to be done for the low-band component.
  • the amount of change of the low band signal can be calculated by using equation (8), and the DTX decision result can be obtained by using equations (3) and (2).
  • the enhancement layer only further quantizes the parameters of the core layer, so if the coding rate reaches the low-band enhancement layer, the DTX decision process and Equation (8) and Equation (9) are basically the same, except that the energy parameters and spectral parameters used are the quantized results in the enhancement layer, and the decision process will not be repeatedly described here.
  • the encoder needs to encode the high-band core layer, then in addition to calculating the equation according to equation (8), the amount of change in the broadband ⁇ is also calculated.
  • the wideband portion encodes the time domain envelope and the frequency domain envelope for the wideband signal component using a simplified TDBWE encoding algorithm.
  • the time domain envelope is calculated by equation (10):
  • the frequency domain envelope is calculated by equations (11), (12), (13), and (14).
  • a wide-band signal is windowed using a 128-tap Hanning window.
  • the window function expression is shown in equation (11):
  • the signal after windowing is:
  • the quantized time domain envelope 7 ⁇ and the frequency domain envelope Fmv of the previous SID frame are buffered in the memory, and the variation of the current frame wideband component compared to the previous SID frame is available.
  • the variation of the narrow band ⁇ and the variation of the wide band ⁇ are respectively obtained, and the combined variation of the narrow band and the wide band can be obtained by the equation (4).
  • the decision rule shown in equation (2) it can be determined whether the current frame needs to be encoded to transmit the SID frame.
  • the signal processed in this embodiment is a 32 kHz sample, and the low band, high band, and ultra high band noise components are obtained by the band division process.
  • a tree structure can be implemented, that is, after a QMF is divided into ultra-high band and wideband signals, and then a wideband signal is divided into low-band and high-band signals by a QMF; or a non-equal-width sub-band filter can also be used.
  • the group directly divides the input signal into low-band, high-band, and ultra-highband signal components.
  • the tree-structured tape splitter has better scalability.
  • the narrowband and wideband information obtained by the banding can be input to the system of the second embodiment for the broadband DTX decision, and finally the broadband noise characteristic information change metric J shown in the formula (4) is obtained, which is the joint ultra-wideband noise characteristic information for this embodiment.
  • the variation ⁇ and the broadband J get the full-band noise characteristic variation metric Jfl, as shown in (16):
  • the ultra-high noise characteristic variation metric ⁇ is described below.
  • the structure of the low-band and high-band portions of the SID frame used in this embodiment is as shown in Table 1, and the description is not repeated; the structure of the ultra-high band portion is as shown in Table 2. Show: Table 2: SID frame super high with bit allocation
  • the time-domain energy envelope of the ultra-high band is calculated by the formula (19)
  • N is 320 in the 20ms frame processing
  • ys is the super high band signal.
  • F v human j the calculation is similar to the high-band frequency domain envelope calculation, the difference is that the spectrum width is different, so the frequency domain envelope points can also be different, as shown in equation (20):
  • Ys is a super-highband spectrum, which can be calculated by FFT (Fast Fourier Transform) or by MDCT (Modified Discrete Cosine Transform). In equation (20), it is 320.
  • the dot spectrum width is taken as an example, and the frequency domain envelope is calculated to be 280 frequency points of 8Khz ⁇ 14KHz.
  • the frequency domain envelope can still be split into 3 sub-vectors for quantization.
  • the quantized super-highband time domain envelope 7 ⁇ and the frequency domain envelope ⁇ " 1 ⁇ of the previous SID frame are buffered in the memory, and the variation of the current frame super-highband component compared to the previous SID frame It can be calculated by formula (21a) or (21b):
  • the full-band noise characteristic variation metric is then calculated using equation (16). By using the decision rule shown in equation (17), it can be determined whether the current frame needs to be encoded to transmit the SID frame.
  • the DTX decision flow involved in the second embodiment and the third embodiment described above is the first DTX decision method described in the step sl03 of the first embodiment.
  • the second DTX decision method described in step s103 of the first embodiment may also be used.
  • the specific decision process is similar to the processes described in the foregoing embodiment 2 and the third embodiment. Repeat the description.
  • the structure of the SID frame used in this embodiment is as shown in Table 3: Table 3: Bit allocation of the SID frame
  • the system operates at 16k sample rate with an input signal bandwidth of 8kHz.
  • the full-rate frame of the SID frame consists of three layers, a low-band core layer, a low-band enhancement layer, and a high-band core layer.
  • the coding parameters used in the low-band core layer are basically similar to the SID frame coding parameters in Appendix B of G.729, which are quantized with 5 bits for the energy parameter, and quantized with 10 bits for the spectral parameter LSF; low-band enhancement layer Based on the low-band core layer, the quantization error of the energy and spectral parameters is further quantified, that is, the second-level quantization is used for the energy, and the third-order quantization is used for the spectrum, wherein the second-level quantization of the energy is performed.
  • the third-level quantization of the spectrum uses 6 bits; the high-band core layer uses encoding parameters similar to those in the TDBWE algorithm in G.729.1, but simplifies the 16-point time domain envelope into a time-domain energy gain, using 6 bits are quantized, the frequency domain envelope is still 12, and the split into 3 vectors is quantized using 14 bits.
  • the input signal is banded, that is, divided into two sub-bands, the low-band frequency range is 0 ⁇ 4kHz, and the high-band frequency range is 4kHz ⁇ 8kHz.
  • the input 16 kHz sample rate signal is banded using a QMF filter bank.
  • the low pass filter ( z ) is a 64-tap symmetrical FIR filter, a high-pass filter (which can be obtained by:
  • the encoder only needs to encode to the low-band core layer or the low-band enhancement layer, then the DTX decision only needs to be done for the low-band component.
  • the wideband portion encodes the wide-band signal component with the time domain envelope and the frequency domain envelope using a simplified TDBWE encoding algorithm.
  • the time domain envelope is calculated by equation (27):
  • Equation (11) The frequency domain envelope is calculated by equations (28), (29), (30), and (31). First use a 128-tap Hanning window to window the wideband signal, window function expression As shown in equation (11):
  • the signal after windowing is:
  • the short time frequency domain envelope is updated as follows:
  • the long-time envelope and the frequency domain envelope of the noise signal are also buffered in the memory.
  • Fenv ⁇ K is the long-term DTX decision of the current frame wideband component given by equation (33):
  • the long time frequency domain envelope is updated as follows:
  • the second DTX decision method described in the first embodiment may also be used.
  • the use is performed.
  • the change of the characteristic parameters of the low-band component and the high-band component is jointly judged, and the result of the independent decision is corrected.
  • the method provided in the foregoing embodiment comprehensively utilizes the noise characteristics in the speech codec bandwidth, and uses the method of band division and layered processing to give a comprehensive and reasonable DTX decision result in the noise coding stage, thereby enabling SID coding/CNG decoding. It is closer to the change in characteristics of actual noise.
  • Embodiment 5 of the present invention further provides a DTX decision device, as shown in FIG. 3, including:
  • the banding module 10 is configured to acquire a banding signal according to the input signal; and the signal of the specific sampling rate input may be banded by using the QMF filter bank.
  • the banding signal is a low band signal, and the low band signal further includes a low band core layer signal, or a low band core layer signal and a low band enhancement layer signal;
  • the signal is a broadband signal
  • the strip signal is a low band signal and a high band signal, the low band signal further comprising a low band core layer signal and a low band enhancement layer signal, the high band signal further comprising a high band core layer signal, or high The core layer signal and the high band enhancement layer signal;
  • the banding signal is a low band signal, a high band signal, and an ultra high band signal, and the low band signal further includes a low band core layer And a low band enhancement layer signal, the high band signal further comprising a high band core layer signal and a high band enhancement layer signal.
  • the determining module 30 is configured to perform a DTX decision according to the change amount of the feature information of each of the banded signals acquired by the feature information change amount acquiring module 20.
  • the decision module 30 further includes:
  • the weighting decision sub-module 31 is configured to weight the feature information change amount of each of the band-strip signals acquired by the feature information change amount acquiring module 20, and perform joint decision on the weighted result as a DTX decision standard.
  • the banding decision sub-module 32 is configured to use the feature information change amount of each band-strip signal acquired by the feature information change amount acquiring module 20 as a decision criterion of the band-splitting signal, and when the judgment results of different band-splitting signals are consistent, The decision result is used as a DTX decision criterion; when the decision results of different banded signals are inconsistent, the weighted decision sub-module is notified to perform a joint decision.
  • the structure of the feature information change amount acquisition module 20 is different depending on the processed signal.
  • the feature information change amount acquisition module 20 further includes: a low-band feature information change amount acquisition sub-module 21, configured to acquire the feature information change amount of the low-band signal.
  • the linear predictive analysis model is used to obtain feature information of the low band banding signal, the feature information includes energy information and spectrum information of the low band signal; and the feature information of the current time of the low band signal and the feature information of the past time are low. The amount of change in characteristic information with a signal.
  • the feature information change amount acquisition module 20 further includes: a low band feature information change amount acquisition sub-module 21, configured to acquire a feature information change amount of the low band signal; and a high band feature information change amount acquisition sub-module 22 , used to obtain the amount of change in the characteristic information of the high band signal.
  • the time domain bandwidth extension coding algorithm TDB WE is used to obtain feature information of the high band signal, and the feature information includes time domain envelope information and frequency domain envelope information of the high band signal.
  • the feature information change amount of the high band signal is obtained based on the feature information of the current time of the high band signal and the feature information of the past time.
  • the feature information change amount acquisition module 20 further includes: a low-band feature information change amount acquisition sub-module 21, configured to acquire feature information of the low-band signal a change amount; a high-band feature information change amount acquisition sub-module 22, configured to acquire a feature information change amount of the high-band signal; a super-high band feature information change amount acquisition sub-module 23, configured to acquire a feature information change amount of the ultra-high band signal .
  • a low-band feature information change amount acquisition sub-module 21 configured to acquire feature information of the low-band signal a change amount
  • a high-band feature information change amount acquisition sub-module 22 configured to acquire a feature information change amount of the high-band signal
  • a super-high band feature information change amount acquisition sub-module 23 configured to acquire a feature information change amount of the ultra-high band signal .
  • TDBWE acquires feature information of the ultra-high band signal, and the feature information includes time domain envelope information and frequency domain envelope information of the super high band signal.
  • the characteristic information change amount of the super high band signal is obtained according to the feature information of the current time of the super high band signal and the feature information of the past time.
  • the structure of the low-band feature information change amount acquisition sub-module 21 is as shown in FIG. 4, and further includes:
  • a low-band layering unit for layering the input low-band signal into a low-band core layer signal and a low-band enhancement layer signal, and respectively transmitting the low-band core layer feature information change amount acquiring unit and the low-band enhancement layer characteristic information Change amount acquisition unit;
  • a low-band core layer feature information change amount acquiring unit configured to acquire a feature information change amount of the low-band core layer signal
  • a low-band enhancement layer feature information change amount acquisition unit configured to acquire a feature information change amount of the low-band enhancement layer signal
  • a low-band integration unit a feature information change amount of the low-band core layer signal acquired by the low-band core layer feature information change amount acquisition unit, and a low-band acquired by the low-band enhancement layer feature information change amount acquisition unit
  • the characteristic information change amount of the enhancement layer signal is integrated as the low-band characteristic information change amount
  • a low-band control unit configured to: when the low-band signal only relates to a low-band core layer, use an output of the low-band core layer decision sub-module as a feature information change amount of the low-band signal; when the band-strip signal arrives In the case of the low band enhancement layer, the output of the low band integrated unit is used as the characteristic information change amount of the low band signal.
  • a high-band layering unit for layering the input high-band signal into a high-band core layer signal and a high-band enhancement layer signal, and respectively transmitting to the high-band core layer feature information change amount acquiring unit And a high-band enhancement layer feature information change amount acquisition unit;
  • a high-band core layer feature information change amount acquiring unit configured to acquire a feature information change amount of the high-band core layer signal
  • a high-band enhancement layer feature information change amount acquisition unit configured to acquire a feature information change amount of the high-band enhancement layer signal
  • a high-band integration unit configured to change a feature information of a high-band core layer signal acquired by the high-band core layer feature information change amount acquisition unit, and a high-band acquired by the high-band enhancement layer feature information change amount acquisition unit
  • the characteristic information change amount of the enhancement layer signal is integrated as the high-band characteristic information change amount
  • a high-band control unit configured to: when the high-band signal only relates to a high-band core layer, use an output of the high-band core layer decision sub-module as a feature information change amount of a high-band signal; when the band-strip signal arrives In the case of the high band enhancement layer, the output of the high band integration unit is used as the characteristic information variation of the high band signal.
  • FIG. 5 An application scenario using the DTX decision device as shown in FIG. 3 above is shown in FIG. 5.
  • the input signal is determined by the VAD as a speech frame or a silence frame (background noise frame), and for the speech frame, the speech frame is followed by a branch.
  • the DTX decision device provided in Embodiment 4 of the present invention is used to determine whether the encoder will
  • the current noise frame is encoded and transmitted.
  • FIG. 6 Another application scenario using the DTX decision device shown in FIG. 3 above is as shown in FIG. 6.
  • the input signal is determined by the VAD as a speech frame or a silence frame (background noise frame), and for the voice frame, the voice is performed according to the following branch.
  • Frame coding outputting a speech frame code stream; for a silence frame (background noise frame), performing noise coding according to the above branch.
  • the DTX decision apparatus provided in Embodiment 4 of the present invention is used to determine whether the encoder is Transmit encoded noise frame data.
  • the noise characteristics in the speech codec bandwidth are fully utilized, and the method of banding and layered processing is used to give a comprehensive and reasonable DTX decision result in the noise coding stage, thereby making the SID coding/ CNG decoding is closer to the actual noise characteristics.
  • the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.
  • a non-volatile storage medium which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • the instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Description

一种 DTX判决方法和装置
技术领域
本发明涉及信号处理技术领域, 尤其涉及一种 DTX ( Discontinuous Transmission System, 非连续传输系统 )判决方法和 装置。 背景技术
语音编码技术可以压缩语音信号的传输带宽,增加通信系统的容 量。 由于语音通信中只有大约 40 %是包含语音的, 其它时间都是静 音或背景噪声, 为了进一步节省传输带宽, DTX/CNG ( Comfortable Noise Generation, 舒适噪声生成)技术应运而生。 该技术使得编码器 可以对背景噪声信号釆用不同于语音信号的编解码算法,降低了平均 码率。 简单说来, DTX/CNG技术就是在编码端对背景噪声段进行编 码时, 不需要像语音帧那样进行全速率的编码, 也不需要对每一帧背 景噪声进行编码,而是相隔若干帧才发送一次相比于语音帧更少量的 编码参数(SID帧)即可; 而在解码端, 则根据接收到的非连续的背 景噪声帧的参数, 恢复出连续的背景噪声, 并且不会明显影响主观听 觉质量。
非连续的背景噪声编码帧通常称为 SID ( Silence Insertion Descriptor, 静音插入描述)帧, SID帧中一般只包含谱参数和信号能 量参数, 相对于语音编码帧没有固定码本、 自适应码本等相关参数, 并且 SID帧不会连续传输,从而降低了平均码率。背景噪声编码阶段, 一般是通过提取出的噪声参数进行检测, 确定是否需要发送 SID帧。 这一过程可以称为 DTX ( Discontinuous Transmission, 不连续发射) 判决, DTX判决的输出是 "1 "或" 0", 表示需要或不需要发送 SID帧。 DTX判决的结果也反映出了当前噪声的性质是否出现了明显的变化。
G.729.1是 ITU最新发布的新一代语音编解码标准, 这种嵌入式 语音编解码标准最大的特点是具有分层编码的特性,能够提供码率范 围在 8kb/s~32kb/s的窄带到宽带的音频质量, 允许在传输过程中, 根 据信道状况丟弃外层码流, 具有良好的信道自适应性。
在 G.729.1标准中, 通过将码流构造成嵌入式的分层结构来达到 分级性, 其核心层使用 G.729标准进行编码, 是一种新型的嵌入式可 分层的多速率语音编解码器编码器。 G.729.1 各层编码器系统框图如 图 1所示。输入为 20ms的超帧,当釆样率为 16000Hz,帧长为 320点, 输入信号 首先经过 QMF滤波 ^^),^^))分成两个子带, 低子 带信号5 经过 50Hz截止频率的高通滤波器进行预处理, 输出信 号 使用 8kb/s~12kb/s的窄带嵌入式 CELP编码器进行编码 , (") 和 12Kb/s码率下 CELP编码器的本地合成信号 之间的差值信号 (")经过知觉加权滤波( WLB (z) )后的信号^ s( ":)通过 MDCT变换到 频域。加权滤波器 ^ ^( 包含了增益补偿,用来保持滤波器输出^ ^(") 与高子带输入信号 ^ (")之间的谱连续性。 加权后的差值信号要变换 到频域内。
高子带分量乘上 进行谱反转之后的信号 »通过截止频 率为 3000HZ 的低通滤波器进行预处理, 滤波后的信号 使用 TDBWE编码器进行编码。 进入 TDAC编码模块的^ (")也要先使用 MDCT变换到频域上。
两组 MDCT系数 和 最后使用 TDAC进行编码。另外, 还有一些参数用 FEC (丟帧错误隐蔽 )编码器进行传输, 用以改进在 传输中出现丟帧时造成的错误。
G.729.1 编码器编码出的全速率码流共有 12 层, 核心层速率为 8kb/s, 是 G.729的码流; 低带增强层编码速率为 12kb/s, 是对核心层 固定码本编码的增强, 12kb/s与 8kb/s都对应着窄带的信号分量; 编 码速率为 14kb/s的层釆用 TDBWE编码器, 对应的是宽带信号分量; 从 16kb/s~32kb/s是对全带信号的增强编码。
3GPP ( the 3rd Generation Partner Project, 第三代合作伙伴计划) 的语音编解码标准 AMR ( Adaptive Multi-Rate , 自适应多码率声码器) 釆用的 DTX策略是在语音段结束时, 使用一个只有 1比特有效数据 的 SID— FIRST帧表示噪声段的开始, 在 SID— FIRST帧之后第三帧发 送第一个包含具体噪声信息的 SID— UPDATE帧, 以后按照固定间隔 每 8帧发送一次 SID— UPDATE帧。只有 SID— UPDATE帧包含有舒适 噪声参数的编码数据。
AMR中使用固定间隔发送 SID帧的策略无法根据噪声的实际特 性自适应地发送 SID帧, 即无法保证在必需的时候才发送 SID帧。 在实际通信系统中使用该方法的缺点在于, 一方面, 噪声特性已经发 生明显变化,但是由于没有发送 SID帧,解码端无法及时得到已经变 化的噪声信息; 另一方面, 到了可以发送 SID帧的时候, 可能噪声特 性在相当长一段时间内(大于 8帧)维持稳定,并不需要发送 SID帧, 这样就造成了带宽的浪费。
ITU ( International Telecom Union, 国际电信联盟)的语音编码标 准一共轭结构代数码本激励线性预测声码器 (G.729 ) 定义的静音压 缩方案中, 在编码端的 DTX策略使用的是根据窄带噪声参数的变化 情况, 自适应地确定是否发送 SID, 前后两帧 SID的间隔最小为 20 毫秒, 最大则不限。 该方法的缺点在于, 仅利用了从窄带信号中提取 出的能量参数和谱参数来指导 DTX判决, 而没有使用宽带分量的信 息, 因此对于宽带语音应用场景可能无法给出全面恰当的 DTX判决 结果。
另外, 随着宽带语音编码器的日益广泛应用, 以及超宽带技术的 逐步发展, 类似 G.729.1这样的嵌入式分层结构的宽带声码器标准已 经发布并走向应用。 在这种分层结构的宽带声码器中, 上述 AMR中 的 DTX机制以及 ITU中的 G.729无法最大限度地利用噪声窄带和宽 带分量的信息, 可能无法给出全面反映实际噪声性质的 DTX判决结 果, 也就无法体现分层编码的优势。 发明内容 本发明实施例提供一种 DTX判决方法和装置, 以实现对噪声信 号的分带及分层处理, 得到全面合理的 DTX判决结果。
为达到上述目的, 本发明实施例提供一种 DTX判决方法, 包括 以下步骤:
根据输入的信号获取分带信号;
获取每一所述分带信号的特征信息变化量;
根据每一所述分带信号的特征信息变化量进行 DTX判决。
本发明的实施例还提供一种 DTX判决装置, 包括:
分带模块, 用于根据输入的信号获取分带信号;
特征信息变化量获取模块,用于获取所述分带模块分带后每一分 带信号的特征信息变化量;
判决模块,用于根据所述特征信息变化量获取模块获取的每一分 带信号的特征信息变化量进行 DTX判决。
通过全面利用语音编解码带宽内的噪声特性,使用分带及分层处 理的方法在噪声编码阶段给出全面、 合理的 DTX判决结果, 从而使 得 SID编码 /CNG解码更能贴近实际噪声的特性变化。 附图说明
图 1是现有技术中 G.729.1各层编码器系统框图;
图 2是本发明的实施例一中一种 DTX判决方法的流程图; 图 3是本发明的实施例五中一种 DTX判决装置的结构示意图; 图 4是本发明的实施例五中 DTX判决装置的低带特征信息变化 量获取子模块的结构示意图;
图 5是本发明的实施例五中 DTX判决装置的使用场景示意图; 图 6是本发明的实施例五中 DTX判决装置的另一使用场景示意 图。
具体实施方式
本发明的实施例一中, 一种 DTX判决方法如图 1所示, 包括: 步骤 Sl01、 对输入的信号进行分带。
该步骤中, 当输入的信号为宽带信号时, 可以将该宽带信号分成 低带和高带两个子带; 当输入的信号为超宽带信号时, 可以将该超宽 带信号一次分为低带、 高带和超高带信号; 或先分为超高带信号和宽 带信号, 再将宽带信号分为低带和高带信号。 对于低带信号, 可以进 一步分为低带核心层信号和低带增强层信号; 对于高带信号, 可以进 一步分为高带核心层信号和高带增强层信号。 该分带可以通过 QMF ( Quadrature Mirror Filter, 正交镜像滤波器组) 实现。 具体的划分标 准可以为: 窄带信号是指频带 0~4000Hz的信号, 宽带信号是指频带 在 0~8000Hz的信号, 超宽带信号是指频带在 0~16000Hz的信号。 窄 带或低带(宽带分量 )信号均指 0~4000Hz的信号, 高带(宽带分量 ) 信号是指 4000~8000Hz 的信号, 超高带 (超宽带分量)信号是指 8000-16000Hz的信号。
该步骤前还包括: 当 VAD ( Voice Activity Detector, 语音激活检 测)功能检测到信号从语音变为噪声后, 编码算法进入拖尾阶段。 在 拖尾阶段, 编码器仍然按照语音帧编码算法对输入的信号进行编码, 其主要作用是估计噪声的特性, 对后续的噪声编码算法进行初始化。 拖尾阶段结束后启动噪声编码, 对输入的信号进行分带。
步骤 sl02、 获取每一分带信号的特征信息和特征信息变化量。 具体的, 对于低带信号, 特征信息包括低带信号的能量信息和谱 信息, 可以通过使用线性预测分析模型获取。
对于高带信号和超高带信号,特征信息包括时域包络信息和频域 包络信息, 可以通过 TDB WE ( Time Domain Band Width Extension , 时域带宽扩展)编码算法获取。
根据获取的分带内信号的特征信息,与过去时刻获取的分带内信 号的特征信息进行比较, 可以得到分带内信号的变化度量。
步骤 sl03、 根据获取的分带信号的特征信息变化量进行 DTX判 决。
对于宽带信号,将低带噪声特性变化度量和高带噪声特性变化度 量进行综合作为宽带的 DTX判决结果; 对于超宽带信号, 将宽带信 号特性变化度量和超高带信号特性变化度量进行综合作为整个超宽 带的 DTX判决结果。
4叚设将输入的噪声信号的全速率的编码信息分为低带核心层、低 带增强层、 高带核心层、 高带增强层和超高带层, 对应的编码速率依 次增大。 则噪声分层结构可以映射为实际的编码速率。
如果实际编码仅涉及低带核心层, 则 DTX判决仅计算低带核心 层对应的特征信息变化量, 如果判决函数值大于一定阔值, 则发送 SID帧, 否则不发。
如果实际编码到了低带增强层, 则 DTX判决可以使用低带核心 层及低带增强层的特征信息变化量进行联合判决,如果判决函数值大 于一定阔值, 则发送 SID帧, 否则不发。
如果实际编码到高带核心层,则使用低带分量的联合特征信息变 化量与高带核心层对应的特征信息变化量进行综合 DTX判决, 如果 判决函数值大于一定阔值, 则发送 SID帧, 否则不发。
如果实际编码到了高带增强层,则使用低带分量的联合特征信息 变化量与宽带分量的联合特征信息变化量进行综合 DTX判决, 如果 判决函数值大于一定阔值, 则发送 SID帧, 否则不发。
如果实际编码到了超高带,那么可以使用全带信号的联合特征信 息变化量进行 DTX判决,如果判决函数值大于一定阔值,则发送 SID 帧, 否则不发。
基于上述描述, 全带信号的特征信息变化量可用式(1 )表示:
J = oJx + β + 7J3 ( 1 ) 根据该式, 可以得到 DTX判决的第一种方法:
其中, + + y= l , AAA分别表示计算出的低带、 高带和超高 带的特诊信息变化量。 则 DTX判决规则如式(2 )表示, 当^ > 1时,
DTX判决输出^ 为 1 , 表示需要对噪声帧编码信息进行传输; 否则 dtxjag为 表示不需要对噪声帧编码信息进行传输: dtx _ flag = 1 J >\
dtx _ flag = 0 J <\ (2) 当只需要编码到低带核心层或低带增强层时, 则式(1) 简化为
J= J\ (3) 当需要编码到高带核心层或高带增强层时, 式(1) 简化为
J= α/,+ ^J2 (4 ) 其中, + β= 1。 当然也可以使用另外的 DTX判决方式,如以下的第二种 DTX判 决方法:
使用 Α, Α分别表示计算出的低带、高带和超高带的特征信息变 化量: 当编码到低带核心层或低带增强层时, 同公式(3), 使用 ^作为 DTX判决标准; 当需要编码到高带核心层或高带增强层时,使用 ^和 ^作为 DTX 判决标准, 当 ^和 均小于 1时, DTX判决输出6 ^-^g为 0, 表示 不需要对噪声帧编码信息进行传输; 当 ^和 ^均大于 1 时, DTX判 决输出^ 为 1,表示需要对噪声帧编码信息进行传输; 当 ^和 ^ 不同时大于 1或小于 1时, 按照公式(4 )将 j= 2作为 DTX判 决标准;
当需要编码到超高带时, 使用 ^、 ^和 Λ作为 DTX判决标准, 当 Λ和 Λ均小于 i时, DTX判决输出6 ^-^g为 0, 表示不需要 对噪声帧编码信息进行传输; 当 ^、 ^和 Λ均大于 1 时, DTX判决 输出 为 L表示需要对噪声帧编码信息进行传输; 当 Λ和 不同时大于 1或小于 1时, 按照公式(1)将 ^ ^+^ ) ^作为 DTX判决标准。
上述两种方法都可以用于 DTX的判决输出。
以下结合具体的应用场景,对本发明实施例的实施方式作进一步 描述。
本发明的实施例二中, 以对输入的宽带信号进行的 DTX判决为 例, 说明本发明中一种 DTX判决方法的实施方式。
本实施例中使用的 SID帧的结构如表 1所示: 表 1 : SID帧的比特分配
Figure imgf000010_0001
系统工作在 16k釆样率, 输入信号带宽 8kHz。 SID帧的全速率 帧包含 3层, 分别是低带核心层、 低带增强层和高带核心层。 低带核 心层使用的编码参数与 G.729附录 B中的 SID帧编码参数基本类似, 分别是对能量参数釆用 5比特进行量化, 对于谱参数 LSF釆用 10比 特进行量化; 低带增强层是在低带核心层的基础上, 对能量和谱参数 的量化误差进行进一步量化, 也就是说对能量釆用第二级量化, 对谱 釆用第三级量化, 其中能量的第二级量化使用 3比特, 谱的第三级量 化使用 6比特; 高带核心层釆用类似 G.729.1中 TDBWE算法中的编 码参数, 不过将 16点时域包络简化为 1个时域能量增益, 使用 6比 特进行量化, 频域包络仍然是 12个, 分裂为 3个矢量共使用 14比特 进行量化。
首先对输入的信号进行分带, 即分成高低两个子带, 低带频率范 围为 0~4kHz, 高带频率范围为 4kHz~8kHz。 具体的, 使用 QMF滤 波器组对输入的 16kHz 釆样率的信号 进行分带, 低通滤波器 (z)是一个 64抽头的对称的 FIR滤波器, 高通滤波器 ( 可以由 得到:
/¾(«) = (-1)"/¾(«) ( 5 ) 则窄带分量可由式(6)得到:
31
t (") =∑ A U) wB (n + l + j) + m (n - j)]
( 6 ) 宽带分量可由式(7)得到:
31
yh (") =∑^2 U) wB (n + l + j) + m (n― j)]
( 7 ) 低带分量 进行 LPC分析, 得到 LPC 系数 α'' (i=l...M), M 是 LPC分析的阶数, 以及残差能量参数 E; 緩存区中保存上一个 SID 帧量化后的 LPC系数 ^')和残差能量
如果编码器只需要编码到低带核心层或低带增强层, 则 DTX判 决仅需要针对低带分量进行即可。
利用式(8)计算出低带的变化量
Et q-E\ ∑¾( ^()
thr\ E? - thrl (8) 其中 分别表示对能量变化和谱变化的加权系数, 分别 表示当前帧和上一个 SID帧量化后的能量参数, (0为当前帧窄带信 号分量的自相关系数,
Figure imgf000011_0001
分别表示能量参数和谱参数 变化的阔值,该阔值反映了人耳对能量和谱变化的敏感程度, M是线 性预测的阶数, 由上一个 SID帧量化后的 LPC系数用式(9) 计算得到:
Figure imgf000012_0001
则可以利用式(8)计算出低带信号的变化量, 并利用式(3)和 式(2)得到 DTX判决结果。
由于本实施例中低带核心层和低带增强层使用的参数完全相同, 增强层仅仅是对核心层的参数进行了进一步量化, 因此如果编码速率 达到了低带增强层, DTX判决的过程与式(8)和式(9)基本相同, 只是使用的能量参数和谱参数是增强层中的量化结果,这里对此判决 过程不做重复描述。
如果编码器需要编码高带核心层,则除了要按照式( 8 )计算出 ^ 以夕卜,还要计算出宽带的变化量 ^。 宽带部分用简化的 TDBWE编码 算法对宽带信号分量提取出时域包络和频域包络进行编码。其中时域 包络由式(10)计算:
1
(10) 其中, 为帧长, 在 G.729.1中 = 160。
频域包络由式(11)、 式(12)、 式(13)和式(14)计算得到。 首先使用一个 128抽头的汉宁窗对宽带信号进行加窗,窗函数表达式 如式( 11 ) 所示:
Figure imgf000012_0002
加窗后的信号为:
n)-w (n + 3l). « = -31 96 (12) 对加窗后的信号进行 128点的 FFT, 使用多项结构实现:
Y (k ) = FFT64 (y (n) + y (n + 64)), A: = 0_,63;" =— 3 ,… ,32 利用计算出的 FFT系数求取加权的频域包络:
Figure imgf000013_0001
内存中緩存了上一个 SID帧的量化后的时域包络7 ^和频域包 络 Fmv , 则当前帧宽带分量相比于上一个 SID帧的变化量可用式
( 15a )或 (15b)计算得到:
或:
J
Figure imgf000013_0002
分别得到窄带的变化量 ^和宽带的变化量 ^ ,则窄带和宽带的联 合变化量可以用式(4 )求得。 利用式(2 )所示的判决规则, 即可判 决出当前帧是否需要编码发送 SID帧。
本发明的实施例三中, 以对输入的超宽带信号进行的 DTX判决 为例, 说明本发明中一种 DTX判决方法的实施方式。
本实施例处理的信号为 32kHz釆样, 经过分带处理分别得到低 带、高带和超高带噪声分量。对于分带处理,可以给予树形结构实现, 即经过一次 QMF分成超高带和宽带信号, 再经过一次 QMF将宽带 信号分成低带和高带信号;也可以基于一个非等宽子带滤波器组直接 将输入信号分成低带、 高带和超高带信号分量。 显然, 树形结构的分 带器具有更好的可扩展性能。分带得到的窄带和宽带信息可以输入到 实施例二的系统进行宽带 DTX判决, 并最终得到(4 )式所示的宽带 噪声特征信息变化度量 J, 对于本实施例就是联合超宽带噪声特征信 息变化量 ^及宽带的 J得到全带噪声特征变化度量 Jfl, 如(16 ) 式 所示:
Ja = r- J + ^s ( 16 ) 利用全带的噪声特征变化度量 Ja进行 DTX判决,输出全带 DTX 判决结果 dtx_flag, 如(17 ) 式所示:
Figure imgf000014_0001
Λ 下面叙述超高带噪声特性变化度量 ^, 本实施例中使用的 SID 帧低带和高带部分的结构如表 1中所示, 不做重复描述; 超高带部分 的结构如表 2所示: 表 2: SID帧超高带比特分配
Figure imgf000014_0004
超高带的时域能量包络由式(19 ) 式计算得到
Figure imgf000014_0002
其中 N在 20ms帧处理时为 320 , ys为超高带信号。 对于频域包 络 F v人 j、的计算类似高带的频域包络计算, 不同的是频谱宽度不一 样, 因此频域包络的点数也可以不一样, 如式(20 )所示:
Fenvs =— log.
Figure imgf000014_0003
其中 Ys为超高带频谱, 可以通过 FFT ( Fast Fourier Transform , 快速傅里叶变换)计算,也可以通过 MDCT ( Modified Discrete Cosine Transform, 改进型离散余弦变换)计算, 式( 20 )中是以 320点频谱 宽度为例的, 并且计算频域包络为 8Khz~14KHz共 280个频点。为了 量化的方便, 仍然可以将频域包络分裂为 3个子矢量进行量化。 内存中緩存了上一个 SID帧的量化后的超高带时域包络7 ^ 和 频域包络^"1^^^, 则当前帧超高带分量相比于上一个 SID帧的变化 量可用式(21a )或 (21b)计算得到:
Figure imgf000015_0001
Figure imgf000015_0002
再用式(16 )计算全带噪声特征变化度量。 再利用式(17 )所示 的判决规则, 即可判决出当前帧是否需要编码发送 SID帧。
上述实施例二和实施例三中所涉及的 DTX判决流程, 使用的均 为实施例一的步骤 sl03中描述的第一种 DTX判决方法。对于实施例 二和实施例三, 也可以使用实施例一的步骤 sl03 中描述的第二种 DTX 判决方法, 具体的判决过程于上述实施例二和实施例三种描述 的过程相似, 在此不进行重复描述。
本发明的实施例四中, 以对输入的宽带信号进行的 DTX判决为 例, 说明本发明中一种 DTX判决方法的实施方式。
本实施例中使用的 SID帧的结构如表 3所示: 表 3: SID帧的比特分配
Figure imgf000015_0003
第三级 LSF量化矢量 6 宽带分量时域包络 6 宽带分量频域包络矢量 1 5
高带核心层 宽带分量频域包络矢量 2 5 宽带分量频域包络矢量 3 4
系统工作在 16k釆样率, 输入信号带宽 8kHz。 SID帧的全速率 帧包含 3层, 分别是低带核心层、 低带增强层和高带核心层。 低带核 心层使用的编码参数与 G.729附录 B中的 SID帧编码参数基本类似, 分别是对能量参数釆用 5比特进行量化, 对于谱参数 LSF釆用 10比 特进行量化; 低带增强层是在低带核心层的基础上, 对能量和谱参数 的量化误差进行进一步量化, 也就是说对能量釆用第二级量化, 对谱 釆用第三级量化, 其中能量的第二级量化使用 3比特, 谱的第三级量 化使用 6比特; 高带核心层釆用类似 G.729.1中 TDBWE算法中的编 码参数, 不过将 16点时域包络简化为 1个时域能量增益, 使用 6比 特进行量化, 频域包络仍然是 12个, 分裂为 3个矢量共使用 14比特 进行量化。
首先对输入的信号进行分带, 即分成高低两个子带, 低带频率范 围为 0~4kHz, 高带频率范围为 4kHz~8kHz。 具体的, 使用 QMF滤 波器组对输入的 16kHz 釆样率的信号 进行分带, 低通滤波器 (z)是一个 64抽头的对称的 FIR滤波器, 高通滤波器 ( 可以由 得到:
/¾(«) = (-1)"/¾(«) ( 22 ) 则窄带分量可由式(23 )得到:
31
t (") =∑ A U) wB (n + l + j) + m (n - j)]
( 23 ) 宽带分量可由式(24 )得到: yh (") =∑^2 UiswB (" + 1 + _/') + s (n― )]
( 24 ) 低带分量 进行 LPC分析, 得到 LPC 系数 α'' (i=l...M), M 是 LPC分析的阶数, 以及残差能量参数 E; 緩存区中保存上一个 SID 帧量化后的 LPC系数 ^')和残差能量
如果编码器只需要编码到低带核心层或低带增强层, 则 DTX判 决仅需要针对低带分量进行即可。
利用式(25)得出低带分量的 DTX判决结果:
MM
1 > t/zrl或者 Z R】id (/) · R' (/) > Et q · thrl
dtx nb
0 其他 ( 25 ) 其中 分别表示对能量变化和谱变化的加权系数, 分别 表示当前帧和上一个 SID帧量化后的能量参数,如果当前编码速率仅 为低带核心层, 则使用核心层的量化结果, 如果当前编码速率为低带 增强层或者更高, 则使用增强层的量化结果, 为当前帧窄带信号 分量的自相关系数,
Figure imgf000017_0001
分别表示能量参数和谱参数变 化的阔值,该阔值反映了人耳对能量和谱变化的敏感程度, M是线性 预测的阶数, R 由上一个 SID帧量化后的 LPC系数用式( 26 )计 算得到:
Figure imgf000017_0002
Rn , j = o
k=° ( 26 ) 如果编码器需要编码高带核心层,宽带部分用简化的 TDBWE编 码算法对宽带信号分量提取出时域包络和频域包络进行编码。其中时 域包络由式(27)计算:
1 N-1
2 ^ (27) 其中, 为帧长, 在 G.729.1中 = 160
频域包络由式(28)、 式(29)、 式(30)和式(31)计算得到。 首先使用一个 128抽头的汉宁窗对宽带信号进行加窗,窗函数表达式 如式( 11 ) 所示:
(28 )
Figure imgf000018_0001
加窗后的信号为:
yh w(n) = yh(n)-wF(n + \), " = _31,...,96 (29 ) 对加窗后的信号进行 128点的 FFT, 使用多项结构实现:
Y (k) = FFT64 iy (n) + yh w (n + 64)), A: = 0,...,63;« = _31"..,32 ( 30 ) 利用计算出的 FFT系数求取加权的频域包络:
Figure imgf000018_0002
内存中緩存了噪声信号短时时域包络 Tm 和频域包络 (0 , 则当前帧宽带分量的短时 DTX判决由式(32 )给出:
- Fenvst (/)| > thrA
dtx wb.
Figure imgf000018_0003
其他 ( 32 ) 短时时域包络按下式更新:
Tenvst = px Tenv st +(\- p)x Tenv
短时频域包络按下式更新:
Fenvst (/') = px Fenvst (/') + (\- p)x Fenv(i)
内存中还緩存了噪声信号长时时域包络 和频域包络
Fenv^K则当前帧宽带分量的长时 DTX判决由式(33 )给出:
- Fenvlt > thr6
Figure imgf000018_0004
否则 ( 33 ) 分别得到宽带分量的短时 DTX判决和长时 DTX判决之后 ,用下 式获得宽带分量的综合判决:
Figure imgf000018_0005
当^ - = l时, 长时时域包络按下式更新: Tenvlt - ψχ Tenvlt + (1 - x Tenv
长时频域包络按下式更新:
Fenvlt (/) = y x Fenvlt (/) + (l - ^)x Fenv{i)
^口果 i i — ^) = dtx _nb , 贝,】 dtx _flag - dtx _wb - dtx _nb · 否贝 ij , 需要 进行综合判决, 具体方法如下:
首先使用式(8 ) 所示的方法, 求得低带的变化量 然后使用 式(15a )或(15b )所示的方法, 求得高带的变化量 ; 再用式(4 ) 求得低带、 高带的联合变化量^ 最后使用式(2 )所示的判决准则, 得到最终的 DTX判决结果^ 。
在本实施例中, 还可以使用上述实施例一中描述的第二种 DTX 判决方法: 在低带、 高带分别进行独立判决的基础上, 如果两个带独 立判决的结果不一致时, 则使用低带分量、 高带分量的特征参数的变 化量进行联合判决, 对独立判决的结果进行修正。
上述实施例提供的方法,全面的利用了语音编解码带宽内的噪声 特性, 使用分带及分层处理的方法在噪声编码阶段给出全面、合理的 DTX判决结果,从而使得 SID编码 /CNG解码更能贴近实际噪声的特 性变化。
本发明的实施例五还提供了一种 DTX判决装置, 如图 3所示, 包括:
分带模块 10, 用于根据输入的信号获取分带信号; 可以利用使 用 QMF滤波器组对输入的特定釆样率的信号进行分带。 所述信号为 窄带信号时, 所述分带信号为低带信号, 所述低带信号进一步包括低 带核心层信号、 或低带核心层信号和低带增强层信号; 所述信号为宽 带信号时, 所述分带信号为低带信号和高带信号, 所述低带信号进一 步包括低带核心层信号和低带增强层信号,所述高带信号进一步包括 高带核心层信号、 或高带核心层信号和高带增强层信号; 所述信号为 超宽带信号时, 所述分带信号为低带信号、 高带信号和超高带信号, 所述低带信号进一步包括低带核心层信号和低带增强层信号,所述高 带信号进一步包括高带核心层信号和高带增强层信号。 特征信息变化量获取模块 20 , 用于获取所述分带模块分带后每 一分带信号的特征信息变化量。
判决模块 30 , 用于根据所述特征信息变化量获取模块 20获取的 每一分带信号的特征信息变化量进行 DTX判决。该判决模块 30进一 步包括:
加权判决子模块 31 , 用于将特征信息变化量获取模块 20获取的 每一分带信号的特征信息变化量进行加权 ,将加权后的结果进行联合 判决, 作为 DTX判决标准。 分带判决子模块 32, 用于将特征信息变 化量获取模块 20获取的每一分带信号的特征信息变化量作为所述分 带信号的判决标准, 不同分带信号的判决结果一致时, 将所述判决结 果作为 DTX判决标准; 不同分带信号的判决结果不一致时, 通知所 述加权判决子模块进行联合判决。
具体的, 根据所处理的信号的不同, 特征信息变化量获取模块 20的结构不同。
当用于低带信号时, 特征信息变化量获取模块 20进一步包括: 低带特征信息变化量获取子模块 21 , 用于获取低带信号的特征信息 变化量。 具体的, 使用线性预测分析模型, 获取低带分带信号的特征 信息, 该特征信息包括低带信号的能量信息和谱信息; 根据低带信号 当前时刻的特征信息和过去时刻的特征信息获取低带信号的特征信 息变化量。
当用于宽带信号时, 特征信息变化量获取模块 20进一步包括: 低带特征信息变化量获取子模块 21 , 用于获取低带信号的特征信息 变化量; 高带特征信息变化量获取子模块 22, 用于获取高带信号的 特征信息变化量。 具体的, 使用时域带宽扩展编码算法 TDB WE, 获 取高带信号的特征信息,该特征信息包括高带信号的时域包络信息和 频域包络信息。根据高带信号当前时刻的特征信息和过去时刻的特征 信息获取高带信号的特征信息变化量。
当用于超宽带信号时,特征信息变化量获取模块 20进一步包括: 低带特征信息变化量获取子模块 21 , 用于获取低带信号的特征信息 变化量; 高带特征信息变化量获取子模块 22, 用于获取高带信号的 特征信息变化量; 超高带特征信息变化量获取子模块 23 , 用于获取 超高带信号的特征信息变化量。 具体的, 使用时域带宽扩展编码算法
TDBWE, 获取超高带信号的特征信息, 该特征信息包括超高带信号 的时域包络信息和频域包络信息。根据超高带信号当前时刻的特征信 息和过去时刻的特征信息获取超高带信号的特征信息变化量。
具体的, 当低带信号进一步包括低带核心层信号和低带增强层信 号时, 低带特征信息变化量获取子模块 21的结构如图 4所示, 进一 步包括:
低带分层单元,用于将输入的低带信号分层为低带核心层信号和 低带增强层信号,并分别发送到低带核心层特征信息变化量获取单元 和低带增强层特征信息变化量获取单元;
低带核心层特征信息变化量获取单元,用于获取低带核心层信号 的特征信息变化量;
低带增强层特征信息变化量获取单元,用于获取低带增强层信号 的特征信息变化量;
低带综合单元,用于将所述低带核心层特征信息变化量获取单元 获取的低带核心层信号的特征信息变化量、和所述低带增强层特征信 息变化量获取单元获取的低带增强层信号的特征信息变化量进行综 合作为低带的特征信息变化量;
低带控制单元, 用于当所述低带信号仅涉及低带核心层时, 将所 述低带核心层判决子模块的输出作为低带信号的特征信息变化量; 当 所述分带信号到达低带增强层时,将所述低带综合单元的输出作为低 带信号的特征信息变化量。
具体的, 当高带信号进一步包括高带核心层信号和高带增强层信 号时, 高带特征信息变化量获取子模块 22的结构与图 4所示低带特 征信息变化量获取子模块 21的结构相似, 进一步包括:
高带分层单元,用于将输入的高带信号分层为高带核心层信号和 高带增强层信号,并分别发送到高带核心层特征信息变化量获取单元 和高带增强层特征信息变化量获取单元;
高带核心层特征信息变化量获取单元,用于获取高带核心层信号 的特征信息变化量;
高带增强层特征信息变化量获取单元,用于获取高带增强层信号 的特征信息变化量;
高带综合单元,用于将所述高带核心层特征信息变化量获取单元 获取的高带核心层信号的特征信息变化量、和所述高带增强层特征信 息变化量获取单元获取的高带增强层信号的特征信息变化量进行综 合作为高带的特征信息变化量;
高带控制单元, 用于当所述高带信号仅涉及高带核心层时, 将所 述高带核心层判决子模块的输出作为高带信号的特征信息变化量; 当 所述分带信号到达高带增强层时,将所述高带综合单元的输出作为高 带信号的特征信息变化量。
使用如上述图 3所示的 DTX判决装置的一应用场景如图 5所示, 输入的信号经过 VAD判决为语音帧或静音帧 (背景噪音帧 ),对于语 音帧则按照下面一条分支进行语音帧编码, 输出语音帧码流; 对于静 音帧 (背景噪音帧), 则按照上面一条分支进行噪音的编码, 在这条 路径中, 本发明实施例四提供的 DTX判决装置用于确定编码器是否 将当前噪音帧进行编码传输。
使用如上述图 3所示的 DTX判决装置的另一应用场景如图 6所 示, 输入的信号经过 VAD判决为语音帧或静音帧(背景噪音帧), 对 于语音帧则按照下面一条分支进行语音帧编码, 输出语音帧码流; 对 于静音帧 (背景噪音帧), 则按照上面一条分支进行噪音的编码, 在 这条路径中, 本发明实施例四提供的 DTX判决装置用于确定编码器 是否传输已编码的噪音帧数据。
通过使用上述实施例提供的装置,全面的利用了语音编解码带宽 内的噪声特性, 使用分带及分层处理的方法在噪声编码阶段给出全 面、 合理的 DTX判决结果, 从而使得 SID编码 /CNG解码更能贴近 实际噪声的特性变化。 通过以上的实施方式的描述,本领域的技术人员可以清楚地了解 到本发明, 可以通过硬件实现, 也可以借助软件加必需的通用硬件平 台的方式来实现。基于这样的理解, 本发明的技术方案可以以软件产 品的形式体现出来, 该软件产品可以存储在一个非易失性存储介质 (可以是 CD-ROM, U盘, 移动硬盘等) 中, 包括若干指令用以使 得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等) 执行本发明各个实施例所述的方法。
总之, 以上所述仅为本发明的较佳实施例而已, 并非用于限定本 发明的保护范围。 凡在本发明的精神和原则之内所作的任何修改、 等 同替换、 改进等, 均应包含在本发明的保护范围之内。

Claims

权利要求
1、 一种 DTX判决方法, 其特征在于, 包括:
根据输入的信号获取分带信号;
获取每一所述分带信号的特征信息变化量;
根据每一所述分带信号的特征信息变化量进行 DTX判决。
2、 如权利要求 1所述 DTX判决方法, 其特征在于, 所述根据输 入的信号获取分带信号前还包括:
检测到信号从语音变为噪声后获取噪声的特性, 对后续的 DTX 判决进行初始化。
3、 如权利要求 1所述 DTX判决方法, 其特征在于, 所述信号为 窄带信号, 所述分带信号为低带信号。
4、 如权利要求 3所述 DTX判决方法, 其特征在于,
所述低带信号进一步包括低带核心层信号; 或
所述低带信号进一步包括低带核心层信号和低带增强层信号。
5、 如权利要求 1所述 DTX判决方法, 其特征在于, 所述信号为 宽带信号, 所述分带信号为低带信号和高带信号。
6、 如权利要求 5所述 DTX判决方法, 其特征在于,
所述低带信号进一步包括低带核心层信号;或所述低带信号进一 步包括低带核心层信号和低带增强层信号;
所述高带信号进一步包括高带核心层信号;或所述高带信号进一 步包括高带核心层信号和高带增强层信号。
7、 如权利要求 1所述 DTX判决方法, 其特征在于, 所述信号为 超宽带信号, 所述分带信号为低带信号、 高带信号和超高带信号。
8、 如权利要求 7所述 DTX判决方法, 其特征在于,
所述低带信号进一步包括低带核心层信号;或所述低带信号进一 步包括低带核心层信号和低带增强层信号;
所述高带信号进一步包括高带核心层信号;或所述高带信号进一 步包括高带核心层信号和高带增强层信号。
9、如权利要求 3至 8中任一项所述 DTX判决方法,其特征在于, 所述分带信号为低带信号时, 获取所述分带信号的特征信息包括: 使用线性预测分析模型, 获取所述分带信号的特征信息, 所述特 征信息包括低带信号的能量信息和谱信息。
10、 如权利要求 5至 8中任一项所述 DTX判决方法, 其特征在 于, 所述分带信号为高带信号或超宽带信号时, 获取所述分带信号的 特征信息包括:
使用时域带宽扩展编码算法 TDBWE, 获取所述分带信号的特征 信息,所述特征信息包括高带信号或超高带信号的时域包络信息和频 域包络信息。
11、 如权利要求 10所述 DTX判决方法, 其特征在于, 所述频域 包络信息通过快速傅里叶变换 FFT或改进型离散余弦变换 MDCT获 取。
12、 如权利要求 3至 8中任一项所述 DTX判决方法, 其特征在 于, 所述根据每一所述分带信号的特征信息变化量进行 DTX判决包 括:
对每一所述分带信号的特征信息变化量进行联合判决,将所述联 合判决结果作为 DTX判决标准: 若结果大于一特定阔值则判断为需 要发送 SID帧, 否则判断为不需要发送 SID帧。
13、 如权利要求 12所述 DTX判决方法, 其特征在于, 所述信号 为窄带信号时, 所述联合判决包括:
所述分带信号仅涉及低带核心层时,根据低带核心层信号对应的 特征信息变化量作为 DTX判决标准;
所述分带信号到达低带增强层时,根据低带核心层信号及低带增 强层信号的特征信息变化量进行联合判决, 作为 DTX判决标准。
14、 如权利要求 12所述 DTX判决方法, 其特征在于, 所述信号 为宽带信号时, 所述联合判决包括:
所述分带信号到达高带核心层时,根据低带信号的联合特征信息 变化量与高带核心层信号对应的特征信息变化量进行联合判决,作为 DTX判决标准;
所述分带信号到达高带增强层时,根据低带信号的联合特征信息 变化量与宽带信号的联合特征信息变化量进行联合判决, 作为 DTX 判决标准。
15、 如权利要求 12所述 DTX判决方法, 其特征在于, 所述信号 为超宽带信号时, 所述联合判决包括:
根据低带信号、高带信号和超高带信号的联合特征信息变化量进 行联合判决, 作为 DTX判决标准。
16、 如权利要求 12述 DTX判决方法, 其特征在于, 所述对每一 所述分带信号的特征信息变化量进行联合判决包括:
将每一所述分带信号的所述特征信息变化量进行加权,将加权后 的结果进行联合判决, 作为 DTX判决标准; 或
将每一所述分带信号的所述特征信息变化量作为当前分带信号 的判决标准, 不同分带信号的判决结果一致时, 将所述判决结果作为 DTX 判决标准; 不同分带信号的判决结果不一致时, 将每一所述分 带信号的所述特征信息变化量进行加权,将加权后的结果进行联合判 决, 作为 DTX判决标准。
17、 一种 DTX判决装置, 其特征在于, 包括:
分带模块, 用于根据输入的信号获取分带信号;
特征信息变化量获取模块,用于获取所述分带模块分带后每一分 带信号的特征信息变化量;
判决模块,用于根据所述特征信息变化量获取模块获取的每一分 带信号的特征信息变化量进行 DTX判决。
18、 如权利要求 17所述 DTX判决装置, 其特征在于, 所述信号 为窄带信号, 所述分带信号为低带信号。
19、 如权利要求 18所述 DTX判决装置, 其特征在于, 所述低带信号进一步包括低带核心层信号; 或
所述低带信号进一步包括低带核心层信号和低带增强层信号。
20、 如权利要求 17所述 DTX判决装置, 其特征在于, 所述信号 为宽带信号, 所述分带信号为低带信号和高带信号。
21、 如权利要求 20所述 DTX判决装置, 其特征在于, 所述低带信号进一步包括低带核心层信号;或所述低带信号进一 步包括低带核心层信号和低带增强层信号;
所述高带信号进一步包括高带核心层信号;或所述高带信号进一 步包括高带核心层信号和高带增强层信号。
22、 如权利要求 17所述 DTX判决装置, 其特征在于, 所述信号 为超宽带信号, 所述分带信号为低带信号、 高带信号和超高带信号。
23、 如权利要求 22所述 DTX判决装置, 其特征在于, 所述低带信号进一步包括低带核心层信号;或所述低带信号进一 步包括低带核心层信号和低带增强层信号;
所述高带信号进一步包括高带核心层信号;或所述高带信号进一 步包括高带核心层信号和高带增强层信号。
24、 如权利要求 17所述 DTX判决装置, 其特征在于, 所述特征 信息变化量获取模块进一步包括:
低带特征信息变化量获取子模块,用于获取低带信号的特征信息 变化量。
25、 如权利要求 17所述 DTX判决装置, 其特征在于, 所述特征 信息变化量获取模块进一步包括:
低带特征信息变化量获取子模块,用于获取低带信号的特征信息 变化量;
高带特征信息变化量获取子模块,用于获取高带信号的特征信息 变化量。
26、 如权利要求 17所述 DTX判决装置, 其特征在于, 所述特征 信息变化量获取模块进一步包括:
低带特征信息变化量获取子模块,用于获取低带信号的特征信息 变化量;
高带特征信息变化量获取子模块,用于获取高带信号的特征信息 变化量; 超高带特征信息变化量获取子模块,用于获取超高带信号的特征 信息变化量。
27、 如权利要求 24至 26中任一项所述 DTX判决装置, 其特征 在于, 所述低带特征信息变化量获取子模块进一步包括:
低带分层单元,用于将输入的低带信号分层为低带核心层信号和 低带增强层信号,并分别发送到低带核心层特征信息变化量获取单元 和低带增强层特征信息变化量获取单元;
低带核心层特征信息变化量获取单元, 用于获取低带核心层信 号的特征信息变化量;
低带增强层特征信息变化量获取单元,用于获取低带增强层信号 的特征信息变化量;
低带综合单元,用于将所述低带核心层特征信息变化量获取单元 获取的低带核心层信号的特征信息变化量、和所述低带增强层特征信 息变化量获取单元获取的低带增强层信号的特征信息变化量进行综 合作为低带的特征信息变化量;
低带控制单元, 用于当所述低带信号仅涉及低带核心层时, 将所 述低带核心层判决子模块的输出作为低带信号的特征信息变化量; 当 所述分带信号到达低带增强层时,将所述低带综合单元的输出作为低 带信号的特征信息变化量。
28、 如权利要求 25或 26所述 DTX判决装置, 其特征在于, 所 述高带特征信息变化量获取子模块进一步包括:
高带分层单元,用于将输入的高带信号分层为高带核心层信号和 高带增强层信号,并分别发送到高带核心层特征信息变化量获取单元 和高带增强层特征信息变化量获取单元;
高带核心层特征信息变化量获取单元, 用于获取高带核心层信 号的特征信息变化量;
高带增强层特征信息变化量获取单元,用于获取高带增强层信号 的特征信息变化量;
高带综合单元,用于将所述高带核心层特征信息变化量获取单元 获取的高带核心层信号的特征信息变化量、和所述高带增强层特征信 息变化量获取单元获取的高带增强层信号的特征信息变化量进行综 合作为高带的特征信息变化量;
高带控制单元, 用于当所述高带信号仅涉及高带核心层时, 将所 述高带核心层判决子模块的输出作为高带信号的特征信息变化量; 当 所述分带信号到达高带增强层时,将所述高带综合单元的输出作为高 带信号的特征信息变化量。
29、 如权利要求 17所述 DTX判决装置, 其特征在于, 所述判决 模块进一步包括:
加权判决子模块,用于将所述特征信息变化量获取模块获取的每 一分带信号的特征信息变化量进行加权,将加权后的结果进行联合判 决, 作为 DTX判决标准。
30、 如权利要求 29所述 DTX判决装置, 其特征在于, 所述判决 模块还包括:
分带判决子模块,用于将所述特征信息变化量获取模块获取的每 一分带信号的特征信息变化量作为所述分带信号的判决标准,不同分 带信号的判决结果一致时, 将所述判决结果作为 DTX判决标准; 不 同分带信号的判决结果不一致时,通知所述加权判决子模块进行联合 判决。
PCT/CN2008/072774 2007-11-02 2008-10-21 Method and apparatus for judging dtx WO2009056035A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP08844412.0A EP2202726B1 (en) 2007-11-02 2008-10-21 Method and apparatus for judging dtx
AU2008318143A AU2008318143B2 (en) 2007-11-02 2008-10-21 Method and apparatus for judging DTX
US12/763,573 US9047877B2 (en) 2007-11-02 2010-04-20 Method and device for an silence insertion descriptor frame decision based upon variations in sub-band characteristic information

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN200710166748.9 2007-11-02
CN200710166748 2007-11-02
CN200810084319.1 2008-03-18
CNB2008100843191A CN100555414C (zh) 2007-11-02 2008-03-18 一种dtx判决方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/763,573 Continuation US9047877B2 (en) 2007-11-02 2010-04-20 Method and device for an silence insertion descriptor frame decision based upon variations in sub-band characteristic information

Publications (1)

Publication Number Publication Date
WO2009056035A1 true WO2009056035A1 (en) 2009-05-07

Family

ID=40197558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/072774 WO2009056035A1 (en) 2007-11-02 2008-10-21 Method and apparatus for judging dtx

Country Status (5)

Country Link
US (1) US9047877B2 (zh)
EP (1) EP2202726B1 (zh)
CN (1) CN100555414C (zh)
AU (1) AU2008318143B2 (zh)
WO (1) WO2009056035A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109074810A (zh) * 2016-02-17 2018-12-21 弗劳恩霍夫应用研究促进协会 用于多声道编码中的立体声填充的装置和方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246688B (zh) * 2007-02-14 2011-01-12 华为技术有限公司 一种对背景噪声信号进行编解码的方法、系统和装置
CN102315901B (zh) * 2010-07-02 2015-06-24 中兴通讯股份有限公司 不连续发射dtx的判决方法和装置
CN102903364B (zh) * 2011-07-29 2017-04-12 中兴通讯股份有限公司 一种进行语音自适应非连续传输的方法及装置
US20130155924A1 (en) * 2011-12-15 2013-06-20 Tellabs Operations, Inc. Coded-domain echo control
CN103187065B (zh) * 2011-12-30 2015-12-16 华为技术有限公司 音频数据的处理方法、装置和系统
CN105846948B (zh) * 2015-01-13 2020-04-28 中兴通讯股份有限公司 一种实现harq-ack检测的方法及装置
US10978096B2 (en) 2017-04-25 2021-04-13 Qualcomm Incorporated Optimized uplink operation for voice over long-term evolution (VoLte) and voice over new radio (VoNR) listen or silent periods
US10805191B2 (en) 2018-12-14 2020-10-13 At&T Intellectual Property I, L.P. Systems and methods for analyzing performance silence packets

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10190498A (ja) * 1996-11-15 1998-07-21 Nokia Mobile Phones Ltd 不連続伝送中に快適雑音を発生させる改善された方法
US20020101844A1 (en) * 2001-01-31 2002-08-01 Khaled El-Maleh Method and apparatus for interoperability between voice transmission systems during speech inactivity
US20020161573A1 (en) * 2000-02-29 2002-10-31 Koji Yoshida Speech coding/decoding appatus and method
CN1440602A (zh) * 2000-06-29 2003-09-03 高通股份有限公司 用于dtx帧检测的系统与方法
US20050075873A1 (en) * 2003-10-02 2005-04-07 Jari Makinen Speech codecs
WO2006084003A2 (en) * 2005-02-01 2006-08-10 Qualcomm Incorporated Method for discontinuous transmission and accurate reproduction of background noise information

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3090842B2 (ja) * 1994-04-28 2000-09-25 沖電気工業株式会社 ビタビ復号法に適応した送信装置
FI100840B (fi) * 1995-12-12 1998-02-27 Nokia Mobile Phones Ltd Kohinanvaimennin ja menetelmä taustakohinan vaimentamiseksi kohinaises ta puheesta sekä matkaviestin
SE507370C2 (sv) * 1996-09-13 1998-05-18 Ericsson Telefon Ab L M Metod och anordning för att alstra komfortbrus i linjärprediktiv talavkodare
US5960389A (en) * 1996-11-15 1999-09-28 Nokia Mobile Phones Limited Methods for generating comfort noise during discontinuous transmission
WO2001033814A1 (en) * 1999-11-03 2001-05-10 Tellabs Operations, Inc. Integrated voice processing system for packet networks
FI116643B (fi) * 1999-11-15 2006-01-13 Nokia Corp Kohinan vaimennus
US6691085B1 (en) * 2000-10-18 2004-02-10 Nokia Mobile Phones Ltd. Method and system for estimating artificial high band signal in speech codec using voice activity information
US6721712B1 (en) * 2002-01-24 2004-04-13 Mindspeed Technologies, Inc. Conversion scheme for use between DTX and non-DTX speech coding systems
US7889783B2 (en) * 2002-12-06 2011-02-15 Broadcom Corporation Multiple data rate communication system
US20050004793A1 (en) * 2003-07-03 2005-01-06 Pasi Ojala Signal adaptation for higher band coding in a codec utilizing band split coding
CN1617605A (zh) * 2003-11-12 2005-05-18 皇家飞利浦电子股份有限公司 一种在语音信道传输非语音数据的方法及装置
US20060149536A1 (en) * 2004-12-30 2006-07-06 Dunling Li SID frame update using SID prediction error
WO2006104555A2 (en) * 2005-03-24 2006-10-05 Mindspeed Technologies, Inc. Adaptive noise state update for a voice activity detector
US7693708B2 (en) * 2005-06-18 2010-04-06 Nokia Corporation System and method for adaptive transmission of comfort noise parameters during discontinuous speech transmission
US7610197B2 (en) * 2005-08-31 2009-10-27 Motorola, Inc. Method and apparatus for comfort noise generation in speech communication systems
WO2007091956A2 (en) * 2006-02-10 2007-08-16 Telefonaktiebolaget Lm Ericsson (Publ) A voice detector and a method for suppressing sub-bands in a voice detector
US8032370B2 (en) * 2006-05-09 2011-10-04 Nokia Corporation Method, apparatus, system and software product for adaptation of voice activity detection parameters based on the quality of the coding modes
JP4810335B2 (ja) * 2006-07-06 2011-11-09 株式会社東芝 広帯域オーディオ信号符号化装置および広帯域オーディオ信号復号装置
US8260609B2 (en) * 2006-07-31 2012-09-04 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
US8725499B2 (en) * 2006-07-31 2014-05-13 Qualcomm Incorporated Systems, methods, and apparatus for signal change detection
US8032359B2 (en) * 2007-02-14 2011-10-04 Mindspeed Technologies, Inc. Embedded silence and background noise compression
US8982744B2 (en) * 2007-06-06 2015-03-17 Broadcom Corporation Method and system for a subband acoustic echo canceller with integrated voice activity detection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10190498A (ja) * 1996-11-15 1998-07-21 Nokia Mobile Phones Ltd 不連続伝送中に快適雑音を発生させる改善された方法
US20020161573A1 (en) * 2000-02-29 2002-10-31 Koji Yoshida Speech coding/decoding appatus and method
CN1440602A (zh) * 2000-06-29 2003-09-03 高通股份有限公司 用于dtx帧检测的系统与方法
US20020101844A1 (en) * 2001-01-31 2002-08-01 Khaled El-Maleh Method and apparatus for interoperability between voice transmission systems during speech inactivity
US20050075873A1 (en) * 2003-10-02 2005-04-07 Jari Makinen Speech codecs
WO2006084003A2 (en) * 2005-02-01 2006-08-10 Qualcomm Incorporated Method for discontinuous transmission and accurate reproduction of background noise information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENG, QIANG ET AL.: "The Influence of Low Bit Rate Speech Coders on Speech Recognition System", APPLICATION RESEARCH OF COMPUTERS, no. 9, September 2003 (2003-09-01), pages 22 - 25 28, XP008132819 *
See also references of EP2202726A4 *
ZHOU,DEJUN: "Discontinuous Transmission in Speech Communication", COMMUNICATIONS TECHNOLOGY, no. 9, September 2001 (2001-09-01), pages 46 - 48 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109074810A (zh) * 2016-02-17 2018-12-21 弗劳恩霍夫应用研究促进协会 用于多声道编码中的立体声填充的装置和方法
US11727944B2 (en) 2016-02-17 2023-08-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for stereo filling in multichannel coding
CN109074810B (zh) * 2016-02-17 2023-08-18 弗劳恩霍夫应用研究促进协会 用于多声道编码中的立体声填充的装置和方法

Also Published As

Publication number Publication date
EP2202726A1 (en) 2010-06-30
US9047877B2 (en) 2015-06-02
EP2202726A4 (en) 2013-01-23
CN100555414C (zh) 2009-10-28
EP2202726B1 (en) 2017-04-05
US20100268531A1 (en) 2010-10-21
AU2008318143B2 (en) 2011-12-01
CN101335001A (zh) 2008-12-31
AU2008318143A1 (en) 2009-05-07

Similar Documents

Publication Publication Date Title
US8473301B2 (en) Method and apparatus for audio decoding
WO2009056035A1 (en) Method and apparatus for judging dtx
JP5357055B2 (ja) 改良形デジタルオーディオ信号符号化/復号化方法
CA2556797C (en) Methods and devices for low-frequency emphasis during audio compression based on acelp/tcx
CA2690433C (en) Method and device for sound activity detection and sound signal classification
JP5149198B2 (ja) 音声コーデック内の効率的なフレーム消去隠蔽の方法およびデバイス
EP2224433B1 (en) An apparatus for processing an audio signal and method thereof
WO2009067883A1 (fr) Procédé de codage/décodage et dispositif pour le bruit de fond
JP6039678B2 (ja) 音声信号符号化方法及び復号化方法とこれを利用する装置
US20070147518A1 (en) Methods and devices for low-frequency emphasis during audio compression based on ACELP/TCX
JP6775064B2 (ja) オーディオ信号復号器における改善された周波数帯域拡張
US20140207445A1 (en) System and Method for Correcting for Lost Data in a Digital Audio Signal
WO2009117967A1 (zh) 编码、解码的方法及装置
EP2193348A1 (en) Method and device for efficient quantization of transform information in an embedded speech and audio codec
CN101335002A (zh) 一种音频解码的方法和装置
Kim et al. Temporal normalization techniques for transform-type speech coding and application to split-band wideband coders.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08844412

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2008318143

Country of ref document: AU

Ref document number: 1396/KOLNP/2010

Country of ref document: IN

REEP Request for entry into the european phase

Ref document number: 2008844412

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2008844412

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2008318143

Country of ref document: AU

Date of ref document: 20081021

Kind code of ref document: A