WO2009067883A1 - Procédé de codage/décodage et dispositif pour le bruit de fond - Google Patents

Procédé de codage/décodage et dispositif pour le bruit de fond Download PDF

Info

Publication number
WO2009067883A1
WO2009067883A1 PCT/CN2008/072939 CN2008072939W WO2009067883A1 WO 2009067883 A1 WO2009067883 A1 WO 2009067883A1 CN 2008072939 W CN2008072939 W CN 2008072939W WO 2009067883 A1 WO2009067883 A1 WO 2009067883A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
noise
coding
noise frame
band
Prior art date
Application number
PCT/CN2008/072939
Other languages
English (en)
Chinese (zh)
Inventor
Qi Zhang
Jinliang Dai
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2009067883A1 publication Critical patent/WO2009067883A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the present invention relates to the field of voice communication technologies, and in particular, to a codec method and apparatus for background noise. Background technique
  • FIG. 1 is a schematic diagram of a method of compressing background noise in a DTX manner in voice communication.
  • VAD Voice Activity Detection
  • SID Session Insertion Descriptor
  • the corresponding decoding process is as follows: For the speech frame code stream Speech frame decoding reconstructs the speech signal; the non-continuous transmission system reconstructs a continuous comfortable background noise signal using a specific CNG (Comfort Noise Generation) algorithm based on the received non-contiguous SID frame stream. .
  • CNG Computer Noise Generation
  • G.729.1 is the latest generation of speech codec standard released by ITU (International Telecommunication Union).
  • ITU International Telecommunication Union
  • the biggest feature of this embedded speech codec standard is its layered coding, which can provide a code rate range of 8kb.
  • the narrowband-to-broadband audio quality of /s ⁇ 32kb/s allows the outer code stream to be discarded according to channel conditions during transmission, and has good channel adaptability.
  • a narrowband signal refers to a signal with a frequency band of 0 to 4000 Hz
  • a wideband signal refers to a signal with a frequency band of 0 to 8000 Hz
  • an ultrawideband signal refers to a signal with a frequency band of 0 to 16000 Hz.
  • the wideband signal can be decomposed into a low-band signal component and a high-band signal component.
  • the low-band signal (component) refers to a signal of 0 to 4000 Hz, and the low-band signal component can also be called a narrowband signal component.
  • the high band signal (component) refers to the signal of 4000 ⁇ 8000Hz, and the super high band signal (component) refers to the signal of 8000 ⁇ 16000Hz.
  • hierarchical is achieved by constructing the code stream into an embedded hierarchical structure.
  • the core layer is coded using the G.729 standard, which is a new type of embedded layered multi-rate speech coding. decoder.
  • the input is a 20ms superframe.
  • the input signal 3 ⁇ 4 (n) is first filtered by QMF (Quarature Mirror Filterbank) (H, ( ⁇ ), ⁇ 2 ( ) is divided into two sub-bands, the low sub-band signal is preprocessed by a high-pass filter with a cutoff frequency of 50 Hz, and the output signal s LB (n) uses a narrow-band embedded CELP of 8 kb/s to 12 kb/s (Code-Excited Linear-Prediction , code excited linear prediction) the encoder performs encoding, and the difference signal d L n between the locally synthesized signal of the CELP encoder at a code rate of 12 Kb/s) is subjected to a perceptual weighted filtering (W LB (z) ) signal d B (n) is transformed into the frequency domain by MDCT (Modified Discrete Cosine Transform).
  • Weighting filter W LB (z) contains gain compensation to maintain spectral
  • the high-band signal component is multiplied by (-1)".
  • the signal s» f after spectral inversion is preprocessed by a low-pass filter with a cutoff frequency of 3000 Hz.
  • the filtered signal uses TDBWE (Time-Domain Band Width Extension, time domain).
  • the bandwidth extension is encoded by the encoder.
  • the TD AC (Time Domain Alias Cancellation) coding module is also first converted to the frequency domain using MDCT.
  • the two sets of MDCT coefficients / ⁇ and ⁇ ) were finally encoded using TDAC.
  • some parameters are transmitted using the FEC (Frame Erasure Concealment) encoder to improve the error caused by frame loss during transmission.
  • FEC Fre Erasure Concealment
  • Figure 2 is a block diagram of the G.729.1 layer encoder system, where the dotted line is the QMF filter bank for the banding.
  • Figure 3 is a block diagram of the G.729.1 decoder decoder system. The actual working mode of the decoder is determined by the number of code streams received, which is also equivalent to the received code rate.
  • the dotted line portion is a QMF filter bank for synthesizing each subband into a full band signal. According to the different code rates received by the decoder, the conditions are as follows:
  • the code stream of the first layer or the first two layers is decoded by the embedded CELP decoder, The decoded signal s ( «) is further filtered to obtain a wideband signal that is combined into a QMF filter and combined into a 16 kHz signal after high-pass filtering, wherein the high-band component is set to zero.
  • the TDBWE decoder In addition to the CELP decoder decoding the low-band signal component, the TDBWE decoder also decodes the high-band signal component s ( «). For the MDCT transformation, the frequency component above 3000 Hz (corresponding to 7000 Hz or higher in the 16 kHz sampling rate) in the high-band signal component spectrum is set to 0, and then Inverse MDCT transform, after superposition and speech inversion, then synthesize a 16 kHz wideband signal in the QMF filter bank with the low band component decoded by the CELP decoder.
  • the TDBWE decoder decodes the high-band signal component
  • the TD AC decoder is also used to decode the low-band weighted differential signal and the high-band enhancement signal to enhance the full-band signal, and finally to synthesize the 16-kHz wideband signal in the QMF filter bank.
  • the code stream of G729.1 has a hierarchical structure, which allows the outer code stream to be discarded from the outside to the inside according to the transmission capability of the channel during transmission to achieve adaptation to the channel condition.
  • the discontinuous transmission mode for noise frames has not been defined in the G.729.1 standard, which means that for the gap phase in voice communication, the encoder still needs to encode according to the voice frame, which not only increases the coding.
  • the algorithmic burden of the device also wastes the limited transmission bandwidth of the channel, so it is necessary to introduce a discontinuous transmission mode for noise.
  • the frequency parameter is quantized by the current line spectrum, otherwise the line language corresponding to the average LPC parameter of the past 6 frames is used. By quantifying the frequency parameters, it can be seen that this alternative is discontinuous for the stationary nature of the background noise.
  • Table 1 G.729 AnnexB SID frame bit allocation
  • the energy of each frame is calculated by the smoothing method for the decoded frame energy, and the frequency parameter of the last SID line pair is directly copied for the line spectrum versus frequency parameter.
  • the above noise coding method is only suitable for encoding narrow-band noise, and is powerless for broadband noise, lacking bandwidth scalability.
  • a DTX/CNG noise coding method represented by an AMR-WB is also known in the prior art.
  • the AMR-WB is based on a 16 kHz sample, 20 ms frame processing, and performs variable rate encoding for a signal frame judged to be a speech signal in VAD detection, and an input signal judged to be background noise in VAD detection.
  • a fixed coding mode that is, outputting a frame of 35-bit SID frame information every 7 frames.
  • the SID coding parameters are mainly to encode the energy and spectral parameters of the background noise.
  • the energy parameter is the logarithmic domain energy of the current noise frame:
  • the AMR-WB is represented by the ISF (Immittance Spectral Frequency) parameter.
  • the ISF parameter is a 16-dimensional vector that is transformed from a 16-order LPC (Linear Prediction Coding) coefficient.
  • LPC Linear Prediction Coding
  • the average frame energy is quantized by 6 bits, and the quantization of the spectral parameters is divided into 5 sub-vectors by using the split quantization technique to quantize the 16-dimensional ISF quantized vector.
  • the SID frame length of the AMR-WB is 35 bits, and its bit allocation is as shown in Table 2:
  • Embodiments of the present invention provide a coding and decoding method and apparatus for background noise, which can perform coding with bandwidth scalability for background noise.
  • An embodiment of the present invention provides a coding and decoding method for background noise, including: when a received audio frame is a noise frame, selecting a noise frame that needs to be coded according to a transmission mode of the current noise frame;
  • the noise frame that needs to be encoded is hierarchically coded.
  • the coding parameters of the noise frame are decoded according to a transmission mode of the current noise frame
  • Background noise reconstruction is performed according to the coding parameters.
  • An embodiment of the present invention further provides an encoder, including:
  • a selecting unit configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current frame, and send the selected result to the coding unit; and the coding unit is configured to send according to the selection unit
  • the noise frames that need to be encoded are hierarchically encoded.
  • An embodiment of the present invention further provides a decoder, including:
  • a decoding unit configured to: when the received audio frame is a layered coded noise frame, decode the coding parameter of the noise frame according to a transmission mode of the current noise frame;
  • a reconstruction unit configured to perform background noise reconstruction according to the coding parameter of the noise frame sent by the decoding unit.
  • the invention also provides a codec system for background noise, comprising:
  • An encoder configured to: when the received audio frame is a noise frame, select a noise frame that needs to be coded according to a transmission mode of the current noise frame, and perform hierarchical coding on the noise frame that needs to be coded;
  • a decoder configured to: when the audio frame received from the encoder is a layered coded noise frame, decode an encoding parameter of the noise frame according to a transmission mode of the current noise frame, and perform background noise according to the coding parameter reconstruction.
  • Embodiments of the present invention have the following advantages over the prior art:
  • the encoding end selects the noise frame to be encoded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame;
  • the transmission mode of the layered coded noise frame is decoded to decode the coding parameters of the noise frame, and background noise reconstruction is performed to achieve bandwidth scalability for background noise.
  • FIG. 1 is a schematic diagram of a method for compressing background noise in a DTX manner in the prior art
  • FIG. 2 is a schematic diagram of a G.729.1 encoder system in the prior art
  • FIG. 3 is a schematic diagram of a G.729.1 decoder system in the prior art
  • FIG. 4 is a schematic flowchart of a background noise encoding method according to Embodiment 1 of the present invention
  • FIG. 5 is a schematic flowchart of a background noise encoding method according to Embodiment 2 of the present invention
  • FIG. 6 is a DTX noise encoding according to Embodiment 2 of the present invention
  • FIG. 7 is a schematic diagram of a TDBWE encoder system for background noise according to Embodiment 2 of the present invention.
  • FIG. 8 is a schematic diagram of an encoder system according to Embodiment 2 of the present invention.
  • FIG. 9 is a schematic diagram of a CNG noise decoding module of a decoding end according to Embodiment 2 of the present invention
  • FIG. 10 is a schematic diagram of a method for recovering a low-band signal component by using a reconstructed low-band coding parameter according to Embodiment 2 of the present invention
  • FIG. 11 is a schematic diagram of a method for recovering a high-band signal component by using a reconstructed high-band coding parameter according to Embodiment 2 of the present invention.
  • FIG. 12 is a schematic diagram of a decoder system according to Embodiment 2 of the present invention.
  • FIG. 13 is a schematic flow chart of a method for encoding background noise according to Embodiment 3 of the present invention.
  • FIG. 14 is a schematic diagram of a coding end system of a noise frame according to Embodiment 3 of the present invention
  • FIG. 15 is a schematic diagram of a decoding end system of a noise frame according to Embodiment 3 of the present invention
  • 16 is a schematic diagram of an encoder according to Embodiment 5 of the present invention
  • FIG. 17 is a schematic diagram of a decoder according to Embodiment 6 of the present invention. detailed description
  • FIG. 4 a method for encoding and decoding background noise is shown in FIG. 4, and the specific steps are as follows:
  • Step S401 At the encoding end, use VAD detection on the input audio frame to determine the type of the current audio frame. If the current audio frame is a voice frame, the audio frame is encoded according to the voice frame coding algorithm, if the current frame is a noise frame and The previous frame is a speech frame (ie, switching from a speech frame to a noise frame is currently occurring), and the flow proceeds to step S402.
  • Step S402 If the switching from the speech frame to the noise frame currently occurs, it is also possible to first enter the tailing phase.
  • the switching from the voice frame to the noise frame may first enter the tailing phase, and the tailing phase is specifically: in the N frame time after the switching from the voice frame to the noise frame occurs, the voice frame is still followed.
  • the encoding algorithm encodes the current noise frame, but reduces the encoding speed.
  • Step S403 Select a noise frame to be encoded according to the transmission mode.
  • Two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the discontinuous transmission mode is used, it is determined whether the current frame needs to be encoded. If it is determined that the current noise frame needs to be encoded, the current frame is selected as the noise frame to be encoded, otherwise no processing is performed on the current frame; In the continuous transmission mode, the current frame is directly selected as the noise frame to be encoded, that is, all the received noise frames are encoded.
  • discontinuous transmission mode it is determined whether the current frame needs to be encoded. If it is determined that the current noise frame needs to be encoded, the current frame is selected as the noise frame to be encoded, otherwise no processing is performed on the current frame; In the continuous transmission mode, the current frame is directly selected as the noise frame to be encoded, that is, all the received noise frames are encoded.
  • Step S404 performing narrowband core layer coding on the noise frame that needs to be encoded.
  • the low-band signal component of the noise frame that needs to be encoded is obtained, and the core layer parameter encoding is performed on the low-band signal component.
  • the method for obtaining a low-band signal component of a noise frame to be encoded includes: performing band-sampling on a noise frame to be coded, dividing the noise frame into a low-band signal component and a high-band signal component; or performing high-pass filtering on the noise signal of the full-band, And the sample processing is performed to obtain a low-band signal component.
  • the method for performing narrowband core layer coding on the acquired lowband signal component comprises: linearly predicting and analyzing a lowband signal component of the noise frame to obtain a linear prediction coefficient and a signal energy; converting the linear prediction coefficient into a spectral parameter, and using the spectral parameter Perform vector quantization to obtain quantized spectral parameters; quantize the signal energy to obtain frame energy; and use the quantized spectral parameters and frame energy as narrow-band core layer parameters of the noise frame.
  • Step S405 If the enhancement layer coding is further required, the noise frame encoded by the narrowband core layer is subjected to extension layer coding.
  • the noise frame is subjected to narrowband enhancement layer coding, that is, the quantization error of the spectral parameters in the narrowband core layer and the quantization error of the signal energy are quantized.
  • Broadband spreading layer coding is performed on the noise frame, that is, the high-band signal component of the noise frame is subjected to extended parameter coding.
  • the extension layer can be one layer or multiple layers.
  • the broadband extension layer includes a broadband core layer and a broadband enhancement layer.
  • the broadband extended layer coding of the noise frame specifically includes: acquiring a time domain envelope and a frequency domain envelope of the high band signal component, and subtracting the quantized time domain envelope from each dimension component of the frequency domain envelope, and the obtained vector is split into A plurality of sub-vectors are separately quantized to obtain a wideband extension layer coding parameter.
  • Step S406 After the encoding is completed, the encoded noise frame is transmitted.
  • Step S407 Decode the encoding parameter from the received encoded code stream at the decoding end, and determine the type of the current audio frame. If the current audio frame is a voice frame, decode the audio frame according to the voice frame decoding algorithm. Otherwise, turn Step S408.
  • Step S408 If the received audio frame is a noise frame, the coding parameters of the noise frame are decoded according to the transmission mode of the current noise frame.
  • the coding parameters of the received noise frame are decoded, and for the untransmitted noise frame, according to the previously received noise frame or the coding parameter buffered in the trailing phase.
  • the encoding parameters of the current noise frame are decoded.
  • the coding parameters are decoded for the received noise frame.
  • Step S409 Perform background noise reconstruction according to the decoded coding parameters.
  • the coefficients of the synthesis filter are calculated using the reconstructed spectral parameters, and Gaussian random noise is used as the excitation, and the calculation is performed.
  • the synthesized filter is synthesized and filtered, and the reconstructed energy parameter is used for time domain shaping to reconstruct the background noise signal; or the low band coding parameter is CELP decoded to obtain the decoded low band signal component, and the low band signal is obtained.
  • the component is sampled as a full-band signal and spectrally spread to reconstruct a background noise signal.
  • the TDBWE decoding algorithm may be used to reconstruct the background noise signal from the noise frame; or the background noise signal reconstructed from the noise frame by the TDAC decoding algorithm may be used.
  • the method for reconstructing the background noise signal from the noise frame by using the TDB WE decoding algorithm is as follows: Calculate the coefficient of the synthesis filter using the reconstructed spectral parameters, use Gaussian random noise as the excitation, and perform synthesis filtering through the calculated synthesis filter. And use to rebuild The energy parameters are time domain shaped to obtain the low-band signal component of the background noise signal. Using Gaussian random noise as the excitation source, the reconstructed high-band coding parameters are used for time domain shaping and frequency domain shaping of the excitation source to reconstruct the background noise. The high-band signal component of the signal; performing QMF synthesis filtering on the reconstructed low-band signal component and the high-band signal component to obtain a background noise signal.
  • the method of constructing the background noise signal for the noise frame by using the TDAC decoding algorithm is as follows: Decoding the low-band signal component by the CELP decoding algorithm for the low-band coding parameter, raising the low-band signal component and performing frequency-spreading to obtain the whole The frequency band signal is subjected to inverse quantization and inverse MDCT transform on the reconstructed high-band coding parameters to obtain a residual signal, which is combined with the full-band signal to obtain a broadband background noise signal.
  • the high-band signal component is encoded by the TDBWE encoding algorithm as an example, and a background noise encoding and decoding method is shown in FIG. 5, and the specific steps are as follows:
  • Step S501 At the encoding end, input a data length of 20 ms and a sampling rate of 16000 Hz, and use VAD detection on the input audio frame to determine the type of the current frame. If the current frame is a voice frame, go to step S502, if current The frame is a noise frame and the previous frame is a voice frame (ie, the switching from the voice frame to the noise frame currently occurs), and the process goes to step S503.
  • the frame structure of the full-rate speech frame used in this embodiment is as shown in Table 3.
  • TDBWE Layer 3 - Broadband Enhancement Layer
  • Step S502 If the current frame is a voice frame, the current frame is encoded according to a voice frame coding algorithm, and a coded stream of up to 32 kb/s can be encoded.
  • Step S503 If the switching from the voice frame to the noise frame occurs currently, the smear phase may also be entered first.
  • the trailing phase duration is N frames, that is, in the N frame time after the switching from the voice frame to the noise frame occurs, the current noise frame is still encoded according to the encoding algorithm of the voice frame, but the encoding speed is reduced. For example, if the encoding rate of the speech frame before switching is encoded, if the encoding rate of the speech frame before switching is 8 kb/s or 12 kb/s, then the packet is advanced.
  • the learning and training of the noise parameters can be completed at the same time, that is, the autocorrelation function of the low-band signal component of the buffering tail stage, the low-band coding parameter and the high-band coding parameter are used for initializing the encoding of the subsequent noise frame.
  • discontinuous transmission (DTX) mode two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the current frame is encoded and transmitted in the discontinuous transmission mode, step S504 is performed. If the continuous transmission mode is used, all the received noise frames are encoded, and steps S505 to S507 are directly performed.
  • DTX discontinuous transmission
  • continuous transmission mode all the received noise frames are encoded, and steps S505 to S507 are directly performed.
  • Step S504 Determine whether the current noise frame needs to be encoded. If the current noise frame needs to be encoded, go to step S505, otherwise no processing is performed on the current frame.
  • the DTX policy may be determined by using specific criteria to determine whether the current frame needs to be encoded, that is, the spectrum of the current noise frame, the energy relative to the long-term average spectrum, and the energy (ie, the average spectrum of the previously buffered coding parameters, energy). Distortion, if the distortion exceeds a certain threshold, the noise frame is encoded, otherwise no processing is performed on the current frame.
  • the implementation module for encoding the noise frame is shown in Figure 6. Shown.
  • Step S505 Perform narrowband core layer coding on the current noise frame.
  • the narrowband core layer parameter coding may use the CELP model, and perform QMF banding filtering on the background noise frame that needs to be SID coded, and divide into several subbands according to the frequency.
  • This embodiment takes the simplest case.
  • the background noise frame is divided into two sub-bands: a low-band signal component and a high-band signal component 3 ⁇ 4 ( «), a low-band signal component frequency range of 0 to 4000 Hz, and a high-band signal component frequency range of 4000 to 8000 Hz.
  • Step S506 If the extension layer parameter coding is needed, the extension layer parameter coding is performed on the noise frame encoded by the narrowband core layer.
  • the quantization error of the spectral parameters in the narrowband core layer and the quantization error of the energy parameter are further quantized, that is, if the spectral parameter before quantization is ⁇ , the spectral parameter after quantization in the core layer is Then, in the narrowband enhancement layer, the pair is quantized, and the quantization result is the index value in the spectral quantization codebook in the enhancement layer; for the energy parameter, a similar method is also used to quantize the £- to obtain the narrowband enhancement layer. Encoded noise frame.
  • the noise frame encoded by the narrowband enhancement layer is subjected to extended parameter coding.
  • the high-band signal component is decomposed from the background noise frame, and the TDBWE encoding algorithm is used to perform extended parameter encoding on the high-band signal component, as shown in FIG. 7 That is, the time domain envelope or the frequency domain envelope of the high band signal component is first calculated separately.
  • the calculation method of the time domain envelope is as shown in formula (1):
  • I is the number of time domain envelopes.
  • the calculation method of the frequency domain envelope is as follows: First, a high-band signal component is windowed using a 128-tap Hanning window. The window function is as shown in equation (2):
  • the high-band signal component after windowing is:
  • j is the number of frequency domain envelopes.
  • the embodiment of the present invention can also be applied to obtain a frequency domain envelope for any band of a high band, and the number of frequency domain envelopes can also be any value greater than 0, and thus is not limited to the application in G.729.1. Because the encoding of the background noise, the human ear can not distinguish the time domain envelope of the background noise very finely, so it does not need to be divided into 16 time domain envelopes like a speech frame, but only needs to calculate the entire frame.
  • the average time domain envelope can be, as shown in equation (6):
  • the obtained time domain envelope is quantized using a uniform quantizer with a length of 5 bits and a quantization step size of 3 dB.
  • the quantized time domain envelope is represented by 7 ⁇ , and then the dimensional components of the J-dimensional frequency domain envelope are reduced.
  • the vector after 7 ⁇ is split into 3 sub-vectors, and quantized separately; the quantized time domain envelope and the frequency domain envelope are output through the multiplexer to obtain a noise frame encoded by the wideband extension layer.
  • Step S507 After the encoding is completed, the encoded noise frame is transmitted.
  • the encoder system of the embodiment of the present invention is as shown in FIG.
  • Step S508 Decode the encoding parameter from the received encoded code stream at the decoding end, and determine the type of the current frame. If the current frame is a voice frame, decode the audio frame according to the voice frame decoding algorithm, if the current frame is The noise frame is changed to step S509.
  • the media gateway may discard some coded bits from the outer layer to the inner layer according to channel conditions to adapt to the channel transmission capability, so even if the encoder sends the full rate
  • the decoder may also be unable to receive the full rate stream.
  • the decoder can only decode according to the actual received code stream according to the corresponding rate.
  • Step S509 Reconstruct the coding parameters of the received noise frame, and reconstruct a background noise signal according to the coding parameters of the noise frame.
  • the decoder can only be connected intermittently. Receiving the SID frame, reconstructing the encoding parameter for the received noise frame, and reconstructing the encoding parameter of the current frame by the previously received noise frame or the noise parameter learned in the trailing phase for the frame that is not transmitted, and then performing the background Noise reconstruction.
  • the decoding module in the discontinuous transmission mode is shown in FIG.
  • the coding parameters are reconstructed for all received noise frames for background noise reconstruction.
  • the received noise frame only contains the narrowband core layer
  • it is necessary to calculate the coding parameter of the narrowband core layer [0, Division, construct the filter using the reconstructed spectral parameter ⁇ , wherein the filter uses Gaussian random noise as the excitation
  • the signal is used to filter the coding parameters of the narrow-band core layer, and the encoded parameters of the filtered narrow-band core layer are then shaped by using the decoded energy parameter E, thereby reconstructing the low-band signal component of the background noise, as shown in FIG. .
  • the decoder also requires to output a wideband signal, the highband signal component is set to 0, and the wideband signal output can be synthesized by using the QMF synthesis filter and the reconstructed lowband signal component. If the decoder does not require the output of the wideband signal, then The reconstructed low-band signal component can be directly output.
  • the received noise frame further includes a narrowband enhancement layer
  • the narrowband enhancement layer since the narrowband enhancement layer only enhances the quantization precision of the core layer spectral parameters and the energy parameters, no new parameters are added, so the spectral parameters and energy parameters obtained by decoding are used.
  • a reconstructed wideband or narrowband background noise signal can be obtained by a decoding process similar to that of a narrowband core layer only stream.
  • the decoder system of the embodiment of the present invention is as shown in FIG.
  • the TD AC encoding algorithm is used to encode the high-band signal component as an example, and a background noise encoding and decoding method is shown in FIG. 13, and the specific steps are as follows: Step S1301, at the encoding end, Using the VAD detection on the input audio frame, determining the type of the current frame, if the current frame is a voice frame, go to step S1302, if the current frame is a noise frame and the previous frame is a voice frame (ie, the current frame from the voice frame to the noise frame occurs) Switching), go to step S1303.
  • the frame structure of the full-rate noise frame used in this embodiment is as shown in Table 5: Table 5 Bit allocation of noise frames
  • Step S1302 If the current frame is a voice frame, the current frame is encoded according to a voice frame coding algorithm, and a coded stream of up to 32 kb/s can be encoded.
  • Step S1303 If the switching from the voice frame to the noise frame occurs currently, First enter the trailing phase.
  • the trailing phase duration is N frames, that is, in the N frame time after the switching from the voice frame to the noise frame occurs, the current noise frame is still encoded according to the encoding algorithm of the voice frame, but the encoding speed is reduced. For example, if the encoding rate of the speech frame before switching is encoded, if the encoding rate of the speech frame before switching is 8 kb/s or 12 kb/s, then the packet is advanced.
  • the learning and training of the noise parameters can be completed at the same time, that is, the autocorrelation function of the low-band signal component of the buffering tail stage, the low-band coding parameter and the high-band coding parameter are used for initializing the encoding of the subsequent noise frame.
  • two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the current frame is encoded and transmitted in the discontinuous transmission mode, step S1304 is performed. If the continuous transmission mode is used, all the received noise frames are encoded, and steps S1305 to S1307 are directly performed.
  • DTX discontinuous transmission
  • step S1304 If the current frame is encoded and transmitted in the discontinuous transmission mode, step S1304 is performed. If the continuous transmission mode is used, all the received noise frames are encoded, and steps S1305 to S1307 are directly performed.
  • Step S1304 Determine whether it is necessary to encode the current noise frame. If the current noise frame needs to be encoded, go to step S1305, otherwise no processing is performed on the current frame.
  • the method for determining whether the current frame needs to be encoded is the same as the step S504 in the second embodiment, and details are not described herein again.
  • Step S1305 Perform high-pass filtering and down-sample processing on the noise signal of the full-band to obtain a low-band signal component of the noise frame.
  • the low-band signal component of the noise frame can be obtained by using the QMF filtering method in the second embodiment, or the low-band signal component of the noise frame can be obtained by using the high-pass filtering and the down-sample processing method.
  • High-pass filtering and down-sampling methods are preferred.
  • High-pass can be performed on the noise signal x(n) by using a second-order elliptical high-pass filter transfer function
  • the filtered noise signal y(n) is obtained by filtering, and the transfer function is as shown in formula (7):
  • Step S1306 pre-emphasizing the low-band signal component of the noise frame, and then performing CELP coding to obtain a low-band coding parameter of the noise frame, where the noise frame may only include a narrow-band core layer parameter, or may include a narrowband
  • the core layer also contains a narrowband enhancement layer.
  • the spectral parameters and frame energy are used as narrow-band core layer parameters of background noise [0, ⁇ ].
  • Step S1307 reconstructing a low-band signal component by using the low-band coding parameter of the obtained noise frame.
  • the synthesized filter is constructed by using the reconstructed spectral parameters, and the Gaussian random noise is used as the excitation signal, filtered by the synthesis filter, and the output of the filter is shaped by using the decoded energy parameters to reconstruct the background noise.
  • Low band signal component the Gaussian random noise
  • Step S1308 Ascending the reconstructed low-band signal component to the original sampling rate, and performing spectrum expansion to obtain the reconstructed full-band signal.
  • Step S1309 Perform MDCT transformation on the residual of the original full-band signal and the reconstructed full-band signal, quantize and encode the MDCT coefficients, obtain a high-band coding parameter of the noise frame, and reconstruct a noise frame high-band signal component, and the noise frame It may contain only a broadband core layer, or it may include both a broadband core layer and a broadband enhancement layer.
  • Step S1310 The low-band signal component and the high-band signal component are processed by the multiplexer to obtain a coded code stream of the background noise of the hierarchical structure and transmitted.
  • the encoder system of the embodiment of the present invention is as shown in FIG.
  • Step S1311 At the decoding end, decoding the coding parameter from the received coded stream, and determining the type of the current frame. If the current frame is a voice frame, decoding the audio signal according to the voice frame decoding algorithm, if the current frame is The noise frame is changed to step S1312.
  • the media gateway can discard the outer coded bits of the noise frame when needed according to the transmission characteristics of the channel without affecting the decoding of the inner layer bits.
  • the decoder decodes based on the actual received code stream. Step S1312: reconstruct an encoding parameter of the received noise frame, and reconstruct a background noise signal according to the encoding parameter of the noise frame.
  • CELP decoding is performed on the received noise frame to obtain a decoded lowband signal component, and the lowband signal is obtained.
  • the component is sampled as a full-band signal and frequency-spreaded to obtain a reconstructed background noise signal.
  • the low-band coding parameters of the received noise frame are decoded by the CELP decoding algorithm to the low-band signal component, and the low-band signal component is boosted. And frequency-spreading is performed to obtain a full-band signal; the high-band coding parameters (ie, MDCT coefficients) of the received noise frame are subjected to inverse quantization and inverse MDCT transform to obtain a residual signal, and the full-band reconstructed with the low-band signal component The signals are added together to obtain the final reconstructed full-band background noise.
  • the high-band coding parameters ie, MDCT coefficients
  • the block diagram of the decoder system of this embodiment is as shown in FIG.
  • the encoding end selects the noise frame that needs to be encoded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame; the decoding end receives according to the The transmission mode of the layered coded noise frame decodes the coding parameters of the noise frame, and performs background noise reconstruction to achieve bandwidth scalability for background noise.
  • Embodiment 4 of the present invention provides a codec system, including:
  • the encoder 10 is configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current noise frame, and perform hierarchical coding on the noise frame that needs to be coded.
  • the decoder 20 is configured to: when the audio frame received from the encoder is a layered coded noise frame, decode the coding parameters of the noise frame according to the transmission mode of the current noise frame, and perform background noise reconstruction according to the coding parameter.
  • the fifth embodiment of the present invention provides an encoder, as shown in FIG. 16, including: a selecting unit 11 configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current frame, The result of the selection is sent to the coding unit.
  • the encoding unit 12 is configured to perform layered coding on the noise frame that needs to be encoded according to the result of the sending by the selecting unit.
  • the encoder further includes: a determining unit 13 configured to determine a type of the currently received audio frame, and when the audio frame is a noise frame and the previous frame is a voice frame, the received noise frame is sent to the specific frame time.
  • the speech coding unit transmits the received noise frame to the selection unit 11 after a specific frame time.
  • the voice frame coding unit 14 is configured to: after receiving the noise frame sent by the determining unit 13, encode the noise frame according to the voice coding algorithm, reduce the coding rate, and buffer the coding parameters of the received noise frame.
  • the coding unit 12 further includes: a low band coding sub-unit 121 for performing core layer coding on the low band signal component of the noise frame.
  • the high-band coding sub-unit 122 is configured to perform enhancement layer coding on the high-band signal component of the noise frame encoded by the core layer coding sub-unit.
  • Embodiment 6 of the present invention provides a decoder as shown in FIG. 17, which includes:
  • the decoding unit 21 is configured to: when the received audio frame is a layered coded noise frame, decode the coding parameter of the noise frame according to the transmission mode of the current noise frame.
  • the reconstruction unit 22 is configured to perform background noise reconstruction according to the coding parameters of the noise frame transmitted by the decoding unit.
  • the reconstruction unit 22 further includes: a low-band sub-unit 221, configured to use a low-band coding parameter output by the decoding unit when the received noise frame includes only the narrow-band core layer or both the narrow-band core layer and the narrow-band enhancement layer, The low-band signal component of the background noise signal is reconstructed.
  • the high-band sub-unit 222 is configured to reconstruct a high-band signal component of the background noise signal by using a high-band coding parameter output by the decoding unit when the received noise frame further includes the broadband extension layer.
  • the synthesizing subunit 223 is configured to perform synthesis filtering on the low band signal component and the high band signal component to obtain a background noise signal.
  • the encoding end selects the noise frame that needs to be coded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame; the decoding end is based on the received segment.
  • the transmission mode of the layer-coded noise frame decodes the coding parameters of the noise frame, and performs background noise reconstruction to achieve bandwidth-scalable decoding of the background noise.
  • the present invention can be implemented by hardware, or can be implemented by means of software plus necessary general hardware platform, and the technical solution of the present invention. It can be embodied in the form of a software product that can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including a number of instructions for making a computer device (may It is a personal computer, a server, or a network device, etc.) that performs the methods described in various embodiments of the present invention.
  • a non-volatile storage medium which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a computer device may It is a personal computer, a server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention porte sur un procédé de codage et de décodage et sur un dispositif pour le bruit de fond. Le procédé comprend le fait qu'une trame de bruit devant être codée est choisie en fonction du mode de transmission de trame de bruit en cours lorsque la trame reçue est une trame de bruit ; la réalisation d'un codage à couches pour la trame de bruit qui doit être codée. Par conséquent, la trame de bruit de fond est codée d'une façon pouvant être mise à l'échelle avec la largeur de bande. De façon correspondante, le procédé de décodage pourrait obtenir le décodage d'une façon pouvant être mise à l'échelle avec la largeur de bande pour le bruit de fond.
PCT/CN2008/072939 2007-11-07 2008-11-04 Procédé de codage/décodage et dispositif pour le bruit de fond WO2009067883A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710169832.6 2007-11-07
CN 200710169832 CN101430880A (zh) 2007-11-07 2007-11-07 一种背景噪声的编解码方法和装置

Publications (1)

Publication Number Publication Date
WO2009067883A1 true WO2009067883A1 (fr) 2009-06-04

Family

ID=40646234

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/072939 WO2009067883A1 (fr) 2007-11-07 2008-11-04 Procédé de codage/décodage et dispositif pour le bruit de fond

Country Status (2)

Country Link
CN (1) CN101430880A (fr)
WO (1) WO2009067883A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10115406B2 (en) 2013-06-10 2018-10-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V Apparatus and method for audio signal envelope encoding, processing, and decoding by splitting the audio signal envelope employing distribution quantization and coding
US11776551B2 (en) 2013-06-21 2023-10-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out in different domains during error concealment
CN117672247A (zh) * 2024-01-31 2024-03-08 中国电子科技集团公司第十五研究所 一种实时音频滤除窄带噪声的方法及系统

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101826331B1 (ko) 2010-09-15 2018-03-22 삼성전자주식회사 고주파수 대역폭 확장을 위한 부호화/복호화 장치 및 방법
ES2564504T3 (es) * 2010-12-29 2016-03-23 Samsung Electronics Co., Ltd Aparato de codificación y aparato de descodificación con una ampliación de ancho de banda
EP2709101B1 (fr) * 2012-09-13 2015-03-18 Nxp B.V. Système et procédé de traitement audio numérique
CN110010141B (zh) * 2013-02-22 2023-12-26 瑞典爱立信有限公司 用于音频编码中的dtx拖尾的方法和装置
CN105225668B (zh) 2013-05-30 2017-05-10 华为技术有限公司 信号编码方法及设备
PT3008726T (pt) 2013-06-10 2017-11-24 Fraunhofer Ges Forschung Aparelho e método de codificação, processamento e descodificação de envelope de sinal de áudio por modelação da representação de soma cumulativa empregando codificação e quantização de distribuição
EP2980790A1 (fr) 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Appareil et procédé de sélection de mode de génération de bruit de confort
EP3079151A1 (fr) * 2015-04-09 2016-10-12 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Codeur audio et procédé de codage d'un signal audio
CN112863539B (zh) * 2019-11-28 2024-04-16 科大讯飞股份有限公司 一种高采样率语音波形生成方法、装置、设备及存储介质
CN113066487A (zh) * 2019-12-16 2021-07-02 广东小天才科技有限公司 一种矫正口音的学习方法、系统、设备及存储介质
CN114006874B (zh) * 2020-07-14 2023-11-10 中国移动通信集团吉林有限公司 一种资源块调度方法、装置、存储介质和基站
CN112420065B (zh) * 2020-11-05 2024-01-05 北京中科思创云智能科技有限公司 音频降噪处理方法和装置及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867815A (en) * 1994-09-29 1999-02-02 Yamaha Corporation Method and device for controlling the levels of voiced speech, unvoiced speech, and noise for transmission and reproduction
JPH11352999A (ja) * 1998-04-06 1999-12-24 Ricoh Co Ltd 音声圧縮符号化装置
CN1428953A (zh) * 2002-04-22 2003-07-09 西安大唐电信有限公司 一种多通道amr声码器的实现方法和设备
CN1922660A (zh) * 2004-02-24 2007-02-28 松下电器产业株式会社 通信装置和信号编码/解码方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867815A (en) * 1994-09-29 1999-02-02 Yamaha Corporation Method and device for controlling the levels of voiced speech, unvoiced speech, and noise for transmission and reproduction
JPH11352999A (ja) * 1998-04-06 1999-12-24 Ricoh Co Ltd 音声圧縮符号化装置
CN1428953A (zh) * 2002-04-22 2003-07-09 西安大唐电信有限公司 一种多通道amr声码器的实现方法和设备
CN1922660A (zh) * 2004-02-24 2007-02-28 松下电器产业株式会社 通信装置和信号编码/解码方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Acoustics, Speech and Signal Processing, IEEE International Conference on, 15 Apr.2007", 15 April 2007, article RAGOT,S. ET AL.: "ITU-T G.729.1: AN 8-32 Kbit/S Scalable Coder Interoperable with G.729 for Wideband Telephony and Voice Over IP." *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10115406B2 (en) 2013-06-10 2018-10-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V Apparatus and method for audio signal envelope encoding, processing, and decoding by splitting the audio signal envelope employing distribution quantization and coding
US11776551B2 (en) 2013-06-21 2023-10-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out in different domains during error concealment
US11869514B2 (en) 2013-06-21 2024-01-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out for switched audio coding systems during error concealment
CN117672247A (zh) * 2024-01-31 2024-03-08 中国电子科技集团公司第十五研究所 一种实时音频滤除窄带噪声的方法及系统
CN117672247B (zh) * 2024-01-31 2024-04-02 中国电子科技集团公司第十五研究所 一种实时音频滤除窄带噪声的方法及系统

Also Published As

Publication number Publication date
CN101430880A (zh) 2009-05-13

Similar Documents

Publication Publication Date Title
WO2009067883A1 (fr) Procédé de codage/décodage et dispositif pour le bruit de fond
AU2018217299B2 (en) Improving classification between time-domain coding and frequency domain coding
US8532983B2 (en) Adaptive frequency prediction for encoding or decoding an audio signal
KR101854297B1 (ko) 시간 도메인 여기 신호를 기초로 하는 오류 은닉을 사용하여 디코딩된 오디오 정보를 제공하기 위한 오디오 디코더 및 방법
US8473301B2 (en) Method and apparatus for audio decoding
CN108831501B (zh) 用于带宽扩展的高频编码/高频解码方法和设备
US8718804B2 (en) System and method for correcting for lost data in a digital audio signal
JP6039678B2 (ja) 音声信号符号化方法及び復号化方法とこれを利用する装置
WO2009117967A1 (fr) Procédés et dispositifs de codage et de décodage
WO2009109139A1 (fr) Procédé de codage et de décodage par extension de la très large bande, codeur et système d'extension de la très large bande
US9047877B2 (en) Method and device for an silence insertion descriptor frame decision based upon variations in sub-band characteristic information
EP3039676A1 (fr) Extension de bande passante adaptative et son appareil
WO2010028301A1 (fr) Contrôle de netteté d'harmoniques/bruits de spectre
KR20160079849A (ko) 시간 도메인 여기 신호를 변형하는 오류 은닉을 사용하여 디코딩된 오디오 정보를 제공하기 위한 오디오 디코더 및 방법
WO2010000179A1 (fr) Procédé, système et dispositif pour élargir une bande passante
WO2014044197A1 (fr) Classement audio basé sur la qualité perceptuelle pour des débits binaires faibles ou moyens

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08855069

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08855069

Country of ref document: EP

Kind code of ref document: A1