WO2013097764A1 - 音频数据的处理方法、装置和系统 - Google Patents

音频数据的处理方法、装置和系统 Download PDF

Info

Publication number
WO2013097764A1
WO2013097764A1 PCT/CN2012/087812 CN2012087812W WO2013097764A1 WO 2013097764 A1 WO2013097764 A1 WO 2013097764A1 CN 2012087812 W CN2012087812 W CN 2012087812W WO 2013097764 A1 WO2013097764 A1 WO 2013097764A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
sid
frame
band signal
energy
Prior art date
Application number
PCT/CN2012/087812
Other languages
English (en)
French (fr)
Inventor
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to KR1020147020836A priority Critical patent/KR101693280B1/ko
Priority to RU2014131387/08A priority patent/RU2579926C1/ru
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to BR112014016153-4A priority patent/BR112014016153B1/pt
Priority to ES12861377.5T priority patent/ES2610783T3/es
Priority to MX2014007968A priority patent/MX338445B/es
Priority to CA2861916A priority patent/CA2861916C/en
Priority to EP12861377.5A priority patent/EP2793227B1/en
Priority to MYPI2014001949A priority patent/MY173976A/en
Priority to JP2014549344A priority patent/JP6072068B2/ja
Priority to KR1020167036611A priority patent/KR101770237B1/ko
Priority to AU2012361423A priority patent/AU2012361423B2/en
Priority to SG11201403686SA priority patent/SG11201403686SA/en
Publication of WO2013097764A1 publication Critical patent/WO2013097764A1/zh
Priority to US14/318,899 priority patent/US9406304B2/en
Priority to IN1436KON2014 priority patent/IN2014KN01436A/en
Priority to ZA2014/04996A priority patent/ZA201404996B/en
Priority to HK14113112.0A priority patent/HK1199543A1/zh
Priority to US15/188,518 priority patent/US9892738B2/en
Priority to US15/867,977 priority patent/US10529345B2/en
Priority to US16/697,822 priority patent/US11183197B2/en
Priority to US17/507,200 priority patent/US11727946B2/en
Priority to US18/344,445 priority patent/US20230352035A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/028Noise substitution, i.e. substituting non-tonal spectral components by noisy source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • G10L19/265Pre-filtering, e.g. high frequency emphasis prior to encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method, device, and system for processing audio data.
  • voice, image, audio, and video transmissions have a wide range of application requirements, such as mobile phone calls, audio and video conferencing, broadcast television, and multimedia entertainment.
  • the voice is digitized and transmitted from one terminal to another through a voice communication network, where the terminal can be a mobile phone, a digital telephone terminal or any other type of voice terminal, such as a VOIP telephone or an ISDN telephone, a computer, an electric book. Cable communication telephone.
  • the audio signal is compressed at the transmitting end and transmitted to the receiving end, and the receiving end recovers the audio signal and plays it by decompressing.
  • DTX/CNG Discontinuous Transmission System/Comfort Noise Generation
  • SID Silence Insertion Descriptor
  • This continuous recovery of background noise is not a faithful reproduction of the background noise at the encoding end, but rather strives to minimize the loss of auditory quality, making the user feel more comfortable, and the recovered background noise Called CN (Comfort Noise), this method of recovering CN from the decoder is called comfort noise generation.
  • ITU-T G.718 is a relatively new standardized wideband codec that includes a broadband DTX/CNG system.
  • the system can transmit the SID according to a fixed interval, and can also adaptively adjust the transmission interval of the SID according to the estimated noise level.
  • the G. 718 SID frame consists of 16 ISP parameters and excitation energy parameters.
  • the ISP (Immittance Spectral Pair) parameter characterizes the spectral envelope of the noise over the entire wideband bandwidth, and the excitation energy is obtained by the analysis filter represented by the set of ISP parameters.
  • the 718 estimates the LPC coefficients required for CNG according to the ISP parameters obtained by decoding the SID in the CNG state, and the excitation energy obtained according to the decoded SID frame. The number of excitation energies required for CNG is estimated, and the reconstructed CN is obtained by exciting the CNG synthesis filter with gain-adjusted white noise.
  • the embodiment of the present invention provides a method, device and system for processing audio data.
  • the technical solution is as follows:
  • a method for processing audio data comprising:
  • the noise frame of the audio signal Acquiring a noise frame of the audio signal, and decomposing the noise frame into a noise low band signal and a noise high band signal; encoding and transmitting the noise low band signal by using a first discontinuous transmission mechanism, and encoding and transmitting by using a second discontinuous transmission mechanism
  • the noise high band signal wherein the first mute insertion of the first discontinuous transmission mechanism describes a transmission policy of the frame SID and a transmission strategy of the second SID of the second discontinuous transmission mechanism, or The encoding strategy of the first SID of the first discontinuous transmission mechanism is different from the encoding strategy of the second SID of the second discontinuous transmission mechanism.
  • a method for processing audio data comprising:
  • Decoding a muting insertion description frame SID determining whether the SID includes a low band parameter and/or including a high band parameter; if the SID includes the low band parameter, decoding the SID to obtain a noise low band parameter, and Generating a noise high band parameter locally, obtaining a first comfort noise CN frame according to the decoded low band parameter obtained by the decoding and the locally generated noise high band parameter;
  • the SID includes the high band parameter, decoding the SID to obtain a noise high band parameter, and locally generating a noise low band parameter, and the noise high band parameter obtained according to the decoding and the locally generated noise low band
  • the parameter gets the second CN frame
  • the SID includes the high band parameter and the low band parameter
  • decoding the SID to obtain a noise high band parameter and the noise low band parameter, and the noise high band parameter and the noise low band parameter obtained according to the decoding A third CN frame is obtained.
  • an encoding apparatus for audio data comprising:
  • An acquiring module configured to acquire a noise frame of the audio signal, and decompose the noise frame into a noise low band signal and a noise high band signal
  • a transmission module configured to transmit the noise low band signal by using a first discontinuous transmission mechanism, and to transmit the noise high band signal by a second discontinuous transmission mechanism, where the first mute of the first discontinuous transmission mechanism is Inserting a transmission policy describing the frame SID and a transmission policy of the second SID of the second discontinuous transmission mechanism, or the coding strategy of the first SID of the first discontinuous transmission mechanism and the second discontinuity
  • the encoding strategy of the second SID of the transport mechanism is different.
  • a decoding apparatus for audio data comprising:
  • An obtaining module configured to obtain a mute insertion description frame SID, determine whether the SID includes a low band parameter and/or includes a high band parameter;
  • a first decoding module configured to: if the SID acquired by the acquiring module includes the low-band parameter, decode the SID, obtain a noise low-band parameter, and generate a noise high-band parameter locally, according to the decoded Deriving a noise low band parameter and the locally generated noise high band parameter to obtain a first comfort noise CN frame;
  • a second decoding module configured to: if the SID acquired by the acquiring module includes the high-band parameter, decode the SID to obtain a noise high-band parameter, and generate a noise low-band parameter locally, and obtain a high noise according to the decoding. Taking a parameter and the locally generated noise low band parameter to obtain a second CN frame;
  • a third decoding module configured to: if the SID acquired by the acquiring module includes the highband parameter and the lowband parameter, decoding the SID to obtain a noise highband parameter and the noise lowband parameter, according to the decoding The obtained noise high band parameter and the noise low band parameter obtain the third CN frame.
  • a processing system for audio data comprising: encoding means for audio data as described above and decoding means for audio data as described above.
  • the current noise frame is decomposed into a noise low band signal and a noise high band signal
  • the noise low band signal is encoded and transmitted by the first discontinuous transmission mechanism.
  • the discontinuous transmission mechanism encodes and transmits the noisy highband signal
  • the decoder obtains a silence insertion description frame SID, determines whether the SID includes a lowband parameter and/or includes a highband parameter; and uses different noise decoding modes for different judgment results.
  • the computational complexity and the coding bits can be saved without reducing the subjective quality of the codec, and the saved bits can be reduced to reduce the transmission bandwidth or used for The purpose of improving the overall coding quality is to solve the problem of coding transmission due to ultra-wideband.
  • FIG. 2 is a flowchart of a method for processing audio data provided in Embodiment 2 of the present invention.
  • Embodiment 3 is a flowchart of a method for processing audio data provided in Embodiment 3 of the present invention.
  • Embodiment 4 is a flowchart of a method for processing audio data provided in Embodiment 4 of the present invention.
  • FIG. 5 is a schematic diagram of an apparatus for encoding audio data according to Embodiment 6 of the present invention.
  • FIG. 6 is a schematic diagram of another encoding apparatus for audio data provided in Embodiment 6 of the present invention.
  • FIG. 7 is a schematic diagram of an apparatus for decoding audio data according to Embodiment 7 of the present invention.
  • Embodiment 8 is a schematic diagram of another decoding apparatus for audio data provided in Embodiment 7 of the present invention.
  • FIG 9 is a schematic diagram of a processing system for audio data provided in Embodiment 8 of the present invention. detailed description
  • this embodiment provides a method for processing audio data, where the method includes:
  • the first low-pass transmission mechanism encodes and transmits the noise low-band signal
  • the second non-continuous transmission mechanism encodes and transmits the noise high-band signal, where the first silent insertion description frame of the first discontinuous transmission mechanism
  • the sending policy of the SID is different from the sending policy of the second SID of the second discontinuous transmission mechanism, or the coding strategy of the first SID of the first discontinuous transmission mechanism and the second discontinuous transmission mechanism
  • the coding strategy of the two SIDs is different.
  • the first SID includes a low band parameter of the noise frame
  • the second SID includes a low band parameter or a high band parameter of the noise frame.
  • the transmitting, by using the second non-continuous transmission mechanism, the transmitting the high-band signal includes: determining whether the noise high-band signal has a preset spectrum structure, and if yes, and satisfying the foregoing And transmitting, in the second SID transmission policy, encoding, by the second SID coding strategy, the SID of the noise highband signal and transmitting; if not, determining that the noise highband signal is not required to be encoded and transmitted.
  • the determining whether the noise highband signal has a preset spectrum structure includes:
  • Obtaining a spectrum of the noisy high-band signal dividing the spectrum into at least two sub-bands, if an average energy of any one of the sub-bands is not less than an average of a second sub-band in the sub-band Energy, wherein the frequency band in which the second sub-band is located is higher than a frequency band in which the first sub-band is located, confirming that the noise high-band signal does not have a preset spectrum structure, otherwise
  • the noise high band signal has a preset spectral structure.
  • the encoding, by the second non-continuous transmission mechanism, the transmitting the high-band signal comprises: generating a deviation degree value according to the first ratio and the second ratio, where the first ratio is a ratio of the energy of the noise high band signal of the noise frame to the energy of the noise low band signal, the second ratio being the time at which the SID containing the noisy high band parameter is transmitted last time before the noise frame The ratio of the energy of the noise high band signal to the energy of the noise low band signal;
  • Determining whether the deviation degree value reaches a preset threshold if yes, encoding the SID of the noise high band signal by the second SID coding strategy and transmitting; if not, determining that the noise is not required to be high band
  • the signal is encoded and transmitted.
  • the first ratio is a ratio of an energy of a noise highband signal of the noise frame to an energy of the noise lowband signal, and includes:
  • the first ratio is a ratio of an instantaneous energy of a noise high band signal of the noise frame to an instantaneous energy of the noise low band signal;
  • the second ratio is a ratio of the energy of the noise highband signal and the energy of the noise lowband signal at the time when the SID containing the noisy highband parameter is transmitted last time before the noise frame, and includes:
  • the second ratio is a ratio of the instantaneous energy of the noise high band signal and the instantaneous energy of the noise low band signal of the time corresponding to the last transmission of the SID including the noisy high band parameter before the noise frame;
  • the first ratio is a ratio of the energy of the noise high band signal of the noise frame to the energy of the noise low band signal, and includes:
  • the first ratio is a ratio of a weighted average energy of a noise highband signal of the noise frame and its previous noise frame to a weighted average energy of a noise lowband signal of the noise frame and its previous noise frame;
  • the second ratio is a ratio of the energy of the noise highband signal and the energy of the noise lowband signal at the time when the SID containing the noisy highband parameter is transmitted last time before the noise frame, and includes:
  • the second ratio is a weighted average energy of the high-band signal and a low-band signal of the noise frame at the time corresponding to the last time the SID corresponding to the SID containing the noisy high-band parameter is transmitted before the noise frame The ratio of the average energy.
  • the generating the deviation degree value according to the first ratio and the second ratio includes:
  • the absolute value of the difference between the logarithmic value of the first ratio and the logarithm of the second ratio is calculated to obtain the degree of deviation.
  • the encoding, by the second discontinuous transmission mechanism, the transmitting the high-band signal includes: Determining whether a spectral structure of the noise high band signal of the noise frame satisfies a preset condition compared to an average spectral structure of the noise high band signal before the noise frame, and if so, encoding the second coding strategy
  • the noise of the noise frame is high with the SID of the signal and is transmitted; if not, it is determined that the noise high band signal of the noise frame does not need to be encoded and transmitted.
  • the average spectral structure of the noise highband signal before the noise frame includes: a weighted average of the spectrum of the noise highband signal before the noise frame.
  • the sending condition in the sending policy of the second SID of the second discontinuous transmission mechanism further includes: the first discontinuous transmission mechanism satisfies a sending condition of the first SID.
  • the method provided by the present invention has the beneficial effects of: acquiring a current noise frame of the audio signal, and decomposing the current noise frame into a noise low band signal and a noise high band signal, and transmitting the code by using the first discontinuous transmission mechanism a low-band signal with a second discontinuous transmission mechanism for encoding and transmitting the noisy high-band signal, so that by processing the high-band signal and the low-band signal differently, the calculation can be saved without reducing the subjective quality of the codec.
  • the complexity and coding bits, the saved bits can achieve the purpose of reducing the transmission bandwidth or improving the overall coding quality, thereby solving the problem of coding transmission due to ultra-wideband.
  • a method for processing audio data includes:
  • the decoder obtains a mute insertion description frame SID, and determines whether the SID includes a low band parameter or includes a high band parameter.
  • the SID includes the lowband parameter, decoding the SID, obtaining a noise lowband parameter, and locally generating a noise highband parameter, according to the decoded lowband parameter and the local The generated noise high band parameter obtains the first comfort noise CN frame;
  • the SID includes the highband parameter, decoding the SID to obtain a noise highband parameter, and locally generating a noise lowband parameter, according to the decoded highband parameter and the locally generated noise.
  • the low band parameter obtains the second CN frame;
  • the SID includes the high band parameter and the low band parameter
  • decoding the SID to obtain a noise high band parameter and the noise low band parameter, and the noise high band parameter and noise obtained according to the decoding are low.
  • the third CN frame is obtained with parameters.
  • the decoding the SID if the SID includes the low-band parameter, the decoding the SID, obtaining a noise low-band parameter, and locally generating a noise high-band parameter, according to the decoding.
  • the noise low band parameter and the locally generated noise high band parameter before the first comfort noise CN frame are also included:
  • the decoder enters a second CNG state if the decoder is in a first comfort noise generating CNG state.
  • the decoding station if the SID includes the high band parameter and the low band parameter, the decoding station The SID obtains a noise high band parameter and the noise low band parameter, and before the third CN frame is obtained according to the decoded high band parameter and the noise low band parameter, the method further includes:
  • the decoder If the decoder is in the second CNG state, the decoder enters a first CNG state.
  • determining whether the SID includes a low-band parameter and/or including a high-band parameter includes: if the number of bits of the SID is less than a preset first threshold, confirming that the SID includes a high If the number of bits of the SID is greater than a preset first threshold and less than a preset second threshold, confirming that the SID includes a low band parameter; if the number of bits of the SID is greater than a preset second The threshold is less than the preset third threshold, and the SID is confirmed to include a high band parameter and a low band parameter;
  • the SID includes the first identifier, confirming that the SID includes a highband parameter, and if the SID includes a second identifier, confirming that the SID includes a lowband parameter, if the SID The third identifier is included, and it is confirmed that the SID includes a low band parameter and a high band parameter.
  • the generating the noise high band parameter locally includes:
  • the obtaining the weighted average energy of the noise highband signal at the time corresponding to the SID includes:
  • the calculating, by the ratio of the energy of the noise high band signal corresponding to the time of receiving the SID containing the high band parameter and the energy of the noise low band signal, before the SID, obtains a first ratio includes:
  • the local frame of the previous CN frame is updated at the first rate.
  • the energy of the high band signal otherwise updating the energy of the high band signal of the locally buffered previous CN frame at a second rate, the first rate being greater than the second rate.
  • the weighted average of the energy of the noise highband signal at the time corresponding to the SID is obtained, including:
  • the obtaining the synthesis filter coefficient of the noise highband signal at the time corresponding to the SID includes:
  • ISF Interference Spectral Frequency coefficients or ISP coefficients or LSF (Line Spectral Frequency) coefficients or LSPs (Line Spectral pair) are distributed in the frequency range corresponding to the high-band signal. Coefficient;
  • each of the M coefficients is gradually brought closer to a corresponding one of its target values, and the target value is the same as the coefficient a value within a preset range adjacent to the value; a target value of each of the M coefficients is changed every N frames, wherein the M and the N are natural numbers; according to the randomization process
  • the filter coefficient obtains a synthesis filter coefficient of the noise high band signal at the time corresponding to the SID.
  • the obtaining a synthesis filter coefficient of the noise highband signal at the time corresponding to the SID includes:
  • a synthetic filter coefficient of the noise high band signal at the time corresponding to the SID is obtained according to the randomized processed filter coefficient.
  • the method before the obtaining the first low-band parameter and the locally generated noise high-band parameter according to the decoding, the method further includes:
  • the noise high band signal of the subsequent L frame from the SID is multiplied by a smoothing coefficient less than 1, to obtain a weighted average of the energy of the new locally generated noise high band signal;
  • the first CN frame is obtained according to the noise low band parameter obtained by the decoding and the locally generated noise high band parameter, and includes:
  • the decoder acquires a mute insertion description frame SID, determines whether the SID includes a low band parameter and/or includes a high band parameter; if the SID includes the low band parameter, Decoding the SID, obtaining a noise lowband parameter, and locally generating a noise highband parameter, and obtaining a first comfort noise CN frame according to the decoded lowband parameter obtained by the decoding and the locally generated noise highband parameter; If the SID includes the high band parameter, decoding the SID to obtain a noise high band parameter, and locally generating a noise low band parameter, and the noise high band parameter obtained according to the decoding and the locally generated noise low band The parameter obtains a second CN frame; if the SID includes the high band parameter and the low band parameter, decoding the SID to obtain a noise high band parameter and the noise low band parameter, and the noise obtained according to the decoding is high A third CN frame is obtained with parameters and noise low band parameters.
  • a method for processing audio data is provided.
  • the CNG noise spectrum of the low frequency band or the CNG noise spectrum of the high frequency band usually has lost the harmonic structure, so that the CNG high frequency band
  • the signal's effect on auditory perception will be primarily its energy rather than its spectral structure. Therefore, when performing DTX transmission on an ultra-wideband signal, in many cases, it is not necessary to transmit a high-band signal spectrum in the SID, and a high-band spectrum can be locally constructed at the decoding end by an appropriate method. A locally constructed high-band spectrum does not cause significant perceptual distortion. In this way, the calculation and encoding of the high-band spectrum calculations and bits at the encoding end are saved.
  • a method for processing audio data including: analyzing/classifying a high-bandwidth spectrum of a noise, and blinding a spectrum of a high-band signal by a decoder, when the SID does not include a high-band energy parameter, the decoder is high. Estimation with signal energy, switching between decoders between different CNG modules, etc.
  • the specific processing method of the audio data provided at the encoder end in this embodiment includes:
  • the encoder obtains a noise frame of the audio signal, and decomposes the noise frame into a noise low band signal and a noise high band signal.
  • the encoder obtains the noise frame of the audio signal, and the noise frame may be the current noise frame, or may be the noise frame buffered by the encoder side. This embodiment does not specifically limit this embodiment.
  • an ultra-wideband input audio signal sampled at 32 kHz is taken as an example.
  • the encoder first framing the input audio signal with a frame of 20ms (or 640 samples).
  • the encoder For the current frame (the current frame in this embodiment refers to the current frame to be encoded), the encoder first performs a high-pass filtering, and the passband is a frequency above 50 Hz.
  • the high-pass filtered current frame is decomposed into a low-band signal s by a QMF (Quadrature Mirror Filter) analysis filter.
  • a high band signal S1 wherein the low band signal s.
  • the high band signal is also 16 kHz sampling, characterizing the 8 to 16 kHz spectrum of the current frame.
  • VAD Voice Activity Detector
  • the encoder When VAD (Voice Activity Detector) indicates that the current frame is the foreground signal frame, ie speech In the case of a signal frame, the encoder performs speech coding on the current frame.
  • the encoding of the speech encoded frame by the encoder belongs to the prior art, and is not described in this embodiment.
  • the VAD indicates that the current frame is a noise frame.
  • the time encoder enters the DTX operating state.
  • the noise frame refers to both the background noise frame and the silence frame.
  • the DTX controller determines whether the low-band signal of the current frame encodes the SID and transmits according to the SID transmission policy.
  • the SID transmission strategy of the low-band signal in this embodiment is similar to the prior art, and the present invention does not describe it in detail.
  • determining whether the highband signal of the current noise frame satisfies the preset encoding transmission condition includes: determining whether the noise highband signal has a preset spectrum structure, and if yes, and satisfying the second SID transmission policy And transmitting, in the second SID encoding strategy, the SID of the noisy highband signal and transmitting; if not, determining that encoding of the noisy highband signal is not required.
  • Determining whether the noise highband signal has a preset spectrum structure comprises: obtaining a spectrum of the noise highband signal, and dividing the spectrum into at least two subbands, if any of the first subbands in the subband The average energy is not less than the average energy of the second sub-band in the sub-band, wherein the second sub-band is in a higher frequency band than the first sub-band, and the noise high-band signal is confirmed. There is no preset spectral structure, otherwise the noise high band signal has a preset spectral structure.
  • the encoder performs spectrum analysis on the high-band signal S1 of the current noise frame to determine whether 51 has a more obvious spectral structure, that is, a preset spectrum structure.
  • determining whether the high-band signal of the current noise frame satisfies a preset coding transmission condition includes: The ratio and the second ratio generate a deviation degree value, wherein the first ratio is a ratio of an energy of the noise high band signal of the noise frame to an energy of the noise low band signal, and the second ratio is at the noise The ratio of the energy of the noise high-band signal and the energy of the noise low-band signal at the time corresponding to the SID containing the noise high-band parameter is transmitted most recently before the frame; determining whether the deviation degree value reaches a preset threshold, and if so, Encoding the SID of the noisy highband signal with the second SID encoding strategy and transmitting; if not, determining that the noise is not needed Band signals encoded and transmitted.
  • the first ratio is a ratio of an energy of a noise highband signal of the noise frame to an energy of the noise lowband signal
  • the method includes: the first ratio is a high noise of the noise frame.
  • the second ratio is a ratio of the energy of the noise high band signal and the energy of the noise low band signal at the time when the SID corresponding to the noise high band parameter is transmitted last time before the noise frame, and includes: the second The ratio is a ratio of the instantaneous energy of the noise high band signal and the instantaneous energy of the noise low band signal at the time when the SID containing the noisy high band parameter is transmitted last time before the noise frame; or, the first ratio is The ratio of the energy of the noise high band signal of the noise frame to the energy of the noise low band signal includes: the first ratio is a weighted average energy of a noise high band signal of the noise frame and a noise frame before the noise frame a
  • generating the deviation degree value according to the first ratio and the second ratio comprising: separately calculating a logarithmic value of the logarithmic value of the first ratio and the second ratio; taking the logarithmic value of the first ratio and the The absolute value of the difference between the logarithmic values of the second ratio is obtained, and the degree of deviation is obtained.
  • determining whether the deviation degree value reaches a preset threshold may be implemented in the following manner: In the DTX working state, the encoder calculates the logarithmic energy e of the current frame high and low band signals S1 , so respectively.
  • e xa is initialized to the e x of the current frame.
  • the long-term moving average is one of the weighted average calculations, which is not specifically limited in this embodiment.
  • determining whether the deviation degree value reaches a preset threshold may be used as a second determination condition. In a specific implementation process, only one of the first determination condition or the second determination condition is required to be performed. It can be confirmed that the noise high-band signal needs to be encoded and transmitted, which is not specifically limited in this embodiment.
  • the second determining condition is optionally, the purpose of performing the step is to assist the decoding end according to the noise low-band energy and the noise high-low energy ratio when the SID containing the high-band parameter is received last time. Highly noisy energy is estimated locally. Specifically, if the deviation degree value is not calculated at the encoding end, the decoding end may obtain the speech frame with the lowest energy of the high-band signal in the speech frame for a period of time before the current noise frame, and the high-band signal in the speech frame according to the period before the current noise frame.
  • the high-band signal energy of the lowest energy speech frame locally estimates the current high-noise energy, for example, selecting the high-band signal energy of the speech frame with the lowest energy of the high-band signal in the speech frame for a period of time before the current noise frame as the current high
  • the weighted average energy obtains a weighted average of the energy of the noise high band signal at the time corresponding to the SID.
  • the specific embodiment is not limited herein.
  • Transform isp SID (i) to ISF coefficient isfsiD(i), quantize isf SID Ci), obtain a set of quantization index idx ISF , and encapsulate it into SID.
  • update the decoded ISP coefficient long-term moving average with the buffered isp'(i):
  • a 0.9
  • isp a ( ) is initialized to isp'1 of the first SID.
  • Transform isp a ( ) to the LPC coefficient lpc a ( ), the analysis filter A(Z) is obtained.
  • the cache is in this embodiment.
  • the flag SID of the current noise frame is 1, according to the cached M history including the current noise frame
  • the calculated weighted average logarithm energy e SID of the frame, e slD ⁇ j ⁇ 1.5, where Wl (k) is a set of M-dimensional positive coefficients,
  • the quantization index idx e is obtained by quantizing the e SID .
  • the coding and transmission strategy for the noise low-band signal is similar to the coding and transmission strategy for the noise broadband signal in the prior art. This is only a brief introduction in this embodiment, and the specific implementation process is not detailed in this embodiment. description.
  • the noise high band signal of the current noise frame does not need to be encoded, and only the noise low band signal is encoded, which saves the calculation amount of the coding end and also saves the transmission bit.
  • the first low-speed transmission mechanism encodes and transmits the noise low-band signal
  • the second discontinuous transmission mechanism encodes and transmits the noise high-band signal.
  • the SID needs to encode the high-band parameter in addition to the encoding of the low-band parameter.
  • the coding of the low-bandwidth and low-band parameters is the same as the coding in the step 303, and will not be further described in this embodiment.
  • the lsp a (i) is quantized to obtain a set of quantization index id SP .
  • the number of energy is quantized at the long-term moving average e la of the encoding end to obtain a quantization index idx E.
  • the SID will be composed of idx ISF , idx e , idxL SP and idx E , which in this embodiment will be idx ISF , idx e , idxL
  • the SID composed of SP and idx E is called a large SID.
  • the coding strategy for the noisy high-band signal is similar to the principle of the coding strategy for the low-band signal. This is only a brief introduction in this embodiment. The specific implementation process is not described in detail in this embodiment.
  • the coded transmission of the noise high band signal when the coded transmission condition of the noise high band signal is satisfied, the coded transmission of the noise high band signal is always
  • the coded transmission of the noise low-band signal is performed simultaneously, but optionally, the coded transmission of the noise high-band signal and the coded transmission of the noise low-band signal may also be performed at different times, that is, there are three possible cases when the SID is transmitted: Only the low-band signal encoding transmission is performed on the current noise frame; 2) only the high-band signal encoding transmission is performed on the current noise frame; 3) the low-band and high-band signal encoding transmission is simultaneously performed on the current noise frame, at this time
  • the sending condition in the sending policy of the second SID of the second discontinuous transmission mechanism further includes: the first discontinuous transmission mechanism satisfies a sending condition of the first SID.
  • the above three cases of transmitting the SID are not specifically limited in this embodiment.
  • the steps 302-304 are specifically performing the step of encoding and transmitting the noise lowband signal by using a first discontinuous transmission mechanism, and encoding the noise highband signal by using a second discontinuous transmission mechanism, where the The first mute insertion of the discontinuous transmission mechanism describes that the transmission policy of the frame SID and the transmission strategy of the second SID of the second discontinuous transmission mechanism are different, or the first SID of the first discontinuous transmission mechanism The encoding strategy is different from the encoding strategy of the second SID of the second discontinuous transmission mechanism.
  • the method provided by the present invention has the beneficial effects of: acquiring a current noise frame of the audio signal, and decomposing the current noise frame into a noise low band signal and a noise high band signal, and transmitting the code by using the first discontinuous transmission mechanism a low-band signal with a second discontinuous transmission mechanism for encoding and transmitting the noisy high-band signal, so that by processing the high-band signal and the low-band signal differently, the calculation can be saved without reducing the subjective quality of the codec.
  • the complexity and coding bits, the saved bits can achieve the purpose of reducing the transmission bandwidth or improving the overall coding quality, thereby solving the problem of coding transmission due to ultra-wideband.
  • a method for processing audio data is provided.
  • the decoding end can determine, according to the received code stream, whether the current frame is a voice coded frame or a SID or a NO_DATA frame.
  • the NO_DATA frame indicates that the encoder does not encode the frame for transmitting the SID during the noise.
  • the decoder may further determine, according to the number of bits of the SID, whether the SID includes a low band and/or a high band parameter.
  • the decoder may also determine whether the SID includes low-band and/or high-band parameters according to the specific identifier entered in the SID, which requires adding additional identification bits when encoding the SID, such as when entering in the SID.
  • the SID is only included in the high-band parameter.
  • the SID is identified to contain only the low-band parameter, and the third identifier is entered, and the SID is included with the high-band parameter and the low-band parameter. parameter.
  • the current frame is a voice-coded frame
  • the decoder performs the process of decoding the voice frame.
  • the specific processing is similar to the prior art, which is not described in detail in this embodiment.
  • the decoder selects a corresponding method to reconstruct the CN frame according to the specific working state of the CNG.
  • the CNG has two working states, corresponding to the half-decoded CNG state of the small SID frame, that is, the first CNG state, corresponding to the fully decoded CNG state of the large SID frame, that is, the second CNG state.
  • the decoder reconstructs the CN frame according to the noise level parameter obtained by decoding the large SID frame.
  • the decoder reconstructs the CN frame based on the noise low band parameters obtained by decoding the small SID frame and the locally estimated noise high band parameters.
  • the decoder acquires a SID. If the SID includes the highband parameter and the lowband parameter, decoding the SID to obtain a noise highband parameter and the noise lowband parameter, and the noise obtained according to the decoding is high. A third CN frame is obtained with parameters and noise low band parameters.
  • the decoder determines the type of the voice frame first, so that different decoding modes are adopted according to different types of voice frames. Specifically, if the number of bits of the SID is less than a preset first threshold, confirm that the SID includes a highband parameter; if the number of bits of the SID is greater than a preset first threshold and less than a preset second a threshold, confirming that the SID includes a low band parameter; if the number of bits of the SID is greater than a preset second threshold and less than a preset third threshold, confirming that the SID includes a high band parameter and a low band parameter Or if the SID includes the first identifier, confirming that the SID includes a highband parameter, and if the SID includes a second identifier, confirming that the SID includes a lowband parameter, if The third identifier is included in the SID, and it is confirmed that the SID includes a low band parameter and
  • the SID includes the high band parameter and the low band parameter
  • decoding the SID to obtain a noise high band parameter and the noise low band parameter
  • the noise high band parameter obtained according to the decoding And the noise low band parameter gets the third
  • the decoder decodes the SID to obtain a decoded low-band excitation logarithm energy e D , a low-band ISF coefficient isf d (i), a high-band logarithmic energy E D , and a high-band LSP coefficient lspd1.
  • Transform isf d (i) to ISP coefficient isp d (i), convert e D , E D to energy e d , E d , where
  • the decoder end passes s' Q , s ' i through the QMF synthesis filter to obtain the first CN frame of the final 32 kHz sample reconstructed by the decoder.
  • the SID includes the lowband parameter
  • decoding the SID obtains a noise lowband parameter
  • locally generating a noise highband parameter according to the decoded lowband parameter and the local
  • the generated noise high band parameter gets the first CN frame.
  • the high band signal of the first CN frame is still obtained by the method of exciting the synthesis filter with white noise, except that the high band signal energy and the synthesis filter coefficient of the first CN frame are obtained by local estimation.
  • generating the noise high band parameter locally includes: obtaining a weighted average energy of the noise high band signal and a synthesis filter coefficient of the noise high band signal at the time corresponding to the SID, respectively;
  • the noise high band signal is obtained by the weighted average energy of the noise high band signal at the time corresponding to the SID and the synthesis filter coefficient of the noise high band signal.
  • the weighted average energy of the noise high band signal at the time corresponding to the SID is obtained in the embodiment, including: obtaining the energy of the low band signal of the first CN frame according to the noise low band parameter obtained by the decoding; a ratio of the energy of the noise high band signal corresponding to the time at which the SID containing the high band parameter is received in front of the SID and the energy of the noise low band signal is obtained; a first ratio is obtained according to the low band signal of the first CN frame Energy and the first ratio, obtaining the pair of SIDs
  • the noise of the high-band signal at the moment of the response weighting the energy of the noise high-band signal at the time corresponding to the SID and the energy of the high-band signal of the locally buffered CN frame to obtain the noise at the time corresponding to the SID
  • the weighted average energy of the high band signal wherein the weighted average energy of the noise high band signal at the time corresponding to the SID is the high band signal energy of the first CN frame.
  • the calculating, by the ratio of the energy of the noise high-band signal corresponding to the time of receiving the SID including the high-band parameter and the energy of the noise low-band signal, before the SID obtains a first ratio, including: And obtaining, by the ratio of the instantaneous energy of the noise high-band signal corresponding to the time when the SID containing the high-band parameter is received, and the instantaneous energy of the noise low-band signal, the first ratio; or, calculating, receiving in front of the SID
  • the ratio of the weighted average of the energy of the noise high band signal corresponding to the time of the SID containing the high band parameter to the weighted average of the energy of the noise low band signal is the first ratio.
  • the instant energy is the energy obtained by decoding. And updating, when the energy of the high-band signal of the time corresponding to the SID is greater than the energy of the high-band signal of the previous CN frame of the local cache, updating the previous CN frame of the local cache at the first rate The energy of the high band signal, otherwise updating the energy of the high band signal of the locally buffered previous CN frame at a second rate, the first rate being greater than the second rate.
  • the energy E Q of the low band signal of the first CN frame s' Q is obtained according to the noise low band parameter obtained by the decoding.
  • the SID is optionally obtained.
  • a weighted average of the energy of the high-band signal at the corresponding time comprising: selecting a high-band signal of the voice frame with the lowest energy of the high-band signal in the voice frame in the preset time period before the SID; according to the high-band signal in the voice frame The energy of the high-band signal of the lowest energy speech frame obtains the weighted average energy of the noise high-band signal at the time corresponding to the SID; or, the high-band signal energy in the speech frame within the preset time period before the SID is selected is less than the pre- Setting a high-band signal of N speech frames of a threshold; The weighted average energy of the high band signal of the N speech frames obtains a weighted average of the energy of the noise high band signal at the time corresponding to the SID, wherein the weighted average energy of the noise high band signal at the time corresponding to the SID is The high band signal
  • the synthesis filter coefficients of the noise high band signal at the time corresponding to the SID are obtained, including: distributing M impedance coefficients ISF coefficients or derivatives in a frequency range corresponding to the high band signal Resistance spectrum to ISP coefficient or a line spectrum frequency LSF coefficient or a line spectrum pair LSP coefficient; randomizing the M coefficients, wherein the randomization is characterized by: causing each of the M coefficients to correspond to a respective target thereof The value is gradually close, the target value is a value within a preset range adjacent to the coefficient value; a target value of each of the M coefficients is changed every N frames, and N may be a variable;
  • the randomized filter coefficient obtains a synthesis filter coefficient of a noise high band signal at a time corresponding to the SID.
  • RND in equation (15) is a set of 9-dimensional random number sequences, each of which is different and both in [-1, 1
  • the modCcnt in the calculation of RtCi), 10 in 10) can also be a variable, such as:
  • N _ ⁇ 0 + 5 -RND mod(cnt, N (_1) ) 0
  • RND is a random number in the range of [-1, 1], which is not specifically limited in this embodiment.
  • Transformed into LPC coefficients lpd, i 0, l,...9.
  • Multiply lpd by a set of 10-dimensional weighting coefficients W(i) ⁇ 0.6699, 0.5862, 0.5129, 0.4488, 0.3927, 0.3436, 0.3007, 0.2631, 0.2302, 0.2014 ⁇ , and obtain the weighted LPC coefficient lpc ⁇ i), ie For the estimated synthesis filter 1/A (Z).
  • the obtaining, by the obtaining, the synthesis filter coefficient of the noise high band signal at the time corresponding to the SID includes: acquiring the one ISF or the ISP or the LSF of the locally buffered noise high band signal Or LSP coefficients; performing randomization processing on the plurality of coefficients, wherein the randomization is characterized by: causing each of the coefficients to gradually move toward a corresponding one of their target values, the target value a value within a preset range adjacent to the coefficient value; a target value of each of the coefficients is changed every time the frame is changed; and the filter coefficient obtained according to the randomization process is The synthesis filter coefficient of the noise high band signal at the time corresponding to the SID.
  • This embodiment is not specifically limited.
  • s' Q is obtained, and the first CN frame of the final 32 kHz sample reconstructed by the decoder is obtained through the QMF synthesis filter.
  • the locally generated noise high band parameter may also be optimized before the first CN frame is obtained according to the noise low band parameter obtained by the decoding and the locally generated noise high band parameter.
  • the specific optimization step includes: when the historical frame adjacent to the SID is a speech encoded frame, if the speech encoded frame decodes a high band signal or a part of a high band When the average energy of the signal is less than the average energy of the locally generated noise high band signal or the partial noise high band signal, the noise high band signal of the subsequent L frame from the SID is multiplied by a smoothing coefficient less than 1, to obtain a new a weighted average of the energy of the locally generated noise highband signal; correspondingly, the obtaining the first CN frame according to the noise lowband parameter obtained by the decoding and the locally generated noise highband parameter, including: according to the a noise low band parameter obtained by decoding, a synthesis filter coefficient of a noise high band signal at a time corresponding to
  • the current SID and the following SIDs are required (in this embodiment, 50 frames)
  • the high band signal energy is smoothed.
  • the specific smoothing method is: multiplying 3' 1 of the current frame by the gain G s to obtain smoothed s' ls . among them
  • This smoothing process only takes up to 50 frames, and if the period occurs - greater than E, the smoothing process is terminated.
  • - and may also only represent the energy of a part of the frame, which is not specifically limited in this embodiment.
  • s' Q , S' i (or s, ls ) is passed through a QMF synthesis filter to obtain a final 32 kHz sampled CN frame reconstructed by the decoder.
  • the SID includes the highband parameter, decoding the SID to obtain a noise highband parameter, and locally generating a noise lowband parameter, according to the decoded highband parameter and the locally generated noise.
  • the low band parameter gets the second CN frame.
  • the SID includes the highband parameter
  • decoding the SID to obtain a noise highband parameter and locally generating a noise lowband parameter, according to the decoded highband parameter and the locally generated
  • the noise low band parameter obtains the second CN frame
  • the method for decoding the high band parameter is the same as the method in step 401, and is not described in this embodiment, and the method for locally generating the low band parameter is local to the prior art.
  • the method for generating the broadband parameter is the same, and the details are not described in this embodiment.
  • the decoder acquires a mute insertion description frame SID, determines whether the SID includes a low band parameter and/or includes a high band parameter; if the SID includes the low band parameter, Decoding the SID, obtaining a noise lowband parameter, and locally generating a noise highband parameter, and obtaining a first comfort noise CN frame according to the decoded lowband parameter obtained by the decoding and the locally generated noise highband parameter; If the SID includes the high band parameter, decoding the SID to obtain a noise high band parameter, and locally generating a noise low band parameter, and the noise high band parameter obtained according to the decoding and the locally generated noise low band The parameter obtains a second CN frame; if the SID includes the high band parameter and the low band parameter, decoding the SID to obtain a noise high band parameter and the noise low band parameter, and the noise obtained according to the decoding is high A third CN frame is obtained with parameters and noise low band parameters.
  • the computational complexity and the coding bits can be saved without reducing the subjective quality of the codec, and the saved bits can be reduced to reduce the transmission bandwidth or to improve the overall coding.
  • the purpose of quality is thus solved due to the problem of code transmission in ultra-wideband.
  • the locally generated noise high band parameter may also be optimized, so that a better effect can be obtained. Comfort noise, which further optimizes the performance of the decoder.
  • a method for processing audio data is provided.
  • the encoder side acquires a noise frame of an audio signal, and decomposes the noise frame into a low-band noise signal and a high noise.
  • determining whether the high band signal of the noise frame satisfies a preset encoding transmission condition includes: determining noise of the noise frame Whether the spectral structure of the high band signal satisfies a preset condition compared to the average spectral structure of the noise high band signal preceding the noise frame, and if so, the noise high band signal of the noise frame is encoded by the second coding strategy
  • the SID is sent and transmitted; if not, the noise high band signal of the noise frame does not need to be encoded and transmitted.
  • the average spectral structure of the noise high band signal before the noise frame includes: a weighted average of the spectrum of the noise high band signal before the noise frame.
  • the second determination condition may be used to determine whether the transmission noise high band signal needs to be encoded, which is not specifically limited in this embodiment.
  • the LSP, LSF, or ISF, or ISP coefficients are only different representations of different domains, but all represent synthesis filter coefficients. This embodiment is not specifically limited. Update its sliding average with lsp(i),
  • lsp a (i) is the long-term moving average of lsp(i)
  • the synthesis filter coefficient of the noise high band signal at the time corresponding to the SID is obtained, including: The locally buffered noise high band signal of the M ISF coefficients or ISP coefficients or LSF coefficients or LSP coefficients; randomizing the M coefficients, wherein the randomization is characterized by: making the M coefficients Each of the coefficients is gradually closer to a respective one of its target values, the target value being a value within a preset range adjacent to the coefficient value; each target value of each of the M coefficients is passed The N frame is changed; the synthesis filter coefficient of the noise high band signal at the time corresponding to the SID is obtained according to the randomized processed filter coefficient.
  • the specific noise obtained at the time corresponding to the SID is high.
  • Lsp'(i) is randomized in the same manner as in the fourth embodiment to obtain ls Pl (i),
  • the lspl(i) is transformed into the LPC coefficient lpcl(i), and weighted by w(i) in the same manner as in the fourth embodiment to obtain a synthetic filter 1/A CZ).
  • the obtained lspl(i) is not used to update the long-term moving average of the LSP coefficients of the high-band signal of the CN frame buffered by the decoder when the current frame is the SID.
  • a high band signal slidably log energy at length encoding end average e la quantizing of e la certain attenuating the present embodiment (i.e., by subtracting a certain value)
  • quantization is performed, so at this time, it is no longer necessary to multiply s (i) by G2 or G4 in Embodiment 4 at the time of decoding.
  • the other steps of the decoding end in this embodiment are similar to the steps in the foregoing embodiment, and are not described in detail in this embodiment.
  • the method provided by the present invention has the beneficial effects of: acquiring a current noise frame of the audio signal, and decomposing the current noise frame into a noise low band signal and a noise high band signal, and transmitting the code by using the first discontinuous transmission mechanism a noise low band signal, encoding the noise high band signal by a second discontinuous transmission mechanism, the decoder acquiring a mute insertion description frame SID, determining whether the SID includes a low band parameter and/or including a high band parameter; The SID includes the low band parameter, the SID is decoded, a noise low band parameter is obtained, and a noise high band parameter is generated locally, and the noise low band parameter and the locally generated noise high band are obtained according to the decoding.
  • the parameter obtains a first comfort noise CN frame; if the SID includes the high band parameter, decoding the SID to obtain a noise high band parameter, and locally generating a noise low band parameter, and the noise high band parameter obtained according to the decoding And generating, by the locally generated noise low band parameter, a second CN frame; if the SID includes the high band parameter and the low band parameter, decoding the SI D obtains a noise high band parameter and the noise low band parameter, and obtains a third CN frame according to the noise high band parameter and the noise low band parameter obtained by the decoding.
  • an apparatus for encoding audio data includes: an obtaining module 501, and a transmission module 502.
  • the obtaining module 501 is configured to acquire a noise frame of the audio signal, and decompose the noise frame into a noise low band signal and a noise high band signal;
  • the transmitting module 502 is configured to code and transmit the noise low band signal by using a first discontinuous transmission mechanism, and transmit the noise high band signal by a second discontinuous transmission mechanism, where the first discontinuous transmission mechanism is first.
  • the transmission policy of the mute insertion description frame SID is different from the transmission policy of the second SID of the second discontinuous transmission mechanism, or the coding strategy of the first SID of the first discontinuous transmission mechanism and the second discontinuity
  • the encoding strategy of the second SID of the transport mechanism is different.
  • the first SID includes a low band parameter of the noise frame
  • the second SID includes a low band parameter and/or a high band parameter of the noise frame.
  • the transmission module 502 includes:
  • the first transmission unit 502a is configured to determine whether the noise highband signal has a preset spectrum structure, and if yes, and satisfy a transmission condition in the second SID transmission policy, use the second SID coding strategy Encoding the SID of the noisy highband signal and transmitting; if not, determining that encoding of the noisy highband signal is not required.
  • the first transmission unit 502a includes:
  • a determining subunit configured to obtain a spectrum of the noisy highband signal, and divide the frequency spectrum into at least two subbands, if an average energy of any of the first subbands in the subband is not less than the subband An average energy of the second sub-band, wherein the frequency band in which the second sub-band is located is higher than a frequency band in which the first sub-band is located, confirming that the noisy high-band signal does not have a preset spectral structure, otherwise
  • the noise high band signal has a preset spectral structure.
  • the transmission module 502 includes:
  • a second transmission unit 502b configured to generate a deviation degree value according to the first ratio and the second ratio, wherein the first ratio is a ratio of an energy of the noise highband signal of the noise frame to an energy of the noise lowband signal
  • the second ratio is a ratio of the energy of the noise highband signal and the energy of the noise lowband signal at the time when the SID including the noisy highband parameter is transmitted most recently before the noise frame; determining the degree of deviation Whether the value reaches a preset threshold, and if so, encoding the SID of the noisy highband signal with the second SID encoding strategy and transmitting; if not, determining that encoding of the noisy highband signal is not required.
  • the first ratio is a ratio of an energy of a noise highband signal of the noise frame to an energy of the noise lowband signal, and includes:
  • the first ratio is a ratio of an instantaneous energy of a noise high band signal of the noise frame to an instantaneous energy of the noise low band signal;
  • the second ratio is a ratio of the energy of the noise highband signal and the energy of the noise lowband signal at the time when the SID containing the noisy highband parameter is transmitted last time before the noise frame, and includes:
  • the second ratio is a ratio of the instantaneous energy of the noise high band signal and the instantaneous energy of the noise low band signal of the time corresponding to the last transmission of the SID including the noisy high band parameter before the noise frame;
  • the first ratio is a ratio of the energy of the noise high band signal of the noise frame to the energy of the noise low band signal, and includes:
  • the first ratio is a ratio of a weighted average energy of a noise highband signal of the noise frame and its previous noise frame to a weighted average energy of a noise lowband signal of the noise frame and its previous noise frame;
  • the second ratio is a ratio of the energy of the noise highband signal and the energy of the noise lowband signal at the time when the SID containing the noisy highband parameter is transmitted last time before the noise frame, and includes:
  • the second ratio is a weighted average energy of the high-band signal and a low-band signal of the noise frame at the time corresponding to the last time the SID corresponding to the SID containing the noisy high-band parameter is transmitted before the noise frame The ratio of the average energy.
  • the second transmission unit 502b includes:
  • Calculating a subunit configured to separately calculate a logarithmic value of the logarithmic value of the first ratio and the second ratio; calculating an absolute value of a difference between the logarithmic value of the first ratio and the logarithm of the second ratio, to obtain the deviation Degree value.
  • the transmission module 502 includes:
  • a third transmission unit 502c configured to determine whether a spectral structure of the noise highband signal of the noise frame satisfies a preset condition compared with an average spectral structure of a noise highband signal before the noise frame, and if yes,
  • the second encoding strategy encodes the SID of the noise high band signal of the noise frame and transmits; if not, it determines that the noise high band signal of the noise frame does not need to be encoded and transmitted.
  • the average spectral structure of the noise highband signal before the noise frame includes: a weighted average of the spectrum of the noise highband signal before the noise frame.
  • the sending condition in the sending policy of the second SID of the second discontinuous transmission mechanism in the embodiment further includes: the first discontinuous transmission mechanism satisfies a sending condition of the first SID.
  • the device provided by the present invention has the beneficial effects of: acquiring a current noise frame of the audio signal, and decomposing the current noise frame into a noise low band signal and a noise high band signal, and transmitting the code by using the first discontinuous transmission mechanism. a low-band signal with a second discontinuous transmission mechanism for transmitting the noise high-band signal, such that the high-band signal and the low-band signal are not.
  • the same processing method can save computational complexity and coding bits without reducing the subjective quality of the codec, and the saved bits can achieve the purpose of reducing the transmission bandwidth or improving the overall coding quality, thereby solving the problem of ultra-wideband.
  • the encoding transmission problem Example 7
  • the apparatus includes: an obtaining module 601, a first decoding module 602, a second decoding module 603, and a third decoding module 604.
  • the obtaining module 601 is configured to determine whether the received current mute insertion description frame SID includes a high band parameter or a low band parameter;
  • the first decoding module 602 is configured to: if the SID acquired by the acquiring module 601 includes the low-band parameter, decode the SID, obtain a noise low-band parameter, and generate a noise high-band parameter locally, according to the decoding.
  • the noise low band parameter and the locally generated noise high band parameter to obtain a first comfort noise CN frame;
  • a second decoding module 603, configured to: if the SID acquired by the acquiring module 601 includes the highband parameter, decode the SID to obtain a noise highband parameter, and generate a noise lowband parameter locally, according to the decoding. a noise high band parameter and the locally generated noise low band parameter to obtain a second CN frame;
  • a third decoding module 604 configured to: if the SID acquired by the acquiring module 601 includes the highband parameter and the lowband parameter, decoding the SID to obtain a noise highband parameter and the noise lowband parameter, according to the The noise high band parameter and the noise low band parameter obtained by the decoding obtain the third CN frame.
  • the first decoding module 602 is further configured to: after decoding the SID, obtain a noise lowband parameter, and locally generate a noise highband parameter, and the noise lowband parameter obtained according to the decoding. And before the locally generated noise high band parameter obtains the first comfort noise CN frame, if the decoder is in the first comfort noise generating CNG state, then enters the second CNG state.
  • the third decoding module 604 is further configured to: decode the SID to obtain a noise highband parameter and the noise lowband parameter, and obtain a noise highband parameter and a low noise band according to the decoding. Before the parameter obtains the third CN frame, if the decoder is in the second CNG state, it enters the first CNG state.
  • the obtaining module 601 includes:
  • a first confirming unit configured to: if the number of bits of the SID is less than a preset first threshold, confirm that the SID includes a highband parameter; if the number of bits of the SID is greater than a preset first threshold and less than a pre- Setting a second threshold, confirming that the SID includes a low band parameter; if the number of bits of the SID is greater than a preset second threshold and less than a preset third threshold, confirming that the SID includes a high band parameter And low band parameters; Or a second confirming unit, configured to: if the SID includes a first identifier, confirm that the SID includes a highband parameter, and if the SID includes a second identifier, confirm that the SID includes a low identifier With parameters, if the third identifier is included in the SID, it is confirmed that the SID includes a low band parameter and a high band parameter.
  • the first decoding module 602 includes:
  • a first acquiring unit configured to respectively obtain a weighted average energy of a noise high band signal and a synthesis filter coefficient of a noise high band signal at a time corresponding to the SID;
  • a second acquiring unit configured to obtain the noise high band signal according to the weighted average energy of the noise high band signal and the synthesis filter coefficient of the noise high band signal at the time corresponding to the obtained SID.
  • the first acquiring unit includes:
  • a first obtaining subunit configured to obtain, according to the decoded low-band parameter, a low-band signal capable sub-unit of the first CN frame, configured to calculate, before the SID, a SID that includes a high-band parameter
  • the ratio of the energy of the noise high-band signal corresponding to the energy of the noise low-band signal corresponding to the time is obtained as a first ratio
  • a second acquiring sub-unit configured to obtain, according to the energy of the low-band signal of the first CN frame and the first ratio, an energy of a noise high-band signal at a corresponding moment of the SID;
  • a third acquiring sub-unit configured to weight-average the energy of the noise high-band signal at the time corresponding to the SID and the energy of the high-band signal of the locally buffered CN frame, to obtain a noise high-band signal at a time corresponding to the SID
  • the weighted average energy of the noise high band signal at the time corresponding to the SID is the high band signal energy of the first CN frame, wherein the calculation subunit is specifically used for:
  • a ratio of a weighted average of the energy of the noise high band signal corresponding to the time at which the SID containing the high band parameter is received in front of the SID and the energy of the noise low band signal is calculated to obtain a first ratio.
  • the local frame of the previous CN frame is updated at the first rate.
  • the energy of the high band signal otherwise updating the energy of the high band signal of the locally buffered previous CN frame at a second rate, the first rate being greater than the second rate.
  • the first acquiring unit includes:
  • a first selection sub-unit configured to select a high-band signal of a speech frame with a minimum energy of a high-band signal in a speech frame within a preset time period before the SID; and a high-band of a speech frame with a minimum energy of a high-band signal in the speech frame Signal energy acquisition
  • the weighted average energy of the noise high band signal at the time corresponding to the SID, wherein the weighted average energy of the noise high band signal at the time corresponding to the SID is the high band signal energy of the first CN frame;
  • a second selection sub-unit configured to select a high-band signal of the N speech frames in which the high-band signal energy in the speech frame in the preset time period is less than a preset threshold in the preset time period; according to the height of the N speech frames
  • the weighted average energy of the signal obtains a weighted average of the energy of the noise high band signal at the time corresponding to the SID, wherein the weighted average energy of the noise high band signal at the time corresponding to the SID is the height of the first CN frame With signal energy.
  • the first acquiring unit includes:
  • a distribution subunit for distributing M impedance spectrum frequencies in the frequency range corresponding to the highband signal, ISF coefficient or impedance spectrum pair ISP coefficient or line spectrum frequency LSF coefficient or line spectrum pair LSP coefficient;
  • a first randomization processing sub-unit configured to perform randomization processing on the M coefficients, wherein the randomization is characterized by: causing each of the M coefficients to gradually reach a corresponding target value thereof Closely, the target value is a value within a preset range adjacent to the coefficient value; a target value of each of the M coefficients is changed every N frames, wherein the M and the N are both Natural number;
  • a fourth acquiring subunit configured to obtain a synthesis filter coefficient of the noise highband signal at the time corresponding to the SID according to the randomized processed filter coefficient.
  • the first acquiring unit includes:
  • a fifth obtaining subunit configured to obtain the M ISF coefficients or ISP coefficients or LSF coefficients or LSP coefficients of the locally buffered noise highband signal
  • a second randomization processing sub-unit performing randomization processing on the M coefficients, wherein the randomization is characterized by: causing each of the M coefficients to gradually move toward a corresponding one of their target values, The target value is a value within a preset range adjacent to the coefficient value; a target value of each of the M coefficients is changed every time the N frame is changed; a sixth acquisition subunit is configured according to The randomized processed filter coefficient obtains a synthesis filter coefficient of a noise high band signal at a time corresponding to the SID.
  • the device further includes:
  • the optimization module 605 is configured to: before the first decoding module 602 obtains the first CN frame, when the historical frame adjacent to the SID is a voice coded frame, if the voice coded frame is decoded by the high band signal or part When the average energy of the high band signal is less than the average energy of the locally generated noise high band signal or the partial noise high band signal, the noise high band signal of the subsequent L frame from the SID is multiplied by a smoothing coefficient less than 1, Obtaining a weighted average of the energy of the new locally generated noise highband signal;
  • the first decoding module 602 is specifically configured to use the noise low band parameter obtained by the decoding, and the SID A weighted average of the synthesized filter coefficients of the noise high band signal at the corresponding time and the energy of the new locally generated noise high band signal results in a fourth CN frame.
  • the decoder acquires a mute insertion description frame SID, determines whether the SID includes a low band parameter and/or includes a high band parameter; if the SID includes the low band parameter, Decoding the SID, obtaining a noise lowband parameter, and locally generating a noise highband parameter, and obtaining a first comfort noise CN frame according to the decoded lowband parameter obtained by the decoding and the locally generated noise highband parameter; If the SID includes the high band parameter, decoding the SID to obtain a noise high band parameter, and locally generating a noise low band parameter, and the noise high band parameter obtained according to the decoding and the locally generated noise low band The parameter obtains a second CN frame; if the SID includes the high band parameter and the low band parameter, decoding the SID to obtain a noise high band parameter and the noise low band parameter, and the noise obtained according to the decoding is high A third CN frame is obtained with parameters and noise low band parameters.
  • a processing system for audio data comprising: an encoding device 500 for audio data as described above and a decoding device 600 for audio data as described above.
  • the technical solution provided by the embodiment of the present invention has the beneficial effects of: acquiring a current noise frame of the audio signal, and decomposing the current noise frame into a noise low band signal and a noise high band signal, and coding by using the first discontinuous transmission mechanism Transmitting the noise low band signal, encoding the noise high band signal by using a second discontinuous transmission mechanism, and the decoder acquires a mute insertion description frame SID, determining whether the SID includes a low band parameter and/or includes a high band parameter; If the SID includes the low band parameter, decoding the SID, obtaining a noise low band parameter, and locally generating a noise high band parameter, the noise low band parameter obtained according to the decoding, and the locally generated The noise high band parameter obtains a first comfort noise CN frame; if the SID includes the high band parameter, decoding the SID to obtain a noise high band parameter, and locally generating a noise low band parameter, according to the decoded noise a high band parameter and the locally generated noise low band parameter
  • the computational complexity and the coding bits can be saved without reducing the subjective quality of the codec, and the saved bits can be reduced to reduce the transmission bandwidth or to improve the overall coding.
  • the purpose of quality thus solving the problem of encoding transmission due to ultra-wideband.
  • the device and the system provided by this embodiment may be the same as the method embodiment, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • the method and apparatus for processing audio data in the above embodiments may be applied to an audio encoder or an audio decoder.
  • Audio codecs can be used in a wide variety of electronic devices such as mobile phones, wireless devices, personal data assistants (PDAs), handheld or portable computers, GPS receivers/navigators, cameras, audio/video players, Cameras, video recorders, surveillance equipment, etc.
  • PDAs personal data assistants
  • Such an electronic device includes an audio encoder or an audio decoder, and the audio encoder or decoder may be directly implemented by a digital circuit or a chip such as a DSP (digital signal processor), or may be executed by a software code driven processor in the software code. The process is implemented.
  • DSP digital signal processor

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Noise Elimination (AREA)

Abstract

本发明公开了一种音频数据的处理方法、装置和系统,属于通信技术领域。所述方法包括:获取音频信号的噪声帧,并将所述当前噪声帧分解为噪声低带信号和噪声高带信号;以第一非连续传输机制编码传输所述噪声低带信号;以第二非连续传输机制编码传输所述噪声高带信号。本发明通过对高带信号和低带信号不同的处理方式,可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比特,节省下的比特可达到降低传输带宽或用于提高整体编码质量的目的。

Description

音频数据的处理方法、 装置和系统 技术领域
本发明涉及通信技术领域, 特别涉及一种音频数据的处理方法、 装置和系统 背景技术 说
在数字通信领域, 语音、 图像、 音频、 视频的传输有着非常广泛的应用需求, 如手机通 话、 音视频会议、 广播电视、 多媒体娱乐等。 语音被数字化处理, 通过语音通信网络从一个 终端传递到另一个终端, 这里的终端可以是手机、数字电话终端或其他任何类型的语音终端, 数字电话终端例如 VOIP电话或 ISDN电话、 计算机、 电书缆通信电话。 为了降低音频信号存储 或者传输过程中占用的资源, 音频信号在发送端进行压缩处理后传输到接收端, 接收端通过 解压缩处理恢复音频信号并进行播放。
在语音通信中只有大约 40 %的时间是包含语音的, 其它时间都是静音或背景噪声。 为了 节省传输带宽, 避免在静音或背景噪声段消耗不必要的带宽, DTX/CNG ( Discontinuous transmission system/Comfort Noise Generation, 非连续传输 /舒适噪声生成) 技术应运而 生。 DTX/CNG简单的说就是不对噪声帧进行连续的编码, 而是按照某种策略在噪声 /静音期间 每间隔若干帧才做一次编码, 且编码的码率通常较对语音帧编码的码率低的多。 这种低速率 的噪声编码帧叫做 SID ( Silence Insertion Descriptor, 静音插入描述帧)。 解码器根据间 断的接收到的 SID在解码端恢复出连续的背景噪声帧来。 这种连续的恢复出的背景噪声并不 是对编码端背景噪声的忠实重现, 而是力求能够尽量不引入听觉上的质量下降, 使用户听起 来感觉比较舒适, 这种恢复出的背景噪声就叫做 CN (Comfort Noise, 舒适噪声), 这种解码 端恢复 CN的方法就叫做舒适噪声生成。
现有技术中, ITU-T G. 718 是较新的一个标准化的宽带编解码器, 其中包含了一个宽带 的 DTX/CNG系统。 该系统可以依据固定间隔发送 SID, 也可以根据估计出的噪声电平高低自 适应的调节 SID的发送间隔。 G. 718 SID帧由 16个 ISP参数和激励能量参数组成。 该组 ISP ( Immittance Spectral Pair, 导抗谱对) 参数表征的是噪声在整个宽带带宽上的频谱包络, 激励能量则是由该组 ISP参数表示的分析滤波器得到的。 在解码端, G. 718在 CNG状态下根 据解码 SID得到的 ISP参数估计出 CNG所需的 LPC系数, 根据解码 SID帧得到的激励能量参 数估计出 CNG所需的激励能量, 使用经增益调整后的白噪声激励 CNG合成滤波器得到重建的 CN。
但是对于超宽带频谱包络来说, 由于超宽带的带宽极宽, 如果将现有技术扩展到超宽带 DTX/CNG系统的话, 由于 SID需要编码完整的超宽带频谱包络, 这就需要消耗更多的计算量 和比特来计算和编码增加的十几个 ISP参数。 由于噪声的高带信号 (这里指宽带以上的频率 范围) 通常在听觉上都感知不敏感, 为这部分信号消耗的计算量和比特就变得很不划算, 从 而降低了编解码器的编码效率。 发明内容
为了解决由于超宽带的编码传输问题, 本发明实施例提供了一种音频数据的处理方法、 设备和系统。 所述技术方案如下:
一方面, 提供了一种音频数据的处理方法, 所述方法包括:
获取音频信号的噪声帧, 并将所述噪声帧分解为噪声低带信号和噪声高带信号; 以第一非连续传输机制编码传输所述噪声低带信号, 以第二非连续传输机制编码传输所 述噪声高带信号, 其中所述第一非连续传输机制的第一静音插入描述帧 SID的发送策略和所 述第二非连续传输机制的第二 SID的发送策略不的,或,所述第一非连续传输机制的第一 SID 的编码策略和所述第二非连续传输机制的第二 SID的编码策略不同。
一方面, 提供了一种音频数据的处理方法, 其特征在于, 所述方法包括:
解码器获取静音插入描述帧 SID, 判断所述 SID是否包含低带参数和 /或包含高带参数; 如果所述 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地生成 噪声高带参数, 根据所述解码得到的所述噪声低带参数和所述本地生成的噪声高带参数得到 第一舒适噪声 CN帧;
如果所述 SID包含所述高带参数, 则解码所述 SID得到噪声高带参数, 并在本地生成噪 声低带参数, 根据所述解码得到的噪声高带参数和所述本地生成的噪声低带参数得到第二 CN 帧;
如果所述 SID包含所述高带参数和所述低带参数, 则解码所述 SID得到噪声高带参数和 所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低带参数得到第三 CN帧。
另一方面, 提供了一种音频数据的编码装置, 所述装置包括:
获取模块, 用于获取音频信号的噪声帧, 并将所述噪声帧分解为噪声低带信号和噪声高 带信号; 传输模块, 用于以第一非连续传输机制编码传输所述噪声低带信号, 以第二非连续传输 机制编码传输所述噪声高带信号, 其中所述第一非连续传输机制的第一静音插入描述帧 SID 的发送策略和所述第二非连续传输机制的第二 SID的发送策略不的, 或, 所述第一非连续传 输机制的第一 SID的编码策略和所述第二非连续传输机制的第二 SID的编码策略不同。
另一方面, 还提供了一种音频数据的解码装置, 所述装置包括:
获取模块, 用于获取静音插入描述帧 SID, 判断所述 SID是否包含低带参数和 /或包含高 带参数;
第一解码模块, 用于如果所述获取模块获取的 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声低带参数和 所述本地生成的噪声高带参数得到第一舒适噪声 CN帧;
第二解码模块, 用于如果所述获取模块获取的 SID包含所述高带参数, 则解码所述 SID 得到噪声高带参数, 并在本地生成噪声低带参数, 根据所述解码得到的噪声高带参数和所述 本地生成的噪声低带参数得到第二 CN帧;
第三解码模块, 用于如果所述获取模块获取的 SID包含所述高带参数和所述低带参数, 则解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的噪声高带参数 和噪声低带参数得到第三 CN帧。
另一方面, 还提供了一种音频数据的处理系统, 所述系统包括: 如上所述的音频数据的 编码装置和如上所述的音频数据的解码装置。
本发明实施例提供的技术方案带来的有益效果是: 将当前噪声帧分解为噪声低带信号和 噪声高带信号, 以第一非连续传输机制编码传输所述噪声低带信号, 以第二非连续传输机制 编码传输所述噪声高带信号, 解码器获取静音插入描述帧 SID, 判断所述 SID是否包含低带 参数和 /或包含高带参数; 针对不同判断结果采用不同的噪声解码方式。 这样通过对高带信号 和低带信号不同的噪声编解码处理方式, 可以在不降低编解码器主观质量的前提下节省计算 复杂度和编码比特, 节省下的比特可达到降低传输带宽或用于提高整体编码质量的目的, 从 而解决了由于超宽带的编码传输问题。 附图说明
为了更清楚地说明本发明实施例中的技术方案, 下面将对实施例描述中所需要使用的附 图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明的一些实施例, 对于本领域 普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。 图 1是本发明实施例 1中提供的一种音频数据处理的方法的流程图;
图 2是本发明实施例 2中提供的一种音频数据处理的方法的流程图;
图 3是本发明实施例 3中提供的一种音频数据处理的方法的流程图;
图 4是本发明实施例 4中提供的一种音频数据处理的方法的流程图;
图 5是本发明实施例 6中提供的一种音频数据的编码装置的示意图;
图 6是本发明实施例 6中提供的另一种音频数据的编码装置的示意图;
图 7是本发明实施例 7中提供的一种音频数据的解码装置的示意图;
图 8是本发明实施例 7中提供的另一种音频数据的解码装置的示意图;
图 9是本发明实施例 8中提供的一种音频数据的处理系统的示意图。 具体实施方式
为使本发明的目的、 技术方案和优点更加清楚, 下面将结合附图对本发明实施方式作进 一步地详细描述。
实施例 1
参见图 1, 本实施例提供了一种音频数据的处理方法, 所述方法包括:
101、 获取音频信号的噪声帧, 并将所述噪声帧分解为噪声低带信号和噪声高带信号;
102、 以第一非连续传输机制编码传输所述噪声低带信号, 以第二非连续传输机制编码传 输所述噪声高带信号, 其中所述第一非连续传输机制的第一静音插入描述帧 SID的发送策略 和所述第二非连续传输机制的第二 SID的发送策略不同, 或, 所述第一非连续传输机制的第 一 SID的编码策略和所述第二非连续传输机制的第二 SID的编码策略不同。
本实施例中, 所述第一 SID包含所述噪声帧的低带参数, 所述第二 SID包含所述噪声帧 的低带参数或高带参数。
可选地, 本实施例中, 以第二非连续传输机制编码传输所述噪声高带信号, 包括: 判断所述噪声高带信号是否具有预设的频谱结构, 如果是, 且满足所述第二 SID发送策 略的中的发送条件, 则以所述第二 SID编码策略编码所述噪声高带信号的 SID并发送; 如果 否, 则确定不需要对所述噪声高带信号进行编码传输。
其中, 所述判断所述噪声高带信号是否具有预设的频谱结构包括:
获得所述噪声高带信号的频谱, 将所述频谱划分为至少两个子带, 如果所述子带中任一 第一子带的平均能量均不小于所述子带中第二子带的平均能量, 其中所述第二子带所处的频 带高于所述第一子带所处频带, 则确认所述噪声高带信号不具有预设的频谱结构, 否则所述 噪声高带信号具有预设的频谱结构。
可选地, 本实施例中, 所述以第二非连续传输机制编码传输所述噪声高带信号, 包括: 根据第一比值和第二比值生成偏离程度值, 其中所述第一比值是所述噪声帧的噪声高带 信号的能量与所述噪声低带信号的能量的比值, 所述第二比值是在所述噪声帧之前最近一次 发送包含有噪声高带参数的 SID所对应的时刻的噪声高带信号的能量和噪声低带信号的能量 的比值;
判断所述偏离程度值是否达到预设的阈值, 如果是, 则以所述第二 SID编码策略编码所 述噪声高带信号的 SID并发送; 如果否, 则确定不需要对所述噪声高带信号进行编码传输。
其中, 可选地, 所述第一比值是所述噪声帧的噪声高带信号的能量与所述噪声低带信号 的能量的比值, 包括:
所述第一比值是所述噪声帧的噪声高带信号的即时能量与所述噪声低带信号的即时能量 的比值;
相应地, 所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所 对应的时刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括:
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声高带信号的即时能量和噪声低带信号的即时能量的比值;
或, 所述第一比值是所述噪声帧的噪声高带信号的能量与所述噪声低带信号的能量的比 值, 包括:
所述第一比值是所述噪声帧及其之前的噪声帧的噪声高带信号的加权平均能量与所述噪 声帧及其之前的噪声帧的噪声低带信号的加权平均能量的比值;
相应地, 所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所 对应的时刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括:
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声帧及其之前的噪声帧的高带信号的加权平均能量和低带信号的加权平均能量的比 值。
本实施例中, 所述根据第一比值和第二比值生成偏离程度值, 包括:
分别计算第一比值的对数值和第二比值的对数值;
计算所述第一比值的对数值和所述第二比值的对数值的差的绝对值, 得到所述偏离程度 值。
可选地, 本实施例中, 所述以第二非连续传输机制编码传输所述噪声高带信号, 包括: 判断所述噪声帧的噪声高带信号的频谱结构与在所述噪声帧之前的噪声高带信号的平均 频谱结构相比是否满足预设条件, 如果是, 则以所述第二编码策略编码所述噪声帧的噪声高 带信号的 SID并发送; 如果否, 则确定不需要对所述噪声帧的噪声高带信号进行编码传输。
其中, 所述噪声帧之前的噪声高带信号的平均频谱结构包括: 在所述噪声帧之前的噪声 高带信号的频谱的加权平均。
本实施例中, 所述第二非连续传输机制的第二 SID的发送策略中的发送条件还包括: 所 述第一非连续传输机制满足所述第一 SID的发送条件。
本发明提供的方法实施例的有益效果是: 获取音频信号的当前噪声帧, 并将所述当前噪 声帧分解为噪声低带信号和噪声高带信号, 以第一非连续传输机制编码传输所述噪声低带信 号, 以第二非连续传输机制编码传输所述噪声高带信号, 这样通过对高带信号和低带信号不 同的处理方式, 可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比特, 节省 下的比特可达到降低传输带宽或用于提高整体编码质量的目的, 从而解决了由于超宽带的编 码传输问题。 实施例 2
参见图 2, 本实施例中提供了一种音频数据的处理方法, 所述方法包括:
201、解码器获取静音插入描述帧 SID,判断所述 SID是否包含低带参数或包含高带参数;
202、 如果所述 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地 生成噪声高带参数, 根据所述解码得到的所述噪声低带参数和所述本地生成的噪声高带参数 得到第一舒适噪声 CN帧;
203、 如果所述 SID包含所述高带参数, 则解码所述 SID得到噪声高带参数, 并在本地生 成噪声低带参数, 根据所述解码得到的噪声高带参数和所述本地生成的噪声低带参数得到第 二 CN帧;
204、如果所述 SID包含所述高带参数和所述低带参数, 则解码所述 SID得到噪声高带参 数和所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低带参数得到第三 CN帧。
可选地, 本实施例中如果所述 SID包含所述低带参数, 则所述解码所述 SID, 得到噪声 低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声低带参数和所述本地 生成的噪声高带参数得到第一舒适噪声 CN帧之前, 还包括:
如果所述解码器处于第一舒适噪声生成 CNG状态, 则所述解码器进入第二 CNG状态。 可选地, 本实施例中, 如果所述 SID包含所述高带参数和所述低带参数, 则所述解码所 述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低 带参数得到第三 CN帧之前, 还包括:
如果所述解码器处于所述第二 CNG状态, 则所述解码器进入第一 CNG状态。
可选地, 本实施例中, 判断所述 SID是否包含低带参数和 /或包含高带参数包括: 如果所述 SID的比特数小于预设的第一阈值, 则确认所述 SID包含有高带参数; 如果所 述 SID的比特数大于预设的第一阈值且小于预设的第二阈值, 则确认所述 SID包含有低带参 数; 如果所述 SID的比特数大于预设的第二阈值且小于预设的第三阈值, 则确认所述 SID包 含有高带参数和低带参数;
或, 如果所述 SID中包含第一标识符, 则确认所述 SID包含有高带参数, 如果所述 SID 中包含第二标识符, 则确认所述 SID包含有低带参数, 如果所述 SID中包含第三标识符, 则 确认所述 SID包含有低带参数和高带参数。
本实施例中, 所述在本地生成噪声高带参数包括:
分别获得所述 SID所对应的时刻的噪声高带信号的加权平均能量和噪声高带信号的合成 滤波器系数;
根据所述获得的所述 SID所对应的时刻的噪声高带信号的加权平均能量和噪声高带信号 的合成滤波器系数得到所述噪声高带信号。
可选地本实施例中, 所述获得所述 SID所对应的时刻的噪声高带信号的加权平均能量, 包括:
根据所述解码得到的噪声低带参数得到第一 CN帧的低带信号的能量;
计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的能量 和噪声低带信号的能量的比值得到第一比值;
根据所述第一 CN帧的低带信号的能量和所述第一比值,获得所述 SID的对应的时刻的噪 声高带信号的能量;
将所述 SID对应的时刻的噪声高带信号的能量与本地缓存的 CN帧的高带信号的能量做加 权平均, 得到所述 SID对应的时刻的噪声高带信号的加权平均能量, 其中所述 SID对应的时 刻的噪声高带信号的加权平均能量就是所述第一 CN帧的高带信号能量。
可选地本实施例中, 所述计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对 应的噪声高带信号的能量和噪声低带信号的能量的比值得到第一比值, 包括:
计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的即时 能量和噪声低带信号的即时能量的比值得到第一比值; 或, 计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的 能量的加权平均和噪声低带信号的能量的加权平均的比值得到第一比值。
其中,当所述 SID对应的时刻的噪声高带信号的能量大于所述本地缓存的前一 CN帧的高 带信号的能量时, 则以第一速率更新所述本地缓存的前一 CN帧的高带信号的能量, 否则以第 二速率更新所述本地缓存的前一 CN帧的高带信号的能量, 所述第一速率大于所述第二速率。
可选地本实施例中,所述获得所述 SID所对应的时刻的噪声高带信号的能量的加权平均, 包括:
选取所述 SID之前预设时间段内的语音帧中高带信号能量最小的语音帧的高带信号; 根据所述语音帧中高带信号能量最小的语音帧的高带信号的能量获得所述 SID所对应的 时刻的噪声高带信号的加权平均能量, 其中所述 SID对应的时刻的噪声高带信号的加权平均 能量就是所述第一 CN帧的高带信号能量;
或, 选取所述 SID之前预设时间段内的语音帧中高带信号能量小于预设阈值的 N个语音 帧的高带信号;
根据所述 N个语音帧的高带信号的加权平均能量获得所述 SID所对应的时刻的噪声高带 信号的能量的加权平均, 其中所述 SID对应的时刻的噪声高带信号的加权平均能量就是所述 第一 CN帧的高带信号能量。
可选地本实施例中,所述获得所述 SID所对应的时刻的噪声高带信号的合成滤波器系数, 包括:
在高带信号所对应的频率范围内分布 M个 ISF ( Immittance Spectral Frequency, 导抗 谱频率) 系数或 ISP系数或 LSF (Line Spectral Frequency, 线谱频率) 系数或 LSP (Line Spectral pair, 线谱对) 系数;
对所述 M个系数进行随机化处理, 其中所述随机化的特征为: 使所述 M个系数中的每个 系数向其各自对应的一个目标值逐渐靠拢,所述目标值为与该系数值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过 N帧发生改变,其中所述 M和所述 N均为自然数; 根据所述随机化处理后的滤波器系数得到所述 SID所对应的时刻的噪声高带信号的合成 滤波器系数。
可选地, 本实施例中, 所述获得所述 SID所对应的时刻的噪声高带信号的合成滤波器系 数, 包括:
获取本地缓存的噪声高带信号的所述 M个 ISF系数或 ISP系数或 LSF系数或 LSP系数; 对所述 M个系数进行随机化处理, 其中所述随机化的特征为: 使所述 M个系数中的每个 系数向其各自对应的一个目标值逐渐靠拢,所述目标值为与该系数值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过所述 N帧发生改变;
根据所述随机化处理后的滤波系数得到所述 SID所对应的时刻的噪声高带信号的合成滤 波器系数。
可选地, 本实施例中, 所述根据所述解码得到的噪声低带参数和所述本地生成的噪声高 带参数得到第一 CN帧之前, 还包括:
当与所述 SID相邻的历史帧为语音编码帧时, 若所述语音编码帧解码出的高带信号或部 分高带信号的平均能量小于所述本地生成的噪声高带信号或部分噪声高带信号的平均能量 时, 对从所述 SID开始的后续 L帧的噪声高带信号乘以小于 1的平滑系数, 得到新的本地生 成的噪声高带信号的能量的加权平均;
相应地, 所述根据所述解码得到的噪声低带参数和所述本地生成的噪声高带参数得到第 一 CN帧, 包括:
根据所述解码得到的噪声低带参数、 所述 SID所对应的时刻的噪声高带信号的合成滤波 器系数和所述新的本地生成的噪声高带信号的能量的加权平均得到第四 CN帧。
本发明提供的方法实施例的有益效果是: 解码器获取静音插入描述帧 SID, 判断所述 SID 是否包含低带参数和 /或包含高带参数; 如果所述 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声低带参数和 所述本地生成的噪声高带参数得到第一舒适噪声 CN帧; 如果所述 SID包含所述高带参数, 则 解码所述 SID得到噪声高带参数, 并在本地生成噪声低带参数, 根据所述解码得到的噪声高 带参数和所述本地生成的噪声低带参数得到第二 CN帧;如果所述 SID包含所述高带参数和所 述低带参数, 则解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的 噪声高带参数和噪声低带参数得到第三 CN帧。这样通过对高带信号和低带信号不同的处理方 式, 可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比特, 节省下的比特可 达到降低传输带宽或用于提高整体编码质量的目的,从而解决了由于超宽带的编码传输问题。 实施例 3
本实施例中提供了一种音频数据的处理方法, 对于编码端, 不管是低频带的 CNG噪声谱 还是高频带的 CNG噪声谱通常都已失去了谐波结构了, 这样, CNG高频带信号对听觉感知起 作用的将主要是其能量而非谱结构。 因此, 在对超宽带信号进行 DTX传输时, 很多情况下是 没必要在 SID中传输高带信号谱的, 而可以通过适当的方法在解码端本地构造出高带谱, 这 种本地构造出的高带频谱并不会造成明显的感知失真。 这样, 在编码端计算和编码高带谱的 计算量和比特就节省下来了。 同时, 对于另一些噪声信号, 其在高带信号可能存在一定的谐 波结构, 仅靠解码端本地构造高带频谱可能会在 CNG段与语音段切换时产生感知质量下降的 问题, 因此这类噪声则需要在 SID中传输高带信号的谱参数。 可见, 一个兼顾效率和质量的 DTX/CNG系统应该是在编码端能够根据背景噪声的高带特征自适应的选择在 SID中编码或不 编码高带谱参数, 而在解码端根据 SID的不同类型采用不同的解码方法来重建 CNG帧。 本实 施例中, 提供了一种音频数据的处理方法包括: 对噪声高带频谱的分析 /分类, 解码器对高带 信号谱的盲构造, 当 SID不包含高带能量参数时解码器对高带信号能量的估计, 解码器在不 同 CNG模块间的切换等。 参见图 3, 具体的本实施例中在编码器端提供的音频数据的处理方 法包括:
301、 编码器获取音频信号的噪声帧, 并将噪声帧分解为噪声低带信号和噪声高带信号。 本实施例中, 由于编码器编码规则不同, 编码器获取音频信号的噪声帧, 其中该噪声帧 可以是当前噪声帧, 也可以是编码器端缓存的噪声帧对此本实施例不做具体限定。 本实施例 中, 以 32kHz采样的超宽带输入音频信号为例。 编码器首先对输入音频信号进行分帧处理, 以 20ms (或 640采样点) 为一帧。 对当前帧 (本实施例中当前帧指当前待编码帧), 编码器首 先进行一个高通滤波, 通带为 50Hz以上的频率。将高通滤波后的当前帧通过 QMF (Quadrature Mirror Filter, 正交镜像滤波器) 分析滤波器分解为一个低带信号 s。和一个高带信号 Sl, 其 中低带信号 s。为 16kHz采样, 表征当前帧的 (T8kHz谱, 高带信号 也是 16kHz采样, 表征当前帧 的 8 〜 16kHz谱。 当 VAD (Voice Activity Detector, 语音激活检测器) 指示当前帧为前景信 号帧, 即语音信号帧时, 则编码器对当前帧进行语音编码, 本实施例中, 编码器对语音编码 帧进行编码属于现有技术范畴, 对此本实施例不再赘述。 当 VAD指示当前帧为噪声帧时编码器 进入 DTX工作状态, 本实施例中噪声帧既指背景噪声帧也指静音帧。
本实施例中, 在 DTX工作状态下, DTX控制器根据 SID发送策略决定当前帧的低带信号是否 编码 SID并发送。本实施例中低带信号 SID发送策略如下: 1 )在语音编码帧之后的第一个噪声 帧发送 SID, 设置发送 SID标志 flagSID=l; 2)在噪声期间, 在每一 SID帧之后第 N帧发送一次 SID 帧, 在该帧设置 flagSID=l, 其中 N为大于 1的整数, 由编码器外部输入; 3) 噪声期间的其余帧 不发送 SID, 设置 flagSID=0。 其中, 本实施例中低带信号的 SID发送策略与现有技术类似, 本 发明对此不做详细描述。
302、判断当前噪声帧的高带信号是否满足预设的编码传输条件,如果是,则执行步骤 304, 否则执行步骤 303。 本实施例中, 判断当前噪声帧的高带信号是否满足预设的编码传输条件包括: 判断所述 噪声高带信号是否具有预设的频谱结构, 如果是, 且满足所述第二 SID发送策略的中的发送 条件, 则以所述第二 SID编码策略编码所述噪声高带信号的 SID并发送; 如果否, 则确定不 需要对所述噪声高带信号进行编码传输。 其中判断所述噪声高带信号是否具有预设的频谱结 构包括: 获得所述噪声高带信号的频谱, 将所述频谱划分为至少两个子带, 如果所述子带中 任一第一子带的平均能量均不小于所述子带中第二子带的平均能量, 其中所述第二子带所处 的频带高于所述第一子带所处频带, 则确认所述噪声高带信号不具有预设的频谱结构, 否则 所述噪声高带信号具有预设的频谱结构。
本实施例中,在 DTX工作状态下,编码器对当前噪声帧的高带信号 Sl进行频谱分析以确定 51是否具有较明显的频谱结构, 即预设的频谱结构。 本实施例中的具体方法为: 对 51做下采样 至 IJ l2.8kHz, 对下采样后的信号做 256点 FFT, 得到频谱 C , i=0,. . . 127。 将 C 划分为等宽的 4 个子带, 计算每个子带的能量 E(i), 每个子带就是上述所说的任一第一子带, ?( ) = c( ), i=0, . . .3, 其中 l(i), h(i)分别表示第 i子带的上下边界。 1(0={0, 32, 64, 96}, h(i)={31, 63, 95, 127}。 检查是否满足条件:
E(i)≥VE(j) j > i ( 1 ) 其中 EG)就是上述所说的第二子带, 若上述公式 (1 ) 满足, 即所述子带中任一第一子带的能 量均不小于所述子带中第二子带的能量, 则认为高带信号不具有明显的频谱结构, 否则具有。 如果高带信号具有明显的频谱结构, 则 DTX策略为发送高带参数。 本实施例中, 若发送高带 参数标志 flaghb不为 1, 则在下次 flagSID=l时设置 flaghb=l, 否则 flaghb=0。
本实施例中, 满足 SID发送条件时, 可以通过当前噪声帧的高带信号的频谱结构来判断 当前噪声帧的高带信号是否需要编码传输, 将判断所述噪声高带信号是否具有预设的频谱结 构且噪声低带信号是否满足 SID发送条件, 做为第一判断条件, 可选地, 本实施例中, 判断 当前噪声帧的高带信号是否满足预设的编码发送条件包括: 根据第一比值和第二比值生成偏 离程度值, 其中所述第一比值是所述噪声帧的噪声高带信号的能量与所述噪声低带信号的能 量的比值, 所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对 应的时刻的噪声高带信号的能量和噪声低带信号的能量的比值; 判断所述偏离程度值是否达 到预设的阈值, 如果是, 则以所述第二 SID编码策略编码所述噪声高带信号的 SID并发送; 如果否, 则确定不需要对所述噪声高带信号进行编码传输。 其中, 可选地, 所述第一比值是 所述噪声帧的噪声高带信号的能量与所述噪声低带信号的能量的比值, 包括: 所述第一比值 是所述噪声帧的噪声高带信号的即时能量与所述噪声低带信号的即时能量的比值; 相应地, 所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时刻的 噪声高带信号的能量和噪声低带信号的能量的比值, 包括: 所述第二比值是在所述噪声帧之 前最近一次发送包含有噪声高带参数的 SID所对应的时刻的噪声高带信号的即时能量和噪声 低带信号的即时能量的比值; 或, 所述第一比值是所述噪声帧的噪声高带信号的能量与所述 噪声低带信号的能量的比值, 包括: 所述第一比值是所述噪声帧及其之前的噪声帧的噪声高 带信号的加权平均能量与所述噪声帧及其之前的噪声帧的噪声低带信号的加权平均能量的比 值; 相应地, 所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所 对应的时刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括: 所述第二比值是在 所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时刻的噪声帧及其之前的 噪声帧的高带信号的加权平均能量和低带信号的加权平均能量的比值。本实施例中, 优选地, 根据第一比值和第二比值生成偏离程度值, 包括: 分别计算第一比值的对数值和第二比值的 对数值; 取所述第一比值的对数值和所述第二比值的对数值的差的绝对值, 得到所述偏离程 度值。
具体的, 本实施例中, 判断所述偏离程度值是否达到预设的阈值可以通过以下方式实现: 在 DTX工作状态下, 编码器分别计算当前帧高低带信号 Sl, so的对数能量 e 。
ex = 10 . 1og10(2 x( )2) x = 0,l = 0,1,.. "319 ( 2 ) 更新 e eQ在编码端的长时滑动平均 ela, e0a:
exa = e(~l) + a■ 3 x = 0, l ( 3 )
Figure imgf000013_0001
其中 表示符号函数, M/N[.]表示取小函数, 1.1表示绝对值函数, 形式 χ(_υ表示前一帧 X的 取值, α=0.1为遗忘系数决定着更新速度的快慢, 其中前一帧就是在当前噪声帧前面最近一次 发送包含有高带参数的 SID。 本实施例中对 ela, eQ 更新幅度是受限的, 若当前噪声帧的 ex¾ 前一帧的 exa的能量变化大于 3dB, 则按 3dB更新当前帧的 exa。当编码器第一次进入 DTX工作状 态时, exa初始化为当前帧的 ex。 检查当前噪声帧高低带信号能量比 (即第一比值) 是否偏离 最近一次发送包含有高带参数的 SID时的高低带能量比(第二比值)达一定程度, 即检查是否 满足如下条件:
0。。 _ _ 0。—。_ 。)| > 4.5 ( 4 )
其中 , 分别表示最近一次发送包含有高带参数的 SID帧时的高低带对数能量, 若上 述公式(4 )满足,则需要对噪声高带信号进行编码发送,其中如果发送高带参数标志 flaghb=0, 则置 flaghb=l。
本实施例中, 长时滑动平均属于加权平均计算的一种, 对此本实施例不做具体限定。 本实施例中, 判断所述偏离程度值是否达到预设的阈值可以做为第二判断条件, 在具体 的实施过程中, 只需要对第一判断条件或是第二判断条件中的任意一个进行判断就可以确认 噪声高带信号是否需要进行编码传输, 对此本实施例不做具体限定。
本实施例中, 第二判断条件是可选地, 执行该步骤的目的是为了协助解码端可以根据噪 声低带能量和最近一次接收到包含有高带参数的 SID时的噪声高低带能量比值在本地估计出 高带噪声的能量。 具体的, 如果在编码端没有计算偏离程度值, 在解码端可以通过获取当前 噪声帧之前一段时间的语音帧中高带信号能量最小的语音帧, 根据当前噪声帧之前一段时间 的语音帧中高带信号能量最小的语音帧的高带信号能量在本地估计出当前高带噪声的能量, 例如, 选取当前噪声帧之前一段时间的语音帧中高带信号能量最小的语音帧的高带信号能量 做为当前高带噪声的能量, 或, 选取所述 SID之前预设时间段内的语音帧中高带信号能量小 于预设阈值的 N个语音帧的高带信号; 根据所述 N个语音帧的高带信号的加权平均能量获得 所述 SID所对应的时刻的噪声高带信号的能量的加权平均。 具体的本实施例在此不做限定。
303、 以第一非连续传输机制编码传输所述噪声低带信号。
本实施例中, 优选地, 以第一非连续传输机制编码传输所述噪声低带信号包括: 在 DTX 工作状态下, 编码器对当前噪声帧的低带信号 sQ做 16阶线性预测分析, 获得 16个线性预测系 数 lpc(i), i=0,l,...,15。 变换 LPC系数到 ISP系数得 16个 ISP系数 isp(i), i=0,l,...,15, 并将 ISP系数 缓存。如果当前帧编码 SID即 flagSID=l, 则在缓存的包括当前帧在内的 N个历史帧的 ISP系数中 搜索中值 ISP系数, 方法为: 首先计算每个帧的 ISP系数到其余帧 ISP系数的距离 3,
=∑∑(^(" ')- „) jk,k = 0,—\,...,—N + \ (5)
=0 !=0 然后选择 3最小的帧的 ISP系数做为待编码的 ISP系数 ispSID(i), i=0,...,15。变换 ispSID(i)到 ISF系数 isfsiD(i), 对 isfSIDCi)量化, 获得一组量化索引 idxISF, 封装入 SID中。 本地解码 idxISF获得解码后 的 ISF系数 isf(i), i=0,...,15, 变换 isf(i)到 ISP系数 isp'(i), i=0,...,15, 缓存 isp'①。 对每一噪声 帧, 用缓存的 isp'(i)更新编码端的解码后 ISP系数长时滑动平均:
^。()二^ ―1) ') + (1 / = 0,1,...15 (6) 优选地, a=0.9, ispa( )初始化为第一个 SID的 isp'①。 变换 ispa( )到 LPC系数 lpca( ), 得到分析滤 波器 A(Z)。 将每一噪声帧的低带信号 s。通过 A(Z)滤波得到残差信号! <i), i=0,l,...319, 计算对 数残差能量 ,
er = 0丄...319 (7)
Figure imgf000014_0001
本实施例中缓存 。 当当前噪声帧的 flagSID=l时, 根据缓存的包括当前噪声帧在内的 M个历史 帧的 计算加权平均对数能量 eSID, eslD =^j^ 1.5, 其中 Wl(k)为一组 M维正系数,
其和小于 1。 对 eSID量化得到量化索引 idxe。 本实施例中, 在 DTX工作状态下且 flagSID=l时, 如果 flaghb=0, 则此时 SID帧仅编码发送 低带参数, 即此时 SID帧由 idxISF和 idxe组成, 方便起见称为小 SID帧。
本实施例中, 对噪声低带信号的编码传输策略与现有技术中对噪声宽带信号的编码传输 策略类似, 对此本实施例中只是简要的介绍, 具体的实现过程本实施例不做详细描述。 本实 施例中, 当前噪声帧的噪声高带信号不需要进行编码, 只对噪声低带信号进行编码, 节省了 编码端的计算量, 同时也节省了传输比特。
304、 以第一非连续传输机制编码传输所述噪声低带信号, 以第二非连续传输机制编码传 输所述噪声高带信号。
本实施例中, 若 flaghb=l, 则除了需要编码低带参数外, SID还需要编码高带参数。 其中 低带噪声低带参数的编码同步骤 303中的编码方式一样, 对此本实施例不再赘述。本实施例中 优选地, 高带参数的编码方法如下: 仅当 DTX工作状态下且 flaghb=l时, 编码器对当前帧的高 带信号 s ¾10阶线性预测分析, 获得 10个线性预测系数 lpc(i), i=0,l,...,9。 对 lpc(i)加权: lpcw(i) = w2(i)-lpc(i) / = 0,1,...9 (8) 得到加权后的 LPC系数 lpcw(i),其中 w2(i)为一组 9维的小于等于 1的加权系数。变换 lpcw(i)到 LSP 系数得 10个 LSP系数 lspw (i), i=0,l,...,9, 根据 lspw①更新编码端 lspw①的长时滑动平均: lspa (ί) = a - lsp(-r> (i) + (1 - α) · lspw (i) = 0!,… 9 (9) 其中, 优选地, a=0.9, lspa( )在每次 flaghb由 0变为 1时初始化为当前帧的 lspw( )。 当 SID 需要包含高带参数时, 对 lspa(i)进行量化, 获得一组量化索引 id SP。 对高带信号对数能量在 编码端的长时滑动平均 ela进行量化,获得量化索引 idxE。此时, SID将由 idxISF, idxe, idxLSP和 idxE 组成, 本实施例中将由 idxISF, idxe, idxLSP和 idxE组成的 SID称为大 SID。
可选地, lspa(i)也可以是在 DTX工作状态下连续更新的, 即无论 flaghb的取值是 1或 0, 均 对 lspa(i)进行更新, 具体的在 flaghb=0时, 更新 lspa(i)的方法与上述 flaghb=l时的方法一样, 在 此本实施例不在赘述。
本实施例中, 对噪声高带信号的编码策略与对噪声低带信号的编码策略原理类似, 对此 本实施例中只是简要的介绍, 具体的实现过程本实施例不做详细描述。
本实施例中, 在满足噪声高带信号的编码传输条件时, 噪声高带信号的编码传输总是和 噪声低带信号的编码传输同时进行的, 但是可选地, 噪声高带信号的编码传输与噪声低带信 号的编码传输也可以不同时进行, 即在发送 SID时存在三种可能的情况: 1 ) 对当前噪声帧只 进行低带信号的编码传输; 2) 对当前噪声帧只进行高带信号的编码传输; 3 ) 对当前噪声帧 同时进行低带和高带信号的编码传输,此时所述第二非连续传输机制的第二 SID的发送策略中 的发送条件还包括: 所述第一非连续传输机制满足所述第一 SID的发送条件。对以上三种发送 SID的情况本实施例不做具体限定。
本实施例中, 步骤 302-304为具体执行以第一非连续传输机制编码传输所述噪声低带信 号, 以第二非连续传输机制编码传输所述噪声高带信号的步骤, 其中所述第一非连续传输机 制的第一静音插入描述帧 SID的发送策略和所述第二非连续传输机制的第二 SID的发送策略是 不同的, 或, 所述第一非连续传输机制的第一 SID的编码策略和所述第二非连续传输机制的第 二 SID的编码策略是不同的。
本发明提供的方法实施例的有益效果是: 获取音频信号的当前噪声帧, 并将所述当前噪 声帧分解为噪声低带信号和噪声高带信号, 以第一非连续传输机制编码传输所述噪声低带信 号, 以第二非连续传输机制编码传输所述噪声高带信号, 这样通过对高带信号和低带信号不 同的处理方式, 可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比特, 节省 下的比特可达到降低传输带宽或用于提高整体编码质量的目的, 从而解决了由于超宽带的编 码传输问题。 实施例 4
本实施例中提供了一种音频数据的处理方法, 相对于编码器端对噪声信号的处理, 解码 端根据接收到的码流可以判断出当前帧是语音编码帧还是 SID或 NO_DATA帧。 NO_DATA帧 表示编码端在噪声期间没有编码发送 SID的帧。 在当前帧为 SID时解码器还可以进一步根据 SID的比特数判断是该 SID是否包含有低带和 /或高带参数。可选地,解码器也可以根据打入 SID 中的特定标识来判断 SID是否包含有低带和 /或高带参数,这需要在编码 SID时加入额外的标识 比特, 如当在 SID中打入第一标识符时, 标识该 SID只含有高带参数, 打入第二标识符时, 标 识该 SID只含有低带参数, 打入第三标识符, 标识该 SID包含有高带参数和低带参数。 若当前 帧为语音编码帧, 则解码器进行语音帧解码, 具体处理过程与现有技术类似, 本实施例对此 不做详细描述。若当前帧为 SID或 NO_DATA帧, 则解码器根据 CNG的具体工作状态选择各自 对应的方法重建 CN帧。本实施例中 CNG有两种工作状态,对应于小 SID帧的半解码 CNG状态, 即第一 CNG状态, 对应于大 SID帧的全解码 CNG状态, 即第二 CNG状态。 在全解码 CNG状态 下, 解码器根据解码大 SID帧得到的噪声高低带参数重建出 CN帧。 在半解码 CNG状态下, 解 码器根据解码小 SID帧得到的噪声低带参数以及本地估计出的噪声高带参数重建 CN帧。 当解 码端的当前帧为大 SID帧时, 如果 CNG工作状态标志 flagCNe=0 (表示半解码 CNG状态) , 则 设置 CNG工作状态标志 flagCNe=l (表示全解码 CNG状态) , 否则维持原状态。 同样, 当解码 端的当前帧为小 SID帧时, 如果 CNG工作状态标志 flagCNe=l, 则设置 CNG工作状态标志 flagcNG=0, 否则维持原状态。 参见图 4, 具体的本实施例中提供的在解码器端的音频数据的处 理方法包括:
401、 解码器获取 SID, 如果所述 SID包含所述高带参数和所述低带参数, 则解码所述 SID 得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低带参数 得到第三 CN帧。
本实施例中, 解码器端接收到编码器端发送的编码帧后, 先判断该语音帧的类型, 以便 根据语音帧的不同类型相应采用不同的解码方式。 具体的, 如果所述 SID的比特数小于预设 的第一阈值, 则确认所述 SID包含有高带参数; 如果所述 SID的比特数大于预设的第一阈值 且小于预设的第二阈值, 则确认所述 SID包含有低带参数; 如果所述 SID的比特数大于预设 的第二阈值且小于预设的第三阈值, 则确认所述 SID包含有高带参数和低带参数; 或, 如果 所述 SID中包含第一标识符, 则确认所述 SID包含有高带参数, 如果所述 SID中包含第二标 识符, 则确认所述 SID包含有低带参数, 如果所述 SID中包含第三标识符, 则确认所述 SID 包含有低带参数和高带参数。
本实施例中, 如果所述 SID包含所述高带参数和所述低带参数, 则解码所述 SID得到噪声 高带参数和所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低带参数得到第三
CN帧。 具体的, 解码器解码 SID得到解码的低带激励对数能量 eD, 低带 ISF系数 isfd(i), 高带对 数能量 ED和高带 LSP系数 lspd①。 变换 isfd(i)到 ISP系数 ispd(i), 转换 eD, ED到能量 ed, Ed, 其中,
Ed = 10° 1 , ed = 2¾, 缓存 ispd(i), ed, lspd(i)和 Ed
本实施例中, 当解码器在 CNG工作状态下且 flagCNe=l时, 无论当前帧是 SID还是
NO_DATA帧, 使用缓存的 ispd , ed, lspd 和 Ed更新它们各自在解码端的长时滑动平均, ispCN (/') = a . isp ~^ (/') + (1 - α) · ispd (/') /' = 0,1,...15
lspCN (i) = β · Isp^ (i) + (1 - ) · lsPd (i) = 0,1, ...9 ( i q )
其中 cc=0.9, β=0.Ί。 将 ECN缓存入高带能量缓存 Eldd。 在 eCN的基础上加上一个随机小能量得到 最终用于重建低带噪声信号的激励能量 e'CN, eCN = (\ + 0m00\ \- RND - eCN ) - eCN , 其中 RA 是 一个在 [-32767, 32767]范围内的随机数。 本实施例中, 生成一个 320点的白噪声序列 exco(i), i=0, l,〜319, 利用 e'c^fexcoCi)进行 益调整得到 exc'。Ci), 即将 exc。Ci)乘以一个增益系数 G0使 得 exC'Q(i)的能量等于 e'CN, 其中 G( l ispCN(i)变换为 LPC系数得到合成滤波器
Figure imgf000018_0001
l/Ao(Z), 使用增益调整后的激励 eXC'Q(i)激励滤波器 1/A(Z)得到解码端重建的 16kHz采样低带 CN信号 s'Q, 计算 s'Q的能量并缓存入低带能量缓存 Eo。ld
本实施例中, 对于解码端对噪声高带信号与对噪声低带信号的处理类似, 生成另一个 320 点的白噪声序列 exd , i=0, l,〜319, 将 lspCNCi;>变换为 LPC系数得到合成滤波器 1/A ;), 使 用 exCl(i)激励滤波器 l/A )得到未经增益调整的高带 CN信号 s^(i。对 s^(i)乘以增益系数 和
G2=0.8, 得到解码端重建的 16kHz采样高带 CN信号 其中 G1 = 本实施例中, G2
Figure imgf000018_0002
的目的是对重建的噪声信号做一定程度的能量抑制。 本实施例中, 解码器端将 s'Q, s'i通过 QMF合成滤波器, 得到解码器重建的最终 32kHz采 样的第一 CN帧。
402、 如果所述 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地生 成噪声高带参数, 根据所述解码得到的所述噪声低带参数和所述本地生成的噪声高带参数得 到第一 CN帧。
本实施例中, 当解码器在 CNG工作状态下且 flagCNe=0时, 无论当前帧是 SID还是
NO_DATA帧, 依照与 flagCNC=l时相同的方法, 即步骤 402中的方法得到解码端重建的 16kHz 采样低带 CN信号 s'Q, 对此本实施例不再赘述。
本实施例中, 第一 CN帧的高带信号仍然以用白噪声激励合成滤波器的方法得到, 只是 第一 CN帧的高带信号能量和合成滤波器系数依靠本地估计得到。 本实施例中, 在本地生成 噪声高带参数包括: 分别获得所述 SID所对应的时刻的噪声高带信号的加权平均能量和噪声 高带信号的合成滤波器系数; 根据所述获得的所述 SID所对应的时刻的噪声高带信号的加权 平均能量和噪声高带信号的合成滤波器系数得到所述噪声高带信号。
本实施例中优选地, 获得所述 SID所对应的时刻的噪声高带信号的加权平均能量, 包括: 根据所述解码得到的噪声低带参数得到第一 CN帧的低带信号的能量;计算在所述 SID前面接 收到包含有高带参数的 SID的时刻所对应的噪声高带信号的能量和噪声低带信号的能量的比 值得到第一比值; 根据所述第一 CN帧的低带信号的能量和所述第一比值, 获得所述 SID的对 应的时刻的噪声高带信号的能量; 将所述 SID对应的时刻的噪声高带信号的能量与本地缓存 的 CN帧的高带信号的能量做加权平均,得到所述 SID对应的时刻的噪声高带信号的加权平均 能量,其中所述 SID对应的时刻的噪声高带信号的加权平均能量就是所述第一 CN帧的高带信 号能量。 可选地, 其中所述计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应 的噪声高带信号的能量和噪声低带信号的能量的比值得到第一比值, 包括: 计算在所述 SID 前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的即时能量和噪声低带信号 的即时能量的比值得到第一比值; 或, 计算在所述 SID前面接收到包含有高带参数的 SID的 时刻所对应的噪声高带信号的能量的加权平均和噪声低带信号的能量的加权平均的比值得到 第一比值。 其中即时能量就是解码得到的能量。 其中, 当所述 SID对应的时刻的噪声高带信 号的能量大于所述本地缓存的前一 CN帧的高带信号的能量时,则以第一速率更新所述本地缓 存的前一 CN帧的高带信号的能量, 否则以第二速率更新所述本地缓存的前一 CN帧的高带信 号的能量, 所述第一速率大于所述第二速率。
具体的本实施例中上述获得所述 SID所对应的时刻的噪声高带信号的加权平均能量可以 通过以下方法实现:
根据解码得到的噪声低带参数得到第一 CN帧 s'Q的低带信号的能量 EQ。 根据在前一全解 码 CNG状态下缓存的 CN帧的高低带信号的能量 Eldd, Eooid以及 EQ估计出 SID的对应的时 刻的噪声高带信号的能量 E^, 其中, £「=f ^ 。。 利用 更新解码端高带 CN信号能 量的长时滑动平均 ECN, ECN = A - E(-^ + (\ - A) - E^ ,其中系数 Α为变量, 当 £ >£^时^=0.98, 否则 A=0.9, 其中 A=0.98即为第一速率, A=0.9为第二速率。 本实施例中如果在编码端没有计算偏离程度值, 可选地, 获得所述 SID所对应的时刻的 噪声高带信号的能量的加权平均, 包括: 选取所述 SID之前预设时间段内的语音帧中高带信 号能量最小的语音帧的高带信号; 根据所述语音帧中高带信号能量最小的语音帧的高带信号 的能量获得所述 SID所对应的时刻的噪声高带信号的加权平均能量; 或, 选取所述 SID之前 预设时间段内的语音帧中高带信号能量小于预设阈值的 N个语音帧的高带信号; 根据所述 N 个语音帧的高带信号的加权平均能量获得所述 SID所对应的时刻的噪声高带信号的能量的加 权平均,其中所述 SID对应的时刻的噪声高带信号的加权平均能量就是所述第一 CN帧的高带 信号能量。
本实施例中, 优选地, 获得所述 SID所对应的时刻的噪声高带信号的合成滤波器系数, 包括: 在高带信号所对应的频率范围内分布 M个导抗谱频率 ISF系数或导抗谱对 ISP系数或 线谱频率 LSF系数或线谱对 LSP系数; 对所述 M个系数进行随机化处理, 其中所述随机化的 特征为: 使所述 M个系数中的每个系数向其各自对应的一个目标值逐渐靠拢, 所述目标值为 与该系数值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过 N帧发生改 变, 且 N可为变量; 根据所述随机化处理后的滤波器系数得到所述 SID所对应的时刻的噪声 高带信号的合成滤波器系数。
具体的本实施例中获得所述 SID所对应的时刻的噪声高带信号的合成滤波器系数可以通 过以下方式实现:
在低带 ISF系数 isfd(14;)〜 16kHz的频带内平均分布 9个 ISF系数 isfext(i i=0,l,...8, ext (0 = isfd (14) + 0.1 · ( + 1) · (16000 - isfd (14)) = 0,1, ...8 (11) 将 isfext(i)转换到 0〜8kHz频带, 得 isfext(i
Ο) ϋ i = 0,1, ...8 ( 12) 将 isf xt(i)以一组 9维的随机化因子 R(i i=0,l,...8, 随机化, 得随机化后的 ISF系数 isft i):
(0 = · (¥:xt (1) - isfext (0)) + isfext ii) = 0,1, ...8 (13) 其中 R(i)由下式 (14) 得到,
R{i) = a - R(-y> (i) + (\- )-Rt (i) = 0 ,… 8 (14) 其中 cc=0.8, Rt(i)称作目标随机因子, 由下式得到,
RM) =
Figure imgf000020_0001
/ = 0,1,...8 (15)
[R~l)(i) mod(c"t,10)≠0 上式(15)中 RND为一组 9维的随机数序列, 每维随机数各不相同且都在 [-1, 1]的范围内。 cnt为一个帧记数器, 在 CNG工作状态下且 flagCNe=0时每帧 SID或 N0_DATA帧加一, mod(cnt, 10)表示对 cnt取 10的模。 在另一实施例中, 计算 RtCi)时 modCcnt, 10)中的 10也可为变量, 如:
「1 + 0.1· RND(i) mod(cnt, N) = 0
)= n、 = 0,1,...8
R— υ ·) mod(cnt, N)≠ 0
r (16)
N _ \\0 + 5 -RND mod(cnt, N(_1)) = 0
_1N(- D mod(c"t, N(- υ)≠0 其中 RND为 [-1, 1]范围内的随机数, 对此本实施例不做具体限定。
本实施例中, 将低带 ISF系数 isfdC15;>做为 isf )与随机化后的 ISF系数 isft , i=0,l,...8, 组 合成一个 10阶滤波器的 ISF系数, 变换为 LPC系数 lpd , i=0,l,...9。 将 lpd 乘以一组 10维的 加权系数 W(i)={0.6699, 0.5862, 0.5129, 0.4488, 0.3927, 0.3436, 0.3007, 0.2631, 0.2302, 0.2014}, 得加权后的 LPC系数 lpc^i), 即为估计出的合成滤波器 1/A (Z)。 本实施例中,生成 320点的白噪声序列 exc2(i), i=0, l,〜319,使用 exc2(i)激励滤波器 1/Α (Ζ) 得到未经增益调整的高带 CN信号 s^(i)。 对 s i)乘以增益系数 G3和 G4=0.6, 得到解码端重建的
16kHz采样高带 CN信号 其中 G
Figure imgf000021_0001
如果当前帧是 SID, 则需要变换 lpc^i)到 LSP系数 lsp^(i), 并使用 lsp^(i)更新解码端缓存 的 CN帧的高带信号的 LSP系数的长时滑动平均,
lspCN (i) = β · hp^ (j) + (\ - β) - Isp (i) = 0,1, ...9 ( 17 ) 其中 β=0.Ί。
本实施例中, 可选地, 所述获得所述 SID所对应的时刻的噪声高带信号的合成滤波器系 数, 包括: 获取本地缓存的噪声高带信号的所述 Μ个 ISF或 ISP或 LSF或 LSP系数; 对所述 Μ个系数进行随机化处理, 其中所述随机化的特征为: 使所述 Μ个系数中的每个系数向其各 自对应的一个目标值逐渐靠拢, 所述目标值为与该系数值相邻的预设范围内的值; 所述 Μ个 系数中的每个系数的目标值每经过所述 Ν帧发生改变; 根据所述随机化处理后的滤波系数得 到所述 SID所对应的时刻的噪声高带信号的合成滤波器系数。 对此本实施例不做具体限定。
本实例例中, 得到低带参数和高带参数后将 s'Q, 通过 QMF合成滤波器, 得到解码器 重建的最终 32kHz采样的第一 CN帧。
进一步地, 本实施例中可选地, 根据所述解码得到的噪声低带参数和所述本地生成的噪 声高带参数得到第一 CN帧之前, 还可以对本地生成的噪声高带参数进行优化, 以便能得到效 果更好的舒适噪声, 其中具体的优化步骤包括: 当与所述 SID相邻的历史帧为语音编码帧时, 若所述语音编码帧解码出的高带信号或部分高带信号的平均能量小于所述本地生成的噪声高 带信号或部分噪声高带信号的平均能量时, 对从所述 SID开始的后续 L帧的噪声高带信号乘 以小于 1的平滑系数, 得到新的本地生成的噪声高带信号的能量的加权平均; 相应地, 所述 根据所述解码得到的噪声低带参数和所述本地生成的噪声高带参数得到第一 CN帧, 包括: 根 据所述解码得到的噪声低带参数、 所述 SID所对应的时刻的噪声高带信号的合成滤波器系数 和所述新的本地生成的噪声高带信号的能量的加权平均得到第四 CN帧。
本实施例中,当当前 SID的前一帧为语音编码帧,且该语音编码帧高带信号能量 Esp比 的能量 E 低时, 需要对当前 SID及之后的若干 SID (本实施例中为 50帧) 的高带信号能量 进行平滑。 具体平滑方法为: 将当前帧的 3' 1乘以增益 Gs, 得到平滑后的 s' ls。 其中
Gs = 2 1 - 0.02 · (50 - ^) · (1 -£¾ ) , 其中 cnt为帧记数器, 从语音编码帧之后的第一帧 CN 帧开始每帧加 1, —为上一帧经平滑后的高带信号能量, 在 cnt=l 时初始化为 Esp。 此平滑 过程最多只进行 50帧, 若期间出现 —大于 E 的情况则终止此次平滑过程。可选地, -和 也可以仅表示部分帧的能量,对此本实施例不做具体限定。本实施例中,将 s'Q, S' i (或 s,ls) 通过 QMF合成滤波器, 得到解码器重建的最终 32kHz采样的 CN帧。
403: 如果所述 SID包含所述高带参数, 则解码所述 SID得到噪声高带参数, 并在本地生 成噪声低带参数, 根据所述解码得到的噪声高带参数和所述本地生成的噪声低带参数得到第 二 CN帧。
本实施例中, 如果 SID包含所述高带参数, 则解码所述 SID得到噪声高带参数, 并在本 地生成噪声低带参数, 根据所述解码得到的噪声高带参数和所述本地生成的噪声低带参数得 到第二 CN帧, 其中解码高带参数的方法与步骤 401中的方法一样, 在此本实施例不再赘述, 对于在本地生成低带参数的方法与现有技术中在本地生成宽带参数的方法一样, 对此本实施 例也不再赘述。 本发明提供的方法实施例的有益效果是: 解码器获取静音插入描述帧 SID, 判断所述 SID 是否包含低带参数和 /或包含高带参数; 如果所述 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声低带参数和 所述本地生成的噪声高带参数得到第一舒适噪声 CN帧; 如果所述 SID包含所述高带参数, 则 解码所述 SID得到噪声高带参数, 并在本地生成噪声低带参数, 根据所述解码得到的噪声高 带参数和所述本地生成的噪声低带参数得到第二 CN帧;如果所述 SID包含所述高带参数和所 述低带参数, 则解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的 噪声高带参数和噪声低带参数得到第三 CN帧。这样通过对高带信号和低带信号不同的处理方 式, 可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比特, 节省下的比特可 达到降低传输带宽或用于提高整体编码质量的目的,从而解决了由于超宽带的编码传输问题。 并且,在根据所述解码得到的噪声低带参数和所述本地生成的噪声高带参数得到第二 CN帧之 前, 还可以对本地生成的噪声高带参数进行优化, 以便能得到效果更好的舒适噪声, 从而进 一步优化了解码端的性能。 实施例 5
本实施例中提供了一种音频数据的处理方法,与实施例 2中对音频数据的处理方法一样, 编码器端获取音频信号的噪声帧, 并将噪声帧分解为噪声低带信号和噪声高带信号, 但是可 选地, 判断噪声帧的高带信号是否满足预设的编码传输条件, 包括: 判断所述噪声帧的噪声 高带信号的频谱结构与在所述噪声帧之前的噪声高带信号的平均频谱结构相比是否满足预设 条件, 如果是则以所述第二编码策略编码所述噪声帧的噪声高带信号的 SID并发送; 如果否, 则不需要对所述噪声帧的噪声高带信号进行编码传输。 其中, 噪声帧之前的噪声高带信号的 平均频谱结构包括: 在所述噪声帧之前的噪声高带信号的频谱的加权平均。 本实施例中, 将 判断所述噪声帧的噪声高带信号的频谱结构与在所述噪声帧之前的噪声高带信号的平均频谱 结构相比是否满足预设条件做为是否编码传输噪声高带信号的第三判断条件。
本实施例中, 可选地, 也可以通过第二判断条件来判断是否需要编码传输噪声高带信号, 对此本实施例不做具体限定。
本实施例中, DTX决定是否编码发送高带参数, 即 flaghb的设置, 可以由以下几个条件决 定。 1 )是否满足第三判断条件, 如果是, 则设置 flaghb=0, 否则 flaghb=l ; 2)是否满足第二判 断条件, 如果否, 则设置 flaghb=0, 如果是, 贝 ijflaghb=l。
本实施例中, 第三判断条件的具体实施方法可以为: 编码器获得当前噪声帧的噪声高带 信号 51的10阶 LSP系数 lsp(i), i=0, ...9, 可选地也可以是 LSF, 或 ISF, 或 ISP系数, 对此本实施 例不做具体限定, 其中 LSP, LSF, 或 ISF, 或 ISP系数只是不同域的不同表示方式, 但是均表 示合成滤波器系数, 对此本实施例不做具体限定。 用 lsp(i)更新其滑动平均,
Isp a(f) = a . Isp a (f) + _ a) · lsp(f) = 0 .9 ( 18 ) 其中, lspa(i)为 lsp(i)的长时滑动平均, 计算当前 lspa(i)与最近一次发送包含有高带参数的 SID 帧时的 lspa(i)的谱失真, Disp=i(fSpa(i、— lSpJ- , 其中 Dlsp为谱失真, / 。—表示最近一次发送 包含有高带参数的 SID帧时的 lsp— :(i)。 若 Dlsp小于某阈值, 则设置 flaghb=0, 否则 flaghb=l。 本实施例中编码器在需编码低带参数和或高带参数下的工作方法与实施例 3中的工作方 法基本相同, 对此本实施例不做赘述。 本实施例中, 当解码器在 CNG工作状态下且 flagCNe=0时, 需要本地生成噪声高带信号, 其中获得 SID所对应的时刻的噪声高带信号的加权平均能量的方法与实施例 4中的方法一样, 在此本实施例不再赘述。但是, 本实施例中, 优选地, 获得所述 SID所对应的时刻的噪声高带 信号的合成滤波器系数, 包括: 获取本地缓存的噪声高带信号的所述 M个 ISF系数或 ISP系数 或 LSF系数或 LSP系数; 对所述 M个系数进行随机化处理, 其中所述随机化的特征为: 使所述 M个系数中的每个系数向其各自对应的一个目标值逐渐靠拢, 所述目标值为与该系数值相邻 的预设范围内的值;所述 M个系数中的每个系数的目标值每经过所述 N帧发生改变;根据所述 随机化处理后的滤波系数得到所述 SID所对应的时刻的噪声高带信号的合成滤波器系数。具体 的上述获得所述 SID所对应的时刻的噪声高带信号的合成滤波器系数的方法可以通过以下方 式实现: 令 lsp'( )=lspcN(;i;), i=0, . . .9, lspCN( )为解码端本地缓存的 CN帧的高带信号 LSP系数的长 时滑动平均。 lsp'(i)以与实施例 4中相同的方法进行随机化处理, 得到 lsPl(i),
(0) = R(0) · (1 - lsPl (0)) + lsp (0) ( 19 )
Figure imgf000024_0001
[lspl (/) = R(i) . (Isp (/) - Isp (i - 1)) + Isp (/) = 1, 9
将 lspl(i)变换为 LPC系数 lpcl(i), 并以与实施例 4中相同的方法经过 w(i)加权后得到合成滤 波器 1/A CZ)。 本实施例中, 生成 320点的白噪声序列 exc2Ci), i=0, l,〜319, 使用 exc2Ci)激励滤 波器 1/A (Z)得到未经增益调整的高带 CN信号 s (i)。 对 s (i)乘以增益系数 G3, 其中
G, = ^, 得到解码端重建的 16kHz采样 CN帧的高带信号 本实施例中, 以此方法得
到的 lspl(i)在当前帧为 SID时不用来更新解码端缓存的 CN帧的高带信号的 LSP系数的长时滑 动平均。 本实施例中, 当编码器在编码大 SID帧时, 对高带信号对数能量在编码端的长时滑动平 均 ela进行量化时, 对 ela进行一定衰减后 (即减去一定值后) 再进行量化, 所以此时, 解码 时无需再对 s (i)乘以实施例 4中的 G2或 G4。本实施例中解码端的其它步骤与上述实施例中 的步骤类似, 在此本实施例不做具体赘述。
本发明提供的方法实施例的有益效果是: 获取音频信号的当前噪声帧, 并将所述当前噪 声帧分解为噪声低带信号和噪声高带信号, 以第一非连续传输机制编码传输所述噪声低带信 号, 以第二非连续传输机制编码传输所述噪声高带信号, 解码器获取静音插入描述帧 SID, 判断所述 SID是否包含低带参数和 /或包含高带参数; 如果所述 SID包含所述低带参数, 则解 码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪 声低带参数和所述本地生成的噪声高带参数得到第一舒适噪声 CN帧;如果所述 SID包含所述 高带参数, 则解码所述 SID得到噪声高带参数, 并在本地生成噪声低带参数, 根据所述解码 得到的噪声高带参数和所述本地生成的噪声低带参数得到第二 CN帧;如果所述 SID包含所述 高带参数和所述低带参数, 则解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所 述解码得到的噪声高带参数和噪声低带参数得到第三 CN帧。这样通过对高带信号和低带信号 不同的处理方式, 可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比特, 节 省下的比特可达到降低传输带宽或用于提高整体编码质量的目的, 从而解决了由于超宽带的 编码传输问题。 实施例 6
参见图 5, 本实施例中提供了一种音频数据的编码装置, 所述装置包括: 获取模块 501、 和传输模块 502。
获取模块 501, 用于获取音频信号的噪声帧, 并将所述噪声帧分解为噪声低带信号和噪 声高带信号;
传输模块 502, 用于以第一非连续传输机制编码传输所述噪声低带信号, 以第二非连续 传输机制编码传输所述噪声高带信号, 其中所述第一非连续传输机制的第一静音插入描述帧 SID的发送策略和所述第二非连续传输机制的第二 SID的发送策略不同, 或, 所述第一非连 续传输机制的第一 SID的编码策略和所述第二非连续传输机制的第二 SID的编码策略不同。
本实施例中, 所述第一 SID包含所述噪声帧的低带参数, 所述第二 SID包含所述噪声帧 的低带参数和 /或高带参数。
其中可选地, 参见图 6, 所述传输模块 502包括:
第一传输单元 502a, 用于判断所述噪声高带信号是否具有预设的频谱结构, 如果是, 且 满足所述第二 SID发送策略的中的发送条件, 则以所述第二 SID编码策略编码所述噪声高带 信号的 SID并发送; 如果否, 则确定不需要对所述噪声高带信号进行编码传输。
本实施例中, 所述第一传输单元 502a包括:
判断子单元, 用于获得所述噪声高带信号的频谱, 将所述频谱划分为至少两个子带, 如 果所述子带中任一第一子带的平均能量均不小于所述子带中第二子带的平均能量, 其中所述 第二子带所处的频带高于所述第一子带所处频带, 则确认所述噪声高带信号不具有预设的频 谱结构, 否则所述噪声高带信号具有预设的频谱结构。
参见图 6, 可选地, 所述传输模块 502包括:
第二传输单元 502b, 用于根据第一比值和第二比值生成偏离程度值, 其中所述第一比值 是所述噪声帧的噪声高带信号的能量与所述噪声低带信号的能量的比值, 所述第二比值是在 所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时刻的噪声高带信号的能 量和噪声低带信号的能量的比值; 判断所述偏离程度值是否达到预设的阈值, 如果是, 则以 所述第二 SID编码策略编码所述噪声高带信号的 SID并发送; 如果否, 则确定不需要对所述 噪声高带信号进行编码传输。
可选地, 所述第一比值是所述噪声帧的噪声高带信号的能量与所述噪声低带信号的能量 的比值, 包括: 所述第一比值是所述噪声帧的噪声高带信号的即时能量与所述噪声低带信号的即时能量 的比值;
相应地, 所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所 对应的时刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括:
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声高带信号的即时能量和噪声低带信号的即时能量的比值;
或, 所述第一比值是所述噪声帧的噪声高带信号的能量与所述噪声低带信号的能量的比 值, 包括:
所述第一比值是所述噪声帧及其之前的噪声帧的噪声高带信号的加权平均能量与所述噪 声帧及其之前的噪声帧的噪声低带信号的加权平均能量的比值;
相应地, 所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所 对应的时刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括:
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声帧及其之前的噪声帧的高带信号的加权平均能量和低带信号的加权平均能量的比 值。
可选地, 本实施例中, 所述第二传输单元 502b包括:
计算子单元, 用于分别计算第一比值的对数值和第二比值的对数值; 计算所述第一比值 的对数值和所述第二比值的对数值的差的绝对值, 得到所述偏离程度值。
参见图 6, 可选地本实施例中, 所述传输模块 502包括:
第三传输单元 502c, 用于判断所述噪声帧的噪声高带信号的频谱结构与在所述噪声帧之 前的噪声高带信号的平均频谱结构相比是否满足预设条件, 如果是, 则以所述第二编码策略 编码所述噪声帧的噪声高带信号的 SID并发送; 如果否, 则确定不需要对所述噪声帧的噪声 高带信号进行编码传输。
本实施例中, 可选地, 所述噪声帧之前的噪声高带信号的平均频谱结构包括: 在所述噪 声帧之前的噪声高带信号的频谱的加权平均。
可选地, 本实施例中所述第二非连续传输机制的第二 SID的发送策略中的发送条件还包 括: 所述第一非连续传输机制满足所述第一 SID的发送条件。
本发明提供的装置实施例的有益效果是: 获取音频信号的当前噪声帧, 并将所述当前噪 声帧分解为噪声低带信号和噪声高带信号, 以第一非连续传输机制编码传输所述噪声低带信 号, 以第二非连续传输机制编码传输所述噪声高带信号, 这样通过对高带信号和低带信号不 同的处理方式, 可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比特, 节省 下的比特可达到降低传输带宽或用于提高整体编码质量的目的, 从而解决了由于超宽带的编 码传输问题。 实施例 7
参见图 7, 本实施例中提供了一种音频数据的解码装置, 所述装置包括: 获取模块 601、 第一解码模块 602、 第二解码模块 603和第三解码模块 604。
获取模块 601, 用于判断接收到的当前静音插入描述帧 SID是否包含有高带参数或低带 参数;
第一解码模块 602, 用于如果所述获取模块 601获取的 SID包含所述低带参数, 则解码 所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声 低带参数和所述本地生成的噪声高带参数得到第一舒适噪声 CN帧;
第二解码模块 603, 用于如果所述获取模块 601获取的 SID包含所述高带参数, 则解码 所述 SID得到噪声高带参数, 并在本地生成噪声低带参数, 根据所述解码得到的噪声高带参 数和所述本地生成的噪声低带参数得到第二 CN帧;
第三解码模块 604, 用于如果所述获取模块 601获取的 SID包含所述高带参数和所述低 带参数, 则解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的噪声 高带参数和噪声低带参数得到第三 CN帧。
可选地, 本实施例中, 第一解码模块 602还用于在解码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声低带参数和所述本地生成的噪声 高带参数得到第一舒适噪声 CN帧之前, 如果所述解码器处于第一舒适噪声生成 CNG状态, 则 进入第二 CNG状态。
可选地, 本实施例中, 所述第三解码模块 604还用于解码所述 SID得到噪声高带参数和 所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低带参数得到第三 CN帧之前, 如果所述解码器处于所述第二 CNG状态, 则进入第一 CNG状态。
其中, 可选地, 所述获取模块 601包括:
第一确认单元, 用于如果所述 SID的比特数小于预设的第一阈值, 则确认所述 SID包含 有高带参数; 如果所述 SID的比特数大于预设的第一阈值且小于预设的第二阈值, 则确认所 述 SID包含有低带参数;如果所述 SID的比特数大于预设的第二阈值且小于预设的第三阈值, 则确认所述 SID包含有高带参数和低带参数; 或, 第二确认单元, 用于如果所述 SID中包含第一标识符, 则确认所述 SID包含有高带 参数, 如果所述 SID中包含第二标识符, 则确认所述 SID包含有低带参数, 如果所述 SID中 包含第三标识符, 则确认所述 SID包含有低带参数和高带参数。
本实施例中, 所述第一解码模块 602包括:
第一获取单元, 用于分别获得所述 SID所对应的时刻的噪声高带信号的加权平均能量和 噪声高带信号的合成滤波器系数;
第二获取单元, 用于根据所述获得的所述 SID所对应的时刻的噪声高带信号的加权平均 能量和噪声高带信号的合成滤波器系数得到所述噪声高带信号。
可选地, 所述第一获取单元包括:
第一获取子单元,用于根据所述解码得到的噪声低带参数得到第一 CN帧的低带信号的能 计算子单元, 用于计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪 声高带信号的能量和噪声低带信号的能量的比值得到第一比值;
第二获取子单元, 用于根据所述第一 CN帧的低带信号的能量和所述第一比值, 获得所述 SID的对应的时刻的噪声高带信号的能量;
第三获取子单元,用于将所述 SID对应的时刻的噪声高带信号的能量与本地缓存的 CN帧 的高带信号的能量做加权平均, 得到所述 SID对应的时刻的噪声高带信号的加权平均能量, 其中所述 SID对应的时刻的噪声高带信号的加权平均能量就是所述第一 CN帧的高带信号能 其中, 所述计算子单元具体用于:
计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的即时 能量和噪声低带信号的即时能量的比值得到第一比值;
或, 计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的 能量的加权平均和噪声低带信号的能量的加权平均的比值得到第一比值。
其中,当所述 SID对应的时刻的噪声高带信号的能量大于所述本地缓存的前一 CN帧的高 带信号的能量时, 则以第一速率更新所述本地缓存的前一 CN帧的高带信号的能量, 否则以第 二速率更新所述本地缓存的前一 CN帧的高带信号的能量, 所述第一速率大于所述第二速率。
可选地, 所述第一获取单元包括:
第一选取子单元, 用于选取所述 SID之前预设时间段内的语音帧中高带信号能量最小的 语音帧的高带信号; 根据所述语音帧中高带信号能量最小的语音帧的高带信号的能量获得所 述 SID所对应的时刻的噪声高带信号的加权平均能量, 其中所述 SID对应的时刻的噪声高带 信号的加权平均能量就是所述第一 CN帧的高带信号能量;
或, 第二选取子单元, 用于选取所述 SID之前预设时间段内的语音帧中高带信号能量小 于预设阈值的 N个语音帧的高带信号; 根据所述 N个语音帧的高带信号的加权平均能量获得 所述 SID所对应的时刻的噪声高带信号的能量的加权平均, 其中所述 SID对应的时刻的噪声 高带信号的加权平均能量就是所述第一 CN帧的高带信号能量。
可选地, 所述第一获取单元包括:
分布子单元, 用于在高带信号所对应的频率范围内分布 M个导抗谱频率 ISF系数或导抗 谱对 ISP系数或线谱频率 LSF系数或线谱对 LSP系数;
第一随机化处理子单元, 用于对所述 M个系数进行随机化处理, 其中所述随机化的特征 为: 使所述 M个系数中的每个系数向其各自对应的一个目标值逐渐靠拢, 所述目标值为与该 系数值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过 N帧发生改变, 其中所述 M和所述 N均为自然数;
第四获取子单元, 用于根据所述随机化处理后的滤波器系数得到所述 SID所对应的时刻 的噪声高带信号的合成滤波器系数。
可选地, 所述第一获取单元包括:
第五获取子单元, 用于获取本地缓存的噪声高带信号的所述 M个 ISF系数或 ISP系数或 LSF系数或 LSP系数;
第二随机化处理子单元, 对所述 M个系数进行随机化处理, 其中所述随机化的特征为: 使所述 M个系数中的每个系数向其各自对应的一个目标值逐渐靠拢, 所述目标值为与该系数 值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过所述 N帧发生改变; 第六获取子单元, 用于根据所述随机化处理后的滤波系数得到所述 SID所对应的时刻的 噪声高带信号的合成滤波器系数。
参见图 8, 可选地, 所述装置还包括:
优化模块 605, 用于所述第一解码模块 602得到第一 CN帧之前, 当与所述 SID相邻的历 史帧为语音编码帧时, 若所述语音编码帧解码出的高带信号或部分高带信号的平均能量小于 所述本地生成的噪声高带信号或部分噪声高带信号的平均能量时, 对从所述 SID开始的后续 L帧的噪声高带信号乘以小于 1 的平滑系数, 得到新的本地生成的噪声高带信号的能量的加 权平均;
相应地, 所述第一解码模块 602具体用于根据所述解码得到的噪声低带参数、 所述 SID 所对应的时刻的噪声高带信号的合成滤波器系数和所述新的本地生成的噪声高带信号的能量 的加权平均得到第四 CN帧。
本发明提供的装置实施例的有益效果是: 解码器获取静音插入描述帧 SID, 判断所述 SID 是否包含低带参数和 /或包含高带参数; 如果所述 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声低带参数和 所述本地生成的噪声高带参数得到第一舒适噪声 CN帧; 如果所述 SID包含所述高带参数, 则 解码所述 SID得到噪声高带参数, 并在本地生成噪声低带参数, 根据所述解码得到的噪声高 带参数和所述本地生成的噪声低带参数得到第二 CN帧;如果所述 SID包含所述高带参数和所 述低带参数, 则解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的 噪声高带参数和噪声低带参数得到第三 CN帧。这样通过对高带信号和低带信号不同的处理方 式, 可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比特, 节省下的比特可 达到降低传输带宽或用于提高整体编码质量的目的,从而解决了由于超宽带的编码传输问题。 实施例 8
参见图 9, 本实施例中提供了一种音频数据的处理系统, 所述系统包括: 如上所述的音 频数据的编码装置 500和如上所述的音频数据的解码装置 600。
本发明实施例提供的技术方案带来的有益效果是: 获取音频信号的当前噪声帧, 并将所 述当前噪声帧分解为噪声低带信号和噪声高带信号, 以第一非连续传输机制编码传输所述噪 声低带信号, 以第二非连续传输机制编码传输所述噪声高带信号, 解码器获取静音插入描述 帧 SID, 判断所述 SID是否包含低带参数和 /或包含高带参数; 如果所述 SID包含所述低带参 数, 则解码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到 的所述噪声低带参数和所述本地生成的噪声高带参数得到第一舒适噪声 CN帧; 如果所述 SID 包含所述高带参数, 则解码所述 SID得到噪声高带参数, 并在本地生成噪声低带参数, 根据 所述解码得到的噪声高带参数和所述本地生成的噪声低带参数得到第二 CN帧; 如果所述 SID 包含所述高带参数和所述低带参数,则解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低带参数得到第三 CN帧。这样通过对高带信号和低 带信号不同的处理方式, 可以在不降低编解码器主观质量的前提下节省计算复杂度和编码比 特, 节省下的比特可达到降低传输带宽或用于提高整体编码质量的目的, 从而解决了由于超 宽带的编码传输问题。 本实施例提供的装置和系统, 具体可以与方法实施例属于同一构思, 其具体实现过程详 见方法实施例, 这里不再赘述。 上述实施例中的音频数据的处理方法、 装置可以应用于音频编码器或音频解码器。 音频 编解码器可以广泛应用于各种电子设备中,例如:移动电话,无线装置,个人数据助理(PDA), 手持式或便携式计算机, GPS接收机 /导航器, 照相机, 音频 /视频播放器, 摄像机, 录像机, 监控设备等。 通常, 这类电子设备中包括音频编码器或音频解码器, 音频编码器或者解码器 可以直接由数字电路或芯片例如 DSP ( digital signal processor) 实现, 或者由软件代码 驱动处理器执行软件代码中的流程而实现。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成, 也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中, 上述提到的存储介质可以是只读存储器, 磁盘或光盘等。 以上所述仅为本发明的较佳实施例, 并不用以限制本发明, 凡在本发明的精神和原则之 内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。

Claims

权利 要求 书
1、 一种音频数据的处理方法, 其特征在于, 所述方法包括:
获取音频信号的噪声帧, 并将所述噪声帧分解为噪声低带信号和噪声高带信号; 以第一非连续传输机制编码传输所述噪声低带信号, 以第二非连续传输机制编码传输所 述噪声高带信号, 其中所述第一非连续传输机制的第一静音插入描述帧 SID的发送策略和所 述第二非连续传输机制的第二 SID的发送策略不同,或,所述第一非连续传输机制的第一 SID 的编码策略和所述第二非连续传输机制的第二 SID的编码策略不同。
2、根据权利要求 1所述的方法,其特征在于,所述第一 SID包含所述噪声帧的低带参数, 所述第二 SID包含所述噪声帧的低带参数或高带参数。
3、根据权利要求 1或 2所述的方法, 其特征在于, 所述以第二非连续传输机制编码传输 所述噪声高带信号, 包括:
判断所述噪声高带信号是否具有预设的频谱结构, 如果是, 且满足所述第二 SID发送策 略的发送条件, 则以所述第二 SID编码策略编码所述噪声高带信号的 SID并发送; 如果否, 则确定不需要对所述噪声高带信号进行编码传输。
4、根据权利要求 3所述的方法, 其特征在于, 所述判断所述噪声高带信号是否具有预设 的频谱结构包括:
获得所述噪声高带信号的频谱, 将所述频谱划分为至少两个子带, 如果所述子带中任一 第一子带的平均能量均不小于所述子带中第二子带的平均能量, 其中所述第二子带所处的频 带高于所述第一子带所处频带, 则确认所述噪声高带信号不具有预设的频谱结构, 否则所述 噪声高带信号具有预设的频谱结构。
5、根据权利要求 1或 2所述的方法, 其特征在于, 所述以第二非连续传输机制编码传输 所述噪声高带信号, 包括:
根据第一比值和第二比值生成偏离程度值, 其中所述第一比值是所述噪声帧的噪声高带 信号的能量与所述噪声低带信号的能量的比值, 所述第二比值是在所述噪声帧之前最近一次 发送包含有噪声高带参数的 SID所对应的时刻的噪声高带信号的能量和噪声低带信号的能量 的比值;
判断所述偏离程度值是否达到预设的阈值, 如果是, 则以所述第二 SID编码策略编码所 述噪声高带信号的 SID并发送; 如果否, 则确定不需要对所述噪声高带信号进行编码传输。
6、根据权利要求 5所述的方法, 其特征在于, 所述第一比值是所述噪声帧的噪声高带信 号的能量与所述噪声低带信号的能量的比值, 包括: 所述第一比值是所述噪声帧的噪声高带信号的即时能量与所述噪声低带信号的即时能量 的比值;
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括:
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声高带信号的即时能量和噪声低带信号的即时能量的比值;
或, 所述第一比值是所述噪声帧的噪声高带信号的能量与所述噪声低带信号的能量的比 值, 包括:
所述第一比值是所述噪声帧及其之前的噪声帧的噪声高带信号的加权平均能量与所述噪 声帧及其之前的噪声帧的噪声低带信号的加权平均能量的比值;
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括:
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声帧及其之前的噪声帧的高带信号的加权平均能量和低带信号的加权平均能量的比 值。
7、根据权利要求 5或 6所述的方法, 其特征在于, 所述根据第一比值和第二比值生成偏 离程度值, 包括:
分别计算第一比值的对数值和第二比值的对数值;
计算所述第一比值的对数值和所述第二比值的对数值的差的绝对值, 得到所述偏离程度 值。
8、根据权利要求 1或 2所述的方法, 其特征在于, 所述以第二非连续传输机制编码传输 所述噪声高带信号, 包括:
判断所述噪声帧的噪声高带信号的频谱结构与在所述噪声帧之前的噪声高带信号的平均 频谱结构相比是否满足预设条件, 如果是, 则以所述第二编码策略编码所述噪声帧的噪声高 带信号的 SID并发送; 如果否, 则确定不需要对所述噪声帧的噪声高带信号进行编码传输。
9、根据权利要求 8所述的方法, 其特征在于, 所述噪声帧之前的噪声高带信号的平均频 谱结构包括: 在所述噪声帧之前的噪声高带信号的频谱的加权平均。
10、 根据权利要求 3-8任一项所述的方法, 其特征在于, 所述第二非连续传输机制的第 二 SID的发送策略中的发送条件还包括: 所述第一非连续传输机制满足所述第一 SID的发送 条件。
11、 一种音频数据的处理方法, 其特征在于, 所述方法包括:
解码器获取静音插入描述帧 SID, 判断所述 SID是否包含低带参数或高带参数; 如果所述 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地生成 噪声高带参数, 根据所述解码得到的所述噪声低带参数和所述本地生成的噪声高带参数得到 第一舒适噪声 CN帧;
如果所述 SID包含所述高带参数, 则解码所述 SID得到噪声高带参数, 并在本地生成噪 声低带参数, 根据所述解码得到的噪声高带参数和所述本地生成的噪声低带参数得到第二 CN 帧;
如果所述 SID包含所述高带参数和所述低带参数, 则解码所述 SID得到噪声高带参数和 所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低带参数得到第三 CN帧。
12、 根据权利要求 11所述的方法, 其特征在于, 如果所述 SID包含所述低带参数, 则所 述解码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所 述噪声低带参数和所述本地生成的噪声高带参数得到第一舒适噪声 CN帧之前, 还包括: 如果所述解码器处于第一舒适噪声生成 CNG状态, 则所述解码器进入第二 CNG状态。
13、根据权利要求 11所述的方法, 其特征在于, 如果所述 SID包含所述高带参数和所述 低带参数, 则所述解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到 的噪声高带参数和噪声低带参数得到第三 CN帧之前, 还包括:
如果所述解码器处于所述第二 CNG状态, 则所述解码器进入第一 CNG状态。
14、 根据权利要求 11-13任一项所述的方法, 其特征在于, 所述判断所述 SID是否包含 低带参数和 /或包含高带参数包括:
如果所述 SID的比特数小于预设的第一阈值, 则确认所述 SID包含有高带参数; 如果所 述 SID的比特数大于预设的第一阈值且小于预设的第二阈值, 则确认所述 SID包含有低带参 数; 如果所述 SID的比特数大于预设的第二阈值且小于预设的第三阈值, 则确认所述 SID包 含有高带参数和低带参数;
或, 如果所述 SID中包含第一标识符, 则确认所述 SID包含有高带参数, 如果所述 SID 中包含第二标识符, 则确认所述 SID包含有低带参数, 如果所述 SID中包含第三标识符, 则 确认所述 SID包含有低带参数和高带参数。
15、 根据权利要求 11-14任一项所述的方法, 其特征在于, 所述在本地生成噪声高带参 数包括:
分别获得所述 SID所对应的时刻的噪声高带信号的加权平均能量和噪声高带信号的合成 滤波器系数;
根据所述获得的所述 SID所对应的时刻的噪声高带信号的加权平均能量和噪声高带信号 的合成滤波器系数得到所述噪声高带信号。
16、根据权利要求 15所述的方法, 其特征在于, 所述获得所述 SID所对应的时刻的噪声 高带信号的加权平均能量, 包括:
根据所述解码得到的噪声低带参数得到第一 CN帧的低带信号的能量;
计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的能量 和噪声低带信号的能量的比值得到第一比值;
根据所述第一 CN帧的低带信号的能量和所述第一比值,获得所述 SID的对应的时刻的噪 声高带信号的能量;
将所述 SID对应的时刻的噪声高带信号的能量与本地缓存的 CN帧的高带信号的能量做加 权平均, 得到所述 SID对应的时刻的噪声高带信号的加权平均能量, 其中所述 SID对应的时 刻的噪声高带信号的加权平均能量就是所述第一 CN帧的高带信号能量。
17、根据权利要求 16所述的方法, 其特征在于, 所述计算在所述 SID前面接收到包含有 高带参数的 SID的时刻所对应的噪声高带信号的能量和噪声低带信号的能量的比值得到第一 比值, 包括:
计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的即时 能量和噪声低带信号的即时能量的比值得到第一比值;
或, 计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的 能量的加权平均和噪声低带信号的能量的加权平均的比值得到第一比值。
18、 根据权利要求 16或 17所述的方法, 其特征在于, 其中, 当所述 SID对应的时刻的 噪声高带信号的能量大于所述本地缓存的前一 CN帧的高带信号的能量时,则以第一速率更新 所述本地缓存的前一 CN帧的高带信号的能量, 否则以第二速率更新所述本地缓存的前一 CN 帧的高带信号的能量, 所述第一速率大于所述第二速率。
19、根据权利要求 15所述的方法, 其特征在于, 所述获得所述 SID所对应的时刻的噪声 高带信号的能量的加权平均, 包括:
选取所述 SID之前预设时间段内的语音帧中高带信号能量最小的语音帧的高带信号; 根据所述语音帧中高带信号能量最小的语音帧的高带信号的能量获得所述 SID所对应的 时刻的噪声高带信号的加权平均能量, 其中所述 SID对应的时刻的噪声高带信号的加权平均 能量就是所述第一 CN帧的高带信号能量; 或, 选取所述 SID之前预设时间段内的语音帧中高带信号能量小于预设阈值的 N个语音 帧的高带信号;
根据所述 N个语音帧的高带信号的加权平均能量获得所述 SID所对应的时刻的噪声高带 信号的能量的加权平均, 其中所述 SID对应的时刻的噪声高带信号的加权平均能量就是所述 第一 CN帧的高带信号能量。
20、 根据权利要求 15-19任一项所述的方法, 其特征在于, 所述获得所述 SID所对应的 时刻的噪声高带信号的合成滤波器系数, 包括:
在高带信号所对应的频率范围内分布 M个导抗谱频率 ISF系数或导抗谱对 ISP系数或线 谱频率 LSF系数或线谱对 LSP系数;
对所述 M个系数进行随机化处理, 其中所述随机化的特征为: 使所述 M个系数中的每个 系数向其各自对应的一个目标值逐渐靠拢,所述目标值为与该系数值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过 N帧发生改变,其中所述 M和所述 N均为自然数; 根据所述随机化处理后的滤波器系数得到所述 SID所对应的时刻的噪声高带信号的合成 滤波器系数。
21、 根据权利要求 15-19任一项所述的方法, 其特征在于, 所述获得所述 SID所对应的 时刻的噪声高带信号的合成滤波器系数, 包括:
获取本地缓存的噪声高带信号的所述 M个 ISF系数或 IS系数 P或 LSF系数或 LSP系数; 对所述 M个系数进行随机化处理, 其中所述随机化的特征为: 使所述 M个系数中的每个 系数向其各自对应的一个目标值逐渐靠拢,所述目标值为与该系数值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过所述 N帧发生改变;
根据所述随机化处理后的滤波系数得到所述 SID所对应的时刻的噪声高带信号的合成滤 波器系数。
22、 根据权利要求 15-21任一项所述的方法, 其特征在于, 所述根据所述解码得到的噪 声低带参数和所述本地生成的噪声高带参数得到第一 CN帧之前, 还包括:
当与所述 SID相邻的历史帧为语音编码帧时, 若所述语音编码帧解码出的高带信号或部 分高带信号的平均能量小于所述本地生成的噪声高带信号或部分噪声高带信号的平均能量 时, 对从所述 SID开始的后续 L帧的噪声高带信号乘以小于 1的平滑系数, 得到新的本地生 成的噪声高带信号的能量的加权平均;
所述根据所述解码得到的噪声低带参数和所述本地生成的噪声高带参数得到第一 CN帧, 包括: 根据所述解码得到的噪声低带参数、 所述 SID所对应的时刻的噪声高带信号的合成滤波 器系数和所述新的本地生成的噪声高带信号的能量的加权平均得到第四 CN帧。
23、 一种音频数据的编码装置, 其特征在于, 所述装置包括:
获取模块, 用于获取音频信号的噪声帧, 并将所述噪声帧分解为噪声低带信号和噪声高 带信号;
传输模块, 用于以第一非连续传输机制编码传输所述噪声低带信号, 以第二非连续传输 机制编码传输所述噪声高带信号, 其中所述第一非连续传输机制的第一静音插入描述帧 SID 的发送策略和所述第二非连续传输机制的第二 SID的发送策略不同, 或, 所述第一非连续传 输机制的第一 SID的编码策略和所述第二非连续传输机制的第二 SID的编码策略不同。
24、根据权利要求 23所述的装置, 其特征在于, 所述第一 SID包含所述噪声帧的低带参 数, 所述第二 SID包含所述噪声帧的低带参数或高带参数。
25、 根据权利要求 23或 24所述的装置, 其特征在于, 所述传输模块包括:
第一传输单元, 用于判断所述噪声高带信号是否具有预设的频谱结构, 如果是, 且满足 所述第二 SID发送策略的发送条件,则以所述第二 SID编码策略编码所述噪声高带信号的 SID 并发送; 如果否, 则确定不需要对所述噪声高带信号进行编码传输。
26、 根据权利要求 25所述的装置, 其特征在于, 所述第一传输单元包括:
判断子单元, 用于获得所述噪声高带信号的频谱, 将所述频谱划分为至少两个子带, 如 果所述子带中任一第一子带的平均能量均不小于所述子带中第二子带的平均能量, 其中所述 第二子带所处的频带高于所述第一子带所处频带, 则确认所述噪声高带信号不具有预设的频 谱结构, 否则所述噪声高带信号具有预设的频谱结构。
27、 根据权利要求 23或 24所述的装置, 其特征在于, 所述传输模块包括:
第二传输单元, 用于根据第一比值和第二比值生成偏离程度值, 其中所述第一比值是所 述噪声帧的噪声高带信号的能量与所述噪声低带信号的能量的比值, 所述第二比值是在所述 噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时刻的噪声高带信号的能量和 噪声低带信号的能量的比值; 判断所述偏离程度值是否达到预设的阈值, 如果是, 则以所述 第二 SID编码策略编码所述噪声高带信号的 SID并发送; 如果否, 则确定不需要对所述噪声 高带信号进行编码传输。
28、根据权利要求 27所述的装置, 其特征在于, 所述第一比值是所述噪声帧的噪声高带 信号的能量与所述噪声低带信号的能量的比值, 包括:
所述第一比值是所述噪声帧的噪声高带信号的即时能量与所述噪声低带信号的即时能量 的比值;
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括:
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声高带信号的即时能量和噪声低带信号的即时能量的比值;
或, 所述第一比值是所述噪声帧的噪声高带信号的能量与所述噪声低带信号的能量的比 值, 包括:
所述第一比值是所述噪声帧及其之前的噪声帧的噪声高带信号的加权平均能量与所述噪 声帧及其之前的噪声帧的噪声低带信号的加权平均能量的比值;
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声高带信号的能量和噪声低带信号的能量的比值, 包括:
所述第二比值是在所述噪声帧之前最近一次发送包含有噪声高带参数的 SID所对应的时 刻的噪声帧及其之前的噪声帧的高带信号的加权平均能量和低带信号的加权平均能量的比 值。
29、 根据权利要求 27或 28所述的装置, 其特征在于, 所述第二传输单元包括: 计算子单元, 用于分别计算第一比值的对数值和第二比值的对数值; 计算所述第一比值 的对数值和所述第二比值的对数值的差的绝对值, 得到所述偏离程度值。
30、 根据权利要求 23或 24所述的装置, 其特征在于, 所述第一传输模块包括: 第三传输单元, 用于判断所述噪声帧的噪声高带信号的频谱结构与在所述噪声帧之前的 噪声高带信号的平均频谱结构相比是否满足预设条件, 如果是, 则以所述第二编码策略编码 所述噪声帧的噪声高带信号的 SID并发送; 如果否, 则确定不需要对所述噪声帧的噪声高带 信号进行编码传输。
31、根据权利要求 30所述的装置, 其特征在于, 所述噪声帧之前的噪声高带信号的平均 频谱结构包括: 在所述噪声帧之前的噪声高带信号的频谱的加权平均。
32、 根据权利要求 25-31任一项所述的装置, 其特征在于, 所述第二非连续传输机制的 第二 SID的发送策略中的发送条件还包括: 所述第一非连续传输机制满足所述第一 SID的发 送条件。
33、 一种音频数据的解码装置, 其特征在于, 所述装置包括:
获取模块, 用于获取静音插入描述帧 SID, 判断所述 SID是否包含低带参数或包含高带 参数; 第一解码模块, 用于如果所述获取模块获取的 SID包含所述低带参数, 则解码所述 SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声低带参数和 所述本地生成的噪声高带参数得到第一舒适噪声 CN帧;
第二解码模块, 用于如果所述获取模块获取的 SID包含所述高带参数, 则解码所述 SID 得到噪声高带参数, 并在本地生成噪声低带参数, 根据所述解码得到的噪声高带参数和所述 本地生成的噪声低带参数得到第二 CN帧;
第三解码模块, 用于如果所述获取模块获取的 SID包含所述高带参数和所述低带参数, 则解码所述 SID得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的噪声高带参数 和噪声低带参数得到第三 CN帧。
34、 根据权利要求 32 所述的装置, 其特征在于, 所述第一解码模块还用于在解码所述
SID, 得到噪声低带参数, 并在本地生成噪声高带参数, 根据所述解码得到的所述噪声低带参 数和所述本地生成的噪声高带参数得到第一舒适噪声 CN帧之前,如果所述解码器处于第一舒 适噪声生成 CNG状态, 则进入第二 CNG状态。
35、 根据权利要求 32所述的装置, 其特征在于, 所述第三解码模块还用于解码所述 SID 得到噪声高带参数和所述噪声低带参数, 根据所述解码得到的噪声高带参数和噪声低带参数 得到第三 CN帧之前, 如果所述解码器处于所述第二 CNG状态, 则进入第一 CNG状态。
36、 根据权利要求 33-35任一项所述的装置, 其特征在于, 所述获取模块包括: 第一确认单元, 用于如果所述 SID的比特数小于预设的第一阈值, 则确认所述 SID包含 有高带参数; 如果所述 SID的比特数大于预设的第一阈值且小于预设的第二阈值, 则确认所 述 SID包含有低带参数;如果所述 SID的比特数大于预设的第二阈值且小于预设的第三阈值, 则确认所述 SID包含有高带参数和低带参数;
或, 第二确认单元, 用于如果所述 SID中包含第一标识符, 则确认所述 SID包含有高带 参数, 如果所述 SID中包含第二标识符, 则确认所述 SID包含有低带参数, 如果所述 SID中 包含第三标识符, 则确认所述 SID包含有低带参数和高带参数。
37、 根据权利要求 33-36任一项所述的装置, 其特征在于, 所述第一解码模块包括: 第一获取单元, 用于分别获得所述 SID所对应的时刻的噪声高带信号的加权平均能量和 噪声高带信号的合成滤波器系数;
第二获取单元, 用于根据所述获得的所述 SID所对应的时刻的噪声高带信号的加权平均 能量和噪声高带信号的合成滤波器系数得到所述噪声高带信号。
38、 根据权利要求 37所述的装置, 其特征在于, 所述第一获取单元包括: 第一获取子单元,用于根据所述解码得到的噪声低带参数得到第一 CN帧的低带信号的能 计算子单元, 用于计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪 声高带信号的能量和噪声低带信号的能量的比值得到第一比值;
第二获取子单元, 用于根据所述第一 CN帧的低带信号的能量和所述第一比值, 获得所述
SID的对应的时刻的噪声高带信号的能量;
第三获取子单元,用于将所述 SID对应的时刻的噪声高带信号的能量与本地缓存的 CN帧 的高带信号的能量做加权平均, 得到所述 SID对应的时刻的噪声高带信号的加权平均能量, 其中所述 SID对应的时刻的噪声高带信号的加权平均能量就是所述第一 CN帧的高带信号能 量。
39、 根据权利要求 38所述的装置, 其特征在于, 所述计算子单元具体用于:
计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的即时 能量和噪声低带信号的即时能量的比值得到第一比值;
或, 计算在所述 SID前面接收到包含有高带参数的 SID的时刻所对应的噪声高带信号的 能量的加权平均和噪声低带信号的能量的加权平均的比值得到第一比值。
40、 根据权利要求 38或 39所述的装置, 其特征在于, 其中, 当所述 SID对应的时刻的 噪声高带信号的能量大于所述本地缓存的前一 CN帧的高带信号的能量时,则以第一速率更新 所述本地缓存的前一 CN帧的高带信号的能量, 否则以第二速率更新所述本地缓存的前一 CN 帧的高带信号的能量, 所述第一速率大于所述第二速率。
41、 根据权利要求 37所述的装置, 其特征在于, 所述第一获取单元包括:
第一选取子单元, 用于选取所述 SID之前预设时间段内的语音帧中高带信号能量最小的 语音帧的高带信号; 根据所述语音帧中高带信号能量最小的语音帧的高带信号的能量获得所 述 SID所对应的时刻的噪声高带信号的加权平均能量, 其中所述 SID对应的时刻的噪声高带 信号的加权平均能量就是所述第一 CN帧的高带信号能量;
或, 第二选取子单元, 用于选取所述 SID之前预设时间段内的语音帧中高带信号能量小 于预设阈值的 N个语音帧的高带信号; 根据所述 N个语音帧的高带信号的加权平均能量获得 所述 SID所对应的时刻的噪声高带信号的能量的加权平均, 其中所述 SID对应的时刻的噪声 高带信号的加权平均能量就是所述第一 CN帧的高带信号能量。
42、 根据权利要求 37-41任一项所述的装置, 其特征在于, 所述第一获取单元包括: 分布子单元, 用于在高带信号所对应的频率范围内分布 M个导抗谱频率 ISF系数或导抗 谱对 ISP系数或线谱频率 LSF系数或线谱对 LSP系数;
第一随机化处理子单元, 用于对所述 M个系数进行随机化处理, 其中所述随机化的特征 为: 使所述 M个系数中的每个系数向其各自对应的一个目标值逐渐靠拢, 所述目标值为与该 系数值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过 N帧发生改变, 其中所述 M和所述 N均为自然数;
第四获取子单元, 用于根据所述随机化处理后的滤波器系数得到所述 SID所对应的时刻 的噪声高带信号的合成滤波器系数。
43、 根据权利要求 37-41任一项所述的装置, 其特征在于, 所述第一获取单元包括: 第五获取子单元, 用于获取本地缓存的噪声高带信号的所述 M个 ISF系数或 ISP系数或 LSF系数或 LSP系数;
第二随机化处理子单元, 对所述 M个系数进行随机化处理, 其中所述随机化的特征为: 使所述 M个系数中的每个系数向其各自对应的一个目标值逐渐靠拢, 所述目标值为与该系数 值相邻的预设范围内的值; 所述 M个系数中的每个系数的目标值每经过所述 N帧发生改变; 第六获取子单元, 用于根据所述随机化处理后的滤波系数得到所述 SID所对应的时刻的 噪声高带信号的合成滤波器系数。
44、 根据权利要求 37-43任一项所述的装置, 其特征在于, 所述装置还包括: 第七获取子单元, 用于所述第一解码模块得到第一 CN帧之前, 当与所述 SID相邻的历史 帧为语音编码帧时, 若所述语音编码帧解码出的高带信号或部分高带信号的平均能量小于所 述本地生成的噪声高带信号或部分噪声高带信号的平均能量时, 对从所述 SID开始的后续 L 帧的噪声高带信号乘以小于 1的平滑系数, 得到新的本地生成的噪声高带信号的能量的加权 平均;
所述第一解码模块具体用于根据所述解码得到的噪声低带参数、 所述 SID所对应的时刻 的噪声高带信号的合成滤波器系数和所述新的本地生成的噪声高带信号的能量的加权平均得 到第四 CN帧。
45、 一种音频数据的处理系统, 其特征在于, 所述系统包括: 如权利要求 23-32任一项所述 的音频数据的编码装置和如权利要求 33-44任一项所述的音频数据的解码装置。
PCT/CN2012/087812 2011-12-30 2012-12-28 音频数据的处理方法、装置和系统 WO2013097764A1 (zh)

Priority Applications (21)

Application Number Priority Date Filing Date Title
AU2012361423A AU2012361423B2 (en) 2011-12-30 2012-12-28 Method, apparatus, and system for processing audio data
KR1020167036611A KR101770237B1 (ko) 2011-12-30 2012-12-28 오디오 데이터 처리 방법, 장치 및 시스템
BR112014016153-4A BR112014016153B1 (pt) 2011-12-30 2012-12-28 método para um codificador processar dados de áudio, método para processar um sinal de áudio, codificador e decodificador
ES12861377.5T ES2610783T3 (es) 2011-12-30 2012-12-28 Método y aparato para procesar datos de audio
MX2014007968A MX338445B (es) 2011-12-30 2012-12-28 Metodo, aparato, y sistema para procesar datos de audio.
CA2861916A CA2861916C (en) 2011-12-30 2012-12-28 Method, apparatus, and system for processing audio data
EP12861377.5A EP2793227B1 (en) 2011-12-30 2012-12-28 Audio data processing method and apparatus
MYPI2014001949A MY173976A (en) 2011-12-30 2012-12-28 Method, apparatus, and system for processing audio data
SG11201403686SA SG11201403686SA (en) 2011-12-30 2012-12-28 Method, apparatus, and system for processing audio data
KR1020147020836A KR101693280B1 (ko) 2011-12-30 2012-12-28 오디오 데이터 처리 방법, 장치 및 시스템
RU2014131387/08A RU2579926C1 (ru) 2011-12-30 2012-12-28 Способ, устройство и система для обработки аудиоданных
JP2014549344A JP6072068B2 (ja) 2011-12-30 2012-12-28 オーディオ・データを処理するための方法、装置、及びシステム
US14/318,899 US9406304B2 (en) 2011-12-30 2014-06-30 Method, apparatus, and system for processing audio data
IN1436KON2014 IN2014KN01436A (zh) 2011-12-30 2014-07-08
ZA2014/04996A ZA201404996B (en) 2011-12-30 2014-07-08 Method, apparatus , and system for processing audio data
HK14113112.0A HK1199543A1 (zh) 2011-12-30 2014-12-31 音頻數據的處理方法、裝置和系統
US15/188,518 US9892738B2 (en) 2011-12-30 2016-06-21 Method, apparatus, and system for processing audio data
US15/867,977 US10529345B2 (en) 2011-12-30 2018-01-11 Method, apparatus, and system for processing audio data
US16/697,822 US11183197B2 (en) 2011-12-30 2019-11-27 Method, apparatus, and system for processing audio data
US17/507,200 US11727946B2 (en) 2011-12-30 2021-10-21 Method, apparatus, and system for processing audio data
US18/344,445 US20230352035A1 (en) 2011-12-30 2023-06-29 Method, Apparatus, and System for Processing Audio Data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110455836.7A CN103187065B (zh) 2011-12-30 2011-12-30 音频数据的处理方法、装置和系统
CN201110455836.7 2011-12-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/318,899 Continuation US9406304B2 (en) 2011-12-30 2014-06-30 Method, apparatus, and system for processing audio data

Publications (1)

Publication Number Publication Date
WO2013097764A1 true WO2013097764A1 (zh) 2013-07-04

Family

ID=48678198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/087812 WO2013097764A1 (zh) 2011-12-30 2012-12-28 音频数据的处理方法、装置和系统

Country Status (18)

Country Link
US (6) US9406304B2 (zh)
EP (1) EP2793227B1 (zh)
JP (2) JP6072068B2 (zh)
KR (2) KR101770237B1 (zh)
CN (1) CN103187065B (zh)
AU (1) AU2012361423B2 (zh)
BR (1) BR112014016153B1 (zh)
CA (3) CA3059322C (zh)
ES (1) ES2610783T3 (zh)
HK (1) HK1199543A1 (zh)
IN (1) IN2014KN01436A (zh)
MX (1) MX338445B (zh)
MY (1) MY173976A (zh)
PT (1) PT2793227T (zh)
RU (3) RU2617926C1 (zh)
SG (2) SG10201609338SA (zh)
WO (1) WO2013097764A1 (zh)
ZA (2) ZA201404996B (zh)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103187065B (zh) * 2011-12-30 2015-12-16 华为技术有限公司 音频数据的处理方法、装置和系统
CN106169297B (zh) * 2013-05-30 2019-04-19 华为技术有限公司 信号编码方法及设备
US9136763B2 (en) * 2013-06-18 2015-09-15 Intersil Americas LLC Audio frequency deadband system and method for switch mode regulators operating in discontinuous conduction mode
CN111710342B (zh) * 2014-03-31 2024-04-16 弗朗霍弗应用研究促进协会 编码装置、解码装置、编码方法、解码方法及程序
US10163453B2 (en) * 2014-10-24 2018-12-25 Staton Techiya, Llc Robust voice activity detector system for use with an earphone
GB2532041B (en) * 2014-11-06 2019-05-29 Imagination Tech Ltd Comfort noise generation
CN105681512B (zh) * 2016-02-25 2019-02-01 Oppo广东移动通信有限公司 一种降低语音通话功耗的方法及装置
CN105721656B (zh) * 2016-03-17 2018-10-12 北京小米移动软件有限公司 背景噪声生成方法及装置
ES2745018T3 (es) 2016-12-12 2020-02-27 Kyynel Oy Procedimiento versátil de selección de canal para red inalámbrica
US10504538B2 (en) * 2017-06-01 2019-12-10 Sorenson Ip Holdings, Llc Noise reduction by application of two thresholds in each frequency band in audio signals
US10540983B2 (en) * 2017-06-01 2020-01-21 Sorenson Ip Holdings, Llc Detecting and reducing feedback
GB2595891A (en) * 2020-06-10 2021-12-15 Nokia Technologies Oy Adapting multi-source inputs for constant rate encoding
CN113571072B (zh) * 2021-09-26 2021-12-14 腾讯科技(深圳)有限公司 一种语音编码方法、装置、设备、存储介质及产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101087319A (zh) * 2006-06-05 2007-12-12 华为技术有限公司 一种发送和接收背景噪声的方法和装置及静音压缩系统
CN101246688A (zh) * 2007-02-14 2008-08-20 华为技术有限公司 一种对背景噪声信号进行编解码的方法、系统和装置
CN101320563A (zh) * 2007-06-05 2008-12-10 华为技术有限公司 一种背景噪声编码/解码装置、方法和通信设备
US20110228946A1 (en) * 2010-03-22 2011-09-22 Dsp Group Ltd. Comfort noise generation method and system

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7103065B1 (en) * 1998-10-30 2006-09-05 Broadcom Corporation Data packet fragmentation in a cable modem system
US6424938B1 (en) 1998-11-23 2002-07-23 Telefonaktiebolaget L M Ericsson Complex signal activity detection for improved speech/noise classification of an audio signal
EP1133886B1 (en) * 1998-11-24 2008-03-12 Telefonaktiebolaget LM Ericsson (publ) Efficient in-band signaling for discontinuous transmission and configuration changes in adaptive multi-rate communications systems
US6549587B1 (en) * 1999-09-20 2003-04-15 Broadcom Corporation Voice and data exchange over a packet based network with timing recovery
US6782360B1 (en) * 1999-09-22 2004-08-24 Mindspeed Technologies, Inc. Gain quantization for a CELP speech coder
AU1359601A (en) * 1999-11-03 2001-05-14 Tellabs Operations, Inc. Integrated voice processing system for packet networks
FI116643B (fi) * 1999-11-15 2006-01-13 Nokia Corp Kohinan vaimennus
US7920697B2 (en) 1999-12-09 2011-04-05 Broadcom Corp. Interaction between echo canceller and packet voice processing
US6691085B1 (en) 2000-10-18 2004-02-10 Nokia Mobile Phones Ltd. Method and system for estimating artificial high band signal in speech codec using voice activity information
US6615169B1 (en) * 2000-10-18 2003-09-02 Nokia Corporation High frequency enhancement layer coding in wideband speech codec
US6691805B2 (en) 2001-08-27 2004-02-17 Halliburton Energy Services, Inc. Electrically conductive oil-based mud
US7319703B2 (en) * 2001-09-04 2008-01-15 Nokia Corporation Method and apparatus for reducing synchronization delay in packet-based voice terminals by resynchronizing during talk spurts
US20030093270A1 (en) * 2001-11-13 2003-05-15 Domer Steven M. Comfort noise including recorded noise
CA2392640A1 (en) * 2002-07-05 2004-01-05 Voiceage Corporation A method and device for efficient in-based dim-and-burst signaling and half-rate max operation in variable bit-rate wideband speech coding for cdma wireless systems
FR2859566B1 (fr) * 2003-09-05 2010-11-05 Eads Telecom Procede de transmission d'un flux d'information par insertion a l'interieur d'un flux de donnees de parole, et codec parametrique pour sa mise en oeuvre
JP4572123B2 (ja) * 2005-02-28 2010-10-27 日本電気株式会社 音源供給装置及び音源供給方法
US7809559B2 (en) * 2006-07-24 2010-10-05 Motorola, Inc. Method and apparatus for removing from an audio signal periodic noise pulses representable as signals combined by convolution
US8725499B2 (en) * 2006-07-31 2014-05-13 Qualcomm Incorporated Systems, methods, and apparatus for signal change detection
US8260609B2 (en) 2006-07-31 2012-09-04 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
JP2008139447A (ja) * 2006-11-30 2008-06-19 Mitsubishi Electric Corp 音声符号化装置及び音声復号装置
US8032359B2 (en) * 2007-02-14 2011-10-04 Mindspeed Technologies, Inc. Embedded silence and background noise compression
BRPI0818927A2 (pt) * 2007-11-02 2015-06-16 Huawei Tech Co Ltd Método e aparelho para a decodificação de áudio
CN100555414C (zh) * 2007-11-02 2009-10-28 华为技术有限公司 一种dtx判决方法和装置
DE102008009718A1 (de) * 2008-02-19 2009-08-20 Siemens Enterprise Communications Gmbh & Co. Kg Verfahren und Mittel zur Enkodierung von Hintergrundrauschinformationen
DE102008009719A1 (de) 2008-02-19 2009-08-20 Siemens Enterprise Communications Gmbh & Co. Kg Verfahren und Mittel zur Enkodierung von Hintergrundrauschinformationen
CN101483495B (zh) * 2008-03-20 2012-02-15 华为技术有限公司 一种背景噪声生成方法以及噪声处理装置
CN101335000B (zh) 2008-03-26 2010-04-21 华为技术有限公司 编码的方法及装置
CN102792760B (zh) * 2010-02-25 2015-08-12 瑞典爱立信有限公司 为音乐关闭dtx
JP2012215198A (ja) * 2011-03-31 2012-11-08 Showa Corp 回転構造体
CN103187065B (zh) * 2011-12-30 2015-12-16 华为技术有限公司 音频数据的处理方法、装置和系统
JP6180544B2 (ja) * 2012-12-21 2017-08-16 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン オーディオ信号の不連続伝送における高スペクトル−時間分解能を持つコンフォートノイズの生成

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101087319A (zh) * 2006-06-05 2007-12-12 华为技术有限公司 一种发送和接收背景噪声的方法和装置及静音压缩系统
CN101246688A (zh) * 2007-02-14 2008-08-20 华为技术有限公司 一种对背景噪声信号进行编解码的方法、系统和装置
CN101320563A (zh) * 2007-06-05 2008-12-10 华为技术有限公司 一种背景噪声编码/解码装置、方法和通信设备
US20110228946A1 (en) * 2010-03-22 2011-09-22 Dsp Group Ltd. Comfort noise generation method and system

Also Published As

Publication number Publication date
US9406304B2 (en) 2016-08-02
KR20170002704A (ko) 2017-01-06
US20140316774A1 (en) 2014-10-23
KR101770237B1 (ko) 2017-08-22
KR20140109456A (ko) 2014-09-15
BR112014016153A2 (pt) 2017-06-13
MY173976A (en) 2020-03-02
SG11201403686SA (en) 2014-10-30
US11183197B2 (en) 2021-11-23
RU2641464C1 (ru) 2018-01-17
JP6072068B2 (ja) 2017-02-01
CA2861916A1 (en) 2013-07-04
ZA201600247B (en) 2016-03-30
US20220044692A1 (en) 2022-02-10
HK1199543A1 (zh) 2015-07-03
PT2793227T (pt) 2016-12-29
MX338445B (es) 2016-04-15
EP2793227B1 (en) 2016-10-26
EP2793227A1 (en) 2014-10-22
CA2861916C (en) 2019-11-19
IN2014KN01436A (zh) 2015-10-23
AU2012361423A1 (en) 2014-07-31
AU2012361423B2 (en) 2016-01-28
JP2017062512A (ja) 2017-03-30
BR112014016153B1 (pt) 2021-01-12
US20230352035A1 (en) 2023-11-02
SG10201609338SA (en) 2016-12-29
RU2579926C1 (ru) 2016-04-10
EP2793227A4 (en) 2015-03-18
CA3059322C (en) 2023-01-10
JP6462653B2 (ja) 2019-01-30
US10529345B2 (en) 2020-01-07
KR101693280B1 (ko) 2017-01-05
US20160300578A1 (en) 2016-10-13
CN103187065B (zh) 2015-12-16
US11727946B2 (en) 2023-08-15
JP2015507764A (ja) 2015-03-12
CA3059322A1 (en) 2013-07-04
CA3181066A1 (en) 2013-07-04
US20200098378A1 (en) 2020-03-26
MX2014007968A (es) 2015-01-26
CN103187065A (zh) 2013-07-03
BR112014016153A8 (pt) 2017-07-04
ES2610783T3 (es) 2017-05-03
ZA201404996B (en) 2016-06-29
US9892738B2 (en) 2018-02-13
RU2617926C1 (ru) 2017-04-28
US20180137869A1 (en) 2018-05-17

Similar Documents

Publication Publication Date Title
US11727946B2 (en) Method, apparatus, and system for processing audio data
RU2383943C2 (ru) Кодирование звуковых сигналов
US6539355B1 (en) Signal band expanding method and apparatus and signal synthesis method and apparatus
WO2009056027A1 (fr) Procédé et dispositif de décodage audio
WO2014117458A1 (zh) 高频带信号的预测方法、编/解码设备
WO2023197809A1 (zh) 一种高频音频信号的编解码方法和相关装置
JP2005241761A (ja) 通信装置及び信号符号化/復号化方法
EP2774148A1 (en) Bandwidth extension of audio signals
JP6258522B2 (ja) デバイスにおいてコーディング技術を切り替える装置および方法
CN116137151A (zh) 低码率网络连接中提供高质量音频通信的系统和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12861377

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2861916

Country of ref document: CA

Ref document number: 2014549344

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: MX/A/2014/007968

Country of ref document: MX

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2012861377

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2012861377

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20147020836

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2014131387

Country of ref document: RU

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2012361423

Country of ref document: AU

Date of ref document: 20121228

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112014016153

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112014016153

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20140627