WO2009115039A1 - 一种噪声生成方法以及噪声生成装置 - Google Patents

一种噪声生成方法以及噪声生成装置 Download PDF

Info

Publication number
WO2009115039A1
WO2009115039A1 PCT/CN2009/070856 CN2009070856W WO2009115039A1 WO 2009115039 A1 WO2009115039 A1 WO 2009115039A1 CN 2009070856 W CN2009070856 W CN 2009070856W WO 2009115039 A1 WO2009115039 A1 WO 2009115039A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter
noise
energy
frame
energy attenuation
Prior art date
Application number
PCT/CN2009/070856
Other languages
English (en)
French (fr)
Inventor
代金良
张立斌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP09722494.3A priority Critical patent/EP2259040B1/en
Priority to RU2010142929/08A priority patent/RU2469420C2/ru
Publication of WO2009115039A1 publication Critical patent/WO2009115039A1/zh
Priority to US12/886,151 priority patent/US8370136B2/en
Priority to US13/730,056 priority patent/US20130124196A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding

Definitions

  • the present invention relates to the field of communications, and in particular, to a noise generating method and a noise generating device. Background technique
  • the speech coding technology can compress the transmission bandwidth of the speech signal and increase the capacity of the communication system. Since only about 40% of the content in voice communication is packet-like, other transmissions are muted or background noise, in order to further save transmission bandwidth, DTX (Discontinuous Transmission System) / Comfort p ⁇ The generation (CNG, Comfortable Noise Generation) technology came into being.
  • a DTX strategy in the prior art is to send a Silence Insertion Descriptor (SID) frame at a fixed interval of several frames, and the CNG algorithm is a parameter (including energy) decoded by using the received two consecutive SID frames.
  • SID Silence Insertion Descriptor
  • the parameters and spectral parameters are linearly interpolated to estimate the parameters required for comfort noise synthesis.
  • the spectral parameters are used as the calculation of the synthesis filter, and the energy parameters are used as the energy of the excitation signal.
  • the excitation signal is calculated, it is filtered by a synthesis filter, and the output is the reconstructed comfort noise.
  • 3dB attenuation is added when quantifying at the encoding end, so that the comfort noise energy reconstructed by the CNG algorithm at the decoding end is lower than the actual value, in the background noise phase, even in the actual background noise.
  • the generated comfort noise can also give the listener a relatively better subjective feeling.
  • this 3dB energy attenuation is fixed in such a way that all background noise in the noise phase is attenuated the same, which may result in switching to the noise phase during the speech phase (or switching from the noise phase to the speech phase).
  • the background noise in the speech frame has higher energy, the reconstructed comfort noise energy in the noise phase is lower, and the listener can clearly hear the discontinuity of the energy, which also affects the reconstructed comfortable noise band. Give the listener a subjective feeling.
  • Embodiments of the present invention provide a noise generating method and a noise generating apparatus, which can improve a user experience.
  • the method for generating noise includes: if the received data frame is a noise frame, calculating a corresponding energy attenuation parameter according to the noise frame and the data frame received before the noise frame; The attenuation parameter attenuates the noise energy.
  • the noise generating device includes: an energy attenuation parameter calculating unit, configured to calculate a corresponding energy according to the noise frame and a data frame received before the noise frame when the received data frame is a noise frame An attenuation parameter; an energy attenuation unit, configured to attenuate noise energy according to the energy attenuation parameter.
  • the embodiments of the present invention have the following advantages:
  • the embodiment of the present invention when the received data frame is a noise frame, the corresponding energy attenuation parameter is calculated according to the noise frame and the previously received data frame, and the narrowband and/or highband noise is compared according to the energy attenuation parameter.
  • the energy is attenuated. Therefore, the embodiment of the present invention can calculate a corresponding energy attenuation parameter according to the relationship between the current noise frame and the previous data frame, and attenuate the noise energy by the energy attenuation parameter, so the energy attenuation manner is Adaptive, it can be adjusted according to the situation of the data frame, so that the comfort noise obtained by this energy attenuation method is relatively smooth, which is beneficial to improve the user experience.
  • FIG. 1 is a schematic diagram of a voice codec system using DTX/CNG technology according to an embodiment of the present invention
  • FIG. 2 is a flowchart of an embodiment of a noise generation method according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a narrowband noise generation process according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a high-noise generation process in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an embodiment of a noise generating apparatus according to an embodiment of the present invention.
  • Embodiments of the present invention provide a noise generating method and a noise generating device for improving a user experience.
  • the corresponding energy attenuation parameter is calculated according to the noise frame and the previously received data frame, and the narrowband and/or highband noise is compared according to the energy attenuation parameter.
  • the energy is attenuated, so embodiments of the present invention can be based on the current noise frame and the previous
  • the relationship between the data frames calculates the corresponding energy attenuation parameter, and the noise energy is attenuated by the energy attenuation parameter, so the energy attenuation mode is adaptive, and can be adjusted according to the data frame condition, thereby passing
  • the comfort noise obtained by this energy attenuation method is relatively smooth, which is beneficial to improve the user experience.
  • the embodiment of the present invention also adopts the DTX technology, which enables the encoder to encode the background noise signal with an encoding algorithm different from the speech signal and the encoding rate, thereby reducing the average bit rate.
  • DTX/CNG technology is to encode the background noise segment at the encoding end. It does not need to encode full-rate like a speech frame, nor does it need to transmit the encoded information of each frame, but only after several frames. Sending a smaller number of encoding parameters than the speech frame, that is, the mute insertion describes the SID frame; and at the decoding end, the entire background noise (ie, comfort) is recovered according to the parameters of the received non-continuous background noise frame. noise).
  • the noise coded frame that encodes the noise and sends it to the decoder is usually called a SID frame.
  • the SID frame generally only contains the speech parameters and the signal energy gain parameters, but there is no fixed codebook and adaptive code. This related parameter is used to reduce the average coding rate.
  • the specific application scenario in the embodiment of the present invention is as shown in FIG. 1 .
  • voice activation detection VAD, Voice Activity Detector
  • the voice is subjected to DTX processing, and then the voice frame is encoded by the voice encoder.
  • an embodiment of a method for generating a noise in an embodiment of the present invention includes:
  • the decoder decodes the parameter from the received code stream, and obtains the type information of the current data frame.
  • the type information is used to identify whether the current data frame is a voice frame or a noise frame, and the decoder can determine the current data frame according to the type information.
  • the speech frame is also a noise frame.
  • step 202 determining whether the type information indicates that the data frame is a noise frame, and if yes, executing step 204, if not, executing step 203;
  • the decoder may determine whether the current data frame is a voice frame or a noise frame according to the obtained type information. If the voice frame is a voice frame, step 203 is performed, and if it is a noise frame, step 204 is performed. 203. Perform other processing procedures, and return to step 201;
  • the decoder learns that the current data frame is a voice frame from the type information, the corresponding processing flow is performed, and the specific processing flow may be to update the noise generation parameter, and corresponding noise generation parameters are corresponding in different subsequent embodiments.
  • the update process will be described in detail in the subsequent embodiments.
  • step 201 After completing the update of the noise generation parameters, returning to step 201 continues to decode the code stream.
  • the decoder learns that the current data frame is a noise frame from the type information, the corresponding energy attenuation parameter is calculated according to the previously received data frame and the current noise frame.
  • the specific calculation manner has three cases, which will be performed in the following embodiments. With a detailed description.
  • the attenuation of the noise energy includes attenuation of the high noise energy and attenuation of the noise of the narrowband noise. It can be understood that, in practical applications, the attenuation may be performed only for the high noise energy, or only The narrowband noise energy is attenuated, or the highband noise energy and the narrowband noise energy are simultaneously attenuated. In the present embodiment and the subsequent embodiments, the highband noise energy and the narrowband noise energy are simultaneously attenuated as an example.
  • the narrowband and the highband together form a broadband, wherein the broadband refers to a bandwidth of 0 to 8000 Hz, the narrowband refers to a bandwidth of 0 to 4000 Hz, and the highband refers to a bandwidth of 4001 Hz to 8000 Hz, and the bandwidth of the above narrowband and highband
  • the division method is only a case. In practical applications, narrowband and highband can also be divided according to specific needs.
  • the energy of the noise is divided into a narrowband signal component and a highband signal component, i.e., the comfort noise signal generated by the decoder includes a narrowband signal component and a highband signal component.
  • the specific attenuation process can be divided into two categories:
  • the flow of narrowband noise generation in this embodiment includes:
  • the attenuated narrowband signal component is calculated according to the attenuated narrowband core layer energy parameter.
  • the following is a specific example:
  • the received SID frame narrowband core layer energy parameter is represented by G formulate 3 ⁇ 4
  • the narrowband core layer language parameter is represented by.
  • the narrowband energy parameter is attenuated according to the calculated energy attenuation parameter ct:
  • the narrowband spectral parameters are converted into synthetic filter coefficients, and Gaussian random noise is used as the excitation signal, filtered by the synthesis filter, and then subjected to energy Gön 3 ⁇ 4 shaping to generate a narrowband signal component ⁇ ( ) of the background noise.
  • the calculation of the high-band signal component may use the reconstructed narrow-band coding parameter or the reconstructed narrow-band signal component.
  • the high-noise generation process in this embodiment includes: acquiring a high-band core layer time domain. Envelope parameters and high-band core layer frequency domain envelope parameters;
  • the attenuated high-band signal component is calculated according to the attenuated high-band core layer time domain envelope parameter and the attenuated high-band core layer frequency domain envelope parameter.
  • the same is given by a specific example:
  • the time domain envelope of the broadband core layer is represented by re
  • the frequency domain envelope is represented by Ee
  • the energy attenuation parameter is represented by fact.
  • the narrowband energy parameter is attenuated according to the calculated energy attenuation parameter ct:
  • narrowband parameters such as pitch delay, fixed codebook gain, and adaptive codebook gain are first estimated using the reconstructed narrowband coding parameters or the reconstructed narrowband signal components, and then based on the estimated pitch delay and fixed codebook gain.
  • the narrowband parameters such as adaptive codebook gain are used to appropriately shape the white noise generated by the random sequence generator as the excitation source, and then the reconstructed wideband coding parameters Te, Fe are respectively used for time domain shaping and frequency domain shaping of the excitation source, that is, High-band signal components that generate background noise
  • the decoder will reconstruct the narrowband signal component s) and the highband signal component respectively and then the narrowband signal component and the highband signal.
  • the components are filtered using a synthesis filter bank, and finally the broadband comfort noise s ra ( ) is obtained.
  • the energy parameters of the narrowband core layer, the spectral parameters of the narrowband core layer, the envelope parameters of the high-band core layer, and the envelope parameters of the high-band core layer are obtained;
  • the narrowband core layer spectral parameter uses the time domain envelope of the broadband core layer 73 ⁇ 4, the frequency domain envelope Ee to calculate the narrowband signal component and the highband signal component.
  • the obtained narrowband signal component and the highband signal component are synthesized and filtered to obtain a broadband comfort noise signal ⁇ (w), and then the energy attenuation parameter fact is directly used for broadband comfort noise.
  • the line energy attenuation may specifically be the product of the broadband comfort noise signal and the energy attenuation parameter as the attenuated broadband comfort noise signal.
  • the narrowband signal component and the highband signal component may be respectively attenuated and then combined, specifically as follows:
  • the attenuated narrowband signal component and the attenuated highband signal component are combined to obtain an attenuated wideband signal component.
  • the attenuation may be performed on the narrowband signal and the highband signal at the same time, or may be attenuated only for one of the signals, which is not limited herein.
  • the attenuation of the noise energy may be completed at the decoding end or may be performed at the encoding end.
  • the noise energy attenuation of the decoding end is performed, if the encoding end is completed.
  • the encoding end also attenuates the noise energy according to the manner in the above embodiment, and sends the attenuated narrowband encoding parameter and the highband encoding parameter to the decoding end, and the decoding end is based on the attenuation.
  • the narrowband coding parameters and the highband coding parameters respectively calculate the attenuated narrowband signal component and the highband signal component, and combine the two components to obtain a bandwidth signal component.
  • the specific process may include:
  • the encoding end calculates the energy attenuation parameter
  • the data frame including the energy attenuation parameter is sent to the decoding end; Then, the decoding end attenuates the noise energy according to the energy attenuation parameter in the received data frame to obtain a comfort noise signal.
  • the encoding end performs the noise energy attenuation according to the calculated energy attenuation parameter, and then sends the data frame that is attenuated by the noise energy to the decoding end;
  • the decoding end generates a comfort noise signal according to the data frame.
  • the energy attenuation parameter is generated by calculating an energy attenuation parameter according to the VAD switching frequency:
  • the specific process includes:
  • the smear parameter If the type information indicates that the data frame is a voice frame, set the smear parameter to a preset maximum smear length, and if the type information indicates that the data frame is a noise frame, perform the smearing parameter Decrement until the preset value is reached;
  • the decoder decodes the parameter from the received code stream, determines the frame type information of the current frame, and detects whether a VAD switching occurs: if the previous frame is a voice frame and the current frame is a noise frame, or if If the previous frame is a noise frame and the current frame is a voice frame, it is considered that VAD switching occurs, and the VAD switching counter is incremented by one; in addition, an energy attenuation trailing counter (tailing parameter) g_ho is set, and is set to be in the speech frame.
  • the maximum trailing length is MAX—G—HANGOVER. The maximum trailing length can be set by the actual situation. It is not limited here.
  • the trailing parameter is set to MAX—G—HANGOVER every time a voice frame is detected.
  • the trailing parameter is decremented by one until the preset value is reached.
  • the preset value can be determined according to the specific situation. In this embodiment, an example in which 0 is used as the preset value is described.
  • an observation window with a window length of MAX—WINDOW, and the unit is a frame.
  • the window length can be set by the actual situation.
  • another position counter is set to record the position of the currently received data frame in the observation window. If the current frame reaches the end of the observation window, the VAD switching counter VadSw is smoothed for a long time to obtain a long-term average VAD switching frequency.
  • VadSwtLT (VadSwtLT + VadSw) 11 , while shifting the observation window to the MAX_WINDO W frame, and then A ⁇ w is set to 0.
  • the switching frequency within a certain period can be counted according to actual needs.
  • the energy attenuation parameter is first calculated to attenuate the background noise energy reconstructed by the CNG, and the energy attenuation operation can be performed in the parameter domain before the synthesis filtering.
  • the synthesis filter can be attenuated in the time domain after synthesis filtering.
  • the formula for calculating the energy attenuation parameter is as follows:
  • the minimum value of fact is the preset attenuation coefficient, which is a constant value, which is used to indicate the minimum attenuation degree.
  • the specific value can be set according to the actual situation.
  • ⁇ ⁇ ) ⁇ is also a constant value, which is used to indicate the weight of the switching frequency parameter and the trailing parameter in the energy attenuation parameter, that is, the influence on the energy attenuation parameter, wherein if the level of the background noise is relatively high, Set the value to be larger to increase the influence of the trailing parameter on the energy attenuation parameter. If the background noise is very unstable, such as background noise, the energy is high, and sometimes the energy is low, the value of ⁇ can be set larger. To increase the influence of the switching frequency parameter on the energy attenuation parameter.
  • the above describes the calculation process of the energy attenuation parameter in the present mode. It can be understood that the above formula is only a specific example, as long as the energy attenuation parameter is proportional to the sum of the switching frequency parameter and the trailing parameter, and the switching frequency parameter and The preset maximum tail length is inversely proportional to the sum, and the specific formula is not limited.
  • the degree is lower.
  • the attenuation on the other hand, if there is less switching between different types of frames, a higher degree of attenuation is used, so the specific degree of attenuation is related to the switching frequency between different types of frames, thereby improving the user experience.
  • the energy attenuation parameter is generated by calculating the energy attenuation parameter according to the SID frame interval:
  • the specific process includes: Calculating an average interval parameter between the current noise frame and the most recently received noise frame before the current noise frame;
  • the energy decay parameter is inversely proportional to the average interval parameter.
  • the decoder before the decoder decodes a frame, first determine the type of the current frame according to the received parameter.
  • a long-term average record (average interval parameter) sid_dist_lt of a SID frame interval is established, and each time a SID frame is received, the interval sid between the SID frame and the last received SID frame is used.
  • _ dist _ cur updates the long-term SID frame interval as shown below:
  • Sid _ dist _lt ⁇ * sid _ dist _ lt + (l - S) * sid _ dist _ cur
  • the long-term SID frame interval i3 ⁇ 4t_ /t is set to 1.
  • the energy attenuation parameter can be calculated.
  • the specific formula is as follows:
  • the energy attenuation parameter when the average interval parameter is greater than the preset value K, the energy attenuation parameter is inversely proportional to the average interval parameter. If the average interval parameter is less than or equal to K, the energy attenuation parameter is 1, that is, no attenuation is performed.
  • K is a preset value used to indicate the threshold of the SID frame interval. That is, if the average interval between the two SID frames is relatively large, the noise is relatively stable, so that it can be attenuated. The average interval between two SID frames is relatively small, which means that the noise is not stable and is not attenuated, so that the user experience difference is large, so that the user experience can be improved.
  • the energy attenuation parameter is generated by calculating an energy attenuation parameter according to the VAD switching frequency and the SID frame interval:
  • the specific process includes:
  • the energy attenuation parameter is proportional to the sum of the switching frequency parameter and the trailing coefficient, the energy attenuation parameter being inversely proportional to the sum of the switching frequency parameter, the preset maximum trailing length, and the average spacing parameter.
  • the decoder decodes the parameter from the received code stream, determines the frame type information of the current frame, and detects whether a VAD switching occurs: if the previous frame is a voice frame and the current frame is a noise frame, or if If the previous frame is a noise frame and the current frame is a voice frame, it is considered that VAD switching occurs, and the VAD switching counter is incremented by one; in addition, an energy attenuation trailing counter (tailing parameter) g_ho is set, and is set to be in the speech frame.
  • the maximum trailing length is MAX—G—HANGOVER. The maximum trailing length can be set by the actual situation. It is not limited here.
  • the trailing parameter is set to MAX—G—HANGOVER every time a voice frame is detected. Decrease the trailing parameter by 1 until 0 in the noise frame.
  • an observation window with a window length of MAX—WINDOW and the unit is a frame.
  • the window length can be set by the actual situation.
  • a long-term average record wW_i3 ⁇ 4t_/t of a SID frame interval is established, and each time a SID frame is received, the interval between the SID frame and the last received SID frame is used sid_dist_cur Update the long-term SID frame interval as shown below:
  • Sid _ dist _lt ⁇ * sid _ dist _ lt + (l - S) * sid _ dist _ cur
  • the long-term SID frame interval i3 ⁇ 4t_ /t is set to 1.
  • the energy attenuation parameter can be calculated.
  • the specific formula is as follows:
  • the energy attenuation parameter is inversely proportional to the average interval parameter. If the average interval parameter is less than or equal to ⁇ , the energy attenuation parameter is 1, that is, no attenuation is performed, and the ⁇ is a pre- The value is used to indicate the threshold of the SID frame interval. That is, if the average interval between the two SID frames is relatively large, the noise is relatively stable, so that it can be attenuated if two SID frames The average interval between the two is relatively small, which means that the noise is not stable, so it is not attenuated.
  • this method combines the advantages of the above two methods, using the switching frequency as the attenuation basis and also the noise stabilization. Sex is used as the basis for attenuation, so it is possible to further avoid the situation where the subjective experience of the user is greatly different, thereby improving the user experience.
  • the noise generating apparatus in the embodiment of the present invention includes:
  • the decoding unit 501 is configured to decode the received code stream to obtain an encoding parameter and type information of a current data frame.
  • a type checking unit 502 configured to determine whether the type information indicates that the data frame is a noise frame
  • an energy attenuation parameter calculating unit 503, configured to: according to the noise frame and the noise frame, when the current frame is a noise frame The previously received data frame calculates a corresponding energy attenuation parameter
  • An energy attenuation unit 504 is configured to attenuate narrowband and/or highband noise energy based on the energy attenuation parameter.
  • the energy attenuation parameter calculation unit 503 in this embodiment may further include one or all of the following units:
  • the switching frequency recording unit 5032 is configured to determine whether the type of the data frame is consistent with the type of the data frame that was received recently before the data frame, and if not, the switching frequency parameter is counted;
  • a smear counting unit 5034 configured to set a smear parameter to a preset maximum smear length when the type information indicates that the data frame is a voice frame, and when the type information indicates that the data frame is a noise frame, The trailing parameter is decremented until a preset value is reached.
  • the energy attenuation parameter calculation unit 503 in this embodiment may further include:
  • a noise frame interval recording unit 5031 configured to determine a type of a data frame according to the decoding unit The information records the average interval parameter between the current noise frame and the most recently received noise frame prior to the current noise frame.
  • the energy attenuation parameter calculation unit 503 in this embodiment may further include:
  • the calculation executing unit 5033 is configured to calculate an energy attenuation parameter according to the switching frequency parameter and/or the average interval parameter.
  • the calculation executing unit 5033 in this embodiment may further include at least one of the following units:
  • a first calculating unit 5033 configured to calculate an energy attenuation parameter according to the switching frequency parameter, the smearing parameter, a preset attenuation coefficient, and the preset maximum smear length; the energy attenuation parameter and the switching frequency parameter And the sum of the tailing coefficients is inversely proportional to the sum of the switching frequency parameter and the preset maximum trailing length.
  • a second calculating unit 50332 configured to calculate an average interval parameter between a current noise frame and a noise frame recently received before the current noise frame; calculate an energy attenuation parameter according to the average interval parameter and a preset attenuation coefficient; The energy decay parameter is inversely proportional to the average interval parameter.
  • a third calculating unit 50333 configured to calculate an average interval parameter between the current noise frame and a noise frame recently received before the current noise frame; according to the switching frequency parameter, the smearing parameter, the average interval parameter, Calculating an energy attenuation parameter by a preset attenuation coefficient and the preset maximum tail length; the energy attenuation parameter is proportional to a sum of a switching frequency parameter and a trailing coefficient, the energy attenuation parameter and a switching frequency parameter, preset The maximum tail length and the average interval parameter are inversely proportional.
  • the decoding unit 501 and the type checking unit 502 are optional units, that is, these functions may not be completed in the noise generating device but are completed by other external devices.
  • the energy attenuation parameter calculation unit 503 can calculate the energy attenuation parameter according to the switching frequency, and can also calculate the energy attenuation parameter according to the noise frame interval, and can also calculate the energy attenuation parameter according to the switching frequency and the noise frame interval, and the specific calculation process.
  • the detailed description has been made in the above method embodiments, and the processes here are similar and will not be described again.
  • the corresponding energy attenuation parameter is calculated according to the noise frame and the previously received data frame, and the narrowband and/or highband noise is compared according to the energy attenuation parameter.
  • the energy is attenuated, so embodiments of the present invention can be based on the current noise frame and the previous
  • the relationship between the data frames calculates the corresponding energy attenuation parameter, and the noise energy is attenuated by the energy attenuation parameter, so the energy attenuation mode is adaptive, and can be adjusted according to the data frame condition, thereby passing
  • the comfort noise obtained by this energy attenuation method is relatively smooth, which is beneficial to improve the user experience.
  • the received data frame is a noise frame, calculating a corresponding energy attenuation parameter according to the noise frame and the data frame received before the noise frame;
  • the noise energy is attenuated according to the energy attenuation parameter to obtain a comfort noise signal.
  • the above-mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Noise Elimination (AREA)

Description

一种噪声生成方法以及噪声生成装置
本申请要求于 2008 年 3 月 20 日提交中国专利局、 申请号为 200810085175. K 发明名称为 "一种噪声生成方法以及噪声生成装置" 的中国 专利申请的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通讯领域, 尤其涉及一种噪声生成方法以及噪声生成装置。 背景技术
目前的数据传输系统中,语音编码技术可以压缩语音信号的传输带宽, 增 加通信系统的容量。 由于语音通信中只有大约 40 %的内容是包 ^吾音的, 其 它传输的内容都是静音或背景噪声, 为了进一步节省传输带宽, 非连续传输系 统( DTX, Discontinuous Transmission System )/舒适 p喿声生成( CNG, Comfortable Noise Generation )技术应运而生。
现有技术中一种 DTX策略是按照固定间隔若干帧发送一次静音插入描述 ( SID, Silence Insertion Descriptor ) 帧, 采用的 CNG算法则是利用接收到的 连续两帧 SID帧解码出的参数 (包括能量参数和谱参数 )进行线性插值, 以估 算出舒适噪声合成所需要的参数。
重建出能量参数和谱参数之后,谱参数用作合成滤波器的计算, 能量参数 用作激励信号的能量。 计算出激励信号之后, 用合成滤波器进行滤波, 输出即 为重建出的舒适噪声。
上述方案中, 对于能量参数, 在编码端量化时加入了 3dB 的衰减, 这样 在解码端 CNG算法重建出的舒适噪声能量相比于实际值较低, 在背景噪声阶 段, 即使在实际的背景噪声能量较高的情况下, 生成的舒适噪声也能给听者相 对较好的主观感受。
但是, 这种 3dB 的能量衰减采用的是固定的方式, 即对噪声阶段的所有 的背景噪声都进行相同的衰减, 这样可能会导致在语音阶段向噪声阶段切换 (或者噪声阶段向语音阶段切换)时, 语音帧中的背景噪声有较高的能量, 噪 声阶段的重建出的舒适噪声能量却较低,听者能够明显听出这种能量的不连续 性, 同样会影响重建出的舒适噪声带给听者的主观感受。
发明内容 本发明实施例提供了一种噪声生成方法以及噪声生成装置 ,能够提高用户 体验。
本发明实施例提供的噪声生成方法, 包括: 若接收的数据帧为噪声帧, 则 根据所述噪声帧以及在所述噪声帧之前接收到的数据帧计算对应的能量衰减 参数; 根据所述能量衰减参数对噪声能量进行衰减。
本发明实施例提供的噪声生成装置, 包括: 能量衰减参数计算单元, 用于 在接收的数据帧为噪声帧时根据所述噪声帧以及在所述噪声帧之前接收到的 数据帧计算对应的能量衰减参数; 能量衰减单元, 用于根据所述能量衰减参数 对噪声能量进行衰减。
从以上技术方案可以看出, 本发明实施例具有以下优点:
本发明实施例中, 当接收到的数据帧为噪声帧时,根据该噪声帧以及之前 接收到的数据帧计算对应的能量衰减参数,并根据所述能量衰减参数对窄带和 /或高带噪声能量进行衰减, 因此本发明实施例可以根据当前噪声帧与之前的 数据帧之间的联系计算对应的能量衰减参数,并通过该能量衰减参数对噪声能 量进行衰减, 所以这种能量衰减的方式是自适应的,是可根据数据帧的情况进 行调整的,从而通过这种能量衰减方式得到的舒适噪声比较平滑,有利于提高 用户体验。
附图说明
图 1为本发明实施例使用 DTX/CNG技术的语音编解码器系统示意图; 图 2为本发明实施例中噪声生成方法实施例流程图;
图 3为本发明实施例中窄带噪声生成流程示意图;
图 4为本发明实施例中高带噪声生成流程示意图;
图 5为本发明实施例中噪声生成装置实施例示意图。
具体实施方式
本发明实施例提供了一种噪声生成方法以及噪声生成装置,用于提高用户 体验。
本发明实施例中, 当接收到的数据帧为噪声帧时,根据该噪声帧以及之前 接收到的数据帧计算对应的能量衰减参数,并根据所述能量衰减参数对窄带和 /或高带噪声能量进行衰减 , 因此本发明实施例可以根据当前噪声帧与之前的 数据帧之间的联系计算对应的能量衰减参数,并通过该能量衰减参数对噪声能 量进行衰减, 所以这种能量衰减的方式是自适应的,是可根据数据帧的情况进 行调整的,从而通过这种能量衰减方式得到的舒适噪声比较平滑,有利于提高 用户体验。
本发明实施例同样采用了 DTX技术, 该技术使得编码器可以对背景噪声 信号采用不同于语音信号的编码算法和编码速率进行编码, 降低了平均码率。 简单说来, DTX/CNG技术就是在编码端对背景噪声段进行编码时, 不需要像 语音帧那样进行全速率的编码, 也不需要每帧的编码信息都进行传输, 而是相 隔若干帧才发送一次相比于语音帧更少量的编码参数, 即静音插入描述 SID 帧即可; 而在解码端, 则根据接收到的非连续的背景噪声帧的参数, 恢复出整 段背景噪声 (即舒适噪声)。 相对于正常的语音编码帧, 对噪声进行编码并且 发送给解码器的噪声编码帧通常称为 SID帧, SID帧中一般只包含语参数和信 号能量增益参数, 而没有固定码本、 自适应码本相关的参数, 以降低平均编码 速率。
本发明实施例中具体的应用场景如图 1所示, 其中,输入语音之后经过语 音激活检测( VAD, Voice Activity Detector )之后 , 对语音进行 DTX处理后分 别经过语音编码器对语音帧进行编码进行全速率连续编码,以及经过噪声编码 器对噪声帧进行非全速率的非连续性编码, 并经过信道发送至解码端,解码端 进行参数译码, 根据语音帧对进行语音解码, 并根据噪声帧生成舒适噪声, 之 后输出语音解码的结果以及舒适噪声。
请参阅图 2, 本发明实施例中噪声生成方法实施例包括:
201、 对接收到的码流进行译码得到当前数据帧的类型信息;
解码器从接收到的码流中译码出参数, 获得当前数据帧的类型信息,该类 型信息用于标识当前数据帧为语音帧还是噪声帧,解码器可以根据该类型信息 判断当前数据帧为语音帧还是噪声帧。
202、 判断该类型信息是否指示所述数据帧为噪声帧, 若是, 则执行步骤 204, 若否, 则执行步骤 203;
本实施例中,解码器可以根据获得到的类型信息判断当前数据帧为语音帧 还是噪声帧, 若为语音帧, 则执行步骤 203 , 若为噪声帧, 则执行步骤 204。 203、 执行其他处理流程, 并返回步骤 201 ;
若解码器从类型信息获知当前数据帧为语音帧 , 则执行相应的处理流程, 具体的处理流程可以为对噪声生成参数进行更新,在后续的不同实施例中对应 有不同的噪声生成参数, 具体更新过程将在后续实施例中详细描述。
完成噪声生成参数的更新之后, 返回步骤 201继续对码流进行译码。
204、 根据噪声帧以及在该噪声帧之前接收到的数据帧计算对应的能量衰 减参数;
若解码器从类型信息获知当前数据帧为噪声帧,则根据之前接收到的数据 帧以及当前的噪声帧一起计算对应的能量衰减参数,具体的计算方式有三种情 况, 将在后续实施例中进行伴细描述。
具体的噪声帧的结构如下表所示:
表 1
Figure imgf000006_0001
205、 根据所述能量衰减参数对噪声能量进行衰减得到舒适噪声信号。 本实施例中,对噪声能量进行衰减包括对高带噪声能量的衰减以及对窄带 噪声能量的衰减, 可以理解的是, 在实际应用中, 可以只针对高带噪声能量进 行衰减,也可以只针对窄带噪声能量进行衰减,或者对高带噪声能量以及窄带 噪声能量同时进行衰减 ,在本实施例以及后续的实施例中均以同时对高带噪声 能量以及窄带噪声能量进行衰减为例进行说明。
窄带和高带共同构成宽带,其中宽带是指 0 ~ 8000Hz的带宽,窄带是指 0 ~ 4000Hz的带宽, 高带是指 4001Hz ~ 8000Hz的带宽, 上述窄带与高带的带宽 划分方式只是一种情况,在实际应用中同样还可以根据具体需求划分窄带和高 带。
噪声的能量分为窄带信号分量以及高带信号分量,即解码器生成的舒适噪 声信号包括窄带信号分量以及高带信号分量。
具体的衰减过程可以分为两类:
A、 合成滤波之前在参数域进行能量衰减:
由于舒适噪声分为窄带信号分量以及高带信号分量两个部分,因此分别进 行说明, 请参阅图 3, 本实施例中窄带噪声生成的流程包括:
获取窄带核心层能量参数;
将所述窄带核心层能量参数与所述能量衰减参数进行乘积运算得到衰减 后的窄带核心层能量参数;
根据所述衰减后的窄带核心层能量参数计算衰减后的窄带信号分量。 为便于理解, 下面以一具体实例进行说明:
首先假设接收到的 SID帧窄带核心层能量参数用 G„¾表示, 窄带核心层语 参数用 表示。
则根据计算出的能量衰减参数 ct对窄带能量参数进行衰减:
衰减后的窄带核心层能量参数 G„¾ =G„¾ * Ct ,则重建出的窄带编码参数为
G n sf 。
将窄带谱参数 转换成合成滤波器系数 , 用高斯随机噪声作为激励 信号, 经过合成滤波器滤波, 再经过能量 G„¾整形, 即可生成背景噪声的窄带 信号分量 ^( )。
本实施例中,高带信号分量的计算可以用到重建的窄带编码参数或者重建 的窄带信号分量, 具体请参阅图 4, 本实施例中高带噪声生成的流程包括: 获取高带核心层时域包络参数以及高带核心层频域包络参数;
将所述高带核心层时域包络参数以及高带核心层频域包络参数分别与所 述能量衰减参数进行乘积运算得到衰减后的高带核心层时域包络参数以及衰 减后的高带核心层频域包络参数;
根据所述衰减后的高带核心层时域包络参数以及衰减后的高带核心层频 域包络参数计算衰减后的高带信号分量。 为便于理解, 同样以一具体实例进行说明:
首先假设宽带核心层的时域包络用 re表示、 频域包络用 Ee表示, 能量衰 减参数用 fact表示。
则根据计算出的能量衰减参数 ct对窄带能量参数进行衰减:
衰减后的核心层的时 i或包络 7¾ = 7¾ * fact , 衰减后的频 i或包络 = fact。 如图 4所示,首先利用重建的窄带编码参数或者重建的窄带信号分量估计 出基音延迟、 固定码本增益和自适应码本增益等窄带参数,之后根据估计出的 基音延迟、固定码本增益和自适应码本增益等窄带参数对随机序列发生器生成 的白噪声进行适当整形作为激励源,再用重建出的宽带编码参数 Te, Fe 分别对 激励源进行时域整形和频域整形, 即可生成背景噪声的高带信号分量
需要说明的是, 如果接收到的码流既包含窄带编码参数, 又包含宽带编码 参数, 则解码器将分别重建出窄带信号分量 s )和高带信号分量 然后再 对窄带信号分量和高带信号分量使用合成滤波器组滤波,最后即可得到宽带的 舒适噪声 s ra( )。
上述描述的是在参数域进行能量衰减的情况, 可以理解的是,在实际应用 中能量衰减同样可以在滤波之后对滤波结果进行能量衰减。
B、 滤波之后对滤波结果进行能量衰减:
本方式中, 获取窄带核心层能量参数, 窄带核心层谱参数, 高带核心层时 域包络参数以及高带核心层频域包络参数;
根据所述窄带核心层能量参数以及窄带核心层谱参数计算窄带信号分量; 根据所述高带核心层时域包络参数以及高带核心层频域包络参数计算高 带信号分量;
对所述窄带信号分量以及高带信号分量进行组合得到宽带信号分量; 根据所述能量衰减参数对所述宽带信号分量进行衰减。
具体可以为: 按照原先的 SID帧窄带核心层能量参数 G„¾ , 窄带核心层谱 参数用 宽带核心层的时域包络 7¾、 频域包络 Ee计算窄带信号分量 以 及高带信号分量
之后对得到的窄带信号分量以及高带信号分量进行合成滤波得到宽带舒 适噪声信号^ (w) ,之后再利用能量衰减参数 fact直接对宽带舒适噪声 »进 行能量衰减,具体可以为将宽带舒适噪声信号与能量衰减参数的乘积作为衰减 后的宽带舒适噪声信号。
上述描述的是对宽带舒适噪声信号进行衰减的情况,在实际应用中还可以 分别对窄带信号分量以及高带信号分量进行衰减之后再组合, 具体为:
获取窄带核心层能量参数, 窄带核心层谱参数, 高带核心层时域包络参数 以及高带核心层频域包络参数;
根据所述窄带核心层能量参数以及窄带核心层谱参数计算窄带信号分量; 根据所述高带核心层时域包络参数以及高带核心层频域包络参数计算高 带信号分量;
根据所述能量衰减参数分别对所述窄带信号分量以及高带信号分量进行 衰减得到衰减后的窄带信号分量以及衰减后的高带信号分量;
对所述衰减后的窄带信号分量以及衰减后的高带信号分量进行组合得到 衰减后的宽带信号分量。
上述过程描述的是同时针对高带信号分量以及窄带信号分量进行衰减之 后再组合的情况,在实际应用中,同样可以只对其中的任意一个分量进行衰减, 再与另外一个分量组合成衰减后的宽带舒适噪声信号。
需要说明的是,在实际应用中, 可以同时针对窄带信号以及高带信号进行 衰减, 也可以仅针对其中一种信号进行衰减, 此处不作限定。
需要说明的是,本发明实施例中,对噪声能量进行衰减可以在解码端完成, 也可以在编码端完成, 上述实施例中描述的是解码端进行噪声能量衰减的情 况, 若在编码端完成对噪声能量的衰减, 则编码端同样按照上述实施例中的方 式对噪声能量进行衰减,并向解码端发送包含有衰减后的窄带编码参数以及高 带编码参数,由解码端根据该衰减后的窄带编码参数以及高带编码参数分别计 算衰减后的窄带信号分量以及高带信号分量,并对这两个分量进行组合得到带 宽信号分量。
需要说明的是, 若由编码端完成噪声能量的衰减, 则在进行衰减之后需要 将相应的数据帧发送至解码端 , 具体的过程可以包括:
编码端计算得到能量衰减参数之后向解码端发送包含所述能量衰减参数 的数据帧; 再由解码端根据接收到的数据帧中的能量衰减参数对噪声能量进行衰减 得到舒适噪声信号。
或者编码端根据计算得到的能量衰减参数进行噪声能量衰减之后向解码 端发送经过噪声能量衰减的数据帧;
再由解码端根据所述数据帧生成舒适噪声信号。
下面对本发明实施例中的能量衰减参数的生成过程进行描述:
本发明一个实施例中能量衰减参数的生成过程为根据 VAD切换频率计算 能量衰减参数:
具体流程包括:
判断所述数据帧的类型与在所述数据帧之前最近被接收到的数据帧的类 型是否一致, 若不一致, 则对切换频率参数进行统计;
若所述类型信息指示所述数据帧为语音帧,则设置拖尾参数为预置的最大 拖尾长度, 若所述类型信息指示所述数据帧为噪声帧, 则对所述拖尾参数进行 递减直至达到预置数值;
具体地: 解码器从接收到的码流中译码出参数, 判断出当前帧的帧类型信 息, 检测是否发生了 VAD的切换: 如果前一帧是语音帧且当前帧是噪声帧, 或者如果前一帧是噪声帧且当前帧是语音帧, 则认为发生 VAD的切换, VAD 切换计数器^ 加 1; 另外设置一个能量衰减拖尾计数器(拖尾参数) g—ho , 在语音帧时置为最大拖尾长度 MAX— G— HANGOVER,该最大拖尾长度可以由 实际情况进行设置, 此处不作限定,每次检测到语音帧时都将该拖尾参数设置 为 MAX— G— HANGOVER,检测到噪声帧时将该拖尾参数减 1直至达到预置的 数值, 该预置的数值可以根据具体情况进行确定, 本实施例中, 以 0作为预置 数值的例子进行说明。
为了统计某一周期内的切换频率,则需要设置一个检测周期,具体可以为: 使用一个观察窗, 窗长为 MAX— WINDOW, 单位为帧, 该窗长可以由实际情 况进行设置, 此处不作限定, 另外设置一个位置计数器, 记录当前接收到的数 据帧在观察窗中的位置, 如果当前帧到达观察窗的末尾, 则对 VAD切换计数 器 VadSw进行长时平滑, 获得长时平均的 VAD切换频率(切换频率参数) VadSwtLT = (VadSwtLT + VadSw) 11 , 同时将观察窗平移 MAX_WINDO W帧, 再将 a^w置 0, 采用这种方式, 则可以根据实际需要统计某一周期内的切换频率。 如果当前帧是噪声帧, 则在进行 CNG重建背景噪声时, 要先计算出能量 衰减参数, 对 CNG重建出的背景噪声能量进行衰减, 该能量衰减操作可以在 合成滤波之前在参数域进行,也可以完成合成滤波之后在时域对合成滤波器的 输出进行衰减。 计算能量衰减参数的公式如下式所示:
fact = a + - a) β VadSwtLT " S - ho
β VadSwtLT + γ MAX _ G _ HANGOVER
其中, 《为 fact的最小值, 即预置的衰减系数, 该系数为一常数值, 用于 指示最小的衰减程度, 具体数值可以根据实际情况进行设置。
β ^) γ同样为常数值, 分别用于表示切换频率参数以及拖尾参数在能量衰 减参数中的权重, 即对能量衰减参数的影响大小, 其中, 若背景噪声的电平比 较高, 则可以将 的数值设置的大一些, 以加大拖尾参数对能量衰减参数的影 响, 若背景噪声非常不稳定, 例如背景噪声时而能量高, 时而能量低, 则可以 将 β的数值设置的大一些 , 以加大切换频率参数对能量衰减参数的影响。
上述描述了本方式下的能量衰减参数的计算过程, 可以理解的是, 上述公 式仅是一个具体实例,只要使得能量衰减参数与切换频率参数以及拖尾参数之 和成正比, 与切换频率参数以及预置最大拖尾长度之和成反比即可,具体公式 不作限定。
从上述实施例中可以看出, 若不同类型帧之间的切换比较频繁, 即 VadSwtLT的数值比较大,且由于拖尾参数在每次检测到语音帧时会被设置为最 大拖尾长度, 只有在检测到噪声帧时才会减 1 , 由于切换频繁, 所以语音帧和 噪声帧的交替比较快,因此拖尾参数的数值仅会比其预置的最大拖尾长度略小 一点, 因此上述公式中计算得到的能量衰减参数会比较大, 而由前述能量衰减 的过程可知, 能量衰减参数数值越大, 则衰减程度越低, 所以若不同类型帧之 间的切换比较频繁, 则采用较低程度的衰减, 反之, 若不同类型帧之间的切换 比较少, 则采用较高程度的衰减, 因此具体的衰减程度与不同类型帧之间的切 换频率相关, 从而可以提高用户体验。
本发明一个实施例中能量衰减参数的生成过程为根据 SID 帧间隔计算能 量衰减参数:
具体流程包括: 计算当前噪声帧与在当前噪声帧之前最近接收到的噪声帧之间的平均间 隔参数;
根据所述平均间隔参数以及预置的衰减系数计算能量衰减参数;
所述能量衰减参数与所述平均间隔参数成反比。
具体地, 解码器对一帧解码前, 先根据接收到的参数判断出当前帧的类型
(语音帧还是噪声帧),建立一个 SID帧间隔的长时平均纪录(平均间隔参数 ) sid_dist_lt , 每接收到一个 SID帧, 则使用该 SID帧与上一次接收到的 SID帧 之间的间隔 sid _ dist _ cur更新长时 SID帧间隔, 如下式所示:
sid _ dist _lt = δ * sid _ dist _ lt + (l - S) * sid _ dist _ cur
其中 大于或等于 0, 小于或等于 1 , 指示长时平均 SID帧间隔的更新速 度。 如果接收到语音帧, 则将长时 SID帧间隔 i¾t— /t置为 1。
获取到平均间隔参数之后即可计算能量衰减参数, 具体公式如下:
Figure imgf000012_0001
由上述公式可以看出, 当平均间隔参数大于预置值 K时, 能量衰减参数 与平均间隔参数成反比,若平均间隔参数小于或等于 K,则能量衰减参数为 1 , 即不进行衰减, 该 K为一个预置的数值, 用于指示 SID帧间隔的门限值, 也 就是说, 若两个 SID帧之间的平均间隔比较大, 则说明噪声比较稳定, 从而可 以对其进行衰减,若两个 SID帧之间的平均间隔比较小,则说明噪声不太稳定, 从而不对其进行衰减, 因此可以避免出现用户主观体验差别较大的情况,从而 提高用户体验。
上述描述了本方式下的能量衰减参数的计算过程, 可以理解的是, 上述公 式仅是一个具体实例, 只要使得能量衰减参数与平均间隔参数成反比即可,具 体公式不作限定。
本发明一个实施例中能量衰减参数的生成过程为根据 VAD切换频率以及 SID帧间隔计算能量衰减参数:
具体流程包括:
获取切换频率参数以及拖尾参数;
计算当前噪声帧与在当前噪声帧之前最近接收到的噪声帧之间的平均间 隔参数;
根据所述切换频率参数, 所述拖尾参数, 所述平均间隔参数, 预置的衰减 系数以及所述预置的最大拖尾长度计算能量衰减参数;
所述能量衰减参数与切换频率参数以及拖尾系数之和成正比,所述能量衰 减参数与切换频率参数, 预置的最大拖尾长度以及平均间隔参数之和成反比。
具体地, 解码器从接收到的码流中译码出参数, 判断出当前帧的帧类型信 息, 检测是否发生了 VAD的切换: 如果前一帧是语音帧且当前帧是噪声帧, 或者如果前一帧是噪声帧且当前帧是语音帧, 则认为发生 VAD的切换, VAD 切换计数器^ 加 1; 另外设置一个能量衰减拖尾计数器(拖尾参数) g—ho , 在语音帧时置为最大拖尾长度 MAX— G— HANGOVER,该最大拖尾长度可以由 实际情况进行设置,此处不作限定,每次检测到语音帧时都将该拖尾参数设置 为 MAX— G— HANGOVER, 检测到噪声帧时将该拖尾参数减 1直至 0。
为了统计某一周期内的切换频率,则需要设置一个检测周期,具体可以为: 使用一个观察窗, 窗长为 MAX— WINDOW, 单位为帧, 该窗长可以由实际情 况进行设置, 此处不作限定, 另外设置一个位置计数器, 记录当前接收到的数 据帧在观察窗中的位置, 如果当前帧到达观察窗的末尾, 则对 VAD切换计数 器 进行长时平滑, 获得长时平均的 VAD 切换频率 (切换频率参数) VadSwtLT = (VadSwtLT + VadSw) 11 , 同时将观察窗平移 MAX_WINDO W帧, 再将 a^w置 0, 采用这种方式, 则可以根据实际需要统计某一周期内的切换频率。
另夕卜,建立一个 SID帧间隔的长时平均纪录 wW— i¾t— /t ,每接收到一个 SID 帧, 则使用该 SID帧与上一次接收到的 SID帧之间的间隔 sid—dist—cur更新长 时 SID帧间隔, 如下式所示:
sid _ dist _lt = δ * sid _ dist _ lt + (l - S) * sid _ dist _ cur
其中 大于或等于 0, 小于或等于 1 , 指示长时平均 SID帧间隔的更新速 度。 如果接收到语音帧, 则将长时 SID帧间隔 i¾t— /t置为 1。
获取到平均间隔参数以及切换频率参数之后即可计算能量衰减参数,具体 公式如下:
、 β VadSwtLT+ γ g ho . , ,. , T, r a + (l - a) '—^= sid dist lt > K fact: β VadSwtLT+ γ MAX _G _HANGOVER+ sid _dist Jt - -
1 否则 同理, 当平均间隔参数大于预置值 K时, 能量衰减参数与平均间隔参数 成反比, 若平均间隔参数小于或等于 Κ, 则能量衰减参数为 1 , 即不进行衰减, 该 Κ为一个预置的数值, 用于指示 SID帧间隔的门限值, 也就是说, 若两个 SID帧之间的平均间隔比较大,则说明噪声比较稳定,从而可以对其进行衰减, 若两个 SID帧之间的平均间隔比较小,则说明噪声不太稳定,从而不对其进行 衰减, 需要说明的是, 本方式结合了上述两种方式的优点, 既以切换频率作为 衰减依据, 同时也以噪声稳定性作为衰减依据, 因此可以更进一步地避免出现 用户主观体验差别较大的情况, 从而提高用户体验。
上述描述了本方式下的能量衰减参数的计算过程, 可以理解的是, 上述公 式仅是一个具体实例,只要使得能量衰减参数切换频率参数以及拖尾参数之和 成正比, 与切换频率参数、 预置最大拖尾长度、 平均间隔参数成反比即可, 具 体公式不作限定。
下面对本发明实施例中噪声生成装置实施例进行描述, 请参阅图 5, 本发 明实施例中噪声生成装置包括:
译码单元 501, 用于对接收到的码流进行译码得到编码参数以及当前数据 帧的类型信息;
类型校验单元 502,用于判断所述类型信息是否指示所述数据帧为噪声帧; 能量衰减参数计算单元 503 , 用于在当前帧为噪声帧时根据所述噪声帧以 及在所述噪声帧之前接收到的数据帧计算对应的能量衰减参数;
能量衰减单元 504,用于根据所述能量衰减参数对窄带和 /或高带噪声能量 进行衰减。
本实施例中的能量衰减参数计算单元 503 还可以进一步包括下列单元中 的一个或全部:
切换频率记录单元 5032, 用于判断所述数据帧的类型与在所述数据帧之 前最近被接收到的数据帧的类型是否一致, 若不一致, 则对切换频率参数进行 统计;
拖尾计数单元 5034, 用于当所述类型信息指示所述数据帧为语音帧时设 置拖尾参数为预置的最大拖尾长度,当所述类型信息指示所述数据帧为噪声帧 时, 对所述拖尾参数进行递减直至达到预置数值。
本实施例中能量衰减参数计算单元 503还可以进一步包括:
噪声帧间隔记录单元 5031, 用于根据所述译码单元得到的数据帧的类型 信息记录当前噪声帧与在当前噪声帧之前最近接收到的噪声帧之间的平均间 隔参数。
本实施例中能量衰减参数计算单元 503还可以进一步包括:
计算执行单元 5033 , 用于根据所述切换频率参数和 /或平均间隔参数计算 能量衰减参数。
本实施例中的计算执行单元 5033还可以进一步包括以下单元中的至少一 个:
第一计算单元 50331 , 用于根据所述切换频率参数, 所述拖尾参数, 预置 的衰减系数以及所述预置的最大拖尾长度计算能量衰减参数;所述能量衰减参 数与切换频率参数以及拖尾系数之和成正比,所述能量衰减参数与切换频率参 数以及预置的最大拖尾长度之和成反比。
第二计算单元 50332, 用于计算当前噪声帧与在当前噪声帧之前最近接收 到的噪声帧之间的平均间隔参数;根据所述平均间隔参数以及预置的衰减系数 计算能量衰减参数; 所述能量衰减参数与所述平均间隔参数成反比。
第三计算单元 50333, 用于计算当前噪声帧与在当前噪声帧之前最近接收 到的噪声帧之间的平均间隔参数; 根据所述切换频率参数, 所述拖尾参数, 所 述平均间隔参数,预置的衰减系数以及所述预置的最大拖尾长度计算能量衰减 参数; 所述能量衰减参数与切换频率参数以及拖尾系数之和成正比, 所述能量 衰减参数与切换频率参数, 预置的最大拖尾长度以及平均间隔参数之和成反 比。
上述实施例中,译码单元 501和类型校验单元 502为可选单元, 即这些功 能可以不在噪声生成装置内完成, 而由其他的外部装置完成。
需要说明的是,能量衰减参数计算单元 503可以根据切换频率计算能量衰 减参数,也可以根据噪声帧间隔计算能量衰减参数,还可以同时根据切换频率 以及噪声帧间隔计算能量衰减参数,具体的计算过程在上述方法实施例中已经 进行了详细描述, 此处流程类似, 不再赘述。
本发明实施例中, 当接收到的数据帧为噪声帧时,根据该噪声帧以及之前 接收到的数据帧计算对应的能量衰减参数,并根据所述能量衰减参数对窄带和 /或高带噪声能量进行衰减 , 因此本发明实施例可以根据当前噪声帧与之前的 数据帧之间的联系计算对应的能量衰减参数,并通过该能量衰减参数对噪声能 量进行衰减, 所以这种能量衰减的方式是自适应的,是可根据数据帧的情况进 行调整的,从而通过这种能量衰减方式得到的舒适噪声比较平滑,有利于提高 用户体验。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤 是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可 读存储介质中, 该程序在执行时, 包括如下步骤:
若接收的数据帧为噪声帧,则根据所述噪声帧以及在所述噪声帧之前接收 到的数据帧计算对应的能量衰减参数;
根据所述能量衰减参数对噪声能量进行衰减得到舒适噪声信号。
上述提到的存储介质可以是只读存储器, 磁盘或光盘等。
以上对本发明所提供的一种噪声生成方法以及噪声生成装置进行了详细 介绍, 对于本领域的一般技术人员, 依据本发明实施例的思想, 在具体实施方 式及应用范围上均会有改变之处, 综上所述,本说明书内容不应理解为对本发 明的限制。

Claims

权 利 要 求
1、 一种噪声生成方法, 其特征在于, 包括:
若接收的数据帧为噪声帧,则根据所述噪声帧以及在所述噪声帧之前接收 到的数据帧计算对应的能量衰减参数;
根据所述能量衰减参数对噪声能量进行衰减。
2、 根据权利要求 1所述的方法, 其特征在于, 所述方法还包括: 判断当前接收到的数据帧的类型与前一个接收到的数据帧的类型相比是 否发生变化, 若是, 则对切换频率参数进行统计。
3、 根据权利要求 2所述的方法, 其特征在于, 所述方法还包括: 若所述数据帧为语音帧, 则设置拖尾参数为预置的最大拖尾长度, 若所述 数据帧为噪声帧, 则对所述拖尾参数进行递减直至达到预置数值。
4、 根据权利要求 2或 3所述的方法, 其特征在于, 所述根据噪声帧以及 在所述噪声帧之前接收到的数据帧计算对应的能量衰减参数的步骤包括: 获取切换频率参数以及拖尾参数;
根据所述切换频率参数, 所述拖尾参数,预置的衰减系数以及预置的最大 拖尾长度计算能量衰减参数;
所述能量衰减参数与切换频率参数以及拖尾系数之和成正比,所述能量衰 减参数与切换频率参数以及预置的最大拖尾长度之和成反比。
5、 根据权利要求 1所述的方法, 其特征在于, 所述根据噪声帧以及在所 述噪声帧之前接收到的数据帧计算对应的能量衰减参数的步骤包括:
计算所述噪声帧与在所述噪声帧之前接收到的前一个噪声帧之间的平均 间隔参数;
根据所述平均间隔参数以及预置的衰减系数计算能量衰减参数; 所述能量衰减参数与所述平均间隔参数成反比。
6、 根据权利要求 5所述的方法, 其特征在于, 所述根据平均间隔参数以 及预置的衰减系数计算能量衰减参数的步骤之前包括:
判断所述平均间隔参数是否大于预置的衰减门限值, 若是, 则触发根据平 均间隔参数以及预置的衰减系数计算能量衰减参数的步骤。
7、 根据权利要求 2或 3所述的方法, 其特征在于, 所述根据噪声帧以及 在所述噪声帧之前接收到的数据帧计算对应的能量衰减参数的步骤包括: 获取切换频率参数以及拖尾参数;
计算所述噪声帧与在所述噪声帧之前接收到的前一个噪声帧之间的平均 间隔参数;
根据所述切换频率参数, 所述拖尾参数, 所述平均间隔参数, 预置的衰减 系数以及预置的最大拖尾长度计算能量衰减参数;
所述能量衰减参数与切换频率参数以及拖尾系数之和成正比,所述能量衰 减参数与切换频率参数, 预置的最大拖尾长度以及平均间隔参数之和成反比。
8、 根据权利要求 1 , 2, 3 , 5 , 6中任一项所述的方法, 其特征在于, 所 述根据能量衰减参数对噪声能量进行衰减的步骤包括:
获取窄带核心层能量参数;
将所述窄带核心层能量参数与所述能量衰减参数进行乘积运算得到衰减 后的窄带核心层能量参数;
根据所述衰减后的窄带核心层能量参数计算衰减后的窄带信号分量。
9、 根据权利要求 1, 2, 3 , 5 , 6中任一项所述的方法, 其特征在于, 所 述根据能量衰减参数对噪声能量进行衰减的步骤包括:
获取高带核心层时域包络参数以及高带核心层频域包络参数;
将所述高带核心层时域包络参数以及高带核心层频域包络参数分别与所 述能量衰减参数进行乘积运算得到衰减后的高带核心层时域包络参数以及衰 减后的高带核心层频域包络参数;
根据所述衰减后的高带核心层时域包络参数以及衰减后的高带核心层频 域包络参数计算衰减后的高带信号分量。
10、 根据权利要求 1, 2, 3 , 5 , 6中任一项所述的方法, 其特征在于, 所 述根据能量衰减参数对噪声能量进行衰减的步骤包括:
获取窄带核心层能量参数, 窄带核心层谱参数, 高带核心层时域包络参数 以及高带核心层频域包络参数;
根据所述窄带核心层能量参数以及窄带核心层谱参数计算窄带信号分量; 根据所述高带核心层时域包络参数以及高带核心层频域包络参数计算高 带信号分量; 对所述窄带信号分量以及高带信号分量进行组合得到宽带信号分量; 根据所述能量衰减参数对所述宽带信号分量进行衰减。
11、 根据权利要求 1 , 2, 3 , 5 , 6中任一项所述的方法, 其特征在于, 所 述根据能量衰减参数对噪声能量进行衰减的步骤包括:
获取窄带核心层能量参数, 窄带核心层谱参数, 高带核心层时域包络参数 以及高带核心层频域包络参数;
根据所述窄带核心层能量参数以及窄带核心层谱参数计算窄带信号分量; 根据所述高带核心层时域包络参数以及高带核心层频域包络参数计算高 带信号分量;
根据所述能量衰减参数分别对所述窄带信号分量以及高带信号分量进行 衰减得到衰减后的窄带信号分量以及衰减后的高带信号分量;
对所述衰减后的窄带信号分量以及衰减后的高带信号分量进行组合得到 衰减后的宽带信号分量。
12、根据权利要求 1所述的方法, 其特征在于, 所述根据噪声帧以及在所 述噪声帧之前接收到的数据帧计算对应的能量衰减参数的步骤之后包括: 向解码端发送包含所述能量衰减参数的数据帧;
所述根据能量衰减参数对噪声能量进行衰减的步骤包括:
解码端根据接收到的数据帧中的能量衰减参数对噪声能量进行衰减。
13、根据权利要求 1所述的方法, 其特征在于, 所述根据能量衰减参数对 噪声能量进行衰减的步骤之后包括:
向解码端发送经过噪声能量衰减的数据帧;
解码端根据所述数据帧生成舒适噪声信号。
14、 一种噪声生成装置, 其特征在于, 包括:
能量衰减参数计算单元,用于在接收的数据帧为噪声帧时根据所述噪声帧 以及在所述噪声帧之前接收到的数据帧计算对应的能量衰减参数;
能量衰减单元, 用于根据所述能量衰减参数对噪声能量进行衰减。
15、 根据权利要求 14所述的噪声生成装置, 其特征在于, 所述噪声生成 装置还包括:
译码单元, 用于对接收到的码流进行译码得到当前数据帧的类型信息; 类型校验单元, 用于判断所述类型信息指示所述数据帧是否为噪声帧。
16、 根据权利要求 14所述的噪声生成装置, 其特征在于, 所述能量衰减 参数计算单元还包括:
切换频率记录单元,用于判断当前接收到的数据帧的类型与前一个接收到 的数据帧的类型相比是否发生变化, 若是, 则对切换频率参数进行统计; 拖尾计数单元,用于当所述类型信息指示所述数据帧为语音帧时设置拖尾 参数为预置的最大拖尾长度, 当所述类型信息指示所述数据帧为噪声帧时,对 所述拖尾参数进行递减直至达到预置数值。
17、根据权利要求 15或 16所述的噪声生成装置, 其特征在于, 所述能量 衰减参数计算单元还包括:
噪声帧间隔记录单元,用于根据所述译码单元得到的数据帧的类型信息记 录当前噪声帧与在当前噪声帧之前接收到的前一个噪声帧之间的平均间隔参 数。
18、 根据权利要求 17所述的噪声生成装置, 其特征在于, 所述能量衰减 参数计算单元还包括:
计算执行单元, 用于根据所述切换频率参数和 /或平均间隔参数计算能量 衰减参数。
19、 根据权利要求 18所述的噪声生成装置, 其特征在于, 所述计算执行 单元包括:
第一计算单元, 用于根据所述切换频率参数, 所述拖尾参数, 预置的衰减 系数以及所述预置的最大拖尾长度计算能量衰减参数;所述能量衰减参数与切 换频率参数以及拖尾系数之和成正比,所述能量衰减参数与切换频率参数以及 预置的最大拖尾长度之和成反比。
20、 根据权利要求 18所述的噪声生成装置, 其特征在于, 所述计算执行 单元包括:
第二计算单元,用于计算当前噪声帧与在当前噪声帧之前接收到的前一个 噪声帧之间的平均间隔参数;根据所述平均间隔参数以及预置的衰减系数计算 能量衰减参数; 所述能量衰减参数与所述平均间隔参数成反比。
21、 根据权利要求 18所述的噪声生成装置, 其特征在于, 所述计算执行 单元包括:
第三计算单元,用于计算当前噪声帧与在当前噪声帧之前接收到的前一个 噪声帧之间的平均间隔参数; 根据所述切换频率参数, 所述拖尾参数, 所述平 均间隔参数, 预置的衰减系数以及所述预置的最大拖尾长度计算能量衰减参 数; 所述能量衰减参数与切换频率参数以及拖尾系数之和成正比, 所述能量衰 减参数与切换频率参数, 预置的最大拖尾长度以及平均间隔参数之和成反比。
PCT/CN2009/070856 2008-03-20 2009-03-18 一种噪声生成方法以及噪声生成装置 WO2009115039A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP09722494.3A EP2259040B1 (en) 2008-03-20 2009-03-18 Method and apparatus for noise generating
RU2010142929/08A RU2469420C2 (ru) 2008-03-20 2009-03-18 Способ и устройство для формирования шумов
US12/886,151 US8370136B2 (en) 2008-03-20 2010-09-20 Method and apparatus for generating noises
US13/730,056 US20130124196A1 (en) 2008-03-20 2012-12-28 Method and apparatus for generating noises

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2008100851751A CN101483042B (zh) 2008-03-20 2008-03-20 一种噪声生成方法以及噪声生成装置
CN200810085175.1 2008-03-20

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/886,151 Continuation US8370136B2 (en) 2008-03-20 2010-09-20 Method and apparatus for generating noises

Publications (1)

Publication Number Publication Date
WO2009115039A1 true WO2009115039A1 (zh) 2009-09-24

Family

ID=40880122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/070856 WO2009115039A1 (zh) 2008-03-20 2009-03-18 一种噪声生成方法以及噪声生成装置

Country Status (5)

Country Link
US (2) US8370136B2 (zh)
EP (1) EP2259040B1 (zh)
CN (1) CN101483042B (zh)
RU (1) RU2469420C2 (zh)
WO (1) WO2009115039A1 (zh)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246688B (zh) * 2007-02-14 2011-01-12 华为技术有限公司 一种对背景噪声信号进行编解码的方法、系统和装置
EP2458586A1 (en) * 2010-11-24 2012-05-30 Koninklijke Philips Electronics N.V. System and method for producing an audio signal
CN103137133B (zh) * 2011-11-29 2017-06-06 南京中兴软件有限责任公司 非激活音信号参数估计方法及舒适噪声产生方法及系统
WO2013098885A1 (ja) * 2011-12-27 2013-07-04 三菱電機株式会社 音声信号復元装置および音声信号復元方法
CN104217723B (zh) * 2013-05-30 2016-11-09 华为技术有限公司 信号编码方法及设备
CN105336339B (zh) 2014-06-03 2019-05-03 华为技术有限公司 一种语音频信号的处理方法和装置
TWI591624B (zh) * 2014-11-12 2017-07-11 元鼎音訊股份有限公司 降低噪音之方法及其電腦程式產品及其電子裝置
US9812149B2 (en) * 2016-01-28 2017-11-07 Knowles Electronics, Llc Methods and systems for providing consistency in noise reduction during speech and non-speech periods
CN105721656B (zh) * 2016-03-17 2018-10-12 北京小米移动软件有限公司 背景噪声生成方法及装置
US11120795B2 (en) * 2018-08-24 2021-09-14 Dsp Group Ltd. Noise cancellation
CN109817241B (zh) * 2019-02-18 2021-06-01 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法、装置及存储介质
CN110931035B (zh) * 2019-12-09 2023-10-10 广州酷狗计算机科技有限公司 音频处理方法、装置、设备及存储介质
CN113571072B (zh) * 2021-09-26 2021-12-14 腾讯科技(深圳)有限公司 一种语音编码方法、装置、设备、存储介质及产品

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007027291A1 (en) * 2005-08-31 2007-03-08 Motorola, Inc. Method and apparatus for comfort noise generation in speech communication systems
WO2007111645A2 (en) * 2006-03-20 2007-10-04 Mindspeed Technologies, Inc. Method and system for reducing effects of noise producing artifacts in a voice codec
CN101080766A (zh) * 2004-11-03 2007-11-28 声学技术公司 使用bark频带weiner滤波器和线性衰减的噪声降低和舒适噪声增益控制
CN101087319A (zh) * 2006-06-05 2007-12-12 华为技术有限公司 一种发送和接收背景噪声的方法和装置及静音压缩系统
CN101207665A (zh) * 2007-11-05 2008-06-25 华为技术有限公司 一种衰减因子的获取方法和获取装置
WO2008100385A2 (en) * 2007-02-14 2008-08-21 Mindspeed Technologies, Inc. Embedded silence and background noise compression

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5146473A (en) * 1989-08-14 1992-09-08 International Mobile Machines Corporation Subscriber unit for wireless digital subscriber communication system
FR2680924B1 (fr) 1991-09-03 1997-06-06 France Telecom Procede de filtrage adapte d'un signal transforme en sous-bandes, et dispositif de filtrage correspondant.
US5657422A (en) * 1994-01-28 1997-08-12 Lucent Technologies Inc. Voice activity detection driven noise remediator
ZA955605B (en) * 1994-07-13 1996-04-10 Qualcomm Inc System and method for simulating user interference received by subscriber units in a spread spectrum communication network
FR2739995B1 (fr) 1995-10-13 1997-12-12 Massaloux Dominique Procede et dispositif de creation d'un bruit de confort dans un systeme de transmission numerique de parole
US6563803B1 (en) * 1997-11-26 2003-05-13 Qualcomm Incorporated Acoustic echo canceller
US6549587B1 (en) * 1999-09-20 2003-04-15 Broadcom Corporation Voice and data exchange over a packet based network with timing recovery
CA2454296A1 (en) * 2003-12-29 2005-06-29 Nokia Corporation Method and device for speech enhancement in the presence of background noise

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080766A (zh) * 2004-11-03 2007-11-28 声学技术公司 使用bark频带weiner滤波器和线性衰减的噪声降低和舒适噪声增益控制
WO2007027291A1 (en) * 2005-08-31 2007-03-08 Motorola, Inc. Method and apparatus for comfort noise generation in speech communication systems
WO2007111645A2 (en) * 2006-03-20 2007-10-04 Mindspeed Technologies, Inc. Method and system for reducing effects of noise producing artifacts in a voice codec
CN101087319A (zh) * 2006-06-05 2007-12-12 华为技术有限公司 一种发送和接收背景噪声的方法和装置及静音压缩系统
WO2008100385A2 (en) * 2007-02-14 2008-08-21 Mindspeed Technologies, Inc. Embedded silence and background noise compression
CN101207665A (zh) * 2007-11-05 2008-06-25 华为技术有限公司 一种衰减因子的获取方法和获取装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2259040A4 *

Also Published As

Publication number Publication date
US20130124196A1 (en) 2013-05-16
EP2259040B1 (en) 2013-06-12
RU2469420C2 (ru) 2012-12-10
EP2259040A4 (en) 2011-06-29
RU2010142929A (ru) 2012-04-27
US8370136B2 (en) 2013-02-05
CN101483042B (zh) 2011-03-30
CN101483042A (zh) 2009-07-15
US20110015923A1 (en) 2011-01-20
EP2259040A1 (en) 2010-12-08

Similar Documents

Publication Publication Date Title
WO2009115039A1 (zh) 一种噪声生成方法以及噪声生成装置
US10269359B2 (en) Audio decoder and method for providing a decoded audio information using an error concealment based on a time domain excitation signal
JP6306177B2 (ja) 時間ドメイン励振信号を修正するエラーコンシールメントを用いて、復号化されたオーディオ情報を提供する、オーディオデコーダおよび復号化されたオーディオ情報を提供する方法
EP2438592B1 (en) Method, apparatus and computer program product for reconstructing an erased speech frame
US8296132B2 (en) Apparatus and method for comfort noise generation
KR101648290B1 (ko) 컴포트 노이즈의 생성
WO2014190641A1 (zh) 一种媒体数据的传输方法、装置和系统
JP2019512733A (ja) 適切に復号されたオーディオフレームの復号化表現の特性を使用する誤り隠蔽ユニット、オーディオデコーダ、および関連する方法およびコンピュータプログラム
JP5143949B2 (ja) 背景雑音生成方法および雑音処理装置
US20220108709A1 (en) Stereo Signal Encoding Method and Encoding Apparatus
KR101655913B1 (ko) 디지털 오디오 신호에서의 프리-에코 감쇠
JP5415460B2 (ja) 背景ノイズ情報を符号化する方法および手段
WO2013017018A1 (zh) 一种进行语音自适应非连续传输的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09722494

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 6059/CHENP/2010

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2009722494

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2010142929

Country of ref document: RU