WO2013017018A1 - Method and apparatus for performing voice adaptive discontinuous transmission - Google Patents

Method and apparatus for performing voice adaptive discontinuous transmission Download PDF

Info

Publication number
WO2013017018A1
WO2013017018A1 PCT/CN2012/078878 CN2012078878W WO2013017018A1 WO 2013017018 A1 WO2013017018 A1 WO 2013017018A1 CN 2012078878 W CN2012078878 W CN 2012078878W WO 2013017018 A1 WO2013017018 A1 WO 2013017018A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
spectral energy
insertion description
mute insertion
speech signal
Prior art date
Application number
PCT/CN2012/078878
Other languages
French (fr)
Chinese (zh)
Inventor
顾彩霞
袁浩
江东平
黎家力
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2013017018A1 publication Critical patent/WO2013017018A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding

Definitions

  • the present invention relates to the field of digital signal processing, and in particular, to a method and apparatus for performing speech adaptive discontinuous transmission (DTX).
  • DTX speech adaptive discontinuous transmission
  • the sender uses the Voice Activity Detector (VAD) algorithm for signal detection, and when detecting the inactive segment of the call, the lower code rate pair is used in the silence segment.
  • VAD Voice Activity Detector
  • the important information of the signal is encoded, that is, the signal is coded into a Silence Insertion Descriptor (SID) frame, and the SID frame is transmitted in a discontinuous manner.
  • SID Silence Insertion Descriptor
  • the decoding end decodes according to the received SID frame in the form of Comfort Noise Generation (CNG).
  • CNG Comfort Noise Generation
  • a SID frame is transmitted at a certain number of frames in the mute segment by using a parameter set in advance, for example, the 3GPP AMR and the AMR-WB speech coding standard are used. Method, fixed once every 8 frames.
  • the advantage of this method is that the calculation is simple and easy to implement, and the disadvantage is that the code rate cannot be automatically adjusted according to the signal characteristics.
  • the sender detects a silence frame after a voice frame, it does not immediately enter the silence segment, but uses a certain hangover mechanism.
  • the encoding of the normal speech is still encoded.
  • the silence frame is still detected, then the SIDFIRST frame (ie the first SID frame) is sent at the first silence frame position after the silence segment, and a SID update (SIDUPDATE) frame is sent at the third silence frame position.
  • a SID update frame is sent every 7 frames, so that the SID frame is updated with a fixed low code rate after the buffering phase, so as to update the parameters.
  • the buffering phase is canceled, and the SID update frame is directly transmitted.
  • This method is simple to calculate and can be implemented only by using a counter. No additional parameter calculation is required, and the code rate is controllable, and the algorithm is stable.
  • the disadvantage of this method is that the fixed interval is used to make the code rate fixed, and the uniform code rate is used for different noises, and cannot be adjusted according to the change of the noise signal. For example, for white noise, the parameters are very stable, but the SID frame is still sent frequently, which cannot effectively reduce the code rate. For a fast-changing noise signal, the signal change cannot be tracked in time, causing information delay, resulting in a large distortion of the noise signal when the CNG is restored at the decoding end.
  • variable interval transmission scheme of mode 2 When using the variable interval transmission scheme of mode 2, a certain algorithm is used to evaluate the signal of the silent segment in real time, and according to the real-time change of the signal, it is determined whether the SID frame needs to be transmitted.
  • the advantage of this method is flexibility, it can be changed according to the real-time change of the signal, the bandwidth is saved to the maximum, and the average code rate can be adjusted.
  • the disadvantage is that the calculation is relatively complicated.
  • variable interval transmission mode is used to measure whether the signal changes significantly by calculating the parameters such as LPC of the signal to determine whether the update is needed, although the method can be adaptive.
  • the signal is tracked, but the computational complexity is high.
  • This method is based on linear prediction.
  • LPC linear predictive coding
  • the mathematical representation of the coefficient is used, and the same parameter of the last transmitted SID frame stored in the memory.
  • the signal is considered to change, then the SID update frame is sent, otherwise it is not sent.
  • Embodiments of the present invention provide a method and apparatus for performing speech adaptive discontinuous transmission, which overcomes the problem that the fixed interval method in the related art cannot flexibly track signal changes, and the variable interval method must have multiple parameters such as linear prediction.
  • the calculations lead to the disadvantage of high computational complexity.
  • an embodiment of the present invention provides a method for performing voice adaptive discontinuous transmission, including:
  • whether to send a mute insertion description frame is determined according to the current speech signal frame and the spectrum information of the previous mute insertion description frame.
  • the spectrum information of the voice signal frame refers to the spectrum information calculated according to the frequency domain signal of the voice signal frame, or the frequency domain signal of the voice signal frame is smoothed and processed according to the smoothed frequency domain signal. Calculated frequency information.
  • the step of determining whether to send the mute insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame includes:
  • Determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy of the speech signal frame and a previous mute insertion description
  • the mute insertion description frame is sent.
  • the step of determining whether to send the silence insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame includes:
  • Determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy of the speech signal frame and the previous muting
  • the difference between the spectral energy of the description frame is greater than the first preset limit
  • the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit:
  • the ratio of the spectral energy of the speech signal frame to the spectral energy of the last mute insertion description frame is large a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than one;
  • the difference between the spectral energy of the speech signal frame and the spectral energy of the last mute insertion description frame is greater than the difference threshold.
  • the step of determining whether to send the mute insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame includes:
  • Determining the absolute value of the spectral energy of the speech signal frame and/or the absolute value of the spectral energy of the last mute insertion description frame is greater than the single frame energy threshold, calculating the speech signal frame and the previous mute insertion description frame
  • the frequency-dependent value of the spectral energy when it is judged that the calculated frequency-related value is less than the spectral correlation threshold, sends a mute insertion description frame.
  • an embodiment of the present invention further provides an apparatus for performing voice adaptive non-contiguous transmission, including a mute insertion description frame processing unit and a mute insertion description frame storage unit;
  • the mute insertion description frame processing unit is configured to determine whether to send a mute insertion description frame according to the current speech signal frame and the spectrum information of the last mute insertion description frame;
  • the mute insertion description frame storage unit is configured to store the spectrum information of the mute insertion description frame after the mute insertion description frame processing unit transmits the mute insertion description frame.
  • the mute insertion description frame processing unit is further configured to perform smoothing processing on the frequency domain signal of the speech signal frame, and calculate the frequency information of the speech signal frame according to the smoothed frequency domain signal;
  • the mute insertion description frame storage unit is further arranged to store the smoothed frequency domain signal.
  • the mute insertion description frame processing unit is configured to decide whether to transmit a mute insertion description frame by: determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame When the value is greater than the single frame energy threshold, and the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than the first preset limit, the mute insertion description frame is sent; or the speech signal frame is determined.
  • Absolute value of the spectral energy and / or the The absolute value of the spectral energy of the previous mute insertion description frame is greater than the single frame energy threshold, and the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than the first preset limit, further Determining whether a difference between a spectral energy of the speech signal frame and a spectral energy of the previous mute insertion description frame is greater than a second preset limit, and if so, continuously transmitting two mute insertion description frames, wherein the second preset limit corresponds to The spectral energy difference is greater than the spectral energy difference corresponding to the first preset limit;
  • the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit: the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame
  • the ratio is greater than a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than 1; or the frequency energy of the speech signal frame and the spectrum of the previous muting insertion description frame
  • the absolute difference in energy is greater than the difference threshold.
  • the mute insertion description frame processing unit is configured to decide whether to transmit a mute insertion description frame by: determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame When the value is greater than the single-frame energy threshold, the frequency-correlation value of the spectral energy of the speech signal frame and the previous mute insertion description frame is calculated, and when the calculated frequency-related value is less than the spectral correlation threshold, the mute insertion description frame is sent. .
  • 1 is a schematic structural diagram of an apparatus for performing voice adaptive discontinuous transmission
  • FIG. 2 is a schematic structural diagram of another apparatus for performing voice adaptive discontinuous transmission
  • FIG. 3 is a schematic flowchart of performing voice adaptive discontinuous transmission in Embodiment 2
  • FIG. 4 is a voice adaptive method in Embodiment 3. Schematic diagram of the process of discontinuous transmission. Preferred embodiment of the invention
  • the apparatus for performing voice adaptive discontinuous transmission includes a mute insertion description frame processing unit and a mute insertion description frame storage unit.
  • the mute insertion description frame processing unit is configured to determine whether to send the mute insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame;
  • the mute insertion description frame storage unit is arranged to store the frequency information of the mute insertion description frame after the device transmits the mute insertion description frame.
  • the mute insertion description frame processing unit is configured to determine whether to send the mute insertion description frame by: determining the absolute value of the spectral energy of the speech signal frame and/or the spectrum of the previous mute insertion description frame.
  • the absolute value of the energy is greater than the single frame energy threshold, and when the difference between the spectral energy of the speech signal frame and the spectral energy of the last mute insertion description frame is greater than the first preset limit, the mute insertion description frame is sent.
  • the mute insertion description frame processing unit may be further configured to decide whether to transmit the mute insertion description frame by: determining an absolute value of the spectral energy of the speech signal frame and/or an absolute value of the spectral energy of the last mute insertion description frame If the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than the first predetermined limit, further determining the spectral energy of the speech signal frame and the The previous mute insertion describes whether the difference value of the spectral energy of the frame is greater than the second preset limit. If yes, two mute insertion description frames are continuously sent, where the spectral energy difference corresponding to the second preset limit is greater than the first preset limit.
  • the spectral energy gap may be further configured to decide whether to transmit the mute insertion description frame by: determining an absolute value of the spectral energy of the speech signal frame and/or an absolute value of the spectral energy of the last mute insertion description
  • the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit:
  • the ratio of the spectral energy of the speech signal frame to the spectral energy of the previous mute insertion description frame is greater than a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than 1; or, the speech signal frame
  • the absolute value of the difference between the spectral energy and the spectral energy of the last mute insertion description frame is greater than the difference threshold.
  • the mute insertion description frame processing unit is configured to decide whether to send the mute insertion description frame by: determining the absolute value of the spectral energy of the speech signal frame and/or the upper When the absolute value of the frequency speech energy of the mute insertion description frame is greater than the single frame energy threshold, the frequency correlation value of the frame is calculated according to the current speech signal frame and the spectrum energy of the previous mute insertion description frame, and the spectrum correlation value is determined. When less than the spectral correlation threshold, the mute insertion description frame is sent.
  • the mute insertion description frame processing unit is configured to determine whether to transmit the mute insertion description frame by the difference of the spectrum energy of the two and the frequency correlation value.
  • the apparatus may further include: a smoothing filtering unit; the smoothing filtering unit is configured to perform smoothing filtering on the frequency domain signal of the voice signal, and input to the mute insertion description frame processing unit, and the mute insertion description frame processing unit The above processing is performed on the smoothed frequency domain signal, and the mute insertion description frame storage unit also needs to save the smoothed frequency domain signal.
  • the method for performing voice adaptive discontinuous transmission includes: In performing voice adaptive discontinuous transmission, determining whether to send a silence insertion description frame according to a current voice signal frame and a frequency information of a previous silence insertion description frame.
  • the spectrum information of the voice signal frame refers to the spectrum information calculated according to the frequency domain signal of the voice signal frame, or the frequency domain signal of the voice signal frame is smoothed and processed according to the smoothed frequency domain signal. Calculated frequency information.
  • the smoothing process is mainly to more accurately compare the spectral changes of the signal, reduce the influence of the details of the spectrum on the overall comparison, eliminate the spectral spikes and burrs, and make the output spectrum smoother, making the spectral envelope more stable.
  • This spectral smoothing can be achieved using a smoothing filter. Take 16kHz sample and 20ms frame length as an example. By using a fast Fourier transform (FFT), the time domain signal is transformed into the frequency domain to obtain the spectral parameters of the frame signal, and the FFT length is 320 points.
  • FFT fast Fourier transform
  • H(z) a 0 Z ⁇ 2 + ⁇ ⁇ ⁇ ⁇ + 2 + ⁇ 3 ⁇ + ⁇ 4 ⁇ 2
  • the coefficients [ , A , ⁇ ⁇ , ] are the smoothing coefficients, which can be [0.15, 0.15, 0.4, 0.15, 0.15]. After smoothing, the trend of the line is unchanged, but the instantaneous mutation is reduced, which is more conducive to observing the change of the signal envelope of the signal.
  • the above spectral smoothing includes, but is not limited to, the above-described manner of using a filter. During the use of the filter, different adjustment effects can also be achieved by adjusting the coefficients or orders of the filter.
  • determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy sum of the speech signal frame
  • the mute insertion description frame is sent.
  • determining an absolute value of the spectral energy of the speech signal frame and/or an absolute value of the spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and the spectral energy of the speech signal frame and the upper
  • determining, by a mute insertion, that the difference between the spectral energy of the frame is greater than the first preset limit further determining whether a difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a second preset limit, If yes, two mute insertion description frames are continuously sent, wherein the second preset limit corresponds to a spectral energy difference greater than a spectral energy difference corresponding to the first preset limit.
  • the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit: the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame
  • the ratio is greater than a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than 1; or the frequency energy of the speech signal frame and the spectrum of the previous muting insertion description frame
  • the absolute difference in energy is greater than the difference threshold.
  • Embodiment 2 when determining the absolute value of the spectral energy of the speech signal frame and/or the absolute value of the frequency speech energy of the last mute insertion description frame is greater than the single frame energy threshold, according to the current speech signal frame and the upper A mute insertion describes a frequency-correlation value of the spectral energy of the frame, and when the frequency-related value is less than the frequency-dependent threshold, the mute insertion description frame is sent.
  • whether the mute insertion description frame is sent may be determined according to the difference of the spectrum energy of the two and the frequency correlation value.
  • the frequency word correlation value parameter is used for judgment.
  • the device After the SID frame is sent, the device stores the spectrum energy information of the SID frame in the SID frame storage unit, that is, the information stored in the silence insertion description frame storage unit is the last transmission. Spectrum energy information of the SID frame.
  • the SID frame When determining whether to send the SID frame, first determining that at least one of the absolute value of the spectral energy of the current speech signal frame and the absolute value of the spectral energy of the previous mute insertion description frame is greater than a single frame energy threshold (THR1), if not satisfied In the above condition, the signal execution is considered to maintain low energy, and the SID frame does not need to be transmitted. After the above conditions are satisfied, the correlation between the spectral energy of the current speech signal frame and the spectral energy of the previous mute insertion description frame is calculated according to the following formula:
  • S(i) represents the spectral energy of the current speech signal frame
  • S last (i) represents the spectral energy of the previous SID frame of the current frame
  • N represents the spectral length, which is 320 in this embodiment.
  • the ratio of the spectral energy is used to determine.
  • the device After the SID frame is sent, the device stores the spectrum energy information of the SID frame in the SID frame storage unit, that is, the information stored in the silence insertion description frame storage unit is the spectrum energy information of the last transmitted SID frame.
  • the ratio of the spectral energy of the current speech signal frame to the spectral energy of the last mute insertion description frame is calculated according to the following formula:
  • S(i) represents the spectral energy of the current speech signal frame
  • S last (i) represents the spectral energy of the previous SID frame of the current frame
  • N represents the spectral length
  • THR3 is a real number greater than 1, indicating that the signal energy changes greatly, and a SID frame needs to be sent. Otherwise, the SID frame does not need to be transmitted.
  • the ratio of the spectral energy is used to determine.
  • the device After the SID frame is sent, the device stores the spectrum energy information of the SID frame in the SID frame storage unit, that is, the information stored in the silence insertion description frame storage unit is the spectrum energy information of the last transmitted SID frame.
  • the ratio of the spectral energy of the current speech signal frame to the spectral energy of the last mute insertion description frame is calculated according to the following formula:
  • S(i) represents the spectral energy of the current speech signal frame
  • S last (i) represents the spectral energy of the previous SID frame of the current frame
  • N represents the spectral length
  • THR3 is A real number greater than 1 indicates that the signal energy has changed greatly, and the next step is judged. Otherwise, there is no need to send a SID frame.
  • the difference is determined by the difference in spectral energy.
  • the device After the SID frame is sent, the device stores the spectrum energy information of the SID frame in the SID frame storage unit, that is, the information stored in the silence insertion description frame storage unit is the spectrum energy information of the last transmitted SID frame.
  • the difference between the spectral energy of the current speech signal frame and the spectral energy of the last mute insertion description frame is calculated according to the following formula:
  • R 3 X * 5( - ⁇ 5 ⁇ ( * 5 ⁇ (
  • S(i) represents the spectral energy of the current speech signal frame
  • S last (i) represents the spectral energy of the previous SID frame of the current frame
  • N represents the spectral length
  • the absolute value of the difference R 3 is greater than the threshold value THR5, it indicates that the signal energy changes greatly, and the SID frame needs to be sent, and the information of the SID frame storage unit is updated at the same time.
  • a hangover algorithm may be added to ensure the sound quality at the end of the speech, and the CNG algorithm initialization is completed. That is, when a silence frame is detected after a continuous speech frame, instead of directly entering the discontinuous transmission mode, the first few silent frames continue to be processed in accordance with the voice frame mode. After that, it enters the discontinuous transmission mode. For example, in the language When the first silence frame is detected after the tone frame, the first 7 silence frames continue to be processed in the voice frame mode. Then, if the detected silence frame is still a silence frame, the SID_ FIRST frame is transmitted, and the SID_UPDATE is transmitted in the third frame after SID_ FIRST, and then the SID frame is sent according to the decision algorithm described above.
  • the hangover algorithm includes counting the continuous speech frames.
  • the buffer algorithm is set according to the above buffer algorithm. Buffer phase, otherwise, send SID_UPDATE directly, and enter the automatic detection state, and the count of consecutive speech frames will be cleared.
  • the maximum SID interval threshold value may also be set.
  • the SID is forced to be updated to ensure the stability of the system and reduce the adverse effects caused by abnormal conditions such as SID frame loss.
  • a minimum SID interval threshold value may also be set.
  • the solution can be used for real-time two-way communication, such as wireless, IP conferencing, television, and other areas of voice transmission, to effectively save bandwidth resources and improve network usage efficiency without substantially affecting sound quality.
  • the scheme has low computational complexity, accurate tracking of signal spectrum changes, effective tracking in the case of fast noise changes, effective bandwidth saving in the case of noise smoothness, and independent of specific speech and audio encoders. Flexible and efficient.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

A method and an apparatus for performing voice adaptive discontinuous transmission. The method comprises: when voice adaptive discontinuous transmission is performed, determining whether to send a silence insertion descriptor frame according to frequency spectrum information of a current voice signal frame and frequency spectrum information of a previous silence insertion descriptor frame. This scheme may overcome disadvantages that in related technologies, with a fixed interval manner, signal changes cannot be flexibly tracked, and with a variable interval manner, necessary multi-parameter calculation such as linear prediction results in high calculation complexity. This scheme is directly performed in a frequency domain, and can well track signal changes, thereby ensuring the tone quality at the same time of maintaining a low average code rate.

Description

一种进行语音自适应非连续传输的方法及装置  Method and device for performing speech adaptive discontinuous transmission
技术领域 Technical field
本发明涉及数字信号处理领域, 尤其涉及一种进行语音自适应非连续传 输( Discontinuous Transmission, 简称 DTX ) 的方法及装置。  The present invention relates to the field of digital signal processing, and in particular, to a method and apparatus for performing speech adaptive discontinuous transmission (DTX).
背景技术 Background technique
在实际用户通信过程中, 一般情况下, 较少时间用于传送用户话音, 较 多时间用于传送非话音的背景音。 如果按照对语音信号的编码方式对通信过 程进行全程编码, 会造成很大的资源浪费。 相关技术中为了减少这种浪费, 发送端利用语音激活检测 ( Voice Activity Detector, 简称 VAD )算法进行信 号检测, 检测到通话中的静音(inactive )段时, 在静音段中使用较低码率对 信号的重要信息进行编码, 即将信号编码成为静音插入描述( Silence Insertion Descriptor, 简称 SID ) 帧, 并且釆用不连续方式发送 SID帧。 解码端根据接 收到的 SID帧以舒适噪声产生 ( Comfort Noise Generation , 简称 CNG ) 的方 式进行解码。 这样, 在对音质影响不大的基础上, 大大减少平均码率, 节省 资源, 这无疑对于有效地使用日益紧张的网络带宽资源具有积极地意义。 因 此, 在静音段釆用什么样的策略以及多大间隔来发送 SID帧, 也就决定了节 省带宽的多少。  In the actual user communication process, in general, less time is used to transmit the user's voice, and more time is used to transmit the non-voice background sound. If the communication process is fully encoded according to the encoding method of the voice signal, a great waste of resources is caused. In the related art, in order to reduce such waste, the sender uses the Voice Activity Detector (VAD) algorithm for signal detection, and when detecting the inactive segment of the call, the lower code rate pair is used in the silence segment. The important information of the signal is encoded, that is, the signal is coded into a Silence Insertion Descriptor (SID) frame, and the SID frame is transmitted in a discontinuous manner. The decoding end decodes according to the received SID frame in the form of Comfort Noise Generation (CNG). In this way, on the basis of little influence on the sound quality, the average code rate is greatly reduced, and resources are saved, which is undoubtedly positive for effectively using increasingly tight network bandwidth resources. Therefore, what kind of strategy and how much interval to use to send SID frames in the silent segment determines how much bandwidth is saved.
目前在语音自适应非连续传输中进行 SID帧发送的方式主要包括两类: 一, 以固定间隔发送; 二: 以可变间隔发送。  Currently, there are two main methods for performing SID frame transmission in voice adaptive discontinuous transmission: one, transmitting at a fixed interval; and two: transmitting at a variable interval.
釆用方式一的以固定间隔发送方案时, 釆用事先设定好的参数, 在静音 段每隔一定的帧数发送一 SID帧 , 例如 3GPP AMR和 AMR-WB语音编码标 准中就是使用的该方法, 固定每 8帧发送一次。 该方法的优点是计算简单, 容易实现, 缺点是不能根据信号特征自动调节码率。  When the scheme is transmitted at a fixed interval in the first mode, a SID frame is transmitted at a certain number of frames in the mute segment by using a parameter set in advance, for example, the 3GPP AMR and the AMR-WB speech coding standard are used. Method, fixed once every 8 frames. The advantage of this method is that the calculation is simple and easy to implement, and the disadvantage is that the code rate cannot be automatically adjusted according to the signal characteristics.
自适应多速率( Adaptive Multi Rate , 简称 AMR ) 的 SID帧发送机制中, 发送端在语音帧后检测到静音帧时, 不立即进入静音段, 而是釆用一定的緩 冲 (hangover )机制, 在此緩冲阶段, 仍然按照对正常语音的编码进行编码, 在緩冲阶段之后, 仍然检测到静音帧, 则在静音段之后的第一个静音帧位置 发送 SIDFIRST帧 (即第一 SID帧 ) , 在第三个静音帧位置发送一个 SID更 新( SIDUPDATE )帧, 之后, 固定每隔 7帧发送一个 SID更新帧, 这样在緩 冲阶段后按固定低码率对 SID帧进行参数更新, 以达到更新参数的目的。 在 另一种实现方式中,在连续 N个语音帧后检测到静音帧并且此 N的值小于 34 时, 取消緩冲阶段, 直接进行 SID更新帧的发送。 此方法计算简单, 只需要 使用计数器就可以实现, 不需要进行额外的参数计算, 并且码率可控, 算法 稳定。 此方法的缺点是釆用固定间隔, 使码率固定, 对于不同噪声使用统一 的码率, 不能根据噪声信号的变化进行调整。 比如对于白噪声, 参数非常稳 定, 但是仍然频繁发送 SID帧, 不能有效降低码率。 而对于变化很快的噪声 信号, 又不能及时跟踪信号变化, 造成信息延迟, 导致在解码端进行 CNG恢 复的时候噪声信号失真很大。 In the SID frame transmission mechanism of Adaptive Multi Rate (AMR), when the sender detects a silence frame after a voice frame, it does not immediately enter the silence segment, but uses a certain hangover mechanism. In this buffering phase, the encoding of the normal speech is still encoded. After the buffering phase, the silence frame is still detected, then the SIDFIRST frame (ie the first SID frame) is sent at the first silence frame position after the silence segment, and a SID update (SIDUPDATE) frame is sent at the third silence frame position. After that, a SID update frame is sent every 7 frames, so that the SID frame is updated with a fixed low code rate after the buffering phase, so as to update the parameters. In another implementation, when a silence frame is detected after consecutive N speech frames and the value of N is less than 34, the buffering phase is canceled, and the SID update frame is directly transmitted. This method is simple to calculate and can be implemented only by using a counter. No additional parameter calculation is required, and the code rate is controllable, and the algorithm is stable. The disadvantage of this method is that the fixed interval is used to make the code rate fixed, and the uniform code rate is used for different noises, and cannot be adjusted according to the change of the noise signal. For example, for white noise, the parameters are very stable, but the SID frame is still sent frequently, which cannot effectively reduce the code rate. For a fast-changing noise signal, the signal change cannot be tracked in time, causing information delay, resulting in a large distortion of the noise signal when the CNG is restored at the decoding end.
釆用方式二的以可变间隔发送方案时, 釆用一定算法对静音段的信号进 行实时评估, 根据信号的实时变化, 决定是否需要发送 SID帧。 该方法的优 点是灵活, 可以根据信号的实时变化而变化, 最大限度地节省带宽, 并且平 均码率可调节, 缺点是计算相对复杂。  When using the variable interval transmission scheme of mode 2, a certain algorithm is used to evaluate the signal of the silent segment in real time, and according to the real-time change of the signal, it is determined whether the SID frame needs to be transmitted. The advantage of this method is flexibility, it can be changed according to the real-time change of the signal, the bandwidth is saved to the maximum, and the average code rate can be adjusted. The disadvantage is that the calculation is relatively complicated.
ITU-T G.729语音编码器中就是釆用的可变间隔发送方式,通过对信号的 LPC等参数的计算, 来衡量信号是否发生重大改变, 以决定是否需要更新, 虽然该方法能自适应地跟踪信号, 但是计算复杂度较高。 该方法是建立在线 性预测基础上的。 首先对信号进行线性预测编码( Linear Predictive Coding, 简称 LPC )得到信号的线性预测参数 a和残差能量 E, 然后使用该系数的数 学表示, 与存储器中存储的上一个发送的 SID帧的同参数做比较, 如果 LPC 的包络或者能量中任意一个比较结果大于一定的门限,则认为信号发生变化, 则发送 SID更新帧, 否则不发送。 由于该方法是在时域进行, 首先必须要进 行信号的 LPC分析,计算比较复杂。 并且 LPC系数对信号频谱的真实反映程 度取决于 LPC的阶数, 而 LPC的阶数与计算复杂度是成正比的。 另外使用信 号的残差能量或者 LPC包络单独进行检测,难以整体反映信号的变化。比如, 如果 LPC对本帧信号的描述不准确, 则直接导致信号的残差能量发生比较大 变化。 发明内容 In the ITU-T G.729 speech coder, the variable interval transmission mode is used to measure whether the signal changes significantly by calculating the parameters such as LPC of the signal to determine whether the update is needed, although the method can be adaptive. The signal is tracked, but the computational complexity is high. This method is based on linear prediction. First, the linear predictive coding (LPC) of the signal is obtained to obtain the linear prediction parameter a and the residual energy E of the signal, and then the mathematical representation of the coefficient is used, and the same parameter of the last transmitted SID frame stored in the memory. To compare, if any of the LPC envelopes or energy comparison results are greater than a certain threshold, then the signal is considered to change, then the SID update frame is sent, otherwise it is not sent. Since the method is performed in the time domain, the LPC analysis of the signal must first be performed, and the calculation is complicated. And the true reflection degree of the LPC coefficient on the signal spectrum depends on the order of the LPC, and the order of the LPC is directly proportional to the computational complexity. In addition, the residual energy of the signal or the LPC envelope is separately detected, and it is difficult to reflect the change of the signal as a whole. For example, if the description of the frame signal by the LPC is inaccurate, it directly leads to a relatively large change in the residual energy of the signal. Summary of the invention
本发明实施方式提供一种进行语音自适应非连续传输的方法及装置, 克 服相关有技术中的釆用固定间隔方式不能灵活跟踪信号变化, 釆用可变间隔 方式又必须有线性预测等多参数的计算导致计算复杂度高的缺点。  Embodiments of the present invention provide a method and apparatus for performing speech adaptive discontinuous transmission, which overcomes the problem that the fixed interval method in the related art cannot flexibly track signal changes, and the variable interval method must have multiple parameters such as linear prediction. The calculations lead to the disadvantage of high computational complexity.
为了解决上述技术问题, 本发明实施方式提供了一种进行语音自适应非 连续传输的方法, 包括:  In order to solve the above technical problem, an embodiment of the present invention provides a method for performing voice adaptive discontinuous transmission, including:
在进行语音自适应非连续传输中, 根据当前的语音信号帧和上一静音插 入描述帧的频谱信息决定是否发送静音插入描述帧。  In performing speech adaptive discontinuous transmission, whether to send a mute insertion description frame is determined according to the current speech signal frame and the spectrum information of the previous mute insertion description frame.
所述语音信号帧的频谱信息是指根据所述语音信号帧的频域信号计算得 到的频谱信息, 或者, 对所述语音信号帧的频域信号进行平滑处理后根据平 滑处理后的频域信号计算得到的频语信息。  The spectrum information of the voice signal frame refers to the spectrum information calculated according to the frequency domain signal of the voice signal frame, or the frequency domain signal of the voice signal frame is smoothed and processed according to the smoothed frequency domain signal. Calculated frequency information.
根据当前的语音信号帧和上一静音插入描述帧的频语信息决定是否发送 静音插入描述帧的步骤包括:  The step of determining whether to send the mute insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame includes:
判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入描述 帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧的频谱能量 和上一静音插入描述帧的频谱能量的差距大于第一预设限度时, 发送静音插 入描述帧。  Determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy of the speech signal frame and a previous mute insertion description When the difference in the spectral energy of the frame is greater than the first preset limit, the mute insertion description frame is sent.
根据当前语音信号帧和上一静音插入描述帧的频语信息决定是否发送静 音插入描述帧的步骤包括:  The step of determining whether to send the silence insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame includes:
判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入描述 帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧的频谱能量 和所述上一静音插入描述帧的频谱能量的差距大于第一预设限度时, 进一步 判断所述语音信号帧的频谱能量和所述上一静音插入描述帧的频谱能量的差 距是否大于第二预设限度, 如果是, 连续发送两个静音插入描述帧, 其中第 二预设限度对应的频谱能量差距大于第一预设限度对应的频谱能量差距。  Determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy of the speech signal frame and the previous muting When the difference between the spectral energy of the description frame is greater than the first preset limit, further determining whether the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a second preset limit, if The two mute insertion description frames are continuously sent, wherein the second preset limit corresponds to a spectral energy difference greater than a spectral energy difference corresponding to the first preset limit.
所述语音信号帧的频谱能量和所述上一静音插入描述帧的频谱能量的差 距大于预设限度是指:  The difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit:
所述语音信号帧的频谱能量与上一静音插入描述帧的频谱能量的比值大 于预设限度对应的比值门限或者小于此比值门限的倒数, 其中所述比值门限 为大于 1的实数; The ratio of the spectral energy of the speech signal frame to the spectral energy of the last mute insertion description frame is large a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than one;
或者,  Or,
所述语音信号帧的频谱能量与所述上一静音插入描述帧的频谱能量的差 值绝对值大于差值门限。  The difference between the spectral energy of the speech signal frame and the spectral energy of the last mute insertion description frame is greater than the difference threshold.
根据当前的语音信号帧和上一静音插入描述帧的频语信息决定是否发送 静音插入描述帧的步骤包括:  The step of determining whether to send the mute insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame includes:
判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入描述 帧的频谱能量的绝对值大于单帧能量门限时, 计算所述语音信号帧和上一静 音插入描述帧的频谱能量的频语相关值, 判断所计算的频语相关值小于频谱 相关性门限时, 发送静音插入描述帧。  Determining the absolute value of the spectral energy of the speech signal frame and/or the absolute value of the spectral energy of the last mute insertion description frame is greater than the single frame energy threshold, calculating the speech signal frame and the previous mute insertion description frame The frequency-dependent value of the spectral energy, when it is judged that the calculated frequency-related value is less than the spectral correlation threshold, sends a mute insertion description frame.
为解决上述技术问题, 本发明实施方式还提供一种进行语音自适应非连 续传输的装置, 包括静音插入描述帧处理单元和静音插入描述帧存储单元; 其中,  In order to solve the above technical problem, an embodiment of the present invention further provides an apparatus for performing voice adaptive non-contiguous transmission, including a mute insertion description frame processing unit and a mute insertion description frame storage unit;
所述静音插入描述帧处理单元设置为根据当前的语音信号帧和上一静音 插入描述帧的频谱信息决定是否发送静音插入描述帧;  The mute insertion description frame processing unit is configured to determine whether to send a mute insertion description frame according to the current speech signal frame and the spectrum information of the last mute insertion description frame;
所述静音插入描述帧存储单元设置为在所述静音插入描述帧处理单元发 送静音插入描述帧后, 存储此静音插入描述帧的频谱信息。  The mute insertion description frame storage unit is configured to store the spectrum information of the mute insertion description frame after the mute insertion description frame processing unit transmits the mute insertion description frame.
所述静音插入描述帧处理单元还设置为对所述语音信号帧的频域信号进 行平滑处理后根据平滑处理后的频域信号计算得到所述语音信号帧的频语信 息;  The mute insertion description frame processing unit is further configured to perform smoothing processing on the frequency domain signal of the speech signal frame, and calculate the frequency information of the speech signal frame according to the smoothed frequency domain signal;
所述静音插入描述帧存储单元还设置为存储经过平滑处理后的频域信 号。  The mute insertion description frame storage unit is further arranged to store the smoothed frequency domain signal.
所述静音插入描述帧处理单元是设置为通过如下方式决定是否发送静音 插入描述帧: 判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插 入描述帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧的频 谱能量和上一静音插入描述帧的频谱能量的差距大于第一预设限度时, 发送 静音插入描述帧; 或者, 判断所述语音信号帧的频谱能量的绝对值和 /或所述 上一静音插入描述帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音 信号帧的频谱能量和所述上一静音插入描述帧的频谱能量的差距大于第一预 设限度时, 进一步判断所述语音信号帧的频谱能量和所述上一静音插入描述 帧的频谱能量的差距是否大于第二预设限度, 如果是, 连续发送两个静音插 入描述帧, 其中第二预设限度对应的频谱能量差距大于第一预设限度对应的 频谱能量差距; The mute insertion description frame processing unit is configured to decide whether to transmit a mute insertion description frame by: determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame When the value is greater than the single frame energy threshold, and the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than the first preset limit, the mute insertion description frame is sent; or the speech signal frame is determined. Absolute value of the spectral energy and / or the The absolute value of the spectral energy of the previous mute insertion description frame is greater than the single frame energy threshold, and the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than the first preset limit, further Determining whether a difference between a spectral energy of the speech signal frame and a spectral energy of the previous mute insertion description frame is greater than a second preset limit, and if so, continuously transmitting two mute insertion description frames, wherein the second preset limit corresponds to The spectral energy difference is greater than the spectral energy difference corresponding to the first preset limit;
其中, 所述语音信号帧的频谱能量和所述上一静音插入描述帧的频谱能 量的差距大于预设限度是指: 所述语音信号帧的频谱能量与上一静音插入描 述帧的频谱能量的比值大于预设限度对应的比值门限或者小于此比值门限的 倒数, 其中所述比值门限为大于 1的实数; 或者, 所述语音信号帧的频语能 量与所述上一静音插入描述帧的频谱能量的差值绝对值大于差值门限。  The difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit: the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame The ratio is greater than a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than 1; or the frequency energy of the speech signal frame and the spectrum of the previous muting insertion description frame The absolute difference in energy is greater than the difference threshold.
所述静音插入描述帧处理单元是设置为通过如下方式决定是否发送静音 插入描述帧: 判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插 入描述帧的频谱能量的绝对值大于单帧能量门限时, 计算所述语音信号帧和 上一静音插入描述帧的频谱能量的频语相关值, 判断所计算的频语相关值小 于频谱相关性门限时, 发送静音插入描述帧。  The mute insertion description frame processing unit is configured to decide whether to transmit a mute insertion description frame by: determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame When the value is greater than the single-frame energy threshold, the frequency-correlation value of the spectral energy of the speech signal frame and the previous mute insertion description frame is calculated, and when the calculated frequency-related value is less than the spectral correlation threshold, the mute insertion description frame is sent. .
本方案可以克服相关技术中的釆用固定间隔方式不能灵活跟踪信号变 化, 釆用可变间隔方式又必须有线性预测等多参数的计算导致计算复杂度高 的缺点。 本方案直接在频域进行, 能很好地跟踪信号的变化, 在保持较低平 均码率的同时保证音质。 This solution can overcome the shortcomings of the related art that the fixed interval method cannot flexibly track the signal change, and the variable interval method must have linear parameters and other multi-parameter calculations, resulting in high computational complexity. This scheme is directly implemented in the frequency domain, which can well track the change of the signal and ensure the sound quality while maintaining a low average code rate.
附图概述 BRIEF abstract
图 1是进行语音自适应非连续传输的装置的结构示意图;  1 is a schematic structural diagram of an apparatus for performing voice adaptive discontinuous transmission;
图 2是进行语音自适应非连续传输的装置的另一种结构示意图; 图 3是具体实施例二中进行语音自适应非连续传输的流程示意图; 图 4是具体实施例三中进行语音自适应非连续传输的流程示意图. 本发明的较佳实施方式 2 is a schematic structural diagram of another apparatus for performing voice adaptive discontinuous transmission; FIG. 3 is a schematic flowchart of performing voice adaptive discontinuous transmission in Embodiment 2; FIG. 4 is a voice adaptive method in Embodiment 3. Schematic diagram of the process of discontinuous transmission. Preferred embodiment of the invention
如图 1所示, 进行语音自适应非连续传输的装置包括静音插入描述帧处 理单元和静音插入描述帧存储单元。  As shown in Fig. 1, the apparatus for performing voice adaptive discontinuous transmission includes a mute insertion description frame processing unit and a mute insertion description frame storage unit.
静音插入描述帧处理单元设置为根据当前的语音信号帧和上一静音插入 描述帧的频语信息决定是否发送静音插入描述帧;  The mute insertion description frame processing unit is configured to determine whether to send the mute insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame;
静音插入描述帧存储单元设置为在所述装置发送静音插入描述帧后, 存 储此静音插入描述帧的频语信息。  The mute insertion description frame storage unit is arranged to store the frequency information of the mute insertion description frame after the device transmits the mute insertion description frame.
实施方式一中, 静音插入描述帧处理单元是设置为通过如下方式决定是 否发送静音插入描述帧: 判断所述语音信号帧的频谱能量的绝对值和 /或所述 上一静音插入描述帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音 信号帧的频谱能量和上一静音插入描述帧的频谱能量的差距大于第一预设限 度时, 发送静音插入描述帧。  In the first embodiment, the mute insertion description frame processing unit is configured to determine whether to send the mute insertion description frame by: determining the absolute value of the spectral energy of the speech signal frame and/or the spectrum of the previous mute insertion description frame. The absolute value of the energy is greater than the single frame energy threshold, and when the difference between the spectral energy of the speech signal frame and the spectral energy of the last mute insertion description frame is greater than the first preset limit, the mute insertion description frame is sent.
静音插入描述帧处理单元还可设置为通过如下方式决定是否发送静音插 入描述帧: 判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入 描述帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧的频谱 能量和所述上一静音插入描述帧的频谱能量的差距大于第一预设限度时, 进 一步判断所述语音信号帧的频谱能量和所述上一静音插入描述帧的频谱能量 的差距值是否大于第二预设限度, 如果是, 连续发送两个静音插入描述帧, 其中第二预设限度对应的频谱能量差距大于第一预设限度对应的频谱能量差 距。  The mute insertion description frame processing unit may be further configured to decide whether to transmit the mute insertion description frame by: determining an absolute value of the spectral energy of the speech signal frame and/or an absolute value of the spectral energy of the last mute insertion description frame If the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than the first predetermined limit, further determining the spectral energy of the speech signal frame and the The previous mute insertion describes whether the difference value of the spectral energy of the frame is greater than the second preset limit. If yes, two mute insertion description frames are continuously sent, where the spectral energy difference corresponding to the second preset limit is greater than the first preset limit. The spectral energy gap.
其中, 语音信号帧的频谱能量和上一静音插入描述帧的频谱能量的差距 值大于预设限度是指:  Wherein, the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit:
语音信号帧的频谱能量与上一静音插入描述帧的频谱能量的比值大于预 设限度对应的比值门限或者小于此比值门限的倒数, 其中所述比值门限为大 于 1的实数; 或者, 语音信号帧的频谱能量与所述上一静音插入描述帧的频 谱能量的差值绝对值大于差值门限。  The ratio of the spectral energy of the speech signal frame to the spectral energy of the previous mute insertion description frame is greater than a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than 1; or, the speech signal frame The absolute value of the difference between the spectral energy and the spectral energy of the last mute insertion description frame is greater than the difference threshold.
实施方式二中, 静音插入描述帧处理单元设置为通过如下方式决定是否 发送静音插入描述帧: 判断所述语音信号帧的频谱能量的绝对值和 /或所述上 一静音插入描述帧的频语能量的绝对值大于单帧能量门限时, 根据当前的语 音信号帧和上一静音插入描述帧的频谱能量计算两者的频语相关值, 判断所 述频谱相关值小于频谱相关性门限时, 发送静音插入描述帧。 In the second embodiment, the mute insertion description frame processing unit is configured to decide whether to send the mute insertion description frame by: determining the absolute value of the spectral energy of the speech signal frame and/or the upper When the absolute value of the frequency speech energy of the mute insertion description frame is greater than the single frame energy threshold, the frequency correlation value of the frame is calculated according to the current speech signal frame and the spectrum energy of the previous mute insertion description frame, and the spectrum correlation value is determined. When less than the spectral correlation threshold, the mute insertion description frame is sent.
实施方式三中, 静音插入描述帧处理单元设置为同时通过两者的频谱能 量的差距和频语相关值决定是否发送静音插入描述帧。  In Embodiment 3, the mute insertion description frame processing unit is configured to determine whether to transmit the mute insertion description frame by the difference of the spectrum energy of the two and the frequency correlation value.
如图 2所示, 所述装置还可以包括平滑滤波单元; 平滑滤波单元设置为 对语音信号的频域信号进行平滑滤波后, 输入至所述静音插入描述帧处理单 元, 静音插入描述帧处理单元对平滑处理后的频域信号进行上述处理, 静音 插入描述帧存储单元还需保存平滑处理后的频域信号。  As shown in FIG. 2, the apparatus may further include: a smoothing filtering unit; the smoothing filtering unit is configured to perform smoothing filtering on the frequency domain signal of the voice signal, and input to the mute insertion description frame processing unit, and the mute insertion description frame processing unit The above processing is performed on the smoothed frequency domain signal, and the mute insertion description frame storage unit also needs to save the smoothed frequency domain signal.
进行语音自适应非连续传输的方法包括: 在进行语音自适应非连续传输 中, 根据当前的语音信号帧和上一静音插入描述帧的频语信息决定是否发送 静音插入描述帧。 The method for performing voice adaptive discontinuous transmission includes: In performing voice adaptive discontinuous transmission, determining whether to send a silence insertion description frame according to a current voice signal frame and a frequency information of a previous silence insertion description frame.
所述语音信号帧的频谱信息是指根据所述语音信号帧的频域信号计算得 到的频谱信息, 或者, 对所述语音信号帧的频域信号进行平滑处理后根据平 滑处理后的频域信号计算得到的频语信息。  The spectrum information of the voice signal frame refers to the spectrum information calculated according to the frequency domain signal of the voice signal frame, or the frequency domain signal of the voice signal frame is smoothed and processed according to the smoothed frequency domain signal. Calculated frequency information.
平滑处理主要为了更准确地比较信号频谱变化, 减小频谱的细节对整体 比较的影响, 消除频谱尖峰和毛刺, 使输出频谱更加平滑, 使得频谱包络更 加平稳。 此频谱平滑可以使用一个平滑滤波器实现。 以 16kHz釆样, 20ms 帧长为例进行说明。 通过釆用快速傅里叶变换(FFT ) , 将时域信号变换到频 域, 得到本帧信号的频谱参数, FFT釆用长度为 320点。 可以釆用以下平滑 滤波器:  The smoothing process is mainly to more accurately compare the spectral changes of the signal, reduce the influence of the details of the spectrum on the overall comparison, eliminate the spectral spikes and burrs, and make the output spectrum smoother, making the spectral envelope more stable. This spectral smoothing can be achieved using a smoothing filter. Take 16kHz sample and 20ms frame length as an example. By using a fast Fourier transform (FFT), the time domain signal is transformed into the frequency domain to obtain the spectral parameters of the frame signal, and the FFT length is 320 points. The following smoothing filters can be used:
H(z) = a0Z~2 + αλΖ~ι + 2 + α3Ζ + α4Ζ2 其中 系数 [ , A , α ^ , ]是平滑系数 , 取值可以为 [0.15,0.15,0.4,0.15,0.15]。 经过平滑处理后, 语线趋势不变, 但是瞬时突变减 小, 更有利于观察信号频谱包络的变化。 上述频谱平滑包括但是不限于上述 使用滤波器的方式。 在滤波器使用过程中, 也可以通过调节滤波器的系数或 者阶数来达到不同的调整效果。 实施方式一中, 判断所述语音信号帧的频谱能量的绝对值和 /或所述上一 静音插入描述帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号 帧的频谱能量和上一静音插入描述帧的频谱能量的差距大于第一预设限度 时, 发送静音插入描述帧。 H(z) = a 0 Z~ 2 + α λ Ζ~ ι + 2 + α 3 Ζ + α 4 Ζ 2 where the coefficients [ , A , α ^ , ] are the smoothing coefficients, which can be [0.15, 0.15, 0.4, 0.15, 0.15]. After smoothing, the trend of the line is unchanged, but the instantaneous mutation is reduced, which is more conducive to observing the change of the signal envelope of the signal. The above spectral smoothing includes, but is not limited to, the above-described manner of using a filter. During the use of the filter, different adjustment effects can also be achieved by adjusting the coefficients or orders of the filter. In Embodiment 1, determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy sum of the speech signal frame When the previous mute insertion describes that the difference in the spectral energy of the frame is greater than the first preset limit, the mute insertion description frame is sent.
或者, 判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入 描述帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧的频谱 能量和所述上一静音插入描述帧的频谱能量的差距大于第一预设限度时, 进 一步判断所述语音信号帧的频谱能量和所述上一静音插入描述帧的频谱能量 的差距是否大于第二预设限度, 如果是, 连续发送两个静音插入描述帧, 其 中第二预设限度对应的频谱能量差距大于第一预设限度对应的频谱能量差 距。  Or determining an absolute value of the spectral energy of the speech signal frame and/or an absolute value of the spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and the spectral energy of the speech signal frame and the upper And determining, by a mute insertion, that the difference between the spectral energy of the frame is greater than the first preset limit, further determining whether a difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a second preset limit, If yes, two mute insertion description frames are continuously sent, wherein the second preset limit corresponds to a spectral energy difference greater than a spectral energy difference corresponding to the first preset limit.
其中, 所述语音信号帧的频谱能量和所述上一静音插入描述帧的频谱能 量的差距大于预设限度是指: 所述语音信号帧的频谱能量与上一静音插入描 述帧的频谱能量的比值大于预设限度对应的比值门限或者小于此比值门限的 倒数, 其中所述比值门限为大于 1的实数; 或者, 所述语音信号帧的频语能 量与所述上一静音插入描述帧的频谱能量的差值绝对值大于差值门限。  The difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit: the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame The ratio is greater than a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than 1; or the frequency energy of the speech signal frame and the spectrum of the previous muting insertion description frame The absolute difference in energy is greater than the difference threshold.
实施方式二中, 判断所述语音信号帧的频谱能量的绝对值和 /或所述上一 静音插入描述帧的频语能量的绝对值大于单帧能量门限时, 根据当前的语音 信号帧和上一静音插入描述帧的频谱能量计算两者的频语相关值, 判断所述 频语相关值小于频语相关性门限时, 发送静音插入描述帧。  In Embodiment 2, when determining the absolute value of the spectral energy of the speech signal frame and/or the absolute value of the frequency speech energy of the last mute insertion description frame is greater than the single frame energy threshold, according to the current speech signal frame and the upper A mute insertion describes a frequency-correlation value of the spectral energy of the frame, and when the frequency-related value is less than the frequency-dependent threshold, the mute insertion description frame is sent.
实施方式三中, 可以同时根据两者的频谱能量的差距和频语相关值决定 是否发送静音插入描述帧。  In the third embodiment, whether the mute insertion description frame is sent may be determined according to the difference of the spectrum energy of the two and the frequency correlation value.
下面通过具体实施例进行详细说明。 The details will be described below by way of specific examples.
具体实施例一  Specific embodiment 1
本实施例中釆用频语相关值参数进行判断。  In this embodiment, the frequency word correlation value parameter is used for judgment.
本装置在每次发送 SID帧后, 将此 SID帧的频谱能量信息存储于 SID帧 存储单元中, 即静音插入描述帧存储单元中存储的信息为最近一次发送的 SID帧的频谱能量信息。 After the SID frame is sent, the device stores the spectrum energy information of the SID frame in the SID frame storage unit, that is, the information stored in the silence insertion description frame storage unit is the last transmission. Spectrum energy information of the SID frame.
在进行是否发送 SID帧的判决时, 首先判断当前语音信号帧的频谱能量 的绝对值和上一静音插入描述帧的频谱能量的绝对值中至少一个大于单帧能 量门限(THR1 ) , 如果不满足上述条件, 则认为信号执行维持低能量, 不需 要发送 SID帧, 满足上述条件后, 根据下式计算当前语音信号帧的频谱能量 和上一静音插入描述帧的频谱能量的相关值:  When determining whether to send the SID frame, first determining that at least one of the absolute value of the spectral energy of the current speech signal frame and the absolute value of the spectral energy of the previous mute insertion description frame is greater than a single frame energy threshold (THR1), if not satisfied In the above condition, the signal execution is considered to maintain low energy, and the SID frame does not need to be transmitted. After the above conditions are satisfied, the correlation between the spectral energy of the current speech signal frame and the spectral energy of the previous mute insertion description frame is calculated according to the following formula:
∑ y slast )∑ ys last )
Figure imgf000011_0001
其中, S(i)代表当前语音信号帧的频谱能量, Slast(i)表示当前帧的前一 SID 帧的频谱能量, N代表频谱长度, 本实施例中取 320。
Figure imgf000011_0001
Where S(i) represents the spectral energy of the current speech signal frame, S last (i) represents the spectral energy of the previous SID frame of the current frame, and N represents the spectral length, which is 320 in this embodiment.
如果上式中的频语相关值 的绝对值小于频语相关性门限(THR2 ) , 则 判定需要发送 SID帧, 同时更新 SID帧存储单元的信息。  If the absolute value of the frequency-dependent value in the above equation is less than the frequency-dependent threshold (THR2), it is determined that the SID frame needs to be transmitted while updating the information of the SID frame storage unit.
具体实施例二 Specific embodiment 2
本实施例中釆用频谱能量的比值进行判断。  In this embodiment, the ratio of the spectral energy is used to determine.
本装置在每次发送 SID帧后, 将此 SID帧的频谱能量信息存储于 SID帧 存储单元中, 即静音插入描述帧存储单元中存储的信息为最近一次发送的 SID帧的频谱能量信息。  After the SID frame is sent, the device stores the spectrum energy information of the SID frame in the SID frame storage unit, that is, the information stored in the silence insertion description frame storage unit is the spectrum energy information of the last transmitted SID frame.
如图 3所示, 在进行是否发送 SID帧的判决时, 首先判断当前语音信号 帧的频谱能量的绝对值和上一静音插入描述帧的频谱能量的绝对值中至少一 个大于单帧能量门限, 如果不满足上述条件, 则认为信号执行维持低能量, 不需要发送 SID帧, 满足上述条件后, 根据下式计算当前语音信号帧的频谱 能量和上一静音插入描述帧的频谱能量的比值:
Figure imgf000012_0001
As shown in FIG. 3, when determining whether to send a SID frame, first determining that at least one of an absolute value of a spectral energy of a current speech signal frame and an absolute value of a spectral energy of a previous mute insertion description frame is greater than a single frame energy threshold, If the above conditions are not met, the signal execution is considered to maintain low energy, and the SID frame does not need to be transmitted. After the above conditions are satisfied, the ratio of the spectral energy of the current speech signal frame to the spectral energy of the last mute insertion description frame is calculated according to the following formula:
Figure imgf000012_0001
其中, S(i)代表当前语音信号帧的频谱能量, Slast(i)表示当前帧的前一 SID 帧的频谱能量, N代表频谱长度。 Where S(i) represents the spectral energy of the current speech signal frame, S last (i) represents the spectral energy of the previous SID frame of the current frame, and N represents the spectral length.
如果两者的比值 R2大于门限值 THR3或者小于 THR3的倒数, THR3为 大于 1的实数, 说明信号能量发生较大变化, 需发送一个 SID帧, 否则, 不 需要发送 SID帧。 If the ratio R 2 of the two is greater than the threshold THR3 or less than the reciprocal of THR3, THR3 is a real number greater than 1, indicating that the signal energy changes greatly, and a SID frame needs to be sent. Otherwise, the SID frame does not need to be transmitted.
具体实施例三 Concrete embodiment 3
本实施例中釆用频谱能量的比值进行判断。  In this embodiment, the ratio of the spectral energy is used to determine.
本装置在每次发送 SID帧后, 将此 SID帧的频谱能量信息存储于 SID帧 存储单元中, 即静音插入描述帧存储单元中存储的信息为最近一次发送的 SID帧的频谱能量信息。  After the SID frame is sent, the device stores the spectrum energy information of the SID frame in the SID frame storage unit, that is, the information stored in the silence insertion description frame storage unit is the spectrum energy information of the last transmitted SID frame.
如图 4所示, 在进行是否发送 SID帧的判决时, 首先判断当前语音信号 帧的频谱能量的绝对值和上一静音插入描述帧的频谱能量的绝对值中至少一 个大于单帧能量门限, 如果不满足上述条件, 则认为信号执行维持低能量, 不需要发送 SID帧, 满足上述条件后, 根据下式计算当前语音信号帧的频谱 能量和上一静音插入描述帧的频谱能量的比值:  As shown in FIG. 4, when determining whether to send a SID frame, first determining that at least one of an absolute value of a spectral energy of a current speech signal frame and an absolute value of a spectral energy of a previous mute insertion description frame is greater than a single frame energy threshold, If the above conditions are not met, the signal execution is considered to maintain low energy, and the SID frame does not need to be transmitted. After the above conditions are satisfied, the ratio of the spectral energy of the current speech signal frame to the spectral energy of the last mute insertion description frame is calculated according to the following formula:
Figure imgf000012_0002
Figure imgf000012_0002
其中, S(i)代表当前语音信号帧的频谱能量, Slast(i)表示当前帧的前一 SID 帧的频谱能量, N代表频谱长度。 Where S(i) represents the spectral energy of the current speech signal frame, S last (i) represents the spectral energy of the previous SID frame of the current frame, and N represents the spectral length.
如果两者的比值 R2大于门限值 THR3或者小于 THR3的倒数, THR3为 大于 1的实数, 说明信号能量发生较大变化, 进行下一步判断, 否则, 不需 要发送 SID帧。 If the ratio R 2 of the two is greater than the threshold THR3 or less than the reciprocal of THR3, THR3 is A real number greater than 1 indicates that the signal energy has changed greatly, and the next step is judged. Otherwise, there is no need to send a SID frame.
进一步判断两者的比值 R2大于门限值 THR4或者小于 THR4的倒数时 ( THR4为大于 THR3的实数),说明信号能量突然发生非常大的变化(比如 静音中突然出现能量非常大的噪声) , 则设置一个连续更新信号, 并强制连 续发送两个 SID帧, 不满足此条件时, 只需发送一个 SID帧。 Further determining that the ratio R 2 of the two is greater than the threshold value THR4 or less than the reciprocal of THR4 (THR4 is a real number greater than THR3), indicating that the signal energy suddenly changes very greatly (such as sudden occurrence of very large energy noise in the mute), Then set a continuous update signal and force two SID frames to be sent continuously. When this condition is not met, only one SID frame needs to be sent.
具体实施例四 Concrete embodiment 4
本实施例中釆用频谱能量的差值进行判断。  In this embodiment, the difference is determined by the difference in spectral energy.
本装置在每次发送 SID帧后, 将此 SID帧的频谱能量信息存储于 SID帧 存储单元中, 即静音插入描述帧存储单元中存储的信息为最近一次发送的 SID帧的频谱能量信息。  After the SID frame is sent, the device stores the spectrum energy information of the SID frame in the SID frame storage unit, that is, the information stored in the silence insertion description frame storage unit is the spectrum energy information of the last transmitted SID frame.
在进行是否发送 SID帧的判决时, 首先判断当前语音信号帧的频谱能量 的绝对值和上一静音插入描述帧的频谱能量的绝对值中至少一个大于单帧能 量门限,如果不满足上述条件,则认为信号执行维持低能量, 不需要发送 SID 帧, 满足上述条件后, 根据下式计算当前语音信号帧的频谱能量和上一静音 插入描述帧的频谱能量的差值:  When determining whether to send the SID frame, first determining that at least one of an absolute value of the spectral energy of the current speech signal frame and an absolute value of the spectral energy of the previous mute insertion description frame is greater than a single frame energy threshold, if the above condition is not met, It is considered that the signal execution maintains low energy, and does not need to transmit a SID frame. After satisfying the above conditions, the difference between the spectral energy of the current speech signal frame and the spectral energy of the last mute insertion description frame is calculated according to the following formula:
N-l N-1  N-l N-1
R3 = X * 5( -∑ 5^( * 5^( R 3 = X * 5( -∑ 5^( * 5^(
i=0 i=0  i=0 i=0
其中, S(i)代表当前语音信号帧的频谱能量, Slast(i)表示当前帧的前一 SID 帧的频谱能量, N代表频谱长度。 Where S(i) represents the spectral energy of the current speech signal frame, S last (i) represents the spectral energy of the previous SID frame of the current frame, and N represents the spectral length.
如果两者的差值 R3的绝对值大于门限值 THR5, 说明信号能量发生较大 变化, 需要发送 SID帧, 同时更新 SID帧存储单元的信息。 If the absolute value of the difference R 3 is greater than the threshold value THR5, it indicates that the signal energy changes greatly, and the SID frame needs to be sent, and the information of the SID frame storage unit is updated at the same time.
在上述方案以及具体实施例中, 可以加入緩冲(hangover )算法, 以保证 在语音结尾阶段的音质, 并且使得 CNG算法初始化完成。 即在连续语音帧后 检测到静音帧时, 不是直接进入不连续传输方式, 而是在最初的几个静音帧 继续按照语音帧方式来处理。 之后, 才进入到不连续传输模式。 例如, 在语 音帧后检测到第一个静音帧时, 在最初的 7个静音帧继续按照语音帧方式来 处理。之后如果检测到的仍然是静音帧,则发送 SID— FIRST帧,在 SID— FIRST 之后第三帧发送 SID— UPDATE, 然后根据上述描述的判决算法决定是否发送 SID帧。 所述 hangover算法包括了对连续语音帧的计数, 在检测到第一个静 音帧时, 判断此连续语音帧的数值大于设置的緩冲门限(thr— hangover ) 时, 按照上述的緩冲算法设置緩冲阶段, 否则, 直接发送 SID— UPDATE, 并且进 入自动检测状态, 同时将对连续语音帧的计数清零。 In the above scheme and in the specific embodiment, a hangover algorithm may be added to ensure the sound quality at the end of the speech, and the CNG algorithm initialization is completed. That is, when a silence frame is detected after a continuous speech frame, instead of directly entering the discontinuous transmission mode, the first few silent frames continue to be processed in accordance with the voice frame mode. After that, it enters the discontinuous transmission mode. For example, in the language When the first silence frame is detected after the tone frame, the first 7 silence frames continue to be processed in the voice frame mode. Then, if the detected silence frame is still a silence frame, the SID_ FIRST frame is transmitted, and the SID_UPDATE is transmitted in the third frame after SID_ FIRST, and then the SID frame is sent according to the decision algorithm described above. The hangover algorithm includes counting the continuous speech frames. When the first silence frame is detected, when the value of the continuous speech frame is greater than the set buffer threshold (thr hangover), the buffer algorithm is set according to the above buffer algorithm. Buffer phase, otherwise, send SID_UPDATE directly, and enter the automatic detection state, and the count of consecutive speech frames will be cleared.
在上述方案以及具体实施例中, 还可以设置最大 SID间隔门限值。 在当 前帧进行判决时,当前帧与上一 SID帧的间隔超过此最大 SID间隔门限值时, 强制更新一帧 SID,以保证系统稳定, 减少由于 SID帧丟失等异常情况造成的 不利影响。  In the above scheme and in the specific embodiment, the maximum SID interval threshold value may also be set. When the current frame is judged, the interval between the current frame and the previous SID frame exceeds the maximum SID interval threshold, and the SID is forced to be updated to ensure the stability of the system and reduce the adverse effects caused by abnormal conditions such as SID frame loss.
在上述方案以及具体实施例中, 还可以设置最小 SID间隔门限值。 在当 前帧进行判决时, 当前帧与上一 SID帧的间隔超过小于此最小 SID间隔门限 值时, 判定不发送 SID帧, 暂时不更新, 以减少 SID帧的频繁发送。  In the above scheme and in the specific embodiment, a minimum SID interval threshold value may also be set. When the current frame is judged, when the interval between the current frame and the previous SID frame exceeds the minimum SID interval threshold, it is determined that the SID frame is not sent, and is not updated temporarily, so as to reduce frequent transmission of the SID frame.
本方案可以用于实时双向通信如无线、 IP会议电视等领域的语音的不连 续传输情况, 在基本不影响音质的情况下有效节省带宽资源, 提高网络使用 效率。 本方案计算复杂度较低, 对信号频谱变化的跟踪比较准确, 能够在噪 声变化快的情况下进行有效跟踪, 在噪声平稳情况下有效节省带宽, 并且不 依赖于具体的语音频编码器, 具有灵活高效的特点。 The solution can be used for real-time two-way communication, such as wireless, IP conferencing, television, and other areas of voice transmission, to effectively save bandwidth resources and improve network usage efficiency without substantially affecting sound quality. The scheme has low computational complexity, accurate tracking of signal spectrum changes, effective tracking in the case of fast noise changes, effective bandwidth saving in the case of noise smoothness, and independent of specific speech and audio encoders. Flexible and efficient.
需要说明的是, 在不冲突的情况下, 本申请中的实施例及实施例中的特 征可以相互任意组合。 当然, 本发明还可有其他多种实施例, 在不背离本发明精神及其实质的 但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。 本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序 来指令相关硬件完成, 所述程序可以存储于计算机可读存储介质中, 如只读 存储器、 磁盘或光盘等。 可选地, 上述实施例的全部或部分步骤也可以使用 一个或多个集成电路来实现。 相应地, 上述实施例中的各模块 /单元可以釆用 硬件的形式实现, 也可以釆用软件功能模块的形式实现。 本发明不限制于任 何特定形式的硬件和软件的结合。 It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other. It is a matter of course that the invention may be embodied in various other forms and modifications without departing from the spirit and scope of the invention. One of ordinary skill in the art can understand that all or part of the above steps can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium, such as read only. Memory, disk or disc, etc. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware, or may be implemented in the form of a software function module. The invention is not limited to any specific form of combination of hardware and software.
工业实用性 Industrial applicability
本方案可以克服相关技术中的釆用固定间隔方式不能灵活跟踪信号变 化, 釆用可变间隔方式又必须有线性预测等多参数的计算导致计算复杂度高 的缺点。 本方案直接在频域进行, 能很好地跟踪信号的变化, 在保持较低平 均码率的同时保证音质。  This solution can overcome the shortcomings of the related art that the fixed interval method cannot flexibly track the signal change, and the variable interval method must have linear parameters and other multi-parameter calculations, resulting in high computational complexity. This scheme is directly implemented in the frequency domain, which can well track the change of the signal and ensure the sound quality while maintaining a low average code rate.

Claims

权 利 要 求 书 Claim
1、 一种进行语音自适应非连续传输的方法, 包括:  1. A method for performing speech adaptive discontinuous transmission, comprising:
在进行语音自适应非连续传输中, 根据当前的语音信号帧和上一静音插 入描述帧的频谱信息决定是否发送静音插入描述帧。  In performing speech adaptive discontinuous transmission, whether to send a mute insertion description frame is determined according to the current speech signal frame and the spectrum information of the previous mute insertion description frame.
2、 如权利要求 1所述的方法, 其中,  2. The method of claim 1 wherein
所述语音信号帧的频谱信息是指根据所述语音信号帧的频域信号计算得 到的频谱信息, 或者, 对所述语音信号帧的频域信号进行平滑处理后根据平 滑处理后的频域信号计算得到的频语信息。  The spectrum information of the voice signal frame refers to the spectrum information calculated according to the frequency domain signal of the voice signal frame, or the frequency domain signal of the voice signal frame is smoothed and processed according to the smoothed frequency domain signal. Calculated frequency information.
3、 如权利要求 2所述的方法, 其中, 根据当前的语音信号帧和上一静音 插入描述帧的频语信息决定是否发送静音插入描述帧的步骤包括:  3. The method according to claim 2, wherein the step of deciding whether to send the mute insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame comprises:
判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入描述 帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧的频谱能量 和上一静音插入描述帧的频谱能量的差距大于第一预设限度时, 发送静音插 入描述帧。  Determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy of the speech signal frame and a previous mute insertion description When the difference in the spectral energy of the frame is greater than the first preset limit, the mute insertion description frame is sent.
4、 如权利要求 2所述的方法, 其中, 根据当前语音信号帧和上一静音插 入描述帧的频谱信息决定是否发送静音插入描述帧的步骤包括:  4. The method according to claim 2, wherein the step of deciding whether to send the mute insertion description frame according to the current speech signal frame and the spectrum information of the previous mute insertion description frame comprises:
判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入描述 帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧的频谱能量 和所述上一静音插入描述帧的频谱能量的差距大于第一预设限度时, 进一步 判断所述语音信号帧的频谱能量和所述上一静音插入描述帧的频谱能量的差 距是否大于第二预设限度, 如果是, 连续发送两个静音插入描述帧, 其中第 二预设限度对应的频谱能量差距大于第一预设限度对应的频谱能量差距。  Determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy of the speech signal frame and the previous muting When the difference between the spectral energy of the description frame is greater than the first preset limit, further determining whether the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a second preset limit, if The two mute insertion description frames are continuously sent, wherein the second preset limit corresponds to a spectral energy difference greater than a spectral energy difference corresponding to the first preset limit.
5、 如权利要求 3或 4所述的方法, 其中, 所述语音信号帧的频谱能量和 所述上一静音插入描述帧的频谱能量的差距大于预设限度是指:  The method according to claim 3 or 4, wherein the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit:
所述语音信号帧的频谱能量与上一静音插入描述帧的频谱能量的比值大 于预设限度对应的比值门限或者小于此比值门限的倒数, 其中所述比值门限 为大于 1的实数;  The ratio of the spectral energy of the speech signal frame to the spectral energy of the previous mute insertion description frame is greater than a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than one;
或者, 所述语音信号帧的频谱能量与所述上一静音插入描述帧的频谱能量的差 值绝对值大于差值门限。 or, The absolute value of the difference between the spectral energy of the speech signal frame and the spectral energy of the last mute insertion description frame is greater than the difference threshold.
6、 如权利要求 2所述的方法, 其中, 根据当前的语音信号帧和上一静音 插入描述帧的频语信息决定是否发送静音插入描述帧的步骤包括:  The method according to claim 2, wherein the step of determining whether to send the mute insertion description frame according to the current speech signal frame and the frequency information of the previous mute insertion description frame comprises:
判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入描述 帧的频谱能量的绝对值大于单帧能量门限时, 计算所述语音信号帧和上一静 音插入描述帧的频谱能量的频语相关值, 判断所计算的频语相关值小于频谱 相关性门限时, 发送静音插入描述帧。  Determining the absolute value of the spectral energy of the speech signal frame and/or the absolute value of the spectral energy of the last mute insertion description frame is greater than the single frame energy threshold, calculating the speech signal frame and the previous mute insertion description frame The frequency-dependent value of the spectral energy, when it is judged that the calculated frequency-related value is less than the spectral correlation threshold, sends a mute insertion description frame.
7、一种进行语音自适应非连续传输的装置, 包括静音插入描述帧处理单 元和静音插入描述帧存储单元; 其中,  7. A device for performing speech adaptive discontinuous transmission, comprising: a mute insertion description frame processing unit and a mute insertion description frame storage unit; wherein
所述静音插入描述帧处理单元设置为根据当前的语音信号帧和上一静音 插入描述帧的频谱信息决定是否发送静音插入描述帧;  The mute insertion description frame processing unit is configured to determine whether to send a mute insertion description frame according to the current speech signal frame and the spectrum information of the last mute insertion description frame;
所述静音插入描述帧存储单元设置为在所述静音插入描述帧处理单元发 送静音插入描述帧后, 存储此静音插入描述帧的频谱信息。  The mute insertion description frame storage unit is configured to store the spectrum information of the mute insertion description frame after the mute insertion description frame processing unit transmits the mute insertion description frame.
8、 如权利要求 7所述的装置, 其中,  8. The apparatus according to claim 7, wherein
所述静音插入描述帧处理单元还设置为对所述语音信号帧的频域信号进 行平滑处理后根据平滑处理后的频域信号计算得到所述语音信号帧的频语信 息;  The mute insertion description frame processing unit is further configured to perform smoothing processing on the frequency domain signal of the speech signal frame, and calculate the frequency information of the speech signal frame according to the smoothed frequency domain signal;
所述静音插入描述帧存储单元还设置为存储经过平滑处理后的频域信 号。  The mute insertion description frame storage unit is further arranged to store the smoothed frequency domain signal.
9、 如权利要求 8所述的装置, 其中, 所述静音插入描述帧处理单元是设 置为通过如下方式决定是否发送静音插入描述帧:  9. The apparatus according to claim 8, wherein the mute insertion description frame processing unit is configured to decide whether to transmit a mute insertion description frame by:
判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入描述 帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧的频谱能量 和上一静音插入描述帧的频谱能量的差距大于第一预设限度时, 发送静音插 入描述帧; 或者, 判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静 音插入描述帧的频谱能量的绝对值大于单帧能量门限, 并且所述语音信号帧 的频谱能量和所述上一静音插入描述帧的频谱能量的差距大于第一预设限度 时, 进一步判断所述语音信号帧的频谱能量和所述上一静音插入描述帧的频 谱能量的差距是否大于第二预设限度, 如果是, 连续发送两个静音插入描述 帧, 其中第二预设限度对应的频谱能量差距大于第一预设限度对应的频谱能 量差距; Determining an absolute value of a spectral energy of the speech signal frame and/or an absolute value of a spectral energy of the last mute insertion description frame is greater than a single frame energy threshold, and a spectral energy of the speech signal frame and a previous mute insertion description And sending a mute insertion description frame when the difference of the spectral energy of the frame is greater than the first preset limit; or determining an absolute value of the spectral energy of the speech signal frame and/or an absolute value of the spectral energy of the previous mute insertion description frame The value is greater than a single frame energy threshold, and a difference between a spectral energy of the speech signal frame and a spectral energy of the previous mute insertion description frame is greater than a first predetermined limit Further determining whether the difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a second preset limit, and if so, continuously transmitting two mute insertion description frames, wherein the second pre- The spectral energy difference corresponding to the limit is greater than the spectral energy difference corresponding to the first preset limit;
其中, 所述语音信号帧的频谱能量和所述上一静音插入描述帧的频谱能 量的差距大于预设限度是指: 所述语音信号帧的频谱能量与上一静音插入描 述帧的频谱能量的比值大于预设限度对应的比值门限或者小于此比值门限的 倒数, 其中所述比值门限为大于 1的实数; 或者, 所述语音信号帧的频谱能 量与所述上一静音插入描述帧的频谱能量的差值绝对值大于差值门限。  The difference between the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame is greater than a preset limit: the spectral energy of the speech signal frame and the spectral energy of the previous mute insertion description frame The ratio is greater than a ratio threshold corresponding to the preset limit or less than a reciprocal of the ratio threshold, wherein the ratio threshold is a real number greater than 1; or the spectral energy of the speech signal frame and the spectral energy of the previous muting insertion description frame The absolute value of the difference is greater than the difference threshold.
10、 如权利要求 8所述的装置, 其中, 所述静音插入描述帧处理单元是 设置为通过如下方式决定是否发送静音插入描述帧:  10. The apparatus according to claim 8, wherein the mute insertion description frame processing unit is configured to decide whether to transmit a mute insertion description frame by:
判断所述语音信号帧的频谱能量的绝对值和 /或所述上一静音插入描述 帧的频谱能量的绝对值大于单帧能量门限时, 计算所述语音信号帧和上一静 音插入描述帧的频谱能量的频语相关值, 判断所计算的频语相关值小于频谱 相关性门限时, 发送静音插入描述帧。  Determining the absolute value of the spectral energy of the speech signal frame and/or the absolute value of the spectral energy of the last mute insertion description frame is greater than the single frame energy threshold, calculating the speech signal frame and the previous mute insertion description frame The frequency-dependent value of the spectral energy, when it is judged that the calculated frequency-related value is less than the spectral correlation threshold, sends a mute insertion description frame.
PCT/CN2012/078878 2011-07-29 2012-07-19 Method and apparatus for performing voice adaptive discontinuous transmission WO2013017018A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110216374.3A CN102903364B (en) 2011-07-29 2011-07-29 Method and device for adaptive discontinuous voice transmission
CN201110216374.3 2011-07-29

Publications (1)

Publication Number Publication Date
WO2013017018A1 true WO2013017018A1 (en) 2013-02-07

Family

ID=47575567

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/078878 WO2013017018A1 (en) 2011-07-29 2012-07-19 Method and apparatus for performing voice adaptive discontinuous transmission

Country Status (2)

Country Link
CN (1) CN102903364B (en)
WO (1) WO2013017018A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10805191B2 (en) 2018-12-14 2020-10-13 At&T Intellectual Property I, L.P. Systems and methods for analyzing performance silence packets

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217723B (en) 2013-05-30 2016-11-09 华为技术有限公司 Coding method and equipment
EP2980790A1 (en) * 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for comfort noise generation mode selection
CN104378474A (en) * 2014-11-20 2015-02-25 惠州Tcl移动通信有限公司 Mobile terminal and method for lowering communication input noise
US9748929B1 (en) * 2016-10-24 2017-08-29 Analog Devices, Inc. Envelope-dependent order-varying filter control

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060149536A1 (en) * 2004-12-30 2006-07-06 Dunling Li SID frame update using SID prediction error
CN1964408A (en) * 2005-11-12 2007-05-16 鸿富锦精密工业(深圳)有限公司 A device and method for mute processing
CN101213591A (en) * 2005-06-18 2008-07-02 诺基亚公司 System and method for adaptive transmission of comfort noise parameters during discontinuous speech transmission
WO2008121035A1 (en) * 2007-03-29 2008-10-09 Telefonaktiebolaget Lm Ericsson (Publ) Method and speech encoder with length adjustment of dtx hangover period
CN101335001A (en) * 2007-11-02 2008-12-31 华为技术有限公司 DTX determination method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060149536A1 (en) * 2004-12-30 2006-07-06 Dunling Li SID frame update using SID prediction error
CN101213591A (en) * 2005-06-18 2008-07-02 诺基亚公司 System and method for adaptive transmission of comfort noise parameters during discontinuous speech transmission
CN1964408A (en) * 2005-11-12 2007-05-16 鸿富锦精密工业(深圳)有限公司 A device and method for mute processing
WO2008121035A1 (en) * 2007-03-29 2008-10-09 Telefonaktiebolaget Lm Ericsson (Publ) Method and speech encoder with length adjustment of dtx hangover period
CN101335001A (en) * 2007-11-02 2008-12-31 华为技术有限公司 DTX determination method and apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10805191B2 (en) 2018-12-14 2020-10-13 At&T Intellectual Property I, L.P. Systems and methods for analyzing performance silence packets
US11323343B2 (en) 2018-12-14 2022-05-03 At&T Intellectual Property I, L.P. Systems and methods for analyzing performance silence packets
US11729076B2 (en) 2018-12-14 2023-08-15 At&T Intellectual Property I, L.P. Systems and methods for analyzing performance silence packets

Also Published As

Publication number Publication date
CN102903364B (en) 2017-04-12
CN102903364A (en) 2013-01-30

Similar Documents

Publication Publication Date Title
JP7427752B2 (en) Device and method for reducing quantization noise in time domain decoders
JP4025018B2 (en) Composite signal activity detection for improved speech / noise selection of speech signals
EP2346027B1 (en) Method and apparatus for voice activity detection
JP4995913B2 (en) System, method and apparatus for signal change detection
US11417354B2 (en) Method and device for voice activity detection
KR102299938B1 (en) Time delay estimation method and device
WO2008148323A1 (en) A voice activity detecting device and method
KR101427863B1 (en) Audio signal coding method and apparatus
JP3273599B2 (en) Speech coding rate selector and speech coding device
KR101648290B1 (en) Generation of comfort noise
WO2013017018A1 (en) Method and apparatus for performing voice adaptive discontinuous transmission
WO2009115039A1 (en) Method and apparatus for noise generating
JP2019527855A (en) Method and encoder for encoding a multi-channel signal
CN112599140B (en) Method, device and storage medium for optimizing voice coding rate and operand
US20140172420A1 (en) Audio or voice signal processor
WO2008067763A1 (en) A decoding method and device
JP2019023742A (en) Method for estimating noise in audio signal, noise estimation device, audio encoding device, audio decoding device, and audio signal transmitting system
WO2014190641A1 (en) Media data transmission method, device and system
WO2008089696A1 (en) A method and device for accomplishing speech decoding in a speech decoder
JP4437011B2 (en) Speech encoding device
JP7577773B2 (en) METHOD AND ENCODER FOR ENCODING A MULTI-CHANNEL SIGNAL - Patent application
JP7504216B2 (en) Low-cost adaptation of low-pass postfilters.
CN115762547A (en) Method, device, coding method, medium and equipment for detecting and eliminating noise

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12820724

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12820724

Country of ref document: EP

Kind code of ref document: A1