WO2012163144A1 - 音频信号编码方法和装置 - Google Patents

音频信号编码方法和装置 Download PDF

Info

Publication number
WO2012163144A1
WO2012163144A1 PCT/CN2012/072792 CN2012072792W WO2012163144A1 WO 2012163144 A1 WO2012163144 A1 WO 2012163144A1 CN 2012072792 W CN2012072792 W CN 2012072792W WO 2012163144 A1 WO2012163144 A1 WO 2012163144A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
delay
encoding
high frequency
signal
Prior art date
Application number
PCT/CN2012/072792
Other languages
English (en)
French (fr)
Inventor
苗磊
刘泽新
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP12793206.9A priority Critical patent/EP2680260A4/en
Priority to KR1020137023033A priority patent/KR101427863B1/ko
Priority to JP2013555743A priority patent/JP2014508327A/ja
Publication of WO2012163144A1 publication Critical patent/WO2012163144A1/zh
Priority to US14/145,632 priority patent/US9251798B2/en
Priority to US15/011,824 priority patent/US9514762B2/en
Priority to US15/341,451 priority patent/US9779749B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • G10L19/265Pre-filtering, e.g. high frequency emphasis prior to encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/087Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • FIG. 11 is a schematic diagram of an audio signal processing apparatus according to an embodiment of the present invention.
  • the high frequency coding when the low frequency coding is time domain coding, the high frequency coding may be a time domain bandwidth extension or a frequency domain bandwidth extension; and when the low frequency coding is frequency domain coding, the high frequency coding may be a time domain bandwidth extension, or may be Frequency domain bandwidth extension.
  • Step 101 dividing the audio signal into a high frequency audio signal and a low frequency audio signal
  • This step illustrates several possibilities for encoding high frequency audio signals, one based on low frequency signals.
  • the encoding method determines the encoding mode of the high frequency audio signal, the second is to determine the encoding mode of the high frequency audio signal according to the characteristics of the audio signal, and the third is to simultaneously determine the high frequency by referring to the encoding mode of the low frequency signal and the characteristics of the audio signal.
  • the encoding method of the audio signal determines the encoding mode of the high frequency audio signal.
  • the encoding method of the low frequency audio signal may be time domain coding or frequency domain coding
  • the audio signal may be a voice audio signal or a music audio signal
  • the high frequency audio signal coding mode may be a time domain bandwidth extension mode or a frequency domain bandwidth extension mode.
  • bandwidth extension of high frequency audio signals it is necessary to encode with reference to the encoding mode or audio signal characteristics of the low frequency audio signal.
  • the selected bandwidth extension mode corresponds to the low frequency coding mode or the feature of the audio signal, and belongs to the same domain coding mode.
  • the selected bandwidth extension mode corresponds to the low frequency coding mode: when the low frequency audio signal uses the time domain coding mode, the time domain bandwidth extension mode is selected to perform time domain coding on the high frequency audio signal; when the low frequency audio signal is used In the frequency domain coding mode, the frequency domain bandwidth extension mode is selected to perform frequency domain coding on the high frequency audio signal. That is: the encoding method of the high-frequency audio signal and the low-frequency encoding method belong to the same domain encoding method (time domain encoding or frequency domain encoding).
  • the low frequency audio signal for example, the audio signal of 0-6.4 kHz may be time domain coded TD coding or frequency domain coded FD coding, and the high frequency
  • the bandwidth extension of an audio signal such as an 6.4-16/14 kHz audio signal, may be a time domain bandwidth extension TD-BWE or a frequency domain bandwidth extension FD-BWE.
  • a method for encoding a high-frequency audio signal by selecting a bandwidth extension mode is processed according to a low-frequency encoding manner of the low-frequency audio signal.
  • FIG. 5 the bandwidth expansion diagram of the audio signal encoding method of the embodiment of FIG. 5 is used. It is shown that when the low frequency (0-6.4 kHz) audio signal is time domain coded TD coding, the high frequency (6.4-16/14 kHz) audio signal is also the time domain bandwidth extension TD-BWE time domain coding; low frequency (0-6.4 kHz) When the audio signal is frequency domain coded FD coding, the high frequency (6.4- 16/14 kHz) audio signal is also the frequency domain code of the frequency domain bandwidth extension FD-B WE.
  • the encoding mode of the bandwidth extension of the high-frequency audio signal is determined according to the encoding mode of the low-frequency signal, and the encoding mode of the low-frequency audio signal is not considered when the bandwidth is expanded, and the limitation of the bandwidth expansion on the encoding quality of different audio signals is compensated for, and the adaptive is realized. Encoding, optimizing the quality of the audio coding.
  • Another way to select the bandwidth extension mode to encode the high frequency audio signal is to process it based on the characteristics of the audio signal or the low frequency audio signal. For example, if the audio signal/low frequency audio signal is a voice audio signal, the high frequency audio signal is encoded using time domain coding, and if the audio signal/low frequency audio signal is a music audio signal, the high frequency audio signal is performed using frequency domain coding. coding.
  • the encoding of the bandwidth extension of the high-frequency audio signal refers only to the characteristics of the audio signal/bass audio signal, and regardless of the encoding method of the low-frequency audio signal, the low-frequency audio signal is time-domain encoded, and the high-frequency audio signal may
  • the time domain coding may also be frequency domain coding, and when the low frequency audio signal is frequency domain coding, the high frequency audio signal may be frequency domain coding or time domain coding.
  • the encoding mode of the bandwidth extension of the high frequency audio signal is determined, and the limitation of the audio signal/low frequency audio signal is avoided in the bandwidth expansion, and the limitation of the bandwidth extension on the encoding quality of different audio signals is compensated for.
  • Implement adaptive coding to optimize audio coding quality There is another way to select the bandwidth extension mode to encode the high frequency audio signal, both according to the encoding method of the low frequency audio signal and the characteristics of the audio signal/low frequency audio signal.
  • the time domain bandwidth extension mode is selected to perform time domain coding on the high frequency audio signal; and when the low frequency audio signal is in the frequency domain coding mode, Or the low frequency audio signal is a time domain coding mode, and when the audio signal/low frequency audio signal is a music signal, the frequency domain bandwidth extension mode is selected to perform frequency domain coding on the high frequency audio signal.
  • FIG. 6 is a third schematic diagram of bandwidth expansion of an audio signal encoding method according to an embodiment of the present invention.
  • a low frequency (0-6.4 kHz) audio signal is time domain coded TD coding
  • the audio signal may be a frequency domain bandwidth extension FD-BWE frequency domain coding, or a time domain bandwidth extension TD-BWE time domain coding; and when a low frequency (0-6.4 kHz) audio signal is frequency domain coding FD coding,
  • the high frequency (6.4-16/14 kHz) audio signal is also the frequency domain code of the frequency domain bandwidth extension FD-BWE.
  • the encoding mode of the bandwidth extension of the high frequency audio signal is determined, and the bandwidth mode is not considered, and the encoding mode of the low frequency signal and the characteristics of the audio signal/low frequency audio signal are not considered.
  • Bandwidth extension has limitations on the encoding quality of different audio signals, enabling adaptive coding and optimizing audio coding quality.
  • the encoding method for the low frequency audio signal may be time domain coding or frequency domain coding, and the bandwidth extension method is also two, time domain bandwidth extension and frequency domain bandwidth extension, which may correspond to different low frequencies.
  • the bandwidth extension method is also two, time domain bandwidth extension and frequency domain bandwidth extension, which may correspond to different low frequencies.
  • Time domain bandwidth extension and frequency domain bandwidth extension may have different delays, so delay alignment is required to achieve a uniform delay.
  • the time-domain bandwidth extension and the frequency-domain bandwidth extension delay are preferably the same.
  • the time-domain bandwidth extension delay is fixed, and the frequency domain bandwidth extension delay is Adjusted, so you can adjust the delay of the frequency domain bandwidth extension to achieve delay uniformity.
  • Embodiments of the present invention can achieve zero delay bandwidth extension relative to decoding low frequency signals, where zero delay is relative to the low frequency band because the asymmetric window itself is time laged. Moreover, the embodiment of the present invention can perform different windowing on the high frequency band signal, and an asymmetric window is used here, as shown in the analysis window in ITU-T G.718 shown in FIG. Moreover, any delay from zero delay relative to the decoded low frequency signal to the high frequency window self delay relative to the decoded low frequency signal can be achieved, as shown in FIG.
  • FIG. 8 is a schematic diagram of windowing of different high frequency audio signals of the audio signal encoding method of the present invention, as shown in FIG.
  • frames for example, (m-1) frame frame, (m) frame frame and (m+1) frame frame, high-frequency signal high delay windowing, high Low signal window (low delay windowing) and high frequency signal zero delay windowing.
  • the delay window of the high-frequency signal does not consider the delay of the window itself, but only considers the force window method of different high-frequency signals.
  • FIG. 9 is a schematic diagram of a BWE of a high-frequency signal high-latency window in an audio signal encoding method according to the present invention. As shown in the figure, after the low-frequency audio signal of the input frame is completely decoded, the decoded low-frequency audio signal is used as a high-frequency excitation signal, and the input frame is high. The windowing of the frequency audio signal is determined based on the delay of decoding the low frequency audio signal of the input frame.
  • the delay that is, the decoded low-frequency audio signal requires an additional delay of D2 milliseconds and the decoded high-frequency audio signal is aligned, and the total delay of the output signal is D1+D2.
  • the same time-frequency transform processing is performed on the low-frequency audio signal at the decoding end and the high-frequency audio signal at the encoding end, and Both the high frequency audio signal and the low frequency audio signal at the decoding end are time-frequency transformed for the audio signal after the delay of D1 milliseconds, so the excitation signal is aligned.
  • FIG. 10 is a schematic diagram of a high-frequency signal zero delay window BWE in the audio signal encoding method of the present invention.
  • the encoding end directly adds a window to the high-frequency audio signal of the currently received frame, and the decoding end time-frequency transform processing uses the current
  • the low-frequency audio signal decoded by the frame is used as the excitation signal.
  • the excitation signal may be misaligned, the effect of the misalignment may be neglected after the excitation signal is corrected.
  • the decoded low-band signal delay is D1 milliseconds
  • the encoding end does not perform delay processing on the time-frequency conversion of the high-band signal, but only because the high-frequency signal windowing transformation generates a delay of D2 milliseconds
  • the total delay of the high frequency band signal decoded at the decoding end is D2 milliseconds.
  • the decoded low-frequency audio signal can be aligned with the decoded high-frequency audio signal without additional delay; but the high-band excitation signal is predicted at the decoding end from the low-frequency audio signal after the delay of D1 milliseconds.
  • Time-frequency transform obtained in the frequency domain signal so, high-frequency excitation signal and low-frequency excitation
  • the excitation signal is not aligned and has a D1 millisecond misalignment.
  • the overall delay of the decoded signal relative to the encoder signal is D1 or D2.
  • D1 is not equal to D2
  • D1 is smaller than D2
  • the overall delay of the decoded signal relative to the encoder signal is D2 milliseconds
  • the misalignment between the high frequency excitation signal and the low frequency excitation signal is D1 milliseconds
  • the decoded low frequency audio signal requires additional The delay (D2-D1) milliseconds is aligned with the decoded high frequency audio signal.
  • D1 is greater than D2
  • the overall delay of the decoded signal relative to the encoder signal is D1 milliseconds
  • the misalignment between the high frequency excitation signal and the low frequency excitation signal is D1 milliseconds
  • the decoded high frequency audio signal requires additional delay (D1- D2) Bytes aligned with the decoded low frequency audio signal.
  • the BWE between the zero delay window and the high delay window of the high frequency signal is the windowing of the high frequency audio signal of the currently received frame by the encoding end after D3 milliseconds, and the delay is between 0 and D1 milliseconds.
  • the time-frequency transform processing of the decoder uses the current frame decoded signal of the low-frequency audio signal as the excitation signal.
  • the excitation signal may have a certain misalignment, after the correction of the excitation signal, the influence of the misalignment may be neglected.
  • the decoded low frequency audio signal requires an additional delay of D3 milliseconds and the decoded high frequency audio signal is aligned; but the high frequency band excitation signal is predicted at the decoding end from the low frequency audio signal after the delay of D1 milliseconds.
  • the frequency domain signal obtained by the frequency transform is obtained, so that the high frequency excitation signal and the low frequency excitation signal are not aligned, and have a misalignment of (D1 - D3) milliseconds.
  • the overall delay of the decoded signal relative to the encoder signal is (D2+D3) or (D1+D3) milliseconds.
  • D1 is not equal to D2
  • D1 is smaller than D2
  • the overall delay of the decoded signal relative to the encoder signal is (D2+D3) milliseconds, and the misalignment between the high frequency excitation signal and the low frequency excitation signal is (D1-D3) milliseconds.
  • the decoded low frequency audio signal requires an additional delay (D2+D3-D1) milliseconds and the decoded high frequency audio signal is aligned.
  • the overall delay of the decoded signal relative to the encoding end signal is max(Dl, D2+D3) milliseconds, and the misalignment between the high frequency excitation signal and the low frequency excitation signal is (D1-D3) milliseconds, where max (a, b) means taking a larger value of a and b.
  • the decoded low frequency audio signal requires an additional delay (D2+D3-D1) milliseconds and the decoded high frequency audio signal is aligned, when max(Dl, D2+D3)
  • the overall delay of the end signal is D1 milliseconds, the error between the high frequency excitation signal and the low frequency excitation signal
  • the bit is D2 milliseconds, at which point the decoded low frequency audio signal can be aligned with the decoded high frequency audio signal without additional delay.
  • the embodiment of the present invention needs to keep the state of the frequency domain bandwidth extension updated, because the next frame may be a frequency domain bandwidth extension, and the same needs to extend the time domain bandwidth in the frequency domain bandwidth extension.
  • the state remains updated because it is possible to extend the time domain bandwidth to the next frame, thereby achieving continuity of bandwidth switching in this way.
  • FIG. 11 is a schematic diagram of an audio signal processing apparatus according to an embodiment of the present invention.
  • the signal processing apparatus of the embodiment of the present invention specifically includes: a dividing unit 11, a low frequency signal encoding unit 12, and a high frequency signal encoding unit 13.
  • the dividing unit 11 is configured to divide the audio signal into a high frequency audio signal and a low frequency audio signal;
  • the low frequency signal encoding unit 12 is configured to encode the low frequency audio signal by using a corresponding low frequency encoding manner according to characteristics of the low frequency audio signal; Yes i or code or frequency i or coding mode, for example for speech audio signals, encoding low frequency speech signals with time domain coding, and for music audio signals, encoding low frequency music signals with frequency domain coding. Because it is generally said that the speech signal uses the time domain coding effect better, and the music signal uses the frequency domain coding effect better.
  • the high frequency signal encoding unit 13 is configured to encode the high frequency audio signal by selecting a bandwidth extension mode according to the low frequency encoding mode and/or the characteristics of the audio signal.
  • the high frequency signal encoding unit 13 selects the time domain bandwidth extension mode to perform time domain or frequency domain encoding on the high frequency audio signal; and if the low frequency signal encoding unit 12 ⁇ With frequency domain coding, the high frequency signal encoding unit 13 selects the frequency domain bandwidth extension mode to perform time domain or frequency domain coding on the high frequency audio signal.
  • the high frequency signal encoding unit 13 encodes the high frequency frequency speech signal using time domain encoding, and if the audio signal/low frequency audio signal is a music audio signal, the high frequency signal The encoding unit 13 encodes the high frequency music signal using frequency domain encoding.
  • the encoding mode of the low frequency audio signal is not considered at this time.
  • the high frequency signal encoding unit 13 selects the time domain bandwidth extension mode to perform the high frequency audio signal. Domain coding; and when low frequency signal encoding unit 12 pairs low frequency audio signals In the frequency domain coding mode, or the low frequency signal encoding unit 12 uses the time domain coding mode for the low frequency audio signal, and the audio signal/low frequency audio signal is the music signal, the frequency domain bandwidth extension mode is selected to perform frequency domain coding on the high frequency audio signal. .
  • FIG. 12 is a schematic diagram of another audio signal processing apparatus according to an embodiment of the present invention. As shown in the figure, the signal processing apparatus of the embodiment of the present invention further includes: a low frequency signal decoding unit 14.
  • the low frequency signal decoding unit 14 is for decoding the low frequency audio signal; the low frequency audio signal encoding and decoding is for generating the first delay D1.
  • the high frequency signal encoding unit 13 is configured to perform a first delay D1 encoding on the high frequency audio signal, and the high frequency audio signal encoding generates a second delay D2;
  • the signal codec delay is the sum of the first delay D1 and the second delay D2 (D1+D2).
  • the high frequency signal encoding unit 13 is used to encode the high frequency audio signal, and the high frequency audio signal is encoded to generate the second delay D2; when the first delay D1 is less than or equal to the second delay In D2, the low frequency signal encoding unit 12 delays the difference between the second delay D2 and the first delay D1 (D2-D1) after encoding the low frequency audio signal, so that the audio signal encoding and decoding delay is the second delay D2; When the first delay D1 is greater than the second delay D2, the low frequency signal encoding unit 12 delays the difference between the first delay D1 and the second delay D2 after encoding the high frequency audio signal with the high frequency audio signal (D1-D2) ; The audio signal codec delay is the first delay D1.
  • the high frequency signal encoding unit 13 is configured to perform a third delay D3 encoding on the high frequency audio signal, and the high frequency audio signal encoding generates a second delay D2;
  • the low frequency signal encoding unit 12 encodes the low frequency audio signal and delays the difference between the second delay D2 and the third delay D3 and the first delay D1 (D2+D3-D1), so that The audio signal codec delay is the sum of the second delay D2 and the third delay D3 (D2+D3); when the first delay is greater than the second delay, there are two possibilities, if the first delay D1 And greater than or equal to the sum of the second delay D2 and the third delay D3 (D2+D3), the high frequency signal encoding unit 13 encodes the high frequency audio signal and delays the first delay D1 and the second delay D2, the third Delay D3 and difference (D1-D2-D3), if the first delay D1 is smaller
  • the audio signal encoding apparatus can be based on the encoding mode of the low frequency signal and/or Or the characteristics of the audio signal/low frequency signal to determine the encoding mode of the bandwidth extension of the high frequency audio signal, avoiding the bandwidth expansion without considering the encoding mode of the low frequency signal and the characteristics of the audio signal/low frequency audio signal, and compensating for the bandwidth extension to the encoding quality of different audio signals.
  • the limitation is to achieve adaptive coding and optimize audio coding quality.
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or technical field Any other form of storage medium known.

Abstract

一种音频信号编码方法和装置,方法包括:将音频信号分成高频音频信号和低频音频信号(101);利用低频音频信号特征对低频音频信号用相应的低频编码方式编码(102);根据低频编码方式和/或音频信号的特征,择带宽扩展模式对高频音频信号编码(103)。

Description

音频信号编码方法和装置
本申请要求于 2011 年 10 月 08 日提交中国专利局、 申请号为 201110297791.5、 发明名称为 "音频信号编码方法和装置" 的中国专利申请的 优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及通信领域, 尤其涉及一种音频信号编码方法和装置。 背景技术
在音频编码时, 由于比特率的限制和考虑到人耳的听觉特性, 所以优先编 码低频带音频信号的信息, 而丟弃高频带音频信号的信息。但随着网络技术的 发展, 网络带宽限制越来越小, 同时随着人们对音质越来越高的要求, 希望通 过增加信号的带宽而恢复高频带音频信号的信息, 由此提高音频信号的音质。 具体可以通过带宽扩展( BandWidth Extension , BWE )技术实现。
带宽扩展可以扩大音频信号频带范围、 提高信号质量。 目前常见的 BWE 技术包括: 例如 G.729.1中的时域(Time Domain, TD )带宽扩展算法, 活动图 像专家组 (Moving Picture Experts Group, MPEG) 中的频带复制( Spectral Band Replication, SBR )技术, 以及国际电信联盟 (International Telecommunication Union, ITU-T) G.722B/G.711. ID中的频域( Frequency Domain, FD )带宽扩展 算法。
图 1和图 2为现有技术的带宽扩展的示意图, 即无论低频 (如小于 6.4kHz ) 音频信号的编码是时域编码( TD coding )或者频域编码( FD coding ), 而高频 (如 6.4-16/14kHz )音频信号的带宽扩展都为时域带宽扩展(TD-BWE )或者 都为频域带宽扩展(FD-BWE )。
所以现有技术中,对于高频的音频信号的编码只是时域带宽扩展的时域编 码或只是频域带宽扩展的频域编码, 而不会考虑低频音频信号的编码方式,也 不会考虑音频信号的特性。 发明内容 本发明实施例的音频信号编码方法和装置, 可以实现自适应编码, 而非固 定编码模式。
本发明实施例提供了一种音频信号编码方法, 所述方法包括:
将音频信号分为高频音频信号和低频音频信号;
根据低频音频信号的特征对所述低频音频信号利用相应的低频编码方式 编码;
根据所述低频编码方式和 /或所述音频信号的特征, 选择带宽扩展模式对 所述高频音频信号编码。
本发明实施例提供了一种音频信号编码装置, 所述装置包括:
划分单元, 用于将音频信号分为高频音频信号和低频音频信号;
低频信号编码单元,用于根据低频音频信号的特征对所述低频音频信号利 用相应的氏频编码方式编码;
高频信号编码单元, 用于根据所述低频编码方式和 /或所述音频信号的特 征, 选择带宽扩展模式对所述高频音频信号编码。
本发明实施例音频信号编码方法和装置可以根据低频信号的编码模式和 / 或音频信号的特点来确定高频音频信号带宽扩展的编码方式,避免带宽扩展时 不考虑低频信号的编码模式和音频信号的特点,从而弥补带宽扩展局限于釆用 单一编码模式, 实现自适应的编码, 优化音频编码质量。 附图说明
图 1为现有技术的带宽扩展的示意图之一;
图 2为现有技术的带宽扩展的示意图之二;
图 3为本发明实施例音频信号编码方法的流程图;
图 4为本发明实施例音频信号编码方法的带宽扩展示意图之一;
图 5为本发明实施例音频信号编码方法的带宽扩展示意图之二;
图 6为本发明实施例音频信号编码方法的带宽扩展示意图之三;
图 7为 ITU-T G.718中的分析窗示意图;
图 8为本发明音频信号编码方法的不同高频音频信号的加窗示意图; 图 9为本发明音频信号编码方法中基于高频信号高延时窗的 BWE示意图; 图 10为本发明音频信号编码方法中基于高频信号零延时窗的 BWE示意 图;
图 11为本发明实施例音频信号处理装置的示意图;
图 12为本发明实施例另一音频信号处理装置的示意图。 具体实施方式
下面通过附图和实施例, 对本发明的技术方案做进一步的详细描述。
本发明实施例可以根据低频带音频信号的编码方式和音频信号的特点来 确定频带扩展的方法是时域带宽扩展还是频域带宽扩展。
这样当低频编码是时域编码时, 高频编码可以是时域带宽扩展, 也可以是 频域带宽扩展; 而低频编码是频域编码时, 高频编码可以是时域带宽扩展, 也 可以是频域带宽扩展。
图 3为本发明实施例音频信号编码方法的流程图, 如图所示, 本发明实施 例音频信号编码方法具体包括如下步骤:
步骤 101 , 将音频信号分为高频音频信号和低频音频信号;
因为低频的音频信号需要直接编码,而高频的音频信号必须经过带宽扩展 来进行编码;
步骤 102, 根据低频音频信号的特征对所述低频音频信号利用相应的低频 编码方式编码;
对低频音频信号编码具有两种方式, 可以是时域编码或频域编码方式, 例 如对于语音音频信号, 则利用时域编码对低频语音信号进行编码, 而对于音乐 音频信号, 则利用频域编码对低频音乐信号进行编码; 因为通常来讲说语音信 号釆用时域编码的效果比较好, 例如码激励线性预测 (Code Excited Linear Prediction, CELP ), 而音乐信号釆用频域编码的效果比较好, 例如使用改进离 散余弦变换 ( Modified Discrete Cosine Transform, MDCT )或快速傅立叶变换 ( Fast Fourier Transform, FFT )等。
步骤 103 , 根据低频编码方式或音频信号的特征, 选择带宽扩展模式对高 频音频信号编码。
本步骤是说明了对高频音频信号编码时的几种可能性,一是根据低频信号 的编码方式来决定高频音频信号的编码方式,二是根据音频信号的特征来对决 定高频音频信号的编码方式,三是同时参考低频信号的编码方式和音频信号的 特征来对决定高频音频信号的编码方式。
低频音频信号的编码方式可能是时域编码或者频域编码,而音频信号的特 征可以是语音音频信号或者音乐音频信号,高频音频信号编码方式可以是时域 带宽扩展模式或者频域带宽扩展模式,对于高频音频信号的带宽扩展需要参考 低频音频信号的编码方式或音频信号特征来编码。
根据所述低频编码方式或所述音频信号的特征,选择带宽扩展模式对所述 高频音频信号编码,选择的带宽扩展模式与低频编码方式或音频信号的特征对 应, 属于同一个域编码方式。
一个实施例中, 所选择的带宽扩展模式与低频编码方式对应: 当低频音频 信号釆用时域编码方式时,选择时域带宽扩展模式对高频音频信号进行时域编 码; 当低频音频信号釆用频域编码方式,选择频域带宽扩展模式对高频音频信 号进行频域编码。 即: 高频音频信号的编码方式与低频编码方式属于同一个域 编码方式(时域编码或者频域编码)。
另一个实施例中,所选择的带宽扩展模式与音频信号特征适合的低频编码 方式对应: 当音频信号为语音信号时, 选择时域带宽扩展模式对高频音频信号 进行时域编码; 当音频信号为音乐信号时,选择频域带宽扩展模式对高频音频 信号进行频域编码。 即: 高频音频信号的编码方式与音频信号特征适合的低频 编码方式属于同一个域编码方式(时域编码或者频域编码)。
另一个实施例中, 综合考虑低频编码方式和音频信号的特征,选择带宽扩 展模式对高频音频信号编码: 当低频音频信号为时域编码方式, 且音频信号为 语音信号时,选择时域带宽扩展模式对高频音频信号进行时域编码; 否则选择 频域带宽扩展模式对高频音频信号进行频域编码。
参见图 4的本发明实施例音频信号编码方法的带宽扩展示意图之一所示, 低频音频信号,例如 0-6.4kHz的音频信号可能是时域编码 TD coding或者频域编 码 FD coding, 而高频音频信号, 例如 6.4-16/14kHz的音频信号的带宽扩展可能 是时域带宽扩展 TD-BWE或者频域带宽扩展 FD-BWE。
也就是说本发明实施例的音频信号编码方法中,低频音频信号的编码方式 与高频信号的带宽扩展之间没有一一对应的关系。 例如,如果低频音频信号是 时域编码 TD coding , 其高频音频信号的带宽扩展既可能是时域带宽扩展 TD-BWE, 也可能是频域带宽扩展 FD-B WE; 而如果低频音频信号是频域编码 FD coding, 其高频音频信号的带宽扩展同样可能是时域带宽扩展 TD-BWE, 也可能是频域带宽扩展 FD-BWE。
具体的 ,一种选择带宽扩展模式对高频音频信号编码的方式是根据低频音 频信号的低频编码方式进行处理, 一并参见图 5的本发明实施例音频信号编码 方法的带宽扩展示意图之二所示, 低频 (0-6.4kHz )音频信号是时域编码 TD coding时, 高频 ( 6.4-16/14kHz )音频信号同样也是时域带宽扩展 TD-BWE的 时域编码; 低频 ( 0-6.4kHz ) 音频信号是频域编码 FD coding时, 高频 ( 6.4- 16/14kHz )音频信号同样也是频域带宽扩展 FD-B WE的频域编码。
所以高频音频信号编码的方式与低频音频信号的编码方式是属于相同域 的, 而不参考音频信号 /低音音频信号的特征, 也就是说高频音频信号的编码 参照低频音频信号编码的方式进行处理的, 与音频信号 /低音音频信号的特征 无关。
因此, 根据低频信号的编码方式来确定高频音频信号带宽扩展的编码方 式,避免带宽扩展时不考虑低频音频信号的编码方式, 弥补带宽扩展对不同音 频信号编码质量的局限性, 实现自适应的编码, 优化音频编码质量。
另外一种选择带宽扩展模式对高频音频信号编码的方式,是根据音频信号 或者低频音频信号的特征来处理。 例如如果音频信号 /低频音频信号是语音音 频信号, 则利用时域编码对高频音频信号进行编码, 而如果音频信号 /低频音 频信号是音乐音频信号, 则利用频域编码对高频音频信号进行编码。
同时参见图 4所示, 高频音频信号带宽扩展的编码只参考音频信号 /低音音 频信号的特征, 而无论低频音频信号的编码方式, 所以低频音频信号是时域编 码时, 高频音频信号可能是时域编码也可能是频域编码, 而低频音频信号是频 域编码时, 高频音频信号可能是频域编码也可能是时域编码。
因此, 根据音频信号 /低频信号的特点来确定高频音频信号带宽扩展的编 码方式, 避免带宽扩展时不考虑音频信号 /低频音频信号的特点, 弥补带宽扩 展对不同音频信号编码质量的局限性,实现自适应的编码,优化音频编码质量。 再有一种选择带宽扩展模式对高频音频信号编码的方式,既要根据低频音 频信号的编码方式也要根据音频信号 /低频音频信号的特征。 例如当低频音频 信号为时域编码方式, 而且音频信号 /低频音频信号为语音信号时, 选择时域 带宽扩展模式对高频音频信号进行时域编码;而当低频音频信号为频域编码方 式, 或者低频音频信号为时域编码方式, 且音频信号 /低频音频信号为音乐信 号时, 选择频域带宽扩展模式对高频音频信号进行频域编码。
图 6为本发明实施例音频信号编码方法的带宽扩展示意图之三,如图所示, 当低频( 0-6.4kHz )音频信号为时域编码 TD coding时, 高频( 6.4-16/14kHz ) 音频信号可以是频域带宽扩展 FD-BWE的频域编码, 也可以是时域带宽扩展 TD-BWE的时域编码; 而当低频 (0-6.4kHz )音频信号为频域编码 FD coding 时, 高频(6.4-16/14kHz )音频信号同样是频域带宽扩展 FD-BWE的频域编码。
因此, 根据低频信号的编码模式和音频信号 /低频信号的特点来确定高频 音频信号带宽扩展的编码方式,避免带宽扩展时不考虑低频信号的编码模式和 音频信号 /低频音频信号的特点, 弥补带宽扩展对不同音频信号编码质量的局 限性, 实现自适应的编码, 优化音频编码质量。
本发明实施例音频信号的编码方法中对于低频音频信号的编码方式可以 是时域编码或者频域编码, 而带宽扩展方法也是两种, 时域带宽扩展和频域带 宽扩展, 可以对应不同的低频带编码方式。
时域带宽扩展和频域带宽扩展有可能延时不同, 所以需要延时对齐, 以达 到统一的延时。
假设所有低频音频信号编码延时相同,这样时域带宽扩展和频域带宽扩展 的延时最好也相同,通常时域带宽扩展的延时是固定的, 而频域带宽扩展的延 时是可调的, 所以可以通过调整频域带宽扩展的延时来实现延时统一。
本发明实施例可以实现相对于解码低频信号的零延时带宽扩展,此处零延 时是相对于低频带而言, 因为非对称窗本身是有延时的。 而且本发明实施例可 以对高频带信号进行不同的加窗, 此处釆用的是非对称的窗, 如图 7所示的 ITU-T G.718中的分析窗。 而且可以实现从相对于解码低频信号的零延时到相 对于解码低频信号的高频窗自身延时之间的任一延时, 如图 8所示。
图 8为本发明音频信号编码方法的不同高频音频信号的加窗示意图, 如图 所示, 对于不同帧(frame ), 例 ¾口对于(m-1 )帧 frame、(m )帧 frame和 ( m+1 ) 帧 frame, 可以实现高频信号高延时窗( High delay windowing )、 高频信号低延 时窗 ( Low delay windowing )和高频信号零延时窗 ( Zero delay windowing )。 这里高频信号各延时窗并没有考虑窗本身的延时,只是考虑不同的高频信号的 力口窗方式。
图 9为本发明音频信号编码方法中高频信号高延时窗的 BWE示意图, 如图 所示, 当输入帧的低频音频信号完全解码后, 用解码后的低频音频信号作为高 频激励信号,输入帧高频音频信号的加窗是根据输入帧低频音频信号解码的延 时来确定。
例如, 编解码的低频音频信号延时为 Dims, 在编码端编码器 Encoder对高 频音频信号进行时频变换时, 将延时 Dims的高频音频信号进行时频变换, 而 高频音频信号的加窗变换会产生 D2毫秒的延时, 所以在解码端解码器 Decoder 解码的高频带信号的总延时为 D1+D2毫秒; 这样相对解码的低频音频信号, 高 频音频信号有额外 D2毫秒延时, 即解码的低频音频信号需要额外延时 D2毫秒 和解码的高频音频信号对齐, 输出信号总延时为 Dl+D2。 而在解码端, 因为高 频激励信号需要从低频音频信号的预测中得到,所以对解码端的低频音频信号 和编码端的高频音频信号来说, 均做同样的时频变换处理, 而由于编码端的高 频音频信号和解码端的低频音频信号都是对延时 D1毫秒后的音频信号做时频 变换, 因此激励信号是对齐的。
图 10为本发明音频信号编码方法中高频信号零延时窗 BWE示意图, 如图 所示,是编码端对当前接收的帧的高频音频信号直接进行加窗, 解码端时频变 换处理用当前帧解码的低频音频信号作为激励信号,虽然激励信号可能会有一 定错位, 但是经过对激励信号进行修正, 错位的影响可以忽略不计。
例如, 解码的低频带信号延时为 D1毫秒, 而编码端对高频带信号做时频 变换时不做延时处理, 而只是由于高频信号加窗变换会产生 D2毫秒的延时, 所以在解码端解码的高频带信号的总延时为 D2毫秒。
当 D1等于 D2时, 解码的低频音频信号不需要额外延时即能和解码的高频 音频信号对齐; 但在解码端预测高频带激励信号是从对延时 D1毫秒后的低频 音频信号做时频变换得到的频域信号中得到的, 所以, 高频激励信号和低频激 励信号没有对齐, 具有 D1毫秒的错位。 解码信号相对于编码端信号总体延时 是 D1或者 D2。
当 D1不等于 D2时, 例如 D1小于 D2时, 解码信号相对于编码端信号总体延 时是 D2毫秒, 高频激励信号和低频激励信号之间的错位是 D1毫秒, 解码的低 频音频信号需要额外延时 (D2-D1)毫秒和解码的高频音频信号对齐。 如 D1大于 D2时, 这样解码信号相对于编码端信号总体延时是 D1毫秒, 高频激励信号和 低频激励信号之间的错位是 D1毫秒, 解码的高频音频信号需要额外延时 (D1-D2)毫秒和解码的低频音频信号对齐。
介于如上高频信号零延时窗和高延时窗之间的 BWE, 是编码端对当前接 收的帧的高频音频信号延时 D3毫秒后进行加窗, 该延时介于 0和 D1毫秒之间, 解码端时频变换处理用低频音频信号当前帧解码信号作为激励信号,虽然激励 信号可能会有一定错位,但是经过对激励信号进行修正,错位的影响可以忽略 不计。
当 D1等于 D2时, 解码的低频音频信号需要额外延时 D3毫秒和解码的高频 音频信号对齐; 但在解码端预测高频带激励信号是从对延时 D1毫秒后的低频 音频信号做时频变换得到的频域信号中得到的, 所以, 高频激励信号和低频激 励信号没有对齐, 具有 (D1-D3)毫秒的错位。 解码信号相对于编码端信号总体 延时是 (D2+D3)或者 (D1+D3)毫秒。
当 D1不等于 D2时, 例如 D1小于 D2时, 解码信号相对于编码端信号总体延 时是 (D2+D3)毫秒, 高频激励信号和低频激励信号之间的错位是 (D1-D3)毫秒, 解码的低频音频信号需要额外延时 (D2+D3-D1)毫秒和解码的高频音频信号对 齐。
如 D1大于 D2时, 这样解码信号相对于编码端信号总体延时是 max(Dl, D2+D3)毫秒, 高频激励信号和低频激励信号之间的错位是 (D1-D3)毫秒, 其中 max(a, b)表示取 a和 b的较大的一个值。 当 max(Dl, D2+D3)=D2+D3时, 解码的 低频音频信号需要额外延时 (D2+D3-D1)毫秒和解码的高频音频信号对齐, 当 max(Dl, D2+D3)=D1时, 解码的高频音频信号需要额外延时 (D1-D2 - D3)毫秒 和解码的低频音频信号对齐;举一特例, 当 D3=(D1-D2)毫秒, 这样解码信号相 对于编码端信号总体延时是 D1毫秒, 高频激励信号和低频激励信号之间的错 位是 D2毫秒, 此时解码的低频音频信号不需要额外延时即能和解码的高频音 频信号对齐。
所以,本发明实施例在时域带宽扩展中需要对频域带宽扩展的状态保持更 新, 因为下一帧有可能是频域带宽扩展, 同理在频域带宽扩展中需要对时域带 宽扩展的状态保持更新, 因为到下一帧有可能是时域带宽扩展, 由此通过这种 方法来实现带宽切换的连续性。
以上实施例是对于本发明音频信号编码方法的, 同样, 可以利用音频信号 处理装置来实现。图 11为本发明实施例音频信号处理装置的示意图,如图所示, 本发明实施例信号处理装置具体包括: 划分单元 11、低频信号编码单元 12和高 频信号编码单元 13。
划分单元 11用于将音频信号分为高频音频信号和低频音频信号;低频信号 编码单元 12用于根据低频音频信号的特征对所述低频音频信号利用相应的低 频编码方式编码; 而编码方式可以是时 i或编码或频 i或编码方式, 例如对于语音 音频信号, 利用时域编码对低频语音信号进行编码, 而对于音乐音频信号, 利 用频域编码对低频音乐信号进行编码。因为通常来讲说语音信号釆用时域编码 的效果比较好, 而音乐信号釆用频域编码的效果比较好。
高频信号编码单元 13用于根据所述低频编码方式和 /或所述音频信号的特 征, 选择带宽扩展模式对所述高频音频信号编码。
具体的,如果低频信号编码单元 12釆用时域编码, 则高频信号编码单元 13 选择时域带宽扩展模式对所述高频音频信号进行时域或频域编码;而如果低频 信号编码单元 12釆用频域编码,则高频信号编码单元 13选择频域带宽扩展模式 对所述高频音频信号进行时域或频域编码。
另外, 如果音频信号 /低频音频信号是语音音频信号, 则高频信号编码单 元 13利用时域编码对高频频语音信号进行编码, 而如果音频信号 /低频音频信 号是音乐音频信号,则高频信号编码单元 13利用频域编码对高频频音乐信号进 行编码。 此时不考虑低频音频信号的编码模式。
再有, 当低频信号编码单元 12对低频音频信号釆用时域编码方式, 而且音 频信号 /低频音频信号为语音信号时, 高频信号编码单元 13选择时域带宽扩展 模式对高频音频信号进行时域编码;而当低频信号编码单元 12对低频音频信号 釆用频域编码方式,或者低频信号编码单元 12对低频音频信号釆用时域编码方 式, 且音频信号 /低频音频信号为音乐信号时, 选择频域带宽扩展模式对高频 音频信号进行频域编码。
图 12为本发明实施例另一音频信号处理装置的示意图,如图所示, 本发明 实施例信号处理装置还具体包括: 低频信号解码单元 14。
低频信号解码单元 14用于对低频音频信号解码;低频音频信号编解码产生 第一延时 Dl。
具体的,如果高频音频信号有延时窗时, 高频信号编码单元 13用于对高频 音频信号进行第一延时 D1后编码, 高频音频信号编码产生第二延时 D2; 使得 音频信号编解码延时是第一延时 D1和第二延时 D2之和( D1+ D2 )。
如果高频音频信号没有延时窗时,高频信号编码单元 13用于对高频音频信 号编码, 高频音频信号编码产生第二延时 D2; 当第一延时 D1小于等于第二延 时 D2时, 低频信号编码单元 12对低频音频信号编码后延时第二延时 D2与第一 延时 D1之差(D2-D1 ) , 使得音频信号编解码延时是第二延时 D2; 当第一延时 D1大于第二延时 D2时, 低频信号编码单元 12对高频音频信号对高频音频信号 编码后延时第一延时 D1与第二延时 D2之差(D1-D2 ); 使得音频信号编解码延 时是第一延时 Dl。
如果高频音频信号为中间延时窗时,高频信号编码单元 13用于对高频音频 信号进行第三延时 D3后编码, 高频音频信号编码产生第二延时 D2; 当第一延 时小于等于第二延时时,低频信号编码单元 12对低频音频信号编码后延时第二 延时 D2和第三延时 D3与第一延时 D1之差(D2+D3-D1 ) , 使得音频信号编解码 延时是第二延时 D2和第三延时 D3之和(D2+D3 );当第一延时大于第二延时时, 具有两种可能性, 如果第一延时 D1大于等于第二延时 D2和第三延时 D3之和 ( D2+D3 ) , 高频信号编码单元 13对高频音频信号编码后延时第一延时 D1与第 二延时 D2、 第三延时 D3和之差 (D1-D2-D3 ), 如果第一延时 D1小于第二延时 D2和第三延时 D3之和(D2+D3 ), 低频信号编码单元 12对低频音频信号编码后 延时第二延时 D2加第三延时 D3与第一延时 D1之差(D2+D3-D1 ) , 使得音频信 号编解码延时是第一延时 D1或第二延时 D2和第三延时 D3之和(D2+D3 )。
因此, 本发明实施例音频信号编码装置可以根据低频信号的编码模式和 / 或音频信号 /低频信号的特点来确定高频音频信号带宽扩展的编码方式, 避免 带宽扩展时不考虑低频信号的编码模式和音频信号 /低频音频信号的特点, 弥 补带宽扩展对不同音频信号编码质量的局限性, 实现自适应的编码,优化音频 编码质量。
专业人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各 示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现, 为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地 描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决 于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用 来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范 围。
结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器 执行的软件模块, 或者二者的结合来实施。 软件模块可以置于随机存储器 ( RAM ), 内存、只读存储器(ROM )、 电可编程 ROM、 电可擦除可编程 ROM、 寄存器、 硬盘、 可移动磁盘、 CD-ROM、 或技术领域内所公知的任意其它形式 的存储介质中。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了 进一步详细说明, 所应理解的是, 以上所述仅为本发明的具体实施方式而已, 并不用于限定本发明的保护范围, 凡在本发明的精神和原则之内, 所做的任何 修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。

Claims

权 利 要 求
1、 一种音频信号编码方法, 其特征在于, 所述方法包括:
将音频信号分为高频音频信号和低频音频信号; 信号编码;
根据所述低频编码方式或所述音频信号的特征,选择带宽扩展模式对所述 高频音频信号编码。
2、 根据权利要求 1所述的音频信号编码方法, 其特征在于, 所述根据 所述低频编码方式,选择带宽扩展模式对所述高频音频信号编码具体为,如果 所述低频音频信号使用时域编码方式,则选择时域带宽扩展模式对所述高频音 频信号进行时域编码; 如果所述低频音频信号使用频域编码方式, 则选择频域 带宽扩展模式对所述高频音频信号进行频域编码。
3、 根据权利要求 1所述的音频信号编码方法, 其特征在于, 所述根据 所述音频信号的特征,选择带宽扩展模式对所述高频音频信号编码具体为, 所 述音频信号为语音信号时,选择时域带宽扩展模式对所述高频音频信号进行时 域编码; 所述音频信号为音乐信号时, 选择频域带宽扩展模式对所述高频音频 信号进行频域编码。
4、 根据权利要求 1所述的音频信号编码方法, 其特征在于, 所述根据 所述低频编码方式和所述音频信号的特征,选择带宽扩展模式对所述高频音频 信号编码具体为, 所述低频音频信号为时域编码方式,且所述音频信号为语音 信号, 则选择时域带宽扩展模式对所述高频音频信号进行时域编码; 否则选择 频域带宽扩展模式对所述高频音频信号进行频域编码。
5、 根据权利要求 1至 4所述的任一音频信号编码方法, 其特征在于, 还 包括:
对所述高频音频信号或低频音频信号做延时处理,使得高频音频信号和低 频音频信号在解码端的延时相同。
6、 根据权利要求 1至 5所述的任一音频信号编码方法, 其特征在于, 所述对所述高频音频信号编码具体为,对所述高频音频信号进行第一延时后编 码,使得所述音频信号编解码延时是第一延时和第二延时之和; 其中, 所述第 一延时为低频音频信号编解码产生的延时;所述第二延时为高频音频信号编码 产生的延时。
7、 根据权利要求 1至 5所述的任一音频信号编码方法, 其特征在于, 当第一延时小于等于第二延时时,对所述低频音频信号编码后延时第二延时与 第一延时之差, 使得音频信号编解码延时是第二延时; 当所述第一延时大于第 二延时时,对所述高频音频信号编码后延时第一延时与第二延时之差; 使得音 频信号编解码延时是第一延时; 其中, 所述第一延时为低频音频信号编解码产 生的延时; 所述第二延时为高频音频信号编码产生的延时。
8、 根据权利要求 1至 5所述的任一音频信号编码方法, 其特征在于, 所述对所述高频音频信号编码具体为,对所述高频音频信号进行第三延时后编 码;
当所述第一延时小于等于第二延时时,对所述低频音频信号编码后延时第 二延时和第三延时与第一延时之差, 使得音频信号编解码延时是第二延时和 第三延时之和; 当所述第一延时大于第二延时时,对所述高频音频信号编码后 延时第一延时与第二延时、第三延时和之差, 或者对所述低频音频信号编码后 延时第二延时加第三延时与第一延时之差, 使得音频信号编解码延时是第一 延时或第二延时和第三延时之和。
9、 一种音频信号编码装置, 其特征在于, 所述装置包括:
划分单元, 用于将音频信号分为高频音频信号和低频音频信号; 低频信号编码单元,用于根据低频音频信号的特征利用时域编码或频域编 码方式对所述低频音频信号编码;
高频信号编码单元, 用于根据所述低频编码方式和 /或所述音频信号的特 征, 选择带宽扩展模式对所述高频音频信号编码。
10、 根据权利要求 9所述的音频信号编码装置, 其特征在于, 所述高频 信号编码单元具体用于当所述低频音频信号使用时域编码方式,选择时域带宽 扩展模式对所述高频音频信号进行时域编码;当所述低频音频信号使用频域编 码方式, 选择频域带宽扩展模式对所述高频音频信号进行频域编码。
11、 根据权利要求 9所述的音频信号编码装置, 其特征在于, 所述音频 信号为语音信号时,所述高频信号编码单元具体用于选择时域带宽扩展模式对 所述高频音频信号进行时域编码; 所述音频信号为音乐信号时, 所述高频信号 编码单元具体用于选择频域带宽扩展模式对所述高频音频信号进行频域编码。
12、 根据权利要求 9所述的音频信号编码装置, 其特征在于, 所述低频 音频信号为时域编码方式,且所述音频信号为语音信号时, 所述高频信号编码 单元具体用于选择时域带宽扩展模式对所述高频音频信号进行时域编码,否则 选择频域带宽扩展模式对所述高频音频信号进行频域编码。
13、 根据权利要求 9至 12所述的任一音频信号编码装置, 其特征在于, 所述装置还包括:
低频信号解码单元, 用于对所述低频音频信号解码; 所述低频音频信号编 解码产生第一延时;
所述高频信号编码单元具体用于对所述高频音频信号进行第一延时后编 码, 所述高频音频信号编码产生第二延时; 使得音频信号编解码延时是第一延 时和第二延时之和。
14、 根据权利要求 9至 12所述的任一音频信号编码装置, 其特征在于, 当所述第一延时小于等于第二延时时,所述低频信号编码单元用于对所述 低频音频信号编码后延时第二延时与第一延时之差, 使得音频信号编解码延 时是第二延时; 当所述第一延时大于第二延时时, 所述高频信号编码单元用于 对所述高频音频信号编码后延时第一延时与第二延时之差;使得音频信号编解 码延时是第一延时; 其中, 所述第一延时为低频音频信号编解码产生的延时; 所述第二延时为高频音频信号编码产生的延时。
15、 根据权利要求 9至 12所述的任一音频信号编码装置, 其特征在于, 所述高频信号编码单元具体用于对所述高频音频信号进行第三延时后编 码;
当所述第一延时小于等于第二延时时,所述低频信号编码单元对所述低频 音频信号编码后延时第二延时和第三延时与第一延时之差, 使得音频信号编 解码延时是第二延时和第三延时之和; 当所述第一延时大于第二延时时, 所述 高频信号编码单元对所述高频音频信号编码后延时第一延时与第二延时、第三 延时和之差,或者所述低频信号编码单元对所述低频音频信号编码后延时第二 延时加第三延时与第一延时之差, 使得音频信号编解码延时是第一延时或第 二延时和第三延时之和; 其中, 所述第一延时为低频音频信号编解码产生的延 时; 所述第二延时为高频音频信号编码产生的延时。
PCT/CN2012/072792 2011-10-08 2012-03-22 音频信号编码方法和装置 WO2012163144A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
EP12793206.9A EP2680260A4 (en) 2011-10-08 2012-03-22 METHOD AND DEVICE FOR AUDIO SIGNAL ENCODING
KR1020137023033A KR101427863B1 (ko) 2011-10-08 2012-03-22 오디오 신호 코딩 방법 및 장치
JP2013555743A JP2014508327A (ja) 2011-10-08 2012-03-22 オーディオ信号符号化方法および装置
US14/145,632 US9251798B2 (en) 2011-10-08 2013-12-31 Adaptive audio signal coding
US15/011,824 US9514762B2 (en) 2011-10-08 2016-02-01 Audio signal coding method and apparatus
US15/341,451 US9779749B2 (en) 2011-10-08 2016-11-02 Audio signal coding method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110297791.5A CN103035248B (zh) 2011-10-08 2011-10-08 音频信号编码方法和装置
CN201110297791.5 2011-10-08

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/145,632 Continuation US9251798B2 (en) 2011-10-08 2013-12-31 Adaptive audio signal coding

Publications (1)

Publication Number Publication Date
WO2012163144A1 true WO2012163144A1 (zh) 2012-12-06

Family

ID=47258352

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/072792 WO2012163144A1 (zh) 2011-10-08 2012-03-22 音频信号编码方法和装置

Country Status (6)

Country Link
US (3) US9251798B2 (zh)
EP (2) EP2680260A4 (zh)
JP (3) JP2014508327A (zh)
KR (1) KR101427863B1 (zh)
CN (1) CN103035248B (zh)
WO (1) WO2012163144A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015036348A1 (en) * 2013-09-12 2015-03-19 Dolby International Ab Time- alignment of qmf based processing data
CN112992167A (zh) * 2021-02-08 2021-06-18 歌尔科技有限公司 音频信号的处理方法、装置及电子设备
RU2772778C2 (ru) * 2013-09-12 2022-05-25 Долби Интернэшнл Аб Временное согласование данных обработки на основе квадратурного зеркального фильтра

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2762325T3 (es) * 2012-03-21 2020-05-22 Samsung Electronics Co Ltd Procedimiento y aparato de codificación/decodificación de frecuencia alta para extensión de ancho de banda
US9129600B2 (en) * 2012-09-26 2015-09-08 Google Technology Holdings LLC Method and apparatus for encoding an audio signal
FR3008533A1 (fr) 2013-07-12 2015-01-16 Orange Facteur d'echelle optimise pour l'extension de bande de frequence dans un decodeur de signaux audiofrequences
CN103413553B (zh) * 2013-08-20 2016-03-09 腾讯科技(深圳)有限公司 音频编码方法、音频解码方法、编码端、解码端和系统
CN104517611B (zh) * 2013-09-26 2016-05-25 华为技术有限公司 一种高频激励信号预测方法及装置
CN110619884B (zh) * 2014-03-14 2023-03-07 瑞典爱立信有限公司 音频编码方法和装置
CN104269173B (zh) * 2014-09-30 2018-03-13 武汉大学深圳研究院 切换模式的音频带宽扩展装置与方法
US10638227B2 (en) 2016-12-02 2020-04-28 Dirac Research Ab Processing of an audio input signal
US11032580B2 (en) 2017-12-18 2021-06-08 Dish Network L.L.C. Systems and methods for facilitating a personalized viewing experience
US10365885B1 (en) 2018-02-21 2019-07-30 Sling Media Pvt. Ltd. Systems and methods for composition of audio content from multi-object audio
WO2021258350A1 (zh) * 2020-06-24 2021-12-30 华为技术有限公司 一种音频信号处理方法和装置
CN112086102B (zh) * 2020-08-31 2024-04-16 腾讯音乐娱乐科技(深圳)有限公司 扩展音频频带的方法、装置、设备以及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064311A1 (en) * 2002-10-01 2004-04-01 Deepen Sinha Efficient coding of high frequency signal information in a signal using a linear/non-linear prediction model based on a low pass baseband
CN1498396A (zh) * 2002-01-30 2004-05-19 ���µ�����ҵ��ʽ���� 音频编码与解码设备及其方法
US20050108009A1 (en) * 2003-11-13 2005-05-19 Mi-Suk Lee Apparatus for coding of variable bitrate wideband speech and audio signals, and a method thereof
CN1942928A (zh) * 2004-04-15 2007-04-04 诺基亚公司 音频信号编码
US20090107322A1 (en) * 2007-10-25 2009-04-30 Yamaha Corporation Band Extension Reproducing Apparatus
CN101572087A (zh) * 2008-04-30 2009-11-04 北京工业大学 嵌入式语音或音频信号编解码方法和装置
CN101896968A (zh) * 2007-11-06 2010-11-24 诺基亚公司 音频编码装置及其方法

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134518A (en) * 1997-03-04 2000-10-17 International Business Machines Corporation Digital audio signal coding using a CELP coder and a transform coder
DE60117471T2 (de) * 2001-01-19 2006-09-21 Koninklijke Philips Electronics N.V. Breitband-signalübertragungssystem
JP4308229B2 (ja) * 2001-11-14 2009-08-05 パナソニック株式会社 符号化装置および復号化装置
DE602004027750D1 (de) * 2003-10-23 2010-07-29 Panasonic Corp Spektrum-codierungseinrichtung, spektrum-decodierungseinrichtung, übertragungseinrichtung für akustische signale, empfangseinrichtung für akustische signale und verfahren dafür
EP2752849B1 (en) 2004-11-05 2020-06-03 Panasonic Intellectual Property Management Co., Ltd. Encoder and encoding method
KR100707174B1 (ko) 2004-12-31 2007-04-13 삼성전자주식회사 광대역 음성 부호화 및 복호화 시스템에서 고대역 음성부호화 및 복호화 장치와 그 방법
US9043214B2 (en) * 2005-04-22 2015-05-26 Qualcomm Incorporated Systems, methods, and apparatus for gain factor attenuation
US8010352B2 (en) * 2006-06-21 2011-08-30 Samsung Electronics Co., Ltd. Method and apparatus for adaptively encoding and decoding high frequency band
KR101390188B1 (ko) * 2006-06-21 2014-04-30 삼성전자주식회사 적응적 고주파수영역 부호화 및 복호화 방법 및 장치
CN101140759B (zh) 2006-09-08 2010-05-12 华为技术有限公司 语音或音频信号的带宽扩展方法及系统
KR101373004B1 (ko) * 2007-10-30 2014-03-26 삼성전자주식회사 고주파수 신호 부호화 및 복호화 장치 및 방법
KR100970446B1 (ko) 2007-11-21 2010-07-16 한국전자통신연구원 주파수 확장을 위한 가변 잡음레벨 결정 장치 및 그 방법
JP5108960B2 (ja) * 2008-03-04 2012-12-26 エルジー エレクトロニクス インコーポレイティド オーディオ信号処理方法及び装置
KR20100006492A (ko) * 2008-07-09 2010-01-19 삼성전자주식회사 부호화 방식 결정 방법 및 장치
ES2372014T3 (es) * 2008-07-11 2012-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Aparato y método para calcular datos de ampliación de ancho de banda utilizando un encuadre controlado por pendiente espectral.
KR101261677B1 (ko) * 2008-07-14 2013-05-06 광운대학교 산학협력단 음성/음악 통합 신호의 부호화/복호화 장치
EP2239732A1 (en) * 2009-04-09 2010-10-13 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Apparatus and method for generating a synthesis audio signal and for encoding an audio signal
JP5754899B2 (ja) * 2009-10-07 2015-07-29 ソニー株式会社 復号装置および方法、並びにプログラム

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1498396A (zh) * 2002-01-30 2004-05-19 ���µ�����ҵ��ʽ���� 音频编码与解码设备及其方法
US20040064311A1 (en) * 2002-10-01 2004-04-01 Deepen Sinha Efficient coding of high frequency signal information in a signal using a linear/non-linear prediction model based on a low pass baseband
US20050108009A1 (en) * 2003-11-13 2005-05-19 Mi-Suk Lee Apparatus for coding of variable bitrate wideband speech and audio signals, and a method thereof
CN1942928A (zh) * 2004-04-15 2007-04-04 诺基亚公司 音频信号编码
US20090107322A1 (en) * 2007-10-25 2009-04-30 Yamaha Corporation Band Extension Reproducing Apparatus
CN101896968A (zh) * 2007-11-06 2010-11-24 诺基亚公司 音频编码装置及其方法
CN101572087A (zh) * 2008-04-30 2009-11-04 北京工业大学 嵌入式语音或音频信号编解码方法和装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015036348A1 (en) * 2013-09-12 2015-03-19 Dolby International Ab Time- alignment of qmf based processing data
CN105637584A (zh) * 2013-09-12 2016-06-01 杜比国际公司 基于qmf的处理数据的时间对齐
RU2665281C2 (ru) * 2013-09-12 2018-08-28 Долби Интернэшнл Аб Временное согласование данных обработки на основе квадратурного зеркального фильтра
US10510355B2 (en) 2013-09-12 2019-12-17 Dolby International Ab Time-alignment of QMF based processing data
CN105637584B (zh) * 2013-09-12 2020-03-03 杜比国际公司 基于qmf的处理数据的时间对齐
US10811023B2 (en) 2013-09-12 2020-10-20 Dolby International Ab Time-alignment of QMF based processing data
RU2772778C2 (ru) * 2013-09-12 2022-05-25 Долби Интернэшнл Аб Временное согласование данных обработки на основе квадратурного зеркального фильтра
CN112992167A (zh) * 2021-02-08 2021-06-18 歌尔科技有限公司 音频信号的处理方法、装置及电子设备

Also Published As

Publication number Publication date
US20160148622A1 (en) 2016-05-26
KR101427863B1 (ko) 2014-08-07
EP2680260A4 (en) 2014-09-03
US9514762B2 (en) 2016-12-06
US9779749B2 (en) 2017-10-03
CN103035248A (zh) 2013-04-10
CN103035248B (zh) 2015-01-21
JP2015172778A (ja) 2015-10-01
EP3239980A1 (en) 2017-11-01
JP2014508327A (ja) 2014-04-03
JP2017187790A (ja) 2017-10-12
KR20130126695A (ko) 2013-11-20
US9251798B2 (en) 2016-02-02
US20140114670A1 (en) 2014-04-24
EP2680260A1 (en) 2014-01-01
US20170053661A1 (en) 2017-02-23

Similar Documents

Publication Publication Date Title
WO2012163144A1 (zh) 音频信号编码方法和装置
CA3033225C (en) Multi-channel signal encoding method and encoder
JP5426680B2 (ja) 信号処理方法及び装置
KR101680953B1 (ko) 인지 오디오 코덱들에서의 고조파 신호들에 대한 위상 코히어런스 제어
JP2013242579A (ja) ピッチ調整コーディング及び非ピッチ調整コーディングを使用する信号符号化
WO2009076871A1 (zh) 带宽扩展中激励信号的生成及信号重建方法和装置
WO2013044826A1 (zh) 一种下混信号生成、还原的方法和装置
WO2015007114A1 (zh) 解码方法和解码装置
US20190147895A1 (en) Coding of multiple audio signals
JP6768824B2 (ja) マルチチャンネルコーディング
US20220059099A1 (en) Method and apparatus for controlling multichannel audio frame loss concealment
JP7178506B2 (ja) 位相ecu f0補間スプリットのための方法および関係するコントローラ
EP3577647B1 (en) Multi channel decoding
JP7420829B2 (ja) 予測コーディングにおける低コスト誤り回復のための方法および装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12793206

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20137023033

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2013555743

Country of ref document: JP

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2012793206

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2012793206

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE