KR20110081291A

KR20110081291A - Multi-resolution switched audio encoding/decoding scheme

Info

Publication number: KR20110081291A
Application number: KR1020117010644A
Authority: KR
Inventors: 막스 노이엔도르프; 슈테판 바이에르; 예레미 레콤테; 길로메 푸치스; 율리엔 로빌리아르드; 니콜라우스 레텔바흐; 프레데릭 나겔; 랄프 가이거; 마르쿠스 물트루스; 베른하르트 그릴; 필리페 구르나이; 레드완 살라미
Original assignee: 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.; 보이세지 코포레이션
Priority date: 2008-10-08
Filing date: 2009-10-07
Publication date: 2011-07-13
Also published as: EP2345030A2; TWI419148B; CA2739736C; ZA201102537B; MX2011003824A; RU2011117699A; KR20130133917A; JP5555707B2; EP3640941A1; CN102177426A; AU2009301358A1; TWI520128B; CA2739736A1; BRPI0914056A2; WO2010040522A2; JP2012505423A; AU2009301358A8; KR20130069833A; TW201344679A; TW201142827A

Abstract

오디오 신호를 부호화하기 위한 오디오 부호화기는 제1 코딩 분기(400)를 포함하며, 제1 코딩 분기는 신호를 시간 영역에서 주파수 영역으로 변환하기 위한 제 변환기(410)를 포함한다. 나아가, 오디오 부호화기는 제2 시간/주파수 변환기(523)를 포함하는 제2 코딩 분기(500)를 포함한다. 추가로, 오디오 신호를 분석하기 위한 신호 분석기(300/525)가 제공된다. 신호 분석기는, 한편으로, 오디오 부분이 부호화기 출력신호에서 제1 부호화 분기로부터의 제1 부호화 신호로써 효과적인지 또는 제2 부호화 분기로부터의 제2 부호화 신호로써 효과적인지를 결정한다. 다른 한편으로, 신호 분석기는, 부호화된 신호를 발생시킬 때 변환기들(410, 523)에 의해 적용될 시간/주파수 분해능을 결정한다. 출력 인터페이스는, 제1 부호화 신호 및 제2 부호화 신호에 추가로, 제1 시간/주파수 변환기에 의해 사용되고 제2 시간/주파수 변환기에 의해 사용되는 분해능을 식별하는 분해능 정보를 포함한다.An audio coder for encoding an audio signal includes a first coding branch 400, wherein the first coding branch includes a transformer 410 for transforming the signal from the time domain to the frequency domain. Further, the audio encoder includes a second coding branch 500 that includes a second time / frequency converter 523. In addition, a signal analyzer 300/525 for analyzing the audio signal is provided. The signal analyzer determines, on the one hand, whether the audio portion is effective as the first coded signal from the first coding branch in the coder output signal or as the second coded signal from the second coding branch. On the other hand, the signal analyzer determines the time / frequency resolution to be applied by the transducers 410, 523 when generating the encoded signal. The output interface further includes, in addition to the first coded signal and the second coded signal, resolution information used by the first time / frequency converter and identifying a resolution used by the second time / frequency converter.

Description

[0001] MULTI-RESOLUTION SWITCHED AUDIO ENCODING / DECODING SCHEME [0002]

본 발명은 오디오 코딩에 관한 것으로, 보다 상세하게는 낮은 비트율(Bit Rate) 오디오 코딩 방법에 관한 것이다.The present invention relates to audio coding, and more particularly, to a low bit rate audio coding method.

당해 기술 분야에서, MP3나 AAC와 같은 주파수 영역 코딩 방법은 알려져 있다. 이러한 주파수 영역 부호화기들은, 시간 영역/주파수 영역 변환, 양자화 에러가 지각 모듈(Perceptual Module)로부터의 정보를 이용하여 제어되는 순차적인 양자화 단계, 그리고 양자화된 스펙트럴 계수들 및 관련 보조 정보(Side Information)가 코드 테이블들을 이용하여 엔트로피-부호화되는(Entorpy-encoded) 부호화(Encoding) 단계에 기초한다.
In the art, frequency domain coding methods such as MP3 and AAC are known. These frequency domain encoders can be classified into a time domain / frequency domain transform, a sequential quantization step in which quantization error is controlled using information from a perceptual module, quantized spectral coefficients and related side information, Is based on an entropy-encoded encoding step using code tables.

반면에, 3GPP TS 26.290에 기술된 바와 같이 AMR-WB+와 같은 음성(Speech)처리에 매우 적합한 부호화기들이 있다. 이러한 음성 코딩 방법은 시간 영역 신호의 선형 예측 필터링(Linear Predictive Filtering)을 수행한다. 그러한 LP 필터링은 입력 시간-영역 신호의 선형 예측 분석으로부터 도출된다. 생성된 LP 필터 계수들은 그 후 양자화/코딩 되고 보조 정보로서 전송된다. 상기 처리 과정은 선형 예측 코딩(LPC : Linear Prediction Coding)으로 알려져 있다. 상기 필터의 출력에서, 여기 신호(Excitation Signal)로도 알려진 예측 잔여 신호(Prediction Residual Signal) 또는 예측 에러 신호(Prediction Error Signal)는, ACELP 부호화기의 분석-합성(Analysis-by-synthesis) 단계를 이용하여 부호화되거나 또는 선택적으로, 오버랩(Overlap)을 가지는 푸리에 변환을 사용하는 변환 부호화기를 이용하여 부호화된다. ACELP 코딩과 TCX(Transform Coded eXcitation)로도 불리는 변환 코딩된 여기 코딩(Transform Coded Excitation Coding) 간의 결정은 폐 루프(Closed Loop) 또는 개방 루프(Open Loop) 알고리즘을 이용하여 행해진다.
On the other hand, there are encoders which are very suitable for speech processing such as AMR-WB + as described in 3GPP TS 26.290. This speech coding method performs linear predictive filtering of time domain signals. Such LP filtering is derived from linear prediction analysis of the input time-domain signal. The generated LP filter coefficients are then quantized / coded and transmitted as auxiliary information. This process is known as Linear Prediction Coding (LPC). At the output of the filter, a Prediction Residual Signal or a Prediction Error Signal, also known as an excitation signal, is generated by using an analysis-by-synthesis step of the ACELP encoder Coded, or alternatively, encoded using a transcoder using a Fourier transform with overlap. The determination between ACELP coding and Transform Coded Excitation Coding, also referred to as TCX (Transform Coded Excitation Coding), is performed using a closed loop or an open loop algorithm.

AAC 코딩 방법과 스펙트럴 대역 복사(SBR : Spectral Band Replication) 기술을 결합하는 HE-ACC(High Efficiency AAC) 부호화 방법과 같은 주파수 영역 오디오 코딩 방법은, “MPEG 서라운드(MPEG surround)”라는 이름으로 알려진 결합 스테레오(Joint stereo) 또는 다중 채널(Multi-channel) 코딩 도구(Coding tool)에도 결합될 수 있다.
A frequency-domain audio coding method such as a HE-ACC (High Efficiency AAC) coding method combining an AAC coding method and a spectral band replication (SBR) technique is known as " MPEG surround " And may also be combined with a joint stereo or a multi-channel coding tool.

반면에, AMR-WB+와 같은 음성 부호화기들은 고주파수 확장 단계(High Frequency Extension Stage) 및 스테레오 기능(Stereo Functionality) 또한 가지고 있다.
On the other hand, voice coders such as AMR-WB + also have a High Frequency Extension Stage and a Stereo Functionality.

주파수 영역 코딩 방법들은 음악 신호에 대하여 낮은 비트율에서 높은 품질을 나타낸다는 점에서 유리하다. 그러나, 낮은 비트율에서 음성 신호들의 품질은 문제가 있다.
Frequency domain coding schemes are advantageous in that they represent high quality at low bit rates for music signals. However, the quality of speech signals at low bit rates is problematic.

음성 코딩 방법들은 낮은 비트율에서도 음성 신호들에 대해 높은 품질을 나타내지만, 낮은 비트율에서 다른 신호들에 대해서는 안 좋은 품질을 나타낸다.Speech coding schemes show high quality for speech signals even at low bit rates, but poor quality for other signals at low bit rates.

본 발명의 목적은 향상된 부호화/복호화 개념을 제공하는 것이다.It is an object of the present invention to provide an improved encoding / decoding concept.

이러한 목적은 청구항 1에 따른 오디오 부호화기, 청구항 9에 따른 오디오 부호화 방법, 청구항 10에 따른 복호화기, 청구항 19에 따른 복호화 방법, 청구항 20에 따른 부호화된 신호 또는 청구항 21에 따른 컴퓨터 프로그램에 의해 달성된다.
This object is achieved by an audio encoder according to claim 1, an audio encoding method according to claim 9, a decoder according to claim 10, a decoding method according to claim 19, an encoded signal according to claim 20 or a computer program according to claim 21 .

본 발명은 최선의 코딩 알고리즘이 특정한 신호 특성에 대해 항상 선택될 수 있다는 점에서 하이브리드 또는 듀얼 모드 스위치드 코딩/부호화 방법이 유리하다는 발견에 기초한다. 다르게 말하면, 본 발명은 모든 신호 특성에 완벽하게 일치하는 신호 코딩 알고리즘을 찾지는 않는다. 이러한 방법은, 한편으로 최신의 오디오 부호화기들 그리고 다른 한편으로 음성 부호화기들 사이의 커다란 차이점으로부터 알 수 있듯이, 항상 하나의 절충이 될 것이다. 대신에 본 발명은, 각각의 오디오 신호 부분에 대해 최선의 적합한 코딩 알고리즘이 선택되도록 하기 위해, 한편으로는 음성 코딩 알고리즘 및 다른 한편으로는 오디오 코딩 알고리즘과 같은 서로 다른 코딩 알고리즘들을 스위치드(Switched) 방법으로 결합한다. 또한, 양쪽 코딩 분기(Coding Branch)는 시간/주파수 변환기를 포함하나, 한쪽 코딩 분기에서는 LPC 프로세서와 같은 추가의 영역 변환기(Domain Converter)가 제공되는 것 또한 본 발명의 특징이다. 이러한 영역 변환기는 제2 코딩 분기가 제1 코딩 분기보다 특정한 신호 특성에 더 적합하도록 보장한다. 그러나, 상기 영역 프로세서에 의한 신호 출력이 또한 스펙트럴 표현으로 변환되는 것도 본 발명의 특징이다.
The present invention is based on the discovery that a hybrid or dual mode switched coding / coding method is advantageous in that the best coding algorithm can always be selected for a particular signal characteristic. In other words, the present invention does not find a signal coding algorithm that perfectly matches all signal characteristics. This approach will always be a tradeoff, as can be seen from the large differences between the newer audio coders on the one hand and the speech coders on the other. Instead, the present invention employs different coding algorithms, such as a speech coding algorithm on the one hand and an audio coding algorithm on the other hand, in order to allow the best suitable coding algorithm to be selected for each audio signal portion Lt; / RTI > It is also a feature of the present invention that both coding branches include a time / frequency converter, but in one coding branch an additional domain converter, such as an LPC processor, is provided. This region converter ensures that the second coding branch is more suitable for the particular signal characteristic than the first coding branch. However, it is also a feature of the present invention that the signal output by the area processor is also converted to a spectral representation.

한편의 품질 및 다른 한편의 비트율 사이의 호적의 절충, 일정 고정 품질의 관점에서 낮은 비트율 또는 고정 비트율의 관점에서 높은 품질이 얻어지도록 하기 위해 제1 코딩 분기의 제1 변환기 및 제2 코딩 분기의 제2 변환기, 즉 양 변환기들은 다중 분해능 변환 코딩(Multi-resolution Transform Coding)을 적용하기 위해 구성되며, 대응하는 변환기의 분해능은, 오디오 신호, 특히 대응하는 코딩 분기에서 실제로 코딩되는 오디오 신호에 종속되도록 설정된다.
A trade-off between the quality of the one hand and the bit rate of the other hand, a trade-off between the quality of the first and second coding branches of the first coding branch in order to obtain a high quality in terms of low bit rate or constant bit rate, 2 converter, i.e., the transducers are configured to apply Multi-resolution Transform Coding, and the resolution of the corresponding transducer is set to be dependent on the audio signal, in particular the audio signal actually coded in the corresponding coding branch do.

본 발명에 따르면, 두 변환기의 시간/주파수 분해능은 바람직하게는, 각각의 시간/주파수 트랜스포머(Time/Frequency Transformer)가 상응하는 신호의 시간/주파수 분해능 요구에 최적으로 일치될 수 있도록, 서로간에 독립적으로 설정될 수 있다. 한편의 유효 비트(Useful Bit)와 다른 한 편의 보조 정보 비트 사이의 관계, 즉 비트 효율은 더 긴 블록의 크기/윈도우 길이에서 더 높다. 그러므로, 기본적으로 더 짧은 블록 크기/윈도우 길이/변환 길이를 적용하는 것에 비해 동일한 양의 보조 정보는 오디오 신호의 더 긴 시간 부분을 나타내므로, 양 변환기는 더 긴 윈도우 길이에 더 편향(Biased)되는 것이 선호된다. 바람직하게는, 부호화 분기들에서의 시간/주파수 분해능은 또한 상기 분기들에 위치한 다른 부호화/복호화 도구에 의해 영향을 받을 수도 있다. 바람직하게는, LPC 프로세서와 같은 영역 변환기를 포함하는 제2 코딩 분기는, 한편으로는 ACELP 분기 및 다른 한편으로는 TCX 방법과 같은 또 다른 하이브리드 방법을 포함하며, 제2 변환기가 상기 TCX 방법에 포함된다. 바람직하게는, TCX 분기에 위치한 시간/주파수 변환기의 분해능은 또한 부호화 결정에 의해 영향을 받으며, 따라서 제2 부호화 분기에서 신호의 부분은 제2 변환기를 포함하는 TCX 분기 또는 시간/주파수 변환기를 포함하지 않는 ACELP 분기에서 처리된다.
According to the invention, the time / frequency resolution of the two transducers is preferably chosen such that each time / frequency transformer is optimally matched to the time / frequency resolution requirement of the corresponding signal. Lt; / RTI > The relationship between the useful bit on the one hand and the auxiliary information bit on the other hand, that is, the bit efficiency, is higher in the size / window length of the longer block. Therefore, since the same amount of auxiliary information represents a longer time portion of the audio signal than basically applying a shorter block size / window length / conversion length, both converters are biased to a longer window length Is preferred. Preferably, the time / frequency resolution in the encoding branches may also be affected by other encoding / decoding tools located in the branches. Preferably, the second coding branch comprising a region converter such as an LPC processor comprises another hybrid method, such as an ACELP branch on the one hand and a TCX method on the other, and a second converter is included in the TCX method do. Preferably, the resolution of the time / frequency converter located in the TCX branch is also influenced by the encoding decision, so that the portion of the signal in the second encoding branch does not include a TCX branch or time / frequency converter comprising a second converter Not in the ACELP branch.

기본적으로, 영역 변환기나 제2 코딩 분기, 특히 제2 코딩 분기 내의 제1 처리 분기(Processing Branch) 및 제2 코딩 분기 내의 제2 처리 분기 어느 것도, 영역 변환기에서의 LPC 분석기, 제2 처리 분기에서의 TCX 부호화기 및 제1 처리 분기에서의 ACELP 부호화기와 같은 음성 관련 요소가 되어야만 하는 것은 아니다. 다른 응용들은, 한편으로는 음성 및 다른 한편으로는 음악과는 다른 오디오 신호의 다른 신호 특성이 측정되는 때에도 또한 유용하다. 부호화기측에서, 오디오 신호의 각 부분에 대해, 모든 부호화 대안들이 수행되고 최선의 결과가 선택될 수 있도록 하기 위해, 어떠한 영역 변환기와 부호화 분기 구현도 사용될 수 있고 분석-합성(Analysis-by-systhesis) 방법에 의해 가장 일치하는 알고리즘이 찾아질 수 있으며, 상기 최선의 결과는 부호화 결과에 목표 함수(Target Function)를 적용하여 찾아질 수 있다. 그러면, 복호화기가 부호화기측이나 어떤 신호 특성하의 어떠한 결정에도 신경 쓸 필요가 없이 단순히 전송된 보조 정보에 의해 코딩 분기를 결정하도록 하기 위해, 부호화된 오디오 신호의 특정 부분에 대한 기초가 되는 부호화 알고리즘을 복호화기에 알려주는 보조 정보가, 부호화기 출력 인터페이스에 의해 부호화된 오디오 신호에 부여된다. 게다가, 복호화기는 단지 정확한 복호화 분기를 선택하는 것만이 아니라, 부호화된 신호 내의 부호화된 보조 정보에 기초하여, 어떤 시간/주파수 분해능이 관련된 제1 복호화 분기 및 관련된 제2 복호화 분기에 적용될 지 또한 선택한다.
Basically, neither the area converter nor the second coding branch, especially the first processing branch in the second coding branch and the second processing branch in the second coding branch, are used in the LPC analyzer in the area converter, in the second processing branch Related elements such as the TCX encoder of the first processing branch and the ACELP encoder in the first processing branch. Other applications are also useful when other signal characteristics of an audio signal other than audio on the one hand and music are measured. On the encoder side, for each part of the audio signal, any region converter and encoding branch implementation can be used and an Analysis-by-Systhesis can be used to ensure that all encoding alternatives are performed and the best results can be selected. The best matching algorithm can be found by the method, and the best result can be found by applying a target function to the encoding result. Then, in order to allow the decoder to determine the coding branch by simply transmitting the auxiliary information without having to worry about the encoder side or any decision under any signal characteristic, it is necessary to decode the underlying coding algorithm for the specific part of the encoded audio signal Is given to the audio signal encoded by the encoder output interface. In addition, the decoder also selects which time / frequency resolution to apply to the first decoding branch and the second decoding branch associated, as well as just selecting the correct decoding branch, based on the encoded side information in the encoded signal .

따라서, 본 발명은, 신호 부분이 부호화될 때 일정 코딩 알고리즘에 적합하지 않은 알고리즘에 의해 발생하는 이러한 코딩 알고리즘의 단점을 피하고 모든 다른 코딩 알고리즘의 장점을 결합하는, 부호화/복호화 방법을 제공한다. 또한, 본 발명은, 서로 다른 부호화 분기들 내의 서로 다른 오디오 신호 부분들에 의해 야기되는 다른 신호/주파수 분해능 요구가 고려되지 않았다면 발생할, 어떠한 단점도 피한다. 대신에, 양 분기에서의 시간/주파수 변환기들의 가변적인 시간/주파수 분해능에 기인하여, 같은 시간/주파수 분해능이 양쪽 코딩 분기에 적용되거나, 단지 고정된 시간/주파수 분해능만이 어떤 코딩 분기에 대해 가능한 경우 발생하는 어떤 아티팩트는 적어도 축소되거나 심지어는 완전히 회피된다.
Thus, the present invention provides a coding / decoding method that avoids the disadvantages of such a coding algorithm, which is caused by an algorithm that is not suitable for a certain coding algorithm when the signal portion is coded, and combines the advantages of all other coding algorithms. The present invention also avoids any drawbacks that would arise if other signal / frequency resolution requirements caused by different audio signal portions in different encoding branches were not taken into account. Instead, due to the variable time / frequency resolution of the time / frequency converters in both branches, the same time / frequency resolution is applied to both coding branches, or only a fixed time / frequency resolution is possible for any coding branch Some artifacts that occur are at least reduced or even completely avoided.

제2 스위치는, “외부(Outer)” 제1 분기 영역과는 다른 영역에서이지만, 다시 두 처리 분기 사이를 결정한다. 다시 하나의 “내부(Inner)” 분기는 주로 소스 모델(Source Model)에 의해 또는 SNR 계산들에 의해 유발되고, 다른 “내부” 분기는 싱크 모델(Sink Model) 및/또는 심리 음향 모델(Psycho Acoustic Model)에 의해, 즉 마스킹(Masking)에 의해 유발되거나 또는 적어도 주파수/스펙트럴 영역 코딩 측면들을 포함한다. 예시적으로, 하나의 “내부” 분기는 주파수 영역 부호화기/스펙트럴 변환기를 가지고 다른 분기는 LPC 영역과 같은 다른 영역에서의 부호화기 코딩을 포함하며, 이러한 부호화기는 예를 들어 스펙트럴 변환 없이 입력 신호를 처리하는 CELP 또는 ACELP 양자화기/스케일러이다.
The second switch is in an area different from the " Outer " first branch area, but again determines between the two processing branches. One "Inner" branch is again triggered by the Source Model or by SNR calculations and the other "Inner" Branch is triggered by the Sink Model and / or the Psycho Acoustic Model), i. E. By masking, or at least frequency / spectral region coding aspects. Illustratively, one " inner " branch includes a frequency-domain encoder / spectral transformer and the other branch includes encoder coding in another domain, such as an LPC domain, Processing CELP or ACELP quantizer / scaler.

바람직한 추가 실시예는, 스펙트럴 영역 부호화 분기와 같은 제1 정보 싱크(Information Sink) 지향 부호화 분기, LPC-영역 부호화 분기와 같은 제2 정보 소스 또는 SNR 지향 부호화 분기, 및 제1 부호화 분기와 제2 부호화 분기 사이를 스위칭하기 위한 스위치를 포함하는 오디오 부호화기며, 상기 제2 부호화 분기는 여기 신호(Excitation Signal)를 발생시키는 LPC 분석 단계(LPC Analysis Stage)와 같은 시간 영역과 다른 특정 영역으로의 변환기(Converter)를 포함하며, 그리고 상기 제2 부호화 분기는 추가로 LPC 영역 처리 분기와 같은 특정 영역과 LPC 스펙트럴 영역 처리 분기와 같은 특정 스펙트럴 영역, 및 상기 특정 영역 코딩 분기와 상기 특정 스펙트럴 영역 코딩 분기 사이를 스위칭하기 위한 추가 스위치를 포함한다.
A further preferred embodiment includes a first information source-oriented coding branch, such as a spectral domain coding branch, a second information source or SNR-oriented coding branch, such as an LPC-domain coding branch, And a switch for switching between coding branches, wherein the second coding branch includes a converter for converting a time domain to a specific domain such as an LPC analysis stage (LPC analysis stage) for generating an excitation signal And the second encoding branch further includes a specific spectral region such as a specific region such as an LPC region processing branch and an LPC spectral region processing branch and a specific spectral region such as the specific spectral region coding And an additional switch for switching between the branches.

상기 발명의 추가 실시예는 스펙트럴 영역 복호화 분기와 같은 제1 영역, 제2 영역에서의 여기 신호와 같은 신호를 복호화하기 위한 LPC 영역 복호화 분기와 같은 제2 영역, 및 LPC 스펙트럴 영역과 같은 제3 영역에서의 여기 신호와 같은 신호를 복호화하기 위한 LPC-스펙트럴 복호화기 분기와 같은 제3 영역을 포함하며, 상기 제3 영역은 제2 영역으로부터의 주파수 변환을 수행함으로써 얻어지고, 제2 영역 신호와 제3 영역 신호를 위한 제1 스위치가 제공되며, 그리고 제1 영역 복호화기와 제2 영역 또는 제3 영역에 대한 복호화기 사이를 변환하기 위한 제2 스위치가 제공된다.
A further embodiment of the invention includes a first region such as a spectral region decoding branch, a second region such as an LPC region decoding branch for decoding a signal such as an excitation signal in a second region, and a second region such as an LPC spectral region A third region such as an LPC-spectral decoder branch for decoding a signal such as an excitation signal in the third region, the third region being obtained by performing frequency conversion from the second region, Signal and a third area signal are provided and a second switch for converting between the first area decoder and the decoder for the second area or the third area is provided.

본 발명의 바람직한 실시예들은 뒤에서 첨부된 도면들과 관련하여 설명된다.Preferred embodiments of the present invention are described below with reference to the accompanying drawings.

최선의 코딩 알고리즘이 특정한 신호 특성에 대해 선택될 수 있으며, 향상된 부호화/복호화 개념이 제공된다.Best coding algorithms can be selected for specific signal characteristics and improved coding / decoding concepts are provided.

도 1a는 본 발명의 제1 측면에 따른 부호화 방법의 블록도이다.
도 1b는 본 발명의 제1 측면에 따른 복호화 방법의 블록도이다.
도 1c는 본 발명의 추가 측면에 따른 부호화 방법의 블록도이다.
도 2a는 본 발명의 제2 측면에 따른 부호화 방법의 블록도이다.
도 2b는 본 발명의 제2 측면에 따른 복호화 방법의 도식도이다.
도 2c는 본 발명의 추가 측면에 따른 부호화 방법의 블록도이다.
도 3a는 본 발명의 추가 측면에 따른 부호화 방법의 블록도를 나타낸다.
도 3b는 본 발명의 추가 측면에 따른 복호화 방법의 블록도를 나타낸다.
도 3c는 연속적으로 접속된 스위치를 갖는 부호화 장치/방법의 개략적 표현을 나타낸다.
도 3d는 연속적으로 접속된 결합기가 사용되는, 복호화를 위한 장치 또는 방법의 도식도를 나타낸다.
도 3e는 시간 영역 신호와 양쪽의 부호화된 신호에 포함된 짧은 교차 페이드 영역(Cross Fade Region)을 나타내는 부호화된 신호와 대응하는 표현의 도해를 나타낸다.
도 4a는 부호화 분기 앞에 위치한 스위치의 블록도를 나타낸다.
도 4b는 부호화 분기 뒤에 위치한 스위치를 이용한 부호화 방법의 블록도를 나타낸다.
도 5a는 준주기적(Quasi-periodic) 또는 임펄스형 신호 세그먼트(Impulse-like Signal Segment)와 같은 시간 영역 음성 세그먼트의 파형을 나타낸다.
도 5b는 도 5a의 세그먼트의 스펙트럼을 나타낸다.
도 5c는 잡음형 세그먼트(Noise-like Segment)의 예로써 무성음(Unvoiced Speech)의 시간 영역 음성 세그먼트를 나타낸다.
도 5d는 도 5c의 시간 영역 파형의 스펙트럼을 나타낸다.
도 6은 분석-합성(Analysis by Synthesis) CELP 부호화기의 블록도를 나타낸다.
도 7a 내지 도 7d는 임펄스형 신호에 대한 예로써 유성음/무성음의 여기 신호를 나타낸다.
도 7e는 단기 예측 정보(Short-term Prediction Information)와 예측 에러(여기) 신호를 제공하는 부호화기 측 LPC 단계를 나타낸다.
도 7f는 가중 신호(Weighted Signal)를 발생시키기 위한 LPC 장치의 추가 실시예를 나타낸다.
도 7g는 도 2b의 변환기 537에서 요구되는 것과 같은 역가중 연산(Inverse Weighting Operation) 및 후속 여기 분석(Subsequent Excitation Analysis)을 적용함으로써 가중 신호를 여기 신호로 변환하기 위한 구현을 나타낸다.
도 8은 본 발명의 실시예에 따른 결합 다중 채널 알고리즘(Joint Multi-channel Algorithm)의 블록도를 나타낸다.
도 9는 대역폭 확장 알고리즘의 바람직한 실시예를 나타낸다.
도 10a는 개방 루프 결정(Open Loop Decision)을 수행할 때 스위치의 상세한 설명을 나타낸다.
도 10b는 폐 루프 결정(Closed Loop Decision)을 수행할 때 스위치의 도해를 나타낸다.
도 11A는 본 발명의 다른 실시예에 따른 오디오 부호화기의 블록도를 나타낸다.
도 11B는 발명된 오디오 복호화기의 다른 실시예의 블록도를 나타낸다.
도 12A는 발명된 부호화기의 또 다른 실시예를 나타낸다.
도 12B는 발명된 복호화기의 또 다른 실시예를 나타낸다.
도 13A는 분해능과 윈도우/변환 길이 사이의 상호 관계(interrelation)를 나타낸다.
도 13B는 제1 코딩 분기에서의 변환 윈도우 집합 및 제1 코딩 분기에서 제2 코딩 분기로의 전이(Transition)의 개괄을 나타낸다.
도 13C는 제1 코딩 분기에서의 윈도우 시퀀스 및 제2 분기로의 전이에서의 시퀀스를 포함하는 복수의 다른 윈도우 시퀀스를 나타낸다.
도 14A는 제2 코딩 분기의 바람직한 실시예의 프레임 구성(Framing)을 나타낸다.
도 14B는 제2 코딩 분기에 적용되는 짧은 윈도우(Short Window)를 나타낸다.
도 14C는 제2 코딩 분기에 적용되는 중간 크기 윈도우(Medium sized Window)를 나타낸다.
도 14D는 제2 코딩 분기에 의해 적용되는 긴 윈도우(Long Window)를 나타낸다.
도 14E는 슈퍼 프레임(Super Frame) 구역 내의 ACELP 프레임과 TCX 프레임의 예시적인 시퀀스를 나타낸다.
도 14F는 제2 부호화 분기에서 다른 시간/주파수 분해능에 대응하는 다른 변환 길이를 나타낸다.
도 14G는 도 14F의 정의를 사용한 윈도우의 구성을 나타낸다.FIG. 1A is a block diagram of a coding method according to a first aspect of the present invention.
1B is a block diagram of a decoding method according to the first aspect of the present invention.
1C is a block diagram of a coding method according to a further aspect of the present invention.
2A is a block diagram of a coding method according to a second aspect of the present invention.
2B is a schematic diagram of a decoding method according to a second aspect of the present invention.
2C is a block diagram of a coding method according to a further aspect of the present invention.
Figure 3a shows a block diagram of an encoding method according to a further aspect of the present invention.
Figure 3B shows a block diagram of a decoding method according to a further aspect of the invention.
Figure 3c shows a schematic representation of an encoder / method with a switch connected in series.
Figure 3d shows a schematic diagram of an apparatus or method for decoding, in which a continuously connected combiner is used.
FIG. 3E shows a representation of a representation corresponding to a coded signal representing a short-cross fade region included in a time-domain signal and both encoded signals.
4A shows a block diagram of a switch located in front of an encoding branch.
4B shows a block diagram of a coding method using a switch located after an encoding branch.
Figure 5A shows a waveform of a time-domain speech segment, such as a quasi-periodic or impulse-like signal segment.
Figure 5b shows the spectrum of the segment of Figure 5a.
FIG. 5C shows a time-domain speech segment of Unvoiced Speech as an example of a noise-like segment.
FIG. 5D shows the spectrum of the time domain waveform of FIG. 5C. FIG.
Figure 6 shows a block diagram of an Analysis by Synthesis CELP encoder.
7A to 7D show an excitation signal of a voiced / unvoiced sound as an example of an impulse-type signal.
7E shows an encoder-side LPC step that provides short-term prediction information and a prediction error (excitation) signal.
Figure 7f shows a further embodiment of an LPC device for generating a weighted signal.
FIG. 7G shows an implementation for transforming the weighted signal into an excitation signal by applying an inverse weighting operation and a subsequent excitation analysis as required in the transformer 537 of FIG. 2B.
8 is a block diagram of a joint multi-channel algorithm according to an embodiment of the present invention.
Figure 9 shows a preferred embodiment of a bandwidth extension algorithm.
10A shows a detailed description of a switch when performing an open loop decision.
FIG. 10B shows an illustration of the switch when performing a closed loop decision.
11A is a block diagram of an audio encoder according to another embodiment of the present invention.
11B shows a block diagram of another embodiment of the inventive audio decoder.
12A shows another embodiment of the inventive encoder.
12B shows another embodiment of the inventive decoder.
Figure 13A shows the interrelation between resolution and window / translation length.
Fig. 13B shows an overview of the transformation window set in the first coding branch and the transition from the first coding branch to the second coding branch.
13C shows a window sequence in a first coding branch and a plurality of other window sequences including a sequence in a transition to a second branch.
14A shows the framing of the preferred embodiment of the second coding branch.
14B shows a short window applied to the second coding branch.
14C shows a medium sized window applied to the second coding branch.
14D shows a Long Window applied by the second coding branch.
Figure 14E shows an exemplary sequence of ACELP and TCX frames within a Super Frame zone.
Figure 14F shows another transform length corresponding to different time / frequency resolution in the second coding branch.
14G shows the configuration of a window using the definition of Fig. 14F.

도 11A는 오디오 신호를 부호화하기 위한 오디오 부호화기의 실시예를 나타낸다. 상기 부호화기는 제1 부호화 신호(First Encoded Signal)를 얻기 위한 제1 코딩 알고리즘을 사용하여 오디오 신호를 부호화하기 위한 제1 코딩 분기 400을 포함한다.
11A shows an embodiment of an audio encoder for encoding an audio signal. The encoder includes a first coding branch 400 for coding an audio signal using a first coding algorithm for obtaining a first encoded signal.

오디오 부호화기는 제2 부호화 신호(Second Encoded Signal)를 얻기 위한 제2 코딩 알고리즘을 사용하여 오디오 신호를 부호화하기 위한 제2 코딩 분기 500을 더 포함한다. 제1 코딩 알고리즘은 제2 코딩 알고리즘과 다르다. 추가로, 오디오 신호의 부분에 대해 제1 부호화 신호 또는 제2 부호화 신호가 부호화기 출력 신호 801에 있도록 하기 위해, 제1 코딩 분기 및 제2 코딩 분기 사이에서 스위칭하는 제1 스위치 200이 제공된다.
The audio encoder further includes a second coding branch 500 for encoding an audio signal using a second coding algorithm to obtain a second encoded signal. The first coding algorithm is different from the second coding algorithm. In addition, a first switch 200 is provided for switching between the first coding branch and the second coding branch, so that the first coded signal or the second coded signal is in the coder output signal 801 for a portion of the audio signal.

도 11A에 나타난 오디오 부호화기는, 오디오 신호의 부분이 부호화기 출력 신호 801에서 제1 부호화 신호로 또는 제2 부호화 신호로써 나타나는지 결정하도록, 오디오 신호의 부분을 분석하기 위해 구성된 신호 분석기 300/525를 더 포함한다.
The audio encoder shown in FIG. 11A further includes a signal analyzer 300/525 configured to analyze a portion of the audio signal to determine whether a portion of the audio signal appears as a first encoded signal or as a second encoded signal in the encoder output signal 801 do.

상기 신호 분석기 300/525는, 추가로 제1 코딩 분기 400내의 제1 변환기 410 또는 제2 부호화 분기 500 내의 제2 변환기 523의 각각의 시간/주파수 분해능을 가변적으로 결정하기 위해 구성된다. 상기 시간/주파수 분해능은, 오디오 신호의 부분을 나타내는 제1 부호화 신호 또는 제2 부호화 신호가 발생될 때 적용된다.
The signal analyzer 300/525 is further configured to variably determine the temporal / frequency resolution of each of the first transducer 410 in the first coding branch 400 or the second transducer 523 in the second coding branch 500. The time / frequency resolution is applied when a first coded signal or a second coded signal representing a portion of an audio signal is generated.

상기 오디오 부호화기는 부호화기 출력 신호 801를 발생하기 위한 출력 인터페이스 800를 더 포함한다. 상기 부호화기 출력 신호 801는 오디오 신호의 부분의 부호화된 표현, 및 오디오 신호의 표현이 제1 부호화 신호인지 또는 제2 부호화 신호인지 여부를 가리키고 제1 부호화 신호와 제2 부호화 신호를 복호화하기 위해 사용되는 시간/주파수 분해능을 가리키는 정보를 포함한다.
The audio encoder further includes an output interface 800 for generating an encoder output signal 801. The encoder output signal 801 indicates whether the representation of the audio signal is a first encoded signal or a second encoded signal, and is used to decode the first encoded signal and the second encoded signal. And information indicating time / frequency resolution.

제2 부호화 분기는, 제1 부호화 분기에서 오디오 신호가 다른 영역으로 처리되는 영역으로부터의 오디오 신호를 변환하기 위한 영역 변환기(Domain Converter)를 추가로 포함한다는 점에서, 제2 부호화 분기는 바람직하게는 제1 부호화 분기와 다르다. 바람직하게는 상기 영역 변환기는 LPC 프로세서 510이지만, 영역 변환기가 제1 변환기 410 및 제2 변환기 523와 다른 한, 다른 어떤 방법으로도 구현될 수 있다.
The second encoding branch preferably further comprises a domain converter for transforming the audio signal from the region where the audio signal is processed in the other region in the first encoding branch, Which is different from the first encoding branch. Preferably, the domain transformer is an LPC processor 510, but the domain transformer may be implemented in any other way than the first transformer 410 and the second transformer 523.

제1 변환기 410은 바람직하게는 윈도우어(Windower) 410a 및 트랜스포머(Transformer) 410b를 포함하는 시간/주파수 변환기이다. 윈도우어 410a 분석 윈도우(Analysis Window)를 입력 오디오 신호에 적용하고, 트랜스포머 410b는 윈도우잉된 신호의 스펙트럴 형태로의 변환을 수행한다.
The first transducer 410 is preferably a time / frequency converter including a Windower 410a and a Transformer 410b. A window 410a analysis window is applied to the input audio signal, and a transformer 410b performs the conversion of the windowed signal to the spectral form.

유사하게, 제2 변환기 523는 바람직하게는 윈도우어 523a와 순차적으로 연결된 트랜스포머 523b를 포함한다. 윈도우어 523a는 영역 변환기(Domain Converter) 510에 의한 신호 출력 및 그 신호의 윈도우잉된 형태를 출력한다. 윈도우어 523a에 의해 적용되는 하나의 분석 윈도우의 결과는 스펙트럴 형태를 형성하기 위해 트랜스포머 523b로 입력된다. 상기 트랜스포머는 소프트웨어, 하드웨어 또는 혼합된 하드웨어/소프트웨어 수행에서 관련된 알고리즘을 구현하는 FFT 또는 바람직하게는 MDCT 프로세서일 수 있다. 선택적으로, 상기 트랜스포머는 프로토타입 필터(Prototype Filter)의 실수 또는 복소 변조(Real-valued or Complex Modulation)을 기초로 할 수 있는 QMF 필터뱅크와 같은 필터뱅크 구현일 수 있다. 특정 필터뱅크 구현을 위해, 윈도우가 적용된다. 그러나, 다른 필터뱅크 구현에서는, MDCT의 FFT에 기반한 변환 알고리즘에서 요구되는 것과 같은 윈도우잉(Windowing)은 필요하지 않다. 필터뱅크 구현이 이용될 때, 상기 필터뱅크는 가변 분해능 필터뱅크(Variable Resolution Filterbank)이고 분해능은 필터뱅크의 주파수 분해능을 조정하고, 추가적으로, 시간 분해능을 조정하거나 또는 시간 분해능이 아닌 주파수 분해능만을 조정한다. 그러나, 변환기가 FFT 또는 MDCT 또는 어떠한 다른 대응하는 트랜스포머로써 구현되는 경우, 시간 내의 더 큰 블록 길이에 의해 얻어지는 주파수 분해능의 증가는 곧 자동적으로 더 작은 시간 분해능과 상응하고 그 역도 마찬가지라는 점에서, 주파수 분해능은 시간 분해능과 관련된다.
Similarly, the second converter 523 preferably includes a transformer 523b connected in series with the windower 523a. The windower 523a outputs a signal output by the domain converter 510 and a windowed form of the signal. The result of one analysis window applied by window 523a is input to transformer 523b to form a spectral shape. The transformer may be an FFT, or preferably an MDCT processor, that implements software, hardware, or related algorithms in mixed hardware / software implementations. Alternatively, the transformer may be a filter bank implementation, such as a QMF filter bank, which may be based on a Real-valued or Complex Modulation of a Prototype Filter. For a particular filter bank implementation, the window is applied. However, other filter bank implementations do not require windowing as required in the transform algorithm based on the FFT of MDCT. When a filter bank implementation is used, the filter bank is a Variable Resolution Filterbank and the resolution adjusts the frequency resolution of the filter bank and additionally adjusts the time resolution or only the frequency resolution that is not time resolution . However, when the transducer is implemented as an FFT or MDCT or any other corresponding transformer, the increase in frequency resolution obtained by the larger block length in time will automatically correspond to a smaller time resolution and vice versa, Resolution is related to time resolution.

추가로, 제1 코딩 분기는 양자화기/코더 단계 421를 포함할 수 있고, 제2 부호화 분기는 또한 하나 또는 그 이상의 추가 코딩 도구(Coding Tool) 524를 포함할 수 있다.
In addition, the first coding branch may include a quantizer / coder step 421, and the second coding branch may also include one or more additional coding tools 524.

중요하게, 신호 분석기는 제1 변환기 510와 제2 변환기 523에서 분해능 제어 신호를 발생시키기 위해 구성된다. 따라서, 한편으로는 낮은 비트율(Bit Rate)을 제공하고 다른 한편으로는 낮은 비트율의 관점에서 최대의 품질을 제공하는 코딩 방법을 갖기 위해, 양쪽 코딩 분기에서 독립적인 분해능 제어가 구현된다. 낮은 비트율의 목표를 달성하기 위해, 더 긴 윈도우 길이 또는 더 긴 변환 길이가 바람직하나, 이러한 긴 길이가 낮은 시간 분해능 때문에 아티팩트(Artifact)를 야기하는 상황에서는, 더 짧은 윈도우 길이와 더 짧은 변환 길이가 적용되며, 이는 더 낮은 주파수 분해능을 야기한다. 바람직하게는, 신호 분석기는 부호화 분기에서의 상응하는 알고리즘에 적합한 통계적 분석 또는 어떠한 다른 분석을 적용한다. 제1 코딩 분기가 AAC에 기초한 부호화기와 같은 주파수 영역 코딩 분기이고, 제2 코딩 분기가 영역 변환기(Domain Converter)로써 LPC 프로세서 510를 포함하는 하나의 구현 모드에서는, 대응하여 스위치 200을 제어함으로써 오디오 신호의 음성 부분이 제2 코딩 분기에 입력되도록 하기 위해, 신호 분석기가 음성/음악 구별을 수행한다. 오디오 신호의 음악 부분은, 스위치 제어 라인(Switch Control Line)에 의해 지시된 대로, 대응하여 스위치 200을 제어함으로써 제1 코딩 분기 400에 입력된다. 택일적으로, 도 1C와 도 4B에 관하여 뒤에서 논의되는 것처럼 스위치는 또한 출력 인터페이스 800 앞에도 위치할 수 있다.
Significantly, the signal analyzer is configured to generate a resolution control signal at the first transducer 510 and the second transducer 523. Thus, independent resolution control is implemented in both coding branches in order to have a coding method that provides a low bit rate on the one hand and a maximum quality in terms of low bit rate on the other hand. In order to achieve a low bit rate target, longer window lengths or longer conversion lengths are desirable, but in situations where such long lengths cause artifacts due to low time resolution, shorter window lengths and shorter conversion lengths , Which results in lower frequency resolution. Preferably, the signal analyzer applies a statistical analysis or any other analysis suitable for the corresponding algorithm in the coding branch. In one implementation mode where the first coding branch is a frequency-domain coding branch such as an AAC-based encoder and the second coding branch comprises an LPC processor 510 as a domain converter, by correspondingly controlling the switch 200, The signal analyzer performs the voice / music distinction so that the speech portion of the speech signal is input to the second coding branch. The music portion of the audio signal is input to the first coding branch 400 by correspondingly controlling the switch 200 as indicated by the switch control line. Alternatively, the switch may also be located in front of the output interface 800, as discussed below with respect to Figures 1C and 4B.

또한, 신호 분석기는 스위치 200으로의 오디오 신호 입력 또는 스위치 200에 의한 오디오 신호 출력을 수신할 수 있다. 또한, 신호 분석기는 단지 오디오 신호를 대응하는 코딩 분기에 공급하기 위함만이 아니라, 신호 분석기와 변환기를 연결하는 분해능 제어 라인(Resolution Controlled Line)에 의해 지시되는 제1 변환기 410 및 제2 변환기 523과 같은, 대응하는 코딩 분기 내의 각각의 변환기의 적정한 시간/주파수 분해능을 결정하기 위해서도 분석을 수행한다.
The signal analyzer may also receive an audio signal input to the switch 200 or an audio signal output by the switch 200. In addition, the signal analyzer not only supplies the audio signal to the corresponding coding branch, but also includes a first converter 410 and a second converter 523 indicated by a resolution control line connecting the signal analyzer and the converter The same analysis is also performed to determine the appropriate time / frequency resolution of each transducer in the corresponding coding branch.

도 11B는 도 11A의 오디오 부호화기에 부합되는 오디오 복호화기의 바람직한 실시예를 포함한다.
FIG. 11B includes a preferred embodiment of an audio decoder consistent with the audio encoder of FIG. 11A.

도 11B의 오디오 복호화기는 도 11A의 출력 인터페이스 800에 의한, 부호화기 출력 신호 801와 같은 부호화된 오디오 신호를 복호화하기 위해 구성된다. 상기 부호화된 신호는, 제1 코딩 알고리즘에 따라 부호화된 제1 부호화 오디오 신호, 제1 코딩 알고리즘과는 다른 제2 코딩 알고리즘에 따라 부호화된 제2 부호화 오디오 신호, 그리고 상기 제1 부호화 신호와 상기 제2 부호화 신호의 복호화에서 제1 코딩 알고리즘이 사용되는지 또는 제2 코딩 알고리즘이 사용되는지 여부를 가리키는 정보 및 상기 제1 부호화 오디오 신호와 상기 제2 부호화 오디오 신호에 대한 시간/주파수 분해능 정보를 포함한다.
The audio decoder of FIG. 11B is configured to decode an encoded audio signal, such as encoder output signal 801, by output interface 800 of FIG. 11A. Wherein the encoded signal includes a first coded audio signal encoded according to a first coding algorithm, a second coded audio signal encoded according to a second coding algorithm different from the first coding algorithm, 2 information indicating whether a first coding algorithm or a second coding algorithm is used in decoding the coded signal and time / frequency resolution information for the first coded audio signal and the second coded audio signal.

상기 오디오 복호화기는 제1 코딩 알고리즘을 기초로 한 제1 부호화 신호를복호화하기 위한 제1 복호화 분기 431, 440을 포함한다. 나아가, 상기 오디오 복호화기는 제2 코딩 알고리즘을 사용한 제2 부호화 신호를 복호화하기 위한 제2 복호화 분기를 포함한다.
The audio decoder includes first decoding branches 431 and 440 for decoding a first coded signal based on a first coding algorithm. Further, the audio decoder includes a second decoding branch for decoding the second coded signal using the second coding algorithm.

상기 제1 복호화 분기는 스펙트럴 영역에서 시간 영역으로 변환하기 위한 제1 제어 가능 변환기(First Controllable Converter) 440을 포함한다. 상기 제어 가능 변환기는 제1 복호화 신호(First Decoded Signal)를 얻기 위해 제1 부호화 신호로부터의 시간/주파수 분해능 정보를 사용하여 제어되도록 구성된다.
The first decoding branch includes a first controllable converter 440 for converting from a spectral region to a time domain. The controllable transducer is configured to be controlled using time / frequency resolution information from a first coded signal to obtain a first decoded signal.

상기 제2 복호화 분기는 스펙트럴 표현에서 시간 표현으로 변환하기 위한 제2 제어 가능 변환기(Second Controllable Converter)를 포함하며, 상기 제2 제어 가능 변환기 534는 제2 부호화 신호에서 시간/주파수 분해능 정보 991을 사용하여 제어되도록 구성된다.
The second decodable branch includes a second controllable converter for converting from a spectral representation to a temporal representation and the second controllable converter 534 includes time / frequency resolution information 991 in the second encoded signal As shown in FIG.

상기 복호화기는 추가로 시간/주파수 분해능 정보 991에 따라 제1 변환기540와 제2 변환기 534를 제어하기 위한 제어기(Controller) 990을 포함한다.
The decoder further includes a controller 990 for controlling the first transducer 540 and the second transducer 534 according to the time / frequency resolution information 991.

또한, 상기 복호화기는 도 11A의 부호화기 내의 영역 변환기(Domain Converter) 510에 의해 적용된 영역 변환을 취소하기 위해 제2 복호화 신호를 이용하여 합성 신호(Synthesis Signal)를 발생시키기 위한 영역 변환기(Domain Converter)를 포함한다.
In addition, the decoder may include a domain converter (Domain Converter) for generating a synthesis signal using a second decoded signal in order to cancel the region conversion applied by the domain converter 510 in the encoder of FIG. 11A .

바람직하게는, 영역 변환기 540은 부호화된 신호에 포함된 LPC 필터 정보를 사용하여 제어되는 LPC 합성 프로세서(LPC Synthesis Processor)이며, 이러한 LPC 필터 정보는 도 11A의 LPC 프로세서 510에 의해 생성되고, 보조 정보로써 부호화기 출력 신호로 입력된 것이다. 상기 오디오 복호화기는 마지막으로 복호화된 오디오 신호 609를 얻기 위해 제1 영역 변환기 440에 의한 제1 복호화 신호 출력과 합성 신호(Synthesis Signal)를 결합하기 위한 결합기(Combiner) 600을 포함한다.
Preferably, the area converter 540 is an LPC synthesis processor that is controlled using the LPC filter information included in the encoded signal. This LPC filter information is generated by the LPC processor 510 of FIG. 11A, As an encoder output signal. The audio decoder includes a combiner 600 for combining a first decoded signal output by the first region converter 440 and a synthesis signal to obtain a final decoded audio signal 609.

바람직한 구현으로, 제1 복호화 분기는 추가적으로 대응하는 부호화기 단계421에 의해 수행된 연산을 되돌리거나 적어도 부분적으로 되돌리기 위한 역 양자화기/복호화기 단계 431을 포함한다. 그러나, 양자화는 손실이 많은 연산이므로 되돌릴 수 없음이 명백하다. 그러나, 역 양자화기는 대수 또는 압신 양자화(Logarithmic or Companding Quantization)와 같은 그러한 양자화에서의 불균일성(Non-uniformity)을 되돌린다.
In a preferred implementation, the first decoding branch additionally includes an inverse quantizer / decoder step 431 for reversing, or at least partially reverting, the operation performed by the corresponding encoder step 421. However, it is obvious that the quantization can not be reversed because it is a lossy operation. However, the inverse quantizer returns non-uniformity in such quantization such as logarithmic or companding quantization.

제2 코딩 분기에서, 단계 524에 의해 적용되는 일정 부호화 연산을 되돌리기 위해 대응하는 단계 533가 적용된다. 바람직하게는, 단계 524는 균일 양자화(Uniform Quantization)를 포함한다. 그러므로, 대응하는 단계 533는 일정 균일 양자화를 되돌리기 위한 특정 역 양자화 단계를 갖지 않는다.
In the second coding branch, a corresponding step 533 is applied to reverse the constant encoding operation applied by step 524. Preferably, step 524 includes uniform quantization. Therefore, the corresponding step 533 does not have a specific dequantization step to reverse the constant uniform quantization.

제2 변환기534뿐만 아니라 제1 변환기 440도 대응하는 역 트랜스포머 단계 440a, 534a, 합성 윈도우 단계 440b, 534b 및 순차적으로 연결된 오버랩/가산 단계 440c, 534c를 포함할 수 있다. 상기 오버랩/가산 단계는 상기 변환기들, 더 상세하게는 상기 트랜스포머 단계 440a, 534a들이 변형 이산 여현 변환(MDCT : Modified Discrete Cosine Transform)과 같은 앨리어싱 유발 변환을 적용할 때 필요하다. 이어서, 오버랩/가산 동작은 시간 영역 앨리어싱 제거(TDAC : Time Domain Aliasing Cancellation)를 수행한다. 그러나, 역 FFT와 같이 앨리어싱을 유발하지 않는 변환을 이용할 때에는, 오버랩/가산 단계 440c가 필요하지 않다. 이러한 구현에서는, 블록킹 아티팩트를 피하기 위한 교차 페이딩 연산(Cross Fading Operation)이 이용될 수 있다.
The first transducer 440 as well as the second transducer 534 may include corresponding reverse transformer stages 440a, 534a, synthesis window stages 440b, 534b, and sequential overlap / add stages 440c, 534c. The overlap / add step is needed when the transformers, and more particularly the transformer stages 440a, 534a, apply aliasing-induced transforms such as Modified Discrete Cosine Transform (MDCT). Subsequently, the overlap / add operation performs Time Domain Aliasing Cancellation (TDAC). However, when using a transform that does not cause aliasing, such as an inverse FFT, an overlap / add step 440c is not needed. In this implementation, a cross fading operation to avoid blocking artifacts may be used.

유사하게, 상기 결합기 600는 스위칭되는 결합기 또는 교차 페이딩 결합기일 수 있고, 또는 앨리어싱(Aliasing)이 블록킹 아티팩트를 피하기 위해 사용될 때, 전이 윈도우잉 연산(Transition Windowing Operation)이 분기 자체 내에 오버랩/가산 단계와 유사한 결합기에 의해 구현된다.
Similarly, the combiner 600 may be a toggle combiner or a cross fading combiner, or, when aliasing is used to avoid blocking artifacts, a Transition Windowing Operation may be performed within the branch itself by an overlap / It is implemented by a similar coupler.

도 1a는 두 연결된 스위치를 갖는 발명의 일 실시예를 나타낸다. 모노 신호, 스테레오 신호 또는 다중 채널 신호가 스위치 200으로 입력된다. 스위치 200은 결정 단계 300에 의해 제어된다. 상기 결정 단계는, 입력으로써, 블록 200으로의 신호 입력을 받는다. 택일적으로, 결정 단계 300은 또한, 모노 신호, 스테레오 신호 또는 다중 채널 신호에 포함되거나 또는 적어도, 예를 들어, 모노 신호, 스테레오 신호 또는 다중 채널 신호를 본래 생성할 때 발생된 정보가 어디가 존재하는지에 대한 신호에 연관된 보조 정보(Side Information) 를 받을 수 있다.
Figure 1A shows an embodiment of the invention with two connected switches. A mono signal, a stereo signal, or a multi-channel signal is input to the switch 200. [ The switch 200 is controlled by a decision step 300. The determining step receives, as an input, a signal input to the block 200. Alternatively, the determining step 300 may also include determining whether the information generated in generating the mono, stereo, or multi-channel signals, or at least generating, for example, mono, (Side Information) associated with the signal for the < / RTI >

결정 단계 300는 도 1a의 위쪽 분기에 나타난 주파수 부호화 부분 또는 도 1a의 아래쪽 분기에 나타난 LPC 영역 부호화 부분 400에 신호를 공급하기 위해 스위치 200를 동작시킨다. 주파수 영역 부호화 분기의 핵심 요소는, 일반적인(common) 전 처리 단계 출력 신호(후술할 바와 같이)를 스펙트럴 영역으로 변환하기 위해 동작하는 스펙트럴 변환 블록 410이다. 상기 스펙트럴 변환 블록은 MDCT 알고리즘, QMF, FFT 알고리즘, 웨이블릿 분석(Wavelet Analysis), 또는 특정수의 필터뱅크 채널을 갖는 정밀하게 샘플링된 필터뱅크와 같은 필터뱅크(Filter Bank)를 포함할 수 있으며, 이러한 필터뱅크 내의 부대역(Subband) 신호는 실수 신호(Real-valued Signal) 또는 복소 신호(Complex-valued Signal)일 수 있다. 상기 스펙트럴 변환 블록 410의 출력은 스펙트럴 오디오 부호화기(Spectral Audio Encoder) 421를 이용하여 부호화되며, 상기 스펙트럴 오디오 부호화기는 AAC 코딩 방법으로 알려진 바와 같이 처리 블록을 포함할 수 있다.
The decision step 300 operates the switch 200 to supply a signal to the LPC region encoding portion 400 shown in the frequency encoding portion shown in the upper branch of FIG. 1A or the lower branch of FIG. 1A. The key element of the frequency domain coding branch is a spectral transform block 410 that operates to transform a common pre-processing phase output signal (as described below) into a spectral domain. The spectral transform block may include a filter bank such as a MDCT algorithm, a QMF, an FFT algorithm, a wavelet analysis, or a finely sampled filter bank with a certain number of filter bank channels, The subband signal in this filter bank may be a real-valued signal or a complex-valued signal. The output of the spectral transform block 410 is encoded using a spectral audio encoder 421. The spectral audio encoder may include a processing block as known in the AAC coding method.

일반적으로, 분기 400에서의 처리는 지각 기반 모델(Perception Based Model) 또는 정보 싱크 모델(Information Sink Model)에서의 처리이다. 따라서, 이러한 분기는 음향을 수신하는 인간 청각 시스템(Human Auditory System)을 모델링한다. 그에 반대로, 분기 500에서의 처리는 여기, 잔여 또는 LPC 영역에서의 신호를 발생하기 위한 것이다. 일반적으로, 분기 500에서의 처리는 음성 모델 또는 정보 발생 모델(Information Generation Model)이다. 상기 모델은 음성 신호에 대해 음향을 발생하는 인간의 음성/음향 발생 시스템(Human Speech/Sound Generation System)의 모델이다. 그러나, 만일 다른 음향 발생 모델을 요구하는 다른 소스로부터의 음향이 부호화되는 경우라면, 분기 500에서의 처리는 다를 수 있다.
In general, processing in branch 400 is processing in a Perception Based Model or an Information Sink Model. Thus, this branch models a human auditory system that receives sound. Conversely, the processing at branch 500 is for generating a signal in the excitation, residual or LPC region. In general, the processing at branch 500 is an audio model or an information generation model. The model is a model of a human voice / sound generation system (Human Speech / Sound Generation System) that generates sound for a voice signal. However, if the sound from another source that requires a different sound generation model is to be encoded, the processing in branch 500 may be different.

아래쪽 부호화 분기 500에서, 핵심 요소는 LPC 필터의 특성을 제어하기 위해 사용되는 LPC 정보를 출력하는 LPC 장치 510이다. 상기 LPC 정보는 복호화기로 전송된다. LPC 단계 510 출력 신호는 여기 신호 및/또는 가중 신호로 이루어진 LPC 영역 신호이다.
In the lower coding branch 500, the key element is the LPC device 510 which outputs the LPC information used to control the characteristics of the LPC filter. The LPC information is transmitted to a decoder. The LPC phase 510 output signal is an LPC domain signal consisting of an excitation signal and / or a weighted signal.

LPC 장치는 일반적으로 LPC 영역 신호를 출력하며, 상기 LPC 영역 신호는, 도 7e의 여기 신호 또는 도 7f의 가중 신호 또는 오디오 신호에 LPC 필터 계수를 적용함으로써 발생된 어떤 다른 신호일 수 있다. 또한, LPC 장치는 또한 상기 계수를 결정할 수 있고, 상기 계수를 양자화/부호화 할 수 있다.
The LPC device generally outputs an LPC domain signal, which can be any other signal generated by applying the LPC filter coefficients to the excitation signal of FIG. 7E or the weighted or audio signal of FIG. 7F. The LPC apparatus may also determine the coefficients and may quantize / code the coefficients.

결정 단계에서의 결정은, 결정 단계가 음악/음성 구별(Music/Speech Discrimination)을 수행하고 음악 신호는 위쪽 분기 400로의 입력되고 음성 신호가 아래쪽 분기 500로의 입력되는 방식으로 스위치 200를 제어하기 위해 신호 적응적(Signal-adaptive)일 수 있다. 하나의 실시예에서, 결정 단계는, 복호화기가 정확한 복호화 연산을 수행하기 위해 결정 정보를 사용할 수 있도록, 출력 비트 스트림에 상기 결정 정보를 공급한다.
The decision at the decision step is based on the fact that the decision step is to perform a Music / Speech Discrimination and the music signal is input to the upper branch 400 and the signal is input to the lower branch 500, May be adaptive (Signal-adaptive). In one embodiment, the determining step supplies the decision information to an output bitstream such that the decoder may use the decision information to perform an accurate decoding operation.

그러한 복호화기는 도 1b에 나타난다. 스펙트럴 오디오 부호화기 421에 의한 출력 신호는, 전송 후, 스펙트럴 오디오 복호화기 431로 입력된다. 스펙트럴 오디오 복호화기 431의 출력은 시간 영역 변환기 440로 입력된다. 유사하게, 도 1a의 LPC 영역 부호화 분기 500의 출력은, 복호화기측에서 수신되며, LPC 여기 신호를 얻기 위해 구성요소 531, 533, 534 및 532에 의해 처리된다. 상기 LPC 여기 신호는 LPC 합성 단계 540로 입력되며, 상기 LPC 합성 단계(LPC Synthesis Stage) 540는, 추가 입력으로, 대응하는 LPC 분석 단계(LPC Analysis Stage) 510에 의해 발생된 LPC 정보를 수신한다. 시간 영역 변환기 440의 출력 및/또는 LPC 합성 단계 540의 출력은 스위치 600로 입력된다. 상기 스위치 600는, 예를 들어 결정 단계 300에 의해 발생되거나 또는 본래 모노 신호, 스테레오 신호 또는 다중 채널 신호의 생성기와 같은 것에 의해 외부적으로 제공된, 스위치 제어 신호(Switch Control Signal)를 통해 제어된다. 상기 스위치 600의 출력은 완전한 모노 신호, 스테레오 신호 또는 다중 채널 신호이다.
Such a decoder is shown in FIG. The output signal from the spectral audio encoder 421 is input to the spectral audio decoder 431 after transmission. The output of the spectral audio decoder 431 is input to the time domain converter 440. Similarly, the output of the LPC region encoding branch 500 of FIG. 1A is received at the decoder side and processed by components 531, 533, 534, and 532 to obtain the LPC excitation signal. The LPC excitation signal is input to an LPC synthesis step 540, and the LPC synthesis stage 540 receives LPC information generated by a corresponding LPC analysis stage 510 as an additional input. The output of the time domain converter 440 and / or the output of the LPC synthesis step 540 are input to the switch 600. The switch 600 is controlled via a switch control signal, e.g., generated by a decision step 300, or externally provided by, for example, a generator of a mono signal, a stereo signal, or a multi-channel signal. The output of the switch 600 is a complete mono signal, a stereo signal, or a multi-channel signal.

스위치 200와 결정 단계 300로의 입력 신호는 모노 신호, 스테레오 신호, 다중 채널 신호 또는 일반적으로 오디오 신호일 수 있다. 스위치 200 입력 신호로부터 또는 단계 200로의 신호 입력의 기초를 이루는 본래 오디오 신호의 발생기(Producer)와 같은 어떠한 외부 소스로부터 도출될 수 있는 결정(Decision)에 따라서, 상기 스위치는 주파수 부호화 분기 400와 LPC 부호화 분기 500 사이를 스위칭한다. 주파수 부호화 분기 400는 스펙트럴 변환 단계(Spectral Conversion Stage) 410와 뒤에 연결되는 양자화/코딩 단계(Quantizing/Coding Stage) 421를 포함한다. 상기 양자화/코딩 단계는 AAC 부호화기와 같은 최신의 주파수 영역 부호화기로 알려진, 어떠한 기능도 포함할 수 있다. 또한, 양자화/코딩 단계 421에서의 양자화 연산은, 주파수에 대한 심리 음향 마스킹 임계치(Psychoacoustic Masking Threshold)와 같은 심리 음향 정보(Psychoacoustic Information)를 발생하는 심리 음향 모듈(Psychoacoustic Module)에 의해 제어될 수 있으며, 상기 정보는 단계 421로 입력된다.
The input signal to switch 200 and to decision step 300 may be a mono signal, a stereo signal, a multi-channel signal, or an audio signal in general. Depending on the decision that may be derived from the switch 200 input signal or from any external source, such as a producer of the original audio signal that underlies the signal input to step 200, the switch is coupled to the frequency encoding branch 400 and the LPC encoding Switches between branch 500. The frequency encoding branch 400 includes a spectral conversion stage 410 and a quantizing / coding stage 421 connected downstream. The quantization / coding step may include any function known as a state-of-the-art frequency domain coder such as an AAC encoder. In addition, the quantization operation in the quantization / coding step 421 can be controlled by a psychoacoustic module that generates Psychoacoustic Information, such as a Psychoacoustic Masking Threshold for the frequency , The information is input to step 421. [

LPC 부호화 분기에서, 스위치 출력 신호는 LPC 보조 정보(LPC Side Info)와 LPC 영역 신호를 발생시키는 LPC 분석 단계 510에 의해 처리된다. 여기 부호화기는, 진보적으로, LPC 영역 내의 양자화/코딩 연산 522 또는 LPC 스펙트럴 영역의 값을 처리하는 양자화/코딩 단계 524 사이에서 LPC 영역 신호의 추가 처리로 스위칭하는 부가적인 스위치를 포함한다. 이것을 위하여, 스펙트럴 변환기 523가 양자화/코딩 단계 524의 입력에 제공된다. 스위치 521는, 예를 들어 AMR-WB+ 기술 내용에서 설명되는 것과 같은 특정 설정에 따라 개방 루프 방법(Open-loop Fashion) 또는 폐 루프 방법(Closed-loop Fashion)으로 제어된다.
In the LPC encoding branch, the switch output signal is processed by an LPC analysis step 510 that generates LPC side information and an LPC domain signal. The encoder includes an additional switch that progressively switches to additional processing of the LPC region signal between the quantization / coding operation 522 in the LPC region or the quantization / coding step 524 processing the value of the LPC spectral region. To this end, a spectral transformer 523 is provided at the input of the quantization / coding step 524. The switch 521 is controlled by an open-loop fashion or a closed-loop fashion according to a specific setting, for example, as described in the AMR-WB + description.

폐 루프 제어 모드(Closed-loop Control Mode)에서는, 부호화기는 추가적으로 LPC 영역 신호에 대한 역 양자화기/코더 531, LPC 스펙트럴 영역 신호에 대한 역 양자화기/코더 533 및 항목(Item) 533의 출력을 위한 역 스펙트럴 변환기(Inverse Spectral Converter) 534를 포함한다. 제2 부호화 분기의 처리 분기들 내의 부호화된 신호 및 다시 복호화된 신호는 모두 스위치 제어 장치(Switch Control Device) 525에 입력된다. 더 낮은 왜곡(Distortion)을 가지는 신호가 스위치 521가 어떤 위치를 취할 지 결정하는 데 사용되도록 하기 위해 스위치 제어 장치 525에서는, 상기 두 출력 신호가 서로 및/또는 목표 함수(Target Function)와 비교되거나, 또는 양 신호에서의 왜곡(Distortion)의 비교를 기초로 한 목표 함수(Target Function)가 계산될 수 있다. 선택적으로, 양 분기가 일정하지 않은 비트율을 제공하는 경우에는, 심지어 한 분기의 신호대 잡음비(SNR : Signal to Noise Ratio)가 다른 분기의 신호대 잡음비보다 낮은 경우에도, 더 낮은 신호대 잡음비를 제공하는 분기가 선택될 수 있다. 선택적으로, 목표 함수는, 특정 목표를 위한 최선의 결정(Decision)을 찾기 위해, 입력으로써, 각 신호의 신호대 잡음비와 각 신호의 비트율 및/또는 추가적인 기준(Criterion)을 사용할 수 있다. 만일, 예를 들어, 목표가 비트율이 가능한 한 낮아야만 하는 것이라면, 목표 함수는 구성요소 531, 534에 의한 두 신호 출력의 비트율에 강하게 의존할 것이다. 그러나, 주요 목표가 특정한 비트율에 대해 최고의 품질을 가지는 것일 때에는, 스위치 제어 525는, 예를 들어, 허용된 비트율 아래에 있는 각 신호를 폐기할 수 있고 양 신호들이 허용된 비트율 아래에 있는 때에는, 상기 스위치 제어는 더 좋은 신호대 잡음비를 가지는, 즉 더 작은 양자화/코딩 왜곡을 가지는 신호를 선택할 것이다.
In the closed-loop control mode, the encoder further includes an inverse quantizer / coder 531 for the LPC domain signal, an inverse quantizer / coder 533 and an item 533 for the LPC spectral domain signal And an inverse spectral converter 534 for converting the input signal into a digital signal. Both the encoded signal and the re-decoded signal in the processing branches of the second encoding branch are input to the switch control device 525. [ To allow a signal with a lower distortion to be used to determine what position the switch 521 will take, the switch control device 525 determines whether the two output signals are compared to each other and / or to a target function, Or a target function based on a comparison of distortion in both signals can be calculated. Optionally, when both branches provide an unequal bit rate, even if the signal to noise ratio (SNR) of one branch is lower than the signal to noise ratio of the other branch, a branch providing a lower signal to noise ratio Can be selected. Optionally, the target function may use the signal-to-noise ratio of each signal, the bit rate of each signal, and / or an additional criterion (Criterion) as input, to find the best decision for a particular target. If, for example, the goal is that the bit rate should be as low as possible, then the target function will strongly depend on the bit rate of the two signal outputs by components 531 and 534. However, when the primary goal is to have the highest quality for a particular bit rate, the switch control 525 may discard each signal below the allowed bit rate, for example, and when both signals are below the allowed bit rate, The control will select a signal having a better signal-to-noise ratio, i. E., A smaller quantization / coding distortion.

본 발명에 따른 복호화 방법은, 전술한 바와 같이, 도 1b에 나타난다. 세 가지 가능한 출력 신호 종류 각각에 대해, 특정 복호화/재양자화 단계 431, 531, 및 533이 존재한다. 단계 431은 주파수/시간 변환기 440를 사용하여 시간 영역으로 변환되는 시간-스펙트럼을 출력하는 반면, 단계 531은 LPC 영역 신호(LPC-domain Signal)를 출력하고, 항목(Item) 533은 LPC 스펙트럼(LPC-spectrum)을 출력한다. 스위치 532로의 입력 신호들이 모두 LPC 영역에 있다는 것을 확실히 하기 위해, LPC 스펙트럼/LPC 변환기(LPC-spectrum/LPC-converter) 534가 제공된다. 상기 스위치 532의 출력 데이터는, 부호화기측에서 발생되어 전송된 LPC 정보에 의해 제어되는 LPC 합성 단계(LPC Synthesis Stage) 540를 사용하여 도로 시간 영역으로 변환된다. 그러면, 블록 540에 순차적으로, 양 분기는, 도 1a의 부호화 방법으로의 신호 입력에 종속되는 모노 신호, 스테레오 신호 또는 다중 채널 신호와 같은 오디오 신호를 최종적으로 얻기 위해, 스위치 제어 신호에 따라 스위칭되는 시간 영역 정보를 가진다.
The decoding method according to the present invention is shown in FIG. 1B, as described above. For each of the three possible output signal types, there are specific decoding / requantization steps 431, 531, and 533. The step 531 outputs an LPC domain signal (LPC-domain signal), the item 533 outputs an LPC spectrum (LPC domain signal) -spectrum). An LPC spectrum / LPC-converter (LPC-spectrum / LPC-converter) 534 is provided to ensure that all of the input signals to switch 532 are in the LPC region. The output data of the switch 532 is converted into a road time domain using an LPC synthesis stage (LPC Synthesis Stage) 540, which is controlled by the LPC information generated and transmitted from the encoder side. Then, in turn, in block 540, both branches are switched according to the switch control signal to finally obtain an audio signal such as a mono signal, a stereo signal or a multi-channel signal depending on the signal input to the encoding method of Fig. Time-domain information.

도 1c는 도 4b의 원리와 유사한 스위치 521의 다른 구성을 가진 추가 실시예를 나타낸다.
Figure 1C shows a further embodiment with another configuration of switch 521 similar to the principle of Figure 4B.

도 2a는 상기 발명의 제2 측면에 따른 바람직한 부호화 방법을 나타낸다. 스위치 200 입력에 연결된 일반적인(common) 전 처리 방법은, 출력으로써, 둘 또는 그 이상의 채널을 갖는 신호인 입력 신호를 하향 혼합(Downmixing)하여 발생되는 모노 출력 신호 및 결합 스테레오 파라미터(Joint Stereo Parameter)를 발생시키는 서라운드/결합 스테레오 블록(Surround/Joint Stereo Block) 101을 포함할 수 있다. 일반적으로, 블록 101의 출력에서의 신호는 더 많은 채널을 갖는 신호일 수도 있으나, 블록 101의 하향 혼합(Downmixing) 기능 때문에, 블록 101의 출력에서의 채널의 수는 블록 101으로의 채널 입력의 수보다 작다.
FIG. 2A shows a preferred encoding method according to the second aspect of the present invention. A common pre-processing method connected to the input of the switch 200 is to output a mono output signal and a joint stereo parameter generated by downmixing an input signal, which is a signal having two or more channels, as an output And a surround / joint stereo block 101 for generating a surround / Generally, the signal at the output of block 101 may be a signal with more channels, but because of the downmixing function of block 101, the number of channels at the output of block 101 is less than the number of channel inputs to block 101 small.

일반적인(common) 전 처리 방법은 상기 블록 101 대신에 또는 상기 블록 101에 추가로 대역폭 확장 단계(Bandwidth Extension Stage) 102를 포함할 수 있다. 도 2a의 실시예에서, 블록 101의 출력은, 도 2a의 부호화기 내에서 출력단에서 저대역 신호 또는 저역 통과 신호와 같은 대역 제한 신호(Band-limited Signal)를 출력하는, 대역폭 확장 블록 102로 입력된다. 바람직하게는, 상기 신호는 다운샘플링(예를 들어 2의 팩터(Factor)에 의해)될 수도 있다. 또한, 블록 102로의 신호의 입력의 고대역에 대해서는, MPEG-4의 HE-AAC 프로파일로 불리는 스펙트럴 엔빌로프 파라미터(Spectral Envelop Parameter), 역 필터링 파라미터(Inverse Filtering Parameter), 잡음 플로어 파라미터(Noise Floor Parameter) 등과 같은 대역폭 확장 파라미터들이 발생되고 비트 스트림 다중화기 800로 전송된다.
A common pre-processing method may include a Bandwidth Extension Stage 102 in place of or in addition to the block 101. In the embodiment of FIG. 2A, the output of block 101 is input to a bandwidth extension block 102, which outputs a band-limited signal, such as a low-band signal or a low-pass signal, at the output in the encoder of FIG. 2A . Preferably, the signal may be downsampled (e.g., by a factor of two). In addition, the high band of the input of the signal to the block 102 includes a spectral envelope parameter called an HE-AAC profile of MPEG-4, an inverse filtering parameter, a noise floor parameter Parameters, and the like are generated and transmitted to the bitstream multiplexer 800.

바람직하게는, 결정 단계 300는, 예를 들어, 음악 모드 또는 음성 모드 사이를 결정하기 위해 블록 101으로의 신호 입력 또는 블록 102으로의 입력을 수신한다. 음악 모드에서는 위쪽 부호화 분기 400가 선택되는 반면에, 음성 모드에서는 아래쪽 부호화 분기 500가 선택된다. 바람직하게는, 상기 결정 단계는 추가로, 특정 신호에 블록들의 기능을 적합화하기 위해 결합 스테레오 블록 101 및/또는 대역폭 확장 블록 102을 제어한다. 따라서, 상기 결정 단계가 입력 신호의 일정 시간 부분이 음악 모드와 같은 제1 모드의 것이라고 결정할 경우, 블록 101 및/또는 블록 102의 특정 특징이 결정 단계 300에 의해 제어된다. 선택적으로, 결정 단계 300가 신호가 음성 모드에 있거나, 일반적으로, 제2 LPC 영역 모드(Second LPC-domain Mode)에 있다고 결정할 경우 블록 101, 102들의 특정한 특징이 결정 단계 출력에 따라 제어될 수 있다.
Preferably, decision step 300 receives a signal input to block 101 or an input to block 102 to determine, for example, a music mode or a voice mode. The upper coding branch 400 is selected in the music mode while the lower coding branch 500 is selected in the voice mode. Advantageously, the determining step further controls the combining stereo block 101 and / or the bandwidth extension block 102 to adapt the function of the blocks to a particular signal. Thus, if the determining step determines that a portion of the time of the input signal is of a first mode, such as a music mode, certain features of block 101 and / or block 102 are controlled by decision step 300. Alternatively, if the decision step 300 determines that the signal is in a voice mode or, in general, in a second LPC-domain Mode, the particular characteristics of the blocks 101, 102 may be controlled according to the decision step output .

바람직하게는, 코딩 분기 400의 스펙트럴 변환은 MDCT 연산, 더욱 바람직하게는 시간 워핑된 MDCT 연산(Time-warped MDCT Operation)을 이용하여 행해지며, 그 세기, 또는 일반적으로 그 워핑 세기(Warping Strength)는 0과 높은 워핑 세기 사이에서 조절될 수 있다. 0인 워핑 세기에서는, 블록 411에서의 MDCT 연산은 당해 기술 분야에서 알려진 간단한 MDCT 연산이다. 시간 워핑 보조 정보(Time Warping Side Information)를 함께 갖는 시간 워핑 세기(Time Warping Strength)는 보조 정보로써 비트 스트림 다중화기 800으로 전송/입력될 수 있다.
Preferably, the spectral transform of the coding branch 400 is performed using an MDCT operation, more preferably a time warped MDCT operation, and the intensity, or generally its warping strength, Can be adjusted between zero and high warping strength. At a warping intensity of zero, the MDCT operation at block 411 is a simple MDCT operation known in the art. The time warping strength together with time warping side information may be transmitted / input to the bitstream multiplexer 800 as auxiliary information.

LPC 부호화 분기에서 LPC 영역 부호화기는, 피치 이득(Pitch Gain), 피치 지연(Pitch Lag) 및/또는 코드북 인덱스와 이득(Codebook Index and Gain)과 같은 그러한 코드북 정보를 계산하는 ACELP 코어 526(ACELP core 526)을 포함할 수 있다. 3GPP TS 26.290으로 알려진 TCX 모드는 변환 영역에서 지각적으로 가중된 신호(Perceptually Weighted Signal)의 처리를 가져온다. 푸리에 변환된 가중 신호(Fourier-transformed Weighted Signal)는, 잡음 팩터 양자화(Noise Factor Quantization)를 갖는 분할 다중-비율 격자 양자화(Split Multi-rate Lattice Quantization)(대수 VQ(Algebraic VQ))를 이용하여 양자화된다. 변환은 1024, 512 또는 216 샘플 윈도우에서 계산된다. 여기 신호는 역 가중 필터(Inverse Weighting Filter)를 통해 양자화된 가중 신호(Quantized Weighted Signal)를 역 필터링 함으로써 회복된다.
The LPC region coder in the LPC encoding branch includes an ACELP core 526 (ACELP core 526) that computes such codebook information, such as pitch gain, pitch lag, and / or codebook index and gain. ). The TCX mode, known as 3GPP TS 26.290, results in the processing of a perceptually weighted signal in the transform domain. The Fourier-transformed weighted signal is quantized using Split Multi-rate Lattice Quantization (Algebraic VQ) with Noise Factor Quantization do. The transform is computed in 1024, 512 or 216 sample windows. The excitation signal is recovered by inverse filtering the quantized weighted signal through an inverse weighting filter.

제1 코딩 분기 400에서, 스펨트럴 변환기(Spectral Converter)는 바람직하게는윈도우 함수 후 단일 벡터 양자화 단계를 가질 수 있으나 도 2a의 항목 421에서 주파수 영역 코딩 분기에서의 양자화기/코더와 유사한 결합된 스칼라 양자화기/엔트로피 코더인 양자화 엔트로피 부호화 단계를 가지는 특별히 적응적인 MDCT 연산을 포함한다.
In the first coding branch 400, a spectral converter may have a single vector quantization step, preferably after a window function, but in item 421 of FIG. 2a, a combined scalar (similar to a quantizer / coder in a frequency-domain coding branch) Lt; RTI ID = 0.0 > MDCT < / RTI > operation with a quantization entropy encoding step that is a quantizer / entropy coder.

제2 코딩 분기에는, LPC 블록 510 뒤에 스위치 521 다시 뒤에 ACELP 블록 526 또는 TCX 블록 527이 있다. ACELP는 3GPP TS 26.190에서 설명되며, TCX는 3GPP TS 26.290에서 설명된다. 일반적으로, ACELP 블록 526은 도 7e에 나타난 것과 같은 절차에 의해 계산되는 것과 같은 LPC 여기 신호(LPC Excitation Signal)를 수신한다. TCX 블록 527은 도 7f에 의해 발생되는 것과 같은 가중 신호(Weighted Signal)를 수신한다.
In the second coding branch, after the LPC block 510, there is an ACELP block 526 or a TCX block 527 after the switch 521 again. ACELP is described in 3GPP TS 26.190, and TCX is described in 3GPP TS 26.290. In general, the ACELP block 526 receives an LPC excitation signal such as that calculated by the procedure shown in FIG. 7E. TCX block 527 receives a weighted signal such as that generated by Figure 7f.

TCX에서, 변환은 LPC 기반 가중 필터(LPC-based Weighted Filter)를 통해 입력 신호를 필터링함으로써 계산된 가중 신호(Weighted Signal)에 적용된다. 상기 발명의 바람직한 실시예에서 사용되는 가중 필터(Weighted Filter)는

로 주어진다. 따라서, 가중 신호는 LPC 영역 신호이고, 그 신호의 변환은 LPC 스펙트럴 영역이다. ACELP 블록 526에 의해 처리되는 신호는 여기 신호이고 상기 블록 527에서 처리되는 신호와는 다르나, 양 신호 모두 LPC 영역에 있다.
In TCX, the transform is applied to the weighted signal calculated by filtering the input signal through an LPC-based weighted filter. The weighted filter used in the preferred embodiment of the present invention

. Thus, the weighted signal is an LPC domain signal, and the conversion of the signal is an LPC spectral domain. The signal processed by ACELP block 526 is an excitation signal and is different from the signal processed in block 527, but both signals are in the LPC region.

도 2b에 나타난 복호화기 측에서, 블록 537에서의 역 스펙트럴 변환(Inverse Spectral Transform) 후, 가중 필터의 역이 적용되며, 이는

이다. 그러면, 신호는 LPC 여기 영역으로 가기 위해 (1-A(z))를 통해 필터링된다. 따라서, LPC 영역 블록 534로의 변환(Conversion)과 TCX^-1 블록 537은, 역변환(Inverse Transform)과 그 후 가중 영역(Weighted Domain)으로부터 여기 영역(Excitation Domain)으로 변환하기 위한,

을 통한 필터링을 포함한다.
2B, after the inverse spectral transform at block 537, the inverse of the weighted filter is applied,

to be. Then, the signal is filtered through (1-A (z)) to go to the LPC excitation domain. Thus, the conversion to the LPC region block 534 and the TCX- ¹ block 537 may be performed in an inverse transform, and thereafter a conversion from a weighted domain to an excitation domain,

Lt; / RTI >

비록 도 1a, 1c, 2a, 2c의 항목(Item) 510은 하나의 블록을 나타내지만, 신호들이 LPC 영역에 있는 한, 다른 신호들을 출력할 수 있다. 여기 신호 모드 또는 가중 신호 모드와 같은 블록 510의 실제 모드(Actual Mode)는 실제 스위치 상태에 좌우된다. 선택적으로, 블록 510은 두 개의 병렬적인 처리 장치를 가질 수 있으며, 하나의 장치는 도 7e와 유사하게 구현되고 다른 장치는 도 7f에서처럼 구현된다. 그러므로, 블록 510의 출력에서의 LPC 영역은 LPC 여기 신호 또는 LPC 가중 신호 또는 다른 LPC 영역 신호를 나타낼 수 있다.
Although Item 510 in Figures 1a, 1c, 2a, 2c represents one block, it can output other signals as long as the signals are in the LPC region. The actual mode of the block 510, such as the excitation mode or the weighted signal mode, depends on the actual switch state. Optionally, block 510 may have two parallel processing units, one device is implemented similar to FIG. 7e and the other device is implemented as in FIG. 7f. Therefore, the LPC region at the output of block 510 may represent an LPC excitation signal or an LPC weighted signal or another LPC domain signal.

도 2a 또는 2c의 제2 부호화 분기(ACELP/TCX)에서, 신호는 바람직하게는 부호화 전에, 필터

를 통해 프리엠퍼시스(Pre-emphasized)된다. 도 2b의 ACELP/TCX 복호화기에서, 합성된 신호는 필터

로 디엠퍼시스(Deemphasized)된다. 상기 프리엠퍼시스(Pre-emphasis)는, 신호가 LPC 분석(LPC Analysis) 및 양자화 전에 프리엠퍼시스되는 LPC 블록 510의 일부일 수 있다. 유사하게, 디엠퍼시스는 LPC 합성 블록(LPC Synthesis Block) LPC^-1 540의 일부일 수 있다.
In the second encoding branch (ACELP / TCX) of Figure 2a or 2c, the signal is preferably pre-

Pre-emphasized " through < / RTI > In the ACELP / TCX decoder of FIG. 2B,

(Deemphasized). The pre-emphasis may be part of LPC block 510 where the signal is pre-emphasized before LPC analysis and quantization. Similarly, de-emphasis may be part of the LPC Synthesis Block LPC ^-1 540.

도 2c는 도 2a의 구현에 대한, 그러나 도 4b의 원리와 유사한 스위치 521의 다른 구성을 가진 추가 실시예를 나타낸다.
Fig. 2C shows a further embodiment for the embodiment of Fig. 2a, but with another configuration of switch 521 similar to the principle of Fig. 4b.

바람직한 실시예에서, 제1 스위치 200(도 1a 또는 2a를 참고한다.)는 개방 루프 결정(Open-loop Decision)(도 4a에서 처럼)을 통해 제어되고 제2 스위치는 폐 루프 결정(Closed-loop Decision)(도 4b에서 처럼)을 통해 제어된다.
In a preferred embodiment, the first switch 200 (see FIG. 1A or 2a) is controlled via an open-loop decision (as in FIG. 4A) and the second switch is controlled through a closed- Decision (as in Figure 4b).

예를 들어, 도 2c는 도 4b에 있는 것처럼 ACELP 및 TCX 뒤에 위치한 제2 스위치를 포함한다. 그러면, 제1 처리 분기에서는 제1 LPC 영역이 LPC 여기(LPC Excitation)를 나타내고, 제2 처리 분기에서는 제2 LPC 영역이 LPC 가중 신호(LPC Weighted Signal)를 나타낸다. 제1 LPC 영역 신호는 LPC 잔여 영역(LPC Residual Domain)으로 변환하기 위한 (1-A(z))를 통한 필터링에 의해 얻어지고, 반면에 제2 LPC 영역 신호는 LPC 가중 영역(LPC Weighted Domain)으로 변환하기 위한

를 통한 필터링에 의해 얻어진다.
For example, FIG. 2C includes ACELP as shown in FIG. 4B and a second switch located after TCX. Then, in the first processing branch, the first LPC region indicates LPC excitation, and in the second processing branch, the second LPC region indicates an LPC weighted signal. The first LPC region signal is obtained by filtering through (1-A (z)) to convert to the LPC Residual Domain while the second LPC region signal is obtained by filtering through the LPC Weighted Domain, To

Lt; / RTI >

도 2b는 도 2a의 부호화 방법에 대응하는 복호화 방법을 나타낸다. 도 2a의 비트 스트림 다중화기 800에 의해 발생된 비트 스트림은 비트 스트림 역 다중화기 900로 입력된다. 예를 들어 모드 탐지 블록(Mode Detection Block) 601에 의한 비트 스트림으로부터 도출된 정보에 따르면, 복호화기측 스위치 600는 위쪽 분기로부터의 신호 또는 아래쪽 분기로부터의 신호를 대역폭 확장 블록 701으로 진행시키도록 제어된다. 대역폭 확장 블록 701은, 비트 스트림 역 다중화기 900로부터, 보조 정보를 수신하고, 상기 보조 정보와 모드 결정(Mode Decision) 601의 출력에 기초하여, 스위치 600에 의한 저대역(Low Band) 출력을 기초로 한 고대역(High Band)을 복원한다.
FIG. 2B shows a decoding method corresponding to the encoding method of FIG. 2A. The bit stream generated by the bit stream multiplexer 800 of FIG. 2A is input to the bit stream demultiplexer 900. FIG. For example, according to the information derived from the bit stream by the mode detection block 601, the decoder switch 600 is controlled to advance the signal from the upper branch or the signal from the lower branch to the bandwidth extension block 701 . The bandwidth extension block 701 receives auxiliary information from the bit stream demultiplexer 900 and generates a low band output by the switch 600 based on the auxiliary information and the output of the mode decision 601 To restore the high band.

블록 701에 의해 발생된 전대역(Full Band) 신호는 두 스테레오 채널 또는 몇몇의 다중 채널을 복원하는 결합 스테레오/서라운드 처리 단계(Joint Stereo/Surround Processing Stage) 702로 입력된다. 일반적으로, 블록 702은 이 블록으로 입력되는 것보다 더 많은 채널을 출력한다. 적용에 따라, 이 블록에서의 출력이 이 블록으로의 입력에 비해 더 많은 채널을 포함하는 한 블록 702으로의 입력은 더 많은 채널을 포함할 수도 있고 스테레오 모드에서와 같은 두 채널을 포함할 수도 있다.
The full band signal generated by block 701 is input to the Joint Stereo / Surround Processing Stage 702, which restores the two stereo channels or some of the multiple channels. Generally, block 702 outputs more channels than is input to this block. Depending on the application, the input to block 702 may include more channels and may include two channels, such as in stereo mode, as long as the output in this block includes more channels than the input to this block .

스위치200는 오직 하나의 분기만이 처리할 신호를 수신하고 다른 분기는 처리할 신호를 수신하지 않도록 하기 위해, 양 분기 사이를 스위칭시키는 것으로 설명되어 왔다. 그러나 선택적인 실시예에서 스위치는, 예를 들어 오디오 부호화기421와 여기 부호화기(Excitation Encoder) 522, 523, 524가 순차적으로 배열될 수 있으며, 이는 양 분기 400, 500가 같은 신호를 평행하게 처리한다는 것을 의미한다. 그러나, 비트율을 두 배도 늘리지 않기 위해, 오직 상기 부호화 분기 400, 500 중 하나에 의한 신호 출력만이, 출력 비트 스트림에 기록되기 위해 선택된다. 그러면 결정 단계는 비트 스트림에 기록된 신호가 일정 비용 함수를 최소화하도록 하기 위해 동작하며, 상기 비용 함수는 발생된 비트율 또는 발생된 지각 왜곡(Perceptual Distortion) 또는 결합된 율/왜곡(Combined Rate/Distortion) 비용 함수일 수 있다. 그러므로, 이러한 모드 또는 도면에 나타난 모드에서, 결정 단계는 또한, 마지막으로, 부호화 분기 출력이 오직 주어진 지각 왜곡에 대해 가장 낮은 비트율 또는 주어진 비트율에 대해 가장 낮은 지각 왜곡을 갖는 비트 스트림에 기록되는 것을 확실히 하기 위해, 폐 루프 모드에서 동작할 수도 있다. 폐 루프 모드에서, 피드백 입력은 도 1a에 있는 세 양자화기/스케일러 블록(Quantizer/Scaler Block) 421, 522, 424의 출력으로부터 도출될 수 있다.
Switch 200 has been described as switching between both branches to ensure that only one branch receives the signal to process and the other branch does not receive the signal to process. However, in an alternative embodiment, the switches may be arranged in order, for example, the audio encoder 421 and the excitation encoders 522, 523, 524, which means that both branches 400, 500 process the same signal in parallel it means. However, in order not to double the bit rate, only the signal output by one of the encoding branches 400, 500 is selected to be written to the output bit stream. The decision step then operates to cause the signal recorded in the bitstream to minimize a certain cost function and the cost function is based on the generated bit rate or the generated Perceptual Distortion or Combined Rate / Distortion, Cost function. Therefore, in this mode or mode shown in the figure, the decision step also ensures that the encoded branch output is only written to the bit stream with the lowest bit rate for a given perceptual distortion or the lowest perceptual distortion for a given bit rate , It may operate in the closed loop mode. In the closed loop mode, the feedback inputs can be derived from the outputs of the three quantizer / scaler blocks 421, 522, 424 in FIG.

두 개의 스위치, 즉 제1 스위치 200 및 제2 스위치 521를 갖는 구현에서는, 제1 스위치에서의 시간 분해능이 제2 스위치에서의 시간 분해능보다 더 낮은 것이 바람직하다. 다르게 말하면, 스위치 동작에 의해 스위칭될 수 있는, 제1 스위치로의 입력 신호의 블록은 LPC 영역에서 동작하는 제2 스위치에 의해 스위칭되는 블록보다 더 크다. 예시적으로, 각각 주파수 영역/LPC 영역 스위치 200는 1024 샘플 길이의 블록을 스위칭시킬 수 있고, 제2 스위치 521는 256 샘플을 갖는 블록을 스위칭시킬 수 있다.
In an implementation with two switches, a first switch 200 and a second switch 521, it is desirable that the time resolution at the first switch is lower than the time resolution at the second switch. In other words, the block of the input signal to the first switch, which can be switched by the switch operation, is larger than the block switched by the second switch operating in the LPC region. Illustratively, each of the frequency domain / LPC domain switches 200 may switch blocks of 1024 samples long and the second switch 521 may switch blocks of 256 samples.

비록 도 1a부터 도 10b까지 중 몇몇은 장치의 블록도로 나타내어지지만, 상기 도면들은 동시에 블록 기능이 방법 단계에 대응하는, 방법의 도해도 된다.
Although some of FIGS. 1A through 10B are block diagrams of the apparatus, the figures also illustrate a method in which the block function corresponds to a method step at the same time.

도 3a는 제1 부호화 분기 400와 제2 부호화 분기 500의 출력으로써 부호화된 오디오 신호를 발생시키기 위한 오디오 부호화기를 나타낸다. 또한, 부호화된 오디오 신호는, 바람직하게는 일반적인(common) 전 처리 단계로부터의 전 처리 파라미터(Pre-processing Parameter)와 이전의 도면들과 관련되어 논의된 것과 같은, 스위치 제어 정보(Switch Control Information) 또는 보조 정보를 포함한다.
3A shows an audio coder for generating an audio signal encoded by the outputs of the first encoding branch 400 and the second encoding branch 500. [ In addition, the encoded audio signal preferably includes a pre-processing parameter from a common pre-processing step and switch control information, such as discussed in connection with previous figures, Or auxiliary information.

바람직하게는, 제1 부호화 분기는 제1 코딩 알고리즘에 따라 오디오 중간 신호(Intermediate Signal) 195를 부호화하도록 동작하되, 제1 코딩 알고리즘은 정보 싱크 모델(Information Sink Model)을 가진다. 제1 부호화 분기 400는 오디오 중간 신호 195의 부호화된 스펙트럴 정보 표현인 제1 부호화기 출력 신호를 발생시킨다.
Preferably, the first coding branch is operated to code an audio intermediate signal 195 according to a first coding algorithm, wherein the first coding algorithm has an information sink model. The first encoding branch 400 generates a first encoder output signal which is a coded spectral information representation of the audio intermediate signal 195.

나아가, 제2 부호화 분기 500는 제2 부호화 알고리즘에 따라 오디오 중간 신호 195를 부호화하는데 적합화되며, 제2 코딩 알고리즘은 정보 소스 모델(Information Source Model)을 가지고, 제2 부호화기 출력 신호에서, 상기 중간 오디오 신호를 나타내는 정보 소스 모델에 대한 부호화된 파라미터를 발생시키다.
Further, the second encoding branch 500 is adapted to encode the audio intermediate signal 195 according to the second encoding algorithm, the second coding algorithm has an information source model, and in the second encoder output signal, Generates encoded parameters for the information source model representing the audio signal.

오디오 부호화기는 오디오 중간 신호 195를 얻기 위해 오디오 입력 신호99를 전 처리하기 위한 일반적인(common) 전 처리 단계(Pre-processing Stage)를 더 포함한다. 특히, 일반적인(common) 전 처리 단계는, 오디오 중간 신호 195, 즉 일반적인(common) 전 처리 알고리즘의 출력이 오디오 입력 신호의 압축된 형태(Compressed Version)이 되도록 하기 위해 오디오 입력 신호 99를 처리하도록 동작한다.
The audio encoder further includes a common pre-processing stage for preprocessing the audio input signal 99 to obtain an audio intermediate signal 195. In particular, the common pre-processing step is operative to process the audio input signal 99 so that the output of the audio intermediate signal 195, a common pre-processing algorithm, is a compressed version of the audio input signal do.

부호화된 오디오 신호를 발생시키기 위한 바람직한 오디오 부호화 방법은, 정보 싱크 모델을 포함하고, 제1 출력 신호에서, 오디오 신호를 나타내는 부호화된 스펙트럴 정보를 발생시키는 제1 코딩 알고리즘에 따라 오디오 중간 신호 195를 부호화(400)하는 단계; 정보 소스 모델을 가지고, 제2 출력 신호에서, 중간 신호 195를 나타내는 정보 소스 모델을 위한 부호화된 파라미터를 발생시키는 제2 코딩 알고리즘에 따라 오디오 중간 신호 195를 부호화(500)하는 단계; 및 오디오 중간 신호 195를 얻기 위해 오디오 입력 신호 99를 일반적으로 전 처리(100)하는 단계를 포함하되, 상기 일반적으로 전 처리하는 단계에서, 오디오 입력 신호 99는 오디오 중간 신호 195가 오디오 입력 신호 99의 압축된 형태가 되도록 처리되고, 상기 부호화된 오디오 신호는, 오디오 신호의 특정 부분에서 제1 출력 신호 또는 제2 출력 신호를 포함한다. 상기 방법은 바람직하게는 제1 코딩 알고리즘 또는 제2 코딩 알고리즘을 이용하여 오디오 중간 신호의 일정 부분을 부호화하거나, 또는 두 알고리즘을 모두 이용하여 상기 신호를 부호화하고, 부호화된 신호에서 제1 코딩 알고리즘의 결과 또는 제2 코딩 알고리즘의 결과를 출력하는 단계를 더 포함한다.
A preferred audio coding method for generating an encoded audio signal includes an information sink model and generates an audio intermediate signal 195 in accordance with a first coding algorithm that generates, in a first output signal, encoded spectral information indicative of an audio signal Encoding (400); Encoding (500) the audio intermediate signal (195) in accordance with a second coding algorithm, generating an encoded parameter for an information source model representing an intermediate signal (195) in a second output signal with an information source model; And processing the audio input signal 99 to obtain an audio intermediate signal 195, wherein in the generally pre-processing step, the audio input signal 99 is generated such that the audio intermediate signal 195 is an audio input signal 99 And the encoded audio signal includes a first output signal or a second output signal in a specific portion of the audio signal. The method preferably encodes a certain portion of the audio intermediate signal using a first coding algorithm or a second coding algorithm, or encodes the signal using both algorithms, and encodes the signal using the first coding algorithm Outputting the result or the result of the second coding algorithm.

일반적으로, 제1 부호화 분기 400에 사용되는 오디오 부호화 알고리즘은 오디오 싱크(Audio Sink)에서의 환경을 반영하고 모델링한다. 오디오 정보의 싱크는 보통 인간의 귀이다. 인간의 귀는 주파수 분석기로써 모델링될 수 있다. 그러므로, 제1 부호화 분기는 부호화된 스펙트럴 정보를 출력한다. 바람직하게는, 제1 부호화 분기는 심리 음향 마스킹 임계치(Psychoacoustic Masking Threshold)를 추가로 적용하기 위한 심리 음향 모델(Psychoacoustic Model)을 더 포함한다. 상기 심리 음향 마스킹 임계치는 오디오 스펙트럴 값(Audio Spectral Value)을 양자화할 때 사용되며, 바람직하게는 상기 양자화는 수행되되, 양자화 잡음(Quantization Noise)이 심리 음향 마스킹 임계치 아래 숨겨진 스펙트럴 오디오 값을 양자화함으로써 유입되도록 한다.Generally, the audio encoding algorithm used in the first encoding branch 400 reflects and models the environment in an audio sink. Sinking of audio information is usually the human ear. The human ear can be modeled as a frequency analyzer. Therefore, the first coding branch outputs the encoded spectral information. Advantageously, the first encoding branch further comprises a Psychoacoustic Model for further applying a Psychoacoustic Masking Threshold. The psychoacoustic masking threshold value is used when quantizing the audio spectral value. Preferably, the quantization noise is quantized by quantizing the spectral audio value hidden below the psychoacoustic masking threshold, .

제2 부호화 분기는 오디오 음향의 발생을 반영하는 정보 소스 모델(Information Source Model)을 나타낸다. 그러므로, 정보 소스 모델은 LPC 분석 단계에 의해, 즉 시간 영역 신호를 LPC 영역으로 변환하고 순차적으로 LPC 잔여 신호(LPC Residual Signal), 즉 여기 신호(Excitation Signal)를 처리함에 의해 나타내어지는 음성 모델(Speech Model)을 포함할 수 있다. 그러나, 선택 적인 음향 소스 모델들은 일정 악기(Instrument)를 나타내기 위한 음향 소스 모델 또는 현실로 존재하는 특정 음향 소스와 같은 어떠한 음향 발생기이다. 다른 음향 소스 모델들 사이에서 선택은, 몇몇 음향 소스 모델이, 예를 들면 SNR 계산, 즉 어떠한 소스 모델이 오디오 신호의 일정 시간 부분 및/또는 주파수 부분을 부호화하는데 가장 적합한 것인지에 대한 계산에 기초하여 유효할 때 수행된다. 바람직하게는, 그러나, 부호화 분기 사이의 스위치는 시간 영역에서 동작되며, 즉 특정 시간 부분은 하나의 모델을 이용하여 부호화되고 중간 신호(Intermediate Signal)의 특정한 다른 시간 부분은 다른 부호화 분기를 이용하여 부호화된다.
The second encoding branch represents an information source model that reflects the generation of audio sound. Therefore, the information source model can be represented by an LPC analysis step, that is, a speech model (Speech), which is represented by converting a time domain signal into an LPC domain and sequentially processing an LPC residual signal (LPC residual signal) Model). However, alternative acoustic source models are any acoustic source model, such as a sound source model for representing a certain instrument or a specific acoustic source that exists in reality. Selection between different acoustic source models is based on the fact that some acoustic source models are valid based on, for example, SNR calculations, i.e., which source model is best suited to encode a certain time portion and / . Preferably, however, the switch between the encoding branches is operated in the time domain, that is, the specific time portion is encoded using one model and the other different time portion of the intermediate signal is encoded using another encoding branch do.

정보 소스 모델은 특정 파라미터에 의해 나타내어진다. 음성 모델에 대해서는, AMR-WB+와 같은 최신의 음성 코더가 고려될 때, 상기 파라미터는 LPC 파라미터와 코딩된 여기 파라미터(Coded Excitation Parameter)이다. 상기 AMR-WB+는 ACELP 부호화기와 TCX 부호화기를 포함한다. 이 경우에, 코딩된 여기 파라미터는 전역 이득, 잡음 플로어(Noise Floor) 및 가변 길이 코드(Variable Length Code)일 수 있다.
The information source model is represented by specific parameters. For voice models, when a modern voice coder such as AMR-WB + is considered, the parameter is an LPC parameter and a coded excitation parameter. The AMR-WB + includes an ACELP encoder and a TCX encoder. In this case, the coded excitation parameter may be a global gain, a noise floor, and a variable length code.

도 3b는 도 3a에 나타난 부호화기에 대응하는 복호화기다. 일반적으로, 도 3b는 복호화된 오디오 신호 799를 얻도록, 부호화된 오디오 신호를 복호화하기 위한 오디오 복호화기를 나타낸다. 상기 복호화기는, 정보 싱크 모델을 갖는 제1 코딩 알고리즘에 따라 부호화된 부호화 신호를 복호화하기 위한 제1 복호화 분기 450를 포함한다. 상기 오디오 복호화기는, 정보 소스 모델을 갖는 제2 코딩 알고리즘에 따라 부호화된 정보 신호를 복호화하기 위한 제2 복호화 분기 550를 더 포함한다. 상기 오디오 복호화기는 추가로, 결합된 신호를 얻기 위해 제1 복호화 분기 450 및 제2 복호화 분기 550로부터의 출력 신호를 결합하기 위한 결합기(Combiner)를 포함한다. 일반적인(common) 전-처리 단계의 출력 신호가 결합된 신호의 확장된 버전이기 위해서는 도 3b에서 복호화된 오디오 중간 신호 699로 도시된 결합 신호는 결합기 600에 의한 결합된 출력 신호인 복호화된 오디오 중간 신호 699를 후 처리하기 위한 일반적인(common) 후처리 단계로 입력된다. 따라서, 복호화된 오디오 신호(Decoded Audio Signal) 799는 복호화된 오디오 중간 신호 699에 비해 향상된 정보 내용(Enhanced Information Content)을 가진다. 상기 정보 확장은, 일반적인(common) 후 처리 단계에 의해 부호화기에서 복호화기로 전송될 수 있고 또는 복호화된 오디오 중간 신호 자체로부터 도출될 수도 있는 전/후 처리 파라미터의 도움을 받는다. 바람직하게는, 그러나, 부호화기에서 복호화기로 전송되는 이러한 절차가 복호화된 오디오 신호의 향상된 품질을 가능하게 하기 때문에, 전/후 처리 파라미터들은 부호화기에서 복호화기로 전송된다.
FIG. 3B shows a decoding step corresponding to the encoder shown in FIG. 3A. Generally, FIG. 3B shows an audio decoder for decoding a coded audio signal to obtain a decoded audio signal 799. The decoder includes a first decoding branch 450 for decoding an encoded signal encoded according to a first coding algorithm having an information sink model. The audio decoder further includes a second decoding branch 550 for decoding the information signal encoded according to a second coding algorithm having an information source model. The audio decoder further includes a combiner for combining the output signals from the first decoding branch 450 and the second decoding branch 550 to obtain a combined signal. In order for the output signal of the common pre-processing stage to be an extended version of the combined signal, the combined signal shown as the decoded audio intermediate signal 699 in FIG. 3B is a combined output signal of the decoded audio intermediate signal 699 to the common post-processing step for post-processing. Therefore, the decoded audio signal 799 has enhanced information content as compared with the decoded audio intermediate signal 699. The information extensions may be transmitted from the encoder to the decoder by a common post-processing step, or assisted by pre / post processing parameters that may be derived from the decoded audio intermediate signal itself. Preferably, however, the pre / post processing parameters are transmitted from the encoder to the decoder since this procedure, which is transmitted from the encoder to the decoder, enables an improved quality of the decoded audio signal.

도 3c는, 본 발명의 바람직한 실시예에 따른 도 3a의 중간 오디오 신호 195와 동일할 수 있는, 오디오 입력 신호 195를 부호화하기 위한 오디오 부호화기를 나타낸다. 상기 오디오 입력 신호 195는, 예를 들어, 시간 영역일 수도 있으나, 주파수 영역, LPC 영역, LPC 스펙트럴 영역 또는 그 밖의 영역과 같은 어떠한 다른 영역일 수도 있는 제1 영역에 있다. 일반적으로, 하나의 영역에서 다른 영역으로의 변환은 어떠한 잘 알려진 시간/주파수 변환 알고리즘 또는 주파수/시간 변환 알고리즘과 같은 변환 알고리즘에 의해 수행된다.
3C shows an audio encoder for encoding an audio input signal 195, which may be identical to the intermediate audio signal 195 of FIG. 3A according to a preferred embodiment of the present invention. The audio input signal 195 may be, for example, a time domain, but is in a first domain, which may be a frequency domain, an LPC domain, an LPC spectral domain, or any other domain such as the other domain. In general, the conversion from one region to another is performed by a conversion algorithm such as any well-known time / frequency conversion algorithm or frequency / time conversion algorithm.

시간 영역으로부터의 대체 가능한 변환은, 예를 들어 LPC 영역에서, LPC 잔여 신호(LPC Residual Signal) 또는 여기 신호(Excitation Signal)를 발생시키는 시간 영역 신호를 필터링하는 LPC의 결과이다. 변환 전 신호 샘플의 실질적인 수에 영향을 미치는 필터링된 신호를 제공하는 어떠한 필터링 연산도, 경우에 따라 변환 알고리즘처럼 사용될 수 있다. 그러므로, 가중 필터(Weighting Filter)에 기반을 둔 LPC를 이용하여 오디오 신호를 가중(Weighting)하는 것은 LPC 영역에서 신호를 발생시키는 추가 변환이다. 시간/주파수 변환에서, 단일 스펙트럴 값의 변경은 변환 전의 모든 시간 영역 값들에 영향을 미칠 것이다. 유사하게, 어떤 시간 영역 샘플의 변경도 각 주파수 영역 샘플에 영향을 미칠 것이다. 유사하게, LPC 영역 환경에서 여기 신호의 샘플의 변경은, LPC 필터의 길이 때문에, LPC 필터링 전 샘플의 상당한 수에 영향을 미칠 것이다. 유사하게, LPC 변환 전 샘플의 변경은 LPC 필터의 고유 메모리 효과(Inherent Memory Effect) 때문에 상기 LPC 변환에 의해 얻어지는 많은 샘플에 영향을 미칠 것이다.
The replaceable conversion from the time domain is, for example, the result of LPC filtering in the LPC domain, a time domain signal generating an LPC residual signal or an excitation signal. Any filtering operation that provides a filtered signal that affects a substantial number of signal samples before conversion may be used as the translation algorithm, as the case may be. Therefore, weighting an audio signal using an LPC based on a weighting filter is an additional conversion that generates a signal in the LPC region. In a time / frequency transform, a change in a single spectral value will affect all time domain values before transformation. Similarly, any change in time-domain samples will affect each frequency-domain sample. Similarly, a change in the sample of the excitation signal in the LPC domain environment will affect a significant number of samples before LPC filtering due to the length of the LPC filter. Similarly, a change in the sample before the LPC transform will affect many samples obtained by the LPC transform due to the inherent memory effect of the LPC filter.

도 3c의 오디오 부호화기는 제1 부호화 신호를 발생시키는 제1 코딩 분기(First Coding Branch) 400를 포함한다. 제1 부호화 신호는, 바람직한 실시예에서, 시간-스펙트럴 영역, 즉 시간 영역 신호가 시간/주파수 변환에 의해 처리될 때 얻어지는 영역인 제4 영역에 있을 수 있다.
The audio encoder of FIG. 3C includes a first coding branch 400 for generating a first coded signal. In a preferred embodiment, the first encoded signal may be in a time-spectral region, that is, a fourth region that is obtained when the time-domain signal is processed by time / frequency conversion.

그러므로, 오디오 신호를 부호화하기 위한 제1 코딩 분기 400은 제1 부호화 신호를 얻기 위해 제1 코딩 알고리즘을 사용하며, 제1 코딩 알고리즘은 시간/주파수 변환 알고리즘을 포함하거나 포함하지 않을 수 있다.
Therefore, the first coding branch 400 for encoding the audio signal uses a first coding algorithm to obtain the first coded signal, and the first coding algorithm may or may not include a time / frequency conversion algorithm.

오디오 부호화기는 오디오 신호를 부호화하기 위한 제2 코딩 분기(Second Coding Branch) 500를 더 포함한다. 제2 코딩 분기 500는 제2 부호화 신호(Second Encoded Signal)를 얻기 위해, 제1 코딩 알고리즘과는 다른 제2 코딩 알고리즘을 사용한다.
The audio encoder further includes a second coding branch 500 for encoding the audio signal. The second coding branch 500 uses a second coding algorithm different from the first coding algorithm to obtain a second encoded signal.

오디오 부호화기는, 오디오 입력 신호의 부분에서, 블록 400의 출력에서의 제1 부호화 신호 또는 제2 부호화 분기의 출력에서의 제2 부호화 신호가 부호화기 출력 신호에 포함되도록 하기 위해, 제1 코딩 분기 400와 제2 코딩 분기 500 사이를 스위칭하는 제1 스위치 200를 더 포함한다. 따라서, 오디오 입력 신호 195의 특정 부분에 대해 제4 영역의 제1 부호화된 신호가 부호화기 출력 신호에 포함될 때에는, 제2 영역에서의 제1 처리 신호(First Processed Signal) 또는 제3 영역에서의 제2 처리 신호(Second Processed Signal)인 제2 부호화 신호는 부호화기 출력 신호에 포함되지 않는다. 이는 상기 부호화기가 비트율에 있어 효율적이라는 것을 보장한다. 실시예에서, 두 다른 부호화된 신호에 포함된 오디오 신호의 어떤 시간 부분도 도 3e와 관련해서 논의될 프레임의 프레임 길이에 비해 작다. 이러한 작은 부분들은, 어떠한 교차 페이드(Cross Fade) 없이 일어날 수 있는 아티팩트(Artifact)를 감소시키기 위해 스위칭이 일어나는 경우에 하나의 부호화된 신호로부터 다른 부호화된 신호로의 교차 페이드(Cross Fade)에 유용하다. 그러므로, 교차 페이드 영역(Cross Fade Region)외에, 각 시간 영역 블록은 오직 단일 영역(Single Domain)의 부호화된 신호에 의해 나타내어진다.
The audio coder may be configured such that in the portion of the audio input signal the first coded branch at the output of block 400 or the second coded signal at the output of the second coded branch is included in the coder output signal, And a first switch 200 for switching between the second coding branch 500. Therefore, when the first encoded signal of the fourth region is included in the encoder output signal with respect to a specific portion of the audio input signal 195, the first processed signal (first processed signal) in the second region or the second processed signal The second coded signal, which is a second processed signal, is not included in the encoder output signal. This ensures that the encoder is efficient in bit rate. In the embodiment, any time portion of the audio signal contained in the two different encoded signals is small compared to the frame length of the frame to be discussed with respect to Figure 3e. These small portions are useful for cross fading from one encoded signal to another encoded signal when switching occurs to reduce artifacts that may occur without any cross fades . Therefore, in addition to the Cross Fade Region, each time domain block is represented by only a single domain coded signal.

도 3c에 나타난 것처럼, 제2 코딩 분기 500는 제1 영역에서의 오디오 신호, 즉 신호 195를 제2 영역으로 변환하기 위한 변환기(Converter) 510를 포함한다. 또한, 제2 코딩 분기 500은 제1 처리 분기(First Processing Branch) 522가 영역 변환을 수행하지 않도록 바람직하게는 또한 제2 영역에 있는 제1 처리 신호(First Processed Signal)를 얻기 위해 제2 영역에서 오디오 신호를 처리하기 위한 제1 처리 분기 522를 포함한다.
As shown in FIG. 3C, the second coding branch 500 includes a converter 510 for converting the audio signal in the first region, i. E., The signal 195, into the second region. In addition, the second coding branch 500 may also include a second coding branch 500 to allow the first processing branch 522 to perform region conversion, preferably also in a second region to obtain a first processed signal in the second region And a first processing branch 522 for processing an audio signal.

제2 코딩 분기 500는 추가로 제2 영역의 오디오 신호를 제1 영역과 다르고 또한 제2 영역과도 다르며 상기 제2 처리 분기 523, 524의 출력에서 제2 처리 신호(Second Processed Signal)를 얻기 위해 제3 영역에서 오디오 신호를 처리하는 제3 영역으로 변환하는 제2 처리 분기(Second Processing Branch) 523, 524를 포함한다.
The second coding branch 500 further differs from the first region in that the audio signal of the second region is also different from the second region and is used to obtain a second processed signal at the output of the second processing branch 523, And a second processing branch (Second Processing Branch) 523, 524 for converting an audio signal in a third area into a third area for processing an audio signal.

또한, 제2 코딩 분기는, 제2 코딩 분기로의 오디오 신호 입력의 부분에 대해, 제2 영역에서의 제1 처리 신호 또는 제3 영역에서의 제2 처리 신호가 제2 부호화 신호(Second Encoding Signal)가 되도록 하기 위해, 제1 처리 분기 522와 제2 처리 분기 523, 524 사이를 스위칭하기 위한 제2 스위치 521를 포함한다.
In addition, the second coding branch may be such that, for the portion of the audio signal input to the second coding branch, the first processing signal in the second region or the second processing signal in the third region is the Second Encoding Signal ), A second switch 521 for switching between the first processing branch 522 and the second processing branch 523, 524.

도 3d는 도 3c의 부호화기에 의해 발생되는 부호화된 오디오 신호를 복호화하기 위한 대응하는 복호화기를 나타낸다. 일반적으로, 제1 영역 오디오 신호의 각 블록은 바람직하게는, 임계 샘플링 한계(Critical Sampling Limit)에 가능한 한 많이 있는 시스템을 얻기 위해 하나의 프레임의 길이에 비해 짧은 임의의 교차 페이드 영역을 제외하고 제2 영역 신호, 제3 영역 신호 또는 제4 영역 부호화 신호(Fourth Encoded Signal)에 의해 나타내어진다. 상기 부호화된 오디오 신호는 제1 코딩 신호(First Coded Signal), 제2 영역의 제2 코딩 신호(Second Coded Signal) 및 제3 영역의 제3 코딩 신호(Third Coded Signal)를 포함하며, 제1 코딩 신호, 제2 코딩 신호 및 제3 코딩 신호는 복호화된 오디오 신호의 다른 시간 부분들에 관련하고, 복호화된 오디오 신호의 제2 영역, 제3 영역 및 제1 영역은 서로 다르다.
FIG. 3D shows a corresponding decoder for decoding the encoded audio signal generated by the encoder of FIG. 3C. In general, each block of the first-domain audio signal preferably includes an arbitrary crossfade region shorter than the length of one frame to obtain a system with as much as possible the Critical Sampling Limit, 2 region signal, a third region signal, or a fourth region encoded signal (Fourth Encoded Signal). The coded audio signal includes a first coded signal, a second coded signal of a second region, and a third coded signal of a third region, The second coding signal and the third coding signal are related to different time portions of the decoded audio signal and the second region, the third region and the first region of the decoded audio signal are different from each other.

상기 복호화기는 제1 코딩 알고리즘에 기초하여 복호화하기 위한 제1 복호화 분기를 포함한다. 제1 복호화 분기는 도 3d의 431, 440에 나타나고, 바람직하게는 주파수/시간 변환기를 포함한다. 제1 코딩 신호는 바람직하게는 제4 영역에 있고 복호화된 출력 신호를 위한 영역인 제1 영역으로 변환된다.
The decoder includes a first decoding branch for decoding based on a first coding algorithm. The first decoding branch is shown at 431 and 440 in Figure 3d and preferably includes a frequency / time converter. The first coding signal is preferably converted to a first region which is in the fourth region and which is the region for the decoded output signal.

도 3d의 복호화기는 몇몇 구성 요소를 포함하는 제2 복호화 분기를 더 포함한다. 이러한 구성요소는 블록 531의 출력에서 제2 영역의 제1 역 처리 신호(First Inverse Processed Signal)를 얻기 위해 제2 코딩 신호를 역 처리하기 위한 제1 역 처리 분기(First Inverse Processing Branch) 531이다. 상기 제2 복호화 분기는 추가로 제2 영역에서의 제2 역 처리 신호(Second Inverse Processed Signal)를 얻기 위해 제3 코딩 신호를 역 처리하기 위한 제2 역 처리 분기(Second Inverse Processing Branch) 533,534를 포함하며, 상기 제2 역 처리 분기는 제3 영역으로부터 제2 영역으로 변환하기 위한 변환기를 포함한다.
The decoder of Figure 3d further comprises a second decoding branch comprising several components. This component is a first inverse processing branch 531 for inverse processing the second coded signal to obtain a first inverse processed signal of the second region at the output of block 531. [ The second decoding branch further includes second inverse processing branches 533 and 534 for inverse processing the third coded signal to obtain a second inverse processed signal in the second region And the second de-processing branch includes a converter for converting from a third region to a second region.

제2 복호화 분기는 제2 영역에서의 신호를 얻기 위해 제1 역 처리 신호와 제2 역 처리 신호를 결합하는 제1 결합기 532를 더 포함하며, 결합된 신호는, 처음 시점에는, 오직 제1 역 처리 신호에 의해서만 영향을 받고, 나중 시점에는, 오직 제2 역 처리 신호에만 영향을 받는다.
The second decoding branch further includes a first combiner 532 that combines the first inverse processed signal and the second inverse processed signal to obtain a signal in the second domain, Is influenced only by the processing signal, and is influenced only by the second inverse processing signal at a later time point.

제2 복호화 분기는 추가로 상기 결합된 신호를 제1 영역으로 변환하기 위한 변환기 540를 포함한다.
The second decryption branch further includes a transformer 540 for transforming the combined signal into a first region.

마지막으로, 도 3d에 나타난 복호화기는, 제1 영역에서 복호화된 출력 신호를 얻기 위해, 블록 431, 440으로부터의 복호화된 제1 신호(Decoded First Signal)와 변환기 540 출력 신호를 결합하기 위한 제2 결합기 600을 포함한다. 다시, 제1 영역의 복호화된 출력 신호는, 처음 시점에는, 오직 변환기 540에 의한 신호 출력에 의해서만 영향을 받고, 나중 시점에는, 오직 블록 431, 440에 의한 제1 복호화 신호에 의해서만 영향을 받는다.
Finally, the decoder shown in FIG. 3D includes a second combiner 540 for combining the decoded first signal (Decoded First Signal) from blocks 431 and 440 with the output signal of transformer 540 to obtain a decoded output signal in the first domain, 600. Again, the decoded output signal of the first region is only affected by the signal output by the transducer 540 only at the beginning, and only at the later time by the first decoded signal by the blocks 431 and 440 only.

이러한 상황은 도 3e에, 부호화기 관점에서, 나타난다. 도 3e의 위쪽 부분은 시간 영역 오디오 신호와 같은 그러한 제1 영역 오디오 신호를 도식적인 형태로 나타내며, 시간 인덱스는 왼쪽에서 오른쪽으로 증가하고 항목 3(Item 3)은 도 3c의 신호 195를 나타내는 오디오 샘플의 스트림으로 간주될 수 있다. 도 3e는, 도 3e의 항목 4(Item 4)에 나타난 것처럼 제1 부호화 신호, 제1 처리 신호 및 제2 처리 신호 사이를 스위칭함으로써 발생될 수 있는 프레임 3a, 3b, 3c, 3d를 나타낸다. 상기 제1 부호화 신호, 제1 처리 신호 및 제2 처리 신호는 모두 다른 영역에 있고, 다른 영역들 사이의 스위치가 복호화기 측에서 아티팩트(Artifact)를 야기하지 않음을 보장하기 위해, 시간 영역 신호의 프레임(3a, 3b)은 교차 페이드 영역으로 나타내어진 오버래핑 범위(Overlapping Range)을 가지며, 그러한 교차 페이드 영역은 프레임 3b와 3c에 있다. 그러나, 프레임 3d, 3c 사이에 존재하는 그러한 교차 페이드 영역은 존재하지 않으며, 이는 프레임 3d 또한 제2 처리 신호, 즉 제2 영역 내의 신호에 의해서 나타내어지고, 프레임 3c와 3d 사이에는 영역 변화가 없음을 의미한다. 그러므로, 일반적으로, 영역 변화가 없는 교차 페이드 영역을 제공하는 것은 바람직하지 않고, 교차 페이드 영역, 즉 영역 변화, 말하자면 두 스위치 중 어느 하나라도 스위칭 동작이 있을 때 두 개의 연속하는 코딩된/처리된 신호들에 의해 부호화되는 오디오 신호의 부분을 제공하는 것이 바람직하다. 바람직하게는, 교차 페이드는 다른 영역 변화를 위해 수행된다.
This situation appears in Figure 3e, from an encoder viewpoint. The upper portion of FIG. 3e represents such a first region audio signal, such as a time domain audio signal, in a graphical form, the time index increases from left to right, and Item 3 (Item 3) &Lt; / RTI > Figure 3e shows frames 3a, 3b, 3c, 3d that may be generated by switching between the first coded signal, the first processed signal, and the second processed signal as shown in item 4 of Figure 3e. In order to ensure that the first coded signal, the first processed signal and the second processed signal are all in different areas and that switches between different areas do not cause artifacts on the decoder side, (3a, 3b) have an overlapping range indicated by a cross fade region, and such a cross fade region is in frames 3b and 3c. However, there is no such crossing fade region present between frames 3d, 3c, which is represented by the signal in the second processing signal, i.e. the second region, also in frame 3d, and there is no region change between frames 3c and 3d it means. Therefore, in general, it is not desirable to provide a crossover fade region without region change, and it is not desirable to provide a crossover fade region, that is, a region change, that is, It is desirable to provide a portion of the audio signal that is encoded by the decoder. Preferably, the cross fades are performed for other area changes.

제1 부호화 신호 또는 제2 처리 신호가 예를 들어 50 퍼센트 오버랩(Overlap)을 가지는 MDCT 처리에 의해 발생되는 실시예에서, 각 시간 영역 신호는 두 개의 연속하는 프레임에 포함된다. 그러나, MDCT의 특성으로 인하여, 이것은 오버헤드(Overhead)를 유발하지 않으며 이는 MDCT는 임계 샘플링되는 시스템(Critically Sampled System)이기 때문이다. 이러한 문맥에서는, 임계 샘플링된다는 것은 스펙트럴 값의 수가 시간 영역 값의 수와 동일하다는 것을 의미한다. MDCT 블록으로부터 다음 MDCT 블록으로의 크로스오버(Crossover)가 임계 샘플링 요구를 방해하는 어떠한 오버헤드(Overhead) 없이 제공되도록 하기 위해, 상기 MDCT는 크로스오버 효과가 특정 크로스오버 영역 없이 제공되도록 하는 데 유리하다.
In embodiments where the first encoded signal or the second processed signal is generated by MDCT processing having, for example, 50 percent overlap, each time domain signal is included in two consecutive frames. However, due to the nature of MDCT, this does not cause overhead, since MDCT is a critically sampled system. In this context, being threshold sampled means that the number of spectral values is equal to the number of time domain values. In order to ensure that the crossover from the MDCT block to the next MDCT block is provided without any overhead that interferes with the threshold sampling requirement, the MDCT is advantageous in that the crossover effect is provided without a specific crossover region .

바람직하게는, 제1 코딩 분기에서의 제1 코딩 알고리즘은 정보 싱크 모델에 기초하고, 제2 코딩 분기에서의 제2 코딩 알고리즘은 정보 소스 또는 SNR 모델에 기초한다. SNR 모델은 특정 음향 발생 메커니즘에 특별하게 관계된 모델이 아니라, 예를 들면 폐 루프 결정(Closed-loop Decision)에 기초한 복수의 코딩 모드 사이에서 선택될 수 있는 하나의 코딩 모드인 모델이다. 따라서, SNR 모델은 필수적으로 음향 생성기의 물리적인 구성과 연관될 필요가 없고 서로 다른 모델들로부터의 SNR 결과를 비교함으로써 폐 루프 결정에 의해 선택되는 정보 싱크 모델과 다른 어떠한 파라미터화된 코딩 모델인 어떠한 가용한 모델이 될 수 있다.
Preferably, the first coding algorithm in the first coding branch is based on an information sink model, and the second coding algorithm in the second coding branch is based on an information source or SNR model. The SNR model is not a model specifically related to a specific sound generation mechanism, but is a model that is a coding mode that can be selected among a plurality of coding modes based on, for example, a closed-loop decision. Thus, the SNR model does not necessarily have to be associated with the physical configuration of the sound generator, and by comparing the SNR results from different models, the information sink model, which is selected by the closed loop decision, It can be an available model.

도 3c에 나타난 바와 같이, 제어기(Controller) 300, 525가 제공된다. 상기 제어기는 도 1a의 결정 단계 300의 기능을 포함할 수 있고, 추가로, 도 1a의 스위치 제어 장치 525의 기능을 포함할 수 있다. 일반적으로, 상기 제어기는 신호 적응 방법(Signal Adaptive Way)으로 제1 스위치와 제2 스위치를 제어하기 위한 것이다. 상기 제어기는, 제1 스위치로의 신호 입력, 제1 또는 제2 코딩 분기에 의한 출력, 또는 목표 함수에 관하여 제1 및 제2 부호화 분기로부터 부호화 및 복호화 함으로써 얻어지는 신호를 분석하기 위해 작용한다. 선택적으로, 또는 부가적으로, 상기 제어기는, 제2 스위치로의 신호 입력, 제1 처리 분기에 의한 출력, 제2 처리 분기에 의한 출력, 또는 다시 목표 함수에 관하여, 제1 처리 분기와 제2 처리 분기로부터 처리 및 역 처리(Processing and Inverse Processing)함으로써 얻어지는 출력을 분석하기 위해 작용한다.
As shown in FIG. 3C, controllers 300 and 525 are provided. The controller may include the functionality of the decision step 300 of FIG. 1A and, in addition, may include the functionality of the switch control device 525 of FIG. 1A. Generally, the controller is for controlling the first switch and the second switch in a signal adaptive way. The controller serves to analyze the signal obtained by coding and decoding from the first and second coding branches with respect to the signal input to the first switch, the output by the first or second coding branch, or the target function. Alternatively, or additionally, the controller is further programmed to perform the steps of: inputting a signal to the second switch, output by the first processing branch, output by the second processing branch, It serves to analyze the output obtained by processing and inverse processing from the processing branch.

일 실시예에서, 제1 코딩 분기 또는 제2 코딩 분기는 앨리어싱 현상(Aliasing Effect)을 유발하지 않는, 단순한 FFT 변환과는 다른, MDCT 또는 MDST 알고리즘과 같은 그러한 앨리어싱 유발 시간/주파수 변환 알고리즘(Aliasing Introducing Time/Frequency Conversion Algorithm)을 포함한다. 또한, 하나 또는 양 분기는 양자화기/엔트로피 코더 블록(Quantizer/Entropy Coder Block)을 포함한다. 특히, 제2 코딩 분기의 제2 처리 분기만이 앨리어싱 동작을 유발하는 시간/주파수 변환기를 포함하고, 제2 코딩 분기의 제1 처리 분기는 양자화기 및/또는 엔트로피 코더를 포함하고 앨리어싱 현상을 유발하지 않는다. 상기 앨리어싱 유발 시간/주파수 변환기는 바람직하게는 분석 윈도우(Analysis Window)를 적용하기 위한 윈도우어(Windower)와 MDCT 변환 알고리즘을 포함한다. 특히, 상기 위도우어는, 윈도우잉된 신호의 샘플이 적어도 두 개의 연속되는 윈도우잉된 프레임에 나타나도록 하기 위해, 오버래핑 방법으로 연속되는 프레임에 윈도우 함수를 적용하도록 작용한다.
In one embodiment, the first coding branch or the second coding branch may be implemented with such an aliasing-inducing time / frequency conversion algorithm such as the MDCT or MDST algorithm, which is different from the simple FFT transform that does not cause an aliasing effect Time / Frequency Conversion Algorithm). In addition, one or both branches include a quantizer / entropy coder block. In particular, only the second processing branch of the second coding branch includes a time / frequency transformer causing a aliasing operation, and the first processing branch of the second coding branch includes a quantizer and / or an entropy coder and causes an aliasing phenomenon I never do that. The aliasing-inducing time / frequency converter preferably includes a Windower and MDCT transformation algorithm for applying an Analysis Window. In particular, the window function serves to apply a window function to successive frames in an overlapping manner, so that a sample of the windowed signal appears in at least two consecutive windowed frames.

일 실시예에서, 제1 처리 분기는 ACELP 코더를 포함하고 제2 처리 분기는 MDCT 스펙트럴 변환기(MDCT Spectral Converter)와, 양자화된 스펙트럴 성분(Spectral Component)을 얻기 위해 스펙트럴 성분을 양자화하기 위한 양자화기를 포함하며, 각 양자화된 스펙트럴 성분은 0이거나 또는 복수의 서로 다른 가능한 양자화기 인덱스들 중 하나의 양자화기 인덱스에 의해 정의된다.
In one embodiment, the first processing branch comprises an ACELP coder and the second processing branch comprises an MDCT spectral converter and a second processing branch for quantizing the spectral components to obtain a quantized spectral component. Wherein each quantized spectral component is zero or is defined by a quantizer index of one of a plurality of different possible quantizer indices.

또한, 제1 스위치 200는 개방 루프 방법으로 동작하고 제2 스위치는 폐 루프 방법으로 동작하는 것이 바람직하다.
It is also preferable that the first switch 200 operates in an open loop manner and the second switch operates in a closed loop manner.

전술한 것처럼, 양 코딩 분기는 블록단위 방법(Block-wise Manner)으로 오디오 신호를 부호화하도록 작용하며, 양 코딩 분기 내에서 제1 스위치 또는 제2 스위치는, 스위칭 동작이, 최저 한도에서, 신호 샘플의 미리 정해진 수의 블록 후에 발생하도록 블록단위 방법으로 스위칭하고, 상기 미리 정해진 수는 대응하는 스위치에 대한 프레임 길이를 형성한다. 따라서, 제1 스위치에서 스위칭을 위한 그래뉼(Granule)은, 예를 들어, 2048 또는 1028 샘플의 블록이고, 제1 스위치 200가 기초하여 스위칭하는 상기 프레임 길이는 가변적일 수 있으나, 바람직하게는 꽤 긴 기간에 고정된다.
As described above, the two-coding branch serves to encode the audio signal in a block-wise manner, wherein the first switch or the second switch within the two-coding branch is selected such that the switching operation, at a minimum, In a block-by-block manner to occur after a predetermined number of blocks of the switch, and the predetermined number forms a frame length for the corresponding switch. Therefore, the granule for switching in the first switch is, for example, a block of 2048 or 1028 samples, and the frame length switched on the basis of the first switch 200 may be variable, but is preferably considerably long Lt; / RTI >

이와 반대로, 제2 스위치 521에서의 블록 길이는 즉 제2 스위치 521가 하나의 모드에서 다른 모드로 스위칭할 때, 제1 스위치에 대한 블록 길이보다 실질적으로 작다. 바람직하게는, 상기 스위치들의 양 블록 길이가 선택되되 긴 블록 길이가 작은 블록 길이의 정수배가 되도록 한다. 바람직한 실시예에서, 제1 스위치의 블록 길이는 2048 또는 1024이고 제2 스위치의 블록 길이는, 최대 한도에서, 제1 스위치가 단지 한 번 스위칭할 때 제2 스위치가 16번 스위칭할 수 있도록, 1024, 또는 더욱 바람직하게는 512, 그리고 심지어 더 바람직하게는 256, 그리고 심지어 더 바람직하게는 128 샘플이다. 그러나, 바람직한 최대 블록 길이 비율은 4:1이다.
Conversely, the block length at the second switch 521 is substantially smaller than the block length for the first switch when the second switch 521 switches from one mode to another. Preferably, both block lengths of the switches are selected so that a long block length is an integral multiple of the block length. In a preferred embodiment, the block length of the first switch is 2048 or 1024, and the block length of the second switch is such that the second switch can switch 16 times when the first switch only switches once, , Or more preferably 512, and even more preferably 256, and even more preferably 128 samples. However, the preferred maximum block length ratio is 4: 1.

추가 실시예에서, 상기 제어기 300, 525는, 음악으로의 결정(Decision to Music)에 대해 음성으로의 결정(Decision to Speech)이 선호되는 그러한 방법으로 제1 스위치에서 음성 음악 구별을 수행하도록 동작한다. 상기 실시예에서, 심지어 제1 스위치에 대한 프레임의 50% 미만인 부분이 음성이고 프레임의 50%를 초과한 부분이 음악인 경우에도, 음성으로의 결정이 취해진다.
In a further embodiment, the controller 300, 525 operates to perform speech music discrimination in the first switch in such a manner that Decision to Speech is preferred for Decision to Music . In this embodiment, even if the portion of the frame for the first switch that is less than 50% is speech and the portion that exceeds 50% of the frame is music, a determination by speech is taken.

또한, 상기 제어기는, 제1 프레임의 꽤 작은 부분이 음성일 때, 특히, 제1 프레임의 부분이, 더 작은 제2 프레임의 길이의 50%인, 음성일 때, 음성 모드로 미리 스위칭하도록 동작한다. 따라서, 바람직한 음성/선호 스위칭 결정(Speech/Favouring Switching Decision)은 심지어, 예를 들어, 제1 스위치의 프레임 길이에 대응하는 단지 6% 또는 12%만이 음성(Speech)인 때에도, 미리 음성으로 전환(Switch Over)한다.
In addition, the controller is configured to pre-switch to the speech mode when a fairly small portion of the first frame is speech, particularly when the portion of the first frame is speech, which is 50% of the length of the second frame smaller do. Thus, the preferred Speech / Favouring Switching Decision can even be switched to speech (for example, even when only 6% or 12% of the frame length of the first switch is speech) Switch Over.

상기 절차는 바람직하게는 일 실시예에서 유성음 코어(Voiced Speech Core)를 가지는, 제1 처리 분기의 비트율 저장 능력(Bit Rate Saving Capability)을 충분히 살리기 위한 것이고, 제2 처리 분기가 변환기(Converter)를 포함하고 있다는 사실 때문에 비음성(Non-speech)인, 큰 제1 프레임의 나머지에 대해서도 어떤 품질이 떨어지지 않게 하기 위한 것이며, 그러므로, 이는 비음성 신호를 포함하는 오디오 신호에 대해서도 유용하다. 바람직하게는, 상기 제2 처리 분기는 임계 샘플링되고, 복호화기 측에서의 오버랩(Overlap) 및 가산(Add)와 같은 그러한 시간 영역 앨리어싱 제거 처리(Time Domain Aliasing Cancellation)에 기인하여 심지어 작은 윈도우 사이즈에서도 높은 효율과 앨리어싱에 방해 받지 않는 동작을 제공하는 오버래핑 MDCT를 포함한다. 또한, 비음성 신호는 보통 꽤 정적(Stationary)이며, 긴 변환 윈도우는 높은 주파수 분해능을 제공하고, 그러므로 높은 품질을 제공하며, 추가로, 제2 코딩 분기의 제2 처리 분기에서의 변환 기반 코딩 모드(Transform Based Coding Mode)에 적용될 수 있는, 심리 음향적으로 제어된 양자화 모듈(Psycho Acoustically Controlled Quantization Module)에 기인하여 비트율 효율성(Bit Rate Efficiency)을 제공하기 때문에, 바람직하게는 AAC 같은 MDCT 부호화 분기(AAC-like MDCT Encoding Branch)인 제1 부호화 분기를 위한 큰 블록 길이는 유용하다.
The above procedure is preferably for making full use of the Bit Rate Saving Capability of the first processing branch having a voiced speech core in one embodiment, For the rest of the large first frame, which is non-speech due to the fact that the audio signal contains a non-speech signal, and is therefore also useful for an audio signal containing a non-speech signal. Advantageously, said second processing branch is threshold sampled, and even at such a small window size due to such Time Domain Aliasing Cancellation such as overlap and add at the decoder side, And an overlapping MDCT that provides uninterrupted operation to aliasing. In addition, the non-speech signal is usually fairly static, the long translation window provides high frequency resolution, and therefore provides high quality, and additionally, the conversion-based coding mode in the second processing branch of the second coding branch (Bit rate efficiency) due to a Psycho Acoustically Controlled Quantization Module, which can be applied to a Transform Based Coding Mode (MDCT) AAC-like MDCT Encoding Branch) is useful.

도 3d의 복호화기 도면에 관해서는, 전송된 신호는 도 3e에 나타난 것과 같은 보조 정보 4a로써 명시적인 지시자(indicator)를 포함하는 것이 바람직하다. 이러한 보조 정보 4a는, 대응하는 제1 부호화 신호(First Encoded Signal), 제1 처리 신호(First Processed Signal) 또는 제2 처리 신호(Second Processed Signal)를 도 3d의 제1 복호화 분기, 제1 역 처리 분기 또는 제2 역 처리 분기와 같은 정확한 프로세서로 전송하기 위해, 도 3d에 나타나지 않은 비트 스트림 파서(Bit Stream Parser)에 의해 추출된다. 그러므로, 부호화된 신호는 단지 부호화된/처리된 신호만을 갖는 가지는 것이 아니라, 이러한 신호에 대한 보조 정보도 포함한다. 다른 실시예에서는, 그러나, 복호화기측 비트 스트림 파서가 일정 신호 사이를 구별하도록 하는 내재된 신호 표시(Implicit Signaling)가 있을 수 있다. 도 3e에 관해서는, 제1 처리 신호(First Processed Signal) 또는 제2 처리 신호(Second Processed Signal)가 제2 코딩 분기의 출력이고, 그러므로 ,제2 코딩 신호(Second Coded Signal)임이 나타내어진다.
With respect to the decoder diagram of Figure 3d, the transmitted signal preferably includes an explicit indicator as auxiliary information 4a as shown in Figure 3e. The auxiliary information 4a includes a first decoded signal, a first decoded signal, a first decoded signal, a first processed signal, and a second processed signal, Is extracted by a bit stream parser (not shown in FIG. 3D) for transmission to the correct processor, such as a branch or a second inverse processing branch. Therefore, the encoded signal has not only the encoded / processed signal, but also auxiliary information for such a signal. In another embodiment, however, there may be an implicit signaling that allows the decoder-side bitstream parser to distinguish between certain signals. Referring to FIG. 3E, the first processed signal or the second processed signal is the output of the second coding branch and is therefore indicated as the second coded signal.

바람직하게는, 제1 복호화 분기 및/또는 제2 역 처리 분기는 스펙트럴 영역에서 시간 영역으로 변환하기 위한 MDCT 변환을 포함한다. 이 때문에, 오버랩-가산기(Overlap-adder)가, 동시에 블록킹 아티팩트를 피하기 위한 교차 페이드 효과(Cross Fade Effect)를 제공하는, 시간 영역 앨리어싱 제거 기능을 수행하기 위해 제공된다. 제2 역 처리 분기는 제3 영역으로부터 제2 영역으로의 변환을 수행하고 제1 결합기 뒤에 연결되는 변환기(Converter)는 제2 영역으로부터 제1 영역으로의 변환을 제공하는 반면에 일반적으로, 결합기 600의 출력에서는, 도 3d의 실시예에서, 복호화된 출력 신호를 나타내는 제1 영역 신호만이 있도록 하기 위해, 제1 코딩 분기는 제4 영역 내에서 부호화된 신호를 제1 영역으로 변환한다.
Advantageously, the first decoding branch and / or the second de-processing branch comprises an MDCT transform for transforming from the spectral domain to the time domain. To this end, an overlap-adder is provided for performing a time-domain anti-aliasing function, which provides a cross fade effect to simultaneously avoid blocking artifacts. The second inverse processing branch performs the conversion from the third region to the second region and the converter connected behind the first combiner provides the conversion from the second region to the first region while the combiner 600 In the embodiment of Figure 3d, the first coding branch converts the encoded signal in the fourth region into the first region so that there is only a first region signal representing the decoded output signal.

도 4a와 4b는 스위치 200의 위치에서 차이점을 가지는 두 다른 실시예를 나타낸다. 도 4a에서, 스위치 200는 일반적인(common) 전 처리 단계 100의 출력과 두 부호화되는 분기 400, 500의 입력 사이에 위치한다. 도 4a의 실시예는, 오디오 신호는 오직 단일 부호화 분기로 입력되고, 일반적인(common) 전 처리 단계의 출력에 연결되지 않은 다른 부호화 분기는 동작하지 않고, 따라서, 꺼진 상태(Switched Off) 또는 수면 모드(Sleep Mode)에 있다. 상기 실시예는 동작하지 않는 부호화 분기는 전력과 계산 자원(Computational Resource)을 소비하지 않는다는 점에서 바람직하며, 이는 특히, 배터리에 의해 전력이 공급되는, 따라서 전력 소비에 일반적인 한계를 가지는 모바일 어플리케이션에 유용하다.
Figures 4A and 4B show two alternative embodiments with differences in the position of the switch 200. [ In FIG. 4A, the switch 200 is located between the output of the common pre-processing step 100 and the input of the two encoded branches 400, 500. In the embodiment of FIG. 4A, the audio signal is input only to a single encoding branch, and other encoding branches not connected to the output of the common pre-processing stage do not operate and thus are in a switched off or sleep mode (Sleep Mode). This embodiment is preferred in that an inactive encoding branch does not consume power and computational resources, which is particularly advantageous for mobile applications powered by batteries, thus having a general limitation on power consumption. Do.

반면에, 그러나, 도 4b의 실시예는 전력 소비가 문제가 되지 않을 때 바람직하다. 상기 실시예에서는 양 부호화 분기 400, 500은 항상 활성화되어 있고, 단지 일정 시간 부분 및/또는 일정 주피수 부분을 위해 선택된 부호화 분기의 출력만이, 비트 스트림 다중화기 800로써 구현될 수 있는 비트 스트림 포매터(Bit Stream Formatter)로 전송된다. 그러므로, 도 4b의 실시예에서는, 양 부호화 분기는 항상 활성화되어 있고, 결정 단계 300에 의해 선택되는 부호화 분기의 출력은 출력 비트 스트림으로 입력되는 반면, 다른 선택되지 않은 부호화 분기 400의 출력은 폐기된다. 말하자면, 출력 비트 스트림, 즉 부호화된 오디오 신호에 입력되지 않는다.
On the other hand, however, the embodiment of FIG. 4B is desirable when power consumption is not a concern. In this embodiment, both the encoding branches 400 and 500 are always active, and only the output of the encoding branch selected for the constant time portion and / or the constant frequency portion is output to the bitstream formatter 800, (Bit Stream Formatter). Thus, in the embodiment of FIG. 4B, both encoding branches are always active, and the output of the encoding branch selected by decision step 300 is input to the output bitstream, while the output of the other unselected encoding branch 400 is discarded . That is, it is not input to the output bit stream, that is, the encoded audio signal.

바람직하게는, 제2 부호화 방법/복호화 방법은 LPC 기반 코딩 알고리즘이다. LPC 기반 음성 코딩에서는, 준주기적 임펄스형 여기 신호 세그먼트(Quasi-periodic Impulse-like Excitation Signal Segment) 또는 신호 부분(Signal Portion), 및 잡음형 여기 신호 세그먼트(Noise-like Excitation Signal Segment) 또는 신호 부분 사이의 구분이 만들어진다. 이것은 도 7b에서처럼 매우 낮은 비트율 LPC 보코더(LPC Vocoder)(2.4kbps)에서 수행된다. 그러나, 중간율 CELP 코더(Medium Rate CELP Coder)에서는, 여기(Excitation)는 적응 코드북(Adaptive Codebook)과 고정 코드북(Fixed Codebook)으로부터 스케일링된 벡터(Scaled Vector)의 가산(Addition)을 위해 얻어진다.
Preferably, the second coding method / decoding method is an LPC-based coding algorithm. In LPC-based speech coding, a quasi-periodic Impulse-like Excitation Signal Segment or a Signal Portion, and a Noise-like Excitation Signal Segment, . This is performed in a very low bit rate LPC vocoder (LPC Vocoder) (2.4 kbps) as in Fig. 7B. However, in a medium rate CELP coder, excitation is obtained for the addition of a scaled vector from an adaptive codebook and a fixed codebook.

준주기적 임펄스형 여기 신호 세그먼트(Quasi-periodic Impulse-like Excitation Signal Segment), 즉 특정 피치(Pitch)를 갖는 신호 세그먼트는 잡음형 여기 신호와는 다른 메커니즘으로 코딩된다. 준주기적 임펄스형 여기 신호는 유성음(Voiced Speech)에 연결되는 반면, 잡음형 신호는 무성음(Unvoiced Speech)에 연결된다.
A quasi-periodic impulse-like Excitation Signal Segment, i.e., a signal segment having a certain pitch, is coded with a mechanism different from the noise-type excitation signal. The quasi-periodic impulse excitation signal is connected to a voiced speech while the noise type signal is connected to an unvoiced speech.

예시적으로, 참조(Reference)가 도 5a부터 5d에 만들어진다. 여기서, 준주기적 임펄스형 신호 세그먼트 또는 신호 부분 및 잡음형 신호 세그먼트 또는 신호 부분이 예시적으로 논의된다. 특히, 시간 영역에서의 도 5a 및 주파수 영역에서의 도 5b에 나타난 것과 같은 유성음은 준주기적 임펄스형 신호 부분의 예로써 논의되고, 잡음형 신호 부분의 예로써의 무성음 세그먼트는 도 5c 및 도 5d와 관련되어 논의된다. 음성은 일반적으로 유성, 무성 또는 혼합으로 분류될 수 있다. 샘플링된 유성 및 무성 세그먼트를 위한 시간 및 주파수 영역 도표(Time-and-frequency Domain Plot)이 도 5a부터 5d에 나타나 있다. 무성은 랜덤형이고 광대역(Broadband)인 반면 유성음은 시간 영역에서 준주기적이고 주파수 영역에서 고조파적으로(Harmonically) 구성된다. 유성음의 단시간 스펙트럼(Short-time Spectrum)은 그것의 미세 고조파 포먼트 구조(Fine Harmonic Formant Structure)로 특징지어진다. 상기 미세 고조파 구조는 음성의 준주기성(Quasi-periodicity)의 결과이고 진동하는 성대(Vibrating Vocal Chord)에 기인할 수 있다. 포먼트 구조(스펙트럴 엔빌로프(Spectral Envelop))는 소스와 성도(Vocal Tract)의 상호작용에 기인한다. 상기 성도는 인두(Pharynx) 및 구강(Mouth Cavity)으로 이루어진다. 유성음의 단시간 스펙트럼에 “맞춘” 스펙트럴 엔빌로프의 형상은, 성문 펄스(Glottal Pulse)로 인해 성대의 전달 특성 및 스펙트럴 기울기(Spectral Tilt)(6dB/Octave)에 관련된다. 스펙트럴 엔빌로프는 포먼트(Formant)로 불리는 피크들의 집합에 의해 특징 지어진다. 포먼트는 성도(Vocal Tract)의 공진 모드(Resonant Mode)이다. 평균 성도에 대해 5kHz 이하에서 3 내지 5 포먼트들이 존재한다. 일반적으로 3kHz 이하에서 발생하는, 첫 번째 3개의 포먼트들의 진폭들 및 위치들은 음성 합성(Synthesis) 및 지각(Perception) 모두에서 매우 중요하다. 더 높은 포먼트들이 또한 광대역 및 무성음(Unvoiced Speech) 표현을 위해 중요하다. 음성의 속성들(Properties)은 아래과 같이 물리적인 음성 발생 시스템에 관련된다. 유성음은 진동하는 성대에 의해 발생된 준주기적 성문 공기 펄스로 성도를 여기(Exciting)함으로써 발생된다. 주기적 펄스의 주파수는 기본 주파수(Fundamental Frequency) 또는 피치(Pitch)라 칭한다. 무성음은 성도에서 수축을 통해 공기를 압박함으로써 생성된다. 비음(Nasal Sound)은 비강(Nasal Tract)의 성도에 대한 음향적 결합에 기인하며, 파열음(Plosive Sound)은 성도에서의 폐쇄 뒤에 만들어진 공기압을 갑자기 해제함으로써 생성된다.
Illustratively, a reference is made in Figures 5a to 5d. Here, a quasi-periodically impulse-like signal segment or signal portion and a noise-like signal segment or signal portion are illustratively discussed. 5a in the time domain and Fig. 5b in the frequency domain is discussed as an example of a quasi-periodic impulse-like signal portion and the unvoiced segment as an example of the noise-like signal portion is shown in Figs. 5c and 5d . Voice can generally be classified as oil-based, silent or mixed. Time-and-frequency Domain Plots for sampled oily and silent segments are shown in Figures 5a to 5d. Silence is random and broadband, while voices are quasi-periodic in the time domain and harmonically in the frequency domain. The short-time spectrum of a voiced sound is characterized by its fine harmonic formant structure. The fine harmonic structure is a result of quasi-periodicity of the voice and may be caused by a vibrating vocal chord. The formant structure (Spectral Envelope) is due to the interaction of the source and Vocal Tract. The sutures consist of pharynx and mouth cavities. The shape of the spectral envelope "in" to the short time spectrum of the voiced sound is related to the transmission characteristics and spectral tilt (6 dB / Octave) of the vocal cords due to the glottal pulse. Spectral envelopes are characterized by a collection of peaks called Formants. The formant is the resonant mode of Vocal Tract. There are 3 to 5 formants below 5 kHz for the average sperm. The amplitudes and positions of the first three formants, which generally occur below 3 kHz, are very important in both speech synthesis and perception. Higher formants are also important for broadband and unvoiced speech. The properties of speech are related to the physical speech generation system as follows. A voiced sound is generated by exciting a syllable with a quasi-periodic gated air pulse generated by a vibrating vocal cords. The frequency of the periodic pulses is referred to as a fundamental frequency or a pitch. Unvoiced sounds are created by squeezing air in saints. The Nasal Sound is caused by the acoustic coupling of the Nasal Tract to the saints, and the Plosive Sound is created by abruptly releasing the air pressure created behind the closing in the saints.

따라서, 오디오 신호의 잡음형 부분은, 도 5a와 도 5b에서 예로써 도시된 것과 같은 준주기적 임펄스형 부분과는 달리, 도 5c와 도 5d에 도시된 바와 같이 고조파 주파수 영역 구조 또는 임펄스형 시간 영역 구조 어떤 것도 나타내지 않는다. 그러나, 나중에 설명되는 바와 같이, 잡음형 부분과 준주기적 임펄스형 부분간의 구분은 여기 신호를 위한 LPC 후에 관찰될 수 있다. 상기 LPC는 성도를 모델링하고 성도의 여기를 신호로부터 추출하는 방법이다.
Thus, the noise-like portion of the audio signal, unlike the quasi-periodic impulse-like portion as shown by way of example in FIGS. 5A and 5B, can have a harmonic frequency domain structure or impulse- No structure is shown. However, as will be explained later, the distinction between the noise-like portion and the quasi-periodic impulse-like portion can be observed after the LPC for the excitation signal. The LPC is a method for modeling a soul and extracting excitation of a soul from a signal.

나아가, 준주기적 임펄스형 부분과 잡음형 부분은 시기 적절한 방법으로 발생할 수 있으며, 이는 시간에서의 오디오 신호의 한 부분이 잡음(Noisy)이고 시간에서의 오디오 신호의 다른 부분이 준주기적, 즉 음색(Tonal)임을 의미한다. 선택적으로, 또는 추가적으로, 신호의 특성은 서로 다른 주파수 대역들에서 다를 수 있다. 따라서, 오디오 신호가 잡음인지 음색인지의 결정은, 일정 주파수 대역이 잡음으로 고려되고 다른 주파수 대역들이 음색으로 고려될 수 있도록, 주파수 선택적으로 수행된다. 이 경우에, 오디오 신호의 일정 시간 부분은 음색 성분(Tonal Component)와 잡음 성분(Noisy Component)를 포함할 수 있다.
Furthermore, the quasi-periodic impulse-like portion and the noise-like portion can occur in a timely manner, since one part of the audio signal in time is noisy and the other part of the audio signal in time is quasi-periodic, Tonal. Alternatively, or additionally, the characteristics of the signal may be different in different frequency bands. Thus, the determination of whether the audio signal is a noise or a tone is performed in a frequency-selective manner such that a certain frequency band is considered noise and the other frequency bands can be considered a tone color. In this case, a certain time portion of the audio signal may include a tone component (Noise Component) and a tone component (Tonal Component).

도 7a는 음성 발생 시스템의 선형 모델을 나타낸다. 이 시스템은 2-단계 여기, 즉 도 7c에 나타난 바와 같은 유성음에서의 임펄스-트레인(Impulse Train) 및 도 7d에 나타난 바와 같은 무성음에서의 랜덤 잡음(Random Noise)을 가정한다. 성도는 성문 모델 72에 의해 발생된, 도 7c 또는 도 7d의 펄스들을 처리하는 전-극점 필터(All-pole Filter)로써 모델링된다. 그러므로, 도 7a의 시스템은 이득 단계(Gain Stage) 77, 순방향 경로(Forward Path) 78, 피드백 경로(Feedback Path) 79 및 가산 단계(Adding Stage) 80를 갖는 도 7b의 전-극점 필터 모델로 축소될 수 있다. 피드백 경로 79에는, 예측 필터(Prediction Filter)(81)가 존재하고, 도 7b에 도시된 전체 소스-모델 합성 시스템은 다음과 같이 z-영역 함수를 사용하여 표현될 수 있다.
7A shows a linear model of a speech generation system. This system assumes a two-stage excitation, namely an impulse train at voiced sound as shown in FIG. 7C and a random noise at unvoiced sound as shown in FIG. 7D. The syllable is modeled as an all-pole filter that processes the pulses of Fig. 7c or Fig. 7d, generated by the linguistic model 72. Fig. Thus, the system of FIG. 7A is reduced to a pre-pole filter model of FIG. 7b with a gain stage 77, a forward path 78, a feedback path 79, and an adding stage 80 . In the feedback path 79, there is a prediction filter 81, and the entire source-model synthesis system shown in Fig. 7B can be expressed using a z-domain function as follows.

여기서, g는 이득을 나타내고, A(z)는 LP 분석에 의해 결정된 예측 필터이며, X(z)는 여기 신호이고, S(z)는 합성 음성 출력(Synthesis Speech Output)이다.
Here, g denotes a gain, A (z) denotes a prediction filter determined by LP analysis, X (z) denotes an excitation signal, and S (z) denotes a synthesized speech output.

도 7c 및 도 7d는 선형 소스 시스템 모델을 사용한 유성음 및 무성음 합성의 그래픽적인 시간 영역 묘사를 제공한다. 상기 수식에서 이 시스템 및 여기 파라미터들은 알려져 있지 않으며, 음성 샘플의 유한 집합들로부터 결정되어야 한다. A(z)의 계수들은 출력 신호의 선형 예측과 필터 계수들의 양자화를 이용하여 얻어진다. p-번째 차수 전송 선형 예측기(P-th Order Forward Linear Predictor)에서, 음성 시퀀스의 현재 샘플은 p개의 과거 샘플들의 선형 결합으로부터 예측된다. 예측기 계수들은 레빈슨-더빈(Levinson-Durbin) 알고리즘 또는 일반적으로 자기상관 방법(Autocorrelation Method) 또는 반사 방법(Reflection Method)과 같은 잘 알려진 알고리즘에 의해 결정될 수 있다.
Figures 7c and 7d provide graphical time domain representations of voiced and unvoiced syntheses using a linear source system model. In this equation, the system and excitation parameters are not known and should be determined from the finite sets of speech samples. The coefficients of A (z) are obtained using the linear prediction of the output signal and the quantization of the filter coefficients. In the P-th Order Forward Linear Predictor, the current sample of the speech sequence is predicted from the linear combination of the past p samples. The predictor coefficients may be determined by well known algorithms such as the Levinson-Durbin algorithm or generally the Autocorrelation Method or the Reflection Method.

도 7e는 도 4a의 LPC 분석 블록 510의 더 상세한 구현을 나타낸다. 오디오 신호는 필터 정보 A(z)를 결정하는 필터 결정 블록으로 입력된다. 상기 정보는 복호화기에서 요구되는 단기간 예측 정보(Short-term Prediction Information)로서 출력된다. 단기간 예측 정보는 실제 예측 필터 85에 의해 요구된다. 이 샘플에 대해, 예측 에러 신호가 라인 84에서 발생되도록 감산기(Subtracter) 86에서, 오디오 신호의 현재의 샘플이 입력되고, 현재의 샘플에 대해 예측된 값이 감산된다. 이러한 예측 에러 신호 샘플들의 시퀀스는 도 7c 또는 7d에 매우 도식적으로 나타나 있다. 그러므로, 도 7c, 7d는 일종의 수정된 임펄스형 신호로서 고려될 수 있다.
FIG. 7E shows a more detailed implementation of LPC analysis block 510 of FIG. 4A. The audio signal is input to a filter decision block which determines the filter information A (z). The information is output as short-term prediction information (Short-term Prediction Information) required by the decoder. The short-term prediction information is required by the actual prediction filter 85. For this sample, at the subtracter 86, a current sample of the audio signal is input and a predicted value for the current sample is subtracted such that a prediction error signal is generated on line 84. [ Such a sequence of prediction error signal samples is shown schematically in Figure 7c or 7d. Therefore, Figures 7c and 7d can be considered as a kind of modified impulse-like signal.

도 7e가 여기 신호를 계산하기 위한 바람직한 방법을 나타내는 반면, 도 7f는 가중 신호를 계산하기 위한 바람직한 방법을 나타낸다. 도 7e와 대조적으로,

가 1과 다를 때, 필터 85는 다르다. 1보다 작은 A 값이

에 대해 바람직하다. 또한, 블록 87이 있고, μ는 1보다 작은 숫자임이 바람직하다.
Figure 7e shows the preferred method for calculating the excitation signal, while Figure 7f shows the preferred method for calculating the weighted signal. In contrast to Figure 7E,

Is different from 1, the filter 85 is different. A value less than 1

. Also, there is block 87, where μ is preferably a number less than one.

일반적으로, 도 7e와 도7f의 구성 요소들은 3GPP TS 26.190 또는 3GPP TS 26.290으로써 구현된다.
In general, the components of Figures 7e and 7f are implemented as 3GPP TS 26.190 or 3GPP TS 26.290.

도 7g는, 도 2b의 구성요소 537에서와 같은 복호화기측에서 적용될 수 있는, 역 처리를 나타낸다. 특히, 블록 88은 가중 신호(Weighted Signal)로부터 가중되지 않은 신호(Unweighted Signal)를 발생시키고 블록 89는 가중되지 않은 신호로부터 여기(Excitation)를 계산한다. 일반적으로, 도 7g에서 가중되지 않은 신호(Unweighted Signal)를 제외한 모든 신호는 LPC 영역에 있으나, 여기 신호와 가중 신호는 같은 영역에서 서로 다른 신호이다. 블록 89는 블록 536의 출력과 함께 사용될 수 있는 여기 신호를 출력할 수 있다. 그리고 나서, 일반적인 역 LPC 변환이 도 2b의 블록 540에서 수행될 수 있다.
Fig. 7g shows inverse processing, which can be applied at the decoder side, such as at element 537 in Fig. 2b. In particular, block 88 generates an unweighted signal from the weighted signal and block 89 computes excitation from the unweighted signal. Generally, all signals except the unweighted signal in FIG. 7G are in the LPC region, but the excitation signal and the weighted signal are different signals in the same region. Block 89 may output an excitation signal that may be used with the output of block 536. A general inverse LPC transform can then be performed at block 540 of FIG. 2B.

순차적으로, 이 알고리즘에 적용되는 변경들을 설명하기 위해 분석-합성 CELP 부호화기가 도 6과 관련하여 설명된다. 상기 CELP 부호화기는 "Speech Coding: A Tutorial Review", Andreas Spanias, Proceedings of the IEEE, Vol. 82, No. 10, October 1994, pages 1541-1582 에 상세히 설명되어 있다. 도 6에 도시된 CELP 부호화기는 장기간 예측 성분 60 및 단기간 예측 성분 62를 포함한다. 또한, 64에 의해 지시된 코드북이 사용된다. 지각 가중 필터(Perceptual Weighting Filter) W(z)는 66에서 구현되고, 에러 최소화 제어기(Error Minimization Controller)는 68에서 제공된다. s(n)은 시간 영역 입력 신호이다. 지각적으로 가중된 후에, 가중된 신호는 블록 66 출력에서의 가중된 합성 신호와 본래의 가중된 신호 s_w(n) 사이의 에러를 계산하는 감산기 69로 입력된다. 일반적으로, 단기간 예측 필터 계수 A(z)는 LP 분석 단계에 의해 계산되고, 그 계수는 도 7e에 나타난 것처럼

로 양자화된다. 장기간 예측 이득 g 및 벡터 양자화 인덱스, 즉 코드북 참조(Codebook Reference)를 포함하는 장기간 예측 정보 A_L(z)는, 도 7e에 10a로 나타내어진 LPC 분석 단계의 출력에서의 예측 에러 신호에 기반하여 계산된다. LTP 파라미터들은 피치 지연 및 이득(Pitch Delay and Gain)이다. CELP에서 이것은 보통, 지난 여기 신호(Excitation Signal)(잔여(Residual)가 아님)를 포함하는 적응 코드북(Adaptive Codebook)으로써 구현된다. 상기 적응 CB 지연 및 이득은 평균-제곱(Mean-squared) 가중 에러를 최소화함으로써 찾아진다. (폐 루프 피치 검색)
Sequentially, an analysis-synthesis CELP encoder is described with reference to FIG. 6 to illustrate the changes applied to this algorithm. The CELP encoder is described in "Speech Coding: A Tutorial Review ", Andreas Spanias, Proceedings of the IEEE, Vol. 82, No. 10, October 1994, pages 1541-1582. The CELP encoder shown in FIG. 6 includes a long-term prediction component 60 and a short-term prediction component 62. Also, the codebook indicated by 64 is used. The Perceptual Weighting Filter W (z) is implemented at 66 and the Error Minimization Controller is provided at 68. s (n) is a time domain input signal. After being perceptually weighted, the weighted signal is input to a subtractor 69 which calculates the error between the weighted synthesized signal at block 66 output and the original weighted signal s _w (n). In general, the short term prediction filter coefficient A (z) is calculated by the LP analysis step, and the coefficient is calculated as shown in FIG. 7E

. The long term prediction gain g and the long term prediction information A _L (z) including the vector quantization index, that is, the codebook reference, are calculated based on the prediction error signal at the output of the LPC analysis step represented by 10a in FIG. do. The LTP parameters are the Pitch Delay and Gain. In CELP, this is usually implemented as an Adaptive Codebook that includes an Excitation Signal (not a Residual). The adaptive CB delay and gain are found by minimizing the Mean-squared weighted error. (Closed loop pitch search)

CELP 알고리즘은 예컨대, 가우시안 시퀀스들의 코드북을 이용하여, 단기간 및 장기간 예측 후에 얻어진 잔여 신호(Residual Signal)를 부호화한다. ACELP 알고리즘(여기에서, "A"는 "Algebraic"를 의미한다.)은 특정한 대수적으로 설계된 코드북을 가진다.
The CELP algorithm codes residual signals obtained after short-term and long-term prediction, for example, using a codebook of Gaussian sequences. The ACELP algorithm (where "A" means "Algebraic ") has a particular algebraically designed codebook.

코드북은 많거나 적은 벡터를 포함할 수 있고 각 벡터는 어떠한 샘플 길이를 가진다. 이득 팩터(Gain Factor) g는 코드 벡터를 스케일링하고, 이득을 받은(Gained) 코드는 장기간 예측 합성 필터 및 단기간 예측 합성 필터에 의해 필터링된다. “최적의(Optimum)”코드 벡터는, 감산기 69 출력에서의 지각적으로 가중된 평균 제곱 에러가 최소화되도록 선택된다. CELP에서의 검색 프로세스는 도 6에 나타난 것과 같이 분석-합성 최적화(Analysis by Synthesis Optimization)에 의해 행해진다.
A codebook can contain many or fewer vectors, and each vector has a certain sample length. Gain Factor g scales the codevector and the Gained code is filtered by the long-term prediction synthesis filter and the short-term prediction synthesis filter. The " Optimum " codevector is selected such that the perceptually weighted mean square error at the subtractor 69 output is minimized. The search process in CELP is performed by Analysis by Synthesis Optimization as shown in FIG.

프레임이 무성 및 유성음의 혼합이거나 음악 위에 음성이 존재하는 특정한 경우에 대해서는, TCX 코딩이 LPC 영역에서 여기를 코딩하기에 더욱 적합하다. TCX 코딩은 여기 생성의 어떠한 가정을 함이 없이 주파수 영역에서 가중 신호를 처리한다. TCX는 그 때 CELP 코딩보다 더욱 포괄적이고, 여기(Excitation)의 유성 또는 무성의 소스 모델에 한정되지 않는다. TCX는 여전히, 음성형 신호의 포먼트(Formant)들을 모델링하기 위해 선형 예측 필터(Linear Predictive Filter)를 이용하는 소스-지향 모델(Source-oriented Model)이다.
For a particular case where the frame is a mixture of silent and voiced sounds or there is speech on the music, TCX coding is more suitable for coding excitation in the LPC domain. TCX coding processes the weighted signal in the frequency domain without any assumption of excitation. TCX is then more comprehensive than CELP coding, and is not limited to the oily or silent source model of Excitation. TCX is still a source-oriented model that uses Linear Predictive Filters to model formants of speech-like signals.

AMR-WB+형 코딩에서, 서로 다른 TCX 모드들과 ACELP 사이의 선택은 AMR-WB+ 설명서로부터 알려진 바와 같이 일어난다. TCX 모드들은, 블록 단위 이산 푸리에 변환(Block-wise Discrete Fourier Transform)이 서로 다른 모드에서 다르다는 점에서 서로 다르고, 최선의 모드는 합성 방법에 의한 분석 또는 직접적인 “피드포워드(Feedforward)” 모드에 의해 선택될 수 있다.
In AMR-WB + type coding, the choice between different TCX modes and ACELP occurs as known from the AMR-WB + manual. The TCX modes are different in that the Block-wise Discrete Fourier Transform is different in different modes, and the best mode is selected by the synthesis method or by a direct " feedforward " .

도 2a 및 2b와 관련되어 논의된 것처럼, 일반적인(common) 전 처리 단계 100은 바람직하게는 결합 다중 채널(서라운드/결합 스테레오 장치) 101과, 추가로, 대역폭 확장 단계 102를 포함한다. 대응하여, 복호화기는 대역폭 확장 단계 701와 순차적으로 연결되는 결합 다중 채널 단계 702를 포함한다. 바람직하게는, 상기 결합 다중 채널 단계 101는, 부호화기에 대해서, 대역폭 확장 단계 102 전에 연결되고, 복호화기측에서, 대역폭 확장 단계 701는 신호 처리 방향(Signal Processing Direction)에 대해서 결합 다중 채널 단계 전에 연결된다. 선택적으로, 그러나, 일반적인(common) 전 처리 단계는 순차적으로 연결되는 대역폭 확장 단계 없이 결합 다중 채널 단계를 포함하거나, 연결되는 결합 다중 채널 단계 없이 대역폭 확장 단계를 포함할 수 있다.
As discussed in connection with FIGS. 2A and 2B, a common pre-processing stage 100 preferably includes a combined multiple-channel (surround / combined stereo device) 101 and, in addition, a bandwidth expansion stage 102. Correspondingly, the decoder includes a combined multi-channel step 702 which is in turn connected to a bandwidth extension step 701. Preferably, the combined multi-channel step 101 is connected to the encoder before the bandwidth extension step 102, and on the decoder side, the bandwidth extension step 701 is connected before the combined multi-channel step with respect to the signal processing direction . Optionally, however, the common pre-processing step may include a combined multiple-channel step without a bandwidth extension step that is sequentially connected, or may include a bandwidth extension step without a coupled multiple-channel step connected.

부호화기측 101a, 101b 및 복호화기측 702a, 702b에서의 결합 다중 채널 단계에 대한 바람직한 예가 도 8의 내용에 나타나 있다. E개의 본래 입력 채널이, 하향 혼합기(Downmixer) 101a가 K개의 전송되는 채널을 발생시킬 수 있도록, 하향 혼합기 101a로 입력되며, 수 K는 1보다 크거나 같고 E보다 작거나 같다.
A preferred example of the combined multi-channel step in the encoder side 101a, 101b and the decoder side 702a, 702b is shown in FIG. The E original input channels are input to the down-mixer 101a so that the down mixer 101a can generate K transmitted channels, where the number K is greater than or equal to 1 and less than or equal to E.

바람직하게는, 상기 E 입력 채널들은 파라미터 정보(Parametric Information)를 발생시키는 결합 다중 채널 파라미터 분석기 101b로 입력된다. 이러한 파라미터 정보는 바람직하게는, 한 다른 부호화 및 후속하는 호프만(Huffman) 부호화 또는, 선택적으로, 후속하는 산술적 부호화에 의하는 것이 같이, 엔트로피-부호화(Entropy-encoded)된다. 블록 101b에 의해 출력된 부호화된 파라미터 정보는 도 2b에서의 항목 720의 부분인 파라미터 복호화기 702b로 전송된다. 상기 파라미터 복호화기 702b는 전송된 파라미터 정보를 복호화하고, 복호화된 파라미터 정보를 상향 혼합기(Upmixer) 702a로 전송한다. 상기 상향 혼합기 702a는 K 전송 채널을 수신하고 많은 L 출력 신호를 발생시키며, 상기 L의 수는 K보다 크거나 같고 E보다 작거나 같다.
Preferably, the E input channels are input to a combined multi-channel parameter analyzer 101b that generates parametric information. This parameter information is preferably entropy-encoded, such as by another encoding and subsequent Huffman encoding, or, optionally, subsequent arithmetic encoding. The encoded parameter information output by block 101b is transmitted to parameter decoder 702b which is part of item 720 in FIG. 2B. The parameter decoder 702b decodes the transmitted parameter information and transmits the decoded parameter information to an upmixer 702a. The up mixer 702a receives the K transport channel and generates a number of L output signals, wherein the number of L is greater than or equal to K and less than or equal to E.

파라미터 정보는, BBC 기술로부터 알려지거나 또는 MPEG 서라운드 표준에서 상세히 설명되고 알려진 것처럼, 채널간 레벨 차이, 채널간 시간 차이, 채널간 위상 차이 및/또는 채널간 일관성 크기(Coherence Measure)를 포함한다. 전송되는 채널의 수는 초저비트율(Ultra-low bit rate) 적용을 위한 단일 모노 채널이거나, 또는 호환성 있는(Compatible) 스테레오 응용을 포함하거나, 또는 호환성 있는 스테레오 신호, 즉 2개의 채널을 포함할 수 있다. 일반적으로, E 출력 채널의 수는 5이거나 그 이상일 수 있다. 선택적으로, E 출력 채널의 수는 공간 오디오 객체 코딩(SAOC : Spatial Audio Object Coding)의 내용에서 알려진 것과 같은 E 오디오 객체일 수 있다.
The parameter information includes interchannel level differences, interchannel time differences, interchannel phase differences, and / or interchannel coherence size, as known from BBC technology or as described and described in detail in the MPEG Surround standard. The number of channels transmitted may be a single mono channel for Ultra-low bit rate application, or may include a compatible stereo application, or may include a compatible stereo signal, i.e., two channels . In general, the number of E output channels may be five or more. Optionally, the number of E output channels may be an E audio object such as is known in the context of Spatial Audio Object Coding (SAOC).

하나의 구현에서, 하향 혼합기는, 본래의 E 입력 채널의 가중 또는 비가중 가산(Weighted or Unweighted Addition), 또는 E 입력 오디오 객체의 가산을 수행한다. 입력 신호로써 오디오 객체의 경우, 결합 다중 채널 파라미터 분석기 101b는, 바람직하게는 각각의 시간 부분에 대해 그리고 심지어 더욱 바람직하게는 각각의 주파수 대역에 대해 오디오 객체들 사이의 상관 행렬(Correlation Matrix)과 같은 오디오 객체 파라미터를 계산한다. 결국, 전체 주파수 범위는 적어도 10 그리고 바람직하게는 32 또는 64 주파수 대역들로 나누어질 수 있다.
In one implementation, the down-mixer performs a weighted or unweighted addition of the original E input channel, or an addition of the E input audio object. In the case of an audio object as an input signal, the combined multi-channel parameter analyzer 101b preferably calculates a correlation matrix for each time portion and even more preferably for each frequency band, such as a correlation matrix between audio objects Calculates audio object parameters. As a result, the entire frequency range can be divided into at least 10 and preferably 32 or 64 frequency bands.

도 9는 도 2a의 대역폭 확장 단계 102와 도 2b의 대응하는 대역폭 확장 단계 701의 구현을 위한 바람직한 실시예를 나타낸다. 부호화기측에서, 대역폭 확장 블록 102은 바람직하게는 블록은 상기 저역 통과(Low pass)의 결과로 발생하거나, 또는 오직 QMF 대역의 절반에서만 동작하는, 역 QMF의 부분인 저역 통과 필터링 블록 102b, 다운 샘플러 블록(Down Sampler Block) 및 고대역 분석기(High Band Analyzer) 102a를 포함한다. 대역폭 확장 블록 102으로의 본래 오디오 신호 입력은, 부호화 분기들 및/또는 스위치로 입력되는 저대역 신호를 발생시키기 위해, 저역 통과 필터링된다. 저역 통과 필터는 3kHz에서 10kHz 범위에 있을 수 있는, 차단 주파수(Cut off frequency)를 가진다. 나아가, 대역폭 확장 블록 102은 추가로, 스펙트럴 엔빌로프 파라미터 정보, 잡음 플로어 파라미터 정보, 역 필터링 파라미터 정보, 추가로 고대역에서의 특정한 고조파 라인들에 관련된 파라미터 정보, 및 스펙트럴 대역 복사(SBR : Spectral Band Replication)에 관계된 장에서 MPEG-4 표준에 상세히 논의되는 것과 같은 추가적인 파라미터와 같은 대역폭 확장 파라미터를 계산하기 위한, 고대역 분석기를 포함한다.
FIG. 9 shows a preferred embodiment for the bandwidth expansion step 102 of FIG. 2A and the corresponding bandwidth extension step 701 of FIG. 2B. On the encoder side, the bandwidth extension block 102 preferably includes a low pass filtering block 102b, which is part of the inverse QMF, where the block occurs as a result of the low pass, or only operates in half of the QMF band, Block (Down Sampler Block) and a high-band analyzer (High Band Analyzer) 102a. The original audio signal input to the bandwidth extension block 102 is low pass filtered to generate a low band signal input to the encoding branches and / or the switch. The low-pass filter has a cutoff frequency, which can be in the range of 3 kHz to 10 kHz. Further, the bandwidth extension block 102 may further include spectral envelope parameter information, noise floor parameter information, inverse filtering parameter information, further parameter information related to specific harmonic lines in the high band, and spectral band copy (SBR) Band analyzer for computing bandwidth extension parameters such as additional parameters as discussed in detail in the MPEG-4 standard in the chapter on Spectral Band Replication.

복호화기측에서, 대역폭 확장 블록 701은 패쳐(Patcher) 701a, 조정기(Adjuster) 701b 및 결합기(Combiner) 701c를 포함한다. 결합기 701c는 복호화된 저대역 신호와, 조정기 701b에 의한, 복원되고 조정된 고대역 신호 출력을 결합한다. 조정기 701c로의 입력은, 스펙트럴 대역 복사(SBR) 또는, 일반적으로, 대역폭 확장에 의한 것과 같이 저대역 신호로부터 고대역 신호를 도출하기 위해 동작되는 패쳐에 의해 제공된다. 상기 패쳐 701a에 의해 수행되는 패칭은 조화 방법(Harmonic Way) 또는 비조화 방법(Non-harmonic Way)으로 수행될 수 있다.
On the decoder side, the bandwidth extension block 701 includes a Patcher 701a, an Adjuster 701b, and a Combiner 701c. The combiner 701c combines the decoded lowband signal with the reconstructed and regulated highband signal output by the regulator 701b. The input to the regulator 701c is provided by spectral band copy (SBR) or a feature that is typically operated to derive a highband signal from a lowband signal, such as by bandwidth expansion. Patching performed by the modifier 701a may be performed by a harmonic method or a non-harmonic method.

도 8 및 도 9에 나타난 것처럼, 묘사된 블록들은, 바람직한 실시예에서 모드 제어 입력을 가질 수 있다. 상기 모드 제어 입력은 결정 단계 300 출력 신호로부터 도출된다. 그러한 바람직한 실시예에서, 대응하는 블록의 특성은, 결정 단계 출력에 적합화될 수 있으며, 즉, 바람직한 실시예에서, 음성으로의 결정 또는 음악으로의 결정이 오디오 신호의 특정 시간 부분에 대해 내려질 수 있다., 바람직하게는, 상기 모드 제어는 상기 블록들의 기능의 하나 또는 그 이상에만 관련되고, 블록들 기능의 모두에 관련되지는 않는다. 예를 들어, 결정(Decision)은 패쳐 701a에만 영향을 주고, 도 9의 다른 블록들에는 영향을 주지 않을 수 있다. 또는 예를 들어, 결정(Decision)은 도 8의 결합 다중 채널 파라미터 분석기 101b에만 영향을 주고 도 8의 다른 블록들에는 영향을 주지 않을 수 있다. 이러한 구현은 바람직하게는 일반적인(common) 전-처리 단계에서 유연성(Flexibility)을 제공함으로써 더 높은 유연성, 더 높은 품질 및 더 낮은 비트율 출력 신호가 얻어지도록 하기 위함이다. 그러나, 반면에, 양 종류 신호에서 일반적인(common) 전-처리 단계에서의 알고리즘의 사용은 효과적인 부호화/복호화 방법을 구현할 수 있게 해 준다.
As shown in Figures 8 and 9, the depicted blocks may have a mode control input in the preferred embodiment. The mode control input is derived from an output signal of decision step 300. In such a preferred embodiment, the characteristics of the corresponding block can be adapted to the output of the decision step, that is, in a preferred embodiment, a determination with speech or a decision with music is made for a particular time portion of the audio signal Preferably, the mode control is related only to one or more of the functions of the blocks, and not to all of the functions of the blocks. For example, the decision may affect only the modifier 701a and not the other blocks of FIG. Or, for example, the decision may only affect the combined multi-channel parameter analyzer 101b of FIG. 8 and not the other blocks of FIG. This implementation is preferably to provide flexibility in the common pre-processing stage so that higher flexibility, higher quality and lower bit rate output signals are obtained. However, on the other hand, the use of algorithms in the common pre-processing stages in both types of signals allows the implementation of efficient coding / decoding methods.

도 10a 및 도 10b는 상기 결정 단계 300의 두 개의 다른 구현을 나타낸다. 도 10a에서, 개방 루프 결정이 나타나 있다. 여기서, 결정 단계에서의 신호 분석기300a는, 입력 신호의 특정 시간 부분 또는 특정 주파수 부분이, 상기 신호의 부분이 제1 부호화 분기 400에 의해 부호화될 것을 요구하는 특성을 가지는지 또는 제2 부호화 분기 500에 의해 부호화될 것을 요구하는 특성을 가지는지 결정하기 위해, 특정 법칙을 가진다. 이에 따라, 신호 분석기 300a는 일반적인(common) 전-처리 단계로의 오디오 입력 신호를 분석하거나, 또는 일반적인(common) 전-처리 단계에 의한 오디오 신호 출력, 즉 오디오 중간 신호를 분석하거나, 또는 모노 신호이거나 또는 도 8에 나타난 K 채널들을 가지는 신호인 하향 혼합 신호(Downmix Signal)의 출력과 같은 일반적인(common) 전-처리 단계 내의 중간 신호를 분석할 수 있다. 출력측에서, 신호 분석기 300a는, 부호화기측에서의 스위치 200와 복호화기측에서의 대응하는 스위치 600 또는 결합기 600를 제어하기 위해 스위칭 결정을 생성한다.
FIGS. 10A and 10B illustrate two different implementations of the decision step 300. In Fig. 10A, an open loop decision is shown. Here, the signal analyzer 300a in the determination step determines whether a specific time portion or a specific frequency portion of the input signal has a characteristic that a portion of the signal needs to be encoded by the first encoding branch 400, In order to determine if it has the property that it needs to be encoded by the decoder. Accordingly, the signal analyzer 300a may analyze an audio input signal to a common pre-processing stage, or analyze an audio signal output, i. E., An audio intermediate signal, by a common pre-processing step, Or an intermediate signal in a common pre-processing step such as the output of a downmix signal, which is a signal having K channels as shown in FIG. 8. On the output side, the signal analyzer 300a generates a switching decision to control the switch 200 on the encoder side and the corresponding switch 600 or combiner 600 on the decoder side.

제2 스위치 521에 대해서는 상세히 논의되지 않았음에도 불구, 제2 스위치521가 도 4a 및 도 4b와 관련되어 논의된 제1 스위치 200와 유사한 방법으로 위치할 수 있다는 것이 강조된다. 따라서, 도 3c에서 스위치 521의 선택 가능한 위치는, 양 처리 분기가 동시에 동작하고 오직 하나의 처리 분기의 출력만이, 도 3c에 나타나지 않은, 비트 스트림 형성기(Bit Stream Former)에 의한 비트 스트림에 기록되도록, 양 처리 분기들 522, 523, 524의 출력에 있다.
It is emphasized that although the second switch 521 is not discussed in detail, the second switch 521 may be located in a manner similar to the first switch 200 discussed in connection with Figs. 4A and 4B. Thus, the selectable position of the switch 521 in FIG. 3C is such that both processing branches operate simultaneously and only the output of one processing branch is written to the bitstream by the Bitstream Former, not shown in FIG. 523, and 524, respectively.

또한, 제2 결합기 600는 도 4c에서 논의된 바와 같은 특정 교차 페이딩(Cross Fading) 기능을 가질 수 있다. 선택적으로 또는 부가적으로, 제1 결합기 532는 동일한 교차 페이딩 기능을 가질 수 있다. 또한, 양 결합기는 동일한 교차 페이딩 기능을 가질 수도 있고, 또는 서로 다른 교차 페이딩 기능을 가질 수도 있으며, 또한 양 결합기가 어떠한 추가적인 교차 페이딩 기능 없이 스위칭되도록 하기 위해 교차 페이딩 기능을 전혀 갖지 않을 수도 있다.
In addition, the second combiner 600 may have a certain cross fading function as discussed in FIG. 4C. Optionally or additionally, the first combiner 532 may have the same cross-fading function. In addition, both combiners may have the same cross fading function, or may have different cross fading functions, and may also have no cross fading function to allow both combiners to switch without any additional cross fading function.

전술한 바와 같이, 양 스위치는 도 10a 및 도 10b와 관련되어 논의된 것과 같이 개방 루프 결정 또는 폐 루프 결정에 따라 제어될 수 있으며, 도 3의 제어기 300, 525들은 상기 양 스위치를 위해, 서로 다른 또는 동일한 기능을 가질 수 있다.
As discussed above, both switches may be controlled in accordance with an open loop decision or closed loop determination, as discussed in connection with FIGS. 10A and 10B, and the controllers 300, 525 of FIG. Or may have the same function.

또한, 신호-적응적(Signal-adaptive)인 시간 워핑(Time Warping) 기능은 제1 부호화 분기 또는 제1 복호화 분기에만 존재할 수 있는 것이 아니라, 복호화기측뿐만이 아닌 부호화기측에서의 제2 코딩 분기의 제2 처리 분기에도 존재할 수 있다. 동일한 시간 워핑이 제1 영역 및 제2 영역의 신호에 적용될 수 있도록, 양 시간 워핑 기능들은 처리된 신호에 따라, 동일한 시간 워핑 정보를 가질 수 있다. 이는 처리 부담(Processing Load)을 덜고, 후속 블록들이 유사한 시간 워핑 시간 특성을 가지는 경우에, 어떤 경우 유용할 수 있다. 그러나, 선택적인 다른 실시예에서는, 제1 코딩 분기와 제2 코딩 분기 내의 제2 처리 분기를 위해 독립적인 시간 워핑 예측자(Time Warping Estimator)을 가지는 것이 바람직하다.
In addition, the time-warping function that is signal-adaptive may not exist only in the first coding branch or the first decoding branch, but may be applied to the second coding branch in the encoder side as well as the decoder side It can also exist in branches. Both time warping functions can have the same time warping information, depending on the processed signal, such that the same time warping can be applied to the signals of the first and second regions. This may be useful in some cases where less processing load is required and subsequent blocks have similar time warping time characteristics. However, in alternative alternative embodiments, it is desirable to have an independent Time Warping Estimator for the first coding branch and the second processing branch in the second coding branch.

발명된 부호화된 오디오 신호는 디지털 저장 매체에 저장되거나, 또는 무선 전송 매체 또는 인터넷과 같은 유선 전송 매체와 같은 전송 매체에서 전송될 수 있다.
The inventive encoded audio signal may be stored in a digital storage medium or transmitted in a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

다른 실시예에서, 도 1a 또는 2a의 스위치 200은 두 코딩 분기 400, 500 사이를 스위칭한다. 추가 실시예에서는, 제3 부호화 분기 또는 심지어 제4 부호화 분기 또는 심지어 더 많은 부호화 분기와 같은 추가적인 부호화 분기들이 있을 수 있다. 복호화기측에서, 도 1b 또는 2b의 스위치 600는 두 복호화 분기 431, 440 및 531, 532, 533, 534, 540 사이를 스위칭시킨다. 추가 실시예에서는, 제3 복호화 분기 또는 심지어 제4 복호화 분기 또는 심지어 더 많은 복호화 분기와 같은 추가적인 복호화 분기들이 있을 수 있다. 유사하게, 다른 스위치 521 또는 532는, 그러한 추가적인 코딩/복호화 분기들이 제공될 때, 둘 이상의 서로 다른 코딩 알고리즘 사이를 스위칭할 수 있다.
In another embodiment, the switch 200 of Fig. 1A or 2a switches between two coding branches 400,500. In a further embodiment, there may be additional coding branches such as a third coding branch or even a fourth coding branch or even more coding branches. On the decoder side, the switch 600 of FIG. 1B or 2B switches between the two decoding branches 431, 440 and 531, 532, 533, 534, 540. In a further embodiment, there may be additional decryption branches such as a third decryption branch or even a fourth decryption branch or even more decryption branches. Similarly, another switch 521 or 532 may switch between two or more different coding algorithms when such additional coding / decoding branches are provided.

도 12A는 부호화기 구현의 바람직한 실시예를 나타내고, 도 12B는 대응하는 복호화기 구현의 바람직한 실시예를 나타낸다. 대응하는 참조 번호들에 관련하여 전에 논의된 구성요소에 추가하여, 도 12A의 실시예는 별도의 심리 음향 모듈(Psycho Acoustic Module) 1200을 나타내고, 부가적으로, 도 11A의 블록 421에 나타난 추가 부호화기 도구의 바람직한 구현을 나타낸다. 이러한 부가적인 도구들은 시간영역 잡음 형상화 도구(TNS : Temporal Noise Shaping Tool) 1201과 미드/사이드(M/S : Mid/Side)코딩 도구 1202이다. 나아가, 구성요소 421, 524들이 스케일링, 잡음 필터링 분석, 양자화, 스펙트럴 값들의 산술적 코딩의 결된 구현으로써 블록 421/542에 나타나 있다.
Figure 12A shows a preferred embodiment of an encoder implementation, and Figure 12B shows a preferred embodiment of a corresponding decoder implementation. In addition to the components discussed above with respect to corresponding reference numerals, the embodiment of FIG. 12A represents a separate Psycho Acoustic Module 1200, and additionally includes a further encoder < RTI ID = 0.0 >&Lt; / RTI > represents a preferred implementation of the tool. These additional tools are Temporal Noise Shaping Tool (TNS) 1201 and Mid / Side (M / S) coding tool 1202. Further, components 421 and 524 are shown in block 421/542 as a combined implementation of scaling, noise filtering analysis, quantization, and arithmetic coding of spectral values.

대응하는 복호화기 구현 도 12B에, M/S 복호화 도구 1203 및 TNS-복호화기 도구 1204인, 추가적인 구성 요소가 나타나 있다. 나아가, 이전 도면들에 나타나지 않은 저음 후 필터(Bass Postfilter)가 1205에 나타나 있다. 전이 윈도우잉 블록 532은, 스위치로 나타내어진, 그러나 오버 샘플링된 교차 페이딩 또는 임계 샘플링된 교차 페이딩일 수 있는 일종의 교차 페이딩을 수행하는, 도 2B의 구성요소 532에 대응한다. 후자는 MDCT 연산으로 구현되며, 둘의 시간 앨리어싱된 부분들은 오버래핑되고 가산된다. 이러한 임계 샘플링된 전이 처리는, 전체 비트율이 품질에서 어떠한 손실 없이 감소될 수 있으므로, 바람직하게 적절한 곳에 사용된다. 부가적인 전이 윈도우잉 블록 600은, 다시 스위치로 나타내어진, 도 2B의 결합기 600에 대응한다. 그러나, 이러한 구성요소는, 하나의 블록이 제1 분기에서 처리되고 다른 블록이 제2 분기에서 처리되었을 경우, 블록킹 아티팩트(Blocking Artifact), 특히 스위칭 아티팩트를 피하기 위해, 임계적으로 샘플링된 또는 비임계적으로 샘플링된 일종의 교차 페이딩을 수행한다. 교차 페이딩 동작은 양 분기 사이의 “소프트(Soft)” 스위칭으로 이해되는 반면, 그러나, 양 분기에서의 처리가 서로 완벽하게 일치될 때에, 교차 페이딩 동작은 하드(Hard) 스위치로 “격하(Degrade)”할 수 있다.
Corresponding Decoder Implementation In Figure 12B, additional components are shown, which are the M / S Decryption Tool 1203 and the TNS-Decryptor Tool 1204. Further, a Bass Postfilter is shown at 1205, which is not shown in the previous figures. Transition windowing block 532 corresponds to component 532 of FIG. 2B, which performs some sort of cross fading as indicated by a switch, but which may be oversampled cross fading or threshold sampled cross fading. The latter is implemented as an MDCT operation, in which the two time-aliased portions are overlapped and added. This threshold sampled transition process is preferably used where appropriate because the overall bit rate can be reduced without any loss in quality. The additional transition windowing block 600 corresponds to the combiner 600 of FIG. 2B, again indicated by the switch. However, these components can be used to provide a more efficient way to avoid blocking artifacts, especially thresholded sampled or non-systematic, to avoid blocking artifacts when one block is processed in the first branch and another block is processed in the second branch. As shown in FIG. The cross fading operation is understood as " soft " switching between both branches, however, when the processing in both branches is perfectly matched with each other, "can do.

도 12A 및 도 12B에서의 개념은 음성과 오디오 내용의 임의의 혼합을 가지는 신호의 코딩을 가능케 하며, 이러한 개념은, 음성 또는 일반적인 오디오 내용의 코딩에 특히 맞추어질 수 있는 최선의 코딩 기술에 상당하는 정도로 또는 그보다 더 잘 작동한다. 부호화기와 복호화기의 일반적인 구조는, 스테레오 또는 다중 채널 처리를 다루기 위한 MPEG 서라운드(MPEGS) 기능을 가지는 유닛 및 입력 신호의 더 높은 오디오 주파수의 파라미터 표현을 다루는 향상된 SBR(eSBR : Enhanced SBR) 유닛으로 구성되는, 일반적인(common) 전-후 처리가 있다는 점에서 설명될 수 있다. 그 때, 두 분기가 있으며, 하나는 변형된 어드밴스드 오디오 코딩(AAC : Advanced Audio Coding) 도구 경로로 구성되고 다른 하나는, LPC 잔여(Residual)의 주파수 영역 표현 또는 시간 영역 표현을 차례차례 특징 짓는, 선형 예측 코딩(LP 또는 LPC 영역) 기반 경로로 구성된다. AAC 및 LPC, 양쪽에서 전송된 모든 전송된 스펙트라는 양자화 및 산술적 코딩에 뒤따르는 MDCT 영역에서 나타내어진다. 시간 영역 표현은 ACELP 여기 코딩 방법을 사용한다. 상기 기본 구조는, 부호화기에 대해 도 12A에서 그리고 복호화기에 대해 도 12B에서 보여진다. 이 도면에서 데이터 흐름은, 왼쪽에서 오른쪽, 위에서 아래쪽이다. 복호화기의 기능은 비트스트림 페이로드(Payload)에서 양자화된 오디오 스펙트럴 또는 시간 영역 표현의 설명을 찾고 양자화된 값들 및 다른 복원 정보를 복호화하는 것이다.
The concepts in Figures 12A and 12B enable the coding of a signal with any mix of speech and audio content, which concept corresponds to the best coding technique that can be specifically tailored to the coding of speech or general audio content Or better. The general structure of encoders and decoders consists of a unit with MPEG Surround (MPEGS) functionality for handling stereo or multi-channel processing and an enhanced SBR (eSBR: Enhanced SBR) unit for handling parameter representations of higher audio frequencies of the input signal Can be explained in that there is a common pre-post processing. At that time, there are two branches: one is a modified Advanced Audio Coding (AAC) tool path and the other is a frequency domain representation of the LPC residual or a time domain representation, And a linear predictive coding (LP or LPC region) based path. All transmitted spectra transmitted in both AAC and LPC are represented in the MDCT domain following quantization and arithmetic coding. The time domain representation uses the ACELP excitation coding method. The basic structure is shown in Figure 12A for the encoder and in Figure 12B for the decoder. In this figure, the data flow is from left to right, top to bottom. The function of the decoder is to find a description of the quantized audio spectral or time domain representation in the bitstream payload and to decode the quantized values and other reconstruction information.

전송된 스펙트럴 정보의 경우에 상기 복호화기는 양자화된 스펙트럴들을 복원하고, 입력 비트 스트림 페이로드에 의해 설명된 바와 같은 실제 신호 스펙트라에 도달하기 위해 비트 스트림 페이로드에서 작동 중인 도구로써 무슨 도구들이든 이를 통해 복원된 신호들을 처리하며, 그리고 마지막으로 주파수 영역 스펙트라를 시간 영역으로 변환한다. 최초 복원 및 스펙트럼 복원의 스케일링을 뒤따라, 더욱 효율적인 코딩을 제공하기 위해 하나 또는 그 이상의 스펙트라를 변경하는 선택적인 도구들이 있다.
In the case of transmitted spectral information, the decoder reconstructs the quantized spectra and uses any tools as a tool operating in the bitstream payload to reach the actual signal spectra as described by the input bitstream payload Thereby processing the recovered signals, and finally transforming the frequency domain spectra into a time domain. Following the initial reconstruction and scaling of the spectral reconstruction, there are optional tools that modify one or more spectra to provide more efficient coding.

전송된 시간 영역 신호 표현의 경우에, 복호화기는 양자화된 시간 신호를 복원하고, 입력 비트 스트림 페이로드에 의해 설명된 바와 같은 실제 시간 영역 신호에 도달하기 위해 비트 스트림 페이로드에서 작동 중인 도구로써 무슨 도구들이든 이를 통해 복원된 시간 신호를 처리한다.
In the case of a transmitted time domain signal representation, the decoder may be a tool operating in the bitstream payload to recover the quantized time signal and to arrive at an actual time domain signal as described by the input bitstream payload. And processes the restored time signal.

신호 데이터에 대해 동작하는 선택적인 도구들 각각을 위해, “통과(Pass Through)”를 위한 옵션이 보유되고, 처리과정이 생략되는 모든 경우에, 그 입력에서의 스펙트라 또는 시간 샘플들은 변경(Modification) 없이 바로 도구를 통과해 지나간다.
For each of the optional tools that operate on the signal data, the spectra or time samples at the input are Modification, in all cases where an option for " Pass Through " Without passing through the tool immediately.

비트 스트림이 그 신호 형태를 시간 영역에서 주파수 영역 형태로 또는 LP 영역에서 비-LP(Non-LP) 영역으로 또는 그 역으로 변경하는 곳에서, 복호화기는, 적절한 전이 오버랩-가산 윈도우잉(Transition Overlab-add Windowing) 수단에 의해 한 영역에서 다른 영역으로의 전이를 돕는다.
Where the bit stream changes its signal form from time domain to frequency domain form, or from LP domain to non-LP domain or vice versa, the decoder may use appropriate Transition Overlap windowing -add Windowing) to help transition from one region to another.

eSBR 및 MPEG 처리는 전이 처리 후 동일한 방법으로 양 코딩 경로에 적용된다.
eSBR and MPEG processing are applied to both coding paths in the same way after the transition processing.

비트 스트림 페이로드 역 다중화기 도구(Bit Stream Payload Demultiplexer Tool)로의 입력은 비트 스트림 페이로드이다. 상기 역 다중화기는 비트 스트림 페이로드를 각 도구를 위한 부분들로 분리하고, 각각의 도구에 그 도구와 관계된 비트 스트림 페이로드 정보를 제공한다.
The input to the Bit Stream Payload Demultiplexer Tool is a bitstream payload. The demultiplexer demultiplexes the bitstream payload into portions for each tool and provides each tool with bitstream payload information associated with the tool.

비트 스트림 페이로드 역 다중화기 도구로부터의 출력들은 다음과 같다:The outputs from the bitstream payload demultiplexer tool are:

● 현재 프레임에서의 코어 코딩 유형(Core Coding Type)에 따라Depending on the type of Core Coding in the current frame

● 아래에 의해 표현되는, 양자화되고 잡음 없이 코딩된 스펙트라 The quantized, noise-free coded spectra, represented by

● 스케일팩터(Scalefactor) 정보 ● About Scalefactor

● 산술적으로 코딩된 스펙트럴 라인들 ● Arithmetically coded spectral lines

● 또는 : 다음에 의해 표현되는 여기 신호(Excitation Signal)를 함께 가진 선형 예측(LP : Linear Prediction) 파라미터 Or: a linear prediction (LP) parameter with an excitation signal represented by:

● 양자화되고 산술적으로 코딩된 스펙트럴 라인들(변환 코딩된 여기(Transform Coded Excitation), TCX) 또는 ● Quantized and arithmetically coded spectral lines (Transform Coded Excitation, TCX) or

● ACELP 코딩된 시간 영역 여기(Excitation) ● ACELP coded time domain excitation

● 스펙트럴 잡음 필링(Spectral Noise Filling) 정보 (선택적)● Spectral Noise Filling Information (optional)

● M/S 결정 정보(선택적)● M / S decision information (optional)

● 시간 영역 잡음 형상화(TNS : Temporal Noise Shaping) 정보 (선택적)● Temporal Noise Shaping (TNS) information (optional)

● 필터뱅크(Filterbank) 제어 정보● Filterbank control information

● 시간 워핑(TW : Time Warping) 제어 정보 (선택적)● Time Warping (TW) control information (optional)

● 향상된 스펙트럴 대역폭 복사(eSBR : Enhanced Spectral Bandwidth Replication) 제어 정보● Enhanced Spectral Bandwidth Replication (eSBR) control information

● MPEG 서라운드(MPEGS : MPEG Surround) 제어 정보
● MPEG Surround (MPEGS: MPEG Surround) control information

스케일팩터 무잡음 복호화도구(Scalefactor Noiseless Decoding Tool)는 비트 스트림 페이로드로부터의 정보를 취하고, 상기 정보를 파싱(Parse)하며, 호프만(Huffman) 또는 DPCM 코딩된 스케일팩터들을 복호화한다.
A Scale Factor Noiseless Decoding Tool takes information from the bitstream payload, parses the information, and decodes Huffman or DPCM coded scale factors.

스케일팩터 무잡음 복호화 도구로의 입력은 다음과 같다:The input to the scale factor noise-free decoding tool is as follows:

● 무잡음 코딩된 스펙트라에서 스케일팩터 정보
● Scale factor information in noiseless coded spectra

스케일팩터 무잡음 복호화 도구의 출력은 다음과 같다:The output of the scale factor noise-free decoding tool is:

● 스케일팩터의 복호화된 정수 표현
● Decoded integer representation of a scale factor

스펙트럴 무잡음 복호화도구(Spectral Noiseless Decoding Tool)는 비트 스트림 페이로드 역 다중화기로부터의 정보를 취하고, 상기 정보를 파싱하고, 산술적으로 코딩된 데이터를 복호화하며, 그리고 양자화된 스펙트라를 복원한다. 상기 무잡음 복호화 도구로의 입력은 다음과 같다: A Spectral Noiseless Decoding Tool takes information from the bitstream payload demultiplexer, parses the information, decodes the arithmetically coded data, and reconstructs the quantized spectra. The input to the noise-free decoding tool is as follows:

● 잡음 없이 코딩된 스펙트라
● Spectra coded without noise

무잡음 복호화 도구의 출력은 다음과 같다:The output of the noise-free decoding tool is:

● 스펙트라의 양자화된 값들
● Spectra's quantized values

역 양자화기 도구는 스펙트라에 대한 양자화된 값을 취하고, 정수 값들을 스케일링 되지 않은(Non-scaled), 복원된 스펙트라로 변환한다. 상기 양자화기는, 압신 팩터(Companding Factor)가 선택된 코어 코딩 모드(Core Coding Mode)에 의해 종속되는, 압신 양자화기(Companding Quantizer)이다.
The dequantizer tool takes the quantized values for the spectra and transforms the integer values to the non-scaled, recovered spectra. The quantizer is a compressing quantizer in which a compressing factor is dependent on a selected core coding mode.

역 양자화기 도구의 입력은 다음과 같다:The input of the inverse quantizer tool is as follows:

● 스펙트라의 양자화된 값들
● Spectra's quantized values

역 양자화기 도구의 출력은 다음과 같다:The output of the inverse quantizer tool is:

● 스케일링되지 않은, 역 양자화된 스펙트라
● Unscaled, dequantized spectra

잡음 필링 도구(Noise Filling Tool)는, 예를 들어 부호화기에서의 비트 요구에 대한 강한 제한 때문에 스펙트럴 값이 0으로 양자화될 때 발생하는, 복호화된 스펙트라의 스펙트럴 차이(Spectral Gap)들을 필링(Filling)하기 위해 사용된다.
The Noise Filling Tool may be used to filter spectral gaps of decoded spectra that occur when the spectral value is quantized to zero due to strong constraints on bit requirements in the encoder, ).

잡음 필링 도구로의 입력은:The input to the noise filler tool is:

● 스케일링되지 않은, 역 양자화된 스펙트라● Unscaled, dequantized spectra

● 잡음 필링 파라미터들● Noise filling parameters

● 스케일팩터들의 복호화된 정수 표현
● Decoded integer representation of scale factors

잡음 필링 도구로의 출력은:The output to the noise filler tool is:

● 이전에 0으로 양자화되었던 스펙트럴 라인에서 스케일링되지 않은, 역 양자화된 스펙트럴 값들● Unscaled, dequantized spectral values in spectral lines that were previously quantized to zero

● 스케일팩터들의 변경된 정수 표현
● Modified integer representation of scale factors

재스케일링(Rescaling) 도구는 스케일팩터들의 정수 표현을 실제 값으로 변환하고, 스케일링되지 않은, 역 양자화된 스펙트라에 관련된 스케일팩터들을 곱한다.
The rescaling tool converts the integer representations of the scale factors into actual values and multiplies the scale factors associated with the unscaled, dequantized spectra.

스케일팩터 도구로의 입력은:The input to the scale factor tool is:

● 스케일팩터들의 복호화된 정수 표현● Decoded integer representation of scale factors

스테일팩터 도구로부터의 출력은 다음과 같다:The output from the Stale Factor tool is:

● 스케일링된, 역 양자화된 스펙트라
● Scaled, dequantized spectra

M/S 도구의 개괄에 대해서는, ISO/IEC 14496-3, subpart 4.1.1.2.를 참고한다.
For an overview of M / S tools , see ISO / IEC 14496-3, subpart 4.1.1.2.

시간 영역 잡음 형상화(TNS : Temporal Noise Shaping) 도구의 개괄에 대해서는, ISO/IEC 14496-3, subpart 4.1.1.2.를 참고한다.
For an overview of the Temporal Noise Shaping (TNS) tool , see ISO / IEC 14496-3, subpart 4.1.1.2.

필터뱅크/블록 스위칭 도구는 부호화기에서 수행되었던 주파수 매핑(Frequency Mapping)의 역(Inverse)을 적용한다. 역 변형 이산 여현 변환(IMDCT : Inverse Modified Discrete Cosine Transform)은 필터뱅크 도구들에서 사용된다. 상기 IMDCT는 120, 128, 240, 256, 320, 480, 512, 576, 960, 1024 또는 1152 스펙트럴 계수들을 지원하기 위해 구성될 수 있다.
The filter bank / block switching tool applies the inverse of the frequency mapping that was performed in the encoder. Inverse Modified Discrete Cosine Transform (IMDCT) is used in filter bank tools. The IMDCT may be configured to support spectral coefficients of 120, 128, 240, 256, 320, 480, 512, 576, 960, 1024 or 1152.

필터뱅크 도구로의 입력들은:Inputs to the Filter Bank tool are:

● (역 양자화된) 스펙트라● (dequantized) spectra

● 필터뱅크 제어 정보● Filter bank control information

필터뱅크 도구로부터의 출력(들)은 다음과 같다.The output (s) from the filter bank tool are:

● 시간 영역 복원된 오디오 신호(들)
Time domain reconstructed audio signal (s)

시간-워핑된 필터뱅크/블록 스위칭 도구(Time-Warped Filterbank/Block Switching Tool)는 시간 워핑 모드가 가용 가능할 때 보통의 필터뱅크/블록 스위칭 도구를 대체한다. 상기 필터뱅크는 보통의 필터뱅크에서와 동일(IMDCT)하고, 추가로, 윈도우잉된 시간 영역 샘플들은, 시변 재샘플링(Time-varying Resampling)에 의해 워핑된 시간 영역으로부터 선형 시간 영역으로 매핑된다.
The Time-Warped Filterbank / Block Switching Tool replaces the normal filter bank / block switching tool when time warping mode is available. The filter bank is the same as in the normal filter bank (IMDCT) and further windowed time-domain samples are mapped from the warped time domain to the linear time domain by Time-varying resampling.

시간 워핑된 필터뱅크 도구로의 입력들:Inputs to the Time Warped Filter Bank tool:

● 역 양자화된 스펙트라● Inverse quantized spectra

● 필터뱅크 제어 정보● Filter bank control information

● 시간-워핑 제어 정보
● Time - Warping control information

필터뱅크 도구로부터의 출력(들)은:The output (s) from the filter bank tool are:

● 선형 시간 영역 복원된 오디오 신호(들)
● Linear time domain reconstructed audio signal (s)

향상된 SBR(eSBR : Enhanced SBR)도구는 오디오 신호의 고대역을 재생한다. 이는 부호화 동안 절단된(Truncated), 고조파 시퀀스의 복사(Replication)에 기반한다. 이는 발생된 고대역의 스펙트럴 엔빌로프를 조정하고 역 필터링을 적용하며, 그리고 본래 신호의 스펙트럴 특성을 되살리기 위해 잡음과 사인파(Sinusoidal) 성분을 추가한다.
Enhanced SBR (eSBR) tools play high-bandwidth audio signals. This is based on the replication of the harmonic sequence, which is truncated during encoding. This adjusts the generated highband spectral envelope, applies inverse filtering, and adds noise and sinusoidal components to recover the spectral characteristics of the original signal.

eSBR 도구로의 입력은:Input to the eSBR tool is:

● 양자화된 엔빌로프 데이터Quantized envelope data

● 기타 제어 데이터● Other control data

● AAC 코어 복호화기로부터의 시간 영역 신호
Time-domain signals from AAC core decoders

eSBR 도구의 출력은:The output of the eSBR tool is:

● 시간 영역 신호 또는● Time domain signal or

● 예를 들어, MPEG 서라운드 도구가 사용되는 경우에, 신호의 QMF 영역 표현
For example, when an MPEG Surround tool is used, the QMF region representation of the signal

MPEG 서라운드(MPEGS : MPEG Surround) 도구는 적절한 공간 파라미터에 의해 제어되는 입력 신호(들)에 정교한 상향 혼합 절차를 적용함으로써 하나 또는 그 이상의 입력 신호로부터 다수의 신호를 생성한다. USAC 환경에서 MPEGS는, 전송된 하향 혼합된(Downmixed) 신호와 함께 파라미터 보조 정보를 전송함으로써, 다중 채널 신호를 코딩하기 위해 사용된다.
An MPEG Surround (MPEGS) tool generates multiple signals from one or more input signals by applying a sophisticated up-mixing procedure to the input signal (s) controlled by appropriate spatial parameters. In the USAC environment, MPEGS is used to code multi-channel signals by transmitting parameter assistance information with the transmitted downmixed signal.

MPEGS 도구로의 입력은:The input to the MPEGS tool is:

● 하향 혼합된 시간 영역 신호 또는● A down-mixed time-domain signal or

● eSBR 도구로부터의 하향 혼합된 신호의 QMF 영역 표현
Q QMF region representation of the downstream mixed signal from the eSBR tool

MPEGS 도구의 출력은 다음과 같다.The output of the MPEGS tool is as follows.

● 다중-채널 시간 영역 신호
● Multi-channel time-domain signals

신호 분류기 도구(Signal Classifier Tool)는 본래 입력 신호를 분석하고 그로부터 서로 다른 코딩 모드들의 선택을 야기(Trigger)하는 제어 정보를 발생시킨다. 입력 신호의 분석은 구현 종속적이고 주어진 입력 신호 프레임에 대해 최선의 코어 코딩의 선택을 시도한다. 신호 분류기의 출력은 (선택적으로) 또한 다른 도구들, 예들 들어 MPEG 서라운드, 향상된 SBR(Enhanced SBR), 시간 워핑된 필터뱅크 및 기타 도구들의 동작에 영향을 끼치기 위해 사용될 수 있다.
The Signal Classifier Tool inherently analyzes the input signal and generates control information that triggers selection of different coding modes therefrom. The analysis of the input signal is implementation dependent and attempts to select the best core coding for a given input signal frame. The output of the signal classifier can (optionally) also be used to affect the operation of other instruments, such as MPEG Surround, Enhanced SBR, Time Warped Filter Bank and other tools.

신호 분류기 도구로의 입력은:Inputs to the signal sorter tool are:

● 본래의 변경되지 않은 입력 신호● The original unaltered input signal

● 추가 구현 종속 파라미터들
Additional implementation dependent parameters

신호 분류기 도구의 출력은 다음과 같다:The output of the Signal Sorter tool is as follows:

● 코어 코덱의 선택을 제어하기 위한 제어 신호(비-LP 필터링된 주파수 영역 코딩, LP 필터링된 주파수 영역 또는 LP 필터링된 시간 영역 코딩)
• Control signals (non-LP filtered frequency domain coding, LP filtered frequency domain or LP filtered time domain coding) to control the selection of the core codec

본 발명에 따르면, 도 12A의 블록 410 및 도 12A의 변환기 523에서 시간/주파수 분해능은 오디오 신호에 종속되어 제어된다. 윈도우 길이, 변환 길이, 시간 분해능 및 주파수 분해능 사이의 상호관계는 도 13A에 나타나 있으며, 긴 윈도우 길이에 대해, 시간 분해능은 낮아지지만, 주파수 분해능은 높아지고, 짧은 윈도우 길이에 대해, 시간 분해능은 높지만, 주파수 분해능은 낮아지는 것이 명확해진다..
According to the present invention, the time / frequency resolution in block 410 of FIG. 12A and in transducer 523 of FIG. 12A is controlled depending on the audio signal. The correlation between window length, transform length, time resolution, and frequency resolution is shown in Figure 13A, for a long window length, the time resolution is low, but the frequency resolution is high, and for short window length, time resolution is high, It becomes clear that the frequency resolution is lowered.

바람직하게는 도 12A의 구성요소 410, 1201, 1202, 4021에 의해 지시되는 AAC 부호화 분기인, 제1 부호화 분기에서는, 서로 다른 윈도우들이 사용될 수 있으며, 상기 윈도우 형상은, 바람직하게는 신호 분류기 블록 300에 의해 부호화되나, 별개의 모듈일 수 있는, 신호 분석기에 의해 결정된다. 상기 부호화기는 도 13B에 나타난 윈도우들 중 하나를 선택하며, 상기 윈도우들은 서로 다른 시간/분해능을 가진다. 제1 긴 윈도우(First Long Window), 제2 윈도우, 제4 윈도우, 제5 윈도우 및 제6 윈도우의 시간/주파수 분해능은 1,024의 변환 길이에 대한 2,048 샘플링 값과 같다. 도 13B의 세 번째 라인에 나타난 짧은 윈도우(Short Window)는 윈도우 크기에 대응하는 256 샘플링 값의 시간 분해능을 가진다. 이것은 128의 변환 길이에 대응한다.
In the first coding branch, which is preferably the AAC coding branch indicated by the components 410, 1201, 1202, 4021 of Figure 12A, different windows can be used and the window shape is preferably a signal classifier block 300 , But may be a separate module, as determined by the signal analyzer. The encoder selects one of the windows shown in FIG. 13B, and the windows have different time / resolution. The time / frequency resolution of the first long window, the second window, the fourth window, the fifth window, and the sixth window is equal to 2,048 sampled values for a transform length of 1,024. The short window shown in the third line of FIG. 13B has a time resolution of 256 sampling values corresponding to the window size. This corresponds to a conversion length of 128.

유사하게, 마지막 두 윈도우는 2,304와 동일한 윈도우 길이를 가지며, 이는 첫 번째 라인의 윈도우보다 더 좋은 주파수 분해능이지만, 더 낮은 시간 분해능이다. 상기 마지막 두 라인의 윈도우의 변환 길이는 1,152와 동일하다.
Similarly, the last two windows have the same window length as 2,304, which is better frequency resolution than the window of the first line, but with lower time resolution. The conversion length of the window of the last two lines is equal to 1,152.

제1 부호화 분기에서는, 도 13B의 변환 윈도우로부터 만들어지는 서로 다른 윈도우 시퀀스들이 구성될 수 있다. 도 13C에는 비록 짧은 시퀀스만이 나타나 있지만, 다른 “시퀀스들”은 단일 윈도우만으로 구성되는 반면에, 더 많은 윈도우로 구성되는 더 큰 시퀀스들도 구성될 수 있다. 도 13B에 따르면, 계수의 더 작은 수, 즉 1,024 대신 960에 대해, 시간 분해능도 또한, 1024와 같은 계수의 대응하는 더 높은 수에 대한 것보다 낮다.
In the first encoding branch, different window sequences made from the transform window of Fig. 13B can be constructed. Although only a short sequence is shown in Fig. 13C, other " sequences " are made up of only a single window, while larger sequences of more windows can be constructed. According to FIG. 13B, for a smaller number of coefficients, i.e., 960 instead of 1,024, the temporal resolution is also lower than for a corresponding higher number of coefficients such as 1024.

도 14A-14G는 제2 부호화 분기 내의 서로 다른 분해능/윈도우 크기를 나타낸다. 본 발명의 바람직한 실시예에서, 제2 부호화 분기는, ACELP 시간 영역 코더526인 제1 처리 분기를 포함하고, 제2 처리 분기는 필터뱅크 523를 포함한다. 이 분기에서, 예를 들어 2048 샘플의 슈퍼 프레임(Super Frame)은 256 샘플의 프레임들로 하위-분할(Sub-divided)된다. 256 샘플의 개별 윈도우들은, 50 퍼센트 오버랩을 가지는 MDCT가 적용될 때, 각 윈도우가 두 개의 프레임을 담당하는, 네 개 윈도우의 시퀀스가 적용될 수 있도록, 별개로 사용될 수 있다. 그 때, 높은 시간 분해능이 도 14D에 나타난 것과 같이 이용된다. 하나의 윈도우가 네 개의 프레임을 담당하고 50 퍼센트의 오버랩이 있도록 하기 위해, 선택적으로, 신호가 더 긴 윈도우를 허용할 때에는, 도 14C에서와 같은 시퀀스가 적용될 수 있으며, 각 윈도우(중간 윈도우(Medium Window))에 대해 1,024 샘플을 갖는 두 배의 윈도우 크기가 적용된다.
Figures 14A-14G illustrate different resolutions / window sizes in the second encoding branch. In a preferred embodiment of the present invention, the second encoding branch comprises a first processing branch, which is an ACELP time domain coder 526, and the second processing branch comprises a filter bank 523. In this branch, for example, a superframe of 2048 samples is sub-divided into 256 sample frames. The individual windows of the 256 samples can be used separately, so that when a MDCT with 50 percent overlap is applied, a sequence of four windows, where each window is responsible for two frames, can be applied. Then, a high temporal resolution is used as shown in FIG. 14D. Optionally, when the signal allows a longer window, the sequence as in FIG. 14C can be applied, so that one window is responsible for four frames and an overlap of 50 percent, and each window Window)) is applied twice the window size with 1,024 samples.

마지막으로, 신호가 긴 윈도우에 사용될 수 있을 정도일 때에는, 다시 50 퍼센트 오버랩에 대해 긴 윈도우가 4,096 샘플 이상으로 확장한다.
Finally, when the signal is enough to be used for a long window, the long window again extends to over 4,096 samples for a 50 percent overlap.

한 분기는 ACELP 부호화기인 두 분기가 존재하는 바람직한 실시예에서, 슈퍼 프레임 내에서 “A”로 지시되는 ACELP 프레임의 위치는, 도 14E에서 “T”로 지시되는 두 근접한 TCX 프레임에 적용되는 윈도우 크기도 또한 결정할 수 있다. 기본적으로, 하나는 가능할 때마다 긴 윈도우를 사용하는데 관여한다. 그럼에도 불구하고, 단일 T 프레임이 두 A 프레임 사이에 있을 때에는 짧은 윈도우가 적용되어야 한다. 중간 윈도우들은 두 근접한 T 프레임이 있을 때 적용될 수 있다. 그러나, 세 개의 근접한 T 프레임들이 있을 때, 대응하는 더 큰 윈도우는 추가적인 복잡성(Complexity) 때문에 효과적이지 않을 수 있다. 그러므로, 세 번째 T 프레임은, 비록 A 프레임이 선행하지 않더라도, 짧은 윈도우에 의해 처리될 수 있다. 전체 슈퍼 프레임이 단지 T 프레임만 가질 때에는, 긴 윈도우가 적용될 수 있다.
In one preferred embodiment where one branch is an ACELP encoder, the location of the ACELP frame indicated as " A " in the superframe is determined by the size of the window applied to two adjacent TCX frames indicated by " T & Can also be determined. Basically, one is involved in using a long window whenever possible. Nevertheless, a short window should be applied when a single T frame is between two A frames. The intermediate windows can be applied when there are two adjacent T frames. However, when there are three adjacent T frames, the corresponding larger window may not be effective due to additional complexity. Therefore, the third T frame can be processed by a short window, even if the A frame does not precede it. When the entire superframe has only T frames, a long window can be applied.

도 14F는 윈도우의 몇몇의 선택 가능한 대안을 도시하며, 상기 윈도우 크기는 바람직한 50 퍼센트 오버랩에 기인하여 항상 2×스펙트럴 계수의 수 lg이다. 그러나, 시간 영역 앨리어싱이 적용되지 않을 경우 윈도우 크기와 변환 길이 사이의 관계가 2와 다를 수 있고 심지어 1에 근접하기 위해 모든 부호화 분기의 다른 오버랩 비율들이 적용될 수 있다.
Fig. 14F shows some selectable alternatives of the window, which is always a number lg of 2 x spectral coefficients due to the preferred 50 percent overlap. However, if time-domain aliasing is not applied, the relationship between the window size and the transform length may be different from 2 and even different overlap ratios of all encoding branches may be applied to approach 1.

도 14G는 도14F에 주어진 방법에 기반한 윈도우를 구성하기 위한 규칙들을 나타낸다. 값 ZL은 윈도우의 시작에서의 영들(zeros)을 나타낸다. 값 L은 앨리어싱 구역에서의 윈도우 계수의 수를 나타낸다. M 부분의 값들은, M에 대응하는 부분에서 0 값들을 갖는 근접한 윈도우를 가지는 어떤 오버랩도 유발하지 않는 “1” 값들을 갖는다. M 부분은 뒤이어 오른쪽 오버랩 영역 R이 이어지고, 상기 오버랩 영역 R은 뒤이어, 후속 윈도우의 M 부분에 대응하는, ZR 구역의 영들(zeros)이 따라온다.
FIG. 14G shows rules for constructing a window based on the method given in FIG. 14F. The value ZL represents the zeros at the beginning of the window. The value L represents the number of window coefficients in the aliased area. The values of the M portion have " 1 " values that do not cause any overlap with adjacent windows with zero values in the portion corresponding to M. The M portion is followed by the right overlap region R followed by the zeros of the ZR region, which corresponds to the M portion of the subsequent window.

참조 사항은 뒤에 첨부된 부록으로 만들어지며, 이는 특히 복호화기측과 관련하여, 진보된 오디오 부호화/복호화 방법의 바람직하고 상세한 구현을 설명한다.
The reference is made to the appended appendix, which, in particular with respect to the decoder, describes a preferred and detailed implementation of the advanced audio encoding / decoding method.

부록Appendix

1.윈도우와 윈도우 시퀀스1. Windows and Windows sequences

양자화와 코딩은 주파수 영역에서 행해진다. 이러한 목적을 쉬해, 시간 신호는 부호화기에서 주파수 영역으로 매핑된다. 복호화기는 서브 항목(Subclause 2)에 설명된 것처럼 역 매핑을 수행한다. 상기 신호에 의해, 코더는 세 개의 다른 윈도우 크기:2304, 2048 및 256을 사용함으로써 시간/주파수 분해능을 변경할 수 있다. 윈도우 사이를 스위칭하기 위해, 전이 윈도우 LONG_START_WINDOW, LONG_STOP_WINDOW, START_WINDOW_LPD, STOP_WINDOW_1152, STOP_START_WINDOW 및 STOP_START_WINDOW_1152가 사용된다. 표 5.11은 상기 윈도우들을 열거하고, 대응하는 변환 길이를 명시하며, 상기 윈도우의 형상을 도식적으로 보여준다. 세 개의 변환 길이들이 사용된다 : 1152, 1024(또는 960)(긴 변환으로써 나타내어짐) 및 128(또는 120) 계수들(짧은 변환으로써 나타내어짐).
Quantization and coding are performed in the frequency domain. To this end, the time signal is mapped to the frequency domain in the encoder. The decoder performs the inverse mapping as described in the subclause 2. With this signal, the coder can change the time / frequency resolution by using three different window sizes: 2304, 2048 and 256. [ To switch between windows, the transition windows LONG_START_WINDOW, LONG_STOP_WINDOW, START_WINDOW_LPD, STOP_WINDOW_1152, STOP_START_WINDOW and STOP_START_WINDOW_1152 are used. Table 5.11 enumerates the windows, specifies the corresponding conversion length, and graphically shows the shape of the window. Three transform lengths are used: 1152, 1024 (or 960) (represented by the long transform) and 128 (or 120) coefficients (represented by the short transform).

윈도우 시퀀스들은, raw_data_block이 항상 1024(또는 960) 출력 샘플들을 나타내는 데이터를 포함하는 방법에서의 윈도우들로 구성된다. 데이터 구성요소 window_sequence는 실제로 사용되는 윈도 시퀀스를 가리킨다. 도 13C는 어떻게 윈도우 시퀀스들이 개별 윈도우들로 구성되는지를 목록으로 나타낸다. 변환과 윈도우에 대한 더욱 상세한 정보는 서브 항목 2를 참고한다.
The window sequences consist of windows in a way that raw_data_block always contains data representing 1024 (or 960) output samples. The data component window_sequence points to the actual window sequence used. Figure 13C lists how the window sequences are organized into separate windows. See Subclause 2 for more information on transforms and windows.

1.2 스케일팩터 대역(Scalefactor Band)과 그룹화(Grouping)1.2 Scalefactor Band and Grouping

ISO/IEC 14496-3, subpart 4, subclause 4.5.2.3.4를 보라.
See ISO / IEC 14496-3, subpart 4, subclause 4.5.2.3.4.

ISO/IEC 14496-3, subpart 4, subclause 4.5.2.3.4에 설명된 것처럼, 스케일팩터 대역들의 폭은 청각 시스템(Human Auditory System)의 임계 대역들의 모방으로 만들어진다. 그러한 이유로 스펙트럼 내의 스케일팩터 대역들의 수와 폭은 변환 길이 및 샘플링 주파수에 의존한다. ISO/IEC 14496-3, subpart 4, subclause 4.5.4의 표 4.110부터 표 4.128까지는 변환 길이 1024(960)과 128(120) 및 샘플링 주파수에서 각 스케일팩터 대역의 시작에 대한 오프셋(Offset)을 열거한다. 본래 LONG_WINDOW, LONG_START_WINDOW 및 LONG_STOP_WINDOW를 위해 설계된 상기 표들은 START_WINDOW_LPD 및 STOP_START_WINDOW를 위해서도 또한 사용된다. STOP_WINDOW_1152 and STOP_START_WINDOW_1152를 위한 오프셋 표들은 표 4에서 표 10까지이다.
As described in ISO / IEC 14496-3, subpart 4, subclause 4.5.2.3.4, the width of the scale factor bands is imitated by the critical bands of the Human Auditory System. For that reason, the number and width of the scale factor bands in the spectrum depend on the conversion length and the sampling frequency. Tables 4.110 through 4.128 in ISO / IEC 14496-3, subpart 4, and subclause 4.5.4 list the offsets for the start of each scale factor band at the conversion lengths of 1024 (960) and 128 (120) do. The tables originally designed for LONG_WINDOW, LONG_START_WINDOW and LONG_STOP_WINDOW are also used for START_WINDOW_LPD and STOP_START_WINDOW. The offset tables for STOP_WINDOW_1152 and STOP_START_WINDOW_1152 are from Table 4 to Table 10.

1.3 lpd_channel_stream()의복호화1.3 lpd_channel_stream () Clothing Luxury

lpd_channel_stream() 비트 스트림 요소는 “선형 예측 영역” 코딩된 신호의 하나의 프레임을 복호화 하기 위한 모든 필요한 정보를 포함한다. 이는, LPC-영역, 즉 LPC 필터링 단계를 포함하는 영역에서 코딩된, 부호화된 신호의 한 프레임에서의 페이로드를 포함한다. 상기 필터의 잔여(“여기”로 불린다.)는 그 때 ACELP 모듈의 도움으로 또는 MDCT 변환 영역에서(“변환 코딩된 여기(TCX : Transform Coded Excitation)”) 나타내어진다. 신호 특성에 맞는 적응(Close Adaptation)을 허용하기 위해, 하나의 프레임은, 각각이 ACELP 또는 TCX 코딩 방법으로 코딩된, 동일한 크기의 네 개의 더 작은 유닛들로 나누어진다.
The lpd_channel_stream () bitstream element contains all the necessary information for decoding one frame of the " linear prediction region " coded signal. This includes the payload in one frame of the encoded signal, coded in the LPC-area, i.e. the area including the LPC filtering step. The residual (referred to as " excitation ") of the filter is then indicated with the aid of the ACELP module or in the MDCT transform domain (" Transform Coded Excitation (TCX) "). In order to allow for close adaptation to the signal characteristics, one frame is divided into four smaller units of the same size each coded in an ACELP or TCX coding method.

이러한 처리는 3GPP TS 26.290에 설명된 코딩 방법과 유사하다. 하나의 “슈퍼 프레임”은 1024 샘플의 신호 세그먼트들을 나타내는 경우에는 “프레임”은 정확히 그 4분의 1 즉, 256 샘플인 약간 다른 용어(Different Terminology)가 이 문서로부터 차용(Inherited)되며, . 이 프레임들 각각의 하나는 동일한 길이를 갖는 네 개의 “서브 프레임”으로 더 하위 분할(Subdivided)된다. 이 서브 챕터(Subchapter)는 상기 용어를 채택함을 주의한다.
This processing is similar to the coding method described in 3GPP TS 26.290. If one "superframe" represents 1024 sample signal segments, then the "frame" is exactly one fourth of it, ie a slightly different terminology, 256 samples, is inherited from this document. One of each of these frames is subdivided into four " subframes " having the same length. Note that this subchapter adopts the term.

1.4 정의, 데이터 구성요소들1.4 Definitions, Data Components

acelp_core_mode 이 비트 필트(Bit Field)는 ACELP가 lpd 코딩 모드로써 사용되는 경우에 정확한 비트 배정(Allocation)을 나타낸다. acelp_core_mode This bit field indicates the correct bit allocation when ACELP is used as the lpd coding mode.

lpd_mode 이 비트 필드 모드는 lpd_channel_stream()(하나의 AAC 프레임에 대응한다.)의 하나의 슈퍼 프레임 내의 네 개의 프레임 각각에 대한 코딩 모드를 정의한다. 상기 코딩 모드는 배열(Array) mod[]에 저장되고 0부터 3까지의 값을 취할 수 있다. lpd_mode로부터 mod[]로의 매핑은 아래의 표 1로부터 결정될 수 있다. lpd_mode This bit field mode defines the coding mode for each of the four frames in one superframe of lpd_channel_stream () (corresponding to one AAC frame). The coding mode is stored in the array mod [] and can take values from 0 to 3. The mapping from lpd_mode to mod [] can be determined from Table 1 below.

표 1 - lpd_channel_stream()에서 코딩 모드의 매핑Table 1 - Mapping of coding modes in lpd_channel_stream ()

mod[0..3] 배열 mod[] 내의 값들은 각 프레임 내의 각각의 코딩 모드를 나타낸다.The values in the mod [0..3] array mod [] represent the respective coding modes in each frame.

표 2 - mod[]에 의해 지시되는 코딩 모드Table 2 - Coding mode indicated by mod []

acelp_coding() ACELP 여기의 하나의 프레임을 복호화하기 위한 모든 데이터를 포함하는 구문 요소acelp_coding () ACELP Syntax element containing all data for decoding one frame here

tcx_coding() 변환 코딩된 여기(TCX : Transform Coded eXcitation)에 기반한 MDCT의 하나의 프레임을 복호화하기 위한 모든 데이터를 포함하는 구문 요소tcx_coding () A syntax element containing all the data for decoding one frame of MDCT based on Transform Coded eXcitation (TCX)

first_tcx_flag 현재 처리되는 TCX 프레임이 슈퍼 프레임 내 첫 번째인지를 나타내는 표시 문자(Flag)first_tcx_flag Indicates whether the currently processed TCX frame is the first in the superframe (Flag)

lpc_data() 현재 슈퍼 프레임을 복호화하기 위해 요구되는 모든 LPC 필터 파라미터 집합들을 복호화하기 위한 모든 데이터를 포함하는 구문 요소lpc_data () A syntax element containing all the data for decoding all sets of LPC filter parameters required to decode the current superframe

first_lpd_flag 현재 슈퍼 프레임이 LPC 영역에서 코딩된 슈퍼 프레임의 시퀀스위 첫 번째인지 나타내는 표시 문자(Flag). 상기 표시 문자는 또한 표 3에 따른 비트 스트림 요소 core_mode(channel_pair_element의 경우에 core_mode0 및 core_mode1)의 유래로부터 결정될 수 있다.
first_lpd_flag Indicates whether the current superframe is the first one above the sequence of superframes coded in the LPC region. The display character can also be determined from the bit stream element core_mode according to Table 3 (core_mode0 and core_mode1 in the case of channel_pair_element).

표 3 - first_lpd_flag의 정의Table 3 - Definition of first_lpd_flag

last_lpd_mode 이전에 복호화된 프레임의 lpd_mode를 나타낸다.
last_lpd_mode Indicates the lpd_mode of the previously decoded frame.

1.5 복호화 프로세스1.5 Decryption process

lpd_channel_stream에서 복호화 순서는 다음과 같다.The decoding order in lpd_channel_stream is as follows.

acelp_core_mode를 받는다. Acelp_core_mode is received.

lpd_mode를 받고 그로부터 helper variable mod[]의 내용을 결정한다. It receives the lpd_mode and determines the contents of helper variable mod [] from it.

상기 helper variable mod[]의 내용에 따라, acelp_coding 또는 tcx_coding 데이터를 받는다. According to the contents of the helper variable mod [], acelp_coding or tcx_coding data is received.

lpc_data를 받는다.
Receives lpc_data.

1.6 ACELP/TCX 코딩 모드 조합(Combination)1.6 Combination of ACELP / TCX Coding Modes

[8], 섹션 5.2.2에 유사하게, lpd_channel_stream 페이로드의 하나의 슈퍼 프레임 내의 ACELP 또는 TCX에 대해 26개의 허용되는 조합이 있다. 상기 26 모드 조합 중 하나는 비트 스트림 구성요소 lpd_mode에서 시그널링된다(Signaled). 서브 프레임 내의 각 프레임의 실제 코딩 모드로의 lpd_mode의 매핑은 표 1 및 표 2에서 보여진다.
[8], similar to section 5.2.2, there are 26 allowed combinations for ACELP or TCX in one superframe of the lpd_channel_stream payload. One of the 26 mode combinations is signaled in the bitstream component lpd_mode. The mapping of the lpd_mode into the actual coding mode of each frame in the subframe is shown in Table 1 and Table 2.

표 4 - 44.1 및 48kHz에서 STOP_START_1152_WINDOW 및Table 4 - STOP_START_1152_WINDOW at 44.1 and 48kHz and STOP_1152_WINDOW를 위한 2304윈도우 길이에 대한 스케일팩터 대역들Scale factor bands for 2304 window length for STOP_1152_WINDOW

표 5 - 32kHz에서 STOP_START_1152_WINDOW 및 STOP_1152_WINDOW를 위한 2304 윈도우 길이에 대한 스케일팩터 대역들Table 5 - Scale factor bands for window lengths of 2304 for STOP_START_1152_WINDOW and STOP_1152_WINDOW at 32 kHz

표 6 - 8kHz에서 STOP_START_1152_WINDOW 및 STOP_1152_WINDOW를 위한 2304 윈도우 길이에 대한 스케일팩터 대역들Table 6 - Scale factor bands for 2304 window lengths for STOP_START_1152_WINDOW and STOP_1152_WINDOW at 8 kHz

표 7 - 11.025, 12 및 16kHz에서 STOP_START_1152_WINDOW 및 STOP_1152_WINDOW를 위한 2304 윈도우 길이에 대한 스케일팩터 대역들Table 7 - Scale factor bands for 2304 window lengths for STOP_START_1152_WINDOW and STOP_1152_WINDOW at 11.025, 12 and 16 kHz

표 8 - 22.05 및 24kHz에서 STOP_START_1152_WINDOW 및 STOP_1152_WINDOW를 위한 2304 윈도우 길이에 대한 스케일팩터 대역들Table 8 - Scale factor bands for 2304 window lengths for STOP_START_1152_WINDOW and STOP_1152_WINDOW at 22.05 and 24 kHz

표 9 - 64kHz에서 STOP_START_1152_WINDOW 및 STOP_1152_WINDOW를 위한 2304 윈도우 길이에 대한 스케일팩터 대역들Table 9 - Scale factor bands for window lengths of 2304 for STOP_START_1152_WINDOW and STOP_1152_WINDOW at 64 kHz

표10-88.2 및 96kHz에서 STOP_START_1152_WINDOW 및 STOP_1152_WINDOW를 위한 2304 윈도우 길이에 대한 스케일팩터 대역들Scale factor bands for 2304 window lengths for STOP_START_1152_WINDOW and STOP_1152_WINDOW in Table 10-88.2 and 96 kHz

1.7 스케일팩터 대역표 참조1.7 Refer to scale factor band table.

모든 다른 스케일팩터 밴드 표들에 대해서는 ISO/IEC 14496-3, subpart4, section 4.5.4 Table 4.129 to Table4.147을 참조한다.
For all other scale factor band tables see ISO / IEC 14496-3, subpart 4, section 4.5.4 Table 4.129 to Table 4.147.

1.8 양자화(Quantization)1.8 Quantization

부호화기 내의 AAC 스펙트럴 계수들의 양자화를 위해 불균일 양자화기(Non Uniform Quantizer)가 사용된다. 그러므로 복호화기는, 스케일 팩터들의 호프만 복호화(서브 항목 6.3을 본다.) 및 스펙트럴 데이터의 무잡음 복호화(서브 항목 6.1을 본다.) 후에 역 불균일 양자화를 수행해야 한다.
A non-uniform quantizer is used for quantizing the AAC spectral coefficients in the encoder. Therefore, the decoder must perform inverse non-uniform quantization after Hoffman decoding of the scale factors (see sub-item 6.3) and noise-free decoding of spectral data (see sub-item 6.1).

TCX 스펙트럴 계수들의 양자화를 위해, 균일 양자화기(Uniform Quantizer)가 사용된다. 스펙트럴 데이터의 무잡음 복호화 후에 복호화기에서 역 양자화는 필요하지 않다.
For quantization of TCX spectral coefficients, a uniform quantizer is used. After the noise-free decoding of spectral data, inverse quantization is not necessary in the decoder.

2. 필터뱅크 및 블록 스위칭2. Filter bank and block switching

2.1 도구 설명(Tool Description)2.1 Tool Description

신호의 시간/주파수 표현은 이를 필터뱅크 모듈로 입력함으로써 시간 영역에 매핑된다. 상기 모듈은 역 변형 이산 여현 변환(IMDCT : Inverse Modified discrete Cosine Transform), 그리고 윈도우 및 오버랩-가산 함수로 이루어진다. 입력 신호의 특성에 필터뱅크의 시간/주파수 분해능을 적합시키기 위해, 블록 스위칭 도구 또한 채택된다. N은 윈도우 길이를 나타내며, N은 window_sequence(서브 항목 1.1을 본다.)의 함수이다. 각 채널에 대해, N/2 시간-주파수 값들 X_i,k들은 IMDCT에 의해 N 시간 영역 값들 x_i,n으로 변환된다. 상기 윈도우 함수를 적용한 후에, 각 채널에 대해, z_i,n의 제1 절반은, 각 채널의 out_i,n을 위한 출력 샘플을 복원하기 위해 이전 블록 윈도우잉된 배열 z_(i-1),n의 제2 절반에 가산된다.
The time / frequency representation of the signal is mapped in the time domain by entering it into the filter bank module. The module comprises an Inverse Modified Discrete Cosine Transform (IMDCT), a window and an overlap-add function. A block switching tool is also employed to adapt the time / frequency resolution of the filter bank to the characteristics of the input signal. N represents the window length, and N is a function of window_sequence (see sub-item 1.1). For each channel, the N / 2 time-frequency values Xi _{, k} are converted to N time-domain values xi _{, n} by IMDCT. After applying the window function, for each channel, the first half of z _{i, n} is the previous block windowed arrangement z _(i-1) to restore output samples for out _{i, n} of each channel _{, n} < / RTI >

2.2 정의2.2 Definitions

window_sequence 어떤 윈도우 시퀀스(즉, 블록 크기)가 사용되는지 나타내는 2 비트 window_sequence 2 bits indicating which window sequence (i.e., block size) is used

window_shape 어떤 윈도우 함수가 선택되는지 나타내는 1 비트
window_shape One bit indicating which window function is selected

도 13C는 8개의 window_sequence(ONLY_LONG_SEQUENCE, LONG_START_SEQUENCE, EIGHT_SHORT_SEQUENCE, LONG_STOP_SEQUENCE, STOP_START_SEQUENCE, STOP_1152_SEQUENCE, LPD_START_SEQUENCE, STOP_START_1152_SEQUENCE)를 보여준다.
Figure 13C shows eight window_sequences (ONLY_LONG_SEQUENCE, LONG_START_SEQUENCE, EIGHT_SHORT_SEQUENCE, LONG_STOP_SEQUENCE, STOP_START_SEQUENCE, STOP_1152_SEQUENCE, LPD_START_SEQUENCE, STOP_START_1152_SEQUENCE).

아래에서 LPD_SEQUENCE는 소위 선형 예측 영역 코덱(섹션 1.3을 보라.) 내에서의 모든 허용되는 윈도우/코딩 모드 조합을 나타낸다. 주파수 영역 코딩된 프레임을 복호화하는 상황에서는 다음 프레임이 LPD_SEQUENCE에 의해 나타내어지는, LP 영역 코딩 모드로 부호화되는지 아는 것만이 중요하다. 그러나, LPD_SEQUENCE 내의 정확한 구조는 LP 영역 코딩된 프레임을 복호화할 때 다루어진다.
The LPD_SEQUENCE below represents all permissible window / coding mode combinations within the so-called linear prediction area codec (see section 1.3). In the situation of decoding the frequency domain coded frame, it is only important to know whether the next frame is encoded in the LP region coding mode, indicated by LPD_SEQUENCE. However, the correct structure in the LPD_SEQUENCE is handled when decoding LP region coded frames.

2.3 복호화 프로세스2.3 Decryption Process

2.3.1 IMDCT2.3.1 IMDCT

IMDCT의 분석적인 표현은 다음과 같다:
The analytical expression of IMDCT is as follows:

역 변환에서 합성 윈도우(Synthesis Window) 길이 N은 구문 요소 window_sequence 및 알고리즘 맥락(Algorithmic Context)의 함수이다. 이는 다음과 같이 정의된다.
The length N of the synthesis window in the inverse transform is a function of the syntax element window_sequence and the algorithmic context. This is defined as follows.

윈도우 길이 2304:
Window length 2304 :

윈도우 길이 2048:
Window length 2048 :

의미 있는 블록 전이(Transition)들은 다음과 같다:
Significant block transitions are as follows:

2.3.2 윈도우잉 및 블록 스위칭2.3.2 Windowing and Block Switching

window_sequence 및 window_shape 요소에 따라 다른 변환 윈도우들이 사용된다. 아래와 같이 설명되는 윈도우 하프들(Window Halves)의 조합은 모든 가능한 window_sequences를 나타나게 한다.
Different conversion windows are used depending on the window_sequence and window_shape elements. The combination of Window Halves described below makes all possible window_sequences appear.

window_shape == 1에 대해, 윈도우 계수들은 다음과 같이 카이저-베셀 도출(KBD : Kaiser-Bessel Drived) 윈도우에 의해 주어진다:
For window_shape == 1, window coefficients are given by the Kaiser-Bessel Drived (KBD) window as follows:

여기서:here:

W’, 카이저-베셀 커넬 윈도우 함수(Kaiser-Bessel Kernel Window Function)(또한 [5]를 본다.)는 다음과 같이 정의된다. W ' , the Kaiser-Bessel kernel window function (see also [5]) is defined as follows.

그렇지 않으면, window_shape == 0에 대해, 다음과 같이 사인 윈도우(Sine Window)가 사용된다.Otherwise, for window_shape == 0, the Sine Window is used as follows.

윈도우 길이 N은 KBD 및 사인 윈도우에 대해 2048(1920) 또는 256(240)일 수 있다. STOP_1152_SEQUENCE 및 STOP_START_1152_SEQUENCE의 경우에, N은 여전히 2048 또는 256일 수 있고, 윈도우 기울기는 유사하지만, 평평한 위쪽 영역들은 더 길다.
The window length N may be 2048 (1920) or 256 (240) for KBD and sine window. In the case of STOP_1152_SEQUENCE and STOP_START_1152_SEQUENCE, N may still be 2048 or 256 and the window slope is similar, but the flat upper regions are longer.

오직 LPC_START_SEQUENCE의 경우에만, 윈도우의 오른쪽 부분이 64 샘플의 사인 윈도우이다.
For the LPC_START_SEQUENCE only, the right part of the window is a sine window of 64 samples.

가능한 윈도우 시퀀스들을 얻는 방법은 본 서브 항목(Subclause)의 a)-h) 파트에서 설명된다.
The way to obtain possible window sequences is described in the a) -h) part of this subclause.

모든 종류의 window_sequences에 대해, 첫 번째 변형 윈도우의 왼쪽 반의 window_shape는 이전 블록의 윈도우 형상에 의해 결정된다. 다음 수식은 이러한 사실을 표현한다.
For all kinds of window_sequences, the window_shape of the left half of the first transform window is determined by the window shape of the previous block. The following formula expresses this fact.

여기서:
here:

window_shape_previous_block: 이건 블록 (i-1)의 window_shape window_shape_previous_block: This is a window_shape of block (i-1)

복호화되는 첫 번째 raw_data_block()에 대해 윈도우의 왼쪽 및 오른쪽 반의 window_shape는 동일하다.
For the first raw_data_block () to be decoded, the window_shape of the left and right half of the window is the same.

a) ONLY_LONG_SEQUENCE:a) ONLY_LONG_SEQUENCE:

window_sequence == ONLY_LONG_SEQUENCE는 2048(1920)의 전체 윈도우 길이 N_l 를 갖는 하나의 LONG_WINDOW와 같다.
window_sequence == ONLY_LONG_SEQUENCE is equal to one LONG_WINDOW with an overall window length N_l of 2048 (1920).

window_shape == 1에 대해 ONLY_LONG_SEQUENCE를 위한 윈도우는 다음과 같이 주어진다: For window_shape == 1, the window for ONLY_LONG_SEQUENCE is given as:

만일window_shape == 0 이라면 ONLY_LONG_SEQUENCE를 위한 윈도우는 다음과 같이 설명될 수 있다:If window_shape == 0, the window for ONLY_LONG_SEQUENCE can be described as:

윈도우잉 후, 시간 영역 값 (z_i,n)은 다음 수식으로 표현될 수 있다:After windowing, the time domain value (z _{i, n} ) can be expressed as:

b) LONG_START_SEQUENCE:b) LONG_START_SEQUENCE:

LONG_START_SEQUENCE는 ONLY_LONG_SEQUENCE로부터 EIGHT_SHORT_SEQUENCE로의 블록 변환을 위한 정확한 오버랩 및 가산을 얻기 위해 필요하다.
LONG_START_SEQUENCE is needed to get accurate overlaps and additions for block conversions from ONLY_LONG_SEQUENCE to EIGHT_SHORT_SEQUENCE.

윈도우 길이 N_l 및 N_s는 각각 2048(1920) 및 256(240)으로 놓여진다.The window length N_l and N_s are set to 2048 (1920) and 256 (240), respectively.

만일 window_shape == 1 이라면 LONG_START_SEQUENCE를 위한 윈도우는 다음과 같이 주어진다:If window_shape == 1 then the window for LONG_START_SEQUENCE is given as:

만일 window_shape == 0 이라면 LONG_START_SEQUENCE를 위한 윈도우는 다음과 같다:If window_shape == 0, the window for LONG_START_SEQUENCE is:

윈도우잉된 시간 영역 값은 a)에 설명된 수식으로 계산될 수 있다.
The windowed time domain value can be calculated by the equation described in a).

c) EIGHT_SHORTc) EIGHT_SHORT

window_sequence == EIGHT_SHORT는 각각 256(240)의 길이 N_s를 갖는 여덟 개의 오버래핑된 및 가산된 SHORT_WINDOW들을 포함한다. 선행 및 후행 영들(zeros)을 함께 갖는 window_sequence의 전체 길이는 2048(1920)이다. 여덟 개의 짧은 블록들 각각은 우선 개별적으로 윈도우잉된다. 짧은 블록 번호는 변수 j=0,…,M-1 (M=N_l/N_s)로 표시된다.
window_sequence == EIGHT_SHORT contains eight overlapped and added SHORT_WINDOWs with length N_s of 256 (240), respectively. The total length of the window_sequence having the leading and trailing zeros together is 2048 (1920). Each of the eight short blocks is windowed individually first. The short block number is the variable j = 0, ... , M-1 ( M = N_l / N_s ).

이전 블록의 window_shape는 여덟 개의 짧은 블록들의 첫 번째(W₀(n))에만 영향을 준다. 만일 window_shape == 1 이라면 윈도우 함수들은 다음과 같이 주어질 수 있다:The window_shape of the previous block only affects the first of the eight short blocks (W ₀ (n)). If window_shape == 1, the window functions can be given as:

그렇지 않고, window_shape == 0 이라면, 윈도우 함수들은 다음과 같이 설명될 수 있다:Otherwise, if window_shape == 0, the window functions can be described as follows:

윈도우잉된 시간 영역 값들 z_i,n을 가져오는 EIGHT_SHORT window_sequence 사이의 오버랩 및 가산은 다음과 같이 설명될 수 있다:
The overlap and addition between EIGHT_SHORT window_sequence, which yields windowed time domain values z _{i, n} , can be described as:

d) LONG_STOP_SEQUENCEd) LONG_STOP_SEQUENCE

상기 window_sequence는 EIGHT_SHORT_SEQUENCE로부터 도로 ONLY_LONG_SEQUENCE로 스위칭하기 위해 필요하다.
The window_sequence is needed to switch from EIGHT_SHORT_SEQUENCE to road ONLY_LONG_SEQUENCE.

만일 window_shape == 1 이라면 LONG_STOP_SEQUENCE를 위한 윈도우는 다음과 같이 주어진다:If window_shape == 1 then the window for LONG_STOP_SEQUENCE is given as:

만일 window_shape == 0 이라면 LONG_START_SEQUENCE를 위한 윈도우는 다음에 의해 결정된다:If window_shape == 0 then the window for LONG_START_SEQUENCE is determined by:

윈도우잉된 시간 영역 값들은 a)에서 설명된 수식으로 계산될 수 있다.
The windowed time domain values can be calculated with the equations described in a).

e) STOP_START_SEQUENCE:e) STOP_START_SEQUENCE:

STOP_START_SEQUENCE는 단지 ONLY_LONG_SEQUENCE가 필요할 때 EIGHT_SHORT_SEQUENCE로부터 EIGHT_SHORT_SEQUENCE로의 블록 전이를 위한 정확한 오버랩 및 가산을 얻기 위해 필요하다.
STOP_START_SEQUENCE is only needed to get accurate overlaps and additions for block transitions from EIGHT_SHORT_SEQUENCE to EIGHT_SHORT_SEQUENCE when ONLY_LONG_SEQUENCE is needed.

윈도우 길이 N_l 및 N_s는 각각 2048(1920) 및 256(240)로 각각 설정된다.The window length N_l and N_s are set to 2048 (1920) and 256 (240), respectively.

만일 window_shape == 1 이라면 STOP_START_SEQUENCE를 위한 윈도우는 다음과 같이 주어진다:
If window_shape == 1 then the window for STOP_START_SEQUENCE is given as:

만일 window_shape == 0 이라면 STOP_START_SEQUENCE를 위한 윈도우는 다음과 같다:If window_shape == 0 then the window for STOP_START_SEQUENCE is:

윈도우잉된 시간-영역 값들은 a)에서 설명된 수식으로 계산될 수 있다.
The windowed time-domain values can be calculated with the equations described in a).

f) LPD_START_SEQUENCE:f) LPD_START_SEQUENCE:

LPD_START_SEQUENC는 ONLY_LONG_SEQUENCE로부터 LPD_SEQUENCE로의 블록 전이를 위한 정확한 오버랩 및 가산을 얻기 위해 필요하다.
LPD_START_SEQUENC is needed to obtain the correct overlap and addition for the block transition from ONLY_LONG_SEQUENCE to LPD_SEQUENCE.

윈도우 길이 N_l 및 N_s는 각각 2048(1920) 및 256(240)으로 놓여진다.
The window length N_l and N_s are set to 2048 (1920) and 256 (240), respectively.

만일 window_shape == 1 이라면 LPD_START_SEQUENCE를 위한 윈도우는 다음과 같이 주어진다:If window_shape == 1 then the window for LPD_START_SEQUENCE is given as:

만일 window_shape == 0 이라면 LPD_START_SEQUENCE를 위한 윈도우는 다음과 같다:If window_shape == 0 then the window for LPD_START_SEQUENCE is:

g) STOP_1152_SEQUENCE:g) STOP_1152_SEQUENCE:

STOP_1152_SEQUENCE는 LPD_SEQUENCE로부터 ONLY_LONG_SEQUENCE로의 블록 전이를 위한 정확한 오버랩 및 가산을 얻기 위해 필요하다.
STOP_1152_SEQUENCE is needed to obtain accurate overlaps and additions for block transitions from LPD_SEQUENCE to ONLY_LONG_SEQUENCE.

윈도우 길이 N_l 및 N_s는 각각 2048(1920) 및 256(240)으로 설정된다.The window length N_l and N_s are set to 2048 (1920) and 256 (240), respectively.

만일 window_shape == 1 이라면 STOP_1152_SEQUENCE를 위한 윈도우는 다음과 같이 주어진다:
If window_shape == 1 then the window for STOP_1152_SEQUENCE is given as:

만일 window_shape == 0 이라면 STOP_1152_SEQUENCE를 위한 윈도우는 다음과 같다:
If window_shape == 0 then the window for STOP_1152_SEQUENCE is:

h) STOP_START_1152_SEQUENCE:h) STOP_START_1152_SEQUENCE:

STOP_START_1152_SEQUENCE는 단지 ONLY_LONG_SEQUENCE가 필요할 때 LPD_SEQUENCE로부터 EIGHT_SHORT_SEQUENCE로의 블록 전이를 위한 정확한 오버랩 및 가산을 얻기 위해 필요하다.
STOP_START_1152_SEQUENCE is only needed to get accurate overlaps and additions for block transitions from LPD_SEQUENCE to EIGHT_SHORT_SEQUENCE when ONLY_LONG_SEQUENCE is needed.

만일 window_shape == 0 이라면 STOP_START_SEQUENCE를 위한 윈도우는 다음과 같다:
If window_shape == 0 then the window for STOP_START_SEQUENCE is:

2.3.3 이전 윈도우 시퀀스를 이용한 오버래핑 및 가산2.3.3 Overlapping and Addition Using Old Window Sequences

EIGHT_SHORT window_sequence내부의 오버랩 및 가산 외에도 모든 window_sequence 의 제1(왼쪽) 부분은 최종 시간 영역 값들 out_i,n을 가져오는 이전 window_sequence의 제2(오른쪽) 부분으로 오버래핑되고 가산된다. 이러한 동작을 위한 수학적 표현은 다음과 같이 설명될 수 있다.EIGHT_SHORTwindow_sequenceIn addition to internal overlaps and additions, allwindow_sequence (Left) portion of the last time-domain values out_{i, n}Get Previouswindow_sequenceTo the second (right) part of Overlapped and added. The mathematical expression for this operation can be described as follows.

ONLY_LONG_SEQUENCE, LONG_START_SEQUENCE, EIGHT_SHORT_SEQUENCE, LONG_STOP_SEQUENCE, STOP_START_SEQUENCE, LPD_START_SEQUENCE의 경우에:In the case of ONLY_LONG_SEQUENCE, LONG_START_SEQUENCE, EIGHT_SHORT_SEQUENCE, LONG_STOP_SEQUENCE, STOP_START_SEQUENCE, LPD_START_SEQUENCE:

그리고 STOP_1152_SEQUENCE, STOP_START_1152_SEQUENCE의 경우에:And in the case of STOP_1152_SEQUENCE, STOP_START_1152_SEQUENCE:

LPD_START_SEQUENCE의 경우에, 다음 시퀀스는 LPD_SEQUENCE이다. SIN 또는 KBD 윈도우는 좋은 오버랩 및 가산을 가지기 위해 LPD_SEQUENCE의 왼쪽 부분에 적용된다.In the case of LPD_START_SEQUENCE, the next sequence is LPD_SEQUENCE. The SIN or KBD window is applied to the left part of the LPD_SEQUENCE to have good overlap and addition.

STOP_1152_SEQUENCE, STOP_START_1152_SEQUENCE의 경우에, 이전 시퀀스는 LPD_SEQUENCE이다. TDAC 윈도우는 좋은 오버랩 및 가산을 가지기 위해 LPD_SEQUENCE의 오른쪽 부분에 적용된다.
In the case of STOP_1152_SEQUENCE, STOP_START_1152_SEQUENCE, the previous sequence is LPD_SEQUENCE. The TDAC window is applied to the right part of the LPD_SEQUENCE to have good overlap and addition.

3. IMDCT3. IMDCT

서브 항목(Subclause) 2.3.1을 보라.
See Subclause 2.3.1.

3.1 윈도우잉 및 블록 스위칭3.1 Windowing and Block Switching

window_shape 요소에 따라 서로 다른 오버 샘플링된 변환 윈도우 프로토타입(Prototype)들이 사용되며, 상기 오버 샘플링된 윈도우들의 길이는 다음과 같다.
Different oversampled transformation window prototypes are used depending on the window_shape element, and the lengths of the oversampled windows are as follows.

window_shape == 1에 대해, 윈도우 계수들은 다음과 같이 카이저-베셀 도출(KBD : Kaiser-Bessel Drived) 윈도우에 의해 주어진다.
For window_shape == 1, window coefficients are given by the Kaiser-Bessel Drived (KBD) window as follows.

여기서:here:

그렇지 않으면, window_shape == 0에 대해, 다음과 같이 사인 윈도우(Sine Window)가 사용된다.
Otherwise, for window_shape == 0, the Sine Window is used as follows.

모든 종류의 window_sequence들에 대해 왼쪽 윈도우 부분을 위해 사용되는 프로토타입은 이전 블록의 윈도우 형상에 의해 결정된다. 다음 수식은 이러한 사실을 표현한다:
The prototype used for the left window part for all kinds of window_sequences is determined by the window shape of the previous block. The following formula expresses this fact:

마찬가지로 오른쪽 윈도우 형상을 위한 프로토타입은 다음 수식에 의해 결정된다:
Similarly, the prototype for the right window shape is determined by the following equation:

전이 길이(Transition Length)들은 이미 결정되기 때문에, 이는 단지 EIGHT_SHORT_SEQUENCE들과 다른 모든 것들(All Others) 사이에서 구별되기만 하면 된다:
Since the transition lengths are already determined, this only needs to be distinguished between EIGHT_SHORT_SEQUENCES and all others:

a) EIGHT SHORT SEQUENCE:
a) EIGHT SHORT SEQUENCE:

아래의 c-코드 형태 부분은 EIGHT_SHORT_SEQUENCE의 내부 오버랩-가산 및 윈도우잉을 설명한다.
The following c-code type part describes the internal overlap-addition and windowing of EIGHT_SHORT_SEQUENCE.

tw_windowing_short(X[][],z[],first_pos,last_pos,warpe_trans_len_left,warped_trans_len_right,left_window_shape[],right_window_shape[]){
tw_windowing_short (X [] [], z [], first_pos, last_pos, warpe_trans_len_left, warped_trans_len_right, left_window_shape [], right_window_shape []) {

offset = n_long - 4*n_short - n_short/2;
offset = n_long - 4 * n_short - n_short / 2;

tr_scale_1 = 0.5*n_long/warped_trans_len_left*os_factor_win; tr_scale_1 = 0.5 * n_long / warped_trans_len_left * os_factor_win;

tr_pos_l = warped_trans_len_left+(first_pos-n_long/2)+0.5) *tr_scale_l; tr_pos_l = warped_trans_len_left + (first_pos-n_long / 2) +0.5) * tr_scale_l;

tr_scale_r = 8*os_factor_win; tr_scale_r = 8 * os_factor_win;

tr_pos_r = tr_scale_r/2;
tr_pos_r = tr_scale_r / 2;

for ( i = 0 ; i < n_short ; i++ ) { for (i = 0; i <n_short; i ++) {

z[i] = X[0][i]; z [i] = X [0] [i];

}
}

for(i=0;i<first_pos;i++) for (i = 0; i <first_pos; i ++)

z[i] = 0.;
z [i] = 0 .;

for(i=n_long-1-first_pos;i>=first_pos;i--) { for (i = n_long-1-first_pos; i> = first_pos; i--) {

z[i] *= left_window_shape[floor(tr_pos_l)]; z [i] * = left_window_shape [floor (tr_pos_l)];

tr_pos_l += tr_scale_l; tr_pos_l + = tr_scale_l;

}
}

for(i=0;i<n_short;i++) { for (i = 0; i <n_short; i ++) {

z[offset+i+n_short]= z [offset + i + n_short] =

X[0][i+n_short]*right_window_shape[floor(tr_pos_r)]; X [0] [i + n_short] * right_window_shape [floor (tr_pos_r)];

tr_pos_r += tr_scale_r; tr_pos_r + = tr_scale_r;

}
}

offset += n_short;
offset + = n_short;

for ( k = 1 ; k < 7 ; k++ ) { for (k = 1; k <7; k ++) {

tr_scale_l = n_short*os_factor_win; tr_scale_l = n_short * os_factor_win;

tr_pos_l = tr_scale_l/2; tr_pos_l = tr_scale_l / 2;

tr_pos_r = os_factor_win*n_long-tr_pos_l; tr_pos_r = os_factor_win * n_long-tr_pos_l;

for ( i = 0 ; i < n_short ; i++ ) { for (i = 0; i <n_short; i ++) {

z[i + offset] += X[k][i]*right_window_shape[floor(tr_pos_r)]; z [i + offset] + = X [k] [i] * right_window_shape [floor (tr_pos_r)];

z[offset + n_short + i] = z [offset + n_short + i] =

X[k][n_short + i]*right_window_shape[floor(tr_pos_l)]; X [k] [n_short + i] * right_window_shape [floor (tr_pos_l)];

tr_pos_l += tr_scale_l; tr_pos_l + = tr_scale_l;

tr_pos_r -= tr_scale_l; tr_pos_r - = tr_scale_l;

} }

offset += n_short; offset + = n_short;

}
}

tr_scale_l = n_short*os_factor_win; tr_scale_l = n_short * os_factor_win;

tr_pos_l = tr_scale_l/2;
tr_pos_l = tr_scale_l / 2;

for ( i = n_short - 1 ; i >= 0 ; i-- ) { for (i = n_short - 1; i> = 0; i--) {

z[i + offset] += X[7][i]*right_window_shape[(int) floor(tr_pos_l)]; z [i + offset] + = X [7] [i] * right_window_shape [(int) floor (tr_pos_l)];

tr_pos_l += tr_scale_l; tr_pos_l + = tr_scale_l;

}
}

for ( i = 0 ; i < n_short ; i++ ) { for (i = 0; i <n_short; i ++) {

z[offset + n_short + i] = X[7][n_short + i]; z [offset + n_short + i] = X [7] [n_short + i];

} }

tr_scale_r = 0.5*n_long/warpedTransLenRight*os_factor_win; tr_scale_r = 0.5 * n_long / warpedTransLenRight * os_factor_win;

tr_pos_r = 0.5*tr_scale_r+.5;
tr_pos_r = 0.5 * tr_scale_r + .5;

tr_pos_r = (1.5*n_long-(float)wEnd-0.5+warpedTransLenRight) *tr_scale_r; tr_pos_r = (1.5 * n_long- (float) wEnd-0.5 + warpedTransLenRight) * tr_scale_r;

for(i=3*n_long-1-last_pos ;i<=wEnd;i++) { (i = 3 * n_long-1-last_pos; i <= wEnd; i ++) {

z[i] *= right_window_shape[floor(tr_pos_r)]; z [i] * = right_window_shape [floor (tr_pos_r)];

tr_pos_r += tr_scale_r; tr_pos_r + = tr_scale_r;

}
}

for(i=lsat_pos+1;i<2*n_long;i++) for (i = lsat_pos + 1; i <2 * n_long; i ++)

z[i] = 0.;
z [i] = 0 .;

b) 다른 모든 것들(All Others):
b) All Others:

tw_windowing_long(X[][],z[],first_pos,last_pos,warpe_trans_len_left,warped_trans_len_right,left_window_shape[],right_window_shape[]){
tw_windowing_long (X [] [], z [], first_pos, last_pos, warpe_trans_len_left, warped_trans_len_right, left_window_shape [], right_window_shape [])

for(i=0;i<first_pos;i++) for (i = 0; i <first_pos; i ++)

z[i] = 0.; z [i] = 0 .;

for(i=last_pos+1;i<N;i++) for (i = last_pos + 1; i <N; i ++)

z[i] = 0.;
z [i] = 0 .;

tr_scale = 0.5*n_long/warped_trans_len_left*os_factor_win; tr_scale = 0.5 * n_long / warped_trans_len_left * os_factor_win;

tr_pos = (warped_trans_len_left+first_pos-N/4)+0.5)*tr_scale;
tr_pos = (warped_trans_len_left + first_pos-N / 4) +0.5) * tr_scale;

for(i=N/2-1-first_pos;i>=first_pos;i--) { for (i = N / 2-1-first_pos; i> = first_pos; i--) {

z[i] = X[0][i]*left_window_shape[floor(tr_pos)]); z [i] = X [0] [i] * left_window_shape [floor (tr_pos)]);

tr_pos += tr_scale; tr_pos + = tr_scale;

}
}

tr_scale = 0.5*n_long/warped_trans_len_right*os_factor_win; tr_scale = 0.5 * n_long / warped_trans_len_right * os_factor_win;

tr_pos = (3*N/4-last_pos-0.5+warped_trans_len_right)*tr_scale;
tr_pos = (3 * N / 4-last_pos-0.5 + warped_trans_len_right) * tr_scale;

for(i=3*N/2-1-last_pos;i<=last_pos;i++) { (i = 3 * N / 2-1-last_pos; i <= last_pos; i ++) {

z[i] = X[0][i]*right_window_shape[floor(tr_pos)]); z [i] = X [0] [i] * right_window_shape [floor (tr_pos)]);

tr_pos += tr_scale; tr_pos + = tr_scale;

} }

}
}

4. MDCT 기반 TCX4. MDCT-based TCX

4.1 도구 설명(Tool Description)4.1 Tool Description

core-mode가 1과 같을 때 그리고 세 TCX 모드 중 하나 또는 그 이상이 “선형 예측-영역”코딩으로써 선택될 때, 즉 mod[]의 4 배열 엔트리들(Array Entries) 중 하나가 0보다 클 때, MDCT 기반 TCX가 사용된다. MDCT 기반 TCX는 산술(Arithmetic) 복호화기로부터 양자화된 스펙트럴 계수들을 받는다. 상기 양자화된 계수들은 먼저, 가중 분석 LPC-필터(Weighting Synthesis LPC-filter)에 입력되는 시간-영역 가중 합성(Time-domain Weighted Synthesis)을 얻기 위한 역 MDCT 변환을 적용하기 전에, 안락 잡음(Comfort Noise)에 의해 완성된다.
When core-mode is equal to 1 and one or more of the three TCX modes are selected as "linear prediction-area" coding, ie, one of the four array entries in mod [] is greater than zero , MDCT based TCX is used. The MDCT-based TCX receives quantized spectral coefficients from an arithmetic decoder. The quantized coefficients are first subjected to an inverse fast Fourier transform (FFT) prior to applying an inverse MDCT transform to obtain a time-domain weighted synthesis input to a weighting synthesis LPC-filter, ).

4.2 정의4.2 Definitions

lg 산술적 복호화기에 의한 양자화된 스펙트럴 계수들 출력의 수lg Quantized spectral coefficients by arithmetic decoder Number of outputs

noise_factor 잡음 레벨 양자화 인덱스 noise_factor noise level quantization index

noise level 복원된 스펙트럼에 삽입된 잡음의 레벨noise level Level of noise inserted in the reconstructed spectrum

noise[] 발생된 잡음의 벡터noise [] vector of generated noise

global_gain 재-스케일링(Re-scaling) 이득 양자화 인덱스 global_gain Re-scaling gain quantization index

g 재-스케일링(Re-scaling) 이득g Re-scaling gain

rms 합성된 시간-영역 신호, x[]의 평균제곱근(Root Mean Square)The rms synthesized time-domain signal, the root mean square of x [],

x[] 합성된 시간-영역 신호
x [] Composite time-domain signal

4.3 복호화 프로세스4.3 Decryption process

MDCT-기반 TCX는 산술적 복호화기로부터 mod[] 및 last_lpd_mode 값들에 의해 결정되는, 양자화된 스펙트럴 계수들의 수, lg를 요청한다. 이러한 두 값들은 또한 역 MDCT에서 적용될 윈도우 길이와 형상을 정의한다. 상기 윈도우는 세 부분, L 샘플들의 왼쪽 오버랩, M 샘플들의 1들(Ones)의 중간 부분 및 R 샘플들의 오른쪽 오버랩 부분으로 이루어진다. 길이 2*lg의 MDCT 윈도우를 얻기 위해, 표 3/도 14F에서 도 14G에 나타난 바와 같이 ZL 영들(Zeros)이 왼쪽에 추가되고 ZR 영들(Zeros)이 오른쪽 측에 추가된다.
The MDCT-based TCX requests the number of quantized spectral coefficients, lg, as determined by the mod [] and last_lpd_mode values from the arithmetic decoder. These two values also define the window length and shape to be applied in the inverse MDCT. The window consists of three parts, a left overlap of L samples, a middle part of 1s of M samples (Ones) and a right overlap of R samples. To obtain a MDT window of length 2 * lg, ZL spells Zeros are added to the left and ZR spells (Zeros) are added to the right side, as shown in Table 3 / Figure 14F to Figure 14G.

표 3 -last_lpd_mode 및 mod[]의 함수로써 스펙트럴 계수들의 수Table 3 - Number of spectral coefficients as a function of last_lpd_mode and mod []

MDCT 윈도우는 다음과 같이 주어진다.The MDCT window is given as follows.

산술적 복호화기에 의해 생성된(Delivered), 양자화된 스펙트럴 계수들, quant[]는 안락 잡음(Comfort Noise)에 의해 완성된다. 삽입된 잡음의 레벨은 다음과 같이 복호화된 noise_factor에 의해 결정된다.The quantized spectral coefficients, quant [], produced by the arithmetic decoder are completed by Comfort Noise. The level of the inserted noise is determined by the decoded noise_factor as follows.

noise_level = 0.0625*(8-noise_factor)
noise_level = 0.0625 * (8-noise_factor)

잡음 벡터, noise[]는, 그 때 값 -1 또는 +1 값을 임의로 생성하는 랜덤(Random) 함수, random_sign()을 이용하여 계산된다.The noise vector, noise [], is then computed using a random function, random_sign (), which randomly generates value -1 or +1 values at that time.

noise[i] = random_sign()*noise_level;
noise [i] = random_sign () * noise_level;

quant[] 및 noise[] 벡터들은, quant[] 내의 8 연속된 영들(Consecutive Zeros)의 런(Run)들이 noise[]의 성분에 의해 대체되는 방법으로, 복원된 스펙트럴 계수 벡터, r[]를 형성하기 위해 결합된다. 8 비-영들(Non-zeros)의 런(Run)은 다음 수식에 따라 탐지된다. The quant [] and noise [] vectors are the reconstructed spectral coefficient vector, r [], in such a way that the runs of Consecutive Zeros in quant [] are replaced by the components of noise [ Respectively. 8 Runs of non-zeros are detected according to the following formula:

1은 다음과 같이 복원된 스펙트럼을 획득한다:1 obtains the reconstructed spectrum as follows:

역 MDCT를 적용하기에 앞서 다음 단계에 따라 스펙트럼 역-형상화(De-shaping)가 적용된다:Prior to applying the inverse MDCT, spectral inverse shaping (de-shaping) is applied according to the following steps:

1. 스펙트럼의 첫 4분의 1의 각 8-차원 블록(8-dimensional Block)에 대해 인덱스 m에서 8-차원 블록의 에너지 E_m 을 계산한다.1. Calculate the energy E _m of the 8-dimensional block at index m for each 8-dimensional block of the first quarter of the spectrum.

2. 비율 R_m=sqrt(E_m/E_I)를 계산하며, 여기서 I는 모든 E_m 의 최대값을 갖는 블록 인덱스이다.2. calculating a ratio _{_{R m = sqrt (E m /}} E I) , where I is a block index having a maximum value of all E _m.

3. 만일 R_m<0.1,이면, 그 때 R_m=0.1로 설정한다.3. If _Rm <0.1 , then set _Rm = 0.1 .

4. 만일 R_m<R_m-1 ,이면, 그 때 R_m=R_m-1 로 설정한다.
4. If _Rm < _Rm-1 , then set _Rm = _Rm-1 .

스펙트럼의 첫 4분의 1에 속하는 각 8-차원 블록은 그 때 상기 팩터(Factor) R_m 에 의해 곱해진다.
Each 8-dimensional block belonging to the first quarter of the spectrum is then multiplied by the Factor R _m .

복원된 스펙트럼은 역 MDCT로 입력된다. 윈도우잉되지 않은(Non-windowed) 출력 신호, x[]은, 복호화된 global_gain 인덱스의 역 양자화에 의해 얻어지는, 이득, g에 의해 재-스케일링된다:The reconstructed spectrum is input to the inverse MDCT. The non-windowed output signal, x [], is re-scaled by the gain, g, obtained by dequantization of the decoded global_gain index:

여기서, rms는 다음과 같이 계산된다:Here, rms is calculated as follows:

재스케일링되고(Rescaled) 합성된(Synthesized) 시간-영역 신호는 그 때 다음과 같다:
The rescaled and synthesized time-domain signal is then:

윈도우잉과 오버랩을 재스케일링한 후 가산이 적용된다.
After the windowing and overlap are rescaled, the addition is applied.

복원된 TCX 목표 x(n)은 그 때 합성 필터에 적용될 여기 신호를 찾기 위해 영 상태 역 가중 합성 필터(Zero-state Inverse Weighted Synthesis Filter)

를 통해 필터링된다. 상기 필터링에서 서브 프레임마다 보간된(Interpolated) LP 필터가 사용됨에 주목한다. 상기 여기(Excitation)가 결정되면, 합성 필터

를 통해 여기를 필터링하고 그 때 위에서 설명한 바와 같이 필터

를 통해 필터링하여 디-엠퍼시스 함으로써(De-emphasizing) 신호가 복원된다.
The restored TCX target x (n) is then multiplied by a zero-state inverse weighted synthesis filter

Lt; / RTI > Note that an LP filter interpolated in each sub-frame is used in the filtering. When the excitation is determined,

Lt; RTI ID = 0.0 > filter < / RTI >

And the de-emphasizing signal is restored.

여기(Excitation)는 또한 ACELP 적응 코드북을 갱신하고 순차적인 프레임에서 TCX로부터 ACELP로 스위칭하도록 하기 위해 필요하다는 점을 주목한다. 또한 TCX 합성의 길이는 TCX 프레임 길이(오버랩이 없는) 각각 1,2 또는 3의 mod[]에 대해 256, 512 또는 1024에 의해 주어지는 점에 주목한다.
Note that the excitation is also needed to update the ACELP adaptive codebook and to switch from TCX to ACELP in a sequential frame. It is also noted that the length of the TCX synthesis is given by 256, 512 or 1024 for mod [] of 1, 2 or 3 TCX frame length (no overlap), respectively.

표준 참조 문헌들
Standard References

[1] ISO/IEC 11172-3:1993, Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s, Part 3: Audio.
[1] ISO / IEC 11172-3: 1993, Information technology - Coding of moving pictures and associated audio for digital storage media at up to 1.5 Mbit / s, Part 3: Audio.

[2] ITU-T Rec.H.222.0(1995) | ISO/IEC 13818-1:2000, Information technology - Generic coding of moving pictures and associated audio information: - Part 1: Systems.
[2] ITU-T Rec.H.222.0 (1995) | ISO / IEC 13818-1: 2000, Information technology - Generic coding of moving pictures and associated audio information: - Part 1: Systems.

[3] ISO/IEC 13818-3:1998, Information technology - Generic coding of moving pictures and associated audio information: - Part 3: Audio.
[3] ISO / IEC 13818-3: 1998, Information technology - Generic coding of moving pictures and associated audio information: - Part 3: Audio.

[4] ISO/IEC 13818-7:2004, Information technology - Generic coding of moving pictures and associated audio information: - Part 7: Advanced Audio Coding (AAC).
[4] ISO / IEC 13818-7: 2004, Information technology - Generic coding of moving pictures and associated audio information: - Part 7: Advanced Audio Coding (AAC).

[5] ISO/IEC 14496-3:2005, Information technology - Coding of audio-visual objects - Part 1: Systems
[5] ISO / IEC 14496-3: 2005, Information technology - Coding of audio-visual objects - Part 1: Systems

[6] ISO/IEC 14496-3:2005, Information technology - Coding of audio-visual objects - Part 3: Audio
[6] ISO / IEC 14496-3: 2005, Information technology - Coding of audio-visual objects - Part 3: Audio

[7] ISO/IEC 23003-1:2007, Information technology - MPEG audio technologies - Part 1: MPEG Surround
[7] ISO / IEC 23003-1: 2007, Information technology - MPEG audio technologies - Part 1: MPEG Surround

[8] 3GPP TS 26.290 V6.3.0, Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec; Transcoding functions
[8] 3GPP TS 26.290 V6.3.0, Extended Adaptive Multi-Rate-Wideband (AMR-WB +) codec; Transcoding functions

[9] 3GPP TS 26.190, Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions
[9] 3GPP TS 26.190, Adaptive Multi-Rate-Wideband (AMR-WB) speech codec; Transcoding functions

[10] 3GPP TS 26.090, Adaptive Multi-Rate (AMR) speech codec; Transcoding functions
[10] 3GPP TS 26.090, Adaptive Multi-Rate (AMR) speech codec; Transcoding functions

정의들Definitions

정의들은 ISO/IEC 14496-3, 서브 파트(Subpart) 1, 서브 항목(Subclause) 1.3 (용어 및 정의들(Terms and Definitions)) 및 3GPP TS 26.290, 섹션(Section) 3(정의 및 약어들(Definitions and Abbreviations))에서 찾아질 수 있다.
Definitions are defined in ISO / IEC 14496-3, Subpart 1, Subclause 1.3 (Terms and Definitions) and 3GPP TS 26.290, Section 3 (Definitions and Abbreviations and Abbreviations).

비록 어떤 면들은 장치의 맥락에서 설명되어 왔으나, 이러한 면들은 또한 대응하는 방법의 설명을 나타냄은 명백하며, 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 대응한다. 유사하게, 방법 단계의 맥락에서 설명된 면들은 또한 대응하는 블록 또는 항목(Item) 또는 대응하는 장치의 특징을 나타낸다.
Although some aspects have been described in the context of a device, it is evident that these aspects also represent a description of the corresponding method, and the block or device corresponds to a feature of the method step or method step. Similarly, the aspects described in the context of the method steps also represent the corresponding block or item or feature of the corresponding device.

진보된 부호화된 오디오 신호는 디지털 저장 매체에 저장되거나, 또는 무선 전송 매체 또는 인터넷과 같은 유선 전송 매체와 같은 전송 매체에서 전송될 수 있다.
The advanced encoded audio signal may be stored in a digital storage medium or transmitted in a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

특정 구현 필요에 따라, 상기 발명의 실시예들은 하드웨어 또는 소프트웨어에서 구현될 수 있다. 상기 구현은 디지털 저장 매체, 예를 들어 전자적으로 판독 가능한 제어 신호가 저장된, 플로피 디스크, DVD, CD, ROM, PROM, EPROM, EEPROM 또는 플래시 메모리(FLASH Memory)를 사용하여 수행될 수 있으며, 이는 각 방법이 수행되도록 하기 위해 프로그램 가능한 컴퓨터 시스템과 함께 동작한다. (또는 함께 동작 가능하다.)
Depending on the specific implementation needs, embodiments of the invention may be implemented in hardware or software. The implementation may be performed using a digital storage medium, such as a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, in which an electronically readable control signal is stored, Methods operate in conjunction with a programmable computer system. (Or can work together).

상기 발명에 따른 몇몇 실시예들은 전자적으로 판독 가능한 제어 신호들을 포함하는 데이터 전송자(Carrier)를 포함하며, 프로그램 가능한 컴퓨터 시스템과 함께 동작할 수 있되 이는 여기서 설명된 방법들 중 하나가 수행된다.
Some embodiments in accordance with the invention include a data carrier including electronically readable control signals and may operate in conjunction with a programmable computer system in which one of the methods described herein is performed.

일반적으로, 본 발명의 실시예들은 프로그램 코드를 갖는 컴퓨터 프로그램 제품으로써 구현될 수 있으며, 상기 프로그램 코드는 상기 컴퓨터 프로그램 제품이 컴퓨터상에서 동작할 때 상기 방법들 중 하나를 수행하기 위해 동작한다. 상기 프로그램 코드는 예를 들어 기계 판독 가능 전송자(Machine Readable Carrier)에 저장될 수 있다.
In general, embodiments of the present invention may be implemented as a computer program product having program code, the program code being operative to perform one of the methods when the computer program product is running on a computer. The program code may be stored, for example, in a machine readable carrier.

다른 실시예들은 기계 판독 가능 전송자에 저장된, 여기서 설명된 방법들 중 하나를 수행하기 위한 컴퓨터 프로그램을 포함한다.
Other embodiments include a computer program for performing one of the methods described herein, stored in a machine-readable sender.

다르게 말하면, 상기 진보된 방법의 실시예는, 그러므로, 컴퓨터 프로그램이 컴퓨터에서 동작할 때, 여기서 설명된 방법들 중 하나를 수행하기 위한 프로그램 코드를 갖는 컴퓨터 프로그램이다.
In other words, an embodiment of the above-described advanced method is therefore a computer program having program code for performing one of the methods described herein when the computer program is run on the computer.

진보된 방법들의 추가 실시예는, 그러므로, 여기서 설명된 방법들 중 하나를 수행하기 위한 컴퓨터 프로그램이 기록된, 데이터 전송자(또는 디지털 저장 매체, 또는 컴퓨터-판독 가능 매체)이다.
A further embodiment of the advanced methods is therefore a data sender (or digital storage medium, or computer-readable medium) on which a computer program for performing one of the methods described herein is recorded.

진보된 방법의 추가 실시예는, 그러므로, 여기서 설명된 방법들 중 하나를 수행하기 위한 컴퓨터 프로그램을 나타내는 데이터 스트림 또는 신호들의 시퀀스이다. 상기 데이터 스트림 또는 신호들의 시퀀스는 예를 들어 데이터 통신 연결을 매개로 하여, 예를 들면 인터넷을 매개로 하여 전송되도록 구성될 수 있다.
A further embodiment of the advanced method is therefore a sequence of data streams or signals representing a computer program for performing one of the methods described herein. The sequence of data streams or signals may be configured to be transmitted, for example via the Internet, via a data communication connection, for example.

추가 실시예는 여기서 설명된 방법들 중 하나를 수행하도록 구성되거나 적합화된, 처리 수단, 예를 들어 컴퓨터 또는 프로그램 가능한 논리 장치를 포함한다.
Additional embodiments include processing means, e.g., a computer or programmable logic device, configured or adapted to perform one of the methods described herein.

추가 실시예는, 여기서 설명된 방법들 중 하나를 수행하기 위한 컴퓨터 프로그램이 설치된 컴퓨터를 포함한다.
Additional embodiments include a computer in which a computer program for performing one of the methods described herein is installed.

몇몇 실시예들에서, 프로그램 가능한 논리 장치(예를 들어 현장 프로그램 가능 게이트 배열(Field Programmable Gate Array))는 여기서 설명된 방법들 중 몇몇 또는 모든 기능을 수행하기 위해 사용될 수 있다. 몇몇 실시예들에서, 현장 프로그램 가능 게이트 배열은 여기서 설명된 방법들 중 하나는 수행하기 위해 마이크로 프로세서와 함께 동작할 수 있다. 일반적으로, 상기 방법들은 바람직하게는 어떠한 하드웨어 장치에 의해 수행된다.
In some embodiments, a programmable logic device (e.g., a Field Programmable Gate Array) may be used to perform some or all of the functions described herein. In some embodiments, the field programmable gate array may operate with a microprocessor to perform one of the methods described herein. In general, the methods are preferably performed by any hardware device.

위에서 설명된 실시예들은 단지 본 발명의 원리를 위한 설명에 불과하다. 여기에 설명된 구성 및 그 상세에 대한 변경 및 변형은 당업자에게 명백하다. 그러므로, 그 취지는 단지 출원되는 특허 청구범위에 의해서만 제한되는 것이고 여기에서의 실시예들의 상세 및 설명에 의해 나타난 특정 상세들에 의해 한정되는 것이 아니다.The embodiments described above are merely illustrative for the principles of the present invention. Modifications and variations to the arrangements and details described herein are apparent to those skilled in the art. Therefore, the spirit of the invention is to be limited only by the appended claims, and not by the specific details presented by the details and description of the embodiments herein.

200 : 스위치
300/525 : 신호 분석기
510 : LPC 프로세서
410 : 제1 변환기
523 : 제2 변환기
421 : 양자화기/코더
431 : 역 양자화기/코더
440 : 제1 제어가능 변환기
534 : 제2 제어가능 변환기
540 : LPC 합성 프로세서
600 : 결합기200: Switch
300/525: Signal Analyzer
510: LPC processor
410: first converter
523: second converter
421: Quantizer / coder
431: Inverse quantizer / coder
440: a first controllable transducer
534: second controllable transducer
540: LPC synthesis processor
600: coupler

Claims

A first coding branch (400) for coding an audio signal using a first coding algorithm to obtain a first coded signal, wherein the first coding branch (400) comprises a first coding branch (400) for transforming an input signal into a spectral region 1 converter 410;
A second coding branch (500) for coding an audio signal using a second coding algorithm to obtain a second coded signal, wherein the second coding branch (500) comprises a second coding algorithm, Wherein the second coding branch comprises a domain converter for transforming an input signal from an input domain to an output domain and a second converter for transforming the input signal into a spectral domain;
A switch (200) for switching between the first coding branch and the second coding branch to make the first coding signal or the second coding signal be in an encoder output signal, in a portion of the audio input signal;
A signal analyzer (300, 525) for analyzing said portion of said audio signal to determine whether said portion of said audio signal is represented by said first encoded signal or said second encoded signal in said encoder output signal, The signal analyzer (300, 525) may also be configured to determine the time / frequency resolution of each of the first and second transducers when generating the first coded signal or the second coded signal indicative of a portion of the audio signal Configured to variably determine -; And
A first encoder for encoding the first encoded signal, the second encoded signal, information indicating the first encoded signal and the second encoded signal, and a time / frequency encoder for encoding the first encoded signal and encoding the second encoded signal, An output interface (800) for generating an encoder output signal including information indicative of frequency resolution; And an audio encoder for encoding the audio signal.

The method of claim 1, wherein the signal analyzer (300, 525) classifies the portion of the audio signal into a speech-based audio signal or musical-type audio signal and determines the time / frequency resolution of the first converter To perform an analysis-synthesis process to perform transient detection in the case of a music signal or to determine the time / frequency resolution of the second transducer (523).

The method of claim 1 or 2, wherein the first converter (410) and the second converter (523) comprise a variable windowed transform &Lt; / RTI >
Wherein the signal analyzer (300/525) is configured to control the window size and / or the transform length based on the signal analysis.

The method of any one of the preceding claims, wherein the second encoder branch comprises a first processing branch (522) for processing an audio signal in an area determined by the region transformer (510), and a second processing branch A second processing branch 523, 524,
Wherein the signal analyzer is configured to sub-divide a portion of the audio signal into a sequence of sub-portions, the signal analyzer comprising a portion of the portion processed by the second processing branch Is configured to determine the time / frequency resolution of the second transducer (523) according to the position of the sub-portion processed by the first processing branch for the sub-portion Audio coder.

The method of claim 4, wherein the first processing branch comprises an ACELP encoder (526)
The second processing branch includes an MDCT-TCX processing unit 527,
The signal analyzer 300/525 is configured to determine the temporal resolution of the second transducer as a relatively high value determined by the length of the lower portion or by a length of the lower portion multiplied by an integer greater than one An audio encoder configured to set a low value.

The signal analyzer (300, 525) as claimed in any one of the preceding claims, wherein the signal analyzer (300, 525) determines a signal classification in a fixed raster covering a plurality of equal size blocks of audio samples, Sub-dividing a block of sub-blocks into a variable number of blocks, wherein the length of the sub-block is set to the first time / frequency resolution or the second time / frequency resolution Determines the audio coder.

The method of any of the preceding claims, wherein the signal analyzer (300, 525) comprises a plurality of different window lengths, wherein the plurality of different window lengths are sampled at 2304, 2048, 256, 1920, - to determine the time / frequency resolution to be selected from, or at least two of
Wherein the plurality of different transform lengths comprises at least two of the groups consisting of 1152, 1024, 1080, 960, 128, 120 coefficients per transform block. Or
The signal analyzer (300, 525) is adapted to convert the time / frequency resolution of the second transducer into a plurality of different window lengths, wherein the plurality of different window lengths are 640, 1152, 2304, 512, 1024 or 2048 samples Or at least one of the following:
Using a plurality of different transform lengths, wherein the plurality of different transform lengths comprise at least two groups of 320, 576, 1152, 256, 512, 1024 spectral coefficients per transform block The audio encoder configured.

The method of any one of the preceding claims, wherein the second coding branch comprises: a first processing branch (522) for processing an audio signal;
A second processing branch comprising the second transducer; And
In the portion of the audio signal input to the second coding branch, the first processing branch (522) and the second processing branch (523) are arranged to make the first processing signal or the second processing signal be in the second coding signal An additional switch 521 for switching between the switches 524 and 524;
Lt; / RTI >

A method of audio coding an audio signal,
A first coding branch (400), wherein the first coding branch comprises a first transformer (410) for transforming an input signal into a spectral region, characterized by using a first coding algorithm to obtain a first coded signal Encoding an audio signal;
In a second coding coding branch (500), encoding an audio signal using a second coding algorithm to obtain a second encoded signal, wherein the first coding algorithm is different from the second coding algorithm The second coding branch comprising a domain transformer for transforming an input signal from an input domain to an output domain and a second transformer for transforming the input signal into a spectral domain;
Switching (200) between the first coding branch and the second coding branch to make the first coded signal or the second coded signal be in the coder output signal, at a portion of the audio input signal;
Analyzing (300, 525) a portion of the audio signal to determine whether the portion of the audio signal is represented by the first encoded signal in the encoder output signal or by the second encoded signal;
Variably determining a time / frequency resolution of each of the first and second transducers when the first encoded signal or the second encoded signal representing the portion of the audio signal is generated; And
Information indicating the first coded signal, the second coded signal, the first coded signal and the second coded signal, and a time / frequency resolution applied to code the first coded signal and the second coded signal, And generating (800) an encoder output signal that includes information indicating the audio signal.

An audio decoder for decoding a coded signal, the coded signal including a first coded signal, a second coded signal, a display indicating the first coded signal and the second coded signal, And time / frequency resolution information used for decoding the second encoded audio signal,
A first decoding branch (431, 440) for decoding the first coded signal using a first controllable frequency / time transformer (440), the first controllable frequency / And using time / frequency resolution information for the first coded signal to control the first and second coded signals;
A second decoding branch for decoding the second coded signal using a second controllable frequency / time converter 534, the second controllable frequency / time transformer 534 for time / Configured to be controlled using frequency resolution information;
A controller 990 for controlling the first frequency / time transformer 440 and the second frequency / time transformer 534 using the time / frequency resolution information;
A region converter 540 for generating a synthesis signal using the second decoded signal; And
And a combiner (604) for combining the first decoded signal and the synthesized signal to obtain a decoded audio signal.

The method of claim 10,
The time / frequency resolution in the first frequency / time converter 440 may be a plurality of different window lengths, wherein the plurality of different window lengths are at least equal to 2304, 2048, 256, 1920, 2160, 240 samples -, < / RTI >
A plurality of different transform lengths, wherein the plurality of different transform lengths comprise at least two of the groups consisting of 1152, 1024, 1080, 960, 128, 120 coefficients per transform block Selected, or
The time / frequency resolution for the second frequency / time transformer 534 may be a plurality of different window lengths, wherein the plurality of different window lengths may be selected among 640, 1152, 2304, 512, 1024 or 2048 samples At least one of which is selected as being either, or
Wherein the plurality of different transform lengths comprises at least two of the groups consisting of 320, 576, 1152, 256, 512, 1024 spectral coefficients per transform block In order to be selected from -
The controller (990) is configured to control the first frequency / time transformer (440) and the second frequency / time transformer (534).

The method of claim 10 or 11, wherein the second decryption branch
And a first inverse processing branch (531) for inverse processing the first processed signal further included in the encoded signal to obtain a first inverse processed signal;
The second controllable frequency / time converter 534 includes a second controllable frequency / time converter 534 configured to inverse process the second coded signal in the same area as the first inverse processed signal to obtain a second inverse processed signal, Located in the inverse processing branch;
And an additional combiner (532) for combining the first de-processed signal and the second de-processed signal to obtain a combined signal;
And the combined signal is input to the combiner (600).

The method of any one of claims 10 to 12, wherein the first frequency / time transformer (440) and the second frequency / time transformer transform the time-domain aliasing And an overlap / add unit (440c) for canceling the audio signal.

The method as claimed in any one of claims 10 to 13, wherein the encoded signal includes coding mode information for identifying whether the encoded signal is the first encoded signal or the second encoded signal,
Wherein the decoder further comprises an input interface (900) for interpreting the coding mode information to determine whether the encoded signal is to be input to the first decoding branch or the second decoding branch, .

The audio decoder of any one of the preceding claims, wherein the first encoded signal is arithmetically encoded and the first coding branch comprises an arithmetic decoder.

The method of any one of the preceding claims, wherein the first coding branch is performed to cancel the result of non-uniform quantization applied when the first coded signal occurs, An inverse quantizer,
Wherein if the second coding branch does not include an inverse quantizer, the second coding branch includes an inverse quantizer using the other inverse quantization characteristic.

10. The method of any one of the preceding claims, wherein the controller (990) is configured to apply, for each transducer, the first frequency / time resolution by applying one of a plurality of different possible discrete frequency / Wherein the number of different possible frequency / time resolutions is greater than the number of different possible frequency / time resolutions in the first transducer, Decoder.

The apparatus of any one of claims 10 to 17, wherein the region converter is an LPC synthesis processor (544) for generating a synthesis signal using PC filter information, and the LPC filter information is included in the encoded signal Audio decoder.

A method for audio decoding a coded signal, wherein the coded signal includes a first encoded signal, a second encoded signal, a first coded signal, and a second coded signal, and time / frequency resolution information used for decoding the first encoded signal and the second encoded audio signal,
The first controllable frequency / time transformer decodes the first coded signal using a first controllable frequency / time transformer 440 by a first decoding branch 431, 440, wherein the first controllable frequency / And to be controlled using the time / frequency resolution information for the first coded signal to obtain a decoded signal;
Time converter 534, wherein the second controllable frequency / time converter 534 decodes the second encoded signal using a second controllable frequency / time converter 534 by a second decoding branch, And to be controlled using the time / frequency resolution information for the signal;
Controlling (990) the first frequency / time transformer 440 and the second frequency / time transformer 534 using the time / frequency resolution information;
Generating a synthesis signal using the second decoded signal by a domain converter; And
Combining (604) the first decoded signal and the synthesized signal to obtain a decoded audio signal
Lt; / RTI >

A first encoded signal;
A second encoded signal, wherein a portion of the audio signal is represented by the first encoded signal or the second encoded signal;
An indication indicating the first coded signal and the second coded signal;
An indication of first time / frequency resolution information to be used to decode the first coded signal, and
An indication of second time / frequency resolution information to be used for decoding the second coded signal,
The encoded audio signal.

A computer program for performing the method of claim 9 or claim 19 when operating on a processor.