TWI419148B

TWI419148B - Multi-resolution switched audio encoding/decoding scheme

Info

Publication number: TWI419148B
Application number: TW098133982A
Authority: TW
Inventors: Max Neuendorf; Stefan Bayer; Jeremie Lecomte; Guillaume Fuchs; Julien Robilliard; Nikolaus Rettelbach; Frederik Nagel; Ralf Geiger; Markus Multrus; Bernhard Grill; Philippe Gournay; Redwan Salami
Original assignee: Fraunhofer Ges Forschung
Priority date: 2008-10-08
Filing date: 2009-10-07
Publication date: 2013-12-11
Also published as: EP2345030A2; CA2739736C; ZA201102537B; MX2011003824A; RU2011117699A; KR20130133917A; JP5555707B2; EP3640941A1; CN102177426A; AU2009301358A1; TWI520128B; CA2739736A1; BRPI0914056A2; WO2010040522A2; JP2012505423A; AU2009301358A8; KR20110081291A; KR20130069833A; TW201344679A; TW201142827A

Description

Multi-resolution switching audio encoding/decoding scheme

本發明係有關於音訊編碼，且特定地有關於低位元率音訊編碼方案。The present invention relates to audio coding, and in particular to low bit rate audio coding schemes.

在習知技術中，諸如MP3或AAC之頻域編碼方案是已知的。這些頻域編碼器是基於一時域/頻域轉換、一隨後的量化階段，其中在該隨後的量化階段中，使用來自一感知模組的資訊來控制該量化誤差，及一編碼階段，其中該量化的頻譜係數與對應的旁側資訊使用編碼表被熵編碼。In the prior art, a frequency domain coding scheme such as MP3 or AAC is known. The frequency domain encoders are based on a time domain/frequency domain conversion, a subsequent quantization phase, wherein in the subsequent quantization phase, information from a sensing module is used to control the quantization error, and an encoding phase, wherein The quantized spectral coefficients and corresponding side information are entropy encoded using a coding table.

另一方面有非常適合於語音處理的編碼器，諸如在3GPP TS 26.290中所描述之AMR-WB+。此類語音編碼方案執行一時域信號之一線性預測(LP)濾波。此一LP濾波獲自於該輸入時域信號之一線性預測分析。接著該產生的LP濾波器係數被量化/編碼並當作旁側資訊被傳輸。該過程被稱為線性預測編碼(LPC)。在該濾波器的輸出，也稱為激發信號之預測殘餘信號或預測誤差信號使用ACELP編碼器之綜合分析級，或可選擇地使用一轉換編碼器來編碼，該轉換編碼器使用具有一重疊之傅立葉轉換。使用一閉迴路或一開迴路演算法來決定使用ACELP編碼或轉換編碼的激勵編碼(也稱為TCX編碼)。On the other hand there are encoders that are very suitable for speech processing, such as AMR-WB+ as described in 3GPP TS 26.290. Such speech coding schemes perform one of the time domain signals, linear prediction (LP) filtering. This LP filtering is derived from linear prediction analysis of one of the input time domain signals. The resulting LP filter coefficients are then quantized/encoded and transmitted as side information. This process is called Linear Predictive Coding (LPC). At the output of the filter, the predicted residual signal or prediction error signal, also referred to as the excitation signal, is encoded using a comprehensive analysis stage of the ACELP encoder, or alternatively using a conversion encoder that has an overlap. Fourier transform. A closed loop or an open loop algorithm is used to determine the excitation coding (also known as TCX coding) using ACELP coding or transcoding.

頻域音訊編碼方案，諸如將一AAC編碼方案與一頻帶複製(SBR)技術結合之高效AAC(HE-AAC)編碼方案，也可與被稱為“MPEG環繞”之一聯合立體聲或一多聲道編碼工具相結合。A frequency domain audio coding scheme, such as an efficient AAC (HE-AAC) coding scheme that combines an AAC coding scheme with a band replica (SBR) technique, or a stereo or a multi-voice associated with one of the so-called "MPEG Surround" The channel coding tool is combined.

另一方面，諸如AMR-WB+之語音編碼器也具有一高頻延伸級與一立體聲功能。On the other hand, a speech encoder such as AMR-WB+ also has a high frequency extension stage and a stereo function.

頻域編碼方案的優點在於它們對低位元率音樂信號顯示一高品質。然而，低位元率的語音信號品質卻有問題。An advantage of the frequency domain coding scheme is that they display a high quality for low bit rate music signals. However, the speech quality of the low bit rate is problematic.

語音編碼方案對甚至是低位元率的語音信號顯示有高品質，但對低位元率的其它信號顯示出不良品質。The speech coding scheme shows high quality even for low bit rate speech signals, but shows poor quality for other signals at low bit rates.

本發明之一目的是提供一改良的編碼/解碼概念。It is an object of the present invention to provide an improved coding/decoding concept.

此目的透過依據申請專利範圍第1項之一音訊編碼器、依據申請專利範圍第9項之一種音訊編碼方法、依據申請專利範圍第10項所述之一解碼器、依據申請專利範圍第19項之一種解碼方法、依據申請專利範圍第20項之一編碼信號或依據申請專利範圍第21項之一電腦程式來實現。This object is achieved by an audio encoder according to item 1 of the patent application scope, an audio coding method according to claim 9 of the patent application scope, a decoder according to claim 10, and claim 19 according to the patent application scope. A decoding method is implemented according to one of the coded ones of claim 20 or a computer program according to claim 21 of the scope of the patent application.

本發明是基於一發現，即：一混合或雙重模式切換的編碼(coding/encoding)方案之優點在於針對某一信號特性始終可選擇最佳的編碼演算法。換言之，本發明並不期望一種完美匹配所有信號特性的信號編碼演算法。此方案將始終是一折衷方案，由習知音訊編碼器與語音編碼器之間的巨大差異可領會。取而代之的是，本發明將不同的編碼演算法，諸如一語音編碼演算法及一音訊編碼演算法結合在一切換方案中以便針對每一音訊信號部分選擇最佳的匹配編碼演算法。此外，這兩編碼支路包含一時間/頻率轉換器但在一編碼支路中提供了諸如一LPC處理器之一進一步的域轉換器，這也是本發明之一特徵。此域轉換器確認該第二支路比該第一編碼支路適於某一信號特性。然而，該域處理器的信號輸出也轉換成一頻譜表示也是本發明之一特徵。The present invention is based on the discovery that a coding/encoding scheme of a hybrid or dual mode switch has the advantage that an optimal coding algorithm can always be selected for a certain signal characteristic. In other words, the present invention does not expect a signal encoding algorithm that perfectly matches all signal characteristics. This solution will always be a compromise, and the huge difference between the conventional audio encoder and the speech encoder can be appreciated. Instead, the present invention combines different encoding algorithms, such as a speech encoding algorithm and an audio encoding algorithm, in a switching scheme to select the best matching encoding algorithm for each portion of the audio signal. Moreover, the two encoding branches include a time/frequency converter but provide a further domain converter such as an LPC processor in an encoding branch, which is also a feature of the present invention. The domain converter confirms that the second branch is adapted to a certain signal characteristic than the first encoding branch. However, the conversion of the signal output of the domain processor to a spectral representation is also a feature of the present invention.

兩轉換器，即，在該第一編碼支路中的該第一轉換器與在該第二編碼支路中的該第二轉換器，被組態為實施一多解析度轉換編碼，其中根據該音訊信號及特定地根據在該相對應的編碼支路中實際編碼的該音訊信號來設定該相對應的轉換器之解析度以便獲得品質與位元率之間之一良好折衷或考慮到某一固定品質、最低位元率或考慮到一固定位元率、最高品質。Two converters, ie the first converter in the first encoding branch and the second converter in the second encoding branch, are configured to implement a multi-resolution conversion encoding, wherein The audio signal and the resolution of the corresponding converter are specifically set according to the audio signal actually encoded in the corresponding encoding branch to obtain a good compromise between quality and bit rate or to take into account A fixed quality, lowest bit rate or a fixed bit rate, the highest quality.

依據本發明，該兩轉換器之時間/頻率解析度可較佳地彼此獨立設定以使得每一時間/頻率轉換器可最佳地匹配該相對應信號之該時間/頻率解析度需求。該位元效率，即有用位元與旁側資訊位元之間的關係對較長區塊大小/視窗長度較高。因此，兩轉換器較偏向於一較長的視窗長度是較佳的，因為，大體上相同數量的旁側資訊較之應用較短的區塊大小/視窗長度/轉換長度涉及該音訊信號之一較長的時間部分。較佳地，在該等編碼支路中的該時間/頻率解析度也受位於這些支路中的其它編碼/解碼工具影響。較佳地，包含該域轉換器諸如一LPC處理器之該第二編碼支路包含另一混合方案，諸如一ACELP支路及一TCX方案，其中該第二轉換器包括在該TCX方案中。較佳地，位於該TCX支路中的該時間/頻率轉換器之解析度亦受該編碼決定影響，以使得在該第二編碼支路中的該信號的一部分在具有該第二轉換器的該TCX支路中或在不具有一時間/頻率轉換器的該ACELP支路中處理。In accordance with the present invention, the time/frequency resolution of the two converters can preferably be set independently of one another such that each time/frequency converter can optimally match the time/frequency resolution requirement of the corresponding signal. The bit efficiency, that is, the relationship between the useful bit and the side information bit, is higher for the longer block size/window length. Therefore, it is preferred that the two converters are biased toward a longer window length because substantially the same amount of side information relates to one of the audio signals compared to the application of a shorter block size/window length/conversion length. Part of the longer time. Preferably, the time/frequency resolution in the encoding branches is also affected by other encoding/decoding tools located in these branches. Preferably, the second encoding branch comprising the domain converter, such as an LPC processor, includes another hybrid scheme, such as an ACELP branch and a TCX scheme, wherein the second converter is included in the TCX scheme. Preferably, the resolution of the time/frequency converter located in the TCX branch is also affected by the encoding decision such that a portion of the signal in the second encoding branch is in the second converter The TCX branch is processed in the ACELP branch without a time/frequency converter.

基本上，域轉換器與第二編碼支路，且特別是第二編碼支路中的第一處理支路及在第二編碼支路中的第二處理支路，均非必須是與語音相關的元件，諸如域轉換器的一LPC分析器、第二處理支路的一TCX編碼器及第一處理支路的一ACELP編碼器。當與語音及音樂不同的一音訊信號之其它信號特性被評估時，其它應用也是有用的。可使用任何域轉換器及編碼支路實施，及可用綜合分析方案找到最佳匹配演算法以使得，在該解碼器端針對該音訊信號之每一部分執行所有的編碼選擇並選擇最佳結果，其中該最佳結果可透過對該等編碼結果實施一目標函數來找到。接著，識別(對於一解碼器而言)針對該編碼音訊信號之某一部分的該基本編碼演算法的旁側資訊透過一編碼器輸出介面與該編碼的音訊信號附合，以使得該解碼器不必在意在該編碼器端上或在任何信號特性上的任何決策而只視該發射的旁側資訊而定選擇它的編碼支路。此外，該解碼器將不僅選擇正確的解碼支路，且亦基於在該編碼信號中編碼的旁側資訊選擇哪一時間/頻率解析度將應用在一對應的第一解碼支路及一對應的第二解碼支路中。Basically, the domain converter and the second encoding branch, and in particular the first processing branch in the second encoding branch and the second processing branch in the second encoding branch, are not necessarily speech related The components, such as an LPC analyzer of the domain converter, a TCX encoder of the second processing branch, and an ACELP encoder of the first processing branch. Other applications are also useful when other signal characteristics of an audio signal different from voice and music are evaluated. Any domain converter and coding branch implementation can be used, and a best matching algorithm can be found using a comprehensive analysis scheme such that at the decoder side all coding selections are performed for each portion of the audio signal and the best results are selected, wherein The best result can be found by implementing an objective function on the encoded results. Next, identifying (for a decoder) side information of the basic encoding algorithm for a portion of the encoded audio signal is coupled to the encoded audio signal through an encoder output interface such that the decoder does not have to Any decision on the encoder side or on any signal characteristics is chosen to select its coding branch based only on the side information of the transmission. In addition, the decoder will not only select the correct decoding branch, but also select which time/frequency resolution will be applied to a corresponding first decoding branch and a corresponding one based on the side information encoded in the encoded signal. In the second decoding branch.

因此，本發明提供一編碼/解碼方案，該編碼/解碼方案結合了所有不同的編碼演算法的優點而避免了這些編碼演算法的缺點，當該信號部分將必須由並不適合於某一編碼演算法之一演算法編碼時出現這些缺點。此外，本發明避免了如果由在不同編碼支路中之不同音訊信號部分引起之該等不同的時間/頻率解析度需求還沒有說明時將出現的任何缺點。取而代之的是，由於在兩支路中時間/頻率轉換器之可變的時間/頻率解析度，至少減少或甚至完全避免將在該情形中：即在相同時間/頻率解析度將用於兩編碼支路或其中針對任何編碼支路將只可能是一固定的時間/頻率解析度的情況下，出現之任何偽影(artifact)。Accordingly, the present invention provides an encoding/decoding scheme that combines the advantages of all of the different encoding algorithms to avoid the disadvantages of these encoding algorithms, when the signal portion would have to be unsuitable for a certain encoding algorithm These shortcomings occur when encoding one of the algorithms. Moreover, the present invention avoids any disadvantages that would arise if such different time/frequency resolution requirements were caused by different audio signal portions in different encoding branches. Instead, due to the variable time/frequency resolution of the time/frequency converters in the two branches, at least a reduction or even a complete avoidance will be used in this case: ie at the same time/frequency resolution will be used for the two codes Any artifacts that occur in the case of a branch or where any coded branch will only be a fixed time/frequency resolution.

該第二開關再次在兩處理支路之間決定，但是在與該“外部”第一支路域不同的一域中。再次，一“內部”支路主要由一來源模型或SNR計算來啟動，及其它“內部”支路可透過一沉積模型及/或一感知模型來啟動即透過遮罩，或至少包括頻率/頻譜域編碼層面。示範地，一“內部”支路具有一頻域編碼器/頻譜轉換器及另一支路具有在其它域(諸如該LPC域)上編碼之一編碼器，其中此編碼器例如是在沒有進行一頻譜轉換的情況下處理一輸入信號之一CELP或ACELP量化器/定標器。The second switch is again determined between the two processing branches, but in a different domain than the "external" first branch domain. Again, an "internal" branch is initiated primarily by a source model or SNR calculation, and other "internal" branches can be initiated through a mask, and/or a perceptual model, or through at least a frequency/spectrum. Domain coding level. Illustratively, an "internal" branch has a frequency domain encoder/spectrum converter and the other branch has one of the encoders encoded on other domains, such as the LPC domain, wherein the encoder is, for example, not performing One of the input signals, CELP or ACELP quantizer/scaler, is processed in the case of a spectral conversion.

一進一步的較佳實施例是一音訊編碼器，該音訊編碼器包含定向諸如一頻譜域編碼支路之編碼支路之一第一資訊槽、定向諸如一LPC域編碼支路之編碼支路之一第二資訊槽來源或SNR及用於在該第一編碼支路與該第二編碼支路之間切換之一開關，其中該第二編碼支路包含到與該時域不同之一特定域(諸如產生一激發信號之一LPC分析級)中之一轉換器，及其中該第二編碼支路另外包含諸如LPC域處理支路之一特定域及諸如LPC頻譜域處理支路之一特定頻譜域及用於在該特定域編碼支路與該特定頻譜域編碼支路之間切換之一額外的開關。A further preferred embodiment is an audio encoder comprising a first information slot directional to an encoding branch such as a spectral domain encoding branch, and an encoding branch oriented such as an LPC domain encoding branch a second information slot source or SNR and a switch for switching between the first encoding branch and the second encoding branch, wherein the second encoding branch includes a specific domain different from the time domain a converter (such as an LPC analysis stage that generates one of the excitation signals), and wherein the second coding branch additionally includes a specific domain such as an LPC domain processing branch and a specific spectrum such as an LPC spectral domain processing branch An additional switch for the domain and for switching between the particular domain encoding branch and the particular spectral domain encoding branch.

本發明之一進一步的實施例是一音訊解碼器，該音訊解碼器包含一第一域(諸如一頻譜域解碼支路)、一第二域(諸如用於解碼在該第二域中的一信號(諸如一激發信號)之一LPC域解碼支路)及一第三域(諸如用於解碼在一第三域(諸如一LPC頻譜域)中的一信號(諸如一激發信號)之一LPC頻譜解碼器支路)，其中該第三域是藉由執行自該第二域的一頻率轉換而獲得，其中提供針對該第二域信號與該第三域信號之一第一開關，及其中提供用於在該第一域解碼器與針對該第二域或該第三域的解碼器之間切換之一第二開關。A further embodiment of the present invention is an audio decoder comprising a first domain (such as a spectral domain decoding branch) and a second domain (such as for decoding one in the second domain) An LPC domain decoding branch of a signal (such as an excitation signal) and a third domain (such as LPC for decoding a signal (such as an excitation signal) in a third domain (such as an LPC spectral domain) a spectral decoder branch), wherein the third domain is obtained by performing a frequency conversion from the second domain, wherein a first switch is provided for the second domain signal and the third domain signal, and wherein A second switch is provided for switching between the first domain decoder and a decoder for the second domain or the third domain.

Simple illustration

本發明之較佳實施例隨後針對該等附圖予以描述，其中：第1a圖是依據本發明之一第一層面之一編碼方案之一方塊圖；第1b圖是依據本發明之該第一層面之一解碼方案之一方塊圖；第1c圖是依據本發明之一進一步的層面之一編碼方案之一方塊圖；第2a圖是依據本發明之一第二層面之一編碼方案之一方塊圖；第2b圖是依據本發明之該第二層面之一解碼方案之一示意圖；第2c圖是依據本發明之一進一步的層面之一編碼方案之一方塊圖；第3a圖說明依據本發明之一進一步的層面之一編碼方案之一方塊圖；第3b圖說明依據本發明之該進一步的層面之一解碼方案之一方塊圖；第3c圖說明具有級聯開關之該編碼設備/方法之一示意表示；第3d圖說明用於解碼之一設備或方法(其中使用了級聯組合器)之一示意圖；第3e圖說明一時域信號之一圖解及說明被包括在兩編碼信號中的短交錯淡出區域之該編碼信號之一相對應的表示；第4a圖說明具有定位在該編碼支路之前的一開關之一方塊圖；第4b圖說明具有定位在該編碼支路之後的該開關之一編碼方案之一方塊圖；第5a圖說明作為一準週期性或類似脉衝的信號段之一時域語音段之一波束形成；第5b圖說明第5a圖之該段之一頻譜；第5c圖說明無聲語音之一時域語音段，作為針對一類似雜訊段之一範例；第5d圖說明第5c圖之該時域波束之一頻譜；第6圖說明一綜合分析CELP編碼器之一方塊圖；第7a至7d圖說明有聲/無聲激發信號，作為針對類似脉衝信號之一範例；第7e圖說明提供短期預測資訊及該預測誤差(激發)信號之一編碼器端LPC級，；第7f圖說明用於產生一加權信號之一LPC裝置之一進一步的實施例；第7g圖說明透藉由實施如在第2b圖之該轉換器537中所需要之一反向加權操作及一隨後的激發分析來將一加權信號轉換成一激發信號之一實施；第8圖說明依據本發明之一實施例之一聯合多聲道演算法之一方塊圖；第9圖說明一頻寬擴展演算法之一較佳實施例；第10a圖說明當執行一開迴路決策時對該開關之一詳細描述；及第10b圖說明檔在一閉合迴路決策模式中操作時該開關之一圖解。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The preferred embodiments of the present invention are described with respect to the accompanying drawings, wherein: FIG. 1a is a block diagram of one of the first aspects of the present invention, and FIG. 1b is the first in accordance with the present invention. a block diagram of one of the decoding schemes of the layer; FIG. 1c is a block diagram of one of the coding schemes according to one of the further aspects of the present invention; and FIG. 2a is a block diagram of one of the coding schemes according to one of the second aspects of the present invention Figure 2b is a schematic diagram of one of the decoding schemes of the second layer according to the present invention; Figure 2c is a block diagram of one of the coding schemes according to one of the further aspects of the present invention; a block diagram of one of the coding schemes of one of the further layers; FIG. 3b illustrates a block diagram of one of the decoding schemes of the further layer in accordance with the present invention; and FIG. 3c illustrates the encoding apparatus/method having a cascade switch A schematic representation; Figure 3d illustrates a schematic diagram of one of the devices or methods for decoding (in which a cascade combiner is used); Figure 3e illustrates one of the time domain signals illustrated and illustrated in the two encoded signals. a corresponding representation of one of the encoded signals of the short interleaved fade-out region; Figure 4a illustrates a block diagram of a switch positioned prior to the encoding branch; Figure 4b illustrates the positioning of the encoded branch after the encoding branch One of the switching schemes of one of the switches; Figure 5a illustrates one of the time domain speech segments as one of the quasi-periodic or similarly pulsed signal segments; and Figure 5b illustrates one of the segments of the 5a graph; Figure 5c illustrates one time domain speech segment of silent speech as an example for a similar noise segment; Figure 5d illustrates one of the time domain beams of Figure 5c; and Figure 6 illustrates a comprehensive analysis of CELP encoder a block diagram; Figures 7a through 7d illustrate an audible/silent excitation signal as an example for a similar pulse signal; Figure 7e illustrates an encoder-side LPC stage that provides short-term prediction information and the prediction error (excitation) signal, Figure 7f illustrates a further embodiment of one of the LPC devices for generating a weighted signal; Figure 7g illustrates one of the reverse weighting operations required by implementing the converter 537 as in Figure 2b and a subsequent excitement Analysis to convert a weighted signal into one of the excitation signals; Figure 8 illustrates a block diagram of a joint multi-channel algorithm in accordance with one embodiment of the present invention; and Figure 9 illustrates one of the bandwidth extension algorithms Preferred Embodiment; Figure 10a illustrates a detailed description of one of the switches when performing an open loop decision; and Figure 10b illustrates one of the switches when the file is operated in a closed loop decision mode.

第11A圖依據本發明之另一層面說明一音訊編碼器之一方塊圖；第11B圖說明一發明音訊解碼器之另一實施例之一方塊圖；第12A圖說明一發明編碼器之另一實施例；第12B圖說明一發明解碼器之另一實施例；第13A圖說明解析度與視窗/轉換長度之間之相互關係；第13B圖說明針對該第一編碼支路之一組轉換視窗之一概觀及自該第一至該第二編碼支路之一過渡；第13C圖說明多個不同視窗序列，包括針對該第一編碼支路的視窗序列及針對到該第二支路的一過渡之序列；第14A圖說明該第二編碼支路之一較佳實施例之該定框；第14B圖說明應用於該第二編碼支路之短視窗；第14C圖說明應用於該第二編碼支路之中等大小的視窗；第14D圖說明該第二編碼支路所應用的長視窗；第14E圖說明在一超訊框劃分中的ACELP訊框及TCX訊框之一示範序列；第14F圖說明相對應於針對該第二編碼支路的不同時間/頻率解析度之不同的轉換長度；及第14G圖說明使用第14F圖的多個定義之一視窗之一建構。11A is a block diagram of an audio encoder according to another aspect of the present invention; FIG. 11B is a block diagram showing another embodiment of an inventive audio decoder; and FIG. 12A is a diagram showing another embodiment of an inventive encoder; Embodiment 12B illustrates another embodiment of an inventive decoder; FIG. 13A illustrates the relationship between resolution and window/conversion length; and FIG. 13B illustrates a group conversion window for the first encoding branch An overview and transition from one of the first to the second encoding branches; Figure 13C illustrates a plurality of different window sequences, including a window sequence for the first encoding branch and one for the second branch a sequence of transitions; Figure 14A illustrates the framing of a preferred embodiment of the second encoding branch; Figure 14B illustrates a short window applied to the second encoding branch; and Figure 14C illustrates the application to the second a window of equal size in the encoding branch; a 14D picture illustrating a long window to which the second encoding branch is applied; and a picture 14E illustrating an exemplary sequence of an ACELP frame and a TCX frame in a hyperframe division; 14F diagram description corresponding to the second series Branches of different time / frequency resolution of the different transform lengths; FIG. 14G described construction of one of the second one of the plurality of definitions used in FIG. 14F window.

第11A圖說明用於編碼一音訊信號之一音訊編碼器之一實施例。該編碼器包含一第一編碼支路400，該第一編碼支路400用於使用一第一編碼演算法來編碼一音訊信號以獲得一第一編碼信號。Figure 11A illustrates an embodiment of an audio encoder for encoding an audio signal. The encoder includes a first encoding branch 400 for encoding an audio signal using a first encoding algorithm to obtain a first encoded signal.

該音訊編碼器此外包含一第二編碼支路500，該第二編碼支路500用於使用一第二編碼演算法來編碼一音訊信號以獲得一第二編碼信號。該第一編碼演算法與該第二編碼演算法不同。額外地，提供了用於在該第一編碼支路與該第二編碼支路之間切換之一第一開關以使得對於該音訊信號之一部分，該第一編碼信號或該第二編碼信號都在一編碼器輸出信號801中。The audio encoder further includes a second encoding branch 500 for encoding an audio signal using a second encoding algorithm to obtain a second encoded signal. The first encoding algorithm is different from the second encoding algorithm. Additionally, a first switch is provided for switching between the first encoding branch and the second encoding branch such that the first encoded signal or the second encoded signal is for a portion of the audio signal In an encoder output signal 801.

在第11A圖中說明的該音訊編碼器額外地包含一信號分析器300/525，該信號分析器300/525被組態為分析該音訊信號之一部分以決定該音訊信號之該部分在該編碼器輸出信號801中是表示為該第一編碼信號抑或該第二編碼信號。The audio encoder illustrated in FIG. 11A additionally includes a signal analyzer 300/525 configured to analyze a portion of the audio signal to determine the portion of the audio signal at the encoding The output signal 801 is represented as the first encoded signal or the second encoded signal.

該信號分析器300/525更進一步被組態為可變地決定在該第一編碼支路400中之一第一轉換器410或在該第二編碼支路500中之一第二轉換器523之一各自的時間/頻率解析度。當產生表示該音訊信號的該部分之該第一編碼信號或該第二編碼信號時，應用此時間/頻率解析度。The signal analyzer 300/525 is further configured to variably determine one of the first converters 410 in the first encoding branch 400 or one of the second converters 523 in the second encoding branch 500. One of each time/frequency resolution. This time/frequency resolution is applied when the first encoded signal or the second encoded signal representing the portion of the audio signal is generated.

該音訊編碼器額外地包含一輸出介面800，該輸出介面800用於產生該編碼器輸出信號801，該編碼器輸出信號801包含對該音訊信號的該部分之一編碼表示及指示該音訊信號之該表示是該第一編碼信號抑或該第二編碼信號並指示用來解碼該第一編碼信號及該第二編碼信號的該時間/頻率解析度之一資訊。The audio encoder additionally includes an output interface 800 for generating the encoder output signal 801. The encoder output signal 801 includes an encoded representation of the portion of the audio signal and indicates the audio signal. The representation is the first encoded signal or the second encoded signal and indicates one of the time/frequency resolutions used to decode the first encoded signal and the second encoded signal.

該第二編碼支路較佳地與該第一編碼支路的不同在於：該第二編碼支路額外地包含用於將音訊信號自其在該第一編碼支路中被處理的域轉換成一不同的域之一域轉換器。較佳地，該域轉換器是一LPC處理器510，但是該域轉換器可以以任何其它方式來實施，只要該域轉換器與該第一轉換器410及該第二轉換器523不同。The second encoding branch is preferably different from the first encoding branch in that the second encoding branch additionally includes means for converting the audio signal from its processed domain in the first encoding branch to a One of the different domain domain converters. Preferably, the domain converter is an LPC processor 510, but the domain converter can be implemented in any other manner as long as the domain converter is different from the first converter 410 and the second converter 523.

該第一轉換器410是一時間/頻率轉換器，較佳地包含一視窗化器410a及一轉換器410b。該視窗化器410a將一分析視窗施於該輸入音訊信號，及該轉換器410b執行該視窗化信號至一頻譜表示之一轉換。The first converter 410 is a time/frequency converter, preferably including a windower 410a and a converter 410b. The windower 410a applies an analysis window to the input audio signal, and the converter 410b performs the conversion of the windowed signal to a spectral representation.

類似地，該第二轉換器523較佳地包含一視窗化器523a及一隨後連接的轉換器523b。該視窗化器523a接收該域轉換器510的信號輸出並輸出其視窗化的表示。由該視窗化器523a所施加之一分析視窗的結果被輸入至該轉換器523b以形成一頻譜表示。該轉換器可以是在軟體或硬體或一混合的硬體/軟體實施中實施一相對應的演算法之一FFT或較佳地MDCT處理器。可選擇地，該轉換器可以是一濾波器組實施，諸如一QMF濾波器組，其可能是基於對一原型濾波器的一實數值的或複雜的調變。對於特定濾波器組實施，應用一視窗。然而，對於其他濾波器組實施，針對基於一FFT或MDCT之一轉換演算法所需要的一視窗化是不是必需的。當使用一濾波器組實施時，則該濾波器組是一可變解析度濾波器組且該解析度控制該濾波器組的頻率解析度及額外地該時間解析度或只有該頻率解析度而沒有該時間解析度。然而，當該轉換器作為一FFT或MDCT或任何其它相對應的轉換器實施時，則該頻率解析度連接至該時間解析度，原因在於一較大區塊長度獲得的該頻率解析度之一增加在時間上自動地對應於一較低的時間解析度，反之亦然。Similarly, the second converter 523 preferably includes a windower 523a and a subsequently connected converter 523b. The windower 523a receives the signal output of the domain converter 510 and outputs its windowed representation. The result of an analysis window applied by the windower 523a is input to the converter 523b to form a spectral representation. The converter can be implemented as a FFT or preferably an MDCT processor in a software or hardware or a hybrid hardware/software implementation. Alternatively, the converter may be a filter bank implementation, such as a QMF filter bank, which may be based on a real-valued or complex modulation of a prototype filter. For a particular filter bank implementation, a window is applied. However, for other filter bank implementations, a windowing required for a conversion algorithm based on an FFT or MDCT is not necessary. When implemented using a filter bank, the filter bank is a variable resolution filter bank and the resolution controls the frequency resolution of the filter bank and additionally the time resolution or only the frequency resolution There is no such time resolution. However, when the converter is implemented as an FFT or MDCT or any other corresponding converter, then the frequency resolution is connected to the time resolution due to one of the frequency resolutions obtained for a larger block length. The increase automatically corresponds to a lower time resolution in time, and vice versa.

額外地，該第一編碼支路可包含一量化/編碼器級421，及該第二編碼支路也可包含一或多個進一步的編碼工具524。Additionally, the first encoding branch can include a quantization/encoder stage 421, and the second encoding branch can also include one or more further encoding tools 524.

重要地，該信號分析器被組態為產生針對該第一轉換器510及該第二轉換器523之一解析度控制信號。因此，一獨立的解析度控制在兩編碼支路中都實施以擁有一方面提供一低位元率及另一方面提供考慮到該低位元率之一最高品質之一編碼方案。為了實現該低位元率目標，較長視窗長度或較長轉換長度是較佳的，但是在這些長長度將導致一偽影(由於低時間解析度)的情況下，應用導致一較低頻率解析度之較短視窗長度及較短轉換長度。較佳地，該信號分析器應用一統計分析或適合於該等編碼支路中該等相對應的演算法之任何其它分析。在該第一編碼支路是一頻域編碼支路(諸如一基於AAC的編碼器)及該第二編碼支路包含作為一域轉換器的一LPC處理器510之一實施模式中，該信號分析器執行一語音/音樂區分以透過控制該開關200使得該音訊信號之該語音部分被饋送至該第二編碼支路中。透過相對應地控制用該等開關控制線所指示的該開關200該音訊信號之一音樂部分被饋送至該第一該第一編碼支路400。可選擇地，如將在後面針對第1C圖或第4B圖討論，該開關也可以定位在該輸出介面800的前面。Importantly, the signal analyzer is configured to generate a resolution control signal for the first converter 510 and the second converter 523. Therefore, an independent resolution control is implemented in both encoding branches to have one encoding scheme that provides a low bit rate on the one hand and one of the highest quality considering one of the low bit rates on the other hand. In order to achieve this low bit rate target, longer window lengths or longer conversion lengths are preferred, but where these long lengths will result in an artifact (due to low temporal resolution), the application results in a lower frequency resolution. Shorter window length and shorter conversion length. Preferably, the signal analyzer applies a statistical analysis or any other analysis suitable for the corresponding algorithms in the encoding branches. In the implementation mode where the first coding branch is a frequency domain coding branch (such as an AAC-based encoder) and the second coding branch includes an LPC processor 510 as a domain converter, the signal The analyzer performs a speech/music distinction to control the switch 200 such that the voice portion of the audio signal is fed into the second encoding branch. The music portion of the audio signal is fed to the first first encoding branch 400 by correspondingly controlling the switch 200 indicated by the switch control lines. Alternatively, the switch can also be positioned in front of the output interface 800 as will be discussed later with respect to FIG. 1C or FIG. 4B.

此外，該信號分析器可接收輸入至該開關200的音訊信號或由該開關200輸出的音訊信號。此外，該信號分析器執行一分析以不僅將該該音訊信號饋送至該相對應的編碼支路，而且決定在該相對應的編碼支路中該各自的轉換器之該適當的時間/頻率解析度，諸如如用連接該信號分析器與該轉換器的解析度控制線指示之該第一轉換器410及該第二轉換器523。Additionally, the signal analyzer can receive an audio signal input to the switch 200 or an audio signal output by the switch 200. Additionally, the signal analyzer performs an analysis to not only feed the audio signal to the corresponding encoding branch, but also to determine the appropriate time/frequency resolution of the respective converter in the corresponding encoding branch. The first converter 410 and the second converter 523 are indicated, for example, by a resolution control line connected to the signal analyzer and the converter.

第11B圖包含匹配第11A圖中的該音訊編碼器之一音訊解碼器之一較佳實施例。Figure 11B contains a preferred embodiment of an audio decoder that matches one of the audio encoders of Figure 11A.

在第11B圖中的該音訊解碼器被組態為解碼一編碼的音訊信號，諸如由第11A圖中的該輸出介面800輸出的該編碼器輸出信號801。該編碼的信號包含依據一第一編碼演算法編碼之一第一編碼的音訊信號、依據一第二演算法編碼之一第二編碼信號(該第二編碼演算法與該第一編碼演算法不同)及指示該第一編碼演算法或該第二編碼演算法是否用於解碼該第一編碼信號及該第二編碼信號之資訊及針對該第一編碼音訊信號及該第二編碼音訊信號之一時間/頻率解析度資訊。The audio decoder in Figure 11B is configured to decode an encoded audio signal, such as the encoder output signal 801 output by the output interface 800 in Figure 11A. The encoded signal includes a first encoded audio signal encoded according to a first encoding algorithm, and a second encoded signal encoded according to a second algorithm (the second encoding algorithm is different from the first encoding algorithm) And indicating whether the first encoding algorithm or the second encoding algorithm is used to decode information of the first encoded signal and the second encoded signal and for one of the first encoded audio signal and the second encoded audio signal Time/frequency resolution information.

該音訊解碼器包含用於基於該第一編碼演算法解碼該第一編碼信號之一第一解碼支路431、440。此外，該音訊解碼器包含用於使用該第二編碼演算法解碼該第二編碼信號之一第二解碼支路。The audio decoder includes first decoding branches 431, 440 for decoding the first encoded signal based on the first encoding algorithm. Additionally, the audio decoder includes a second decoding branch for decoding the second encoded signal using the second encoding algorithm.

該第一解碼支路包含用於自一頻譜域轉換成該時域之一第一可控制的轉換器440。該可控制的轉換器被組態以使用來自該第一編碼信號之該時間/頻率解析度資訊來控制以獲得該第一解碼信號。The first decoding branch includes a first controllable converter 440 for converting from a spectral domain to one of the time domains. The controllable converter is configured to control the first decoded signal using the time/frequency resolution information from the first encoded signal.

該第二解碼支路包含用於自一頻譜表示轉換成一時間表示之一第二可控制的轉換器，該第二可控制的轉換器534被組態為使用針對該第二編碼信號之該時間/頻率解析度資訊991來控制。The second decoding branch includes a second controllable converter for converting from a spectral representation to a time representation, the second controllable converter 534 being configured to use the time for the second encoded signal / Frequency resolution information 991 to control.

該解碼器額外地包含用於依據該時間/頻率解析度資訊來控制該第一轉換器540及該第二轉換器534之一控制器990。The decoder additionally includes a controller 990 for controlling the first converter 540 and the second converter 534 based on the time/frequency resolution information.

此外，該解碼器包含用於使用該第二解碼信號產生一合成信號以消除由在第11A圖的該編碼器中的該域轉換器510所施加的域轉換之一域轉換器。Additionally, the decoder includes a domain converter for generating a composite signal using the second decoded signal to cancel the domain conversion applied by the domain converter 510 in the encoder of FIG. 11A.

較佳地，該域轉換器540是一LPC合成處理器，使用包括在該編碼信號中之LPC濾波器資訊來控制，其中此LPC濾波器資訊已由第11A圖中的該LPC處理器510產生並作為旁側資訊已輸入至該編碼器輸出信號中。該音訊解碼器最後包含用於將由該第一域轉換器440輸出之該第一解碼信號與該合成信號組合以獲得一解碼的音訊信號609之一組合器600。Preferably, the domain converter 540 is an LPC synthesis processor controlled using LPC filter information included in the encoded signal, wherein the LPC filter information has been generated by the LPC processor 510 in FIG. 11A. And as side information has been input to the encoder output signal. The audio decoder finally includes a combiner 600 for combining the first decoded signal output by the first domain converter 440 with the composite signal to obtain a decoded audio signal 609.

在該較佳實施中，該第一解碼支路額外地包含用於反向或至少部分地反向該相對應的編碼器級所執行的該等操作之一解量化器/解碼器級431。然而，清楚的是，量化不可反向，因為這是一損失操作。然而，一解量化器將反向在諸如一對數或壓伸量化之一量化中的某些不均勻。In the preferred implementation, the first decoding branch additionally includes a dequantizer/decoder stage 431 for reversing or at least partially reversing one of the operations performed by the corresponding encoder stage. However, it is clear that quantification is not reversed because it is a loss operation. However, a dequantizer will reverse some of the non-uniformity in one of the quantization such as one-to-one or over-extension quantization.

在該第二解碼支路中，該相對應的級533申請取消級524所施加的某些編碼操作。較佳地，級524包含一均勻量化。因此，該相對應的級533將不具有用於取消某一均勻量化之一特定解量化級。In the second decoding branch, the corresponding stage 533 applies for some encoding operations imposed by the cancellation stage 524. Preferably, stage 524 includes a uniform quantization. Therefore, the corresponding stage 533 will not have a particular dequantization level for canceling one of the uniform quantizations.

該第一轉換器440及該第二轉換器534可包含一相對應的反向轉換器級440a、534a、一合成視窗級440b、534b及後續連接的重疊/相加級440c、534c。當該等轉換器及較特定地該等轉換器級440a、534a，實施諸如一改良的離散餘弦轉換之混疊引入轉換時，需要該等重疊/相加級。接著，該重疊/相加操作將執行一時域混疊消除(TDAC)。然而，當該轉換器應用諸如一反FFT之一非混疊引入轉換時，則不需要一重疊/相加級440c。在此一實施中，可施加用以避免區塊偽影之一交錯淡出操作。The first converter 440 and the second converter 534 can include a corresponding inverse converter stage 440a, 534a, a composite window stage 440b, 534b, and subsequent connected overlap/add stages 440c, 534c. Such overlap/addition stages are required when the converters and, more particularly, the converter stages 440a, 534a implement aliasing lead-in conversions such as a modified discrete cosine transform. This overlap/add operation will then perform a time domain aliasing cancellation (TDAC). However, when the converter applies a non-aliased lead-in conversion such as an inverse FFT, then an overlap/add phase 440c is not required. In this implementation, it may be applied to avoid one of the block artifacts being staggered out.

類似地，該組合器600可以是一切換組合器或一交錯淡出組合器或當混疊用來避免區塊偽影時，由該組合器來實施一過渡視窗化操作，類似於在它的一支路內的一重疊/相加級。Similarly, the combiner 600 can be a switch combiner or a staggered fade out combiner or when aliasing is used to avoid block artifacts, a combiner is implemented by the combiner, similar to one in it. An overlap/addition level within the branch.

第1a圖說明具有兩級聯開關之本發明之一實施例。一單聲信號、一立體聲信號或一多聲道信號輸入至該開關200中。該開關200由該決策級300控制。該決策級接收一信號作為一輸入來輸入至區塊200中。可選擇地，該決策級300也可接收被包括在該單聲信號、該立體聲信號或該多聲道信號中或至少與此一信號相關聯之一旁側資訊，在存在例如當最初產生該單聲信號、該立體聲信號或該多聲道信號時所產生的資訊的情況下。Figure 1a illustrates an embodiment of the invention having two cascade switches. A mono signal, a stereo signal or a multi-channel signal is input to the switch 200. The switch 200 is controlled by the decision stage 300. The decision stage receives a signal as an input for input to block 200. Optionally, the decision stage 300 can also receive side information included in the at least one signal, the stereo signal or the multi-channel signal or at least associated with the signal, in the presence of, for example, when the order is initially generated In the case of an acoustic signal, the stereo signal, or the information generated when the multi-channel signal is generated.

該決策級300開動該開關200以將一信號饋送至在第1a圖之一上支路說明之該頻率編碼部分400中或在第1a圖之一下支路說明之該LPC域編碼部分500。該頻率域編碼支路的一關鍵元件是該頻譜轉換區塊410，該頻譜轉換區塊410可操作地用以將一共同的預處理級輸出信號(如後面討論)轉換成一頻譜域。該頻譜轉換區塊可包括一MDCT演算法、一QMF、一FFT演算法、一小波(Wavelet)分析或一濾波器組，諸如具有某一數目的濾波器組通道之一關鍵取樣濾波器組，其中在此濾波器組中的該等子頻帶信號可以是一實數值的信號或複數值的信號。該頻譜轉換區塊410之輸出可使用一頻譜音訊編碼器421來編碼，如自該AAC編碼方案已知該頻譜音訊編碼器421可包括處理區塊。The decision stage 300 activates the switch 200 to feed a signal to the LPC domain encoding portion 500 as illustrated in the frequency encoding portion 400 illustrated in one of the branches of Figure 1a or in the lower branch of Figure 1a. A key component of the frequency domain encoding branch is the spectral conversion block 410, which is operative to convert a common pre-processing stage output signal (as discussed below) into a spectral domain. The spectral conversion block may comprise an MDCT algorithm, a QMF, an FFT algorithm, a Wavelet analysis or a filter bank, such as a key sampling filter bank having a certain number of filter bank channels, The sub-band signals in the filter bank may be a real-valued signal or a complex-valued signal. The output of the spectral conversion block 410 can be encoded using a spectral audio encoder 421, as is known from the AAC encoding scheme. The spectral audio encoder 421 can include processing blocks.

大體上，在支路400中的該處理是在一基於感知的模型或資訊槽模型中之一處理。因此，此支路模擬人類聽覺系統接收聲音。於此相反，在支路500中的該處理係用以產生在該激發、殘餘或LPC域中的一信號。大體上，在支路500中的該處理是在一語音模型或一資訊產生模型中的一處理。對於語音信號，此模型是產生聲音的人類語音/聲音產生系統之一模型。然而，如果來自一不同來源需要一不同的聲音產生模型之一聲音要被編碼，則在支路500中的該處理可能不同。In general, the processing in branch 400 is processed in one of a perception-based model or an information slot model. Therefore, this branch simulates the human auditory system receiving sound. In contrast, the processing in branch 500 is used to generate a signal in the excitation, residual or LPC domain. In general, the processing in branch 500 is a process in a speech model or an information generation model. For speech signals, this model is a model of a human speech/sound generation system that produces sound. However, if a sound from a different source requires a different sound generation model to be encoded, the processing in branch 500 may be different.

在該下編碼支路500，一關鍵元件是一LPC裝置510，該LPC裝置510輸出用來控制一LPC濾波器的該等特性之一LPC資訊。此LPC資訊被傳輸至一解碼器。該LPC級510的輸出信號是一LPC域信號，由一激發信號及/或一加權信號組成。In the lower encoding branch 500, a key component is an LPC device 510 that outputs LPC information for controlling one of the characteristics of an LPC filter. This LPC information is transmitted to a decoder. The output signal of the LPC stage 510 is an LPC domain signal composed of an excitation signal and/or a weighted signal.

該LPC裝置大體上輸出一LPC域信號，該LPC域信號可以是在該LPC域中的任何信號，諸如在第7e圖中的該激發信號或在第7f圖中的一加權信號或藉由將LPC濾波器係數施於一音訊信號而已產生之任何其它信號。此外，一LPC裝置也可決定這些係數且也可量化/編碼這些係數。The LPC device generally outputs an LPC domain signal, which may be any signal in the LPC domain, such as the excitation signal in Figure 7e or a weighted signal in Figure 7f or by The LPC filter coefficients apply to any other signal that has been generated by an audio signal. In addition, an LPC device can also determine these coefficients and can also quantize/code these coefficients.

在該決策級中的該決策可以是信號適應性的以使得該決策級執行一音樂/語音區分並以音樂信號輸入至該上支路400中及語音信號輸入至該下支路500中之一方式來控制該開關200。在一實施例中，該決策級將它的決策資訊饋送至一輸出位元流以使得一解碼器可使用此決策資訊來執行該等正確的解碼操作。The decision in the decision stage may be signal adaptive such that the decision stage performs a music/speech distinction and is input to the upper branch 400 as a music signal and the speech signal is input to one of the lower branches 500. The way to control the switch 200. In an embodiment, the decision stage feeds its decision information to an output bit stream such that a decoder can use the decision information to perform the correct decoding operation.

在第1b圖中說明了此一解碼器。由該頻譜音訊編碼器421輸出之該信號在傳輸之後輸入至一頻譜音訊解碼器431中。該頻譜音訊解碼器431的輸出被輸入至一時域轉換器440。類似地，第1a圖之該LPC域編碼支路500的輸出在該解碼器端被接收並由元件531、533、534及532來處理以獲得一LPC激發信號。該LPC激發信號輸入至一LPC合成級540，該LPC合成級540接收由該相對應的LPC合成級510所產生的該LPC資訊作為一進一步的輸入。該時域轉換器440之輸出及/或該LPC合成級540之輸出被輸入至一開關600。該開關透過一開關控制信號來被控制，該開關控制信號例如由該決策級300而產生或受外部提供，諸如由該最初的單聲信號、立體聲信號或多聲道信號之一創建器提供。該開關600之輸出是一完整的單聲信號、立體聲信號或多聲道信號。This decoder is illustrated in Figure 1b. The signal output by the spectral audio encoder 421 is input to a spectral audio decoder 431 after transmission. The output of the spectral audio decoder 431 is input to a time domain converter 440. Similarly, the output of the LPC domain encoding branch 500 of Figure 1a is received at the decoder side and processed by elements 531, 533, 534 and 532 to obtain an LPC excitation signal. The LPC excitation signal is input to an LPC synthesis stage 540 that receives the LPC information generated by the corresponding LPC synthesis stage 510 as a further input. The output of the time domain converter 440 and/or the output of the LPC synthesis stage 540 is input to a switch 600. The switch is controlled by a switch control signal, such as generated by the decision stage 300 or externally provided, such as by the original mono, stereo, or multi-channel signal creator. The output of the switch 600 is a complete mono, stereo or multi-channel signal.

到該開關200及該決策級300的該輸入信號可以是一單聲信號、立體聲信號或多聲道信號或一般地一音訊信號。視自該開關200輸入信號或自任何外部來源(諸如構成輸入至級200的該信號的基礎之該最初音訊信號之一產生器)所取得的決策而定，該開關在該頻率編碼支路400與該LPC編碼支路500之間切換。該頻率編碼支路400包含一頻譜轉換級410及一隨後連接的量化/編碼級421。該量化/編碼級可包括如自現代的頻域編碼器(諸如AAC編碼器)已知之任何功能。此外，在該量化/編碼級421中的該量化操作可透過產生感知資訊(諸如一感知遮罩臨限)之一感知模組來控制，其中此資訊被輸入至該級421。The input signal to the switch 200 and the decision stage 300 can be a mono signal, a stereo signal or a multi-channel signal or generally an audio signal. Depending on the decision taken by the switch 200 input signal or from any external source, such as one of the initial audio signals that form the basis of the signal input to stage 200, the switch is at the frequency encoding branch 400. Switching with the LPC encoding branch 500. The frequency encoding branch 400 includes a spectral conversion stage 410 and a subsequently coupled quantization/encoding stage 421. The quantization/encoding stage may include any functionality as known from modern frequency domain encoders, such as AAC encoders. Moreover, the quantization operation in the quantization/encoding stage 421 can be controlled by generating a sensing module, such as a perceptual mask threshold, wherein the information is input to the stage 421.

在該LPC編碼支路中，該開關輸出信號透過產生LPC旁側資訊及LPC域信號之一LPC分析級510來處理。該激發編碼器創造性地包含一額外的開關，用於在該LPC域中的一量化/編碼操作522或處理在該LPC頻譜域中的值之一量化/編碼級524之間切換該LPC域信號之進一步的處理。為此目的，在該量化/編碼級524的輸入提供一頻譜轉換器523。該開關521以一開迴路方式或一閉合迴路方式來控制，視例如在該AMR-WB+技術說明中予以描述的特定設定而定。In the LPC encoding branch, the switch output signal is processed by the LPC analysis stage 510 which generates one of the LPC side information and the LPC domain signal. The excitation encoder creatively includes an additional switch for switching the LPC domain signal between a quantization/encoding operation 522 in the LPC domain or processing quantization/encoding stage 524 in one of the values in the LPC spectral domain. Further processing. For this purpose, a spectral converter 523 is provided at the input of the quantization/encoding stage 524. The switch 521 is controlled in an open loop manner or in a closed loop manner, depending, for example, on the particular settings described in the AMR-WB+ technical description.

針對該閉合迴路控制模式，該編碼器額外地包括針對該LPC域信號之一反向量化器/編碼器531、針對該LPC頻譜域信號之一反向量化器/編碼器533及針對項533之輸出的一反向頻譜轉換器534。在該第二編碼支路之該等處理支路中的編碼及解碼信號都輸入至該開關控制裝置525。在該開關控制裝置525中，這兩輸出信號彼此相互比較及/或以一目標函數相比較或可基於這兩信號上失真的一比較來計算一目標函數，以使得使用具有較低失真的該信號來決定該開關應該使用哪一位置。另外，在兩支路提供非恒定的位元率的情況下，可選擇提供該較低位元率的該支路，甚至當此支路之信號雜訊比另一支路之該信號雜訊比較低時。另外，該目標函數可使用每一信號之該信號雜訊比及每一信號之一位元率及/或額外的準則(作為一輸入)來找到針對一特定目標的最佳決策。如果，例如，目標是使得該位元率應該足夠低，則該目標函數將極大地依賴於由元件531、534輸出之該兩信號的位元率。然而，當主要目標是針對某一位元率具有最佳品質時，則該開關控制525可能例如丟棄在被該允許的位元率以上之每一信號，及當兩信號在該被允許的位元率以下時，該開關控制將選擇具有較好信號雜訊比(即具有較小量化/編碼失真)的信號。For the closed loop control mode, the encoder additionally includes an inverse quantizer/encoder 531 for the LPC domain signal, an inverse quantizer/encoder 533 for the LPC spectral domain signal, and an item 533 for An inverse spectrum converter 534 is output. The encoded and decoded signals in the processing branches of the second encoding branch are all input to the switch control means 525. In the switch control device 525, the two output signals are compared to each other and/or compared to an objective function or an objective function can be calculated based on a comparison of the distortions on the two signals such that the use of the lower distortion is used The signal determines which position the switch should use. In addition, in the case where the two branches provide a non-constant bit rate, the branch providing the lower bit rate may be selected, even when the signal noise of the branch is more than the signal noise of the other branch. When it is low. Additionally, the objective function can use the signal to noise ratio of each signal and one bit rate per signal and/or additional criteria (as an input) to find the best decision for a particular target. If, for example, the goal is to make the bit rate low enough, then the objective function will greatly depend on the bit rate of the two signals output by elements 531, 534. However, when the primary goal is to have the best quality for a certain bit rate, then the switch control 525 may, for example, discard each signal above the allowed bit rate, and when the two signals are in the allowed bit. Below the element rate, the switch control will select a signal with a better signal to noise ratio (i.e., with less quantization/coding distortion).

依據本發明的該解碼方案(如前所述)在第1b圖中說明。對於各該三種可能的輸出信號種類，存在一特定的解碼/解量化級431、531或533。當級431輸出一時間頻譜時，使用該頻率/時間轉換器440將該時間頻譜轉換成該時域，級531輸出一LPC域信號及項533輸出一LPC頻譜。為了確保到開關532的該等輸入信號都在LPC域中，提供了該LPC頻譜/LPC轉換器534。使用一LPC合成級540將該開關532的輸出資料轉換回到該時域中，該LPC合成級540是透過編碼器端產生及傳輸的LPC資訊來控制。接著，在區塊540之後，這兩支路都具有依據一開關控制信號切換之時域資訊以最終獲得視輸入至第1a圖之該編碼方案中的信號而定之一音訊信號，諸如一單聲信號、一立體聲信號或一多聲道信號。The decoding scheme (as described above) in accordance with the present invention is illustrated in Figure 1b. There is a particular decoding/dequantization stage 431, 531 or 533 for each of the three possible output signal types. When stage 431 outputs a time spectrum, the time frequency spectrum is converted to the time domain using the frequency/time converter 440, stage 531 outputs an LPC domain signal and item 533 outputs an LPC spectrum. To ensure that the input signals to switch 532 are all in the LPC domain, the LPC spectrum/LPC converter 534 is provided. The output data of the switch 532 is converted back to the time domain using an LPC synthesis stage 540, which is controlled by the LPC information generated and transmitted by the encoder. Then, after block 540, both of the branches have time domain information switched according to a switch control signal to finally obtain an audio signal, such as a single sound, which is obtained by inputting the signal into the coding scheme of FIG. 1a. Signal, a stereo signal or a multi-channel signal.

第1c圖說明具有與第4b圖的原理相類似之該開關521的一不同配置之一進一步的實施例。Figure 1c illustrates a further embodiment of a different configuration of the switch 521 having a similar principle to that of Figure 4b.

第2a圖說明依據本發明之一第二層面之一較佳編碼方案。連接至該開關200輸入之一共同的預處理方案可包含一環繞/聯合立體聲區塊101，該環繞/聯合立體聲區塊101產生聯合立體聲參數及一單聲輸出信號作為一輸出，該單聲輸出信號藉由降混(downmix)具有兩或多個通道之該輸入信號而產生。一般地，在區塊101之輸出的該信號也可以是具有兩或多個通道之一信號，但是由於區塊101的降混功能，在區塊101之輸出的通道數將比輸入至區塊101的通道數較小。Figure 2a illustrates a preferred coding scheme in accordance with one of the second aspects of the present invention. A pre-processing scheme common to one of the inputs of the switch 200 can include a surround/join stereo block 101 that produces a joint stereo parameter and a mono output signal as an output, the mono output The signal is generated by downmixing the input signal having two or more channels. In general, the signal at the output of block 101 may also be one of two or more channels, but due to the downmix function of block 101, the number of channels output at block 101 will be greater than the input to the block. The number of channels of 101 is small.

共同的預處理方案可包含(不同於該區塊101或除了該區塊101以外)一頻寬延伸級102。在第2a圖的實施例中，區塊101之輸出被輸入至該頻寬擴展區塊102，在第2a圖之該編碼器中該頻寬擴展區塊102在它的輸出輸出一限制頻帶的信號，諸如低頻信號或低通信號。較佳地，此信號也被下取樣(例如以二為因子)。此外，對於輸入至區塊102的該高頻帶的信號，頻寬擴展參數，諸如如自MPEG-4的HE-AAC 概述已知之頻譜包絡參數、反向濾波參數、雜訊層參數等被產生並轉送至一位元流多工器800。A common pre-processing scheme may include (different from or in addition to the block 101) a bandwidth extension stage 102. In the embodiment of Figure 2a, the output of block 101 is input to the bandwidth extension block 102. In the encoder of Figure 2a, the bandwidth extension block 102 outputs a limited band at its output. A signal, such as a low frequency signal or a low pass signal. Preferably, this signal is also downsampled (e.g., by a factor of two). Furthermore, for signals of the high frequency band input to the block 102, bandwidth extension parameters such as known as the HE-AAC outline from MPEG-4, spectral envelope parameters, inverse filtering parameters, noise layer parameters, etc. are generated and Transfer to a meta-stream multiplexer 800.

較佳地，該決策級300接收輸入至區塊101或輸入至區塊102的該信號以在例如一音樂模式或一語音模式之間決策。在該音樂模式選擇上編碼支路400，而在該語音模式選擇下編碼支路500。較佳地，該決策級額外地控制該聯合立體聲區塊101及/或該頻寬擴展區塊102以使這些區塊的功能適應該特定信號。因此，當該決策級定該輸入信號之某一時間部分是該第一模式諸如該音樂模式時，則區塊101及/或區塊102的特定特徵可用該決策級300來控制。另外，當該決策級300決定該信號在一語音模式或大體上在一第二LPC域模式時，則區塊101及102的特定特徵可依據該決策級輸出來控制。Preferably, the decision stage 300 receives the signal input to the block 101 or input to the block 102 to make a decision between, for example, a music mode or a speech mode. The branch 400 is encoded on the music mode selection, and the branch 500 is encoded in the voice mode selection. Preferably, the decision stage additionally controls the joint stereo block 101 and/or the bandwidth extension block 102 to adapt the functionality of the blocks to the particular signal. Thus, when the decision stage determines that a certain portion of the input signal is the first mode, such as the music mode, then particular features of block 101 and/or block 102 may be controlled by the decision stage 300. Additionally, when the decision stage 300 determines that the signal is in a speech mode or substantially in a second LPC domain mode, then the particular features of blocks 101 and 102 can be controlled based on the decision level output.

較佳地，使用一MDCT操作即更特定地扭曲時間操作來完成該編碼支路400之該頻譜轉換，其中強度或一般地該扭曲強度可在零與一高扭曲強度之間控制。在一零扭曲強度中，在區塊411中的該MDCT操作是在技藝中已知之一直接MDCT操作。該時間扭曲強度連同時間扭曲旁側資訊可傳輸/輸入至該位元流多工器800中作為旁側資訊。Preferably, the spectral conversion of the encoding branch 400 is accomplished using an MDCT operation, i.e., a more specific warping time operation, wherein the intensity or generally the torsional strength can be controlled between zero and a high distortion strength. In a zero twist strength, the MDCT operation in block 411 is one of the direct MDCT operations known in the art. The time warp strength along with the time warping side information can be transmitted/inputted into the bit stream multiplexer 800 as side information.

在該LPC編碼支路中，該LPC域編碼器可包括計算一音高(pitch)增益、一音高滯後及/或諸如一碼簿索引及增益之碼簿資訊之一ACELP核心526。自3GPP TS 26.290已知之該TCX模式引起該轉換域中一感知加權信號之一處理。使用具有雜訊因子量化的一分割多速率格量化(代數VQ)來量化一傅立葉轉換的加權信號。在1024、512或256取樣視窗中計算一轉換。該激發信號藉由反向濾波該量化加權信號經過一反向加權濾波器來恢復。In the LPC encoding branch, the LPC domain encoder may include an ACELP core 526 that calculates a pitch gain, a pitch lag, and/or one of the codebook information such as a codebook index and gain. The TCX mode known from 3GPP TS 26.290 causes processing of one of the perceptually weighted signals in the translation domain. A split multi-rate lattice quantization (algebraic VQ) with noise factor quantization is used to quantize the weighted signal of a Fourier transform. A conversion is calculated in a 1024, 512 or 256 sampling window. The excitation signal is recovered by inverse filtering the quantized weighted signal through a inverse weighting filter.

在該第一編碼支路中，一頻譜轉換器較佳地包含一特定調節的MDCT操作，該MDCT操作具有某些視窗函數、由可由一單一向量量化級組成之一量化/熵編碼級隨後，但較佳地是與該頻率域編碼支路中該量化器/編碼器即第2a圖中的項421類似之一結合的純量量化器/熵編碼器。In the first coding branch, a spectral converter preferably includes a specifically adjusted MDCT operation having certain window functions, a quantization/entropy coding stage consisting of a single vector quantization stage, followed by Preferably, however, it is a scalar quantizer/entropy coder that is combined with one of the quantizer/encoder in the frequency domain coding branch, i.e., item 421 in Fig. 2a.

在該第二編碼支路中，存在該LPC區塊510，其後是一開關521，該開關521又由一ACELP區塊526或一TCX區塊527隨後。ACELP在3GPP TS 26.190中予以描述及TCX在3GPP TS 26.290中予以描述。一般地，該ACELP區塊526接收如由在第7e圖中予以描述的一程序計算之一LPC激發信號。該TCX區塊527接收如參照第7f圖產生之一加權信號。In the second encoding branch, the LPC block 510 is present, followed by a switch 521, which in turn is followed by an ACELP block 526 or a TCX block 527. ACELP is described in 3GPP TS 26.190 and TCX is described in 3GPP TS 26.290. In general, the ACELP block 526 receives an LPC excitation signal as calculated by a program as described in Figure 7e. The TCX block 527 receives a weighted signal as produced with reference to Figure 7f.

在TCX，該轉換施於藉由濾波該輸入信號經過一基於LPC的加權濾波器而運算之加權信號。在本發明之較佳實施例中使用的該加權濾波器由(1-A (z /γ ))/(1-μz ^-1 )給定。因此，該加權信號是一LPC域信號及其轉換是一LPC頻譜域。被ACELP區塊526處理之信號是該激發信號且與被該區塊527所處理的該信號不同，但這兩信號都在該LPC域中。At TCX, the conversion is applied to a weighted signal that is computed by filtering the input signal through an LPC-based weighting filter. The weighting filter used in the preferred embodiment of the invention is given by (1- A ( z / γ )) / (1 - μz ^-1 ). Therefore, the weighted signal is an LPC domain signal and its conversion is an LPC spectral domain. The signal processed by ACELP block 526 is the excitation signal and is different from the signal processed by block 527, but both signals are in the LPC domain.

在在第2b圖中說明的該解碼器端，在區塊537該反向頻譜轉換之後，該加權濾波器的反向被應用，即(1-μz ^-1 )/(1-A (z /γ ))。接著，該信號被濾波經過(1-A(z))以進入該LPC激發域。因此，到LPC域區塊534及該TCX^-1 區塊537的轉換包括反向轉換及接著濾波經過(1-A (z ))以自該加權域轉換至該激發域。At the decoder side illustrated in Figure 2b, after the inverse spectral conversion of block 537, the inverse of the weighting filter is applied, i.e., (1- μz ^-1 ) / (1 - A ( z / γ )). The signal is then filtered through (1-A(z)) to enter the LPC excitation domain. Therefore, the conversion to the LPC domain block 534 and the TCX ^-1 block 537 includes reverse conversion and subsequent filtering. (1- A ( z )) is converted from the weighting domain to the excitation domain.

雖然在第1a、1c、2a、2c圖中的項510說明一單一區塊，但是區塊510可輸出不同的信號，只要這些信號在該LPC域中。區塊510的實際模式，諸如該激發信號模式或該加權信號模式可視該實際的開關狀態而定。另外，該區塊510可具有兩並行處理裝置，其中一裝置類似於第7e圖而實施及另一裝置如第7f圖而實施。因此，在510之輸出的該LPC域可表示該LPC激發信號或該LPC加權信號或任何其它LPC域信號。Although item 510 in the 1a, 1c, 2a, 2c diagram illustrates a single block, block 510 can output different signals as long as the signals are in the LPC domain. The actual mode of block 510, such as the excitation signal pattern or the weighted signal pattern, may depend on the actual switching state. Additionally, the block 510 can have two parallel processing devices, one of which is implemented similar to Figure 7e and the other device is implemented as Figure 7f. Thus, the LPC field at the output of 510 can represent the LPC excitation signal or the LPC weighted signal or any other LPC domain signal.

在第2a圖或第2c圖之該第二編碼支路(ACELP/TCX)中，該信號在編碼之前透過一濾波器1-0.68z ^-1 來預加強。在第2b圖的該ACELP/TCX解碼器，該合成信號被該濾波器1/(1-0.68z ^-1 )去加強。該預加強可以是該LPC區塊510的一部分，其中該信號在LPC分析及量化之前被預加強。類似地，去加強可以是該LPC合成區塊LPC^-1 540的一部分。In the second coding branch (ACELP/TCX) of Fig. 2a or 2c, the signal is pre-emphasized by a filter 1-0.68 z ^-1 prior to encoding. In the ACELP/TCX decoder of Figure 2b, the composite signal is boosted by the filter 1/(1-0.68 z ^-1 ). The pre-emphasis may be part of the LPC block 510 where the signal is pre-emphasized prior to LPC analysis and quantization. Similarly, de-enhancement may be part of the LPC synthesis block LPC ^-1 540.

第2c圖說明針對第2a圖的實施之一進一步的實施例，但卻具有類似於第4b圖的原理之該開關521之一不同的配置。Figure 2c illustrates a further embodiment of one of the implementations of Figure 2a, but with a different configuration than the switch 521 of the principle of Figure 4b.

在一較佳實施例中，該第一開關200(見第1a圖或第2a圖)透過一開迴路決策來控制(如第4a圖中)及該第二開關透過一閉合迴路決策來控制(如第4b圖中)。In a preferred embodiment, the first switch 200 (see FIG. 1a or FIG. 2a) is controlled by an open loop decision (as in FIG. 4a) and the second switch is controlled by a closed loop decision ( As shown in Figure 4b).

例如，第2c圖具有如在第4b圖中置於該ACELP及TCX支路之後之該第二開關。接著，在該第一處理支路中，該第一LPC域表示該LPC激發，及在該第二處理支路中，該第二LPC域表示該LPC加權信號。即，該第一LPC域信號藉由濾波經過(1-A(z))以轉換至該LPC殘餘域來獲得，而該第二LPC域信號藉由濾波經過該濾波器(1-A (z /γ ))/(1-μz ^-1 )以轉換至該LPC加權域來獲得。For example, Figure 2c has the second switch placed after the ACELP and TCX branches as shown in Figure 4b. Next, in the first processing branch, the first LPC field indicates the LPC excitation, and in the second processing branch, the second LPC field indicates the LPC weighting signal. That is, the first LPC domain signal is obtained by filtering through (1-A(z)) to be converted to the LPC residual domain, and the second LPC domain signal is filtered by the filter (1- A ( z) / γ )) / (1 - μz ^-1 ) is obtained by switching to the LPC weighting domain.

第2b圖說明與第2a圖之該編碼方案相對應之一解碼方案。由第2a圖之位元流多工器800產生的該位元流輸入至一位元流解多工器900。視例如自經由一模式檢測區塊601的位元流而取得的一資訊而定，控制一解碼器端開關600轉送來自該上支路的信號抑或來自該下支路的信號至該頻寬擴展區塊701。該頻寬擴展區塊701自該位元流解多工器900接收旁側資訊並基於開關600輸出的該低頻帶重建該高頻帶。Figure 2b illustrates one of the decoding schemes corresponding to the encoding scheme of Figure 2a. The bit stream generated by the bit stream multiplexer 800 of Fig. 2a is input to the one bit stream demultiplexer 900. Depending on, for example, a message obtained via a bitstream of a mode detection block 601, a decoder-side switch 600 is controlled to forward a signal from the upper leg or a signal from the lower leg to the bandwidth extension. Block 701. The bandwidth extension block 701 receives the side information from the bit stream demultiplexer 900 and reconstructs the high frequency band based on the low frequency band output by the switch 600.

由區塊701產生的全頻帶信號輸入至重建兩立體聲通道或幾個多聲道之該聯合立體聲/環繞處理級702。一般地，區塊702將輸出比輸入至此區塊較多的通道。視應用而定，到區塊702的該輸入甚至可包括諸如在一立體聲模式中的兩通道或甚至可包括多個通道，只要此區塊的輸出具有比到此區塊的輸入較多的通道。The full band signal generated by block 701 is input to the joint stereo/surround processing stage 702 that reconstructs two stereo channels or several multi-channels. In general, block 702 will output more channels than input to this block. Depending on the application, the input to block 702 may even include two channels, such as in a stereo mode, or even multiple channels, as long as the output of this block has more channels than the input to the block. .

該開關200已顯示為在兩支路之間切換以使得只有一支路接收一信號來處理而另一支路並不接收一信號來處理。在一可選擇的實施例中，然而，該開關也可配置在例如該音訊編碼器421及該激發編碼器522、523、524之後，這意味著兩支路400、500並行地處理相同的信號。為了不使位元率翻倍，然而，只有由這些編碼支路400或500當中之一支路輸出的信號被選擇寫入至該輸出位元流中。該決策級接著將運作以使得寫入至該位元流中的該信號最小化某一成本函數，其中該成本函數可以是該產生的位元率或該產生的感知失真或一結合的比率/失真成本函數。因此，在此模式或是在多個圖中說明的模式中，該決策級也可在一閉合迴路模式中運作以確認最終只有該編碼支路被寫入至對於一給定感知失真具有最低的位元率或對於一給定的位元率具有最低的感知失真之該位元流。在該閉合迴路模式中，該饋送輸入可自第1a圖中的該三個量化器/定標器區塊421、522及424之輸出而取得。The switch 200 has been shown to switch between the two branches such that only one path receives a signal for processing and the other does not receive a signal for processing. In an alternative embodiment, however, the switch can also be configured, for example, after the audio encoder 421 and the excitation encoders 522, 523, 524, which means that the two branches 400, 500 process the same signal in parallel. . In order not to double the bit rate, however, only signals output by one of these encoding branches 400 or 500 are selectively written into the output bit stream. The decision stage will then operate to minimize the signal written to the bit stream by a certain cost function, wherein the cost function can be the generated bit rate or the resulting perceived distortion or a combined ratio / Distortion cost function. Thus, in this mode or in the modes illustrated in the various figures, the decision stage can also operate in a closed loop mode to confirm that only the coded branch is ultimately written to have the lowest for a given perceptual distortion. The bit rate or the bit stream with the lowest perceptual distortion for a given bit rate. In the closed loop mode, the feed input can be taken from the outputs of the three quantizer/scaler blocks 421, 522, and 424 in Figure 1a.

在具有兩開關(即該第一開關200及該第二開關521)之實施中，較佳的是，針對該第一開關的該時間解析度比針對該第二開關的該時間解析度較低。換言之，到該第一開關的該輸入信號之該等區塊(透過一開關操作而切換)比由在該LPC域中運作之該第二開關切換的該等區塊較大。示範地，該頻域/LPC域開關200可切換長度為1024取樣之區塊，及該第二開關521可切換每一具有256取樣之區塊。In an implementation with two switches (ie, the first switch 200 and the second switch 521), it is preferable that the time resolution for the first switch is lower than the time resolution for the second switch. . In other words, the blocks of the input signal to the first switch (switched by a switching operation) are larger than the blocks switched by the second switch operating in the LPC domain. Illustratively, the frequency domain/LPC domain switch 200 can switch blocks of length 1024 samples, and the second switch 521 can switch blocks each having 256 samples.

雖然第1a圖至第10b圖中的一些說明為一裝置之一方塊圖，但是這些圖同時是一方法之一說明，其中多個方塊功能對應於多個方法步驟。Although some of the figures 1a through 10b are illustrated as a block diagram of a device, these figures are also an illustration of one of the methods in which a plurality of block functions correspond to a plurality of method steps.

第3a圖說明用於產生一編碼的音訊信號作為該第一編碼支路400及一第二編碼支路500的一輸出之一音訊編碼器。此外，該編碼的音訊信號較佳地包括旁側資訊，諸如來自該共同預處理級的預處理參數或如針對前圖所作討論之開關控制資訊。Figure 3a illustrates an audio encoder for generating an encoded audio signal as an output of the first encoding branch 400 and a second encoding branch 500. Moreover, the encoded audio signal preferably includes side information such as pre-processing parameters from the common pre-processing stage or switch control information as discussed with respect to the previous figures.

較佳地，該第一編碼支路是可操作的以依據一第一編碼演算法編碼一音訊中間信號195，其中該第一編碼演算法具有一資訊槽模型。該第一編碼支路400產生是該音訊中間信號195之一編碼的頻譜資訊表示之第一編碼輸出信號。Preferably, the first encoding branch is operable to encode an audio intermediate signal 195 in accordance with a first encoding algorithm, wherein the first encoding algorithm has an information slot model. The first encoding branch 400 generates a first encoded output signal that is a spectral information representation encoded by one of the audio intermediate signals 195.

此外，該第二編碼支路500適於依據一第二編碼演算法來編碼該音訊中間信號195，該第二編碼演算法具有一資訊來源模型並針對表示該中間音訊信號之該資訊來源模型產生(在一第二編碼器輸出信號中)編碼的參數。In addition, the second encoding branch 500 is adapted to encode the audio intermediate signal 195 according to a second encoding algorithm, the second encoding algorithm having an information source model and generating the information source model for the intermediate audio signal The parameter (in a second encoder output signal) is encoded.

該音訊編碼器更進一步包含用於預處理一音訊信號99以獲得該音訊中間信號195之共同預處理級。特定地，該共同預處理級可操作地用以處理該音訊輸入信號99以使得該音訊中間信號195(即該共同預處理演算法之輸出)是該音訊輸入信號之一壓縮版本。The audio encoder further includes a common pre-processing stage for pre-processing an audio signal 99 to obtain the intermediate audio signal 195. In particular, the common pre-processing stage is operative to process the audio input signal 99 such that the audio intermediate signal 195 (i.e., the output of the common pre-processing algorithm) is a compressed version of the audio input signal.

用於產生一編碼的音訊信號之音訊編碼之一較佳的方法包含一步驟：依據一第一編碼演算法編碼400一音訊中間信號195，該第一編碼演算法具有一資訊槽模型並產生(在一第一輸出信號中)表示該音訊信號之編碼的頻譜資訊；一步驟：依據一第二編碼演算法編碼500一音訊中間信號195，該第二編碼演算法具有一資訊來源模型並產生(在一第二輸出信號中)針對表示該中間信號195之該資訊來源模型之編碼的參數及一步驟：共同地預處理100一音訊輸入信號99以獲得該音訊中間信號195，其中在共同預處理該步驟中，該音訊輸入信號99被處理以使得該音訊中間信號195是該音訊輸入信號99之一壓縮版本，其中該編碼的音訊信號針對該音訊信號之某一部分包括該第一輸出信號或該第二輸出信號。該方法較佳地包括進一步的步驟：使用該第一編碼演算法或使用該第二編碼演算法來編碼該音訊中間信號之某一部分或使用這兩種演算法來編碼該信號，並將該第一編碼演算法之結果或該第二編碼演算法之結果輸出在一編碼的信號中。A preferred method for generating an audio code of an encoded audio signal comprises the steps of: encoding a 400-intermediate signal 195 according to a first coding algorithm, the first coding algorithm having an information slot model and generating ( In a first output signal, the spectral information of the encoded audio signal is represented; a step of: encoding a 500-intermediate signal 195 according to a second encoding algorithm, the second encoding algorithm having an information source model and generating ( In a second output signal, a parameter for encoding the information source model of the intermediate signal 195 and a step of: pre-processing 100 an audio input signal 99 to obtain the audio intermediate signal 195, wherein the common pre-processing In this step, the audio input signal 99 is processed such that the audio intermediate signal 195 is a compressed version of the audio input signal 99, wherein the encoded audio signal includes the first output signal or a portion of the audio signal. Second output signal. The method preferably includes the further step of encoding a portion of the intermediate signal of the audio using the first encoding algorithm or using the second encoding algorithm or encoding the signal using the two algorithms, and The result of a coding algorithm or the result of the second coding algorithm is output in an encoded signal.

一般地，在該第一編碼支路400中使用的該音訊編碼演算法反映並模擬在一音訊槽中的情況。一音訊資訊的槽通常是人類耳朵。人類耳朵可被模擬為一頻率分析器。因此，該第一編碼支路輸出編碼的頻譜資訊。較佳地，該第一編碼支路更進一步包括用於額外地施以一感知遮罩臨限之一感知模型。當量化音訊頻譜值時使用此感知遮罩臨限，其中較佳地，該量化被執行使得透過量化隱藏在該感知遮罩臨限以下之該等頻譜音訊值而引入一量化雜訊。In general, the audio coding algorithm used in the first encoding branch 400 reflects and simulates the situation in an audio slot. The slot of an audio message is usually the human ear. The human ear can be modeled as a frequency analyzer. Therefore, the first encoding branch outputs the encoded spectral information. Preferably, the first encoding branch further comprises a sensing model for additionally applying a perceptual mask threshold. The perceptual mask threshold is used when quantizing the audio spectral values, wherein preferably the quantization is performed such that a quantization noise is introduced by quantizing the spectral audio values hidden below the perceptual mask threshold.

該第二編碼支路表示反映聲音產生之一資訊來源模型。因此，資訊來源模型可包括一語音模型，該語音模型透過一LPC分析級而反映，即透過將一時域信號轉換成一LPC域並透過隨後處理該LPC殘餘信號(即該激發信號)。然而，可選擇的聲音來源模型是用於表示某一樂器之聲音來源模型或任何其它聲音產生器，諸如存在在現實世界中的一特定聲音來源。當例如基於一SNR計算，即基於該等來源模型是最適於編碼一音訊信號之某一時間部分及/或頻率部分之一計算，可得幾個聲音來源模型時，可執行不同聲音來源模型之間的一選擇。然而，較佳地，在該時域中執行編碼支路之間的切換，即使用一模型編碼某一時間部分並使用另一編碼支路編碼該中間信號之某一不同的時間部分。The second coding branch represents an information source model that reflects sound generation. Thus, the information source model can include a speech model that is reflected by an LPC analysis stage by converting a time domain signal into an LPC domain and subsequently processing the LPC residual signal (ie, the excitation signal). However, the alternative sound source model is a sound source model for representing an instrument or any other sound generator, such as a particular sound source that exists in the real world. When, for example, based on an SNR calculation, that is, based on the fact that the source model is most suitable for encoding a certain time portion and/or frequency portion of an audio signal, several sound source models can be obtained, and different sound source models can be executed. A choice between the two. Preferably, however, switching between the encoding branches is performed in the time domain, i.e., using a model to encode a certain time portion and encoding another different time portion of the intermediate signal using another encoding branch.

用某些參數來表示資訊來源模型。當考慮一現代語音編碼器諸如AMR-WB+時，至於該語音模型，該等參數是LPC參數及編碼的激發參數。該AMR-WB+包含一ACELP編碼器及一TCX編碼器。在此情況中，該等編碼的激發參數可以是全域增益、雜訊層及變化的長度編碼。Some parameters are used to represent the information source model. When considering a modern speech coder such as AMR-WB+, as for the speech model, the parameters are LPC parameters and encoded excitation parameters. The AMR-WB+ includes an ACELP encoder and a TCX encoder. In this case, the encoded excitation parameters may be global gain, noise layer, and varying length coding.

第3b圖說明相對應於第3a圖中說明的該編碼器之一解碼器。大體上，第3b圖說明用於解碼一編碼的音訊信號以獲得一解碼的音訊信號799之一解碼器。該解碼器包括用於解碼依據具有一資訊槽模型的一第一編碼演算法而編碼之一編碼的信號之該第一解碼支路450。該音訊解碼器更進一步包括用於解碼依據具有一資訊來源模型的一第二編碼演算法而編碼之一編碼的資訊信號之一第二解碼支路550。該音訊解碼器更進一步包括用於將來自該第一解碼支路450與該第二解碼支路550的輸出信號相組合以獲得一組合的信號之一組合器。在第3b圖中說明之該組合的信號作為該解碼的音訊中間信號輸入至用於後處理該解碼的音訊中間信號699(由合器600輸出的該組合的信號)之一共同後處理級，以使得該共同預處理級之一輸出信號是該組合的信號之擴充版本。因此，該解碼音訊信號799較之該解碼的音訊中間信號699具有一增強的資訊內容。此資訊擴充在預/後處理參數的幫助下由該共同後處理級提供，該等預/後處理參數可自一編碼器傳輸至一解碼器或可自該解碼的音訊中間信號本身取得。然而，較佳地，預/後處理參數自一編碼器傳輸至一解碼器，因為此程序允許該解碼音訊信號之一改良的品質。Figure 3b illustrates one of the encoders corresponding to the encoder illustrated in Figure 3a. In general, Figure 3b illustrates a decoder for decoding an encoded audio signal to obtain a decoded audio signal 799. The decoder includes the first decoding branch 450 for decoding a signal encoded in accordance with a first encoding algorithm having a slot model. The audio decoder further includes a second decoding branch 550 for decoding one of the information signals encoded in accordance with a second encoding algorithm having an information source model. The audio decoder further includes a combiner for combining the output signals from the first decoding branch 450 and the second decoding branch 550 to obtain a combination. The combined signal illustrated in FIG. 3b is input as a decoded intermediate signal to a post-processing stage that is used to post-process the decoded intermediate intermediate signal 699 (the combined signal output by the combiner 600). The output signal of one of the common pre-processing stages is an extended version of the combined signal. Therefore, the decoded audio signal 799 has an enhanced information content as compared to the decoded audio intermediate signal 699. This information extension is provided by the common post-processing stage with the aid of pre/post processing parameters which may be transmitted from an encoder to a decoder or may be taken from the decoded intermediate intermediate signal itself. Preferably, however, the pre/post processing parameters are transmitted from an encoder to a decoder because the program allows for improved quality of one of the decoded audio signals.

第3c圖說明用於解碼一音訊輸入信號195之一音訊解碼器，依據本發明之該較佳實施例，該音訊輸入信號195可等同於第3a圖的該中間音訊信號195。該音訊輸入信號195出現在一第一域中，該第一域例如可以是時域但其也可以是任何其它域，諸如一頻域、一LPC域、一LPC頻譜域或任何其它域。大體上，由一轉換演算法來執行自一域至另一域的轉換，諸如習知的時間/頻率轉換演算法或頻率/時間演算法中之任一者。Figure 3c illustrates an audio decoder for decoding an audio input signal 195. According to the preferred embodiment of the present invention, the audio input signal 195 can be identical to the intermediate audio signal 195 of Figure 3a. The audio input signal 195 appears in a first domain, which may be, for example, a time domain but may be any other domain, such as a frequency domain, an LPC domain, an LPC spectral domain, or any other domain. In general, the conversion from one domain to another is performed by a conversion algorithm, such as any of the conventional time/frequency conversion algorithms or frequency/time algorithms.

例如自該時域至該LPC域之一可選擇的轉換是濾波一時域信號之LPC的結果，其引起一LPC殘餘信號或激發信號。隨著可能出現的情況可使用產生在轉換前對大量的信號取樣具有影響的一濾波信號之任何其它的濾波操作作為一轉換演算法。因此，使用一基於LPC的加權濾波器對一音訊信號加權是一進一步的轉換，該轉換在該LPC域中產生一信號。在一時間/頻率轉換中，對一單一頻譜值的修改對在轉換之前的所有時域值都具有影響。類似地，對任何時域取樣的修改將對每一頻域取樣具有影響。類似地，對在一LPC域情況中之激發信號之一取樣的一修改由於該LPC濾波器的長度將對在該LPC濾波之前的大量取樣具有影響。類似地，在一LPC轉換之前對一取樣的一修改將對此LPC轉換所獲得的許多取樣具有影響，由於該LPC濾波器之內在的記憶體效應。For example, a selectable transition from the time domain to the LPC domain is the result of filtering the LPC of a time domain signal, which causes an LPC residual signal or an excitation signal. As may be the case, any other filtering operation that produces a filtered signal that has an effect on a large number of signal samples prior to conversion can be used as a conversion algorithm. Therefore, weighting an audio signal using an LPC-based weighting filter is a further conversion that produces a signal in the LPC domain. In a time/frequency conversion, a modification to a single spectral value has an effect on all time domain values prior to conversion. Similarly, modifications to any time domain samples will have an impact on each frequency domain sample. Similarly, a modification to sampling one of the excitation signals in the case of an LPC domain will have an effect on the large number of samples prior to the LPC filtering due to the length of the LPC filter. Similarly, a modification to a sample prior to an LPC conversion will have an impact on many of the samples obtained for this LPC conversion due to the inherent memory effects of the LPC filter.

第3c圖之該音訊編碼器包括產生一第一編碼信號之一第一編碼支路400。此第一編碼信號可以是在一第四域中，在該較佳實施例中，該第四域是該時間頻譜域，即當一時域信號經由一時間/頻率轉換被處理時所獲得的域。The audio encoder of Figure 3c includes a first encoding branch 400 that produces a first encoded signal. The first encoded signal may be in a fourth domain. In the preferred embodiment, the fourth domain is the time spectral domain, ie, the domain obtained when a time domain signal is processed via a time/frequency conversion. .

因此，用於編碼一音訊信號之該第一編碼支路400使用一第一編碼演算法來獲得一第一編碼信號，其中此第一編碼演算法可或可不包括一時間/頻率轉換演算法。Therefore, the first encoding branch 400 for encoding an audio signal uses a first encoding algorithm to obtain a first encoded signal, wherein the first encoding algorithm may or may not include a time/frequency conversion algorithm.

該音訊編碼器更進一步包括用於編碼一音訊信號之一第二編碼支路500。該第二編碼支路500使用與該第一編碼演算法不同的一第二編碼演算法來獲得一第二編碼信號。The audio encoder further includes a second encoding branch 500 for encoding an audio signal. The second encoding branch 500 obtains a second encoded signal using a second encoding algorithm different from the first encoding algorithm.

該音訊編碼器更進一步包括一第一開關200，該第一開關200用於在該第一編碼支路400與該第二編碼支路500之間切換以使得對於該音訊輸入信號之一部分，在區塊400之輸出的該第一編碼信號或在該第二編碼支路之輸出的該第二編碼信號被包括在一編碼器輸出信號中。因此，當對於該音訊輸入信號195之某一部分，在該第四域中的該第一編碼信號被包括在該編碼器輸出信號中時，是在該第二域中的該第一處理信號或是在該第三域中的該第二處理信號之該第二編碼信號不被包括在該編碼器輸出信號中。這確保了此編碼器在位元率上是高效的。在實施例中，被包括在兩不同的編碼信號中的該音訊信號之任何時間部分較之如將針對第3e圖討論的一訊框之一訊框長度是小的。在一開關事件的情況下，這些小部分針對自一編碼信號至另一編碼信號之一交錯淡出是有用的以減少在沒有任何交錯淡出的情況下可能出現的偽影。因此，除開該交錯淡出的區域，用只有一單一域之一編碼信號來表示每一時域區塊。The audio encoder further includes a first switch 200 for switching between the first encoding branch 400 and the second encoding branch 500 such that a portion of the audio input signal is The first encoded signal output by block 400 or the second encoded signal at the output of the second encoding branch is included in an encoder output signal. Therefore, when a certain portion of the audio input signal 195 is included in the encoder output signal for the portion of the audio input signal 195, the first processed signal in the second domain or The second encoded signal of the second processed signal in the third domain is not included in the encoder output signal. This ensures that this encoder is efficient at bit rate. In an embodiment, any time portion of the audio signal included in the two different encoded signals is smaller than the frame length of a frame as discussed for Figure 3e. In the case of a switching event, these small portions are useful for staggering out from one encoded signal to another to reduce artifacts that may occur without any interleaving. Thus, in addition to the interleaved area, each time domain block is represented by a signal encoded by only one of the single fields.

如第3c圖所述，該第二編碼支路500包含用於將在該第一域中的該音訊信號(即信號195)轉換至一第二域中之一轉換器510。此外，該第二編碼支路500包含一第一處理支路522，該第一處理支路522用於處理在該第二域中的一音訊信號以獲得也在該第二域中之一第一處理的信號，使得該第一處理支路522沒有執行一域改變。As described in FIG. 3c, the second encoding branch 500 includes means for converting the audio signal (ie, signal 195) in the first domain to a converter 510 in a second domain. In addition, the second encoding branch 500 includes a first processing branch 522 for processing an audio signal in the second domain to obtain one of the second domains. A processed signal causes the first processing branch 522 to not perform a domain change.

該第二編碼支路500更進一步包含一第二處理支路523、524，該第二處理支路523、524將在該第二域中的該音訊信號轉換至一第三域中並處理在該第三域中的該音訊信號以在該第二處理支路523、524的輸出獲得一第二處理的信號，其中該第三域與該第一域不同且也與該第二域不同。The second encoding branch 500 further includes a second processing branch 523, 524 that converts the audio signal in the second domain to a third domain and processes The audio signal in the third domain obtains a second processed signal at the output of the second processing branch 523, 524, wherein the third domain is different from the first domain and also different from the second domain.

此外，該第二編碼支路包含一第二開關521，該第二開關521用於在該第一處理支路522與該第二處理支路523、524之間切換以使得對於輸入至該第二編碼支路中的該音訊信號之一部分，在該第二域中的該第一處理信號或在該第三域中之該第二處理信號在該第二編碼信號中。In addition, the second encoding branch includes a second switch 521 for switching between the first processing branch 522 and the second processing branch 523, 524 such that for input to the first A portion of the audio signal in the second encoding branch, the first processed signal in the second domain or the second processed signal in the third domain is in the second encoded signal.

第3d圖說明用於解碼由第3c圖之該編碼器產生的一編碼的音訊信號之一相對應的解碼器。除開一可取捨的交錯淡出區域，該交錯淡出區域較之一訊框的長度較佳是短的以獲得盡可能在該關鍵取樣極限之一系統，用一第二域信號、一第三域信號或一第四域編碼信號來表示該第一域音訊信號之每一區塊。該編碼的音訊信號包括該第一編碼信號、在一第二域中之一第二編碼信號及在一第三域中之一第三編碼的信號，其中該第一編碼信號、該第二編碼信號及該第三編碼的信號都涉及該已解碼的音訊信號之不同的時間部分及其中對於一已解碼的音訊信號而言，該第二域、該第三域及該第一域彼此都是不同的。Figure 3d illustrates a decoder for decoding one of the encoded audio signals produced by the encoder of Figure 3c. In addition to a contiguous fading area, the interleaved fading area is preferably shorter than the length of the frame to obtain a system as far as possible at the critical sampling limit, using a second domain signal, a third domain signal Or a fourth domain coded signal to represent each block of the first domain audio signal. The encoded audio signal includes the first encoded signal, a second encoded signal in a second domain, and a third encoded signal in a third domain, wherein the first encoded signal, the second encoded The signal and the third encoded signal all relate to different time portions of the decoded audio signal and for a decoded audio signal, the second domain, the third domain and the first domain are each other different.

該解碼器包含用於基於該第一編碼演算法解碼之一第一解碼支路。該第一解碼支路在第3d圖中的431、440說明及較佳地包含一頻率/時間轉換器。該第一編碼信號較佳地在一第四域中且轉換至針對該已解碼的輸出信號之該第一域中。The decoder includes one of the first decoding branches for decoding based on the first encoding algorithm. The first decoding branch is illustrated at 431, 440 in Figure 3d and preferably includes a frequency/time converter. The first encoded signal is preferably in a fourth domain and converted to the first domain for the decoded output signal.

第3d圖之該解碼器更進一步包含一第二解碼支路，該第二解碼支路包含幾個元件。這些元件是一第一反向處理支路531，該第一反向處理支路531用於反向處理該第二編碼信號以在區塊531之輸出獲得在該第二域中之一第一反向處理的信號。該第二解碼支路更進一步包含一第二反向處理支路533、534，該第二反向處理支路533、534用於反向處理一第三編碼的信號以獲得在該第二域中之一第二反向處理的信號，其中該第二反向處理支路包含用於自該第三域轉換至該第二域之一轉換器。The decoder of Fig. 3d further comprises a second decoding branch comprising several elements. These elements are a first inverse processing branch 531 for inverse processing the second encoded signal to obtain one of the first in the second domain at the output of block 531. Reverse processed signal. The second decoding branch further includes a second reverse processing branch 533, 534 for inverse processing a third encoded signal to obtain the second domain And a second reverse processed signal, wherein the second reverse processing branch includes a converter for converting from the third domain to the second domain.

該第二編碼支路更進一步包含一第一合器532，該第一組合器532用於將該第一反向處理的信號與該第二反向處理的信號相組合以獲得在該第二域中的一信號，其中此已組合的信號在該第一時間瞬時只受該第一反向處理的信號影響及在一隨後時間瞬時只受該第二反向處理的信號影響。The second encoding branch further includes a first combiner 532 for combining the first inverse processed signal with the second inverse processed signal to obtain the second A signal in the domain, wherein the combined signal is only affected by the first reverse processed signal at the first time instant and is only affected by the second reverse processed signal at a subsequent time instant.

該第二解碼支路更進一步包含用於將該已組合的信號轉換至該第一域之一轉換器540。The second decoding branch further includes a converter 540 for converting the combined signal to the first domain.

最後，在第3d圖中說明的該解碼器包含一第二組合器600，該第二組合器600用於將來自區塊431、440之該已解碼的第一信號與該轉換器540輸出信號相組合以獲得在該第一域中之一已解碼的輸出信號。再者，在該第一域中之該已解碼的輸出信號在該第一時間瞬時只受由該轉換器540輸出的信號影響及在一隨後時間瞬時只受由區塊431、440輸出的該第一已解碼的信號影響。Finally, the decoder illustrated in FIG. 3d includes a second combiner 600 for outputting the decoded first signal from blocks 431, 440 and the converter 540. The phases are combined to obtain an output signal that has been decoded in one of the first domains. Moreover, the decoded output signal in the first domain is only affected by the signal output by the converter 540 at the first time instant and is only output by the blocks 431, 440 at a subsequent time instant. The first decoded signal is affected.

自一編碼器的視角在第3e圖中說明了此情況。第3e圖中的上部分在示意表示中說明一第一域音訊信號，諸如一時域音訊信號，其中該時間索引自左至右增加及項3可被認為表示第3c圖中的該信號195之一連串音訊取樣。第3e圖說明可透過在該第一編碼信號及該第一處理信號及第二處理信號(如在第3e圖中的項4說明)之間切換而產生之訊框3a、3b、3c、3d。該第一編碼信號、該第一處理的信號及該第二處理的信號都在不同的域中且為了確保在該等不同域之間的切換在該解碼器端不導致一偽影，該時域信號之訊框3a、3b具有指示為一交錯淡出區域之一重疊範圍，及此一交錯淡出區域在訊框3b及3c。然而，在訊框3d、3c之間不存在此交錯淡出區域，這意味著，訊框3d也由一第二處理的信號(即在該第三域中的一信號)來表示，及在訊框3c與3d之間不存在域改變。因此，大體上，在不存在域變化的情況下不提供一交錯淡出是較佳的，而當存在一域變化(即該兩開關當中之一的一切換動作)時要提供一交錯淡出區域，即由兩隨後已編碼/處理信號編碼之該音訊信號的一部分。較佳地，交錯淡出是針對其它域變化而執行。This is illustrated in Figure 3e from the perspective of an encoder. The upper portion of Figure 3e illustrates in a schematic representation a first domain audio signal, such as a time domain audio signal, wherein the time index increases from left to right and item 3 can be considered to represent the signal 195 in Figure 3c. A series of audio samples. Figure 3e illustrates frames 3a, 3b, 3c, 3d that are generated by switching between the first encoded signal and the first processed signal and the second processed signal (as described in item 4 of Figure 3e) . The first encoded signal, the first processed signal, and the second processed signal are all in different domains and in order to ensure that switching between the different domains does not cause an artifact at the decoder end, then The frame signals 3a, 3b of the domain signal have an overlap indicating one of the staggered fade areas, and the staggered fade areas are in frames 3b and 3c. However, there is no such interlaced fade-out area between the frames 3d, 3c, which means that the frame 3d is also represented by a second processed signal (ie, a signal in the third domain), and There is no domain change between boxes 3c and 3d. Therefore, in general, it is preferable not to provide a staggered fade in the absence of a domain change, and to provide a staggered fade-out region when there is a domain change (ie, a switching action of one of the two switches), That is, a portion of the audio signal encoded by two subsequently encoded/processed signals. Preferably, the staggered fade is performed for other domain changes.

在該第一編碼信號或該第二處理的信號已透過具有例如50%重疊的一MDCT處理而產生之實施例中，每一時域取樣被包括在兩隨後訊框中。然而，由於該MDCT的多個特性，這並不導致一負擔，因為該MDCT是一關鍵取樣系統。在本文中，關鍵取樣意思是，頻譜值數目等於時域值數目。該MDCT的優點在於：在沒有一特定的交越區域的情況下提供交越效果以使得在沒有將違反關鍵取樣需求之任何負擔的情況下提供自一MDCT區塊至下一MDCT區塊之一交越。In embodiments where the first encoded signal or the second processed signal has been generated by an MDCT process having, for example, 50% overlap, each time domain sample is included in both subsequent frames. However, due to the multiple characteristics of the MDCT, this does not result in a burden because the MDCT is a critical sampling system. In this paper, critical sampling means that the number of spectral values is equal to the number of time domain values. An advantage of the MDCT is that it provides a crossover effect without a specific crossover region so as to provide from one MDCT block to one of the next MDCT block without any burden of violating critical sampling requirements. Crossover.

較佳地，在該第一編碼支路中的該第一編碼演算法基於一資訊槽模型，及在該第二編碼支路中的該第二編碼演算法是基於一資訊來源模型或一SNR模型。一SNR模型是並不特定地有關於一特定聲音產生機制但是是可例如基於一閉合迴路決策自多個編碼模式中選擇之一模式。因此，一SNR模型是任何可得的編碼模型，但是其未必必須有關於該聲音產生器的實體組成，但是其是與該資訊槽模型不同之任何參數化的編碼模型，可透過一閉合迴路決策及特定地透過比較來自不同模型之不同的SNR結果來選擇。Preferably, the first coding algorithm in the first coding branch is based on an information slot model, and the second coding algorithm in the second coding branch is based on an information source model or an SNR model. An SNR model is not specifically related to a particular sound generation mechanism but can be selected from one of a plurality of coding modes based, for example, on a closed loop decision. Therefore, an SNR model is any available coding model, but it does not necessarily have to have a physical composition of the sound generator, but it is any parameterized coding model that is different from the information slot model and can be determined through a closed loop And specifically by comparing different SNR results from different models.

如在第3c圖中所說明，提供一控制器300、525。此控制器可包括第1a圖之該決策級之多個功能且額外地可包括第1a圖之該開關裝置525的功能。大體上，該控制器是用於以一信號調節的方式來控制該第一開關及該第二開關。該控制器可操作地用以分析輸入至該第一開關或由該第一或該第二編碼支路輸出之一信號或針對一目標函數來自該第一及該第二編碼支路編碼及解碼而獲得之信號。可選擇地或額外地，該控制器可操作地用以分析輸入至該第二開關或由該第一處理支路或該第二處理支路輸入或透過針對一目標函數來自該第一處理支路及該第二處理支路之處理及反向處理而獲得之信號。As illustrated in Figure 3c, a controller 300, 525 is provided. The controller may include a plurality of functions of the decision stage of Figure 1a and additionally may include the functionality of the switching device 525 of Figure 1a. In general, the controller is for controlling the first switch and the second switch in a signal adjustment manner. The controller is operative to analyze a signal input to the first switch or output by the first or second encoding branch or encode and decode from the first and second encoding branches for an objective function And get the signal. Alternatively or additionally, the controller is operative to analyze input to the second switch or input or pass by the first processing branch or the second processing branch from the first processing branch for an objective function Signals obtained by processing and reverse processing of the road and the second processing branch.

在一實施例中，該第一編碼支路或該第二編碼支路包含一混疊引入時間/頻率轉換演算法，諸如與引入一混疊效果之一直接FFT轉換不同之一MDCT或一MDST演算法。此外，一或兩支路包含一量化器/熵編碼器區塊。特定地，只有該第二編碼支路之該第二處理支路包括引入一混疊操作之該時間/頻率轉換器，及該第二編碼支路之該第一處理支路包含一量化器及/或熵編碼器且不引入任何混疊效果。該混疊引入時間/頻率轉換器較佳地包含用於實施一分析視窗及一MDCT轉換演算法之一視窗化器。特定地，該視窗化器可操作地用於以一重疊方式將該視窗函數施於隨後訊框以使得一視窗化的信號之一取樣出現在至少兩隨後的視窗化訊框中。In an embodiment, the first encoding branch or the second encoding branch comprises an aliasing introduction time/frequency conversion algorithm, such as one of MDCT or one MDST different from one of the direct FFT conversions that introduces an aliasing effect. Algorithm. In addition, one or two branches include a quantizer/entropy encoder block. Specifically, only the second processing branch of the second encoding branch includes the time/frequency converter that introduces an aliasing operation, and the first processing branch of the second encoding branch includes a quantizer and / or entropy encoder and does not introduce any aliasing effect. The aliasing introduction time/frequency converter preferably includes a windower for implementing an analysis window and an MDCT conversion algorithm. In particular, the windower is operative to apply the window function to the subsequent frame in an overlapping manner such that one of the windowed signals is sampled in at least two subsequent window frames.

在一實施例中，該第一處理支路包含一ACELP編碼器及一第二處理支路包含一MDCT頻譜轉換器及用於量化頻譜分量以獲得量化的頻譜分量之量化器，其中每一量化的頻譜分量為零或由該等多個不同的可能的量化器索引之一量化器索引來定義。In one embodiment, the first processing branch includes an ACELP encoder and a second processing branch includes an MDCT spectrum converter and a quantizer for quantizing the spectral components to obtain quantized spectral components, wherein each quantization The spectral components are zero or defined by one of the plurality of different possible quantizer indices.

此外，較佳的是，該第一開關200以一開迴路方式運作而該第二開關以一閉合迴路方式運作。Moreover, preferably, the first switch 200 operates in an open circuit manner and the second switch operates in a closed loop manner.

如前所述，這兩編碼支路可操作地以按組方式來解碼該音訊信號，其中該第一開關或該第二開關以按組方式切換以使得一切換動作至少在一信號之一預定數目的取樣之一區塊之後發生，該預定數目針對相對應的開關形成一訊框長度。因此，用於該第一開關切換的區組可能為例如2048或1028取樣的一區塊，及該訊框長度(該第一開關200基於其切換)是可變的但較佳地固定於這樣相當長的週期。As previously described, the two encoding branches are operable to decode the audio signal in a group mode, wherein the first switch or the second switch is switched in a group mode such that a switching action is predetermined at least in one of the signals A number of samples occur after one of the blocks, the predetermined number forming a frame length for the corresponding switch. Therefore, the block used for the first switch switching may be a block of, for example, 2048 or 1028 samples, and the frame length (the first switch 200 is based on its switching) is variable but preferably fixed in such a manner A fairly long cycle.

與此相反，即，當該第二開關521自一模式至另一模式切換時，針對該第二開關521的該區塊長度實質上比針對該第一開關的該區塊長度較小。較佳地，針對開關的這兩區塊長度被選擇以使得該較長區塊長度是該較短區塊長度的整數倍。在該較佳實施例中，該第一開關之該區塊長度是2048或1024及該第二開關之該區塊長度是1024或又較佳地512及更較佳地256及更較佳地128取樣，以使得當該第一開關僅切換一單一次時該第二開關最多可切換16次。然而，一較佳最大區塊長度比是4：1。In contrast, when the second switch 521 is switched from one mode to another mode, the block length for the second switch 521 is substantially smaller than the block length for the first switch. Preferably, the two block lengths for the switch are selected such that the longer block length is an integer multiple of the shorter block length. In the preferred embodiment, the block length of the first switch is 2048 or 1024 and the block length of the second switch is 1024 or preferably 512 and more preferably 256 and more preferably 128 is sampled such that the second switch can be switched up to 16 times when the first switch is switched only once. However, a preferred maximum block length ratio is 4:1.

在一進一步的實施例中，該控制器300、525可操作地以相對於對音樂之一決策偏好對語音之一決策的一方式來執行針對該第一開關之一語音音樂區分。在此實施例中，即便當針對該第一開關之不到一訊框50%的一部分是語音及多於該訊框50%的該部分是音樂時也採用對語音之一決策。In a further embodiment, the controllers 300, 525 are operative to perform a speech music distinction for one of the first switches in a manner that determines one of the speech decisions relative to one of the music decisions. In this embodiment, one of the decisions on speech is taken even when less than 50% of the portion of the first switch is voice and more than 50% of the frame is music.

此外，當該第一訊框之一相當小的部分是語音及特定地是該較小第二訊框之長度的50%之該第一訊框的一部分是語音時，該控制器可操作地已經切換至該語音模式。因此，即便當例如只有一區塊的6%或12%相對應於該第一開關之該訊框長度時，一較佳語音/偏好切換決策已經切換至語音。In addition, when a relatively small portion of the first frame is voice and a portion of the first frame that is specifically 50% of the length of the smaller second frame is voice, the controller is operatively Switched to this voice mode. Thus, even when, for example, only 6% or 12% of a block corresponds to the frame length of the first switch, a preferred voice/preference switching decision has been switched to speech.

此程序是較佳地以完全利用在一實施例中具有一有聲語音核心之該第一處理支路的位元率節省能力且甚至對非語音的該大第一訊框之剩餘也不放鬆任何品質，由於該第二處理支路包括一轉換器及因此對於具有非語音信號的音訊信號也是有用的。較佳地，此第二處理包括一重疊MDCT，該重疊MDCT被關鍵取樣且即便在小視窗大小也提供一高效且免混疊操作，由於該時域混疊消除，諸如在該解碼器端的重疊及相加。此外，針對該第一編碼支路(較佳地是一類似AAC的MDCT編碼支路)之一大的區塊長度是有用的，因為非語音信號通常相當靜止及一長轉換視窗提供一高頻解析度及進而高品質，及額外地由於一感知控制的量化模組提供一位元率效率，該感知控制的量化模組在該第二編碼支路之該第二處理支路中也可施於該基於轉換的編碼模式。This procedure is preferably to fully utilize the bit rate saving capability of the first processing branch having a voiced speech core in an embodiment and even relax the remainder of the large first frame of non-speech. Quality, since the second processing branch includes a converter and is therefore also useful for audio signals having non-speech signals. Preferably, the second process comprises an overlapping MDCT that is critically sampled and provides an efficient and alias-free operation even in small window sizes due to the time domain aliasing cancellation, such as overlap at the decoder end. And add. Furthermore, a large block length for the first coding branch (preferably an AAC-like MDCT coding branch) is useful because the non-speech signal is typically quite stationary and a long conversion window provides a high frequency The resolution and thus the high quality, and additionally, because a perceptually controlled quantization module provides a one-bit rate efficiency, the perceptually controlled quantization module can also be applied in the second processing branch of the second encoding branch The conversion-based coding mode.

就第3d圖解碼器圖解而言，較佳地，該傳輸信號包括一顯式的指示器作為如在第3e圖中所說明的旁側資訊4a。此旁側資訊4a由未在第3d圖中說明的一位元流剖析器來擷取以將該相對應的第一編碼信號、第一處理信號或第二處理信號轉送至該正確的處理器，諸如在第3d圖中之該第一解碼支路、該第一反向處理支路或該第二反向處理支路。因此，解碼信號不僅具有該編碼/解碼的信號而且包括相關於這些信號之旁側資訊。然而，在其它實施例中，可存在允許一解碼器端位元流剖析器在某些信號之間區分之一隱式傳訊。就第3e圖而言，概述的是，該第一處理信號或該第二處理信號是該第二編碼支路及進而該第二編碼信號之輸出。In the case of the 3d diagram decoder illustration, preferably, the transmission signal includes an explicit indicator as the side information 4a as illustrated in Figure 3e. The side information 4a is retrieved by a one-bit stream parser not illustrated in FIG. 3d to forward the corresponding first encoded signal, first processed signal or second processed signal to the correct processor The first decoding branch, the first reverse processing branch or the second reverse processing branch, such as in Figure 3d. Therefore, the decoded signal not only has the encoded/decoded signal but also includes side information related to these signals. However, in other embodiments, there may be one implicit communication that allows a decoder end bitstream parser to distinguish between certain signals. In the case of FIG. 3e, it is outlined that the first processed signal or the second processed signal is the output of the second encoding branch and thus the second encoded signal.

較佳地，該第一解碼支路及/或該第二反向處理支路包括自該頻譜域轉換至該時域之一MDCT換。為此目的，提供一重疊相加器來執行一時域混疊消除功能，該域混疊消除功能與此同時提供一交錯淡出效果以避免區塊化偽影。大體上，該第一解碼支路將在該第四域中編碼的一單一編碼轉換至該第一域中，而該第二反向處理支路執行自該第三域至該第二域的一轉換，及隨後連接至該第一組合器之該轉換器提供自該第二域至該第一域之一轉換以使得在該組合器600的輸入只有第一域信號，這在第3d圖實施例中表示該解碼的輸出信號。Preferably, the first decoding branch and/or the second reverse processing branch comprises switching from the spectral domain to one of the time domain MDCT exchanges. To this end, an overlay adder is provided to perform a time domain aliasing cancellation function that simultaneously provides a staggered fade out effect to avoid tiling artifacts. In general, the first decoding branch converts a single code encoded in the fourth domain to the first domain, and the second reverse processing branch executes from the third domain to the second domain a conversion, and then the converter connected to the first combiner provides a conversion from the second domain to the first domain such that the input at the combiner 600 has only the first domain signal, which is in the 3d map The decoded output signal is represented in the embodiment.

第4a圖及第4b圖說明兩不同的實施例，它們在該開關200的定位上不同。在第4a圖中，該開關200定位在共同預處理級100之一輸出與該兩編碼的支路400、500之輸入之間。第4a圖實施例確保該音訊信號僅輸入至一單一編碼支路中，而並未連接至該共同預處理級的輸出之另一編碼支路沒有運作及因而被關掉或在一休眠模式。此實施例是較佳的在於：該非活動編碼支路並不消耗對行動應用(尤其是受電池供電及因而具有對電力消耗的一般限制之行動應用)是有用的電力及運算資源。Figures 4a and 4b illustrate two different embodiments which differ in the positioning of the switch 200. In Figure 4a, the switch 200 is positioned between the output of one of the common pre-processing stages 100 and the input of the two encoded branches 400, 500. The embodiment of Fig. 4a ensures that the audio signal is only input into a single encoding branch, while another encoding branch that is not connected to the output of the common preprocessing stage is not operational and is thus turned off or in a sleep mode. This embodiment is preferred in that the inactive coding branch does not consume power and computing resources that are useful for mobile applications, particularly mobile applications that are battery powered and thus have general limitations on power consumption.

然而，另一方面，當電力消耗不是一問題時，第4b圖實施例可能是較佳的。在此實施例中，編碼支路400、500都一直是活動的，且只有針對某一時間部分及/或某一頻率部分之該已選定的編碼支路之輸出被轉送至可作為一位元流多工器800而實施之該位元流格式器。因此，在第4b圖實施例中，這兩編碼支路都一直是活動的，及由該決策級300所選定的一編碼支路之輸出進入該輸出位元流，而另一未選定的編碼支路400之輸出被丟棄，即沒有進入該輸出位元流，即該編碼的音訊信號。On the other hand, however, the embodiment of Figure 4b may be preferred when power consumption is not an issue. In this embodiment, the encoding branches 400, 500 are always active, and only the output of the selected encoding branch for a certain time portion and/or a certain frequency portion is forwarded to a one-bit element. The bit stream formatter implemented by stream multiplexer 800. Thus, in the embodiment of Figure 4b, both of the encoding branches are always active, and the output of an encoding branch selected by the decision stage 300 enters the output bit stream, while another unselected encoding The output of branch 400 is discarded, i.e., the output bit stream is not entered, i.e., the encoded audio signal.

較佳地，該第二編碼規則/解碼規則是一基於LPC的編碼演算法。在基於LPC的語音編碼中，給出準週期性類似脉衝激發信號段或信號部分與類似雜訊激發信號段或信號部分之間之一區別。這是針對如在第7b圖中之很低位元率LPC語音編碼器(2.4kbps)而執行。然而，在中等速率CELP編碼器中，該激發是針對來自一適應性碼簿及一固定碼簿之標度向量的相加而獲得。Preferably, the second encoding rule/decoding rule is an LPC-based encoding algorithm. In LPC-based speech coding, a distinction is made between a quasi-periodic similar pulsed excitation signal segment or signal portion and a similar noise excitation signal segment or signal portion. This is performed for a very low bit rate LPC speech coder (2.4 kbps) as in Figure 7b. However, in medium rate CELP encoders, the excitation is obtained for the addition of scale vectors from an adaptive codebook and a fixed codebook.

準週期性類似脉衝激發信號段，即具有一特定音高的信號段，與類似雜訊的激發信號相比以不同的機制來編碼。當準週期性類似脉衝激發信號連接至有聲語音時，類似雜訊的信號有關於無聲的語音。A quasi-periodic similar to a pulsed excitation signal segment, i.e., a signal segment having a particular pitch, is encoded by a different mechanism than a similar noise-like excitation signal. When a quasi-periodic like pulse excitation signal is connected to a voiced speech, the noise-like signal is about silent speech.

示範地，參考第5a圖至第5d圖。這裡，準週期性類似脉衝信號段或信號部分與類似雜訊信號段或信號部分是示範地討論。特定地，在第5a圖該時域及第5b圖該頻域中說明的一有聲語音是作為針對一準週期性類似脉衝信號部分之一範例而討論，及針對第5c圖及第5d圖而討論的一無聲語音段作為一類似雜訊信號部分之一範例。語音可大體上被分類為有聲的、無聲的或混合的。在第5a圖至第5d圖顯示了針對取樣的有聲及無聲段之時間及頻率域圖。有聲語音在時域中是準週期的且在頻域中是諧波建構的，而無聲語音是類似隨機的且寬頻。有聲語音之短時間頻譜以其細諧波共振峰結構為特徵。該細諧波結構是語音之準週期性的結果且有助於振動聲帶(vocal chord)。該共振峰結構(頻譜包絡)是由於來源與聲道(vocal tract)的交互作用。聲道由咽與口腔組成。由於聲門脉衝，“適合”有聲語音之短時間頻譜的該頻譜包絡之形狀與聲帶及頻譜傾斜(6db/八音度)之轉移特性相關聯。該頻譜包絡以被稱為共振峰的一組峰值為特徵。該等共振峰是聲帶的該等共振模式。對於一般聲帶，存在三至五個共振峰在5kHz以下。該前面三個共振峰之振幅及位置(通常出現在3kHz以下)在語音合成級感知上都十分重要。對於寬頻帶及無聲語音表示較高共振峰也是重要的。語音的該等屬性有關於如下的物理語音產生系統。有聲語音藉由用該震動聲帶所產生的準週期聲門空氣脉衝來激發聲道而產生。該等週期型的脉衝之頻率稱為基本頻率或音高。無聲語音藉由迫使空氣經過聲道內之一壓縮而產生。鼻音藉由突然釋放道內閉合后形成之空氣壓力而產生。For illustrative purposes, reference is made to Figures 5a through 5d. Here, quasi-periodic similar to pulse signal segments or signal portions are similarly discussed with similar noise signal segments or signal portions. Specifically, a voiced voice illustrated in the time domain of FIG. 5a and the frequency domain of FIG. 5b is discussed as an example for a quasi-periodic similar pulse signal portion, and for the 5c and 5d maps. A silent speech segment is discussed as an example of a similar noise signal portion. Speech can be generally classified as vocal, silent, or mixed. The time and frequency domain plots for the sounded and unvoiced segments for sampling are shown in Figures 5a through 5d. Voiced speech is quasi-periodic in the time domain and harmonically constructed in the frequency domain, while silent speech is similarly random and broadband. The short time spectrum of voiced speech is characterized by its fine harmonic formant structure. This fine harmonic structure is the result of quasi-periodicity of speech and contributes to the vocal chord. This formant structure (spectral envelope) is due to the interaction of the source and the vocal tract. The channel consists of the pharynx and the mouth. Due to the glottal pulse, the shape of the spectral envelope of the short time spectrum "suitable" for voiced speech is associated with the transfer characteristics of the vocal cord and spectral tilt (6 db/octave). The spectral envelope is characterized by a set of peaks called formants. These formants are these resonant modes of the vocal cords. For a typical vocal cord, there are three to five formants below 5 kHz. The amplitude and position of the first three formants (usually below 3 kHz) are important in speech synthesis level perception. It is also important to represent higher formants for wideband and unvoiced speech. These attributes of speech are related to the physical speech production system as follows. Voiced speech is produced by exciting the channel with a quasi-periodic glottal air pulse generated by the vibrating vocal cord. The frequency of these periodic pulses is called the fundamental frequency or pitch. Silent speech is produced by forcing air through one of the channels to compress. The nasal sound is produced by a sudden release of air pressure formed after the closure of the passage.

因此，該音訊信號之一類似雜訊部分不顯示如在第5c圖說明之任何類似脉衝的時域結構也不顯示及如在第5d圖中說明之諧波頻域結構，其與例如在第5a圖及第5b圖中所說明的該準週期性類似脉衝部分不同。然而，如隨後概述，在針對該激發信號的一LPC之後也可觀測到類似雜訊部分與準週期性類似脉衝部分之間的不同。該LPC是模擬聲道並自該信號擷取該等聲道之激發的一方法。Therefore, one of the audio signals, like the noise portion, does not display the time domain structure of any similar pulse as illustrated in FIG. 5c and does not display the harmonic frequency domain structure as illustrated in FIG. 5d, which is, for example, The quasi-periodic similar pulse portions illustrated in Figures 5a and 5b are different. However, as will be outlined later, a difference between a similar noise portion and a quasi-periodic similar pulse portion can also be observed after an LPC for the excitation signal. The LPC is a method of simulating a channel and extracting the excitation of the channels from the signal.

此外，準週期性類似脉衝部分及類似雜訊部分可以及時出現，即，這意味著，該音訊信號之一部分在時間上是雜訊的且該音訊信號之另一部分在時間上是準週期性的，即音調的。可選擇地或額外地，一信號之特性在不同頻帶中可不同。因此，該音訊信號是雜訊的還是音調的之決定也可是頻率選擇性地執行以使得某一頻帶或幾個頻帶被認為是雜訊的而其它頻帶被認為是音調的。在此情況中，該音訊信號之某一時間部分可能包括音調分量及雜訊分量。In addition, quasi-periodic-like pulse portions and similar noise portions may appear in time, that is, meaning that one portion of the audio signal is noise in time and another portion of the audio signal is quasi-periodic in time. , that is, the tone. Alternatively or additionally, the characteristics of a signal may vary in different frequency bands. Therefore, the decision whether the audio signal is noise or pitch can also be frequency selective such that a certain frequency band or bands are considered to be noise and other bands are considered to be tones. In this case, a certain time portion of the audio signal may include a tonal component and a noise component.

第7a圖說明一語音產生系統之一線性模型。此系統假定一個二級激發，即，如第7c圖所示一脉衝序列針對有聲語音及如第7d圖所示一隨機雜訊針對無聲語音。聲道被模擬為處理由聲門模型72產生之第7c圖或第7d圖的脉衝之一全極點濾波器70。因此，第7a圖之系統可縮至具有一增益級之第7b圖的一全極點濾波器、一轉送路徑、一回饋路徑79及一相加級80。在該回饋路徑79，存在一預測濾波器81，及可使用如下的z域功能來表示在第7b圖中說明的整個模擬來源合成系統：Figure 7a illustrates a linear model of a speech production system. This system assumes a two-level excitation, i.e., a pulse sequence as shown in Figure 7c for voiced speech and a random noise as shown in Figure 7d for silent speech. The channel is modeled as one of the pulsed all-pole filters 70 of the 7c or 7d map generated by the glottal model 72. Thus, the system of Figure 7a can be scaled down to an all-pole filter having a gain stage of Figure 7b, a transfer path, a feedback path 79, and an add stage 80. In the feedback path 79, there is a prediction filter 81, and the following z-domain function can be used to represent the entire analog source synthesis system illustrated in Figure 7b:

S(z)=g/(1-A(z))‧X(z),S(z)=g/(1-A(z))‧X(z),

其中g表示增益，A(z)是由一LP分析決定之預測濾波器，X(z)是激發信號，及S(z)是合成語音輸出。Where g is the gain, A(z) is the prediction filter determined by an LP analysis, X(z) is the excitation signal, and S(z) is the synthesized speech output.

第7c圖及第7d圖給出使用該線性來源系統模型之有聲及無聲語音合成之一圖形時域描述。此系統及在上面等式中的該等激發參數是未知的而必須根據語音取樣之一有限組來決定。使用該輸入信號之一線性預測及該等濾波器係數之一量化來獲得A(z)的該等係數。在一p階轉送線性預測器中，該語音序列之目前取樣是根據p通過取樣之一線性組合來預測。該等預測器係數可由習知演算法來決定，諸如禮賓生-杜賓(Levinson-Durbin)演算法或一般地一自動相關方法或一反射方法。Figures 7c and 7d show a graphical time domain description of one of the vocal and unvoiced speech synthesis using the linear source system model. The system and the excitation parameters in the above equation are unknown and must be determined based on a limited set of speech samples. The coefficients of A(z) are obtained using linear prediction of one of the input signals and quantization of one of the filter coefficients. In a p-th order transfer linear predictor, the current samples of the speech sequence are predicted based on a linear combination of p through sampling. The predictor coefficients can be determined by conventional algorithms, such as the Levinson-Durbin algorithm or generally an autocorrelation method or a reflection method.

第7e圖說明該LPC分析區塊510之一較詳細的實施。該音訊信號輸入至決定該濾波器資訊(A(z))的一濾波器決定區塊。此資訊作為一解碼器需要的短期預測資訊而輸出。該實際預測濾波器85需要該短期預測資訊。在一減法器86中，該音訊信號之一目前取樣被輸入及針對該目前取樣之一預測值被相減以使得在線84產生該預測誤差信號。在第7c圖或第7d圖中很示意地說明了此類預測誤差信號取樣之一序列。因此，第7a、7b圖可被認為當作一修正的類似脉衝信號。Figure 7e illustrates a more detailed implementation of one of the LPC analysis blocks 510. The audio signal is input to a filter decision block that determines the filter information (A(z)). This information is output as short-term prediction information required by a decoder. The actual prediction filter 85 requires the short-term prediction information. In a subtractor 86, one of the audio signals is currently sampled and subtracted for one of the current samples to cause the line 84 to generate the prediction error signal. A sequence of such prediction error signal samples is schematically illustrated in Figure 7c or Figure 7d. Therefore, the 7a, 7b diagram can be considered as a modified similar pulse signal.

第7e圖說明計算該激發信號之一較佳方式，第7f圖說明計算該加權信號之一較佳方式。與第7e圖對比，當γ不是1時，該濾波器85不同。對於γ，A值小於1是較佳的。此外，出現該區塊87，及μ較佳的是小於1的一數。大體上，在第7e圖及第7f圖中的該等元件可如在3GPP TS 26.190或3GPP TS 26.290中實施。Figure 7e illustrates a preferred manner of calculating the excitation signal, and Figure 7f illustrates a preferred manner of calculating the weighted signal. In contrast to Fig. 7e, when γ is not 1, the filter 85 is different. For γ, an A value of less than 1 is preferred. Further, the block 87 appears, and μ is preferably a number less than one. In general, the elements in Figures 7e and 7f can be implemented as in 3GPP TS 26.190 or 3GPP TS 26.290.

第7g圖說明可施於該解碼器端(諸如第2b圖中的元件537)上之一反向處理。特定地，區塊88自該加權信號產生一未加權信號及區塊89根據該未加權信號計算一激發。一般地，處理第7g圖中的該未加權信號以外的所有信號在該LPC域中，但該激發信號與該加權信號在同一域中是不同信號。區塊89輸出一激發信號，該激發信號隨後可連同區塊536的輸出而使用。接著，在第2b圖中的區塊540可執行該共同反向LPC轉換。Figure 7g illustrates one of the inverse processing that can be applied to the decoder side (such as element 537 in Figure 2b). Specifically, block 88 generates an unweighted signal from the weighted signal and block 89 calculates an excitation based on the unweighted signal. In general, all signals other than the unweighted signal in the 7g graph are processed in the LPC domain, but the excitation signal is a different signal in the same domain as the weighted signal. Block 89 outputs an excitation signal, which can then be used in conjunction with the output of block 536. This common reverse LPC conversion can then be performed at block 540 in Figure 2b.

隨後地，將針對第6圖討論一綜合分析CELP編碼器以說明施於此演算法的多個修改。此CELP編碼器在1994年十月IEEE學報第82卷第10號第1541至1585頁Andreas Spaniasdi的“Speech Coding:A Tutorial Review”中詳細討論。在第6圖中說明的該CELP編碼器包括一長期預測分量60及一短期預測分量62。此外，使用在64指示之一碼簿。在66實施一感知加權濾波器W(z)，及在68提供一誤差最小化控制器。s(n)是該時域輸入信號。在已被感知加權之後，該加權信號輸入至一減法器69中，計算在區塊66之輸出的該加權合成信號與原始加權信號s_w (n)之間的誤差。一般地，該等短期預測濾波器係數A(z)由一LP分析級來計算且其係數在(z)上被量化，如在第7e圖中所示。對在該LPC分析級(在第7e圖中為10a)之輸出的該預測誤差信號計算包括該長期預測增益g及該向量量化索引(即碼簿參考)之該長期預測資訊A_L (z)。該等LTP參數是音高延遲及增益。在CELP中，這通常是作為包含過去激發信號(而非殘餘)之一適應性碼簿而實施。該適應性CB延遲及增益是藉由最小化該均方加權誤差(閉迴路音高搜尋)而發現。Subsequently, a comprehensive analysis of the CELP encoder will be discussed for Figure 6 to illustrate a number of modifications to this algorithm. This CELP encoder is discussed in detail in "Speech Coding: A Tutorial Review" by Andreas Spaniasdi, IEEE Transactions 82, No. 10, pp. 1541 to 1585, October 1994. The CELP encoder illustrated in FIG. 6 includes a long-term prediction component 60 and a short-term prediction component 62. Also, use one of the code books at 64. A perceptual weighting filter W(z) is implemented at 66, and an error minimization controller is provided at 68. s(n) is the time domain input signal. After having been perceptually weighted, the weighted signal is input to a subtractor 69 which calculates the error between the weighted composite signal output at block 66 and the original weighted signal s _w (n). Generally, the short-term prediction filter coefficients A(z) are calculated by an LP analysis stage and their coefficients are (z) is quantized as shown in Figure 7e. Calculating the long-term prediction information A _L (z) including the long-term prediction gain g and the vector quantization index (ie, the codebook reference) for the prediction error signal at the output of the LPC analysis stage (10a in FIG. 7e) . These LTP parameters are pitch delay and gain. In CELP, this is usually implemented as an adaptive codebook containing past excitation signals (rather than residuals). The adaptive CB delay and gain are found by minimizing the mean squared weighted error (closed loop pitch search).

接著，該CELP演算法對在該短期及長期預測之後使用例如Gaussian序列的一碼簿而獲得的該殘餘信號編碼。該ACELP演算法(其中“A”代表“代數的”)具有一特定代數設計的碼簿。Next, the CELP algorithm encodes the residual signal obtained using a codebook such as a Gaussian sequence after the short-term and long-term prediction. The ACELP algorithm (where "A" stands for "algebraic") has a codebook of a particular algebraic design.

一碼簿可包含或多或少的向量，其中每一向量為一些取樣長。一增益因子g改變該碼向量的大小及該增益的碼由該長期預測合成濾波器及該短期預測合成濾波器濾波。該“最佳”碼向量被選擇以使得在該減法器69的輸出該感知加權均方誤差被最小化。如第6圖說明，由一綜合分析最佳化來完成該搜尋過程。A codebook can contain more or less vectors, each of which is some sample length. A gain factor g changes the size of the code vector and the code of the gain is filtered by the long term prediction synthesis filter and the short term prediction synthesis filter. The "best" code vector is selected such that the perceptual weighted mean square error at the output of the subtractor 69 is minimized. As illustrated in Figure 6, the search process is completed by a comprehensive analysis optimization.

對於特定情況，當一訊框是無聲與有聲語音之一混合或當音樂中的語音出現時，一TCX編碼可較適於編碼該LPC域中的該激發。該TCX編碼在沒有對激發產生作任何假設的情況下處理該頻域中的該加權信號。該TCX於是比CELP編碼較一般且不限制於該激發之一有聲或一無聲來源模型。TCX仍是一來源導向的模型編碼，使用一線性預測濾波器模擬該等具語音特徵信號之該等共振峰。For a particular situation, a TCX encoding may be more suitable for encoding the excitation in the LPC domain when the frame is mixed with one of the voiced voices or when the voice in the music appears. The TCX code processes the weighted signal in the frequency domain without making any assumptions about the excitation. The TCX is then more general than the CELP code and is not limited to one of the excitations or a silent source model. The TCX is still a source-oriented model code that simulates the formants of the speech signature signals using a linear prediction filter.

在AMR-WB+-編碼中，自該AMR-WB+描述中得知進行不同TCX模式與ACELP之間之一選擇。該等TCX模式的不同在於針對不同模式按組離散傅立葉轉換的長度是不同的及該最佳模式可透過一綜合分析方法或一直接“前饋”模式來選擇。In AMR-WB+-encoding, one of the choices between different TCX modes and ACELP is known from the AMR-WB+ description. The difference in the TCX modes is that the lengths of the discrete Fourier transforms for different modes are different and the best mode can be selected by a comprehensive analysis method or a direct "feedforward" mode.

如結合第2a圖及第2b圖討論，該共同預處理級100較佳地包括一聯合多聲道(環繞/聯合立體聲裝置)101及額外地一頻寬延伸級102。相對應地，該解碼器包括一頻寬延伸級701及一隨後的連接聯合多聲道級702。較佳地，就該編碼器而言，該聯合多聲道級101在該頻寬延伸級102之前連接，而在該解碼器端，就該信號處理方向而言，該頻寬延伸級701在該聯合多聲道級702之前連接。然而，可選擇地，該共同預處理級可包括在沒有該隨後連接的頻寬延伸級的情況下之一聯合多聲道級或在沒有一連接的聯合多聲道級的情況下之一頻寬延伸級。As discussed in connection with Figures 2a and 2b, the common pre-processing stage 100 preferably includes a joint multi-channel (surround/combined stereo) 101 and additionally a bandwidth extension stage 102. Correspondingly, the decoder includes a bandwidth extension stage 701 and a subsequent connected joint multi-channel stage 702. Preferably, in the case of the encoder, the joint multi-channel stage 101 is connected before the bandwidth extension stage 102, and at the decoder end, the bandwidth extension stage 701 is in the signal processing direction. The joint multi-channel stage 702 is previously connected. Alternatively, however, the common pre-processing stage may comprise one of the combined multi-channel stages in the absence of the subsequently extended bandwidth extension stage or in the absence of a connected joint multi-channel stage. Wide extension.

在第8圖的脈絡中說明了在該編碼器端101a、101b及在該解碼器端702a及702b上的一聯合多聲道級之一較佳範例。E數個原始輸入通道輸入至該降混器101a以使得該降混器產生K數個傳輸的通道，其中該數K大於或等於一及小於或等於E。A preferred example of one of the combined multi-channel stages at the encoder ends 101a, 101b and at the decoder ends 702a and 702b is illustrated in the context of FIG. E number of original input channels are input to the downmixer 101a such that the downmixer produces K number of transmitted channels, wherein the number K is greater than or equal to one and less than or equal to E.

較佳地，該E個輸入通道輸入至產生參數資訊之一聯合多聲道參數分析器101b。用諸如一不同的編碼及隨後的霍夫曼(Huffman)編碼或可選擇地隨後的算術編碼來較佳地熵編碼此參數資訊。由區塊101b輸出之該編碼的參數資訊被傳輸至可以是第2b圖中項702的一部分之一參數解碼器702b。該參數解碼器702b對該傳輸的參數資訊解碼並將該解碼資訊轉送至該上混器702a。該上混器702a接收該K傳輸的通道並產生L數個輸出通道，其中該數L大於或等於K且小於或等於E。Preferably, the E input channels are input to one of the generation parameter information in conjunction with the multi-channel parameter analyzer 101b. This parameter information is preferably entropy encoded with, for example, a different encoding and subsequent Huffman encoding or optionally subsequent arithmetic encoding. The encoded parameter information output by block 101b is transmitted to parameter decoder 702b which may be part of item 702 in Figure 2b. The parameter decoder 702b decodes the transmitted parameter information and forwards the decoded information to the upmixer 702a. The upmixer 702a receives the K transmitted channel and generates L number of output channels, wherein the number L is greater than or equal to K and less than or equal to E.

參數資訊可包括內部通道位準差異、內部通道時間差異、內部通道相位差異及/或內部通道一致量測，如自BCC技術已知或如在MPEG環繞標準中已知或詳細描述。傳輸通道數可以是針對超低位元率應用之一單一單通道或可包括一相容的立體聲應用或可包括一相容的立體聲信號即兩通道。典型地，該E數個輸入通道可以是五個或可能更高。可選擇地，如在空間音訊對象編碼(SAOC)的脈絡中已知，該E數個輸入通道也可是E個音訊對象。Parameter information may include internal channel level differences, internal channel time differences, internal channel phase differences, and/or internal channel consistent measurements, as known from BCC techniques or as known or detailed in the MPEG Surround Standard. The number of transmission channels can be a single single channel for ultra low bit rate applications or can include a compatible stereo application or can include a compatible stereo signal or two channels. Typically, the E number of input channels can be five or possibly higher. Alternatively, as is known in the context of spatial audio object coding (SAOC), the E number of input channels may also be E audio objects.

在一實施中，該降混器執行對該原始E個輸入通道之一加權或未加權相加或對該E個輸入音訊對象之一想家。如果音訊對象作為輸入通道，該聯合多聲道參數分析器101b將計算音訊對象參數，諸如較佳地針對每一時間部分及更較佳地針對每一頻帶之該等音訊對象之間的一相關矩陣。為此目的，整個頻率範圍可劃分為至少10且較佳地32或64頻帶。In one implementation, the downmixer performs a weighted or unweighted addition to one of the original E input channels or homes one of the E input audio objects. If the audio object is used as an input channel, the joint multi-channel parameter analyzer 101b will calculate audio object parameters, such as a correlation between the audio objects preferably for each time portion and, more preferably, for each frequency band. matrix. For this purpose, the entire frequency range can be divided into at least 10 and preferably 32 or 64 bands.

第9圖說明該頻寬延伸級102(在第2a圖中)與該相對應的頻寬延伸級701(在第2b圖中)之實施之一較佳實施例。在該解碼器端，該頻寬擴展區塊102較佳地包括一低通過濾波區塊102b、在該低通之後或是該反向QMF的一部分、只在該等QMF頻帶一半發揮作用之一降取樣器區塊及一高頻帶分析器102a。輸入至該頻寬擴展區塊102中的該原始音訊信號被低通濾波以產生該低頻信號，該低頻信號接著輸入至該等編碼支路及/或該開關。該低通濾波器具有可以在3kHz至10kHz之一範圍內的一截止頻率。此外，該頻寬擴展區塊102進一步包括一高頻帶分析器，該高頻帶分析器用於計算該等頻寬擴展參數，諸如一頻譜包絡參數資訊、一雜訊層參數資訊、一反向濾波參數資訊、有關於該高頻帶中某些諧波線之進一步的參數資訊及如在該MPEG-4標準有關於頻帶複製的章節中詳細討論之額外的參數。Figure 9 illustrates a preferred embodiment of the implementation of the bandwidth extension stage 102 (in Figure 2a) and the corresponding bandwidth extension stage 701 (in Figure 2b). At the decoder side, the bandwidth extension block 102 preferably includes a low pass filtering block 102b, after the low pass or a portion of the reverse QMF, only one of the functions in the QMF band. A downsampler block and a high band analyzer 102a. The original audio signal input to the bandwidth extension block 102 is low pass filtered to produce the low frequency signal, which is then input to the encoding branches and/or the switch. The low pass filter has a cutoff frequency that can range from one of 3 kHz to 10 kHz. In addition, the bandwidth extension block 102 further includes a high frequency band analyzer for calculating the bandwidth extension parameters, such as a spectral envelope parameter information, a noise layer parameter information, and an inverse filtering parameter. Information, further parameter information about certain harmonic lines in the high frequency band and additional parameters as discussed in detail in the section on band replication in the MPEG-4 standard.

在該解碼器端，該頻寬擴展區塊701包括一補節機(patcher)701a、一調整期701b及一組合器701c。該組合器701c將該解碼的低頻信號與該調整器701b輸出之該重建的及調整的高頻信號相組合。一補節機提供至該調整器701b的輸入，該補節機被操作以自該低頻信號取得該高頻信號，諸如透過頻帶複製或一般地透過頻寬擴展。由該補節機所執行的該補節可以是以一諧波方式或一非諧波方式執行的一補節。由該補節機701a所產生的該信號隨後被使用該傳輸的參數頻寬擴展資訊之該調整器701b調整。At the decoder end, the bandwidth extension block 701 includes a patch 701a, an adjustment period 701b, and a combiner 701c. The combiner 701c combines the decoded low frequency signal with the reconstructed and adjusted high frequency signal output by the adjuster 701b. A splicer provides an input to the adjuster 701b that is operative to retrieve the high frequency signal from the low frequency signal, such as through band replication or generally through bandwidth spreading. The suffix performed by the splicing machine may be a splicing performed in a harmonic manner or a non-harmonic manner. The signal generated by the repeater 701a is then adjusted by the adjuster 701b using the transmitted parameter bandwidth extension information.

如在第8圖及第9圖所示，在一較佳實施例中該等描述的區塊可具有一模式控制輸入。此模式控制輸入自該決策級300輸出信號取得。在此一較佳實施例中，一相對應的區塊之一特性可適於該決策級輸出，即無論在一較佳實施例中對語音之一決策或對音樂之一決策是針對該音訊信號之某一時間部分而作出。較佳地，該模式控制僅有關於這些區塊之該等功能之一或多個功能而非有關於區塊的所有功能。例如，該決策可僅影響該補節機701a而不影響在第9圖中的其它區塊，或例如可僅影響第8圖中的該聯合多聲道參數分析器101b而沒有第8圖中的其它區塊。此實施是較佳地以使得藉由在該共同預處理級中靈活地提供而獲得一較高靈活性且較高品質且較低位元率輸出信號。然而，另一方面，在該共同預處理級中針對這兩種信號之演算法的使用允許實施一高效編碼/解碼方案。As shown in Figures 8 and 9, in the preferred embodiment, the blocks described may have a mode control input. This mode control input is derived from the decision stage 300 output signal. In this preferred embodiment, one of the characteristics of a corresponding block may be adapted to the decision level output, i.e., in one preferred embodiment, one of the speech decisions or one of the music decisions is for the audio. Made at a certain time part of the signal. Preferably, the mode control has only one or more of the functions of the blocks, and not all functions related to the blocks. For example, the decision may only affect the splicing machine 701a without affecting other blocks in FIG. 9, or may, for example, affect only the joint multi-channel parameter analyzer 101b in FIG. 8 without the eighth picture Other blocks. This implementation is preferably such that a higher flexibility and higher quality and lower bit rate output signal is obtained by being flexibly provided in the common pre-processing stage. However, on the other hand, the use of algorithms for these two signals in this common pre-processing stage allows for an efficient encoding/decoding scheme to be implemented.

第10a圖及第10b圖說明該決策級300之兩不同的實施。在第10a圖中指示了一開迴路決策。這裡，在該決策級中的該信號分析器300a具有某些規則以決定該輸入信號之特定時間部分或某一頻率部分是否具有需要此信號部分由該第一編碼支路400或該第二編碼支路500來編碼之一特性。為此目的，該信號分析器300a可分析到該共同預處理級的該音訊輸入信號或可分析由該共同預處理級輸出的該音訊信號(即該音訊中間信號)或可分析在該共同預處理級中的一中間信號，諸如可以是一單通道信號或可以是具有k通道的一信號(在第8圖中所示)之降混信號之輸出。在該輸出端，該信號分析器300a產生用於控制在該編碼器端上的該開關200及在該解碼器端上的該相對應的開關600或該組合器600之切換決策。Figures 10a and 10b illustrate two different implementations of the decision stage 300. An open loop decision is indicated in Figure 10a. Here, the signal analyzer 300a in the decision stage has certain rules to determine whether a particular time portion or a certain frequency portion of the input signal has a portion of the signal required by the first encoding branch 400 or the second encoding. Branch 500 is used to encode one of the characteristics. For this purpose, the signal analyzer 300a may analyze the audio input signal of the common pre-processing stage or may analyze the audio signal output by the common pre-processing stage (ie, the audio intermediate signal) or may analyze the common pre-preparation An intermediate signal in the processing stage, such as may be a single channel signal or may be the output of a downmix signal of a signal having a k channel (shown in Figure 8). At the output, the signal analyzer 300a generates a switching decision for controlling the switch 200 on the encoder side and the corresponding switch 600 or the combiner 600 on the decoder side.

雖然沒有針對該第二開關521詳細討論，但是要強調的是，該第二開關521可以以與如針對第4a圖及第4b圖討論之該第一開關200相類似的一方式而被定位。因此，在第3c圖中開關521之一可選擇的位置在兩處理支路522、523、524之輸出以使得這兩處理支路並行運作且只有一處理支路的輸出經由未在第3c圖中說明之一位元流成型器寫入至一位元流。Although not discussed in detail for the second switch 521, it is emphasized that the second switch 521 can be positioned in a manner similar to the first switch 200 as discussed for Figures 4a and 4b. Thus, in Figure 3c one of the switches 521 is selectable at the output of the two processing branches 522, 523, 524 such that the two processing branches operate in parallel and only one of the processing branches outputs via the third cc One bit stream former is described as being written to a bit stream.

此外，該第二組合器600可具有如在第4c圖中討論之一特定交錯淡出功能。可選擇地或額外地，該第一組合器532可能具有相同的交錯淡出功能。此外，這兩組合器可具有相同的交錯淡出功能或可具有不同的交錯淡出功能或可根本沒有交錯淡出功能以使得這兩組合器在沒有任何額外的交錯淡出功能的情況下切換。Additionally, the second combiner 600 can have a particular interleaved fade function as discussed in Figure 4c. Alternatively or additionally, the first combiner 532 may have the same staggered fade out function. In addition, the two combiners may have the same staggered fade-out function or may have different interlace fade-out functions or may have no interlace fade-out function at all to allow the two combiners to switch without any additional interlace fade-out functionality.

如前面討論，可透過如針對第10a圖及第10b圖討論之一開迴路決策或一閉迴路決策來控制這兩開關，其中第3c圖中的該控制器300、525針對這兩開關可具有不同的或相同的功能。As discussed above, the two switches can be controlled by an open loop decision or a closed loop decision as discussed for Figures 10a and 10b, wherein the controllers 300, 525 in Figure 3c can have Different or the same function.

此外，信號適應性的一時間扭曲功能可不僅存在於該第一編碼支路或第一解碼支路中而且也可存在在該編碼器端上與該解碼器上的該第二編碼支路之該第二處理支路中。視一處理的信號而定，這兩時間扭曲功能可具有相同的時間扭曲資訊以使得相同的時間扭曲施於在該第一域及該第二域中的該等信號上。這節省了處理量且可能在一些實例中是有用的，在隨後區塊具有一類似時間扭曲時間特性的情況中。然而，在可選擇的實施例中，較佳地具有獨立的時間扭曲估計器來針對該第一編碼支路及在該第二編碼支路中的該第二處理支路。In addition, a time warping function of signal adaptation may exist not only in the first encoding branch or the first decoding branch but also on the encoder side and the second encoding branch on the decoder. The second processing branch. Depending on the processed signal, the two time warping functions may have the same time warping information such that the same time warping is applied to the signals in the first domain and the second domain. This saves throughput and may be useful in some instances where the subsequent block has a similar time warp time characteristic. However, in an alternative embodiment, it is preferred to have an independent time warping estimator for the first encoding branch and the second processing branch in the second encoding branch.

該發明的編碼音訊信號可儲存在一數位儲存媒體上或可在一傳輸媒體上傳輸，諸如一無線傳輸媒體或一有線傳輸媒體(諸如網際網路)。The encoded audio signal of the present invention may be stored on a digital storage medium or may be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

在一不同的實施例中，第1a或2a圖之該開關200在兩編碼支路400、500之間切換。在一進一步的實施例中，可存在額外的編碼支路，諸如一第三編碼支路或甚至一第四編碼支路或甚至更多的編碼支路。在該解碼器端，第1b或2b圖之該開關600在兩編碼支路431、440與531、532、533、534、540之間切換。在一進一步的實施例中，可存在額外的解碼支路，諸如一第三解碼支路或甚至一第四解碼支路或甚至更多的解碼支路。類似地，當提供此類額外的編碼/解碼支路時，其它的開關521或532可在多於兩個的不同編碼演算法之間切換。In a different embodiment, the switch 200 of the 1a or 2a diagram is switched between the two encoding branches 400, 500. In a further embodiment, there may be additional coding branches, such as a third coding branch or even a fourth coding branch or even more coding branches. At the decoder end, the switch 600 of FIGURE 1b or 2b switches between the two encoding branches 431, 440 and 531, 532, 533, 534, 540. In a further embodiment, there may be additional decoding branches, such as a third decoding branch or even a fourth decoding branch or even more decoding branches. Similarly, when such additional encoding/decoding legs are provided, other switches 521 or 532 can switch between more than two different encoding algorithms.

第12A圖說明一編碼器實施之一較佳實施例，及第12B圖說明該相對應的解碼器實施之一較佳實施例。除了前面用相對應的參數數討論的該等元件以外，第12A圖之該實施例說明一單獨的感知模組1200，且額外地說明在第11A圖區塊421說明之該進一步的編碼器工具的一較佳實施。這些額外的工具是一時域雜訊修整(temporal noise shaping,TNS)工具1201及一中/端(mid/side)編碼工具(M/S)1202。此外，元件421及524之額外的功能在區塊421/542說明，作為對頻譜值之比例調整(scaling)、雜訊填充分析、量化、算術編碼之一結合的實施。Figure 12A illustrates a preferred embodiment of an encoder implementation, and Figure 12B illustrates a preferred embodiment of the corresponding decoder implementation. This embodiment of Figure 12A illustrates a separate sensing module 1200, and additionally illustrates the further encoder tool illustrated in block 11A, block 421, except for those elements previously discussed with corresponding parameter numbers. A preferred implementation. These additional tools are a temporal noise shaping (TNS) tool 1201 and a mid/side encoding tool (M/S) 1202. In addition, the additional functions of elements 421 and 524 are illustrated in blocks 421/542 as an implementation of a combination of scaling, noise fill analysis, quantization, and arithmetic coding of spectral values.

在該相對應的解碼器實施地12B圖中，說明了額外的元件，它們一M/S解碼工具1203及一TNS解碼器工具1204。此外，在1205指示未在前面圖中說明之一低音後濾波器。該過渡視窗區塊532相對應於第2B圖中的該元件532，該元件532被說明為一開關但執行某種可以是一過渡取樣交錯淡出或一關鍵取樣交錯淡出之一交錯淡出。後者是作為一MDCT操作而實施，其中兩時間混疊部分被重疊且相加。由於在沒有任何品質損失的情況下可減少總的位元率，此關鍵取樣過渡處理較佳地使用在適當的情況中。該額外的過度視窗化區塊600相對應於該第2B圖中的該組合器600，該組合器600也被說明為一開關，但是清楚的是，當在該第一支路已處理一區塊及在該第二分鐘已處理另一區塊時，此元件執行某種交錯淡出(關鍵取樣的或非關鍵取樣的)以避免區塊偽影及特定地切換偽影。然而，當在這兩支路中的該處理完美匹配其另一時，則該交錯淡出操作可“降級”為一硬切換(而一交錯淡出操作被理解為在這兩支路之間的一“軟”切換)。In the corresponding decoder implementation map 12B, additional components are illustrated, which are an M/S decoding tool 1203 and a TNS decoder tool 1204. Further, at 1205, one of the bass back filters is not illustrated in the previous figures. The transition window block 532 corresponds to the element 532 in FIG. 2B, which is illustrated as a switch but performs some sort of staggered fade out of a transition sample staggered fade out or a key sample interlace fade out. The latter is implemented as an MDCT operation in which the two-time aliasing portions are overlapped and added. Since the total bit rate can be reduced without any loss of quality, this critical sampling transition process is preferably used in the appropriate case. The additional over-windowing block 600 corresponds to the combiner 600 in FIG. 2B, which is also illustrated as a switch, but it is clear that when a zone has been processed in the first leg When the block and another block have been processed in the second minute, the component performs some sort of staggered out (critically sampled or non-critically sampled) to avoid block artifacts and specifically switch artifacts. However, when the process in the two branches perfectly matches the other, the interleaving operation can be "downgraded" to a hard handoff (and a staggered fade out operation is understood as a "between the two paths" Soft "switch".

在第12A及12B圖中的概念允許對具有語音及音訊內容之一隨意混合之信號編碼，此概念執行比得上或較好於可能特定於語音或一般音訊內容而裁剪之最佳編碼技術。該編碼器及解碼器之大體結構可描述為：存在由處理立體聲或多聲道處理之一MPEG環繞(MPEGS)功能單元及處理該輸入信號中較高音訊頻率之參數表示之一增強SBR(eSBR)組成之一共同預-後處理。接著，存在兩支路，一支路由一改良高階音訊編碼(AAC)工具路徑組成及另一支路由一基於線性預測編碼(LP或LPC域)的路徑組成，其接著以該LPC殘餘之一頻域表示或一時域表示為特徵。針對AAC及LPC之所有傳輸的頻譜都在量化及算術編碼之後的MDCT域中表示。該時域表示使用一ACELP激發編碼方案。針對該編碼器在第12A圖中及針對該解碼器在第12B圖中顯示了該基本結構。在此圖式中的資料流是自左至右，自頂至底。該解碼器的功能是在該位元流酬載中發現對該量化音訊頻譜或時域表示之描述且對該等量化值及其它重建資訊解碼。The concepts in Figures 12A and 12B allow for the encoding of signals that are arbitrarily mixed with one of speech and audio content, and the concept performs an optimal encoding technique that is comparable or better than cropping that may be specific to speech or general audio content. The general structure of the encoder and decoder can be described as the presence of one of the MPEG Surround (MPEGS) functional units for processing stereo or multi-channel processing and one of the parameter representations for processing the higher audio frequencies in the input signal. Enhanced SBR (eSBR) One of the components is jointly pre-post-processed. Then, there are two branches, one route-modified high-order audio coding (AAC) tool path component and another route-based on linear predictive coding (LP or LPC domain) path, which is followed by the LPC residual frequency. A domain representation or a time domain is represented as a feature. The spectrum of all transmissions for AAC and LPC is represented in the MDCT domain after quantization and arithmetic coding. This time domain representation uses an ACELP excitation coding scheme. This basic structure is shown in Figure 12A for the encoder and in Figure 12B for the decoder. The data flow in this diagram is from left to right, from top to bottom. The function of the decoder is to find a description of the quantized audio spectrum or time domain representation in the bit stream payload and to decode the quantized values and other reconstruction information.

在傳輸頻譜資訊的情況下，該解碼器將重建該量化頻譜，透過在該位元流酬載中活動的任何工具來處理該重建的頻譜以得到如該輸入位元流酬載所描述的該實際信號頻譜，並最終將該頻域轉換成該時域。在該初始重建及該頻譜重建之比例調整之後，存在改良一或多個頻譜以提供較高效編碼之最佳工具。In the case of transmitting spectral information, the decoder will reconstruct the quantized spectrum, and the reconstructed spectrum will be processed by any means active in the bit stream payload to obtain the description as described in the input bit stream payload. The actual signal spectrum, and ultimately the frequency domain is converted to the time domain. After this initial reconstruction and the scaling of the spectral reconstruction, there is an optimal tool to improve one or more spectra to provide more efficient coding.

在一傳輸的時域信號表示的情況下，該解碼器將重建該量化時間信號，透過在該位元流酬載中活動的任何工具來處理該重建的時間信號以得到如該輸入位元流酬載所描述的該實際時域信號。In the case of a transmitted time domain signal representation, the decoder will reconstruct the quantized time signal and process the reconstructed time signal through any tool active in the bit stream payload to obtain the input bit stream. The actual time domain signal described by the payload.

對於在該信號資料上操作的各該工具，保留對“通過”的選擇，且在省略該處理的所有情況中，在其輸入的該頻譜或時間取樣直接通過該工具而無需改良。For each of the tools operating on the signal material, the selection of "pass" is retained, and in all cases where the process is omitted, the spectrum or time sample input at it passes directly through the tool without modification.

在該位元流自時域至頻譜表示或自LP域至非LP域或反之亦然改變其信號表示的位置，該解碼器透過一適當的過渡重疊-相加視窗化之方法將有助於自一域至另一域的過渡。In the bit stream from the time domain to the spectral representation or from the LP domain to the non-LP domain or vice versa, the position of the signal representation is changed, and the decoder will assist by an appropriate transition overlap-additional windowing method. The transition from one domain to another.

在過渡處理之後，以相同方式來將eSBR及MPEGS處理施於這兩編碼路徑。After the transition processing, the eSBR and MPEGS processing is applied to the two encoding paths in the same manner.

到該位元流酬載多工器工具的輸入是一位元流酬載。該多工器將該位元流酬載分離為針對每一工具的多個部分並提供給各該工具有關於該工具的位元流酬載資訊。The input to the bit stream multiplexer tool is a one-way payload. The multiplexer separates the bit stream payload into portions for each tool and provides each of the tools with bit stream payload information about the tool.

該位元流酬載多工器工具的輸出是：The output of the bit stream payload multiplexer tool is:

●視在目前訊框中的該核心編碼類型而定，是：●Depending on the core encoding type in the current frame, it is:

●該量化及無雜訊地編碼頻譜，其用如下表示：• The quantized and noise-free encoded spectrum, which is represented as follows:

●比例因子資訊●Scale factor information

●算術編碼頻譜線●Arithmetic coding spectrum line

●或是：線性預測(LP)參數以及一激發信號，該激發信號用如下內容中之一者表示：• Or: Linear prediction (LP) parameters and an excitation signal, which is represented by one of the following:

●量化或算術編碼頻譜線(轉換編碼激發，TCX)或●Quantitative or arithmetically encoded spectral lines (conversion coding excitation, TCX) or

●ACELP編碼時域激發● ACELP coding time domain excitation

●該頻譜雜訊填充資訊(最佳的)●The spectrum noise filling information (best)

●該M/S決策資訊(最佳的)● The M/S decision information (best)

●該時域雜訊修整(TNS)(最佳的)●The time domain noise trimming (TNS) (best)

●該濾波器組控制資訊●The filter bank control information

●該時間不扭曲(TW)控制資訊(最佳的)● This time does not distort (TW) control information (best)

●該增強頻帶複製(eSBR)控制資訊● Enhanced Band Replication (eSBR) control information

●該MPEG環繞(MPEGS)控制資訊●The MPEG Surround (MPEGS) control information

該比例因子無雜訊解碼工具自該位元流酬載去多工器擷取資訊、分析該資訊，並解碼該霍夫曼及DPCM編碼比例因子。The scale factor no-noise decoding tool loads the multiplexer from the bit stream to retrieve information, analyze the information, and decode the Huffman and DPCM coding scale factors.

到該比例因子無雜訊解碼工具的輸入是：The input to the scale factor no noise decoding tool is:

●針對該無雜訊編碼頻譜之比例因子資訊● Scale factor information for the noise-free coded spectrum

該比例因子無雜訊解碼工具的輸出是：The output of the scale factor no noise decoding tool is:

●該比例因子之解碼的整數表示：● The decoded integer representation of the scale factor:

該頻譜無雜訊解碼工具自該位元流酬載去多工器擷取資訊、分析該資訊、解碼算術編碼的資料，並重建該量化頻譜。到此無雜訊解碼工具的輸入是：The spectrum noise-free decoding tool loads the multiplexer from the bit stream to retrieve information, analyze the information, decode the arithmetically encoded data, and reconstruct the quantized spectrum. The input to the no noise decoding tool is:

●該無雜訊編碼頻譜●The noise-free coding spectrum

此無雜訊解碼工具的輸出是：The output of this noise-free decoding tool is:

●該頻譜的該等量化值The quantized values of the spectrum

該反向量化器工具擷取針對該頻譜的該等量化值，並將該整數值轉換成非比例調整的、重建的頻譜。此量化器是一壓伸量化器，其壓伸因子視該選定的核心編碼模式而定。The inverse quantizer tool extracts the quantized values for the spectrum and converts the integer values into a non-proportional adjusted, reconstructed spectrum. The quantizer is an overextension quantizer whose roll-out factor depends on the selected core coding mode.

到該反向量化器工具的輸入是：The input to the inverse quantizer tool is:

●針對該頻譜的該等量化值• the quantized values for the spectrum

該反向量化器工具的輸出是：The output of the inverse quantizer tool is:

●該未比例調整、反向量化的頻譜● The unscaled, inverse quantized spectrum

該雜訊填充工具用來填充在該解碼頻譜中的的頻譜間隙，這些頻譜間隙當頻譜值被量化為零時出現，例如由於對在該編碼器位元要求上的一極強限制。該雜訊填充工具的使用是最佳的。The noise fill tool is used to fill the spectral gaps in the decoded spectrum, which occur when the spectral values are quantized to zero, for example due to a very strong limit on the encoder bit requirements. The use of this noise filling tool is optimal.

到該雜訊填充工具的輸入是：The input to the noise fill tool is:

●該未比例調整、反向量化頻譜●The unscaled, inverse quantized spectrum

●雜訊填充參數● Noise filling parameters

●該比例因子之解碼整數表示● The decoded integer representation of the scale factor

到該雜訊填充工具的輸出是：The output to the noise filling tool is:

●針對頻譜線先前被量化為零的該未比例調整、反向量化頻譜值。• The unscaled, inverse quantized spectral value that was previously quantized to zero for the spectral line.

●該等比例因子之改良的整數表示● Improved integer representation of the scale factors

該解比例調整(rescaling)工具將該等比例因子之整數表示轉換為該等實際值，及用該等相關比例因子乘以該未比例調整反向量化的頻譜。The rescaling tool converts the integer representation of the scale factors to the actual values, and multiplies the unscaled inverse quantized spectrum by the correlation scale factors.

到該等比例因子的輸入是：The inputs to these scale factors are:

●該等比例因子之解碼整數表示● The decoded integer representation of the scale factors

該比例因子工具的輸出是：The output of the scale factor tool is:

對該M/S工具的一概觀，請參考ISO/IEC 14496-3，分條款4.1.1.2。For an overview of the M/S tool, please refer to ISO/IEC 14496-3, sub-clause 4.1.1.2.

對該時域雜訊修整(TNS)工具的一概觀，請參考ISO/IEC 14496-3,分條款4.1.1.2。For an overview of the Time Domain Noise Correction (TNS) tool, please refer to ISO/IEC 14496-3, subclause 4.1.1.2.

該濾波器組/區塊切換工具實施在該編碼器中執行之該頻率映射的反向。針對該濾波器組使用一反向改良離散餘弦轉換(IMDCT)。該IMDCT可被組態以支援120、128、240、256、320、480、512、576、960、1024或1152頻譜係數。The filter bank/block switching tool implements the inverse of the frequency map performed in the encoder. A reverse modified discrete cosine transform (IMDCT) is used for this filter bank. The IMDCT can be configured to support 120, 128, 240, 256, 320, 480, 512, 576, 960, 1024 or 1152 spectral coefficients.

到該濾波器組工具的輸入是：The input to the filter bank tool is:

●該(反向量化)頻譜●This (inverse quantized) spectrum

●該濾波器組控制資訊●The filter bank control information

該濾波器工具的輸出是：The output of this filter tool is:

●該(等)時域重建音訊信號● The (equal) time domain reconstruction audio signal

當致能該時域扭曲模式時，該時間扭曲濾波器組/區塊切換工具替代該正常濾波器/區塊切換工具。該濾波器組與該正常濾波器組是相同的(IMDCT)，額外地，該視窗化的時域取樣藉由隨時間變化的重取樣自該扭曲時域映射至該線性時域。The time warp filter bank/block switching tool replaces the normal filter/block switching tool when the time domain distortion mode is enabled. The filter bank is identical to the normal filter bank (IMDCT), and additionally, the windowed time domain samples are mapped from the warped time domain to the linear time domain by time varying resampling.

到該等時間扭曲濾波器組工具的輸入是：The inputs to these time warp filter bank tools are:

●該反向量化頻譜●The inverse quantized spectrum

●該濾波器組控制資訊●The filter bank control information

●該時間扭曲控制資訊●The time warp control information

該濾波器組工具的輸出是：The output of this filter bank tool is:

●該(等)線性時域重建音訊信號● The (equal) linear time domain reconstruction audio signal

該增強SBR(eSBR)工具再產生該音訊信號之該高頻。它是基於諧波之該等序列之複製，在編碼期間截斷。它調整該產生的高頻之頻譜包絡並施以反向濾波，且加入雜訊及正弦曲綫分量以再產生該原始信號之該等頻譜特性。The enhanced SBR (eSBR) tool reproduces the high frequency of the audio signal. It is a copy of these sequences based on harmonics that are truncated during encoding. It adjusts the resulting high frequency spectral envelope and applies inverse filtering, and adds noise and sinusoidal components to reproduce the spectral characteristics of the original signal.

到該eSBR的輸入是：The input to the eSBR is:

●該量化包絡資料●Quantitative envelope data

●雜項控制資料●Miscellaneous control data

●來自該AAC核心解碼器的一時域信號• a time domain signal from the AAC core decoder

該eSBR的輸出是：The output of the eSBR is:

●一時域信號或● a time domain signal or

●一信號之一QMP域表示，例如在使用該MPEG環繞工具的情況下。A QMP field representation of a signal, for example in the case of using the MPEG Surround tool.

該MPEG環繞(MPEGS)藉由將一複雜上混程序施於由適當空間參數控制之該(等)輸入信號可自一或多個輸入信號產生多個信號。在該USAC脈絡中，MPEGS透過傳輸參數旁側資訊以及一傳輸降混信號用來對一多聲道信號編碼。The MPEG Surround (MPEGS) can generate a plurality of signals from one or more input signals by applying a complex upmix procedure to the (iso) input signal controlled by appropriate spatial parameters. In the USAC context, MPEGS uses a transmission parameter side-by-side information and a transmission downmix signal to encode a multi-channel signal.

到該MPEGS工具的輸入是：The input to the MPEGS tool is:

●一降混時域信號或● a downmix time domain signal or

●來自該eSBR工具的一降混信號之一QMF域表示該MPEGS工具的輸出是：• A QMF field from a downmix signal from the eSBR tool indicates that the output of the MPEGS tool is:

●一多聲道時域信號● A multi-channel time domain signal

該信號分類器工具分析該原始輸入信號並藉以產生觸發該等不同編碼模式的選擇之控制資訊。該輸入信號之分析是依賴實現並將試圖選擇針對一給定輸入信號訊框之該最佳核心編碼模式。該信號分類器的輸出也可(最佳地)用於影響其它工具的表現，例如MPEG環繞、增強SBR、時間扭曲濾波器組及其它的。The signal classifier tool analyzes the original input signal and thereby generates control information that triggers selection of the different coding modes. The analysis of the input signal is dependent on implementation and will attempt to select the best core coding mode for a given input signal frame. The output of the signal classifier can also be used (optimally) to influence the performance of other tools, such as MPEG Surround, Enhanced SBR, Time Warped Filter Banks, and others.

到該信號分類器工具的輸入是：The input to the Signal Classifier tool is:

●該原始末改良輸入信號●The original final improved input signal

●依賴參數之額外的實施● Additional implementation of dependent parameters

該信號分類器工具的輸出是：The output of this signal classifier tool is:

●控制該核心編解碼器的選擇(非LP濾波頻域編碼、LP濾波頻域或LP濾波時域編碼)之一控制信號Controlling the selection of one of the core codecs (non-LP filter frequency domain coding, LP filtering frequency domain or LP filtering time domain coding)

依據本發明，在第12A圖區塊410中及第12A圖該轉換器523中的該時間/頻率解析度依賴於該音訊信號而控制。在第13A圖中說明視窗長度、轉換長度、時間解析度與頻率解析度之間的相互關係，其中變得清楚的是，對於一長視窗長度，該時間解析度變低但該頻率解析度變高，而對於一短視窗長度，該時間解析度變高但該頻率解析度變低。In accordance with the present invention, the time/frequency resolution in block 12A and in converter 523 in Fig. 12A is controlled in dependence on the audio signal. The relationship between the window length, the conversion length, the temporal resolution, and the frequency resolution is explained in Fig. 13A, wherein it becomes clear that for a long window length, the temporal resolution becomes low but the frequency resolution becomes High, and for a short window length, the time resolution becomes higher but the frequency resolution becomes lower.

在該第一編碼支路中(較佳地是用第12A圖的元件410、1201、1202、4021指示的該AAC編碼支路)，可使用不同的視窗，其中該視窗形狀由一信號分析器決定，該信號分析器在該信號分類區塊300中被編碼但其也可以是一單獨的模組。該編碼器在第13B圖中說明之具有不同時間/頻率解析度的該等視窗中選擇一視窗。該第一長視窗、該第二長視窗、該第三長視窗、該第四長視窗、該第五長視窗、該第六長視窗之該時間/頻率解析度等於2048取樣值(對於1024的一轉換長度)。在第13B圖中第三線說明的該短視窗具有相對應於其視窗大小之256取樣值的一時間解析度。這相對應於128的一轉換長度。In the first encoding branch (preferably the AAC encoding branch indicated by elements 410, 1201, 1202, 4021 of Figure 12A), different windows may be used, wherein the window shape is comprised by a signal analyzer It is decided that the signal analyzer is encoded in the signal classification block 300 but it can also be a separate module. The encoder selects a window among the windows having different time/frequency resolutions as illustrated in FIG. 13B. The time/frequency resolution of the first long window, the second long window, the third long window, the fourth long window, the fifth long window, and the sixth long window is equal to 2048 sample values (for 1024 One conversion length). The short window illustrated in the third line in Fig. 13B has a temporal resolution corresponding to 256 sample values of its window size. This corresponds to a conversion length of 128.

類似地，最後兩視窗具有等於2304的一視窗長度，這比在該第一線中的該視窗具有一較好的頻率解析度而一較低的時間解析度。在最後兩線中的該等視窗之該轉換長度等於1152。Similarly, the last two windows have a window length equal to 2304, which has a better frequency resolution and a lower temporal resolution than the window in the first line. The conversion length of the windows in the last two lines is equal to 1152.

在該第一編碼支路中，可建構根據在第13B圖中之該等轉換視窗而建立之不同的視窗序列。雖然在第13C圖中只說明了一短序列，同時其它“序列”只由一單一視窗組成，但是亦可建構由多個視窗組成之較大序列。注意的是，依據第13B圖，對於係數之較小數目，即960而非1024，該時間解析度也小於係數之該相對應的較高數目，諸如1024。In the first coding branch, different window sequences can be constructed based on the conversion windows in Figure 13B. Although only a short sequence is illustrated in Fig. 13C, while other "sequences" consist of only a single window, a larger sequence of multiple windows may be constructed. Note that, according to Figure 13B, for a smaller number of coefficients, i.e., 960 instead of 1024, the temporal resolution is also less than the corresponding higher number of coefficients, such as 1024.

第14A至14G說明在該第二編碼支路中之不同的解析度/視窗大小。在本發明之一較佳實施例中，該第二編碼支路具有一第一處理支路(是一ACELP時域編碼器526)，及該第二處理支路包含該濾波器組523。在此支路中，一例如2048取樣之一超訊框被再分為256取樣之訊框。可單獨使用256取樣之個別訊框以使得當應用具有百分之50重疊之一MDCT時可應用四視窗(每一視窗涵蓋兩訊框)之一序列。接著，如第14D圖中所說明，使用一高時間解析度。可選擇地，當該信號允許較長視窗時，可應用如在第14C圖中的該序列，其中應用針對每一視窗(中等視窗)具有1024取樣之一雙倍視窗大小，以使得一視窗涵蓋四訊框且存在百分之50的一重疊。14A through 14G illustrate different resolution/view sizes in the second encoding branch. In a preferred embodiment of the invention, the second encoding branch has a first processing branch (which is an ACELP time domain encoder 526), and the second processing branch includes the filter bank 523. In this branch, a hyperframe, such as a 2048 sample, is subdivided into 256 sample frames. The individual frames of 256 samples can be used separately so that one of the four windows (each window covers two frames) can be applied when the application has one of the MDCTs with 50 percent overlap. Next, as explained in Fig. 14D, a high temporal resolution is used. Alternatively, when the signal allows for a longer window, the sequence as in Figure 14C can be applied, where the application has one double window size of 1024 samples for each window (medium window) so that a window is covered Four frames and there is an overlap of 50%.

最後，當該信號是那樣使得一長視窗被使用時，此長視窗擴展4096取樣，也具有一百分之50的重疊。Finally, when the signal is such that a long window is used, the long window expands by 4096 samples and also has a 50% overlap.

在存在兩支路(其中一支路具有一ACELP編碼器)的該較佳實施例中，在該超訊框中用“A”指示的該ACELP訊框之位置也可決定申請在第14E圖中用“T”指示之兩相鄰TCX訊框之視窗大小。基本上，人們感興趣於盡可能地使用長視窗。不過，當一單一T訊框在兩A訊框之間時，必須應用短視窗。當存在兩相鄰T訊框時應用中等視窗。然而，當存在三相鄰T訊框時，一相對應的較大視窗由於額外的複雜性可能不是高效的。因此，該第三T訊框雖然未被一A訊框居前，但可由一短視窗來處理。當整個超訊框只具有T訊框時則應用一長視窗。In the preferred embodiment in which there are two branches (one of which has an ACELP encoder), the position of the ACELP frame indicated by "A" in the superframe can also be determined to be applied in Figure 14E. The window size of two adjacent TCX frames indicated by "T". Basically, people are interested in using long windows as much as possible. However, when a single T frame is between two A frames, a short window must be applied. A medium window is applied when there are two adjacent T frames. However, when there are three adjacent T frames, a corresponding larger window may not be efficient due to additional complexity. Therefore, although the third T frame is not in front of an A frame, it can be processed by a short window. A long window is applied when the entire hyperframe has only T frames.

第14F圖說明針對視窗的幾個選擇，其中該視窗大小始終是頻譜係數之該數目1g的2x，由於一較佳百分之50的重疊。然而，可應用針對所有編碼支路之其它重疊百分比以使得當沒有應用時域混疊時視窗大小及轉換長度之間的關係也可不同於二及甚至接近一。Figure 14F illustrates several options for a window where the window size is always 2x of the number of spectral coefficients 1g due to a better 50% overlap. However, other overlap percentages for all coded branches may be applied such that the relationship between window size and transition length may also differ from two and even close to one when no time domain aliasing is applied.

第14G圖說明基於在第14F圖中給定的規則建構一視窗之規則。該值ZL說明在該視窗開始的零。該值L說明在一混疊區域中的多個視窗係數。在部分M中的該等值是未引入任何混疊之“1”值，由於在相對應於M的部分與具有零值的一相鄰視窗之一重疊。該部分M之後是一右重疊區域R，該右重疊區域R之後是零的一ZR區域，其將相對應於一隨後視窗之一部分M。Figure 14G illustrates the rules for constructing a window based on the rules given in Figure 14F. This value ZL indicates the zero at the beginning of the window. This value L illustrates a plurality of window coefficients in an aliasing region. The equivalent in section M is a "1" value that does not introduce any aliasing, since the portion corresponding to M overlaps with one of an adjacent window having a value of zero. This portion M is followed by a right overlap region R, which is followed by a ZR region of zero, which will correspond to a portion M of a subsequent window.

參考隨後所附附件，其描述了一發明的音訊編碼/解碼方案(特定地有關於該解碼器端)之一較佳及詳細實施。Reference is made to the appended annex, which describes a preferred and detailed implementation of an inventive audio encoding/decoding scheme, particularly with respect to the decoder side.

附件annex 1. Window and sequence

在該頻域中完成量化及編碼。為此目的，在該編碼器中，該時間信號被映射成該頻域。該解碼器執行如在分條款2中的該反向映射。視該信號而定，該編碼器藉由使用三不同的視窗大小：2304、2048與256可改變該時間/頻率解析度。為了在視窗之間切換，使用該等過渡視窗LONG_START_WINDOW、LONG_STOP_WINDOW，START_WINDOW_LPD、STOP_WINDOW_1152，STOP_START_WINDOW及STOP_START_WINDOW_1152。表格5.11列舉該等視窗，指定該相對應的轉換長度並示意地顯示該等視窗的形狀。使用三轉換長度：1152、1024(或960)(參考長轉換)及128(or 120)係數(參考短轉換)。Quantization and coding are done in this frequency domain. For this purpose, in the encoder, the time signal is mapped into the frequency domain. The decoder performs the inverse mapping as in sub-clause 2. Depending on the signal, the encoder can change the time/frequency resolution by using three different window sizes: 2304, 2048 and 256. In order to switch between windows, use these transition windows LONG_START_WINDOW, LONG_STOP_WINDOW, START_WINDOW_LPD, STOP_WINDOW_1152, STOP_START_WINDOW and STOP_START_WINDOW_1152. Table 5.11 lists the windows, specifies the corresponding transition lengths and schematically shows the shape of the windows. Use three conversion lengths: 1152, 1024 (or 960) (reference long conversion) and 128 (or 120) coefficients (reference short conversion).

視窗序列由視窗以一raw_data_block始終包含表示1024(或960)輸出取樣之資料之一方式組成。該資料元件window_sequence 指示實際上使用的該視窗序列。第13C圖列舉該等視窗序列是如何由個別視窗組成。參考分條款2索取關於該轉換及該等視窗之較詳細的資訊。The window sequence consists of a window in which one raw_data_block always contains one of the data representing 1024 (or 960) output samples. The data element window_sequence indicates the window sequence actually used. Figure 13C shows how these window sequences are composed of individual windows. Refer to Sub-Clause 2 for more detailed information on the conversion and the windows.

1.2 Scale factor band and grouping

見ISO/IEC 14496-3，子部分4，分條款4.5.2.3.4See ISO/IEC 14496-3, subpart 4, subclause 4.5.2.3.4

如在ISO/IEC 14496-3，子部分4，分條款4.5.2.3.4中所解釋，該等比例因子頻帶之寬度是建立在人類聽覺系統之該等關鍵頻帶之模仿上。由於此原因，在一頻譜中的比例因子頻帶之數目及它們的寬度視該轉換長度及該取樣頻率而定。在ISO/IEC 14496-3子部分4節4.5.4中的表格4.110至表格4.128列舉了在該等轉換長度1024(960)及128(120)上與該等取樣頻率上對每一比例因子頻帶的開始的偏移。原始為LONG_WINDOW,LONG_START_WINDOW及LONG_STOP_WINDOW而設計之該等表格也可用於START_WINDOW_LPD及STOP_START_WINDOW。表格4至表格10是針對STOP_WINDOW_1152及STOP_START_WINDOW_1152的該等偏移表格。As explained in ISO/IEC 14496-3, subpart 4, subclause 4.5.2.3.4, the width of the scale factor bands is based on the imitation of these critical bands of the human auditory system. For this reason, the number of scale factor bands in a spectrum and their width depend on the conversion length and the sampling frequency. Tables 4.110 to 4.128 in Section 4.4.5 of ISO/IEC 14496-3 subsection 4 list the frequency bands for each of these conversion frequencies on the conversion lengths 1024 (960) and 128 (120). The offset of the beginning. The tables originally designed for LONG_WINDOW, LONG_START_WINDOW and LONG_STOP_WINDOW are also available for START_WINDOW_LPD and STOP_START_WINDOW. Tables 4 through 10 are the offset tables for STOP_WINDOW_1152 and STOP_START_WINDOW_1152.

1.3 lpd_channel_stream()的解碼1.3 decoding of lpd_channel_stream()

該lpd_channel_stream()位元流元件包含所有必需的資訊來對“線性預測域”編碼信號之一訊框解碼。它包含針對在該LPC域中編碼(即包括一LPC濾波步驟)之編碼信號的一訊框之酬載。接著在一ACELP模組的幫助下或在該MDCT轉換域中(“轉換編碼激發”，TCX)表示此濾波器之殘餘(所謂的“激發”)。為了允許密切適應於該等信號特性，一訊框被劃分為大小相等的四較小單元，每一較小單元用ACELP或TCX編碼方案來編碼。The lpd_channel_stream() bitstream element contains all the necessary information to decode one of the "linear prediction domain" encoded signals. It contains a payload for a frame of coded signals encoded in the LPC domain (i.e., including an LPC filtering step). The residual of this filter (so called "excitation") is then represented by the help of an ACELP module or in the MDCT conversion domain ("conversion coding excitation", TCX). In order to allow close adaptation to these signal characteristics, a frame is divided into four smaller units of equal size, each smaller unit being encoded with an ACELP or TCX coding scheme.

此過程類似於在3GPP TS 26.290中所描述的該編碼方案。繼承此文件的是一略微不同的術語，其中一“超訊框”表示1024取樣之一信號段，而一“訊框”準確地是該信號段的四分之一，即256取樣。這些訊框中的每一訊框被進一步再分為長度相等四“子訊框”。請注意的是，本分章採用此術語。This process is similar to the coding scheme described in 3GPP TS 26.290. Inheriting this file is a slightly different term, where a "superframe" represents one of the 1024 samples of a signal segment, and a "frame" is exactly one quarter of the segment of the signal, ie 256 samples. Each frame in these frames is further subdivided into four sub-frames of equal length. Please note that this sub-chapter uses this term.

1.4 definition, data components

acelp_core_mode 　在ACELP作為一lpd編碼模式而使用的情況下，此位元欄位指示準確的位元分配方案。 Acelp_core_mode In the case where ACELP is used as an lpd coding mode, this bit field indicates an accurate bit allocation scheme.

lpd_mode 　該位元欄位模式定義針對在lpd_channel_stream()的一超訊框(相對應於一AAC訊框)中之各該四訊框之該等編碼模式。該等編碼模式被儲存在該陣列mod[]中且自0至3取值。自下面的表格1可決定自lpd_mode至mod[]的映射。 Lpd_mode This bit field mode defines these encoding modes for each of the four frames in a hyperframe (corresponding to an AAC frame) in lpd_channel_stream(). The encoding modes are stored in the array mod[] and are taken from 0 to 3. Table 1 below can determine the mapping from lpd_mode to mod[].

mod[0..3]　在該陣列mod[]中的該等值指示在每一訊框中之該各自的編碼模式：The value of mod[0..3] in the array mod[] indicates the respective encoding mode in each frame:

acelp_coding()　包含對ACELP激發之一訊框解碼的所有資料之語法元件。Acelp_coding() contains the syntax elements for all data decoded by the ACELP excitation frame.

tcx_coding()　包含對基於MDCT轉換編碼激發(TCX)之一訊框解碼的所有資料之語法元件。Tcx_coding() contains syntax elements for all data decoded based on MDCT Transformation Code Excitation (TCX).

first_tcx_flag　指示目前處理的TCX訊框是否是該超訊框中的第一訊框之旗標。First_tcx_flag indicates whether the currently processed TCX frame is the flag of the first frame in the superframe.

lpc_data()　包含對解碼目前訊框所需要的所有LPC濾波器參數設定解碼之語法元件。Lpc_data() contains the syntax elements that decode the decoding of all LPC filter parameters required to decode the current frame.

first_lpd_flag　指示目前訊框是否是在LPC域中編碼之超訊框的一序列中的第一個訊框。依據表格3自該位元流元件core_mode(在一channel_pair_element的情況下是core_mode0及core_mode1)的歷史也可決定此旗標。First_lpd_flag indicates whether the current frame is the first frame in a sequence of hyperframes encoded in the LPC domain. This flag can also be determined from the history of the bit stream element core_mode (core_mode0 and core_mode1 in the case of channel_pair_element) according to Table 3.

last_lpd_mode　指示前面解碼的訊框之lpd_mode。Last_lpd_mode indicates the lpd_mode of the previously decoded frame.

1.5 decoding process

在該lpd_channel_stream中解碼的順序是：The order of decoding in the lpd_channel_stream is:

獲取acelp_core_modeGet acelp_core_mode

獲取lpd_mode並據其決定輔助變量mod[]的內容Get lpd_mode and decide the content of the auxiliary variable mod[]

獲取acelp_coding或tcx_coding資料，視輔助變量mod[]的內容而定Get acelp_coding or tcx_coding data, depending on the content of the auxiliary variable mod[]

獲取lpc_dataGet lpc_data

1.6 ACELP/TCX coding mode combination

與節5.2.2中[8]相類似，在一lpd_channel_stream酬載之一超訊框中存在26允許的ACELP或TCX的結合。這26模式結合中的每一模式結合在該位元流元件lpd_mode中被標誌。在表格1及表格2中顯示了在一子訊框中的每一訊框之lpd_mode至實際編碼模式的映射。Similar to [8] in Section 5.2.2, there is a combination of 26 allowed ACELP or TCX in one of the lpd_channel_stream payloads. Each of these 26 mode combinations is marked in the bit stream element lpd_mode. The mapping of lpd_mode to the actual coding mode for each frame in a subframe is shown in Table 1 and Table 2.

11 .7比例因子頻帶表格參考.7 Scale Factor Band Table Reference

對於所有其它的比例因子頻帶表格請參考ISO/IEC 14496-3子部分4節4.5.4表格4.129至表格4.147。For all other scale factor band tables, please refer to ISO/IEC 14496-3 subsection 4, Section 4.5.4, Table 4.129 to Table 4.147.

1.8量化1.8 quantification

為了量化在該編碼器中的該等AAC頻譜係數，使用一非均勻量化器。因此，該解碼器在對該等比例因子霍夫曼解碼(見分條款6.3)及對該頻譜資料無雜訊解碼(見分條款6.1)之後必須執行反向非均勻量化。To quantify the AAC spectral coefficients in the encoder, a non-uniform quantizer is used. Therefore, the decoder must perform reverse non-uniform quantization after the equal scale factor Huffman decoding (see subclause 6.3) and no noise decoding of the spectral data (see subclause 6.1).

為了量化該等TCX頻譜係數，使用一均勻量化器。在對該頻譜資料無雜訊解碼之後在該解碼器不需要反向量化。To quantify these TCX spectral coefficients, a uniform quantizer is used. The decoder does not require inverse quantization after noise-free decoding of the spectral data.

2.濾波器組及區塊切換2. Filter bank and block switching 2.1工具描述2.1 Tool Description

透過將該信號的時間/頻率表示饋送至該濾波器模組將其映射至該時域上。此模組由一反向改良離散餘弦轉換(IMDCT)及一視窗及一重疊函數相加函數組成。為了使該濾波器組的該時間/頻率解析度適於該輸入信號之該等特性，也使用一區塊切換工具。N表示該視窗長度其中N是該window_sequence 的一函數(見分條款1.1)。對於每一通道，透過IMDCT，N/2時間-頻率值被轉換成N時域值x_i,n 。在施以該視窗函數之後，對於每一通道，該z_i,n 序列的第一半被加入至前面區塊視窗化序列z_(i-1),n 的第二半來重建針對每一通道out_i,n 的該等輸出取樣。The time/frequency representation of the signal is mapped to the time domain by feeding it to the filter module. The module consists of a reverse modified discrete cosine transform (IMDCT) and a window and an overlap function addition function. In order to adapt the time/frequency resolution of the filter bank to the characteristics of the input signal, a block switching tool is also used. N represents the length of the window where N is a function of the window_sequence (see subclause 1.1). For each channel, the N/2 time-frequency value is converted to the N time domain value x _i,n through IMDCT. After applying the window function, for each channel, the first half of the z _i,n sequence is added to the previous block windowing sequence z _(i-1), the second half of _n is reconstructed for each channel These output samples of out _i,n .

2.2　定義2.2 Definition

window_sequence 　指示使用哪一視窗序列(即區塊大小)之2位元。 Window_sequence indicates which window sequence (ie block size) is used for 2 bits.

window_shape 　指示選擇哪一視窗函數之1位元。 The window_shape indicates which one of the window functions is selected.

第13C圖顯示八window_sequences (ONLY_LONG_SEQUENCE LONG_START_SEQUENCE、EIGHT_SHORT_SEQUENCE、LONG_STOP_SEQUENCE,STOP_START_SEQUENCE、STOP_1152_SEQUENCE,LPD_START_SEQUENCE、STOP_START_1152_SEQUENCE)。Figure 13C shows eight window_sequences (ONLY_LONG_SEQUENCE LONG_START_SEQUENCE, EIGHT_SHORT_SEQUENCE, LONG_STOP_SEQUENCE, STOP_START_SEQUENCE, STOP_1152_SEQUENCE, LPD_START_SEQUENCE, STOP_START_1152_SEQUENCE).

在後面中，LPD_SEQUENCE指在所謂的線性預測域編解碼器(見節1.3)中的所有允許的視窗/編碼模式組合。在對一頻域編碼訊框解碼的脈絡中，知道只有一後面的訊框用由一LPD_SEQUENCE表示之該等LP域編碼模式編碼是重要的。然而，當對該LP域編碼訊框解碼時，在該LPD_SEQUENCE中的準確結構受注意。In the following, LPD_SEQUENCE refers to all allowed window/encoding mode combinations in the so-called linear prediction domain codec (see Section 1.3). In the context of decoding a frequency domain coded frame, it is important to know that only one subsequent frame is encoded with the LP domain coding mode represented by an LPD_SEQUENCE. However, when decoding the LP domain coded frame, the exact structure in the LPD_SEQUENCE is noted.

2.3　解碼過程2.3 decoding process 2.3.1 IMDCT2.3.1 IMDCT

該IMDCT的分析表式是：The analytical form of the IMDCT is:

其中：among them:

n=取樣指數n=sampling index

i=視窗指數i=window index

k=頻譜係數指數k=Spectrum coefficient index

N=基於該window_sequence值的視窗長度N = window length based on the window_sequence value

n₀ =(N/2+1)/2n ₀ = (N/2 + 1) / 2

用於反向轉換之該分析視窗長度N是該語法元件window_sequence 及該演算法脈絡之一函數。其定義如下：The analysis window length N for the reverse conversion is a function of the syntax element window_sequence and the context of the algorithm. Its definition is as follows:

視窗長度2304：Window length 2304:

視窗長度2408：Window length 2408:

該等重大區塊過渡如下：The transitions in these major blocks are as follows:

22 .3.2視窗化及區塊切換.3.2 Windowing and Block Switching

視window_sequence 及window_shape 元件而定，使用不同的轉換視窗。如下描述的多個半視窗之一組合提供所有可能的window_sequences。Depending on the window_sequence and window_shape components, different conversion windows are used. One of a plurality of half windows as described below provides all possible window_sequences.

對於window_shape ==1,該等視窗係數由凱撒貝索衍生視窗(Kaiser-Bessel derived(KBD)window)給定，如下：For window_shape ==1, these window coefficients are given by Kaiser-Bessel derived (KBD) window as follows:

其中：among them:

w '、凱撒貝索內核視窗函數(也見[5])如下定義： w ', Caesar Besso kernel window function (see also [5]) is defined as follows:

不然，對於window_shape ==0,如下使用一正弦函數：Otherwise, for window_shape ==0, use a sine function as follows:

針對KBD及該正弦視窗的該視窗長度N可以是2048(1920)或256(240)。在STOP_1152_SEQUENCE及STOP_START_1152_SEQUENCE的情況下，N仍可以是2048或256，該等視窗傾斜是類似的，但該平頂區域較長。The window length N for the KBD and the sinusoidal window may be 2048 (1920) or 256 (240). In the case of STOP_1152_SEQUENCE and STOP_START_1152_SEQUENCE, N can still be 2048 or 256, and the window tilts are similar, but the flat top region is longer.

只有在LPD_START_SEQUENCE的情況下，該視窗的右部分是64取樣的一正弦視窗。Only in the case of LPD_START_SEQUENCE, the right part of the window is a sine window of 64 samples.

在此分條款的部分a)-h)解釋了如何獲取可能的該等視窗序列。Sections a)-h) of this subclause explain how to obtain such possible window sequences.

對於所有種類的window_sequences，該第一轉換視窗之左半之window_shape由前面區塊的視窗形狀來決定。如下公式表示此事實：For all kinds of window_sequences, the window_shape of the left half of the first conversion window is determined by the shape of the window of the previous block. The following formula indicates this fact:

其中：among them:

window_shape_previous_block：前一區塊(i-1)的window _shape 。對於要被解碼的該第一raw_data_block()，該視窗之左及右半之window _shape 是相同的。Window_shape_previous_block: window _ shape of the previous block (i-1). For the first raw_data_block (), and the right half of the window Zhizuo window _ shape to be decoded is the same.

a) ONLY _ LONG _ SEQUENCE:

該window _sequence ==ONLY_LONG_SEQUENCE 等於具有2048(1920)的一總視窗長度N_l 之LONG_WINDOW。The window _ sequence == ONLY_LONG_SEQUENCE equal to 2048 (1920) of a total window length LONG_WINDOW of N_l.

對於window_shape ==1，針對ONLY_LONG_SEQUENCE的該視窗如下給定：For window_shape ==1, this window for ONLY_LONG_SEQUENCE is given as follows:

如果window_shape ==0針對ONLY_LONG_SEQUENCE的該視窗可描述如下：If window_shape ==0 for ONLY_LONG_SEQUENCE this window can be described as follows:

視窗化後，該等時域值(zi,n)可表達為；After windowing, the time domain values (zi, n) can be expressed as;

z _i _, _n =w (n )‧x _i _, _n ； z _i _, _n = w ( n )‧ x _i _, _n ;

b)LONG_START_SEQUENCE:b) LONG_START_SEQUENCE:

需要該LONG_START_SEQUENCE來獲得一正確的重疊且加入自一ONLY_LONG_SEQUENCE至一EIGHT_SHORT_SEQUENCE之一區塊過渡。The LONG_START_SEQUENCE is needed to obtain a correct overlap and to add a block transition from an ONLY_LONG_SEQUENCE to an EIGHT_SHORT_SEQUENCE.

視窗長度N_l 及N_s 分別被設定為2048(1920)及256(240)。 N_l window length and N_s are set to 2048 (1920) and 256 (240).

如果window_shape ==1，針對LONG_START_SEQUENCE的視窗可如下給定：If window_shape ==1, the window for LONG_START_SEQUENCE can be given as follows:

如果window_shape==0，則針對LONG_START_SEQUENCE的該視窗看起來像：If window_shape==0, the window for LONG_START_SEQUENCE looks like:

可用在a)說明的該公式計算該視窗化時域值。The windowed time domain value can be calculated using the formula described in a).

c)EIGHT_SHORTc) EIGHT_SHORT

該window_sequence ==EIGHT_SHORT包含八重疊的及加入的SHORT_WINDOW，每一SHORT_WINDOW具有256(240)的一長度N_s 。該window_sequence的總長度以及前導及後置零是2048(1920)。各該八區塊首先被單獨視窗化。用變量j=0,...,M -1(M =N_l /N_s )來作為該段區塊數的指數。The window_sequence ==EIGHT_SHORT contains eight overlapping and joined SHORT_WINDOW, each SHORT_WINDOW has a length N_s of 256 (240). The total length of the window_sequence and the leading and trailing zeros are 2048 (1920). Each of the eight blocks is first individually windowed. By the variable j = 0, ..., M -1 (M = N_l / N_s) as an index to the number of the segment block.

前面區塊的window_shape 只影響該八短區塊(W0(n))中的第一短區塊。如果window_shape ==1，該等視窗函數可如下給定：The window_shape of the previous block only affects the first short block in the eight short block (W0(n)). If window_shape ==1, these window functions can be given as follows:

不然，如果window_shape ==0，該等視窗函數可被描述為：Otherwise, if window_shape ==0, these window functions can be described as:

該EIGHT_SHORTwindow_sequence 之間的重疊及相加(產生該視窗化時域值zi,n)描述如下：The overlap and addition between the EIGHT_SHORT window_sequences (the resulting windowed time domain values zi, n) are described as follows:

dd )LONG_STOP_SEQUENCE)LONG_STOP_SEQUENCE

此window_sequence需要自一EIGHT_SHORT_SEQUENCE切回至一ONLY_LONG_SEQUENCE。This window_sequence needs to be switched back from an EIGHT_SHORT_SEQUENCE to an ONLY_LONG_SEQUENCE.

如果window_shape ==1，針對LONG_STOP_SEQUENCE的視窗如下給定：If window_shape ==1, the window for LONG_STOP_SEQUENCE is given as follows:

如果window_shape==0，針對LONG_START_SEQUENCE的視窗由如下決定：If window_shape==0, the window for LONG_START_SEQUENCE is determined as follows:

可用在a)中說明的該公式計算該等視窗化的時域值。The windowed time domain values can be calculated using the formula described in a).

e)STOP_START_SEQUENCE:e) STOP_START_SEQUENCE:

當只需要一ONLY_LONG_SEQUENCE時，針對自一EIGHT_SHORT_SEQUENCE至一EIGHT_SHORT_SEQUENCE的一區塊過渡需要該STOP_START_SEQUENCE來獲得一正確的重疊及相加。When only one ONLY_LONG_SEQUENCE is needed, the STOP_START_SEQUENCE is required for a block transition from an EIGHT_SHORT_SEQUENCE to an EIGHT_SHORT_SEQUENCE to obtain a correct overlap and addition.

視窗長度N _l 及N _s 分別被設定為2048(1920)及256(240)。The window lengths N _ l and N _ s are set to 2048 (1920) and 256 (240), respectively.

如果window_shape==1，針對STOP_START_SEQUENCE的視窗如下給定：If window_shape==1, the window for STOP_START_SEQUENCE is given as follows:

如果window_shape==0，針對STOP_START_SEQUENCE的視窗看起來像：If window_shape==0, the window for STOP_START_SEQUENCE looks like:

f)STOP_START_SEQUENCE:f) STOP_START_SEQUENCE:

針對自一ONLY_LONG_SEQUENCE至一LPD_SEQUENCE的一區塊過渡需要該LPD_SEQUENCE來獲得一正確的重疊及相加。The LPD_SEQUENCE is required for a block transition from an ONLY_LONG_SEQUENCE to an LPD_SEQUENCE to obtain a correct overlap and addition.

如果window_shape ==1，針對LPD_START_SEQUENCE的視窗如下給定：If window_shape ==1, the window for LPD_START_SEQUENCE is given as follows:

如果window_shape ==0，針對LPD_START_SEQUENCE的視窗看起來像：If window_shape ==0, the window for LPD_START_SEQUENCE looks like:

g)STOP_1152 _SEQUENCE: g) STOP_1152 _ SEQUENCE:

針對自一LPD_SEQUENCE至ONLY_LONG_SEQUENCE的一區塊過渡需要該STOP_1152_SEQUENCE來獲得一正確的重疊及相加。The STOP_1152_SEQUENCE is required for a block transition from an LPD_SEQUENCE to an ONLY_LONG_SEQUENCE to obtain a correct overlap and addition.

如果window _shape ==1，針對STOP_1152_SEQUENCE的視窗如下給定：If window _ shape ==1, the window for STOP_1152_SEQUENCE is given as follows:

如果window_shape ==0，針對STOP_1152_SEQUENCE的視窗如下給定：If window_shape ==0, the window for STOP_1152_SEQUENCE is given as follows:

h)STOP_START_1152_SEQUENCE:h) STOP_START_1152_SEQUENCE:

當只需要一ONLY_LONG_SEQUENCE時，針對自一LPD_SEQUENCE至一EIGHT_SHORT_SEQUENCE的一區塊過渡需要該STOP_START_1152_SEQUENCE來獲得一正確的重疊及相加。When only one ONLY_LONG_SEQUENCE is needed, the STOP_START_1152_SEQUENCE is required for a block transition from an LPD_SEQUENCE to an EIGHT_SHORT_SEQUENCE to obtain a correct overlap and addition.

如果window_shape ==1，針對STOP_START_SEQUENCE的該視窗如下給定：If window_shape ==1, the window for STOP_START_SEQUENCE is given as follows:

如果window_shape==0，針對STOP_START_SEQUENCE的該視窗看起來像：If window_shape==0, the window for STOP_START_SEQUENCE looks like:

2.3.3與前面視窗序列的重疊與相加2.3.3 overlap and addition to the previous window sequence

除了在EIGHT_SHORTwindow_sequence 中的重疊及相加以外，每一window_sequence 的該第一(左)部分與前面window_sequence 的該第二(右)部分重疊及相加產生最終的時域值out _i,n 。此操作的數學表式可如下描述：In addition to the overlap and addition in the EIGHT_SHORT window_sequence , the first (left) portion of each window_sequence overlaps and adds to the second (right) portion of the previous window_sequence to produce a final time domain value out _i,n . The mathematical expression of this operation can be described as follows:

在ONLY_LONG_SEQUENCE、LONG_START_SEQUENCE,EIGHT_SHORT_SEQUENCE、LONG_STOP_SEQUENCE、STOP_START_SEQUENCE、LPD_START_SEQUENCE的情況下：In the case of ONLY_LONG_SEQUENCE, LONG_START_SEQUENCE, EIGHT_SHORT_SEQUENCE, LONG_STOP_SEQUENCE, STOP_START_SEQUENCE, LPD_START_SEQUENCE:

及在STOP_1152_SEQUENCE、STOP_START_1152_SEQUENCE的情況下：And in the case of STOP_1152_SEQUENCE, STOP_START_1152_SEQUENCE:

在LPD_START_SEQUENCE的情況下，下一序列是LPD_SEQUENCE。一SIN或KBD視窗施於LPD_SEQUENCE以取得一良好重疊及相加。In the case of LPD_START_SEQUENCE, the next sequence is LPD_SEQUENCE. A SIN or KBD window is applied to LPD_SEQUENCE for a good overlap and addition.

在STOP_1152_SEQUENCE、STOP_START_1152_SEQUENCE的情況下，前面序列是LPD_SEQUENCE。一TDAC施於LPD_SEQUENCE以取得一良好重疊及相加。In the case of STOP_1152_SEQUENCE, STOP_START_1152_SEQUENCE, the previous sequence is LPD_SEQUENCE. A TDAC is applied to LPD_SEQUENCE to achieve a good overlap and addition.

3. IMDCT See sub-clause 2.3.1 3.1視窗化及區塊切換3.1 Windowing and block switching

視該window_shape 元件而定，使用不同的過度取樣轉換視窗原型，該過度取樣視窗的長度是：Depending on the window_shape component, different oversampling conversion window prototypes are used. The length of the oversampling window is:

N _OS =2‧n _long ‧os _factor _win N _OS =2‧ n _ long ‧ os _ factor _ win

對於window_shape ==1，該等視窗係數由凱撒貝索衍生(KBD)視窗如下給定：For window_shape ==1, these window coefficients are given by the Caesar Besso derived (KBD) window as follows:

其中，W' 、凱撒貝索衍生內核視窗函數(也見[5])如下定義：Among them, W' , Caesar Besso derived kernel window function (see also [5]) is defined as follows:

α=內核視窗alpha因子,α=4α=kernel window alpha factor, α=4

不然，對於window_shape ==0，如下使用一正弦視窗：Otherwise, for window_shape ==0, use a sine window as follows:

對於各種window_sequences，針對左視窗所使用的原型由前一區塊的視窗形狀而決定。下面的公式表達此事實：For various window_sequences, the prototype used for the left window is determined by the shape of the window of the previous block. The following formula expresses this fact:

同樣地，針對右視窗形狀的原型由如下公式來決定：Similarly, the prototype for the shape of the right window is determined by the following formula:

由於已決定該等過渡長度，僅必須表明EIGHT_SHORT_SEQUENCE與所有其它之間的差別：Since the transition length has been determined, only the difference between EIGHT_SHORT_SEQUENCE and all others must be indicated:

a)EIGHT SHORT SEQUENCE:a) EIGHT SHORT SEQUENCE:

下面類似c-code的部分描述一EIGHT_SHORT_SEQUENCE之視窗化及內部重疊-相加：The following section similar to c-code describes the windowing and internal overlap-addition of an EIGHT_SHORT_SEQUENCE:

44 .基於MDCT的TCXMDCT-based TCX 4.1 Tool Description

當該core_mode 等於1且當該三TCX模式之一或多個模式被選定為該“線性預測域”編碼，即mod[]之該4陣列項中之一項大於0時，使用該基於MDCT的TCX工具。該基於MDCT的TCX自該算術解碼器接收該等量化頻譜係數。在施以一反向MDCT轉換以獲得一時域加權合成(其接著被饋送至該加權合成LPC濾波器)之前，由一舒適雜訊來首先完成該等量化係數。When the core_mode is equal to 1 and when one or more modes of the three TCX modes are selected as the "linear prediction domain" coding, that is, one of the 4 array items of mod[] is greater than 0, the MDCT-based is used. TCX tool. The MDCT-based TCX receives the quantized spectral coefficients from the arithmetic decoder. The quantized coefficients are first completed by a comfort noise before an inverse MDCT conversion is applied to obtain a time domain weighted synthesis (which is then fed to the weighted synthesis LPC filter).

4.2定義4.2 Definition

1g　由該算術解碼器輸出的量化頻譜係數之數目1g number of quantized spectral coefficients output by the arithmetic decoder

noise_factor 　雜訊級量化指數 Noise_factor noise level quantization index

雜訊級　注入一重建頻譜中的雜訊級The noise level is injected into the noise level in the reconstructed spectrum.

noise[]　產生雜訊的向量Noise[] vector that produces noise

global_gain 　解比例調整增益量化指數 Global_gain solution proportional gain gain quantization index

g　解比例調整增益g solution proportional gain

rms　該合成時域信號之均方根，x[],Rms The root mean square of the synthesized time domain signal, x[],

x[]　合成時域信號x[] synthetic time domain signal

4.3 decoding process

該基於MDCT的TCX向該算術解碼器請求多個量化頻譜係數1g，其由該mod[]及last_lpd_mode值決定。這兩值也定義將施於該反向MDCT中的該視窗長度及形狀。該視窗由三部分組成：L取樣的一左端重疊、M取樣的若干之一中間部分及R取樣的一右重疊部分。為了獲得長度為2*1g的一MDCT視窗，在該左端加ZL零及在該右端加ZR零，如在針對表格3的第14G圖/第14F圖中所示。The MDCT-based TCX requests the arithmetic decoder for a plurality of quantized spectral coefficients 1g, which are determined by the mod[] and last_lpd_mode values. These two values also define the length and shape of the window to be applied to the inverse MDCT. The window consists of three parts: a left end overlap of the L samples, a middle part of one of the M samples, and a right overlap of the R samples. In order to obtain an MDCT window of length 2*1g, ZL zero is added to the left end and ZR zero is added to the right end, as shown in the 14G/14F chart for Table 3.

該MDCT視窗由如下給定The MDCT window is given by

由該算術解碼器傳送之該等量化頻譜係數、quant[]由一舒適雜訊完成。該注入雜訊之等級由該解碼noise_factor如下決定：The quantized spectral coefficients, quant[] transmitted by the arithmetic decoder are completed by a comfort noise. The level of the injected noise is determined by the decoding noise_factor as follows:

noise_level=0.0625*(8-noise_factor)Noise_level=0.0625*(8-noise_factor)

接著使用一隨機函數、random_sign()、隨機傳送的值-1或+1來運算一雜訊向量、noise[]。Then use a random function, random_sign(), a randomly transmitted value of -1 or +1 to compute a noise vector, noise[].

noise[i]=random_sign()*noise_level;Noise[i]=random_sign()*noise_level;

以在quant[]中的該等多連串8連續零被noise[]中的該等分量替代之一方式來結合該quant[]及noise[]以形成該重建的頻譜係數向量r[]。依據該公式來檢測一連串8非零：The quant[] and noise[] are combined in such a way that the plurality of consecutive zeros in quant[] are replaced by the components in noise[] to form the reconstructed spectral coefficient vector r[]. According to the formula to detect a series of 8 non-zero:

人們如下獲得該重建頻譜：The reconstructed spectrum is obtained as follows:

在實施該反向MDCT之前，依據如下步驟實施一頻譜去成形(de-shaping)：Before performing the inverse MDCT, a spectrum de-shaping is performed according to the following steps:

1.針對該頻譜的第一個四分之一之每一8維區塊，計算該8維區塊在指數m時的能量E _m 1. Calculate the energy E _{m of} the 8-dimensional block at the index m for each of the first quarter of the spectrum.

2.運算比率R _m =sqrt(E _m /E _l ) ，其中I 是具有所有E _m 中的最大值之區塊指數2. The operation ratio R _m = sqrt(E _m /E _l ) , where I is the block index with the largest of all E _m

3.如果R _m <0.1 ，則設定R _m =0.1 3. If R _m < 0.1 , set R _m = 0.1

4.如果R _m <R _m-1 ，則設定R _m =R _m-1 4. If R _m < R _m-1 , set R _m = R _m-1

接著用因子Rm乘以屬於頻譜的第一個四分之一之每一8維區塊。The factor Rm is then multiplied by each of the first quarter of the spectrum belonging to the first quarter.

在一反向MDCT中饋送該重建頻譜。該非視窗化的輸出信號x[]被增益g解比例調整，該增益g是透過該解碼global_gain指數之一反向量化而獲得：The reconstructed spectrum is fed in an inverse MDCT. The non-visualized output signal x[] is scaled by a gain g obtained by inverse quantization of one of the decoded global_gain indices:

g =10^global ^_ ^gain ^/28/(2. ^rms ⁾ g =10 ^global ^_ ^gain ^/28/(2. ^rms ⁾

其中rms被計算為：Where rms is calculated as:

那麼該解比例調整合成時域信號等於：Then the solution is adjusted to synthesize the time domain signal equal to:

x _w [i ]=x [i ]‧g x _w [ i ]= x [ i ]‧ g

在解比例調整之後，施以該視窗化及重疊相加。After the solution is scaled, the windowing and overlap addition are applied.

該重建TCX目標x(n)接著透過該零狀態反向加權合成濾波器濾波以找尋該合成濾波器。注意的是，在該濾波中對每一訊框使用該插入的LP濾波器。一旦決定該激發，該信號藉由將該激發濾波經過合成濾波器且接著藉由濾波經過如上描述的該濾波器1/(1-0.68z ^-1 )來去加強而被重建。The reconstructed TCX target x(n) then passes through the zero state inverse weighted synthesis filter Filter to find the synthesis filter. Note that the inserted LP filter is used for each frame in this filtering. Once the excitation is determined, the signal is passed through a synthesis filter by filtering the excitation It is then reconstructed by filtering through the filter 1/(1-0.68 z ^-1 ) as described above.

注意的是，在一隨後訊框中，該激發也需要更新該ACELP適應性碼簿並允許自TCX至ACELP之切換。還要注意的是，該TCX合成的長度由分別針對1、2、3的mod[]之該TCX訊框長度(沒有重疊)：256、512或1024取樣而給定。Note that in a subsequent frame, the firing also requires updating the ACELP Adaptive Codebook and allowing switching from TCX to ACELP. It should also be noted that the length of the TCX synthesis is given by the TCX frame length (no overlap) for mod[] of 1, 2, 3, respectively: 256, 512 or 1024 samples.

Specification reference

[1]ISO/IEC 11172-3:1993,Information technology-Coding of moving pictures and associated audio for digital storage media at up to about 1,5Mbit/s,Part 3:Audio.[1] ISO/IEC 11172-3:1993, Information technology-Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s, Part 3: Audio.

[2]ITU-T Rec.H.222.0(1995)∣ISO/IEC 13818-1:2000,Information technology-Generic coding of moving pictures and associated audio information:-Part 1:Systems.[2] ITU-T Rec. H. 222.0 (1995) ∣ ISO/IEC 13818-1: 2000, Information technology-Generic coding of moving pictures and associated audio information: -Part 1: Systems.

[3]ISO/IEC 13818-3:1998,Information technology-Generic coding of moving pictures and associated audio information:-Part 3:Audio.[3] ISO/IEC 13818-3:1998, Information technology-Generic coding of moving pictures and associated audio information:-Part 3: Audio.

[4]ISO/IEC 13818-7:2004,Information technology-Generic coding of moving pictures and associated audio information:-Part 7:Advanced Audio Coding(AAC).[4] ISO/IEC 13818-7:2004, Information technology-Generic coding of moving pictures and associated audio information:-Part 7: Advanced Audio Coding (AAC).

[5]ISO/IEC 14496-3:2005,Information technology-Coding of audio-visual objects-Part 1:Systems[5]ISO/IEC 14496-3:2005, Information technology-Coding of audio-visual objects-Part 1:Systems

[6]ISO/IEC 14496-3:2005,Information technology-Coding of audio-visual objects-Part 3:Audio[6] ISO/IEC 14496-3:2005, Information technology-Coding of audio-visual objects-Part 3: Audio

[7]ISO/IEC 23003-1:2007,Information technology-MPEG audio technologies-Part 1:MPEG Surround[7]ISO/IEC 23003-1:2007, Information technology-MPEG audio technologies-Part 1: MPEG Surround

[8]3GPP TS 26.290 V6.3.0,Extended Adaptive Multi-Rate-Wideband(AMR-WB+)codec;Transcoding functions[8] 3GPP TS 26.290 V6.3.0, Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec; Transcoding functions

[9]3GPP TS 26.190,Adaptive Multi-Rate-Wideband(AMR-WB)speech codec;Transcoding functions[9] 3GPP TS 26.190, Adaptive Multi-Rate-Wideband (AMR-WB) speech codec; Transcoding functions

[10]3GPP TS 26.090,Adaptive Multi-Rate(AMR)speech codec;Transcoding functions[10] 3GPP TS 26.090, Adaptive Multi-Rate (AMR) speech codec; Transcoding functions

definition

定義可在ISO/IEC 14496-3子部分1分條款1.3(術語及定義)及3GPP TS 26.290節3(定義及縮語)中找到。The definition can be found in ISO/IEC 14496-3 subpart 1 subsection 1.3 (terms and definitions) and 3GPP TS 26.290 section 3 (definitions and abbreviations).

雖然在一設備的脈絡中已描述了一些層面，但是清楚的是，這些層面也表示該相對應的方法之一描述，其中一區塊或裝置對應於一方法步驟或一方法步驟之一特徵。類似地，在一方法步驟的脈絡中描述的層面也表示一相對應的設備之一相對應的區塊或項或特徵。Although some layers have been described in the context of a device, it is clear that these levels also represent one of the corresponding methods, wherein a block or device corresponds to one of the method steps or one of the method steps. Similarly, the level described in the context of a method step also represents a block or item or feature corresponding to one of the corresponding devices.

該發明的編碼的音訊信號可儲存在一數位儲存媒體上或可在一傳輸媒體上傳輸，諸如一無線傳輸媒體或一有線傳輸媒體(諸如網際網路)。The encoded audio signal of the present invention can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

視某些實施需求而定，本發明之實施例可在硬體或軟體中實施，該實施可使用一數位儲存媒體而執行，例如一軟碟、一DVD、一CD、一ROM、一PROM、一EPROM、一EEPROM或一快閃(FLASH)記憶體，這些數位儲存媒體其上具有電氣可讀取控制信號儲存，藉以與一可規劃電腦系統協作(或能夠協助)以使得本文所描述的該等方法中之一方法被執行。Depending on certain implementation requirements, embodiments of the present invention may be implemented in hardware or software, and the implementation may be performed using a digital storage medium, such as a floppy disk, a DVD, a CD, a ROM, a PROM, An EPROM, an EEPROM or a flash memory (FLASH) memory having electrically readable control signal storage thereon for cooperating (or capable of assisting) with a programmable computer system to enable the described herein One of the methods is executed.

依據本發明的一些實施例包含具有電氣可讀取控制信號之一資料載體，該等電氣可讀取控制信號能夠與一可規劃電腦系統協作以使得本文所描述的該等方法中之一方法被執行。Some embodiments in accordance with the present invention comprise a data carrier having an electrically readable control signal that is capable of cooperating with a programmable computer system such that one of the methods described herein is carried out.

其它實施例包含儲存於一機器可讀取載體上用於執行本文所描述的該等方法中之一方法之電腦程式。Other embodiments comprise a computer program stored on a machine readable carrier for performing one of the methods described herein.

換言之，本發明的方法之一實施例因而是具有用於執行本文所描述的該等方法中之一方法之一電腦程式碼之一電腦程式，當該電腦程式在一電腦上執行時。In other words, an embodiment of the method of the present invention is thus a computer program having one of the computer programs for performing one of the methods described herein, when the computer program is executed on a computer.

本發明的方法之一進一步的實施例因而是一資料載體(或一數位儲存媒體或一電腦可讀取媒體)，其包含：記錄於其上用於執行本文所描述的該等方法中之一方法之一該電腦程式。A further embodiment of the method of the present invention is thus a data carrier (or a digital storage medium or a computer readable medium) comprising: recorded thereon for performing one of the methods described herein One of the methods of the computer program.

本發明的方法之一進一步的實施例因而是表示用於執行本文所描述的該等方法中之一方法之該電腦程式之一資料流或一信號序列。該資料流或該信號序列可例如被組態以透過一資料通訊連接(例如透過網際網路)而被傳送。A further embodiment of the method of the invention thus represents a data stream or a sequence of signals of the computer program for performing one of the methods described herein. The data stream or the sequence of signals can be configured, for example, to be transmitted over a data communication connection (e.g., via the Internet).

一進一步的實施例包含一處理裝置，例如一電腦或一可規劃邏輯裝置，其被組態或被改作以執行本文所描述該等方法中之一方法。A further embodiment comprises a processing device, such as a computer or a programmable logic device, configured or modified to perform one of the methods described herein.

一進一步的實施例包含一電腦，該電腦具有安裝於其上之用於執行本文所描述該等方法中之一方法之該電腦程式。A further embodiment includes a computer having the computer program installed thereon for performing one of the methods described herein.

在一些實施例中，一可規劃邏輯裝置(例如一欄位可規劃閘陣列)可用來執行該等方法之該等功能中之一些或所有功能。在一些實施例中，一欄位可規劃閘陣列可與一微處理器協作以執行本文所描述該等方法中之一方法。大體上，該等方法較佳地由任何硬體設備而執行。In some embodiments, a programmable logic device (e.g., a field programmable gate array) can be used to perform some or all of the functions of the methods. In some embodiments, a field programmable gate array can cooperate with a microprocessor to perform one of the methods described herein. In general, the methods are preferably performed by any hardware device.

上面描述的該等實施例僅僅是說明本發明的原理。要明白的是，對本文描述的該等配置及該等細節的修改或變化對熟於此技者而言將是明顯的。因此，意圖是僅受後附的申請專利範圍之範圍而限制而不受作為本文該等實施例的描述及解釋而出現之該等特定細節限制。The above described embodiments are merely illustrative of the principles of the invention. It will be apparent that modifications and variations of the configurations described herein and the details are apparent to those skilled in the art. Therefore, the intention is to be limited only by the scope of the appended claims

3、4．．．項3, 4. . . item

3a、3b、3c、3d．．．訊框3a, 3b, 3c, 3d. . . Frame

10a．．．LPC分析級10a. . . LPC analysis level

60．．．長期預測分量60. . . Long-term prediction component

62．．．短期預測分量62. . . Short-term prediction component

64．．．碼簿64. . . Code book

66、68．．．區塊66, 68. . . Block

69、86．．．減法器69, 86. . . Subtractor

70．．．全極點濾波器70. . . All-pole filter

72．．．聲門模型72. . . Glottal model

77．．．增益級77. . . Gain level

78．．．轉送路徑78. . . Transfer path

79．．．回饋路徑79. . . Feedback path

80．．．相加級80. . . Addition level

81、85．．．預測濾波器81, 85. . . Predictive filter

84．．．線84. . . line

87、88、89、411、536．．．區塊87, 88, 89, 411, 536. . . Block

99．．．音訊輸入信號99. . . Audio input signal

100．．．共同預處理級100. . . Common pre-processing stage

101．．．聯合多聲道級、環繞/聯合立體聲裝置101. . . Joint multi-channel, surround/combined stereo

101a．．．降混器101a. . . Downmixer

101b．．．聯合多聲道參數分析器101b. . . Joint multichannel parameter analyzer

102．．．頻寬延伸級102. . . Bandwidth extension

102a．．．高頻帶分析器102a. . . High band analyzer

102b．．．低通濾波區塊102b. . . Low pass filter block

195．．．音訊中間信號195. . . Audio intermediate signal

200、521．．．開關200, 521. . . switch

300、525．．．信號分析器、決策級、控制器300, 525. . . Signal analyzer, decision stage, controller

300a．．．信號分析器300a. . . Signal analyzer

400．．．第一編碼支路400. . . First coding branch

410．．．第一轉換器410. . . First converter

410a、523a．．．視窗化器410a, 523a. . . Windowizer

410b、523b．．．轉換器410b, 523b. . . converter

421．．．量化器/定標器區塊、量化器/編碼器級、頻譜音訊編碼器421. . . Quantizer/scaler block, quantizer/encoder level, spectral audio encoder

424．．．量化器/定標器區塊424. . . Quantizer/scaler block

431．．．第一解碼支路、頻譜音訊解碼器、解碼/解量化級431. . . First decoding branch, spectral audio decoder, decoding/dequantization stage

440．．．第一解碼支路、可控制轉換器、第一域轉換器、時域轉換器、頻率/時間轉換器440. . . First decoding branch, controllable converter, first domain converter, time domain converter, frequency/time converter

440a、534a．．．反向轉換器級440a, 534a. . . Reverse converter stage

440b、534b．．．合成視窗級440b, 534b. . . Synthetic window level

440c、534c．．．重疊/相加級440c, 534c. . . Overlap/add level

450．．．第一解碼支路450. . . First decoding branch

500．．．第二編碼支路500. . . Second coding branch

510．．．LPC處理器、域轉換器510. . . LPC processor, domain converter

522．．．量化器/定標器區塊、第一處理支路522. . . Quantizer/scaler block, first processing branch

523．．．第二轉換器523. . . Second converter

524．．．進一步的編碼工具524. . . Further coding tools

526．．．ACELP核心、ACELP區塊、ACELP時域編碼器526. . . ACELP core, ACELP block, ACELP time domain encoder

527．．．TCX區塊、MDCT-TCX處理裝置527. . . TCX block, MDCT-TCX processing device

531‧‧‧第一反向處理支路、元件、反向量化器/編碼器、解碼/解量化級531‧‧‧First inverse processing branch, component, inverse quantizer/encoder, decoding/dequantization stage

532‧‧‧組合器、元件、開關532‧‧‧Combiners, components, switches

533‧‧‧第二反向處理支路、反向量化器/編碼器、解碼/解量化級、項533‧‧‧Second reverse processing branch, inverse quantizer/encoder, decoding/dequantization stage, term

534‧‧‧第二反向處理支路、反向頻譜轉換器、頻率/時間轉換器、元件534‧‧‧Second reverse processing branch, inverse spectrum converter, frequency/time converter, components

537‧‧‧轉換器、TCX^-1 區塊537‧‧‧Transport, TCX ^-1 block

540‧‧‧第一轉換器、域轉換器、LPC合成級、解碼分支540‧‧‧First converter, domain converter, LPC synthesis stage, decoding branch

550‧‧‧第二解碼分支550‧‧‧Second decoding branch

600‧‧‧結合器、開關600‧‧‧ combiner, switch

601‧‧‧模式檢測區塊、模式決策601‧‧‧ mode detection block, mode decision

609‧‧‧解碼音訊信號609‧‧‧Decoding audio signals

699‧‧‧解碼音訊中間信號699‧‧‧Decoding audio intermediate signal

701‧‧‧頻寬擴展區塊、頻寬擴展級701‧‧‧Bandwidth expansion block, bandwidth extension level

701a‧‧‧補節機701a‧‧‧Take machine

701b‧‧‧調整器701b‧‧‧ adjuster

701c‧‧‧結合器701c‧‧‧ combiner

702‧‧‧聯合立體聲/環繞處理級、聯合多通道級、項702‧‧‧Joint stereo/surround processing level, joint multi-channel level, item

702a‧‧‧解碼器端、上混器702a‧‧‧Decoder end, upmixer

702b‧‧‧解碼器端、參數解碼器702b‧‧‧Decoder end, parameter decoder

799‧‧‧解碼音訊信號799‧‧‧Decoding audio signals

800‧‧‧輸出介面、位元流多工器800‧‧‧Output interface, bit stream multiplexer

801‧‧‧編碼器輸出信號801‧‧‧Encoder output signal

900‧‧‧輸入介面、位元流解多工器900‧‧‧Input interface, bit stream demultiplexer

1200‧‧‧心理聲學模組1200‧‧‧Psychoacoustic Module

1201‧‧‧時域雜訊修整工具、元件1201‧‧‧Time domain noise trimming tools and components

1202‧‧‧M/S編碼工具、元件1202‧‧‧M/S coding tools and components

1203‧‧‧M/S解碼工具1203‧‧‧M/S decoding tool

1204‧‧‧TNS解碼器工具1204‧‧‧TNS decoder tool

1205‧‧‧低音後濾波器1205‧‧‧ bass back filter

第1a圖是依據本發明之一第一層面之一編碼方案之一方塊圖；Figure 1a is a block diagram of one of the coding schemes in accordance with one of the first aspects of the present invention;

第1b圖是依據本發明之該第一層面之一解碼方案之一方塊圖；Figure 1b is a block diagram of one of the decoding schemes of the first layer in accordance with the present invention;

第1c圖是依據本發明之一進一步的層面之一編碼方案之一方塊圖；Figure 1c is a block diagram of one of the coding schemes in accordance with one of the further aspects of the present invention;

第2a圖是依據本發明之一第二層面之一編碼方案之一方塊圖；Figure 2a is a block diagram of one of the coding schemes in accordance with one of the second aspects of the present invention;

第2b圖是依據本發明之該第二層面之一解碼方案之一示意圖；Figure 2b is a schematic diagram of one of the decoding schemes of the second layer in accordance with the present invention;

第2c圖是依據本發明之一進一步的層面之一編碼方案之一方塊圖；Figure 2c is a block diagram of one of the coding schemes in accordance with one of the further aspects of the present invention;

第3a圖說明依據本發明之一進一步的層面之一編碼方案之一方塊圖；Figure 3a illustrates a block diagram of one of the coding schemes in accordance with one of the further aspects of the present invention;

第3b圖說明依據本發明之該進一步的層面之一解碼方案之一方塊圖；Figure 3b illustrates a block diagram of one of the decoding schemes in accordance with the further aspect of the present invention;

第3c圖說明具有級聯開關之該編碼設備/方法之一示意表示；Figure 3c illustrates a schematic representation of one of the encoding devices/methods having a cascade switch;

第3d圖說明用於解碼之一設備或方法(其中使用了級聯組合器)之一示意圖；Figure 3d illustrates a schematic diagram of one of the devices or methods for decoding (in which a cascade combiner is used);

第3e圖說明一時域信號之一圖解及說明被包括在兩編碼信號中的短交錯淡出區域之該編碼信號之一相對應的表示；Figure 3e illustrates one of the time domain signals illustrating and representing a representation of one of the encoded signals included in the short interleaved fade out region of the two encoded signals;

第4a圖說明具有定位在該編碼支路之前的一開關之一方塊圖；Figure 4a illustrates a block diagram of a switch having a position prior to the encoding branch;

第4b圖說明具有定位在該編碼支路之後的該開關之一編碼方案之一方塊圖；Figure 4b illustrates a block diagram of one of the encoding schemes of the switch positioned after the encoding branch;

第5a圖說明作為一準週期性或類似脉衝的信號段之一時域語音段之一波束形成；Figure 5a illustrates one of the time domain speech segments as one of the quasi-periodic or similarly pulsed signal segments;

第5b圖說明第5a圖之該段之一頻譜；Figure 5b illustrates a spectrum of the segment of Figure 5a;

第5c圖說明無聲語音之一時域語音段，作為針對一類似雜訊段之一範例；Figure 5c illustrates a time domain speech segment of silent speech as an example for a similar noise segment;

第5d圖說明第5c圖之該時域波束之一頻譜；Figure 5d illustrates a spectrum of the time domain beam of Figure 5c;

第6圖說明一綜合分析CELP編碼器之一方塊圖；Figure 6 illustrates a block diagram of a comprehensive analysis CELP encoder;

第7a至7d圖說明有聲/無聲激發信號，作為針對類似脉衝信號之一範例；Figures 7a through 7d illustrate an audible/silent excitation signal as an example of a similar pulse signal;

第7e圖說明提供短期預測資訊及該預測誤差(激發)信號之一編碼器端LPC級，；Figure 7e illustrates an encoder-side LPC stage that provides short-term prediction information and one of the prediction error (excitation) signals;

第7f圖說明用於產生一加權信號之一LPC裝置之一進一步的實施例；Figure 7f illustrates a further embodiment of one of the LPC devices for generating a weighted signal;

第7g圖說明透藉由實施如在第2b圖之該轉換器537中所需要之一反向加權操作及一隨後的激發分析來將一加權信號轉換成一激發信號之一實施；Figure 7g illustrates the implementation of converting a weighted signal into an excitation signal by performing one of the inverse weighting operations required in the converter 537 as shown in Figure 2b and a subsequent excitation analysis;

第8圖說明依據本發明之一實施例之一聯合多聲道演算法之一方塊圖；Figure 8 illustrates a block diagram of a joint multi-channel algorithm in accordance with one embodiment of the present invention;

第9圖說明一頻寬擴展演算法之一較佳實施例；Figure 9 illustrates a preferred embodiment of a bandwidth extension algorithm;

第10a圖說明當執行一開迴路決策時對該開關之一詳細描述；及Figure 10a illustrates a detailed description of one of the switches when performing an open loop decision; and

第10b圖說明檔在一閉合迴路決策模式中操作時該開關之一圖解。Figure 10b illustrates an illustration of one of the switches when the file is operated in a closed loop decision mode.

第11A圖依據本發明之另一層面說明一音訊編碼器之一方塊圖；11A is a block diagram showing an audio encoder according to another aspect of the present invention;

第11B圖說明一發明音訊解碼器之另一實施例之一方塊圖；Figure 11B is a block diagram showing another embodiment of an inventive audio decoder;

第12A圖說明一發明編碼器之另一實施例；Figure 12A illustrates another embodiment of an inventive encoder;

第12B圖說明一發明解碼器之另一實施例；Figure 12B illustrates another embodiment of an inventive decoder;

第13A圖說明解析度與視窗/轉換長度之間之相互關係；Figure 13A illustrates the relationship between resolution and window/conversion length;

第13B圖說明針對該第一編碼支路之一組轉換視窗之一概觀及自該第一至該第二編碼支路之一過渡；Figure 13B illustrates an overview of one of the first set of conversion paths and a transition from the first to the second encoded branch;

第13C圖說明多個不同視窗序列，包括針對該第一編碼支路的視窗序列及針對到該第二支路的一過渡之序列；Figure 13C illustrates a plurality of different window sequences, including a sequence of windows for the first encoding branch and a sequence for a transition to the second branch;

第14A圖說明該第二編碼支路之一較佳實施例之該定框；Figure 14A illustrates the framing of a preferred embodiment of the second encoding branch;

第14B圖說明應用於該第二編碼支路之短視窗；Figure 14B illustrates a short window applied to the second encoding branch;

第14C圖說明應用於該第二編碼支路之中等大小的視窗；Figure 14C illustrates a window of equal size applied to the second encoding branch;

第14D圖說明該第二編碼支路所應用的長視窗；Figure 14D illustrates a long window to which the second encoding branch is applied;

第14E圖說明在一超訊框劃分中的ACELP訊框及TCX訊框之一示範序列；Figure 14E illustrates an exemplary sequence of ACELP frames and TCX frames in a hyperframe division;

第14F圖說明相對應於針對該第二編碼支路的不同時間/頻率解析度之不同的轉換長度；及Figure 14F illustrates a different conversion length corresponding to different time/frequency resolutions for the second encoding branch; and

第14G圖說明使用第14F圖的多個定義之一視窗之一建構。Figure 14G illustrates the construction of one of the windows using the plurality of definitions of Figure 14F.

200．．．開關200. . . switch

300、525．．．信號分析器300, 525. . . Signal analyzer

400．．．第一編碼支路400. . . First coding branch

410．．．第一轉換器410. . . First converter

410a、523a．．．視窗化器410a, 523a. . . Windowizer

410b、523b．．．轉換器410b, 523b. . . converter

421．．．量化器/編碼器級421. . . Quantizer/encoder level

500．．．第二編碼支路500. . . Second coding branch

510．．．LPC處理器、域轉換器510. . . LPC processor, domain converter

523．．．第二轉換器523. . . Second converter

524．．．進一步的編碼工具524. . . Further coding tools

800．．．輸出介面800. . . Output interface

801．．．編碼器輸出信號801. . . Encoder output signal

Claims

An audio encoder for encoding an audio signal, comprising: a first encoding branch for encoding an audio signal using a first encoding algorithm to obtain a first encoded signal, the first encoding branch The circuit includes a first converter for converting an input signal into a spectral domain; a second encoding branch for encoding an audio signal using a second encoding algorithm to obtain a second encoded signal, wherein The first encoding algorithm is different from the second encoding algorithm, and the second encoding branch includes a domain converter for converting an input signal from an input field to an output domain and for converting an input signal into a a second converter of the spectral domain; a switch for switching between the first encoding branch and the second encoding branch such that for a portion of the audio input signal, the first encoded signal or the first The second encoded signal is in an encoder output signal; a signal analyzer is configured to analyze the portion of the audio signal to determine that the portion of the audio signal is represented as the first in the encoder output signal An encoded signal or the second encoded signal, wherein the signal analyzer is further configured to variably determine the first converter and when the first encoded signal or the second encoded signal representing the partial audio signal is generated a respective time/frequency resolution of the second converter; and an output interface for generating an encoder output signal, the encoder output signal comprising the first encoded signal and the second encoded signal And indicating one of the first coded signal and the second coded signal, and indicating one of the time/frequency resolutions for encoding the first coded signal and for encoding the second coded signal, wherein the signal The analyzer is configured to determine a time/frequency resolution selected from a plurality of different window lengths selected from a set of different window lengths, the set being 2304 samples, 2048 samples, 256 samples, 1920 samples, 2160 samples, and 240 samples, or a plurality of different conversion lengths, including different coefficients selected from 1152 per conversion block, 1024 coefficients per conversion block, 1080 coefficients per conversion block, per conversion region Block 960 coefficients, each conversion block 128 coefficient, and at least two coefficients per conversion block in the group consisting of each conversion block 120 coefficient, or wherein the signal analyzer is configured to determine the second conversion The time/frequency resolution of the device as one of a plurality of different window lengths, the plurality of different window lengths comprising selected from 640 samples, 1152 samples, 2304 samples, 512 samples, 1024 samples, or At least two window lengths in a group of 2048 samples, or a plurality of different conversion lengths, the different conversion lengths comprising at least two different conversion lengths selected from the group of different conversion lengths, the group The group includes spectral coefficients per conversion block 320, spectral coefficients per conversion block 576, spectral coefficients per conversion block 1152, spectral coefficients per conversion block 256, spectral coefficients per conversion block 512, and 1024 spectral coefficients per conversion block .

The audio encoder of claim 1, wherein the signal analyzer is configured to classify the partial audio signal into a pseudo-speech audio signal or a pseudo-music audio signal, and is used in a music In the case of a signal, a transient detection is performed to determine the time/frequency resolution of the first converter, or to perform a comprehensive analysis process to determine the time/frequency resolution of the second converter.

The audio encoder of claim 1 or 2, wherein the first converter and the second converter comprise a variable windowing conversion processor, the variable windowing conversion processor comprising A window function of a variable window size and a conversion function having a variable conversion length, and wherein the signal analyzer is configured to control the window size and/or the conversion length based on the signal analysis.

The audio encoder of claim 1, wherein the second encoder branch includes a first processing branch for processing an audio signal in the domain determined by the domain converter and A second processing branch including one of the second converters, wherein the signal analyzer is configured to subdivide the portion of the audio signal into a series of sub-portions, and wherein the signal analyzer is configured to The position of the sub-portion processed by the first processing branch relative to a sub-portion of the portion processed by the second processing branch determines the time/frequency resolution of the second converter.

The audio encoder of claim 4, wherein the first processing branch comprises an ACELP encoder, Wherein the second processing branch includes an MDCT-TCX processing device, wherein the signal analyzer is configured to set the time resolution of the second converter to a first value determined by a length of one of the sub-portions Or determining a second value from a length of one of the sub-portions that is an integer value greater than one, the second value being less than the first value.

The audio encoder of claim 1, wherein the signal analyzer is configured to determine a signal classification in a constant grating covering a plurality of equal-sized audio sampling blocks, and to The block is subdivided into a variable number of blocks according to the audio signal, wherein a length of one of the sub-blocks determines the first time/frequency resolution or the second time/frequency resolution.

The audio encoder of claim 1, wherein the second encoding branch comprises: a first processing branch for processing an audio signal; a second processing branch, the second processing branch Including the second converter; and a further switch for switching between the first processing branch and the second processing branch such that a portion of the audio signal is input to the second encoding branch A first processed signal or a second processed signal is in the second encoded signal.

A method of encoding an audio signal of an audio signal, comprising the steps of: encoding a sound signal using a first encoding algorithm in a first encoding branch to obtain a first encoded signal, the first encoding branch comprising Converting an input signal into the first converter of a spectral domain; Using a second encoding algorithm to encode an audio signal to obtain a second encoded signal, wherein the first encoding algorithm is different from the second encoding algorithm, the second encoding branch includes a second domain converter for converting an input signal from an input field to an output field and a second converter for converting an input signal into a spectral domain; at the first encoding branch and the second encoding branch Switching between the paths such that for a portion of the audio input signal, the first encoded signal or the second encoded signal is in an encoder output signal; analyzing the portion of the audio signal to determine the portion of the audio signal at the encoder output signal Is represented as the first encoded signal or the second encoded signal, and when the first encoded signal or the second encoded signal indicating the portion of the audio signal is generated, the first converter and the first converter are variably determined a respective time/frequency resolution of the second converter; and generating an encoder output signal, the encoder output signal comprising the first encoded signal and the second encoded signal and Instructing one of the first coded signal and the second coded signal to indicate, and indicating one of the time/frequency resolutions for encoding the first coded signal and for encoding the second coded signal, wherein the analyzing step Determining, the time/frequency resolution selected from a plurality of different window lengths selected from a set of different window lengths, the set being 2304 samples, 2048 samples, 256 samples, 1920 samples , 2160 sampling and 240 sampling, or A plurality of different conversion lengths are used, the different conversion lengths comprising selected from 1152 coefficients per conversion block, 1024 coefficients per conversion block, 1080 coefficients per conversion block, 960 coefficients per conversion block, 128 coefficients per conversion block, each per conversion block Converting at least two of the groups of coefficients of the block 120, or wherein the analyzing step includes determining the time/frequency resolution of the second converter as one of a plurality of different window lengths, the plurality of Different window lengths include at least two window lengths selected from the group consisting of 640 samples, 1152 samples, 2304 samples, 512 samples, 1024 samples, and 2048 samples, or multiple different conversion lengths, including different conversion lengths Free per conversion block 320 spectral coefficients, per conversion block 576 spectral coefficients, 1152 spectral coefficients per conversion block, 256 spectral coefficients per conversion block, 512 spectral coefficients per conversion block, and 1024 spectral coefficients per conversion block At least two different conversion lengths in the group of different conversion lengths.

An audio decoder for decoding an encoded signal, the decoded signal comprising a first encoded signal, a second encoded signal, indicating one of the first encoded signal and the second encoded signal, and used to decode the first And a time/frequency resolution indication of the encoded signal and the second encoded audio signal, the audio decoder comprising: a first decoding branch for decoding the first using a first controllable frequency/time converter An encoded signal, the first controllable frequency/time converter configured to be controlled to use the first encoded signal The time/frequency resolution indication obtains a first decoded signal; a second decoding branch for decoding the second encoded signal using a second controllable frequency/time converter, the second The controlled frequency/time converter is configured to be controlled to use the time/frequency resolution indication of the second encoded signal; a controller for controlling the first frequency using the time/frequency resolution indication/ a time converter and the second frequency/time converter; a domain converter for generating a composite signal using the second decoded signal; and a combiner for combining the first decoded signal with the composite Combining signals to obtain a decoded audio signal, wherein the controller is configured to control the first frequency/time converter and the second frequency/time converter such that the first frequency/time converter is at the time /frequency resolution is selected from a plurality of different window lengths, the different window lengths being at least two of 2304, 2048, 256, 1920, 2160, 240 samples, or selected from a plurality of different conversion lengths, the plurality of different conversions length Containing at least two of the group consisting of 1152, 1024, 1080, 960, 128, 120 coefficients per conversion block, or for the second frequency/time converter, the time/frequency resolution is selected as One of a plurality of different window lengths, the plurality of different window lengths being at least two of 640, 1152, 2304, 512, 1024, or 2048 samples, or The system is selected from a plurality of different conversion lengths including at least two of the group consisting of 320, 576, 1152, 256, 512, 1024 spectral coefficients per conversion block.

The audio decoder of claim 9, wherein the second decoding branch includes a first reverse processing branch, and the first reverse processing branch is additionally included in the reverse processing Decoding a first processed signal to obtain a first reverse processed signal; wherein the second controllable frequency/time converter is located in a second reverse processing branch, the second reverse processing branch Configuring to inversely process the second encoded signal in the same domain as the first inverted processed signal to obtain a second inverted processed signal; a further combiner for the A reverse processed signal is combined with the second reverse processed signal to obtain a combined signal; and wherein the combined signal is input to the combiner.

The audio decoder of claim 9, wherein the first frequency/time converter and the second frequency/time converter are time domain aliasing cancellation converters, and the time domain aliasing cancellation converter has And acknowledging a time domain aliasing overlap/add unit included in the first coded signal and the second coded signal.

The audio decoder of claim 9, wherein the encoded signal includes an encoding mode indication for identifying whether the encoded signal is the first encoded signal and the second encoded signal, and wherein the decoder further comprises an input Interface, the input interface is configured to interpret the coding mode indication to determine that the coded signal is to be fed Send to the first decoding branch or to the second decoding branch.

The audio decoder of claim 9, wherein the first encoded signal is arithmetically encoded, and wherein the first encoding branch comprises an arithmetic decoder.

The audio decoder of claim 9, wherein the first encoding branch comprises a dequantizer having a non-uniform dequantization characteristic for canceling when the first encoded signal is generated A result of a non-uniform quantization performed, wherein the second encoding branch comprises a dequantizer using different dequantization characteristics, or wherein the second encoding branch does not include a dequantizer.

The audio decoder of claim 9, wherein the controller is configured to apply one of a plurality of potentially different discrete frequencies/time resolutions to each converter by a discrete frequency/time resolution Controlling the first frequency/time converter and the second frequency/time converter, the number of possible different discrete frequencies/time resolutions of the second converter being higher than the possible different frequency/time resolution of the first converter The number of degrees.

The audio decoder of claim 9, wherein the domain converter is an LPC synthesis processor that uses a PC filter information to generate the synthesized signal, the LPC filter information being included in the encoded signal. .

A method for audio decoding an encoded signal, the encoded signal comprising a first encoded signal, a second encoded signal, an indication indicating the first encoded signal and the second encoded signal, and a decoding used to decode the first encoding letter And a time/frequency resolution indication of the second encoded audio signal, the method comprising the steps of: decoding, by the first decoding branch, a first controllable frequency/time converter to decode the first encoded signal, the first A controllable frequency/time converter is configured to be controlled to use the time/frequency resolution indication of the first encoded signal to obtain a first decoded signal; to use a second controllable frequency by a second decoding branch /time converter to decode the second encoded signal, the second controllable frequency/time converter being configured to be controlled to use the time/frequency resolution indication of the second encoded signal; using the time/frequency resolution Instructing to control the first frequency/time converter and the second frequency/time converter; generating, by a domain converter, the second decoded signal to generate a composite signal; and combining the first decoded signal with the composite signal Obtaining a decoded audio signal, wherein the controlling the first frequency/time converter and the second frequency/time converter causes the time/frequency solution for the first frequency/time converter The degree is selected from a plurality of different window lengths, the different window lengths being at least two of the 2304, 2048, 256, 1920, 2160, 240 samples, or selected from a plurality of different conversion lengths, the plurality of different conversion lengths comprising Each conversion block is composed of at least two of the group consisting of 1152, 1024, 1080, 960, 128, 120 coefficients, or For the second frequency/time converter, the time/frequency resolution is selected to be one of a plurality of different window lengths, the plurality of different window lengths being 640, 1152, 2304, 512, 1024 or 2048 samples At least two of them are selected from a plurality of different conversion lengths including at least two of the set of spectral coefficients per conversion block 320, 576, 1152, 256, 512, 1024. .

A computer program for operating on a processor for performing the method of claim 8 or 17.