JP7389651B2

JP7389651B2 - Variable alphabet size in digital audio signals

Info

Publication number: JP7389651B2
Application number: JP2019558590A
Authority: JP
Inventors: アルバートチョウ; アントニウスカルカー; ガディエルセルーシ
Original assignee: DTS Inc
Current assignee: DTS Inc
Priority date: 2017-04-25
Filing date: 2018-04-24
Publication date: 2023-11-30
Anticipated expiration: 2038-04-24
Also published as: CN110800049B; JP2020518031A; US20180308497A1; US10699723B2; KR20200012862A; EP3616199A1; CN110800049A; KR102613282B1; EP3616199A4; WO2018200426A1

Description

（関連出願の相互参照）
本出願は、２０１７年４月２５日出願の米国仮特許出願第６２／４８９，８６７号の利益を主張する２０１８年３月２０日出願の米国特許出願第１５／９２６，０８９号に対し優先権を主張するものであり、これらの開示内容全体は、引用により本明細書に組み込まれる。 (Cross reference to related applications)
This application has priority to U.S. Patent Application No. 15/926,089, filed March 20, 2018, which claims the benefit of U.S. Provisional Patent Application No. 62/489,867, filed April 25, 2017. , the entire disclosures of which are incorporated herein by reference.

本開示は、オーディオ信号の符号化又は復号に関する。 TECHNICAL FIELD This disclosure relates to encoding or decoding audio signals.

オーディオコーデックは、時間領域オーディオ信号をデジタルファイル又はデジタルストリームに符号化して、デジタルファイル又はデジタルストリームを時間領域オーディオ信号に復号することができる。符号化されたファイル又はストリームのサイズを小さくすることなど、オーディオコーデックを改良する継続的な取り組みが行われている。 Audio codecs can encode time-domain audio signals into digital files or streams and decode digital files or streams into time-domain audio signals. Continuing efforts are being made to improve audio codecs, including reducing the size of encoded files or streams.

符号化システムの１つの実施例は、プロセッサと、該プロセッサによって実行可能な命令を格納するメモリデバイスであって、上記命令が、オーディオ信号を符号化するための方法を実行するように上記プロセッサによって実行可能であるメモリデバイスと、を含むことができ、上記方法は、デジタルオーディオ信号を受け取るステップと、該デジタルオーディオ信号を構文解析して、指定された数のオーディオサンプルを各々が含む複数のフレームにするステップと、各フレームのオーディオサンプルの変換を行って、各フレームに関する複数の周波数領域係数を生成するステップと、各フレームに関する複数の周波数領域係数を各フレームに関する複数の帯域に分割するステップであって、各帯域が時間分解能及び周波数分解能を表す再形成パラメータを有するステップと、デジタルオーディオ信号を符号化して、再形成パラメータを含むビットストリームにするステップであって、第１の帯域に関する再形成パラメータは、第１のアルファベットサイズを使用して符号化され、該第１の帯域と異なる第２帯域に関する再形成パラメータは、該第１のアルファベットサイズと異なる第２のアルファベットサイズを使用して符号化されるステップと、ビットストリームを出力するステップと、を含む。 One embodiment of an encoding system includes a processor and a memory device storing instructions executable by the processor, the instructions being operable by the processor to perform a method for encoding an audio signal. a memory device executable, the method comprising: receiving a digital audio signal; and parsing the digital audio signal into a plurality of frames each including a specified number of audio samples. transforming the audio samples of each frame to generate a plurality of frequency domain coefficients for each frame; and dividing the plurality of frequency domain coefficients for each frame into a plurality of bands for each frame. each band having a reshaping parameter representing a time resolution and a frequency resolution; and encoding the digital audio signal into a bitstream including the reshaping parameters, the reshaping for the first band. The parameters are encoded using a first alphabet size, and the reshaping parameters for a second band different from the first band are encoded using a second alphabet size different from the first alphabet size. and outputting a bitstream.

復号システムの１つの実施例は、プロセッサと、プロセッサによって実行可能な命令を格納するメモリデバイスであって、上記命令が、符号化されたオーディオ信号を復号するための方法を実行するようにプロセッサによって実行可能である、メモリデバイスと、を含むことができ、上記方法は、複数の帯域に各々が分割された複数のフレームを含むビットストリームを受け取るステップと、各フレームの各帯域に対して、帯域に関する時間分解能及び周波数分解能を表す再形成パラメータをビットストリームから抽出するステップであって、第１の帯域に関する再形成パラメータは、第１のアルファベットサイズを使用してビットストリームに埋め込まれており、該第１の帯域と異なる第２帯域に関する再形成パラメータは、該第１のアルファベットサイズと異なる第２のアルファベットサイズを使用してビットストリームに埋め込まれているステップと、再形成パラメータを使用してビットストリームを復号して、復号されたデジタルオーディオ信号を生成するステップと、を含む。 One embodiment of a decoding system includes a processor and a memory device storing instructions executable by the processor, the instructions being operable by the processor to perform a method for decoding an encoded audio signal. and a memory device, the method comprising: receiving a bitstream including a plurality of frames each divided into a plurality of bands; extracting from the bitstream reconstruction parameters representative of temporal and frequency resolution for a first band, the reconstruction parameters for a first band being embedded in the bitstream using a first alphabet size; A reshaping parameter for a second band different from the first band is embedded in a bitstream using a second alphabet size different from the first alphabet size; decoding the stream to generate a decoded digital audio signal.

符号化システムの別の実施例は、デジタルオーディオ信号を受け取るための受信器回路と、デジタルオーディオ信号を構文解析して、指定された数のオーディオサンプルを各々が含む複数のフレームにするためのフレーマ回路と、各フレームのオーディオサンプルの変換を行って、各フレームに関する複数の周波数領域係数を生成するための変換器回路と、各フレームに関する複数の周波数領域係数を各フレームに関する複数の帯域に分割するための周波数帯域分割器回路であって、各帯域が時間分解能及び周波数分解能を表す再形成パラメータを有する、周波数帯域分割器回路と、デジタルオーディオ信号を符号化して、各帯域の再形成パラメータを含むビットストリームにするためのエンコーダ回路であって、第１の帯域に関する再形成パラメータは、第１のアルファベットサイズを使用して符号化され、該第１の帯域と異なる第２帯域に関する再形成パラメータは、該第１のアルファベットサイズと異なる第２のアルファベットサイズを使用して符号化されるエンコーダ回路と、ビットストリームを出力するための出力回路と、を含むことができる。 Another embodiment of the encoding system includes a receiver circuit for receiving a digital audio signal and a framer for parsing the digital audio signal into a plurality of frames each containing a specified number of audio samples. a transformer circuit for transforming the audio samples of each frame to generate a plurality of frequency domain coefficients for each frame and dividing the plurality of frequency domain coefficients for each frame into a plurality of bands for each frame; a frequency band divider circuit for encoding a digital audio signal, the frequency band divider circuit having a reshaping parameter representing a time resolution and a frequency resolution, each band having a reshaping parameter for each band; An encoder circuit for bitstreaming, wherein a reshaping parameter for a first band is encoded using a first alphabet size, and a reshaping parameter for a second band different from the first band is , an encoder circuit encoded using a second alphabet size different from the first alphabet size, and an output circuit for outputting a bitstream.

幾つかの実施例による符号化システムの１つの実施例のブロック図を示す。1 illustrates a block diagram of one embodiment of an encoding system according to some embodiments. FIG. 幾つかの実施例による符号化システムの別の実施例のブロック図を示す。FIG. 5 illustrates a block diagram of another embodiment of an encoding system according to some embodiments. 幾つかの実施例による復号システムの１つの実施例のブロック図を示す。1 illustrates a block diagram of one embodiment of a decoding system according to some embodiments. FIG. 幾つかの実施例による復号システムの別の実施例のブロック図を示す。FIG. 5 illustrates a block diagram of another embodiment of a decoding system according to some embodiments. 幾つかの実施例による、デジタルオーディオ信号の符号化と関連する量のうちの幾つかを示す。4 illustrates some of the quantities associated with encoding a digital audio signal, according to some embodiments. 幾つかの実施例による、オーディオ信号を符号化するための方法の１つの実施例のフローチャートを示す。1 illustrates a flowchart of one embodiment of a method for encoding an audio signal, according to some embodiments. 幾つかの実施例による、符号化されたオーディオ信号を復号するための方法の１つの実施例のフローチャートを示す。1 illustrates a flowchart of one embodiment of a method for decoding an encoded audio signal, according to some embodiments. 幾つかの実施例による、オーディオ信号を符号化及び復号するための擬似コードの実施例を示す。4 illustrates an example of pseudocode for encoding and decoding an audio signal, according to some embodiments. 幾つかの実施例による、オーディオ信号を符号化及び復号するための擬似コードの実施例を示す。4 illustrates an example of pseudocode for encoding and decoding an audio signal, according to some embodiments. 幾つかの実施例による、オーディオ信号を符号化及び復号するための擬似コードの実施例を示す。4 illustrates an example of pseudocode for encoding and decoding an audio signal, according to some embodiments. 幾つかの実施例による、オーディオ信号を符号化及び復号するための擬似コードの実施例を示す。4 illustrates an example of pseudocode for encoding and decoding an audio signal, according to some embodiments. 幾つかの実施例による符号化システムの１つの実施例のブロック図を示す。1 illustrates a block diagram of one embodiment of an encoding system according to some embodiments. FIG.

対応する参照符号は、幾つかの図全体にわたって対応する要素を示す。図面中の要素は、必ずしも縮尺どおりに描かれていない。図面に示された構成は、単なる例証に過ぎず、どのようにしても本発明の範囲を限定するものではないと解釈されたい。 Corresponding reference numbers indicate corresponding elements throughout the several figures. The elements in the drawings are not necessarily drawn to scale. The configurations shown in the drawings should be construed as illustrative only and not as limiting the scope of the invention in any way.

コーデックなどのオーディオ符号化及び／又は復号システムにおいて、異なる帯域における再形成パラメータは、異なるサイズを有するアルファベットを使用して符号化することができる。異なるアルファベットサイズを使用することは、以下でより詳細に説明するように、ビットストリーム（例えば、符号化されたデジタルオーディオ信号）においてよりコンパクトな圧縮を可能にすることができる。 In audio encoding and/or decoding systems, such as codecs, reshaping parameters in different bands can be encoded using alphabets with different sizes. Using different alphabet sizes may allow for more compact compression in a bitstream (eg, an encoded digital audio signal), as described in more detail below.

図１は、幾つかの実施例による符号化システム１００の１つの実施例のブロック図を示している。図１の構成は、符号化システムの一例に過ぎず、他の好適な構成も使用できる。 FIG. 1 depicts a block diagram of one embodiment of an encoding system 100, according to some embodiments. The configuration of FIG. 1 is only one example of an encoding system; other suitable configurations may also be used.

符号化システム１００は、入力としてデジタルオーディオ信号１０２を受け取り、ビットストリーム１０４を出力することができる。入力信号１０２及び出力信号１０４は、各々、ローカルに又はアクセス可能なサーバ上に保存された１又は２以上の個別ファイル及び／又はローカルで又はアクセス可能なサーバ上で生成された１又は２以上のオーディオストリームを含むことができる。 Encoding system 100 can receive digital audio signal 102 as input and output a bitstream 104. Input signal 102 and output signal 104 each include one or more separate files stored locally or on an accessible server and/or one or more files generated locally or on an accessible server. Can contain audio streams.

符号化システム１００は、プロセッサ１０６を含むことができる。符号化システム１００は更に、プロセッサ１０６により実行可能な命令１１０を格納するメモリデバイス１０８を含むことができる。プロセッサ１０６が、オーディオ信号を符号化するための方法を実行するように命令１１０を実行することができる。オーディオ信号を符号化するためのこのような方法の１つの実施例が以下で詳細に説明される。 Encoding system 100 may include a processor 106. Encoding system 100 may further include a memory device 108 that stores instructions 110 executable by processor 106. Instructions 110 may be executed by processor 106 to perform a method for encoding an audio signal. One embodiment of such a method for encoding audio signals is described in detail below.

図１の構成では、符号化は、ソフトウェアで実行され、典型的には、コンピューティングデバイスにおいて追加のタスクも実行できるプロセッサによって実行される。別の方法として、符号化はまた、符号化を実行するようにハードウェアに組み込まれた専用チップ又は専用プロセッサなどによって、ハードウェアで実行することもできる。このようなハードウェアベースのエンコーダの実施例が図２に示されている。 In the configuration of FIG. 1, the encoding is performed in software, typically by a processor in the computing device that can also perform additional tasks. Alternatively, the encoding can also be performed in hardware, such as by a dedicated chip or processor embedded in the hardware to perform the encoding. An example of such a hardware-based encoder is shown in FIG.

図２は、幾つかの実施例による符号化システム２００の別の実施例のブロック図を示している。図２の構成は、符号化システムの一例に過ぎず、他の好適な構成も使用できる。 FIG. 2 depicts a block diagram of another embodiment of an encoding system 200, according to some embodiments. The configuration of FIG. 2 is only one example of an encoding system; other suitable configurations may also be used.

符号化システム２００は、入力としてデジタルオーディオ信号２０２を受け取り、ビットストリーム２０４を出力することができる。符号化システム２００は、専用符号化プロセッサ２０６を含むことができ、特定の符号化方法を実行するようにハードウェアに組み込まれたチップを含むことができる。オーディオ信号を符号化するためのこのような方法の実施例は、以下で詳細に説明される。 Encoding system 200 can receive a digital audio signal 202 as input and output a bitstream 204. Encoding system 200 may include a dedicated encoding processor 206, which may include a chip embedded in hardware to perform a particular encoding method. An example of such a method for encoding an audio signal is described in detail below.

図１及び図２の実施例は、それぞれソフトウェア及びハードウェアで動作できる符号化システムを示している。以下の図３及び４は、それぞれソフトウェア及びハードウェアで動作できる同等の復号システムを示している。 The embodiments of FIGS. 1 and 2 illustrate encoding systems that can operate in software and hardware, respectively. Figures 3 and 4 below illustrate equivalent decoding systems that can operate in software and hardware, respectively.

図３は、幾つかの実施例による復号システムの１つの実施例のブロック図を示している。図３の構成は、復号システムの一例に過ぎず、他の好適な構成も使用できる。 FIG. 3 depicts a block diagram of one embodiment of a decoding system according to some embodiments. The configuration of FIG. 3 is only one example of a decoding system; other suitable configurations may also be used.

復号システム３００は、入力としてビットストリーム３０２を受け取り、復号されたデジタルオーディオ信号３０４を出力することができる。入力信号３０２及び出力信号３０４は、各々、ローカルに又はアクセス可能なサーバ上に保存された１又は２以上の個別ファイル及び／又はローカルで又はアクセス可能なサーバ上で生成された１又は２以上のオーディオストリームを含むことができる。 Decoding system 300 can receive a bitstream 302 as input and output a decoded digital audio signal 304. Input signal 302 and output signal 304 each include one or more separate files stored locally or on an accessible server and/or one or more files generated locally or on an accessible server. Can contain audio streams.

復号システム３００は、プロセッサ３０６を含むことができる。復号システム３００は更に、プロセッサ３０６により実行可能な命令３１０を格納するメモリデバイス３０８を含むことができる。プロセッサ３０６が、オーディオ信号を復号するための方法を実行するように命令３１０を実行することができる。オーディオ信号を復号するためのこのような方法の実施例は、以下で詳細に説明される。 Decoding system 300 can include a processor 306. Decoding system 300 may further include a memory device 308 that stores instructions 310 executable by processor 306. Instructions 310 may be executed by processor 306 to perform a method for decoding an audio signal. An example of such a method for decoding an audio signal is described in detail below.

図３の構成では、復号は、ソフトウェアで実行され、典型的には、コンピューティングデバイスにおいて追加のタスクも実行できるプロセッサによって実行される。別の方法として、復号はまた、符号化を実行するようにハードウェアに組み込まれた専用チップ又は専用プロセッサなどによって、ハードウェアで実行することもできる。このようなハードウェアベースのデコーダの実施例が図４に示されている。 In the configuration of FIG. 3, decoding is performed in software, typically by a processor in the computing device that can also perform additional tasks. Alternatively, decoding can also be performed in hardware, such as by a dedicated chip or processor embedded in the hardware to perform the encoding. An example of such a hardware-based decoder is shown in FIG.

図４は、幾つかの実施例による復号システム４００の別の実施例のブロック図を示している。図４の構成は、復号システムの一例に過ぎず、他の好適な構成も使用できる。 FIG. 4 depicts a block diagram of another embodiment of a decoding system 400, according to some embodiments. The configuration of FIG. 4 is only one example of a decoding system; other suitable configurations may also be used.

復号システム４００は、入力としてビットストリーム４０２を受け取り、復号されたデジタルオーディオ信号４０４を出力することができる。復号システム４００は、専用復号プロセッサ４０６を含むことができ、特定の復号方法を実行するようにハードウェアに組み込まれたチップを含むことができる。オーディオ信号を復号するためのこのような方法の実施例は、以下で詳細に説明される。 Decoding system 400 can receive a bitstream 402 as input and output a decoded digital audio signal 404. Decoding system 400 may include a dedicated decoding processor 406 and may include a chip embedded in hardware to perform a particular decoding method. An example of such a method for decoding an audio signal is described in detail below.

図５は、幾つかの実施例による、デジタルオーディオ信号の符号化と関連する量のうちの幾つかを示している。ビットストリームの復号は、一般に、ビットストリームの符号化と同じ量を伴うが、数学演算が逆に行われる量を伴う。図５に示されている量は、このような量の一例に過ぎず、他の好適な量が、同様に使用できる。図５に示されている量の各々は、図１から４に示されているエンコーダ又はデコーダの何れかと共に使用できる。 FIG. 5 illustrates some of the quantities associated with encoding a digital audio signal, according to some embodiments. Decoding a bitstream generally involves the same amount as encoding a bitstream, but with the mathematical operations performed in reverse. The amounts shown in FIG. 5 are only one example of such amounts; other suitable amounts can be used as well. Each of the quantities shown in FIG. 5 can be used with any of the encoders or decoders shown in FIGS. 1-4.

エンコーダは、デジタルオーディオ信号５０２を受け取ることができる。デジタルオーディオ信号５０２は、時間領域にあり、時間と共に展開するオーディオ信号振幅を表す一連の整数又は浮動小数点数を含むことができる。デジタルオーディオ信号５０２は、スタジオからの生放送などのストリーム（例えば、指定された開始及び／又は終了のない）の形態とすることができる。代替的に、デジタルオーディオ信号５０２は、サーバ上のオーディオファイル、コンパクトディスクからリッピングされた非圧縮オーディオファイル、又は非圧縮形式の曲のミックスダウンファイルなどの個別ファイル（例えば、開始及び終了と、指定された継続時間とを有する）とすることができる。 The encoder can receive a digital audio signal 502. Digital audio signal 502 is in the time domain and may include a series of integer or floating point numbers representing audio signal amplitude as it evolves over time. Digital audio signal 502 may be in the form of a stream (eg, without a designated start and/or end), such as a live broadcast from a studio. Alternatively, the digital audio signal 502 may be an individual file (e.g., with a start and end and a specified duration).

エンコーダは、デジタルオーディオ信号５０２を構文解析して複数のフレーム５０４にすることができ、ここで、各フレーム５０４は、指定された数のオーディオサンプル５０６を含む。例えば、フレーム５０４は、１０２４個のサンプル５０６又は別の好適な値を含むことができる。一般に、デジタルオーディオ信号５０２をフレーム５０４にグループ化することにより、エンコーダは、明確に規定された数のサンプル５０６にエンコーダの処理を効率的に適用することができるようになる。幾つかの実施例において、このような処理は、各フレームが他のフレームと独立して処理されるように、フレームごとに異なるものとすることができる。 The encoder may parse the digital audio signal 502 into multiple frames 504, where each frame 504 includes a specified number of audio samples 506. For example, frame 504 may include 1024 samples 506 or another suitable value. In general, grouping digital audio signal 502 into frames 504 allows an encoder to efficiently apply encoder processing to a well-defined number of samples 506. In some embodiments, such processing may be different for each frame, such that each frame is processed independently of other frames.

エンコーダは、各フレーム５０４のオーディオサンプル５０６の変換５０８を実行することができる。幾つかの実施例において、この変換は、修正離散コサイン変換とすることができる。フーリエ、ラプラスなどの他の好適な変換が使用できる。変換５０８は、フレーム５０４内のサンプル５０６などの時間領域量を、フレーム５０４に関する周波数領域係数５１０などの周波数領域量に変換する。変換５０８は、各フレーム５０４に関する複数の周波数領域係数５１０を生成することができる。幾つかの実施例において、変換５０８によって生成される周波数領域係数５１０の数は、１０２４などのフレーム内のサンプル５０６の数に等しいとすることができる。周波数領域係数５１０は、特定の周波数の信号がフレーム内にどの程度存在するかを記述するものである。 The encoder may perform a transformation 508 of the audio samples 506 of each frame 504. In some embodiments, this transform may be a modified discrete cosine transform. Other suitable transforms such as Fourier, Laplace, etc. can be used. Transform 508 converts time domain quantities, such as samples 506 within frame 504, to frequency domain quantities, such as frequency domain coefficients 510 for frame 504. Transform 508 may generate multiple frequency domain coefficients 510 for each frame 504. In some examples, the number of frequency domain coefficients 510 produced by transform 508 may be equal to the number of samples 506 in the frame, such as 1024. Frequency domain coefficients 510 describe how much signals of a particular frequency are present within a frame.

幾つかの実施例において、時間領域フレームは、連続したサンプルからなるサブブロックに更に分割することができ、変換は、各サブブロックに適用することができる。例えば、１０２４個のサンプルからなるフレームは、それぞれ１２８個のサンプルからなる８つのサブブロックに分割することができ、このような各サブブロックは、１２８個の周波数係数からなるブロックに変換することができる。フレームがサブブロックに分割される例に関する変換は、短変換と呼ばれることがある。フレームがサブブロックに分割されない例の場合には、変換は、長変換と呼ばれることがある。 In some embodiments, a time-domain frame can be further divided into subblocks of consecutive samples, and a transform can be applied to each subblock. For example, a frame of 1024 samples can be divided into 8 subblocks of 128 samples each, and each such subblock can be transformed into a block of 128 frequency coefficients. can. Transforms for instances where a frame is divided into subblocks are sometimes referred to as short transforms. For examples where the frame is not divided into subblocks, the transform may be referred to as a long transform.

エンコーダは、各フレーム５０４に関する複数の周波数領域係数５１０を各フレーム５０４に関する複数の帯域５１２に分割することができる。幾つかの実施例において、フレームあたり２２個の帯域５１２が存在することができるが、別の値が、同様に使用できる。各帯域５１２は、フレーム５０４内の周波数５１０の範囲を表して、全ての周波数範囲を連結したものが、フレーム５０４内で表される全ての周波数を含むことができるようになる。短変換を使用する例では、結果として生じる周波数係数の各ブロックは、長変換に使用される帯域と１対１で対応できる同じ数の帯域に分割することができる。短変換を使用する例では、ブロック内の所与の帯域の係数の数は、長変換の場合のその所与の帯域の係数の数と比較して比例的により少なくなる。例えば、フレームは、８つのサブブロックに分割でき、短変換ブロックにおける帯域は、長変換における対応する帯域における係数の数の８分の１を有する。長変換における帯域は、３２個の係数を有することができ、短変換では、同じ帯域が、８つの周波数ブロックの各々に４つの係数を有することができる。短変換における帯域は、時間領域で８つ、周波数領域で４つの分解能を有する８×４行列に関連することができる。長変換における帯域は、時間領域で１つ、周波数領域で３２個の分解能を有する１×３２行列に関連することができる。従って、各帯域５１２は、時間分解能５１４及び周波数分解能５１６を表す再形成パラメータ５１８を含むことができる。幾つかの実施例において、再形成パラメータ５１８は、時間分解能５１４及び周波数分解能５１６のデフォルト値からの変化の値を提供することにより時間分解能５１４及び周波数分解能５１６を表すことができる。 The encoder may divide the frequency domain coefficients 510 for each frame 504 into multiple bands 512 for each frame 504. In some embodiments, there may be 22 bands 512 per frame, but other values can be used as well. Each band 512 represents a range of frequencies 510 within frame 504 such that all frequency ranges concatenated can include all frequencies represented within frame 504. In an example using a short transform, each resulting block of frequency coefficients can be divided into the same number of bands that can correspond one-to-one with the bands used for the long transform. In an example using a short transform, the number of coefficients in a given band within a block is proportionally less compared to the number of coefficients in that given band in the case of a long transform. For example, a frame can be divided into eight subblocks, where the band in the short transform block has one-eighth the number of coefficients in the corresponding band in the long transform. A band in the long transform can have 32 coefficients, and in the short transform the same band can have 4 coefficients in each of the 8 frequency blocks. A band in the short transform may relate to an 8x4 matrix with a resolution of 8 in the time domain and 4 in the frequency domain. A band in the long transform may relate to a 1x32 matrix with a resolution of 1 in the time domain and 32 in the frequency domain. Accordingly, each band 512 may include a reshaping parameter 518 representing a temporal resolution 514 and a frequency resolution 516. In some examples, the reshaping parameters 518 can represent the temporal resolution 514 and frequency resolution 516 by providing values for variations from default values of the temporal resolution 514 and frequency resolution 516.

一般に、コーデックの目標は、符号化されたファイルの特定のデータ転送速度又はビットレートによって支配される限定量のデータを使用して、特定のフレームの周波数領域表現が、このフレームの時間領域表現を可能な限り正確に表すことを保証することである。例えば、データ転送速度は、１４１１ｋｂｐｓ（キロビット毎秒）、３２０ｋｂｐｓ、２５６ｋｂｐｓ、１９２ｋｂｐｓ、１６０ｋｂｐｓ、１２８ｋｂｐｓ、又はその他の値を含むことができる。一般に、データ転送速度が高いほど、フレームの表現は、より正確になる。 In general, the goal of a codec is to create a frequency-domain representation of a particular frame, but a time-domain representation of this frame using a limited amount of data, governed by the particular data rate or bit rate of the encoded file. The goal is to ensure that it is represented as accurately as possible. For example, the data transfer rate may include 1411 kbps (kilobits per second), 320 kbps, 256 kbps, 192 kbps, 160 kbps, 128 kbps, or other values. Generally, the higher the data rate, the more accurate the representation of the frame.

限定されたデータ転送速度のみを使用して精度を高めるという目標を追求することにおいて、コーデックは、各帯域に関する時間分解能と周波数分解能との間でトレードオフすることができる。例えば、コーデックは、特定の帯域の時間分解能を２倍にし、その一方、その帯域の周波数分解能を半分にすることができる。このような演算の実行（例えば、時間分解能の周波数分解能への交換、又はその逆）は、帯域の時間周波数構造の再形成と呼ばれることがある。一般に、初期変換では、全ての帯域の時間分解能は同じであり得るが、再形成後、フレーム内の１つの帯域の時間周波数構造は、このフレーム内の他の帯域の時間周波数構造と無関係であり得るので、各帯域は、他の帯域と無関係に再形成することができる。 In pursuing the goal of increasing accuracy using only limited data rates, codecs can trade off between time and frequency resolution for each band. For example, a codec may double the time resolution of a particular band while halving the frequency resolution of that band. Performing such operations (eg, exchanging time resolution for frequency resolution, or vice versa) may be referred to as reshaping the time-frequency structure of the band. In general, in the initial transformation, the time resolution of all bands may be the same, but after reshaping, the time-frequency structure of one band in a frame is independent of the time-frequency structure of other bands in this frame. Since each band can be reshaped independently of the other bands.

幾つかの実施例において、各帯域は、この帯域の時間分解能５１４と帯域の周波数分解能５１６との積に等しいサイズを有することができる。幾つかの実施例において、１つの帯域の時間分解能５１４は、８つのオーディオサンプルに等しく、別の帯域の時間分解能５１４は、１つのオーディオサンプルに等しいとすることができる。他の好適な時間分解能５１４が、同様に使用できる。 In some examples, each band can have a size equal to the product of the band's time resolution 514 and the band's frequency resolution 516. In some examples, the temporal resolution 514 of one band may be equal to eight audio samples and the temporal resolution 514 of another band may be equal to one audio sample. Other suitable temporal resolutions 514 can be used as well.

幾つかの実施例において、エンコーダは、帯域のサイズが変化することなく（例えば、時間分解能５１４と周波数分解能５１６との積が変化することなく）、各フレームの各帯域の時間分解能５１４及び周波数分解能５１６を補完的に調整することができる。エンコーダは、再形成パラメータを用いてこの調整を定量化することができる。 In some embodiments, the encoder adjusts the time resolution 514 and frequency resolution of each band of each frame without changing the size of the band (e.g., without changing the product of time resolution 514 and frequency resolution 516). 516 can be adjusted in a complementary manner. The encoder can quantify this adjustment using reshaping parameters.

再形成パラメータは、選択された整数とすることができる。例えば、再形成パラメータが３である場合に、時間分解能は、量２³を乗算でき、周波数分解能は、量２^-3を乗算することができる。他の好適な整数が使用でき、これらの整数は、正の整数（時間分解能５１４が増加し、周波数分解能５１６が減少することを意味する）、負の整数（時間分解能が減少し、周波数分解能が増加することを意味する）、及びゼロ（時間分解能５１４及び周波数分解能５１６が変化しない、例えば、量２⁰を乗算することを意味する）を含む。 The reshaping parameter can be a selected integer. For example, if the reconstruction parameter is 3, the time resolution can be multiplied by the quantity 2 ³ and the frequency resolution can be multiplied by the quantity 2 ^-3 . Other suitable integers can be used, including positive integers (meaning the time resolution 514 increases and frequency resolution 516 decreases), negative integers (meaning the time resolution decreases and the frequency resolution 516 decreases), and negative integers (meaning the time resolution 514 increases and the frequency resolution 516 decreases). (meaning increasing), and zero (meaning that the time resolution 514 and frequency resolution 516 do not change, e.g., multiplying by the amount ²⁰ ).

幾つかの実施例において、許容される再形成パラメータ値の数は、有限数の整数に制限することができる。特定の実施例として、許容される再形成パラメータ値は、０、１、２、及び３を含み、合計４つの整数とすることができる。別の特定の実施例として、許容される再形成パラメータ値は、０、１、２、３、及び４を含み、合計５つの整数とすることができる。別の特定の実施例として、許容される再形成パラメータ値は、０、－１、－２、－３、及び－４を含み、合計５つの整数とすることができる。別の特定の実施例として、許容される再形成パラメータ値は、０、－１、－２、及び－３を含み、合計４つの整数とすることができる。これらの実施例において、これらの指定された整数範囲を記述する用語は、アルファベットサイズである。具体的には、ある範囲の整数に関するアルファベットサイズは、この範囲内の許容値の数である。上記の４つの実施例において、アルファベットサイズは４又は５である。 In some embodiments, the number of allowed reshaping parameter values may be limited to a finite number of integers. As a particular example, allowed reshaping parameter values may include 0, 1, 2, and 3, for a total of four integers. As another particular example, allowable reshaping parameter values may include 0, 1, 2, 3, and 4, for a total of five integers. As another particular example, allowable reshaping parameter values may include 0, -1, -2, -3, and -4, for a total of five integers. As another particular example, the allowed reshaping parameter values may include 0, -1, -2, and -3, for a total of four integers. In these examples, the term describing these specified integer ranges is alphabet size. Specifically, the alphabet size for a range of integers is the number of allowed values within this range. In the four examples above, the alphabet size is 4 or 5.

幾つかの実施例において、単一のフレームは、第１のアルファベットサイズを使用して符号化できる再形成パラメータを有する１又は２以上の帯域を含むことができ、更に、第１のアルファベットサイズと異なる第２のアルファベットサイズを使用して符号化できる再形成パラメータを有する１又は２以上の帯域を含むことができる。このようにして異なるアルファベットサイズを使用することは、より圧縮されたビットストリーム圧縮を可能にすることができる。 In some embodiments, a single frame can include one or more bands having a reshaping parameter that can be encoded using a first alphabet size, and further includes a first alphabet size and One or more bands can be included having reshaping parameters that can be encoded using different second alphabet sizes. Using different alphabet sizes in this way may allow for more compressed bitstream compression.

エンコーダは、各帯域に関する再形成パラメータを表すデータをビットストリームに符号化することができる。再形成パラメータのビットストリームへの符号化は、デコーダが、逆変換適用前に時間／周波数再形成を逆にすることを可能にすることができる。１つの単純な手法は、再形成シーケンスの各要素がフレーム内の帯域に関する再形成パラメータである状態で各フレームに関する再形成シーケンスを形成することとすることができる。２２個の帯域を有するフレームの場合に、この手法は、２２個の再形成パラメータで構成された再形成シーケンスを生成することになる。各フレームに関する再形成シーケンスは、各帯域に関する再形成パラメータを記述することができる。幾つかの実施例において、エンコーダは、各再形成シーケンスにおける各エントリをこのエントリに関する可能性のある値の範囲に正規化でき、可能性のある値の各範囲は、帯域に関する再形成パラメータの指定された範囲に対応する。 An encoder may encode data representing reconstruction parameters for each band into a bitstream. Encoding the reshaping parameters into the bitstream may allow the decoder to reverse the time/frequency reshaping before applying the inverse transform. One simple approach may be to form a reshaping sequence for each frame, with each element of the reshaping sequence being a reshaping parameter for a band within the frame. For a frame with 22 bands, this approach will generate a reshaping sequence made up of 22 reshaping parameters. The reshaping sequence for each frame may describe the reshaping parameters for each band. In some embodiments, the encoder can normalize each entry in each reshaping sequence to a range of possible values for this entry, each range of possible values being a specification of the reshaping parameter for the band. corresponds to the specified range.

この単純な手法に対する改良策として、エンコーダは、これら２２個の整数を完全に記述するのに必要なデータのサイズを削減することができる。この改良された手法では、エンコーダは、４つのシーケンスの長さ（例えば、４つのシーケンスの各々におけるビット又は整数の数）を計算して、４つのシーケンスのうちの最短シーケンスを選択して、この最短シーケンスを表すデータをビットストリームに埋め込むことができる。最短シーケンスは、最小ビット数を含むシーケンス、すなわち、２２個の整数を最も簡潔に記述するシーケンスである。４つのシーケンスについて以下に説明する。 As an improvement to this simple approach, the encoder can reduce the size of data required to completely describe these 22 integers. In this improved approach, the encoder calculates the lengths of the four sequences (e.g., the number of bits or integers in each of the four sequences), selects the shortest sequence among the four sequences, and selects this sequence. Data representing the shortest sequence can be embedded in the bitstream. The shortest sequence is the sequence that contains the least number of bits, ie, the sequence that most concisely describes the 22 integers. The four sequences will be explained below.

エンコーダは、単進（ｕｎａｒｙ）符号を使用して、各帯域に関する再形成パラメータを表すシーケンスとしてフレームに関する再形成パラメータを記述する、各フレームに関する第１のシーケンスを形成することができる。エンコーダは、準一様符号を使用して、各帯域に関する再形成パラメータを表すシーケンスとしてフレームに関する再形成パラメータを記述する、各フレームに関する第２のシーケンスを形成することができる。エンコーダは、単進符号を使用して、隣接する帯域間の再形成パラメータの差分を表すシーケンスとしてフレームに関する再形成パラメータを記述する、各フレームに関する第３のシーケンスを形成することができる。エンコーダは、準一様符号を使用して、隣接する帯域間の再形成パラメータの差分を表すシーケンスとしてフレームに関する再形成パラメータを記述する、各フレームに関する第４のシーケンスを形成することができる。 The encoder may use unary codes to form a first sequence for each frame that describes the reformation parameters for the frame as a sequence representing the reformation parameters for each band. The encoder may use a quasi-uniform code to form a second sequence for each frame that describes the reshaping parameters for the frame as a sequence representing the reshaping parameters for each band. The encoder may use unary codes to form a third sequence for each frame that describes the reformation parameters for the frame as a sequence representing the difference in reformation parameters between adjacent bands. The encoder may use a quasi-uniform code to form a fourth sequence for each frame that describes the reformation parameters for the frame as a sequence representing the difference in reformation parameters between adjacent bands.

エンコーダは、第１のシーケンス、第２のシーケンス、第３のシーケンス、第４のシーケンスのうちの最短シーケンスを選択することができる。エンコーダは、各フレームに対して、選択された最短シーケンスをビットストリームに埋め込むことができる。エンコーダは更に、各フレームに対して、インジケータを表すデータをビットストリームに埋め込むことができ、このインジケータは、４つのシーケンスのうちのどれがビットストリームに含まれるかを示す。 The encoder can select the shortest sequence among the first sequence, the second sequence, the third sequence, and the fourth sequence. The encoder may embed the selected shortest sequence into the bitstream for each frame. The encoder may further embed, for each frame, data in the bitstream representing an indicator indicating which of the four sequences is included in the bitstream.

以下の付録は、上述した量の厳密な数学的定義を提示するものである。 The appendix below provides strict mathematical definitions of the quantities mentioned above.

図６は、幾つかの実施例による、オーディオ信号を符号化するための方法６００の実施例のフローチャートを示している。方法６００は、図１又は図２の符号化システム１００又は２００によって、或いは任意の他の好適な符号化システムによって実行することができる。方法６００は、オーディオ信号を符号化するための方法の一例に過ぎず、他の好適な符号化方法が、同様に使用できる。 FIG. 6 depicts a flowchart of an embodiment of a method 600 for encoding an audio signal, according to some embodiments. Method 600 may be performed by encoding system 100 or 200 of FIG. 1 or 2, or by any other suitable encoding system. Method 600 is only one example of a method for encoding an audio signal; other suitable encoding methods can be used as well.

動作６０２において、符号化システムは、デジタルオーディオ信号を受け取ることができる。 At act 602, an encoding system can receive a digital audio signal.

動作６０４において、符号化システムは、デジタルオーディオ信号を構文解析して複数のフレームにすることができ、各フレームは、指定された数のオーディオサンプルを含む。 At operation 604, the encoding system may parse the digital audio signal into multiple frames, each frame including a specified number of audio samples.

動作６０６において、符号化システムは、各フレームのオーディオサンプルの変換を行って、各フレームに関する複数の周波数領域係数を生成することができる。 At act 606, the encoding system may perform a transform on the audio samples of each frame to generate a plurality of frequency domain coefficients for each frame.

動作６０８において、符号化システムは、各フレームに関する複数の周波数領域係数を各フレームに関する複数の帯域に分割することができ、各帯域は、時間分解能及び周波数分解能を表す再形成パラメータを有する。 At act 608, the encoding system may divide the plurality of frequency domain coefficients for each frame into a plurality of bands for each frame, each band having reformation parameters representing time resolution and frequency resolution.

動作６１０において、符号化システムは、デジタルオーディオ信号を符号化して、再形成パラメータを含むビットストリームにすることができる。第１の帯域に関する再形成パラメータは、第１のアルファベットサイズを使用して符号化することができる。第１の帯域と異なる第２帯域に関する再形成パラメータは、第１のアルファベットサイズと異なる第２のアルファベットサイズを使用して符号化することができる。 At act 610, the encoding system may encode the digital audio signal into a bitstream that includes reshaping parameters. Reshaping parameters for the first band may be encoded using a first alphabet size. Reshaping parameters for a second band different from the first band may be encoded using a second alphabet size different from the first alphabet size.

動作６１２において、符号化システムは、ビットストリームを出力することができる。 At act 612, the encoding system can output a bitstream.

図７は、幾つかの実施例による、符号化されたオーディオ信号を復号するための方法７００の実施例のフローチャートを示している。方法７００は、図３又は図４の復号システム３００又は４００によって、或いは任意の他の好適な符号化システムによって実行することができる。方法７００は、符号化されたオーディオ信号を復号するためのほんの一方法であり、他の好適な符号化方法が、同様に使用できる。 FIG. 7 shows a flowchart of an embodiment of a method 700 for decoding an encoded audio signal, according to some embodiments. Method 700 may be performed by decoding system 300 or 400 of FIG. 3 or 4 or by any other suitable encoding system. Method 700 is just one method for decoding an encoded audio signal; other suitable encoding methods can be used as well.

動作７０２において、復号システムは、複数の帯域に各々が分割された複数のフレームを含むビットストリームを受け取ることができる。 At operation 702, a decoding system can receive a bitstream that includes multiple frames, each divided into multiple bands.

動作７０４において、復号システムは、各フレームの各帯域に対して、ビットストリームから再成形パラメータを抽出することができ、この再成形パラメータは、帯域に関する時間分解能及び周波数分解能を表す。第１の帯域に関する再形成パラメータは、第１のアルファベットサイズを使用して、ビットストリームに埋め込むことができる。第１の帯域と異なる第２の帯域に関する再形成パラメータは、第１のアルファベットサイズと異なる第２のアルファベットサイズを使用して、ビットストリームに埋め込むことができる。 At operation 704, the decoding system may extract reshaping parameters from the bitstream for each band of each frame, where the reshaping parameters represent temporal and frequency resolution for the band. Reshaping parameters for the first band may be embedded in the bitstream using a first alphabet size. Reshaping parameters for a second band different from the first band may be embedded in the bitstream using a second alphabet size different from the first alphabet size.

動作７０６において、復号システムは、再形成パラメータを使用してビットストリームを復号して、復号されたデジタルオーディオ信号を生成することができる。 At operation 706, the decoding system may decode the bitstream using the reshaping parameters to generate a decoded digital audio signal.

図１２は、幾つかの実施例による符号化システム１２００の１つの実施例のブロック図を示している。 FIG. 12 depicts a block diagram of one embodiment of an encoding system 1200, according to some embodiments.

受信器回路１２０２は、デジタルオーディオ信号を受け取ることができる。 Receiver circuit 1202 can receive digital audio signals.

フレーマ回路１３０４は、デジタルオーディオ信号を構文解析して複数のフレームにすることができ、各フレームは、指定された数のオーディオサンプルを含む。 Framer circuit 1304 may parse the digital audio signal into multiple frames, each frame including a specified number of audio samples.

変換器回路１２０６は、各フレームのオーディオサンプルの変換を行って、各フレームに関する複数の周波数領域係数を生成することができる。 Transformer circuit 1206 may perform a transform of the audio samples of each frame to generate a plurality of frequency domain coefficients for each frame.

周波数帯域分割器回路１２０８は、各フレームに関する複数の周波数領域係数を各フレームに関する複数の帯域に分割することができ、各帯域は、時間分解能及び周波数分解能を表す再形成パラメータを有する。 Frequency band divider circuit 1208 can divide the frequency domain coefficients for each frame into multiple bands for each frame, each band having reformation parameters representing time resolution and frequency resolution.

エンコーダ回路１２０は、デジタルオーディオ信号を符号化して、各帯域の再形成パラメータを含むビットストリームにすることができる。第１の帯域に関する再形成パラメータは、第１のアルファベットサイズを使用して符号化することができる。第１の帯域と異なる第２の帯域に関する再形成パラメータは、第１のアルファベットサイズと異なる第２のアルファベットサイズを使用して符号化することができる。 Encoder circuit 120 may encode the digital audio signal into a bitstream that includes reformation parameters for each band. Reshaping parameters for the first band may be encoded using a first alphabet size. Reshaping parameters for a second band different from the first band may be encoded using a second alphabet size different from the first alphabet size.

出力回路１２１２は、ビットストリームを出力することができる。 The output circuit 1212 can output a bitstream.

本明細書で説明するもの以外の多くの他の変形形態は、本明細書から明らかであろう。例えば、実施形態によっては、本明細書で説明した何らかの方法及びアルゴリズムの特定の動作、事象、又は機能は、異なる順序で実行することができ、追加、統合、又は完全に省略することができる（従って、ここで説明する全ての動作又は事象が、本方法及びアルゴリズムの実施に必要であるとは限らない）。更に、特定の実施形態において、動作又は事象は、連続的にではなく同時に、例えば、マルチスレッド処理、割り込み処理、又はマルチプロセッサ若しくはプロセッサコアによって、或いは他の並列アーキテクチャ上で実行することができる。加えて、様々なタスク又は処理は、一緒に機能することができる異なるマシン及びコンピューティングシステムによって実行することができる。 Many other variations in addition to those described herein will be apparent from the specification. For example, in some embodiments, certain acts, events, or functions of some of the methods and algorithms described herein may be performed in a different order, added, combined, or omitted entirely ( Therefore, not all acts or events described herein may be required to implement the present methods and algorithms). Furthermore, in certain embodiments, operations or events may be performed concurrently rather than sequentially, for example, by multi-threaded processing, interrupt processing, or by multiple processors or processor cores, or on other parallel architectures. Additionally, various tasks or processes can be performed by different machines and computing systems that can function together.

本明細書に開示する実施形態に関連して説明した様々な例示的論理ブロック、モジュール、方法、及びアルゴリズムの処理及び順序は、電子ハードウェア、コンピュータソフトウェア、又はこれら両方の組み合わせとして実装することができる。ハードウェアとソフトウェアとのこの互換性を明確に説明するために、様々な例示的コンポーネント、ブロック、モジュール、及び処理の動作は、上記では一般的にこれらの機能性に関して説明されている。このような機能性がハードウェアとして実装されるか又はソフトウェアとして実装されるかは、特定の用途、及びシステム全体に課された設計上の制約条件に依存する。説明した機能性は、特定の用途の各々に関して異なる方法で実施できるが、このような実施の決定が、本明細書の範囲からの逸脱を生じさせると解釈すべきでない。 The various example logical blocks, modules, methods, and algorithm processes and sequences described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or a combination of both. can. To clearly illustrate this compatibility between hardware and software, the operations of various example components, blocks, modules, and processes are described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the particular application and design constraints imposed on the overall system. The functionality described may be implemented in different ways for each particular application, and such implementation decisions should not be construed as resulting in a departure from the scope of this specification.

本明細書に開示する実施形態に関連して説明する様々な例示的論理ブロック及びモジュールは、汎用プロセッサ、処理デバイス、１又は２以上の処理デバイスを有するコンピューティングデバイス、デジタル信号プロセッサ（ＤＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）又は他のプログラム可能な論理デバイス、離散ゲート若しくはトランジスタ論理回路、離散ハードウェアコンポーネント、又は本明細書に記載の機能を実行するように設計されたこれらの任意の組み合わせなどのマシンによって実装又は実行することができる。汎用プロセッサ及び処理デバイスは、マイクロプロセッサとすることができるが、代替形態では、プロセッサは、コントローラ、マイクロコントローラ、ステートマシン、これらの組み合わせ、又は同様のものとすることができる。プロセッサは、ＤＳＰとマイクロプロセッサとの組み合わせ、複数のマイクロプロセッサ、ＤＳＰコアと連動する１又は２以上のマイクロプロセッサ、又は他の任意のこのような構成などの、コンピューティングデバイスの組み合わせとして実装することもできる。 Various exemplary logic blocks and modules described in connection with embodiments disclosed herein include a general purpose processor, a processing device, a computing device having one or more processing devices, a digital signal processor (DSP), Application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic circuits, discrete hardware components, or other devices configured to perform the functions described herein. It can be implemented or executed by any machine such as any combination of these designed. General purpose processors and processing devices can be microprocessors, but in the alternative, the processor can be a controller, microcontroller, state machine, combinations thereof, or the like. A processor may be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. You can also do it.

本明細書に記載のシステム及び方法の実施形態は、多くのタイプの汎用又は専用コンピューティングシステム環境又は構成内で動作可能である。一般に、コンピューティング環境は、幾つかの実施例を挙げると、限定されるものではないが、１又は２以上のマイクロプロセッサ、メインフレームコンピュータ、デジタル信号プロセッサ、携帯用コンピューティングデバイス、パーソナルオーガナイザ、デバイスコントローラ、電気製品内部の計算エンジン、携帯電話、デスクトップコンピュータ、モバイルコンピュータ、タブレットコンピュータ、スマートフォン、及び組込型コンピュータを備えた電気製品に基づくコンピュータシステムを含む任意のタイプのコンピュータシステムを含むことができる。 Embodiments of the systems and methods described herein are operational within many types of general purpose or special purpose computing system environments or configurations. In general, a computing environment includes one or more microprocessors, mainframe computers, digital signal processors, portable computing devices, personal organizers, devices, to name a few examples, but are not limited to. Can include any type of computer system, including computer systems based on appliances with controllers, computational engines within appliances, mobile phones, desktop computers, mobile computers, tablet computers, smartphones, and embedded computers. .

このようなコンピューティングデバイスは、通常、限定されるものではないが、パーソナルコンピュータ、サーバコンピュータ、ハンドヘルドコンピューティングデバイス、ラップトップ又はモバイルコンピュータ、携帯電話及びＰＤＡなどの通信デバイス、マルチプロセッサシステム、マイクロプロセッサベースのシステム、セットトップボックス、プログラム可能な家庭用電化製品、ネットワークＰＣ、ミニコンピュータ、メインフレームコンピュータ、オーディオ又はビデオメディアプレーヤなどを含む、少なくとも何らかの最低限の計算能力を有するデバイスに見つけることができる。幾つかの実施形態において、コンピューティングデバイスは、１又は２以上のプロセッサを含むことになる。各プロセッサは、デジタル信号プロセッサ（ＤＳＰ）、超長命令語（ＶＬＩＷ）、又は他のマイクロコントローラなどの特殊なマイクロプロセッサとすること、或いは、マルチコアＣＰＵ内の特殊なグラフィックス処理ユニット（ＧＰＵ）ベースのコアを含む、１又は２以上の処理コアを有する従来型中央処理ユニット（ＣＰＵ）とすることができる。 Such computing devices typically include, but are not limited to, personal computers, server computers, handheld computing devices, laptops or mobile computers, communication devices such as cell phones and PDAs, multiprocessor systems, microprocessor systems, etc. can be found in devices with at least some minimal computing power, including base systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, audio or video media players, etc. . In some embodiments, a computing device will include one or more processors. Each processor may be a specialized microprocessor, such as a digital signal processor (DSP), very long instruction word (VLIW), or other microcontroller, or may be based on a specialized graphics processing unit (GPU) within a multi-core CPU. A conventional central processing unit (CPU) may have one or more processing cores, including a core.

本明細書に開示する実施形態に関連して説明した方法、処理、又はアルゴリズムの処理動作は、ハードウェアで直接、プロセッサによって実行されるソフトウェアモジュールで、又はこれら２つの何れかの組み合わせで具現化することができる。ソフトウェアは、コンピューティングデバイスがアクセスできるコンピュータ可読媒体に含めることができる。コンピュータ可読媒体は、取り外し可能、取り外し不可能の何れかである揮発性媒体及び不揮発性媒体の両方、又はこれらの何れかの組み合わせを含む。コンピュータ可読媒体は、コンピュータ可読命令又はコンピュータ実行可能命令、データ構造、プログラムモジュール、又は他のデータなどの情報を格納するのに使用される。コンピュータ可読媒体は、限定されるものではなく例として、コンピュータ記憶媒体及び通信媒体を含むことができる。 The processing operations of the methods, processes, or algorithms described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in any combination of the two. can do. The software can be included on a computer-readable medium that can be accessed by a computing device. Computer-readable media includes both volatile and non-volatile media, either removable or non-removable, and/or any combination thereof. Computer-readable media are used to store information such as computer-readable or computer-executable instructions, data structures, program modules, or other data. Computer-readable media can include, by way of example and not limitation, computer storage media and communication media.

コンピュータストレージ媒体は、限定されるものではないが、Ｂｌｕｒａｙ（登録商標）ディスク（ＢＤ）、デジタル多用途ディスク（ＤＶＤ）、コンパクトディスク（ＣＤ）、フロッピーディスク、テープドライブ、ハードドライブ、光学ドライブ、ソリッドステートメモリデバイス、ＲＡＭメモリ、ＲＯＭメモリ、ＥＰＲＯＭメモリ、ＥＥＰＲＯＭメモリ、フラッシュメモリ、又は他のメモリ技術、磁気カセット、磁気テープ、磁気ディスクストレージ、又は他の磁気ストレージデバイス、或いは所望の情報を格納するのに使用可能で１又は２以上のコンピューティングデバイスによってアクセス可能な何らかの他のデバイスなどの、コンピュータ可読媒体又はマシン可読媒体又はストレージデバイスを含む。 Computer storage media include, but are not limited to, Bluray® discs (BDs), digital versatile discs (DVDs), compact discs (CDs), floppy disks, tape drives, hard drives, optical drives, solid state A state memory device, RAM memory, ROM memory, EPROM memory, EEPROM memory, flash memory, or other memory technology, magnetic cassette, magnetic tape, magnetic disk storage, or other magnetic storage device, or for storing the desired information. computer-readable or machine-readable media or storage devices, such as any other device usable and accessible by one or more computing devices.

ソフトウェアは、ＲＡＭメモリ、フラッシュメモリ、ＲＯＭメモリ、ＥＰＲＯＭメモリ、ＥＥＰＲＯＭメモリ、レジスタ、ハードディスク、取り外し可能ディスク、ＣＤＲＯＭ、或いは当該技術で公知の非一時的コンピュータ可読ストレージ媒体、メディア、又は物理コンピュータストレージの何らかの他の形態に存在することができる。例示的なストレージ媒体は、プロセッサがストレージ媒体から情報を読み出してそれに情報を書き込むことができるように、プロセッサに結合することができる。代替例では、ストレージ媒体は、プロセッサと一体化することができる。プロセッサ及びストレージ媒体は、特定用途向け集積回路（ＡＳＩＣ）に存在することができる。ＡＳＩＣは、ユーザ端末内に存在することができる。代替的に、プロセッサ及びストレージ媒体は、ユーザ端末内の個別コンポーネントとして存在することができる。 The software may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CDROM, or any other type of non-transitory computer readable storage medium, media, or physical computer storage known in the art. Can exist in other forms. An exemplary storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. A processor and a storage medium may reside on an application specific integrated circuit (ASIC). The ASIC may reside within the user terminal. Alternatively, the processor and the storage medium can reside as separate components within a user terminal.

本明細書で使用される「非一時的」という語句は、「永続的又は長寿命」を意味する。「非一時的コンピュータ可読媒体」という語句は、任意の及び全てのコンピュータ可読媒体を含み、唯一の実施例外は、一時的な伝搬信号である。この語句は、限定されるものではなく例として、レジスタメモリ、プロセッサキャッシュ、及びランダムアクセスメモリ（ＲＡＭ）などの非一時的コンピュータ可読媒体を含む。 As used herein, the phrase "non-temporary" means "permanent or long-lived." The phrase "non-transitory computer-readable medium" includes any and all computer-readable media, the only implementation exception being a transitory propagated signal. This term includes, by way of example and not limitation, non-transitory computer-readable media such as register memory, processor cache, and random access memory (RAM).

「オーディオ信号」という語句は、物理的な音を表す信号である。 The phrase "audio signal" is a signal that represents physical sound.

また、コンピュータ可読命令又はコンピュータ実行可能命令、データ構造、プログラムモジュールなどのような情報の保持は、１又は２以上の変調データ信号、電磁波（搬送波など）、又は他の伝送機構若しくは通信プロトコルを符号化するための様々な通信媒体を使用して実現することもでき、何らかの有線又は無線情報配信機構を含む。一般に、これらの通信媒体は、情報又は命令を信号内に符号化するような方法で設定又は変更される信号特性のうちの１又は２以上を有する信号を参照する。例えば、通信媒体は、１又は２以上の変調データ信号を搬送する有線ネットワーク又は直接有線接続などの有線媒体と、音響、無線周波数（ＲＦ）、赤外線、レーザなどの無線媒体と、１又は２以上の変調データ信号又は電磁波を送信、受信、又は送受信するための他の無線媒体とを含む。上記の何れかの組み合わせは、同様に、通信媒体の範囲内に含まれるはずである。 Also, carrying information such as computer-readable or computer-executable instructions, data structures, program modules, etc. may encode one or more modulated data signals, electromagnetic waves (such as carrier waves), or other transmission mechanisms or communication protocols. It can also be implemented using a variety of communication media for communication, including some wired or wireless information delivery mechanism. Generally, these communication media refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information or instructions in the signal. For example, communication media can include one or more wired media, such as a wired network or direct wired connection, that carries one or more modulated data signals, and one or more wireless media, such as acoustic, radio frequency (RF), infrared, laser, etc. transmitting, receiving, or other wireless media for transmitting and receiving modulated data signals or electromagnetic waves. Combinations of any of the above should also be included within the scope of communication media.

更に、本明細書に記載の符号化及び復号システム及び方法の様々な実施形態の一部又は全部を具現化するソフトウェア、プログラム、コンピュータプログラム製品のうちの１つ又は何れかの組み合わせ、或いはこれの一部分は、コンピュータ実行可能命令又は他のデータ構造の形式で、コンピュータ可読媒体又はマシン可読媒体又はストレージデバイス及び通信媒体の任意の所望の組み合わせに格納、受信、送信、又はこれらから読み出すことができる。 Additionally, any one or any combination of software, programs, computer program products, or combinations thereof, embodying some or all of the various embodiments of the encoding and decoding systems and methods described herein. The portions may be stored on, received from, transmitted to, or read from computer-readable or machine-readable media or any desired combination of storage devices and communication media in the form of computer-executable instructions or other data structures.

本明細書に記載のシステム及び方法の実施形態は更に、コンピューティングデバイスによって実行されるプログラムモジュールなどのコンピュータ実行可能命令という一般的状況で説明することができる。一般に、プログラムモジュールは、特定のタスクを実行するか又は特定の抽象データタイプを実装する、ルーチン、プログラム、オブジェクト、コンポーネント、データ構造などを含む。また、本明細書に記載の実施形態は、１又は２以上のリモート処理デバイスによって、又は１又は２以上のデバイスからなるクラウド内でタスクが実行される分散コンピューティング環境で実施することもでき、これらのデバイスは、１又は２以上の通信ネットワークを介してリンクされている。分散コンピューティング環境では、プログラムモジュールは、メディアストレージデバイスを含む、ローカル及びリモート両方のコンピュータストレージ媒体内に配置することができる。更に、上述した命令は、プロセッサを含むことがあるか又はプロセッサを含まないこともあるハードウェア論理回路として部分的に又は全体的に実装することができる。 Embodiments of the systems and methods described herein may also be described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Embodiments described herein can also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices or in a cloud of one or more devices. These devices are linked via one or more communication networks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including media storage devices. Furthermore, the instructions described above may be implemented in part or in whole as hardware logic circuitry that may or may not include a processor.

本明細書で使用される条件語、とりわけ、「できる（ｃａｎ）」、「してよい（ｍｉｇｈｔ）」、「できる（ｍａｙ）」、「例えば（ｅ．ｇ．）」、及び同様のものは、別途明確に言及されていない限り、又は使用される文脈内でそれ以外の意味で理解されない限り、一般に、特定の実施形態が、特定の特徴、要素、及び／又は状態を含むが、他の実施形態は、これらを含まないことを伝えることを意図している。従って、このような条件語は、一般に、特徴、要素、及び／又は状態が、１又は２以上の実施形態に必ず必要であることを示唆するものでなく、作成者の入力又は指示があってもなくても、何れかの特定の実施形態において、これらの特徴、要素、及び／又は状態が含まれるか又は実行されるか否かを決定するためのロジックを、１又は２以上の実施形態が必ず含むことを示唆するものでもない。「備える（ｃｏｍｐｒｉｓｉｎｇ）」、「含む（ｉｎｃｌｕｄｉｎｇ）」、「有する（ｈａｖｉｎｇ）」という用語、及び同様のものは、同義であり、包括的にオープンエンド方式で使用され、追加の要素、特徴、動作、操作などを除外するものではない。また、「又は」という用語は、その包括的な意味で（その排他的意味ではなく）使用され、従って、例えば、要素のリストを結び付けるのに使用される際に、「又は」という用語は、リスト内の要素の１つ、一部、又は全てを意味する。 Conditional words as used herein, especially "can", "might", "may", "e.g.", and the like, , unless explicitly stated otherwise or understood otherwise within the context in which it is used, a particular embodiment generally includes certain features, elements, and/or conditions; The embodiments are intended to convey the exclusion of these. Thus, such conditional language generally does not imply that a feature, element, and/or condition is necessarily required for one or more embodiments, and that it may be used without input or direction from the author. In any particular embodiment, one or more embodiments may include logic for determining whether these features, elements, and/or conditions are included or implemented in any particular embodiment. This does not imply that it necessarily includes. The terms “comprising,” “including,” “having,” and the like are synonymous and are used inclusively and in an open-ended manner, and refer to additional elements, features, and operations. , operations etc. are not excluded. Also, the term "or" is used in its inclusive sense (rather than in its exclusive sense), so that, for example, when used to join a list of elements, the term "or" Refers to one, some, or all of the elements in a list.

上記の詳細な説明は、様々な実施形態に適用される新規性のある特徴を示し、説明し、指摘しているが、本開示の趣旨から逸脱することなく、様々な省略、置換、及び変更が、例証されたデバイス又はアルゴリズムの形態及び詳細において実施できることが理解されるであろう。認識されるように、一部の特徴は、他の特徴から切り離して使用又は実施することができるので、本明細書で説明する本発明の特定の実施形態は、本明細書に示した特徴及び利点の全てを提供するとは限らない形態の範囲内で具現化することができる。 While the foregoing detailed description illustrates, describes, and points out novel features that apply to various embodiments, various omissions, substitutions, and changes may be made without departing from the spirit of this disclosure. It will be understood that the methods may be implemented in the form and details of the devices or algorithms illustrated. As will be appreciated, some features may be used or implemented separately from other features, so that certain embodiments of the invention described herein may be modified to include the features and features illustrated herein. It may be implemented within a range of forms that do not necessarily provide all of the advantages.

更に、本主題は、構造的特徴及び方法論的動作に特有の用語で説明されているが、添付の請求項で規定される主題は、上述した特定の特徴又は動作に必ずしも限定されるものではないことを理解されたい。そうではなく、上述した特定の特徴及び動作は、請求項を実施する例示的な形態として開示される。 Furthermore, although the subject matter has been described in terms specific to structural features and methodological acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. I hope you understand that. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

（付録） (appendix)

本明細書で説明する時間周波数変更シーケンスコーデック及び方法の実施形態は、時間周波数再形成シーケンスを記述するシーケンスを効率的に符号化及び復号するための技法を含む。本コーデック及び方法の実施形態は、異種アルファベット上のシーケンスの効率的な符号化及び復号に対処する。 Embodiments of time-frequency modified sequence codecs and methods described herein include techniques for efficiently encoding and decoding sequences that describe time-frequency reshaping sequences. Embodiments of the present codec and method address efficient encoding and decoding of sequences on a heterogeneous alphabet.

幾つかのコーデックは、既存のコーデックで通常使用されるシーケンスよりもはるかに複雑なシーケンスを生成する。この複雑性は、これらのシーケンスが、可能性のある時間周波数再形成変換のより豊富なセットを記述するという事実に起因する。幾つかの実施形態では、この複雑性の原因は、シーケンスの要素が、異なるサイズ又は範囲（座標に応じた）のものである４つの異なるアルファベットから、かつオーディオフレームが処理される状況に基づいて得られる可能性があることである。これらのシーケンスの単純な符号化は、コストのかかるものであり、より豊富なセットの利点を無効にする。 Some codecs produce sequences that are much more complex than those typically used in existing codecs. This complexity is due to the fact that these sequences describe a richer set of possible time-frequency reshaping transforms. In some embodiments, this complexity is due to the fact that the elements of the sequence are from four different alphabets that are of different sizes or ranges (depending on the coordinates) and based on the context in which the audio frames are processed. This is something that can be obtained. Simple encoding of these sequences is expensive and negates the benefits of a richer set.

本コーデック及び方法の実施形態は、様々なアルファベット変換により異種アルファベットの一様処理を可能にして、符号化パラメータを最適にして、可能性のある最短記述が得られる非常に効率的な方法を説明する。本コーデック及び方法の実施形態の幾つかの特徴は、異種アルファベットの一様処理と、複数の符号化様式の定義と、符号化の長さを最小にする様式の選択とを含む。これらの特徴は、より豊富な時間周波数変換セットの使用を可能にすることを含む、本コーデック及び方法の実施形態の利点のうちの幾つかを提供するものの一部である。 Embodiments of the present codec and method describe a highly efficient method that enables uniform processing of heterogeneous alphabets through various alphabet transformations, optimizing encoding parameters to obtain the shortest possible description. do. Some features of embodiments of the present codecs and methods include uniform processing of heterogeneous alphabets, definition of multiple encoding styles, and selection of the style that minimizes encoding length. These features are part of what provides some of the advantages of embodiments of the present codec and method, including enabling the use of a richer set of time-frequency transforms.

セクション１：シーケンスの定義 Section 1: Sequence definition

修正離散コサイン変換（ＭＤＣＴ）変換エンジンは、現在、２つのモード、すなわち、長変換（デフォルトでほとんどのフレームで使用される）及び短変換（一時的なものを含むとみなされるフレームで使用される）で動作する。所与の帯域におけるＭＤＣＴ係数の数が量Ｎである場合に、長変換モードでは、これらの係数は、Ｎ個の周波数スロット（１×Ｎ）を含む１つの時間スロットとして構成される。短変換モードでは、係数は、各スロットがＮ／８個の周波数スロットを含む８つの時間スロット（８×Ｎ／８）として構成される。 The modified discrete cosine transform (MDCT) transform engine currently operates in two modes: long transform (used by default on most frames) and short transform (used on frames that are considered to contain transients). ) works. If the number of MDCT coefficients in a given band is a quantity N, then in long transform mode these coefficients are arranged as one time slot containing N frequency slots (1×N). In short transform mode, the coefficients are organized as eight time slots (8×N/8), each slot containing N/8 frequency slots.

時間周波数変更シーケンス又はベクトルは、フレームに有効な有効帯域の数までの、帯域ごとに１つの整数シーケンスである。各整数は、変換によって規定された元の時間／周波数構造が、対応する帯域に対してどのように変更されるかを示す。帯域に関する元の構造が、Ｔ×Ｆ（Ｔ個の時間スロット、Ｆ個の周波数スロット）であり、変更値がｃである場合には、適切なローカル変換の適用により、この構造は、２^cＴ×２^-cＦに変更される。ｃの許容値の範囲は、元のモードが長変換であるか又は短変換であるか、及び帯域のサイズに依存する整数制約条件によって、並びにサポートされる時間周波数構成の数に対する制限によって決定される。 The time-frequency change sequence or vector is a sequence of integers, one for each band, up to the number of effective bands available for the frame. Each integer indicates how the original time/frequency structure defined by the transform is changed for the corresponding band. If the original structure for the band is T×F (T time slots, F frequency slots) and the modification value is c, then by applying the appropriate local transformation this structure becomes 2 ^c Changed to T×2 ^-c F. The range of allowed values for c is determined by integer constraints that depend on whether the original modes are long or short transforms and the size of the band, as well as by limits on the number of time-frequency configurations supported. Ru.

帯域は、そのサイズが１６ＭＤＣＴビンより小さい場合に、狭帯域と呼ばれる。それ以外の場合には、帯域は、広帯域と呼ばれる。全ての帯域サイズは、８の倍数とすることができ、現在の実装形態では、４８ｋＨｚのサンプリングレートにおいて、０から７で番号付けされた帯域は、狭帯域であり、８から２１で番号付けされた帯域は、広帯域とすることができ、４４ｋＨｚのサンプリングレートでは、０から５で番号付けされた帯域は、狭帯域であり、６から２１で番号付けされた帯域は、広帯域とすることができる。 A band is called narrowband if its size is smaller than 16 MDCT bins. Otherwise, the band is called wideband. All band sizes can be multiples of 8, and in the current implementation, at a sampling rate of 48 kHz, bands numbered 0 to 7 are narrow bands and bands numbered 8 to 21 are narrow bands. The bands numbered from 0 to 5 may be narrow bands and the bands numbered from 6 to 21 may be wide bands at a sampling rate of 44 kHz. .

次の段落は、長変換対短変換と狭帯域対広帯域との全ての組み合わせに対して可能性のある変更値ｃのセットを示している。 The next paragraph shows a set of possible changes c for all combinations of long versus short transforms and narrowband versus wideband.

狭帯域かつ長変換の場合、｛０、１、２、３｝である。 For narrowband and long transforms, it is {0, 1, 2, 3}.

広帯域かつ長変換の場合、｛０、１、２、３、４｝である。 For wideband and long transforms, it is {0, 1, 2, 3, 4}.

狭帯域かつ短変換の場合、｛－３、－２、－１、０｝である。 For narrowband and short transforms, it is {-3, -2, -1, 0}.

広帯域かつ短変換の場合、｛－３、－２、－１、０、１｝である。 For wideband and short transforms, it is {-3, -2, -1, 0, 1}.

セクション２：シーケンス符号化 Section 2: Sequence encoding

セクション２．１：基本要素 Section 2.1: Basic elements

符号化処理への入力は、シーケンス又はベクトルｃ＝［ｃ₀、ｃ₁、．．．、ｃ_M-1］であり、ここで、量Ｍは、有効帯域の数であり、値ｃ_iは、上記の段落からの適切な範囲にある。 The input to the encoding process is the sequence or vector c=[c ₀ , c ₁ , . ．．．． , c _M-1 ], where the quantity M is the number of effective bands and the value c _i is in the appropriate range from the above paragraph.

シーケンスｃから、第１の差分シーケンス又はベクトルｄ＝［ｄ₀、ｄ₁、．．．、ｄ_M-1］を導出でき、ここで、ｄ₀＝ｃ₀であり、ｄ_i＝ｃ_i－ｃ_i-l、０＜ｉ＜Ｍである。符号化のパラメータｄが規定され、このパラメータは、どのシーケンスがビットストリームに符号化されるか、すなわち、パラメータｄが０である場合に、シーケンスｃ、パラメータｄがｌである場合に、シーケンスｄを信号で伝えるものである。パラメータｄがどのようにして決定されるかについての説明は、以下に従う。 From sequence c, a first difference sequence or vector d=[d ₀ , d ₁ , . ．．．． , d _M-1 ], where d ₀ =c ₀ and d _i =c _i −c _il , 0<i<M. A parameter d of the encoding is defined, which determines which sequence is encoded into the bitstream, i.e. sequence c if parameter d is 0, sequence d if parameter d is l. It is a signal that conveys the A description of how the parameter d is determined follows below.

シーケンス又はベクトルｓ＝［ｓ₀、ｓ₁、．．．、ｓ_M-1］が与えられた場合に、シーケンスｃ又はシーケンスｄのどちらであり得るかを符号化するために、以下が規定される。 Sequence or vector s=[s ₀ , s ₁ , . ．．．． , s _M-1 ], to encode which can be either sequence c or sequence d, the following is defined.

量ｈｅａｄ（ｓ）は、最初の座標から最後の非ゼロ座標まで延びるシーケンスｓのサブシーケンスの長さである。このサブシーケンスは、ｓのヘッドと呼ばれる。シーケンスｓが全てゼロのシーケンスである場合でその場合にのみ、ｈｅａｄ（ｓ）＝０であることに留意されたい。 The quantity head(s) is the length of the subsequence of the sequence s extending from the first coordinate to the last non-zero coordinate. This subsequence is called the head of s. Note that head(s)=0 if and only if the sequence s is a sequence of all zeros.

量ｈｅａｄ（ｓ）は、以下のように符号化される。量ｈｅａｄ（ｓ）がゼロに等しい場合に、エンコーダは、ゼロビットを書き込んで停止する。この場合、ゼロビットは、全てゼロである再形成ベクトル全体を表すので、更なる符号化は必要ない。量ｈｅａｄ（ｓ）がゼロより大きい場合には、エンコーダは、サイズＭのアルファベット上の準一様符号を使用して量ｈｅａｄ（ｓ）－１を符号化する。 The quantity head(s) is encoded as follows. If the quantity head(s) is equal to zero, the encoder writes a zero bit and stops. In this case, the zero bits represent the entire reformed vector that is all zeros, so no further encoding is required. If the quantity head(s) is greater than zero, the encoder encodes the quantity head(s)-1 using a quasi-uniform code over the alphabet of size M.

サイズαのアルファベット上の準一様符号は、以下のようにＬ₁＝［ｌｏｇ₂ α］ビット又はＬ₂＝［ｌｏｇ₂ α］ビットの何れかを使用して｛０、１、．．．、α－１｝における整数を符号化する。 A quasi-uniform code on the alphabet of size α can be written using either L ₁ = [log ₂ α] bits or L ₂ = [log ₂ α] bits as follows: {0, 1, . ．．．． , α-1}.

０＜＝ｘ＜ｎ₁であるシンボルｘは、Ｌ₁ビットにおけるこれらのバイナリ表現で符号化される。 Symbols x with 0<=x<n ₁ are encoded with their binary representation in L ₁ bits.

ｎ₁＜＝ｘ＜ｎ₁＋ｎ₂であるシンボルｘ、は、Ｌ₂ビットにおけるｘ＋ｎ₁のバイナリ表現で符号化される。 A symbol x, with n ₁ <=x<n ₁ +n ₂ , is encoded with a binary representation of x+n ₁ in L ₂ bits.

ｓのヘッドでのシンボルは、シンボルごとに符号化される。符号化の前に、各シンボルは、パラメータｄ、長変換対短変換、及び狭帯域対広帯域の選択に依存するマッピングを使用してマッピングされる。このマッピングは、図８に示されている擬似コード関数ＭａｐＴＦＳｙｍｂｏｌで規定される。入力シンボルシーケンスｓ、変数ｄ、ブール量ｉｓ＿ｌｏｎｇ及びｉｓ＿ｎａｒｒｏｗが、パラメータとして与えられていると仮定する。 The symbols at the head of s are encoded symbol by symbol. Before encoding, each symbol is mapped using a mapping that depends on the parameter d, the long versus short transform, and the choice of narrowband versus wideband. This mapping is defined in the pseudocode function MapTFSymbol shown in FIG. Assume that the input symbol sequence s, the variable d, and the Boolean quantities is_long and is_narrow are given as parameters.

図８は、全ての場合において、範囲［０、α）内の非負整数（すなわち、｛０、１、．．．、α－１｝）をもたらすマッピングを示しており、ここで、量αは、狭帯域の場合に４であり、広帯域の場合に５である。マッピングされたシンボルに対する２つのコード選択肢が存在し、これらのシンボルは、バイナリフラグｋを用いてパラメータ化される。 Figure 8 shows a mapping that in all cases results in non-negative integers (i.e. {0, 1, ..., α-1}) in the range [0, α), where the quantity α is , 4 for narrowband and 5 for wideband. There are two code choices for the mapped symbols, and these symbols are parameterized with a binary flag k.

ｋ＝０：サイズαのアルファベット上の単進符号である。この単進符号は、｛０、１、．．．、α－２｝における整数ｉを、ｉ個の「０」に続く、符号化の終了を示す「１」からなるシーケンスで符号化する。整数α－１は、終端の「１」を伴わずにα－１個の「０」からなるシーケンスで符号化される。 k=0: It is a simple code on the alphabet of size α. This unary code is {0, 1, . ．．．． , α-2} is encoded as a sequence consisting of i "0"s followed by "1" indicating the end of encoding. The integer α-1 is encoded as a sequence of α-1 '0's without a terminating '1'.

ｋ＝ｌ：サイズαのアルファベット上の準一様符号である。 k=l: A quasi-uniform code on the alphabet of size α.

バイナリフラグｋがどのようにして決定されるかについて以下で説明する。 How the binary flag k is determined is explained below.

セクション２．２：符号化 Section 2.2: Encoding

パラメータｄ及びｋは既知であると仮定する。ペア（ｄ、ｋ）は、図９に示されているように得られる１つのシンボルとして符号化される。結果として生じるシンボルは、Ｇｏｌｏｍｂコードを用いて符号化され、置換配列ｍａｐ＿ｄｋ＿ｐａｉｒは、（ｄ＝１、ｋ＝０）が最も可能性が高く最短の符号語を受け取る状態で、ペア（ｄ、ｋ）の出現確率の降順でインデックスを割り当てる。 Assume that parameters d and k are known. The pair (d,k) is encoded as one symbol resulting as shown in FIG. The resulting symbols are encoded using the Golomb code, and the permutation array map_dk_pair consists of the pair (d, k), with (d=1, k=0) receiving the most likely shortest codeword. Assign indices in descending order of probability of occurrence.

符号化手順は、図１０の擬似コードに要約されている。変数ｓｅｑは、入力シーケンスｃを表す。帯域の数は、グローバル変数ｎｕｍ＿ｂａｎｄｓで利用可能である。 The encoding procedure is summarized in the pseudocode of FIG. The variable seq represents the input sequence c. The number of bands is available in the global variable num_bands.

セクション２．３：パラメータ最適化 Section 2.3: Parameter optimization

パラメータｄ及びｋを決定するために、エンコーダは、バイナリ値の４つの組み合わせの全てを試行し、最短符号長を与える１つを選択する。このことは、実際の符号化を必要としない符号長関数を使用して行われる。 To determine the parameters d and k, the encoder tries all four combinations of binary values and chooses the one that gives the shortest code length. This is done using a code length function that does not require actual encoding.

セクション３：シーケンス復号 Section 3: Sequence decoding

デコーダは、エンコーダのステップを単に逆にしたものであり、例外は、デコーダがビットストリームからパラメータｄ及びｋを読み取り、これらのパラメータを最適にする必要がない点である。復号手順は、図１１の擬似コードに要約されており、この図において、量ｎｕｍ＿ｂａｎｄｓは、既知の帯域数である。 The decoder is simply a reversal of the steps of the encoder, except that the decoder does not have to read the parameters d and k from the bitstream and optimize these parameters. The decoding procedure is summarized in the pseudocode of FIG. 11, where the quantity num_bands is the known number of bands.

１００符号化システム
１０２デジタルオーディオ信号
１０４ビットストリーム
１０６プロセッサ
１０８メモリデバイス
１１０命令 100 encoding system 102 digital audio signal 104 bitstream 106 processor 108 memory device 110 instructions

Claims

a processor;
a memory device storing instructions executable by the processor, the instructions being executable by the processor to perform a method for encoding an audio signal;
In a coding system comprising:
The method includes:
receiving a digital audio signal;
parsing the digital audio signal into a plurality of frames each including a specified number of audio samples;
transforming the audio samples of each frame to generate a plurality of frequency domain coefficients for each frame;
dividing a plurality of frequency domain coefficients for each frame into a plurality of bands for each frame, each band having default values of time resolution and frequency resolution after the transform, and each band having adjusted a reshaping parameter representing a time resolution and an adjusted frequency resolution, the reshaping parameter representing a time resolution and a frequency resolution to the adjusted values of the time resolution and frequency resolution; a step, which is a value indicating a change in frequency resolution from the default value;
encoding the parsed, transformed, and segmented digital audio signal into a bitstream comprising reshaping parameters for each of the bands, the reshaping parameters for a first band being a first the reshaping parameters for a second band different from the first band are coded using a second alphabet size different from the first alphabet size. , step and
outputting the bitstream;
including,
A coding system characterized by:

The method further includes:
adjusting the time resolution and frequency resolution of each band of each of the frames, a first of the time resolution and a first of the frequency resolution being selected from one of a plurality of specified ranges of integers; complementarily adjusted by the magnitude described by said reshaping parameter having a value that is an integer;
the first alphabet size is equal to the number of integers in a first specified range of integers among the plurality of specified ranges of integers;
2. The encoding system of claim 1, wherein the second alphabet size is equal to the number of integers in a second specified range of integers of the plurality of specified ranges of integers.

3. The encoding system of claim 2, wherein the first alphabet size is four and the second alphabet size is five.

3. The encoding system of claim 2, wherein, before the adjustment, the temporal resolution of the first band is equal to eight audio samples and the temporal resolution of the second band is equal to one audio sample. .

each band has a size equal to the product of the time resolution of the band and the frequency resolution of the band;
3. The encoding system of claim 2, wherein the time resolution of the band and the frequency resolution of the band are complementarily adjusted without changing the size of the band.

6. The encoding system of claim 5, wherein the time resolution is adjusted by a multiple of ^2c , the frequency resolution is varied by a multiple of 2 ^-c , and the quantity c is the reshaping parameter.

The method further includes:
forming a reshaping sequence for each frame that describes the reshaping parameters for each band;
normalizing each entry in each reshaping sequence to a range of possible values for said entry;
7. A coding system according to any of claims 2 to 6, wherein each range of possible values corresponds to an integer in the specified range for the band.

The method further includes:
forming a first sequence for each frame that describes the reshaping parameters for the frame as a sequence representing the reshaping parameters for each band using a unary code;
forming a second sequence for each frame that describes the reshaping parameters for the frame as a sequence representing the reshaping parameters for each band using a quasi-uniform code;
forming a third sequence for each frame that describes the reshaping parameter for the frame as a sequence representing a difference in the reshaping parameter between adjacent bands using a unary code;
forming a fourth sequence for each frame that describes the reshaping parameter for the frame as a sequence representing a difference in the reshaping parameter between adjacent bands using a quasi-uniform code;
selecting the shortest sequence that is the sequence that includes the least number of elements among the first sequence, the second sequence, the third sequence, and the fourth sequence;
embedding the selected shortest sequence into the bitstream for each frame;
embedding in the bitstream, for each frame, data representing an indicator of which of the four sequences is included in the bitstream;
The encoding system according to claim 1, comprising:

The encoding system of claim 1, wherein the transform is a modified discrete cosine transform.

The encoding system of claim 1, wherein each frame includes exactly 1024 samples.

2. The encoding system of claim 1, wherein the number of frequency domain coefficients in the respective plurality of frequency domain coefficients is equal to the specified number of audio samples in each frame.

The encoding system of claim 1, wherein the plurality of frequency domain coefficients for each frame includes exactly 1024 frequency domain coefficients.

2. The encoding system of claim 1, wherein the plurality of bands for each frame includes exactly 22 bands.

The encoding system of claim 1, wherein the encoding system is included in a codec.

a processor;
a memory device storing instructions executable by the processor, the instructions being executable by the processor to perform a method for decoding an encoded audio signal;
A decoding system comprising:
The method includes:
receiving a bitstream including a plurality of frames each divided into a plurality of bands;
extracting, for each band of each frame, from the bitstream a reshaping parameter representing an adjusted temporal resolution and an adjusted frequency resolution for the band, the reshaping parameter representing the adjusted temporal resolution and adjusted frequency resolution; a value indicative of a change from default values of time resolution and frequency resolution to the adjusted value of resolution, the reshaping parameter for a first band reshaping the bitstream using a first alphabet size embedded, the reshaping parameters for a second band different from the first band are embedded in the bitstream using a second alphabet size different from the first alphabet size;
decoding the bitstream using the reshaping parameters to generate a decoded digital audio signal;
including;
The decoding includes adjusting the adjusted temporal resolution and the adjusted frequency resolution of each band of each frame and subsequently applying an inverse transform, the adjusting temporal resolution and the adjusting the frequency resolution is adjusted using the reshaping parameter to increase one of the first time resolution and the first frequency resolution and decrease the other, or to leave both unchanged. A decoding system characterized by:

The method further includes:
For each band of each frame,
whether the reshaping parameters in the bitstream are represented as unary codes or quasi-uniform codes; and whether the reshaping parameters in the bitstream define the reshaping parameters for each of the bands. or as a sequence representing a difference in said reshaping parameter between adjacent said bands;
16. The decoding system according to claim 15, comprising the step of extracting data indicative of .

The decoding system according to claim 15 or 16, wherein the decoding system is included in a codec.

A coding system,
a receiver circuit for receiving a digital audio signal;
a framer circuit for parsing the digital audio signal into a plurality of frames each including a specified number of audio samples;
a transformer circuit for performing a transform on the audio samples of each frame to generate a plurality of frequency domain coefficients for each frame;
a frequency band divider circuit for dividing the plurality of frequency domain coefficients for each frame into a plurality of bands for each frame, each band representing an adjusted time resolution and an adjusted frequency resolution; a reshaping parameter, the reshaping parameter indicating a change from default values of time resolution and frequency resolution to the adjusted values of time resolution and frequency resolution; a frequency band divider circuit,
an encoder circuit for encoding the parsed, transformed, and segmented digital audio signal into a bitstream comprising reshaping parameters for each band, the reshaping parameters for a first band comprising: the reshaping parameters for a second band different from the first band are encoded using a second alphabet size different from the first alphabet size; an encoder circuit,
an output circuit for outputting the bitstream;
An encoding system comprising:

Further comprising a resolution adjustment circuit for adjusting the time resolution and frequency resolution of each band of each of the frames, the first said time resolution and the first said frequency resolution being within a plurality of specified ranges of integers. complementarily adjusted by the magnitude described by the reshaping parameter having a value that is an integer selected from one;
the first alphabet size is equal to the number of integers in a first specified range of integers among the plurality of specified ranges of integers;
19. The encoding system of claim 18, wherein the second alphabet size is equal to the number of integers in a second specified range of integers of the plurality of specified ranges of integers.

20. The encoding system of claim 19, wherein the temporal resolution is adjusted by a multiple of ^2c , the frequency resolution is varied by a multiple of 2 ^-c , and the quantity c is the reshaping parameter.