JP2008032823A

JP2008032823A - Voice encoding apparatus

Info

Publication number: JP2008032823A
Application number: JP2006203417A
Authority: JP
Inventors: Shiyouko Osada; 将高長田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-07-26
Filing date: 2006-07-26
Publication date: 2008-02-14
Anticipated expiration: 2026-07-26
Also published as: JP5010197B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice encoding apparatus, capable of performing sectioning suitable for a voice signal with a small processing amount. <P>SOLUTION: A coefficient which is a maximum value is extracted from among Modified Discrete Cosine Transform (MDCT) coefficients included in each scale factor band. When a ratio of the maximum values of the MDCT coefficients of adjoining scale factor bands is within a predetermined range, those scale factor bands are set in the same section. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声符号化装置に係り、特に、セクショニング処理に関する。 The present invention relates to a speech encoding apparatus, and more particularly to sectioning processing.

ＡＡＣ（Advanced Audio Coding）方式などによって圧縮された音声信号をハフマン符号化する際、音声信号を周波数変換、例えば、ＭＤＣＴ変換（Modified Discrete Cosine Transform、修正離散コサイン変換）して得られた周波数スペクトルであるＭＤＣＴ係数を量子化して、得られた量子化スペクトルを符号化する。この符号化の際、スケールファクターバンド（以後、ｓｆｂと称する。）と称される周波数帯域毎に発生符号量が少ないハフマンテーブルを選択し、選択されたテーブルを参照して符号化する。 When Huffman coding of an audio signal compressed by an AAC (Advanced Audio Coding) method or the like, the frequency spectrum obtained by frequency conversion of the audio signal, for example, MDCT (Modified Discrete Cosine Transform) is used. A certain MDCT coefficient is quantized, and the obtained quantized spectrum is encoded. In this encoding, a Huffman table with a small generated code amount is selected for each frequency band called a scale factor band (hereinafter referred to as sfb), and encoding is performed with reference to the selected table.

そして、符号化された量子化スペクトルに加えて、選択されたテーブルを示す情報を符号化された音声信号であるビットストリームに含ませる。なお、ハフマンテーブルは、量子化スペクトルの最大値に応じて複数個用意されており、量子化スペクトルの最大値がより小さければ、より小さいテーブルを選択することで、効率的な符号化を行う。 Then, in addition to the encoded quantized spectrum, information indicating the selected table is included in the bit stream that is the encoded audio signal. A plurality of Huffman tables are prepared according to the maximum value of the quantized spectrum. If the maximum value of the quantized spectrum is smaller, efficient coding is performed by selecting a smaller table.

また、セクションと呼ばれる隣り合う複数のｓｆｂで同一のハフマンテーブルを選択する、セクショニング処理が知られている。セクショニング処理によると、選択されたテーブルを示す情報を隣り合うｓｆｂで共通化することができ、上記ビットストリームに含まれるテーブルを示す情報の削減が可能である。この情報の削減により、量子化スペクトルにより多くのビットを割り当てることができ、符号化された音声の音質の向上を図ることができる。 In addition, a sectioning process for selecting the same Huffman table with a plurality of adjacent sfb called sections is known. According to the sectioning process, information indicating the selected table can be shared by adjacent sfb, and information indicating the table included in the bit stream can be reduced. By reducing this information, more bits can be allocated to the quantized spectrum, and the sound quality of the encoded speech can be improved.

セクショニングは、スペクトルの量子化の前で行うことも、量子化の後で行うことも知られている。量子化の前で行うには、例えば、ｓｆｂ毎に周波数変換して得られたスペクトルの最大値を求め、求められた最大値が大きいｓｆｂを所定個選択し、それらの選択されたｓｆｂ毎に適切なハフマンテーブルを選択する。そして、選択されたｓｆｂの近隣のｓｆｂは、選択されたｓｆｂと同一のセクションとする処理が知られている（例えば、特許文献１参照。）。 It is known that sectioning is performed before spectral quantization or after quantization. In order to perform before quantization, for example, the maximum value of the spectrum obtained by frequency conversion for each sfb is obtained, a predetermined number of sfb having the largest obtained maximum value is selected, and for each selected sfb Choose an appropriate Huffman table. A process is known in which the sfb adjacent to the selected sfb is set to the same section as the selected sfb (see, for example, Patent Document 1).

セクショニングをスペクトルの量子化の後に行うには、例えば、ｓｆｂ毎に所定のテーブルを参照してハフマンテーブルを選択し、その選択の後、隣り合うｓｆｂが同一のハフマンテーブルを用いることによって上記ビットストリームに含まれるテーブルを示す情報の削減が可能であれば、それらの隣り合うｓｆｂを同一のセクションとする処理が知られている（例えば、特許文献２参照。）。
特開２００２−９１４９８号公報（第１頁、図１、図３）特開２００３−２３３３９７号公報（第１頁、図２） In order to perform sectioning after spectrum quantization, for example, a Huffman table is selected with reference to a predetermined table for each sfb, and after the selection, the above-described bit stream is used by using a Huffman table having the same sfb. If it is possible to reduce the information indicating the table included in the table, a process is known in which those adjacent sfb are set to the same section (see, for example, Patent Document 2).
JP 2002-91498 A (first page, FIG. 1, FIG. 3) Japanese Patent Laying-Open No. 2003-233397 (first page, FIG. 2)

しかしながら、上述した特許文献１に開示されている方法では、セクションの数が音声信号に依存せずに決定されるため、その音声信号に適したセクショニングが困難である問題点があった。音声信号のスペクトルが近い周波数帯域のｓｆｂに集中して分布する場合、この問題点は顕著である。 However, in the method disclosed in Patent Document 1 described above, since the number of sections is determined without depending on the audio signal, there is a problem that it is difficult to perform sectioning suitable for the audio signal. This problem is significant when the spectrum of the audio signal is concentrated and distributed over sfb in the near frequency band.

また、上述した特許文献２に開示されている方法では、所定のテーブルを参照してハフマンテーブルを選択するため、音声信号に適したハフマンテーブルが選択されない可能性がある問題点があった。また、符号化された音声信号であるビットストリームを作成し、そのビット長が所定の値でない場合、スペクトルの量子化の段階に戻って処理を繰り返す必要がある。そこで、処理量が過大になる可能性がある問題点があった。 Further, the method disclosed in Patent Document 2 described above has a problem in that a Huffman table suitable for an audio signal may not be selected because a Huffman table is selected with reference to a predetermined table. In addition, when a bit stream that is an encoded audio signal is created and the bit length is not a predetermined value, it is necessary to return to the spectrum quantization stage and repeat the process. Therefore, there is a problem that the processing amount may become excessive.

本発明は上記問題点を解決するためになされたもので、音声信号に適したセクショニングを少ない処理量で行う音声符号化装置を提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a speech coding apparatus that performs sectioning suitable for speech signals with a small amount of processing.

上記目的を達成するために、本発明の音声符号化装置は、音声信号を所定のスケールファクターバンドに分類された周波数スペクトルに変換し、前記スケールファクターバンド毎にそのスケールファクターバンドに分類された周波数スペクトルの代表値に依存して複数のハフマンテーブルの中のいずれかのハフマンテーブルを選択し、前記スケールファクターバンドに分類された周波数スペクトルを前記選択されたハフマンテーブルを参照して符号化し、その符号化された周波数スペクトルと、前記参照されたハフマンテーブルを識別する符号とを含む符号化された音声信号を作成する符号化手段を有し、前記符号化手段は、隣り合う前記スケールファクターバンドに分類された周波数スペクトルの代表値の比率が所定の比率以内である場合、それらのスケールファクターバンドに同一の前記ハフマンテーブルを選択することを特徴とする。 To achieve the above object, the speech coding apparatus of the present invention converts a speech signal into a frequency spectrum classified into a predetermined scale factor band, and the frequency classified into the scale factor band for each scale factor band. Depending on the representative value of the spectrum, one of the plurality of Huffman tables is selected, the frequency spectrum classified into the scale factor band is encoded with reference to the selected Huffman table, and the code Encoding means for creating an encoded speech signal including a frequency spectrum that has been converted and a code that identifies the referenced Huffman table, wherein the encoding means is classified into the adjacent scale factor bands When the ratio of the representative values of the frequency spectrum is within the specified ratio And selects the same the Huffman table to their scale factor band.

また、本発明の音声符号化装置は、音声信号を所定のスケールファクターバンドに分類された周波数スペクトルに変換し、前記スケールファクターバンド毎にそのスケールファクターバンドに分類された周波数スペクトルの代表値に依存して複数のハフマンテーブルの中のいずれかのハフマンテーブルを選択し、前記スケールファクターバンドに分類された周波数スペクトルを前記選択されたハフマンテーブルを参照して符号化し、その符号化された周波数スペクトルと、前記参照されたハフマンテーブルを識別する符号とを含む符号化された音声信号を作成する符号化手段を有し、前記符号化手段は、第１の前記スケールファクターバンドに分類された周波数スペクトルの代表値が小さく、その第１のスケールファクターバンドの低周波数側に隣り合う第２の前記スケールファクターバンドに分類された周波数スペクトルの代表値と、その第１のスケールファクターバンドの高周波数側に隣り合う第３の前記スケールファクターバンドに分類された周波数スペクトルの代表値とが前記第１のスケールファクターバンドの代表値より大きい場合、前記第１、前記第２及び前記第３のスケールファクターバンドに同一の前記ハフマンテーブルを選択することを特徴とする。 The speech coding apparatus according to the present invention converts a speech signal into a frequency spectrum classified into a predetermined scale factor band, and depends on a representative value of the frequency spectrum classified into the scale factor band for each scale factor band. And selecting one of the plurality of Huffman tables, encoding the frequency spectrum classified into the scale factor band with reference to the selected Huffman table, and the encoded frequency spectrum Encoding means including a code identifying the referenced Huffman table, the encoding means comprising: a first frequency spectrum classified into the first scale factor band; Low representative value, low frequency of its first scale factor band And a representative value of the frequency spectrum classified into the second scale factor band adjacent to the second scale factor band and a representative of the frequency spectrum classified into the third scale factor band adjacent to the high frequency side of the first scale factor band When the value is larger than the representative value of the first scale factor band, the same Huffman table is selected for the first, second, and third scale factor bands.

本発明によれば、音声信号に適したセクショニングを少ない処理量で行う音声符号化装置を提供することができる。 According to the present invention, it is possible to provide a speech coding apparatus that performs sectioning suitable for speech signals with a small amount of processing.

以下に、本発明による音声符号化装置の実施の形態を、図面を参照して説明する。 Embodiments of a speech encoding apparatus according to the present invention will be described below with reference to the drawings.

（第１の実施形態）
図１は、本発明の第１の実施形態に係る音声符号化装置の構成を示すブロック図である。この音声符号化装置は、装置全体の制御をする制御部１１と、音声信号記憶部２１と、時間／周波数変換部３１と、心理聴覚解析部３２と、スケールファクター乗算部３３と、セクショニング部３４と、量子化ループ処理部４１と、フォーマッタ部５１と、符号化音声信号記憶部５２とからなる。 (First embodiment)
FIG. 1 is a block diagram showing a configuration of a speech encoding apparatus according to the first embodiment of the present invention. The speech coding apparatus includes a control unit 11 that controls the entire apparatus, a speech signal storage unit 21, a time / frequency conversion unit 31, a psychoacoustic analysis unit 32, a scale factor multiplication unit 33, and a sectioning unit 34. And a quantization loop processing unit 41, a formatter unit 51, and an encoded audio signal storage unit 52.

量子化ループ処理部４１は、量子化部４２と、ハフマン符号化部４３と、発生符号量カウント部４４とからなる。 The quantization loop processing unit 41 includes a quantization unit 42, a Huffman coding unit 43, and a generated code amount counting unit 44.

上記のように構成された、本発明の第１の実施形態に係る音声符号化装置の各部の動作を図１を参照して説明する。 The operation of each unit of the speech coding apparatus according to the first embodiment of the present invention configured as described above will be described with reference to FIG.

音声信号記憶部２１には、ＰＣＭ（Pulse Code Modulation）方式でデジタル信号に変換された音声信号が記憶される。 The audio signal storage unit 21 stores an audio signal converted into a digital signal by a PCM (Pulse Code Modulation) method.

時間／周波数変換部３１は、音声信号記憶部２１に記憶された音声信号を読み込んで、時間／周波数変換し、周波数スペクトルを作成して送信する。時間／周波数変換として、ＭＤＣＴ方式が用いられる。そして、周波数スペクトルとして、ＭＤＣＴ係数が作成される。なお、時間／周波数変換は、ＭＤＣＴ方式に限られるものではない。 The time / frequency conversion unit 31 reads the audio signal stored in the audio signal storage unit 21, performs time / frequency conversion, creates a frequency spectrum, and transmits the frequency spectrum. The MDCT method is used for time / frequency conversion. Then, an MDCT coefficient is created as a frequency spectrum. The time / frequency conversion is not limited to the MDCT method.

心理聴覚解析部３２には、マスキング効果特性を含む心理聴覚モデルが記憶される。そして、心理聴覚解析部３２は、時間／周波数変換部３１によって作成されたＭＤＣＴ係数を受信し、記憶された心理聴覚モデルによってマスキング効果を適用した許容量子化歪み量を、受信されたＭＤＣＴ係数のｓｆｂ毎に算出して送信する。また、レート制御情報を算出して送信する。 The psychoacoustic analysis unit 32 stores a psychoacoustic model including a masking effect characteristic. Then, the psychoacoustic analysis unit 32 receives the MDCT coefficient created by the time / frequency conversion unit 31, and calculates the allowable quantization distortion amount to which the masking effect is applied by the stored psychoacoustic model of the received MDCT coefficient. Calculate and transmit for each sfb. Also, rate control information is calculated and transmitted.

なお、心理聴覚解析部３２は、これらの算出に際し、音声信号記憶部２１に記憶された音声信号を読み込んで、時間／周波数変換部３１とは異なる方式の時間／周波数変換、例えば、ＦＦＴ（Fast Fourier Transform）方式の変換を行い、その変換によって得られた周波数スペクトルを用いても良い。この異なる変換を用いる方法によれば、時間／周波数変換部３１及び心理聴覚解析部３２によって２つの時間／周波数変換がされることによる処理量の増加が発生するが、これらの処理部にとって適切な変換を独立して選択することができる。 Note that the psychoacoustic analysis unit 32 reads the audio signal stored in the audio signal storage unit 21 and performs time / frequency conversion using a method different from the time / frequency conversion unit 31, for example, FFT (Fast). (Fourier Transform) conversion may be performed, and a frequency spectrum obtained by the conversion may be used. According to the method using different conversions, an increase in processing amount occurs due to two time / frequency conversions performed by the time / frequency conversion unit 31 and the psychoacoustic analysis unit 32, which are appropriate for these processing units. Transformations can be selected independently.

スケールファクター乗算部３３は、心理聴覚解析部３２によって算出されたｓｆｂ毎の許容量子化歪み量を受信し、その歪み量からｓｆｂ毎にスケールファクターを算出する。そして、時間／周波数変換部３１によって作成されたＭＤＣＴ係数を受信し、そのＭＤＣＴ係数にｓｆｂ毎に算出されたスケールファクターを乗算して、積のＭＤＣＴ係数を送信する。 The scale factor multiplication unit 33 receives the allowable quantization distortion amount for each sfb calculated by the psychoacoustic analysis unit 32, and calculates the scale factor for each sfb from the distortion amount. Then, the MDCT coefficient created by the time / frequency conversion unit 31 is received, the MDCT coefficient is multiplied by the scale factor calculated for each sfb, and the product MDCT coefficient is transmitted.

セクショニング部３４は、スケールファクター乗算部３３によって作成されたＭＤＣＴ係数を受信し、セクションニング、即ち、２以上の隣り合うｓｆｂを、同じセクションとする。そして、受信されたＭＤＣＴ係数と、決定されたセクショニングを示す情報とを送信する。なお、あるｓｆｂが無音である、即ち、そのｓｆｂにＭＤＣＴ係数が存在しない場合、セクショニング部３４は、その旨を送信する。そして、そのｓｆｂをセクショニングの対象としない。なぜなら、そのｓｆｂは、ハフマンテーブルによる符号化の対象ではないからである。 The sectioning unit 34 receives the MDCT coefficient created by the scale factor multiplication unit 33, and sets sectioning, that is, two or more adjacent sfb as the same section. Then, the received MDCT coefficient and information indicating the determined sectioning are transmitted. When a certain sfb is silent, that is, when there is no MDCT coefficient in the sfb, the sectioning unit 34 transmits that fact. The sfb is not subject to sectioning. This is because the sfb is not an object to be encoded by the Huffman table.

量子化ループ処理部４１は、量子化部４２の動作と、ハフマン符号化部４３の動作と、発生符号量カウント部４４の動作とを繰り返す。そして、ハフマン符号化部４３によって選択されたハフマンテーブルの情報と、そのハフマンテーブルを用いて符号化された符号と、心理聴覚解析部３２から送信されたレート制御情報とを記憶する。 The quantization loop processing unit 41 repeats the operation of the quantization unit 42, the operation of the Huffman encoding unit 43, and the operation of the generated code amount counting unit 44. Then, the information of the Huffman table selected by the Huffman encoding unit 43, the code encoded using the Huffman table, and the rate control information transmitted from the psychoacoustic analysis unit 32 are stored.

量子化部４２は、セクショニング部３４によって送信されたＭＤＣＴ係数をｓｆｂ単位で非均一に量子化して、量子化されたＭＤＣＴ係数を送信する。即ち、ＭＤＣＴ係数を除する際の除数である量子化ステップは、ｓｆｂに依存する。なお、あるｓｆｂが無音である場合、量子化部４２は、そのｓｆｂを量子化の対象としない。 The quantization unit 42 non-uniformly quantizes the MDCT coefficients transmitted by the sectioning unit 34 in units of sfb, and transmits the quantized MDCT coefficients. That is, the quantization step, which is a divisor when dividing the MDCT coefficient, depends on sfb. If a certain sfb is silent, the quantization unit 42 does not set the sfb as a quantization target.

ハフマン符号化部４３には、ハフマンテーブルが記憶される。そして、ハフマン符号化部４３は、量子化部４２によって送信された量子化されたＭＤＣＴ係数を受信し、セクショニング部３４によって作られたセクション毎、またはｓｆｂ毎に、適切なハフマンテーブルを選択して用いて符号化し、符号を量子化ループ処理部４１内に記憶させ、また、送信する。また、無音であるｓｆｂに関しては、ハフマンテーブルを使用しない符号化を行う。また、ハフマン符号化部４３は、用いられたハフマンテーブルを量子化ループ処理部４１内に記憶させる。 The Huffman encoder 43 stores a Huffman table. Then, the Huffman encoder 43 receives the quantized MDCT coefficient transmitted by the quantizer 42 and selects an appropriate Huffman table for each section or sfb created by the sectioning unit 34. And the code is stored in the quantization loop processing unit 41 and transmitted. Further, for sfb that is silent, encoding is performed without using a Huffman table. The Huffman encoding unit 43 stores the used Huffman table in the quantization loop processing unit 41.

発生符号量カウント部４４は、ハフマン符号化部４３によって符号化された符号の符号量、即ちビット数を計測し、各ｓｆｂ毎のビット数を累積しつつ、量子化ループ処理部４１内に記憶させる。この際、ハフマン符号化部４３によって量子化ループ処理部４１内に記憶されたハフマン符号化部４３によって選択されたハフマンテーブルを識別する符号のビット数を併せて計測し、記憶させる。 The generated code amount counting unit 44 measures the code amount of the code encoded by the Huffman encoding unit 43, that is, the number of bits, and stores it in the quantization loop processing unit 41 while accumulating the number of bits for each sfb. Let At this time, the number of bits of the code for identifying the Huffman table selected by the Huffman encoder 43 stored in the quantization loop processor 41 by the Huffman encoder 43 is also measured and stored.

更に、発生符号量カウント部４４は、心理聴覚解析部３２から送信されたレート制御情報を量子化ループ処理部４１内に記憶させ、その情報によって識別される各ｓｆｂ毎に用いられたスケールファクターを示す符号のビット数を併せて計測し、記憶させる。これらを識別するビットは、符号化された音声信号であるビットストリームに含まれるからである。 Furthermore, the generated code amount counting unit 44 stores the rate control information transmitted from the psychoacoustic analysis unit 32 in the quantization loop processing unit 41, and the scale factor used for each sfb identified by the information. The number of bits of the indicated code is also measured and stored. This is because the bits for identifying these are included in a bit stream that is an encoded audio signal.

フォーマッタ部５１は、量子化ループ処理部４１内に記憶された、ハフマン符号化部４３によって選択されたハフマンテーブルを識別する情報と、そのハフマンテーブルを用いて符号化された符号と、心理聴覚解析部３２から送信されたレート制御情報を識別する情報とを読み込んで、符号化された音声信号であるビットストリームを所定の形式で作成し、符号化音声信号記憶部５２に記憶させる。 The formatter unit 51 stores information for identifying the Huffman table selected by the Huffman encoding unit 43, the code encoded using the Huffman table, and the psychoacoustic analysis stored in the quantization loop processing unit 41 The information for identifying the rate control information transmitted from the unit 32 is read, a bit stream that is an encoded audio signal is created in a predetermined format, and stored in the encoded audio signal storage unit 52.

以下、本実施形態に係る音声符号化装置のセクショニング部３４によって行われるセクショニングの動作の詳細を説明する。 Details of the sectioning operation performed by the sectioning unit 34 of the speech encoding apparatus according to this embodiment will be described below.

セクショニング部３４は、スケールファクター乗算部３３によって送信されたＭＤＣＴ係数を受信する。ここで、受信されたＭＤＣＴ係数の一例を図２に示す。ここで、横軸は周波数であり、縦軸は、そのＭＤＣＴ係数の振幅値、即ち大きさである。図２は、ＭＤＣＴ係数が、ｓｆｂに分割されていることを併せて示す。また、各ｓｆｂ毎に最大の振幅であるＭＤＣＴ係数の振幅を太線で示す。なお、図２には、各ｓｆｂの周波数帯域の幅は一定であり、各ｓｆｂ毎に一定数のＭＤＣＴ係数が含まれるように記載されているが、これらは、一例であって、本実施形態に何ら限定を加えるものではない。 The sectioning unit 34 receives the MDCT coefficient transmitted by the scale factor multiplication unit 33. An example of the received MDCT coefficient is shown in FIG. Here, the horizontal axis is the frequency, and the vertical axis is the amplitude value, that is, the magnitude of the MDCT coefficient. FIG. 2 also shows that the MDCT coefficients are divided into sfb. The amplitude of the MDCT coefficient, which is the maximum amplitude for each sfb, is indicated by a bold line. In FIG. 2, the frequency band width of each sfb is constant, and a fixed number of MDCT coefficients are included for each sfb. However, these are examples, and this embodiment It does not add any limitation to.

セクショニング部３４は、第１の方法及び第２の方法によってセクショニングを行う。まず、第１の方法を説明する。セクショニング部３４は、各ｓｆｂに含まれるＭＤＣＴ係数の中から最大値である係数を抽出する。図３は、各ｓｆｂから最大値であるＭＤＣＴ係数が抽出された状態を示す。ここで、横軸はｓｆｂであり、縦軸は、そのｓｆｂ毎のＭＤＣＴ係数の最大の振幅値である。 The sectioning unit 34 performs sectioning by the first method and the second method. First, the first method will be described. The sectioning unit 34 extracts the coefficient that is the maximum value from the MDCT coefficients included in each sfb. FIG. 3 shows a state in which the maximum MDCT coefficient is extracted from each sfb. Here, the horizontal axis is sfb, and the vertical axis is the maximum amplitude value of the MDCT coefficient for each sfb.

ここで、各ｓｆｂで抽出されたＭＤＣＴ係数の最大値をｍａｘ＿ｑｕａｎｔ［ｓｆｂ］とする。そして、隣り合うｓｆｂのＭＤＣＴ係数の最大値の比が所定の範囲内である場合、セクショニング部３４は、それらのｓｆｂを同一セクションであると決定する。 Here, the maximum value of the MDCT coefficient extracted by each sfb is assumed to be max_quant [sfb]. When the ratio of the maximum values of the MDCT coefficients of adjacent sfb is within a predetermined range, the sectioning unit 34 determines that the sfb is the same section.

最大値の比が所定の範囲内であるか否かの判断は、例えば、以下の不等式１が成り立つか否かによる。 Whether or not the ratio of the maximum values is within a predetermined range depends on whether or not the following inequality 1 holds, for example.

SEC_TH1<(max_quant[sfb+1]/ max_quant[sfb])< SEC_TH2 （不等式１）
なお、0<SEC_TH1<1 かつ 1< SEC_TH2 である。 SEC_TH1 <(max_quant [sfb + 1] / max_quant [sfb]) <SEC_TH2 (inequality 1)
Note that 0 <SEC_TH1 <1 and 1 <SEC_TH2.

ここで、ＳＥＣ＿ＴＨ１と、ＳＥＣ＿ＴＨ２とを乗算した積は１としても良い。例えば、ＳＥＣ＿ＴＨ１は１／２、ＳＥＣ＿ＴＨ２は２である。これらの値により、上記隣り合うｓｆｂのＭＤＣＴ係数の最大値の比が２倍以内であるか否かを判断することができる。 Here, the product obtained by multiplying SEC_TH1 and SEC_TH2 may be 1. For example, SEC_TH1 is 1/2 and SEC_TH2 is 2. Based on these values, it is possible to determine whether or not the ratio of the maximum values of the MDCT coefficients of the adjacent sfb is within twice.

このように、互いに隣り合うｓｆｂのＭＤＣＴ係数の最大値の比が所定の範囲内であるか否かによってセクショニングを行うことにより、セクショニング部３４は、図４に例示するように、ｓｆｂ０〜ｓｆｂ２が１つのセクションであり、ｓｆｂ３とｓｆｂ４とが１つのセクションであり、そして、ｓｆｂ５〜ｓｆｂ７が１つのセクションであると決定する。 Thus, by performing sectioning according to whether the ratio of the maximum values of the MDCT coefficients of sfb adjacent to each other is within a predetermined range, the sectioning unit 34 has sfb0 to sfb2 as illustrated in FIG. It is determined that sfb3 and sfb4 are one section, and sfb5 to sfb7 are one section.

また、上記不等式１は、ＳＥＣ＿ＴＨ１を大きく、また、ＳＥＣ＿ＴＨ２を小さくする程、隣り合うｓｆｂのＭＤＣＴ係数の最大値が同じであっても成り立ち難くなる。即ち、同一のセクションと決定し難くなる、言い換えると、それらのｓｆｂ毎に独立してハフマンテーブルを選択するとの判断を行い易くなる。 Further, the inequality 1 becomes difficult to hold even if the maximum values of the MDCT coefficients of adjacent sfb are the same as SEC_TH1 is increased and SEC_TH2 is decreased. That is, it is difficult to determine the same section, in other words, it is easy to determine that the Huffman table is selected independently for each sfb.

なお、装置が作成する符号化された音声信号であるビットストリームのビット数が多いことが許容される場合、ｓｆｂ毎に略独立してハフマンテーブルを選択することによって、いずれのｓｆｂにおいても適切なハフマンテーブルを選択しても良い。そして、その選択のために、上述のように、不等式１を成り立ち難くすることが有効である。 In addition, when it is allowed that the number of bits of the bit stream that is the encoded audio signal generated by the apparatus is large, it is appropriate for any sfb by selecting the Huffman table almost independently for each sfb. A Huffman table may be selected. For this selection, it is effective to make inequality 1 difficult to hold as described above.

なお、上記不等式１が成り立つ場合、同一のセクションとする処理によると、セクションの数は、音声信号記憶部２１に記憶された音声信号に依存して変化し、一定ではない。即ち、不等式１が成り立つか否かの容易な判断によって、音声信号に依存した優れたセクショニングが行われる。 When the inequality 1 holds, according to the processing of the same section, the number of sections changes depending on the audio signal stored in the audio signal storage unit 21 and is not constant. That is, excellent sectioning depending on the audio signal is performed by easy determination as to whether inequality 1 holds.

第２の方法による場合、セクショニング部３４は、ＭＤＣＴ係数の最大値が比較的小さいｓｆｂを、そのｓｆｂに隣り合うＭＤＣＴ係数の最大値が比較的大きいｓｆｂと同一セクションとする。即ち、ＭＤＣＴ係数の最大値が比較的小さいｓｆｂを、ＭＤＣＴ係数の最大値が比較的大きいｓｆｂに適した比較的大きいハフマンテーブルを用いるセクションに加える。 In the case of the second method, the sectioning unit 34 sets the sfb having a relatively small maximum value of the MDCT coefficient as the same section as the sfb having a relatively large maximum value of the MDCT coefficient adjacent to the sfb. That is, sfb having a relatively small maximum value of the MDCT coefficient is added to a section using a relatively large Huffman table suitable for sfb having a relatively large maximum value of the MDCT coefficient.

即ち、図５は、あるｓｆｂ（図５では、ｓｆｂ１。）のＭＤＣＴ係数の最大値が比較的小さいために、比較的小さいハフマンテーブルが選択され、そのｓｆｂの両隣のｓｆｂ（図５では、ｓｆｂ０及びｓｆｂ２。）のＭＤＣＴ係数の最大値が比較的大きいために、比較的大きいハフマンテーブルが選択されている状況の一例を示す。ここで、ｓｆｂ２〜ｓｆｂ４は、セクショニング部３４によって上記第１の方法に従い１つのセクションとされている。 That is, in FIG. 5, since the maximum value of the MDCT coefficient of a certain sfb (sfb1 in FIG. 5) is relatively small, a relatively small Huffman table is selected, and the sfb on both sides of the sfb (sfb0 in FIG. 5). And sfb2)) because the maximum value of the MDCT coefficient is relatively large, an example of a situation where a relatively large Huffman table is selected is shown. Here, sfb2 to sfb4 are made into one section by the sectioning unit 34 according to the first method.

この状況で、セクショニング部３４は、上記比較的大きい最大値が所定の閾値以上の場合、図６に示すように、上記ｓｆｂ及びそのｓｆｂの両隣のｓｆｂ（図６では、ｓｆｂ０〜ｓｆｂ４。）を１つのセクションにする。そして、このセクションには、上記両隣のｓｆｂに適したハフマンテーブルが選択される。 In this situation, when the relatively large maximum value is equal to or greater than a predetermined threshold, the sectioning unit 34, as shown in FIG. 6, sets the sfb and sfb adjacent to the sfb (sfb0 to sfb4 in FIG. 6). Make one section. In this section, a Huffman table suitable for the sfb on both sides is selected.

このようにすることで、上記ｓｆｂ（図６では、ｓｆｂ１。）に含まれるＭＤＣＴ係数は、それらの係数の最大値と対比すると過大なハフマンテーブルによって符号化されることになる。しかし、セクションの数を２つ減少させた、即ち、符号化された音声信号であるビットストリームから、ハフマンテーブルを識別する情報を２セット削減することができ、上記ビットストリームのビット長の削減が可能となる。 In this way, the MDCT coefficients included in the sfb (sfb1 in FIG. 6) are encoded by an excessive Huffman table as compared with the maximum values of these coefficients. However, it is possible to reduce two sets of information for identifying the Huffman table from the bit stream that is the number of sections reduced by two, that is, an encoded audio signal, and the bit length of the bit stream can be reduced. It becomes possible.

ここで、上記両隣のｓｆｂのＭＤＣＴ係数の最大値の比率が略等しく、それらのｓｆｂのために選択されていたハフマンテーブルが同一であると予想される場合、特に、上記効果は顕著である。しかし、それらのために選択されていたハフマンテーブルが同一であると予想される場合に限ることはない。それらのハフマンテーブルが異なる場合、セクショニング部３４は、新たに選択されたハフマンテーブルによって発生する符号化された符号量の増加量と、ハフマンテーブルを識別する情報を２セット削減したことによる符号量の削減量とから、または、これらの量の予想値によって、セクショニングを行うか否かを決定する。 Here, especially when the ratio of the maximum values of the MDCT coefficients of the adjacent sfb is substantially equal and the Huffman tables selected for the sfb are expected to be the same, the above effect is particularly remarkable. However, it is not limited to the case where the Huffman tables selected for them are expected to be the same. When these Huffman tables are different, the sectioning unit 34 increases the amount of encoded code generated by the newly selected Huffman table and the code amount obtained by reducing two sets of information for identifying the Huffman table. Whether or not to perform sectioning is determined based on the amount of reduction or the expected value of these amounts.

第１の方法及び／または第２の方法は、１回のみ行われると限るものではない。即ち、セクショニングを１回行うことによって、セクションの数を充分に減らすことができない場合、セクショニング部３４は、セクショニングの動作を繰り返して行うことにより、セクションの数を減らすことができる。この繰り返しは、任意の回数に渡って行って良いことは言うまでもない。 The first method and / or the second method are not limited to being performed only once. That is, if the number of sections cannot be sufficiently reduced by performing sectioning once, the sectioning unit 34 can reduce the number of sections by repeatedly performing the sectioning operation. It goes without saying that this repetition may be performed any number of times.

ここで、セクショニング部３４は、第１の方法にあっては、ＳＥＣ＿ＴＨ１にｉ回目に用いた値より小さい値を設定し、また、ＳＥＣ＿ＴＨ２によりｉ回目に用いた値より大きい値を設定して、ｉ＋１回目以降のセクショニングを行う。また、第２の方法にあっては、上記閾値をｉ回目に用いた値より小さい値を設定して、ｉ＋１回目以降のセクショニングを行う。ここで、ｉは、１以上の整数である。 Here, in the first method, the sectioning unit 34 sets a value smaller than the value used for the i-th time in SEC_TH1, and sets a value larger than the value used for the i-th time in SEC_TH2. i + 1 and subsequent sectioning. Further, in the second method, the threshold is set to a value smaller than the value used for the i-th time, and the (i + 1) -th and subsequent sectioning are performed. Here, i is an integer of 1 or more.

なお、上記の説明では、セクショニング部３４は、各ｓｆｂに含まれるＭＤＣＴ係数の中から最大値である係数を抽出するとしたが、これに限るものではない。ＭＤＣＴ係数を代表する値であれば良く、例えば、各ｓｆｂに含まれるＭＤＣＴ係数の平均値を算出し、上記最大値に代えて、算出された平均値を用いても良い。 In the above description, the sectioning unit 34 extracts the coefficient having the maximum value from the MDCT coefficients included in each sfb. However, the present invention is not limited to this. Any value representative of the MDCT coefficient may be used. For example, an average value of the MDCT coefficients included in each sfb may be calculated, and the calculated average value may be used instead of the maximum value.

平均値を用いると、ｓｆｂに多くのＭＤＣＴ係数が含まれ、それらの係数の値に大きな差がない場合、わずかの差によって最大とされた値に強く依存せずにハフマンテーブルを選択することになる。この選択により、音質が優れた符号化された音声信号が作成され、また、符号化された音声信号ビットストリームのビット長の削減が可能となる。 When the average value is used, when many MDCT coefficients are included in sfb and there is no large difference in the values of these coefficients, the Huffman table is selected without strongly depending on the value maximized by a slight difference. Become. By this selection, an encoded audio signal with excellent sound quality is created, and the bit length of the encoded audio signal bit stream can be reduced.

次に、制御部１１による音声信号符号化の制御動作を説明する。図７は、制御部１１による音声信号符号化の制御動作を示すフローチャートである。 Next, the audio signal encoding control operation by the control unit 11 will be described. FIG. 7 is a flowchart showing the control operation of the audio signal encoding by the control unit 11.

制御部１１は、音声信号符号化の制御動作を開始し（ステップＳ１１ａ）、音声信号記憶部２１に記憶された所定量の音声信号を読み込み、音声符号化装置の各部を制御して、以下の動作を行わせる。 The control unit 11 starts the control operation of the audio signal encoding (step S11a), reads a predetermined amount of the audio signal stored in the audio signal storage unit 21, controls each unit of the audio encoding device, and performs the following Let the action take place.

即ち、制御部１１は、時間／周波数変換部３１を制御して、上記フレームからＭＤＣＴ係数を算出させる（ステップＳ１１ｂ）。そして、心理聴覚解析部３２を制御して、上記ＭＤＣＴ係数から許容量子化歪み量と、レート制御情報とを算出させ、スケールファクター乗算部３３を制御して、上記許容量子化歪み量からスケールファクターを算出させ、上記ＭＤＣＴ係数と、乗算させる（ステップＳ１１ｃ）。そして、セクショニング部３４を制御して、乗算されたＭＤＣＴ係数からセクショニングを行わせる（ステップＳ１１ｄ）。 That is, the control unit 11 controls the time / frequency conversion unit 31 to calculate the MDCT coefficient from the frame (step S11b). Then, the psychoacoustic analysis unit 32 is controlled to calculate the allowable quantization distortion amount and rate control information from the MDCT coefficient, and the scale factor multiplication unit 33 is controlled to calculate the scale factor from the allowable quantization distortion amount. Is calculated and multiplied by the MDCT coefficient (step S11c). Then, the sectioning unit 34 is controlled to perform sectioning from the multiplied MDCT coefficient (step S11d).

次に、制御部１１は、量子化ループ処理部４１を制御して、量子化のループをさせる。即ち、量子化部４２を制御して、スケールファクターが乗算されたＭＤＣＴ係数を量子化させる（ステップＳ１１ｅ）。そして、ハフマン符号化部４３を制御して、セクショニング結果に従ってハフマンテーブルを選択させて、量子化されたＭＤＣＴ係数をハフマン符号化させる。そして、符号と、符号量と、選択されたハフマンテーブルを記憶させる（ステップＳ１１ｆ）。 Next, the control unit 11 controls the quantization loop processing unit 41 to cause a quantization loop. That is, the quantization unit 42 is controlled to quantize the MDCT coefficient multiplied by the scale factor (step S11e). Then, the Huffman encoding unit 43 is controlled to select a Huffman table according to the sectioning result, and the quantized MDCT coefficients are Huffman encoded. Then, the code, the code amount, and the selected Huffman table are stored (step S11f).

そして、制御部１１は、発生符号量カウント部４４を制御して、発生した全ての符号量を計測させ、その符号量が所定のビット長以内であるか否かを判断させる（ステップＳ１１ｇ）。そして、所定のビット長以内であった場合、フォーマッタ部５１を制御して、符号化された音声信号を所定のビットストリーム形式に整えさせ、符号化音声信号記憶部５２に記憶させ、（ステップＳ１１ｈ）、動作を終了する（ステップＳ１１ｉ）。 Then, the control unit 11 controls the generated code amount counting unit 44 to measure all generated code amounts, and determines whether or not the code amount is within a predetermined bit length (step S11g). If it is within the predetermined bit length, the formatter unit 51 is controlled to arrange the encoded audio signal into a predetermined bit stream format and stored in the encoded audio signal storage unit 52 (step S11h). ), The operation is terminated (step S11i).

一方、所定のビット長以内でなかった場合、制御部１１は、ステップＳ１１ｅの量子化部４２を制御して量子化を行うステップに戻って、動作させる。この際、より大きい量子化ステップによって量子化をさせる。 On the other hand, when it is not within the predetermined bit length, the control unit 11 controls the quantization unit 42 in step S11e to return to the step of performing the quantization to operate. At this time, quantization is performed by a larger quantization step.

（第２の実施形態）
図８は、本発明の第２の実施形態に係る音声符号化装置の構成を示すブロック図である。この第２の実施形態に係る音声符号化装置で、第１の実施形態に係る音声符号化装置と同じ部分には、同じ符号を付して説明を省略する。この第２の実施形態に係る音声符号化装置は、装置全体の制御をする制御部１１と、音声信号記憶部２１と、時間／周波数変換部３１と、心理聴覚解析部３２と、スケールファクター乗算部３３と、量子化ループ処理部４１と、フォーマッタ部５１と、符号化音声信号記憶部５２とからなる。 (Second Embodiment)
FIG. 8 is a block diagram showing a configuration of a speech encoding apparatus according to the second embodiment of the present invention. In the speech encoding apparatus according to the second embodiment, the same parts as those in the speech encoding apparatus according to the first embodiment are denoted by the same reference numerals and description thereof is omitted. The speech coding apparatus according to the second embodiment includes a control unit 11 that controls the entire apparatus, a speech signal storage unit 21, a time / frequency conversion unit 31, a psychoacoustic analysis unit 32, and a scale factor multiplication. Unit 33, quantization loop processing unit 41, formatter unit 51, and encoded speech signal storage unit 52.

量子化ループ処理部４１は、量子化部４２と、セクショニング部３４と、ハフマン符号化部４３と、発生符号量カウント部４４とからなる。 The quantization loop processing unit 41 includes a quantization unit 42, a sectioning unit 34, a Huffman coding unit 43, and a generated code amount counting unit 44.

そして、第２の実施形態に係る音声符号化装置と、第１の実施形態に係る音声符号化装置との相違点は、以下の点である。即ち、セクショニング部３４は、第１の実施形態では、スケールファクター乗算部３３によって作成されたＭＤＣＴ係数に基づいて、セクションニングを行ったのに対し、第２の実施形態では、量子化ループ処理部４１内に置かれ、量子化部４２によって量子化されたＭＤＣＴ係数に基づいて、セクションニングを行うことである。 The differences between the speech encoding apparatus according to the second embodiment and the speech encoding apparatus according to the first embodiment are as follows. That is, the sectioning unit 34 performs sectioning based on the MDCT coefficient created by the scale factor multiplication unit 33 in the first embodiment, whereas the quantization loop processing unit in the second embodiment. 41, and performing sectioning based on the MDCT coefficients quantized by the quantization unit 42.

これに伴い、第２の実施形態では、スケールファクター乗算部３３によって作成されたＭＤＣＴ係数は、量子化部４２に送られ、セクショニング部３４によって送信されたＭＤＣＴ係数と、セクショニングを示す情報とは、ハフマン符号化部４３が受信する。また、スケールファクター乗算部３３は、あるｓｆｂが無音である場合、その旨をセクショニング部３４に送信する。 Accordingly, in the second embodiment, the MDCT coefficient created by the scale factor multiplication unit 33 is sent to the quantization unit 42, and the MDCT coefficient transmitted by the sectioning unit 34 and the information indicating sectioning are: The Huffman encoding unit 43 receives it. Further, when a certain sfb is silent, the scale factor multiplication unit 33 transmits that fact to the sectioning unit 34.

次に、第２の実施形態に係る音声符号化装置の制御部１１による音声信号符号化の制御動作を説明する。図９は、制御部１１による音声信号符号化の制御動作を示すフローチャートである。この第２の実施形態に係る制御動作で、第１の実施形態に係る制御動作と同じ動作ステップには、同じ符号を付して説明を省略する。 Next, the control operation of speech signal encoding by the control unit 11 of the speech encoding apparatus according to the second embodiment will be described. FIG. 9 is a flowchart showing the control operation of the audio signal encoding by the control unit 11. In the control operation according to the second embodiment, the same operation steps as those of the control operation according to the first embodiment are denoted by the same reference numerals and description thereof is omitted.

この第２の実施形態に係る制御動作と、第１の実施形態に係る制御動作との相違点は、以下の点である。即ち、第１の実施形態においては、ステップＳ１１ｄのセクショニング部３４を制御してセクショニングを行わせる制御動作は、ステップＳ１１ｃのスケールファクターの決定、その乗算の動作と、ステップＳ１１ｅの量子化の動作の間に置かれたのに対し、第２の実施形態においては、ステップＳ１１ｅの量子化の動作と、ステップＳ１１ｆのハフマン符号化、発生符号記憶の動作との間に置かれることである。 The differences between the control operation according to the second embodiment and the control operation according to the first embodiment are as follows. That is, in the first embodiment, the control operation for controlling the sectioning unit 34 in step S11d to perform sectioning includes the determination of the scale factor in step S11c, the multiplication operation thereof, and the quantization operation in step S11e. On the other hand, in the second embodiment, it is placed between the quantization operation in step S11e and the Huffman coding and generation code storage operations in step S11f.

これによって、ステップＳ１１ｄのセクショニングを行わせる動作は、ステップＳ１１ｇの発生された全ての符号量が所定のビット長以内であるか否かを判断させる動作によって、所定のビット長以内でないと判断された場合、繰り返し行われることになる。そこで、繰り返し行われる際に、制御部１１は、量子化部４２を制御して、より大きい量子化ステップによって量子化をさせることに加えて、または、代えて、上述したように、セクショニング部３４を制御して、一層セクションの数を減らさせても良い。 Accordingly, the operation for performing the sectioning in step S11d is determined not to be within the predetermined bit length by the operation for determining whether or not all the generated code amounts in step S11g are within the predetermined bit length. The case will be repeated. Thus, when repeatedly performed, the control unit 11 controls the quantization unit 42 to perform quantization by a larger quantization step, or alternatively, as described above, the sectioning unit 34. May be controlled to further reduce the number of sections.

（その他の実施形態）
上記の実施形態では、入力される音声信号は、音声信号記憶部２１に記憶されるとした。これは、符号化される全ての音声信号が記憶されるとしても良い。また、例えば、マイクロフォン（図示せず）によって入力されたアナログ信号がＰＣＭ方式でデジタル信号に変換され、音声信号記憶部２１に記憶される動作と、符号化の動作が並行して行われても良い。 (Other embodiments)
In the above embodiment, the input audio signal is stored in the audio signal storage unit 21. This may store all audio signals to be encoded. For example, an analog signal input by a microphone (not shown) may be converted into a digital signal by the PCM method and stored in the audio signal storage unit 21 and an encoding operation may be performed in parallel. good.

また、符号化された音声信号は、符号化音声信号記憶部５２に記憶されるとした。これは、符号化された全ての音声信号が記憶されるとしても良い。また、符号化の動作と、例えば、符号化された信号が通信回線を介して送信される動作とが並行して行われても良い。この動作が並行して行われる場合、符号化された音声信号であるビットストリームの許容される大きさは、上記送信のビットレートの速さに対応することは、言うまでもない。 Also, the encoded audio signal is stored in the encoded audio signal storage unit 52. In this case, all encoded audio signals may be stored. In addition, an encoding operation and, for example, an operation in which an encoded signal is transmitted via a communication line may be performed in parallel. Needless to say, when this operation is performed in parallel, the permissible size of the bit stream that is the encoded audio signal corresponds to the speed of the bit rate of the transmission.

本発明の実施形態に係る音声符号化装置は、プログラムを利用して動作するコンピュータであっても良い。また、本発明は、音声信号を符号化するあらゆる装置に適用することが当然に可能である。また、上記の実施形態で説明した要素を適宜組み合わせても良い。本発明は以上の構成に限定されるものではなく、種々の変形が可能である。 The speech encoding apparatus according to the embodiment of the present invention may be a computer that operates using a program. In addition, the present invention can naturally be applied to any device that encodes an audio signal. Moreover, you may combine suitably the element demonstrated in said embodiment. The present invention is not limited to the above configuration, and various modifications are possible.

本発明の第１の実施形態に係る音声符号化装置の構成を示すブロック図。1 is a block diagram showing a configuration of a speech encoding apparatus according to a first embodiment of the present invention. 本発明の実施形態に係るスケールファクターを乗算したＭＤＣＴ係数の一例を示す図。The figure which shows an example of the MDCT coefficient which multiplied the scale factor which concerns on embodiment of this invention. 本発明の実施形態に係るｓｆｂ毎のスケールファクターを乗算したＭＤＣＴ係数の最大値の一例を示す図。The figure which shows an example of the maximum value of the MDCT coefficient which multiplied the scale factor for every sfb which concerns on embodiment of this invention. 本発明の実施形態に係るセクショニング結果の一例を示す図（その１）。The figure which shows an example of the sectioning result which concerns on embodiment of this invention (the 1). 本発明の実施形態に係るセクショニング結果の一例を示す図（その２）。FIG. 6 is a diagram (part 2) illustrating an example of a sectioning result according to the embodiment of the present invention. 本発明の実施形態に係るセクショニング結果の一例を示す図（その３）。The figure which shows an example of the sectioning result which concerns on embodiment of this invention (the 3). 本発明の第１の実施形態に係る制御部の音声符号化を制御する動作を示すフローチャート。The flowchart which shows the operation | movement which controls the audio | voice coding of the control part which concerns on the 1st Embodiment of this invention. 本発明の第２の実施形態に係る音声符号化装置の構成を示すブロック図。The block diagram which shows the structure of the audio | voice coding apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係る制御部の音声符号化を制御する動作を示すフローチャート。The flowchart which shows the operation | movement which controls the audio | voice coding of the control part which concerns on the 2nd Embodiment of this invention.

Explanation of symbols

１１制御部
３１時間／周波数変換部
３２心理聴覚解析部
３３スケールファクター乗算部
３４セクショニング部
４１量子化ループ処理部
４２量子化部
４３ハフマン符号化部
４４発生符号量カウント部
５１フォーマッタ部 DESCRIPTION OF SYMBOLS 11 Control part 31 Time / frequency conversion part 32 Psychological auditory analysis part 33 Scale factor multiplication part 34 Sectioning part 41 Quantization loop process part 42 Quantization part 43 Huffman encoding part 44 Generated code amount count part 51 Formatter part

Claims

The audio signal is converted into a frequency spectrum classified into a predetermined scale factor band, and one of a plurality of Huffman tables depending on the representative value of the frequency spectrum classified into the scale factor band for each scale factor band. A Huffman table is selected, the frequency spectrum classified into the scale factor band is encoded with reference to the selected Huffman table, and the encoded frequency spectrum and a code for identifying the referenced Huffman table are encoded. Encoding means for creating an encoded audio signal including:
When the ratio of the representative values of the frequency spectrum classified into the adjacent scale factor bands is within a predetermined ratio, the encoding means selects the same Huffman table for those scale factor bands. A speech encoding device.

The audio signal is converted into a frequency spectrum classified into a predetermined scale factor band, and one of a plurality of Huffman tables depending on the representative value of the frequency spectrum classified into the scale factor band for each scale factor band. A Huffman table is selected, the frequency spectrum classified into the scale factor band is encoded with reference to the selected Huffman table, and the encoded frequency spectrum and a code for identifying the referenced Huffman table are encoded. Encoding means for creating an encoded audio signal including:
The encoding means has a small representative value of the frequency spectrum classified into the first scale factor band, and is classified into the second scale factor band adjacent to the low frequency side of the first scale factor band. The representative value of the frequency spectrum and the representative value of the frequency spectrum classified into the third scale factor band adjacent to the high frequency side of the first scale factor band are the representative values of the first scale factor band. If larger, the same Huffman table is selected for the first, second, and third scale factor bands.

The encoding means converts the speech signal into a frequency spectrum quantized by a quantization step depending on the scale factor band when the speech signal is converted into a frequency spectrum classified into the predetermined scale factor band. The speech encoding apparatus according to claim 1 or 2.

The encoding means converts the audio signal into a frequency spectrum multiplied by a scale factor depending on the scale factor band when the audio signal is converted into a frequency spectrum classified into the predetermined scale factor band. The speech encoding apparatus according to claim 1 or 2.

The speech code according to claim 1 or 2, wherein a representative value of the frequency spectrum classified into the scale factor band is a maximum value or an average value of the frequency spectrum classified into the scale factor band. Device.