JP3210165B2

JP3210165B2 - Voice encoding / decoding method and apparatus

Info

Publication number: JP3210165B2
Application number: JP04253094A
Authority: JP
Inventors: 幹男水谷; 博幸根本; 慶治江頭
Original assignee: 松下電送システム株式会社
Priority date: 1994-03-14
Filing date: 1994-03-14
Publication date: 2001-09-17
Anticipated expiration: 2016-09-17
Also published as: JPH07248799A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声の符号化復号化方
法および装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech encoding / decoding method and apparatus.

【０００２】[0002]

【従来の技術】従来、この種の音声の符号化方法は、以
下のようなものであった。2. Description of the Related Art Conventionally, this type of speech encoding method has been described below.

【０００３】すなわち、音声データの時間軸上の波形に
対応するコードブックを予め用意しておき、実際の時間
軸上のデータに前記コードブックの中から最も近似した
コードブックを選択し、前記選択されたコードブックを
示す番号を音声データの圧縮符号化データとして送出し
ていた。That is, a code book corresponding to a waveform on the time axis of audio data is prepared in advance, and a code book that is closest to the actual data on the time axis is selected from the code books. The number indicating the codebook is transmitted as compression-encoded data of the audio data.

【０００４】[0004]

【発明が解決しようとする課題】しかし、このような従
来の方法では、以下のような問題があった。（１）上述のコードブックは時間軸上の波形に対応する
ものであったので、音質の良い圧縮を行うためには、音
声データには様々は波形は出現するので、コードブック
の数を膨大に揃えなければならず（通常は１０００個を
越える）、そのため前記コードブックを格納するメモリ
の容量が大きくする必要があった。（２）膨大な数のコードブックが必要になることから、
そのため実際の音声データの波形に該当するコードブッ
クを探し出すには、膨大な数の全コードブックに対して
音声データの波形を当てはめる処理を行わなければなら
ず、その分処理時間がかかっていた。However, such a conventional method has the following problems. (1) Since the above-described codebook corresponds to a waveform on the time axis, various waveforms appear in audio data in order to perform high-quality compression, so that the number of codebooks is enormous. (Usually more than 1000), so that the capacity of the memory for storing the code book needs to be increased. (2) Since a huge number of codebooks are required,
Therefore, in order to find a codebook corresponding to the actual audio data waveform, a process of applying the audio data waveform to an enormous number of all codebooks has to be performed, which takes a long processing time.

【０００５】本発明は、上記課題を解決するもので、小
数のコードブックのみを使用するだけで精度の高い音声
符号化処理を行うことができる音声符号化復号化方法お
よび装置を実現するものである。The present invention solves the above-mentioned problems, and realizes a speech encoding / decoding method and apparatus capable of performing speech encoding with high accuracy by using only a small number of codebooks. is there.

【０００６】[0006]

【課題を解決するための手段】本発明は上記課題を解決
するために、第１に、音声データを符号化する際、まず
音声データを所定時間単位に離散コサイン変換し、前記
離散コサイン変換により得たデータの最大レベルおよび
この最大レベルの位置情報を検出し、周波数パターンに
より構成されるコードブックの中から、前記離散コサイ
ン変換により得たデータにおいて前記最大レベル周辺の
データに近似した前記コードブックを選択し、前記離散
コサイン変換により得たデータから前記コードブックを
減算し、前記最大レベル、前記位置情報、減算に用いた
前記コードブックを示す情報、前記所定時間内の音声デ
ータに対して行った処理の回数に基づいて音声データの
符号化データを生成するものである。According to the present invention, in order to solve the above-mentioned problems, first, when encoding audio data, first, the audio data is subjected to discrete cosine transform in a predetermined time unit, and the discrete cosine transform is performed by the discrete cosine transform. The maximum level of the obtained data and the position information of the maximum level are detected, and the codebook approximated to the data around the maximum level in the data obtained by the discrete cosine transform from the codebook configured by the frequency pattern. Is selected, and the codebook is subtracted from the data obtained by the discrete cosine transform, and the maximum level, the position information, the information indicating the codebook used for the subtraction, and the audio data within the predetermined time are performed. The encoded data of the audio data is generated based on the number of times of the processing.

【０００７】第２に、上記方法により音声データを符号
化する際、離散コサイン変換により得たデータからコー
ドブックを減算する回数を任意に設定して符号化処理を
行い、音声データの符号化データを生成するものであ
る。Second, when the audio data is encoded by the above-described method, the number of times the codebook is subtracted from the data obtained by the discrete cosine transform is arbitrarily set, the encoding process is performed, and the encoded data of the audio data is encoded. Is generated.

【０００８】第３に、上記第１の方法により音声データ
を符号化する際、離散コサイン変換により得たデータの
レベルにおいて前記減算を行う最低レベルを任意に設定
して、前記離散コサイン変換により得たデータからコー
ドブックを減算する処理を行い、音声データの符号化デ
ータを生成するものである。Third, when encoding the audio data by the first method, the lowest level at which the subtraction is performed is arbitrarily set in the level of the data obtained by the discrete cosine transform, and the minimum level obtained by the discrete cosine transform is obtained. The processing of subtracting the codebook from the data thus generated generates encoded data of the audio data.

【０００９】第４に、上記第１の方法により音声データ
を符号化する際、無音声状態の場合でも無音声状態ため
の特別の処理を行うことなく、音声が有る状態の処理と
同じ処理によって無音声データの符号化データを生成す
るものである。Fourthly, when audio data is encoded by the first method, even if there is no audio, no special processing for the non-audio state is performed, and the same processing as that of the state with audio is performed. This is to generate encoded data of non-speech data.

【００１０】第５に、上記第１の方法により音声データ
を符号化する際、離散コサイン変換により得たデータの
最大レベルにコードブックの最大レベルの大きさ、位置
を合せて減算処理を行い、音声データの符号化データを
生成するものである。Fifth, when encoding audio data by the first method, a subtraction process is performed by adjusting the size and position of the maximum level of the codebook to the maximum level of the data obtained by the discrete cosine transform. This is to generate encoded data of audio data.

【００１１】第６に、上記第１の方法により音声データ
を符号化する際、離散コサイン変換により得たデータか
らコードブックを減算する回数に応じて、生成する符号
化データのデータ長を可変にするものである。Sixth, when the audio data is encoded by the first method, the data length of the encoded data to be generated is made variable according to the number of times the codebook is subtracted from the data obtained by the discrete cosine transform. Is what you do.

【００１２】[0012]

【作用】本発明は上述の方法および構成により、まず、
所定時間内にサンプリングされた音声データを時間軸上
のデータから周波数軸上のデータに離散コサイン変換を
行う。次に前記周波数軸上のデータを奇数成分と偶数成
分に分ける。次に前記奇数成分と偶数成分に分けられた
周波数軸上のデータに、予め複数メモリに格納された周
波数パターンの中から最も近似している最適周波数パタ
ーンを選択し、前記周波数軸上のデータに前記最適周波
数パターンを当てはめて減算処理を行う。前記波形パタ
ーンによる減算処理を減算後の前記周波数軸上のデータ
の最大レベルが所定値に達するまで繰り返す。According to the present invention, the method and the structure described above
Discrete cosine transform is performed on audio data sampled within a predetermined time from data on the time axis to data on the frequency axis. Next, the data on the frequency axis is divided into odd and even components. Next, for the data on the frequency axis divided into the odd component and the even component, an optimal frequency pattern that is the closest from the frequency patterns stored in advance in a plurality of memories is selected, and the data on the frequency axis is selected. A subtraction process is performed by applying the optimal frequency pattern. The subtraction process using the waveform pattern is repeated until the maximum level of the data on the frequency axis after the subtraction reaches a predetermined value.

【００１３】以上の処理が終了すると、（１）処理前の
周波数軸上のデータの最大レベル、（２）減算処理に使
用した周波数パターンの番号、（３）減算したデータの
最大レベルの周波数位置、（４）減算処理を行った回数
を示す情報に基づいて圧縮符号化データを生成して送出
する。When the above processing is completed, (1) the maximum level of the data on the frequency axis before the processing, (2) the number of the frequency pattern used for the subtraction processing, and (3) the frequency position of the maximum level of the subtracted data , (4) Generates and sends compressed encoded data based on information indicating the number of times the subtraction process has been performed.

【００１４】これにより、音声データを圧縮符号化する
処理を行う際、前記音声データを周波数軸上のデータに
離散コサイン変換した後に、周波数軸上のデータをその
奇数成分と偶数成分とに分割することにより、奇数成分
と偶数成分の各々領域での相関係数が増加するので、前
記周波数軸上のデータにあてはめる周波数パターンの数
を大幅に削減することができ、予め格納しておく周波数
パターンの数を大幅に減少させることができる。さら
に、周波数パターンを格納しておくメモリの容量を大幅
に削減できるとともに、周波数パターンの数が少ないの
で音声データを圧縮符号化する際の処理を簡易にするこ
とができ、前記処理速度を向上させることができる。Thus, when performing a process of compressing and encoding audio data, the audio data is subjected to discrete cosine transform into data on the frequency axis, and then the data on the frequency axis is divided into its odd and even components. As a result, the correlation coefficient in each region of the odd component and the even component increases, so that the number of frequency patterns to be applied to the data on the frequency axis can be significantly reduced, and the frequency pattern stored in advance can be reduced. The number can be greatly reduced. Furthermore, the capacity of a memory for storing frequency patterns can be significantly reduced, and the number of frequency patterns is small, so that processing for compression-encoding audio data can be simplified, and the processing speed can be improved. be able to.

【００１５】また、上記効果を達成した上で、処理前の
周波数軸上のデータの最大値、減算した波形パターンの
番号、減算したデータの位置、減算回数に基づいて音声
データ圧縮符号化データを生成しているので、音声デー
タの圧縮効率を高めるとともに音声データの再現性を確
保することができる。Further, after achieving the above-mentioned effects, the audio data compressed and encoded data is converted based on the maximum value of the data on the frequency axis before processing, the number of the subtracted waveform pattern, the position of the subtracted data, and the number of subtractions. Since it is generated, the compression efficiency of the audio data can be increased and the reproducibility of the audio data can be ensured.

【００１６】[0016]

【実施例】以下、本発明の一実施例について図面を参照
にしながら説明する。An embodiment of the present invention will be described below with reference to the drawings.

【００１７】まず、音声の符号化について説明する。図
１は本発明による音声符号化処理を実現する音声符号化
装置のブロック図である。図１において、１はアナログ
データとして入力した音声データを所定時間毎にサンプ
リングして出力するＡ／Ｄ変換器である。２はＡ／Ｄ変
換器１においてサンプリングされた音声データを入力し
所定数蓄積するまで溜めて出力するバッファである。本
実施例では、バッファ２に蓄積する音声データ数を１２
８個としている。３はバッファ２から出力された音声デ
ータを離散コサイン変換する離散コサイン変換手段であ
る。すなわち、図９に示すように音声データを時間軸上
のデータ（イ）から周波数軸上のデータ（ロ）に変換し
ている。なお、図９については後述する。４は周波数軸
上のデータに変換された所定時間内の音声データの最大
レベルの値およびその最大レベルが存在する周波数位置
を検出する最大レベル検出手段である。５は離散コサイ
ン変換手段３により変換された周波数軸上のデータのパ
ターンから頻出頻度を多いパターンを選択して格納して
あるコードブックである。コードブック５には図８に示
すような周波数パターンが用いられ、図８には代表的な
周波数パターンを示している。周波数パターンにはレベ
ルの大小による相違はあるが、パターンの形状は近似し
たものが出現するので、頻出頻度を多いパターンを選択
するだけで、充分に実際に出現する周波数軸上のデータ
に対応することができる。本発明において、特徴的なこ
とはコードブックとして周波数パターンを用いているこ
とであり、コードブック５に格納されている周波数パタ
ーンが少なくて済むということである。本実施例におい
ても周波数パターンの数は１６個以下で充分に足りてい
る。６は最大レベル検出手段から出力された最大レベル
の周波数位置において近似する周波数パターンをコード
ブック５から選択する最適コードブック選択手段であ
る。７は実際の周波数軸上のデータとコードブック５か
ら選択された周波数パターンとを比較し減算処理を行う
計算手段である。８は音声データの圧縮符号データを生
成して出力する符号化手段である。符号化手段８は最大
レベル検出手段４から出力される（１）所定時間内の周
波数軸上のデータに変換された音声データの最大レベル
（以下、単に最大レベルとする）および（２）前記最大
レベルの存在する周波数位置（以下、位置情報とす
る）、さらに、（３）最適コードブック選択手段６から
出力される周波数パターンの番号（以下、コード番号）
に基づき音声データの圧縮符号データを生成する。First, audio coding will be described. FIG. 1 is a block diagram of a speech encoding device for realizing speech encoding processing according to the present invention. In FIG. 1, reference numeral 1 denotes an A / D converter that samples and outputs audio data input as analog data at predetermined time intervals. Reference numeral 2 denotes a buffer for inputting audio data sampled in the A / D converter 1 and storing the audio data until a predetermined number of audio data are accumulated. In this embodiment, the number of audio data stored in the buffer 2 is 12
There are eight. Reference numeral 3 denotes a discrete cosine transform unit for performing discrete cosine transform of the audio data output from the buffer 2. That is, as shown in FIG. 9, audio data is converted from data on the time axis (A) to data on the frequency axis (B). FIG. 9 will be described later. Reference numeral 4 denotes a maximum level detecting means for detecting a maximum level value of the audio data within a predetermined time converted into data on the frequency axis and a frequency position where the maximum level exists. Reference numeral 5 denotes a codebook in which a pattern having a high frequency of occurrence is selected from the data patterns on the frequency axis converted by the discrete cosine conversion means 3 and stored. A frequency pattern as shown in FIG. 8 is used for the codebook 5, and FIG. 8 shows a typical frequency pattern. Although the frequency pattern has a difference depending on the level of the level, the shape of the pattern approximates, so that only the pattern with a high frequent frequency is selected, and the data on the frequency axis which actually appears sufficiently corresponds to the frequency pattern. be able to. In the present invention, a characteristic feature is that a frequency pattern is used as a codebook, and that the frequency pattern stored in the codebook 5 can be reduced. In this embodiment as well, the number of frequency patterns of 16 or less is sufficient. Numeral 6 denotes an optimal codebook selecting means for selecting a frequency pattern which approximates at the maximum level frequency position output from the maximum level detecting means from the codebook 5. Numeral 7 is a calculating means for comparing the data on the actual frequency axis with the frequency pattern selected from the codebook 5 and performing a subtraction process. Reference numeral 8 denotes an encoding unit that generates and outputs compressed encoded data of audio data. The encoding means 8 outputs (1) the maximum level (hereinafter simply referred to as the maximum level) of the audio data output from the maximum level detection means 4 and converted into data on the frequency axis within a predetermined time, and (2) the maximum level. The frequency position where the level exists (hereinafter referred to as position information), and (3) the number of the frequency pattern output from the optimum codebook selecting means 6 (hereinafter referred to as a code number)
Compressed code data of audio data is generated based on.

【００１８】以下、以上のように構成された本発明の音
声符号化装置についてその音声符号化処理について、図
２乃至図１０を用いて説明する。図２は本発明の音声デ
ータの符号化処理の一実施例を示した上位フローチャー
トである。図３は図２のステップ（以下、ＳＴとする）
４における最大レベル検出処理を示したフローチャート
である。図４は図２のＳＴ６における最適コードブック
選択処理を示したフローチャートである。図５は図２の
ＳＴ７における計算処理（減算）を示したフローチャー
トである。図６は図２のＳＴ８における符号化処理Ａを
示したフローチャートである。図７は図２のＳＴ１１に
おける符号化処理Ｂを示したフローチャートである。図
８は図１のコードブック７に格納されている周波数パタ
ーンの代表パターンを示した図である。図９は音声デー
タを離散コサイン変換して時間軸上のデータから周波数
軸上のデータに変換した所定時間におけるデータの状態
を示した図である。図９において（イ）は時間軸上のデ
ータを示し、（ロ）は（イ）のデータを周波数軸上のデ
ータに離散コサイン変換した状態を示し、（ハ）は受信
側で（イ）を再現した音声データを時間軸上のデータに
より示し、（ニ）は逆離散コサイン変換して時間軸上の
データ（ハ）になる前の周波数軸上のデータを示してい
る。図１０は符号化された圧縮符号化データの構成を示
したデータ構成図である。Hereinafter, the speech encoding processing of the speech encoding apparatus of the present invention configured as described above will be described with reference to FIGS. FIG. 2 is a high-level flowchart showing one embodiment of the audio data encoding process of the present invention. FIG. 3 shows the steps of FIG. 2 (hereinafter referred to as ST).
4 is a flowchart illustrating a maximum level detection process in FIG. FIG. 4 is a flowchart showing the optimum codebook selection process in ST6 of FIG. FIG. 5 is a flowchart showing the calculation processing (subtraction) in ST7 of FIG. FIG. 6 is a flowchart showing the encoding process A in ST8 of FIG. FIG. 7 is a flowchart showing the encoding process B in ST11 of FIG. FIG. 8 is a diagram showing a representative pattern of the frequency patterns stored in the code book 7 of FIG. FIG. 9 is a diagram showing a state of data at a predetermined time when audio data is converted into discrete cosine transform data on the time axis to data on the frequency axis. In FIG. 9, (A) shows data on the time axis, (B) shows a state in which the data of (A) is discrete cosine-transformed into data on the frequency axis, and (C) shows (A) on the receiving side. Reproduced audio data is shown by data on the time axis, and (d) shows data on the frequency axis before inverse discrete cosine transform to become data (c) on the time axis. FIG. 10 is a data configuration diagram showing the configuration of encoded compressed and encoded data.

【００１９】まず、図２を用いて本実施例の音声符号化
処理の概略を説明する。音声データを入力する前にＳＴ
７における計算処理を何回繰り返すかをカウンタに設定
する（ＳＴ１）。後述するが、前記設定した回数が多い
ほど圧縮符号化した音声データの再現性が高くなり、逆
に、前記設定した回数が少ないほど圧縮符号化した音声
データの再現性は悪くなるが圧縮効率は高くなる。First, an outline of the speech encoding processing of the present embodiment will be described with reference to FIG. ST before inputting voice data
The number of times the calculation process in step 7 is repeated is set in a counter (ST1). As will be described later, the reproducibility of the compression-encoded audio data increases as the set number increases, and conversely, the reproducibility of the compression-encoded audio data decreases as the set number decreases, but the compression efficiency increases. Get higher.

【００２０】音声データを入力すると、Ａ／Ｄ変換部１
において所定時間毎に前記音声データを１２８単位によ
りサンプリングを行う。Ａ／Ｄ変換部１はサンプリング
した音声データをワンサンプリングデータ単位により出
力する。バッファ２は前記サンプリングされた音声デー
タを１２８個蓄積するまで溜めておき、離散コサイン変
換手段３へ出力する。この際音声データは図９（イ）に
示す状態になっている。離散コサイン変換手段３では入
力した音声データを時間軸上のデータを周波数軸上のデ
ータ（図９（ロ）に示す）に離散コサイン変換を行い
（ＳＴ２）、奇数成分と偶数成分とに分離した上で出力
する（ＳＴ３）。When audio data is input, the A / D converter 1
In the above, the audio data is sampled in units of 128 every predetermined time. The A / D converter 1 outputs the sampled audio data in one-sampling data units. The buffer 2 stores the sampled audio data until 128 are stored, and outputs the data to the discrete cosine transform means 3. At this time, the audio data is in the state shown in FIG. The discrete cosine transform means 3 performs discrete cosine transform of the input audio data on the time axis to the data on the frequency axis (shown in FIG. 9 (b)) (ST2) to separate it into odd and even components. The above is output (ST3).

【００２１】このように、周波数軸上のデータを奇数成
分と偶数成分とに分離すると、奇数成分と偶数成分の各
々領域での相関係数が増加し、図９の（ロ）に示すよう
なデータが生成される。As described above, when the data on the frequency axis is separated into an odd-numbered component and an even-numbered component, the correlation coefficient in each region of the odd-numbered component and the even-numbered component increases, and as shown in FIG. Data is generated.

【００２２】次に、前記所定時間内において１２８回に
サンプリングされた周波数軸上のデータの中での、デー
タの最大レベルを検出する（ＳＴ４）とともに前記最大
レベルであるデータの周波数位置を検出している。この
際最大レベルとはデータの大きさを意味し、絶対値にお
いて最大であるデータのレベルを検出している。Next, among the data on the frequency axis sampled 128 times within the predetermined time, the maximum level of the data is detected (ST4), and the frequency position of the data having the maximum level is detected. ing. At this time, the maximum level means the size of the data, and the level of the data having the maximum absolute value is detected.

【００２３】前記最大レベルが所定レベル以下（０を含
む）であると（ＳＴ５）、音声がない状態であるとみな
して周波数パターンによる演算処理を行わず、ＳＴ１１
の符号化処理Ｂへ進む。ここで、本実施例によると無音
声状態に処理が分岐しても、無音声状態のための特別の
処理はなく、音声が有る場合の処理と同一であるＳＴ１
１の符号化処理Ｂへ進んでいる。If the maximum level is equal to or lower than a predetermined level (including 0) (ST5), it is determined that there is no sound, and no arithmetic processing based on a frequency pattern is performed.
To the encoding process B. Here, according to the present embodiment, even if the processing branches to the non-speech state, there is no special processing for the non-speech state, and the processing is the same as the processing when there is a sound.
1 to the encoding process B.

【００２４】ＳＴ５において、前記最大レベルが所定レ
ベル以上であれば、周波数パターンによる演算処理へ進
む。最大レベル検出手段４は最適コードブック選択手段
６へ最大レベル（１）と位置情報（２）とを出力する。
最適コードブック選択手段６はコードブック５から順次
周波数パターンを取り出す。コードブック５内の周波数
パターンを５個とすると、１個目の周波数パターンから
順次実際の周波数軸上のデータに当てはめて、前記実際
の周波数軸上のデータに近似しているものを５個の周波
数パターンの中から選択する。この際コードブック５内
に格納されている周波数パターンは、各々図８（イ）に
示すように最大レベルが１となるように構成されてい
る。したがって、実際の周波数軸上のデータに当てはめ
る場合、コードブック５内の周波数パターンの最大レベ
ルを実際の周波数パターンの最大レベルに合せるように
ｎ倍している。このように、最適コードブック選択手段
６では、コードブック５から最適な周波数パターンを選
択して（ＳＴ６）、前記周波数パターンの番号であるコ
ード番号（３）を出力する。In step ST5, if the maximum level is equal to or higher than the predetermined level, the process proceeds to a calculation process based on a frequency pattern. The maximum level detecting means 4 outputs the maximum level (1) and the position information (2) to the optimum codebook selecting means 6.
The optimum codebook selecting means 6 sequentially extracts frequency patterns from the codebook 5. Assuming that there are five frequency patterns in the codebook 5, the first frequency pattern is sequentially applied to the data on the actual frequency axis, and the data that approximates the data on the actual frequency axis is used as the five frequency patterns. Select from frequency patterns. At this time, the frequency patterns stored in the codebook 5 are configured such that the maximum level is 1 as shown in FIG. Therefore, when applying to the data on the actual frequency axis, the maximum level of the frequency pattern in the codebook 5 is multiplied by n so as to match the maximum level of the actual frequency pattern. As described above, the optimal codebook selecting means 6 selects an optimal frequency pattern from the codebook 5 (ST6) and outputs a code number (3) which is the number of the frequency pattern.

【００２５】ＳＴ７の計算処理では、離散コサイン変換
手段３から出力された実際の周波数軸上のデータに、最
適コードブック選択手段６から出力されたコード番号
（３）に基づいてコードブック５から最適な周波数パタ
ーンを取り出して減算する処理を行う。減算処理後に減
算した計算回数（４）を符号化手段８へ出力し、また、
減算した後の同所定時間内の周波数軸上のデータを最大
レベル検出手段４へ出力する。In the calculation process of ST7, the data on the actual frequency axis output from the discrete cosine transform means 3 is optimized from the codebook 5 based on the code number (3) output from the optimal codebook selecting means 6. A process for extracting and subtracting a suitable frequency pattern is performed. The number of calculations (4) subtracted after the subtraction processing is output to the encoding means 8, and
The data on the frequency axis within the same predetermined time after the subtraction is output to the maximum level detecting means 4.

【００２６】ＳＴ８の符号化処理Ａでは、最大レベル検
出手段４から入力した最大レベル（１）および位置情報
（２）、最適コードブック選択手段６から入力したコー
ド番号（３）に基づいて図１０（ロ）に示すように圧縮
符号化データを構成するフレームを生成する。計算手段
７における１回目の計算処理に基づいてフレーム１が生
成され、２回目の計算処理に基づいてフレーム２が生成
される。したがって、計算処理り回数が多いほどフレー
ム数は増加することになる。In the encoding process A in ST8, the maximum level (1) and position information (2) input from the maximum level detecting means 4 and the code number (3) input from the optimum codebook selecting means 6 are used as shown in FIG. As shown in (b), a frame constituting the compression-encoded data is generated. Frame 1 is generated based on the first calculation processing in calculation means 7, and frame 2 is generated based on the second calculation processing. Therefore, the number of frames increases as the number of calculations increases.

【００２７】符号化手段によるＳＴ８における符号化処
理Ａが終了すると、ＳＴ１において設定した計算処理の
回数をディクリメントし（ＳＴ９）、カウンタ値が０に
なったか否かを判断する（ＳＴ１０）。０であればＳＴ
１１へ進むが、０でなければ再びＳＴ３の最大レベル検
出手段へ戻り、減算処理後の周波数軸上のデータにおい
て最大レベルを検出し、ＳＴ５→ＳＴ６→ＳＴ７→ＳＴ
８→ＳＴ９→ＳＴ１０と、ＳＴ１０においてカウンタ値
が０になるまで処理を繰り返す。When the encoding process A in ST8 by the encoding means is completed, the number of calculation processes set in ST1 is decremented (ST9), and it is determined whether or not the counter value has become 0 (ST10). If 0, ST
If it is not 0, the process returns to the maximum level detecting means in ST3 again to detect the maximum level in the data on the frequency axis after the subtraction processing, and ST5 → ST6 → ST7 → ST
The process is repeated from 8 to ST9 to ST10 until the counter value becomes 0 in ST10.

【００２８】ここで、ＳＴ１において、カウンタ値を予
め多く設定しておくと、周波数パターンによる計算回数
（ＳＴ７）が多くなり、その分音質の高い音声データの
圧縮を実現することができる。反面、カウンタ値を予め
多く設定しておくと、計算回数が多くなり、その分図１
０（イ）に示す圧縮符号化データのフレーム数が増加
し、データ長が長くなり、音声データの圧縮率が低くな
る。このように本発明によると、任意に音声データの圧
縮率を設定変更することができる。Here, if a large counter value is set in advance in ST1, the number of calculations (ST7) based on the frequency pattern increases, and compression of high-quality sound data can be realized. On the other hand, if a large number of counter values are set in advance, the number of calculations will increase, and accordingly, FIG.
The number of frames of the compression-encoded data indicated by 0 (a) increases, the data length increases, and the compression ratio of audio data decreases. As described above, according to the present invention, the compression ratio of audio data can be arbitrarily set and changed.

【００２９】さらに、ＳＴ１においてカウンタ値を予め
多く設定しておいた場合でも、周波数軸上のデータがコ
ードブック５内の周波数パターンとほとんど同一である
場合のようにＳＴ５において最大レベルが所定レベルを
下回ると、その時点でカウンタ値が０に達しなくても計
算処理を繰り返えさず中断してＳＴ１１の符号化処理Ｂ
へ進む。このように本発明によると、実際の周波数軸上
のデータと周波数パターンとが近似している場合には計
算処理の回数は予め設定されたカウンタ値より少なくな
り、また、実際の周波数軸上のデータと周波数パターン
とが近似していない場合には予め設定されたカウンタ値
まで行っているので、音声データの再現性を保障しつつ
圧縮率を高めることができる。Further, even when a large counter value is set in advance in ST1, the maximum level becomes a predetermined level in ST5 as in the case where the data on the frequency axis is almost the same as the frequency pattern in the codebook 5. If it falls below, even if the counter value does not reach 0 at that time, the calculation process is not repeated and interrupted, and the encoding process B in ST11 is performed.
Proceed to. As described above, according to the present invention, when the data on the actual frequency axis and the frequency pattern are similar, the number of calculations is smaller than the preset counter value, and When the data and the frequency pattern are not approximated, the counter is incremented to a preset counter value, so that the compression ratio can be increased while ensuring the reproducibility of the audio data.

【００３０】なお、ＳＴ５における所定レベルは任意に
設定することができる。前記所定レベルを高く設定する
とその分音声データの再現性は悪くなり、反面、ＳＴ１
におけるカウンタ値に関わりなく計算処理を中断させる
のでその分圧縮率は高まる。また、前記所定レベルを低
く設定するとその分音声データの再現性は高くなり、反
面、ＳＴ１におけるカウンタ値が０に達するまで計算処
理を継続させるのでその分圧縮率は低くなる。The predetermined level in ST5 can be set arbitrarily. If the predetermined level is set higher, the reproducibility of the audio data becomes worse by that amount.
The calculation process is interrupted irrespective of the counter value in, so that the compression ratio increases accordingly. If the predetermined level is set to a low level, the reproducibility of the audio data increases accordingly. On the other hand, since the calculation process is continued until the counter value in ST1 reaches 0, the compression ratio decreases accordingly.

【００３１】ＳＴ１０においてカウンタ値が０に達し、
若しくは、ＳＴ５において最大レベルが所定レベル以下
になると、ＳＴ１１の符号化処理Ｂへ進む。ＳＴ１１の
符号化処理Ｂにおいては、各回数における計算処理の結
果によりＳＴ８の符号化処理Ａにおいて生成されたフレ
ーム１からフレームｎまでを１個の圧縮符号化データに
まとめる処理を行っている。図１０（イ）に示すように
所定時間内の音声データであることを示すヘッダ情報を
先頭に配置し、順次フレーム１からフレームｎまで配置
して、所定時間内の音声データの圧縮符号化データを生
成している。このように圧縮符号化データを生成すると
外部へ出力する。In ST10, the counter value reaches 0,
Alternatively, when the maximum level becomes equal to or lower than the predetermined level in ST5, the process proceeds to the encoding process B in ST11. In the encoding process B of ST11, the process of combining the frames 1 to n generated in the encoding process A of ST8 into one piece of compressed encoded data based on the result of the calculation process at each number of times is performed. As shown in FIG. 10A, header information indicating audio data within a predetermined time is arranged at the head, and frames 1 to n are sequentially arranged, and the compressed and coded data of the audio data within the predetermined time is arranged. Has been generated. When the compressed encoded data is generated in this way, it is output to the outside.

【００３２】また、ＳＴ１１において、入力した音声デ
ータが無音声状態であると、圧縮符号化データは図１０
（イ）のヘッダ部分のみの構成により生成される。圧縮
符号化データを復号化する際、圧縮符号化データがヘッ
ダ部分のみにより構成されていると、無音声であると判
断する。上述したように、無音声状態が判別された場合
でも、無音声状態にのみ用いる処理を特別に行わず、音
声がある場合の通常の処理により圧縮符号化データを生
成しているので、その分ソフトウェアを減少させること
ができる。さらに音声データがない場合には圧縮情報が
減少するので、その分音声データの圧縮率を上げること
ができる。なお、無音声の場合の圧縮符号化データはヘ
ッダにフレームを１個付加した構成であってもよい。If the input voice data is in a non-voice state in ST11, the compression-encoded data is
It is generated by the configuration of only the header part of (a). When decoding the compressed coded data, if the compressed coded data is composed only of the header portion, it is determined that there is no voice. As described above, even when the no-sound state is determined, the processing to be used only for the no-sound state is not specially performed, and the compressed and encoded data is generated by the normal processing when there is a sound. Software can be reduced. Further, when there is no audio data, the compression information decreases, so that the compression ratio of the audio data can be increased accordingly. Note that the compression-encoded data in the case of no sound may have a configuration in which one frame is added to a header.

【００３３】以上が本実施例の音声符号化処理の概略で
ある。以下、図３を用いて図２におけるＳＴ４の最大レ
ベル検出処理について詳細に説明する。The above is the outline of the speech encoding processing of this embodiment. Hereinafter, the maximum level detection process in ST4 in FIG. 2 will be described in detail with reference to FIG.

【００３４】まず、カウンタ値をセットして、最大レベ
ルに０をセットする（ＳＴ１）。ここでカウンタ値とは
所定時間内のサンプリング数を示している。本実施例の
場合は所定時間内で音声データを１２８サンプリングし
ているので、カウンタ値は１２８となる。次に、離散コ
サイン変換手段３から周波数軸上のデータに変換された
音声データを入力する。前記周波数軸上のデータを１個
目のサンプリングされたデータから順番に１２８番目ま
で以下の処理を繰り返し行う。First, a counter value is set, and the maximum level is set to 0 (ST1). Here, the counter value indicates the number of samplings within a predetermined time. In the case of the present embodiment, since the audio data is sampled 128 within a predetermined time, the counter value is 128. Next, audio data converted into data on the frequency axis is input from the discrete cosine conversion means 3. The following processing is repeated for the data on the frequency axis from the first sampled data to the 128th one in order.

【００３５】ＳＴ２において、ＳＴ１でセットされた最
大レベル０から１個目のサンプリングされたデータを減
算する。前記１個目のサンプリングされたデータを０と
すると減算結果は０であるので、ＳＴ５に進み、ＳＴ１
におけるカウンタ値を１２８から１２７へとディクリメ
ントする。In ST2, the first sampled data is subtracted from the maximum level 0 set in ST1. Assuming that the first sampled data is 0, the subtraction result is 0, so the process proceeds to ST5 and ST1.
Is decremented from 128 to 127.

【００３６】ＳＴ６において、ＳＴ１におけるカウンタ
値が０になったか否かを判断する。上記例の場合カウン
タ値は１２７であるので、ＳＴ２へ戻る。再び、ＳＴ２
→ＳＴ３→ＳＴ５→ＳＴ６と処理を繰り返し、最大レベ
ルが０のまま処理が進み、カウンタ値が５となった時点
でデータのレベルが５０であるとする。ＳＴ２におい
て、最大レベル０からレベル５０を減算する。減算結果
は−５０であり０より小さいので、ＳＴ４へ進む。ＳＴ
４では、最大レベルに５０を入力し、位置情報に５を入
力する。ＳＴ５→ＳＴ６→ＳＴ２と進む。In ST6, it is determined whether or not the counter value in ST1 has become 0. In the above example, since the counter value is 127, the process returns to ST2. Again, ST2
It is assumed that the processing is repeated in the order of → ST3 → ST5 → ST6, the processing proceeds with the maximum level being 0, and the data level is 50 when the counter value becomes 5. In ST2, the level 50 is subtracted from the maximum level 0. Since the subtraction result is -50, which is smaller than 0, the process proceeds to ST4. ST
In 4, the user inputs 50 as the maximum level and 5 as the position information. The process proceeds from ST5 to ST6 to ST2.

【００３７】続いて、カウンタ値が６となり、データの
レベルが−３０であるとする。ＳＴ２においては絶対値
により減算を行うので、前回更新した最大値５０からレ
ベル３０を減算する。減算結果は２０であり、０より大
きいのでＳＴ５へ進むことになる。すなわち、最大レベ
ルは更新されないことになる。処理はＳＴ５→ＳＴ６→
ＳＴ２と進む。Subsequently, it is assumed that the counter value becomes 6, and the data level is -30. In ST2, since the subtraction is performed using the absolute value, the level 30 is subtracted from the previously updated maximum value 50. The subtraction result is 20, which is larger than 0, so that the operation proceeds to ST5. That is, the maximum level will not be updated. Processing is ST5 → ST6 →
Proceed to ST2.

【００３８】再び処理が数回繰り返されて、カウンタ値
が１２のときに最大レベルが５０のままであるとする。
カウンタ値が１３となり、データのレベルが−８０であ
るとする。ＳＴ２において、最大レベル５０から絶対値
であるレベル８０を減算する。減算結果−３０であるの
で０より小さい値となり、ＳＴ４へ進み、最大レベルを
５０から８０へと更新し、位置情報を５から１３へと更
新する。It is assumed that the process is repeated several times and the maximum level remains at 50 when the counter value is 12.
It is assumed that the counter value is 13 and the data level is -80. In ST2, a level 80, which is an absolute value, is subtracted from the maximum level 50. Since the subtraction result is -30, the value is smaller than 0, the process proceeds to ST4, the maximum level is updated from 50 to 80, and the position information is updated from 5 to 13.

【００３９】このように、１２８個目のサンプリングさ
れた音声データまで上記処理を繰り返し、所定時間内に
おける周波数軸上のデータに変換された音声データの最
大レベル（１）と位置情報（２）とを検出して出力す
る。As described above, the above processing is repeated up to the 128th sampled audio data, and the maximum level (1) and the position information (2) of the audio data converted into the data on the frequency axis within a predetermined time are obtained. Is detected and output.

【００４０】以上が最大レベル検出処理である。以下、
図４を用いて図２におけるＳＴ６の最適コードブック選
択処理について詳細に説明する。The above is the maximum level detection processing. Less than,
With reference to FIG. 4, the optimal codebook selection process in ST6 in FIG. 2 will be described in detail.

【００４１】まず、カウンタ値をセットし、最大レベル
検出処理４から最大レベル（１）と位置情報（２）とを
入力し、最小減算誤差に任意の大きな値を入力する（Ｓ
Ｔ１）。ここでカウンタ値とはコードブック５内に格納
されている周波数パターンの数を示している。本実施例
の場合は図８の（イ）から（ホ）に示すように周波数パ
ターンの数は５個とする。また、最小減算誤差には実際
の周波数軸上のデータとコードブック５内の周波数パタ
ーンとを減算し、この減算結果を総和した値の中で発生
し得る最も大きな値よりも、大きな値を入力している。First, a counter value is set, the maximum level (1) and the position information (2) are input from the maximum level detection processing 4, and an arbitrary large value is input to the minimum subtraction error (S).
T1). Here, the counter value indicates the number of frequency patterns stored in the code book 5. In the case of this embodiment, the number of frequency patterns is five as shown in FIGS. In addition, as the minimum subtraction error, a value larger than the largest value that can be generated in the sum of the subtraction results obtained by subtracting the data on the actual frequency axis and the frequency pattern in the codebook 5 is input. are doing.

【００４２】次に、コードブック５から１番目に格納さ
れている周波数パターンを取り出す（ＳＴ２）。ここで
は、図８の（イ）を取り出すとする。前記周波数パター
ン（イ）を最大レベルに合せてｎ倍する（ＳＴ３）。す
なわち、周波数パターンは前述したようにコードブック
５内で最大レベルの高さを１に設定して格納してある。
したがって、最大レベル検出手段４から入力した最大レ
ベルが８０とすると、周波数パターン（イ）を８０倍に
する。Next, the first stored frequency pattern is extracted from the code book 5 (ST2). Here, it is assumed that FIG. The frequency pattern (a) is multiplied by n in accordance with the maximum level (ST3). That is, the frequency pattern is stored with the maximum level height set to 1 in the codebook 5 as described above.
Therefore, if the maximum level input from the maximum level detecting means 4 is 80, the frequency pattern (a) is multiplied by 80.

【００４３】ＳＴ４において、実際の周波数軸上のデー
タの最大レベルに、レベル調整をした周波数パターン
（イ）の最大レベルを合せて減算する。したがって、実
際の周波数軸上のデータの最大レベルは減算処理後完全
に消えることになる。続いて、減算して残った各周波数
位置におけるレベルの総和を減算誤差に入力する。In ST4, the maximum level of the frequency pattern (a) whose level has been adjusted is subtracted from the maximum level of the data on the actual frequency axis. Therefore, the actual maximum level of the data on the frequency axis completely disappears after the subtraction processing. Subsequently, the sum of the levels at the respective frequency positions remaining after the subtraction is input to the subtraction error.

【００４４】ＳＴ５において、ＳＴ１における最小減算
誤差からＳＴ４における減算誤差を減算する。ここで、
最小減算誤差を１０００とし、減算誤差を２０とする。
減算すると９８０になり０より大きいので（ＳＴ６）、
ＳＴ７に進む。In ST5, the subtraction error in ST4 is subtracted from the minimum subtraction error in ST1. here,
The minimum subtraction error is set to 1000, and the subtraction error is set to 20.
When subtracted, it becomes 980, which is larger than 0 (ST6).
Proceed to ST7.

【００４５】ＳＴ７において、最小減算誤差を１０００
から２０へ更新し、周波数パターンのコード番号に１を
入力する。In ST7, the minimum subtraction error is set to 1000
To 20 and enter 1 as the code number of the frequency pattern.

【００４６】ＳＴ８において、ＳＴ１におけるカウンタ
値を５から４へデクリメントする。次にカウンタ値が０
であるか否かを判断する（ＳＴ９）。ここではカウンタ
値は０でないので、ＳＴ２へ戻る。In ST8, the counter value in ST1 is decremented from 5 to 4. Next, the counter value is 0
Is determined (ST9). Here, since the counter value is not 0, the process returns to ST2.

【００４７】ＳＴ２において、コードブック５から２番
目の周波数パターンを入力する。すなわちここでは図８
の（ロ）とする。再び、前回のＳＴ４において減算する
前の実際の周波数軸上のデータにおける最大レベルに合
せて、２番目の周波数パターン（ロ）の最大レベルをｎ
倍する（ＳＴ３）。In ST2, the second frequency pattern from the codebook 5 is input. That is, FIG.
(B). Again, the maximum level of the second frequency pattern (b) is set to n according to the maximum level in the data on the actual frequency axis before the subtraction in the previous ST4.
Multiply (ST3).

【００４８】ＳＴ４において、前回のＳＴ４において減
算する前の実際の周波数軸上のデータにおける最大レベ
ルに２番目の周波数パターン（ロ）の最大レベルを合せ
て減算を行い、減算後に残った実際の周波数軸上のデー
タの各周波数位置におけるレベルの総和を減算誤差に入
力する。In ST4, the subtraction is performed by matching the maximum level of the data on the actual frequency axis before the subtraction in the previous ST4 with the maximum level of the second frequency pattern (b), and the actual frequency remaining after the subtraction. The sum of the levels at each frequency position of the data on the axis is input to the subtraction error.

【００４９】ＳＴ５において、最小減算誤差から減算誤
差を減算する。ここで、最小減算誤差は前回２０に更新
され、減算誤差を１０とする。減算すると１０になり０
より大きいので（ＳＴ６）、最小減算誤差は２０から１
０へ更新され、コード番号は周波数パターンの１番から
２番へと更新される。カウンタ値はデクリメントされ、
ＳＴ９→ＳＴ２と進む。コードブック５から
３番目の周波数パターンを入力する（ＳＴ２）。すなわ
ちここでは図８の（ハ）に示すものである。再び、最初
の実際の周波数軸上のデータにおける最大レベルに合せ
て、３番目の周波数パターン（ハ）の最大レベルをｎ倍
する（ＳＴ３）。In ST5, a subtraction error is subtracted from the minimum subtraction error. Here, the minimum subtraction error is updated to 20 last time, and the subtraction error is set to 10. When subtracted, it becomes 10 and 0
(ST6), the minimum subtraction error is 20 to 1
The code number is updated to 0, and the code number is updated from 1 to 2 of the frequency pattern. The counter value is decremented,
The process proceeds from ST9 to ST2. The third frequency pattern is input from the code book 5 (ST2). That is, it is shown in FIG. Again, the maximum level of the third frequency pattern (c) is multiplied by n in accordance with the maximum level of the first actual data on the frequency axis (ST3).

【００５０】ＳＴ４において、最初の実際の周波数軸上
のデータにおける最大レベルに、３番目の周波数パター
ン（ハ）の最大レベルを合せて減算を行い、減算後に残
った実際の周波数軸上のデータの各周波数位置における
レベルの総和を減算誤差に入力する。In step ST4, the maximum level of the first actual frequency axis data is subtracted from the maximum level of the third frequency pattern (c), and the data of the actual frequency axis data remaining after the subtraction is subtracted. The sum of the levels at each frequency position is input to the subtraction error.

【００５１】ＳＴ５において、最小減算誤差から減算誤
差を減算する。ここで、最小減算誤差は前回１０に更新
され、減算誤差を３０とする。減算すると１２０になり
０より小さいので（ＳＴ６）、最小減算誤差は更新され
ず１０のままで、ＳＴ８→ＳＴ９→ＳＴ２と進む。In ST5, a subtraction error is subtracted from the minimum subtraction error. Here, the minimum subtraction error is updated to 10 last time, and the subtraction error is set to 30. When the subtraction results in 120, which is smaller than 0 (ST6), the minimum subtraction error is not updated and the process proceeds from ST8 to ST9 to ST2 without changing.

【００５２】以上のような処理をＳＴ９においてカウン
タ値が０に達するまで繰り返す。ここでは、コードブッ
ク５内に格納されている周波数パターンの数は５個であ
るので５回繰り返すことになる。５回目まで処理が進
み、ＳＴ７における最小減算誤差が１０であり、コード
番号が２番だとすると、２番目の周波数パターン（ロ）
がコードブック５内の周波数パターンの中で最も近似し
ているものとして選択される。このように、本発明で
は、音声データにコードブックを当てはめて圧縮符号化
処理を行う際、音声データを時間軸上のデータから周波
数軸上のデータへ離散コサイン変換を行い、周波数パタ
ーンを当てはめている。出現する周波数パターンはレベ
ルの相違はあっても通常類似形状のものが多いので、１
０００個ものパターン数を予め格納しておく必要がな
く、最適コードブックを選択する処理においても、少な
い周波数パターンの中から１個を選択するだけであるの
で、処理を簡単に行うことができ、処理速度も従来に比
較して格段にアップさせることができる。The above processing is repeated until the counter value reaches 0 in ST9. Here, since the number of frequency patterns stored in the codebook 5 is five, it is repeated five times. If the processing proceeds to the fifth time, the minimum subtraction error in ST7 is 10, and the code number is 2, the second frequency pattern (b)
Is selected as the most similar among the frequency patterns in the codebook 5. As described above, in the present invention, when performing the compression encoding process by applying the codebook to the audio data, the audio data is subjected to the discrete cosine transform from the data on the time axis to the data on the frequency axis, and the frequency pattern is applied. I have. The frequency patterns that appear usually have similar shapes, even though they have different levels.
Since it is not necessary to previously store the number of patterns as many as 000, and in the process of selecting the optimal codebook, it is only necessary to select one from a small number of frequency patterns. The processing speed can be significantly increased as compared with the conventional case.

【００５３】以上が最適コードブック選択処理である。
以下、図５を用いて図２におけるＳＴ７の計算処理（減
算）について詳細に説明する。The above is the optimal codebook selection processing.
Hereinafter, the calculation process (subtraction) of ST7 in FIG. 2 will be described in detail with reference to FIG.

【００５４】まず、計算手段７は、最大レベル検出手段
４から最大レベル（１）および位置情報（２）を入力
し、最適コードブック選択手段６からコード番号（３）
を入力する（ＳＴ１）。次に、コードブック５から前記
コード番号（３）に基づいて周波数パターンを取り出
す。続いて、前記周波数パターンを所定時間内の実際の
周波数軸データにおける最大レベルに合せてｎ倍する。First, the calculating means 7 inputs the maximum level (1) and the position information (2) from the maximum level detecting means 4 and the code number (3) from the optimum code book selecting means 6.
Is input (ST1). Next, a frequency pattern is extracted from the code book 5 based on the code number (3). Subsequently, the frequency pattern is multiplied by n in accordance with the maximum level in the actual frequency axis data within a predetermined time.

【００５５】ＳＴ４において、前記実際の周波数軸デー
タにおける最大レベルの周波数位置に、前記周波数パタ
ーンにおける最大レベルの周波数位置に合せて、前記実
際の周波数軸データから前記周波数パターンを減算し
（ＳＴ４）、減算した結果前記実際の周波数軸データに
おいて残ったレベルを最大レベル検出手段４に戻す。In ST4, the frequency pattern is subtracted from the actual frequency axis data in accordance with the maximum level frequency position in the actual frequency axis data and the maximum level frequency position in the frequency pattern (ST4). The level remaining in the actual frequency axis data as a result of the subtraction is returned to the maximum level detecting means 4.

【００５６】ここで減算を行う場合、実際の周波数軸デ
ータにおける最大レベルの周波数位置に、周波数パター
ンにおける最大レベルの周波数位置に合せて行っている
が、これにより、図のＳＴ１においてカウンタ値を少な
く設定して減算処理を少ない回数で打ち切る場合でも、
データの最大レベルを基準に減算処理を行っているの
で、そこそこの再現性を保障することができるとともに
圧縮率を高めることができる。When the subtraction is performed, the subtraction is performed in accordance with the maximum level frequency position in the actual frequency axis data and the maximum level frequency position in the frequency pattern. Even if it is set and the subtraction process is terminated a small number of times,
Since the subtraction process is performed based on the maximum level of the data, a reasonable reproducibility can be ensured and the compression ratio can be increased.

【００５７】以上が計算処理である。以下、図６を用い
て図２におけるＳＴ８の符号化処理Ａについて詳細に説
明する。The above is the calculation processing. Hereinafter, the encoding process A of ST8 in FIG. 2 will be described in detail with reference to FIG.

【００５８】まず、最大レベル検出手段４から相対レベ
ル（１）を入力する（ＳＴ１）。ここで、相対レベルと
は最大レベルのことであるが、図２のＳＴ４からＳＴ１
０までの処理を繰り返すうちに最大レベルは変化し、相
対レベルは各々の処理において変化する最大レベルを示
している。First, the relative level (1) is input from the maximum level detecting means 4 (ST1). Here, the relative level refers to the maximum level, and is from ST4 to ST1 in FIG.
The maximum level changes as the processing up to 0 is repeated, and the relative level indicates the maximum level that changes in each processing.

【００５９】次に、最大レベル検出手段４からＳＴ１に
おける相対レベルに対応する位置情報（２）を入力する
（ＳＴ２）。Next, position information (2) corresponding to the relative level in ST1 is input from the maximum level detecting means 4 (ST2).

【００６０】続いて、最適コードブック選択手段６から
コード番号（３）を入力する。ＳＴ４において、上記相
対レベル（１）、位置情報（２）、コード番号（３）か
ら図１０（ロ）に示すように各符号化フレームを生成す
る。Subsequently, the code number (3) is inputted from the optimum code book selecting means 6. In ST4, each encoded frame is generated from the relative level (1), position information (2), and code number (3) as shown in FIG.

【００６１】以上が符号化処理Ａである。以下、図７を
用いて図２におけるＳＴ１０の符号化処理Ｂについて詳
細に説明する。The above is the encoding process A. Hereinafter, the encoding process B of ST10 in FIG. 2 will be described in detail with reference to FIG.

【００６２】まず、図２のＳＴ１で設定された計算処理
回数の処理を終了すると、所定時間内における音声デー
タの圧縮符号化処理は終了したことになり、前記所定時
間内における音声データの圧縮符号化データを生成する
処理に移行することになる。First, when the processing of the number of times of calculation set in ST1 of FIG. 2 is completed, the compression encoding processing of the audio data within the predetermined time is completed, and the compression encoding of the audio data within the predetermined time is completed. Then, the process proceeds to a process of generating encrypted data.

【００６３】ここでは、まず図１０の（イ）に示すヘッ
ダ情報を生成することからはじめる。前記ヘッダ情報は
ブロック識別子、最大レベル（１）、計算回数（４）か
ら構成される。ブロック識別子は、所定時間内の音声デ
ータ毎の各区切りを示すものであり、音声データを再現
する際ブロック識別子を検出することにより、ある所定
時間内の音声データから次の所定時間内の音声データに
移ったことを識別することができる。最大レベル（１）
は、前記前記所定時間内における周波数軸上に変換され
た音声データの最大レベルであり、図２のＳＴ４の最大
レベル検出処理において１番最初に検出された最大レベ
ルのことである。計算回数（４）は、図２のＳＴ１にお
いて設定されたカウント値であり、図２のＳＴ３からＳ
Ｔ９までの処理を何回繰り返したかを示している。Here, the procedure starts by generating the header information shown in FIG. The header information includes a block identifier, a maximum level (1), and the number of calculations (4). The block identifier indicates each delimiter of each audio data within a predetermined time. When reproducing the audio data, the block identifier is detected, so that the audio data within a certain predetermined time is converted from the audio data within the next predetermined time. Can be identified. Maximum level (1)
Is the maximum level of the audio data converted on the frequency axis within the above-mentioned predetermined time, and is the maximum level detected first in the maximum level detection processing of ST4 in FIG. The number of calculations (4) is the count value set in ST1 of FIG.
It shows how many times the processing up to T9 has been repeated.

【００６４】符号化手段８は、ブロック識別子を入力し
（ＳＴ１）、最大レベル検出手段４から最大レベル
（１）を入力し（ＳＴ２）、計算手段７から計算回数
（４）を入力して（ＳＴ３）、図１０の（ロ）に示すヘ
ッダ情報を示す。The encoding means 8 inputs the block identifier (ST1), inputs the maximum level (1) from the maximum level detection means 4 (ST2), and inputs the number of calculations (4) from the calculation means 7 (ST1). ST3), and shows the header information shown in (b) of FIG.

【００６５】続いて、図２の処理を繰り返す毎に図２の
ＳＴ８における符号化処理Ａによって生成されたフレー
ムを入力し（ＳＴ４）、前記ヘッダ情報に前記フレーム
を順次付加して図１０の（イ）に示すような符号化ブロ
ックを生成して（ＳＴ５）、外部に前記符号化ブロック
を出力する。ここで、前記ヘッダ情報に付加するフレー
ム数は図２における処理を繰り返した回数に応じて増加
する。したがって、図２における処理を繰り返した回数
が多ければ、図１０の（イ）に示す符号化ブロック（圧
縮符号化データ）のデータ長は長くなり、圧縮率はその
分低くなる。反面、図２における処理を繰り返した回数
が少なければ、図１０の（イ）に示す符号化ブロック
（圧縮符号化データ）のデータ長は短くなり、圧縮率は
その分高くなる。このように、図２のＳＴ１におけるカ
ウンタ値を任意に設定することにより、音声データの圧
縮率を設定変更することができる。Subsequently, every time the process of FIG. 2 is repeated, the frame generated by the encoding process A in ST8 of FIG. 2 is input (ST4), and the frame is sequentially added to the header information, and the frame is added to the header information (FIG. 10). A coding block as shown in a) is generated (ST5), and the coding block is output to the outside. Here, the number of frames added to the header information increases according to the number of times the processing in FIG. 2 is repeated. Therefore, if the number of repetitions of the processing in FIG. 2 is large, the data length of the coded block (compressed coded data) shown in (a) of FIG. 10 is long, and the compression ratio is correspondingly low. On the other hand, if the number of times the processing in FIG. 2 is repeated is small, the data length of the coded block (compressed coded data) shown in FIG. 10A becomes shorter, and the compression ratio becomes higher accordingly. As described above, by arbitrarily setting the counter value in ST1 of FIG. 2, the compression ratio of the audio data can be changed.

【００６６】以上が符号化処理Ｂである。以上の説明か
ら明らかなように本発明によると、音声データを圧縮符
号化する際、周波数軸上のデータを奇数成分と偶数成分
とに分離した上で、まず音声データを時間軸上のデータ
から周波数軸上の離散コサイン変換を行って、次に奇数
成分と偶数成分のに分離し、周波数パターンにより構成
されているコードブックを当てはめて減算する処理を行
っている。この減算処理を行う際、周波数軸上のデータ
を奇数成分と偶数成分とに分離した上で、コードブック
を周波数パターンにより行うことにより、周波数パター
ンはレベルの差は個々に異なっても、出現するパターン
の形状はどれも略似通った形状であるので、予め設けて
おくコードブックの数の少なくて済み、実際の音声デー
タに適合したコードブックを選択する選択処理を少ない
処理数により簡易に実行することができるとともに、処
理数により行う分前記選択処理の処理速度を高めること
ができる。The above is the encoding process B. As is apparent from the above description, according to the present invention, when compressing and encoding audio data, data on the frequency axis is separated into odd and even components, and then audio data is first separated from data on the time axis. A discrete cosine transform is performed on the frequency axis, then separated into an odd-numbered component and an even-numbered component, and a process of applying a codebook constituted by a frequency pattern and performing subtraction is performed. When performing this subtraction processing, the data on the frequency axis is separated into odd and even components, and then the code book is performed by using a frequency pattern. Since the shapes of the patterns are all substantially similar, the number of codebooks provided in advance is small, and the selection process of selecting a codebook suitable for the actual audio data is easily executed with a small number of processes. And the processing speed of the selection process can be increased by the number of processes.

【００６７】また、音声データの圧縮符号化データは、
上記減算処理を行う過程により得た最大レベル（１）、
位置情報（２）、コード番号（３）、計算回数（４）に
より生成され、前記減算処理を行う回数を増やすと圧縮
符号化データのデータ長は前記減算処理の回数に比例し
て長くなる構成としている。これにより、圧縮符号化デ
ータのデータ長を、すなわち音声データの圧縮率を減算
処理の回数の設定により任意に設定変更することができ
るので、音声データの圧縮率を任意に高めることができ
る。さらに、減算処理の回数を少なく設定し圧縮率を高
くした場合でも、データの最大レベルを基準に減算処理
を行っているので、音声の内容および特徴が判別できる
程度の再現性を保障することができる。The compressed and encoded data of the audio data is
The maximum level (1) obtained by performing the above-described subtraction process,
It is generated from the position information (2), the code number (3), and the number of calculations (4), and when the number of times of performing the subtraction process is increased, the data length of the compressed coded data becomes longer in proportion to the number of times of the subtraction process. And Thereby, the data length of the compression-encoded data, that is, the compression ratio of the audio data can be arbitrarily changed by setting the number of times of the subtraction processing, so that the compression ratio of the audio data can be arbitrarily increased. Furthermore, even when the number of times of the subtraction process is set to be small and the compression ratio is high, the subtraction process is performed based on the maximum level of the data. it can.

【００６８】以下、以上にように圧縮符号化された音声
データを再現する処理について図面を参照にしながら説
明する。Hereinafter, a process of reproducing the audio data compressed and encoded as described above will be described with reference to the drawings.

【００６９】図１１は、本発明による音声復号化処理を
実現する音声復号化装置のブロック図である。図１１に
おいて、１１は入力した圧縮符号化データから最大レベ
ル（１）、位置情報（２）、コード番号（３）、計算回
数（４）を抽出する復号化手段である。１２は音声符号
化処理に用いる周波数パターンと同種類同数の周波数パ
ターンを同一コード番号により管理して格納するコード
ブックである。１３は復号化手段から入力した最大レベ
ル（１）、位置情報（２）、コード番号（３）、計算回
数（４）と、コードブック１２から入力した周波数パタ
ーンとから基づいて、音声データを周波数軸上のデータ
として演算（加算）により再現する計算手段である。１
４は計算手段１３から出力された周波数軸上のデータを
時間軸上のデータに逆離散コサイン変換する逆離散コサ
イン変換手段である。１５は逆離散コサイン変換手段１
４から入力される音声データを出力するバッファであ
る。１６はバッファ１５から入力したワンサンプリング
データ単位によりデータをＤ／Ａ変換して出力するＤ／
Ａ変換部である。FIG. 11 is a block diagram of a speech decoding apparatus for realizing speech decoding according to the present invention. In FIG. 11, reference numeral 11 denotes decoding means for extracting the maximum level (1), position information (2), code number (3), and number of calculations (4) from the input compressed and encoded data. Reference numeral 12 denotes a code book that manages and stores the same type and the same number of frequency patterns as the frequency patterns used for the audio encoding process by using the same code numbers. Reference numeral 13 denotes the frequency of audio data based on the maximum level (1), position information (2), code number (3), number of calculations (4) input from the decoding means, and the frequency pattern input from the codebook 12. This is a calculation means that reproduces the data on the axis by calculation (addition). 1
Reference numeral 4 denotes an inverse discrete cosine transform unit for performing an inverse discrete cosine transform of the data on the frequency axis output from the calculation unit 13 into data on the time axis. 15 is an inverse discrete cosine transform means 1
4 is a buffer for outputting audio data input from the buffer 4. Reference numeral 16 denotes a D / A which D / A converts and outputs data in units of one sampling data input from the buffer 15.
A conversion unit.

【００７０】以下、以上のように構成された本発明の音
声復号化装置についてその音声復号化処理を図１２乃至
図１５を用いて説明する。図１２は本発明の音声データ
の復号化処理の一実施例を示した上位フローチャートで
ある。図１３は図１２のＳＴ１における復号化処理Ａを
示したフローチャートである。図１４は図１２のＳＴ３
における復号化処理Ｂを示したフローチャートである。
図１５は図１２のＳＴ４における計算処理（加算）を示
したフローチャートである。図１６は図１２のＳＴ７に
おける逆離散コサイン変換処理を示したフローチャート
である。Hereinafter, the speech decoding processing of the speech decoding apparatus of the present invention configured as described above will be described with reference to FIGS. FIG. 12 is a high-level flowchart showing one embodiment of the audio data decoding process of the present invention. FIG. 13 is a flowchart showing the decoding process A in ST1 of FIG. FIG. 14 shows ST3 of FIG.
6 is a flowchart showing a decoding process B in FIG.
FIG. 15 is a flowchart showing the calculation processing (addition) in ST4 of FIG. FIG. 16 is a flowchart showing the inverse discrete cosine transform processing in ST7 of FIG.

【００７１】まず、図１２を用いて本実施例の音声復号
化処理の概略を説明する。圧縮符号化データを入力する
と復号化処理Ａを行う（ＳＴ１）。復号化処理Ａは、図
１０の（イ）の符号化ブロックをヘッダ情報とフレーム
とに分解し、前記ヘッダ情報の中から最大レベル
（１）、計算回数（４）を検出する処理である。First, the outline of the speech decoding process of this embodiment will be described with reference to FIG. When the compression encoded data is input, decoding processing A is performed (ST1). The decoding process A is a process of decomposing the encoded block shown in FIG. 10A into header information and frames, and detecting the maximum level (1) and the number of calculations (4) from the header information.

【００７２】次に、前記ヘッダ情報の中から検出した計
算回数（４）に基づいてカウンタ値を設定する（ＳＴ
２）。すなわち、図２により音声データの符号化処理に
おいて行った減算処理の回数と同じ回数だけＳＴ４の加
算処理を行うことになる。Next, a counter value is set based on the number of calculations (4) detected from the header information (ST).
2). That is, the adding process of ST4 is performed the same number of times as the number of subtraction processes performed in the encoding process of the audio data according to FIG.

【００７３】続いて、復号化処理Ｂを行う。復号化処理
Ｂは、図１０の（イ）の符号化ブロック内のフレームｎ
から、各フレーム内にセットされている相対レベル
（１）、位置情報（２）、コード番号（３）を検出する
処理である。Subsequently, decoding processing B is performed. The decoding process B is performed for the frame n in the encoded block shown in FIG.
, The relative level (1), position information (2), and code number (3) set in each frame are detected.

【００７４】ＳＴ４へ進み、ＳＴ４において計算処理
（加算）を行う。計算処理は、復号化手段１１から入力
したコード番号（３）に基づきコードブック１２から該
当する周波数パターンを取り出し、復号化手段１１から
入力した相対レベル（１）に基づき前記周波数パターン
の最大レベルをｎ倍する。コードブック１２内には、符
号化処理の場合と同様に各周波数パターンの最大レベル
は図８の（イ）に示すように１に設定して格納してあ
る。次に、相対レベル（１）に基づいてｎ倍された周波
数パターンを位置情報（２）が示す位置に加算する。位
置情報（２）は、符号化処理の場合と同様に所定時間内
の音声データを１２８個に分割し、その何個目の位置で
あるかを示している。The process proceeds to ST4, where calculation processing (addition) is performed in ST4. In the calculation processing, the corresponding frequency pattern is extracted from the code book 12 based on the code number (3) input from the decoding unit 11, and the maximum level of the frequency pattern is determined based on the relative level (1) input from the decoding unit 11. Multiply by n. In the code book 12, the maximum level of each frequency pattern is set to 1 as shown in FIG. 8A and stored as in the case of the encoding process. Next, the frequency pattern multiplied by n based on the relative level (1) is added to the position indicated by the position information (2). The position information (2) divides the audio data within a predetermined time into 128 pieces as in the case of the encoding processing, and indicates the number of the position.

【００７５】計算処理が終了すると、カウンタ値をデク
リメントする（ＳＴ５）。カウンタ値が０に達するまで
（ＳＴ６）ＳＴ３→ＳＴ４→ＳＴ５→ＳＴ６と処理を繰
り返す。この間、順次後続するフレームから相対レベル
（１）、位置情報（２）、コード番号（３）を検出し、
コードブック１２から該当する周波数パターンを取り出
し、前記周波数パターンをｎ倍して位置情報（２）の示
す位置に加算する処理が行われることになる。When the calculation process is completed, the counter value is decremented (ST5). Until the counter value reaches 0 (ST6), the process is repeated from ST3 to ST4 to ST5 to ST6. During this time, the relative level (1), the position information (2), and the code number (3) are sequentially detected from the succeeding frames.
A process of extracting a corresponding frequency pattern from the codebook 12, multiplying the frequency pattern by n, and adding the same to the position indicated by the position information (2) is performed.

【００７６】以上の処理が繰り返されカウンタ値が０に
達すると、図９の（ニ）に示すようなデータが生成され
る。このデータを逆離散コサイン変換して図９の（ハ）
に示すような時間軸上のデータに変換する（ＳＴ７）。
前記時間軸上のデータに変換されたデータは、Ｄ／Ａ変
換部１６においてアナログデータに変換された後出力さ
れる。When the above processing is repeated and the counter value reaches 0, data as shown in FIG. 9D is generated. This data is subjected to inverse discrete cosine transform to obtain (c) in FIG.
(ST7).
The data converted to the data on the time axis is output after being converted to analog data in the D / A converter 16.

【００７７】以上が本実施例による音声復号化処理の概
略である。以下、図１３を用いて図１２におけるＳＴ１
の復号化手段Ａについて詳細に説明する。The above is the outline of the audio decoding processing according to the present embodiment. Hereinafter, ST1 in FIG. 12 will be described with reference to FIG.
Will be described in detail.

【００７８】まず、復号化手段１１において圧縮符号化
データ（符号化ブロック）を入力すると（ＳＴ１）、圧
縮符号化データを構成する符号化ブロックからヘッダ情
報を抜き出し、最大レベル（１）および計算回数（４）
を出力する（ＳＴ２、ＳＴ３）。First, when the compressed coded data (coded block) is input to the decoding means 11 (ST1), header information is extracted from the coded blocks constituting the compressed coded data, and the maximum level (1) and the number of calculations are calculated. (4)
Is output (ST2, ST3).

【００７９】以上が復号化手段Ａである。以下、図１４
を用いて図１２におけるＳＴ３の復号化手段Ｂについて
詳細に説明する。The above is the decoding means A. Hereinafter, FIG.
The decoding means B in ST3 in FIG. 12 will be described in detail with reference to FIG.

【００８０】まず、復号化手段１１において圧縮符号化
データ（符号化ブロック）内の先頭フレームを解析する
（ＳＴ１）。解析の結果、相対レベル（１）、位置情報
（２）、コード番号（３）を出力する（ＳＴ２、ＳＴ
３、ＳＴ４）。図１４のフローチャートに戻りＳＴ４→
ＳＴ５→ＳＴ６→ＳＴ３と進み、再び復号化手段Ｂに戻
ってくると、圧縮符号化データ内の２番目のフレームつ
いて上述の処理を行い、この処理を図１４のＳＴ６にお
いてカウンタ値が０に達するまで繰り返す。First, the decoding means 11 analyzes the first frame in the compressed coded data (coded block) (ST1). As a result of the analysis, a relative level (1), position information (2), and a code number (3) are output (ST2, ST2).
3, ST4). Returning to the flowchart of FIG. 14, ST4 →
When the process proceeds from ST5 to ST6 to ST3 and returns to the decoding means B again, the above-described processing is performed on the second frame in the compressed and coded data, and the counter value reaches 0 in ST6 in FIG. Repeat until

【００８１】以上が復号化手段Ｂである。以下、図１５
を用いて図１２におけるＳＴ４の計算処理（加算）につ
いて詳細に説明する。The above is the decoding means B. Hereinafter, FIG.
The calculation process (addition) in ST4 in FIG. 12 will be described in detail with reference to FIG.

【００８２】まず、計算手段１３は、復号化手段１１か
ら相対レベル（１）、位置情報（２）、コード番号
（３）を入力する（ＳＴ１）。次にコードブック１２か
ら前記コード番号（３）に対応する周波数パターンを取
り出す（ＳＴ２）。続いて、前記相対レベル（１）に基
づいて前記周波数パターンの最大レベルをｎ倍する（Ｓ
Ｔ３）。ｎ倍した周波数パターンを前記位置情報（２）
に前記周波数パターンの最大レベルの位置を合せて加算
する処理を行う（ＳＴ４）。周波数パターンの加算が終
了すると、図１２のフローチャートに戻り、図１２のＳ
Ｔ６においてカウンタ値が０に達するまでＳＴ５→ＳＴ
６→ＳＴ３→ＳＴ４の処理を繰り返し、再びＳＴ４の計
算処理において圧縮符号化データ内の２番目のフレーム
の解析結果から得た相対レベル（１）、位置情報
（２）、コード番号（３）を入力し、前回の加算結果の
上にさらに今回の周波数パターンが加算される。計算処
理は図２のＳＴ６においてカウンタ値が０に達するまで
繰り返される。First, the calculating means 13 inputs the relative level (1), position information (2), and code number (3) from the decoding means 11 (ST1). Next, a frequency pattern corresponding to the code number (3) is extracted from the code book 12 (ST2). Subsequently, the maximum level of the frequency pattern is multiplied by n based on the relative level (1) (S
T3). The frequency pattern multiplied by n is used as the position information (2)
Then, a process of adding the position of the maximum level of the frequency pattern is performed (ST4). When the addition of the frequency pattern is completed, the process returns to the flowchart of FIG.
ST5 → ST until the counter value reaches 0 at T6
The processing of 6 → ST3 → ST4 is repeated, and the relative level (1), position information (2), and code number (3) obtained from the analysis result of the second frame in the compressed coded data in the calculation processing of ST4 again Then, the current frequency pattern is added to the previous addition result. The calculation process is repeated until the counter value reaches 0 in ST6 of FIG.

【００８３】以上が計算処理（加算）である。以上の説
明から明らかなように本発明によると、圧縮された音声
データを再現する際、まず圧縮符号化データから相対レ
ベル（１）、位置情報（２）、コード番号（３）、計算
回数（４）を検出し、圧縮符号化データの先頭フレーム
から得られる情報によりコードブックを選択し加算する
処理が行われる。続いて、圧縮符号化データの次フレー
ムから得られる情報により処理を行い、これを前記計算
回数（４）に基づく回数だけ繰り返して復号化処理を行
う。この復号化処理により圧縮符号化された音声データ
を確実に再現することができる。The above is the calculation processing (addition). As apparent from the above description, according to the present invention, when reproducing the compressed audio data, first, the relative level (1), the position information (2), the code number (3), the number of calculations ( 4) is detected, and a process of selecting and adding a codebook based on information obtained from the first frame of the compression-encoded data is performed. Subsequently, processing is performed using information obtained from the next frame of the compressed and coded data, and decoding is performed by repeating this processing the number of times based on the number of calculations (4). By this decoding process, the compressed and encoded audio data can be reliably reproduced.

【００８４】[0084]

【発明の効果】以上の説明から明らかなように、本発明
は、音声データを圧縮符号化する際、まず音声データを
時間軸上のデータから周波数軸上のデータに離散コサイ
ン変換を行って、次に離散コサイン変換後のデータを奇
数成分と偶数成分のに分離し、この分離した結果得た周
波数軸上のデータに周波数パターンにより構成されたコ
ードブックを当てはめて減算する処理を行っている。こ
の減算処理を行う際、周波数軸上のデータを奇数成分と
偶数成分とに分離しているので、奇数成分と偶数成分と
の各々領域において相関係数が増加するため、周波数軸
上のデータは略近似したパターンが出現するようにな
る。その上で、コードブックを周波数パターンにより構
成すると、周波数パターンはレベルの差は個々に異なっ
ても、出現するパターンの形状は相互に似通った形状で
あるので、予め設けておくコードブックの数を大幅に削
減しても実際の周波数軸上のデータに適応させることが
でき、実際の音声データに適合したコードブックを選択
する選択処理を少ない処理ステップ数により簡易に実行
することができるとともに、処理数により行う分前記選
択処理の処理速度を高めることができるという効果を実
現することができる。As is apparent from the above description, according to the present invention, when compressing and encoding audio data, first, the audio data is subjected to discrete cosine transform from data on the time axis to data on the frequency axis. Next, the data after the discrete cosine transform is separated into an odd-numbered component and an even-numbered component, and the data on the frequency axis obtained as a result of the separation is applied to a codebook constituted by a frequency pattern to perform a subtraction process. When performing this subtraction processing, since the data on the frequency axis is separated into an odd component and an even component, the correlation coefficient increases in each region of the odd component and the even component. A substantially similar pattern appears. Then, if the codebook is configured by frequency patterns, the frequency patterns have different levels, but the shapes of the patterns appearing are similar to each other. Even if it is greatly reduced, it can be adapted to the data on the actual frequency axis, and the selection process of selecting a codebook suitable for the actual audio data can be easily executed with a small number of processing steps, and The effect that the processing speed of the selection processing can be increased by the number performed can be realized.

【００８５】また、上記減算処理を行う際、前記減算処
理の回数は任意に設定することができるので、前記回数
を予め多く設定しておくとその分音質の高い音声データ
の圧縮を実現することができ、反面、前記回数を予め多
く設定しておくと計算回数が多くなり、その分圧縮符号
化データのデータ長が長くなり、音声データの圧縮率は
低くなる。このように任意に音声データの圧縮率を設定
変更することができる。When performing the subtraction process, the number of times of the subtraction process can be set arbitrarily. Therefore, if the number of the subtraction processes is set to a large value in advance, it is possible to realize compression of audio data having high sound quality. On the other hand, if the number of times is set in advance, the number of calculations increases, the data length of the compression-encoded data increases, and the compression ratio of audio data decreases. Thus, the compression ratio of the audio data can be arbitrarily set and changed.

【００８６】また、上記の減算処理の回数を予め多く設
定しておいた場合でも、周波数軸上のデータと選択され
た周波数パターンとがほとんど同一形状である場合は、
減算により実際のデータの最大レベルが予め設定されて
いる最小レベルを下回ると、その時点で減算処理が前記
回数に達しなくても中断している。このように、実際の
周波数軸上のデータと周波数パターンとが近似している
場合には減算処理の回数は予め設定された回数より少な
くなり、また、実際の周波数軸上のデータと周波数パタ
ーンとが近似していない場合には予め設定された回数ま
で行っているので、前記回数を多く設定しても周波数軸
上のデータと選択された周波数パターンとがほとんど同
一形状であれば処理を中断する分、音声データの再現性
を保障しつつ圧縮率を高めることができる。Even if the number of times of the subtraction process is set to be large in advance, if the data on the frequency axis and the selected frequency pattern have almost the same shape,
When the maximum level of the actual data falls below the preset minimum level by the subtraction, the subtraction processing is interrupted even if the number of times does not reach the aforementioned number at that time. Thus, when the data on the actual frequency axis and the frequency pattern are similar, the number of times of the subtraction processing is smaller than the preset number, and the data on the actual frequency axis and the frequency pattern are Are not approximated, the processing is performed up to a preset number of times, so even if the number of times is set to a large value, the processing is interrupted if the data on the frequency axis and the selected frequency pattern have almost the same shape. Therefore, the compression ratio can be increased while ensuring the reproducibility of the audio data.

【００８７】また、無音声状態が判別された場合でも、
無音声状態にのみ用いる処理を特別に行わず、音声があ
る場合の通常の処理により圧縮符号化データを生成して
いるので、その分ソフトウェアを減少させることができ
る。さらに音声データがない場合には圧縮情報が減少す
るので、その分音声データの圧縮率を上げることができ
る。Further, even when the silent state is determined,
Since the processing to be used only in the non-speech state is not specially performed and the compressed and coded data is generated by the normal processing when there is a sound, the software can be reduced accordingly. Further, when there is no audio data, the compression information decreases, so that the compression ratio of the audio data can be increased accordingly.

【００８８】また、上記減算処理を行う際、減算処理の
回数を少なく設定し圧縮率を高くした場合でも、実際の
データの最大レベルを基準にコードブックを当てはめて
減算処理を行っているので、音声の内容および特徴が判
別できる程度の再現性を保障することができる。When performing the above-mentioned subtraction processing, even if the number of times of the subtraction processing is set small and the compression ratio is increased, the subtraction processing is performed by applying the codebook based on the maximum level of the actual data. Reproducibility to the extent that the content and characteristics of the audio can be determined can be guaranteed.

【００８９】また、上記減算処理により得られた音声デ
ータの圧縮符号化データは、上記減算処理を行う過程に
より得た最大レベル（１）、位置情報（２）、コード番
号（３）、計算回数（４）により生成され、前記減算処
理を行う回数を増やすと前記圧縮符号化データのデータ
長は前記減算処理の回数に比例して長くなる構成として
いるので、音声データの圧縮率を減算処理の回数の設定
により任意に設定変更することができるので、圧縮符号
化データのデータ長をを任意に設定変化させることがで
き、音声データの圧縮率を任意に高めることができる。The compressed and coded data of the audio data obtained by the subtraction processing is the maximum level (1), the position information (2), the code number (3), the number of calculations obtained by the process of performing the subtraction processing. Since the data length of the compressed and coded data generated by (4) is increased in proportion to the number of times of the subtraction processing when the number of times of the subtraction processing is increased, the compression ratio of the audio data is reduced by the number of times of the subtraction processing. Since the setting can be arbitrarily changed by setting the number of times, the data length of the compression-encoded data can be arbitrarily set and changed, and the compression ratio of audio data can be arbitrarily increased.

[Brief description of the drawings]

【図１】本発明による音声符号化処理を行うブロック図FIG. 1 is a block diagram illustrating a speech encoding process according to the present invention.

【図２】本発明の音声データの符号化処理の一実施例を
示した上位フローチャートFIG. 2 is a high-level flowchart showing an embodiment of audio data encoding processing according to the present invention;

【図３】図２のＳＴ４における最大レベル検出処理を示
したフローチャートFIG. 3 is a flowchart showing a maximum level detection process in ST4 of FIG. 2;

【図４】図２のＳＴ６における最適コードブック選択処
理を示したフローチャートFIG. 4 is a flowchart showing an optimal codebook selection process in ST6 of FIG. 2;

【図５】図２のＳＴ７における計算処理（減算）を示し
たフローチャートFIG. 5 is a flowchart showing calculation processing (subtraction) in ST7 of FIG. 2;

【図６】図２のＳＴ８における符号化処理Ａを示したフ
ローチャートFIG. 6 is a flowchart showing an encoding process A in ST8 of FIG. 2;

【図７】図２のＳＴ１１における符号化処理Ｂを示した
フローチャートFIG. 7 is a flowchart showing an encoding process B in ST11 of FIG. 2;

【図８】図１のコードブック５に格納されている周波数
パターンの代表パターンを示した図FIG. 8 is a diagram showing a representative pattern of frequency patterns stored in the code book 5 of FIG. 1;

【図９】音声データを離散コサイン変換して時間軸上の
データから周波数軸上のデータに変換した所定時間にお
けるデータの状態を示した図FIG. 9 is a diagram illustrating a state of data at a predetermined time when audio data is converted into data on a frequency axis from data on a time axis by discrete cosine conversion;

【図１０】符号化された圧縮符号化データの構成を示し
たデータ構成図FIG. 10 is a data configuration diagram showing a configuration of encoded compressed and encoded data.

【図１１】本発明による音声復号化処理を実現するブロ
ック図FIG. 11 is a block diagram for realizing a speech decoding process according to the present invention.

【図１２】本発明の音声データの復号化処理の一実施例
を示した上位フローチャートFIG. 12 is a high-level flowchart showing an embodiment of audio data decoding processing according to the present invention;

【図１３】図１２のＳＴ１における復号化処理Ａを示し
たフローチャート13 is a flowchart showing a decoding process A in ST1 of FIG.

【図１４】図１２のＳＴ３における復号化処理Ｂを示し
たフローチャート14 is a flowchart showing a decoding process B in ST3 of FIG.

【図１５】図１２のＳＴ４における計算処理（加算）を
示したフローチャートFIG. 15 is a flowchart showing calculation processing (addition) in ST4 of FIG. 12;

[Explanation of symbols]

３離散コサイン変換手段４最大レベル検出手段５コードブック６最適コードブック選択手段７計算手段８符号化手段 Reference Signs List 3 Discrete cosine transform means 4 Maximum level detecting means 5 Codebook 6 Optimal codebook selecting means 7 Calculating means 8 Encoding means

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平５−218980（ＪＰ，Ａ) 特開平５−145427（ＪＰ，Ａ) 特開平６−291674（ＪＰ，Ａ) 特開平４−249300（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 G10L 11/00 ────────────────────────────────────────────────── ─── Continuation of front page (56) References JP-A-5-218980 (JP, A) JP-A-5-145427 (JP, A) JP-A-6-291674 (JP, A) JP-A-4- 249300 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 19/00 G10L 11/00

Claims

(57) [Claims]

1. A first process for performing discrete cosine transform of audio data in a predetermined time unit, a second process for separating data obtained by the discrete cosine transform into an odd component and an even component, and A third process for detecting the maximum level of the data and position information of the maximum level, and from a codebook composed of frequency patterns, data obtained by the discrete cosine transform approximated data around the maximum level. A fourth process of selecting the codebook, a fifth process of subtracting the codebook from data obtained by the discrete cosine transform, the maximum level, the position information, and the codebook used for the subtraction are shown. Generating encoded data of the audio data based on the information and the number of times of the fifth processing performed on the audio data within the predetermined time; A speech encoding method including a sixth process.

2. A means for performing discrete cosine transform of audio data in a predetermined time unit; a means for separating the separated data into an odd component and an even component; a maximum level of the separated data and a maximum level thereof; Means for detecting the position information, means for storing a codebook composed of frequency patterns, and means for selecting the codebook approximated to the data around the maximum level in the data obtained by the discrete cosine transform, Means for subtracting the codebook from the data around the maximum level, the maximum level, the position information, information indicating the codebook used for the subtraction, and the subtraction performed on the audio data within the predetermined time. Means for generating encoded data of audio data based on the number of times.

3. A means for performing discrete cosine transform of audio data in a predetermined time unit; means for separating the separated data into odd and even components; a maximum level of the separated data and a maximum level of the separated data; Means for detecting the position information, means for storing a codebook composed of frequency patterns, and means for selecting the codebook approximated to the data around the maximum level in the data obtained by the discrete cosine transform, Means for subtracting the codebook from the data around the maximum level, means for arbitrarily setting the number of times of the subtraction, the maximum level, the position information, information indicating the codebook used for the subtraction, the predetermined time Means for generating encoded data of audio data based on the number of times of subtraction performed on audio data in Audio coding device.

Means for discretely cosine transforming audio data in predetermined time units, means for separating the separated data into odd and even components, a maximum level of the separated data and a maximum level of the maximum level of the separated data. Means for detecting position information, means for storing a codebook composed of frequency patterns, means for selecting the codebook that is similar to data around the maximum level in data obtained by the discrete cosine transform, Means for subtracting the codebook from data around a maximum level, means for arbitrarily setting a minimum level for performing the subtraction at a level of data obtained by the discrete cosine transform, and a method for subtracting the maximum level, the position information, and the subtraction. Information indicating the codebook used, and the number of times of the subtraction performed on the audio data within the predetermined time. Means for generating encoded data of audio data based on the number.

5. A first process for performing discrete cosine transform of audio data in predetermined time units, a second process for separating data obtained by the discrete cosine transform into an odd component and an even component, and A third process for detecting the maximum level of the data and position information of the maximum level, and from a codebook composed of frequency patterns, data obtained by the discrete cosine transform approximated data around the maximum level. A fourth process of selecting the codebook, a fifth process of subtracting the codebook from data around the maximum level,
Fifth processing for generating encoded data of audio data based on the position information, information indicating the codebook used for subtraction, and the number of times of the fifth processing performed on audio data within the predetermined time A voice encoding method for generating encoded data of non-voice data by the sixth process even in a non-voice state.

6. When performing the fifth process, the codebook obtained by the fourth process is multiplied by n according to the maximum level of data obtained by the discrete cosine transform, and the maximum of the data obtained by the discrete cosine transform is obtained. 2. The speech encoding method according to claim 1, wherein a position of a maximum level of the codebook is adjusted to a position of a level, and the data obtained by the discrete cosine transform is subtracted from the codebook.

7. The speech encoding method according to claim 1, wherein when performing the sixth processing, the data length of the encoded data is variably generated according to the number of times of the fifth processing.

8. A first process for inputting encoded data of audio data generated by the audio encoding method according to claim 1, a maximum level from the encoded data, position information of the maximum level, and a frequency pattern. A second process for extracting information indicating a codebook used in the encoding process and selecting a codebook based on the information indicating the codebook, and selecting the codebook at a position indicated by the position information. And a third process of adding and adding the maximum levels of the above and reproducing audio data from the encoded data.

9. A means for inputting encoded data of audio data generated by the audio encoding device according to claim 2,
The number of operations performed on audio data within a predetermined time is extracted from the encoded data, and a maximum level, a position information of the maximum level, and a frequency pattern are formed from a frame generated according to the number of operations. Means for extracting information indicating the codebook used during the encoding process,
Means for storing a codebook constituted by a frequency pattern, selecting a codebook based on the information indicating the codebook, means for adding the maximum level of the codebook to the position indicated by the position information, Means for causing this means to repeat the addition of the codebook by the number of times of the extracted operation, thereby reproducing sound data from the encoded data.