JPH10260695A

JPH10260695A - Speech signal encoding device

Info

Publication number: JPH10260695A
Application number: JP9066272A
Authority: JP
Inventors: Teruo Hoshi; 照雄法師
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1997-03-19
Filing date: 1997-03-19
Publication date: 1998-09-29

Abstract

PROBLEM TO BE SOLVED: To decrease the encoding quantity in a high frequency band and to perform more efficient encoding by handling frequency bands together as one bundled band and omitting the recording of a mantissa as to the bundled band. SOLUTION: Block data of respective audio blocks are formed of digital data on the frequency axis for each prescribed period obtained by the MDCT. The obtained frequency data obtain a signal indicating the intensity by 1000 fine bands having specific intervals. In this case, intensity signals in respective bands are handled together by approximately 3 to 12 bands in order to reduce the data amount while preventing deterioration of sound quality as to high frequencies at which auditory influence is small. Namely, bands of high frequency are bundled and handled together. Then audio blocks are reused and reproduced to expand a recording and reproduction time per frame.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声信号などをデ
ジタル記憶する際に利用される音声信号符号化装置、特
に高効率の符号化が行えるものに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio signal encoding apparatus used for digitally storing an audio signal and the like, and more particularly to an apparatus capable of performing highly efficient encoding.

【０００２】[0002]

【従来の技術】従来より、アナログの情報信号を符号化
してデジタル記録することが行われており、例えば、音
声信号はデジタルデータとして、ＣＤ（Compact Dis
c）やＭＤ（Mini Disc）などにおいては、アナログ音
声信号がデジタル記録されている。2. Description of the Related Art Conventionally, an analog information signal has been encoded and digitally recorded. For example, an audio signal is converted into digital data by a CD (Compact Discharge).
In c) and MD (Mini Disc), analog audio signals are digitally recorded.

【０００３】また、留守番機能付き電話機等において音
声記録装置として半導体メモリを用い、このメモリに音
声信号をデジタル記録することも試みられている。半導
体メモリを用いた音声記録は、記録・再生が容易であ
り、また装置を小型化する上でも便利なことからその利
用が進んでいる。更に、半導体メモリの大容量化、低価
格化が進む中、音楽などの記録にも半導体メモリの利用
の可能性が提案されるようになっている。Further, it has been attempted to use a semiconductor memory as an audio recording device in a telephone with an answering machine function and digitally record an audio signal in this memory. Audio recording using a semiconductor memory has been increasingly used because it is easy to record and reproduce and is also convenient for miniaturizing the device. Furthermore, as the capacity and cost of semiconductor memories have been increasing, the possibility of using semiconductor memories for recording music and the like has been proposed.

【０００４】このような音声信号のデジタル記録におい
ては、アナログ音声信号をデジタル信号に変換する際
に、なるべく原音に忠実に再生ができ、かつ、記録すべ
きデジタルデータ量ができるだけ少ないように符号化す
ることが望まれる。特に、半導体メモリを用いた音声記
録の場合、半導体メモリの低価格化が進んだとはいえ、
記録容量の大きいメモリは高価であるので、できる限り
高効率の符号化を行いデータ量を削減する必要がある。In such digital recording of an audio signal, when an analog audio signal is converted into a digital signal, encoding is performed so that the original sound can be reproduced as faithfully as possible and the amount of digital data to be recorded is as small as possible. It is desired to do. In particular, in the case of audio recording using a semiconductor memory, although the cost of the semiconductor memory has been reduced,
Since a memory having a large recording capacity is expensive, it is necessary to perform encoding as efficiently as possible to reduce the amount of data.

【０００５】音声信号の符号化にあたっては、時系列の
音声信号の瞬時値をデジタル値で示すＰＣＭ（Pulse Co
de Modulation）方式が一般的に用いられている。しか
し、ＰＣＭ方式は、符号化の効率が悪く、再生音声の劣
化を防ぐためにサンプリング周波数Ｆｓを高くしたり、
サンプリング値に対するビット割当数を多くするとデー
タ量が増加して記録時間が十分なものとならない。例え
ば、サンプリング周波数Ｆｓを８ＫＨｚとし、８ビット
量子化を行うと、データ量は６４Ｋｂｐｓとなり、３２
Ｍｂｉｔのメモリを用いても録音時間は、８分４４秒で
あり、簡単な音声メモにしか利用することができない。
また、反対に、サンプリング周波数Ｆｓやビット割当数
を下げることでデータ量を低減できるが、再生時の音質
の低下が著しい。[0005] When encoding a speech signal, a pulse code modulation (PCM) which indicates the instantaneous value of the time-series speech signal as a digital value is used.
de Modulation) system is generally used. However, the PCM method has a low encoding efficiency, and requires a higher sampling frequency Fs in order to prevent deterioration of reproduced sound.
If the number of bits allocated to the sampling value is increased, the data amount increases and the recording time is not sufficient. For example, when the sampling frequency Fs is set to 8 KHz and 8-bit quantization is performed, the data amount becomes 64 Kbps and 32
Even if an Mbit memory is used, the recording time is 8 minutes and 44 seconds, and can be used only for simple voice memos.
Conversely, the data amount can be reduced by lowering the sampling frequency Fs and the number of allocated bits, but the sound quality during reproduction is significantly reduced.

【０００６】そこで、音声信号の符号化に離散コサイン
変換等の技術を用い、音声データを周波数軸上のデータ
に変換し、周波数帯域毎のデータとして符号化する手法
が提案されている。この手法によれば、数値の丸めによ
る忠実度の低下が少なく、デジタル化による誤差の影響
を小さくすることができる。すなわち、アナログの音声
信号を所定のサンプリング周期でデジタル化し時間軸上
のデータとした場合、各サンプリングに誤差があると特
有の周波数成分がのってくる。従って、デジタル処理に
よって、不要な周波数成分が発生して、ノイズとなり音
質が劣化しやすい。これに対して、データを周波数軸上
に変換しておけば、各周波数帯域のバランスが変化する
だけで、音質の劣化を最小限にすることができる。従っ
て、デジタルデータを周波数軸のデータに変換すること
によって、データ数をより削減して、高効率の符号化を
行うことができる。Therefore, a technique has been proposed in which audio data is converted into data on the frequency axis by using a technique such as discrete cosine transform for encoding the audio signal, and is encoded as data for each frequency band. According to this method, the decrease in fidelity due to rounding of numerical values is small, and the influence of errors due to digitization can be reduced. In other words, when an analog audio signal is digitized at a predetermined sampling period and used as data on the time axis, if there is an error in each sampling, a specific frequency component will appear. Therefore, unnecessary frequency components are generated by the digital processing, which becomes noise and the sound quality is easily deteriorated. On the other hand, if the data is converted on the frequency axis, deterioration of sound quality can be minimized only by changing the balance of each frequency band. Therefore, by converting the digital data to data on the frequency axis, the number of data can be further reduced and highly efficient encoding can be performed.

【０００７】[0007]

【発明が解決しようとする課題】ここで、このような直
交変換を利用する符号化においては、各周波数帯域毎に
レベルデータ（またはこの差分）を得るが、データ量削
減のためには、各周波数帯域毎の割り当てビット数を減
らすことになる。しかし、音質を維持するためには、各
帯域内での強度レベルに割り当てるビット数をあまり削
減することはできない。そこで、音質に影響の少ない低
周波や、高周波の帯域についてのビット数を減少して、
全体としてのデータ量の削減が図られている。しかし、
データ量はなるべく少ない方がよく、さらなるデータ削
減を図った高効率符号化が求められている。Here, in the encoding using such orthogonal transform, level data (or a difference thereof) is obtained for each frequency band. This reduces the number of bits allocated for each frequency band. However, in order to maintain sound quality, the number of bits allocated to the intensity level in each band cannot be reduced so much. Therefore, by reducing the number of bits for low-frequency and high-frequency bands that have little effect on sound quality,
The amount of data as a whole is reduced. But,
It is better that the data amount is as small as possible, and high-efficiency coding for further data reduction is required.

【０００８】本発明は、上記課題に鑑みなされたもので
あり、高周波数帯域における符号量を削減し、より高効
率な符号化が行える信号符号化装置を提供することを目
的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has as its object to provide a signal coding apparatus capable of reducing the amount of codes in a high frequency band and performing more efficient coding.

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、音声信号をデジタル記憶する際に利用さ
れる符号化装置において、音声信号を多数の周波数バン
ドに分割し、各周波数バンドのレベルを指数と、仮数と
で表現すると共に、一定周波数以上の高域側周波数バン
ドについては、複数の周波数バンドをまとめて１つの結
束バンドとして扱い、この結束バンドについての仮数の
記録を省略することを特徴とする。In order to achieve the above object, the present invention relates to an encoding apparatus used for digitally storing an audio signal, which divides the audio signal into a number of frequency bands, The band level is represented by an exponent and a mantissa, and a plurality of frequency bands are collectively treated as one binding band for a higher frequency band above a certain frequency, and the recording of the mantissa for this binding band is omitted. It is characterized by doing.

【００１０】指数と仮数を用いて各周波数バンドのレベ
ルを表現し、高域側の結束バンドでは仮数の記録を省略
することで、記憶データ量を削減する。高域側の結束バ
ンドでは、位相は重要ではなく、また複数の微細な周波
数バンドが複数結束されることで、隣接する結束バンド
との周波数数差が大きいため波形合成作用も重要でな
い。従って、このような高域側の結束バンドの仮数の記
録を省略することにより、音質にほとんど影響を与える
ことなく、データ量を削減することが可能となる。な
お、複数の周波数バンドを示す指数に関しては、指数そ
のものを符号化して表すことができるが、各周波数バン
ド及び結束バンドの指数を差分データとして示す方式を
併用することで、さらなるデータ量の低減を図ることが
可能となる。[0010] The level of each frequency band is expressed using the exponent and the mantissa, and the recording of the mantissa is omitted in the binding band on the high frequency side, thereby reducing the amount of stored data. In the binding band on the high frequency side, the phase is not important, and since a plurality of fine frequency bands are bound, the frequency number difference between adjacent binding bands is large, so that the waveform synthesizing action is not important. Therefore, by omitting the recording of the mantissa of the binding band on the high frequency side, the data amount can be reduced without substantially affecting the sound quality. The exponent indicating a plurality of frequency bands can be expressed by encoding the exponent itself.However, by using a method of indicating the exponent of each frequency band and the binding band as difference data, further reduction of the data amount can be achieved. It becomes possible to plan.

【００１１】また、本発明では、上記音声信号符号化装
置において、前記結束バンドについて、結束する各周波
数バンドのレベルの絶対値を平均して結束バンドのレベ
ルとすることを特徴とする。高域側では、近隣バンドの
レベルが大きいと他のバンドの音声が聞こえにくくなる
というマスキング効果が強い。そこで、各周波数バンド
のレベルの絶対値の平均値で結束バンドを表現してデー
タ量を削減しても、音質の低下をわずかに抑えることが
できる。Further, in the present invention, in the above-described audio signal encoding apparatus, the absolute value of the level of each frequency band to be bound is averaged to obtain the level of the bound band. On the high frequency side, when the level of the neighboring band is high, the masking effect that the sound of the other bands becomes difficult to hear is strong. Therefore, even if the binding band is expressed by the average value of the absolute values of the levels of the respective frequency bands to reduce the data amount, it is possible to slightly suppress the deterioration of the sound quality.

【００１２】[0012]

【発明の実施の形態】以下、本発明の実施の形態（以下
実施形態という）について、図面に基づいて説明する。Embodiments of the present invention (hereinafter referred to as embodiments) will be described below with reference to the drawings.

【００１３】［情報信号の符号化方式の概要］本実施形
態では、情報信号として、例えば、人の声や自然音、楽
器音などのアナログ音声信号を符号化処理対象としてい
る。そして、音声信号を所定周波数でサンプリングして
デジタルデータに変換し、更に、時間軸上のデジタルデ
ータを上述の離散コサイン変換（ＭＤＣＴ：Modified D
iscrete Cosine Transform）を用いて周波数軸上のデー
タへと変換する。[Outline of Encoding Method of Information Signal] In the present embodiment, as an information signal, for example, an analog audio signal such as a human voice, a natural sound, or a musical instrument sound is to be encoded. Then, the audio signal is sampled at a predetermined frequency and converted into digital data. Further, the digital data on the time axis is converted into the above-mentioned discrete cosine transform (MDCT: Modified D).
It is transformed into data on the frequency axis using iscrete cosine transform.

【００１４】図１は、本実施形態の符号化方式における
音声記録データの階層構造を示してる。FIG. 1 shows a hierarchical structure of audio recording data in the encoding system of the present embodiment.

【００１５】本実施形態の符号化方式において、音声記
録の最小単位はオーディオブロックであり、各オーディ
オブロックのブロックデータは、上述のＭＤＣＴによっ
て得られた１０ｍｓ期間毎の周波数軸上のデジタルデー
タから形成される。具体的には、まず、ＭＤＣＴによっ
て、１０ｍｓ期間毎に得られる周波数データは、ＤＣ〜
５ＫＨｚまでの５０Ｈｚ間隔の１００個の微細バンドご
との強度を示す信号を求める。本実施形態では、音質の
低下を防止しつつデータ量を削減するために、聴感上の
影響の少ない高周波域（例えば１ＫＨｚ以上の周波数
帯）については、各バンドの強度信号を３〜１２バンド
程度まとめて取り扱う。つまり高域側の複数のバンドに
ついてはバンドを結束して取り扱う。また、２５Ｈｚ以
下を省略して、計１００個の微細バンドを２５バンド＃
分の強度信号に圧縮する。なお、バンド＃とは、以下記
録のハンドを示す。更に、各バンド＃についてのスペク
トラム強度を後述するように指数と仮数で示し、１オー
ディオブロックのデータとする。In the encoding system of the present embodiment, the minimum unit of audio recording is an audio block, and block data of each audio block is formed from digital data on the frequency axis for every 10 ms period obtained by the above-mentioned MDCT. Is done. Specifically, first, frequency data obtained every 10 ms period by MDCT is DC to
A signal indicating the intensity of each of the 100 fine bands at 50 Hz intervals up to 5 KHz is obtained. In the present embodiment, in order to reduce the amount of data while preventing the sound quality from deteriorating, the intensity signal of each band is reduced to about 3 to 12 bands in a high-frequency range (for example, a frequency band of 1 KHz or more) that has little effect on hearing. Handle collectively. That is, a plurality of bands on the high frequency side are handled by binding the bands. In addition, 25 Hz or less is omitted, and a total of 100 fine bands are divided into 25 bands #
Compress the intensity signal into minutes. The band # indicates a hand for recording. Further, the spectrum intensity for each band # is indicated by an exponent and a mantissa as described later, and is data of one audio block.

【００１６】図１に示すように、１オーディオブロック
は、４ビットのイニシャルデータと、５６ビットの差分
指数及び２０ビットの仮数とから構成されており、イニ
シャルデータは、７５Ｈｚバンドのスペクトル強度の指
数を示す。また、差分指数は、７５Ｈｚバンドに続く２
４バンド＃の各スペクトル強度を差分データとして表
し、７ビットで３バンド＃分を表示することで、２４バ
ンド＃分のスペクトル強度を表している。また、仮数は
各指数の位相を表し、高域側の結束バンドについては仮
数の表示を省略している。As shown in FIG. 1, one audio block is composed of 4-bit initial data, a 56-bit difference index and a 20-bit mantissa, and the initial data is an index of a 75 Hz band spectral intensity. Is shown. The difference index is 2 following the 75 Hz band.
The spectrum intensities of 24 bands # are represented by expressing each spectrum intensity of 4 bands # as differential data and displaying 3 bands # with 7 bits. The mantissa represents the phase of each exponent, and the mantissa is not displayed for the binding band on the high frequency side.

【００１７】３つのオーディオブロックは、１フレーム
の音声情報を構成する。また、１フレームは、２ⁿビッ
ト（例えば、ｎ＝８、つまり２５６ビット）の固定長に
設定し、これによりフレームを認識するための同期信号
を省略可能としている。また、２ⁿフレーム（例えば、
ｎ＝６、つまり６４フレーム）で一つのスーパーフレー
ムを構成している。The three audio blocks constitute one frame of audio information. In addition, one frame is set to a fixed length of 2 ⁿ bits (for example, n = 8, that is, 256 bits), so that a synchronization signal for recognizing the frame can be omitted. Also, 2 ⁿ frames (for example,
(n = 6, that is, 64 frames) constitutes one superframe.

【００１８】１フレームは、それぞれ８０ビットからな
る３つのオーディオブロック（ＡＢ）のオーディオブロ
ックデータを備え、最短で３０ｍｓ分の音声情報を表示
する。ここで、連続したブロック間で所定の類似関係を
有する場合や、オーディオブロックの指数ピークレベル
が所定値以下が連続する場合には、２ブロック目からい
くつかのオーディオブロックの記録を省略する。One frame includes audio block data of three audio blocks (AB) each consisting of 80 bits, and displays audio information for a minimum of 30 ms. Here, when there is a predetermined similarity between consecutive blocks, or when the exponential peak level of an audio block continues to be a predetermined value or less, recording of some audio blocks from the second block is omitted.

【００１９】３つのオーディオブロックのそれぞれの利
用回数は、同一フレーム内に例えば１５ビットのモード
データを設けてこれを記録する。再生時には、このモー
ドデータに示される利用回数に応じて同一のオーディオ
ブロックを複数回利用してリピート再生する。The number of uses of each of the three audio blocks is recorded by providing, for example, 15-bit mode data in the same frame. At the time of reproduction, the same audio block is used a plurality of times according to the number of times of use indicated in the mode data to repeat the reproduction.

【００２０】このようにオーディオブロックを再利用し
て再生することにより、１フレーム当たりの記録再生時
間は最大１７．３倍（５２０ｍｓ）までに拡大すること
ができる。音声信号などでは、一定の近似した波形があ
る程度継続していることが多いので、このような場合に
類似データの記録を省略することが可能となる。また、
音声レベルが低い場合には、そのオーディオブロックを
前のオーディオブロックで代用することが可能となる。By reusing and reproducing the audio block in this manner, the recording / reproducing time per frame can be extended up to 17.3 times (520 ms) at the maximum. In an audio signal or the like, a fixed approximate waveform often continues to some extent, and in such a case, recording of similar data can be omitted. Also,
When the audio level is low, the audio block can be replaced with the previous audio block.

【００２１】更に、再生時間の短縮等を図る場合には、
設定により、再生時、モードデータに示されたリピート
回数より少ない回数だけ記録したオーディオブロックを
再生することとしてもよい。In order to shorten the reproduction time, etc.
Depending on the setting, at the time of reproduction, an audio block recorded less than the number of repeats indicated in the mode data may be reproduced.

【００２２】２５６ビットの１フレームでは、３つのオ
ーディオブロックＡＢ０〜ＡＢ２に２４０ビットが割り
当てられ、これにモードデータとして１５ビットが割り
当てられ、残りの１ビットには、サブコードに割り当て
られている。サブコードは、１スーパーフレーム分まと
められて６４ビットで一定の情報を表す。よって、１ス
ーパーフレームが音声記録の一つの単位を成している
が、スーパーフレームは、このサブコードにのみ関係し
たものであり、音声データの記録自体はフレーム単位で
完結している。In one frame of 256 bits, 240 bits are allocated to three audio blocks AB0 to AB2, 15 bits are allocated as mode data, and the remaining 1 bit is allocated to a subcode. The subcode is composed of one superframe and represents constant information in 64 bits. Therefore, one superframe forms one unit of audio recording, but the superframe is related only to this subcode, and the recording of audio data itself is completed in frame units.

【００２３】１スーパーフレーム毎のサブコードでは、
音声記録の日時や、ヘッダ若しくはトラック番号等に相
当するフレーズ番号などを記録することが可能である。
また１スーパーフレームを構成するフレーム数について
も各フレームと同様に２ⁿに設定することで（但し、フ
レームとスーパーフレームとでｎを一致させる必要はな
い）、再生時にサブコードだけ読み出し、希望のスーパ
ーフレームを頭出しすることも容易となる。In the subcode for each superframe,
It is possible to record the date and time of audio recording, a phrase number corresponding to a header or track number, and the like.
Also, by setting the number of frames constituting one super frame to 2 ⁿ similarly to each frame (however, it is not necessary to make n coincide between the frame and the super frame), only the subcode is read at the time of reproduction, and the desired It is easy to find the super frame.

【００２４】以上のような方式によって音声信号を符号
化することで、本実施形態ではビットレート８．５Ｋｂ
ｐｓ以下の高効率な符号化が可能となる。例えば、３２
Ｍビットのメモリに３９３２秒（６５分３２秒）以上の
音声が記録可能となることから、半導体メモリを用いた
ＩＣ音声記録や、フロッピーディスクへの音声記録など
に最適である。By encoding the audio signal according to the above-described method, the bit rate is 8.5 Kb in this embodiment.
Highly efficient encoding of ps or less can be performed. For example, 32
Since an audio of 3932 seconds (65 minutes 32 seconds) or more can be recorded in an M-bit memory, it is most suitable for IC audio recording using a semiconductor memory and audio recording on a floppy disk.

【００２５】具体的な符号化方式及び装置構成について
は、以下に説明する。The specific encoding method and device configuration will be described below.

【００２６】［ＭＤＣＴ］図２は、ＭＤＣＴ方式による
変換を概念的に示している。まず、アナログ音声信号を
１０ＫＨｚでサンプリングする。ＭＤＣＴ方式では、こ
れをデジタル変換して得られた１００μｓ毎のデジタル
データを１０ｍｓの期間毎に分割し、各期間の信号をＦ
ＦＴによって周波数成分信号に変換する。実際には、連
続する期間のデジタルデータの連続性を確保するため、
２０ｍｓ期間の音声信号をＦＦＴし、隣接する各期間で
互いに５０％オーバラップするように設定し、１０ｍｓ
毎のオーディオブロックデータを得る。次式（１）は、
コサイン波のウインドの係数Ｗ（ｍ）を示している。こ
のウインドの係数を各期間の原信号ｘ（ｍ）に乗算し、
この原信号ｘ（ｍ）をＦＦＴ分析することで、次式
（２）に示される微細バンドについて、それぞれスペク
トラムＺ（ｋ）を得る。[MDCT] FIG. 2 conceptually shows conversion by the MDCT method. First, an analog audio signal is sampled at 10 KHz. In the MDCT method, digital data of 100 μs obtained by converting the digital data is divided every 10 ms, and the signal of each period is
It is converted into a frequency component signal by FT. In practice, to ensure the continuity of digital data for successive periods,
FFT is performed on the audio signal for a period of 20 ms, and the signal is set so as to overlap each other by 50% in each of the adjacent periods.
Obtain audio block data for each audio block. The following equation (1)
The window coefficient W (m) of the cosine wave is shown. This window coefficient is multiplied by the original signal x (m) of each period,
By performing an FFT analysis on the original signal x (m), a spectrum Z (k) is obtained for each of the fine bands represented by the following equation (2).

【００２７】[0027]

【数１】 (Equation 1)

【数２】なお、式（１）（２）において、Ｍはサンプリング数
（本実施形態では２００個）であり、ｋは微細バンド番
号（１から１００、但し１から７９までに省略可能）で
ある。(Equation 2) In the expressions (1) and (2), M is the number of samplings (200 in this embodiment), and k is a fine band number (1 to 100, but can be omitted from 1 to 79).

【００２８】また、本実施形態では、ＭＤＣＴにおい
て、２０ｍｓ毎の音声信号の２００ポイントのデータｘ
（ｍ）から上式（１）及び（２）によって演算すること
により、コサイン波による窓かけ及びＦＦＴを同時に実
行している。また、演算は１６ビットの精度にて精密に
行っている。In this embodiment, 200 points of data x of the audio signal every 20 ms in the MDCT.
By calculating from (m) by the above equations (1) and (2), windowing with a cosine wave and FFT are performed simultaneously. The calculation is performed precisely with 16-bit precision.

【００２９】［バンドの結束］上述のようなＭＤＣＴに
より、各オーディオブロックについて５０Ｈｚ間隔でＤ
Ｃ〜５ＫＨｚ付近までの１００個の微細バンドの位相と
振幅のデータが得られる。ここで、音速が３４０ｍ／ｓ
と比較的遅いことを考えると、１ＫＨｚの波長は３４ｃ
ｍであり、人の頭の直径と同程度の値に相当する。この
ため、周波数１ＫＨｚ程度以上のバンドでは、その位相
情報は重要ではない。よって、１ＫＨｚ以上のバンドに
ついて、位相を表現するのに必要な仮数の記録を省略す
る。また、もともと高域側では、近接するバンド間で最
も大きな強度を示す周波数成分しか聞こえなくなるとい
う、いわゆるマスキング効果が強い。よって、高域で
は、近接する２〜１２バンドを束ねてもそれによる音質
の低下はわずかであり、複数のバンドを束ね、その強度
の平均レベルで表している。[Bundling of Bands] By the above-described MDCT, D is applied to each audio block at intervals of 50 Hz.
Data on the phase and amplitude of 100 fine bands from C to around 5 KHz can be obtained. Here, the sound speed is 340 m / s
Considering that it is relatively slow, the wavelength of 1 KHz is 34c
m, which is equivalent to a value similar to the diameter of a human head. For this reason, the phase information is not important in a frequency band of about 1 KHz or more. Therefore, the recording of the mantissa necessary for expressing the phase for the band of 1 KHz or more is omitted. Also, originally, on the high frequency side, a so-called masking effect in which only frequency components exhibiting the highest intensity between adjacent bands can be heard. Therefore, in the high frequency range, even if the adjacent 2 to 12 bands are bundled, the deterioration of the sound quality is slight, and a plurality of bands are bundled and expressed by the average level of the intensity.

【００３０】図３は、１００個の微細バンドを圧縮して
得られる全部で２５バンド＃のバンド番号と担当周波数
帯とを示している。２５Ｈｚ以下の情報は全体のパフォ
ーマンスを考慮し、図３に示されているように省略して
おり、７５Ｈｚ〜３９７５Ｈｚまでの計７８個分の微細
バンドを２５バンド分の強度信号に圧縮する。バンド番
号０〜１５（７５Ｈｚ〜８２５Ｈｚ）までは、全て微細
バンドのまま記録し、バンド番号１６から１８は連続す
る３バンド分をそれぞれ結束し、バンド番号１９〜２１
では、連続する６バンドをそれぞれ結束する。結束して
も聴感上ほとんど影響を及ぼさないより高域側のバンド
番号２２〜２４については、連続する１２バンドを結束
して表している。このように、本符号化方式では、周波
数つまり音階に対応し、１オクターブ６バンドを目安と
して図３に示すような各バンド＃を設定している。FIG. 3 shows the band numbers of 25 bands # in total and the assigned frequency bands obtained by compressing 100 fine bands. The information of 25 Hz or less is omitted as shown in FIG. 3 in consideration of the overall performance, and a total of 78 fine bands from 75 Hz to 3975 Hz are compressed into 25 band intensity signals. Band numbers 0 to 15 (75 Hz to 825 Hz) are all recorded as fine bands, and band numbers 16 to 18 bind three continuous bands, respectively, and band numbers 19 to 21.
Then, six consecutive bands are bound. The band numbers 22 to 24 on the higher frequency side, which hardly affect the audibility even if they are bound, are represented by binding 12 consecutive bands. As described above, in the present encoding scheme, each band # as shown in FIG. 3 is set corresponding to the frequency, that is, the musical scale, with six bands per octave as a guide.

【００３１】［スペクトラム強度信号の指数化］本実施
形態の符号化方式では、上記２５の各バンド＃における
スペクトラム強度信号を指数ｂと仮数ａを用いて表現す
る。図４は、各バンドのスペクトラム強度を指数で表し
たものである。また、データ量をより低減するために、
バンド番号０の７５Ｈｚバンド＃については、バンドの
スペクトラム強度の指数そのものを４ビットで表して、
これをイニシャルデータとする。スペクトラム強度の指
数と４ビットのイニシャルデータとは、図５に示すよう
な対応関係を有し、各バンドの強度の指数ｂ（０〜１
５）の値に応じて対応するイニシャルデータを決める。[Exponentialization of Spectrum Intensity Signal] In the encoding method of the present embodiment, the spectrum intensity signal in each of the 25 bands # is expressed using an index b and a mantissa a. FIG. 4 shows the spectrum intensity of each band as an index. Also, to further reduce the amount of data,
For a 75 Hz band # with band number 0, the index itself of the spectrum intensity of the band is represented by 4 bits,
This is used as initial data. The index of the spectrum intensity and the 4-bit initial data have a correspondence as shown in FIG. 5, and the index b (0 to 1) of the intensity of each band.
The corresponding initial data is determined according to the value of 5).

【００３２】また、残りの２４バンド＃については、低
域側から隣接低域バンドとの差分を演算して符号化し、
これを記録する。図６は、差分データである指数の変化
値と、差分指数コードとの対応関係を示しており、−２
〜＋２の間の指数変化量を指数コードによって０〜４ま
での５値で表現する。なお、この際、誤差が蓄積しない
ように、デコーダを用いてより正確な値が算出できるよ
うに符号化することが好適である。バンド番号１６以上
の結束バンド＃においても、各結束バンド＃の平均値の
指数が示され、広帯域バンド間での差分が求められる。The remaining 24 bands # are coded by calculating the difference from the adjacent low band from the low band side.
Record this. FIG. 6 shows the correspondence between the change value of the index, which is the difference data, and the difference index code.
The amount of change in exponent between +2 is represented by five values from 0 to 4 using an exponent code. At this time, it is preferable to perform encoding so that a more accurate value can be calculated using a decoder so that errors are not accumulated. Also for the binding bands # of band number 16 or more, the index of the average value of each binding band # is shown, and the difference between the wide band bands is obtained.

【００３３】バンド番号１〜２４の２４バンド＃につい
ての各差分データは、それぞれ図６の差分指数データで
示され、３バンド＃ずつまとめ、次式（３）に基づいて
この３バンド＃を７ビットで表現する。図７は、３バン
ド＃分の差分データの記録コードと、各バンド＃につい
ての５値の差分指数コードＥ０、Ｅ１、Ｅ２との対応関
係を示している。Each difference data for 24 bands # of band numbers 1 to 24 is indicated by difference index data in FIG. 6, and is grouped into three bands #, and these three bands # are divided into seven based on the following equation (3). Expressed in bits. FIG. 7 shows the correspondence between the recording codes of the difference data for the three bands # and the five-value difference index codes E0, E1, and E2 for each band #.

【００３４】[0034]

【数３】記録コード２⁷＝５⁰・Ｅ０＋５¹・Ｅ１＋５²・Ｅ２・・・（３）このように、差分データを５値とし、かつ３バンド＃を
まとめることにより、７ビット×（２４／３）、つまり
５６ビットで２４バンド分の差分データを表し、バンド
番号１〜２４については、１バンド＃当たり２．３３ビ
ットで表現することを可能としている。Equation 3] recording code ^{^{2 7 = 5 0 · E0 +}} 5 1 · E1 + 5 2 · E2 ··· (3) Thus, the difference data as the 5 values, and by putting together the three bands #, 7-bit × (24 / 3), that is, 56 bits represent differential data for 24 bands, and band numbers 1 to 24 can be represented by 2.33 bits per band #.

【００３５】仮数ａは、位相を表しており、ｂで表現さ
れた指数に対し、その係数が「＋１」、「０」、「−
１」であるか、又は「＋１」、「−１」であるかを表し
ている。The mantissa a represents the phase, and its coefficient is “+1”, “0”, “−” with respect to the exponent represented by b.
1 "or" +1 "or" -1 ".

【００３６】バンド番号０〜６までの低域微細バンドの
６バンド＃については、図８に示すように３値［−１、
０、＋１］の仮数に応じて［０、１、２］の仮数コード
を付す。この３値の仮数コードは、図９に示すように３
バンド＃ずつまとめ、各バンド＃の仮数コードＭ０、Ｍ
１、Ｍ２をまとめて５ビットで示す。これにより、音質
に与える影響の大きい低域６バンド＃については、計１
０ビット（５ビット×（６バンド＃／３））で３値の仮
数が示され、より正確な差分データが表現されることと
なる。また、低域６バンドに続く１０バンド＃（図３の
バンド番号６〜１５）については、仮数としてその極性
のみの１ビットを記録することとする。１ビットの極性
表示は、「１」で＋１を示し、「０」で−１を示すもの
とする。As shown in FIG. 8, for the six low band fine bands # 0 to # 6, three values [−1,
The mantissa code [0, 1, 2] is assigned according to the mantissa of [0, +1]. As shown in FIG. 9, the three-valued mantissa code is
Bands # are grouped together, and mantissa codes M0 and M of each band #
1 and M2 are collectively indicated by 5 bits. As a result, a total of 1 band for low band 6 band # that has a large effect on sound quality
A ternary mantissa is indicated by 0 bits (5 bits × (6 bands # / 3)), and more accurate difference data is represented. In addition, for 10 bands # (band numbers 6 to 15 in FIG. 3) subsequent to the 6 low bands, 1 bit of only the polarity is recorded as a mantissa. In the 1-bit polarity display, “1” indicates +1 and “0” indicates −1.

【００３７】バンド番号１６以上については、複数のバ
ンドが結束されているため、隣接するバンド＃との周波
数差が大きく、隣接するバンド＃間で波形合成作用も発
生しない。よって、これらのバンド＃では、仮数の記録
は省略し、記録する仮数は、（５ビット×２）＋（１ビ
ット×１０バンド＃）で計２０ビットとする。For a band number 16 or higher, since a plurality of bands are bound, the frequency difference between adjacent bands # is large, and no waveform synthesizing action occurs between adjacent bands #. Therefore, in these bands #, the recording of the mantissa is omitted, and the mantissa to be recorded is (5 bits × 2) + (1 bit × 10 band #), for a total of 20 bits.

【００３８】以上の符号化方式により、最低域バンドを
示すイニシャル４ビットと、２４バンド＃分の差分指数
と、２０ビットの仮数データが得られ、これらの計８０
ビットでの１オーディオブロックを構成する。According to the above-described coding method, initial 4 bits indicating the lowest band, a difference index for 24 bands #, and mantissa data of 20 bits are obtained.
One audio block is composed of bits.

【００３９】［オーディオブロックのリピート］音声信
号は、１０ｍｓ又は２０ｍｓ単位で考えた場合、同一波
形が継続していることが多く、オーディオブロックを複
数回リピート再生しても、再生音質の低下が少ない。そ
こで、本実施形態では、所定のオーディオブロックの記
録を省略し、再生時に記録したオーディオブロックをリ
ピート再生する。また、リピート回数をモードデータと
してオーディオブロックと共に記録することで、同一又
は類似した内容、或いは強度の指数レベルの低いオーデ
ィオブロックデータの記録を省略して、記録データ量の
低減を図り、記録再生時間を延長することができる。[Repeat of Audio Block] When considered in units of 10 ms or 20 ms, the audio signal often has the same waveform, and even if the audio block is repeatedly reproduced a plurality of times, there is little deterioration in the reproduced sound quality. . Therefore, in the present embodiment, recording of a predetermined audio block is omitted, and the audio block recorded at the time of reproduction is repeatedly reproduced. In addition, by recording the number of repeats together with the audio block as mode data, recording of audio block data having the same or similar content or a low exponent level of the intensity is omitted, the recording data amount is reduced, and the recording / reproduction time is reduced. Can be extended.

【００４０】図１０は、１フレーム中で１５ビットが割
り当てられたモードデータの構成を示している。本実施
形態では、オーディオブロックデータＡＢを単独で再利
用する場合と、連続する２オーディオブロックを再利用
する場合及びこれらの組み合わせる場合を考慮して再利
用回数（リピート回数）を決めている。図１０に示すよ
うに、「ＡＢ０」、「ＡＢ０〜ＡＢ１」、「ＡＢ１」、
「ＡＢ１〜ＡＢ２」、「ＡＢ２」のそれぞれのリピート
回数を３ビットで指定する。これにより、２５６ビット
の１フレームデータで最長０．５２秒に相当する音声の
記録再生が可能となる。FIG. 10 shows the structure of mode data to which 15 bits are assigned in one frame. In the present embodiment, the number of reuses (the number of repeats) is determined in consideration of the case where the audio block data AB is reused alone, the case where two continuous audio blocks are reused, and the case where they are combined. As shown in FIG. 10, "AB0", "AB0 to AB1", "AB1",
The number of repeats of each of “AB1 to AB2” and “AB2” is specified by 3 bits. As a result, it is possible to record and reproduce sound corresponding to a maximum of 0.52 seconds in one frame data of 256 bits.

【００４１】また、上記オーディオブロックの再利用
は、連続オーディオブロック間で全ビットが一致してい
る場合に実行でき、その他、所定の類似関係がみられる
場合や、バンドの強度の指数ピークが一定以下の場合な
どにおいて実行することができる。The reuse of the audio block can be performed when all the bits match between the continuous audio blocks. In addition, when a predetermined similar relationship is observed, or when the index peak of the band intensity is constant. It can be executed in the following cases.

【００４２】本実施形態では、オーディオブロックを再
利用する場合の条件として、以下（ｉ）〜（iii）を想
定し、いずれの条件を採用するか、そして、その際の具
体的な判断基準について記録時に外部より任意に設定可
能としている。In the present embodiment, the following conditions (i) to (iii) are assumed as conditions for reusing an audio block, which condition is to be adopted, and a specific judgment criterion in that case. It can be set arbitrarily from outside during recording.

【００４３】（ｉ）連続するオーディオブロック間で全
ビットが一致；（ii）連続するオーディオブロック間で低域バンド＃か
ら所定バンド＃までの指数と仮数が一致；（iii）オーディオブロックの指数のピークが一定値以
下：図１１は、これらのオーディオブロックの具体的な
再使用条件を示し、また、各条件に対応づけられた３ビ
ットのリピートモードを示す。このリピートモードは、
サブコードに記録することで、ユーザが再生時や次の記
録時などにおいてこれを参照でき、リピートモードと音
質の関係などを知ることが可能となる。(I) All bits match between successive audio blocks; (ii) Index and mantissa from low band # to predetermined band # match between successive audio blocks; (iii) Index of audio block Peak below a certain value: FIG. 11 shows specific reuse conditions of these audio blocks, and also shows a 3-bit repeat mode associated with each condition. This repeat mode is
By recording in the subcode, the user can refer to this at the time of reproduction or the next recording, and can know the relationship between the repeat mode and the sound quality.

【００４４】再利用の条件として、リピートモード「０
００」に対応する「オーディオブロックの全ビットが一
致」を設定すれば、全く再生音質を劣化させることな
く、連続する同一オーディオブロックの重複記録をなく
すことができる。As a condition for reuse, the repeat mode “0”
By setting “all bits of the audio block match” corresponding to “00”, it is possible to eliminate the duplicate recording of the same continuous audio block without deteriorating the reproduction sound quality at all.

【００４５】また、実際には、オーディオブロック間の
データの相違程度が小さい場合には、一方のオーディオ
ブロックを用いて再生しても音質の低下は少ない。そこ
で、隣接オーディオブロックデータの類似の判断基準を
選択可能とし、良好な音質を確保したい場合には類似の
基準を高くし、音質を多少犠牲にしても記録時間を確保
したい場合には類似の基準を下げるように設定可能とす
る。高域バンド＃側であるほど、音質に与える影響が小
さくなることから、類似の基準は図１１のリピートモー
ド「００１」〜「１１１」に対応付けられた７段階とす
る（完全一致「０００」も含めると全部で８段階とな
る）。このように設定された段階に応じて低域バンド＃
０から２３〜１１バンド＃のいずれかまでの指数と仮数
の一致をみてオーディオブロックを再利用するかどうか
を決定する。より高域側のバンド＃までの一致を条件と
すれば音質の低下を防止でき、ある程度のバンド＃まで
の一致で類似と判定することにすれば、省略するオーデ
ィオブロック数が増加することから記録時間を実質的に
延長することが可能となる。In practice, when the difference in data between audio blocks is small, even if the reproduction is performed using one of the audio blocks, the sound quality is hardly reduced. Therefore, a similar criterion for adjacent audio block data can be selected, and a similar criterion is set higher when a good sound quality is desired. Can be set to lower. Since the effect on the sound quality decreases as the frequency band is closer to the higher band # side, the similar criterion is seven levels corresponding to the repeat modes “001” to “111” in FIG. 11 (exact match “000”). If you include this, there will be a total of eight stages.) According to the stage set in this way, low band #
Whether the audio block is reused is determined by checking the match between the exponent and the mantissa of any of the bands # 0 to 23-11. Deterioration of the sound quality can be prevented if a condition up to band # on the higher frequency side is used as a condition, and if similarity is determined by matching up to a certain band #, the number of audio blocks to be omitted increases, so recording is performed. The time can be substantially extended.

【００４６】オーディオブロック間のデータの類似の基
準の設定は、音声記録時だけでなく、再生時にも設定可
能とする。再生時には、上述のモードデータが示すリピ
ート回数で忠実に再現することが基本であるが、例え
ば、記録された音声の話し手の話の速度がゆっくりな場
合、無音区間が多くなり、リピートが多発する。従っ
て、このような場合、再生時に基準を設定して、オーデ
ィオブロックのリピート再生回数を低減することで、音
程が変化することなく話速が速くなるので、聞き難くな
ることなく再生時間を短縮できる。The similar reference of data between audio blocks can be set not only at the time of audio recording but also at the time of reproduction. At the time of reproduction, it is fundamental to reproduce faithfully with the number of repeats indicated by the above-described mode data. . Therefore, in such a case, by setting a reference at the time of reproduction and reducing the number of times of repeat reproduction of the audio block, the speech speed is increased without changing the pitch, so that the reproduction time can be reduced without being difficult to hear. .

【００４７】記録したオーディオブロックの指数のピー
クレベルを参照して、ピークレベルが低ければ、モード
データで指定されたリピート回数よりも実際のリピート
回数を低減する。これにより再生時間を短縮することが
できる。ピークレベルの基準を低く設定すれば、再生音
質の低下は極めて少なく、またピークレベルの基準を高
めに設定すれば多少音質が低下するが再生時間を短縮で
きる。例えば、会議・講演会などについての音声記録の
再生の場合には、発言者や講演者が話していない不要な
期間などを省略し、再生時間を短縮することが要求され
ることも多い。そこで、再生時にオーディオブロックの
指数のピークレベルの基準値を設定することとすれば、
例えば、話し手の話がとぎれて背景雑音などが続くよう
な場合に、その背景雑音に相当するオーディオブロック
データの再生を省略して、再生音質の低下を防ぎながら
再生時間をより短くすることができる。Referring to the peak level of the index of the recorded audio block, if the peak level is low, the actual number of repeats is reduced from the number of repeats specified by the mode data. Thereby, the reproduction time can be shortened. If the reference of the peak level is set low, the deterioration of the reproduced sound quality is extremely small, and if the reference of the peak level is set high, the sound quality is slightly lowered but the reproduction time can be shortened. For example, in the case of reproducing a voice recording of a conference or a lecture, it is often required to omit unnecessary periods during which the speaker or the speaker is not speaking and to shorten the reproduction time. Therefore, if the reference value of the peak level of the index of the audio block is set at the time of reproduction,
For example, when the speaker is interrupted and background noise continues, the reproduction of audio block data corresponding to the background noise can be omitted, and the reproduction time can be further shortened while preventing a decrease in reproduction sound quality. .

【００４８】ところで、上述のようにオーディオブロッ
クを再利用しながら音声記録する場合、入力される音声
信号の状態に依存して、記録時間が増減することとな
る。しかし、録音機は、一般に、音声録音終了までの期
間を想定して録音することが多く、記録可能な時間が明
確であることが望ましい。従って、上記のようなオーデ
ィオブロックの再利用の機能は、入力音声の類似度や強
度等によって記録可能な時間が変化するので、記録時間
の確定という観点では使いづらい場合もある。When recording audio while reusing audio blocks as described above, the recording time increases or decreases depending on the state of the input audio signal. However, in many cases, a recording device generally performs recording while assuming a period until the end of voice recording, and it is desirable that a recordable time be clear. Therefore, the function of reusing the audio block as described above changes the recordable time depending on the similarity, strength, and the like of the input voice, and thus it is sometimes difficult to use the function from the viewpoint of determining the recording time.

【００４９】そこで、半導体メモリなどの記録装置の記
録容量消費量に応じてオーディオブロック再利用の判断
基準値を制御する。つまり、メモリの消費量をモニター
し、上記記録容量消費量と予定録音期間によって決まる
予定消費量とを順次比較し、類似の基準や指数レベルの
基準などの基準値を変更することで、記録時間の調整を
可能とする。記録容量の消費量が多ければ類似基準や指
数レベル基準を緩和して、オーディオブロックのリピー
ト回数を増大させ、消費量が少なければ、上記基準を高
く設定してより高い記録音質を確保するように自動でコ
ントロールする。これにより、希望の時間内に高い音質
で音声記録をすることが可能となり音声記録装置として
の使い易さが向上する。Therefore, the criterion value for audio block reuse is controlled according to the recording capacity consumption of a recording device such as a semiconductor memory. That is, the memory consumption is monitored, the recording capacity consumption is sequentially compared with the planned consumption determined by the planned recording period, and the recording time is changed by changing a reference value such as a similar reference or an exponent level reference. Can be adjusted. If the recording capacity consumption is large, relax the similarity and exponent level standards and increase the number of audio block repeats.If the consumption is small, set the above standard higher to ensure higher recording sound quality. Control automatically. As a result, it is possible to record sound with high sound quality within a desired time, and the usability as a sound recording device is improved.

【００５０】［サブコード］６４フレームで１スーパー
フレームを構成し、１フレーム内で微少数のビット（例
えば１ビット）が割り当てられたサブコードは、１スー
パーフレームで一つの情報表示単位とすることで、６４
ビットとなり、十分な情報表示能力を持つ。３２フレー
ムで１スーパーフレームを構成する場合には、各フレー
ムにサブコードとして２ビット割り当てることとする。[Subcode] A subcode in which one superframe is composed of 64 frames and a small number of bits (for example, one bit) is allocated in one frame is one information display unit in one superframe. And 64
It has enough information display capability. When one superframe is composed of 32 frames, two bits are allocated to each frame as a subcode.

【００５１】図１２は、６４ビットのサブコードの構造
を示し、先頭の１２ビットはフレーズ番号、続く３ビッ
トは上述のリピートモード、更に３３ビットの記録の日
時、最後の１６ビットが誤り検出用ＣＲＣとする。記録
の日時は、それぞれバイナリで表現されて記録され、年
は、西暦の下位２桁を７ビットで示す。なお、秒につい
ては、１スーパーフレーム毎に記録されることから、各
スーパーフレームでは、１〜３０秒程度の間隔ごとに進
むこととなる。フレーズ番号データは、頭出しのための
トラックマークなどとも表されるマーキングとして用い
る。このフレーズ番号は、連続番号を順次、自動的或い
は任意に付すことができ、１Ｍｂｉｔ当たり６４フレー
ズが最大密度となる。大容量のメモリに符号化データを
記録すれば付されるフレーズ数も増加するが１２ビット
が割り当てられ、４０００程度の番号を示すことができ
るので、十分な情報表示能力を備えている。FIG. 12 shows the structure of a 64-bit subcode. The first 12 bits are the phrase number, the next 3 bits are the above-described repeat mode, the recording date and time of 33 bits, and the last 16 bits are for error detection. CRC. The date and time of recording are recorded in binary format, and the year is indicated by the lower two digits of the Christian era using 7 bits. It should be noted that since the second is recorded every superframe, the superframe advances at intervals of about 1 to 30 seconds in each superframe. The phrase number data is used as a marking also represented as a track mark for cueing. The phrase numbers can be assigned consecutive numbers automatically or arbitrarily, and the maximum density is 64 phrases per Mbit. If encoded data is recorded in a large-capacity memory, the number of phrases to be added increases. However, since 12 bits are allocated and a number of about 4000 can be indicated, a sufficient information display capability is provided.

【００５２】［符号化装置の構成］図１３は、上述の符
号化処理を実行する音声符号化処理装置の概略構成を示
している。入力されるアナログ音声信号からは、予め不
要な低域及び高域信号を除去され、アナログ／デジタル
（Ａ／Ｄ）変換部１０に供給される。Ａ／Ｄ変換部１０
は、供給されたアナログ音声信号を１０ＫＨｚのサンプ
リング周波数Ｆｓでサンプリングして、１２ビットのデ
ジタルデータに変換する。ＭＤＣＴ処理部１２は、Ａ／
Ｄ変換部１０からの１２ビットデジタルデータに対して
ＭＤＣＴの所定の係数を乗算し、１０ｍｓ毎の計１００
個の微細バンド毎の強度信号のレベルと位相を検出す
る。検出された各微細バンドのうちバンド番号１６以上
の高域バンドは、３〜１２バンド毎に、各バンドのレベ
ルの絶対値の平均値を求めて結束し、結束バンドとし
て、計２５のバンド＃に圧縮する。[Structure of Encoding Apparatus] FIG. 13 shows a schematic structure of a speech encoding processing apparatus for executing the above-described encoding processing. Unnecessary low-frequency and high-frequency signals are removed from the input analog audio signal in advance and supplied to the analog / digital (A / D) conversion unit 10. A / D converter 10
Samples the supplied analog audio signal at a sampling frequency Fs of 10 KHz and converts it into 12-bit digital data. The MDCT processing unit 12
The 12-bit digital data from the D conversion unit 10 is multiplied by a predetermined coefficient of MDCT, and a total of 100
The level and phase of the intensity signal for each of the fine bands are detected. Among the detected fine bands, the high-frequency bands having a band number of 16 or more are banded for every 3 to 12 bands by calculating the average value of the absolute value of the level of each band. Compress to

【００５３】指数化処理部１４は、得られた２５バンド
＃の強度信号のレベルを指数、差分指数及び仮数に符号
化する。The exponent processing unit 14 encodes the level of the obtained 25-band # intensity signal into an exponent, a difference exponent, and a mantissa.

【００５４】ＡＢＲ作成部１６は、得られた各指数、差
分指数及び仮数に基づいて、８０ビットの記録用オーデ
ィオブロックデータＡＢＲを作成し、合成部２２とモー
ド決定部１８にそれぞれ出力する。モード決定部１８
は、外部から設定されたオーディオブロックの再利用条
件に基づき、オーディオブロック間の類似度や指数のピ
ークレベルからオーディオブロックを再利用するかどう
かを判断する。更に、この判断の結果から求まる所定の
オーディオブロックデータのリピート回数を１５ビット
のモードデータに示し、これを合成部２２に出力する。The ABR creating section 16 creates 80-bit recording audio block data ABR based on the obtained exponents, difference indices and mantissas, and outputs them to the synthesizing section 22 and the mode determining section 18, respectively. Mode determination unit 18
Determines whether to reuse the audio block based on the similarity between the audio blocks and the peak level of the exponent based on the audio block reuse condition set from outside. Further, the number of repeats of the predetermined audio block data obtained from the result of this determination is indicated as 15-bit mode data, which is output to the synthesizing unit 22.

【００５５】また、サブコード作成部２０は、記録時間
と、フレーズ番号が付された場合にはこのフレーズ番号
とをバイナリデータに変換して、サブコードを作成し、
合成部２２に出力する。The subcode generator 20 converts the recording time and, if a phrase number is assigned, the phrase number into binary data to generate a subcode.
Output to the combining unit 22.

【００５６】合成部２２は、サブコードと、モードデー
タ及びモードデータに従って３つのオーディオブロック
で２５６ビットの１フレームを作成し、ビットストリー
ムとして半導体メモリなどの記録装置２４に出力する。The synthesizing unit 22 creates one frame of 256 bits with three audio blocks according to the subcode, the mode data, and the mode data, and outputs it to a recording device 24 such as a semiconductor memory as a bit stream.

【００５７】[記録装置への記録]合成部２２からの出力
データは、フレーム単位でメモリに記録する。１フレー
ムは、上述のように２ⁿビットに設定することにより、
メモリのアドレスをみることでフレーム識別ができる。
図１４は、１Ｍｂｉｔメモリのアドレスとフレームとの
関係を示している。メモリアドレスは、図１４では下位
８ビットがフレーム内のビット番号に相当し、その上の
６ビットがフレーム番号を示し、更に上位の６ビットが
スーパーフレーム番号を示す。[Recording in Recording Device] Output data from the synthesizing unit 22 is recorded in the memory in frame units. By setting one frame to 2 ⁿ bits as described above,
The frame can be identified by looking at the address of the memory.
FIG. 14 shows the relationship between the address of the 1-Mbit memory and the frame. In FIG. 14, the lower 8 bits of the memory address correspond to the bit number in the frame, the upper 6 bits indicate the frame number, and the upper 6 bits indicate the superframe number.

【００５８】１フレームを２５６ビットとした場合に
は、アドレスの下位ｎ＝８ビットに各フレームデータを
格納する。つまり、各フレームの先頭ビットから最終ビ
ットまでをメモリのアドレス下位ｎビット「００・・・
００（オール０）」から「１１・・・１１（オール
１）」に順次記録する。これにより、アドレスのフレー
ム番号を指定すれば、直ちに対応する１フレームデータ
が識別でき、フレーム識別のための同期信号が不要とな
る。When one frame has 256 bits, each frame data is stored in the lower n = 8 bits of the address. That is, from the first bit to the last bit of each frame, the lower n bits of the memory address “00...
00 (all 0) ”to“ 11... 11 (all 1) ”. As a result, if the frame number of the address is specified, the corresponding one frame data can be immediately identified, and the synchronization signal for identifying the frame becomes unnecessary.

【００５９】［再生］記録装置２４に記録された音声デ
ータの再生に当たっては、デコーダが読み出された１フ
レームのオーディオブロック毎に２５バンド＃の指数及
び仮数を復号し、更に、各バンド＃の信号強度を用いて
逆ＤＣＴを行って、周波数軸上のデータを時間軸上のデ
ジタルデータに逆変換する。また、サブコードについて
は、ＣＲＣデータを用いて、誤りをチェックし、フレー
ズ番号、リピートモード、記録日時をデコードし、これ
を出力する。[Reproduction] In reproducing the audio data recorded in the recording device 24, the decoder decodes the exponent and the mantissa of the 25th band # for each read audio block of one frame, and further decodes each band # Inverse DCT is performed using the signal strength to inversely convert data on the frequency axis into digital data on the time axis. For the subcode, an error is checked using the CRC data, the phrase number, the repeat mode, and the recording date and time are decoded and output.

【００６０】また、上述のように、再生時にオーディオ
ブロックの再利用条件が設定された場合には、デコーダ
が、設定に基づいてオーディオブロックのリピート再生
回数を決定して、オーディオブロックを再利用する。As described above, when the audio block reuse condition is set at the time of reproduction, the decoder determines the number of repeat reproductions of the audio block based on the setting, and reuses the audio block. .

【００６１】[0061]

【発明の効果】本発明では、音声信号を多数のバンドに
分割し、各バンドを指数及び仮数で表して符号化する場
合に、音質に与える影響の少ない高域側の複数のバンド
を結束バンドとして、その結束バンドの仮数の記録を省
略する。これにより、音質を低下させることなく記憶デ
ータ量を削減でき、符号化の高効率が実現される。According to the present invention, when an audio signal is divided into a large number of bands and each band is represented by an exponent and a mantissa and encoded, a plurality of bands on the high frequency side which have little effect on the sound quality are bound. The recording of the mantissa of the binding band is omitted. As a result, the amount of stored data can be reduced without deteriorating sound quality, and high coding efficiency is realized.

【００６２】また、結束バンドを各バンドの絶対値の平
均値で表すことで、音質を低下させることなく更に記憶
データ量を削減することが可能となる。Further, by expressing the binding band by the average value of the absolute values of the respective bands, it is possible to further reduce the amount of stored data without deteriorating the sound quality.

[Brief description of the drawings]

【図１】本発明の符号化方式におけるデータ構造を示
す図である。FIG. 1 is a diagram showing a data structure in an encoding method of the present invention.

【図２】本発明のＭＤＣＴ処理を概念的に示す図であ
る。FIG. 2 is a diagram conceptually showing an MDCT process of the present invention.

【図３】本発明の微細バンドを一部結束して圧縮して
得られる２５バンド＃と各バンド＃が担当する周波数範
囲を示す図である。FIG. 3 is a diagram showing 25 bands # obtained by partially binding and compressing the fine bands of the present invention, and a frequency range assigned to each band #.

【図４】本発明のＭＤＣＴで得られる各バンドのスペ
クトラム強度を指数で表した図である。FIG. 4 is a diagram showing the spectrum intensity of each band obtained by the MDCT of the present invention as an index.

【図５】本発明のスペクトラム強度の指数とイニシャ
ルデータとの対応関係を示す図である。FIG. 5 is a diagram showing a correspondence between an index of spectrum intensity and initial data according to the present invention.

【図６】本発明の指数の差分と差分指数コードとの対
応を示す図である。FIG. 6 is a diagram showing a correspondence between an index difference and a difference index code according to the present invention.

【図７】本発明の３バンド＃分の差分指数コードを表
す７ビットの記録コードと各バンド＃の差分指数コード
との対応関係を示す図である。FIG. 7 is a diagram showing a correspondence relationship between a 7-bit recording code representing a difference exponent code for three bands # of the present invention and a difference exponent code of each band #.

【図８】本発明の仮数と仮数コードとの対応関係を示
す図である。FIG. 8 is a diagram showing the correspondence between mantissas and mantissa codes according to the present invention.

【図９】本発明の３バンド＃分の仮数コードを表す５
ビットの記録コードとの対応関係を示す図である。FIG. 9 shows 5 representing a mantissa code for three bands # of the present invention.
FIG. 4 is a diagram showing a correspondence relationship between a bit and a recording code.

【図１０】本発明のモードデータの構成を示す図であ
る。FIG. 10 is a diagram showing a configuration of mode data of the present invention.

【図１１】本発明のオーディオブロックの再利用条件
及びこの条件と対応する３ビットのリピートモードを示
す図である。FIG. 11 is a diagram showing a reuse condition of an audio block and a 3-bit repeat mode corresponding to the condition according to the present invention.

【図１２】本発明の６４ビットのサブコードの構造を
示す図である。FIG. 12 is a diagram showing the structure of a 64-bit subcode according to the present invention.

【図１３】本発明の音声符号化処理装置の構成を示す
図である。FIG. 13 is a diagram showing a configuration of a speech encoding processing device of the present invention.

【図１４】本発明の符号化音声データを記憶するメモ
リのアドレスを示す図である。FIG. 14 is a diagram showing addresses of a memory for storing encoded audio data of the present invention.

[Explanation of symbols]

１０Ａ／Ｄ変換部、１２ＭＤＣＴ処理部、１４指
数化処理部、１６ＡＢＲ作成部、１８モード決定
部、２０サブコード作成部、２２合成部、２４記
録装置。10 A / D conversion unit, 12 MDCT processing unit, 14 exponentiation processing unit, 16 ABR creation unit, 18 mode determination unit, 20 subcode creation unit, 22 synthesis unit, 24 recording device.

Claims

[Claims]

An encoding device used for digitally storing an audio signal, wherein the audio signal is divided into a number of frequency bands, and the level of each frequency band is represented by an exponent and a mantissa, and a constant frequency is used. Regarding the frequency band of the above high frequency signal,
An audio signal encoding apparatus, wherein a plurality of frequency bands are collectively treated as one binding band, and recording of a mantissa for the binding band is omitted.

2. The audio signal encoding apparatus according to claim 1, wherein, for the binding band, an absolute value of a level of each frequency band to be bound is averaged to obtain a level of the binding band. Encoding device.