JP3535008B2

JP3535008B2 - Audio processing method and apparatus

Info

Publication number: JP3535008B2
Application number: JP09540798A
Authority: JP
Inventors: 誠司佐々木
Original assignee: Hitachi Kokusai Electric Inc; Kokusai Denki Electric Inc
Current assignee: Kokusai Denki Electric Inc
Priority date: 1998-03-24
Filing date: 1998-03-24
Publication date: 2004-06-07
Anticipated expiration: 2018-03-24
Also published as: JPH11272295A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声信号を符号化
・復号化する音声処理技術に関し、特に、音声符号化器
から出力される音声符号化情報ビット列の出力レート
と、音声バッファへ入力される音声サンプルの蓄積レー
トとの不整合を調整し、又は、音声復号器へ入力される
音声符号化情報ビット列の入力レートと、音声バッファ
から出力される再生音声サンプルの出力レートとの不整
合を調整する技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice processing technique for encoding / decoding a voice signal, and particularly to an output rate of a voice coded information bit string output from a voice encoder and a voice buffer. Adjust the mismatch between the audio sample storage rate and the input rate of the audio coded information bit string input to the audio decoder and the output rate of the reproduced audio sample output from the audio buffer. Regarding the technology to adjust.

【０００２】[0002]

【従来の技術】従来より、音声通信装置では、音声信号
の通信データ量を低減するために、送信側装置では音声
信号を符号化して無線或いは有線の伝送路を介して送信
し、受信側装置では当該符号化音声信号を受信して復号
化することが行われている。図１５（ａ）には従来の音
声符号化装置の構成一例を示し、同図（ｂ）には従来の
音声復号装置の構成の一例を示してある。2. Description of the Related Art Conventionally, in a voice communication apparatus, in order to reduce the communication data amount of a voice signal, a transmission side apparatus encodes a voice signal and transmits the encoded voice signal via a wireless or wired transmission path, and a reception side apparatus. Then, the coded audio signal is received and decoded. FIG. 15 (a) shows an example of the configuration of a conventional speech encoding device, and FIG. 15 (b) shows an example of the configuration of a conventional speech decoding device.

【０００３】この音声符号化装置では、例えば８ｋＨｚ
でサンプリングされ、１６ビットで量子化された音声サ
ンプルａ１をサンプリングクロックｋ１に同期して入力
音声バッファ１１に入力して一時的に蓄積し、入力音声
バッファ１１に蓄積された音声サンプルを１フレーム
（例えば、２０ｍｓとする）毎に例えば１６０サンプル
づつｂ１として音声符号化器１２に入力する。そして、
音声符号化器１２が例えば２４００ｂｐｓで入力音声サ
ンプルｂ１を符号化処理し、外部装置から入力されるフ
レーム同期信号（１フレーム区間の開始を示す信号）ｃ
１及びビットクロックｄ１に同期して、音声符号化情報
ビット列ｅ１を４８ビット／フレームで出力する。な
お、フレーム同期信号ｃ１とビットクロックｄ１は同期
している。In this speech coding apparatus, for example, 8 kHz
The audio sample a1 sampled in 16 bits and quantized with 16 bits is input to the input audio buffer 11 in synchronization with the sampling clock k1 and temporarily stored therein, and the audio sample stored in the input audio buffer 11 is stored in one frame ( For example, every 160 ms, for example, 160 samples are input to the speech encoder 12 as b1. And
The voice encoder 12 encodes the input voice sample b1 at, for example, 2400 bps, and a frame synchronization signal (a signal indicating the start of one frame section) c input from an external device.
The audio encoded information bit string e1 is output at 48 bits / frame in synchronization with 1 and the bit clock d1. The frame synchronization signal c1 and the bit clock d1 are synchronized.

【０００４】また、音声復号装置では、外部装置から入
力されるフレーム同期信号（１フレーム区間の開始を示
す信号）ｆ１及びビットクロック（例えば、２４００ｂ
ｐｓ）ｇ１に同期して、音声符号化情報ビット列ｈ１
（４８ビット／フレーム）を音声復号器１４に入力し
て、音声復号器１４が２４００ｂｐｓで入力された音声
符号化情報ビット列ｈ１を復号処理し、再生音声サンプ
ル列ｉ１（例えば、１６０サンプル／フレーム）を出力
する。なお、フレーム同期信号ｆ１とビットクロックｇ
１は同期している。そして、再生音声サンプル列ｉ１は
出力音声バッファ１３に入力され、出力音声バッファ１
３に蓄積された再生音声サンプルは、サンプリングクロ
ック（例えば、８ｋＨｚ）ｌ１に同期してｊ１として出
力される。Further, in the speech decoding apparatus, a frame synchronization signal (a signal indicating the start of one frame section) f1 and a bit clock (for example, 2400b) input from an external apparatus.
ps) g1 in synchronization with the audio coded information bit string h1
(48 bits / frame) is input to the audio decoder 14, the audio decoder 14 decodes the audio encoded information bit string h1 input at 2400 bps, and the reproduced audio sample string i1 (for example, 160 samples / frame). Is output. The frame synchronization signal f1 and the bit clock g
1 is synchronized. Then, the reproduced voice sample string i1 is input to the output voice buffer 13 and the output voice buffer 1
The reproduced voice sample accumulated in 3 is output as j1 in synchronization with the sampling clock (for example, 8 kHz) l1.

【０００５】図１６には音声符号化装置及び音声復号装
置の入出力タイミングチャートを示してある。同図
（ａ）はフレーム同期信号（ｃ１或いはｆ１）であり、
同期タイミングを示す信号レベル”Ｌ”の間隔が本例で
は２０ｍｓである。同図（ｂ）はビットクロック（ｄ１
或いはｇ１）であり、本例ではフレーム同期信号に同期
した２４００ｂｐｓである。同図（ｃ）は音声符号化情
報ビット列（ｅ１或いはｈ１）であり、音声符号化情報
ビット列はフレーム同期信号及びビットクロックに同期
して、本例では１フレーム（２０ｍｓ）当たり４８ビッ
トの速度で入出力される。同図（ｄ）はサンプリングク
ロック（ｋ１或いはｌ１）であり、本例では８ｋＨｚで
ある。同図（ｅ）はサンプリングクロックに同期して入
力される入力音声サンプル或いは出力される再生音声サ
ンプル（ａ１或いはｊ１）であり、本例では１フレーム
（２０ｍｓ）当たり１６０ビットの速度で入出力され
る。FIG. 16 shows an input / output timing chart of the speech coding apparatus and speech decoding apparatus. FIG. 7A shows a frame synchronization signal (c1 or f1),
The interval of the signal level "L" indicating the synchronization timing is 20 ms in this example. FIG. 2B shows a bit clock (d1
Alternatively, g1), which is 2400 bps synchronized with the frame synchronization signal in this example. FIG. 7C shows a voice coded information bit string (e1 or h1). The voice coded information bit string is synchronized with a frame synchronization signal and a bit clock, and in this example, at a rate of 48 bits per frame (20 ms). Input and output. FIG. 7D shows the sampling clock (k1 or 11), which is 8 kHz in this example. FIG. 7E shows an input audio sample input in synchronization with the sampling clock or a reproduced audio sample (a1 or j1) output. In this example, input / output is performed at a rate of 160 bits per frame (20 ms). It

【０００６】ここで、同図（ａ）のフレーム同期信号と
同図（ｂ）のビットクロックは同期しており、ビットク
ロック４８周期毎に１回フレーム同期信号が有効（”
Ｌ”）になる。なお、同図（ｅ）サンプリングクロック
は音声符号化装置または音声復号装置で発生しているた
め、外部装置から入力されるフレーム同期信号やビット
クロックとは同期していない。Here, the frame sync signal of FIG. 1A and the bit clock of FIG. 2B are synchronized, and the frame sync signal is valid once every 48 cycles of the bit clock ("
Since the sampling clock (e) in the figure is generated in the voice encoding device or the voice decoding device, it is not synchronized with the frame synchronization signal or the bit clock input from the external device.

【０００７】[0007]

【発明が解決しようとする課題】上記した音声符号化器
や音声復号器において、外部装置から入力されるビット
クロックの周波数に誤差が無く常に２４００ｂｐｓであ
れば、音声サンプル数（１６０サンプル／フレーム）と
音声符号化情報ビット数（４８ビット／フレーム）とを
常に設計通りに整合した状態に保つことができ、正常に
音声符号化処理や音声復号処理することができる。しか
しながら、外部装置から入力されるビットクロックの周
波数精度を高くするためにはかなりコストが増大し、ま
た、一般にビットクロックの周波数精度には所定量の誤
差（例えば±２．５％）が許容されていることから、ビ
ットクロックの周波数変動によって、正常に音声符号化
処理や音声復号再生処理をすることができなくなる場合
が生じていた。In the above speech encoder and speech decoder, if there is no error in the frequency of the bit clock input from the external device and it is always 2400 bps, the number of speech samples (160 samples / frame) And the number of bits of voice encoding information (48 bits / frame) can always be kept in a matched state as designed, and voice encoding processing and voice decoding processing can be normally performed. However, in order to improve the frequency accuracy of the bit clock input from the external device, the cost increases considerably, and generally, a certain amount of error (for example, ± 2.5%) is allowed in the frequency accuracy of the bit clock. Therefore, due to the frequency fluctuation of the bit clock, the voice encoding process and the voice decoding / reproducing process may not be normally performed.

【０００８】例えば、ビットクロックの周波数が±２．
５％変動すると、音声符号化情報ビット数（４８ビット
／フレーム）を基準にすると、フレーム当たり入出力さ
れる音声サンプル数は１６０±４サンプルの範囲で変動
する。このため、音声サンプル数（１６０サンプル／フ
レーム）と音声符号化情報ビット数（４８ビット／フレ
ーム）を常に設計値に保つことができなくなり、音声符
号化処理においては正常な入力音声サンプル数（１６０
サンプル／フレーム）に対し、入力音声バッファ１１に
蓄えられる１フレームの入力音声サンプル数が１６０±
４サンプルの範囲で変動するため、音声復号化器１２で
正常に音声符号化処理することができなくなり、また、
音声復号処理においても音声復号器１４から出力されて
出力音声バッファ１３に蓄えられる再生音声サンプル数
は１６０サンプル／フレームであるべきなのに対して、
出力音声バッファ１３へ出力される再生音声サンプル数
は１６０±４サンプル／フレームの範囲で変動するた
め、再生音声サンプルに基づいた正常な音声再生ができ
なくなっていた。For example, if the bit clock frequency is ± 2.
When it varies by 5%, the number of voice samples input / output per frame varies in the range of 160 ± 4 samples based on the number of encoded voice information bits (48 bits / frame). Therefore, the number of voice samples (160 samples / frame) and the number of voice coding information bits (48 bits / frame) cannot always be kept at the design values, and the normal number of input voice samples (160
Sample / frame), the number of input voice samples of one frame stored in the input voice buffer 11 is 160 ±
Since it fluctuates within the range of 4 samples, the voice decoding cannot be normally performed by the voice decoder 12, and
Also in the audio decoding process, the number of reproduced audio samples output from the audio decoder 14 and stored in the output audio buffer 13 should be 160 samples / frame.
Since the number of reproduced audio samples output to the output audio buffer 13 varies within the range of 160 ± 4 samples / frame, normal audio reproduction based on the reproduced audio samples cannot be performed.

【０００９】このような不具合に対して、入力バッファ
１１や出力バッファ１３に蓄えられる音声サンプル数を
間引いたり或いは重複して繰り返させたりして、所定の
サンプル数に調整することが考えられるが、単純な音声
サンプルの間引きや重複繰り返しにより所定の１６０サ
ンプルに調整しても、音声サンプル間の連続性が悪くな
り、この結果として再生音声の品質が劣下してしまうと
いう問題がある。なお、上記のような問題はビットクロ
ックを外部装置から与える代わりに、本装置内において
サンプリングクロックとは別の発振源を用いてビットク
ロックを発生させる場合にも、同様に起こり得ることで
ある。For such a problem, it is conceivable that the number of audio samples stored in the input buffer 11 or the output buffer 13 is thinned out or repeated so as to adjust to a predetermined number of samples. Even if adjustment is made to a predetermined 160 samples by simple thinning out of audio samples or repetition of repetition, there is a problem in that continuity between audio samples deteriorates, and as a result, the quality of reproduced audio deteriorates. Note that the above problem can occur similarly when the bit clock is generated by using an oscillation source different from the sampling clock in the present device instead of giving the bit clock from the external device.

【００１０】本発明は上記従来の事情に鑑みなされたも
ので、音声符号化処理を行うために音声バッファから出
力される音声サンプル数を、再生音声の品質を劣下させ
ることなく所定数に調整して、音声符号化処理を正常に
実現することができる音声処理方法及び装置を提供する
ことを目的とする。また、本発明は、音声再生を行うた
めに音声バッファから出力される再生音声サンプル数
を、再生音声の品質を劣下させることなく所定数に調整
して、音声復号再生処理を正常に実現することができる
音声処理方法及び装置を提供することを目的とする。ま
た、本発明は、上記の方法を実施する音声通信装置並び
に音声通信システムを提供することを目的とする。The present invention has been made in view of the above conventional circumstances, and adjusts the number of audio samples output from an audio buffer for performing audio encoding processing to a predetermined number without deteriorating the quality of reproduced audio. Then, it aims at providing the audio | voice processing method and apparatus which can implement | achieve the audio | voice encoding process normally. Further, according to the present invention, the number of reproduced audio samples output from the audio buffer for performing audio reproduction is adjusted to a predetermined number without deteriorating the quality of reproduced audio, and the audio decoding reproduction processing is normally realized. It is an object of the present invention to provide a voice processing method and device capable of performing the same. Another object of the present invention is to provide a voice communication device and a voice communication system for implementing the above method.

【００１１】[0011]

【課題を解決するための手段】本発明に係る音声処理方
法では、サンプリングクロックに同期して音声サンプル
を所定のフレーム毎に音声バッファに蓄積し、当該蓄積
された音声サンプルをフレーム毎に音声符号化器に入力
して、ビットクロックに同期してフレーム毎の音声符号
化情報ビット列として出力する音声符号化処理におい
て、次のようにして、音声符号化処理を行うために音声
バッファから出力される音声サンプル数を所定数に調整
する。In the audio processing method according to the present invention, an audio sample is accumulated in an audio buffer for each predetermined frame in synchronization with a sampling clock, and the accumulated audio sample is subjected to an audio code for each frame. In the audio encoding process of inputting to the coder and outputting as the audio encoded information bit string for each frame in synchronization with the bit clock, it is output from the audio buffer for performing the audio encoding process as follows. Adjust the number of audio samples to a predetermined number.

【００１２】すなわち、音声バッファに蓄積されるフレ
ーム毎の音声サンプルをオーバーサンプリングして、サ
ンプリングクロックに対してビットクロックが相対的に
減少変動した場合（サンプリングクロックとビットクロ
ックとのいずれか一方及び両者が変動した場合を含む）
には、サンプル点の間隔を長くして当該新たなサンプル
点の音声サンプルを元の音声サンプルから補間すること
により、音声符号化器に入力するフレーム毎の音声サン
プルを当該ビットクロックに適合する音声サンプル数に
変換する。また、サンプリングクロックに対してビット
クロックが相対的に増加変動した場合（同上）には、サ
ンプル点の間隔を短くして当該新たなサンプル点の音声
サンプルを元の音声サンプルから補間することにより、
音声符号化器に入力するフレーム毎の音声サンプルを当
該ビットクロックに適合する音声サンプル数に変換す
る。That is, when the audio samples for each frame accumulated in the audio buffer are over-sampled and the bit clock decreases and changes relative to the sampling clock (either one of the sampling clock and the bit clock or both of them). (Including the case where fluctuates)
Is performed by increasing the interval between the sample points and interpolating the voice sample at the new sample point from the original voice sample, so that the voice sample for each frame input to the voice coder is a voice that matches the bit clock. Convert to number of samples. Also, when the bit clock relatively increases and changes with respect to the sampling clock (same as above), by shortening the interval between sample points and interpolating the audio sample at the new sample point from the original audio sample,
The audio samples for each frame input to the audio encoder are converted into the number of audio samples that matches the bit clock.

【００１３】また、本発明に係る音声処理方法は、入力
された音声サンプルをフレーム毎（Ｐ個／フレーム）に
音声バッファに蓄えて、当該音声バッファに蓄えられた
音声サンプルを音声符号化器で音声符号化処理して、外
部装置から入力される或いは内部で発振されるビットク
ロックに同期して、音声符号化情報ビット列（Ｑビット
／フレーム）を出力する音声処理において、次のように
して、音声符号化処理を行うために音声バッファから出
力される音声サンプル数を所定数に調整する。なお、本
明細書に記すＰ、Ｑ、Ｍ、Ｎ、Ｌは正の整数である。Further, in the speech processing method according to the present invention, the inputted speech samples are stored in the speech buffer for each frame (P / frame), and the speech samples stored in the speech buffer are processed by the speech encoder. In the voice processing of performing voice encoding processing and outputting a voice encoded information bit string (Q bits / frame) in synchronization with a bit clock input from an external device or internally oscillated, The number of audio samples output from the audio buffer for performing the audio encoding process is adjusted to a predetermined number. In addition, P, Q, M, N, and L described in this specification are positive integers.

【００１４】すなわち、音声符号化情報ビットがＱビッ
ト出力される間に音声バッファに蓄えられる音声サンプ
ル数をカウントして、当該カウント値が（Ｐ＋（Ｍ＊
Ｎ））個である場合には、音声バッファに蓄えられた音
声サンプルをＬ倍にオーバーサンプリングして（Ｐ＋
（Ｍ＊Ｎ））＊Ｌ個の細分化したサンプル点とした後、
細分化サンプル点（Ｌ＋Ｍ）個毎に（Ｌ＊Ｎ）回隣接す
る元の音声サンプルから新たな音声サンプルを補間して
音声符号化器へ出力するとともに、細分化サンプル点Ｌ
個毎に（Ｐ−（Ｌ＊Ｎ））回音声サンプルを音声符号化
器へ出力することにより、合計Ｐ個の音声サンプルを音
声符号化器へ出力して当該音声符号化器からの出力レー
トに適合させる。また、当該カウント値が（Ｐ−（Ｍ＊
Ｎ））個である場合には、音声バッファに蓄えられた音
声サンプルをＬ倍にオーバーサンプリングして（Ｐ−
（Ｍ＊Ｎ））＊Ｌ個の細分化したサンプル点とした後、
細分化サンプル点（Ｌ−Ｍ）個毎に（Ｌ＊Ｎ）回隣接す
る元の音声サンプルから新たな音声サンプルを補間して
音声符号化器へ出力するとともに、細分化サンプル点Ｌ
個毎に（Ｐ−（Ｌ＊Ｍ））回音声サンプルを音声符号化
器へ出力することにより、合計Ｐ個の音声サンプルを音
声符号化器へ出力して当該音声符号化器からの出力レー
トに適合させる。That is, the number of audio samples stored in the audio buffer is counted while the audio coded information bits are output as Q bits, and the count value becomes (P + (M *
N)), the audio samples stored in the audio buffer are oversampled L times (P +
(M * N)) * L after making L subdivided sample points,
For each (L + M) subdivided sample points, a new voice sample is interpolated from the adjacent original voice sample (L * N) times and output to the voice encoder.
By outputting (P- (L * N)) times voice samples to the voice encoder for each number, a total of P voice samples are output to the voice encoder, and the output rate from the voice encoder is output. To fit. Further, the count value is (P- (M *
N)), the audio samples stored in the audio buffer are oversampled L times (P−
(M * N)) * L after making L subdivided sample points,
For each (L−M) number of subdivided sample points, a new voice sample is interpolated (L * N) times from the adjacent original voice sample and output to the voice encoder.
By outputting (P- (L * M)) times voice samples to each voice encoder for each number, a total of P voice samples are output to the voice encoder and the output rate from the voice encoder is output. To fit.

【００１５】上記のような音声符号化処理により、例え
ばビットクロックが変動して、音声符号化器から出力さ
れる符号化情報ビット列のレートに対して、音声バッフ
ァに蓄えられる音声サンプルの蓄積レートが小さ過ぎる
或いは大き過ぎてしまった場合には、１フレーム中で偏
ることなく分散されて音声サンプルが部分的に重複或い
は削除されて、音声バッファから出力される音声サンプ
ル数が音声符号化器から出力される符号化情報ビット列
のレートに適合したものに変換される。したがって、正
常な音声符号化処理が実施されるとともに、音声サンプ
ル中での調整による影響が細かく分散されたものとなる
ため、音声サンプル間の連続性を維持して聴感上の劣化
が少ない高品質な音声を再生することができる。By the voice encoding process as described above, for example, the bit clock fluctuates, and the accumulation rate of the voice samples stored in the voice buffer is higher than the rate of the encoded information bit string output from the voice encoder. If it is too small or too large, the voice samples are distributed without being biased in one frame, the voice samples are partially overlapped or deleted, and the number of voice samples output from the voice buffer is output from the voice encoder. The encoded information bit sequence is converted into a format adapted to the rate of the encoded information bit stream. Therefore, the normal voice coding process is performed, and the influence of the adjustment in the voice samples is finely dispersed, so that the continuity between the voice samples is maintained and the deterioration of the auditory sense is small. You can play various sounds.

【００１６】また、本発明に係る音声処理方法は、ビッ
トクロックに同期して音声復号器でフレーム毎の音声符
号化情報ビット列を入力して、復号された再生音声サン
プルを出力し、当該フレーム毎の再生音声サンプルを音
声バッファに蓄積して、サンプリングクロックに同期し
て再生音声サンプルを所定のフレーム毎に出力する音声
復号再生処理において、次のようにして、音声復号処理
されて音声を再生するために音声バッファから出力され
る再生音声サンプル数を所定数に調整する。Further, in the audio processing method according to the present invention, the audio encoding information bit string for each frame is input to the audio decoder in synchronization with the bit clock, the decoded reproduced audio sample is output, and each frame is processed. In the audio decoding / reproducing process of accumulating the reproduced audio samples in the audio buffer and outputting the reproduced audio samples for each predetermined frame in synchronization with the sampling clock, the audio is decoded and reproduced as follows. Therefore, the number of reproduced voice samples output from the voice buffer is adjusted to a predetermined number.

【００１７】すなわち、音声バッファに蓄積されるフレ
ーム毎の再生音声サンプルをオーバーサンプリングし
て、サンプリングクロックに対してビットクロックが相
対的に減少変動した場合（同上）には、サンプル点の間
隔を短くして当該新たなサンプル点の音声サンプルを元
の音声サンプルから補間することにより、音声バッファ
から出力するフレーム毎の再生音声サンプルを当該サン
プリングクロックに適合する音声サンプル数に変換す
る。また、サンプリングクロックに対してビットクロッ
クが相対的に増加変動した場合（同上）には、サンプル
点の間隔を長くして当該新たなサンプル点の音声サンプ
ルを元の音声サンプルから補間することにより、音声バ
ッファから出力するフレーム毎の再生音声サンプルを当
該サンプリングクロックに適合する音声サンプル数に変
換する。That is, when the reproduced audio samples for each frame accumulated in the audio buffer are over-sampled and the bit clock decreases and changes relative to the sampling clock (same as above), the interval between sample points is shortened. Then, by interpolating the audio sample at the new sample point from the original audio sample, the reproduced audio sample for each frame output from the audio buffer is converted into the number of audio samples that matches the sampling clock. Further, when the bit clock relatively increases and changes with respect to the sampling clock (same as above), the interval between sample points is lengthened and the audio sample at the new sample point is interpolated from the original audio sample, The reproduced audio samples for each frame output from the audio buffer are converted into the number of audio samples that matches the sampling clock.

【００１８】また、本発明に係る音声処理方法は、ビッ
トクロックに同期して、フレーム毎に音声符号化情報ビ
ット列（Ｑビット／フレーム）を音声復号器に入力し、
フレーム毎に音声復号処理して再生音声信号サンプル
（Ｐ個／フレーム）を音声バッファに蓄えた後、当該再
生音声サンプルをフレーム毎に出力する音声処理におい
て、次のようにして、音声復号されて音声バッファから
出力される再生音声サンプル数を所定数に調整する。Further, in the audio processing method according to the present invention, the audio coded information bit string (Q bits / frame) is input to the audio decoder for each frame in synchronization with the bit clock.
After the audio decoding processing is performed for each frame and the reproduced audio signal samples (P pieces / frame) are stored in the audio buffer, in the audio processing for outputting the reproduced audio sample for each frame, the audio decoding is performed as follows. The number of reproduced audio samples output from the audio buffer is adjusted to a predetermined number.

【００１９】すなわち、音声符号化情報ビットがＱビッ
ト入力する間に音声バッファから出力される再生音声サ
ンプル数をカウントして、当該カウント値が（Ｐ＋（Ｍ
＊Ｎ））個である場合は、音声復号器から出力されて音
声バッファに蓄えられたＰ個の再生音声サンプルをＬ倍
にオーバーサンプリングして（Ｐ＊Ｌ）個の細分化サン
プル点とした後、細分化サンプル点（Ｌ−Ｍ）個毎に
（Ｌ＊Ｎ）回隣接する元の音声サンプルから新たな音声
サンプルを補間して音声バッファから出力させるととも
に、細分化サンプル点Ｌ個毎に（Ｐ＋（Ｍ＊Ｎ）−（Ｌ
＊Ｎ））回音声サンプルを音声バッファから出力させる
ことにより、合計（Ｐ＋（Ｍ＊Ｎ））個の再生音声サン
プルを出力して音声復号器からの出力レートの変動を吸
収する。また、当該カウント値が（Ｐ−（Ｍ＊Ｎ））個
である場合は、音声復号器から出力されて音声バッファ
に蓄えられたＰ個の再生音声サンプルをＬ倍にオーバー
サンプリングして（Ｐ＊Ｌ）個の細分化サンプル点とし
た後、細分化サンプル点（Ｌ＋Ｍ）個毎に（Ｌ＊Ｎ）回
隣接する元の音声サンプルから新たな音声サンプルを補
間して音声バッファから出力させるとともに、細分化サ
ンプル点Ｌ個毎に（Ｐ−（Ｍ＊Ｎ）−（Ｌ＊Ｎ））回音
声サンプルを音声バッファから出力させることにより、
合計（Ｐ−（Ｍ＊Ｎ））個の再生音声サンプルを出力し
て音声復号器からの出力レートの変動を吸収する。That is, the number of reproduced audio samples output from the audio buffer is counted while inputting Q bits of audio encoded information bits, and the count value is (P + (M
* N)), the P reproduced audio samples output from the audio decoder and stored in the audio buffer are oversampled L times to obtain (P * L) subdivided sample points. Then, for each (L−M) subdivided sample points, a new voice sample is interpolated from the original voice sample that is adjacent (L * N) times and output from the voice buffer. (P + (M * N)-(L
By outputting * N)) times audio samples from the audio buffer, a total of (P + (M * N)) reproduced audio samples are output to absorb fluctuations in the output rate from the audio decoder. When the count value is (P- (M * N)), the P reproduced audio samples output from the audio decoder and stored in the audio buffer are oversampled L times (P After making * L) subdivided sample points, interpolate a new audio sample from the original audio sample adjacent (L * N) times for each (L + M) subdivided sample points and output it from the audio buffer. , (P− (M * N) − (L * N)) times for each L subdivided sample points are output from the audio buffer,
A total of (P- (M * N)) reproduced speech samples are output to absorb fluctuations in the output rate from the speech decoder.

【００２０】上記のような音声復号再生処理により、例
えばビットクロックが変動して、音声復号器へ入力され
る符号化情報ビット列のレートが変動し、音声バッファ
に蓄えられる再生音声サンプルの蓄積レートが小さ過ぎ
る或いは大き過ぎてしまった場合には、１フレーム中で
偏ることなく分散されて音声サンプルが部分的に重複或
いは削除されて、音声バッファから出力される再生音声
サンプルが所定数に変換される。したがって、正常な音
声復号再生処理が実施されるとともに、再生音声サンプ
ル中での調整による影響が細かく分散されたものとなる
ため、再生音声サンプル間の連続性を維持して聴感上の
劣化が少ない高品質な音声を再生することができる。By the voice decoding reproduction process as described above, for example, the bit clock fluctuates, the rate of the encoded information bit string input to the voice decoder fluctuates, and the accumulation rate of the reproduced voice samples stored in the voice buffer changes. If it is too small or too large, the audio samples are distributed without being biased in one frame to partially overlap or delete the audio samples, and the reproduced audio samples output from the audio buffer are converted into a predetermined number. . Therefore, since the normal audio decoding / playback processing is performed and the influence of the adjustment in the playback audio sample is finely dispersed, the continuity between the playback audio samples is maintained and the auditory deterioration is small. High quality sound can be reproduced.

【００２１】また、本発明に係る音声処理装置は、入力
された音声サンプルを蓄える音声バッファと、音声バッ
ファに蓄えられた音声サンプルを符号化処理する音声符
号化器と、を備え、ビットクロックに同期して、音声符
号化器からフレーム毎の音声符号化情報ビット列を出力
する音声処理装置において、１フレーム分の音声符号化
情報ビットが音声符号化器から出力される間に音声バッ
ファに蓄えられる音声サンプル数をカウントする音声サ
ンプルカウンタと、当該音声サンプルカウンタによるカ
ウント値が、１フレーム分の音声符号化情報ビットが音
声符号化器から出力される間に音声バッファに蓄えられ
るべき１フレーム分の所定の音声サンプル数と異なる場
合に当該音声バッファに蓄えられる音声サンプル数を変
換する音声サンプル数変換器と、を備えている。The voice processing apparatus according to the present invention comprises a voice buffer for storing the input voice samples, and a voice encoder for encoding the voice samples stored in the voice buffer, and the bit clock is used. In a speech processing device that outputs a speech coded information bit string for each frame from a speech encoder in synchronization, one frame of speech encoded information bits is stored in a speech buffer while being output from the speech encoder. An audio sample counter for counting the number of audio samples and a count value by the audio sample counter are equivalent to one frame to be stored in the audio buffer while one frame of audio encoded information bits is output from the audio encoder. An audio sample that converts the number of audio samples stored in the audio buffer when the number of audio samples differs from the specified number. And a, the number converter.

【００２２】そして、音声サンプル数変換器は、音声バ
ッファに蓄えられるフレーム毎の音声サンプルをオーバ
ーサンプリングして、カウント値が前記所定のサンプル
数より多い場合にはサンプル点の間隔を長くして当該新
たなサンプル点の音声サンプルを元の音声サンプルから
補間することにより、音声符号化器に入力するフレーム
毎の音声サンプルを前記所定のサンプル数に変換し、カ
ウント値が前記所定のサンプル数より少ない場合にはサ
ンプル点の間隔を短くして当該新たなサンプル点の音声
サンプルを元の音声サンプルから補間することにより、
音声符号化器に入力するフレーム毎の音声サンプルを前
記所定の音声サンプル数に変換する。これにより、上記
した音声符号化の方法が実施され、１フレーム中で偏る
ことなく分散されて音声サンプルが部分的に重複或いは
削除され、音声バッファから出力される音声サンプル数
が音声符号化器から出力される音声符号化情報ビット列
のレートに適合したものに変換されて、正常な音声符号
化処理が実施されるとともに、音声サンプル中での調整
による影響が細かく分散されたものとなるため、音声サ
ンプル間の連続性を維持して高品質な音声を再生するこ
とができる。The audio sample number converter oversamples the audio samples for each frame stored in the audio buffer, and if the count value is greater than the predetermined number of samples, the interval between sample points is lengthened. By interpolating the voice sample at the new sample point from the original voice sample, the voice sample for each frame input to the voice encoder is converted into the predetermined number of samples, and the count value is less than the predetermined number of samples. In this case, by shortening the interval between sample points and interpolating the audio sample of the new sample point from the original audio sample,
The audio samples for each frame input to the audio encoder are converted into the predetermined number of audio samples. As a result, the above-described speech coding method is performed, the speech samples are distributed without being biased in one frame, the speech samples are partially overlapped or deleted, and the number of speech samples output from the speech buffer is determined by the speech encoder. It is converted to one that is compatible with the rate of the output audio coded information bit string to perform normal audio encoding processing, and the effects of adjustments in audio samples are finely dispersed, so High-quality sound can be reproduced while maintaining continuity between samples.

【００２３】また、本発明に係る音声処理装置は、フレ
ーム毎の音声符号化情報ビット列を音声復号処理する音
声復号器と、音声復号処理された再生音声信号サンプル
をフレーム毎に蓄えて出力する音声バッファと、を備
え、ビットクロックに同期して音声復号器へ音声符号化
情報ビット列を入力する音声処理装置において、１フレ
ーム分の音声符号化情報ビットが音声復号器へ入力され
る間に音声バッファから出力される再生音声サンプル数
をカウントする音声サンプルカウンタと、当該音声サン
プルカウンタによるカウント値が、１フレーム分の音声
符号化情報ビットが音声復号化器に入力される間に音声
バッファに蓄えられるべき１フレーム分の所定の再生音
声サンプル数と異なる場合に当該音声バッファに蓄えら
れる音声サンプル数を変換する音声サンプル数変換器
と、を備えている。Further, the speech processing apparatus according to the present invention is a speech decoder for speech-decoding a speech coded information bit string for each frame, and a speech for accumulating and outputting reproduced speech signal samples subjected to speech decoding processing for each frame. And a voice processing device for inputting a voice coded information bit string to a voice decoder in synchronization with a bit clock, while the voice coded information bits for one frame are input to the voice decoder. The audio sample counter for counting the number of reproduced audio samples output from the audio sample counter and the count value by the audio sample counter are stored in the audio buffer while one frame of audio encoded information bits is input to the audio decoder. Number of audio samples stored in the audio buffer when it differs from the predetermined number of reproduced audio samples for one frame And a, a number of audio samples converter for converting.

【００２４】そして、音声サンプル数変換器は、音声バ
ッファに蓄えられるフレーム毎の再生音声サンプルをオ
ーバーサンプリングして、カウント値が前記所定のサン
プル数より多い場合にはサンプル点の間隔を長くして当
該新たなサンプル点の音声サンプルを元の音声サンプル
から補間することにより、音声バッファから出力するフ
レーム毎の再生音声サンプルを前記所定のサンプル数に
変換し、カウント値が前記所定のサンプル数より少ない
場合にはサンプル点の間隔を短くして当該新たなサンプ
ル点の音声サンプルを元の音声サンプルから補間するこ
とにより、音声バッファから出力するフレーム毎の再生
音声サンプルを前記所定の音声サンプル数に変換する。
これにより、上記した音声復号再生の方法が実施され、
１フレーム中で偏ることなく分散されて音声サンプルが
部分的に重複或いは削除されて、音声バッファから出力
される再生音声サンプルが所定数に変換されて、正常な
音声復号再生処理が実施されるとともに、再生音声サン
プル中での調整による影響が細かく分散されたものとな
るため、再生音声サンプル間の連続性を維持して高品質
な音声を再生することができる。The audio sample number converter oversamples the reproduced audio samples for each frame stored in the audio buffer and lengthens the interval between sample points when the count value is larger than the predetermined number of samples. By interpolating the audio sample at the new sample point from the original audio sample, the reproduced audio sample for each frame output from the audio buffer is converted into the predetermined number of samples, and the count value is less than the predetermined number of samples. In this case, the interval between sample points is shortened and the audio sample at the new sample point is interpolated from the original audio sample to convert the reproduced audio sample for each frame output from the audio buffer into the predetermined number of audio samples. To do.
As a result, the above method of voice decoding and reproduction is carried out,
The audio samples are distributed without being biased in one frame and the audio samples are partially overlapped or deleted, and the reproduced audio samples output from the audio buffer are converted into a predetermined number to perform normal audio decoding and reproduction processing. Since the influence of the adjustment in the reproduced voice sample is finely dispersed, it is possible to maintain the continuity between the reproduced voice samples and reproduce the high quality voice.

【００２５】また、本発明に係る音声通信装置は、音声
信号を符号化して送信し、受信した符号化音声信号を復
号化する音声通信装置において、音声符号化を行う上記
した音声処理装置を音声信号の符号化処理部に備え、音
声復号再生を行う上記した音声処理装置を音声信号の復
号化処理部に備えて、高品質な音声通信を実現する。ま
た、本発明に係る音声通信システムは、送信側装置では
音声信号を符号化して送信し、受信側装置では受信した
符号化音声信号を復号化する音声通信システムにおい
て、音声符号化を行う上記した音声処理装置を送信側装
置の音声信号符号化処理部に備え、音声復号再生を行う
上記した音声処理装置を受信側装置の音声信号復号化処
理部に備えて、高品質な音声通信を実現する。Further, a voice communication device according to the present invention is a voice communication device which encodes and transmits a voice signal and decodes a received encoded voice signal. High-quality voice communication is realized by providing the above-described voice processing device for performing voice decoding and reproduction in the signal encoding processing unit and for providing the voice signal decoding processing unit. Further, the voice communication system according to the present invention performs voice coding in the voice communication system in which the transmitting side device encodes and transmits a voice signal and the receiving side device decodes the received encoded voice signal. A voice processing device is provided in the voice signal encoding processing unit of the transmission side device, and the voice processing device described above for performing voice decoding and reproduction is provided in the voice signal decoding processing unit of the reception side device to realize high-quality voice communication. .

【００２６】[0026]

【発明の実施の形態】本発明を実施例に基づいて具体的
に説明する。図１には本発明の一実施例に係る音声符号
化装置の構成を示し、図２には本発明の一実施例に係る
音声符号化装置の構成を示してある。なお、音声信号を
符号化して送信し、受信した符号化音声信号を復号化す
る無線通信端末装置等の音声通信装置において、本実施
例の音声符号化装置は音声信号の符号化処理部に適用さ
れ、また、本実施例の音声復号装置は音声信号の復号化
処理部に適用される。また、この音声通信装置は、送信
側の音声通信装置では音声信号を符号化して送信し、受
信側の音声通信装置では受信した符号化音声信号を復号
化する音声通信システムを構成している。BEST MODE FOR CARRYING OUT THE INVENTION The present invention will be specifically described based on Examples. FIG. 1 shows the configuration of a speech coding apparatus according to an embodiment of the present invention, and FIG. 2 shows the configuration of a speech coding apparatus according to an embodiment of the present invention. In a voice communication device such as a wireless communication terminal device that encodes and transmits a voice signal and decodes a received encoded voice signal, the voice encoding device according to the present embodiment is applied to a voice signal encoding processing unit. In addition, the speech decoding apparatus of this embodiment is applied to a speech signal decoding processing unit. Further, this voice communication device constitutes a voice communication system in which a voice communication device on the transmission side encodes and transmits a voice signal and a voice communication device on the receiving side decodes the encoded voice signal received.

【００２７】まず、図１に示す音声符号化装置では、例
えば８ｋＨｚでサンプリングされ、１６ビットで量子化
された入力音声サンプルａ２をサンプリングクロックｋ
２に同期して入力音声バッファ１に入力して一時的に蓄
積し、入力音声バッファ１に蓄積された音声サンプルを
１フレーム（例えば、２０ｍｓとする）毎に例えば１６
０サンプルづつｂ２として音声符号化器２に入力する。
そして、音声符号化器２は例えば２４００ｂｐｓで入力
音声サンプルｂ２を符号化処理し、外部装置から入力さ
れるフレーム同期信号ｃ２及びビットクロックｄ２に同
期して、音声符号化情報ビット列ｅ２を例えば４８ビッ
ト／フレーム出力する。なお、図１６に示したと同様
に、フレーム同期信号ｃ２とビットクロックｄ２は同期
しており、サンプリングクロックｋ２は音声符号化装置
で発生しているため、外部装置から入力されるフレーム
同期信号ｃ２やビットクロックｄ２とは同期していな
い。First, in the speech coder shown in FIG. 1, the input speech sample a2 sampled at 8 kHz and quantized with 16 bits is used as the sampling clock k.
2 is input to the input audio buffer 1 and is temporarily stored in synchronism with 2, and the audio samples stored in the input audio buffer 1 are, for example, 16 for each frame (for example, 20 ms).
It is input to the voice encoder 2 as b2 every 0 samples.
Then, the audio encoder 2 encodes the input audio sample b2 at, for example, 2400 bps, and synchronizes the audio encoded information bit string e2 with, for example, 48 bits in synchronization with the frame synchronization signal c2 and the bit clock d2 input from the external device. / Output frame. Note that, as in the case shown in FIG. 16, the frame synchronization signal c2 and the bit clock d2 are synchronized, and the sampling clock k2 is generated in the audio encoding device. It is not synchronized with the bit clock d2.

【００２８】そして、この音声符号化装置では、音声サ
ンプルカウンタ３がフレーム同期信号ｃ２の１周期に入
力される音声サンプルａ２の数をサンプリングクロック
ｋ２をカウントすることにより求め、１フレーム当たり
の音声サンプル数ｍ２を音声サンプル変換器４へ出力す
る。この音声サンプル数変換器４は、音声サンプル数ｍ
２を所定の音声サンプル数（この例では１６０サンプ
ル）に変換する。すなわち、音声サンプル数変換器４は
入力音声バッファ１に蓄えられた音声サンプル列ｎ２を
一旦読み出して、所定の音声サンプル数に変換した後
に、変換した音声サンプル列ｏ２を入力音声バッファ１
に上書きして再入力し、この所定数に変換された音声サ
ンプル列を音声符号化器２へ出力させる。In this speech coding apparatus, the speech sample counter 3 obtains the number of speech samples a2 input in one cycle of the frame synchronization signal c2 by counting the sampling clock k2, and the speech samples per frame. The number m2 is output to the voice sample converter 4. This voice sample number converter 4 has a voice sample number m.
2 is converted into a predetermined number of voice samples (160 samples in this example). That is, the audio sample number converter 4 once reads out the audio sample sequence n2 stored in the input audio buffer 1, converts it into a predetermined number of audio samples, and then converts the converted audio sample sequence o2 into the input audio buffer 1.
, And re-input, and the voice sample sequence converted into the predetermined number is output to the voice encoder 2.

【００２９】また、図２に示す音声復号装置では、外部
装置から入力されるフレーム同期信号ｆ２及びビットク
ロック（例えば、２４００ｂｐｓ）ｇ２に同期して、音
声符号化情報ビット列ｈ２（例えば、４８ビット／フレ
ーム）を音声復号器６に入力し、音声復号器６が例えば
２４００ｂｐｓで入力された音声符号化情報ビット列ｈ
２を復号処理して、再生音声サンプル列ｉ２（例えば、
１６０サンプル／フレーム）を出力音声バッファ５へ出
力する。そして、この再生音声サンプル列ｉ２を出力音
声バッファ５に一時的に蓄積し、蓄積された再生音声サ
ンプルをサンプリングクロック（例えば、８ｋＨｚ）ｌ
２に同期してｊ２として図外の音声再生装置へ出力す
る。なお、図１６に示したと同様に、フレーム同期信号
ｆ２とビットクロックｇ２は同期しており、サンプリン
グクロックｌ２は音声復号装置で発生しているため、外
部装置から入力されるフレーム同期信号ｆ２やビットク
ロックｇ２とは同期していない。また、本例では、ビッ
トクロックｄ２、ｇ２は外部装置から入力されるが、ビ
ットクロックが本装置内のサンプリングクロックｋ２、
ｌ２とは別な発振源から与えられる場合にあっても、本
発明は同様な作用効果を得ることができる。In the speech decoding apparatus shown in FIG. 2, the speech coded information bit string h2 (for example, 48 bits / bit) is synchronized with the frame synchronization signal f2 and the bit clock (for example, 2400 bps) g2 input from the external apparatus. Frame) to the audio decoder 6, and the audio decoder 6 inputs the audio encoded information bit string h input at 2400 bps, for example.
2 is decoded, and the reproduced voice sample sequence i2 (for example,
160 samples / frame) to the output audio buffer 5. Then, the reproduced voice sample sequence i2 is temporarily stored in the output voice buffer 5, and the stored reproduced voice samples are sampled at a sampling clock (for example, 8 kHz) l.
It is output to an audio reproducing device (not shown) as j2 in synchronization with 2. As shown in FIG. 16, since the frame synchronization signal f2 and the bit clock g2 are synchronized and the sampling clock 12 is generated in the audio decoding device, the frame synchronization signal f2 and bits input from the external device are input. It is not synchronized with the clock g2. In this example, the bit clocks d2 and g2 are input from an external device, but the bit clock is the sampling clock k2 in the device.
Even when it is given from an oscillation source different from 12, the same effects can be obtained by the present invention.

【００３０】そして、この音声復号装置では、音声サン
プルカウンタ７がフレーム同期信号ｆ２の１周期に出力
音声バッファ５から出力される再生音声サンプルの数を
サンプリングクロックｌ２をカウントすることにより求
め、１フレーム当たりに出力される再生音声サンプルの
数ｐ２を音声サンプル数変換器８へ出力する。この音声
サンプル数変換器８は、出力音声バッファ５に蓄えられ
た再生音声サンプル数ｐ２を所定の再生音声サンプル数
に変換する。すなわち、音声サンプル数変換器８は出力
音声バッファ５に蓄えられた再生音声サンプル列ｑ２を
一旦読み出して、所定の再生音声サンプル数に変換した
後に、変換した再生音声サンプル列ｒ２を出力音声バッ
ファ５に上書きして再入力し、この所定数に変換された
再生音声サンプル列を図外の音声再生装置へ出力させ
る。In this audio decoding device, the audio sample counter 7 obtains the number of reproduced audio samples output from the output audio buffer 5 in one cycle of the frame synchronization signal f2 by counting the sampling clock l2, and one frame is obtained. The number p2 of reproduced voice samples output per hit is output to the voice sample number converter 8. The audio sample number converter 8 converts the reproduced audio sample number p2 stored in the output audio buffer 5 into a predetermined reproduced audio sample number. That is, the audio sample number converter 8 once reads the reproduced audio sample sequence q2 stored in the output audio buffer 5, converts it to a predetermined reproduced audio sample number, and then converts the converted reproduced audio sample sequence r2 into the output audio buffer 5 , And re-input, and the reproduced audio sample sequence converted into the predetermined number is output to an audio reproducing device (not shown).

【００３１】次に、上記の音声符号化装置による処理を
図３〜図１０を参照して説明する。この音声符号化装置
では、図３に示す処理手順で音声符号化器２から音声符
号化情報ビットｅ２を出力させるとともに入力音声バッ
ファ１に蓄えられた音声サンプル数を変換するデジタル
データ出力割り込み処理が行われ、また、図４に示す処
理手順で入力音声バッファ１へ音声サンプルａ２を入力
させる音声サンプル入力割り込み処理が行われる。Next, the processing by the above speech coding apparatus will be described with reference to FIGS. In this audio encoding device, a digital data output interrupt process for outputting the audio encoded information bits e2 from the audio encoder 2 and converting the number of audio samples stored in the input audio buffer 1 is performed by the processing procedure shown in FIG. Also, the voice sample input interrupt process for inputting the voice sample a2 to the input voice buffer 1 is performed by the processing procedure shown in FIG.

【００３２】まず、図３に示すデジタルデータ出力割り
込み処理は、ビットクロックｄ２の立ち下がり毎に発生
し、デジタルデータ（音声符号化情報ビットｅ２）を音
声符号化器２から１ビット出力させる（ステップＳ
１）。そして、フレーム同期信号ｃ２のレベルを確認し
て（ステップＳ２）、レベルが”Ｌ”でない場合には復
帰（割り込みルーチンを終了）する一方、レベルが”
Ｌ”であればフレーム処理の区切りであるので、入力音
声バッファ１から出力させる音声サンプル数を必要に応
じて調整するために以下の処理を続行する。First, the digital data output interrupt process shown in FIG. 3 is generated at each falling edge of the bit clock d2, and the digital data (voice coded information bit e2) is output from the voice coder 2 by one bit (step). S
1). Then, the level of the frame synchronization signal c2 is confirmed (step S2), and if the level is not "L", the process returns (ends the interrupt routine) while the level is "L".
If it is L ", it is a delimiter of frame processing, and therefore the following processing is continued in order to adjust the number of audio samples output from the input audio buffer 1 as necessary.

【００３３】すなわち、音声サンプルカウンタ３のカウ
ント値を確認して（ステップＳ３）、カウント値が音声
符号化器２からの出力レートに整合する所定値Ｐ（この
例では１６０サンプル）である場合には、音声サンプル
数の調整は必要ないので音声サンプルカウンタ３のカウ
ント値をクリアして（ステップＳ７）、復帰する。一
方、カウント値が所定値Ｐでない場合には、ビットクロ
ックｄ２の変動により現時点での音声符号化器２からの
出力レートに整合していないため、入力音声バッファ１
から出力する音声サンプル数を調整するためにカウント
値が所定値Ｐより大きいか否かを判定する（ステップＳ
４）。That is, the count value of the voice sample counter 3 is confirmed (step S3), and when the count value is a predetermined value P (160 samples in this example) that matches the output rate from the voice encoder 2. Since it is not necessary to adjust the number of voice samples, the count value of the voice sample counter 3 is cleared (step S7) and the process returns. On the other hand, when the count value is not the predetermined value P, it does not match the output rate from the audio encoder 2 at the present time due to the fluctuation of the bit clock d2, and therefore the input audio buffer 1
It is determined whether or not the count value is larger than the predetermined value P in order to adjust the number of audio samples output from (step S
4).

【００３４】この結果、カウント値が所定値Ｐより小さ
い（本例では、Ｐ−Ｍ＊Ｎ個）場合には、後述する音声
サンプル数増加処理を行い（ステップＳ５）、また、カ
ウント値が所定値Ｐより大きい（本例では、Ｐ＋Ｍ＊Ｎ
個）場合には、後述する音声サンプル数削減処理を行っ
て（ステップＳ６）、入力音声バッファ１に蓄積される
１フレームの音声サンプルの個数を所定の個数Ｐに調整
し、このＰ個の音声サンプル列ｂ２を入力音声バッファ
１から音声符号化器２へ出力する。そして、次のフレー
ム処理のために音声サンプルカウンタ３のカウント値を
クリアにして（ステップＳ７）、復帰する。As a result, if the count value is smaller than the predetermined value P (P-M * N in this example), the voice sample number increasing process described later is performed (step S5), and the count value is predetermined. Greater than P (in this example, P + M * N
In this case, the number of audio samples is reduced as will be described later (step S6), the number of audio samples of one frame accumulated in the input audio buffer 1 is adjusted to a predetermined number P, and the P audio The sample sequence b2 is output from the input voice buffer 1 to the voice encoder 2. Then, the count value of the audio sample counter 3 is cleared for the next frame processing (step S7), and the process returns.

【００３５】図４に示す音声サンプル入力割り込み処理
は、サンプリングクロックｋ２の立ち下がり毎に発生
し、入力音声サンプルａ２を１つ入力音声バッファ１へ
入力して蓄積し（ステップＳ１１）、音声サンプルカウ
ンタ３のカウント値を１つ増加させて（ステップＳ１
２）、復帰する。すなわち、入力音声サンプルａ２はサ
ンプリングクロックｋ２に同期して、１つづつ入力音声
バッファ１へ入力されて蓄積され、その蓄積個数がサン
プリングクロックｋ２をカウントする音声サンプルカウ
ンタ３によってカウントされる。The voice sample input interrupt process shown in FIG. 4 is generated at each falling edge of the sampling clock k2, and one input voice sample a2 is input to the input voice buffer 1 and stored (step S11). The count value of 3 is incremented by 1 (step S1
2) Return. That is, the input audio samples a2 are input to and accumulated in the input audio buffer 1 one by one in synchronization with the sampling clock k2, and the accumulated number is counted by the audio sample counter 3 which counts the sampling clock k2.

【００３６】上記した音声サンプル数増加処理（ステッ
プＳ５）は、音声サンプル数変換器４によって図５に示
すような手順のサブルーチンとして実行される。この増
加処理では、まず、所定の個数Ｐより少なくなってしま
っているＰ−（Ｍ＊Ｎ）個の音声サンプルを入力音声バ
ッファ１から取り出して、Ｌ倍オーバーサンプリングし
（ステップＳ２１）、（Ｐ−（Ｍ＊Ｎ））＊Ｌ個の細分
化されたサンプル点に分割する。そして、この細分化さ
れたサンプル点（Ｌ−Ｍ）個毎に、Ｌ＊Ｎ回隣接する元
の音声サンプルから新たな音声サンプルを補間して入力
音声バッファ１に出力して上書き格納し（ステップＳ２
２）、更に、細分化サンプル点Ｌ個毎に、音声サンプル
をＰ−（Ｌ＊Ｎ）回入力音声バッファ１に出力して上書
き格納する（ステップＳ２３）。すなわち、（Ｐ−Ｍ＊
Ｎ）個の音声サンプル列から、上記のオーバーサンプリ
ング及び補間によって音声サンプル数を増加させて、合
計（Ｌ＊Ｎ）＋（Ｐ−（Ｌ＊Ｎ））＝Ｐ個の音声サンプ
ルを入力音声バッファ１に再格納する。The above-described voice sample number increasing process (step S5) is executed by the voice sample number converter 4 as a subroutine of the procedure as shown in FIG. In this increase processing, first, P- (M * N) voice samples, which are less than the predetermined number P, are taken out from the input voice buffer 1 and oversampled L times (step S21), (P -(M * N)) * L Divide into sample points. Then, for each of the subdivided sample points (L−M), a new voice sample is interpolated from the original voice sample that is L * N times adjacent and output to the input voice buffer 1 to be overwritten and stored (step S2
2) Furthermore, for each L subdivided sample points, a voice sample is output to the input voice buffer 1 P- (L * N) times and is overwritten and stored (step S23). That is, (P-M *
From the N) audio sample sequence, increase the number of audio samples by the above-mentioned oversampling and interpolation, and add a total of (L * N) + (P- (L * N)) = P audio samples to the input audio buffer. Restore to 1.

【００３７】ここで、この音声サンプル数を増加させる
処理を更に詳しく説明すると、図６に示すようである。
図６には音声サンプル列の一部を示してあるが、図中の
Ａ０〜Ａ６がオーバーサンプリング前の８ｋＨｚでの音
声サンプルであり、×印の点がＬ倍（本例では、１６
倍）オーバーサンプリングされた細分化サンプル点であ
る。また、図中のＢ０〜Ｂ６は、細分化サンプル点（Ｌ
−Ｍ）個毎の位置であり、これらの位置の新たな音声サ
ンプルを隣接する元の音声サンプル（例えば、Ｂ１点に
ついてはＡ０とＡ１）から補間して求め、これら新たな
音声サンプルをＬ＊Ｎ回入力音声バッファ１に出力し、
更にそれに続いて、同図には示していないが、後続する
音声サンプルを細分化サンプル点Ｌ個毎に、Ｐ−（Ｌ＊
Ｎ）回入力音声バッファ１に出力して、合計Ｐ個の音声
サンプルを入力音声バッファ１に再格納する。Here, the process for increasing the number of voice samples will be described in more detail as shown in FIG.
Although FIG. 6 shows a part of the audio sample sequence, A0 to A6 in the figure are audio samples at 8 kHz before oversampling, and points marked with X are L times (16 in this example).
X) Oversampled subdivided sample points. In addition, B0 to B6 in the figure are subdivided sample points (L
-M) positions for each, and new voice samples at these positions are obtained by interpolating from adjacent original voice samples (for example, A0 and A1 for the B1 point), and these new voice samples are L *. Output to the input voice buffer 1 N times,
After that, although not shown in the figure, the subsequent audio samples are divided into P- (L *
N) Output to the input voice buffer 1 times, and store a total of P voice samples in the input voice buffer 1 again.

【００３８】なお、この例ではＬ＝１６であるが、更
に、Ｐ＝１６０、Ｍ＝１、Ｎ＝４とすると、変換前では
１５６個であった音声サンプルが、上記のように（Ｌ−
Ｍ＝１５）個毎に（Ｌ＊Ｎ＝１６＊４＝６４）回入力音
声バッファ１に出力し、（Ｌ＝１６）個毎に（Ｐ−Ｌ＊
Ｎ＝１６０−１６＊４＝９６）回入力音声バッファ１に
出力することにより、最終的に６４＋９６＝１６０個の
音声サンプル（８ｋＨｚサンプリング）が入力音声バッ
ファ１に再格納されて音声符号化器２へ出力される。す
なわち、音声サンプル列から補間により得た音声サンプ
ルを細分化サンプル点１５個毎に６４回出力することに
より、この区間では、細分化サンプル点１５個置きに分
散させて６４個の音声サンプルを出力し、４個の音声サ
ンプルを増加させている。Note that although L = 16 in this example, if P = 160, M = 1, and N = 4, then 156 voice samples before conversion are converted to (L-
It outputs to the input voice buffer 1 every (M * 15) (L * N = 16 * 4 = 64) times, and (P-L *) every (L = 16).
N = 160−16 * 4 = 96) times, output to the input speech buffer 1 finally 64 + 96 = 160 speech samples (8 kHz sampling) are re-stored in the input speech buffer 1 and the speech encoder 2 Is output to. That is, by outputting the voice samples obtained by interpolation from the voice sample sequence for every 15 subdivided sample points 64 times, in this section, every 15 subdivided sample points are dispersed to output 64 voice samples. However, the number of voice samples is increased by four.

【００３９】ここで、新たなサンプル点の音声サンプル
を元の音声サンプルから補間して得る処理は、本例では
直線補間により行っている。例えば、新たなサンプル点
Ｂ２の音声サンプルは、（Ａ１の音声サンプル＊２／１
６）＋（Ａ２の音声サンプル＊１４／１６）により求め
ている。なお、本発明では勿論他の補間方法を用いるこ
とができる。簡単な補間方法としては、直線補間を用い
ることが考えられるが、音声の連続性をよりよく維持し
ようとする場合には、サンプリング関数による補間を用
いることが望ましい。また、本例では、図７（ａ）に示
すように、１６倍オーバーサンプリングされた音声サン
プル列の先頭位置から、細分化サンプル点１５個毎に音
声サンプルを６４回入力音声バッファ１に出力した後
に、細分化サンプル点１６個毎に音声サンプルを９６回
入力音声バッファ１に出力しているが、本発明では、例
えば、同図（ｂ）に示すように１５個毎の出力と１６個
毎の出力との順番を入れ換えたり、或いは、同図（ｃ）
に示すように１５個毎の出力や１６個毎の出力を分割し
て分散させるようにしてもよい。なお、図７（ｃ）に示
す方法では、１６個毎に出力する区間でも新たなサンプ
ル点の音声サンプルを補間して求めることとなるが、図
７（ａ）や（ｂ）に示す方法では、１６個毎に出力する
区間ではオーバーサンプリング前の元のサンプル点と一
致することとなるため、補間処理を行わずとも元の音声
サンプルを出力すればよいので好ましい。The process of interpolating the voice sample at the new sample point from the original voice sample is performed by linear interpolation in this example. For example, the voice sample of the new sample point B2 is (the voice sample of A1 * 2/1
6) + (voice sample of A2 * 14/16). Note that other interpolation methods can of course be used in the present invention. It is conceivable to use linear interpolation as a simple interpolation method, but it is desirable to use interpolation by a sampling function in order to maintain the continuity of speech better. Further, in this example, as shown in FIG. 7A, the audio sample is output to the input audio buffer 1 64 times for each 15 subdivided sample points from the head position of the 16-times oversampled audio sample sequence. After that, the audio sample is output to the input audio buffer 1 96 times for every 16 subdivided sample points. In the present invention, for example, as shown in FIG. The order of the output and the output of
As shown in, the output of every 15 pieces or the output of every 16 pieces may be divided and dispersed. Note that in the method shown in FIG. 7C, the audio samples of new sample points are interpolated even in the interval of outputting every 16 pieces, but in the method shown in FIGS. 7A and 7B. , In the interval of outputting every 16 units, it coincides with the original sample point before oversampling, so that the original voice sample may be output without performing the interpolation process, which is preferable.

【００４０】また、上記した音声サンプル数削減処理
（ステップＳ６）は、音声サンプル数変換器４によって
図８に示すような手順のサブルーチンとして実行され
る。この削減処理では、まず、所定の個数Ｐより多いＰ
＋（Ｍ＊Ｎ）個の音声サンプルを入力音声バッファ１か
ら取り出して、Ｌ倍オーバーサンプリングし（ステップ
Ｓ３１）、（Ｐ＋（Ｍ＊Ｎ））＊Ｌ個の細分化されたサ
ンプル点に分割する。そして、この細分化されたサンプ
ル点（Ｌ＋Ｍ）個毎に、Ｌ＊Ｎ回隣接する元の音声サン
プルから新たな音声サンプルを補間して入力音声バッフ
ァ１に出力して上書き格納し（ステップＳ３２）、更
に、細分化サンプル点Ｌ個毎に、Ｐ−（Ｌ＊Ｎ）回音声
サンプルを入力音声バッファ１に出力して上書き格納す
る（ステップＳ３３）。すなわち、（Ｐ＋Ｍ＊Ｎ）個の
音声サンプル列から、上記のオーバーサンプリング及び
補間処理によって音声サンプルを削減し、合計（Ｌ＊
Ｎ）＋（Ｐ−（Ｌ＊Ｎ））＝Ｐ個の音声サンプルを入力
音声バッファ１に再格納する。The above-mentioned voice sample number reduction processing (step S6) is executed by the voice sample number converter 4 as a subroutine of the procedure shown in FIG. In this reduction process, first, P which is larger than the predetermined number P is set.
+ (M * N) voice samples are taken from the input voice buffer 1, over-sampled L times (step S31), and divided into (P + (M * N)) * L subdivided sample points. . Then, for each of the subdivided sample points (L + M), a new audio sample is interpolated from the original audio sample that is L * N times adjacent, and is output to the input audio buffer 1 to be overwritten and stored (step S32). Further, for each L subdivided sample points, P- (L * N) times audio samples are output to the input audio buffer 1 and are overwritten and stored (step S33). That is, from the (P + M * N) speech sample sequence, the speech samples are reduced by the above oversampling and interpolation processing, and the total (L *
N) + (P− (L * N)) = P voice samples are stored again in the input voice buffer 1.

【００４１】ここで、この音声サンプル数を削減させる
処理を更に詳しく説明すると、図９に示すようである。
図９には音声サンプル列の一部を示してあるが、図中の
Ａ０〜Ａ６がオーバーサンプリング前の８ｋＨｚでの音
声サンプルであり、×印の点がＬ倍（本例では、１６
倍）オーバーサンプリングされた細分化音声サンプル点
である。また、図中のＢ０〜Ｂ６は、細分化サンプル点
（Ｌ＋Ｍ）個毎の位置であり、これらの位置の新たな音
声サンプルを隣接する元の音声サンプルから補間して求
め、これら新たな音声サンプルをＬ＊Ｎ回入力音声バッ
ファ１に出力し、更にそれに続いて、同図には示してい
ないが、後続する音声サンプルを細分化サンプル点Ｌ個
毎に、Ｐ−（Ｌ＊Ｎ）回入力音声バッファ１に出力し
て、合計Ｐ個の音声サンプルを入力音声バッファ１に再
格納する。Here, the process for reducing the number of voice samples will be described in more detail as shown in FIG.
Although FIG. 9 shows a part of the audio sample sequence, A0 to A6 in the figure are audio samples at 8 kHz before oversampling, and the points marked with X are L times (16 in this example).
X) oversampled segmented audio sample points. In addition, B0 to B6 in the figure are positions for each of the subdivided sample points (L + M), new audio samples at these positions are obtained by interpolating from the adjacent original audio samples, and these new audio samples are obtained. Is input to the audio buffer 1, and subsequent audio samples are input P- (L * N) times for each L subdivided sample points, which is not shown in the figure. Output to audio buffer 1 and re-store a total of P audio samples in input audio buffer 1.

【００４２】なお、この例ではＬ＝１６であるが、更
に、Ｐ＝１６０、Ｍ＝１、Ｎ＝４とすると、変換前では
１６４個であった音声サンプルが、上記のように（Ｌ＋
Ｍ＝１７）個毎に（Ｌ＊Ｎ＝１６＊４＝６４）回入力音
声バッファ１に出力し、（Ｌ＝１６）個毎に（Ｐ−Ｌ＊
Ｎ＝１６０−１６＊４＝９６）回入力音声バッファ１に
出力することにより、最終的に６４＋９６＝１６０個の
音声サンプル（８ｋＨｚサンプリング）が入力音声バッ
ファ１に再格納されて音声符号化器２へ出力される。す
なわち、音声サンプル列から補間により得た音声サンプ
ルを細分化サンプル点１７個毎に６４回出力することに
より、この区間では、細分化サンプル点１７個置きに分
散させて６４個の音声サンプルを出力し、４個の音声サ
ンプルを削減させている。In this example, L = 16, but if P = 160, M = 1, and N = 4, then 164 voice samples before conversion are converted into (L +
It is output to the input voice buffer 1 every (M * 17) (L * N = 16 * 4 = 64) times, and every (L = 16) (P-L *).
N = 160−16 * 4 = 96) times, output to the input speech buffer 1 finally 64 + 96 = 160 speech samples (8 kHz sampling) are re-stored in the input speech buffer 1 and the speech encoder 2 Is output to. That is, by outputting the voice samples obtained by interpolation from the voice sample sequence for every 17 subdivided sample points 64 times, in this section, every 17 subdivided sample points are dispersed to output 64 voice samples. However, four voice samples are reduced.

【００４３】ここで、上記の補間処理は、上述した増加
処理の場合と同様である。また、本例では、図１０に示
すように、１６倍オーバーサンプリングされた音声サン
プル列の先頭位置から、細分化サンプル点１７個毎に音
声サンプルを６４回入力音声バッファ１に出力した後
に、細分化サンプル点１６個毎に音声サンプルを９６回
入力音声バッファ１に出力しているが、本発明では、例
えば図７（ｂ）（ｃ）に示したように、種々な態様によ
り削減処理を行うことができる。Here, the above-mentioned interpolation processing is the same as the above-mentioned increase processing. Further, in this example, as shown in FIG. 10, after the audio sample is output to the input audio buffer 1 64 times for every 17 subdivided sample points from the head position of the 16-times oversampled audio sample sequence, The audio sample is output to the input audio buffer 1 96 times for every 16 sampled sample points. However, in the present invention, the reduction processing is performed in various modes as shown in FIGS. 7B and 7C, for example. be able to.

【００４４】上記のように、音声サンプル数の増加処理
や削減処理は、音声サンプルカウンタ３が、音符号化器
２から１フレーム分の音声符号化情報ビットｅ２が出力
される間に入力音声バッファ１に入力される音声サンプ
ル数をカウントし、ビットクロックｄ２の変動によっ
て、入力音声バッファ１に入力される音声サンプル数
（すなわち、音声符号化器２へ出力される音声サンプル
数）が適正値からずれてしまったときに行われる。した
がって、ビットクロックｄ２が変動してしまったときで
も、音声符号化器２へその出力レートに適合した適正な
数の音声サンプルを供給することができ、また、上記の
増加処理や削減処理は分散して行われることから音声サ
ンプル列の連続性を保つことができる。As described above, the process of increasing or reducing the number of audio samples is performed by the audio sample counter 3 while the audio encoder 2 outputs the audio encoded information bits e2 for one frame. The number of audio samples input to 1 is counted, and the number of audio samples input to the input audio buffer 1 (that is, the number of audio samples output to the audio encoder 2) is changed from an appropriate value due to the fluctuation of the bit clock d2. It is done when it is out of alignment. Therefore, even when the bit clock d2 fluctuates, the audio encoder 2 can be supplied with an appropriate number of audio samples adapted to its output rate, and the above-mentioned increase processing and reduction processing are distributed. Therefore, the continuity of the voice sample sequence can be maintained.

【００４５】次に、上記の音声復号装置による処理を図
１１〜図１４を参照して説明する。この音声復号装置で
は、図１１に示す処理手順で音声復号器６へ音声符号化
情報ビットｈ２を入力させるとともに出力音声バッファ
５に蓄えられた再生音声サンプル数を変換するデジタル
データ入力割り込み処理が行われ、また、図１２に示す
処理手順で出力音声バッファ５から音声サンプルｊ２を
出力させる音声サンプル出力割り込み処理が行われる。Next, the processing by the above speech decoding apparatus will be described with reference to FIGS. 11 to 14. In this audio decoding device, a digital data input interrupt process for inputting audio coded information bits h2 to the audio decoder 6 and converting the number of reproduced audio samples stored in the output audio buffer 5 is performed by the processing procedure shown in FIG. In addition, a voice sample output interrupt process for outputting the voice sample j2 from the output voice buffer 5 is performed by the processing procedure shown in FIG.

【００４６】まず、図１１に示すデジタルデータ入力割
り込み処理は、ビットクロックｇ２の立ち下がり毎に発
生し、デジタルデータ（音声符号化情報ビットｈ２）を
音声復号器６に１ビット入力する（ステップＳ４１）。
そして、フレーム同期信号ｆ２のレベルを確認して（ス
テップＳ４２）、レベルが”Ｌ”でない場合には復帰
（割り込みルーチンを終了）する一方、レベルが”Ｌ”
であればフレーム処理の区切りであるので、出力音声バ
ッファ５から出力させる再生音声サンプル数を必要に応
じて調整するために以下の処理を続行する。First, the digital data input interrupt processing shown in FIG. 11 is generated at each falling edge of the bit clock g2, and one bit of digital data (voice coded information bit h2) is input to the voice decoder 6 (step S41). ).
Then, the level of the frame synchronization signal f2 is confirmed (step S42), and if the level is not "L", the process returns (ends the interrupt routine) while the level is "L".
If so, it is a delimiter of frame processing, and therefore the following processing is continued in order to adjust the number of reproduced audio samples output from the output audio buffer 5 as necessary.

【００４７】すなわち、音声サンプルカウンタ７のカウ
ント値を確認して（ステップＳ４３）、カウント値が音
声復号器６への入力レートに整合する所定値Ｐ（この例
では１６０サンプル）である場合には、再生音声サンプ
ル数の調整は必要ないので音声サンプルカウンタ７のカ
ウント値をクリアして（ステップＳ４７）、復帰する。
一方、カウント値が所定値Ｐでない場合には、ビットク
ロックｇ２の変動により現時点での音声復号器６への入
力レートに整合していないため、出力音声バッファ５か
ら出力する再生音声サンプル数を調整するために、変換
器８がカウント値が所定値Ｐより大きいか否かを判定す
る（ステップＳ４４）。That is, the count value of the voice sample counter 7 is confirmed (step S43), and when the count value is a predetermined value P (160 samples in this example) that matches the input rate to the voice decoder 6, Since it is not necessary to adjust the number of reproduced voice samples, the count value of the voice sample counter 7 is cleared (step S47) and the process returns.
On the other hand, if the count value is not the predetermined value P, it does not match the current input rate to the audio decoder 6 due to the fluctuation of the bit clock g2, so the number of reproduced audio samples output from the output audio buffer 5 is adjusted. In order to do so, the converter 8 determines whether the count value is larger than the predetermined value P (step S44).

【００４８】この結果、カウント値が所定値Ｐより大き
い（本例では、Ｐ＋（Ｍ＊Ｎ個））場合には、後述する
音声サンプル数増加処理を行って音声再生のために不足
する再生音声サンプルを補い（ステップＳ４５）、ま
た、カウント値が所定値Ｐより小さい（本例では、Ｐ−
（Ｍ＊Ｎ個））場合には、後述する音声サンプル数削減
処理を行って音声再生のために余ってしまう再生音声サ
ンプルを削減して（ステップＳ４６）、出力音声バッフ
ァ５に蓄積される１フレームの再生音声サンプルの個数
を音声再生に適合する所定個数Ｐに調整し、このＰ個の
音声サンプル列ｊ２を出力音声バッファ５から図外の音
声再生器へ出力する。そして、次のフレーム処理のため
に音声サンプルカウンタ７のカウント値をクリアにして
（ステップＳ４７）、復帰する。As a result, if the count value is larger than the predetermined value P (in this example, P + (M * N)), the number of audio samples to be increased, which will be described later, is applied to reproduce audio that is insufficient for audio reproduction. The sample is supplemented (step S45), and the count value is smaller than the predetermined value P (in this example, P-
(M * N)), the audio sample number reduction process described later is performed to reduce the reproduced audio samples remaining for audio reproduction (step S46), and the accumulated audio samples are stored in the output audio buffer 5 as 1 The number of reproduced audio samples of the frame is adjusted to a predetermined number P suitable for audio reproduction, and the P audio sample strings j2 are output from the output audio buffer 5 to an audio reproducer (not shown). Then, the count value of the audio sample counter 7 is cleared for the next frame processing (step S47), and the process returns.

【００４９】図１２に示す音声サンプル出力割り込み処
理は、サンプリングクロックｌ２の立ち下がり毎に発生
し、再生音声サンプルｊ２を１つ出力音声バッファ５か
ら出力し（ステップＳ５１）、音声サンプルカウンタ７
のカウント値を１つ増加させて（ステップＳ５２）、復
帰する。すなわち、再生音声サンプルｊ２はサンプリン
グクロックｌ２に同期して、１つづつ出力音声バッファ
５から出力され、その出力個数がサンプリングクロック
ｌ２をカウントする音声サンプルカウンタ７によってカ
ウントされる。The audio sample output interrupt processing shown in FIG. 12 is generated at each falling edge of the sampling clock l2, one reproduced audio sample j2 is output from the output audio buffer 5 (step S51), and the audio sample counter 7 is output.
The count value of is incremented by 1 (step S52), and the process returns. That is, the reproduced voice samples j2 are output from the output voice buffer 5 one by one in synchronization with the sampling clock l2, and the number of outputs is counted by the voice sample counter 7 that counts the sampling clock l2.

【００５０】上記した音声サンプル数増加処理（ステッ
プＳ４５）は、音声サンプル数変換器８によって図１３
に示すような手順のサブルーチンとして実行される。こ
の増加処理では、まず、音声復号器６から出力されて出
力音声バッファ５に蓄積されたカウント値（Ｐ＋（Ｍ＊
Ｎ））より少ないＰ個の再生音声サンプルを出力音声バ
ッファ５から取り出して、Ｌ倍オーバーサンプリングし
（ステップＳ６１）、Ｐ＊Ｌ個の細分化されたサンプル
点に分割する。そして、この細分化されたサンプル点
（Ｌ−Ｍ）個毎に、Ｌ＊Ｎ回隣接する元の音声サンプル
から新たな音声サンプルを補間して出力音声バッファ５
に出力して上書き格納し（ステップＳ６２）、更に、細
分化サンプル点Ｌ個毎に、音声サンプルをＰ＋（Ｍ＊
Ｎ）−（Ｌ＊Ｎ）回出力音声バッファ５に出力して上書
き格納する（ステップＳ６３）。The above-described voice sample number increasing process (step S45) is performed by the voice sample number converter 8 as shown in FIG.
It is executed as a subroutine of the procedure as shown in. In this increase processing, first, the count value (P + (M *
N)) less than P reproduced voice samples are taken out from the output voice buffer 5, oversampled L times (step S61), and divided into P * L subdivided sample points. Then, for each of the subdivided sample points (L−M), a new voice sample is interpolated from the original voice sample that is L * N times adjacent to the output voice buffer 5
, And overwrite-store it (step S62). Further, for each L subdivided sample points, P + (M *
N)-(L * N) times output to the output voice buffer 5 and store it by overwriting (step S63).

【００５１】すなわち、図６や図７に示したと同様にし
て、Ｐ個の再生音声サンプル列から、上記のオーバーサ
ンプリング及び補間により再生音性サンプルを増加させ
て、合計（Ｌ＊Ｎ）＋（Ｐ＋（Ｍ＊Ｎ）−（Ｌ＊Ｎ））
＝Ｐ＋（Ｍ＊Ｎ）個の再生音声サンプルを出力音声バッ
ファ５に再格納する。この結果、音声再生処理に適合し
た個数の再生音声サンプルを出力音声バッファ５から出
力させている。なお、再生音声サンプルを増加させる態
様は、上記の符号化の場合と同様である。That is, in the same manner as shown in FIG. 6 and FIG. 7, the reproduced sound samples are increased by the above-mentioned oversampling and interpolation from the P reproduced sound sample string, and the total (L * N) + ( P + (M * N)-(L * N))
= P + (M * N) reproduced voice samples are stored again in the output voice buffer 5. As a result, the output audio buffer 5 outputs a number of reproduced audio samples suitable for the audio reproduction process. The mode of increasing the reproduced audio samples is the same as the case of the above encoding.

【００５２】また、上記した音声サンプル数削減処理
（ステップＳ４６）は、音声サンプル数変換器８によっ
て図１４に示すような手順のサブルーチンとして実行さ
れる。この削減処理では、まず、音声復号器６から出力
されて出力音声バッファ５に蓄積されたカウント値（Ｐ
−（Ｍ＊Ｎ））より多いＰ個の再生音声サンプルを出力
音声バッファ５から取り出して、Ｌ倍オーバーサンプリ
ングし（ステップＳ７１）、Ｐ＊Ｌ個の細分化されたサ
ンプル点に分割する。そして、この細分化されたサンプ
ル点（Ｌ＋Ｍ）個毎に、Ｌ＊Ｎ回隣接する元の音声サン
プルから新たな音声サンプルを補間して出力音声バッフ
ァ５に出力して上書き格納し（ステップＳ７２）、更
に、細分化サンプル点Ｌ個毎に、音声サンプルをＰ−
（Ｍ＊Ｎ）−（Ｌ＊Ｎ）回出力音声バッファ５に出力し
て上書き格納する（ステップＳ７３）。The above-mentioned voice sample number reduction processing (step S46) is executed by the voice sample number converter 8 as a subroutine of the procedure shown in FIG. In this reduction processing, first, the count value (P
-(M * N)) more than P reproduced voice samples are taken out from the output voice buffer 5, oversampled L times (step S71), and divided into P * L subdivided sample points. Then, for each of the subdivided sample points (L + M), a new audio sample is interpolated from the original audio sample that is L * N times adjacent, and is output to the output audio buffer 5 to be overwritten and stored (step S72). , And P-the audio sample for each L subdivided sample points.
The data is output to (M * N)-(L * N) times output audio buffer 5 and is overwritten and stored (step S73).

【００５３】すなわち、図９や図１０に示したと同様に
して、Ｐ個の再生音声サンプル列から、上記のオーバー
サンプリング及び補間により再生音性サンプルを削減さ
せて合計（Ｌ＊Ｎ）＋（Ｐ−（Ｍ＊Ｎ）−（Ｌ＊Ｎ））
＝Ｐ−（Ｍ＊Ｎ）個の再生音声サンプルを出力音声バッ
ファ５に再格納する。この結果、音声再生処理に適合し
た個数の再生音声サンプルを出力音声バッファ５から出
力させている。なお、再生音声サンプルを削減する態様
は、上記の符号化の場合と同様である。That is, in the same manner as shown in FIG. 9 and FIG. 10, a total of (L * N) + (P -(M * N)-(L * N))
= P- (M * N) reproduced voice samples are stored again in the output voice buffer 5. As a result, the output audio buffer 5 outputs a number of reproduced audio samples suitable for the audio reproduction process. The mode of reducing the reproduced voice samples is the same as the case of the above encoding.

【００５４】上記のように、音声サンプル数の増加処理
や削減処理は、音声サンプルカウンタ７が、音符復号器
６に１フレーム分の音声符号化情報ビットｈ２が入力さ
れる間に出力音声バッファ５から出力される再生音声サ
ンプル数をカウントし、ビットクロックｇ２の変動によ
って、出力音声バッファ５に入力される再生音声サンプ
ル数（すなわち、音声再生器へ出力される再生音声サン
プル数）が適正値からずれてしまったときに行われる。
したがって、ビットクロックｇ２が変動してしまったと
きでも、音声再生器へその再生処理に適合した適正な数
の再生音声サンプルを供給することができ、また、上記
の増加処理や削減処理は分散して行われることから再生
音声サンプル列の連続性を保って高品質な音声を再生す
ることができる。As described above, in the process of increasing or decreasing the number of voice samples, the voice sample counter 7 outputs the voice buffer 5 while the voice codec 6 inputs one frame of voice coded information bits h2. The number of reproduced audio samples output from the output audio buffer 5 is counted, and the number of reproduced audio samples input to the output audio buffer 5 (that is, the number of reproduced audio samples output to the audio reproducer) changes from an appropriate value due to the fluctuation of the bit clock g2. It is done when it is out of alignment.
Therefore, even when the bit clock g2 fluctuates, it is possible to supply an appropriate number of reproduced audio samples suitable for the reproduction process to the audio reproducer, and the above-mentioned increase processing and reduction processing are dispersed. Therefore, it is possible to reproduce high-quality sound while maintaining the continuity of the reproduced sound sample sequence.

【００５５】なお、上記した実施例では、音声サンプル
数の増加及び削減処理を音声バッファから読み出して再
度上書き書込することにより行ったが、本発明では、例
えば、音声バッファの入力用領域に蓄積した音声サンプ
ルを読み出して変換した後に、音声バッファの出力用領
域に書き込んで出力させる、或いは、音声バッファへの
格納の際や音声バッファからの出力の際に、変換器でそ
のサンプル個数を変換するようにしてもよく、特にその
態様に限定はない。また、本発明では、音声サンプル数
の増加及び削減処理における重複や間引きの態様に特に
限定はなく、要は、オーバーサンプリングされたサンプ
ル列中から偏ることなく分散させて重複や間引きを行え
ばよい。In the above-described embodiment, the processing for increasing and reducing the number of audio samples is performed by reading from the audio buffer and overwriting again, but in the present invention, for example, it is stored in the input area of the audio buffer. After reading the converted audio sample and converting it, write it in the output area of the audio buffer and output it, or convert the number of samples with the converter when storing in the audio buffer or outputting from the audio buffer However, the aspect is not particularly limited. Further, in the present invention, there is no particular limitation on the mode of duplication or decimation in the process of increasing or reducing the number of audio samples, and the point is that the duplication or decimation may be performed by distributing the oversampled sample sequence without bias. .

【００５６】[0056]

【発明の効果】以上説明したように、本発明によれば、
音声バッファから音声符号化器へ供給される音声サンプ
ル数や、音声バッファから出力される再生音声サンプル
数を、フレーム中で偏ることなく分散させて調整するよ
うにしたため、例えば、外部装置から入力されるビット
クロックの周波数誤差が存在する場合にあっても、音声
サンプル数を適正な数に変換して符号化や復号再生の処
理を行うことができるとともに、その音声サンプル間の
連続性を維持して高品質な音声を再生させることができ
る。As described above, according to the present invention,
Since the number of audio samples supplied from the audio buffer to the audio encoder and the number of reproduced audio samples output from the audio buffer are distributed and adjusted without bias in the frame, for example, input from an external device. Even if there is a bit clock frequency error, the number of audio samples can be converted to an appropriate number for encoding and decoding playback, and the continuity between the audio samples can be maintained. It is possible to reproduce high quality sound.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声符号化装置の構
成図である。FIG. 1 is a configuration diagram of a speech encoding apparatus according to an embodiment of the present invention.

【図２】本発明の一実施例に係る音声復号装置の構成
図である。FIG. 2 is a configuration diagram of a speech decoding apparatus according to an embodiment of the present invention.

【図３】音声符号化におけるデジタルデータ出力割り
込み処理の手順の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of a procedure of digital data output interrupt processing in voice encoding.

【図４】音声符号化における音声サンプル入力割り込
み処理の手順の一例を示すフローチャートである。FIG. 4 is a flowchart showing an example of a procedure of a voice sample input interrupt process in voice encoding.

【図５】音声符号化における音声サンプル数増加処理
の手順の一例を示すフローチャートである。FIG. 5 is a flowchart showing an example of a procedure of a voice sample number increase process in voice encoding.

【図６】音声サンプル数増加処理を説明する概念図で
ある。FIG. 6 is a conceptual diagram illustrating a voice sample number increase process.

【図７】音声サンプル数増加処理における分散態様を
説明する概念図である。FIG. 7 is a conceptual diagram illustrating a distribution mode in a voice sample number increase process.

【図８】音声符号化における音声サンプル数削減処理
の手順の一例を示すフローチャートである。FIG. 8 is a flowchart illustrating an example of a procedure of a voice sample number reduction process in voice encoding.

【図９】音声サンプル数削減処理を説明する概念図で
ある。FIG. 9 is a conceptual diagram illustrating a voice sample number reduction process.

【図１０】音声サンプル数削減処理における分散態様
を説明する概念図である。FIG. 10 is a conceptual diagram illustrating a distribution mode in a voice sample number reduction process.

【図１１】音声復号再生におけるデジタルデータ入力
割り込み処理の手順の一例を示すフローチャートであ
る。FIG. 11 is a flowchart showing an example of a procedure of digital data input interrupt processing in audio decoding / playback.

【図１２】音声復号再生における音声サンプル出力割
り込み処理の手順の一例を示すフローチャートである。FIG. 12 is a flowchart showing an example of the procedure of an audio sample output interrupt process in audio decoding and reproduction.

【図１３】音声復号再生における音声サンプル数増加
処理の手順の一例を示すフローチャートである。FIG. 13 is a flowchart showing an example of the procedure of a process for increasing the number of audio samples in audio decoding / playback.

【図１４】音声復号再生における音声サンプル数削減
処理の手順の一例を示すフローチャートである。FIG. 14 is a flowchart illustrating an example of a procedure of a voice sample number reduction process in voice decoding and reproduction.

【図１５】従来の音声符号化装置及び音声復号装置の
構成図である。[Fig. 15] Fig. 15 is a configuration diagram of a conventional speech encoding device and speech decoding device.

【図１６】音声符号化及び音声復号化における入出力
タイミングの一例を示すタイミングチャートである。FIG. 16 is a timing chart showing an example of input / output timings in voice encoding and voice decoding.

[Explanation of symbols]

１・・・入力音声バッファ、２・・・音声符号化器、
３・・・音声サンプルカウンタ、４・・・音声サンプ
ル数変換器、５・・・出力音声バッファ、６・・・音
声復号器、７・・・音声サンプルカウンタ、８・・・
音声サンプル数変換器、ａ２、ｂ２・・・音声サンプ
ル、ｋ２、ｌ２・・・サンプリングクロック、ｃ２、
ｆ２・・・フレーム同期信号、ｄ２、ｇ２・・・ビッ
トクロック、ｅ２、ｈ２・・・音声符号化情報ビット、
ｉ２、ｊ２・・・再生音声サンプル、1 ... Input voice buffer, 2 ... Voice encoder,
3 ... voice sample counter, 4 ... voice sample number converter, 5 ... output voice buffer, 6 ... voice decoder, 7 ... voice sample counter, 8 ...
Voice sample number converter, a2, b2 ... voice sample, k2, 12 ... sampling clock, c2,
f2 ... Frame synchronization signal, d2, g2 ... Bit clock, e2, h2 ... Speech coded information bit,
i2, j2 ... Playback audio sample,

Claims

(57) [Claims]

1. An audio sample is accumulated in an audio buffer for each predetermined frame in synchronization with a sampling clock, the accumulated audio samples are input to an audio encoder for each frame, and an output of the audio encoder is output. In the audio processing method that outputs the audio coded information bit string for each frame in synchronization with the bit clock, the audio sample for each frame accumulated in the audio buffer is oversampled and the bit clock is relative to the sampling clock. If the number of voice samples for each frame is input to the voice encoder by interpolating the voice sample at the new sample point from the original voice sample by lengthening the interval between sample points, Convert to the number of audio samples that matches the bit clock, and set the bit clock for the sampling clock. When the lock is relatively increased and changed, the interval between sample points is shortened and the audio sample at the new sample point is interpolated from the original audio sample, so that the audio for each frame input to the audio encoder is An audio processing method, characterized in that the number of samples is converted into the number of audio samples suitable for the bit clock.

2. An input voice sample is stored in a voice buffer for each frame (P pieces / frame), and the voice sample stored in the voice buffer is subjected to a voice encoding process by a voice encoder to obtain a bit clock. In a voice processing for outputting a voice coded information bit string (Q bits / frame) in synchronization with, the number of voice samples stored in the voice buffer while the voice coded information bits are output by Q bits is counted, When the count value is (P + (M * N)), the audio samples stored in the audio buffer are oversampled L times to obtain (P + (M * N)) * L subdivided samples. After setting the points, subdivided sample points (L +
Every (M) number of (L * N) times, a new voice sample is interpolated from the adjacent original voice sample and output to the voice encoder, and (P- (L *
N)) times output the audio sample to the audio encoder, so that a total of P audio samples are output to the audio encoder and the count value is (P- (M * N)). Oversamples the audio samples stored in the audio buffer by L times to obtain (P- (M * N)) * L subdivided sample points, and then subdivides the sample points (L-
Every (M) number of (L * N) times, a new voice sample is interpolated from the adjacent original voice sample and output to the voice encoder, and (P- (L *
N)) A speech processing method characterized in that a total of P speech samples are output to the speech encoder by outputting the speech samples to the speech encoder.

3. A voice coded information bit string for each frame is input to a voice decoder in synchronization with a bit clock, decoded playback voice samples are output, and the playback voice samples for each frame are stored in a voice buffer. Then, in the audio processing method that outputs the reproduced audio sample for each predetermined frame in synchronization with the sampling clock, the reproduced audio sample for each frame accumulated in the audio buffer is oversampled, and the bit clock is compared with the sampling clock. If there is a relative decrease, the interval between sampling points is shortened and the audio sample at the new sample point is interpolated from the original audio sample to determine the number of reproduced audio samples for each frame output from the audio buffer. Convert to the number of audio samples that matches the sampling clock, and When the bit clock relatively increases and changes with respect to the ringing clock, the interval between sampling points is lengthened and the audio sample at the new sample point is interpolated from the original audio sample and output from the audio buffer. An audio processing method, characterized in that the number of reproduced audio samples for each frame is converted into the number of audio samples suitable for the sampling clock.

4. Synchronized with a bit clock, a speech coded information bit string (Q bits / frame) is input to a speech decoder for each frame, speech decoding processing is performed for each frame, and reproduced speech signal samples (P / Frame) is stored in the audio buffer, and then in the audio processing of outputting the reproduced audio sample for each frame, the number of reproduced audio samples output from the audio buffer is counted while Q bits of audio encoded information bits are input. , If the count value is (P + (M * N)),
P output from the audio decoder and stored in the audio buffer
After oversampling the reproduced speech samples L times into (P * L) subdivided sample points, each subdivided sample point (L−M) is adjacent (L * N) times. A new audio sample is interpolated from the audio sample of
By outputting the audio sample from the audio buffer (P + (M * N)-(L * N)) times for each piece, a total of (P +
When (M * N)) reproduced voice samples are output and the count value is (P- (M * N)),
P output from the audio decoder and stored in the audio buffer
After oversampling each reproduced voice sample by L times to make (P * L) subdivided sample points, each subdivided sample point (L + M) (L * N) times the adjacent original voice A new audio sample is interpolated from the sample and output from the audio buffer, and the sample point L
By outputting the audio samples from the audio buffer for each (P- (M * N)-(L * N)) times, a total of (P-
A sound processing method comprising outputting (M * N)) reproduced sound samples.

5. A voice buffer for storing input voice samples, and a voice encoder for encoding the voice samples stored in the voice buffer, the voice encoder synchronizing with a bit clock. In a voice processing device that outputs a voice coded information bit string for each frame, a voice sample counter that counts the number of voice samples stored in a voice buffer while one frame of voice coded information bits is output from a voice encoder. And the count value by the audio sample counter is different from the predetermined number of audio samples for one frame to be stored in the audio buffer while one frame of audio encoded information bits is output from the audio encoder. An audio sample number converter for converting the number of audio samples stored in the audio buffer. The pull number converter oversamples the audio samples for each frame stored in the audio buffer, and when the count value is greater than the predetermined number of samples, the interval between sampling points is lengthened to increase the sampling point of the new sample points. By interpolating the voice samples from the original voice samples, the voice samples for each frame input to the voice encoder are converted into the predetermined number of samples, and sampling is performed when the count value is less than the predetermined number of samples. By shortening the point interval and interpolating the audio sample of the new sample point from the original audio sample,
A speech processing apparatus, characterized in that the speech samples for each frame input to the speech encoder are converted into the predetermined number of speech samples.

6. A voice decoder comprising: a voice decoder for voice-decoding a voice coded information bit string for each frame; and a voice buffer for storing and outputting reproduced voice signal samples that have undergone voice decoding for each frame. In a speech processing apparatus for synchronously inputting a speech coded information bit string to a speech decoder, the number of reproduced speech samples output from a speech buffer while one frame of speech coded information bits is inputted to the speech decoder. The audio sample counter for counting and the count value by the audio sample counter are the predetermined reproduced audio for one frame to be stored in the audio buffer while the audio encoded information bits for one frame are input to the audio decoder. An audio sample number converter that converts the number of audio samples stored in the audio buffer when the number of samples is different The audio sample number converter oversamples the reproduced audio samples for each frame stored in the audio buffer, and lengthens the sampling point interval when the count value is greater than the predetermined number of samples. By interpolating the audio sample at the new sample point from the original audio sample, the number of reproduced audio samples for each frame output from the audio buffer is converted to the predetermined number of samples, and the count value is the predetermined number of samples. If it is smaller, the interval between sampling points is shortened and the audio sample at the new sample point is interpolated from the original audio sample to determine the number of reproduced audio samples for each frame output from the audio buffer by the predetermined audio sample. A voice processing device characterized by converting into a number.

7. A voice communication device for encoding and transmitting a voice signal and decoding the received coded voice signal, wherein the voice processing device according to claim 5 is provided in a voice signal encoding processing section. A voice communication device comprising the voice processing device according to item 6 in a voice signal decoding processing unit.

8. A voice communication system in which a transmission side device encodes and transmits a voice signal, and a reception side device decodes a received encoded voice signal, wherein the voice processing device according to claim 5 is used. 7. A voice communication system comprising: the voice signal encoding processing unit according to claim 6; and the voice processing apparatus according to claim 6 included in a voice signal decoding processing unit of a reception side device.