JP3468184B2

JP3468184B2 - Voice communication device and its communication method

Info

Publication number: JP3468184B2
Application number: JP36564099A
Authority: JP
Inventors: 孝行石川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-12-22
Filing date: 1999-12-22
Publication date: 2003-11-17
Anticipated expiration: 2019-12-22
Also published as: JP2001184098A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、線形予測符号化
（ＬＰＣ）分析・合成方式を用いた音声通信装置（送信
装置及び受信装置）と、その通信方法とに関する。The present invention relates to a linear predictive coding voice communication device using (LPC) analysis and synthesis system (transmitter and receiver), and to a communication method thereof.

【０００２】[0002]

【従来の技術】ＬＰＣ係数と残差信号を用いたＬＰＣ分
析・合成方式の音声通信装置として、従来よりピッチ励
振型ＬＰＣボコーダが知られている。図３はこの従来の
音声通信装置（送信装置及び受信装置）としてのピッチ
励振型ＬＰＣボコーダの一例のブロック図を示す。2. Description of the Related Art A pitch excitation type LPC vocoder has been conventionally known as an LPC analysis / synthesis type voice communication apparatus using an LPC coefficient and a residual signal. FIG. 3 shows a block diagram of an example of a pitch excitation type LPC vocoder as the conventional voice communication device (transmitting device and receiving device).

【０００３】同図において、送信装置側の入力音声信号
は音声帯域制限用ローパスフィルタ（ＬＰＦ）２２によ
り、例えば３００Ｈｚ〜３．４ｋＨｚの電話音声帯域に
制限された後、Ａ／Ｄ変換器２３に供給されて、所定の
標本化周波数で標本化された所定量子化ビット数の音声
データに変換される。In the figure, an input voice signal on the transmitter side is limited to a telephone voice band of, for example, 300 Hz to 3.4 kHz by a voice band limiting low pass filter (LPF) 22, and then is input to an A / D converter 23. The audio data is supplied and converted into audio data having a predetermined number of quantization bits sampled at a predetermined sampling frequency.

【０００４】この音声データは、線形予測分析器（ＬＰ
Ｃ分析器）２４に供給され、ここで公知の線形予測分析
によりｋパラメータ、αパラメータ等の８個〜１２個程
度のＬＰＣ係数に変換される。音源分析器２５はこのＬ
ＰＣ係数を用いて公知の方法で残差信号を抽出し、更に
これを２乗和して音源信号（電力）を算出する。また、
上記の音声データは、ピッチ抽出器２６に供給されて、
音源データのピッチ周波数（声帯振動数）が抽出され
る。This voice data is a linear predictive analyzer (LP
C analyzer 24, where it is converted into about 8 to 12 LPC coefficients such as k parameter and α parameter by known linear prediction analysis. The sound source analyzer 25 is this L
The residual signal is extracted by a known method using the PC coefficient, and the squared sum of the extracted residual signal is calculated to calculate the sound source signal (power). Also,
The above voice data is supplied to the pitch extractor 26,
The pitch frequency (vocal chord frequency) of the sound source data is extracted.

【０００５】以上のＬＰＣ分析器２４から出力されたＬ
ＰＣ係数と、音源分析器２５から出力された音源信号
（電力）と、ピッチ抽出器２６から出力されたピッチ周
波数とは、それぞれ多重化器２７に供給され、ここで多
重化された後、伝送路を介して合成側（受信装置側）へ
送出される。L output from the above LPC analyzer 24
The PC coefficient, the sound source signal (power) output from the sound source analyzer 25, and the pitch frequency output from the pitch extractor 26 are respectively supplied to a multiplexer 27, where they are multiplexed and then transmitted. It is sent to the combining side (receiving device side) via the path.

【０００６】受信装置側では、分離器２８で入力された
信号から上記のＬＰＣ係数、音源信号（電力）及びピッ
チ周波数をそれぞれ分離する。パルス列発生器２９は分
離器２８からのピッチ周波数に応じたインパルス列を発
生する。有声／無声判定器３０は分離器２８からのピッ
チ周波数及び電力により有声音声か無声音声かを判定
し、その判定結果をスイッチ回路３２へスイッチング信
号として供給する。On the receiver side, the LPC coefficient, the sound source signal (power) and the pitch frequency are separated from the signal input by the separator 28. The pulse train generator 29 generates an impulse train corresponding to the pitch frequency from the separator 28. The voiced / unvoiced deciding unit 30 decides a voiced voice or an unvoiced voice based on the pitch frequency and power from the separator 28, and supplies the decision result to the switch circuit 32 as a switching signal.

【０００７】スイッチ回路３２は、有声音声判定時は上
記のスイッチング信号に基づき、パルス列発生器２９か
らピッチ周期に対応した固定周期（固定間隔）で取り出
されたインパルス列を選択して音源復調器３３へ供給
し、無声音声判定時は声帯振動が無くピッチ周波数が存
在しないため、前記ピッチ周波数に基づき固定周期的に
励振されるインパルス列の代わりに、上記のスイッチン
グ信号に基づき、雑音発生器３１から取り出された白色
雑音信号に応じたランダムパルス列を選択して音源復調
器３３へ供給する。The switch circuit 32 selects an impulse train extracted at a fixed period (fixed interval) corresponding to the pitch period from the pulse train generator 29 on the basis of the above-mentioned switching signal when the voiced voice is judged, and the sound source demodulator 33. Since there is no vocal cord vibration and there is no pitch frequency at the time of unvoiced voice determination, instead of the impulse train excited in a fixed cycle based on the pitch frequency, the noise generator 31 is supplied based on the switching signal. A random pulse train corresponding to the extracted white noise signal is selected and supplied to the sound source demodulator 33.

【０００８】音源復調器３３は、スイッチ回路３２から
のパルス列と分離器２８により分離された電力とに基づ
いて音源信号を復調する。ＬＰＣ合成フィルタ３４はこ
の復調音源信号に励振され、分離器２８により分離され
たＬＰＣ係数により係数が制御されてディジタル合成音
声信号を出力する。The sound source demodulator 33 demodulates the sound source signal based on the pulse train from the switch circuit 32 and the power separated by the separator 28. The LPC synthesis filter 34 is excited by the demodulated sound source signal, the coefficient is controlled by the LPC coefficient separated by the separator 28, and the digital synthesized speech signal is output.

【０００９】このディジタル合成音声信号は、Ｄ／Ａ変
換器３５に供給されてアナログ信号に変換された後、Ｌ
ＰＦ３６により不要周波数成分が除去されて電話音声帯
域の音声信号とされて出力される。This digital synthesized voice signal is supplied to the D / A converter 35 and converted into an analog signal, and then L
Unnecessary frequency components are removed by the PF 36, and a voice signal in the telephone voice band is output.

【００１０】このように、従来の音声通信装置では、有
声音声の合成に際し、音源情報を電話音声帯域（３００
Ｈｚ〜３．４ｋＨｚ）で一括分析し、その結果得られた
ピッチ周波数に応じた固定周期でパルス列発生器２９か
らインパルス列を発生している。As described above, in the conventional voice communication apparatus, when synthesizing the voiced voice, the sound source information is set to the telephone voice band (300
(Hz to 3.4 kHz), the impulse train is generated from the pulse train generator 29 at a fixed cycle corresponding to the pitch frequency obtained as a result.

【００１１】[0011]

【発明が解決しようとする課題】しかし、本来の自然音
声（有声音声）声帯振動は、一定（固定周期）ではな
く、肉声の変化に応じた時間的、周波数的揺らぎ、瞬時
変動を伴う準周期的な動作をしている。しかるに、上記
の従来の音声通信装置では、音源情報（ピッチ周波数）
を分析するにあたって、上述した肉声の周波数的揺ら
ぎ、時間的揺らぎを無視して、電話音声帯域（３００Ｈ
ｚ〜３．４ｋＨｚ）を定常信号とみなし、全帯域を一括
して分析しているため、その肉声の変動に伴う時間的、
周波数的な揺らぎを表現し得ず、このため、自然性に乏
しい機械的合成音声しか生成できないという問題があ
る。However, the natural natural (voiced) vocal cord vibration is not constant (fixed period), but quasi-periodic with fluctuations in time and frequency and instantaneous fluctuations according to changes in the real voice. Behaves like However, in the above conventional voice communication device, the sound source information (pitch frequency)
In analyzing the above, ignoring the frequency fluctuations and temporal fluctuations of the real voice described above, the telephone voice band (300H
(z to 3.4 kHz) is regarded as a stationary signal, and the whole band is analyzed collectively, so that the temporal change due to the fluctuation of the real voice,
There is a problem that frequency fluctuations cannot be expressed, and therefore only mechanically synthesized speech with poor naturalness can be generated.

【００１２】一般に周波数分析の欠点は、その周波数の
定倍の信号を検出してしまうことがあることであり、ピ
ッチ周波数分析においても、その倍の周波数（倍ピッ
チ）や半分の周波数（半ピッチ）を検出して、音声品質
の自然性が劣化した合成音声を生成してしまうことがあ
る。In general, a drawback of frequency analysis is that it may detect a signal of a constant multiple of that frequency, and even in the pitch frequency analysis, that frequency (double pitch) or half frequency (half pitch) is detected. ) Is detected, a synthetic voice with deteriorated naturalness of voice quality may be generated.

【００１３】また、従来、音声信号（残差信号）の帯域
内信号を一括して有声・無声と判定しているが、帯域内
すべてが同じ状態（有声・無声）でいることは無く、周
波数帯域ごとに有声・無声が混在することがある。ま
た、有声音声時においても、各周波数帯域ごとにピッチ
周期が異なることがある。Conventionally, in-band signals of a voice signal (residual signal) are collectively judged to be voiced / unvoiced, but all in-band signals are not in the same state (voiced / unvoiced) and the frequency is not. Voiced and unvoiced sounds may be mixed in each band. Further, even in the case of voiced voice, the pitch period may differ for each frequency band.

【００１４】更に、上記の従来の音声通信装置では、Ｌ
ＰＣ分析は帯域内スペクトルを一括して表現させるた
め、通常使用する８個〜１２個程度のＬＰＣ係数では、
エネルギーが集中している低域周波数帯に割り当てら
れ、高域周波数帯の表現精度が不十分になるため、フォ
ルマント帯域幅の過小推定、高次（第３次）フォルマン
トの近似性に劣り、忠実なスペクトルの再現ができない
場合がある。Further, in the above-mentioned conventional voice communication device, L
Since the PC analysis collectively expresses the in-band spectrum, with the normally used 12 to 12 LPC coefficients,
Since it is assigned to the low frequency band where energy is concentrated and the representation accuracy of the high frequency band becomes insufficient, the underestimation of the formant bandwidth and the approximation of the high order (3rd) formant are poor and It may not be possible to reproduce a different spectrum.

【００１５】更に音源信号は実際には定常的信号ではな
く、揺らぎを伴うものであり、周波数帯域ごとにその揺
らぎ幅が異なるものである。Furthermore, the sound source signal is not a stationary signal in practice, but is accompanied by fluctuations, and the fluctuation width is different for each frequency band.

【００１６】なお、スペクトル精度の向上を図るため、
ＬＰＣ係数を増加させることも考えられるが、ＬＰＣ係
数の増加は、通信情報量の増加にもつながるため、１２
個を越えるＬＰＣ係数の使用は望ましくなく、実際に狭
帯域通信を行う音声通信装置においては、１２個を越え
るＬＰＣ係数の使用は現状では困難である。In order to improve the spectrum accuracy,
Although it is possible to increase the LPC coefficient, an increase in the LPC coefficient leads to an increase in the amount of communication information.
It is not desirable to use more than 12 LPC coefficients, and it is currently difficult to use more than 12 LPC coefficients in a voice communication device that actually performs narrow band communication.

【００１７】本発明は以上の点に鑑みなされたもので、
より自然性の高い音声合成を再現するとともに、ＬＰＣ
係数及び音源情報をベクトル量子化により情報量を圧縮
し、かつ、相互類似度の高いＬＰＣ係数は代表を１つの
み選択し更に情報量を圧縮することにより、ＬＰＣ係数
を増加させることなく（情報量の増加を伴わずに）、よ
り忠実なスペクトルの再現（高品質音声）をすることが
できる音声通信装置及びその通信方法を提供することを
目的とする。The present invention has been made in view of the above points,
Reproduces more natural speech synthesis and LPC
The amount of information is compressed by vector quantization of the coefficient and the sound source information, and only one representative LPC coefficient having a high degree of mutual similarity is selected, and the amount of information is further compressed without increasing the LPC coefficient (information It is an object of the present invention to provide a voice communication device and a communication method thereof capable of reproducing a more faithful spectrum (high quality voice) without increasing the amount).

【００１８】[0018]

【課題を解決するための手段】本発明の音声通信装置
は；音声信号を入力し符号化音声信号として伝送路へ出
力する音声通信送信装置と、前記伝送路を介して前記音
声通信送信装置と接続され入力された符号化音声信号か
ら合成音声信号を再生する音声通信受信装置とから成る
音声通信装置において；前記音声通信送信装置が；予め
定めた音声帯域を第１の分割数の帯域に分割し、その分
割帯域のそれぞれについて所定周期のフレーム化された
入力音声信号を線形予測分析して線形予測係数を出力す
る線形予測分析手段と；前記線形予測分析手段から出力
される分割帯域ごとの前記線形予測係数をベクトル量子
化する第１の量子化手段と；同じ分割帯域ごとの前記線
形予測係数と前記入力音声信号とを受け、残差信号を抽
出する逆フィルタ手段と；前記残差信号を第２の分割数
の帯域に分割し、その分割帯域のそれぞれの残差信号に
基づいて分割帯域ごとに音源信号を抽出する音源分析手
段と；前記分割帯域ごとの音源信号からケプストラム信
号を抽出するケプストラム分析手段と；前記ケプストラ
ム信号を基に前記音源信号を補正する音源補正手段と；
前記音源補正手段から出力される分割帯域ごとの補正さ
れた前記音源信号をベクトル量子化する第２の量子化手
段と；前記第１の量子化手段から出力される分割帯域ご
との前記ベクトル量子化済線形予測係数をフレームを所
定数分集成した大フレーム単位で蓄積するベクトル蓄積
手段と；前記ベクトル蓄積手段に蓄積された複数のベク
トル量子化済線形予測係数を互いに類似性のあるものと
ないものとに分離し、類似性ありとみなした複数のベク
トル量子化済線形予測係数の中から一つのみを代表ベク
トル量子化済線形予測係数として選択する類似度判定手
段と；前記類似度判定手段から出力される前記代表ベク
トル量子化済線形予測係数及び類似性ありとみなされな
かった残りのベクトル量子化済線形予測係数と、前記第
２の量子化手段から出力される分割帯域ごとの前記ベク
トル量子化済音源信号とをそれぞれ多重化し符号化音声
信号として出力する多重化手段と；を備え；前記音声通
信受信装置が；前記音声通信送信装置からの符号化音声
信号を受け、その信号から分割帯域ごとの代表ベクトル
量子化済線形予測係数及び類似性ありとみなされなかっ
た残りのベクトル量子化済線形予測係数と、当該分割帯
域ごとのベクトル量子化済音源信号とをそれぞれ分離
し、前記代表ベクトル量子化済線形予測係数を類似性あ
りとみなしたフレーム数分のベクトル量子化済線形予測
係数として複製し、前記類似性ありとみなされなかった
残りのベクトル量子化済線形予測係数とともに大フレー
ム分の値に復元した後、各ベクトル値をスカラー値に復
元する分離手段と；前記分離手段から出力される分割帯
域ごとの前記線形予測係数及び音源信号から全帯域分の
合成音声信号を再生する合成手段と；を備える。A voice communication apparatus of the present invention.
Inputs a voice signal and outputs it as a coded voice signal to the transmission line.
Voice communication transmitting device and the sound through the transmission line.
Is it an encoded voice signal that is connected to the voice communication transmitter and input?
And a voice communication receiving device for reproducing a synthetic voice signal from
In a voice communication device; the voice communication transmitting device;
Divide the defined voice band into the first number of bands,
Framed with a predetermined period for each of the divided bands
Output a linear prediction coefficient by performing a linear prediction analysis on the input speech signal
Linear predictive analysis means; and output from the linear predictive analysis means
Vector quantization of the linear prediction coefficient for each divided band
First quantizing means for converting; the line for each of the same divided bands
The residual signal is extracted by receiving the shape prediction coefficient and the input speech signal.
Inverse filter means for outputting the residual signal to a second division number
Of each residual signal of the divided band
Sound source analysis method that extracts sound source signals for each divided band based on
And a cepstrum signal from the sound source signal for each divided band.
A cepstrum analysis means for extracting the number;
A sound source correcting means for correcting the sound source signal based on the sound signal;
The correction for each divided band output from the sound source correction means
Second quantizer for vector quantizing the generated source signal
A stage; each divided band output from the first quantizing means
Frame the vector quantized linear prediction coefficients with
Vector accumulation that accumulates in large frame units assembled by a constant number
Means and a plurality of vectors stored in the vector storage means
Toll quantized linear prediction coefficients are similar to each other.
Multiple vectors that are separated from those that are not and are considered to be similar
Represents only one of the quantized linear prediction coefficients
Tol Quantized linear prediction coefficient selected similarity measure
A step; the representative vector output from the similarity determination means
Toll quantized linear prediction coefficient and not considered similar
The remaining vector quantized linear prediction coefficients,
The vector for each divided band output from the second quantization means
Tol quantized sound source signal and encoded sound
Multiplexing means for outputting as a signal;
A signal receiving device; encoded voice from the voice communication transmitting device
Represents a signal and a representative vector for each divided band from the signal
Quantized linear prediction coefficient and not considered to be similar
Remaining vector quantized linear prediction coefficients and
Separated from vector quantized source signal for each region
The representative vector quantized linear prediction coefficient
Vector quantized linear prediction for the number of frames considered
Duplicated as a coefficient and not considered to have the similarities
Large frame with remaining vector quantized linear prediction coefficients
Each vector value to a scalar value.
Original separating means; division band output from the separating means
For all bands from the linear prediction coefficient and sound source signal for each band
And a synthesizing means for reproducing the synthesized voice signal .

【００１９】本発明の音声通信方法は；送信側にて、入
力音声信号を符号化音声信号として伝送路へ送出し、受
信側にて、前記伝送路を介して受信した符号化音声信号
を合成音声信号として再生する音声通信方法において；
前記送信側で；予め定めた音声帯域を第１の分割数の帯
域に分割し、その分割帯域のそれぞれについて所定周期
のフレーム化された入力音声信号を線形予測分析して線
形予測係数を求め；前記線形予測分析手段から出力され
る分割帯域ごとの前記線形予測係数をベクトル量子化
し；同じ分割帯域ごとの前記線形予測係数と前記入力音
声信号とから残差信号を抽出し；前記残差信号を第２の
分割数の帯域に分割し、その分割帯域のそれぞれの残差
信号に基づいて分割帯域ごとに音源信号を抽出し；前記
分割帯域ごとの音源信号からケプストラム信号を抽出
し；前記ケプストラム信号を基に前記音源信号を補正
し；前記分割帯域ごとの補正された音源信号をベクトル
量子化し；前記分割帯域ごとの前記ベクトル量子化済線
形予測係数をフレームを所定数分集成した大フレーム単
位で蓄積し；蓄積された複数のベクトル量子化済線形予
測係数を互いに類似性のあるものとないものとに分離
し、類似性ありとみなした複数のベクトル量子化済線形
予測係数の中から一つのみを代表ベクトル量子化済線形
予測係数として選択し；前記代表ベクトル量子化済線形
予測係数及び類似性ありとみなされなかった残りのベク
トル量子化済線形予測係数と、前記ベクトル量子化済音
源信号とをそれぞれ多重化し符号化音声信号として前記
伝送路へ出力し；前記受信側で；前記伝送路を介して前
記符号化音声信号を受信し、その信号から分割帯域ごと
の代表ベクトル量子化済線形予測係数及び類似性ありと
みなされなかった残りのベクトル量子化済線形予測係数
と、当該分割帯域ごとのベクトル量子化済音源信号とを
それぞれ分離し；前記代表ベクトル量子化済線形予測係
数を類似性ありとみなしたフレーム数分のベクトル量子
化済線形予測係数として複製し、前記類似性ありとみな
されなかった残りのベクトル量子化済線形予測係数とと
もに大フレーム分の値に復元した後、各ベクトル値をス
カラー値に復元し；前記復元された分割帯域ごとの線形
予測係数及び音源信号から全帯域分の合成音声信号を再
生する工程を有する。In the voice communication method of the present invention, the transmitting side sends the input voice signal as an encoded voice signal to the transmission line, and the receiving side synthesizes the encoded voice signal received through the transmission line. In a voice communication method for reproducing as a voice signal;
On the transmitting side; a predetermined voice band is divided into bands of a first division number, and a linear prediction analysis is performed on the framed input voice signal of a predetermined cycle for each of the divided bands to obtain a linear prediction coefficient; Vector quantizing the linear prediction coefficient for each divided band output from the linear prediction analysis means; extracting a residual signal from the linear prediction coefficient for each same divided band and the input speech signal; Dividing into a band of a second number of divisions, extracting a sound source signal for each divided band based on each residual signal of the divided band; extracting a cepstrum signal from the sound source signal for each divided band; the cepstrum signal Correct the excitation signal on the basis of the above; vector-quantize the corrected excitation signal for each of the divided bands; frame the vector quantized linear prediction coefficient for each of the divided bands. Accumulated in large frame units aggregated by a constant; multiple accumulated vector quantized linear predictive coefficients are separated into those with similarities and those with no similarity, Only one of the linear predictive coefficients is selected as the representative vector quantized linear predictive coefficient; the representative vector quantized linear predictive coefficient and the remaining vector quantized linear predictive coefficients that are not regarded as having similarity. , The vector quantized excitation signal is multiplexed and output to the transmission line as a coded voice signal; on the receiving side; the coded voice signal is received via the transmission line, and a divided band is generated from the signal. Representative vector quantized linear prediction coefficient for each and the remaining vector quantized linear prediction coefficient that is not regarded as having similarity, and vector quantized sound for each divided band. Signals are separated from each other; the representative vector quantized linear prediction coefficient is duplicated as vector quantized linear prediction coefficients for the number of frames considered to have similarity, and the remaining vector not considered to have similarity After restoring to a value for a large frame together with the quantized linear prediction coefficient, each vector value is restored to a scalar value; a synthesized speech signal for all bands is restored from the restored linear prediction coefficient and excitation signal for each divided band. There is a step of reproducing.

【００２０】[0020]

【００２１】[0021]

【００２２】[0022]

【００２３】[0023]

【００２４】[0024]

【発明の実施の形態】まず、本発明の概要を説明する。
本発明の音声通信装置は、音声信号をスペクトラル包絡
情報及び音源信号を生成し符号化音声信号として伝送す
る音声通信装置に関し、複数種類の周波数分析器、逆フ
ィルタ、音源補正器、及びベクトル量子化器を備えた音
声通信送信装置と、この音声通信送信装置と伝送路を介
して接続される音源信号復調器を備えた音声通信受信装
置とから成る。DESCRIPTION OF THE PREFERRED EMBODIMENTS First, the outline of the present invention will be described.
A voice communication device of the present invention relates to a voice communication device that generates a spectral envelope information and a sound source signal and transmits the sound signal as an encoded voice signal, and includes a plurality of types of frequency analyzers, inverse filters, sound source correctors, and vector quantizers. And a voice communication receiver including a sound source signal demodulator connected to the voice communication transmitter via a transmission path.

【００２５】音声通信送信装置は、音声帯域を第１の分
割帯域それぞれについて所定周期のフレーム化された入
力音声信号を線形予測分析（ＬＰＣ分析）して線形予測
係数（ＬＰＣ係数）を出力する線形予測分析手段と、線
形予測係数をベクトル量子化する第１の量子化手段と、
線形予測係数と入力音声信号との残差信号を抽出する逆
フィルタ手段と、残差信号から第２の分割帯域ごとに音
源信号を抽出する音源分析手段と、音源信号からケプス
トラム信号を抽出するケプストラム分析手段と、ケプス
トラム信号を基に音源信号を補正する音源補正手段と、
補正された音源信号をベクトル量子化する第２の量子化
手段と、ベクトル量子化済線形予測係数をフレームを所
定数分集成した大フレーム単位で蓄積するベクトル蓄積
手段と、蓄積された複数のベクトル量子化済線形予測係
数を互いに類似性のあるものとないものとに分離し、類
似性ありとみなした複数のベクトル量子化済線形予測係
数の中から一つのみを代表ベクトル量子化済線形予測係
数として選択する類似度判定手段と、ベクトル量子化済
線形予測係数及び類似性ありとみなされなかった残りの
ベクトル量子化済線形予測係数と、ベクトル量子化済音
源信号とをそれぞれ多重化し符号化音声信号として出力
する多重化手段とを備える。The voice communication transmitting device performs a linear predictive analysis (LPC analysis) on a framed input audio signal of a predetermined period for each of the first divided bands of the voice band, and outputs a linear predictive coefficient (LPC coefficient). Prediction analysis means, first quantization means for vector-quantizing linear prediction coefficients,
Inverse filter means for extracting the residual signal between the linear prediction coefficient and the input speech signal, sound source analysis means for extracting the sound source signal for each second divided band from the residual signal, and cepstrum for extracting the cepstrum signal from the sound source signal. Analysis means and sound source correction means for correcting the sound source signal based on the cepstrum signal,
Second quantizing means for vector quantizing the corrected sound source signal, vector accumulating means for accumulating the vector quantized linear prediction coefficients in a unit of a large frame in which a predetermined number of frames are assembled, and a plurality of accumulated vectors Quantized linear prediction coefficients are separated into those with similarity and those with no similarity, and only one of multiple vector quantized linear prediction coefficients that are regarded as having similarity is the representative vector quantized linear prediction Similarity determination means selected as coefficients, vector quantized linear prediction coefficients, remaining vector quantized linear prediction coefficients not regarded as having similarity, and vector quantized excitation signal are multiplexed and coded. And a multiplexing means for outputting as an audio signal.

【００２６】音声通信受信装置は、符号化音声信号を受
け、その信号から分割帯域ごとの代表ベクトル量子化済
線形予測係数及び類似性ありとみなされなかった残りの
ベクトル量子化済線形予測係数と、当該分割帯域ごとの
ベクトル量子化済音源信号とをそれぞれ分離し、代表ベ
クトル量子化済線形予測係数を類似性ありとみなしたフ
レーム数分のベクトル量子化済線形予測係数として複製
し、類似性ありとみなされなかった残りのベクトル量子
化済線形予測係数とともに大フレーム分の値に復元する
とともに、各ベクトル値をスカラー値に復元する分離手
段と、線形予測係数及び音源信号から合成音声信号を再
生する合成手段とを備える。The voice communication receiving device receives the encoded voice signal, and from the signal, the representative vector quantized linear prediction coefficient for each divided band and the remaining vector quantized linear prediction coefficient not regarded as having similarity. , The vector quantized sound source signal for each divided band is separated, and the representative vector quantized linear prediction coefficient is duplicated as the vector quantized linear prediction coefficient for the number of frames considered to have similarity, and the similarity is calculated. With the remaining vector quantized linear prediction coefficients that were not considered to be present, a large-frame value is restored, and a separation means that restores each vector value to a scalar value, and a synthetic speech signal from the linear prediction coefficient and the excitation signal is generated. And a synthesizing means for reproducing.

【００２７】本発明では、分割帯域のそれぞれについて
入力音声信号を線形予測分析して線形予測係数（ＬＰＣ
係数）を得るようにしたため、従来のＬＰＣ分析の欠点
といわれていた（１）フォルマント帯域幅の過小推定、
（２）第３フォルマントの近似性の悪さをそれぞれ改善
でき、このＬＰＣ係数を用いて逆フィルタにより高精度
の残差信号を抽出することができる。In the present invention, the input speech signal is subjected to a linear prediction analysis for each of the divided bands to obtain a linear prediction coefficient (LPC).
(1) underestimation of formant bandwidth, which was said to be a drawback of the conventional LPC analysis.
(2) Poor approximation of the third formant can be improved, and a highly accurate residual signal can be extracted by an inverse filter using this LPC coefficient.

【００２８】また、この高精度の残差信号を更に帯域分
割し、それぞれについて音源分析手段にて音源信号を得
るようにしたため、各帯域ごとに混在している有声音声
／無声音声を分離し、有声音声においては各帯域ごとに
最適な音源情報の抽出ができる。Further, since the high-precision residual signal is further band-divided and the sound source signal is obtained by the sound source analysis means for each of them, the mixed voiced voice / unvoiced voice is separated for each band, For voiced speech, optimal sound source information can be extracted for each band.

【００２９】更に、各帯域ごとに音源信号の揺らぎを測
定し、かつ補正するため、自然音声に近い揺らぎを持つ
音源信号の抽出ができる。従って、個々の特性に合わせ
た精度の良い音源情報を抽出することができるため、本
来の人間の発声機構に忠実な音声分析・合成ができる。Further, since the fluctuation of the sound source signal is measured and corrected for each band, it is possible to extract the sound source signal having the fluctuation close to that of natural voice. Therefore, since it is possible to extract the sound source information with high accuracy according to the individual characteristics, it is possible to perform voice analysis / synthesis faithful to the original human vocalization mechanism.

【００３０】加えて、ベクトル量子化済線形予測係数を
全て伝送するのではなく、類似性のあるベクトル値は一
つの代表ベクトル値のみ伝送する代表ベクトル送出方式
であるため、音質を保ちながら、通信情報量の削減をす
ることができる。In addition, instead of transmitting all vector quantized linear prediction coefficients, a representative vector sending method in which only one representative vector value of similar vector values is transmitted is used, and therefore communication is performed while maintaining sound quality. The amount of information can be reduced.

【００３１】次に、本発明の実施の形態について図面を
参照して詳細に説明する。Next, embodiments of the present invention will be described in detail with reference to the drawings.

【００３２】図１は本発明の音声通信装置の内の音声通
信送信装置１００の一実施の形態を示すブロック図であ
る。同図において、入力音声信号は音声帯域制限用ロー
パスフィルタ（ＬＰＦ）１１０により、例えば３００Ｈ
ｚ〜３．４ｋＨｚの電話音声帯域に制限された後、アナ
ログ・ディジタル変換器（Ａ／Ｄ変換器）１２０に供給
されて、所定の標本化周波数（例えば８ｋＨｚ）で標本
化された所定量子化ビット数（例えば１６ビット）のデ
ィジタル音声データに変換される。この音声データは所
定周期（本例では２２．５ｍｓｅｃ）で連続するフレー
ムで構成されており、以下の処理においては、このフレ
ーム単位で処理される。FIG. 1 is a block diagram showing an embodiment of a voice communication transmitting apparatus 100 in the voice communication apparatus of the present invention. In the figure, an input audio signal is, for example, 300H by an audio band limiting low pass filter (LPF) 110.
After being limited to a telephone voice band of z to 3.4 kHz, it is supplied to an analog / digital converter (A / D converter) 120 and is sampled at a predetermined sampling frequency (for example, 8 kHz). It is converted into digital audio data having a bit number (for example, 16 bits). This audio data is composed of consecutive frames in a predetermined cycle (22.5 msec in this example), and in the following processing, it is processed in units of this frame.

【００３３】この音声データは、バンドパスフィルタ群
１３０に供給されて、所定の周波数帯域ごとの複数の帯
域に分割される。本例では、１．２ｋＨｚを中心に２分
割される。すなわち、音声データは、バンドパスフィル
タ群１３０を構成するバンドパスフィルタ（ＢＰＦ）１
３１及び１３２にそれぞれ供給され、ＢＰＦ１３１から
帯域が３００Ｈｚ〜１．２ｋＨｚ、ＢＰＦ１３２から帯
域が１．２ｋＨｚ〜３．４ｋＨｚとされて取り出され
る。This audio data is supplied to the bandpass filter group 130 and divided into a plurality of bands for each predetermined frequency band. In this example, it is divided into two around 1.2 kHz. That is, the audio data is the bandpass filter (BPF) 1 that constitutes the bandpass filter group 130.
31 and 132, respectively, taken out from the BPF 131 with a band of 300 Hz to 1.2 kHz and from the BPF 132 with a band of 1.2 kHz to 3.4 kHz.

【００３４】ＢＰＦ１３１及び１３２から出力された各
帯域分割音声データは、分割帯域ごとに対応して設けら
れた線形予測分析器（ＬＰＣ分析器）１４１及び１４２
にそれぞれ供給され、公知の線形予測分析によりＬＳＰ
パラメータ、αパラメータ等のＬＰＣ係数（線形予測係
数））に変換された後、ＬＰＣ逆フィルタ１６０にＡ／
Ｄ変換器１２０の出力音声データと共に入力される。The respective band-divided voice data output from the BPFs 131 and 132 are linear prediction analyzers (LPC analyzers) 141 and 142 provided corresponding to the respective divided bands.
To the LSP by known linear prediction analysis.
After being converted into LPC coefficients (linear prediction coefficients) such as parameters and α parameters, A / P is applied to the LPC inverse filter 160.
It is input together with the output voice data of the D converter 120.

【００３５】各ＬＰＣ分析器１４１及び１４２からのＬ
ＰＣ係数（本例ではαパラメータを使用）を圧縮・量子
化するためのベクトル量子化器１５１及び１５２にそれ
ぞれ入力し、各々のＬＰＣ係数のベクトル量子化値（ベ
クトル量子化済ＬＰＣ係数）を算出しベクトル蓄積器１
８１及び１８２にそれぞれ送出する。L from each LPC analyzer 141 and 142
The PC coefficients (in this example, the α parameter is used) are input to vector quantizers 151 and 152 for compressing and quantizing, respectively, and the vector quantized value of each LPC coefficient (vector quantized LPC coefficient) is calculated. Vector storage 1
81 and 182 respectively.

【００３６】各ベクトル蓄積器１８１及び１８２は、音
声の声道周期として違和感のない１００ｍｓｅｃ〜２２
０ｍｓｅｃ程度の時間長に相当するように音声データの
フレームを複数集成して大フレームとし（本例は１フレ
ーム２２．５ｍｓｅｃとして、１０フレームを１大フレ
ームとしている）、この大フレーム単位にそれを構成す
る各フレームに対応するベクトル値（ベクトル量子化済
ＬＰＣ係数）を蓄積するとともに、類似度判定器１９１
及び１９２にそれぞれ送出する。Each of the vector accumulators 181 and 182 has 100 msec to 22 which does not cause a sense of discomfort as a vocal tract cycle of voice.
A plurality of audio data frames are assembled into a large frame so as to correspond to a time length of about 0 msec (in this example, 1 frame is 22.5 msec and 10 frames are 1 large frame). The vector value (vector quantized LPC coefficient) corresponding to each of the constituent frames is accumulated, and the similarity determiner 191
And 192 respectively.

【００３７】各類似度判定器１９１及び１９２は、前述
の大フレームを構成する１０フレーム分のベクトル信号
（ベクトル量子化済ＬＰＣ係数）の中から互いに類似性
のある信号を抽出し、その中の一つのみを代表ベクトル
として選択する。実験的には、有声音声の場合は１０フ
レーム中２〜５フレームが類似性があると算出される場
合が多い。よって線形予測ベクトル量子化情報量の２〜
５フレーム分のデータを１フレーム分のデータとして選
択された代表ベクトル値（代表ベクトル量子化済ＬＰＣ
係数）と、類似性なしと判定されたベクトル値（ベクト
ル量子化済ＬＰＣ係数）とを多重化器１０へ送る。この
ように、互いに類似性のある２〜５フレーム分のデータ
を１フレーム分のデータとするので１／２〜１／５に情
報圧縮が可能である。Each of the similarity determiners 191 and 192 extracts signals having similarity to each other from the vector signals (vector quantized LPC coefficients) for 10 frames which form the above-mentioned large frame, and extracts them. Only one is selected as the representative vector. Experimentally, in the case of voiced speech, it is often calculated that 2 to 5 frames out of 10 are similar. Therefore, the linear prediction vector quantization information amount 2 to
Representative vector value (representative vector quantized LPC) selected as data for five frames as data for one frame
The coefficient) and the vector value (vector quantized LPC coefficient) determined to have no similarity are sent to the multiplexer 10. In this way, since the data of 2 to 5 frames which are similar to each other is set to the data of 1 frame, it is possible to compress the information to 1/2 to 1/5.

【００３８】一方、ＬＰＣ逆フィルタ１６０は、Ａ／Ｄ
変換器１２０の出力音声データに対し、ＬＰＣ分析器１
４１及び１４２からの２つの分割帯域のＬＰＣ係数を用
いて線形予測分析のスペクトル包絡特性とは逆のフィル
タ特性を付与し、残差信号を出力する。On the other hand, the LPC inverse filter 160 uses the A / D
The LPC analyzer 1 for the output voice data of the converter 120
Using the LPC coefficients of the two divided bands from 41 and 142, a filter characteristic opposite to the spectral envelope characteristic of the linear prediction analysis is added and a residual signal is output.

【００３９】この残差信号は、バンドパスフィルタ群１
７０に供給されて複数の帯域に分割される。本例では、
１．５ｋＨｚと２．５ｋＨｚを境にして帯域が３分割さ
れる。すなわち、残差信号は、バンドパスフィルタ群１
７０を構成するＢＰＦ１７１，１７２，及び１７３にそ
れぞれ供給され、ＢＰＦ１７１から３００Ｈｚ〜１．５
ｋＨｚの周波数成分が取り出され、ＢＰＦ１７２から
１．５ｋＨｚ〜２．５ｋＨｚの周波数成分が取り出さ
れ、ＢＰＦ１７３から２．５ｋＨｚ〜３．４ｋＨｚの周
波数成分が取り出される。This residual signal is used as a bandpass filter group 1
It is supplied to 70 and divided into a plurality of bands. In this example,
The band is divided into three at the boundaries of 1.5 kHz and 2.5 kHz. That is, the residual signal is the bandpass filter group 1
70 is supplied to the BPFs 171, 172, and 173 of the BPF 171, respectively, and 300 Hz to 1.5 Hz from the BPF 171.
A frequency component of kHz is extracted, a frequency component of 1.5 kHz to 2.5 kHz is extracted from the BPF 172, and a frequency component of 2.5 kHz to 3.4 kHz is extracted from the BPF 173.

【００４０】各ＢＰＦ１７１，１７２，及び１７３から
それぞれ取り出された帯域分割残差信号は、分帯域ごと
に対応してそれぞれ複数個（本例は２個）ずつ設けられ
た音源分析器２１１〜２１２、２２１〜２２２、及び２
３１〜２３２にそれぞれ供給され、ここで２乗和されて
分割帯域ごとの音源信号（電力）として取り出される。A plurality of (two in this example) sound source analyzers 211 to 212 are provided for the band-divided residual signals extracted from the BPFs 171, 172, and 173, respectively, for each sub-band. 221-222, and 2
31 to 232, respectively, where they are summed squared and extracted as a sound source signal (power) for each divided band.

【００４１】各音源分析器２１１〜２１２、２２１〜２
２２、及び２３１〜２３２からの各帯域ごとの音源信号
は、対応するケプストラム分析器３１１〜３１２、３２
１〜３２２、及び３３１〜３３２にそれぞれ供給され
る。Each sound source analyzer 211-212, 221-2
The sound source signals for each band from 22 and 231 to 232 correspond to the corresponding cepstrum analyzers 311 to 312, 32.
1 to 322 and 331 to 332, respectively.

【００４２】各ケプストラム分析器３１１〜３１２、３
２１〜３２２、及び３３１〜３３２は、各帯域ごとの音
源信号の周波数軸上のケプストラム値（ケフレンシー）
を算出し、対応する音源補正器４１１〜４１２、４２１
〜４２２、及び４３１〜４３２にそれぞれ出力する。Each cepstrum analyzer 311 to 312, 3
21-322 and 331-332 are cepstral values (keflency) on the frequency axis of the sound source signal for each band.
Is calculated, and the corresponding sound source correctors 411 to 412 and 421 are calculated.
To 422 and 431 to 432, respectively.

【００４３】各音源補正器４１１〜４１２、４２１〜４
２２、及び４３１〜４３２は、ケプストラム値のケフレ
ンシー上の分散値を測定し、分散が小さいときには、定
常信号と判断、分散が大きいときは揺らぎがある信号と
判断する。揺らぎがあると判断された音源信号はピッチ
周期（声帯信号）に同期した変動（揺らぎ）を与え補正
する。更に、各音源補正器４１１〜４１２、４２１〜４
２２、及び４３１〜４３２の出力は、次に説明する各帯
域対応の判定器５０１，５０２，及び５０３を通して、
第２のベクトル量子化器である、ベクトル量子化器５１
１，５１２，及び５１３にそれぞれ出力する。Each sound source corrector 411-412, 421-4
22 and 431 to 432 measure the dispersion value of the cepstrum value on the cefency, and when the dispersion is small, it is determined as a stationary signal, and when the dispersion is large, it is determined as a signal with fluctuation. The sound source signal determined to have fluctuation is corrected by giving fluctuation (fluctuation) synchronized with the pitch cycle (vocal cord signal). Furthermore, each sound source corrector 411-412, 421-4
The outputs of 22 and 431 to 432 are passed through decision units 501, 502, and 503 corresponding to each band, which will be described below.
Vector quantizer 51, which is the second vector quantizer
1, 512, and 513, respectively.

【００４４】なお、音源分析器、ケプストラム分析器、
及び音源補正器を各帯域ごとに２つずつ設けるのは次の
理由による。音源分析にあたり、ＢＰＦ群１７０で各帯
域に分割された出力信号を、それぞれの帯域ごとでの倍
ピッチ又は半ピッチを含めて算出するように更に２分割
し、それぞれの計算結果からどちらかの分割側（２個の
うち１個）がより正しいかを各帯域ごとに判定器５０
１，５０２，及び５０３で判定して、入力信号により近
い値を示す分析ルートを正しい値とし、そのルートから
得られた値を正しい各帯域ごとの分析情報とするためで
ある。A sound source analyzer, a cepstrum analyzer,
The reason for providing two sound source correctors for each band is as follows. In the sound source analysis, the output signal divided into each band by the BPF group 170 is further divided into two so as to include a double pitch or a half pitch in each band, and one of the divided results is obtained. It is determined for each band whether the side (one of two) is more correct.
This is because the analysis route indicating the value closer to the input signal is determined as the correct value, and the value obtained from the route is determined as the correct analysis information for each band.

【００４５】各ベクトル量子化器５１１，５１２，及び
５１３は、判定器５０１，５０２，及び５０３でそれぞ
れ選択された側の音源補正器４１１〜４１２、４２１〜
４２２、及び４３１〜４３２で補正された音源信号のベ
クトル量子化値（ベクトル量子化済音源信号）をそれぞ
れ算出する。The vector quantizers 511, 512, and 513 have the sound source compensators 411 to 412, 421 to the side selected by the determiners 501, 502, and 503, respectively.
The vector quantized value (vector quantized sound source signal) of the sound source signal corrected in 422 and 431 to 432 is calculated.

【００４６】多重化器５２０は、ベクトル量子化器５１
１，５１２，及び５１３からそれぞれ取り出された分割
帯域ごとのベクトル量子化された音源信号（電力）と、
もう一つのベクトル量子化器１５１及び１５２（ベクト
ル蓄積器、類似度判定器経由）からそれぞれ取り出され
た２つの分割帯域のベクトル量子化されたＬＰＣ係数と
をそれぞれ多重化し、符号化音声信号として伝送路７０
０へ出力する。The multiplexer 520 is a vector quantizer 51.
Vector quantized sound source signal (power) for each divided band extracted from each of 1, 512, and 513,
The vector quantized LPC coefficients of the two divided bands extracted from the other vector quantizers 151 and 152 (via the vector accumulator and the similarity determiner) are multiplexed and transmitted as a coded voice signal. Road 70
Output to 0.

【００４７】図２は本発明の音声通信装置の内の音声通
信受信装置８００の一実施の形態を示すブロック図であ
る。同図において、分離器８１０は、伝送路７００を介
して音声通信送信装置１００から入力された符号化音声
信号より、送信側と同様の３つの分割帯域のベクトル量
子化された音源信号（電力）と２つの分割帯域のベクト
ル量子化されたＬＰＣ係数とをそれぞれ分離するととも
に、選択された代表ベクトル値（ベクトル量子化済ＬＰ
Ｃ係数）を類似性ありとみなしたフレーム数分のベクト
ル値として複製し、類似性なしと判定されたベクトル値
ととともに大フレーム（１０フレーム）分の値に復元し
た後、通常の線形計算に扱えるスカラー値に復元する。FIG. 2 is a block diagram showing an embodiment of a voice communication receiving apparatus 800 in the voice communication apparatus of the present invention. In the figure, a separator 810 is a vector-quantized excitation signal (power) of three divided bands similar to that on the transmission side, from an encoded voice signal input from the voice communication transmitting device 100 via a transmission path 700. And the vector quantized LPC coefficients of the two divided bands are separated from each other, and the selected representative vector value (vector quantized LP
C coefficient) is duplicated as a vector value for the number of frames considered as having similarity, and is restored to a value for a large frame (10 frames) together with the vector value determined as having no similarity, and then is subjected to normal linear calculation. Restore to a scalar value that can be handled.

【００４８】このうちＬＰＣ係数はＬＰＣ補間器８２０
に供給され、音声帯域の上位側と下位側をそれぞれ示し
ているＬＰＣ値（ベクトル値）を、線形演算で扱えるス
カラー値に変更した後、両ＬＰＣ係数（スカラー値）を
単純に重ね合わせることで、音声全帯域を表現している
ＬＰＣ係数に再生し、ここで一定周期（例えば２２．５
ｍｓｅｃ）ごとに入力されるＬＰＣ係数の前回の入力値
と今回の入力値とを利用した線形補間値から、例えば
５．６２５ｍｓｅｃ単位のＬＰＣ係数に修正される（換
言すると、２２．５ｍｓｅｃ単位で変化するＬＰＣ係数
が、５．６２５ｍｓｅｃ単位で変化するＬＰＣ係数に変
換される）。Of these, the LPC coefficient is the LPC interpolator 820.
The LPC values (vector values) supplied to the upper and lower sides of the voice band, respectively, are changed to scalar values that can be handled by linear calculation, and then both LPC coefficients (scalar values) are simply overlapped. , LPC coefficients that represent the entire audio band are reproduced, where a fixed period (for example, 22.5
(msec), the linear interpolation value that uses the previous input value and the current input value of the LPC coefficient is corrected to, for example, an LPC coefficient in units of 5.625 msec (in other words, changes in units of 22.5 msec). LPC coefficient is converted to an LPC coefficient that changes in units of 5.625 msec).

【００４９】また、分離器８１０により分離された分割
帯域ごとの音源信号（電力）は、音源復調器８３０に供
給され、ここで帯域ごとの音源信号（電力）を補間処理
して全帯域（３００Ｈｚ〜３．４ｋＨｚ）のピッチ情報
に復元される。ここで音源復調器８３０は、３つのバン
ドパスフィルタ群を持ち、それぞれ、低域、中域、高域
の音源再生用として、各帯域ごとの音源情報（スカラー
値）をフィルタ係数とし、その帯域に対応したピッチ情
報をエネルギーとして駆動する。そしてこの３つのフィ
ルタの出力の線形和が、音声全帯域を表す音源情報とな
る。Further, the sound source signal (power) for each divided band separated by the separator 810 is supplied to the sound source demodulator 830, where the sound source signal (power) for each band is subjected to interpolation processing and the entire band (300 Hz). Up to 3.4 kHz). Here, the sound source demodulator 830 has three band pass filter groups, and uses sound source information (scalar value) for each band as a filter coefficient for sound source reproduction in the low band, the middle band, and the high band, respectively. The pitch information corresponding to is driven as energy. Then, the linear sum of the outputs of these three filters becomes the sound source information representing the entire audio band.

【００５０】ＬＰＣ合成フィルタ８４０は、ＬＰＣ補間
器８２０から出力された修正後のＬＰＣ係数をフィルタ
係数とし、音源復調器８３０から出力された復調後のピ
ッチ情報を入力エネルギーとしたディジタル合成音声デ
ータを再生する。The LPC synthesis filter 840 uses the modified LPC coefficient output from the LPC interpolator 820 as a filter coefficient, and the digital synthesized speech data with the demodulated pitch information output from the sound source demodulator 830 as input energy. Reproduce.

【００５１】このディジタル合成音声データは、ディジ
タル・アナログ変換器（Ｄ／Ａ変換器）８５０に供給さ
れてディジタル・アナログ変換されてアナログ信号の合
成音声信号として取り出され、次段のＬＰＦ８６０によ
り不要周波数成分を除去された後、再成合成音声信号と
して出力される。This digital synthesized voice data is supplied to a digital / analog converter (D / A converter) 850, subjected to digital / analog conversion and taken out as a synthesized voice signal of an analog signal, and an LPF 860 at the next stage unnecessary frequency. After the components are removed, the re-synthesized speech signal is output.

【００５２】このように、この実施の形態では、音声信
号帯域を２分割して得られたＬＰＣ係数を用いてＬＰＣ
逆フィルタ１６０により抽出した高精度の残差信号を更
に３つの帯域に分割し、それぞれについて音源分析器２
１１〜２１２、２２１〜２２２、及び２３１〜２３２に
て音源信号を得ることにより、各帯域ごとに混在してい
る有声音声／無声音声を分離し、個々の特性に合わせた
精度の良い音源情報を抽出することができ、かつ各帯域
ごとに音源信号の揺らぎを測定し補正するため、自然音
声に近い揺らぎを持つ音源信号の抽出ができる。従っ
て、本来の人間の発声機構に忠実な音声分析・合成がで
きる。As described above, in this embodiment, the LPC coefficient obtained by dividing the audio signal band into two is used for LPC.
The high-precision residual signal extracted by the inverse filter 160 is further divided into three bands, and the sound source analyzer 2
By obtaining the sound source signals at 11-212, 221-222, and 231-2232, the mixed voiced / unvoiced sound for each band is separated, and accurate sound source information matched to each characteristic is obtained. It is possible to extract, and the fluctuation of the sound source signal is measured and corrected for each band, so that the sound source signal having the fluctuation close to that of natural speech can be extracted. Therefore, it is possible to perform voice analysis / synthesis faithful to the original human vocalization mechanism.

【００５３】更に周波数分析上の欠点である倍周期検出
を補正するために、音源分析器を２分割し、ＢＰＦ群１
７０で分割されたそれぞれの帯域における倍ピッチ又
は、半ピッチを含めて算出しているため、分析エラーを
極限できるという特徴を持つ。Further, in order to correct the double period detection which is a defect in frequency analysis, the sound source analyzer is divided into two, and the BPF group 1
Since the calculation is performed by including the double pitch or the half pitch in each band divided by 70, it has a feature that the analysis error can be limited.

【００５４】また、声道情報であるベクトル量子化済線
形予測（ＬＰＣ）係数を全て送るのではなく、類似性の
あるものは一つの代表値のみを送るため、音質の低下な
しで、通信情報量の削減ができる。Further, since not all vector quantized linear prediction (LPC) coefficients, which are vocal tract information, are transmitted but only one representative value having similarity is transmitted, the communication information is transmitted without deterioration in sound quality. The amount can be reduced.

【００５５】なお、本発明は上記の実施の形態に限定さ
れるものではなく、例えばバンドパスフィルタ群１３０
及び１７０の各ＢＰＦの特性は固定として説明したが、
入力情報を基に中心周波数を可変することもでき、ま
た、分割数も２分割から４分割程度まで可能であるこ
と、ケプストラムの分散値の判定は本実施例では固定と
して説明したが、ピッチ周波数に応じて可変してもよい
ことが実験的に確認されている。従来のＬＰＣボコーダ
はＬＰＣ分析に４０ビット、ピッチに７ビット、音源に
６ビット、同期に１ビットの計５４ビットを１フレーム
２２．５ｍｓｅｃとして伝送（５４／２２．５ｍ＝２４
００ｂｐｓ）としている。The present invention is not limited to the above embodiment, and for example, the bandpass filter group 130.
Although the characteristics of the BPFs of 170 and 170 are described as fixed,
The center frequency can be varied based on the input information, and the number of divisions can be from 2 divisions to 4 divisions. The determination of the dispersion value of the cepstrum is fixed in the present embodiment, but the pitch frequency is described. It has been experimentally confirmed that it may be changed according to The conventional LPC vocoder transmits 40 bits for LPC analysis, 7 bits for pitch, 6 bits for sound source, and 1 bit for synchronization as a total of 54 bits per frame of 22.5 msec (54 / 22.5m = 24).
00 bps).

【００５６】今回の方式はＬＰＣ分析におけるベクトル
量子化器（１５１、１５２）のそれぞれのビット割当を
１０ビットとし、計２０ビットをＬＰＣ係数に割り当て
る。また、代表値選択情報用としての類似度判定情報を
３ビットとし、計６ビットを割り当てる。更に音源情報
にしてもベクトル量子化器（５１１、５１２、５１３）
のそれぞれのビットを４ビットとし、計１２ビットを音
源情報に割り当てる。また、ピッチ情報として各帯域の
音源ごとに５ビットの計１５ビットを割り当てる。（Ｌ
ＰＣ係数２０ビット、類似度判定情報６ビット、音源情
報１２ビット、ピッチ情報１５ビット、同期１ビットの
計５４ビットで実現できる）よって、本発明の実施例に
おいて、更なる音質向上をビット情報の増加無く実現で
きることがわかる。In this method, the bit quantizers of the vector quantizers (151, 152) in the LPC analysis are assigned 10 bits, and a total of 20 bits are assigned to the LPC coefficients. Further, the similarity determination information for the representative value selection information is set to 3 bits, and a total of 6 bits are allocated. Further, the vector quantizer (511, 512, 513) is also used as the sound source information.
Each bit of 4 is set to 4 bits, and a total of 12 bits are assigned to the sound source information. Further, a total of 15 bits of 5 bits are assigned as pitch information for each sound source in each band. (L
(PC coefficient 20 bits, similarity determination information 6 bits, sound source information 12 bits, pitch information 15 bits, and synchronization 1 bit can be realized with a total of 54 bits). It can be seen that it can be realized without increase.

【００５７】[0057]

【発明の効果】以上説明したように、本発明によれば、
分割帯域のそれぞれについて入力音声信号を線形予測分
析して線形予測係数（ＬＰＣ係数）を用いて逆フィルタ
により高精度の残差信号を抽出した後、この高精度の残
差信号を更に帯域分割し、それぞれについて音源分析手
段にて音源信号を得ることにより、各帯域ごとに混存し
ている有声音声／無声音声を分離するようにしたため、
個々の特性に合わせた（各周波数帯域ごとに最適な）精
度の良い音源情報を抽出することができるとともに、周
波数分析上の欠点といえる、倍ピッチ、半ピッチによる
分析エラーを防ぐことができ、各帯域ごとに音源信号の
揺らぎを測定しかつ補正するため自然音声に近い揺らぎ
を持つ音源信号の抽出をし、本来の人間の発声機構に忠
実な音声分析・合成ができ、より自然性の高い合成音声
を得ることができる。また、ベクトル量子化済線形予測
（ＬＰＣ）係数を全て送るのではなく、類似性のあるも
のは一つの代表値のみを送るため、ＬＰＣ係数の増加な
く（通信情報量の削減）、より忠実なスペクトルの再現
（音質の低下なし）ができるとともに、より情報の圧縮
度が高い狭帯域音声通信を行うことができる。As described above, according to the present invention,
The input speech signal is subjected to linear prediction analysis for each of the divided bands, a high-precision residual signal is extracted by an inverse filter using a linear prediction coefficient (LPC coefficient), and then the high-precision residual signal is further band-divided. , By obtaining the sound source signal for each of them, the mixed voiced voice / unvoiced voice is separated for each band.
It is possible to extract accurate sound source information (optimal for each frequency band) according to individual characteristics, and prevent analysis errors due to double pitch and half pitch, which can be said to be a drawback in frequency analysis. Since the fluctuation of the sound source signal is measured and corrected for each band, the sound source signal having fluctuation close to that of natural speech is extracted, and voice analysis / synthesis faithful to the original human vocalization mechanism can be performed, resulting in higher naturalness. It is possible to obtain synthetic speech. Further, since not all vector quantized linear prediction (LPC) coefficients are sent but only one representative value having similarities is sent, there is no increase in LPC coefficients (reduction of communication information amount) and more faithful It is possible to reproduce a spectrum (without deterioration of sound quality) and perform narrowband voice communication with a higher degree of information compression.

[Brief description of drawings]

【図１】本発明の音声通信装置における音声通信送信
装置の一実施の形態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a voice communication transmitting device in a voice communication device of the present invention.

【図２】本発明の音声通信装置における音声通信受信
装置の一実施の形態を示すブロック図である。FIG. 2 is a block diagram showing an embodiment of a voice communication receiving device in the voice communication device of the present invention.

【図３】従来の音声通信送信装置及び音声通信受信装置
を示すブロック図である。FIG. 3 is a block diagram showing a conventional voice communication transmitting device and voice communication receiving device.

[Explanation of symbols]

１００音声通信送信装置１１０，８６０ローパスフィルタ（ＬＰＦ）１２０Ａ／Ｄ変換器１３０，１７０バンドパスフィルタ群１３１，１３２，１７１〜１７３バンドパスフィル
タ（ＢＰＦ）１４１，１４２ＬＰＣ分析器１５１，１５２，５１１〜５１３ベクトル量子化器１６０ＬＰＣ逆フィルタ２１１，２１２，２２１，２２２，２３１，２３２
音源分析器３１１，３１２，３２１，３２２，３３１，３３２
ケプストラム分析器４１１，４１２，４２１，４２２，４３１，４３２
音源補正器５０１〜５０３判定器５２０多重化器８００音声通信受信装置８１０分離器８２０ＬＰＣ補間器８３０音源復調器８４０ＬＰＣ合成フィルタ８５０Ｄ／Ａ変換器100 voice communication transmitter 110,860 low pass filter (LPF) 120 A / D converter 130,170 band pass filter group 131,132,171-173 band pass filter (BPF) 141,142 LPC analyzer 151,152,511 ˜513 vector quantizer 160 LPC inverse filter 211,212,221,222,231,232
Sound source analyzer 311,312,321,322,331,332
Cepstrum analyzer 411, 412, 421, 422, 431, 432
Sound source correctors 501 to 503 Determinator 520 Multiplexer 800 Voice communication receiving device 810 Separator 820 LPC interpolator 830 Sound source demodulator 840 LPC synthesis filter 850 D / A converter

Claims

(57) [Claims]

1. A voice signal is input and as an encoded voice signal
A voice communication transmitting device for outputting to a transmission line, and
And input encoding connected to the voice communication transmitter
Audio communication receiver for reproducing synthesized audio signal from audio signal
A voice communication transmitting device comprising: a voice communication transmitting device;
Divide into as many bands as the number of divisions, and
Predicts a framed input speech signal with a predetermined period
Linear prediction analysis means for analyzing and outputting a linear prediction coefficient;
Before each divided band output from the linear prediction analysis means
First quantizing means for vector quantizing the linear prediction coefficient
And; the linear prediction coefficient and the input sound for each of the same divided bands
Inverse filter means for receiving the voice signal and extracting the residual signal
And dividing the residual signal into a second number of bands,
For each split band based on each residual signal of the split band
Sound source analysis means for extracting a sound source signal into
Cepstrum that extracts the cepstrum signal from the sound source signal
Lamb analysis means; the sound source based on the cepstrum signal
Sound source correction means for correcting the signal; from the sound source correction means
The corrected sound source signal for each output divided band is
Second quantizing means for quantizing Kuttle; and the first quantum.
Vector quantum for each divided band output from the digitizing means
A large frame composed of a predetermined number of frames of the converted linear prediction coefficients.
Vector accumulating means for accumulating in units of units;
Multiple vector quantized linear predictions stored in storage means
The coefficients are separated into those with similarities and those with no similarity,
Multiple vector quantized linear predictions considered similar
Representative vector quantized linear prediction for only one of the coefficients
Similarity determination means selected as a coefficient; the similarity determination
The representative vector quantized linear prediction output from the means
Coefficients and remaining vectors not considered alike
The quantized linear prediction coefficient and the second quantization means
Vector quantized source signal for each divided band
And multiplex and output as encoded audio signals.
And a voice communication receiving device;
Receives the coded speech signal and uses the signal for each divided band.
Table vector quantized linear prediction coefficient and assumed to be similar
The remaining vector quantized linear prediction coefficients that were not
And vector quantized sound source signal of the divided bands your capital it
Separate each of the representative vector quantized linear prediction coefficients
Vector quantized for the number of frames regarded as having similarity
Duplicated as a linear prediction coefficient and considered to have the similarity
With the remaining vector quantized linear prediction coefficients that were not
After restoring to large frame values, each vector value is a scalar
-Separating means for restoring the value; output from the separating means
From the linear prediction coefficient and source signal for each divided band
And a synthesizing unit for reproducing a synthesized voice signal for a band.
A voice communication device characterized by the above .

2. A voice in which a transmitting side sends an input voice signal as a coded voice signal to a transmission line, and a reception side reproduces a coded voice signal received via the transmission line as a synthetic voice signal. In the communication method, on the transmitting side; a predetermined voice band is divided into bands of a first division number, and a linear prediction analysis is performed by linearly predicting a framed input voice signal of a predetermined cycle for each of the divided bands. Obtaining a coefficient; vector-quantizing the linear prediction coefficient for each divided band output from the linear prediction analysis means; extracting a residual signal from the linear prediction coefficient for each same divided band and the input speech signal; The residual signal is divided into bands of a second division number, and a sound source signal is extracted for each divided band based on each residual signal of the divided band; a cepstrum signal is extracted from the sound source signal for each divided band. Extracting; correcting the excitation signal on the basis of the cepstrum signal; vector-quantizing the corrected excitation signal for each of the divided bands; a predetermined number of frames of the vector quantized linear prediction coefficient for each of the divided bands; Accumulated in a unit of large frames that have been assembled; multiple accumulated vector quantized linear predictive coefficients are separated into those with similarity and those with no similarity, and multiple vector quantized linear that are regarded as similar Only one of the predictive coefficients is selected as the representative vector quantized linear predictive coefficient; the representative vector quantized linear predictive coefficient and the remaining vector quantized linear predictive coefficients that are not regarded as having similarity; The vector quantized excitation signal is multiplexed with each other and output to the transmission path as an encoded audio signal; on the receiving side; the encoded audio via the transmission path Vector quantized linear predictive coefficient for each sub-band and the remaining vector quantized linear predictive coefficients that are not regarded as having similarity and the vector quantized sound source for each sub-band. Separate the signal and the signal respectively;
The representative vector quantized linear prediction coefficient is copied as a vector quantized linear prediction coefficient for the number of frames that are regarded as having similarity, and the remaining vector quantized linear prediction coefficients that are not regarded as having similarity are included. After restoring to a value for a large frame, each vector value is restored to a scalar value; a synthesized speech signal for all bands is reproduced from the restored linear prediction coefficient and excitation signal for each divided band. Voice communication method.