JPH0675599A

JPH0675599A - Speech coding method, speech decoding method, and speech codec

Info

Publication number: JPH0675599A
Application number: JP4230104A
Authority: JP
Inventors: Seiji Sasaki; 誠司佐々木; Osamu Watanabe; 治渡辺; Hiroki Goto; 裕樹後藤; Masayasu Miyake; 正泰三宅
Original assignee: Kokusai Electric Co Ltd
Current assignee: Kokusai Denki Electric Inc
Priority date: 1992-08-28
Filing date: 1992-08-28
Publication date: 1994-03-18
Anticipated expiration: 2016-05-14
Also published as: JP3166797B2

Abstract

PURPOSE:To improve the degree of clearness of reproduced voice by making the sound source power value of the voice signal during a no voice condition to be no sound subframe sound source power value obtained for each serialized plural subframe. CONSTITUTION:Sampled input voice a1 is divided by a framing device 11 and becomes b1 and is processed for every frame. If there is a no voice frame, a switch 35 is controlled and prediction difference signal d1 is sent to a subframing device 37 and for example, is divided to four subframes. A subframe power computer 38 computes the power of signal K3, which is divided into subframes, for every subframe. A subframe power synthesizer 39 places the power obtained for every subframe in a serial manner, makes a sound source power information m3, which is the parameter equivalent to one frame, and outputs it through a switch 40. Thus, an amplitude gain is adjusted for every subframe and the amplitude change of no sound region in the reproduced voice is more closely approximated to the input voice.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、無声音の合成に好適な
音声符号化法及び音声復号化法、並びにその装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding method and a voice decoding method suitable for unvoiced sound synthesis, and an apparatus thereof.

【０００２】[0002]

【従来の技術】低ビットレート（２．４ｋｂｐｓ程度）
の音声符号及び復号化装置（即ち、音声符復号化装置）
では、分析合成方式が使用されるのが一般的である。最
も代表的な分析合成方式としてＬＰＣボコーダ（線形予
測分析による音声符復号化方式）がある。この方式で
は、予測残差信号をパルス列あるいは雑音モデル化して
情報圧縮しており、入力信号が有声区間である場合はパ
ルス列が、また入力音声が無声区間である場合は白色雑
音が用いられる。この従来例での音声符号化装置を図
２、音声復号化装置を図３に示す。以下では音声符号化
速度は２．４ｋｂｐｓであり、線形予測係数（スペクト
ル包絡情報）としてＬＳＰ（線スペクトル対）を使用す
ることを前提に説明する。2. Description of the Related Art Low bit rate (about 2.4 kbps)
Voice coding and decoding device (that is, voice coding and decoding device)
In general, the analysis and synthesis method is used. An LPC vocoder (speech code decoding method based on linear prediction analysis) is the most typical analysis and synthesis method. In this method, the prediction residual signal is pulse train or noise modeled and information is compressed. When the input signal is in the voiced section, pulse train is used, and when the input speech is in the unvoiced section, white noise is used. A speech coding apparatus and a speech decoding apparatus in this conventional example are shown in FIG. 2 and FIG. 3, respectively. In the following description, it is assumed that the speech coding rate is 2.4 kbps and that LSP (line spectrum pair) is used as the linear prediction coefficient (spectral envelope information).

【０００３】音声符号化装置では、例えば、８ｋＨｚで
標本化された入力音声ａ１はフレーム化器１１により、
１フレーム２０ｍｓｅｃに分割されｂ１となりその後は
フレーム毎に処理される。ｂ１は線形予測分析器１２に
より線形予測分析され、その結果として線形予測係数ｃ
１が得られ、これはＬＳＰ係数に変換された後、２０ビ
ットでベクトル量子化され図３の復号装置へ転送され
る。線形予測分析フィルタ１３は線形予測係数ｃ１を係
数としてｂ１をフィルタリングし予測残差信号ｄ１を出
力する。有声／無声判定器１４では、入力音声が音声で
あるか無声であるか判定し、その結果ｅ１をピッチ周期
抽出器１５に送る。ピッチ周期抽出器１５では、予測残
差信号ｄ１からピッチ周期ｆ１を計算し出力するが、有
声／無声判定結果ｅ１が無声の場合は、ピッチ周期ｆ１
に零を挿入して出力することにより、有声／無声情報と
する。音源電力計算器１６では、予測残差信号ｄ１のフ
レーム毎の電力情報（電力値）を計算し出力する。In the speech coding apparatus, for example, the input speech a1 sampled at 8 kHz is processed by the framer 11
One frame is divided into 20 msec and becomes b1, and thereafter each frame is processed. b1 is subjected to linear prediction analysis by the linear prediction analyzer 12, and as a result, the linear prediction coefficient c
1, which is converted into an LSP coefficient, vector-quantized with 20 bits, and transferred to the decoding apparatus of FIG. The linear prediction analysis filter 13 filters b1 using the linear prediction coefficient c1 as a coefficient and outputs the prediction residual signal d1. The voiced / unvoiced decision unit 14 decides whether the input voice is voice or unvoiced, and sends the result e1 to the pitch period extractor 15. The pitch period extractor 15 calculates and outputs the pitch period f1 from the prediction residual signal d1, but when the voiced / unvoiced determination result e1 is unvoiced, the pitch period f1 is calculated.
Voiced / unvoiced information is obtained by inserting and outputting zero. The sound source power calculator 16 calculates and outputs power information (power value) for each frame of the prediction residual signal d1.

【０００４】ここで、判定器１４でのＬＰＣボコーダの
有声／無声判定法は次の通りである。ＬＰＣボコーダで
は、有声／無声の判定は音源（予測残差信号）の周期性
の度合で決まる。周期性の度合は、入力音声信号のピッ
チ周期ｔ_pに当たる時間遅れ（ｔ＝ｔ_p）での予測残差信
号相関ｔ_pを、時間遅れなし（ｔ＝０）での予測残差信
号相関Ｒ₀で正規化した値の大小で調べる。即ち、有声
／無声判定法は次の条件式によって行われる。Ｒ_tp／Ｒ₀≧０．２５ならば有声区間Ｒ_tp／Ｒ₀＜０．２５ならば無声区間Here, the voiced / unvoiced decision method of the LPC vocoder in the decision unit 14 is as follows. In the LPC vocoder, the voiced / unvoiced determination is determined by the degree of periodicity of the sound source (prediction residual signal). The degree of periodicity is determined by the prediction residual signal correlation t _p at a time delay (t = t _p ) corresponding to the pitch period t _p of the input speech signal and the prediction residual signal correlation R at no time delay (t = 0). Check with the value normalized by ₀ . That is, the voiced / unvoiced determination method is performed by the following conditional expression. Voiced section if R _tp / R ₀ ≧ 0.25 Voiceless section if R _tp / R ₀ <0.25

【０００５】図３の音声復号化装置では、音源信号を生
成するため、パルス発生器２１により伝送されてきたピ
ッチ周期ａ２の周期を有するパルス列ｂ２を発生させ、
白色雑音発生器２２では、白色雑音ｃ２を発生させる。
パルスゲイン計算器２５では伝送されてきた音源電力情
報ｄ２から適切なパルス用ゲインｆ２（音源電力値を
γ、ピッチ周期をＴとすればパルス振幅が（Ｔγ）^1/2
となるように調整する）を計算し、雑音ゲイン計算器２
６では、伝送されてきた音源電力情報ｄ２から適切な雑
音用ゲインｇ２（伝送された音源電力ｄ２と白色雑音電
力ｉ２が同じになるように調整する）を計算する。制御
回路２３は、ピッチ周期ａ２が零であれば無音フレーム
であるとして、スイッチ２４、スイッチ２７をそれぞれ
白色雑音発生器２２側、雑音ゲイン計算器２６側に切り
替え、ピッチ周期ａ２が零以外であれば有音フレームで
あるとして、スイッチ２４、スイッチ２７をそれぞれパ
ルス発生器２１側、パルスゲイン計算器２５側に切り替
える。乗算器２９では、選択された音源信号ｉ２と選択
されたゲインｈ２を乗算し、ゲイン調整された音源信号
ｊ２を出力する。線形合成フィルタ２８では、線形予測
係数ｋ２をフィルタ係数としてフィルタリングし再生音
声ｊ３を生成する。In the speech decoding apparatus of FIG. 3, in order to generate an excitation signal, a pulse train b2 having a pitch period a2 transmitted by the pulse generator 21 is generated,
The white noise generator 22 generates white noise c2.
The pulse gain calculator 25 uses the transmitted sound source power information d2 to obtain an appropriate pulse gain f2 (where the sound source power value is γ and the pitch period is T, the pulse amplitude is (Tγ) ^1/2
Adjust so that the noise gain calculator 2
In step 6, an appropriate noise gain g2 (adjusted so that the transmitted sound source power d2 and the white noise power i2 are the same) is calculated from the transmitted sound source power information d2. If the pitch cycle a2 is zero, the control circuit 23 determines that it is a silent frame, and switches the switches 24 and 27 to the white noise generator 22 side and the noise gain calculator 26 side, respectively, and determines that the pitch cycle a2 is other than zero. For example, assuming that it is a voiced frame, the switches 24 and 27 are switched to the pulse generator 21 side and the pulse gain calculator 25 side, respectively. The multiplier 29 multiplies the selected sound source signal i2 and the selected gain h2, and outputs the gain adjusted sound source signal j2. The linear synthesis filter 28 filters the linear prediction coefficient k2 as a filter coefficient to generate a reproduced voice j3.

【０００６】[0006]

【発明が解決しようとする課題】従来方式での１フレー
ム（２０ｍｓ、４８ビット）のビット配分を図４に示
す。同期用ビットを１ビット、ピッチ周期用ビットを７
ビット、音源電力用ビットを６ビット、線形予測係数
（ＬＳＰ係数）用ビットを２０ビット、未使用ビットは
１４ビット（未使用ビットは、有声フレームでは何らか
の補助的な情報ビットとして使用されるとする）として
る。FIG. 4 shows the bit allocation of one frame (20 ms, 48 bits) in the conventional method. 1 bit for synchronization and 7 bits for pitch period
Bits, 6 bits for sound source power, 20 bits for linear prediction coefficient (LSP coefficient), 14 bits for unused bits (unused bits are used as some auxiliary information bits in a voiced frame) ).

【０００７】従来方式での問題点は、１フレーム２０ｍ
ｓに対し、音源電力情報は１つしか伝送しないため、音
声の子音部で発生する急激な振幅変化（２０ｍｓより小
さな時間幅で発生するもの）を確実に再生することが出
来ない。この例を図５（イ）、（ロ）に示す。（イ）は
入力音声（原音声）であり、“ｔｓｕ”という発生音の
波形である。ここで、点線の円で囲んだ部分の急激な振
幅変化が、子音である“ｔｓ”を表現するための重要な
情報である。しかし、従来方式の再生音声を示す（ロ）
では、それが表現できずフレームで平均化され平坦にな
ってしまうため、再生音声は聴感上で“ｈｕ”と聞こえ
てしまう。The problem with the conventional method is that one frame is 20 m.
For s, since only one sound source power information is transmitted, it is not possible to reliably reproduce a rapid amplitude change (which occurs in a time width smaller than 20 ms) that occurs in the consonant part of the voice. An example of this is shown in FIGS. (A) is the input voice (original voice), which is the waveform of the generated sound "tsu". Here, the rapid amplitude change in the part surrounded by the dotted circle is important information for expressing the consonant "ts". However, the playback sound of the conventional method is shown (b)
Then, it cannot be expressed and is averaged by the frames to be flat, so that the reproduced sound is perceived as "hu".

【０００８】本発明の目的は、入力音声が無声音（子音
部）である場合、再生音声に於ける無音区間の振幅変化
をより忠実に入力音声に近づけることにより、再生音声
の明瞭声を向上させる音声符号化法及び復号化法及び合
成装置を提供することにある。An object of the present invention is to improve the clear voice of the reproduced voice by making the amplitude change of the silent section in the reproduced voice closer to the input voice more faithfully when the input voice is an unvoiced sound (consonant part). An object of the present invention is to provide a speech encoding method, a decoding method, and a synthesizer.

【０００９】[0009]

【課題を解決するための手段】本発明は、音声信号から
パラメータ（有声か無声かで識別可能な構成となるピッ
チ周期、音源電力値、線形予測係数）を抽出し、この抽
出したパラメータを音声信号に代わって出力する音声符
号化法に於て、音声信号が無声音の場合の上記音源電力
値は、シリアル化した複数の各サブフレーム毎に求め
た、無音声時のサブフレーム音源電力値とする（請求項
１）。According to the present invention, a parameter (a pitch period, a sound source power value, a linear prediction coefficient) that can be identified as voiced or unvoiced is extracted from a voice signal, and the extracted parameter is used as a voice signal. In the voice coding method that outputs instead of the signal, the sound source power value when the voice signal is unvoiced is obtained for each of the serialized subframes, and the subframe sound source power value when there is no voice and (Claim 1).

【００１０】更に本発明は、請求項１の音声符号化法で
出力されたパラメータを受信し、有声音の場合には、そ
のパラメータを利用して音声信号を復元し、無声音の場
合には、その時のサブフレーム音源電力値を含む上記パ
ラメータを利用して音声信号を復元する（請求項２）。Furthermore, the present invention receives a parameter output by the speech coding method of claim 1, restores a speech signal using the parameter in the case of voiced sound, and unvoiced sound in the case of unvoiced sound. The audio signal is restored using the above parameters including the subframe sound source power value at that time (claim 2).

【００１１】更に本発明は、音声符号化部と音声復号化
部とより成ると共に、音声符号化部は、音声信号をフレ
ーム単位に区分するフレーム化器と、この出力から線形
予測係数を算出し、これを出力する線形予測分析器と、
この線形予測係数とフレーム化器出力とから予測残差信
号を求める線形予測分フィルタと、この予測残差信号か
ら有声音か無声音かを判定する判定器と、上記予測残差
信号から判定器の判定出力に従って有声時のピッチ周期
を求め且つ無声時のピッチ周期を零にしこれを出力する
ピッチ周期抽出器と、判定器が有声音判定時に上記予測
残差信号を取り込み音源電力値を算出し、これを有声音
判定時の音源電力値として出力する音源電力計算器と、
判定器が無声音判定時に上記予測残差信号を取り込みこ
れをサブフレームで区分するサブフレーム化器と、この
サブフレーム毎の予測残差信号の音源電力値を算出する
サブフレーム電力計算器と、このサブフレーム電力を合
成するサブフレーム電力合成器と、無声音判定時にこの
サブフレーム電力合成器を音源電力値として出力する出
力手段と、より成り、音声復号化部は、上記ピッチ周期
抽出器のピッチ周期の内容から有声音か無声音かを識別
する制御回路と、上記ピッチ周期抽出器のピッチ周期か
ら有声音でのピッチ周期に同期するパルスを発生するパ
ルス発生器と、白色雑音発生器と、上記制御回路が有声
音判定時には音声符号化部の有声音時の音源電力値を取
り込みパルスゲインを算出し、この算出値と上記パルス
発生器出力との乗算を行う第１の手段と、上記制御回路
が無声音判定時には音声符号化部の無声音時の音源電力
値を取り込みサブフレーム雑音ゲインを算出し、この算
出値と上記白色雑音発生器出力との乗算を行う第２の手
段と、上記有声音時には第１の手段の出力を取り込み線
形予測合成し、無声音判定時には第２の手段の出力を取
り込み線形予測係合成し、かくして再生音声を得る線形
予測合成フィルタと、より成る（請求項３）。Further, the present invention comprises a speech coding section and a speech decoding section, and the speech coding section calculates a linear prediction coefficient from a framing device which divides a speech signal into frame units. , A linear prediction analyzer that outputs this,
A linear prediction component filter that obtains a prediction residual signal from the linear prediction coefficient and the framer output, a determiner that determines whether voiced sound or unvoiced sound is obtained from this prediction residual signal, and a determination unit that determines from the prediction residual signal. A pitch period extractor that obtains the pitch period when voiced according to the determination output and outputs the pitch period when unvoiced is zero, and the determination unit calculates the sound source power value by capturing the prediction residual signal at the time of voiced sound determination, A sound source power calculator that outputs this as a sound source power value at the time of voiced sound determination,
A subframe generator that determines the undecided sound when the unvoiced sound determines the prediction residual signal and divides it into subframes, and a subframe power calculator that calculates the excitation power value of the prediction residual signal for each subframe. The subframe power synthesizer for synthesizing the subframe power and the output means for outputting the subframe power synthesizer as a sound source power value at the time of unvoiced sound determination are provided. A control circuit for discriminating between voiced sound and unvoiced sound from the contents of the above, a pulse generator for generating a pulse synchronized with the pitch period of the voiced sound from the pitch period of the pitch period extractor, a white noise generator, and the above control When the circuit judges voiced sound, it takes in the sound source power value of the voice coding unit in voiced sound, calculates the pulse gain, and multiplies the calculated value by the pulse generator output. When the unvoiced sound is determined by the control circuit, the sound source power value of the unvoiced sound of the speech encoding unit is taken in to calculate the subframe noise gain, and the calculated value is multiplied by the output of the white noise generator. The second means for performing and the linear predictive synthesis filter which takes in the output of the first means at the time of voiced sound and performs the linear predictive synthesis at the time of unvoiced sound determination, and performs the linear predictive synthesis to obtain the reproduced voice. And (claim 3).

【００１２】[0012]

【作用】本発明によれば、ピッチ周期と音源電力値、線
形予測係数より成るパラメータで音声信号を符号化する
と共に、無声音時には、サブフレーム毎に求めた音源電
力値を出力することにより、無声音の符号化を達成する
（請求項１）。According to the present invention, a voice signal is coded with a parameter consisting of a pitch period, a sound source power value, and a linear prediction coefficient, and at the time of unvoiced sound, the sound source power value obtained for each subframe is output so that the unvoiced sound is generated. To achieve the encoding (claim 1).

【００１３】更に、本発明によれば、上記パラメータを
受信して復元するに際して無声音時には、音源である白
色雑音をサブフレーム毎に振幅調整することによって無
音声区間の振幅変化をより忠実に入力信号に近づけるこ
とができる（請求項２）。Further, according to the present invention, when unvoiced when receiving and restoring the above parameters, the amplitude of white noise, which is a sound source, is adjusted for each subframe to more faithfully change the amplitude of the unvoiced signal. (Claim 2).

【００１４】更に本発明は、サブフレーム毎の無声音の
音源電力値をパラメータとして送出する符号化部と、こ
れを受信して復元する復号化部とを備えて無音声区間の
振幅変化をより忠実に入力信号に近づけた合成を可能に
する（請求項３）。Further, the present invention is provided with an encoding unit for transmitting the unvoiced sound source power value of each subframe as a parameter, and a decoding unit for receiving and restoring the same so that the amplitude change in the unvoiced section is more faithful. It enables synthesis close to the input signal (Claim 3).

【００１５】[0015]

【実施例】本発明の音声符号化装置の構成を図１、音声
復号化装置の構成を図６に示す。ここでは、音声符号化
速度は２．４ｋｂｐｓであり、線形予測係数器（スペク
トル包絡情報）としてＬＳＰ（線スペクトル対）を使用
することを前提に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows the configuration of a speech coding apparatus of the present invention, and FIG. 6 shows the configuration of a speech decoding apparatus. Here, it is assumed that the speech coding rate is 2.4 kbps and that the LSP (line spectrum pair) is used as the linear prediction coefficient unit (spectral envelope information).

【００１６】図１に於て、図２に比して新しく追加した
部分は、スイッチ３５、サブフレーム化器３７、サブフ
レーム電力計算器３８、サブフレーム電力合成器３９、
スイッチ４０である。図６に於て、図３に比して新しく
追加した部分は、スイッチ４４、４７であり、更に、計
算器４６がサブフレーム雑音ゲインを計算する点で図２
と異なる。In FIG. 1, the parts newly added in comparison with FIG. 2 are a switch 35, a subframe converter 37, a subframe power calculator 38, a subframe power combiner 39,
The switch 40. In FIG. 6, the parts newly added in comparison with FIG. 3 are switches 44 and 47, and in addition, the calculator 46 calculates the subframe noise gain.
Different from

【００１７】音声符号化装置では、例えば、８ｋＨｚで
標本化された入力音声ａ１はフレーム化器１１により、
１フレーム２０ｍｓｅｃに分割されｂ１となりその後は
フレーム毎に処理される。ｂ１は線形予測分析器１２に
より線形予測分析され、その結果として線形予測係数ｃ
１が得られ、これはＬＳＰ係数に変換された後、２０ビ
ットでベクトル量子化され復号器伝送される。線形予測
分析フィルタ１３は線形予測係数ｃ１を係数としてｂ１
をフィルタリングして予測残差信号ｄ１を出力する。有
声／無声判定器１４では、入力音声が有声であるか無声
であるか判定し、その結果ｅ１をピッチ周期抽出器１５
に送る。ピッチ周期抽出器１５では、予測残差信号ｄ１
からピッチ周期ｆ３を計算し出力するが、有声／無声判
定結果ｅ１が無声の場合は、ピッチ周期ｆ３に零を挿入
し、出力することにより、有声／無声情報とする。有声
フレームの場合は、スイッチ３５を制御して音源電力計
算器１６により、予測残差信号ｄ１のフレーム毎の電力
情報を計算しスイッチ４０を介して出力する。In the speech coder, for example, the input speech a1 sampled at 8 kHz is processed by the framer 11
One frame is divided into 20 msec and becomes b1, and thereafter each frame is processed. b1 is subjected to linear prediction analysis by the linear prediction analyzer 12, and as a result, the linear prediction coefficient c
1, which is converted into LSP coefficients, vector-quantized with 20 bits, and transmitted to the decoder. The linear prediction analysis filter 13 sets the linear prediction coefficient c1 as a coefficient to b1.
To output a prediction residual signal d1. The voiced / unvoiced decision unit 14 decides whether the input voice is voiced or unvoiced, and the result e1 is determined as the pitch period extractor 15
Send to. In the pitch period extractor 15, the prediction residual signal d1
Then, the pitch period f3 is calculated and output, but when the voiced / unvoiced determination result e1 is unvoiced, zero is inserted in the pitch period f3 and output to obtain voiced / unvoiced information. In the case of a voiced frame, the sound source power calculator 16 controls the switch 35 to calculate power information of each frame of the prediction residual signal d1 and outputs the power information via the switch 40.

【００１８】無声フレームの場合は、スイッチ３５を制
御して予測残差信号ｄ１はサブフレーム化器３７に送
り、例えば４個のサブフレーム（５ｍｓ）に分割され
る。サブフレーム電力計算器３８は、サブフレーム分割
された信号ｋ３について、サブフレーム毎の電力ｌ４を
計算する。サブフレーム電力合成器３９は、サブフレー
ム毎に求めた電力をシリアルに並べ１フレーム分として
のパラメータである音源電力情報ｍ３を作り、これをス
イッチ４０を介して出力される。スイッチ３５、４０は
有声／無声情報ｅ１に応じて、音源電力計算器３６側
と、サブフレーム化器３７・サブフレーム電力計算器３
８・サブフレーム電力合成器３９側を切り替える。In the case of an unvoiced frame, the switch 35 is controlled to send the prediction residual signal d1 to the subframe converter 37, which divides it into, for example, four subframes (5 ms). The subframe power calculator 38 calculates the power 14 for each subframe for the signal k3 divided into subframes. The subframe power combiner 39 serially arranges the power obtained for each subframe to create sound source power information m3 that is a parameter for one frame, and outputs this through the switch 40. The switches 35 and 40 correspond to the sound source power calculator 36 side and the subframe converter 37 / subframe power calculator 3 according to the voiced / unvoiced information e1.
8. Switch the subframe power combiner 39 side.

【００１９】図６の音声復号化装置では、音源信号を生
成するため、パルス発生器２１が伝送されてきたピッチ
周期ａ４の周期を有するパルス列ｂ４を発生させ、白色
雑音発生器２２が、白色雑音ｃ４を発生させる。パルス
ゲイン計算器２５では伝送されてきた音源電力情報ｆ４
から適切なパルス用ゲインｇ４（パルス振幅が、音源電
力をγ、ピッチ周期をＴとすれば（Ｔγ）^1/2となるよ
うに調整する）を計算し、サブフレーム雑音ゲイン計算
器４６では、伝送されてきた音源電力情報ｆ４から各サ
ブフレームの適切な雑音用ゲインｈ４（音源電力ｆ４と
白色雑音電力ｅ４が同じになるように調整する）を計算
し、サブフレーム毎にそれぞれのゲインを出力する。制
御回路２３は、ピッチ周期ａ４があれば零であれば、無
音フレームであるとして、スイッチ４４、スイッチ４７
をそれぞれ白色雑音発生器２２側、サブフレーム雑音ゲ
イン計算器４６側に切り替え、ピッチ周期ａ４が零でな
ければ有音フレームであるとして、スイッチ４４、スイ
ッチ４７をそれぞれパルス発生器２１側、パルスゲイン
計算器２５側に切り替える。乗算器２９では、選択され
た音源信号ｅ４と選択されたゲインｉ４を乗算し、ゲイ
ン調整された音源信号ｊ４を出力する。線形合成フィル
タ２８では、線形予測係数ｋ４をフィルタ係数としてｊ
４をフィルタリングし再生音声ｊ５を生成する。In the speech decoding apparatus shown in FIG. 6, in order to generate a sound source signal, the pulse generator 21 generates a pulse train b4 having a period of the transmitted pitch period a4, and the white noise generator 22 causes the white noise generator 22 to generate white noise. c4 is generated. In the pulse gain calculator 25, the transmitted sound source power information f4 is transmitted.
Then, an appropriate pulse gain g4 (adjusted so that the pulse amplitude is (Tγ) ^1/2 where γ is the sound source power and T is the pitch period) is calculated by the subframe noise gain calculator 46. From the transmitted sound source power information f4, an appropriate noise gain h4 of each subframe (adjusted so that the sound source power f4 and the white noise power e4 are the same) is calculated, and each gain is output for each subframe. To do. If the pitch cycle a4 is zero, the control circuit 23 determines that the frame is a silent frame, and the switch 44 and the switch 47.
Are switched to the white noise generator 22 side and the sub-frame noise gain calculator 46 side respectively, and if the pitch period a4 is not zero, it is determined that it is a voiced frame, and the switches 44 and 47 are respectively set to the pulse generator 21 side and the pulse gain. Switch to the calculator 25 side. The multiplier 29 multiplies the selected sound source signal e4 by the selected gain i4 and outputs the gain adjusted sound source signal j4. The linear synthesis filter 28 uses the linear prediction coefficient k4 as a filter coefficient j
4 is filtered and reproduced voice j5 is generated.

【００２０】本発明での１フレーム（２０ｍｓ、４８ビ
ット）のビット配分を図７（ロ）に示す。図７（イ）に
は図４の従来例を対比のために示した。同期用ビットを
１ビット、ピッチ周期用ビットを７ビット、音源電力用
ビットを１６ビット（４ビット／サブフレーム：１サブ
フレーム用いるビット数は、電力の変動範囲が１フレー
ム全体のものより小さくなるので、４ビットで量子化可
能となる）、線形予測係数（ＬＳＰ係数）用ビットを２
０ビット、未使用ビットは４ビットとしている。The bit allocation of one frame (20 ms, 48 bits) according to the present invention is shown in FIG. FIG. 7A shows the conventional example of FIG. 4 for comparison. 1 bit for synchronization, 7 bits for pitch period, 16 bits for sound source power (4 bits / subframe: 1 subframe, the power fluctuation range is smaller than that of the whole frame. Therefore, it is possible to quantize with 4 bits), and 2 bits for the linear prediction coefficient (LSP coefficient).
There are 0 bits and 4 unused bits.

【００２１】本発明では、１サブフレーム５ｍｓ毎に音
源電力情報を伝送しているため、従来技術と比較し、音
声の子音部で発生する急激な振幅変化（２０ｍｓより小
さな時間幅で発生するもの）をより確実に対応できる。
この例を図８（イ）、（ロ）、（ハ）に示す。（イ）は
入力音声（原音声）であり、“ｔｓｕ”という発生音の
波形である。ここで、点線の円で囲んだ部分の急激な振
幅変化が、子音である“ｔｓ”を表現するための重要な
情報である。即ち、フレームをサブフレーム１、２、
３、４の区分で考えるとサブフレーム２では、パルス状
の波形が存在するため他のサブフレームより電力が大き
くなる。このパルス状の波形が“ｔｓ”を表現する重要
な要素である。前述したように従来方式の再生音声を示
す（ロ）では、それが表現出来ず、平坦になってしまう
ため、再生音声は聴感上で“ｈｕ”と聞こえてしまう。
（ハ）は本発明での再生音声波形であり、急激な振幅変
化が表現可能となり、聴感上“ｔｓｕ”と聞き取れるよ
うになる。即ち、サブフレーム毎に電力を送るため、サ
ブフレーム２での電力を他のサブフレームより大きく表
現可能となり、聴感上“ｔｓ”を聞き取れるようにな
る。In the present invention, since the sound source power information is transmitted every 5 ms of one sub-frame, compared with the prior art, the abrupt amplitude change (occurs in a time width smaller than 20 ms) occurring in the consonant part of the voice. ) Can be handled more reliably.
This example is shown in FIGS. 8A, 8B, and 8C. (A) is the input voice (original voice), which is the waveform of the generated sound "tsu". Here, the rapid amplitude change in the part surrounded by the dotted circle is important information for expressing the consonant "ts". That is, the frame is divided into subframes 1, 2,
Considering sections 3 and 4, subframe 2 has a pulsed waveform, and therefore has higher power than other subframes. This pulse-shaped waveform is an important element for expressing "ts". As described above, in the case of (b) which shows the reproduced sound of the conventional method, it cannot be expressed and becomes flat, so that the reproduced sound is perceived as "hu" in the sense of hearing.
(C) is a reproduced speech waveform according to the present invention, and a sudden change in amplitude can be expressed, and can be perceived as "tsu". That is, since the electric power is sent for each sub-frame, the electric power in the sub-frame 2 can be expressed more than in the other sub-frames, and "ts" can be heard in the sense of hearing.

【００２２】本実施例に用いることにより、単音節明瞭
度（再生音声の客観評価方法の１つである。即ちいろい
ろな音節をランダムに並べ、聞き取り試験を行った正解
率〔％〕であり、殆どの違聴は子音部で生じる）は、従
来方式の場合の５０．６〔％〕であるのに対し、５５．
６〔％〕と向上し、再生音声の品質向上が確認できた。When used in this embodiment, the single syllable intelligibility (one of the objective evaluation methods for reproduced voices, that is, the correct answer rate [%] obtained by arranging various syllables at random and performing a listening test, (Most of the hearing loss occurs in the consonant part), which is 50.6% in the conventional method, while 55.
It was improved to 6% and it was confirmed that the quality of the reproduced voice was improved.

【発明の効果】本発明によれば、音源信号に対し、フレ
ームをサブフレームに分解し、サブフレーム毎に振幅ゲ
イン調整することになり、再生音声における無声音区間
の振幅変化をより忠実に入力音声に近づけることがで
き、再生音声の明瞭声を向上できるようになった。According to the present invention, a frame is decomposed into subframes for a sound source signal, and an amplitude gain adjustment is performed for each subframe, so that a change in amplitude of a voiceless section in a reproduced voice can be faithfully reproduced. It has become possible to improve the clear voice of the reproduced voice.

[Brief description of drawings]

【図１】本発明の音声符号化装置の実施例図である。FIG. 1 is a diagram showing an embodiment of a speech coder according to the present invention.

【図２】従来の音声符号化装置を示す図である。FIG. 2 is a diagram showing a conventional speech encoding device.

【図３】従来の音声復号化装置を示す図である。FIG. 3 is a diagram showing a conventional speech decoding device.

【図４】従来のパラメータ送出フォーマットを示す図で
ある。FIG. 4 is a diagram showing a conventional parameter transmission format.

【図５】従来の音声再生の様子を示す図である。FIG. 5 is a diagram showing a state of conventional audio reproduction.

【図６】本発明の音声復号化装置の実施例図である。FIG. 6 is a diagram showing an embodiment of a speech decoding apparatus of the present invention.

【図７】本発明のパラメータ送出フォーマットを従来例
のパラメータ送出フォーマットと対比して示した図であ
る。FIG. 7 is a diagram showing a parameter transmission format of the present invention in comparison with a parameter transmission format of a conventional example.

【図８】本発明の音声再生の様子を従来例の音声再生と
対比して示した図である。FIG. 8 is a diagram showing a state of audio reproduction of the present invention in comparison with a conventional example of audio reproduction.

[Explanation of symbols]

１１フレーム化器１２線形予測分析器１３線形予測分析フィルタ１４有声／無声判定器１５ピッチ周期抽出器１６音源電力計算器２１パルス発生器２２白色雑音発生器２３制御回路２５パルスゲイン計算器２８線形予測合成フィルタ３５、４０スイッチ３７サブフレーム化器３８サブフレーム電力計算器３９サブフレーム電力合成器４４、４７スイッチ４６サブフレーム雑音ゲイン計算器 11 Framer 12 Linear Prediction Analyzer 13 Linear Prediction Analysis Filter 14 Voiced / Unvoiced Determinator 15 Pitch Period Extractor 16 Source Power Calculator 21 Pulse Generator 22 White Noise Generator 23 Control Circuit 25 Pulse Gain Calculator 28 Linear Prediction Synthesis filter 35, 40 Switch 37 Subframe converter 38 Subframe power calculator 39 Subframe power combiner 44, 47 Switch 46 Subframe noise gain calculator

───────────────────────────────────────────────────── フロントページの続き (72)発明者三宅正泰東京都港区虎ノ門二丁目３番13号国際電気株式会社内 ─────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Masayasu Miyake 2-3-3 Toranomon, Minato-ku, Tokyo Kokusai Electric Co., Ltd.

Claims

[Claims]

1. A voice code for extracting a parameter (a pitch period, a sound source power value, a linear prediction coefficient, which can be identified as voiced or unvoiced) from a voice signal and outputting the extracted parameter instead of the voice signal. In the speech coding method, the sound source power value when the sound signal is unvoiced is a subframe sound source power value in the absence of sound, which is obtained for each of a plurality of serialized subframes.

2. A parameter output by the voice encoding method according to claim 1 is received, and in the case of voiced sound, the voice signal is restored using the parameter, and in the case of unvoiced sound, the sub-parameter at that time is restored. A speech decoding method that restores a speech signal by using the above parameters including the frame sound source power value.

3. A speech coding unit and a speech decoding unit, wherein the speech coding unit divides a speech signal into frame units, and a linear prediction coefficient is calculated from this output, and the linear prediction coefficient is calculated. A linear prediction analyzer that outputs, a linear prediction component filter that obtains a prediction residual signal from this linear prediction coefficient and the framer output, a determiner that determines whether this is a voiced sound or an unvoiced sound from the prediction residual signal, and A pitch period extractor for obtaining the pitch period in the voiced state and outputting the pitch period in the unvoiced state to zero according to the determination output of the determination unit from the prediction residual signal,
The sound source power calculator that the decision unit takes in the above prediction residual signal at the time of voiced sound judgment and calculates the sound source power value, and outputs this as the sound source power value at the time of voiced sound judgment, and the above prediction residual error at the time of the unvoiced sound judgment by the decision unit. A subframe converter that takes in a signal and divides it into subframes, a subframe power calculator that calculates the excitation power value of the prediction residual signal for each subframe, and a subframe power combiner that combines this subframe power And a means for outputting this subframe power combiner as a sound source power value when determining unvoiced sound, and the speech decoding unit distinguishes between voiced sound and unvoiced sound from the content of the pitch cycle of the pitch cycle extractor. A control circuit, a pulse generator that generates a pulse synchronized with the pitch period of voiced sound from the pitch period of the pitch period extractor, a white noise generator, and When the control circuit determines the voiced sound, the first means for taking in the sound source power value of the voice encoding unit during the voiced sound, calculating the pulse gain, and multiplying the calculated value by the pulse generator output; When the unvoiced sound is determined, the sound source power value of the unvoiced sound of the speech encoding unit is taken in to calculate the subframe noise gain, and the second means for multiplying the calculated value and the output of the white noise generator, A linear predictive synthesis filter that takes in the output of the first means, performs linear predictive synthesis, and takes the output of the second means when performing unvoiced sound determination, and performs linear predictive synthesis, thus obtaining reproduced speech;
A voice codec, which comprises: