JP3364827B2

JP3364827B2 - Audio encoding method, audio decoding method, audio encoding / decoding method, and devices therefor

Info

Publication number: JP3364827B2
Application number: JP06040997A
Authority: JP
Inventors: 正山浦; 裕久田崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1996-10-18
Filing date: 1997-03-14
Publication date: 2003-01-08
Anticipated expiration: 2017-03-14
Also published as: JPH10177399A

Abstract

PROBLEM TO BE SOLVED: To evade deterioration in a regenerative signal and to make possible reproducing a voice with less transmission errors by constituting so that a noise code is not incorporated in the code used when a characteristic position for pitch synchronizing a noise code vector is obtained. SOLUTION: Distances between coded voices generated using adaptive codes, the noise codes, gains respectively selected in the case when an adaptive code book 11 and a noise code book 12 are used and the case when a fixed code book 24 and a second noise code book 25 and an input voice S1 are compared with each other, and the adaptive code, the noise code, the gain that the distance becomes small are selected. After the coding is ended, the codes of the adaptive code, the noise code and the gain minimizing the distortion between the input voice and the coded voice are outputted as a coded result S2. In such a manner, the voice is obtained from the coded code without directly coding the information of the characteristic position, and this method is constituted so that the noise code for the noise code book generating at least a periodic time sequence vector is not incorporated in the code used when the characteristic position is obtained.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は音声信号をディジ
タル信号に圧縮符号化する音声符号化復号化方法に関
し、特に通信に適用する際に伝送路誤りによる品質劣化
の少ない音声を再生するための音声符号化方法、音声復
号化方法及び音声符号化復号化方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding / decoding method for compressing and coding a voice signal into a digital signal, and particularly to a voice for reproducing a voice whose quality is less deteriorated by a transmission line error when applied to communication. The present invention relates to a coding method, a voice decoding method, and a voice coding / decoding method.

【０００２】[0002]

【従来の技術】従来、通信用の高能率音声符号化方法と
しては、符号励振線形予測符号化（Code-Excited Linea
r Prediction coding : ＣＥＬＰ）、多帯域励振符号化
（Multi-Band Excitation coding :ＭＢＥ）といった手
法が代表的である。それぞれの技術については、「Code
-excited linear prediction (ＣＥＬＰ) :High-qualit
y speech at 8kbps」（M.R.Shroeder and B.S.Atal著、
ICASSP'85, pp.937-940,1985）、及び「A real-time im
plementation of the improved MBE speech coder」
（M.S.Brandstein, P.A.Monta, J.C.Hardwick and J.S.
Lim 著、ICASSP'90,pp.5-8, 1990）に述べられている。2. Description of the Related Art Conventionally, as a high-efficiency speech coding method for communication, code-excited line coding (Code-Excited Linea) has been used.
Methods such as r Prediction coding (CELP) and Multi-Band Excitation coding (MBE) are typical. For each technology, see "Code
-excited linear prediction (CELP): High-qualit
y speech at 8kbps ”(MR Shroeder and BSAtal,
ICASSP'85, pp.937-940,1985), and "A real-time im
plementation of the improved MBE speech coder "
(MS Brandstein, PAMonta, JCHardwick and JS
Lim, ICASSP'90, pp.5-8, 1990).

【０００３】ここでは、ＣＥＬＰ系音声符号化について
説明する。図１２は、ＣＥＬＰ系音声符号化復号化方法
の全体構成の一例を示すもので、図中１は符号化部、２
は復号化部、３は多重化手段、４は分離手段である。符
号化部１は線形予測パラメータ分析手段８、線形予測パ
ラメータ符号化手段９、合成フィルタ１０、適応符号帳
１１、雑音符号帳１２、ゲイン符号化手段１３、距離計
算手段１４より構成されている。また、復号化部２は線
形予測パラメータ復号化手段１５、合成フィルタ１６、
適応符号帳１７、雑音符号帳１８、ゲイン復号化手段１
９より構成されている。Here, CELP audio coding will be described. FIG. 12 shows an example of the overall configuration of a CELP speech coding / decoding method.
Is a decoding unit, 3 is a multiplexing unit, and 4 is a demultiplexing unit. The coding unit 1 includes a linear prediction parameter analysis unit 8, a linear prediction parameter coding unit 9, a synthesis filter 10, an adaptive codebook 11, a noise codebook 12, a gain coding unit 13, and a distance calculation unit 14. Further, the decoding unit 2 includes a linear prediction parameter decoding unit 15, a synthesis filter 16,
Adaptive codebook 17, random codebook 18, gain decoding means 1
It is composed of 9.

【０００４】ＣＥＬＰ系音声符号化では、５〜50ms程度
を１フレームとして、そのフレームの音声をスペクトル
情報と音源情報に分けて符号化する。以下、ＣＥＬＰ系
音声符号化復号化方法の動作について説明する。まず符
号化部１において、線形予測パラメータ分析手段８は入
力音声Ｓ１を分析し、音声のスペクトル情報である線形
予測パラメータを抽出する。線形予測パラメータ符号化
手段９はその線形予測パラメータを符号化し、符号化し
た線形予測パラメータを合成フィルタ１０の係数として
設定する。In CELP speech coding, one frame is about 5 to 50 ms, and the speech of the frame is divided into spectrum information and sound source information and coded. The operation of the CELP speech coding / decoding method will be described below. First, in the encoding unit 1, the linear prediction parameter analysis unit 8 analyzes the input speech S1 and extracts a linear prediction parameter which is spectrum information of the speech. The linear prediction parameter coding means 9 codes the linear prediction parameter and sets the coded linear prediction parameter as a coefficient of the synthesis filter 10.

【０００５】次に音源情報の符号化について説明する。
適応符号帳１１には、過去の駆動音源ベクトルが記憶さ
れており、適応符号に対応して過去の駆動音源ベクトル
を周期的に繰り返した時系列ベクトルを出力する。雑音
符号帳１２には、例えばランダム雑音から生成した複数
の時系列ベクトルが記憶されており、雑音符号に対応し
た時系列ベクトルを出力する。適応符号帳１１、雑音符
号帳１２からの各時系列ベクトルはゲイン符号化手段１
３から与えられるそれぞれのゲインに応じて重み付けし
て加算され、その加算結果を駆動音源ベクトルとして合
成フィルタ１０へ供給し符号化音声を得る。距離計算手
段１４は符号化音声と入力音声Ｓ１との距離を求め、距
離が最小となる適応符号、雑音符号、ゲインを探索す
る。この符号化が終了した後、線形予測パラメータの符
号、入力音声と符号化音声との歪みを最小にする適応符
号、雑音符号、ゲインの符号を符号化結果として出力す
る。Next, encoding of sound source information will be described.
The past drive excitation vector is stored in the adaptive codebook 11, and the time series vector obtained by periodically repeating the past drive excitation vector corresponding to the adaptive code is output. The random codebook 12 stores a plurality of time series vectors generated from random noise, for example, and outputs the time series vector corresponding to the random code. Each time series vector from the adaptive codebook 11 and the noise codebook 12 is gain coding means 1.
Weights are added according to the respective gains given by 3 and added, and the addition result is supplied to the synthesis filter 10 as a driving excitation vector to obtain encoded speech. The distance calculation means 14 obtains the distance between the encoded voice and the input voice S1, and searches for an adaptive code, a noise code, and a gain that minimize the distance. After this coding is completed, the code of the linear prediction parameter, the adaptive code that minimizes the distortion between the input speech and the coded speech, the noise code, and the gain code are output as the coding results.

【０００６】一方復号化部２において、線形予測パラメ
ータ復号化手段１５は線形予測パラメータの符号から線
形予測パラメータを復号化し、合成フィルタ１６の係数
として設定する。次に、適応符号帳１７は、適応符号に
対応して、過去の駆動音源ベクトルを周期的に繰り返し
た時系列ベクトルを出力し、また雑音符号帳１８は雑音
符号に対応した時系列ベクトルを出力する。これらの時
系列ベクトルは、ゲイン復号化手段１９でゲインの符号
から復号化したそれぞれのゲインに応じて重み付けして
加算され、その加算結果が駆動音源ベクトルとして合成
フィルタ１６へ供給され出力音声Ｓ３が得られる。On the other hand, in the decoding unit 2, the linear prediction parameter decoding means 15 decodes the linear prediction parameter from the code of the linear prediction parameter and sets it as the coefficient of the synthesis filter 16. Next, adaptive codebook 17 outputs a time-series vector obtained by periodically repeating the past driving excitation vector corresponding to the adaptive code, and noise codebook 18 outputs a time-series vector corresponding to the noise code. To do. These time-series vectors are weighted and added according to the respective gains decoded from the gain code by the gain decoding means 19, and the addition result is supplied to the synthesis filter 16 as a driving sound source vector to output the output voice S3. can get.

【０００７】またＣＥＬＰ系音声符号化復号化方法で再
生音声品質の向上を目的として改良された従来の音声符
号化復号化方法として、特開平６−２０２６９９号公報
に開示されたものがある。図１２との対応部分に同一符
号を付けた図１３は、この従来の音声符号化復号化方法
の構成を示し、図中２０、２２はピッチ位置抽出手段、
２１、２３はピッチ同期化手段である。このような構成
による符号化復号化方法の動作を説明する。まず符号化
部１において、ピッチ位置抽出手段２０は、適応符号帳
１１から出力される周期的に繰り返した時系列ベクトル
から、例えば時系列ベクトルの最大振幅をとる周期的な
点をピッチ位置として抽出する。雑音符号帳１２には、
例えばランダム雑音から生成した複数の符号ベクトルが
記憶されており、雑音符号に対応した符号ベクトルを出
力する。Further, as a conventional speech encoding / decoding method improved for the purpose of improving the reproduced speech quality by the CELP system speech encoding / decoding method, there is one disclosed in Japanese Patent Laid-Open No. 6-202699. FIG. 13 in which parts corresponding to those in FIG. 12 are assigned the same reference numerals shows the configuration of this conventional speech coding / decoding method, in which 20 and 22 are pitch position extracting means.
Reference numerals 21 and 23 are pitch synchronization means. The operation of the encoding / decoding method having such a configuration will be described. First, in the encoding unit 1, the pitch position extraction means 20 extracts, from the periodically repeated time series vector output from the adaptive codebook 11, for example, a periodic point having the maximum amplitude of the time series vector as a pitch position. To do. In the noise codebook 12,
For example, a plurality of code vectors generated from random noise are stored, and a code vector corresponding to a noise code is output.

【０００８】各符号ベクトルにはピッチ同期位置が設定
されており、ピッチ同期化手段２１は、前記ピッチ位置
抽出手段２０で抽出されたピッチ位置にピッチ同期位置
が合うように、符号ベクトルを切り出しピッチ周期長に
し、これを周期的に繰り返した時系列ベクトルを生成す
る。この時系列ベクトルを生成する方法は、例えば、
「位相適応型ＰＳＩ−ＣＥＬＰ音声符号化の検討」（間
野、守谷著、電子情報通信学会音声研究会 SP94-96, p
p.37-44, 1995）に示すように、フレーム境界とピッチ
位置のずれ（位相）とピッチ周期長の２つのパラメータ
に応じて符号ベクトルを切り出し、これを周期化するも
のである。図１４に位相をφ、ピッチ周期をＬとしたと
きのピッチ同期化処理の例を示す。そして、適応符号帳
１１、ピッチ同期化手段２１からの各時系列ベクトルを
用いて符号化音声が生成され、この符号化音声と入力音
声Ｓ１との距離が最小となる適応符号、雑音符号、ゲイ
ンが選択され、符号化される。A pitch synchronizing position is set for each code vector, and the pitch synchronizing means 21 cuts out the code vector so that the pitch synchronizing position matches the pitch position extracted by the pitch position extracting means 20. A period length is set and a time series vector is generated by repeating this periodically. The method of generating this time series vector is, for example,
"Study on Phase-Adaptive PSI-CELP Speech Coding" (Mano, Moriya, IEICE Speech Society SP94-96, p.
p.37-44, 1995), a code vector is cut out according to two parameters of a frame boundary, a pitch position shift (phase), and a pitch cycle length, and the code vector is made periodic. FIG. 14 shows an example of pitch synchronization processing when the phase is φ and the pitch period is L. Then, the coded speech is generated using each time-series vector from the adaptive codebook 11 and the pitch synchronization means 21, and the adaptive code, noise code, and gain that minimize the distance between the coded speech and the input speech S1. Is selected and encoded.

【０００９】次に復号化部２において、ピッチ位置抽出
手段２２は適応符号帳１７から出力される周期的に繰り
返した時系列ベクトルから、符号化部１のピッチ位置抽
出手段２０と同様の方法でピッチ位置を抽出する。また
雑音符号帳１８は雑音符号に対応した符号ベクトルを出
力する。符号ベクトルには、符号化部１の雑音符号帳１
２と同じピッチ同期位置が設定されており、ピッチ同期
化手段２３は、そのピッチ位置抽出手段２２で抽出され
たピッチ位置にピッチ同期位置が合うように、符号ベク
トルを切り出しピッチ周期長にし、これを周期的に繰り
返した時系列ベクトルを生成する。Next, in the decoding unit 2, the pitch position extracting means 22 uses the same method as the pitch position extracting means 20 of the encoding unit 1 from the periodically repeated time series vector output from the adaptive codebook 17. Extract the pitch position. The random codebook 18 also outputs a code vector corresponding to the random code. For the code vector, the random codebook 1 of the encoding unit 1 is used.
The same pitch synchronization position as that of No. 2 is set, and the pitch synchronizing means 23 cuts out the code vector to make the pitch period length so that the pitch synchronizing position matches the pitch position extracted by the pitch position extracting means 22. To periodically generate a time series vector.

【００１０】そして、適応符号帳１７、ピッチ同期化手
段２３からの各時系列ベクトルを用いて出力音声Ｓ３を
得る。このように構成することにより、雑音符号帳から
の符号ベクトルをピッチに同期させることで再生音声の
ピッチ周期性を向上し、特に音声の有声部における再生
音声の品質を向上することができ、また符号ベクトルを
周期化する際のピッチ位置は適応符号帳からの時系列ベ
クトルから適応的に求めているので、伝送情報量を増加
させること無く、高品質な再生音声を生成できる。Then, the output speech S3 is obtained by using each time series vector from the adaptive codebook 17 and the pitch synchronizing means 23. With this configuration, it is possible to improve the pitch periodicity of the reproduced voice by synchronizing the code vector from the random codebook with the pitch, and particularly to improve the quality of the reproduced voice in the voiced part of the voice, and Since the pitch position when the code vector is made periodic is adaptively obtained from the time-series vector from the adaptive codebook, high-quality reproduced speech can be generated without increasing the amount of transmission information.

【００１１】ここで実際上移動体通信のような、符号誤
りの発生する応用分野に適用される音声符号化復号化方
法では、誤り訂正符号化技術を用いて符号誤りによる符
号化音声の品質劣化を押さえている。また誤りを訂正し
きれない場合には、波形修復処理を行い、符号誤りの影
響を押さえる工夫がなされている。Here, in a speech coding / decoding method which is actually applied to an application field in which a coding error occurs, such as mobile communication, an error correction coding technique is used to deteriorate the quality of coded speech due to a coding error. Holding down. Further, when the error cannot be completely corrected, a waveform restoration process is performed to reduce the influence of the code error.

【００１２】これまでの修復方法としては、次の２種類
がある。すなわち一番目の修復方法は、「Channel codi
ng for digital speech transmission in the Japanese
digital cellular system」（ M.J.McLaughlin 著、電
子情報通信学会無線通信システム研究会 RCS90-27, pp.
41-45, 1990 ）に示すように、現在のフレームが符号誤
りのあるフレームの場合に、過去のフレームのパラメー
タを繰り返し用いて再生音声を生成する、また再生音声
のパワーを徐々に抑圧していく方法である。There are the following two types of repair methods so far. In other words, the first repair method is "Channel codi
ng for digital speech transmission in the Japanese
digital cellular system ”(MJMcLaughlin, Radio Communication System Research Society of the Institute of Electronics, Information and Communication Engineers RCS90-27, pp.
41-45, 1990), when the current frame is a frame with a code error, the reproduced voice is generated by repeatedly using the parameters of the past frame, and the power of the reproduced voice is gradually suppressed. That's the way to go.

【００１３】また２番目の修復方法は、特開平６−１２
０９５号公報に開示されているように、過去のフレー
ム、現在のフレーム及び将来のフレームのそれぞれの符
号誤り検出情報を用い、各フレーム誤り検出状態に応じ
て現在のフレームの音声を再生修復するものである。こ
の場合には、将来のフレームの情報も用いて補間を行う
ので、過去のフレームの情報のみを用いる場合に比較し
て、歪みの小さい補間を行うことができる。The second repair method is disclosed in Japanese Patent Laid-Open No. 6-12.
As disclosed in Japanese Patent Publication No. 095, the code error detection information of each of the past frame, the current frame, and the future frame is used to reproduce and restore the voice of the current frame according to each frame error detection state. Is. In this case, since the interpolation is also performed using the information of the future frame, it is possible to perform the interpolation with less distortion as compared with the case where only the information of the past frame is used.

【００１４】また波形修復が十分に行えない場合にも、
聴感上の音声品質を維持する伝送誤り補償方法として、
特開平７−３６４９６号公報に開示されたものがある。
これは、伝送誤りの状態により再生音声を出力せず、代
わりに雑音信号を出力するものである。これにより、適
当な波形修復が行えない場合にも、異音となることを避
けることができる。Also, when the waveform restoration cannot be performed sufficiently,
As a transmission error compensation method that maintains the perceptual voice quality,
There is one disclosed in JP-A-7-36496.
In this, the reproduced voice is not output due to a transmission error state, but a noise signal is output instead. As a result, it is possible to avoid making an abnormal noise even when proper waveform restoration cannot be performed.

【００１５】[0015]

【発明が解決しようとする課題】上述したように改良さ
れた従来の音声符号化復号化方法では、雑音符号に対応
した雑音符号帳内の符号ベクトルを、適応符号帳から出
力される時系列ベクトルより求めるピッチ位置に、符号
ベクトルのピッチ同期位置が合うように切り出しピッチ
周期長にし、これを周期的に繰り返した時系列ベクトル
を生成し、これを用いて駆動音源ベクトルを生成してい
る。ここで雑音符号に伝送誤りが発生した場合には、駆
動音源ベクトルは誤りの影響を受け、この駆動音源ベク
トルから生成される適応符号帳からの時系列ベクトルも
誤りの影響を受けたものになるため、ピッチ位置が正し
く求まらなくなる。In the conventional speech coding / decoding method improved as described above, the code vector in the noise codebook corresponding to the noise code is converted into the time series vector output from the adaptive codebook. The cut-out pitch period length is set so that the pitch synchronization position of the code vector matches the more obtained pitch position, and a time-series vector is generated by repeating this periodically, and this is used to generate the driving sound source vector. When a transmission error occurs in the noise code, the driving excitation vector is affected by the error, and the time series vector from the adaptive codebook generated from this driving excitation vector is also affected by the error. Therefore, the pitch position cannot be obtained correctly.

【００１６】このため一度雑音符号を誤ると、その後は
正しい雑音符号が伝送されて正しい雑音符号ベクトルを
用いても、ピッチ位置が本来の正しいものとは異なるた
め、ピッチ同期化処理により生成される時系列ベクトル
も正しいものが得られず、駆動音源ベクトルが正しく生
成できなくなる。これがまたピッチ位置が正しく求まら
ない原因となるという悪循環が起こるため、雑音符号の
誤りの影響の時間的波及が大きく、雑音符号を誤った場
合の再生音声の劣化が大きいという問題があった。従来
の誤り訂正符号化技術を用いて符号誤りによる符号化音
声の品質劣化を押さえる場合でも、誤り訂正符号にかか
る伝送情報量を少なくするために、この雑音符号は誤り
訂正対象外とされることが多く、雑音符号の誤りによる
品質劣化はできるだけ小さく抑える必要がある。For this reason, once the noise code is erroneous, even if the correct noise code is transmitted after that and the correct noise code vector is used, the pitch position is different from the original correct one, so that it is generated by the pitch synchronization processing. The correct time series vector cannot be obtained, and the driving sound source vector cannot be correctly generated. Since this causes a vicious circle that causes the pitch position to not be obtained correctly, there is a problem that the influence of the error of the noise code has a large time ripple and the reproduced voice is greatly deteriorated when the noise code is wrong. . Even if the conventional error correction coding technology is used to suppress the quality deterioration of coded speech due to a code error, this noise code must be excluded from error correction in order to reduce the amount of transmission information related to the error correction code. Therefore, it is necessary to suppress the quality deterioration due to the noise code error as small as possible.

【００１７】またピッチ同期化処理において雑音符号に
対応した雑音符号帳内の符号ベクトルを切り出しピッチ
周期長にする際、位相とピッチ周期長の２つのパラメー
タを用いている。ここでピッチ位置が誤り、位相が本来
の正しいものと異なる場合、ピッチ周期長の符号ベクト
ルが表す波形が本来の正しいものと異なり、これを周期
的に繰り返した時系列ベクトルの波形も本来の正しいも
のとは大きく異なるため、再生音声の劣化が大きいとい
う問題があった。図１５（Ａ）、（Ｂ）に符号ベクトル
とピッチ周期長は同じで、ピッチ位置のみが異なる場合
のピッチ同期化処理により生成される時系列ベクトルの
例を示す。In the pitch synchronization processing, when the code vector in the random codebook corresponding to the random code is cut out to be the pitch cycle length, two parameters of the phase and the pitch cycle length are used. If the pitch position is incorrect and the phase is different from the original correct one, the waveform represented by the code vector of the pitch cycle length is different from the original correct one, and the time-series vector waveform that repeats this periodically is also the original correct one. There is a problem that the reproduced voice is greatly deteriorated because it is very different from the one. FIGS. 15A and 15B show examples of time-series vectors generated by the pitch synchronization processing when the code vector has the same pitch period length but only the pitch position is different.

【００１８】また従来の符号誤りが発生した場合の波形
修復方法では、過去のフレームのパラメータを繰り返し
て用いているが、まだ再生音声の品質が低いという問題
があった。例えば有声の立ち上がりのフレームではピッ
チ周期性が非定常なため、必ずしも後続の有声定常部に
適したピッチ情報のパラメータが伝送されてはいないこ
とが多く、後続フレームで符号誤りが発生した場合、そ
のピッチ情報のパラメータを繰り返して用いても良好な
再生音声は得られなかった。Further, in the conventional waveform restoration method when a code error occurs, the parameters of the past frame are repeatedly used, but there is a problem that the quality of reproduced voice is still low. For example, in a voiced rising frame, the pitch periodicity is non-stationary, so the pitch information parameter suitable for the subsequent voiced stationary part is not always transmitted, and when a code error occurs in the subsequent frame, Even if the parameters of pitch information were used repeatedly, good reproduced speech could not be obtained.

【００１９】また無声部でも局所的には周期性があるフ
レームがあり、これに後続する無声フレームで符号誤り
が発生した場合、パラメータの繰り返しにより一定周期
の信号が連続するため再生音声がブザー音となり、再生
音声の品質劣化が大きかった。また符号誤りが発生した
フレームで、前フレームのパラメータを繰り返して生成
したスペクトルが不適当であった場合には、再生音声品
質が大きく劣化していた。これは特に、高域に鋭いホル
マントピークがある等、高域のパワーが大きい場合に、
その高域での異音感が顕著となり、聴感上の劣化が大き
かった。Further, even in the unvoiced part, there is a frame having local periodicity, and when a code error occurs in the unvoiced frame following the frame, the signal of a constant period continues due to the repetition of the parameter, so that the reproduced voice is a buzzer sound. Therefore, the quality of the reproduced voice was greatly deteriorated. Further, in a frame in which a code error has occurred, if the spectrum generated by repeating the parameters of the previous frame is inappropriate, the reproduced voice quality is greatly deteriorated. This is especially true when there is a large amount of power in the high range, such as a sharp formant peak in the high range.
The abnormal noise in the high frequency range became noticeable, and the auditory deterioration was great.

【００２０】また将来のフレームの情報も用いて補間を
行う従来の波形修復方法では、修復のために現フレーム
の再生に必要な遅延が大きくなるとういう問題があっ
た。この音声再生に必要な遅延が大きくなると、通信に
適用した場合、自然な対話ができず、通話に支障を来た
すので、遅延はできるだけ小さいことが望ましい。また
従来の符号誤りが発生して波形修復が十分に行えない場
合にも、聴感上の音声品質を維持する伝送誤り補償方法
では、再生音声と雑音信号を切り替えて出力している。
しかし、音声から雑音あるいは雑音から音声への移行に
不連続感が伴うため、これがかえって、聴感上の再生音
声品質の劣化につながるという問題があった。さらに伝
送誤りが発生した場合、常に雑音信号を出力しているた
め、本来は無音である部分でも雑音が出力され、これも
聴感上の劣化につながっていた。Further, the conventional waveform restoration method for performing the interpolation by using the information of the future frame also has a problem that the delay required for reproducing the current frame for restoration becomes large. If the delay required for this voice reproduction becomes large, when it is applied to communication, a natural dialogue cannot be performed and a telephone call is disturbed. Therefore, it is desirable that the delay be as small as possible. Further, even when a conventional code error occurs and waveform restoration cannot be performed sufficiently, the transmission error compensation method for maintaining the audible voice quality switches and outputs the reproduced voice and the noise signal.
However, since a discontinuity is involved in the transition from voice to noise or from noise to voice, there is a problem that this rather leads to deterioration of the reproduced voice quality in terms of hearing. Furthermore, when a transmission error occurs, a noise signal is always output, so noise is output even in the originally silent portion, which also leads to deterioration in hearing.

【００２１】この発明はかかる課題を解決するためにな
されたものであり、伝送路誤りによる品質劣化の少ない
音声を再生する音声符号化方法、音声復号化方法及び音
声符号化復号化方法を提供するものである。またこの発
明は、符号誤りが発生した場合に、遅延を増大すること
なく現在のフレームの音声を良好に再生修復し、伝送路
誤りによる品質劣化の少ない音声を再生する音声復号化
方法及び音声符号化復号化方法を提供するものである。
さらにまたこの発明は、符号誤りが発生した場合に、再
生音声に雑音を重畳し、聴感上の音声品質を維持する音
声復号化方法及び音声符号化復号化方法を提供するもの
である。The present invention has been made to solve the above problems, and provides a speech coding method, a speech decoding method, and a speech coding / decoding method for reproducing speech with less quality deterioration due to a transmission path error. It is a thing. The present invention also provides a voice decoding method and a voice code which, when a code error occurs, satisfactorily reproduce and restore the voice of the current frame without increasing the delay and reproduce the voice with less quality deterioration due to a transmission path error. An encryption / decryption method is provided.
Furthermore, the present invention provides a speech decoding method and a speech encoding / decoding method that superimpose noise on reproduced speech when a code error occurs to maintain audible speech quality.

【００２２】[0022]

【課題を解決するための手段】上述の課題を解決するた
めにこの発明の音声符号化方法は、過去の駆動音源ベク
トルを記憶し、適応符号に対応して過去の駆動音源ベク
トルを周期的に繰り返した第１の時系列ベクトルを出力
する適応符号帳と、予め定められたピッチ同期位置を有
する符号ベクトルを複数記憶し、雑音符号に対応した符
号ベクトルを出力する雑音符号帳を備え、上記雑音符号
を除く上記複数のパラメータの符号の何れかを基に、現
フレームにおいてピッチ周期間隔で並ぶ特徴位置からピ
ッチ位置を抽出し、上記抽出されたピッチ位置に上記各
符号ベクトルのピッチ同期位置が合うように、ピッチ周
期長にした符号ベクトルを周期的に繰り返した第２の時
系列ベクトルを生成し、上記第１の時系列ベクトルと上
記第２の時系列ベクトルを加算して駆動音源ベクトルを
生成し、上記駆動音源ベクトルで合成フィルタを駆動し
て音声信号を再生し、上記再生した音声信号の入力音声
に対する歪を評価して符号を決定する。 [Means for Solving the Problems ]
For this reason, the speech encoding method of the present invention is based on
Memory, and corresponding to the adaptive code, past drive sound source
Outputs the first time-series vector in which Toll is repeated periodically
With an adaptive codebook and a predetermined pitch synchronization position
The code corresponding to the noise code is stored by storing multiple code vectors
And a random codebook for outputting a signal vector,
Based on any of the signs of the above parameters except
From the characteristic positions lined up at pitch intervals in the frame,
The pitch position is extracted, and each of the above-mentioned pitch positions is extracted.
Adjust the pitch circumference so that the pitch synchronization position of the code vector matches.
Second time when the code vector with period length is repeated cyclically
Generate a sequence vector and add it to the first time-series vector
The driving time source vector is calculated by adding the second time series vector.
Generate and drive the synthesis filter with the above driving sound source vector
Input audio of the reproduced audio signal
The distortion is evaluated to determine the sign.

【００２３】また次の発明の音声復号化方法は、過去の
駆動音源ベクトルを記憶し、適応符号に対応して過去の
駆動音源ベクトルを周期的に繰り返した第１の時系列ベ
クトルを出力する適応符号帳と、予め定められたピッチ
同期位置を有する符号ベクトルを複数記憶し、雑音符号
に対応した符号ベクトルを出力する雑音符号帳を備え、
上記雑音符号を除く上記複数のパラメータの符号の何れ
かを基に、現フレームにおいてピッチ周期間隔で並ぶ特
徴位置からピッチ位置を抽出し、上記抽出されたピッチ
位置に上記各符号ベクトルのピッチ同期位置が合うよう
に、ピッチ周期長にした符号ベクトルを周期的に繰り返
した第２の時系列ベクトルを生成し、上記第１の時系列
ベクトルと上記第２の時系列ベクトルを加算して駆動音
源ベクトルを生成し、上記駆動音源ベクトルで合成フィ
ルタを駆動して出力音声を生成する。 The speech decoding method of the next invention is the same as the speech decoding method of the past.
The driving sound source vector is stored, and the past
The first time-series vector in which the driving sound source vector is periodically repeated.
An adaptive codebook that outputs a cutout and a predetermined pitch
Stores multiple code vectors with synchronization positions
Equipped with a random codebook that outputs a code vector corresponding to
Any of the codes of the plurality of parameters excluding the noise code
Based on the
Pitch position is extracted from the position, and the extracted pitch is
Match the pitch synchronization position of each code vector above to the position
, The code vector with the pitch cycle length is periodically repeated.
To generate the second time series vector
Driving sound by adding the vector and the second time series vector
Generate a source vector and combine it with the above driving source vector.
Drive the audio output.

【００２４】また次の発明の音声符号化復号化方法は、
符号化側では、過去の駆動音源ベクトルを記憶し、適応
符号に対応して過去の駆動音源ベクトルを周期的に繰り
返した第１の時系列ベクトルを出力する適応符号帳と、
予め定められたピッチ同期位置を有する符号ベクトルを
複数記憶し、雑音符号に対応した符号ベクトルを出力す
る雑音符号帳を備え、上記雑音符号を除く上記複数のパ
ラメータの符号の何れかを基に、現フレームにおいてピ
ッチ周期間隔で並ぶ特徴位置からピッチ位置を抽出し、
上記抽出されたピッチ位置に上記各符号ベクトルのピッ
チ同期位置が合うように、ピッチ周期長にした符号ベク
トルを周期的に繰り返した第２の時系列ベクトルを生成
し、上記第１の時系列ベクトルと上記第２の時系列ベク
トルを加算して駆動音源ベクトルを生成し、上記駆動音
源ベクトルで合成フィルタを駆動して音声信号を再生
し、上記再生した音声信号の入力音声に対する歪を評価
して符号を決定し、復号化側では、過去の駆動音源ベク
トルを記憶し、適応符号に対応して過去の駆動音源ベク
トルを周期的に繰り返した第１の時系列ベクトルを出力
する適応符号帳と、予め定められたピッチ同期位置を有
する符号ベクトルを複数記憶し、雑音符号に対応した符
号ベクトルを出力する雑音符号帳を備え、上記雑音符号
を除く上記複数のパラメータの符号の何れかを基に、現
フレームにおいてピッチ周期間隔で並ぶ特徴位置からピ
ッチ位置を抽出し、上記抽出されたピッチ位置に上記各
符号ベクトルのピッチ同期位置が合うように、ピッチ周
期長にした符号ベクトルを周期的に繰り返した第２の時
系列ベクトルを生成し、上記第１の時系列ベクトルと上
記第２の時系列ベクトルをそれぞれ重み付けして加算し
て駆動音源ベクトルを生成し、上記駆動音源ベクトルで
合成フィルタを駆動して出力音声を生成する。 A speech encoding / decoding method of the next invention is
On the encoding side, the past driving excitation vector is stored and adapted.
The past driving sound source vector is periodically repeated corresponding to the code.
An adaptive codebook that outputs the returned first time series vector,
A code vector with a predetermined pitch synchronization position
Stores multiple codes and outputs a code vector corresponding to the noise code
A random codebook that
Based on one of the parameter signs,
Pitch positions are extracted from the characteristic positions lined up at pitch intervals,
The pitch of each code vector is added to the extracted pitch position.
The pitch is set so that the sync position is matched.
Generates a second time-series vector that repeats Toll periodically
The first time series vector and the second time series vector
Drive sound source vector is generated by adding the torque
Playback audio signal by driving synthesis filter with source vector
And evaluate the distortion of the reproduced voice signal with respect to the input voice.
To determine the code, and the decoding side
Memory, and corresponding to the adaptive code, past drive sound source
Outputs the first time-series vector in which Toll is repeated periodically
With an adaptive codebook and a predetermined pitch synchronization position
The code corresponding to the noise code is stored by storing multiple code vectors
And a random codebook for outputting a signal vector,
Based on any of the signs of the above parameters except
From the characteristic positions lined up at pitch intervals in the frame,
The pitch position is extracted, and each of the above-mentioned pitch positions is extracted.
Adjust the pitch circumference so that the pitch synchronization position of the code vector matches.
Second time when the code vector with period length is repeated cyclically
Generate a sequence vector and add it to the first time-series vector
The second time series vector is weighted and added.
Drive source vector is generated by
A synthesis filter is driven to generate an output voice.

【００２５】また次の発明の音声符号化方法は、過去の
駆動音源ベクトルを記憶し、適応符号に対応して過去の
駆動音源ベクトルを周期的に繰り返した第１の時系列ベ
クトルを出力する適応符号帳と、予め定められたピッチ
同期位置と、予めピッチ周期に対応して定められた切り
出し位置を有する符号ベクトルを複数記憶し、雑音符号
に対応した符号ベクトルを出力する雑音符号帳を備え、
現フレームにおいてピッチ周期間隔で並ぶ特徴位置から
ピッチ位置を抽出し、上記各符号ベクトルをピッチ周期
に対応した切り出し位置で切り出し、この切り出した符
号ベクトルのピッチ同期位置が上記抽出されたピッチ位
置に合うように周期的に繰り返した第２の時系列ベクト
ルを生成し、上記第１の時系列ベクトルと上記第２の時
系列ベクトルをそれぞれ重み付けして加算して駆動音源
ベクトルを生成し、上記駆動音源ベクトルで合成フィル
タを駆動して音声信号を再生し、上記再生した音声信号
の入力音声に対する歪を評価して符号を決定する。 The speech encoding method of the next invention is the same as that of the past.
The driving sound source vector is stored, and the past
The first time-series vector in which the driving sound source vector is periodically repeated.
An adaptive codebook that outputs a cutout and a predetermined pitch
The synchronization position and the preset cutoff corresponding to the pitch cycle.
Stores a plurality of code vectors with output positions,
Equipped with a random codebook that outputs a code vector corresponding to
From the characteristic positions lined up at pitch intervals in the current frame
Pitch position is extracted and each of the above code vectors is pitch period
Cut out at the cutout position corresponding to
The pitch synchronization position of the signal vector is the pitch position extracted above.
The second time-series vector that is periodically repeated to match the position
To generate the first time series vector and the second time series
Driving sound source by weighting and adding sequence vectors
Generate a vector and synthesize fill with the above driving sound source vector
Drive the audio signal to reproduce the audio signal, and the reproduced audio signal above.
The code is determined by evaluating the distortion of the input voice of.

【００２６】さらに次の発明の音声符号化方法は、フレ
ームはピッチ周期の整数倍長であるようにした。Further, in the speech coding method of the next invention, the frame has an integer multiple of the pitch period.

【００２７】さらに次の発明の音声符号化方法は、フレ
ームの規準となる固定長を定め、ピッチ周期の整数倍長
はその固定長に最も近いものとするようにした。Further, in the speech coding method of the next invention, a fixed length which is a standard of a frame is determined, and an integer multiple of the pitch period is set to be closest to the fixed length.

【００２８】さらに次の発明の音声符号化方法は、フレ
ームの規準となる固定長を定め、ピッチ周期の整数倍長
の平均は常にその固定長以上とするようにした。Further, in the speech encoding method of the next invention, a fixed length which is a standard of the frame is determined, and the average of integral multiples of the pitch period is always equal to or larger than the fixed length.

【００２９】さらに次の発明の音声符号化方法は、フレ
ームの規準となる固定点を定め、ピッチ周期の整数倍長
はフレーム境界がその固定点に最も近いものとするよう
にした。Further, in the speech encoding method of the next invention, a fixed point serving as a standard of a frame is determined, and an integer multiple of the pitch period is set so that the frame boundary is closest to the fixed point.

【００３０】また次の発明の音声復号化方法は、過去の
駆動音源ベクトルを記憶し、適応符号に対応して過去の
駆動音源ベクトルを周期的に繰り返した第１の時系列ベ
クトルを出力する適応符号帳と、予め定められたピッチ
同期位置と、予めピッチ周期に対応して定められた切り
出し位置を有する符号ベクトルを複数記憶し、雑音符号
に対応した符号ベクトルを出力する雑音符号帳を備え、
現フレームにおいてピッチ周期間隔で並ぶ特徴位置から
ピッチ位置を抽出し上記各符号ベクトルをピッチ周期に
対応した切り出し位置で切り出し、この切り出した符号
ベクトルのピッチ同期位置が上記抽出されたピッチ位置
に合うように周期的に繰り返した第２の時系列ベクトル
を生成し、上記第１の時系列ベクトルと上記第２の時系
列ベクトルをそれぞれ重み付けして加算して駆動音源ベ
クトルを生成し、上記駆動音源ベクトルで合成フィルタ
を駆動して出力音声を生成する。 The speech decoding method of the next invention is the same as that of the past.
The driving sound source vector is stored, and the past
The first time-series vector in which the driving sound source vector is periodically repeated.
An adaptive codebook that outputs a cutout and a predetermined pitch
The synchronization position and the preset cutoff corresponding to the pitch cycle.
Stores a plurality of code vectors with output positions,
Equipped with a random codebook that outputs a code vector corresponding to
From the characteristic positions lined up at pitch intervals in the current frame
The pitch position is extracted and the above code vectors are set to the pitch period.
Cut out at the corresponding cutout position, and this cut out code
The pitch sync position of the vector is the pitch position extracted above
Second time series vector that is periodically repeated to match
To generate the first time series vector and the second time series
Each column vector is weighted and added to add the driving sound source vector.
Generate a cuttle and synthesize filter with the above driving sound source vector
To produce output audio.

【００３１】さらに次の発明の音声復号化方法は、フレ
ームはピッチ周期の整数倍長であるようにした。Further, in the speech decoding method of the next invention, the frame has an integer multiple of the pitch period.

【００３２】さらに次の発明の音声復号化方法は、フレ
ームの規準となる固定長を定め、ピッチ周期の整数倍長
はその固定長に最も近いものとするようにした。Further, in the speech decoding method of the next invention, a fixed length which is a standard of a frame is determined, and an integer multiple of the pitch period is set to be closest to the fixed length.

【００３３】さらに次の発明の音声復号化方法は、フレ
ームの規準となる固定長を定め、ピッチ周期の整数倍長
の平均は常にその固定長以上とするようにした。Further, in the speech decoding method of the next invention, a fixed length which is the standard of the frame is set, and the average of integral multiples of the pitch period is always equal to or larger than the fixed length.

【００３４】さらに次の発明の音声復号化方法は、フレ
ームの規準となる固定点を定め、ピッチ周期の整数倍長
はフレーム境界がその固定点に最も近いものとするよう
にした。Further, in the speech decoding method of the next invention, a fixed point serving as a standard of a frame is determined, and an integer multiple of the pitch period is set so that the frame boundary is closest to the fixed point.

【００３５】また次の発明の音声符号化復号化方法は、
符号化側では、過去の駆動音源ベクトルを記憶し、適応
符号に対応して過去の駆動音源ベクトルを周期的に繰り
返した第１の時系列ベクトルを出力する適応符号帳と、
予め定められたピッチ同期位置と、予めピッチ周期に対
応して定められた切り出し位置を有する符号ベクトルを
複数記憶し、雑音符号に対応した符号ベクトルを出力す
る雑音符号帳を備え、現フレームにおいてピッチ周期間
隔で並ぶ特徴位置からピッチ位置を抽出し、上記各符号
ベクトルをピッチ周期に対応した切り出し位置で切り出
し、この切り出した符号ベクトルのピッチ同期位置が上
記抽出されたピッチ位置に合うように周期的に繰り返し
た第２の時系列ベクトルを生成し、上記第１の時系列ベ
クトルと上記第２の時系列ベクトルをそれぞれ重み付け
して加算して駆動音源ベクトルを生成し、上記駆動音源
ベクトルで合成フィルタを駆動して音声信号を再生し、
上記再生した音声信号の入力音声に対する歪を評価して
符号を決定し、復号化側では、過去の駆動音源ベクトル
を記憶し、適応符号に対応して過去の駆動音源ベクトル
を周期的に繰り返した第１の時系列ベクトルを出力する
適応符号帳と、予め定められたピッチ同期位置と、予め
ピッチ周期に対応して定められた切り出し位置を有する
符号ベクトルを複数記憶し、雑音符号に対応した符号ベ
クトルを出力する雑音符号帳を備え、現フレームにおい
てピッチ周期間隔で並ぶ特徴位置からピッチ位置を抽出
し、上記各符号ベクトルをピッチ周期に対応した切り出
し位置で切り出し、この切り出した符号ベクトルのピッ
チ同期位置が上記抽出されたピッチ位置に合うように周
期的に繰り返した第２の時系列ベクトルを生成し、上記
第１の時系列ベクトルと上記第２の時系列ベクトルをそ
れぞれ重み付けして加算して駆動音源ベクトルを生成
し、上記駆動音源ベクトルで合成フィルタを駆動して出
力音声を生成する A speech encoding / decoding method of the next invention is
On the encoding side, the past driving excitation vector is stored and adapted.
The past driving sound source vector is periodically repeated corresponding to the code.
An adaptive codebook that outputs the returned first time series vector,
The preset pitch synchronization position and the pitch period
The code vector with the cutout position determined accordingly.
Stores multiple codes and outputs a code vector corresponding to the noise code
With a random codebook for
The pitch positions are extracted from the characteristic positions arranged at intervals and
Cut out the vector at the cutting position corresponding to the pitch period
However, the pitch synchronization position of this clipped code vector is
Periodically repeated to match the extracted pitch position
Second time series vector is generated, and the first time series vector is generated.
The weight of Khuttle and the second time series vector
And add to generate a driving sound source vector
Drive the synthesis filter with a vector to reproduce the audio signal,
Evaluate the distortion of the reproduced voice signal with respect to the input voice
The code is decided, and on the decoding side, the past driving sound source vector
And the past driving sound source vector corresponding to the adaptive code
Outputs the first time-series vector that is repeated periodically
Adaptive codebook, predetermined pitch synchronization position, and
Has a cutout position that is determined according to the pitch period
Multiple code vectors are stored and the code vector corresponding to the noise code is stored.
Equipped with a random codebook that outputs a cutout,
Pitch positions are extracted from the characteristic positions lined up at pitch cycle intervals.
Then, cut out each of the above code vectors corresponding to the pitch period.
Cut out at this position, and
The pitch is adjusted so that the synchronization position matches the pitch position extracted above.
Generate a second time series vector that is repeated periodically,
The first time series vector and the second time series vector are
Generate driving sound source vectors by weighting and adding each
Then, drive the synthesis filter with the above driving sound source vector and output.
Generate force voice

【００３６】また次の発明の音声符号化装置は、過去の
駆動音源ベクトルを記憶し、適応符号に対応して過去の
駆動音源ベクトルを周期的に繰り返した第１の時系列ベ
クトルを出力する適応符号帳と、予め定められたピッチ
同期位置を有する符号ベクトルを複数記憶し、雑音符号
に対応した符号ベクトルを出力する雑音符号帳と、上記
雑音符号を除く上記複数のパラメータの符号の何れかを
基に、現フレームにおいてピッチ周期間隔で並ぶ特徴位
置からピッチ位置を抽出するピッチ位置抽出手段と、上
記抽出されたピッチ位置に上記各符号ベクトルのピッチ
同期位置が合うように、ピッチ周期長にした符号ベクト
ルを周期的に繰り返した第２の時系列ベクトルを生成す
るピッチ同期化手段と、上記第１の時系列ベクトルと上
記第２の時系列ベクトルを加算して駆動音源ベクトルを
生成する加算手段と、上記駆動音源ベクトルで音声信号
を再生する合成フィルタと、上記再生した音声信号の入
力音声に対する歪を評価して符号を決定する距離算出手
段とを備える。 The speech coding apparatus of the next invention is the same as the speech coding apparatus of the past.
The driving sound source vector is stored, and the past
The first time-series vector in which the driving sound source vector is periodically repeated.
An adaptive codebook that outputs a cutout and a predetermined pitch
Stores multiple code vectors with synchronization positions
A random codebook that outputs a code vector corresponding to
Select any of the codes of the above multiple parameters excluding the noise code
Based on the feature position of the current frame
Pitch position extraction means for extracting the pitch position from the
The pitch of each of the above code vectors at the extracted pitch position
Code vector with pitch period length so that the synchronization positions match
Generate a second time series vector that repeats
Pitch synchronization means and the first time-series vector
The driving time source vector is calculated by adding the second time series vector.
Addition means to generate and audio signal with the above driving sound source vector
And the input of the reproduced audio signal.
Distance calculator that evaluates distortion for force speech and determines the code
And a step.

【００３７】また次の発明の音声復号化装置は、過去の
駆動音源ベクトルを記憶し、適応符号に対応して過去の
駆動音源ベクトルを周期的に繰り返した第１の時系列ベ
クトルを出力する適応符号帳と、予め定められたピッチ
同期位置を有する符号ベクトルを複数記憶し、雑音符号
に対応した符号ベクトルを出力する雑音符号帳と、上記
雑音符号を除く上記複数のパラメータの符号の何れかを
基に、現フレームにおいてピッチ周期間隔で並ぶ特徴位
置からピッチ位置を抽出するピッチ位置抽出手段と、上
記抽出されたピッチ位置に上記各符号ベクトルのピッチ
同期位置が合うように、ピッチ周期長にした符号ベクト
ルを周期的に繰り返した第２の時系列ベクトルを生成す
るピッチ同期化手段と、上記第１の時系列ベクトルと上
記第２の時系列ベクトルを加算して駆動音源ベクトルを
生成する加算手段と、上記駆動音源ベクトルで出力音声
を生成する合成フィルタとを備える。 The speech decoding apparatus of the next invention is the same as the speech decoding apparatus of the past.
The driving sound source vector is stored, and the past
The first time-series vector in which the driving sound source vector is periodically repeated.
An adaptive codebook that outputs a cutout and a predetermined pitch
Stores multiple code vectors with synchronization positions
A random codebook that outputs a code vector corresponding to
Select any of the codes of the above multiple parameters excluding the noise code
Based on the feature position of the current frame
Pitch position extraction means for extracting the pitch position from the
The pitch of each of the above code vectors at the extracted pitch position
Code vector with pitch period length so that the synchronization positions match
Generate a second time series vector that repeats
Pitch synchronization means and the first time-series vector
The driving time source vector is calculated by adding the second time series vector.
The adding means for generating and the synthesizing filter for generating the output sound by the driving sound source vector are provided.

【００３８】また次の発明の音声符号化装置は、過去の
駆動音源ベクトルを記憶し、適応符号に対応して過去の
駆動音源ベクトルを周期的に繰り返した第１の時系列ベ
クトルを出力する適応符号帳と、予め定められたピッチ
同期位置と、予めピッチ周期に対応して定められた切り
出し位置を有する符号ベクトルを複数記憶し、雑音符号
に対応した符号ベクトルを出力する雑音符号帳と、現フ
レームにおいてピッチ周期間隔で並ぶ特徴位置からピッ
チ位置を抽出するピッチ位置抽出手段と、上記各符号ベ
クトルをピッチ周期に対応した切り出し位置で切り出
し、この切り出した符号ベクトルのピッチ同期位置が上
記抽出されたピッチ位置に合うように周期的に繰り返し
た第２の時系列ベクトルを生成するピッチ同期化手段
と、上記第１の時系列ベクトルと上記第２の時系列ベク
トルを加算して駆動音源ベクトルを生成する加算手段
と、上記駆動音源ベクトルで音声信号を再生する合成フ
ィルタと、上記再生した音声信号の入力音声に対する歪
を評価して符号を決定する距離算出手段とを備える。 The speech coding apparatus of the next invention is the same as the speech coding apparatus of the past.
The driving sound source vector is stored, and the past
The first time-series vector in which the driving sound source vector is periodically repeated.
An adaptive codebook that outputs a cutout and a predetermined pitch
The synchronization position and the preset cutoff corresponding to the pitch cycle.
Stores a plurality of code vectors with output positions,
The random codebook that outputs the code vector corresponding to
From the characteristic positions lined up at pitch period intervals in the rem
Pitch position extracting means for extracting the H position, and
Cut out the cuttle at the cutout position corresponding to the pitch cycle.
However, the pitch synchronization position of this clipped code vector is
Periodically repeated to match the extracted pitch position
Pitch synchronization means for generating a second time-series vector
And the first time series vector and the second time series vector
Adding means for adding driving torque to generate a driving sound source vector
And a synthetic stream that reproduces an audio signal with the above driving sound source vector.
And the distortion of the reproduced audio signal with respect to the input audio.
And a distance calculating means for determining the sign.

【００３９】また次の発明の音声復号化装置は、過去の
駆動音源ベクトルを記憶し、適応符号に対応して過去の
駆動音源ベクトルを周期的に繰り返した第１の時系列ベ
クトルを出力する適応符号帳と、予め定められたピッチ
同期位置と、予めピッチ周期に対応して定められた切り
出し位置を有する符号ベクトルを複数記憶し、雑音符号
に対応した符号ベクトルを出力する雑音符号帳と、現フ
レームにおいてピッチ周期間隔で並ぶ特徴位置からピッ
チ位置を抽出するピッチ位置抽出手段と、上記各符号ベ
クトルをピッチ周期に対応した切り出し位置で切り出
し、この切り出した符号ベクトルのピッチ同期位置が上
記抽出されたピッチ位置に合うように周期的に繰り返し
た第２の時系列ベクトルを生成するピッチ同期化手段
と、上記第１の時系列ベクトルと上記第２の時系列ベク
トルをそれぞれ重み付けして加算して駆動音源ベクトル
を生成する加算手段と、上記駆動音源ベクトルで出力音
声を生成する合成フィルタとを備える。 The speech decoding apparatus of the next invention is the same as the speech decoding apparatus of the past.
The driving sound source vector is stored, and the past
The first time-series vector in which the driving sound source vector is periodically repeated.
An adaptive codebook that outputs a cutout and a predetermined pitch
The synchronization position and the preset cutoff corresponding to the pitch cycle.
Stores a plurality of code vectors with output positions,
The random codebook that outputs the code vector corresponding to
From the characteristic positions lined up at pitch period intervals in the rem
Pitch position extracting means for extracting the H position, and
Cut out the cuttle at the cutout position corresponding to the pitch cycle.
However, the pitch synchronization position of this clipped code vector is
Periodically repeated to match the extracted pitch position
Pitch synchronization means for generating a second time-series vector
And the first time series vector and the second time series vector
Driving sound source vector
And the output sound from the driving sound source vector
And a synthesis filter for generating a voice.

【００４０】[0040]

【００４１】[0041]

【００４２】[0042]

【００４３】[0043]

【００４４】[0044]

【発明の実施の形態】以下図面を参照しながら、この発
明の実施の形態について説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００４５】実施の形態１．図１３との対応部分に同一
符号を付けて示す図１は、この発明による音声符号化方
法及び音声復号化方法の実施の形態１の全体構成を示
し、図中符号化部１において２４は適応符号帳１１と切
り替えて用いる固定符号帳、２５は固定符号帳２４と合
わせて用いる第２の雑音符号帳、２６はピッチ位置抽出
手段である。また図中復号化部２において２７は適応符
号帳１７と切り替えて用いる固定符号帳、２８は固定符
号帳２７と合わせて用いる第２の雑音符号帳、２９はピ
ッチ位置抽出手段である。Embodiment 1. FIG. 1 in which parts corresponding to those in FIG. 13 are assigned the same reference numerals shows the overall configuration of the first embodiment of the speech coding method and speech decoding method according to the present invention. A fixed codebook used by switching to the codebook 11, 25 is a second random codebook used together with the fixed codebook 24, and 26 is a pitch position extraction means. In the decoding unit 2 in the figure, 27 is a fixed codebook used by switching from the adaptive codebook 17, 28 is a second random codebook used together with the fixed codebook 27, and 29 is a pitch position extraction means.

【００４６】以下、動作を説明する。まず、符号化部１
において、線形予測パラメータ分析手段８は入力音声Ｓ
１を分析し、音声のスペクトル情報である線形予測パラ
メータを抽出する。線形予測パラメータ符号化手段９は
その線形予測パラメータを符号化し、符号化した線形予
測パラメータを合成フィルタ１０の係数として設定す
る。ここで、音源情報の符号化について説明する。音源
情報を符号化して駆動音源ベクトルを生成するには、適
応符号帳１１と雑音符号帳１２とを用いる場合と、固定
符号帳２４と第２の雑音符号帳２５とを用いる場合の２
通りの方法をとり、例えば符号化音声の入力音声との距
離がより小さくなる駆動音源ベクトルを生成する方法を
選択するとして、これら２通りの方法を切り替えて用い
る。ここでは、まず固定符号帳２４と第２の雑音符号帳
２５とを用いる場合について説明し、次に適応符号帳１
１と雑音符号帳１２とを用いる場合について説明する。The operation will be described below. First, the encoding unit 1
In the linear prediction parameter analysis means 8, the input speech S
1 is analyzed, and a linear prediction parameter that is the spectrum information of the voice is extracted. The linear prediction parameter coding means 9 codes the linear prediction parameter and sets the coded linear prediction parameter as a coefficient of the synthesis filter 10. Here, encoding of sound source information will be described. In order to code the excitation information to generate the driving excitation vector, two cases are used: the case of using the adaptive codebook 11 and the random codebook 12, and the case of using the fixed codebook 24 and the second random codebook 25.
Assuming that a method for generating a driving sound source vector in which the distance between the coded speech and the input speech becomes smaller is selected, these two methods are switched and used. Here, the case of using the fixed codebook 24 and the second random codebook 25 will be described first, and then the adaptive codebook 1 will be described.
The case of using 1 and the random codebook 12 will be described.

【００４７】固定符号帳２４には、例えばランダム雑音
から生成した複数の時系列ベクトルが記憶されており、
適応符号に対応した時系列ベクトルを出力する。第２の
雑音符号帳２５には、例えばランダム雑音から生成した
複数の時系列ベクトルが記憶されており、雑音符号に対
応した時系列ベクトルを出力する。固定符号帳２４、第
２の雑音符号帳２５からの各時系列ベクトルはゲイン符
号化部１３から与えられるそれぞれのゲインに応じて重
み付けして加算され、その加算結果を駆動音源ベクトル
として合成フィルタ１０へ供給され符号化音声を得る。
距離計算手段１４は符号化音声と入力音声Ｓ１との距離
を求め、距離が最小となる適応符号、雑音符号、ゲイン
を探索する。The fixed codebook 24 stores a plurality of time series vectors generated from random noise, for example.
The time series vector corresponding to the adaptive code is output. The second random codebook 25 stores, for example, a plurality of time series vectors generated from random noise, and outputs a time series vector corresponding to the random code. The time series vectors from the fixed codebook 24 and the second random codebook 25 are weighted and added according to the respective gains given from the gain coding unit 13, and the addition result is used as a driving excitation vector to synthesize the filter 10. To obtain encoded speech.
The distance calculation means 14 obtains the distance between the encoded voice and the input voice S1, and searches for an adaptive code, a noise code, and a gain that minimize the distance.

【００４８】次に、適応符号帳１１と雑音符号帳１２と
を用いる場合について説明する。適応符号帳１１には、
過去の駆動音源ベクトルが記憶されており、適応符号に
対応して過去の駆動音源ベクトルを周期的に繰り返した
時系列ベクトルを出力する。ピッチ位置抽出手段２６
は、前フレームにおいて固定符号帳が選択されている場
合は、前フレームにおいて固定符号帳２４から出力され
た時系列ベクトルから、例えば時系列ベクトルの終端部
分で、現フレームで探索する適応符号に対応した周期長
の範囲において最大振幅をとる点を基準として、適応符
号に対応した周期で得られる点をピッチ位置として抽出
する。Next, the case where the adaptive codebook 11 and the random codebook 12 are used will be described. In the adaptive codebook 11,
The past driving sound source vector is stored, and a time series vector obtained by periodically repeating the past driving sound source vector corresponding to the adaptive code is output. Pitch position extraction means 26
If the fixed codebook is selected in the previous frame, corresponds to the adaptive code searched in the current frame from the time series vector output from the fixed codebook 24 in the previous frame, for example, at the end of the time series vector. The point obtained in the cycle corresponding to the adaptive code is extracted as the pitch position with reference to the point having the maximum amplitude in the range of the cycle length.

【００４９】また、前フレームにおいて適応符号帳が選
択されている場合は、前フレームで求めたピッチ位置を
基準として、現フレームで探索する適応符号に対応した
周期で繰り返した点をピッチ位置として抽出する。図２
にこの実施の形態１において現フレームで探索する適応
符号に対応した周期をＬとしたときのピッチ位置抽出処
理の例を示す。図２（Ａ）は、前フレームにおいて固定
符号帳が選択されている場合、図２（Ｂ）は前フレーム
において適応符号帳が選択されている場合の例である。When the adaptive codebook is selected in the previous frame, a point repeated at a cycle corresponding to the adaptive code searched in the current frame is extracted as a pitch position with reference to the pitch position obtained in the previous frame. To do. Figure 2
An example of the pitch position extraction process when the period corresponding to the adaptive code searched in the current frame is L in the first embodiment is shown in FIG. FIG. 2A shows an example when the fixed codebook is selected in the previous frame, and FIG. 2B shows an example when the adaptive codebook is selected in the previous frame.

【００５０】雑音符号帳１２には、例えばランダム雑音
から生成した複数の符号ベクトルが記憶されており、雑
音符号に対応した符号ベクトルを出力する。各符号ベク
トルにはピッチ同期位置が設定されており、ピッチ同期
化手段２１は、前記ピッチ位置抽出手段２６で抽出され
たピッチ位置にピッチ同期位置が合うように、符号ベク
トルを切り出しピッチ周期長にし、これを周期的に繰り
返した時系列ベクトルを生成する。適応符号帳１１、ピ
ッチ同期化手段２１からの各時系列ベクトルは、ゲイン
符号化部１３から与えられるそれぞれのゲインに応じて
重み付けして加算され、その加算結果を駆動音源ベクト
ルとして合成フィルタ１０へ供給され符号化音声を得
る。距離計算手段１４は符号化音声と入力音声Ｓ１との
距離を求め、距離が最小となる適応符号、雑音符号、ゲ
インを探索する。The random codebook 12 stores a plurality of code vectors generated from random noise, for example, and outputs a code vector corresponding to a random code. A pitch synchronization position is set for each code vector, and the pitch synchronization means 21 cuts out the code vector and sets the pitch cycle length so that the pitch synchronization position matches the pitch position extracted by the pitch position extraction means 26. , Which is repeated periodically to generate a time series vector. The time-series vectors from the adaptive codebook 11 and the pitch synchronization means 21 are weighted and added according to the respective gains given from the gain coding unit 13, and the addition result is sent to the synthesis filter 10 as a driving excitation vector. The supplied coded speech is obtained. The distance calculation means 14 obtains the distance between the encoded voice and the input voice S1, and searches for an adaptive code, a noise code, and a gain that minimize the distance.

【００５１】そして適応符号帳１１と雑音符号帳１２と
を用いる場合と、固定符号帳２４と第２の雑音符号帳２
５とを用いる場合とで、それぞれで選択された適応符
号、雑音符号、ゲインを用いて生成される符号化音声と
入力音声Ｓ１との距離を比較し、距離が小さくなる方の
適応符号、雑音符号、ゲインが選択される。以上符号化
が終了した後、線形予測パラメータの符号、入力音声と
符号化音声との歪みを最小にする適応符号、雑音符号、
ゲインの符号を符号化結果Ｓ２として出力する。以上が
この実施の形態１の音声符号化方法に特徴的な動作であ
る。When the adaptive codebook 11 and the random codebook 12 are used, the fixed codebook 24 and the second random codebook 2 are used.
5 and the case where 5 is used, the distance between the coded speech generated using the adaptive code, the noise code, and the gain selected respectively and the input speech S1 is compared, and the adaptive code and the noise with the smaller distance are compared. The sign and gain are selected. After the above coding is completed, the code of the linear prediction parameter, the adaptive code that minimizes the distortion between the input speech and the coded speech, the noise code,
The sign of the gain is output as the coding result S2. The above is the characteristic operation of the speech coding method according to the first embodiment.

【００５２】次に復号化部２について説明する。復号化
部２では、線形予測パラメータ復号化手段１５は線形予
測パラメータの符号から線形予測パラメータを復号化
し、合成フィルタ１６の係数として設定する。ここで音
源情報の復号化について説明する。音源情報を復号化し
て駆動音源ベクトルを生成するには、適応符号帳１７と
雑音符号帳１８とを用いる場合と、固定符号帳２７と第
２の雑音符号帳２８とを用いる場合の２通りの方法をと
り、適応符号、雑音符号に対応して、符号化する際に用
いられた方法を選択するとして、これら２通りの方法を
切り替えて用いる。Next, the decoding unit 2 will be described. In the decoding unit 2, the linear prediction parameter decoding unit 15 decodes the linear prediction parameter from the code of the linear prediction parameter and sets it as the coefficient of the synthesis filter 16. Here, decoding of sound source information will be described. There are two methods for decoding the excitation information and generating the driving excitation vector: using adaptive codebook 17 and random codebook 18, and using fixed codebook 27 and second random codebook 28. Assuming that a method is used and the method used for encoding is selected corresponding to the adaptive code and the noise code, these two methods are switched and used.

【００５３】ここでは、まず固定符号帳２７と第２の雑
音符号帳２８とを用いる場合について説明し、次に適応
符号帳１７と雑音符号帳１８とを用いる場合について説
明する。固定符号帳２７と第２の雑音符号帳２８とを用
いる場合、固定符号帳２７は適応符号に対応した時系列
ベクトルを出力し、また、第２の雑音符号帳２８は雑音
符号に対応した時系列ベクトルを出力する。次に、適応
符号帳１７と雑音符号帳１８とを用いる場合について説
明する。適応符号帳１７は、適応符号に対応して、過去
の駆動音源ベクトルを周期的に繰り返した時系列ベクト
ルを出力する。ピッチ位置抽出手段２９は、前フレーム
で固定符号帳が選択されている場合は、前フレームにお
いて固定符号帳２７から出力された時系列ベクトルか
ら、また、前フレームにおいて適応符号帳が選択されて
いる場合は、前フレームで求めたピッチ位置から、符号
化部１のピッチ位置抽出手段２６と同様の方法でピッチ
位置を抽出する。Here, the case of using the fixed codebook 27 and the second random codebook 28 will be described first, and then the case of using the adaptive codebook 17 and the random codebook 18 will be described. When the fixed codebook 27 and the second random codebook 28 are used, the fixed codebook 27 outputs the time series vector corresponding to the adaptive code, and the second random codebook 28 outputs the time series vector corresponding to the random code. Output the series vector. Next, the case of using the adaptive codebook 17 and the random codebook 18 will be described. The adaptive codebook 17 outputs a time-series vector obtained by periodically repeating the past drive excitation vector corresponding to the adaptive code. When the fixed codebook is selected in the previous frame, the pitch position extraction means 29 selects the adaptive codebook in the previous frame from the time series vector output from the fixed codebook 27 in the previous frame. In this case, the pitch position is extracted from the pitch position obtained in the previous frame by the same method as the pitch position extracting means 26 of the encoding unit 1.

【００５４】また、雑音符号帳１８は雑音符号に対応し
た符号ベクトルを出力する。符号ベクトルにはピッチ同
期位置が設定されており、ピッチ同期化手段２３は、前
記ピッチ位置抽出手段２９で抽出されたピッチ位置にピ
ッチ同期位置が合うように、符号ベクトルを切り出しピ
ッチ周期長にし、これを周期的に繰り返した時系列ベク
トルを生成する。固定符号帳２７と第２の雑音符号帳２
８からの時系列ベクトル、あるいは適応符号帳１７とピ
ッチ同期化手段２３からの時系列ベクトルは、ゲイン復
号化手段１９でゲインの符号から復号化したそれぞれの
ゲインに応じて重み付けして加算され、その加算結果を
駆動音源ベクトルとして合成フィルタ１６へ供給され出
力音声Ｓ３が得られる。以上がこの実施の形態１の音声
復号化方法に特徴的な動作である。The random codebook 18 also outputs a code vector corresponding to the random code. A pitch synchronization position is set in the code vector, and the pitch synchronization means 23 cuts out the code vector so that the pitch synchronization position matches the pitch position extracted by the pitch position extraction means 29, This is periodically repeated to generate a time series vector. Fixed codebook 27 and second random codebook 2
The time series vector from 8 or the time series vector from the adaptive codebook 17 and the pitch synchronizing means 23 is weighted and added according to each gain decoded from the gain code by the gain decoding means 19, The addition result is supplied to the synthesis filter 16 as a driving sound source vector, and the output voice S3 is obtained. The above is the operation characteristic of the speech decoding method according to the first embodiment.

【００５５】この実施の形態１によれば、雑音符号ベク
トルをピッチ同期化する際のピッチ位置を適応符号のみ
にかかわる情報から求め、雑音符号にかかわる情報には
一切無関係に決定することにより、雑音符号に伝送誤り
が発生しても、適応符号が正しく伝送されていればピッ
チ位置は誤りの影響を受けないので、雑音符号の誤りの
影響の時間的波及が小さく、再生音声の劣化を回避し、
伝送誤りによる品質劣化の少ない音声を再生することが
できる。According to the first embodiment, the pitch position at the time of pitch synchronization of the noise code vector is obtained from the information related to the adaptive code only, and is determined regardless of the information related to the noise code. Even if a transmission error occurs in the code, if the adaptive code is transmitted correctly, the pitch position is not affected by the error, so the influence of the noise code error is small in time and avoids deterioration of the reproduced voice. ,
It is possible to reproduce voice with little quality deterioration due to transmission error.

【００５６】実施の形態２．この発明による音声符号化
方法及び音声復号化方法の実施の形態２におけるピッチ
同期化処理を説明する。雑音符号帳内の各符号ベクトル
には、ピッチ周期に対応して符号ベクトルを切り出す位
置が設定されており、この位置より符号ベクトルを切り
出しピッチ周期長にする。そして、これをピッチ位置に
ピッチ同期位置が合うように周期的に繰り返した時系列
ベクトルを生成する。図３にこの実施の形態２における
ピッチ同期化処理により生成される時系列ベクトルの例
を示す。図中ピッチ周期がＬの場合、符号ベクトルを切
り出す位置はピッチ同期位置からφ（Ｌ）前とし、φ
（Ｌ）は例えばＬ／２とするなどピッチ周期のみから決
定する。図３（Ａ）、（Ｂ）は符号ベクトルとピッチ周
期長は同じで、ピッチ位置のみが異なるもので、ピッチ
周期長の符号ベクトルが表す波形は（Ａ）、（Ｂ）とも
同一であり、これを周期的に繰り返した時系列ベクトル
も同一の波形を時間的にシフトしたものになる。Embodiment 2. Pitch synchronization processing in the second embodiment of the speech coding method and speech decoding method according to the present invention will be described. Each code vector in the random codebook is set with a position at which the code vector is cut out corresponding to the pitch cycle, and the code vector is cut out from this position to have a pitch cycle length. Then, a time series vector is generated by repeating this periodically so that the pitch synchronization position matches the pitch position. FIG. 3 shows an example of the time-series vector generated by the pitch synchronization processing according to the second embodiment. When the pitch cycle is L in the figure, the position where the code vector is cut out is φ (L) before the pitch synchronization position, and φ
(L) is determined from only the pitch period, for example, L / 2. 3A and 3B, the code vector and the pitch period length are the same, only the pitch position is different, and the waveforms represented by the pitch period code vector are the same in FIGS. 3A and 3B. A time-series vector obtained by periodically repeating this is also the same waveform shifted in time.

【００５７】この実施の形態２によれば、ピッチ同期化
処理において雑音符号に対応した雑音符号帳内の符号ベ
クトルを切り出しピッチ周期長にする際、ピッチ周期長
のみをパラメータとして用い、ピッチ位置には一切無関
係に決定するので、ピッチ位置が誤っている場合でもピ
ッチ周期長の符号ベクトルが表す波形は本来の正しいも
のと同一であり、これを周期的に繰り返した時系列ベク
トルの波形は本来の正しいものを時間的にシフトしたも
のになるので、ピッチ位置の誤りによる聴感上の劣化は
小さく、再生音声の劣化を回避し、伝送誤りによる品質
劣化の少ない音声を再生することができる。According to the second embodiment, when the code vector in the random codebook corresponding to the random code is cut out in the pitch synchronization process to be the pitch cycle length, only the pitch cycle length is used as a parameter and the pitch position is set. , The waveform represented by the code vector of the pitch cycle length is the same as the original correct one, even if the pitch position is incorrect. Since the correct one is temporally shifted, the auditory deterioration due to the error in the pitch position is small, the deterioration of the reproduced sound can be avoided, and the sound with less quality deterioration due to the transmission error can be reproduced.

【００５８】実施の形態３．上述の実施の形態２では、
雑音符号帳内の符号ベクトルを切り出す位置はピッチ同
期位置からφ（Ｌ）前とし、φ（Ｌ）は例えばＬ／２と
するなどピッチ周期によらず同一の関数を用いて決定し
ていたが、これに代え、ピッチ周期毎に符号ベクトルを
切り出す位置を決定する関数を変更しても良い。また雑
音符号帳内の符号ベクトル毎に、ピッチ周期毎に符号ベ
クトルを切り出す位置を決定する関数を変更しても良
い。この実施の形態３によれば、符号ベクトルを切り出
す位置を、ピッチ周期毎あるいは符号ベクトル毎に自由
に設定できるので、この設定を再生音声の品質が良くな
るように学習、獲得することにより、伝送誤りがない場
合であっても合成音声の品質を向上することができる。Third Embodiment In the second embodiment described above,
The position where the code vector is cut out in the random codebook is φ (L) before the pitch synchronization position, and φ (L) is determined by using the same function regardless of the pitch cycle, such as L / 2. Instead of this, the function that determines the position where the code vector is cut out for each pitch period may be changed. Further, the function for determining the position where the code vector is cut out may be changed for each pitch period for each code vector in the random codebook. According to the third embodiment, the position where the code vector is cut out can be freely set for each pitch period or each code vector. Therefore, by learning and acquiring this setting so as to improve the quality of the reproduced voice, the transmission is performed. Even if there is no error, the quality of synthesized speech can be improved.

【００５９】実施の形態４．上述の実施の形態２、３で
は、ピッチ周期によらず同一の雑音符号帳内の符号ベク
トルを用いているが、これに代え、雑音符号帳を複数個
備え、ピッチ周期に応じて用いる雑音符号帳を切り替え
るとしても良い。この実施の形態４によれば、ピッチ周
期に応じて異なる雑音符号帳を用いることができるの
で、従来１つの雑音符号帳を用いていた場合と比較して
生成できる時系列ベクトルの自由度が増し、伝送誤りが
ない場合であっても合成音声の品質を向上することがで
きる。Fourth Embodiment In the above-described Embodiments 2 and 3, the code vector in the same random codebook is used regardless of the pitch cycle, but instead of this, a plurality of random codebooks are provided and the random code is used according to the pitch cycle. You may switch books. According to the fourth embodiment, different random codebooks can be used depending on the pitch period, so that the degree of freedom of the time series vector that can be generated is increased as compared with the case where one conventional random codebook is used. Even if there is no transmission error, the quality of synthesized speech can be improved.

【００６０】実施の形態５．上述の実施の形態４でさら
に、全てのピッチ周期長毎に異なる雑音符号帳を備え、
各ピッチ周期長に対応した雑音符号帳はそのピッチ周期
長の符号ベクトルから構成されるとしても良い。この実
施の形態５によれば、各ピッチ周期に対応する雑音符号
帳内の符号ベクトルは既にピッチ周期長であるので、ピ
ッチ同期化処理において符号ベクトルを切り出し、ピッ
チ周期長にする処理を省くことができる。Fifth Embodiment In Embodiment 4 described above, a random codebook different for every pitch cycle length is further provided,
The random codebook corresponding to each pitch cycle length may be composed of code vectors of that pitch cycle length. According to the fifth embodiment, since the code vector in the random codebook corresponding to each pitch cycle is already the pitch cycle length, the code vector is cut out in the pitch synchronization processing and the processing for setting the pitch cycle length is omitted. You can

【００６１】実施の形態６．上述の実施の形態２では、
駆動音源ベクトルを生成する単位であるフレームを固定
長としているが、これに代え、フレームをピッチ周期長
に応じて可変長とし、フレームがピッチ周期の整数倍長
となるようにしても良い。図４はこの実施の形態６にお
けるフレームをピッチの整数倍長とした場合の雑音符号
ベクトルのピッチ同期化処理により生成される時系列ベ
クトルの例を示す。図４（Ａ）、（Ｂ）は符号ベクトル
とピッチ周期長は同じで、ピッチ位置のみが異なるもの
で、生成される時系列ベクトルは同一の波形であり、ピ
ッチ位置の差異はフレームの境界が時間的にずれること
に現れる。Sixth Embodiment In the second embodiment described above,
Although the frame, which is a unit for generating the driving sound source vector, has a fixed length, the frame may have a variable length according to the pitch cycle length, and the frame may have an integral multiple of the pitch cycle. FIG. 4 shows an example of a time-series vector generated by the pitch synchronization processing of the noise code vector when the frame in this Embodiment 6 has an integral multiple of the pitch. In FIGS. 4A and 4B, the code vector and the pitch period length are the same, only the pitch position is different, the generated time-series vectors are the same waveform, and the difference in the pitch position is the frame boundary. Appears to be shifted in time.

【００６２】この実施の形態６によれば、駆動音源ベク
トルを生成する単位であるフレームをピッチ周期長に応
じて可変長とし、フレームがピッチ周期の整数倍長とな
るようにしたので、従来ピッチ位置が誤っている場合は
そのフレーム境界部分で雑音符号帳から生成される時系
列ベクトルの波形が本来の正しいものと異なっていたと
いう問題を解消でき、生成される雑音符号帳からの時系
列ベクトルは本来の正しいものと全く同一の波形にな
り、ピッチ位置の誤りの影響はフレームが時間的にシフ
トするだけになるので、ピッチ位置の誤りによる聴感上
の劣化は小さく、再生音声の劣化を回避し、伝送誤りに
よる品質劣化の少ない音声を再生することができる。According to the sixth embodiment, the frame, which is a unit for generating the driving sound source vector, has a variable length according to the pitch cycle length, and the frame has an integral multiple of the pitch cycle. If the position is incorrect, the problem that the waveform of the time series vector generated from the random codebook at the frame boundary part was different from the original correct one can be solved, and the time series vector from the generated random codebook can be solved. Has the exact same waveform as the original one, and the effect of pitch position error is that the frame only shifts temporally, so the auditory deterioration due to the pitch position error is small, and the deterioration of the reproduced voice is avoided. However, it is possible to reproduce voice with less quality deterioration due to transmission error.

【００６３】実施の形態７．上述の実施の形態６でさら
に、ある固定長をフレームの規準として定め、ピッチ周
期の整数倍長である可変長のフレームは固定長に最も近
いものとなるように整数倍とする値を決定するとしても
良い。図５にフレーム周期の規準となる固定長をＮ、ピ
ッチ周期をＬとしたときのフレームの例を示す。この実
施の形態７によれば、符号化を行う単位であるフレーム
長がある程度一定の値に保たれるので、単位時間におけ
る符号化情報量（ビットレート）を安定にでき、同一の
時間長の音声であれば、ピッチ周期長によらずほぼ同一
の情報量で符号化することができる。Embodiment 7. Further, in the above-described sixth embodiment, a certain fixed length is defined as a frame standard, and a variable length frame that is an integral multiple of the pitch period is determined to have an integer multiple value so as to be closest to the fixed length. Also good. FIG. 5 shows an example of a frame when the fixed length serving as the reference of the frame period is N and the pitch period is L. According to the seventh embodiment, the frame length which is a unit for encoding is maintained at a constant value to some extent, so that the encoded information amount (bit rate) in a unit time can be stabilized and the same time length can be obtained. In the case of voice, it can be encoded with almost the same amount of information regardless of the pitch cycle length.

【００６４】実施の形態８．上述の実施の形態６でさら
に、ある固定長をフレームの規準として定め、ピッチ周
期の整数倍長である可変長のフレームの平均は常に固定
長以上となるように整数倍とする値を決定するとしても
良い。図６にフレーム周期の規準となる固定長をＮ、ピ
ッチ周期をＬとしたときのフレームの例を示す。この実
施の形態８によれば、あるフレーム数の可変長のフレー
ムで符復号化される音声の時間長が、必ず同フレーム数
の固定長のフレームで符復号化される音声の時間長以上
になるので、通信に適用する際復号化側の遅延を小さく
することができる。また符号化を行う単位であるフレー
ムの周期の平均が常にある一定の値以上に保たれるの
で、単位時間における符号化情報量（ビットレート）の
最大値を規定でき、同一の時間長の音声であれば、ピッ
チ周期長によらず規定されるビットレートの最大値によ
って決まる情報量以下で符号化することができる。Embodiment 8. Further, in the above-described sixth embodiment, a certain fixed length is set as a frame standard, and a value that is an integral multiple is determined so that the average of a variable length frame that is an integral multiple of the pitch period is always the fixed length or more. Also good. FIG. 6 shows an example of a frame where N is a fixed length and L is a pitch period, which are criteria of the frame period. According to the eighth embodiment, the time length of voice coded with a variable length frame of a certain number of frames is always equal to or longer than the time length of voice coded with a fixed length frame of the same number of frames. Therefore, when applied to communication, the delay on the decoding side can be reduced. In addition, since the average of the period of the frame, which is the unit for encoding, is always kept above a certain value, it is possible to specify the maximum value of the encoded information amount (bit rate) in a unit time, and the audio of the same time length In this case, it is possible to perform encoding with an amount of information equal to or less than the information amount determined by the maximum value of the defined bit rate regardless of the pitch cycle length.

【００６５】実施の形態９．上述の実施の形態６でさら
に、ピッチ周期の整数倍長である可変長のフレームは、
ある固定点をフレーム周期の規準として定め、フレーム
の境界が固定点に最も近いものとなるように整数倍とす
る値を決定するとしても良い。図７にフレーム周期の規
準となる固定点を周期Ｎ毎の点とし、ピッチ周期をＬと
したときのフレームの例を示す。この実施の形態８によ
れば、符号化を行う単位であるフレームの周期がある程
度一定の値に保たれるので、単位時間における符号化情
報量（ビットレート）を安定にでき、同一の時間長の音
声であれば、ピッチ周期長によらずほぼ同一の情報量で
符号化することができる。Ninth Embodiment Furthermore, in the sixth embodiment described above, the variable-length frame that is an integral multiple of the pitch period is
It is also possible to set a certain fixed point as a criterion of the frame cycle and determine a value that is an integral multiple so that the frame boundary is closest to the fixed point. FIG. 7 shows an example of a frame in which a fixed point serving as a frame cycle criterion is a point for each cycle N and the pitch cycle is L. According to the eighth embodiment, the period of a frame, which is a unit for encoding, is maintained at a constant value to some extent, so that the encoded information amount (bit rate) in a unit time can be stabilized and the same time length can be obtained. The voice can be encoded with almost the same amount of information regardless of the pitch cycle length.

【００６６】実施の形態１０．上述の実施の形態６〜９
では、符号化、復号化するフレームがどの時点の入力音
声を符号化したものであるかという時間情報は一切伝送
されていないため、復号化側においては復号するフレー
ムが入力音声のどの時点に対するものであるかは復号化
を開始した時点から該フレームまでの可変であるフレー
ム長の積算によってのみ求めることができる。このた
め、一度でも伝送誤りによりフレーム長を誤って復号し
た場合にはそれ以降復号化側で生成される再生音声は入
力音声と時間的にずれが生じ、このずれを解消すること
ができないという問題があった。そこで、時間情報を一
切伝送しないことに代え、少なくとも間欠的に、例えば
該フレームを符号化する時点の符号化開始時点からの時
間差を伝送するなど、時間情報を伝送するようにしても
良い。Tenth Embodiment Embodiments 6 to 9 described above
However, since the time information indicating at what point in time the input and encoded frames are encoded and decoded is not transmitted at all, on the decoding side, the decoded frame corresponds to what point in the input speech. Can be determined only by integrating the variable frame length from the time when decoding is started to the frame. Therefore, even if the frame length is erroneously decoded due to a transmission error even once, the reproduced voice generated on the decoding side thereafter has a time lag with respect to the input voice, and this lag cannot be eliminated. was there. Therefore, instead of transmitting no time information at all, the time information may be transmitted at least intermittently, for example, by transmitting a time difference from the coding start time at the time of coding the frame.

【００６７】この実施の形態１０によれば、少なくとも
間欠的に時間情報が伝送されるので、伝送誤りによりフ
レーム長を誤って復号して復号側で生成される再生音声
が入力音声と時間的にずれが生じた場合でも、伝送され
る時間情報を用いることによりその時間的なずれを補償
し、解消することができるので、伝送誤りによる品質劣
化の少ない音声を再生することができる。According to the tenth embodiment, since the time information is transmitted at least intermittently, the reproduced voice generated by decoding the frame length erroneously due to a transmission error and the decoding side is temporally different from the input voice. Even if a shift occurs, it is possible to compensate for and eliminate the time shift by using the transmitted time information, so that it is possible to reproduce voice with less quality deterioration due to a transmission error.

【００６８】実施の形態１１．図１との対応部分に同一
符号を付けた図８は、この発明による音声復号化方法の
実施の形態１１の構成を示し、図中３０は符号の伝送誤
りが検出された場合に線形予測パラメータを修復する線
形予測パラメータ修復手段、３１は再生音声を分析して
前フレームが有声の立ち上がりか否かを判定する立ち上
がり判定手段、３２は再生音声を分析しピッチを求める
ピッチ抽出手段、３３は符号の伝送誤りが検出された場
合に適応符号を修復する適応符号修復手段である。また
３４は再生音声を分析して前フレームが無声か否かを判
定する無声判定手段、３５は符号の伝送誤りが検出され
た場合にゲインを修復するゲイン修復手段、３６は符号
の伝送誤り検出状況に応じて再生音声のパワーを制御す
る再生音声パワー制御手段である。さらに３７は符号の
伝送誤りが検出された場合に現フレームの音声の状態を
推定する音声状態推定手段、３８は雑音信号を生成する
雑音生成手段、３９は符号の伝送誤り検出状況に応じて
雑音信号のパワーを制御する雑音パワー制御手段であ
る。Eleventh Embodiment FIG. 8 in which the same parts as those in FIG. 1 are assigned the same reference numerals shows the configuration of the eleventh embodiment of the speech decoding method according to the present invention, in which 30 is a linear prediction parameter when a transmission error of the code is detected. A linear prediction parameter repairing means for repairing, a rising edge determining means for analyzing the reproduced speech to determine whether or not the preceding frame is a voiced rising edge, a pitch extracting means for analyzing the reproduced speech to obtain a pitch, and a reference numeral 33 Is an adaptive code repairing means for repairing the adaptive code when the transmission error is detected. Reference numeral 34 is an unvoiced determination means for analyzing the reproduced voice to determine whether or not the previous frame is unvoiced, 35 is a gain recovery means for recovering a gain when a code transmission error is detected, and 36 is a code transmission error detection. It is a reproduced voice power control means for controlling the power of the reproduced voice according to the situation. Further, 37 is a voice state estimating means for estimating the voice state of the current frame when a code transmission error is detected, 38 is noise generating means for generating a noise signal, and 39 is noise depending on a code transmission error detection situation. It is a noise power control means for controlling the power of the signal.

【００６９】復号化部２において、線形予測パラメータ
復号化手段１５は線形予測パラメータの符号から線形予
測パラメータを復号化する。線形予測パラメータ修復手
段３０は、符号の伝送誤りが検出されていないフレーム
では復号化された線形予測パラメータを、伝送誤りが検
出されたフレームでは前フレームで復号化された線形予
測パラメータから、低域のホルマント構造を保ち、高域
のホルマントピークを抑制したスペクトルとなるような
線形予測パラメータを、例えば線形予測パラメータを線
スペクトル対パラメータとして表したときに、その低域
のパラメータのみを用いるとして求め、合成フィルタ１
６の係数として設定する。図９に前フレームの線形予測
パラメータが表すスペクトル包絡と、線形予測パラメー
タ修復処理によりその低域の構造を保ち高域を抑制した
スペクトル包絡の例を示す。In the decoding unit 2, the linear prediction parameter decoding means 15 decodes the linear prediction parameter from the code of the linear prediction parameter. The linear prediction parameter repairing means 30 uses the decoded linear prediction parameter in the frame in which no transmission error of the code is detected, and the low band from the linear prediction parameter decoded in the previous frame in the frame in which the transmission error is detected. While maintaining the formant structure of, the linear prediction parameters such as a spectrum in which the high-range formant peak is suppressed, for example, when the linear prediction parameter is expressed as a line spectrum pair parameter, it is determined that only the low-frequency parameters are used, Synthesis filter 1
Set as a coefficient of 6. FIG. 9 shows an example of the spectrum envelope represented by the linear prediction parameter of the previous frame and the spectrum envelope in which the structure of the low band is maintained and the high band is suppressed by the linear prediction parameter restoration process.

【００７０】立ち上がり判定手段３１は符号の伝送誤り
が検出されたフレームで、前フレームの再生音声を分析
して、前フレームが有声の立ち上がりであるか否かを判
定し、判定結果をピッチ分析手段３２、適応符号修復手
段３３に出力する。ここで、有声立ち上がりか否かの判
定は、例えば前フレームにおいて、フレーム後半部の音
声のパワーがフレーム前半部のそれと比較してある閾値
以上大きい場合には有声の立ち上がりとする。ピッチ抽
出手段３２は、符号の伝送誤りが検出されたフレーム
で、前フレームが有声立ち上がりと判定された場合に、
過去の再生音声を分析して少なくとも１つ以上のピッチ
周期の候補を抽出し、適応符号修復手段３３に出力す
る。The rising edge judging means 31 analyzes the reproduced voice of the previous frame in the frame in which the code transmission error is detected, judges whether or not the preceding frame is a voiced rising edge, and the judgment result is the pitch analyzing means. 32, and outputs to the adaptive code restoration means 33. Here, it is determined whether or not the voiced rising is occurring, for example, when the power of the voice in the second half of the frame in the previous frame is larger than the power of the first half of the frame by a certain threshold value or more. The pitch extracting means 32 is a frame in which a code transmission error is detected, and when the preceding frame is determined to be a voiced rising edge,
The past reproduced speech is analyzed to extract at least one or more pitch period candidates and output to the adaptive code restoration means 33.

【００７１】適応符号修復手段３３は、符号の伝送誤り
が検出されていないフレームでは、分離手段４から入力
される適応符号を、符号誤りが検出されても前フレーム
が有声立ち上がりではない場合には、前フレームの適応
符号を、符号誤りが検出され、かつ前フレームが有声立
ち上がりの場合には、前記ピッチ抽出手段３２から入力
されたピッチ周期の候補の中から、例えば過去に正しく
伝送された適応符号の出現頻度分布と比較し、出現頻度
の高い符号に対応するピッチ周期に最も近いものを選択
し、この選択されたピッチ周期に対応する適応符号を適
応符号帳１７に出力する。適応符号帳１７は適応符号に
対応して、過去の駆動音源ベクトルを周期的に繰り返し
た時系列ベクトルを出力する。図１０（Ａ）にこの実施
の形態１１における適応符号修復処理の例を、また図１
０（Ｂ）に従来の前フレームの適応符号を繰り返して用
いる修復処理の例を示す。図に示すように、前フレーム
の適応符号に対応するピッチ周期が現フレームに不適な
場合でも、過去のフレームの情報より現フレームに適し
たピッチ周期を求めることにより、良好な波形修復処理
が可能であることが分かる。The adaptive code restoration means 33 uses the adaptive code input from the separating means 4 in the frame in which no code transmission error is detected, when the preceding frame is not the voiced rising even if a code error is detected. If the code error is detected and the preceding frame has a voiced rising edge, the adaptive code of the preceding frame has been correctly transmitted in the past, for example, from among the pitch period candidates input from the pitch extracting means 32. The code which is closest to the pitch cycle corresponding to the code having a high appearance frequency is selected by comparing with the code appearance frequency distribution, and the adaptive code corresponding to the selected pitch cycle is output to the adaptive codebook 17. The adaptive codebook 17 outputs a time-series vector obtained by periodically repeating the past drive excitation vector corresponding to the adaptive code. FIG. 10A shows an example of adaptive code restoration processing according to the eleventh embodiment, and FIG.
0 (B) shows an example of the restoration process in which the conventional adaptive code of the previous frame is repeatedly used. As shown in the figure, even if the pitch period corresponding to the adaptive code of the previous frame is not suitable for the current frame, good waveform restoration processing can be performed by finding the pitch period suitable for the current frame from the information of the past frames. It turns out that

【００７２】雑音符号帳１８は雑音符号に対応した時系
列ベクトルを出力する。ゲイン復号化手段１９はゲイン
の符号から、前記適応符号帳１７及び雑音符号帳１８よ
り出力された時系列ベクトルに対するゲインを復号化す
る。無声判定手段３４は、符号の伝送誤りが検出された
フレームで、前フレームの再生音声を分析して、前フレ
ームが無声であるか否かを判定し、判定結果をゲイン修
復手段３５に出力する。ゲイン修復手段３５は、符号の
伝送誤りが検出されていないフレームでは、ゲイン復号
化手段１９から入力されたゲインを、符号誤りが検出さ
れ、かつ、前フレームが無声でない場合は前フレームの
ゲインを、符号誤りが検出され、かつ、前フレームが無
声の場合は、前フレームのゲインに対し、適応符号帳か
らの時系列ベクトルに対するゲインをα倍、雑音符号帳
からの時系列ベクトルに対するゲインをβ倍して出力す
る。ここでα、βは、例えば、０≦α＜β≦１とする。The random codebook 18 outputs a time series vector corresponding to the random code. The gain decoding means 19 decodes the gain for the time series vector output from the adaptive codebook 17 and the noise codebook 18 from the gain code. The unvoiced determination unit 34 analyzes the reproduced voice of the previous frame in the frame in which the transmission error of the code is detected, determines whether or not the previous frame is unvoiced, and outputs the determination result to the gain restoration unit 35. . The gain restoration means 35 uses the gain input from the gain decoding means 19 in the frame in which no code transmission error is detected, and the gain in the previous frame when the code error is detected and the previous frame is not unvoiced. , If a code error is detected and the previous frame is unvoiced, the gain for the time series vector from the adaptive codebook is multiplied by α and the gain for the time series vector from the noise codebook is β with respect to the gain of the previous frame. Double and output. Here, α and β are, for example, 0 ≦ α <β ≦ 1.

【００７３】適応符号帳１７、雑音符号帳１８からの各
時系列ベクトルは、ゲイン修復手段３５から出力された
それぞれのゲインに応じて重み付けして加算され、その
加算結果を駆動音源ベクトルとして合成フィルタ１６へ
供給され再生音声が得られる。再生音声パワー制御手段
３６は、符号の伝送誤り検出状況に応じて、例えば誤り
が連続するに従い徐々に抑圧量を強めるとして、再生音
声のパワーを抑圧する。音声状態推定手段３７は、符号
の伝送誤りが検出された場合に、過去の再生音声及び現
フレームの線形予測パラメータの符号から現フレームの
有音／無音を判定し、その判定結果を雑音生成手段３８
に出力する。The time-series vectors from the adaptive codebook 17 and the noise codebook 18 are weighted and added according to the respective gains output from the gain restoration means 35, and the addition result is used as a driving excitation vector to synthesize a filter. It is supplied to 16 and reproduced sound is obtained. The reproduced voice power control means 36 suppresses the power of the reproduced voice according to the code transmission error detection situation, for example, by gradually increasing the suppression amount as the error continues. When a code transmission error is detected, the voice state estimating means 37 determines whether the current frame is voiced / non-voiced based on the past reproduced voice and the code of the linear prediction parameter of the current frame, and the determination result is noise generation means. 38
Output to.

【００７４】雑音生成手段３８は、符号の伝送誤り検出
状況及び音声状態判定手段３７から入力された有音／無
音判定結果に応じて、例えば誤りが検出され、かつ、有
音と推定されたフレームで雑音を生成する。雑音パワー
制御手段３９は、例えば雑音の重畳始めは徐々にパワー
を増大し、重畳終わりには徐々にパワーを減少させると
して、雑音のパワーを制御する。図１１に再生音声及び
雑音信号のパワー制御処理の例を示す。再生音声パワー
制御手段３６から出力された再生音声と雑音パワー制御
手段３９から出力された雑音は加算され、出力音声Ｓ３
が得られる。The noise generation means 38 detects, for example, an error in accordance with the transmission error detection status of the code and the voiced / non-voiced determination result input from the voice state determination means 37, and estimates the frame as voiced. Generates noise. The noise power control means 39 controls the noise power, for example, by gradually increasing the power at the beginning of noise superimposition and gradually decreasing the power at the end of superimposition. FIG. 11 shows an example of power control processing for reproduced voice and noise signals. The reproduced voice output from the reproduced voice power control means 36 and the noise output from the noise power control means 39 are added together, and the output voice S3 is output.
Is obtained.

【００７５】この実施の形態１１によれば、符号が正し
く復号できなかったことを検出した場合に、前のフレー
ムの再生音声の状態を判定し、この判定に応じて現在の
フレームの音声を再生修復することにより、遅延を増大
することなく、伝送路誤りによる品質劣化の少ない音声
を再生することができる。また、符号誤りが発生した場
合に、再生する音声の状態に応じて、再生音声に雑音を
重畳することにより、聴感上の音声品質を維持すること
ができる。According to the eleventh embodiment, when it is detected that the code cannot be correctly decoded, the state of the reproduced voice of the previous frame is judged, and the voice of the current frame is reproduced according to this judgment. By repairing, it is possible to reproduce the voice with less quality deterioration due to the transmission path error without increasing the delay. Further, when a code error occurs, noise can be superimposed on the reproduced voice depending on the state of the reproduced voice, so that the audible voice quality can be maintained.

【００７６】実施の形態１２．上述の実施の形態１１で
は、線形予測パラメータ修復手段３０において、伝送誤
りが発生した際は常に前フレームの線形予測パラメータ
から高域のホルマントピークを抑制したスペクトルとな
るような線形予測パラメータを求め、合成フィルタ１６
の係数としているが、これに代え、前フレームの線形予
測パラメータが表すスペクトルの高域に鋭いホルマント
ピークがある場合にのみ抑制処理を行い、その他の場合
は、前フレームの線形予測パラメータをそのまま用いる
とするなど、状態に応じて選択的に処理を行っても良
い。Twelfth Embodiment In the eleventh embodiment described above, in the linear prediction parameter restoration means 30, when a transmission error occurs, the linear prediction parameter is always obtained from the linear prediction parameter of the previous frame so as to obtain a spectrum in which the formant peak in the high frequency band is suppressed, Synthesis filter 16
However, instead of this, suppression processing is performed only when there is a sharp formant peak in the high band of the spectrum represented by the linear prediction parameter of the previous frame, and in other cases, the linear prediction parameter of the previous frame is used as it is. Alternatively, the processing may be selectively performed according to the state.

【００７７】この実施の形態１２によれば、線形予測パ
ラメータの修復処理を行う際に、高域に鋭いホルマント
ピークがあり、聴感上の品質劣化につながる可能性が高
い場合にのみその抑制処理を行い、その他の場合前フレ
ームの線形予測パラメータを繰り返して用いるので、異
音を発生すること無く合成音声の連続性が向上し、聴感
上の音声品質を向上することができる。According to the twelfth embodiment, when the linear prediction parameter restoration process is performed, the suppression process is performed only when there is a sharp formant peak in the high frequency range and there is a high possibility that the perceived quality will be deteriorated. In other cases, since the linear prediction parameter of the previous frame is repeatedly used, the continuity of the synthesized speech is improved without generating an abnormal sound, and the audible voice quality can be improved.

【００７８】実施の形態１３．上述の実施の形態１１で
は、伝送誤り検出時前フレームが無声の場合、ゲイン修
復手段３５においてゲインを調整することにより駆動音
源ベクトルのピッチ周期性を抑制を実現しているが、こ
れに代え、例えば適応符号帳からの時系列ベクトルを用
いないとするなど、別の手段を用いてピッチ周期性の抑
制を実現しても良い。この実施の形態１３によれば、ピ
ッチ周期性の抑制を適応符号帳からの時系列ベクトルを
使用する／使用しないを切り替えるだけで良く、ゲイン
を調整してピッチ周期性を抑制する場合に比較し簡易に
実現できる。Thirteenth Embodiment In the eleventh embodiment described above, when the transmission error detection previous frame is unvoiced, the gain restoration means 35 adjusts the gain to suppress the pitch periodicity of the driving sound source vector, but instead of this, For example, the pitch periodicity may be suppressed by using another means such as not using the time series vector from the adaptive codebook. According to the thirteenth embodiment, it suffices to switch the use of the time series vector from the adaptive codebook to suppress the pitch periodicity, and to compare the case where the pitch periodicity is suppressed by adjusting the gain. It can be realized easily.

【００７９】実施の形態１４．上述の実施の形態１１で
は、音声状態推定手段３７で、過去の再生音声及び現フ
レームの線形予測パラメータの符号から現フレームの有
音／無音を判定しているが、これに代え、適応符号やゲ
インの符号も判定のためのパラメータとして用いても良
い。この実施の形態１４によれば、過去の再生音声及び
線形予測パラメータの符号からだけでなく、その他の情
報も用いて有音／無音判定するので、より精度の高い判
定が可能となる。Fourteenth Embodiment In the eleventh embodiment described above, the voice state estimating means 37 determines whether the current frame is voiced / non-voiced based on the past reproduced voice and the code of the linear prediction parameter of the current frame. The sign of gain may also be used as a parameter for determination. According to the fourteenth embodiment, the presence / absence of sound is determined using not only the past reproduced voice and the code of the linear prediction parameter but also other information, so that the determination can be performed with higher accuracy.

【００８０】実施の形態１５．上述の実施の形態１１で
は、ＣＥＬＰ系音声符号化に伝送誤り時の修復処理、雑
音重畳処理を適用しているが、これに代え、ＭＢＥ系音
声符号化をはじめ他の音声符号化方式に適用しても、同
様に伝送誤り時の再生音声の聴感上の音声品質を向上す
ることができる。Fifteenth Embodiment In the eleventh embodiment described above, restoration processing at the time of transmission error and noise superposition processing are applied to CELP system speech coding, but instead of this, it is applied to other speech coding systems including MBE system speech coding. Even in this case, similarly, it is possible to improve the audible voice quality of the reproduced voice at the time of transmission error.

【００８１】[0081]

【発明の効果】以上詳述したように、請求項１〜請求項
３及び請求項１５〜請求項１６に記載の発明によれば、
音声符号化方法及び音声復号化方法で、雑音符号ベクト
ルをピッチ同期化するための特徴位置を求める際に用い
る符号に雑音符号を含まないので、雑音符号の符号誤り
の影響の時間的波及を小さくでき、再生音声の劣化を回
避し伝送誤りによる品質劣化の少ない音声を再生するこ
とができる。As described in detail above, according to the inventions of claims 1 to 3 and claims 15 to 16 ,
In the speech coding method and the speech decoding method, since the code used when obtaining the feature position for pitch synchronization of the noise code vector does not include the noise code, the time ripple of the influence of the code error of the noise code is reduced. Therefore, it is possible to avoid deterioration of reproduced sound and reproduce sound with little quality deterioration due to transmission error.

【００８２】また請求項４、請求項９、請求項１４、請
求項１７〜請求項１８に記載の発明によれば、音声符号
化方法及び音声復号化方法で、雑音符号ベクトルをピッ
チ同期化する際のピッチ周期長にした符号ベクトルはピ
ッチ周期のみから定まるので、雑音符号より生成される
時系列ベクトルのピッチ位置の誤りによる影響を小さく
でき、再生音声の劣化を回避し伝送誤りによる品質劣化
の少ない音声を再生することができる。Further, claim 4, claim 9, claim 14 , contract
According to the inventions of claim 17 to claim 18 , in the speech coding method and the speech decoding method, the code vector having the pitch cycle length when pitch-synchronizing the noise code vector is determined only from the pitch cycle. , It is possible to reduce the influence of the error in the pitch position of the time series vector generated from the noise code, avoid the deterioration of the reproduced voice, and reproduce the voice with less quality deterioration due to the transmission error.

【００８３】また請求項５、請求項１０に記載の発明に
よれば、音声符号化方法及び音声復号化方法で、フレー
ムはピッチ周期の整数倍長であるので、雑音符号より生
成される時系列ベクトルのピッチ位置の誤りによる影響
を小さくでき、再生音声の劣化を回避し伝送誤りによる
品質劣化の少ない音声を再生することができる。According to the fifth and tenth aspects of the present invention, in the speech coding method and speech decoding method, since the frame is an integral multiple of the pitch period, the time series generated from the noise code. It is possible to reduce the influence of the error in the pitch position of the vector, avoid the deterioration of the reproduced sound, and reproduce the sound with less quality deterioration due to the transmission error.

【００８４】また請求項６、請求項１１に記載の発明に
よれば、音声符号化方法及び音声復号化方法で、フレー
ムの規準となる固定長を定め、ピッチ周期の整数倍長で
ある可変長のフレームはその固定長に最も近いものとし
たので、単位時間における符号化情報量（ビットレー
ト）を安定にすることができる。According to the sixth and eleventh aspects of the present invention, in the voice encoding method and the voice decoding method, a fixed length which is a reference of a frame is determined, and a variable length which is an integral multiple of the pitch period. Since the frame of is closest to the fixed length, the encoded information amount (bit rate) per unit time can be stabilized.

【００８５】また請求項７、請求項１２に記載の発明に
よれば、音声符号化方法及び音声復号化方法で、フレー
ムの規準となる固定長を定め、ピッチ周期の整数倍長で
ある可変長のフレームの平均は常に固定長以上となるよ
うにしたので、通信に適用する際復号化側の遅延を小さ
くすることができ、また単位時間における符号化情報量
（ビットレート）の最大値を規定することができる。According to the seventh and twelfth aspects of the present invention, in the voice encoding method and the voice decoding method, a fixed length serving as a standard of a frame is determined, and a variable length which is an integral multiple of the pitch period. Since the average of the frames is always fixed length or more, the delay on the decoding side can be reduced when applied to communication, and the maximum value of the coded information amount (bit rate) per unit time is specified. can do.

【００８６】また請求項８、請求項１３に記載の発明に
よれば、音声符号化方法及び音声復号化方法で、フレー
ムの規準となる固定点を定め、フレームの境界がその固
定点に最も近いものとしたので、単位時間における符号
化情報量（ビットレート）を安定にすることができる。According to the eighth and thirteenth aspects of the present invention, in the voice encoding method and the voice decoding method, a fixed point serving as a standard of a frame is determined, and a frame boundary is closest to the fixed point. Since this is the case, the encoded information amount (bit rate) per unit time can be stabilized.

【００８７】[0087]

【００８８】[0088]

【００８９】[0089]

【００９０】[0090]

【００９１】[0091]

[Brief description of drawings]

【図１】この発明による音声符号化方法及び音声復号
化方法の実施の形態１の全体構成を示すブロック図であ
る。FIG. 1 is a block diagram showing an overall configuration of a first embodiment of a voice encoding method and a voice decoding method according to the present invention.

【図２】図１の実施の形態１におけるピッチ位置抽出
処理の動作の説明に供する略線図である。FIG. 2 is a schematic diagram for explaining an operation of pitch position extraction processing in the first embodiment of FIG.

【図３】この発明による音声符号化方法及び音声復号
化方法の実施の形態２における雑音符号ベクトルのピッ
チ同期化処理の説明に供する略線図である。[Fig. 3] Fig. 3 is a schematic diagram for explaining pitch synchronization processing of a noise code vector in a second embodiment of a speech coding method and a speech decoding method according to the present invention.

【図４】この発明による音声符号化方法及び音声復号
化方法の実施の形態６における雑音符号ベクトルのピッ
チ同期化処理の説明に供する略線図である。[Fig. 4] Fig. 4 is a schematic diagram for explaining pitch synchronization processing of a noise code vector in a sixth embodiment of a speech coding method and a speech decoding method according to the present invention.

【図５】この発明による音声符号化方法及び音声復号
化方法の実施の形態７におけるフレーム決定の説明に供
する略線図である。[Fig. 5] Fig. 5 is a schematic diagram for explaining frame determination in Embodiment 7 of a speech encoding method and a speech decoding method according to the present invention.

【図６】この発明による音声符号化方法及び音声復号
化方法の実施の形態８におけるフレーム決定の説明に供
する略線図である。[Fig. 6] Fig. 6 is a schematic diagram for explaining frame determination in Embodiment 8 of a speech encoding method and a speech decoding method according to the present invention.

【図７】この発明による音声符号化方法及び音声復号
化方法の実施の形態９におけるフレーム決定の説明に供
する略線図である。[Fig. 7] Fig. 7 is a schematic diagram used for explaining frame determination in Embodiment 9 of a speech encoding method and a speech decoding method according to the present invention.

【図８】この発明による音声復号化方法の実施の形態
１１の構成を示すブロック図である。[Fig. 8] Fig. 8 is a block diagram showing the structure of an eleventh embodiment of a speech decoding method according to the present invention.

【図９】図８の実施の形態１１におけるスペクトル修
復処理の動作の説明に供する略線図である。FIG. 9 is a schematic diagram for explaining the operation of the spectrum restoration processing in the eleventh embodiment of FIG.

【図１０】図８の実施の形態１１における適応符号修
復処理の動作の説明に供する信号波形図である。FIG. 10 is a signal waveform diagram for explaining the operation of the adaptive code restoration process in the eleventh embodiment of FIG.

【図１１】図８の実施の形態１１における再生音声及
び雑音信号のパワー制御処理の一例の説明に供する略線
図である。FIG. 11 is a schematic diagram for explaining an example of power control processing of reproduced voice and noise signals in the eleventh embodiment of FIG.

【図１２】従来のＣＥＬＰ系音声符号化復号化方法の
全体構成を示すブロック図である。FIG. 12 is a block diagram showing the overall configuration of a conventional CELP-based speech encoding / decoding method.

【図１３】従来の改良されたＣＥＬＰ系音声符号化復
号化方法の全体構成を示すブロック図である。FIG. 13 is a block diagram showing an overall configuration of a conventional improved CELP audio encoding / decoding method.

【図１４】図１３のＣＥＬＰ系音声符号化復号化方法
における雑音符号ベクトルのピッチ同期化処理の説明に
供する略線図である。FIG. 14 is a schematic diagram for explaining pitch synchronization processing of a noise code vector in the CELP speech coding / decoding method of FIG. 13;

【図１５】図１３のＣＥＬＰ系音声符号化復号化方法
における位相を誤った場合の雑音符号ベクトルのピッチ
同期化処理の説明に供する略線図である。15 is a schematic diagram for explaining pitch synchronization processing of a noise code vector when the phase is incorrect in the CELP speech coding / decoding method of FIG.

[Explanation of symbols]

１符号化部２復号化部３多重化手段４分離手段８線形予測パラメータ分析手段９線形予測パラメータ符号化手段１０、１６合成フィルタ１１、１７適応符号帳１２、１８雑音符号帳１３ゲイン符号化手段１４距離計算手段１５線形予測パラメータ復号化手段１９ゲイン復号化手段２０、２２ピッチ位置抽出手段２１、２３ピッチ同期化手段２４、２７固定符号帳２５、２８第２の雑音符号帳２６、２９ピッチ位置抽出手段３０線形予測パラメータ修復手段３１立ち上がり判定手段３２ピッチ抽出手段３３適応符号修復手段３４無声判定手段３５ゲイン修復手段３６再生音声パワー制御手段３７音声状態推定手段３８雑音生成手段３９雑音パワー制御手段 1 Encoding section 2 Decoding section 3 Multiplexing means 4 Separation means 8 Linear prediction parameter analysis means 9 Linear prediction parameter coding means 10, 16 Synthesis filter 11, 17 Adaptive codebook 12, 18 Noise codebook 13 Gain Encoding Means 14 Distance calculation means 15 Linear prediction parameter decoding means 19 gain decoding means 20, 22 Pitch position extraction means 21, 23 Pitch synchronization means 24, 27 fixed codebook 25, 28 Second random codebook 26, 29 pitch position extraction means 30 Linear prediction parameter restoration means 31 Rise determination means 32 pitch extraction means 33 Adaptive code restoration means 34 Silent judgment means 35 Gain Restoration Means 36 Playback audio power control means 37 Voice state estimating means 38 noise generation means 39 Noise power control means

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平５−19794（ＪＰ，Ａ) 特開平５−19796（ＪＰ，Ａ) 特開平５−94200（ＪＰ，Ａ) 特開平５−108098（ＪＰ，Ａ) 特開平６−202699（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 - 19/14 H03M 7/30 H04B 14/04 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP 5-19794 (JP, A) JP 5-19796 (JP, A) JP 5-94200 (JP, A) JP 5- 108098 (JP, A) JP-A-6-202699 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G10L 19/00-19/14 H03M 7/30 H04B 14/04

Claims

(57) [Claims]

1. A voice encoding method for encoding input voice into a code of a plurality of parameters on a frame-by-frame basis, storing past drive excitation vectors and corresponding to adaptive codes.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector and a code vector that has a predetermined pitch synchronization position
Stores multiple codes and outputs a code vector corresponding to the noise code
Comprising a that noise codebook, any sign of the plurality of parameters except the noise code
Based on the
The pitch position is extracted from the sign position, and the pitch of each code vector is added to the extracted pitch position.
The pitch is set so that the sync position is matched.
Generates a second time-series vector that repeats Toll periodically
And, the first time series vector and the second time series vector
To generate the driving sound source vector, and drive the synthesis filter with the driving sound source vector
The play items, to evaluate the distortion for the input speech of said reproduced speech signals
A speech coding method characterized by determining a code.

2. A code of a plurality of parameters for each frame
Is a voice decoding method that decodes
Then, the past driving sound source vector is memorized, and it corresponds to the adaptive code.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector and a plurality of code vectors that have a predetermined pitch synchronization position are stored, and a noise codebook that outputs a code vector corresponding to a noise code is provided, and Which of the sign of the parameter
Based on the
The pitch position is extracted from the sign position, and the pitch of each code vector is added to the extracted pitch position.
The pitch is set so that the sync position is matched.
Generates a second time-series vector that repeats Toll periodically
And, the first time series vector and the second time series vector
The generated an addition to excitation vector, output sound by driving the synthesis filter with the excitation vector
A voice decoding method characterized by generating a voice .

3. A plurality of parameters for input voice in frame units
It is a voice encoding / decoding method that encodes the
Then, on the encoding side, the past driving excitation vector is stored, and it corresponds to the adaptive code.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector and a code vector that has a predetermined pitch synchronization position
Stores multiple codes and outputs a code vector corresponding to the noise code
Comprising a that noise codebook, any sign of the plurality of parameters except the noise code
Based on the
The pitch position is extracted from the sign position, and the pitch of each code vector is added to the extracted pitch position.
The pitch is set so that the sync position is matched.
Generates a second time-series vector that repeats Toll periodically
And, the first time series vector and the second time series vector
To generate the driving sound source vector, and drive the synthesis filter with the driving sound source vector
The play items, to evaluate the distortion for the input speech of said reproduced speech signals
The code is determined, and the decoding side stores the past driving excitation vector and corresponds to the adaptive code.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector and a code vector that has a predetermined pitch synchronization position
Stores multiple codes and outputs a code vector corresponding to the noise code
Comprising a that noise codebook, any sign of the plurality of parameters except the noise code
Based on the
The pitch position is extracted from the sign position, and the pitch of each code vector is added to the extracted pitch position.
The pitch is set so that the sync position is matched.
Generates a second time-series vector that repeats Toll periodically
Then, the first time series vector and the second time series vector are added to generate a driving sound source vector, and the synthesis filter is driven by the driving sound source vector to output the output sound.
A voice encoding / decoding method characterized by generating a voice.

4. A plurality of parameters are applied to the input voice for each frame.
It is a voice encoding method for encoding into a data code, storing past drive excitation vectors and corresponding to adaptive codes.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector, a predetermined pitch synchronization position, and a pitch period
The code vector with the cutout position determined accordingly.
Stores multiple codes and outputs a code vector corresponding to the noise code
Comprising a that noise codebook, from the feature position arranged at a pitch period interval in the current frame
The pitch position is extracted, and the above-mentioned code vectors are cut out according to the pitch period.
Position, and the pitch of the code vector
Periodic so that the period position matches the extracted pitch position above
To generate a driving sound source vector by adding the first time series vector and the second time series vector, and driving the synthesis filter with the driving sound source vector. Voice communication
The input signal of the reproduced audio signal above.
A speech coding method characterized in that distortion is evaluated to determine a code.

5. The audio encoding method according to claim 4, wherein the frame has an integral multiple of a pitch period.

6. The speech coding method according to claim 5, wherein a fixed length serving as a frame standard is determined, and an integer multiple of the pitch period is closest to the fixed length.

7. The speech coding method according to claim 5, wherein a fixed length serving as a frame standard is determined, and an average of integral multiples of the pitch period is always equal to or larger than the fixed length.

8. The voice encoding method according to claim 5, wherein a fixed point serving as a standard of a frame is determined, and an integer multiple of the pitch period is such that a frame boundary is closest to the fixed point. .

9. A code of a plurality of parameters for each frame
Is a voice decoding method that decodes
Then, the past driving sound source vector is memorized, and it corresponds to the adaptive code.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector, a predetermined pitch synchronization position, and a pitch period
The code vector with the cutout position determined accordingly.
Stores multiple codes and outputs a code vector corresponding to the noise code
Comprising a that noise codebook, from the feature position arranged at a pitch period interval in the current frame
The pitch position is extracted and the above code vectors are set to the pitch period.
Cut out at the corresponding cutout position, and this cut out code
The pitch sync position of the vector is the pitch position extracted above
Second time series vector that is periodically repeated to match
To generate the first time series vector and the second time series vector
To generate the driving sound source vector, and drive the synthesis filter with the driving sound source vector to output the output sound.
A voice decoding method characterized by generating a voice.

10. The speech decoding method according to claim 9, wherein the frame is an integral multiple of a pitch period.

11. A fixed length as a standard of the frame is defined,
The speech decoding method according to claim 10, wherein an integer multiple of the pitch period is closest to the fixed length.

12. A fixed length as a standard of the frame is defined,
11. The speech decoding method according to claim 10, wherein an average of integer multiples of the pitch period is always equal to or more than the fixed length.

13. A fixed point serving as a standard of a frame is defined,
11. The speech decoding method according to claim 10, wherein the integer multiple of the pitch period is such that a frame boundary is closest to the fixed point.

14. A plurality of parameters are applied to input voice for each frame.
It is a voice coding / decoding method that codes into the meter code.
Then, on the encoding side, the past driving excitation vector is stored, and it corresponds to the adaptive code.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector, a predetermined pitch synchronization position, and a pitch period
The code vector with the cutout position determined accordingly.
Stores multiple codes and outputs a code vector corresponding to the noise code
Comprising a that noise codebook, from the feature position arranged at a pitch period interval in the current frame
The pitch position is extracted, and the above-mentioned code vectors are cut out according to the pitch period.
Position, and the pitch of the code vector
Periodic so that the period position matches the extracted pitch position above
To generate a second time-series vector that is repeated, and to generate the first time-series vector and the second time-series vector.
To generate the driving sound source vector, and drive the synthesis filter with the driving sound source vector
The play items, to evaluate the distortion for the input speech of said reproduced speech signals
The code is determined, and the decoding side stores the past driving excitation vector and corresponds to the adaptive code.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector, a predetermined pitch synchronization position, and a pitch period
The code vector with the cutout position determined accordingly.
Stores multiple codes and outputs a code vector corresponding to the noise code
Comprising a that noise codebook, from the feature position arranged at a pitch period interval in the current frame
A second time when a pitch position is extracted, each of the code vectors is cut out at a cutout position corresponding to a pitch cycle, and the pitch synchronization position of the cut out code vector is periodically repeated so as to match the extracted pitch position. Generating a sequence vector, the first time series vector and the second time series vector
To generate the driving sound source vector, and drive the synthesis filter with the driving sound source vector to output the output sound.
A voice encoding / decoding method characterized by generating a voice.

15. A plurality of parameters are applied to input voice for each frame.
A speech coding apparatus that codes into a meter code, stores past drive excitation vectors, and supports adaptive codes.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector and a code vector that has a predetermined pitch synchronization position
Stores multiple codes and outputs a code vector corresponding to the noise code
Which is a random codebook or a code of the plurality of parameters excluding the random code.
Based on the
Pitch position extraction means for extracting the pitch position from the index position
And the pitch of each code vector above the extracted pitch position.
The pitch is set so that the sync position is matched.
Generates a second time-series vector that repeats Toll periodically
Pitch synchronization means, the first time series vector and the second time series vector
And an addition means for generating a driving sound source vector, and a synthetic filter for reproducing an audio signal with the driving sound source vector.
And the distortion of the reproduced voice signal with respect to the input voice
A distance calculating means for determining a sign.
Audio coding device.

16. A plurality of parameter marks for each frame
A voice decoding device that decodes a signal to generate an output voice.
Then, the past driving sound source vector is memorized, and it corresponds to the adaptive code.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector and a code vector that has a predetermined pitch synchronization position
Stores multiple codes and outputs a code vector corresponding to the noise code
Which is a random codebook or a code of the plurality of parameters excluding the random code.
Based on the
Pitch position extraction means for extracting the pitch position from the index position
And the pitch of each code vector above the extracted pitch position.
The pitch is set so that the sync position is matched.
Generates a second time-series vector that repeats Toll periodically
Pitch synchronization means, the first time series vector and the second time series vector
And adding means for generating a driving sound source vector, and a synthetic filter for generating an output sound by the driving sound source vector.
And a voice decoding device.

17. A plurality of parameters are applied to input voice for each frame.
A speech coding apparatus that codes into a meter code, stores past drive excitation vectors, and supports adaptive codes.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector, a predetermined pitch synchronization position, and a pitch period
The code vector with the cutout position determined accordingly.
Stores multiple codes and outputs a code vector corresponding to the noise code
From the random codebook and the characteristic positions that are lined up at pitch intervals in the current frame
Pitch position extracting means for extracting the pitch position , and a cutting position corresponding to the pitch period for each of the above code vectors
Position, and the pitch of the code vector
Periodic so that the period position matches the extracted pitch position above
To generate the second time series vector repeated
Averaging means, the first time series vector and the second time series vector
Means for generating a driving sound source vector, and a synthetic filter for reproducing an audio signal with the driving sound source vector
And the distortion of the reproduced voice signal with respect to the input voice
A distance calculating means for determining a sign.
Audio coding device.

18. A plurality of parameter marks for each frame
A voice decoding device that decodes a signal to generate an output voice.
Then, the past driving sound source vector is memorized, and it corresponds to the adaptive code.
The first time when the past driving sound source vector is periodically repeated
An adaptive codebook that outputs a sequence vector, a predetermined pitch synchronization position, and a pitch period
The code vector with the cutout position determined accordingly.
Stores multiple codes and outputs a code vector corresponding to the noise code
From the random codebook and the characteristic positions that are lined up at pitch intervals in the current frame
Pitch position extracting means for extracting the pitch position , and a cutting position corresponding to the pitch period for each of the above code vectors
Position, and the pitch of the code vector
Periodic so that the period position matches the extracted pitch position above
To generate the second time series vector repeated
Averaging means, adding means for adding the first time-series vector and the second time-series vector to generate a driving sound source vector, and a synthetic filter for generating an output sound by the driving sound source vector.
Speech decoding apparatus characterized by comprising a motor.