JP3428594B2

JP3428594B2 - Audio encoding device, audio decoding device, audio encoding method, and audio decoding method

Info

Publication number: JP3428594B2
Application number: JP2002030538A
Authority: JP
Inventors: 利幸森井; 泰助渡辺
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2002-02-07
Filing date: 2002-02-07
Publication date: 2003-07-22
Anticipated expiration: 2018-07-22
Also published as: JP2002304200A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ディジタル移動通
信のための音声コーデックや、各種機器の音声出力のた
めの音声合成器に使用される音声符号化・復号化装置に
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding / decoding device used in a voice codec for digital mobile communication and a voice synthesizer for voice output of various devices.

【０００２】[0002]

【従来の技術】ディジタル移動通信の分野においては、
加入者の増加に対応するために、より低ビットレートの
音声符号化法が求められており、各研究機関で研究開発
が行われている。そして、現在ではビットレート８ｋｂ
ｐｓまでが標準化されており、各研究機関は、4.8kbps
程度の低ビットレートに向って研究開発を行っている。
しかし、十分な性能を持つ音声符号化・復号化装置はま
だ得られていない。In the field of digital mobile communications,
In order to cope with the increase in the number of subscribers, there is a demand for a lower bit rate speech coding method, and research and development are being conducted by each research institute. And now the bit rate is 8 kb
ps is standardized, and each research institute has 4.8 kbps
We are conducting research and development towards low bit rates.
However, a speech encoding / decoding device having sufficient performance has not yet been obtained.

【０００３】低ビットレートを実現する技術は、次の２
通りの方法に分けることができる。The technique for realizing a low bit rate is described in the following 2
It can be divided into street ways.

【０００４】まず、第１の方法は、ＣＥＬＰに代表され
るように、ある分析区間における音声をＬＰＣ系パラメ
ータと駆動音源によって符号化し、ピッチ成分は長期予
測フィルターによって作り出すという方法である。この
方法は6.7kbps〜16kbpsの中ビットレートにおいては現
在最も効率の良い方法である。しかし、４kbps以下では
かなりの音質低下が見られ、低ビットレートを実現する
のは難しい。The first method is, as represented by CELP, a method in which a speech in a certain analysis section is encoded by an LPC system parameter and a driving sound source, and a pitch component is produced by a long-term prediction filter. This method is currently the most efficient method for medium bit rates from 6.7kbps to 16kbps. However, at 4 kbps or less, the sound quality is considerably degraded, and it is difficult to realize a low bit rate.

【０００５】第２の方法は、人の音声の大部分が基本周
波数を持った有声音であることを利用し、ピッチ分析を
行ってピッチと１ピッチ波形のみを伝送するという方法
である。この方法は６kbps以下の符号化においてはかな
り有効で、低ビットレートを実現する最も有力な方法と
言える。しかし、ピッチ検出には誤りが必ず発生する。
また、ピッチ検出に要する計算量はかなり多く、リアル
タイムでのピッチ検出は難しいと言える。The second method is a method in which most of human voice is voiced sound having a fundamental frequency, and pitch analysis is performed to transmit only the pitch and one pitch waveform. This method is quite effective in encoding below 6 kbps, and can be said to be the most effective method for realizing a low bit rate. However, an error always occurs in pitch detection.
Also, the amount of calculation required for pitch detection is quite large, and it can be said that pitch detection in real time is difficult.

【０００６】したがって、上記の理由により、低ビット
レートの実現は難しいとされてきた。Therefore, it has been considered difficult to realize a low bit rate for the above reasons.

【０００７】[0007]

【発明が解決しようとする課題】上述したように、従来
の音声符号化・復号化技術では、低ビットレートを実現
するのが困難であった。それは、長期予測では十分な音
質は得られず、また、ピッチ同期ではピッチ検出が困難
であったからである。As described above, it has been difficult to realize a low bit rate with the conventional voice encoding / decoding technology. This is because long-term prediction does not provide sufficient sound quality, and pitch synchronization makes it difficult to detect pitch.

【０００８】本発明は、上記課題に鑑み、２つの方法の
長所をあわせ、ピッチ同期の符号化に長期予測を用いて
ピッチ誤りが起らないようにすることによって、低ビッ
トレートを実現することを目的とする。In view of the above problems, the present invention realizes a low bit rate by combining the advantages of the two methods and by using long-term prediction for pitch-synchronous coding to prevent pitch errors. With the goal.

【０００９】[0009]

【課題を解決するための手段】この目的を達成するため
に、本発明は、入力音声をディジタル音声信号に変換す
るＡ／Ｄ変換部と、過去の合成波形が格納されている合
成波形格納部と、予め決められた時間長の前記ディジタ
ル音声信号の部分区間及び前記過去の合成波形から長期
予測のためのピッチに基づいて選択した選択合成波形を
入力とし、前記選択合成波形、長期予測係数、前記長期
予測のためのピッチ並びに基本波形及びその長さを用い
て形成する新たな合成波形と前記ディジタル音声信号の
部分区間との誤差パワーが最小となるための、前記選択
合成波形、前記長期予測係数、前記長期予測のためのピ
ッチ並びに前記基本波形及びその長さを決定する音響分
析部と、前記音響分析部で得られる基本波形を符号化す
る基本波形符号化部と、前記音響分析部で得られる前記
長期予測係数及び前記長期予測のためのピッチ並びに前
記基本波形符号化部で得られる基本波形の符号及びその
長さの符号に基づいて合成波形を作成して前記合成波形
格納部に出力する音声合成部とを有し、前記長期予測係
数、前記長期予測のためのピッチ並びに前記基本波形の
符号及びその長さの符号を出力とする音声符号化装置で
ある。In order to achieve this object, the present invention provides an A / D converter for converting an input voice into a digital voice signal and a synthesized waveform storage for storing past synthesized waveforms. And a selected combined waveform selected based on a pitch for long-term prediction from a partial section of the digital audio signal of a predetermined time length and the past combined waveform as an input, the selected combined waveform, a long-term prediction coefficient, The selected synthetic waveform and the long-term prediction for minimizing the error power between the pitch and the basic waveform for the long-term prediction and the new synthetic waveform formed by using the length and the partial section of the digital speech signal. A coefficient, a pitch for the long-term prediction, an acoustic analysis unit that determines the basic waveform and its length, and a basic waveform encoding that encodes the basic waveform obtained by the acoustic analysis unit. And creating a synthetic waveform based on the long-term prediction coefficient and the pitch for the long-term prediction obtained by the acoustic analysis unit, and the code of the basic waveform and the code of the length thereof obtained by the basic waveform coding unit. A speech coder having a speech synthesizing section for outputting to the synthesized waveform storing section, and outputting the long-term prediction coefficient, the pitch for the long-term prediction, the code of the basic waveform and the code of the length thereof. .

【００１０】[0010]

【発明の実施の形態】本発明の請求項１に記載の発明
は、入力音声をディジタル音声信号に変換するＡ／Ｄ変
換部と、過去の合成波形が格納されている合成波形格納
部と、予め決められた時間長の前記ディジタル音声信号
の部分区間及び前記過去の合成波形から長期予測のため
のピッチに基づいて選択した選択合成波形を入力とし、
前記選択合成波形、長期予測係数、前記長期予測のため
のピッチ並びに基本波形及びその長さを用いて形成する
新たな合成波形と前記ディジタル音声信号の部分区間と
の誤差パワーが最小となるための、前記選択合成波形、
前記長期予測係数、前記長期予測のためのピッチ並びに
前記基本波形及びその長さを決定する音響分析部と、前
記音響分析部で得られる基本波形を符号化する基本波形
符号化部と、前記音響分析部で得られる前記長期予測係
数及び前記長期予測のためのピッチ並びに前記基本波形
符号化部で得られる基本波形の符号及びその長さの符号
に基づいて合成波形を作成して前記合成波形格納部に出
力する音声合成部とを有し、前記長期予測係数、前記長
期予測のためのピッチ並びに前記基本波形の符号及びそ
の長さの符号を出力とする音声符号化装置である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The invention according to claim 1 of the present invention is an A / D converter for converting an input voice into a digital voice signal, a synthesized waveform storage for storing past synthesized waveforms, With a selected synthesized waveform selected based on a pitch for long-term prediction from a partial section of the digital speech signal of a predetermined time length and the past synthesized waveform as an input,
In order to minimize the error power between the selected synthesized waveform, the long-term prediction coefficient, the pitch for the long-term prediction, the basic waveform and the new synthesized waveform formed by using the length and the partial section of the digital audio signal. , The selected combined waveform,
The long-term prediction coefficient, a pitch for the long-term prediction, an acoustic analysis unit that determines the basic waveform and its length, a basic waveform encoding unit that encodes the basic waveform obtained by the acoustic analysis unit, the acoustic A synthetic waveform is created based on the long-term prediction coefficient and the pitch for the long-term prediction obtained by the analysis unit, the code of the basic waveform and the code of the length thereof obtained by the basic waveform coding unit, and the synthesized waveform is stored. And a speech synthesizing unit for outputting to the unit, and outputs the long-term prediction coefficient, the pitch for the long-term prediction, the code of the basic waveform, and the code of the length thereof.

【００１１】上記構成によって、１ピッチの基本波形を
符号化することによって低ビットレートを実現すること
ができ、長期予測を使用するためにピッチ誤りを防ぐこ
とができる。従って、低ビットレートの音声符号化を効
率良く行うことができる。With the above configuration, a low bit rate can be realized by encoding a 1-pitch basic waveform, and pitch error can be prevented because long-term prediction is used. Therefore, low bit rate speech coding can be performed efficiently.

【００１２】請求項２に記載の発明は、合成波形は、基
本波形の長さごとに、長期予測のためのピッチに基づく
選択合成波形に長期予測係数を乗じ、その結果に復号化
された基本波形を足し合わせた結果を分析区間の長さま
で足し合わせて得られる請求項１記載の音声符号化装置
であり、１ピッチの基本波形を符号化することによって
低ビットレートを実現することができ、長期予測を使用
するためにピッチ誤りを防ぐことができる。従って、低
ビットレートの音声符号化を効率良く行うことができ
る。According to a second aspect of the present invention, the synthesized waveform is obtained by multiplying the selected synthesized waveform based on the pitch for long-term prediction by the long-term prediction coefficient for each length of the basic waveform, and decoding the result to obtain the decoded basic waveform. The speech coding apparatus according to claim 1, wherein the result of summing the waveforms is added up to the length of the analysis section, and a low bit rate can be realized by coding the basic waveform of one pitch. Pitch errors can be prevented because long term prediction is used. Therefore, low bit rate speech coding can be performed efficiently.

【００１３】請求項３に記載の発明は、入力された基本
波形の符号及び基本波形の長さの符号に基づいて基本波
形を復号化する基本波形復号化部と、長期予測係数、長
期予測のためのピッチ並びに前記基本波形復号化部の出
力である復号化された基本波形及びその長さを入力と
し、予め決められた時間長のディジタル音声信号を復号
化する音声波形復号化部と、前記音声波形復号化部の出
力である復号化されたディジタル音声信号をアナログ音
声信号に変換するＤ／Ａ変換部とを有する音声復号化装
置であり、低ビットレートの音声復号化を効率良く行う
ことができる。According to a third aspect of the present invention, a basic waveform decoding unit that decodes the basic waveform based on the input code of the basic waveform and the code of the length of the basic waveform, a long-term prediction coefficient, and a long-term prediction And a speech waveform decoding section for decoding a digital speech signal having a predetermined time length by using the pitch for decoding and the decoded basic waveform output from the basic waveform decoding section and its length as input, A voice decoding device having a D / A conversion unit for converting a decoded digital voice signal output from a voice waveform decoding unit into an analog voice signal, and efficiently performing low bit rate voice decoding. You can

【００１４】請求項４に記載の発明は、入力された基本
波形の符号及び基本波形の長さの符号に基づいて基本波
形を復号化する基本波形復号化部と、長期予測係数、長
期予測のためのピッチ並びに前記基本波形復号化部の出
力である復号化された基本波形及びその長さを入力と
し、予め決められた時間長のディジタル音声信号を復号
化する音声波形復号化部と、前記音声波形復号化部の出
力である復号化されたディジタル音声信号をアナログ音
声信号に変換するＤ／Ａ変換部とを有する音声復号化装
置であり、低ビットレートの音声符号化を効率良く行う
ことができる。According to a fourth aspect of the present invention, a basic waveform decoding unit for decoding the basic waveform based on the input code of the basic waveform and the code of the length of the basic waveform, a long-term prediction coefficient, and a long-term prediction And a speech waveform decoding section for decoding a digital speech signal of a predetermined time length, using the pitch for decoding and the decoded basic waveform output from the basic waveform decoding section and its length as input, A voice decoding device having a D / A conversion unit for converting a decoded digital voice signal output from a voice waveform decoding unit into an analog voice signal, and efficiently performing low bit rate voice encoding. You can

【００１５】請求項５に記載の発明は、合成波形は、基
本波形の長さごとに、長期予測のためのピッチに基づく
選択合成波形に長期予測係数を乗じ、その結果に復号化
された基本波形を足し合わせた結果を分析区間の長さま
で足し合わせて得られる請求項４記載の音声符号化方法
であり、低ビットレートの音声符号化を効率良く行うこ
とができる。According to a fifth aspect of the present invention, in the synthesized waveform, the selected synthesized waveform based on the pitch for long-term prediction is multiplied by the long-term prediction coefficient for each length of the basic waveform, and the result is the decoded basic waveform. The speech coding method according to claim 4, wherein the result of summing the waveforms is summed up to the length of the analysis section, and speech coding at a low bit rate can be efficiently performed.

【００１６】請求項６に記載の発明は、入力された基本
波形の符号及び基本波形の長さの符号に基づいて基本波
形を復号化する基本波形復号化工程と、長期予測係数、
長期予測のためのピッチ並びに前記基本波形復号化工程
で得られた復号化された基本波形及びその長さを入力と
し、予め決められた時間長のディジタル音声信号を復号
化する音声波形復号化工程と、前記音声波形復号化工程
で得られた復号化されたディジタル音声信号をアナログ
音声信号に変換するＤ／Ａ変換工程とを有する音声復号
化方法であり、低ビットレートの音声復号化を効率良く
行うことができる。According to a sixth aspect of the present invention, there is provided a basic waveform decoding step of decoding the basic waveform based on the input basic waveform code and the basic waveform length code, and a long-term prediction coefficient,
A speech waveform decoding step for decoding a digital speech signal having a predetermined time length by inputting the pitch for long-term prediction, the decoded basic waveform obtained in the basic waveform decoding step and its length. And a D / A conversion step of converting the decoded digital audio signal obtained in the audio waveform decoding step into an analog audio signal, which is a low bit rate audio decoding method. You can do it well.

【００１７】（実施の形態１）以下、本発明の第１の実
施の形態について図面を参照しながら説明する。(Embodiment 1) Hereinafter, a first embodiment of the present invention will be described with reference to the drawings.

【００１８】図１において、１はＡ／Ｄ変換部、２は合
成波形格納部、３は音響分析部、４は基本波形符号化
部、５は音声合成部、６は基本波形復号化部、７は音声
波形復号化部、８はＤ／Ａ変換部、１０は符号器、１１
は復号器である。In FIG. 1, 1 is an A / D conversion section, 2 is a synthesized waveform storage section, 3 is an acoustic analysis section, 4 is a basic waveform coding section, 5 is a speech synthesis section, 6 is a basic waveform decoding section, Reference numeral 7 is a speech waveform decoding unit, 8 is a D / A conversion unit, 10 is an encoder, 11
Is a decoder.

【００１９】次に、本発明の第１の実施の形態による音
声符号化・復号化装置の動作を説明する。Next, the operation of the speech encoding / decoding apparatus according to the first embodiment of the present invention will be described.

【００２０】まず、符号器１０の機能について図１を用
いて説明する。マイク（図示せず）から入力した入力音
声をＡ／Ｄ変換部１でディジタル信号に変換する。次に
音響分析部３において、一定時間の音声信号をＲＡＭメ
モリー（図示せず）に取込み、この分析区間と合成波形
格納部２に格納されている合成波形との相関の分析と、
分析区間のピッチ分析とを同時に行い、相関のある合成
波形の部分区間の位置と１ピッチの基本波形とを求め
る。このパラメータと基本波形の抽出方法については、
後に詳細に説明する。First, the function of the encoder 10 will be described with reference to FIG. The A / D converter 1 converts an input voice input from a microphone (not shown) into a digital signal. Next, in the acoustic analysis unit 3, a voice signal of a fixed time is taken into a RAM memory (not shown), and the correlation between this analysis section and the synthetic waveform stored in the synthetic waveform storage unit 2 is analyzed.
The pitch analysis of the analysis section is performed at the same time, and the position of the partial section of the correlated waveform and the basic waveform of one pitch are obtained. For the extraction method of this parameter and basic waveform,
The details will be described later.

【００２１】さらに、基本波形符号化部４においては、
音響分析部３において得られた基本波形を符号化する。
具体的な方法としては、位相を合せて波形のままＶＱす
る方法や、周波数領域に変換してから符号化する方法等
が挙げられる。Further, in the basic waveform encoding unit 4,
The basic waveform obtained in the acoustic analysis unit 3 is encoded.
As a specific method, a method of matching the phases and performing VQ as a waveform, a method of converting into a frequency domain and then encoding, and the like can be mentioned.

【００２２】ここで、音声合成部５においては、基本波
形符号化部４において得られた符号に基づいて１ピッチ
の基本波形を復号化し、音響分析部３において抽出した
長期予測係数と上記基本波形とを用いて復号化を行い、
得られた合成波形を合成波形格納部２に格納する。この
音声合成部５の機能は、復号器１１における基本波形復
号化部６と音声波形復号化部７の機能を合せたものであ
るので、その詳細は復号器１１の説明の際に述べる。Here, in the voice synthesizing section 5, the basic waveform of one pitch is decoded based on the code obtained in the basic waveform coding section 4, and the long-term prediction coefficient extracted in the acoustic analysis section 3 and the above basic waveform. Decrypt using and
The obtained composite waveform is stored in the composite waveform storage unit 2. The function of the speech synthesis unit 5 is a combination of the functions of the basic waveform decoding unit 6 and the speech waveform decoding unit 7 in the decoder 11, and the details thereof will be described when the decoder 11 is described.

【００２３】ここで、音響分析部３におけるパラメータ
と基本波形の抽出方法について、詳細に説明する。本発
明の合成の式を（数１）に示す。Here, the method of extracting the parameters and the basic waveform in the acoustic analysis unit 3 will be described in detail. The synthesis formula of the present invention is shown in (Equation 1).

【００２４】[0024]

【数１】 [Equation 1]

【００２５】この（数１）で、ｎはピッチ区間の番号、
ｑはピッチ周期、βは長期予測係数、ｐは長期予測のピ
ッチ、Ｘnq+i，Ｘnq+i-pはいずれも合成波形、Ｙiは復
号化された１ピッチの基本波形である。In this (Equation 1), n is the pitch section number,
q is a pitch period, β is a long-term prediction coefficient, p is a long-term prediction pitch, Xnq + i and Xnq + ip are all synthetic waveforms, and Yi is a decoded basic pitch of one pitch.

【００２６】そこで、この合成式により合成される波形
が原波形に最も近くなるようにｐ、ｑ、β、Ｙiを求め
る。Therefore, p, q, β and Yi are calculated so that the waveform synthesized by this synthesis formula is closest to the original waveform.

【００２７】今、ｐ、ｑが与えられていると、原波形Now, given p and q, the original waveform

【００２８】[0028]

【数２】 [Equation 2]

【００２９】と（数１）の合成波形との誤差パワーは以
下の（数３）のようになる。The error power between the composite waveform of and (Formula 1) is as shown in (Formula 3) below.

【００３０】[0030]

【数３】 [Equation 3]

【００３１】ここで、Ｅは誤差パワー、Ｍは１分析区間
内のピッチ周期の数である。そこで、このＥが最小の時
は、β、Ｙiで微分したものがいずれも０になることを
利用する。まず、βで微分すると（数４）が得られる。Here, E is the error power, and M is the number of pitch periods in one analysis section. Therefore, it is used that when E is the minimum, the values differentiated by β and Yi are both 0. First, by differentiating with β, (Equation 4) is obtained.

【００３２】[0032]

【数４】 [Equation 4]

【００３３】そこで、下記の（数５）および（数６）を
用いて、式を簡略化してβについて解くと、（数７）が
得られる。Therefore, by using the following (Equation 5) and (Equation 6) to simplify the equation and solve for β, (Equation 7) is obtained.

【００３４】[0034]

【数５】 [Equation 5]

【００３５】[0035]

【数６】 [Equation 6]

【００３６】[0036]

【数７】 [Equation 7]

【００３７】一方、（数３）をＹkで微分すると（数
８）が得られる。On the other hand, when (Equation 3) is differentiated by Yk, (Equation 8) is obtained.

【００３８】[0038]

【数８】 [Equation 8]

【００３９】そこで、（数８）を（数７）に代入してβ
を求め、その値を用いて各Ｙkを求める。これを、全て
のｐ、ｑについて行い、誤差Ｅを評価して最も誤差の少
ないｐ、ｑを選ぶ。Then, by substituting (Equation 8) into (Equation 7), β
Is obtained, and each value is used to obtain each Yk. This is performed for all p and q, the error E is evaluated, and p and q with the smallest error are selected.

【００４０】ただし、このｐ、ｑを全探索すると、多大
な計算量を必要とする。この計算量を削減する方法とし
ては、Ｖｐの値でｐおよびｑの候補を絞り込んで探索す
る方法や、ｑをＶｐの最大の時のｐの値にする方法等が
挙げられる。However, a full search for these p and q requires a large amount of calculation. As a method of reducing the calculation amount, there are a method of narrowing down the candidates of p and q by the value of Vp and searching, a method of setting q to the value of p at the maximum Vp, and the like.

【００４１】次に、復号器１１の機能について図１を用
いて説明する。まず、基本波形復号化部６において、１
ピッチの基本波形を合成する。そして、音声波形復号化
部７において、基本波形復号化部６において合成された
１ピッチの基本波形と、長期予測係数とを用いて合成式
（数１）に基づいて１分析区間の音声波形を合成する。
そして、Ｄ／Ａ変換部８でアナログ信号に変換して出力
する。Next, the function of the decoder 11 will be described with reference to FIG. First, in the basic waveform decoding unit 6, 1
Synthesize the pitch basic waveform. Then, the speech waveform decoding unit 7 uses the 1-pitch basic waveform synthesized by the basic waveform decoding unit 6 and the long-term prediction coefficient to generate a speech waveform of one analysis section based on the synthesis formula (Equation 1). To synthesize.
Then, the D / A converter 8 converts the analog signal and outputs the analog signal.

【００４２】本発明の音声符号化・復号化装置の符号化
の効果を検証するために、音声符号化・復号化の予備シ
ミュレーション実験を行った。評価用音声は男性１名が
発声した「爆音が銀世界の高原に広がる」で、サンプリ
ングレート８ｋＨｚ、１２bit-PCMで符号化したもので
ある。また、シミュレーションでは基本波形の符号化・
復号化やβのスカラ符号化を行わず、そのままを用い
た。ｐ、ｑはいずれも７ビットで符号化した。その結
果、セグメンタルＳ／Ｎ比で１３．７５ｄＢが得られ、
ピッチ誤りも起らなかった。また、１ピッチ波形は１４
〜２０ｄＢで符号化できることを考慮すると、Ｓ／Ｎ比
９〜１２ｄＢで符号化ができる。したがって、上記目的
を達成することが可能となる。In order to verify the coding effect of the speech coding / decoding apparatus of the present invention, a preliminary simulation experiment of speech coding / decoding was conducted. The voice for evaluation is "The explosion sound spreads to the plateau of the silver world" uttered by one man, and it was coded with a sampling rate of 8 kHz and 12-bit PCM. In the simulation, the basic waveform coding
It was used as it was without decoding or scalar encoding of β. Both p and q are coded with 7 bits. As a result, a segmental S / N ratio of 13.75 dB was obtained,
There was no pitch error. In addition, 1 pitch waveform is 14
Considering that the coding can be performed with ˜20 dB, the coding can be performed with the S / N ratio of 9 to 12 dB. Therefore, it is possible to achieve the above object.

【００４３】[0043]

【発明の効果】以上のように、本発明は、ピッチ同期の
符号化に長期予測を用いてピッチ誤りが起らないように
したので、ピッチ誤りを起こさずに低ビットレートで音
声を符号化および復号化することができる。As described above, according to the present invention, since the pitch error is prevented from occurring by using the long-term prediction for the pitch synchronization encoding, the voice is encoded at the low bit rate without causing the pitch error. And can be decrypted.

[Brief description of drawings]

【図１】本発明第１の実施の形態における音声符号化・
復号化装置のブロック結線図FIG. 1 is a speech coding / encoding method according to a first embodiment of the present invention.
Block diagram of decryption device

[Explanation of symbols]

１Ａ／Ｄ変換部２合成波形格納部３音響分析部４基本波形符号化部５音声合成部６基本波形復号化部７音声波形復号化部８Ｄ／Ａ変換部９伝送路１０符号器１１復号器 1 A / D converter 2 Synthetic waveform storage 3 Acoustic analysis section 4 Basic waveform encoder 5 Speech synthesizer 6 Basic waveform decoding section 7 Speech waveform decoding section 8 D / A converter 9 transmission lines 10 encoder 11 Decoder

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平５−73098（ＪＰ，Ａ) 特開平８−69300（ＪＰ，Ａ) 特開平１−205199（ＪＰ，Ａ) 特開昭62−135899（ＪＰ，Ａ) 特開平１−126700（ＪＰ，Ａ) 特開平２−8900（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/04 G10L 13/00 G10L 19/00 G10L 19/04 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-5-73098 (JP, A) JP-A 8-69300 (JP, A) JP-A 1-205199 (JP, A) JP-A 62- 135899 (JP, A) JP-A 1-126700 (JP, A) JP-A 2-8900 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G10L 11/04 G10L 13 / 00 G10L 19/00 G10L 19/04

Claims

(57) [Claims]

1. An A / D converter for converting an input speech into a digital speech signal, a synthesized waveform storage section for storing past synthesized waveforms, and a partial section of the digital speech signal of a predetermined time length. And using the selected combined waveform selected from the past combined waveform based on the pitch for long-term prediction as input, the selected combined waveform, the long-term prediction coefficient, the pitch for the long-term prediction, the basic waveform and its length are used. The selected synthetic waveform, the long-term prediction coefficient, the pitch for the long-term prediction, and the basic waveform and its length for minimizing the error power between the new synthetic waveform formed by the above-mentioned method and the partial section of the digital speech signal. An acoustic analysis unit that determines the depth, a basic waveform encoding unit that encodes the basic waveform obtained by the acoustic analysis unit, the long-term prediction coefficient and the previous obtained by the acoustic analysis unit The speech synthesis unit that creates a synthetic waveform based on the pitch for long-term prediction, the code of the basic waveform obtained by the basic waveform encoding unit, and the code of the length thereof and outputs the synthesized waveform to the synthetic waveform storage unit. A speech coding apparatus that outputs the long-term prediction coefficient, the pitch for the long-term prediction, the code of the basic waveform, and the code of the length thereof.

2. A synthesized waveform is obtained by multiplying a pitch-based selected synthesized waveform for long-term prediction by a long-term prediction coefficient for each length of the basic waveform, and adding the decoded basic waveform to the result. The speech coding apparatus according to claim 1, wherein the speech coding apparatus is obtained by adding up to a length of an analysis section.

3. A basic waveform decoding unit for decoding a basic waveform based on the input basic waveform code and basic waveform length code, a long-term prediction coefficient, a pitch for long-term prediction, and the basic waveform. An output of the speech waveform decoding unit, and a speech waveform decoding unit that receives the decoded basic waveform that is the output of the decoding unit and the length thereof and decodes a digital speech signal of a predetermined time length. D / A for converting a decoded digital voice signal that is an analog voice signal
A speech decoding apparatus having a conversion unit.

4. An A / D conversion process for converting an input voice into a digital voice signal, and a partial interval of the digital voice signal having a predetermined time length and a pitch for long-term prediction from a past synthesized waveform. A selected synthesized waveform selected as an input, a new synthesized waveform formed using the selected synthesized waveform, the long-term prediction coefficient, the pitch for the long-term prediction, the basic waveform and the length thereof, and a partial section of the digital voice signal. Of the selected combined waveform, the long-term prediction coefficient, the pitch for the long-term prediction, and the basic waveform and its length for minimizing the error power of A basic waveform encoding step of encoding a basic waveform, the long-term prediction coefficient obtained in the acoustic analysis step, a pitch for the long-term prediction, and the basic waveform It has a voice synthesizing step of creating a synthesized waveform based on the code of the basic waveform and the code of its length obtained by the encoding section, and a synthetic waveform storing step of storing the synthesized waveform as a past synthesized waveform. A speech coding method which outputs a prediction coefficient, a pitch for the long-term prediction, a code of the basic waveform, and a code of the length thereof.

5. The synthesized waveform is obtained by multiplying a pitch-based selected synthesized waveform for long-term prediction by a long-term prediction coefficient for each length of the basic waveform, and adding the decoded basic waveform to the result. The speech coding method according to claim 4, wherein the speech coding method is obtained by adding up to a length of an analysis section.

6. A basic waveform decoding step of decoding a basic waveform based on a code of the input basic waveform and a code of the length of the basic waveform, a long-term prediction coefficient, a pitch for long-term prediction, and the basic waveform. The decoded basic waveform obtained in the decoding step and its length are input, and a speech waveform decoding step of decoding a digital speech signal of a predetermined time length and a speech waveform decoding step D / that converts the decoded digital audio signal to an analog audio signal
A voice decoding method having an A conversion step.