JPH01152500A

JPH01152500A - Voice synthesizer

Info

Publication number: JPH01152500A
Application number: JP62309412A
Authority: JP
Inventors: Toshio Yoshikawa; 敏雄吉川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1987-12-09
Filing date: 1987-12-09
Publication date: 1989-06-14
Anticipated expiration: 2013-09-17
Also published as: JP2797300B2

Abstract

PURPOSE: To continuously vary a pitch under simple pitch control by employing a means which interpolate a pulse train of one frame in a voiced sound section, and filling a representative period with 0's behind a pulse train when a pitch period is made long and making the pulse train overlap when the pitch period is made short. CONSTITUTION: In the voiced sound section, the means which interpolates the pulse train of one frame from e pulse train in an arbitrary representative pitch section in a frame; and the representative section is filled with 0's (no pulse) behind the pulse train when the pitch period is made long and the pulse train in the representative section is made to overlap when the pitch period is made short. Namely, when pitch control information 12 is longer than a pitch period 10, an interpolating circuit 7 performs interpolation while making the pitch period 10 long, but when short, the interpolating circuit 7 performs interpolation while making the pitch period 10 short. Consequently, discontinuance at the connection point of frames can be prevented.

Description

【発明の詳細な説明】〔産業の利用分野〕本発明は入力音声を一定時間長のフレームに分け、その
入力音声よりピッチを抽出し、そのピッチ毎に一定数の
駆動音源パルスで表すマルチパルス法を用いた音声合成
装置に係シ、特にピッチ制御と振幅制御を行う音声合成
装置に関するものであるＯ〔従来の技術〕従来、音声合成装置としては、音声波形符号化方式や音
声分析合成方式がある。そして、前者の音声波形符号化
方式は、波形そのものを使用するため自然性のある音声
品質が得られるがピッチを制御することは困難で１、符
号化した原音声波形のピッチを連続的に可変することは
でき々かった。そのためピッチ周期の異なる波形データ
を数段階用意して宜きその中から選択していた。このよ
うな方式は、例えば、伏木田、高島、三留：［調音素片
波形編集方式による規則合成の一検討」日本音響学会講
演論文集　昭和６０年９月−１０月　Ｐ１７９−１８０
）に説明されている。[Detailed description of the invention] [Field of industrial application] The present invention divides input audio into frames of a fixed time length, extracts pitches from the input audio, and generates multi-pulse signals for each pitch represented by a fixed number of driving sound source pulses. [Background Art] Conventionally, speech synthesis devices have been developed using speech waveform encoding methods and speech analysis and synthesis methods. There is. The former audio waveform encoding method uses the waveform itself, so it can obtain natural audio quality, but it is difficult to control the pitch1, and the pitch of the encoded original audio waveform is continuously variable. I was able to do it. Therefore, several stages of waveform data with different pitch periods were prepared and selected from among them. Such a method is used, for example, by Fushikida, Takashima, and Mitome: [A Study of Ruled Synthesis by Articulatory Segment Waveform Editing Method,” Proceedings of the Acoustical Society of Japan, September-October 1985, P179-180
) is explained in

また、後者の音声分析合成方式は、原音声波形から線形
予測法によシビツテ情報を抽出しそのビツテ情報は連続
的に変更することができる。しかし音声分析合成方式で
は音源としては擬似的な音源を用いており、有声音の場
合は１つのピッチに１個のパルスで駆動し、無声音の場
合は白色雑音で駆動する方式であった。このような方式
は、例えば、板倉文忠：「線スペクトル周波数をパラメ
ータとした音声合成法とそのＬＳＩ化」　日経エレクト
ロニクス１９８１．２．２　Ｐ１２８−１５．８）に説
明されている。In addition, the latter speech analysis and synthesis method extracts bit information from the original speech waveform by a linear prediction method, and the bit information can be continuously changed. However, the speech analysis and synthesis method uses a pseudo sound source as a sound source, and in the case of voiced sounds, it is driven by one pulse per pitch, and in the case of unvoiced sounds, it is driven by white noise. Such a method is described, for example, in Fumitada Itakura: "Speech synthesis method using line spectrum frequency as a parameter and its implementation into LSI" Nikkei Electronics 1981.2.2 P128-15.8).

[Problem that the invention seeks to solve]

上述した従来の音声合成装置における音声波形符号化方
式は、波形そのものを使用するため自然性のある音声品
質が得られるがピッチを制御することは困難であシ、符
号化した原音声波形のピッチを連続的に可変することは
できなかった。そのためピッチ周期の異なる波形データ
を数段階用意して置きその中から選択し７ていた。しか
しこの場合ピッチが連続的に可変ではなく、さらにピッ
チ周期側に音声波形データをメモリに格納するため大容
量のメモリが必要になるという問題点があつた。The above-mentioned speech waveform encoding method in the conventional speech synthesizer uses the waveform itself, so natural speech quality can be obtained, but it is difficult to control the pitch, and the pitch of the encoded original speech waveform is difficult to control. could not be varied continuously. For this reason, several levels of waveform data with different pitch periods were prepared and selected from among them. However, in this case, there was a problem that the pitch was not continuously variable and that a large capacity memory was required to store audio waveform data in the memory on the pitch cycle side.

また、従来の音声合成装置における音声分析合成方式は
、原音声波形から線形予測法によりピッチ情報を抽出し
そのピッチ情報は連続的に変更することができる。しか
し音声分析合成方式では音源としては疑似的な音源を用
いてお９、有声音の場合は１つのピッチに１個のパルス
で駆動し、無声音の場合は白色雑音で駆動する方式であ
るため自然性のある音声品質が得られないという問題点
があった。Furthermore, the speech analysis and synthesis method in the conventional speech synthesis apparatus extracts pitch information from the original speech waveform by a linear prediction method, and the pitch information can be continuously changed. However, in the speech analysis and synthesis method, a pseudo sound source is used as the sound source9, and voiced sounds are driven by one pulse per pitch, and unvoiced sounds are driven by white noise, so it is natural. There was a problem in that it was not possible to obtain a reasonable voice quality.

[Failure to solve the problem]

本発明の音声合成装置は、入力音声を一定時間長のフレ
ームに分け、その入力音声よりピッチを抽出し、そのピ
ッチ毎に一定数の駆動音源パルスで表すマルチパルス法
を用いた音声合成装置において、有声区間ではフレーム
内の任意の代表ピッチ区間のパルス列から１フレーム分
のパルス列を補間する補間手段をとり、上記有声区間で
ピッチ周期を長くする場合は上記代表ピッチ区間のパル
ス列の後ろに零を詰める手段をとり、上記有声区間でピ
ッチ周期を短くする場合は上記代表ピッチ区間１７）パ
ルス列をオーバーラツプさせる手段をとり、かつ上記補
間手段によシ音声パワーがフレーム毎に変化するのを補
正する手段を有するものである。The speech synthesis device of the present invention is a speech synthesis device that uses a multi-pulse method that divides input speech into frames of a fixed time length, extracts pitches from the input speech, and represents each pitch with a fixed number of driving sound source pulses. , in a voiced section, an interpolation means is used to interpolate a pulse train for one frame from a pulse train in an arbitrary representative pitch section within the frame, and when the pitch period is lengthened in the voiced section, a zero is added after the pulse train in the representative pitch section. When the pitch period is shortened in the voiced section by means of overlapping the pulse trains, and when the pitch period is shortened in the voiced section, the means for overlapping the pulse trains is taken, and the means for correcting the change in voice power from frame to frame by the interpolation means. It has the following.

[Effect]

本発明においては、音声生成モデルをもつ音声波形符号
化法（マルチパルス法）を用いた音声合成装置において
、有声区間ではフレーム内の任意の代表ピッチ区間のパ
ルス列から１フレーム分のパルス列を補間する手段をと
り、ピッチ周期を長くする場合は代表区間のパルス列の
後ろにＯ（パルス無し）を詰めて、ピッチ周期を短くす
る場合は代表区間のパルス列をオーバーラツプさせる。In the present invention, in a speech synthesis device using a speech waveform encoding method (multipulse method) having a speech generation model, in a voiced section, a pulse train for one frame is interpolated from a pulse train in an arbitrary representative pitch section within the frame. If the pitch period is to be lengthened, an O (no pulse) is placed behind the pulse train in the representative section, and if the pitch period is to be shortened, the pulse trains in the representative section are overlapped.

〔Example〕

以下、図面に基づき本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail based on the drawings.

なお、代表ピッチ区間パルス列を用いたピッチ補間形マ
ルチパルス法については、例えば、手沢−範：ｒ１６に
ビット／秒以下でも音質の劣化が少ないマルチパルス音
声符号化法を用いた実時間コーデック」日経エレクトロ
ニクス１９８６．６゜１６　（Ａ３９７）Ｐ１８５．−
２１５に説明されているので、詳細は省略する。Regarding the pitch interpolation type multi-pulse method using a representative pitch interval pulse train, for example, see Tezawa-Nori: "Real-time codec using a multi-pulse audio encoding method with little deterioration of sound quality even at bits per second or less" Nikkei. Electronics 1986.6°16 (A397) P185. −
215, the details will be omitted.

第１図は本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the present invention.

図において、１は原音声信号をピッチ補間形マルチパル
ス法で分析した音声波形を格納した音声ファイルである
。第２図に音声ファイル１に格納されている情報を示す
。音声ファイル１にはフレーム毎のピッチ周期１０とＰ
ＡＲＣＯＲ係数１１および音源情報９が格納されている
。ここで、音源情報９は有声区間の場合、代表ピッチ区
間のパルス列の振幅と位置情報であシ、無声区間の場合
、フレーム全体のパルス列の振幅と位置情報である。In the figure, reference numeral 1 is an audio file that stores an audio waveform obtained by analyzing an original audio signal using the pitch interpolation type multipulse method. FIG. 2 shows information stored in the audio file 1. Audio file 1 has a pitch period of 10 and P for each frame.
ARCOR coefficient 11 and sound source information 9 are stored. Here, in the case of a voiced section, the sound source information 9 is the amplitude and position information of the pulse train of the representative pitch section, and in the case of the unvoiced section, it is the amplitude and position information of the pulse train of the entire frame.

２は有声／無声判定回路で、この有声／無声判定回路２
はピッチ周期１０がＯ（零）に設定されていれば無声音
と判定し、パルス復元回路３へ音源情報９を送り、また
、ピッチ周期１０が０以外に設定されていたときには有
声音と判定し、ビツテ制御回路４へ音源情報９を送る。2 is a voiced/unvoiced judgment circuit; this voiced/unvoiced judgment circuit 2
If the pitch period 10 is set to O (zero), it is determined to be an unvoiced sound, and the sound source information 9 is sent to the pulse restoration circuit 3, and if the pitch period 10 is set to a value other than 0, it is determined to be a voiced sound. , sends the sound source information 9 to the bit control circuit 4.

そして、無声区間の場合は音源情報９がフレーム全体の
パルス列であるためそのまま合成フィルタ６へ入力する
。有声区間の場合は音源情報９が代表ピッチ区間のパル
ス列である。In the case of an unvoiced section, the sound source information 9 is a pulse train of the entire frame, so it is input to the synthesis filter 6 as is. In the case of a voiced section, the sound source information 9 is a pulse train of the representative pitch section.

５はピッチ制御回路４の出力を入力とする振幅制御回路
、７はピッチ周期１０とピッチ制御情報１２を入力とし
その出力をピッチ制御回路４に送出する補間回路、８は
ＰＡＲＣＯＲ係数１１を入力としその出力を合成フィル
タ６に送出する補間回路、１３は合成フィルタ６の出力
である合成音である。5 is an amplitude control circuit which receives the output of the pitch control circuit 4 as an input; 7 is an interpolation circuit which receives the pitch period 10 and pitch control information 12 and sends its output to the pitch control circuit 4; and 8 receives the PARCOR coefficient 11 as an input. An interpolation circuit 13 sends its output to the synthesis filter 6, and 13 is a synthesized sound that is the output of the synthesis filter 6.

そして、パルス復元回路３とピッチ制御回路４および振
幅制御回路５によって音源パルス復元部１４を構成して
いる。また、このピッチ制御回路４と補間回路Ｔは有声
区間でフレーム内の任意の代表ピッチ区間のパルス列か
ら１フレーム分のパルス列を補間する補間手段と上記有
声区間でピッチ周期を長くする場合上記代表ピッチ区間
のパルス列の後ろに零を詰める手段および上記有声区間
でピッチ周期を短くする場合上記代表ピッチ区間のパル
ス列をオーバーラツプさせる手段を構成している。また
振幅制御回路５は上記補間手段により音声パワーがフレ
ーム毎に変化するのを補正する手段を構成している。The pulse restoring circuit 3, pitch control circuit 4, and amplitude control circuit 5 constitute a sound source pulse restoring section 14. The pitch control circuit 4 and the interpolation circuit T also include an interpolation means for interpolating a pulse train for one frame from a pulse train of an arbitrary representative pitch section in a frame in a voiced section, and a representative pitch when the pitch period is lengthened in the voiced section. The apparatus includes means for padding the end of the pulse train of the interval with zeros and means for overlapping the pulse trains of the representative pitch interval when the pitch period is shortened in the voiced interval. Further, the amplitude control circuit 5 constitutes means for correcting the change in audio power from frame to frame caused by the interpolation means.

第３図は有声区間でのパルスの探索と補間処理を示す説
明図である。FIG. 3 is an explanatory diagram showing pulse search and interpolation processing in a voiced section.

図において、２１は原音声波形の２フレーム分であＪ、
ｋ番目と（ｋ＋１）番目のフレームを示す。分析条件と
しては、標本化周波数が８ＫＨｚで、音声分析フレーム
長が２０ｍ５としている。In the figure, 21 is two frames of the original audio waveform J,
The kth and (k+1)th frames are shown. The analysis conditions are that the sampling frequency is 8 KHz and the audio analysis frame length is 20 m5.

２２はフレームにおけるパルス列である。これはフレー
ム毎に求めたピッチ周期を用いてフレームをピッチ周期
毎のサブ・フレーム区間に分割し、フレームで求めた合
成フィルタの係数をサブ・フレーム毎に補間し、サブ・
フレーム単位にフィルタの係数を求め、そして定められ
た個数のパルス列をサブ・フレーム毎に計算したもので
ある。ｋ番目のフレームの各サブ・フレームのパルス列
は（ａｌｋＨ１２に＋　’３ｋｇ　８４に、　ｊ１５ｋ
）、（ｋ＋１）番目のフレームの各サブ・フレームのパ
ルス列ハ（ｋ＋１　　　　　　　　　　　　　　ｋ＋ｌ
　　　　　　ｈ＋１１ｋｌ　　、ａ２＋ａ３　　ｅ’４
　　＋＆Ｓ　　）と表されている。22 is a pulse train in a frame. This uses the pitch period determined for each frame to divide the frame into sub-frame sections for each pitch period, interpolates the coefficients of the synthesis filter determined for each frame for each sub-frame, and
Filter coefficients are determined for each frame, and a predetermined number of pulse trains are calculated for each sub-frame. The pulse train of each sub-frame of the k-th frame is (alkH12+'3kg 84, j15k
), the pulse train of each sub-frame of the (k+1)th frame is (k+1 k+l
h+11kl, a2+a3 e'4
+&S).

２３は代表ピッチ区間のパルス列である。これはフレー
ム全体で原音声と良く似た良好な音声を再生できるよう
な区間をいくつかの区間について探索して選択されたも
のである。ｋ番目のフレームの代表ピッチ区間パルス列
は（ＩＬ２　　）　、　（ｋ＋１）番目のフレームの代
表ピッチ区間パルス列は（、に＋１１　　で表されてい
る。23 is a pulse train of a representative pitch section. This is selected by searching several sections for a section that can reproduce good audio that is very similar to the original audio throughout the frame. The representative pitch interval pulse train of the k-th frame is represented by (IL2), and the representative pitch interval pulse train of the (k+1)th frame is represented by (, +11).

ここで、第１図における音声ファイル１中のピッチ周期
１０は原音声信号を分析したピッチ周期であシ、ピッチ
制御情報１２は原音声信号のピッチ周期１０を変更した
いピッチ周期の値である。Here, the pitch period 10 in the audio file 1 in FIG. 1 is the pitch period obtained by analyzing the original audio signal, and the pitch control information 12 is the value of the pitch period at which the pitch period 10 of the original audio signal is desired to be changed.

そして、このピッチ制御情報１２がピッチ周期１０よシ
長い場合は補間回路７でピッチ周期１０を長くしながら
補間し、短い場合には補間回路Ｔでピッチ周期１０を短
くシながら補間する。以下に、補間方法を説明する。If the pitch control information 12 is longer than the pitch period 10, the interpolation circuit 7 performs interpolation while lengthening the pitch period 10, and if it is shorter, the interpolation circuit T performs interpolation while shortening the pitch period 10. The interpolation method will be explained below.

第３図における２４はピッチ周期を長くする場合の再生
パルス列を示す処理波形であり、第４図はその拡大図で
ある。24 in FIG. 3 is a processed waveform showing a reproduction pulse train when the pitch period is lengthened, and FIG. 4 is an enlarged view thereof.

（ｋ＋１）番目の代表ピッチ区間のパルス列Ｉ２６　（
ａ２”１）の後にＯを詰めることによシビツテ周期を伸
張する。この後に（ｋ＋１）番目の代表ピッチ区間のパ
ルス列ｌＩ２７（ａ２　　）を接続し、これを続けて伸
張されたフレームのパルス列（ｂｌに＋１　、　ｂ２に
＋１　、　ｂ３に＋１　、　ｂ、に＋１　）、を生成す
る。これから分かるようにピッチ周期を伸張した区間に
は音源パルスは設定しない。Pulse train I26 (
The shift period is extended by filling O after a2"1). After this, the pulse train lI27 (a2) of the (k+1)th representative pitch section is connected, and this is followed by the pulse train of the extended frame (bl +1 to b2, +1 to b3, +1 to b).As can be seen from this, no sound source pulse is set in the section where the pitch period is extended.

２５はピッチ周期を短くする場合の再生パルス列を示す
処理波形であシ、第５図はその拡大図である。25 is a processed waveform showing a reproduction pulse train when the pitch period is shortened, and FIG. 5 is an enlarged view thereof.

（ｋ＋１）番目の代表ピッチ区間のパルス列Ｉ２６　（
ａ２”Ｊと　（ｋ＋１）番目の代表ピッチ区間のパルス
列Ｕ　２７　（ａ２”Ｊをオーバーラツプさせることに
よりピッチ周期を圧縮する。この後に（ｋ＋１）番目の
代表ピッチ区間のパルス列■２７　（ａｔ””　）を接
続し、これを続けて伸張されたフレームのパルス列（ｃ
ｌ＋Ｃ！　　　、ｃ３　　　＋に＋ｌ　　　ｋ＋１．Ｃ
６）　　を生成する。そして、Ｃ４＋０５オーバーラツプし、た区間には代表ピッチ区間のパルス
列Ｉ　２６と代表ピッチ区間のパルス列１１２７のパル
スが両方とも設定される場合がある。これはオーバーラ
ツプした区間にもパルスを設定することによシ音源情報
を減少させない方が音声品質が劣化しないと考えられる
からである。Pulse train I26 (
The pitch period is compressed by overlapping a2"J and the pulse train U27 (at"") of the (k+1)th representative pitch section. After this, the pulse train U27 (at"") of the (k+1)th representative pitch section This is followed by the pulse train of the expanded frame (c
l+C! , c3 + to +l k+1. C
6) Generate. Then, in the C4+05 overlapping interval, both the pulse train I 26 of the representative pitch interval and the pulse train 1127 of the representative pitch interval may be set. This is because it is thought that voice quality will not deteriorate if the sound source information is not reduced by setting pulses even in the overlapped sections.

とこで代表ピッチ区間のパルス列Ｉ２６と代表ピッチ区
間のパルス列［２７は同じパルス列である。Here, the pulse train I26 in the representative pitch section and the pulse train [27] in the representative pitch section are the same pulse train.

そして、フレーム間の境界点における接続方法は、ｋ番
目のフレームで伸張または圧縮されたピッチ区間が（ｋ
＋１）番目のフレームにはみ出した部分の後からに番目
のフレームの伸張または圧縮を開始するように接続する
。これによってフレームの接続点における不連続を防止
している。The connection method at the boundary point between frames is such that the pitch section expanded or compressed in the k-th frame is (k
+1) Connect so that the expansion or compression of the th frame starts after the part that protrudes into the th frame. This prevents discontinuities at frame connection points.

上記の有声区間の補間処理では、ピッチ周期を長くする
場合は、フレーム内のパルス列が減少するため音源パワ
ーが小さくなシ、ピッチ周期を短くする場合は、フレー
ム内のパルス列が増加するため音源パワーが大きくなる
。そこで、この音源パワーを平均化する振幅制御方法の
一例をつぎに説明する。In the voiced interval interpolation process described above, when the pitch period is lengthened, the pulse train in the frame decreases, so the sound source power becomes small, and when the pitch period is shortened, the pulse train in the frame increases, so the sound source power decreases. becomes larger. An example of an amplitude control method for averaging this sound source power will now be described.

第１図における振幅制御回路５は、ピッチ周期を変更す
る場合、パルス列の振幅に振幅補正値を乗算する。ここ
で、この振幅補正値はピッチ周期を変更しない場合のパ
ルス数をピッチ周期を変更した場合のハルス数で割った
値とする。When changing the pitch period, the amplitude control circuit 5 in FIG. 1 multiplies the amplitude of the pulse train by an amplitude correction value. Here, this amplitude correction value is a value obtained by dividing the number of pulses when the pitch period is not changed by the Hals number when the pitch period is changed.

このようにして振幅補正を行った結果のパルス列を合成
フィルタ６に入力する。この合成フィルタ６は音声ファ
イル１のＰＡＲＣＯＲ係数１１を補間回路８によシサブ
フレーム毎に補間したものを使用し音声を合成する。こ
こで、ＰＡＲＣＯＲ係数を例にとったが、α係数やＬＳ
Ｐ係数などの他の線形予測係数でもよい。The pulse train resulting from the amplitude correction performed in this manner is input to the synthesis filter 6. This synthesis filter 6 synthesizes audio using the PARCOR coefficients 11 of the audio file 1 which are interpolated for each subframe by the interpolation circuit 8. Here, we took the PARCOR coefficient as an example, but the α coefficient and LS
Other linear prediction coefficients such as the P coefficient may also be used.

〔Effect of the invention〕

以上説明したように本発明は、音声生成モデルをもつ音
声波形符号化法（マルチパルス法）を用いた音声合成装
置において、有声区間ではフレーム内の任意の代表ピッ
チ区間のパルス列から１フレーム分のパルス列を補間す
る手段をとり、ピッチ周期を長くする場合は代表区間の
パルス列の後ろに０（パルス無し）を詰めて、ピッチ周
期を短くする場合は代表区間のパルス列をオーバーラツ
プさせるという、簡単なピッチ制御方法によシピツチを
連続的に可変することができる。As explained above, the present invention provides a speech synthesis device using a speech waveform encoding method (multipulse method) having a speech generation model, in which, in a voiced section, one frame's worth of pulses is extracted from a pulse train in an arbitrary representative pitch section within the frame. A simple pitch method that interpolates the pulse train, pads 0 (no pulse) after the pulse train in the representative section to lengthen the pitch period, and overlaps the pulse train in the representative section to shorten the pitch period. Depending on the control method, the pitch can be varied continuously.

また、このピッチ制御方法によりフレーム毎の音声パワ
ーが変化してしまい、音声振幅が不安定になるのを補正
することができる。Furthermore, this pitch control method can correct the instability of the audio amplitude caused by changes in the audio power for each frame.

そこで基準の音声をマルチパルス法によシ音素素片やｃ
ｖ−ｖｃ音韻連鎖等の合成単位に分解して波形データを
準備しておけば、規則合成時に本方式のピッチ制御を用
いればピッチが滑らかに変化し自然性のある規則合成音
声を得ることができるという効果がある。Therefore, we used the multi-pulse method to convert the reference speech into shi phoneme segments and c.
If you prepare waveform data by breaking it down into synthesis units such as v-vc phoneme chains, and use the pitch control of this method during rule synthesis, the pitch will change smoothly and natural-looking rule synthesized speech can be obtained. There is an effect that it can be done.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図の音声ファイルに格納されている情報を示す説明
図、第３図は有声区間でのパルスの探索と補間処理を示
す説明図、第４図はピッチ周期を長くする場合の処理波
形の拡大図、第５図はピッチ周期を短くする場合の処理
波形の拡大図である。１・・・・音声ファイル、２・・・・有声／無声判定回
路、３・・・・パルス復元回路、４・・・・ピッチ制御
回路、５・・・・振幅制御回路、６・・・・合成フィル
タ、７，８・・・・補間回路。特許出願人　　日本電気株式会社Fig. 1 is a block diagram showing an embodiment of the present invention, Fig. 2 is an explanatory diagram showing information stored in the audio file shown in Fig. 1, and Fig. 3 is a pulse search and interpolation process in voiced sections. FIG. 4 is an enlarged view of the processing waveform when the pitch period is lengthened, and FIG. 5 is an enlarged view of the processing waveform when the pitch period is shortened. 1... Audio file, 2... Voiced/unvoiced determination circuit, 3... Pulse restoration circuit, 4... Pitch control circuit, 5... Amplitude control circuit, 6... - Synthesis filter, 7, 8... interpolation circuit. Patent applicant: NEC Corporation

Claims

[Claims]

In a speech synthesizer that uses a multi-pulse method, input speech is divided into frames of a fixed time length, pitches are extracted from the input speech, and each pitch is expressed using a fixed number of driving sound source pulses. An interpolation means is used to interpolate a pulse train for one frame from a pulse train in a representative pitch section, and when the pitch period is lengthened in the voiced section, a means is used to pad the pulse train in the representative pitch section with zeros, When the pitch period is shortened, the speech synthesis device takes means for overlapping the pulse trains of the representative pitch section, and has means for correcting the change in audio power from frame to frame by the interpolation means. .