JP2937322B2

JP2937322B2 - Speech synthesizer

Info

Publication number: JP2937322B2
Application number: JP63081411A
Authority: JP
Inventors: 敏雄吉川
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1988-04-04
Filing date: 1988-04-04
Publication date: 1999-08-23
Anticipated expiration: 2014-08-23
Also published as: JPH01253800A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は，ピッチ制御を行う音声合成装置に関する。Description: TECHNICAL FIELD The present invention relates to a speech synthesizer that performs pitch control.

［従来の技術］従来，音声波形符号化方式は波形そのものを使用する
ため自然性のある音声品質が得られるがピッチを制御す
ることは困難であり，符号化した原音声波形のピッチを
連続的に可変することはできなかった。そのためピッチ
周期の異なる波形データを数段階（例えば５段階）用意
して置きその中から選択していた（例えば，参考文献1;
伏木田，高島，三留：「調音素片波形編集方式による規
則合成の一検討」日本音響学会講演論文集昭和60年９月
−10月P179−180参照）。[Prior art] Conventionally, the speech waveform encoding method uses a waveform itself, so that natural speech quality can be obtained, but it is difficult to control the pitch. Could not be changed. Therefore, several steps (for example, five steps) of waveform data having different pitch periods are prepared and selected from among them (for example, Reference 1;
Fushida, Takashima, Midome: "A Study on Rule Synthesis by Articulatory Segment Waveform Editing Method", Proc. Of the Acoustical Society of Japan, September-October pp. 179-180).

また，音声分析合成方式は原音声波形から線形予測法
によりピッチ情報を抽出しそのピッチ情報は連続的に変
更することができる。しかし音声分析合成方式では音源
としては擬似的な音源を用いており，有声音の場合は１
つのピッチに１個のパルスで駆動し，無声音の場合は白
色雑音で駆動する方式であった（例えば参考文献2;板倉
文忠：「線スペクトル周波数をパラメータとした音声合
成法とそのLSI化」日経エレクトロニクス1981.2.2P128
−158参照）。In the speech analysis / synthesis method, pitch information is extracted from an original speech waveform by a linear prediction method, and the pitch information can be continuously changed. However, in the voice analysis / synthesis method, a pseudo sound source is used as a sound source.
It was driven by one pulse per pitch, and in the case of unvoiced sound, it was driven by white noise. (For example, Ref. 2; Fumitada Itakura: "Speech synthesis method using line spectrum frequency as a parameter and its LSI implementation" Nikkei Electronics 1981.2.2P128
-158).

［発明が解決しようとする課題］しかしながら，音声波形符号化方式は，波形そのもの
を使用するため，自然性のある音声品質が得られる。し
かしながら、ピッチを制御することは困難であり，符号
化した原音声波形のピッチを連続的に変えることはでき
なかった。そのため，ピッチ周期の異なる波形データを
数段階（５段階）用意して置き，その中から選択してい
た。しかし，この場合，ピッチの連続的な可変ではな
く，さらにピッチ周期別に音声波形データをメモリに格
納するため大容量のメモリが必要になるという欠点があ
った。[Problems to be Solved by the Invention] However, since the speech waveform encoding method uses the waveform itself, natural speech quality can be obtained. However, it is difficult to control the pitch, and the pitch of the encoded original speech waveform cannot be changed continuously. Therefore, several stages (five stages) of waveform data having different pitch periods are prepared and stored, and selected from among them. However, in this case, there is a disadvantage that a large-capacity memory is required because the voice waveform data is stored in the memory for each pitch cycle instead of continuously changing the pitch.

また，音声分析合成方式は，原音声波形から線形予測
法によりピッチ情報を抽出しそのピッチ情報は連続的に
変更することができる。しかし，音声分析合成方式では
音現としては擬似的な音源を用いており，有声音の場合
は１つのピッチに１個のパルスで駆動し，無声音の場合
は白色雑音で駆動する方式であるため，自然性のある音
声品質が得られないという欠点があった。In the speech analysis / synthesis method, pitch information is extracted from an original speech waveform by a linear prediction method, and the pitch information can be continuously changed. However, in the speech analysis and synthesis method, a pseudo sound source is used as the onset, and in the case of voiced sound, one pulse is driven at one pitch, and in the case of unvoiced sound, white noise is used. However, there is a disadvantage that natural voice quality cannot be obtained.

本発明の課題は，上記欠点を除去し，ピッチを連続的
に変更することができ，しかも，自然性のある音声品質
を得ることができる音声合成装置を提供することにあ
る。SUMMARY OF THE INVENTION An object of the present invention is to provide a speech synthesizing apparatus capable of eliminating the above-mentioned drawbacks, continuously changing the pitch, and obtaining a natural speech quality.

［課題を解決するための手段］本発明によれば、入力音声を一定時間長のフレームに
分けた状態で、前記入力音声よりピッチを抽出し、前記
ピッチ毎に一定数の駆動音源パルスで表すマルチパルス
法を用いた音声合成装置において、前記マルチパルス法
によって分析されたピッチ周波数の異なる２種類の音声
波形（ピッチ周波数の高い音声波形とピッチ周波数の低
い音声波形）のフレームにおける代表ピッチ区間のパル
ス列の振幅情報、位置情報、及び線形予測係数を線形補
間し、線形補間された振幅情報、線形補間された位置情
報、及び線形補間された線形予測係数を合成フィルタに
与える補間手段を有し、前記合成フィルタは、前記線形
補間された振幅情報、前記線形補間された位置情報、及
び前記線形補間された線形予測係数に基づいて前記２種
類の音声波形のピッチ周波数の各々とは異なるピッチ周
波数を有する音声波形を合成音声として出力することを
特徴とする音声合成装置が得られる。[Means for Solving the Problems] According to the present invention, a pitch is extracted from the input voice in a state where the input voice is divided into frames of a fixed time length, and is represented by a certain number of drive sound source pulses for each pitch. In a speech synthesizer using a multi-pulse method, a representative pitch section in a frame of two types of speech waveforms having different pitch frequencies (a speech waveform having a high pitch frequency and a speech waveform having a low pitch frequency) analyzed by the multi-pulse method. Linear interpolation of the amplitude information of the pulse train, the position information, and the linear prediction coefficient, and a linearly interpolated amplitude information, a linearly interpolated position information, and an interpolation unit that provides the linearly interpolated linear prediction coefficient to the synthesis filter, The synthesis filter is based on the linearly interpolated amplitude information, the linearly interpolated position information, and the linearly interpolated linear prediction coefficient. Thus, a speech synthesizer characterized in that a speech waveform having a pitch frequency different from each of the pitch frequencies of the two types of speech waveforms is output as synthesized speech.

［実施例］次に，本発明の実施例について図面を参照して説明す
る。Example Next, an example of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例のブロック図である。１と
２は原音声信号を，代表ピッチ区間パルス列を用いたピ
ッチ補間形マルチパルス法で分析した音声波形を格納し
た音声ファイルであり,1がピッチ周波数の低い音声ファ
イル,2がピッチ周波数の高い音声ファイルである。代表
ピッチ区間パルス列を用いたピッチ補間形マルチパルス
法については，参考文献３で説明されているので詳細は
省略する（参考文献3;小沢一範：「16kビット／秒以下
でも音質の劣化が少ないマルチパルス音声符号化法を用
いた実時間コーデック」日経エレクトロニクス1986.6.1
6（No.397）P185−215）。FIG. 1 is a block diagram of one embodiment of the present invention. 1 and 2 are audio files storing an audio waveform obtained by analyzing an original audio signal by a pitch interpolation type multipulse method using a representative pitch section pulse train, 1 is an audio file having a low pitch frequency, and 2 is an audio file having a high pitch frequency. This is an audio file. The details of the pitch interpolation type multi-pulse method using the representative pitch section pulse train are omitted since they are described in Reference 3 (Ref. 3; Kazunori Ozawa: "There is little deterioration in sound quality even at 16 kbit / sec or less." Real-time codec using multi-pulse speech coding "Nikkei Electronics 1986.6.1
6 (No. 397) P185-215).

第２図に音声ファイルに格納されている情報を示す。
音声ファイルには，フレーム毎のピッチ周期21,PARCOR
係数22,音源情報23が格納されている。FIG. 2 shows information stored in the audio file.
The pitch period of each frame is 21, PARCOR
A coefficient 22 and sound source information 23 are stored.

第３図にフレーム内の音源情報に形式を示す。有声区
間の場合，代表ピッチ区間のパルス列の振幅情報と位置
情報であり，無声区間の場合，フレーム全体のパルス列
の振幅情報と位置情報である。31はフレーム内のパルス
振幅の最大値P_maxであり,32はフレーム内のパルス列の
振幅情報g_n,33はフレーム内のパルス列の位置情報m_nで
ある。ここでｎはフレームにおけるパルス数に対応す
る。FIG. 3 shows the format of the sound source information in the frame. In the case of a voiced section, it is the amplitude information and position information of the pulse train in the representative pitch section, and in the case of the unvoiced section, it is the amplitude information and position information of the pulse train of the entire frame. 31 is the maximum value P _max of the pulse amplitude in the frame, 32 is the amplitude information g _n of the pulse train in the frame, and 33 is the position information _mn of the pulse train in the frame. Here, n corresponds to the number of pulses in the frame.

第１図にもどって，有声／無声判定回路３は，音声フ
ァイル１又は２のピッチ周期21（第２図）が０に設定さ
れていれば，無声音と判定し，音源パルス復元部16の無
声音パルス復元回路４へ音源情報９又は12を入力する。
無声音パルス復元回路４は，無声区間の場合は，音源情
報９又は12がフレーム全体のパルス列であるため，その
まま合成ファイタ７へ入力する。有声／無声判定回路３
は，音声ファイル１又は２のピッチ周期21（第２図）が
０以外に設定されていれば，有声音と判定し，音源パル
ス復元部16の有声音ピッチ制御回路５へ音源情報９又は
12を入力する。有声区間の場合は，音源情報９又は12が
代表ピッチ区間のパルス列であるため，有声音ピッチ制
御回路５及び音声ファイル１及び２のピッチ周期10及び
13を受けるピッチ周期補間回路６は，以下の処理を行
う。Returning to FIG. 1, if the pitch period 21 (FIG. 2) of the voice file 1 or 2 is set to 0, the voiced / unvoiced determination circuit 3 determines that the voice file is unvoiced, and The sound source information 9 or 12 is input to the pulse restoration circuit 4.
In the unvoiced section, the unvoiced sound pulse restoration circuit 4 directly inputs the sound source information 9 or 12 to the synthesis fighter 7 because the sound source information 9 or 12 is a pulse train of the entire frame. Voiced / unvoiced judgment circuit 3
If the pitch period 21 (FIG. 2) of the audio file 1 or 2 is set to a value other than 0, the sound file is determined to be voiced, and the sound source information 9 or
Enter 12. In the case of a voiced section, since the sound source information 9 or 12 is a pulse train of the representative pitch section, the voiced pitch control circuit 5 and the pitch periods 10 and
The pitch period interpolation circuit 6 receiving the signal 13 performs the following processing.

まず，音声ファイル1,2に格納されている各フレーム
内のパルス列の振幅情報g_nと位置情報m_nの求め方を第４
図で説明する。第４図は有声区間でのパルスの探索と補
間処理を示している。図において,41は原音声波形の２
フレーム分であり,k番目と（ｋ＋１）番目のフレームを
示す。分析条件としては，標本化周波数が8kHzで，音声
分析フレーム長が20msとしている。First, a method of obtaining the amplitude information g _n and the position information m _n of the pulse train in each frame stored in the audio file 1, 2 4
This will be described with reference to the drawings. FIG. 4 shows a pulse search and interpolation process in a voiced section. In the figure, 41 is the original speech waveform 2
This is the number of frames, and indicates the kth and (k + 1) th frames. As the analysis conditions, the sampling frequency is 8 kHz, and the voice analysis frame length is 20 ms.

42はフレームにおけるパルス列である。これはフレー
ム毎に求めたピッチ周期を用いてフレームをピッチ周期
毎のサブ・フレーム区間に分割し，フレームで求めた合
成フィルタ７の係数をサブ・フレーム毎に補間し，サブ
・フレーム単位にフィルタの係数を求め，そして定めら
れた個数のパルス列をサブ・フレーム毎に計算したもの
である。ｋ番目のフレームの各サブ・フレームのパルス
列は｛a₁ ^K,a₂ ^K,a₃ ^K,a₄ ^K,a₅ ^K｝，（ｋ＋１）番目のフレ
ームの各サブ・フレームのパルス列は｛a₁ ^K+1,a₂ ^K+1,a₃
^K+1,a₄ ^K+1,a₅ ^K+1｝と表されている。42 is a pulse train in the frame. This method divides a frame into sub-frame sections for each pitch period using the pitch period obtained for each frame, interpolates the coefficients of the synthesis filter 7 obtained for each frame for each sub-frame, and filters the sub-frame units. And a predetermined number of pulse trains are calculated for each sub-frame. pulse train of each sub-frame of the k-th frame is _{^{_{^{{a 1 K, a 2 K}}}} , a 3 K, a 4 K, a 5 K}, (k + 1) th pulse sequence of each sub-frame of the frame {a ₁ ^{K + 1} , a ₂ ^{K + 1} , a ₃
^{K + 1} , a ₄ ^{K + 1} , a ₅ ^{K + 1} ｝.

43は代表ピッチ区間のパルス列である。これはフレー
ム全体で原音声と良く似た良好な音声を再生できるよう
な区間をいくつかの区間について探索して選択されたも
のである。ｋ番目のフレームの代表ピッチ区間パルス列
は｛a₂ ^K｝，（ｋ＋１）番目のフレームの代表ピッチ区
間パルス列は｛a₂ ^K+1｝で表されている。43 is a pulse train in the representative pitch section. In this case, a section in which a good sound similar to the original sound can be reproduced in the entire frame is searched for and selected in several sections. The representative pitch section pulse train of the k-th frame is represented by {a ₂ ^K }, and the representative pitch section pulse train of the (k + 1) -th frame is represented by {a ₂ ^{K + 1} }.

次に，第５図〜第７図を参照して有声音ピッチ制御回
路５及びピッチ周期補間回路６の動作を説明する。Next, operations of the voiced sound pitch control circuit 5 and the pitch period interpolation circuit 6 will be described with reference to FIGS.

第54図は音声ファイルにおけるある１フレームの代表
ピッチ区間のパルス列を示す。ここで１フレームにおけ
るパルス数は８パルスとして説明する。51はピッチ周波
数の低い音声ファイル（例えばピッチ周波数f_a＝150H
z）におけるある１フレームの代表ピッチ区間のパルス
列である。52はピッチ周波数の高い音声ファイル（例え
ばピッチ周波数f_b＝300Hz）におけるある１フレームの
代表ピッチ区間のパルス列であり，このフレームは51に
対応するフレームである。51のパルス列のパルス振幅は
｛g_a1,g_a2,…,g_a8｝，パルス位置は｛m_a1,m_a2,…,m_a8｝
とする。52のパルス列のパルス列のパルス振幅は｛g_b1,
g_b2,…,g_b8｝，パルス位置は｛m_b1,m_b2,…,m_b8｝とす
る。FIG. 54 shows a pulse train in a representative pitch section of a certain frame in an audio file. Here, the number of pulses in one frame will be described as eight. 51 is an audio file having a low pitch frequency (for example, pitch frequency f _a = 150H)
FIG. 9 is a pulse train of a representative pitch section of a certain frame in z). Reference numeral 52 denotes a pulse train of a representative pitch section of a certain frame in an audio file having a high pitch frequency (for example, pitch frequency f _b = 300 Hz), and this frame is a frame corresponding to 51. The pulse amplitude of the 51 pulse train is {g _a1 , g _a2 , ..., g _a8 }, and the pulse position is {m _a1 , m _a2 , ..., m _a8 }
And The pulse amplitude of the 52 pulse train is の g _b1 ,
g _b2 , ..., g _b8 }, and the pulse position is {m _b1 , m _b2 , ..., m _b8 }.

ピッチ周波数を200Hzに変更する方法について，以下
に説明する。A method for changing the pitch frequency to 200 Hz will be described below.

ピッチ周波数を変更する方法は，あらかじめ用意して
あるピッチ周波数150Hzと300Hzの２種類の音声ファイル
のパルス列の位置情報と振幅情報を次式によって補間す
ることで求められる。ピッチ周波数200Hzのパルス振幅
｛g_c1,g_c2,…,g_c8｝，パルス位置は｛m_c1,m_c2,…,m_c8｝
は次式で求められる。The method of changing the pitch frequency can be obtained by interpolating the position information and the amplitude information of the pulse trains of the two types of audio files of the pitch frequencies 150 Hz and 300 Hz prepared in advance by the following equations. The pulse amplitude at pitch frequency 200Hz { _gc1 , _gc2 , ..., _gc8 }, and the pulse position is { _mc1 , _mc2 , ..., _mc8 }
Is obtained by the following equation.

ここでｎ＝1,2,…,8である。 Here, n = 1, 2,..., 8.

第６図はパルス列の位置情報の補間方法を示す。 FIG. 6 shows a method of interpolating the position information of the pulse train.

61はピッチ周波数の低い音声ファイル（ピッチ周波数
f_a＝150Hz）におけるある１フレームの代表ピッチ区間
のパルス列のパルス位置情報である。61 is an audio file with a low pitch frequency (pitch frequency
a pulse position information of the pulse train of the representative pitch interval of a certain frame in the f _a = 150Hz).

62はピッチ周波数の高い音声ファイル（ピッチ周波数
f_b＝300Hz）における61に対応した１フレームの代表ピ
ッチ区間のパルス列のパルス位置情報である。62 is an audio file with a high pitch frequency (pitch frequency
This is pulse position information of a pulse train in a representative pitch section of one frame corresponding to 61 at f _b = 300 Hz).

63はピッチ周波数を200Hzにしたときの,61に対応した
１フレームの代表ピッチ区間のパルス列のパルス位置情
報である。63 is pulse position information of a pulse train of a representative pitch section of one frame corresponding to 61 when the pitch frequency is set to 200 Hz.

第７図はパルス列の振幅情報の補間方法を示す。 FIG. 7 shows a method of interpolating the amplitude information of the pulse train.

71はピッチ周波数の低い音声ファイル（ピッチ周波数
f_a＝150Hz）におけるある１フレームの代表ピッチ区間
のパルス列のパルス振幅情報である。71 is an audio file with a low pitch frequency (pitch frequency
is a pulse amplitude information of the pulse train of the representative pitch interval of a certain frame in the f _a = 150Hz).

72はピッチ周波数の高い音声ファイル（ピッチ周波数
f_b＝300Hz）における71に対応した１フレームの代表ピ
ッチ区間のパルス列のパルス振幅情報である。72 is an audio file with a high pitch frequency (pitch frequency
This is pulse amplitude information of a pulse train of a representative pitch section of one frame corresponding to 71 at (f _b = 300 Hz).

73はピッチ周波数を200Hzにしたときの,71に対応した
１フレームの代表ピッチ区間のパルス列のパルス振幅位
置情報である。Reference numeral 73 denotes pulse amplitude position information of a pulse train of a representative pitch section of one frame corresponding to 71 when the pitch frequency is set to 200 Hz.

以上，説明したようにピッチ周波数を変更することが
できる。As described above, the pitch frequency can be changed as described.

このようにして，ピッチ周期補間回路６は，ピッチ制
御情報17にf_c＝200Hzが入力されるとあるかじめわかっ
ている音声ファイル１のf_a＝150Hzと音声ファイル２のf
_b＝300Hzをピッチ制御回路５へ与え，パルス振幅g_cnと
パルス位置m_cnを式（１）及び（２）により求めさせ
る。In this way, the pitch period interpolation circuit 6, the audio file 1 that is known beforehand that when the f _c = 200 Hz pitch control information 17 are input f _a = 150 Hz and the audio file 2 f
_b = 300 Hz is supplied to the pitch control circuit 5, and the pulse amplitude g _cn and the pulse position m _cn are obtained by the equations (1) and (2).

線形予測係数補間回路８は，音声ファイル１と音声フ
ァイル２のPARCOR係数11及び14を受け，線形補間し，合
成フィルタ７へ入力する。合成フィルタ７は音源復元部
16から入力された音源情報と線形予測係数補間回路８か
ら入力されたPARCOR係数により合成音声15を出力する。The linear prediction coefficient interpolation circuit 8 receives the PARCOR coefficients 11 and 14 of the audio file 1 and the audio file 2, performs linear interpolation, and inputs the linear interpolation to the synthesis filter 7. The synthesis filter 7 is a sound source restoration unit
The synthesized speech 15 is output based on the sound source information input from 16 and the PARCOR coefficient input from the linear prediction coefficient interpolation circuit 8.

［発明の効果］以上説明したように本発明は，音声生成モデルをもつ
音声波形符号化法（マルチパルス法）を用いた音声合成
装置において，前記マルチパルス法によって分析された
ピッチ周波数の異なる２種類の音声波形のフレームにお
ける代表ピッチ区間のパルス列の振幅情報と位置情報を
線形補間し，さらに線形予測係数も線形補間する手段に
よって，ピッチ周期を変更するという簡単なピッチ制御
方法によりピッチを連続的に可変することができる。[Effects of the Invention] As described above, the present invention relates to a speech synthesizer using a speech waveform coding method (multi-pulse method) having a speech generation model, which has two different pitch frequencies analyzed by the multi-pulse method. The pitch is continuously changed by a simple pitch control method in which the pitch period is changed by means of linearly interpolating the amplitude information and position information of the pulse train of the representative pitch section in the frame of the type of speech waveform and linearly interpolating the linear prediction coefficient. Can be varied.

そこで基準の音声をマルチパルス法により音素子片や
CV−VC音韻連鎖等の合成単位に分解して波形データを準
備しておけば，規則合成時に本方式のピッチ制御を用い
ればピッチが滑らかに変化し自然性のある規則合成音声
を得ることができるという効果がある。Therefore, the reference sound is converted to a sound element piece by the multi-pulse method.
If the waveform data is prepared by decomposing it into synthesis units such as CV-VC phoneme chains, the pitch can be smoothly changed by using the pitch control of this method during rule synthesis, and a rule-synthesized speech with naturalness can be obtained. There is an effect that can be.

また音声波形符号化方式は波形そのものを使用するた
め，ピッチ周期の異なる波形データを数段階（５段階）
用意しており，大容量のメモリが必要であったが，本発
明ではピッチ周波数の異なる２種類の音声波形（ピッチ
周波数の高い音声波形とピッチ周波数の低い音声波形）
を用意すればよくメモリ容量が少なく済む。またマルチ
パルス法より音声圧縮できるため小型の音声合成装置が
実現できる。In addition, since the speech waveform encoding method uses the waveform itself, several steps (5 steps) of waveform data with different pitch periods are used.
In the present invention, two types of voice waveforms having different pitch frequencies (a voice waveform having a high pitch frequency and a voice waveform having a low pitch frequency) are required.
And the memory capacity can be reduced. In addition, since speech can be compressed by the multi-pulse method, a compact speech synthesizer can be realized.

[Brief description of the drawings]

第１図は本発明の一実施例による音声合成装置のブロッ
ク図であり，第２図は第１図の音声合成装置の音声ファ
イルに格納されている情報の形式を示す図であり，第３
図はフレームにおける音源情報の形式を示す。第４図は
処理方法を示す処理波形であり，第５図は音声ファイル
におけるある１フレームの代表ピッチ区間のパルス列で
ある。第６図はパルス列の位置情報の補間方法，第７図
はパルス列の振幅情報の補間方法を示す。１……音声ファイル（ピッチ周波数の低い音声ファイ
ル）,2……音声ファイル（ピッチ周波数の高い音声ファ
イル）,3……有声／無声判定回路,4……無声音パルス復
元回路,5……有声音ピッチ制御回路,6……ピッチ周期補
間回路,7……合成フィルタ,8……線形予測係数補間回
路,9……音源情報（ピッチ周波数の低い音声ファイ
ル）,10……ピッチ周期（ピッチ周波数の低い音声ファ
イル）,11……PARCOR係数（ピッチ周波数の低い音声フ
ァイル）,12……音源情報（ピッチ周波数の高い音声フ
ァイル）,13……ピッチ周期（ピッチ周波数の高い音声
ファイル）,14……PARCOR係数（ピッチ周波数の高い音
声ファイル）,15……合成音声,16……音源パルス復元
部,17……ピッチ制御情報,21……フレーム内のピッチ周
期,23……フレーム内の音源情報,31……フレーム内のパ
ルス振幅の最大値,32……フレーム内のパルス列の振幅,
33……フレーム内のパルス列の位置,41……原音声波形
の２フレーム分,42……フレームにおけるパルス列,43…
…代表ピッチ区間のパルス列,51……代表ピッチ区間の
パルス列（ピッチ周波数の低い音声ファイル）,52……
代表ピッチ区間のパルス列（ピッチ周波数の高い音声フ
ァイル）,61……代表ピッチ区間のパルス列（ピッチ周
波数の低い音声ファイル）,62……代表ピッチ区間のパ
ルス列（ピッチ周波数の高い音声ファイル）,63……代
表ピッチ区間のパルス列（補間したピッチ周波数）,71
……代表ピッチ区間のパルス列（ピッチ周波数の低い音
声ファイル）,72……代表ピッチ区間のパルス列（ピッ
チ周波数の高い音声ファイル）,73……代表ピッチ区間
のパルス列（補間したピッチ周波数）。FIG. 1 is a block diagram of a speech synthesizer according to an embodiment of the present invention. FIG. 2 is a diagram showing a format of information stored in a speech file of the speech synthesizer of FIG.
The figure shows the format of sound source information in a frame. FIG. 4 is a processing waveform showing a processing method, and FIG. 5 is a pulse train of a representative pitch section of a certain frame in an audio file. FIG. 6 shows a method of interpolating the position information of the pulse train, and FIG. 7 shows a method of interpolating the amplitude information of the pulse train. 1 ... voice file (voice file with low pitch frequency), 2 ... voice file (voice file with high pitch frequency), 3 ... voiced / unvoiced judgment circuit, 4 ... unvoiced sound pulse restoration circuit, 5 ... voiced sound Pitch control circuit, 6 ... Pitch period interpolation circuit, 7 ... Synthesis filter, 8 ... Linear prediction coefficient interpolation circuit, 9 ... Sound source information (audio file with low pitch frequency), 10 ... Pitch period (Pitch frequency PARCOR coefficient (sound file with low pitch frequency), 12 ... Sound source information (sound file with high pitch frequency), 13 ... Pitch cycle (sound file with high pitch frequency), 14 ... PARCOR coefficient (sound file with high pitch frequency), 15 synthesized speech, 16 sound source pulse restoration unit, 17 pitch control information, 21 pitch cycle in frame, 23 sound source information in frame, 31 ... Pal in the frame The maximum value of the pulse amplitude, 32 ... the amplitude of the pulse train in the frame,
33: Position of pulse train in frame, 41: Two frames of original voice waveform, 42: Pulse train in frame, 43:
… Pulse train in representative pitch section, 51 …… Pulse train in representative pitch section (audio file with low pitch frequency), 52…
Pulse train of representative pitch section (voice file with high pitch frequency), 61 ... Pulse train of representative pitch section (voice file with low pitch frequency), 62 ... Pulse train of representative pitch section (voice file with high pitch frequency), 63 ... … Pulse train of representative pitch section (interpolated pitch frequency), 71
... Pulse train of the representative pitch section (sound file with low pitch frequency), 72... Pulse train of the representative pitch section (sound file with high pitch frequency), 73... Pulse train of the representative pitch section (interpolated pitch frequency).

Claims

(57) [Claims]

1. A speech synthesizer using a multi-pulse method, in which an input speech is divided into frames of a fixed time length and a pitch is extracted from the input speech and represented by a fixed number of driving sound source pulses for each pitch. Linearly interpolating amplitude information, position information, and a linear prediction coefficient of a pulse train in a representative pitch section in frames of two types of speech waveforms having different pitch frequencies analyzed by the multi-pulse method, and linearly interpolated amplitude information; Interpolating means for providing an interpolated position information and a linearly interpolated linear prediction coefficient to a synthesis filter, wherein the synthesis filter includes the linearly interpolated amplitude information, the linearly interpolated position information, and the linear interpolation A voice waveform having a pitch frequency different from each of the pitch frequencies of the two types of voice waveforms based on the obtained linear prediction coefficients; Speech synthesis apparatus characterized by and output.