JPS6159400A

JPS6159400A - Voice synthesizer

Info

Publication number: JPS6159400A
Application number: JP59181220A
Authority: JP
Inventors: 敏郎柴沼
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-08-30
Filing date: 1984-08-30
Publication date: 1986-03-26
Also published as: JPH055119B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音素等のパラメータ時系列を格納する音素等
パラメータ格納部と、語句等のパラメータ時系列を格納
する語句等パラメータ格納部とを有し、与えられた読み
列に対するバラメーク時系列を、音素等パラメータ格納
部及び語句等パラメータ格納部より成る集まりの中に存
在するパラメータ時系列を組合せて作成できるようにし
た音声合成装置に関するものである。[Detailed Description of the Invention] [Industrial Application Field] The present invention comprises a phoneme etc. parameter storage section that stores a parameter time series of phonemes, etc., and a word/phrase etc. parameter storage section that stores a parameter time series of words etc. This invention relates to a speech synthesis device that is capable of creating a variation time series for a given pronunciation sequence by combining parameter time series existing in a collection consisting of a phoneme etc. parameter storage unit and a phrase etc. parameter storage unit. be.

[Conventional technology and problems]

ＰＡＣＯＲ方式の音声合成器等を用いて文字列がら音声
を合成することは公知である。従来の音声合成装置にお
いては、読み列の各文字に対応するＰＡＣｏ１１係数を
音素等パラメータ格納部から取り出し、これらを結合し
て、読め列全体に対するＰＡＣＯＲ係数を作成していた
。各音素対応のＰＡＣＯＩ？係数の時系列を結合するだ
けでは、不自然な音声になるので、補間処理を行って音
素のＰＡＣＯＲ係数の時系列を結合する必要があるが、
上記のような補間処理を行っても自然な音声を得ること
が出来なかった。It is well known to synthesize speech from character strings using a PACOR speech synthesizer or the like. In the conventional speech synthesis device, the PACOR coefficients corresponding to each character in the reading sequence are taken out from the phoneme etc. parameter storage unit and are combined to create the PACOR coefficient for the entire reading sequence. PACOI for each phoneme? Simply combining the time series of the coefficients will result in unnatural speech, so it is necessary to perform interpolation processing and combine the time series of the PACOR coefficients of the phonemes.
Even with the interpolation process described above, it was not possible to obtain natural speech.

[Purpose of the invention]

本発明は、」−記の考察に基づくものであって。 The present invention is based on the considerations mentioned above.

自然発声にきわめて近い音声を合成できるようになった
音声合成装置を提供することを目的としている。The purpose of the present invention is to provide a speech synthesis device that can synthesize speech that is extremely close to natural speech.

[Means to achieve the purpose]

そしてそのため、本発明の音声合成装置は、任意語を合
成可能な音素等のパラメータ時系列を格納する音素等パ
ラメータ格納部と、単語もしくは文節もしくはそれ以上
の長い単位のパラメータ時系列を格納する語句等パラメ
ータ格納部と、読み列に対するパラメータ時系列の設定
が依頼されたとき上記音素等パラメータ格納部および語
句等パラメーク格納部の集まりの中に存在するパラメー
タ時系列を使用して上記読み列に対するパラメータ時系
列を作成するパラメータ時系列作成手段とを具備する音
声合成装置であって、上記バラメーク時系列作成手段は
、上記集まりの中に上記読み列全体に対するパラメータ
時系列が存在せず且つ当該読み列に対するパラメータ時
系列が上記集まりの中に存在するパラメータ時系列の組
合せの複数個で表わせる場合、これらの＆Ｕ合せの中で
最も音質が向上する組合せを判定し、この判定結果によ
って定まる組合せを用いて当該読み列に対するパラメー
タ時系列を作成するように構成することを特徴とするも
のである。Therefore, the speech synthesis device of the present invention includes a phoneme etc. parameter storage section that stores a parameter time series of phonemes etc. that can synthesize arbitrary words, and a word/phrase parameter storage section that stores a parameter time series of words, phrases, or longer units. When a request is made to set the parameter time series for the phoneme etc. parameter storage section and the word/phrase etc. parameter storage section, the parameter time series for the above pronunciation sequence is set using the parameter time series existing in the collection of the phoneme etc. parameter storage section and the word/phrase etc. parameter storage section. A speech synthesis device comprising parameter time series creation means for creating a time series, wherein the variable time series creation means is configured to perform parameter time series creation means for creating a time series when there is no parameter time series for the entire reading sequence in the collection and when the reading sequence does not include a parameter time series for the entire reading sequence. When the parameter time series for is expressed by multiple combinations of parameter time series existing in the above collection, the combination that improves the sound quality the most among these &U combinations is determined, and the combination determined by this determination result is used. The present invention is characterized in that it is configured to create a parameter time series for the reading sequence.

[Embodiments of the invention]

以下、本発明を図面を参照しつつ説明する。 Hereinafter, the present invention will be explained with reference to the drawings.

第１図は本発明の１実施例構成を示す図、第２図は第１
図のパラメータ組合せ判定部の処理を示す図である。FIG. 1 is a diagram showing the configuration of one embodiment of the present invention, and FIG. 2 is a diagram showing the configuration of one embodiment of the present invention.
It is a figure which shows the process of the parameter combination determination part of a figure.

第１図において、１は文章格納部、２は文章解析部、３
は韻律設定部、４はパラメータ変換部、５はパラメータ
組合せ判定部、６は音素等パラメータ格納部、７は語句
等パラメータ格納部をそれぞれ示している。In Figure 1, 1 is a text storage section, 2 is a text analysis section, and 3 is a text storage section.
Reference numeral 4 indicates a prosody setting section, 4 a parameter conversion section, 5 a parameter combination determination section, 6 a phoneme etc. parameter storage section, and 7 a phrase etc. parameter storage section.

文章格納部１には、コードの形の漢字仮名混じり文が格
納されている。文章解析部２は、単語辞書や文法辞書な
どを有しており、これらを用いて文章格納部１から取り
出された文字列を単語列に変換する。単語列とは、単語
の読み、単語の文法情杜（品詞種別）、ｔａ語の拍数及
び単語のアクセント情報等より成る単語情報の並びであ
る。文章解析部２から出力される単語列は、韻律設定部
３及びパラメータ変換部４に送られる。韻律設定部３は
、単語列に対して呼気段落境界を設定し、呼気段落区間
に対するピッチ・パターンを作成する。The sentence storage unit 1 stores sentences in the form of codes containing kanji and kana. The text analysis section 2 has a word dictionary, a grammar dictionary, etc., and uses these to convert the character string retrieved from the text storage section 1 into a word string. The word string is a sequence of word information including the pronunciation of the word, the grammatical information (part of speech type) of the word, the number of beats of the ta word, accent information of the word, and the like. The word string output from the sentence analysis section 2 is sent to the prosody setting section 3 and the parameter conversion section 4. The prosody setting unit 3 sets exhalation paragraph boundaries for the word string and creates pitch patterns for the exhalation paragraph sections.

呼気段落区間に対するピッチ・パターンは複数の山を有
しているが、ピッチ・パターンを山毎に区切り、この区
切りに対応ずに文節境界をパラメータ変換部４に通知す
る。パラメータ変換部４は、通知された文節境界に従っ
て文章解析部２から送られて来る読み列を区切り、この
結果作成される文節の読み列をパラメータ組合せ判定部
５に送る。Although the pitch pattern for the exhalation paragraph section has a plurality of peaks, the pitch pattern is divided into peaks, and the phrase boundaries are notified to the parameter conversion unit 4 without corresponding to the divisions. The parameter conversion unit 4 separates the pronunciation sequence sent from the text analysis unit 2 according to the notified clause boundaries, and sends the pronunciation sequence of the clauses created as a result to the parameter combination determination unit 5.

パラメータ組合せ判定部５は、音素等パラメータ格納部
６及び語句等パラメータ格納部７より成る集まりを参照
し、パラメータ変換部４から送られて来た文節読み列に
対する最適なパラメータ時系列の組合せを判定し、この
判定結果によって定まるパラメータ時系列を上記の集ま
りの中から取り出し、取り出されたパラメータ時系列を
パラメータ変換部４に送る。パラメータ変換部４は、パ
ラメータ組合せ判定部５から送られてくるパラメータ時
系列を結合して文節読み列に対するパラメータ時系列を
作成する。文節読み列に対するパラメータ時系列及び対
応するピッチ・パターンは、音声合成部８に送られる。The parameter combination determination section 5 refers to the collection consisting of the phoneme etc. parameter storage section 6 and the phrase etc. parameter storage section 7 and determines the optimal combination of parameter time series for the phrase pronunciation sequence sent from the parameter conversion section 4. Then, the parameter time series determined by this determination result is extracted from the above collection, and the extracted parameter time series is sent to the parameter conversion section 4. The parameter conversion unit 4 combines the parameter time series sent from the parameter combination determination unit 5 to create a parameter time series for the phrase pronunciation sequence. The parameter time series and the corresponding pitch pattern for the phrase pronunciation sequence are sent to the speech synthesis section 8.

音声合成部８は、例えばＰＡＣＯＲ方式のものである。The speech synthesis section 8 is of the PACOR type, for example.

第２図は、パラメータ組合せ判定部の処理を示す図であ
る。パラメータ組合せ判定部５では下記のような処理が
行われる。FIG. 2 is a diagram showing the processing of the parameter combination determination section. The parameter combination determination unit 5 performs the following processing.

■　読みの位置を示す変数Ａをｎに設定する。た＼ｊＬ
、ｎは読み列の読みの個数である。第１番目ないし第ｎ
番目の読みの並びに対応するパラメータ時系列が語句等
パラメータ格納部７に格納されているか否かを調べる。■ Set variable A indicating the reading position to n. Ta\jL
, n is the number of readings in the reading sequence. 1st to nth
It is checked whether the parameter time series corresponding to the th reading sequence is stored in the word/phrase parameter storage unit 7.

あれば、これをバラメーク変換部４に送る。なければ、
変数をｎ−１にし、第１番目ないし第ｎ−１番目の読み
の並びに対応するパラメータ時系列が語句等パラメータ
格納部７にあるか否かを調べる。あれば、これをパラメ
ータ変換部４に送り、なければ変数を−１する。このよ
うな処理を順番に繰り返す。変数が１を示したとき、先
頭の読みに対応するパラメータ時系列を音素等パラメー
タ格納部６から取り出し、これをパラメータ変換部４に
送る。If there is, it is sent to the variable make conversion unit 4. If not,
The variable is set to n-1, and it is checked whether the parameter storage unit 7 includes a parameter time series corresponding to the first to n-1st pronunciation sequences. If there is, it is sent to the parameter converter 4, and if there is not, the variable is decremented by one. This process is repeated in order. When the variable indicates 1, the parameter time series corresponding to the first reading is taken out from the phoneme etc. parameter storage section 6 and sent to the parameter conversion section 4.

■　第１番目ないし第ｎＬ（ｊ＋は０．１．−ｎ−１）
番目の読みに対応するパラメータ時系列をパラメータ変
換部４に送った後、残りの読み列について■と同様の処
理を行う。■ 1st to nL (j+ is 0.1.-n-1)
After sending the parameter time series corresponding to the th reading to the parameter conversion unit 4, the same process as in (2) is performed for the remaining reading sequences.

■　文節の終り、即ち残りの読み列がＯか否かを調べ、
Ｎｏであれば■の処理を繰り返す。■ Check whether the end of the clause, that is, the remaining reading sequence, is O,
If No, repeat the process (■).

次に、本発明によるパラメータの組合せ判定を具体的に
説明する。いま、「おんせいどう甘い」に対して「お」
　「ん」　「せ」　「い」　「ご」　「う」「せ」　「
い」　「おん」　「甘い」　「どう」　「せい」「ごう
せい」に対応する音声のパラメータが記憶されていると
すれば、「おん」＋「甘い」＋「ごうせい」の組合せが
選ばれる。Next, parameter combination determination according to the present invention will be specifically explained. Now, "o" is used for "onseido sweet"
"N""Se""I""Go""U""Se""
If the voice parameters corresponding to "i", "on", "sweet", "do", "sei" and "gousei" are stored, the combination "on" + "sweet" + "gousei" is selected. .

なお、第２図のようにしてパラメータ時系列の組合せ判
定を行う代りに、組合せの要素の数が最も少ない組合せ
を選択することも出来る。Note that instead of determining the combination of parameter time series as shown in FIG. 2, it is also possible to select a combination with the smallest number of combination elements.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれば、任意
の文を自然音声に近い音声に変換することが出来る。As is clear from the above description, according to the present invention, any sentence can be converted into speech that is close to natural speech.

[Brief explanation of the drawing]

第１図は本発明の１実施例構成を示す図、第２図は第１
図のパラメータ組合せ判定部の処理を示す図である。１・・・文章格納部、２・・・文章解析部、３・・・韻
律設定部、４・・・パラメータ変換部、５・・・パラメ
ータ組合せ判定部、６・・・音素等パラメータ格納部、
７・・・語句等パラメータ格納部。FIG. 1 is a diagram showing the configuration of one embodiment of the present invention, and FIG. 2 is a diagram showing the configuration of one embodiment of the present invention.
It is a figure which shows the process of the parameter combination determination part of a figure. 1... Sentence storage unit, 2... Text analysis unit, 3... Prosody setting unit, 4... Parameter conversion unit, 5... Parameter combination determination unit, 6... Phoneme etc. parameter storage unit ,
7...Parameter storage unit for words, etc.

Claims

[Claims]

A phoneme etc. parameter storage section that stores a parameter time series of phonemes etc. that can synthesize arbitrary words, a phrase etc. parameter storage section that stores a parameter time series of words, phrases, or longer units, and a parameter time series for reading sequences. a parameter time series creation means for creating a parameter time series for the pronunciation sequence using the parameter time series existing in the collection of the phoneme etc. parameter storage section and the word/phrase etc. parameter storage section when the sequence setting is requested; In the speech synthesis device, the parameter time series creation means is configured to perform a process in which a parameter time series for the entire reading sequence does not exist in the collection, and a parameter time series for the reading sequence does exist in the collection. When the parameter time series can be expressed by multiple combinations of parameter time series, the combination that improves the sound quality the most among these combinations is determined, and the parameter time series for the reading sequence is created using the combination determined by this determination result. A speech synthesis device characterized in that it is configured to.