JPH07140996A

JPH07140996A - Speech rule synthesizer

Info

Publication number: JPH07140996A
Application number: JP5286863A
Authority: JP
Inventors: Yoshiaki Teramoto; 良明寺本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-11-16
Filing date: 1993-11-16
Publication date: 1995-06-02

Abstract

PURPOSE:To easily correct the length of continuation time in a speech rule synthesizer. CONSTITUTION:A text analysis means 101 analyzes an inputted text and obtains an information relative to phonemes and rhythm. Based on the information, a pitch pattern generating means 102 generates the pitch pattern which indicates time variation of fundamental frequencies, a continuation time length generating means 103 generates the trains of continuation time length of each phoneme. An input means 111 inputs the pair of the phoneme trains and the continuation time length of each phoneme included in the phoneme train, a storage means 112 which stores the phoneme trains and the corresponding continuation time length trains. A waveform generating means 104 refers to the means 111 and 112 in accordance with the input of the phoneme trains and when a proper continuation time length train is retrieved, the continuation time length train from the means 103 is replaced by the continuation time length train retrieved from the means 112 and a replacement means 113 transmits the train to the means 104.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声応答システムなど
に用いられる音声規則合成装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice rule synthesizing device used in a voice response system or the like.

【０００２】音声応答システムは、求人情報やチケット
の予約情報のように、刻々と変化するデータベースの内
容に関する電話による問い合わせに対して、音声によっ
て応答するシステムである。[0002] The voice response system is a system that responds by voice to an inquiry by telephone regarding the contents of a database that changes from moment to moment such as recruitment information and ticket reservation information.

【０００３】このような音声応答システムにおいては、
データベースの内容の変化に応じて応答内容も多種多様
に変化するので、任意の入力テキストに対応して読み上
げ音声を合成可能な音声規則合成装置が用いられてい
る。In such a voice response system,
Since the response contents are changed in various ways according to the change in the contents of the database, a speech rule synthesizing device capable of synthesizing a reading voice corresponding to an arbitrary input text is used.

【０００４】[0004]

【従来の技術】図１３に、従来の日本語音声規則合成装
置の構成を示す。音声規則合成装置において、テキスト
解析部２０１は、例えば、漢字かな混じり文の入力に応
じて、単語辞書２０２を参照しながら入力テキストを解
析し、読みの情報と韻律の情報とを生成する。この読み
の情報および韻律の情報は、ピッチパターン生成部２０
３および継続時間長生成部２０４に送出されるととも
に、音韻系列バッファ２０５に保持される。2. Description of the Related Art FIG. 13 shows the configuration of a conventional Japanese speech rule synthesizing device. In the speech rule synthesizing device, the text analysis unit 201 analyzes the input text with reference to the word dictionary 202 in accordance with the input of a kanji / kana mixed sentence, and generates reading information and prosody information. The reading information and the prosody information are provided to the pitch pattern generation unit 20.
3 and the duration length generation unit 204, and is held in the phoneme sequence buffer 205.

【０００５】ここで、読みの情報とは、入力テキストを
示す音韻系列を示すものであり、例えば、「朝早くバン
ガローに電報が届いた」というテキストの入力に応じ
て、「アサハヤクバンガローニデンポウガトドイタ」と
いう音韻系列を示す読みの情報が得られる。一方、韻律
の情報は、対応する音韻系列を発音する際のイントネー
ションやアクセントに関する情報であり、例えば、上述
したような音韻系列にアクセント記号などを付加して示
される。Here, the reading information indicates a phoneme sequence indicating an input text. For example, in response to input of a text "a telegram arrived at a bungalow early in the morning", "Asahaya yak bungalow Nidenpo" The reading information indicating the phoneme sequence "Ugato Doita" is obtained. On the other hand, the prosody information is information about intonation and accent when the corresponding phoneme sequence is pronounced, and is shown by adding an accent symbol or the like to the above-described phoneme sequence, for example.

【０００６】ピッチパターン生成部２０３は、読みの情
報および韻律の情報で示されたアクセントやフレーズを
解析し、その音韻系列を発声する際の基本周波数の時間
変化を示すピッチパターンを生成し、ピッチパターンバ
ッファ２０６に送出する。また、継続時間長生成部２０
４は、読みの情報および韻律の情報に基づいて、その音
韻系列に含まれる各音韻について、その音韻を継続する
時間長を生成し、継続時間長バッファ２０７に送出す
る。The pitch pattern generator 203 analyzes accents and phrases indicated by reading information and prosody information, generates a pitch pattern showing a temporal change of the fundamental frequency when uttering the phoneme sequence, and generates a pitch pattern. It is sent to the pattern buffer 206. In addition, the duration generation unit 20
On the basis of the reading information and the prosody information, 4 generates a time length for continuing the phoneme for each phoneme included in the phoneme sequence and sends it to the duration buffer 207.

【０００７】波形生成部２０８は、音韻系列バッファ２
０５内の音韻系列に基づいて、波形辞書２０９から各音
韻に対応する素片波形を検索し、これらの素片波形の周
波数をピッチパターンに従って調整し、更に、各音韻に
対応する信号をそれぞれの継続時間長だけ生成してスピ
ーカ２１０に送出する。The waveform generation unit 208 includes a phoneme sequence buffer 2
Based on the phoneme sequence in 05, the phoneme waveforms corresponding to the phonemes are searched from the waveform dictionary 209, the frequencies of these phoneme waveforms are adjusted according to the pitch pattern, and the signals corresponding to the phonemes are respectively adjusted. Only the duration length is generated and sent to the speaker 210.

【０００８】このように、音声規則合成装置は、各音韻
に対応する素片波形を所定の規則に基づいて調整すると
ともに結合するものであるから、日本語を構成する全て
の音韻に対応する素片波形を波形辞書２０９に登録して
おけば、任意の入力テキストに対応した読み上げ音声を
得ることができる。As described above, since the speech rule synthesizing device adjusts and combines the segment waveforms corresponding to each phoneme based on a predetermined rule, the speech rule synthesizing device corresponds to all phonemes composing Japanese. If one-sided waveform is registered in the waveform dictionary 209, it is possible to obtain a read voice corresponding to any input text.

【０００９】上述した継続時間長生成部２０４は、各音
韻に対応してその音韻固有の平均的な継続時間長を保持
した対応表を備えており、音韻系列の入力に応じてこの
対応表を参照することによって基本の継続時間長を得た
後に、様々な条件を考慮して各音韻に対応する継続時間
長を調整する構成となっている。The duration duration generation unit 204 described above has a correspondence table holding the average duration duration peculiar to the phoneme corresponding to each phoneme, and this correspondence table is provided according to the input of the phoneme sequence. After the basic duration is obtained by reference, the duration corresponding to each phoneme is adjusted in consideration of various conditions.

【００１０】例えば、隣接する音韻の種類や一息で発生
される単位である呼気段落内での位置に応じて、各音韻
の継続時間長を調整したり、呼気段落内のモーラ数に応
じて、該当する音韻系列に含まれる全ての音韻に対応す
る継続時間長を調整することにより、合成音声のリズム
を人間が発声したときのリズムに近づけて、より自然な
読み上げ音声を得ることが可能である。For example, the duration of each phoneme is adjusted according to the type of adjacent phonemes or the position in the expiratory paragraph which is a unit generated in one breath, or according to the number of mora in the expiratory paragraph. By adjusting the duration time corresponding to all the phonemes included in the corresponding phoneme sequence, it is possible to bring the rhythm of the synthesized speech closer to the rhythm when a human utters it, and obtain a more natural reading voice. .

【００１１】[0011]

【発明が解決しようとする課題】ところで、実際の言語
は不規則なリズムで発声が行われる音韻系列を多く含ん
でいるから、上述したような条件では調整しきれない例
外が存在し、このために、リズム的に不自然な読み上げ
音声が合成される場合がある。例えば、「セツゾクタイ
ショー」や「ジョーホーカシャカイ」のように、固有の
音素長が長い音素と短い音素とが複雑に入り交じってい
る系列では、合成音声における各音韻の継続時間長と人
間が発声した場合の各音韻の継続時間長との差が大き
く、たどたどしい印象を与える場合が多い。By the way, since an actual language includes many phonological sequences in which utterance is performed with an irregular rhythm, there are exceptions that cannot be adjusted under the above-mentioned conditions. In addition, a rhythmically unnatural reading voice may be synthesized. For example, in a sequence in which a phoneme with a long intrinsic phoneme and a phoneme with a short phoneme are intricately mixed, such as “Settsukutai Show” and “Joho Hawka Shakai”, the duration of each phoneme in synthetic speech and human utterance There is a large difference from the duration of each phoneme in the case of doing, and it often gives a lingering impression.

【００１２】読み上げ音声の中の不自然な箇所は、利用
者によって発見される場合が多いが、継続時間長を調整
するための規則を設定するには音声学などの知識が必要
であるため、事実上、利用者が必要な修正を施すことは
できなかった。[0012] An unnatural part in the read-aloud voice is often found by the user, but since knowledge such as phonetics is necessary to set the rule for adjusting the duration, In effect, the user could not make the necessary modifications.

【００１３】このため、従来は、音声規則合成装置のメ
ーカー側で装置を改良する際に、利用者からの要求に応
じて、継続時間長の調整に関して様々な規則を追加して
対応していた。Therefore, conventionally, when the maker of the voice rule synthesizing device improves the device, various rules regarding the adjustment of the duration time are added to meet the request from the user. .

【００１４】しかしながら、例外の全てに対応して、継
続時間長を調整するための規則を設定することは非常に
困難である。また、例外的な発声は不規則なものが多い
から、それぞれの例外に対応する規則を抽出すること自
体も困難である。更に、１つの例外に対応するための規
則を適用したことによって、他の音韻系列の継続時間長
がかえって悪い方に修正されてしまう可能性もある。However, it is very difficult to set a rule for adjusting the duration time for all the exceptions. Also, since exceptional utterances are often irregular, it is difficult to extract the rules corresponding to each exception. Furthermore, by applying the rule for dealing with one exception, the duration of other phoneme sequences may be corrected to a bad one.

【００１５】本発明は、任意の音韻系列に対応する継続
時間長を容易に修正可能な音声規則合成装置を提供する
ことを目的とする。It is an object of the present invention to provide a speech rule synthesizing device capable of easily modifying the duration length corresponding to an arbitrary phoneme sequence.

【００１６】[0016]

【課題を解決するための手段】図１は、請求項１および
請求項２の音声規則合成装置の原理ブロック図である。FIG. 1 is a block diagram showing the principle of a voice rule synthesizing apparatus according to claims 1 and 2.

【００１７】請求項１の発明は、テキスト解析手段１０
１が入力テキストを解析して得られる音韻系列と韻律に
関する情報に基づいて、ピッチパターン生成手段１０２
と継続時間長生成手段１０３とが、基本周波数の時間変
化を表すピッチパターンと各音韻の継続時間長の系列と
をそれぞれ生成し、これらの情報に基づいて、波形生成
手段１０４が音声を合成する音声規則合成装置におい
て、音韻系列とこの音韻系列に含まれる各音韻の継続時
間長との対を入力する入力手段１１１と、音韻系列と対
応する継続時間長の系列とを蓄積する蓄積手段１１２
と、音韻系列の入力に応じて蓄積手段１１２を参照し、
該当する継続時間長の系列を検索したときに、継続時間
長生成手段１０３からの継続時間長の系列を蓄積手段１
１２から検索された継続時間長の系列で置き換えて、波
形生成手段１０４に送出する置換手段１１３とを備えた
ことを特徴とする。According to the invention of claim 1, the text analysis means 10 is provided.
1 is a pitch pattern generation means 102 based on information about a phoneme sequence and prosody obtained by analyzing the input text.
And the duration generating means 103 respectively generate a pitch pattern representing a temporal change of the fundamental frequency and a sequence of durations of each phoneme, and the waveform generating means 104 synthesizes a voice based on these pieces of information. In the speech rule synthesizing device, an input means 111 for inputting a pair of a phoneme sequence and a duration of each phoneme included in the phoneme sequence and a storage means 112 for accumulating a sequence of durations corresponding to the phoneme sequence.
And referring to the accumulating means 112 according to the input of the phoneme sequence,
When searching for the corresponding series of durations, the means 1 accumulates the series of durations from the duration generating means 103.
It is characterized in that it is provided with a replacement unit 113 which replaces with the sequence of the duration length searched from 12 and sends it to the waveform generation unit 104.

【００１８】請求項２の発明は、請求項１に記載の音声
規則合成装置において、入力手段１１１が、波形生成手
段１０４から合成音声に対応する音韻系列を示す情報が
入力されており、入力テキストのうち少なくとも音韻系
列に対応する部分を表示する表示手段１２１と、表示手
段１２１の表示画面上における位置の入力に応じて、該
当する位置に表示された音韻系列を抽出して、蓄積手段
１１２に送出する音韻系列抽出手段１２２とを備えた構
成であることを特徴とする。According to a second aspect of the present invention, in the voice rule synthesizing apparatus according to the first aspect, the input means 111 receives the information indicating the phoneme sequence corresponding to the synthesized voice from the waveform generating means 104, and the input text. A display unit 121 for displaying at least a portion corresponding to a phoneme sequence among them, and a phoneme sequence displayed at the corresponding position is extracted according to an input of a position on the display screen of the display unit 121, and stored in the storage unit 112. And a phoneme sequence extraction means 122 for transmitting.

【００１９】図２は、請求項３の音声規則合成装置の原
理ブロック図である。請求項３の発明は、請求項１に記
載の音声規則合成装置において、入力手段１１１が、選
択指示の入力に応じて、予め設定された複数の規則から
該当する規則を選択する規則選択手段１３１と、音韻系
列の入力に応じて、規則選択手段１３１で選択された規
則を適用して継続時間長を修正し、得られた修正値を蓄
積手段１１２に送出する修正手段１３２とを備えた構成
であることを特徴とする。FIG. 2 is a block diagram showing the principle of the speech rule synthesizing apparatus of claim 3. According to a third aspect of the present invention, in the voice rule synthesizing apparatus according to the first aspect, the input unit 111 selects a corresponding rule from a plurality of preset rules according to the input of a selection instruction. And a correction unit 132 that corrects the duration length by applying the rule selected by the rule selection unit 131 according to the input of the phoneme sequence and sends the obtained correction value to the storage unit 112. Is characterized in that.

【００２０】図３は、請求項４の音声規則合成装置の原
理ブロック図である。請求項４の発明は、請求項１に記
載の音声規則合成装置において、入力手段１１１が、音
韻系列に対応する音声が入力され、境界指定指示の入力
に応じて、音声の波形における各音韻の境界位置を設定
する境界設定手段１４１と、音声の波形における各音韻
の境界位置に基づいて、各音韻の継続時間長を抽出する
継続時間長抽出手段１４２とを備えた構成であることを
特徴とする。FIG. 3 is a block diagram showing the principle of the speech rule synthesizing apparatus of claim 4. According to a fourth aspect of the invention, in the voice rule synthesizing apparatus according to the first aspect, the input unit 111 receives a voice corresponding to a phoneme sequence, and, in response to an input of a boundary designation instruction, outputs each phoneme of a waveform of the voice It is characterized by including a boundary setting means 141 for setting a boundary position and a duration length extraction means 142 for extracting the duration length of each phoneme based on the boundary position of each phoneme in the waveform of the voice. To do.

【００２１】図４は、請求項４の音声規則合成装置の原
理ブロック図である。請求項５の発明は、請求項１に記
載の音声規則合成装置において、入力手段１１１が、音
韻系列に対応する音声と音韻系列に対応して波形生成手
段１０４で生成された合成音声とが入力され、音声にお
ける各音韻と合成音声における各音韻との対応づけを行
い、音声における各音韻の境界位置を設定するマッチン
グ手段１４３と、音声における各音韻の境界位置に基づ
いて、音声における各音韻の継続時間長を抽出する継続
時間長抽出手段１４２とを備えた構成であることを特徴
とする。FIG. 4 is a block diagram showing the principle of the speech rule synthesizing device of the fourth aspect. According to a fifth aspect of the invention, in the voice rule synthesizing apparatus according to the first aspect, the input unit 111 inputs the voice corresponding to the phoneme sequence and the synthesized voice generated by the waveform generating unit 104 corresponding to the phoneme sequence. Based on the matching unit 143 that associates each phoneme in the voice with each phoneme in the synthesized voice and sets the boundary position of each phoneme in the voice, and the boundary position of each phoneme in the voice, The present invention is characterized in that it is provided with a duration extraction unit 142 for extracting the duration.

【００２２】[0022]

【作用】請求項１の発明は、入力手段１１１を介して入
力された音韻系列と対応する継続時間長の系列とを蓄積
手段１１２が蓄積し、該当する音韻系列の入力に応じ
て、置換手段１１３が蓄積手段１１２内の継続時間長の
系列を出力するものである。したがって、利用者が、リ
ズム的に不自然な印象を持った音韻系列を音節あるいは
アクセント句単位に入力するとともに、適切な継続時間
長の系列を入力することにより、任意の音韻系列に対し
て継続時間長の補正を行うことができる。According to the invention of claim 1, the accumulating means 112 accumulates the phoneme sequence inputted through the input means 111 and the corresponding duration length sequence, and the replacing means in accordance with the input of the corresponding phoneme sequence. Reference numeral 113 is for outputting a series of durations in the storage means 112. Therefore, the user inputs a phoneme sequence having a rhythmically unnatural impression in units of syllables or accent phrases, and by inputting a sequence having an appropriate duration, the user can continue the sequence for any phoneme sequence. The length of time can be corrected.

【００２３】この場合は、継続時間長を修正するための
規則を抽出する必要がないから、利用者が比較的容易に
継続時間長を補正することができる。また、継続時間長
の補正値は、その音韻系列にのみ適用されるので、継続
時間長を修正するための規則を追加した場合と異なり、
他の音韻系列の継続時間長の系列に悪影響を及ぼすこと
はない。In this case, since it is not necessary to extract the rule for correcting the duration time, the user can relatively easily correct the duration time. Also, since the correction value of the duration is applied only to that phoneme sequence, unlike the case where a rule for correcting the duration is added,
It does not adversely affect the duration sequence of other phoneme sequences.

【００２４】更に、様々な音韻系列に対応する補正値を
蓄積手段１１２に蓄積していくことにより、多様な用途
に柔軟に対応して、高品質の合成音声を生成することが
可能となる。Further, by accumulating the correction values corresponding to various phoneme sequences in the accumulating means 112, it becomes possible to flexibly cope with various uses and generate high-quality synthetic speech.

【００２５】請求項２の発明は、波形生成手段１０４に
よる合成音声の生成動作に同期して、表示手段１２１が
入力テキストの少なくとも一部を表示することにより、
利用者に、合成音声とテキストの該当部分との対応関係
に関する情報を明示することができる。更に、音韻系列
抽出手段１２２が、表示画面上の位置に基づいて、テキ
ストの指定された部分に対応する音韻系列を抽出するこ
とにより、音韻系列の入力作業を簡易化することができ
る。According to the second aspect of the present invention, the display means 121 displays at least a part of the input text in synchronization with the operation of generating the synthesized voice by the waveform generation means 104.
Information about the correspondence between the synthesized voice and the corresponding portion of the text can be specified to the user. Further, the phoneme sequence extraction means 122 extracts the phoneme sequence corresponding to the designated portion of the text based on the position on the display screen, so that the input work of the phoneme sequence can be simplified.

【００２６】請求項３の発明は、規則選択手段１３１に
よって選択された規則に従って、修正手段１３２が該当
する継続時間長の系列を修正する構成とすることによ
り、継続時間長の系列そのものを入力する作業を規則の
選択作業に置き換えることが可能となり、継続時間長の
補正作業を簡易化することができる。According to the third aspect of the present invention, the correction means 132 corrects the corresponding duration length sequence in accordance with the rule selected by the rule selection means 131, thereby inputting the duration length sequence itself. It becomes possible to replace the work with the work of selecting the rule, and it is possible to simplify the work of correcting the duration time.

【００２７】請求項４の発明は、境界設定手段１４１に
境界指定指示を入力することにより、入力した音声波形
における各音韻の境界位置を設定し、継続時間長抽出手
段１４２によって継続時間長の系列を抽出することがで
きる。According to the fourth aspect of the present invention, by inputting a boundary designating instruction to the boundary setting means 141, the boundary position of each phoneme in the input speech waveform is set, and the duration length extracting means 142 sets the duration time series. Can be extracted.

【００２８】この場合は、継続時間長の補正値を音声と
して入力することができるから、試行錯誤を繰り返す必
要がなく、継続時間長の補正作業を簡易化することがで
きる。In this case, since the correction value of the duration time can be input as a voice, it is not necessary to repeat trial and error, and the work of correcting the duration time can be simplified.

【００２９】更に、請求項５の発明は、マッチング手段
１４３により、入力音声と合成音声とについて各音韻の
対応づけを行うことにより、入力音声波形における各音
韻の境界位置を自動的に設定することが可能であり、継
続時間長の補正作業をより簡易化することができる。Further, according to the invention of claim 5, the matching means 143 automatically sets the boundary position of each phoneme in the input voice waveform by associating each phoneme with the input voice and the synthesized voice. It is possible to further simplify the work of correcting the duration time.

【００３０】[0030]

【実施例】以下、図面に基づいて本発明の実施例につい
て詳細に説明する。図５に、請求項１の音声規則合成装
置の実施例構成図を示す。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 5 shows a block diagram of an embodiment of the speech rule synthesizing apparatus of claim 1.

【００３１】図５において、音声規則合成装置は、図１
３に示した従来の音声規則合成装置に、補正系列バッフ
ァ２１１と補正値作成部２１２と補正値データベース２
１３と継続時間長補正部２１４とユーザインタフェース
部２２０とを付加して構成されている。In FIG. 5, the speech rule synthesizing device is shown in FIG.
In the conventional speech rule synthesizing device shown in FIG. 3, the correction sequence buffer 211, the correction value creating unit 212, and the correction value database 2 are added.
13, a duration correction unit 214, and a user interface unit 220 are added.

【００３２】また、図５において、ユーザインタフェー
ス部２２０は、キーボード２２１とマウス２２２と入力
解析部２２３と表示部２２４と表示データ作成部２２５
とから構成されている。このユーザーインタフェース部
２２０において、入力解析部２２３は、キーボード２２
１およびマウスから入力される指示に応じて、テキスト
解析部２０１，波形生成部２０８および補正値作成部２
１２に必要な指示を送出する構成となっている。また、
キーボード２２１から入力された補正対象の音韻系列
は、入力解析部２２３を介して補正系列バッファ２１１
に送出される。また、入力解析部２２３は、音韻継続長
の補正を行う旨の指示に応じて、表示データ作成部２２
５を起動し、これに応じて、表示データ作成部２２５
が、音韻継続長の補正を補助する表示画面（後述する）
を表示するための表示データを作成し、表示部２２４に
送出する構成となっている。In FIG. 5, the user interface section 220 includes a keyboard 221, a mouse 222, an input analysis section 223, a display section 224, and a display data creation section 225.
It consists of and. In the user interface unit 220, the input analysis unit 223 is provided with the keyboard 22.
1 and the instruction input from the mouse, the text analysis unit 201, the waveform generation unit 208, and the correction value generation unit 2
It is configured to send the necessary instructions to 12. Also,
The phoneme sequence to be corrected input from the keyboard 221 is corrected by the correction sequence buffer 211 via the input analysis unit 223.
Sent to. The input analysis unit 223 also displays the display data creation unit 22 in response to an instruction to correct the phoneme duration.
5 is started, and in response to this, the display data creation unit 225
However, a display screen (described later) that assists in correcting the phoneme duration
The display data for displaying is generated and sent to the display unit 224.

【００３３】また、補正値作成部２１２は、入力解析部
２２３からの指示に応じて、後述する処理によって指定
された音韻の継続時間長を補正値を作成して継続時間時
間長バッファ２０７に送出し、補正値を決定する旨の指
示に応じて、継続時間長の補正値を対応する音韻系列と
ともに補正値データベース２１３に送出する。In addition, the correction value creation unit 212 creates a correction value for the duration of the phoneme designated by the processing described later in response to the instruction from the input analysis unit 223 and sends it to the duration buffer 207. Then, in response to the instruction to determine the correction value, the correction value of the duration time is sent to the correction value database 213 together with the corresponding phoneme sequence.

【００３４】補正値データベース２１３は、蓄積手段１
１２に相当するものであり、補正値作成部２１２からの
音韻系列と継続時間長の補正値との対を蓄積する構成と
なっている。The correction value database 213 is stored in the storage means 1.
This is equivalent to 12 and is configured to accumulate a pair of a phoneme sequence from the correction value creation unit 212 and a correction value of the duration time.

【００３５】また、継続時間長補正部２１４は、置換手
段１１３に相当するものであり、音韻系列の入力に応じ
て補正値データベース２１３を検索し、対応する継続時
間長の補正値が格納されている場合には、継続時間長生
成部２０４で得られた継続時間長を補正値で置き換え
て、継続時間長バッファ２０７に送出する構成となって
いる。The duration correction unit 214 corresponds to the replacing unit 113, searches the correction value database 213 according to the input of the phoneme sequence, and stores the corresponding duration correction values. If there is, the duration length obtained by the duration length generation unit 204 is replaced with a correction value and the result is sent to the duration length buffer 207.

【００３６】なお、図５において、波形生成部２０８
は、波形生成手段１０４に相当するものであり、通常の
波形生成処理を行うとともに、継続時間長の補正を行う
旨の指示に応じて、入力解析部２２３からの指示に応じ
て、波形生成処理を実行する構成とすればよい。In FIG. 5, the waveform generator 208
Is equivalent to the waveform generation means 104, and performs a normal waveform generation process and a waveform generation process in response to an instruction from the input analysis unit 223 in response to an instruction to correct the duration time. May be configured to execute.

【００３７】また、テキスト解析部２０１，ピッチパタ
ーン生成部２０３，継続時間長生成部２０４は、それぞ
れテキスト解析手段１０１，ピッチパターン生成手段１
０２，継続時間長生成手段１０３に相当するものであ
る。Further, the text analysis unit 201, the pitch pattern generation unit 203, and the duration length generation unit 204 respectively include the text analysis unit 101 and the pitch pattern generation unit 1.
02, which corresponds to the duration length generating means 103.

【００３８】以下、本発明の継続時間長の補正処理につ
いて説明する。図６に、補正対象の音韻系列および継続
時間長の補正値を入力する動作を表す流れ図を示す。The correction process of the duration time of the present invention will be described below. FIG. 6 is a flowchart showing the operation of inputting the phoneme sequence to be corrected and the correction value of the duration.

【００３９】リズム的に不自然な合成音声を発見したと
きに、利用者は、キーボード２２１あるいはマウス２２
２を介して継続時間長を補正する旨を指示し、次いで、
例えばキーボード２２１を介して補正対象の音韻系列を
示す文字列をアクセント句単位（あるいは音節単位）で
入力する（ステップ３０１）。When a rhythmically unnatural synthesized voice is found, the user is required to use the keyboard 221 or the mouse 22.
Instruct to correct the duration length via 2 and then
For example, a character string indicating the phoneme sequence to be corrected is input in accent phrase units (or syllable units) via the keyboard 221 (step 301).

【００４０】入力解析部２２３は、このステップ３０１
で入力された文字列を補正系列バッファ２１１に送出
し、テキスト解析部２０１に対して、この補正系列バッ
ファ２１１内の文字列についての解析処理を指示する。
これに応じて、テキスト解析部２０１，ピッチパターン
生成部２０２，継続時間長生成部２０４は、通常のテキ
ストと同様にして、音韻系列，ピッチパターンおよび継
続時間長を生成し、それぞれ音韻系列バッファ２０５，
ピッチパターンバッファ２０６および継続時間長バッフ
ァ２０７に保持する。The input analysis unit 223 uses this step 301.
The character string input in (1) is sent to the correction series buffer 211, and the text analysis unit 201 is instructed to analyze the character string in the correction series buffer 211.
In response to this, the text analysis unit 201, the pitch pattern generation unit 202, and the duration length generation unit 204 generate the phoneme sequence, the pitch pattern, and the duration length in the same manner as the ordinary text, and the phoneme sequence buffer 205, respectively. ，
It is held in the pitch pattern buffer 206 and the duration buffer 207.

【００４１】次に、入力解析部２２３は、波形生成部２
０８に合成音声の生成を指示し、これに応じて、波形生
成部２０８は、音韻系列バッファ２０５，ピッチパター
ンバッファ２０６および継続時間長バッファ２０７の内
容に基づいて、該当する合成音声を生成しスピーカ２１
０に出力する（ステップ３０２）。Next, the input analysis section 223 uses the waveform generation section 2
08 is instructed to generate a synthetic voice, and in response to this, the waveform generation unit 208 generates a corresponding synthetic voice based on the contents of the phoneme sequence buffer 205, the pitch pattern buffer 206, and the duration buffer 207, and outputs the generated voice to the speaker. 21
It is output to 0 (step 302).

【００４２】また、このとき、入力解析部２２３は、表
示データ作成部２２５を起動し、これに応じて、表示デ
ータ作成部２２５は、波形生成部２０８で得られた波形
データと継続時間長バッファ２０７の内容とに基づいて
表示データを作成し、表示部２２４に送出する（ステッ
プ３０３）。At this time, the input analysis section 223 activates the display data creation section 225, and in response to this, the display data creation section 225 causes the display data creation section 225 to display the waveform data obtained by the waveform generation section 208 and the duration buffer. Display data is created based on the contents of 207 and sent to the display unit 224 (step 303).

【００４３】表示データ作成部２２５は、例えば、上述
した合成音声の波形を表す表示データとともに、継続時
間長バッファ２０７の内容に基づいて、合成音声の波形
における各音韻の境界位置および各音韻の継続時間を示
す表示データを作成し、これらの表示データを表示部２
２４に送出すればよい。これにより、表示部２２４によ
り、図７に示すような補正値の入力を補助する表示画面
が得られる。The display data creation unit 225, for example, together with the display data representing the waveform of the above-mentioned synthesized voice, based on the contents of the duration buffer 207, the boundary position of each phoneme in the waveform of the synthesized voice and continuation of each phoneme. Display data indicating time is created, and these display data are displayed on the display unit 2.
It is sufficient to send it to 24. As a result, the display unit 224 obtains a display screen as shown in FIG. 7 for assisting the input of the correction value.

【００４４】図７においては、合成音声の波形の例とし
て、音韻系列「セツゾクタイショー」に対応する合成音
声の波形の一部を示した。また、図７において、各音韻
を構成する音素をローマ字表記で示し、対応する波形の
部分を実線で区切って、それぞれの継続時間長を示して
いる。In FIG. 7, as an example of the waveform of the synthetic voice, a part of the waveform of the synthetic voice corresponding to the phonological sequence "Settsukutaisho" is shown. Further, in FIG. 7, the phonemes that make up each phoneme are shown in Roman letters, and the corresponding waveform portions are separated by solid lines to show the respective durations.

【００４５】利用者は、図７に示したような表示画面を
見ながらマウス２２２を操作し、各音素の境界位置の変
更を指示すればよい。このとき、入力解析部２２３は、
マウス２２２を利用者が操作して入力した位置データを
表示データ作成部２２５に送出するとともに補正値作成
部２１２に送出する。これに応じて、表示データ作成部
２２５は、表示データを更新し、一方、補正値作成部２
１２は、上述した位置データから該当する音素の境界位
置の変化分を求め、この境界位置の変化分を各音素の継
続時間長の変化分に変換して、継続時間長の補正値を算
出し、継続時間長バッファ２０７の内容を更新する（ス
テップ３０４，３０５）。The user may operate the mouse 222 while looking at the display screen as shown in FIG. 7 to indicate the change of the boundary position of each phoneme. At this time, the input analysis unit 223
The position data input by the user operating the mouse 222 is sent to the display data creating unit 225 and the correction value creating unit 212. In response to this, the display data creation unit 225 updates the display data, while the correction value creation unit 2
Reference numeral 12 obtains a change amount of the boundary position of the corresponding phoneme from the position data described above, converts the change amount of the boundary position into a change amount of the duration time of each phoneme, and calculates a correction value of the duration time. The contents of the duration buffer 207 are updated (steps 304 and 305).

【００４６】すなわち、ユーザインタフェース部２２０
と補正値作成部２１２とによって、入力手段１１１の機
能が実現され、補正対象の音韻系列と継続時間長の補正
値とを入力することが可能となっている。この場合は、
各音韻を構成する音素毎に継続時間長を補正することが
できる。That is, the user interface section 220
The correction value creation unit 212 realizes the function of the input unit 111, and can input the phoneme sequence to be corrected and the correction value of the duration time. in this case,
The duration time can be corrected for each phoneme forming each phoneme.

【００４７】また、このとき、入力解析部２２３は、波
形生成部２０８に合成音声の生成を指示し、これに応じ
て、ステップ３０５で作成された継続時間長の補正値を
用いて、補正対象の音韻系列に対応する合成音声が生成
される（ステップ３０６）。Further, at this time, the input analysis section 223 instructs the waveform generation section 208 to generate a synthetic voice, and in response thereto, the correction target of the correction target is corrected using the correction value of the duration length generated in step 305. The synthesized speech corresponding to the phoneme sequence of is generated (step 306).

【００４８】このステップ３０６で得られた合成音声を
聞いて、利用者は、合成音声のリズムが適切か否かを判
断し、その判断結果をマウス２２２あるいはキーボード
２２１を操作して入力すればよい。By listening to the synthesized voice obtained in step 306, the user may determine whether or not the rhythm of the synthesized voice is appropriate, and input the determination result by operating the mouse 222 or the keyboard 221. .

【００４９】合成音声が適切でない旨の判断結果が入力
された場合は、ステップ３０７の否定判定となり、再び
ステップ３０３に戻って、新しい補正値による補正処理
を繰り返す。When the judgment result indicating that the synthesized voice is not appropriate is inputted, the negative judgment is made in step 307, the flow returns to step 303 again, and the correction processing by the new correction value is repeated.

【００５０】上述したステップ３０３〜ステップ３０７
を繰り返すことにより、利用者が、試行錯誤を繰り返し
ながら、継続時間長の系列を補正することが可能とな
る。このようにして、継続時間長の系列を補正すること
により、自然な合成音声が得られたときに、合成音声が
適切である旨の判断結果を入力すればよい。Steps 303 to 307 described above
By repeating the above, it becomes possible for the user to correct the series of the duration length while repeating trial and error. In this way, by correcting the series of durations, when a natural synthesized speech is obtained, a determination result indicating that the synthesized speech is appropriate may be input.

【００５１】この場合は、ステップ３０７の肯定判定と
なり、補正値作成部２１２は、算出した補正値と対応す
る音韻系列とを補正値データベース２１３に送出する
（ステップ３０８）。これにより、該当する音韻系列に
対応する継続時間長の補正値が、補正値データベース２
１３に蓄積され、その後、継続時間長の補正処理を終了
すればよい。In this case, an affirmative decision is made in step 307, and the correction value creating section 212 sends the calculated correction value and the corresponding phoneme sequence to the correction value database 213 (step 308). As a result, the correction value of the duration length corresponding to the corresponding phoneme sequence is stored in the correction value database 2
Then, the correction processing of the duration time may be finished.

【００５２】このようにして、合成音声がリズム的に不
自然に感じられたときに、該当する音韻系列と対応する
継続時間長の系列の補正値とを入力することにより、補
正値データベースに、表１に示すような音韻系列（表１
においてローマ字表記で示す）と各音素の継続時間長と
の対が蓄積される。In this way, when synthetic speech is perceived as rhythmically unnatural, by inputting the corresponding phoneme sequence and the correction value of the corresponding duration length sequence, the correction value database Phonological sequences as shown in Table 1 (Table 1
In Romanized notation) and the duration of each phoneme is stored as a pair.

【００５３】[0053]

【表１】 [Table 1]

【００５４】次に、上述したようにして蓄積された補正
値データベース２１３を用いて、継続時間長を補正する
方法について説明する。図８に、本発明の音声合成動作
を表す流れ図を示す。Next, a method of correcting the duration time using the correction value database 213 accumulated as described above will be described. FIG. 8 is a flowchart showing the voice synthesis operation of the present invention.

【００５５】従来と同様にして読み上げテキストが入力
され（ステップ４０１）、テキスト解析部２０１，ピッ
チパターン生成部２０３，継続時間長生成部２０４によ
り、入力テキストに対応する音韻系列および韻律に関す
る情報とピッチパターンと各音韻の継続時間長とがそれ
ぞれ生成される（ステップ４０２）。The text to be read is input in the same manner as in the conventional case (step 401), and the text analysis unit 201, the pitch pattern generation unit 203, and the duration length generation unit 204 perform information and pitch regarding the phoneme sequence and prosody corresponding to the input text. A pattern and a duration of each phoneme are generated (step 402).

【００５６】次に、補正処理部２１４は、ステップ４０
２で得られた音韻系列について、アクセント句ごとに補
正値データベース２１３を参照し、該当する補正値を検
索する（ステップ４０３）。Next, the correction processing unit 214 proceeds to step 40.
With respect to the phoneme sequence obtained in 2, the correction value database 213 is referred to for each accent phrase, and a corresponding correction value is searched (step 403).

【００５７】ステップ４０３において、該当する補正値
が得られた場合は、ステップ４０４の肯定判定として、
継続時間長バッファ２０７の該当する継続時間長の系列
を補正値データベース２１３から検索された補正値で置
換し（ステップ４０５）、ステップ４０６に進む。一
方、ステップ４０４の否定判定の場合は、そのままステ
ップ４０６に進めばよい。If the corresponding correction value is obtained in step 403, an affirmative decision is made in step 404,
The series of the corresponding duration in the duration buffer 207 is replaced with the correction value retrieved from the correction value database 213 (step 405), and the process proceeds to step 406. On the other hand, in the case of negative determination in step 404, the process may proceed to step 406 as it is.

【００５８】このようにして、各アクセント句について
の補正処理が終了した後に、全てのアクセント句につい
ての補正が終了したか否かを判定し（ステップ４０
６）、否定判定の場合は、ステップ４０３に戻って、次
のアクセント句の補正処理を行う。また、このステップ
４０６の肯定判定の場合は、音韻系列バッファ２０５，
ピッチパターンバッファ２０６，継続時間長バッファ２
０７および波形辞書２０９の内容に基づいて、波形生成
部２０８が合成音声を生成して（ステップ４０７）、音
声合成処理を終了すればよい。In this way, after the correction processing for each accent phrase is completed, it is determined whether the correction for all accent phrases is completed (step 40).
6) If the determination is negative, the process returns to step 403 and the correction process for the next accent phrase is performed. If the determination in step 406 is affirmative, the phoneme sequence buffer 205,
Pitch pattern buffer 206, duration buffer 2
Based on the contents of 07 and the waveform dictionary 209, the waveform generation unit 208 may generate a synthesized voice (step 407) and end the voice synthesis process.

【００５９】このように、補正処理の対象となった音韻
系列について、継続時間長生成部２０４で得られた継続
時間長の系列を継続時間長の補正値で置き換えることに
より、他の音韻系列に悪影響を及ぼすことなく、該当す
る音韻系列について自然な合成音声を得ることができ
る。As described above, for the phoneme sequence subjected to the correction process, the sequence of the duration time obtained by the duration time generation unit 204 is replaced with the correction value of the duration time, so that another phoneme sequence is obtained. It is possible to obtain natural synthesized speech for the corresponding phoneme sequence without adversely affecting it.

【００６０】また、この場合は、現象から規則を導き出
す必要がないから、利用者にも手軽に修正することがで
き、補正値データベース２１３に補正値を蓄積していく
ことにより、様々な入力テキストに柔軟に対応すること
が可能である。Further, in this case, since it is not necessary to derive the rule from the phenomenon, the correction can be easily made by the user. By accumulating the correction values in the correction value database 213, various input texts can be obtained. It is possible to respond flexibly to.

【００６１】このようにして、利用者が音韻の継続時間
長を容易に修正可能としたことにより、不自然な印象を
与える合成音声を効果的に修正し、音声規則合成装置に
よる合成音声の品質を向上することができる。In this way, the user can easily modify the duration of the phoneme, so that the synthetic speech that gives an unnatural impression is effectively corrected, and the quality of the synthesized speech produced by the speech rule synthesizing device is improved. Can be improved.

【００６２】なお、入力テキストの解析処理と継続時間
長の置換処理と合成音声の生成処理とはそれぞれ独立に
処理可能であるから、これらの処理を適当な時間差をお
いて並行して行ってもよい。Since the input text analysis process, the duration replacement process, and the synthetic voice generation process can be performed independently of each other, these processes may be performed in parallel with an appropriate time difference. Good.

【００６３】次に、請求項２の音声規則合成装置の実施
例について説明する。図９は、請求項２の音声規則合成
装置の実施例構成図である。図９において、音声規則合
成装置は、図５に示した音声規則合成装置に、テキスト
バッファ２１５を付加するとともに、ユーザーインタフ
ェース部２２０のキーボード２２１の代わりに、テキス
ト表示データ作成部２２６と補正系列抽出部２２７とを
備えた構成となっている。Next, an embodiment of the speech rule synthesizing apparatus of claim 2 will be described. FIG. 9 is a block diagram of an embodiment of the speech rule synthesizing apparatus of claim 2. In FIG. 9, the speech rule synthesizing device adds a text buffer 215 to the speech rule synthesizing device shown in FIG. 5, and replaces the keyboard 221 of the user interface unit 220 with a text display data creating unit 226 and a correction sequence extracting unit. And a part 227.

【００６４】上述したテキストバッファ２１５は、入力
されるテキストを保持するとともにテキスト解析部２０
１に送出する構成となっている。また、テキスト表示デ
ータ作成部２２６は、波形生成部２０８から波形生成処
理の対象となる音韻系列の通知を受け、この通知とテキ
ストバッファ２１５の内容とに基づいて、波形生成部２
０８による音声合成動作と同期して、該当する音韻系列
を示すテキストを表示するためのテキスト表示データを
作成する構成となっている。The above-mentioned text buffer 215 holds the input text and at the same time the text analysis unit 20.
It is configured to be sent to 1. Further, the text display data creation unit 226 receives the notification of the phoneme sequence that is the target of the waveform generation processing from the waveform generation unit 208, and based on this notification and the contents of the text buffer 215, the waveform generation unit 2
The text display data for displaying the text indicating the corresponding phoneme sequence is created in synchronization with the voice synthesis operation by 08.

【００６５】このテキスト表示データ作成部２２６は、
例えば、テキストバッファ２１５内のテキストに含まれ
る文字コードを表示画面に合わせて配列してテキスト表
示データを作成し、表示部２２４内の表示データバッフ
ァ（図示せず）に送出すればよい。また、更に、上述し
た音韻系列の通知に応じて、対応するアクセント句ある
いは音節に対応する部分のテキスト表示データについ
て、上述した表示データバッファ内の対応する表示属性
データを更新すればよい。This text display data creating section 226
For example, the character codes included in the text in the text buffer 215 may be arranged according to the display screen to create text display data, and the text display data may be sent to a display data buffer (not shown) in the display unit 224. Further, in accordance with the notification of the phoneme sequence described above, the corresponding display attribute data in the display data buffer described above may be updated for the text display data of the part corresponding to the corresponding accent phrase or syllable.

【００６６】このようにして得られたテキスト表示デー
タに基づいて、表示部２２４が表示動作を行うことによ
り、波形生成部２０８による音声合成動作に同期して、
入力テキストを表示することができる。The display unit 224 performs a display operation based on the text display data obtained in this way, in synchronization with the voice synthesis operation by the waveform generation unit 208.
Input text can be displayed.

【００６７】また、この場合に、入力解析部２２３は、
利用者によってマウス２２２が操作され、表示画面上の
範囲が指定されたときに、該当する表示画面上の位置を
補正系列抽出部２２７に通知する構成とすればよい。ま
た、これに応じて、補正系列抽出部２２７は、上述した
表示データバッファを参照して、指定された範囲に対応
するテキストデータを抽出し、補正系列バッファ２１１
に送出する構成とすればよい。Further, in this case, the input analysis unit 223
When the mouse 222 is operated by the user to specify the range on the display screen, the correction sequence extraction unit 227 may be notified of the position on the display screen. In response to this, the correction series extraction unit 227 refers to the above-mentioned display data buffer to extract the text data corresponding to the specified range, and the correction series buffer 211
It may be configured to send to the.

【００６８】すなわち、テキスト表示データ作成部２２
６と表示部２２４とによって、表示手段１２１の機能が
実現され、マウス２２２と入力解析部２２３と補正系列
抽出部２２７とにより、音韻系列抽出手段１２２の機能
が実現されている。That is, the text display data creation unit 22
6 and the display unit 224 realize the function of the display unit 121, and the mouse 222, the input analysis unit 223, and the correction sequence extraction unit 227 realize the function of the phoneme sequence extraction unit 122.

【００６９】この場合は、表示部２２４によって、合成
音声とテキストの該当部分との対応関係が明示されてい
るから、利用者は、表示部２２４で表示されるテキスト
を見ることにより、波形生成部２０８で合成された音声
に対応するテキストの部分を直観的に把握することがで
きる。また、マウス２２２を操作して、該当する部分を
指定することにより、不自然な印象を持ったアクセント
句や音節を容易に指定することが可能であるから、音韻
系列の入力作業を簡易に行うことができる。In this case, the display unit 224 clearly indicates the correspondence between the synthesized voice and the corresponding portion of the text. Therefore, the user can see the text displayed on the display unit 224 to display the waveform generation unit. It is possible to intuitively understand the text portion corresponding to the voice synthesized in 208. Further, by operating the mouse 222 to specify the corresponding portion, it is possible to easily specify an accent phrase or syllable having an unnatural impression, so that the input work of the phonological sequence is easily performed. be able to.

【００７０】ところで、上述した音韻継続時間長の修正
作業では、アクセント句に含まれる全ての音韻の継続時
間長を修正可能としており、修正の自由度が大きいため
に、音声学に関する知識を持たない利用者にとっては、
かえって修正が困難となる場合も考えられる。By the way, in the above phoneme duration correction work, the duration of all phonemes included in the accent phrase can be corrected, and since the degree of freedom of correction is great, no knowledge of phonetics is given. For users,
On the contrary, it may be difficult to correct.

【００７１】これに対して、幾つかの修正例から最も適
当なものを選択する作業は、専門知識のない利用者にも
容易に行うことができる。請求項３の発明は、音韻継続
長の修正作業を修正例からの選択作業に置き換えて、専
門知識のない利用者にとって使いやすいシステムを提供
するためのものである。On the other hand, the operation of selecting the most appropriate one from several modifications can be easily performed by a user who does not have specialized knowledge. The invention of claim 3 is to replace the phoneme duration correction work with a selection work from the correction examples to provide a system that is easy to use for users who do not have specialized knowledge.

【００７２】図１０に、請求項３の音声規則合成装置の
実施例構成図を示す。図１０において、音声規則合成装
置は、図５に示した音声規則合成装置の補正値作成部２
１２に代えて、ｎ個の規則修正部２１６₁〜２１６_nと
セレクタ２１７とデマルチプレクサ（図１０において符
号ＤＭＰＸで示す）２１８とを備え、セレクタ２１７の
出力をデマルチプレクサ２１８が継続時間長バッファ２
０７あるいは補正値データベース２１３の一方に送出す
る構成となっている。FIG. 10 shows a block diagram of an embodiment of the speech rule synthesizing apparatus of claim 3. In FIG. 10, the speech rule synthesizing device is the correction value creation unit 2 of the speech rule synthesizing device shown in FIG.
Instead of 12, the number of rule correction units 216 _{1 to} 216 _n , a selector 217, and a demultiplexer (denoted by DMPX in FIG. 10) 218 are provided, and the demultiplexer 218 outputs the output of the selector 217 to the duration buffer 2.
07 or correction value database 213.

【００７３】規則修正部２１６₁〜２１６_nは、音韻系
列と対応する継続時間長の系列との入力に応じて、それ
ぞれ異なる規則を用いて、各音韻の継続時間長を修正す
る構成となっている。The rule modifying units 216 _{1 to} 216 _n are configured to modify the duration of each phoneme by using different rules according to the input of the phoneme sequence and the sequence of the corresponding duration. There is.

【００７４】規則修正部２１６₁〜２１６_nが適用する
規則としては、例えば、以下に挙げるものが考えられ
る。 (a) 撥音および促音の継続時間長の値を選択的に修正す
る。 (b) 各音韻に設定された重みに応じて、平均的な継続時
間長に近づける。 (c) 連続する無声摩擦音および無声破裂音の継続時間長
を選択的に短縮する。 (d) 連続する有声摩擦音および有声破裂音の継続時間長
を選択的に延長する。As the rules applied by the rule correction units 216 _{1 to} 216 _n , the following can be considered, for example. (a) Selectively correct the duration of repellant and consonant sounds. (b) The average duration is approximated according to the weight set for each phoneme. (c) Selectively reduce the duration of continuous unvoiced fricatives and unvoiced plosives. (d) Selectively extend the duration of consecutive voiced fricatives and voiced plosives.

【００７５】この場合は、表示データ作成部２２５は、
入力解析部２２３からの指示に応じて、規則修正部２１
６₁〜２１６_nが適用する規則の内容や設定可能なパラ
メータなどを表す表示データを作成して表示部２２４に
送出し、利用者に規則の選択およびパラメータの入力を
促せばよい。In this case, the display data creating section 225
In response to an instruction from the input analysis unit 223, the rule correction unit 21
It suffices to create display data representing the contents of the rules applied by 6 _{1 to} 216 _n and the parameters that can be set, and send it to the display unit 224 to prompt the user to select the rules and input the parameters.

【００７６】また、利用者によってマウス２２２などが
操作され、選択する規則およびパラメータが入力された
ときに、入力解析部２２３は、該当する規則修正部２１
６に入力されたパラメータを送出するとともに、セレク
タ２１７に対して、該当する規則修正部２１６の出力を
選択する旨を指示すればよい。Further, when the user operates the mouse 222 or the like to input the rule and the parameter to be selected, the input analysis section 223 causes the corresponding rule correction section 21 to operate.
It is sufficient to send out the parameters input to No. 6 and to instruct the selector 217 to select the output of the corresponding rule modification unit 216.

【００７７】これにより、補正対象の音韻系列に利用者
が選択した規則を適用して得られた継続時間長の修正値
が得られ、デマルチプレクサ２１８を介して継続時間長
バッファ２０７に送出され、波形生成部２０８により、
この修正値を用いた合成音声が生成される。As a result, the correction value of the duration length obtained by applying the rule selected by the user to the phoneme sequence to be corrected is obtained and sent to the duration buffer 207 via the demultiplexer 218. By the waveform generation unit 208,
A synthetic voice is generated using this modified value.

【００７８】すなわち、規則修正部２１６₁〜２１６_n
およびセレクタ２１７により、規則選択手段１３１およ
び修正手段１３２と同等の機能を実現する構成となって
いる。That is, the rule correction units 216 _{1 to} 216 _n
Further, the selector 217 is configured to realize the same function as that of the rule selection means 131 and the correction means 132.

【００７９】この場合は、利用者は、この合成音声を聞
いて修正結果を確認し、適用する規則を変えたり、設定
するパラメータを変更したりして、試行錯誤を繰り返
し、最良のものが得られたときに、マウス２２２などを
操作して、修正結果を補正値として決定する旨を指示す
ればよい。In this case, the user listens to the synthesized voice to confirm the correction result, changes the rule to be applied, changes the parameter to be set, and repeats trial and error to obtain the best one. At this time, the mouse 222 or the like may be operated to give an instruction to determine the correction result as the correction value.

【００８０】これに応じて、入力解析部２２３は、デマ
ルチプレクサ２１８を切り換えることにより、セレクタ
２１７の出力を補正値データベース２１３に送出し、利
用者が選択して最良の修正結果を継続時間長の系列の補
正値として蓄積すればよい。In response to this, the input analysis unit 223 switches the demultiplexer 218 to send the output of the selector 217 to the correction value database 213, and the user selects the best correction result for the duration time. It may be accumulated as a correction value of the series.

【００８１】このように、利用者が選択した規則によっ
て、修正の対象となる音韻の種類などを限定することに
より、過大な自由度を削減して、専門知識のない利用者
でも効果的に継続時間長の系列を修正することが可能と
なる。また、修正に用いる規則ごとに、継続時間長の伸
縮のためのパラメータや重み付けを変更可能とすること
により、様々な音韻系列に柔軟に対応することが可能で
ある。As described above, by limiting the types of phonemes to be modified according to the rules selected by the user, excessive freedom can be reduced, and even users without specialized knowledge can continue effectively. It is possible to correct the time length series. In addition, it is possible to flexibly cope with various phoneme sequences by making it possible to change the parameters and weights for expanding and contracting the duration length for each rule used for correction.

【００８２】一方、利用者が補正対象となる音韻系列を
発声し、この音声情報から継続時間長の補正値を抽出す
ることも可能である。図１１に、請求項４の音声規則合
成装置の実施例構成図を示す。On the other hand, it is also possible that the user utters the phoneme sequence to be corrected and extracts the correction value of the duration time from this voice information. FIG. 11 shows a block diagram of an embodiment of the speech rule synthesizing apparatus of claim 4.

【００８３】図１１において、音声規則合成装置は、図
５に示した音声規則合成装置の補正値作成部２１２に代
えて継続時間長抽出部２１９を備え、また、ユーザイン
タフェース部２２０に音声入力用のマイク２２８と音声
入力バッファ２２９とを付加し、マイク２２８を介して
入力した音声を音声入力バッファ２２９に保持する構成
となっている。In FIG. 11, the speech rule synthesizing device includes a duration extracting unit 219 instead of the correction value creating unit 212 of the speech rule synthesizing device shown in FIG. 5, and a user interface unit 220 for speech input. The microphone 228 and the voice input buffer 229 are added, and the voice input through the microphone 228 is held in the voice input buffer 229.

【００８４】また、この場合に、表示データ作成部２２
５は、波形生成部２０８で得られた合成音声の波形の代
わりに、音声入力バッファ２２９に保持された入力音声
の波形を表示する。また、マウス２２２によって指定さ
れた位置を示す情報が入力解析部２２３を介して入力さ
れ、これに応じて、この表示データ作成部２２５は、上
述した入力音声の波形における各音素の境界位置を表示
する表示データを作成する。Further, in this case, the display data creating section 22
5 displays the waveform of the input voice held in the voice input buffer 229, instead of the waveform of the synthesized voice obtained by the waveform generation unit 208. In addition, information indicating the position designated by the mouse 222 is input via the input analysis unit 223, and in response to this, the display data creation unit 225 displays the boundary position of each phoneme in the waveform of the input speech described above. Create the display data.

【００８５】このように、表示データ作成部２２５が入
力音声に対応する表示を行い、マウス２２２を介して波
形における各音素の境界位置を入力することにより、境
界設定手段１４１の機能を実現し、入力音声における各
音素の境界位置を設定することができる。In this way, the display data creating section 225 performs a display corresponding to the input voice and inputs the boundary position of each phoneme in the waveform via the mouse 222, thereby realizing the function of the boundary setting means 141. The boundary position of each phoneme in the input voice can be set.

【００８６】継続時間長抽出部２１９は、継続時間長抽
出手段１４２に相当するものであり、表示データ作成部
２２５で作成された表示データから、上述したようにし
て設定された各音素の境界位置を求め、これらの境界位
置から各音素の継続時間長を算出して、補正値データベ
ース２１３に登録する構成とすればよい。The duration extraction unit 219 corresponds to the duration extraction unit 142, and based on the display data created by the display data creation unit 225, the boundary position of each phoneme set as described above. Is calculated, the duration of each phoneme is calculated from these boundary positions, and registered in the correction value database 213.

【００８７】これにより、利用者が補正対象となる音韻
系列を発声し、この音声情報に基づいて、該当する音韻
系列に対応する継続時間長の系列を修正し、補正値デー
タベース２１３に補正値を蓄積していくことができる。As a result, the user utters the phoneme sequence to be corrected, corrects the sequence of duration length corresponding to the corresponding phoneme sequence based on this voice information, and stores the correction value in the correction value database 213. You can accumulate.

【００８８】この場合は、入力音声の波形について、音
素の境界位置を適切に指示すれば、人間の発声に極めて
近い継続時間長の系列を得ることができ、試行錯誤を繰
り返す必要がない。したがって、継続時間長の修正作業
を迅速かつ的確に行うことができる。In this case, if the boundary position of the phoneme is appropriately indicated in the waveform of the input voice, a series of duration length extremely close to human utterance can be obtained, and it is not necessary to repeat trial and error. Therefore, the correction work of the duration time can be performed quickly and appropriately.

【００８９】更に、表示データ作成部２２５において、
継続時間長生成部２０４で生成された継続時間長の系列
に基づいて、補正前の各音素の区切り位置を示す表示デ
ータを作成し、図７に示した表示画面と同様にして、上
述した入力音声の波形とともに表示してもよい。Further, in the display data creating section 225,
Based on the sequence of durations generated by the duration generator 204, display data indicating the delimiter position of each phoneme before correction is created, and the above-described input is performed in the same manner as the display screen shown in FIG. It may be displayed together with the sound waveform.

【００９０】この場合は、入力音声に含まれる各音素の
区切り位置について、おおよその目安を利用者に与える
ことができるから、入力音声における音素の境界を入力
する作業をより容易にすることができる。In this case, the user can be given an approximate guideline about the delimiter position of each phoneme included in the input voice, so that the work of inputting the boundary of the phoneme in the input voice can be made easier. .

【００９１】また、補正前の合成音声の波形と入力音声
の波形とを対比することにより、音韻の境界位置を自動
的に設定することも可能である。図１３に、請求項５の
音声規則合成装置の実施例構成図を示す。It is also possible to automatically set the boundary position of the phoneme by comparing the waveform of the synthesized voice before correction with the waveform of the input voice. FIG. 13 shows a block diagram of an embodiment of the speech rule synthesizing device of the fifth aspect.

【００９２】図１３において、音声規則合成装置は、図
１１に示した音声規則合成装置の表示データ作成部２２
５に代えて、マッチング処理部２３１を備えて構成され
ている。In FIG. 13, the voice rule synthesizing device is the display data creating section 22 of the voice rule synthesizing device shown in FIG.
Instead of 5, the matching processing unit 231 is provided.

【００９３】このマッチング処理部２３１は、マッチン
グ手段１４３に相当するものであり、例えば、動的計画
法(DP, Dynamic Programming) を用いて、音声入力バッ
ファ２２９に保持された入力音声の波形と波形生成部２
０８で生成された合成音声の波形とをマッチングさせる
構成となっている。The matching processing section 231 corresponds to the matching means 143, and uses, for example, dynamic programming (DP, Dynamic Programming) to form the waveform and waveform of the input voice held in the voice input buffer 229. Generator 2
It is configured to match the waveform of the synthetic voice generated in 08.

【００９４】これにより、入力音声における各音韻と合
成音声における各音韻とを自動的に対応づけて、入力音
声における各音韻の境界位置を設定することができるか
ら、継続時間長抽出部２１９は、マッチング処理部２３
１で得られた各音韻の境界位置に基づいて継続時間長を
それぞれ算出し、補正値データベース２１３に登録すれ
ばよい。As a result, the boundary position of each phoneme in the input voice can be set by automatically associating each phoneme in the input voice with each phoneme in the synthetic voice, so that the duration extraction unit 219 Matching processing unit 23
The duration length may be calculated based on the boundary position of each phoneme obtained in 1 and registered in the correction value database 213.

【００９５】この場合は、補正対象のアクセント句を指
定して、そのアクセント句を発声するだけで、該当する
継続時間長の系列を補正することができるから、継続時
間長の補正処理をより簡単に行うことができ、利用者に
とって使いやすいシステムを提供することができる。In this case, the sequence of the corresponding duration can be corrected simply by designating the accent phrase to be corrected and uttering the accent phrase. Therefore, the correction process of the duration is simpler. It is possible to provide a system that is easy for users to use.

【００９６】[0096]

【発明の効果】以上説明したように本発明は、補正すべ
き音韻系列と各音韻に対応する継続時間長の補正値とを
入力し、この音韻系列と補正値とを対として蓄積してお
き、該当する音韻系列の読み上げ音声を合成する際に、
蓄積された補正値を適用することができる。これによ
り、リズム的に不自然に感じられる音韻系列を利用者が
発見したときに、他の音韻系列に悪影響を及ぼすことな
く、容易に補正することが可能となり、利用者の用途に
柔軟に対応して高品質の合成音声を生成する音声規則合
成装置を提供することができる。As described above, according to the present invention, the phoneme sequence to be corrected and the correction value of the duration time corresponding to each phoneme are input, and the phoneme sequence and the correction value are stored as a pair. , When synthesizing the reading voice of the corresponding phoneme sequence,
The accumulated correction value can be applied. As a result, when a user discovers a phoneme sequence that feels rhythmically unnatural, it can be easily corrected without adversely affecting other phoneme sequences, and can be flexibly adapted to the user's application. As a result, it is possible to provide a speech rule synthesizing device that generates high quality synthetic speech.

【００９７】請求項２の発明は、音声合成動作に同期し
て、テキストの少なくとも一部を表示することにより、
合成音声とテキストの該当部分との対応関係を明示し、
補正対象の音韻系列を入力する作業を簡易化することが
できる。According to the invention of claim 2, at least a part of the text is displayed in synchronization with the voice synthesis operation.
Clarify the correspondence between the synthesized voice and the corresponding part of the text,
The work of inputting the phoneme sequence to be corrected can be simplified.

【００９８】また、請求項３の発明は、継続時間長の補
正値を作成する作業をいくつかの候補から１つを選択す
る作業で置き換えて、補正作業をより簡易化することが
できる。これにより、専門知識を持たない利用者にとっ
ても、使いやすいシステムを備えた音声規則合成装置を
提供することができる。Further, the invention of claim 3 can further simplify the correction work by replacing the work of creating the correction value of the duration time with the work of selecting one from several candidates. As a result, it is possible to provide a voice rule synthesizing device having a system that is easy to use even for a user who does not have specialized knowledge.

【００９９】更に、請求項４および請求項５の発明は、
利用者が補正対象の音韻系列を発声した際の音声を入力
し、この入力音声における各音韻の境界位置を利用者の
指示に応じてあるいは合成音声とのマッチング結果に応
じて抽出することにより、継続時間長の補正値を得るこ
とができる。これにより、補正作業を簡易化し、専門知
識を持たない利用者にとっても、使いやすいシステムを
備えた音声規則合成装置を提供することができる。Further, the inventions of claims 4 and 5 are
By inputting the voice when the user utters the phoneme sequence to be corrected, by extracting the boundary position of each phoneme in this input voice according to the user's instructions or according to the matching result with the synthesized voice, The correction value of the duration time can be obtained. As a result, it is possible to provide a voice rule synthesizing device having a system that simplifies the correction work and is easy to use even for users who do not have specialized knowledge.

[Brief description of drawings]

【図１】請求項１および請求項２の音声規則合成装置の
原理ブロック図である。FIG. 1 is a principle block diagram of a voice rule synthesizing apparatus according to claims 1 and 2.

【図２】請求項３の音声規則合成装置の原理ブロック図
である。FIG. 2 is a block diagram showing the principle of the speech rule synthesizing device of claim 3;

【図３】請求項４の音声規則合成装置の原理ブロック図
である。FIG. 3 is a block diagram showing the principle of the speech rule synthesizing device of claim 4;

【図４】請求項５の音声規則合成装置の原理ブロック図
である。FIG. 4 is a block diagram showing the principle of the speech rule synthesizing device of claim 5;

【図５】請求項１の音声規則合成装置の実施例構成図で
ある。FIG. 5 is a configuration diagram of an embodiment of a voice rule synthesizing device according to claim 1;

【図６】継続時間長の補正値を入力する動作を表す流れ
図である。FIG. 6 is a flowchart showing an operation of inputting a correction value of a duration time.

【図７】補正値の入力を補助する表示画面の説明図であ
る。FIG. 7 is an explanatory diagram of a display screen that assists in inputting a correction value.

【図８】本発明の音声合成動作を表す流れ図である。FIG. 8 is a flowchart showing a voice synthesizing operation of the present invention.

【図９】請求項２の音声規則合成装置の実施例構成図で
ある。FIG. 9 is a configuration diagram of an embodiment of a voice rule synthesizing device according to claim 2;

【図１０】請求項３の音声規則合成装置の実施例構成図
である。FIG. 10 is a configuration diagram of an embodiment of a voice rule synthesizing apparatus according to claim 3;

【図１１】請求項４の音声規則合成装置の実施例構成図
である。FIG. 11 is a configuration diagram of an embodiment of a voice rule synthesizing device according to claim 4;

【図１２】請求項５の音声規則合成装置の実施例構成図
である。FIG. 12 is a configuration diagram of an embodiment of a voice rule synthesizing apparatus according to claim 5;

【図１３】従来の音声規則合成装置の構成図である。FIG. 13 is a block diagram of a conventional speech rule synthesizing device.

[Explanation of symbols]

１０１テキスト解析手段１０２ピッチパターン生成手段１０３継続時間長生成手段１０４波形生成手段１１１入力手段１１２蓄積手段１１３置換手段１２１表示手段１２２音韻系列抽出手段１３１規則選択手段１３２修正手段１４１境界設定手段１４２継続時間長抽出手段１４３マッチング手段２０１テキスト解析部２０２単語辞書２０３ピッチパターン生成部２０４継続時間長生成部２０５音韻系列バッファ２０６ピッチパターンバッファ２０７継続時間長バッファ２０８波形生成部２０９波形辞書２１０スピーカ２１１補正系列バッファ２１２補正値作成部２１３補正値データベース２１４継続時間長補正部２１５テキストバッファ２１６規則修正部２１７セレクタ２１８デマルチプレクサ（ＤＭＰＸ）２１９継続時間長抽出部２２０ユーザインタフェース部２２１キーボード２２２マウス２２３入力解析部２２４表示部２２５表示データ作成部２２６テキスト表示データ作成部２２７補正系列抽出部２２８マイク２２９音声入力バッファ２３１マッチング処理部 101 Text Analysis Means 102 Pitch Pattern Generation Means 103 Duration Length Generation Means 104 Waveform Generation Means 111 Input Means 112 Accumulation Means 113 Replacement Means 121 Display Means 122 Phoneme Sequence Extraction Means 131 Rule Selection Means 132 Correction Means 141 Boundary Setting Means 142 Duration Times Length extraction means 143 Matching means 201 Text analysis section 202 Word dictionary 203 Pitch pattern generation section 204 Duration duration generation section 205 Phonological sequence buffer 206 Pitch pattern buffer 207 Duration duration buffer 208 Waveform generation section 209 Waveform dictionary 210 Speaker 211 Correction sequence buffer 212 correction value creation unit 213 correction value database 214 duration correction unit 215 text buffer 216 rule correction unit 217 selector 218 demultiplexer DMPX) 219 duration extraction unit 220 the user interface unit 221 keyboard 222 mouse 223 input analyzer 224 display unit 225 display data producing unit 226 text display data producing unit 227 correcting sequence extraction unit 228 microphone 229 audio input buffer 231 matching processor

Claims

[Claims]

1. A pitch pattern generating means (102) and a duration length generating means (103) are based on information about a phoneme sequence and prosody obtained by analyzing the input text by a text analyzing means (101). In a speech rule synthesizing device in which a pitch pattern representing a temporal change in frequency and a sequence of durations of phonemes are respectively generated, and a waveform generation means (104) synthesizes a speech based on these information, An input unit (111) for inputting a pair with a duration time of each phoneme included in the phoneme sequence, a storage unit (112) for accumulating the phoneme sequence and a sequence of a corresponding duration time, and a phoneme sequence The series of durations from the duration generating means (103) is stored when the series of the corresponding durations is searched by referring to the accumulating means (112) according to the input. A speech rule synthesizing device comprising: a replacement unit (113) for replacing with a sequence of durations retrieved from the multiplication unit (112) and sending it to the waveform generation unit (104).

2. The voice rule synthesizing apparatus according to claim 1, wherein the input unit (111) receives information indicating a phoneme sequence corresponding to the synthesized voice from the waveform generation unit (104), A display unit (121) for displaying at least a portion corresponding to the phoneme sequence, and a phoneme sequence displayed at the corresponding position is extracted in response to an input of a position on the display screen of the display unit (121). And a phoneme sequence extraction means (122) for sending to a storage means (112).

3. The voice rule synthesizing apparatus according to claim 1, wherein the input unit (111) selects a corresponding rule from a plurality of preset rules in response to the input of the selection instruction. 131) and the rule selection means (131) according to the input of the phoneme sequence.
A voice rule synthesizing system, comprising: a correction means (132) for applying the rule selected in step 1 to correct the duration and sending the obtained correction value to the storage means (112). apparatus.

4. The voice rule synthesizing apparatus according to claim 1, wherein the input unit (111) receives a voice corresponding to a phoneme sequence, and each phoneme in the waveform of the voice according to an input of a boundary designation instruction. Boundary setting means (141) for setting the boundary position of the phoneme, and based on the boundary position of each phoneme in the waveform of the voice,
Duration extracting means (1) for extracting the duration of each phoneme
42) and a voice rule synthesizing device.

5. The speech rule synthesizing apparatus according to claim 1, wherein the input unit (111) generates a voice corresponding to a phoneme sequence and a synthesized voice generated by a waveform generation unit (104) corresponding to the phoneme sequence. Is input, each phoneme in the voice is associated with each phoneme in the synthesized voice, and a matching unit (143) that sets a boundary position of each phoneme in the voice, and a boundary position of each phoneme in the voice. A speech rule synthesizing device comprising: a duration duration extracting means (142) for extracting duration duration of each phoneme in the speech based on the above.