JPH02234198A

JPH02234198A - Text voice synthesizing system

Info

Publication number: JPH02234198A
Application number: JP1054793A
Authority: JP
Inventors: Shiyouichi Sasabe; 佐々部　昭一; Junko Komatsu; 小松　順子; Tetsuya Sakayori; 哲也酒寄
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-03-07
Filing date: 1989-03-07
Publication date: 1990-09-17
Anticipated expiration: 2014-08-23
Also published as: JP2938466B2

Abstract

PURPOSE:To obtain a natural utterance generated by desired reading and an accent by providing a function for dividing a word or editing a result of analysis of language. CONSTITUTION:A Japanese language sentence is inputted from a character input part 1, a character-string to be uttered is sent to a morpheme analyzing part 2, and the morpheme analyzing part 2 divides the character-string into word units, adds form information such as a part of speech, an inflective form, etc., and an accent type, a reading symbol, etc., to them, respectively and outputs them. An analysis result editing part 3 corrects these information in accordance with necessity and a desire through input/output devices 9, 10 and an input/output control part 8, and thereafter, a combination analyzing part 4 extracts a combined relation, syntax information, a degree of inter-sentence/ clause separation, etc., a meter symbol generating part 5 generates a meter symbol train, and a voice synthesizing part 6 synthesizes a voice. In such a way, a composite voice by desired reading and an accent can be obtained.

Description

【発明の詳細な説明】技監分見本発明は、テキスト音声合成システムに関し、より詳細
には、単語分割あるいは言語解析結果を編集する機能を
備えたテキスト音声合成システムに関する．従遼ｎテキスト音声合成システムでは、一般に、読み、アクセ
ント、ポーズ，イントネーシ１ンを生成するために、言
語解析処理を行なって入力文字列を単語単位に分割し、
それらの単語の読み、アクセント、品詞情報などを抽出
し、これらの情報から音韻韻律記号を生成している．し
かしながら、該言語解析処理で誤解析を生じることが少
なくない．したがって、正しい読み，アクセント，イン
トネーションを与えるため、従来は以下のような方法を
採っている．（１）言語解析、音韻韻律記号生成を行なった結果、得
られる音韻韻律記号列の一部を修正することによって正
しい発声を実行させる。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a text-to-speech synthesis system, and more particularly, to a text-to-speech synthesis system having functions for word segmentation and editing of language analysis results. Generally speaking, text-to-speech synthesis systems perform language analysis processing to divide input strings into word units in order to generate pronunciations, accents, pauses, and intonation.
The pronunciation, accent, part of speech information, etc. of those words are extracted, and phonological and prosodic symbols are generated from this information. However, incorrect analysis often occurs in the language analysis process. Therefore, in order to give the correct pronunciation, accent, and intonation, the following methods have traditionally been used. (1) Correct utterance is performed by correcting a part of the phonological and metrical symbol string obtained as a result of language analysis and phonological and metrical symbol generation.

（２）テキスト音声合成専用の入力文章作成ワープロで
、そのかな漢字変換の際，同時に単語，文節の区切り、
読みを確定し，言語解析処理でそれらのアクセントを定
める．また，テキスト音声合成システムでは、前述のように、
読み、アクセント、ポーズ，イントネーションを生成す
るために、言語解析処理を行なって入力文字列を単語単
位に分割し，それらの単語の読み、アクセント、品詞情
報などを抽出し、これらの情報から音韻韻律記号を生成
しているので、この言語解析に使用する単語辞書にあら
ゆる単語を登録するのは無理であり、通常、数十万語程
度であり，略語や固有名詞などが未登録語となることが
ある．よって，一般のは言語解析処理中で未登録語処理
を行ない、その単語表記、読み，アクセントなどを決定
している．しかしながら，この未ｇ：．録語処理で誤り
が生じることが少なくない．また，日本語には同形語（
表記が同じで読みが異なる単ｉ！？ｌ）が存在し、これ
らを正しく判断して一つの単語に決定することも容易で
はなく，誤りが生じることが少なくない．したがって，
所望の発声を行なわせるための手段が必要となる。(2) An input sentence creation word processor dedicated to text-to-speech synthesis, which simultaneously separates words and phrases during Kana-Kanji conversion.
Determine the pronunciations and determine their accents through language analysis processing. In addition, in text-to-speech synthesis systems, as mentioned above,
In order to generate pronunciation, accent, pause, and intonation, we perform language analysis processing to divide the input string into words, extract the pronunciation, accent, part of speech information, etc. of each word, and then extract phonological and prosody information from this information. Since symbols are generated, it is impossible to register every word in the word dictionary used for this language analysis, and it is usually around several hundred thousand words, and abbreviations and proper nouns may be unregistered words. There is. Therefore, in general, unregistered word processing is performed during language analysis processing to determine the word spelling, pronunciation, accent, etc. However, this non-g:. Errors often occur during word recording processing. Also, in Japanese there are homographs (
Single i has the same notation but different reading! ? l), and it is not easy to correctly judge these and decide on a single word, and errors often occur. therefore,
A means for making the desired utterances is required.

前記従来技術では以下のような問題点がある。The above-mentioned conventional technology has the following problems.

（１）では、音韻韻律記号が慣れない人にはわかりにく
い記号列である場合が多く、また、単語１こ対応する音
韻記号を見つけることには時間がかかるため、音韻韻律
記号を修正するのは容易ではない．（２）では，音声出
力をする場合には、必ず専用のワープロを使わなければ
ならず，他のテキストファイルをそのまま読ませること
ができない，また，アクセントの指定をするにはやはり
音韻韻律記号を修正しなければならない。In (1), the phonological and metrical symbols are often difficult to understand for people who are not familiar with them, and it takes time to find the phonological symbol that corresponds to one word, so it is difficult to correct the phonological and metrical symbols. is not easy. In (2), when outputting audio, a dedicated word processor must be used, and other text files cannot be read as is, and phonological and prosodic symbols are still used to specify accents. Must be corrected.

且一一首本発明は，上述のごとき欠点を解決するためになされた
もので、単語分割あるいは言語解析結果を編集する機能
を備えることによって、正しいあるいは所望の読みやア
クセントを指定できるようにし、より自然な発声ができ
るテキスト音声合成システムを提供することを目的とし
てなされたものである。The present invention has been made in order to solve the above-mentioned drawbacks, and by providing a function to edit word segmentation or language analysis results, it is possible to specify the correct or desired pronunciation or accent. The purpose of this project is to provide a text-to-speech synthesis system that can produce more natural speech.

盪一一底本発明は、上記目的を達成するために、文字情報を音声
に変換して出力するテキスト音声合成システムにおいて
，言語解析部と、言語解析結果編集部と、音韻・韻律記
号生成部と、規則音声合成部とを有し、言語解析結果の
修正，追加，削除が可能であること，更には、前記言語
解析結果編集部が，単語の区切り、単語の読み、アクセ
ント、品詞などについての編集を可能としたこと、更に
は、前記言語解析結果編集部が，韻律情報についての編
集を可能としたこと、更には、前記言語解析結果編集部
が、言語解析の結果により，同形語の存在が検出された
場合には該同形語の選択を可能としたことを特徴とした
ものである．以下、本発明の実施例に基づいて説明する
．第１図は、本発明によるテキスト音声合成システムの一
実施例を説明するための構成図で、図中、１は文字入力
部，２は形態素解析部、３は解析結果編集部、４は係り
受け解析部、５は韻律記号生成部，６は音声合成部、７
は単語辞書、８は表示入力制御部、９はディスプレイ，
１０はキーボード，１１はスピーカである．日本語文章は文字入力部１から入力され、必要ならば文
字変換などを行なって、発声させようとする文字列を形
態素解析部２に送る．形態素解析部２では文字列を単語
単位に分割し、それぞれに品詞，活用形などの形態情報
、および、アクセント型，読み記号などを付加して出力
する．解析結果編集部３では、入出力装置および入出力
制御部を介して、これらの情報を必要、所望に応じて修
正した後，係り受け解析部４で係り受け関係，構文情報
，文節間分離度などを抽出する．こうして得られた種々
の情報から，韻律記号生成部５で、読み、アクセント、
ポーズ、イントネーションなどを含む韻律記号列を生成
し、それに従って音声合成部６で音声を合成する．文字
入力部１、形態素解析部２，係り受け解析部４、韻律記
号生成部５，音声合成部６は、従来技術によって実現で
きる．解析結果の編集においては、抽出された少なくとも一つ
の、発声に関する情報を表示、修正できれば良く、例え
ば第２図のようにディスプレイ９に結果が表示され，修
正にはその箇所へカーソル（四角の点線で表示してある
）を移動して書き直すことができるようなエディターで
実現できる．それらの情報の表示では，読みはひらがな
，カタカナ、発音記号、音韻記号、音韻番号などで，品
詞、アクセントなどは、そのシステムにおける分類に基
づくコード，分類名で、表示すれば良い．また、同形語
が検出された場合に、その複数の候補を保持し、解析結
果編集部において選択できるようにすれば、所望の読み
とアクセントを与えることができる．さらに単語の分割
位置が誤っており、それを修正した際には、予め蓄えら
れた複数の解析結果から適した候補を表示したり、また
はそれ以降の文字列の解析をやり直し、新たに表示する
ようにすれば、修正の作業を軽減することができる．また，解析結果編集部を第１図の係り受け解析の次に配
置して、係り受け解析の結果も含めて編集するようにし
ても良い．そのようにすれば，ポーズ位置、ポーズ長，
呼気段落、声立てなども所望に合せて修正できる．さらに，解析結果および編集結果をファイルとして出力
し、そのファイルを読み込んで韻律記号の生成，音声の
合成が起動できるようにすれば、所望の発声形態を保存
でき、常に所望の合成音声を得ることができる．劾−−一釆一以上の説明から明らかなように、本発明によると，テキ
スト音声合成システムの言語解析結果に誤り、または未
登録があっても容易に修正でき，結果として所望の読み
，アクセントによる合成音声を得ることができる．(2) In order to achieve the above object, the present invention provides a text-to-speech synthesis system that converts character information into speech and outputs it, which includes a language analysis section, a language analysis result editing section, and a phonological/prosodic symbol generation section. , a regular speech synthesis section, and is capable of modifying, adding, and deleting language analysis results; and furthermore, the language analysis result editing section is capable of editing information such as word breaks, word pronunciations, accents, parts of speech, etc. Furthermore, the linguistic analysis result editing section can edit prosodic information based on the results of the linguistic analysis, This feature is characterized in that if a isomorphism is detected, the corresponding isomorphism can be selected. The present invention will be explained below based on examples. FIG. 1 is a block diagram for explaining one embodiment of the text-to-speech synthesis system according to the present invention. In the figure, 1 is a character input section, 2 is a morphological analysis section, 3 is an analysis result editing section, and 4 is a related part. 5 is a prosodic symbol generation section, 6 is a speech synthesis section, 7 is a receiving analysis section;
is a word dictionary, 8 is a display input control unit, 9 is a display,
10 is a keyboard, and 11 is a speaker. A Japanese sentence is input from the character input section 1, character conversion is performed if necessary, and the character string to be uttered is sent to the morphological analysis section 2. The morphological analysis unit 2 divides the character string into words and outputs each word with morphological information such as part of speech and conjugation, accent type, reading symbol, etc. added. The analysis result editing unit 3 modifies this information as necessary and desired via the input/output device and the input/output control unit, and then the dependency analysis unit 4 changes the dependency relationship, syntactic information, and the degree of separation between clauses. Extract etc. From the various information obtained in this way, the prosodic symbol generation unit 5 calculates the reading, accent,
A prosodic symbol string including pauses, intonation, etc. is generated, and the speech synthesizer 6 synthesizes speech accordingly. The character input section 1, the morphological analysis section 2, the dependency analysis section 4, the prosodic symbol generation section 5, and the speech synthesis section 6 can be realized by conventional techniques. When editing the analysis results, it is sufficient to be able to display and modify at least one extracted piece of utterance-related information. For example, the results are displayed on the display 9 as shown in Figure 2, and corrections can be made by moving the cursor (square dotted line) to the location. This can be achieved with an editor that allows you to move and rewrite the text (shown in ). When displaying such information, pronunciations can be displayed in hiragana, katakana, phonetic symbols, phonetic symbols, phonetic numbers, etc., and parts of speech, accents, etc. can be displayed as codes and classification names based on the classification in the system. Furthermore, if a homomorphic word is detected, multiple candidates can be stored and selected in the analysis result editing department, making it possible to give the desired pronunciation and accent. Furthermore, if the word division position is incorrect and it is corrected, suitable candidates are displayed from multiple analysis results stored in advance, or the subsequent character strings are reanalyzed and a new one is displayed. By doing so, the correction work can be reduced. Furthermore, the analysis result editing section may be placed next to the dependency analysis shown in FIG. 1, so that the results of the dependency analysis can also be edited. By doing so, the pose position, pose length,
You can also modify the exhalation paragraph and voice to suit your needs. Furthermore, by outputting the analysis results and editing results as a file, and reading the file to start generating prosodic symbols and synthesizing speech, it is possible to save the desired vocal form and always obtain the desired synthesized speech. Can be done. As is clear from the above explanation, according to the present invention, even if the language analysis result of the text-to-speech synthesis system is incorrect or unregistered, it can be easily corrected, and as a result, the desired pronunciation and accent can be obtained. You can obtain synthesized speech using

[Brief explanation of the drawing]

第１図は、本発明によるテキスト音声合成システムの一
実施例を説明するための構成図、第２図は，ディスプレ
イの表示を示す図である．１・・・文字入力部、２・・
・形態素解析部、３・・・解析結果編集部，４・・・係
り受け解析部，５・・・韻律記号生成部、６・・・音声
合成部、７・・・単語辞書、８・・・表示入力制御部，
９・・・ディスプレイ、１０・・・キーボード，１１・
・・スビーカ．ｌ第１図FIG. 1 is a block diagram for explaining an embodiment of a text-to-speech synthesis system according to the present invention, and FIG. 2 is a diagram showing a display. 1...Character input section, 2...
- Morphological analysis unit, 3... Analysis result editing unit, 4... Dependency analysis unit, 5... Prosodic symbol generation unit, 6... Speech synthesis unit, 7... Word dictionary, 8...・Display input control section,
9...Display, 10...Keyboard, 11.
... Subika. l Figure 1

Claims

[Claims] 1. A text-to-speech synthesis system that converts text information into speech and outputs the same, comprising a language analysis section, a language analysis result editing section, a phonological/prosodic symbol generation section, and a regular speech synthesis section. A text-to-speech synthesis system characterized in that it is capable of modifying, adding, and deleting language analysis results. 2. The text-to-speech synthesis system according to claim 1, wherein the language analysis result editing unit is capable of editing word divisions, word pronunciations, accents, parts of speech, and the like. 3. The text-to-speech synthesis system according to claim 1 or 2, wherein the language analysis result editing section is capable of editing prosodic information. 4. The language analysis result editing unit according to claim 1, 2 or 3, is capable of selecting a homograph when the existence of a homograph is detected as a result of the language analysis. Text-to-speech synthesis system.