JPH08248971A

JPH08248971A - Text reading aloud and reading device

Info

Publication number: JPH08248971A
Application number: JP7049436A
Authority: JP
Inventors: Takashi Endo; 隆遠藤; Shunichi Yajima; 俊一矢島; Nobuo Nukaga; 信尾額賀; Toshiyuki Aritsuka; 俊之在塚
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-03-09
Filing date: 1995-03-09
Publication date: 1996-09-27

Abstract

PURPOSE: To provide a text reading-out device which reads out the dialog parts appearing in text having no additive information with different sound quality with each speaker by estimating the speakers of these dialog parts. CONSTITUTION: This device is inputted with the text from a text data input device 206. A compsn. analysis program 101, speaker analysis program 102 and speaker data allotment program 103 stored in a ROM 205 for storing the programs are executed under control of a control program 120 by a CPU 201. The data base stored in a non-volatile memory 203 is referenced in this CPU. To which speaker the respective dialogs of the dialog table 211 and speaker table 212 in the memory 202 correspond and which speaker of the speaker data 113 stored in a speaker data storage device 204 is to be used for each speaker are set. A voice rule synthesis program 104 references the dialog table 211 and the speaker table 212 and forms the synthesized voices while choosing the adequate speaker data for the text. The synthesized voices are outputted from a D/A device 144.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テキスト情報を読み上
げるテキスト読み上げ装置に係り、特に小説などの台詞
を含むテキストや、レイアウトを持つテキスト構造を、
音質を変えて読み上げるテキスト読み上げ装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech device for reading out text information, and particularly to a text including dialogue such as a novel and a text structure having a layout.
The present invention relates to a text-to-speech device that reads aloud while changing the sound quality.

【０００２】[0002]

【従来の技術】近年、ワードプロセッサや電子出版の普
及により、電子化されたテキストが普及している。これ
らのテキストへのアクセス手段の一つとして、テキスト
から音声波形を生成して出力する読み上げ装置が提案さ
れている。2. Description of the Related Art In recent years, digitized text has become widespread due to widespread use of word processors and electronic publishing. As one of means for accessing these texts, a reading device that generates and outputs a speech waveform from the texts has been proposed.

【０００３】従来提案されている読み上げ装置では、引
用文部分をテキスト中の記号より識別し、引用文部分に
対してはその他の部分とは性質を変化させて引用文を識
別可能にする例（特開平５−２８９６８７号）や、テキ
スト中に発声制御コードを埋め込んでおき、その制御コ
ードに従って規則音声合成を行う例（特開平５−３０７
３９６）がある。In a conventionally proposed reading device, a quoted sentence portion is identified from a symbol in a text, and the quoted sentence portion can be identified by changing its properties from the other portions ( Japanese Unexamined Patent Publication No. Hei 5-28987) or an example of embedding a voicing control code in a text and performing regular speech synthesis according to the control code (Japanese Unexamined Patent Publication No. 5-30787).
396).

【０００４】[0004]

【発明が解決しようとする課題】従来の文章読み上げ装
置を用いた場合、小説などの台詞を含むテキストの台詞
の部分の声質を台詞の話者毎に変えて適切に読み上げる
ことは、まったく不可能であるか、テキスト中に付加情
報を予め埋め込んでおく必要があった。When a conventional text-to-speech device is used, it is completely impossible to change the voice quality of the dialogue portion of a text including dialogue such as a novel by changing it for each speaker of the dialogue. Or, it is necessary to embed additional information in the text in advance.

【０００５】本発明の目的は、テキスト中に現われる台
詞部分の話者を推定し、話者毎に異なる声質で読み上げ
るテキスト読み上げ装置を提供することにある。An object of the present invention is to provide a text-to-speech device which estimates a speaker of a dialogue portion appearing in a text and reads aloud with a different voice quality for each speaker.

【０００６】[0006]

【課題を解決するための手段】本発明は、テキストデー
タを読み込むテキストデータ入力装置と、テキスト中に
現われる記号によりテキストを台詞と地の文に分離し、
テキスト中の文の主語及び動詞を検出する構文解析手段
と、構文解析結果からテキスト中の台詞の話者を推定す
る話者分析手段と、テキストから推定された話者を実際
にどの話者データで読み上げるかを決定する話者データ
割り当て手段と、テキストデータと話者データから規則
合成音声波形を生成する音声規則合成手段と、音声波形
を音として出力するD／A変換装置と、全体の制御を行う
制御手段とを有し、テキストの中の話者の異なる内容は
異なる合成音声で読み上げるテキスト読み上げ装置を提
供するものである。According to the present invention, a text data input device for reading text data, and a symbol appearing in the text are used to separate the text into dialogue and ground sentences.
Parsing means for detecting the subject and verb of the sentence in the text, speaker analyzing means for estimating the speaker of the dialogue in the text from the result of the parsing, and speaker data that is the speaker data actually estimated from the text. Speaker data allocating means for deciding whether or not to read aloud, a voice rule synthesizing means for generating a rule synthesized voice waveform from text data and speaker data, a D / A conversion device for outputting the voice waveform as a sound, and overall control The present invention is to provide a text-to-speech device having a control means for performing the above and reading out different contents of different speakers in the text with different synthetic voices.

【０００７】また、テキスト中の改行、および括弧記号
からテキストの地の文をタイトル、注釈、通常の文に分
類する分類手段と、分類毎に声質、話速、声の高さ、抑
揚パタンを変えて合成音声が合成可能な音声規則合成手
段とを有し、タイトル、注釈、通常の文の分類毎に声
質、話速、声の高さ、抑揚パタンのいずれかもしくは全
てを変えてテキストから合成音声をテキストを読み上げ
る、テキストの構造を発声様式の違いによってわかりや
すく呈示するテキスト読み上げ装置を提供するものであ
る。[0007] In addition, a classifying means for classifying a sentence in the ground of a text into a title, a comment, and a normal sentence from line breaks and parenthesis marks in the text, and a voice quality, a speech speed, a voice pitch, and an intonation pattern for each classification. It has a voice rule synthesizing means that can synthesize synthesized voices by changing it, and changes any or all of voice quality, speech speed, pitch, and intonation pattern for each title, annotation, and ordinary sentence classification, and from the text. (EN) A text-to-speech device that reads a synthetic voice aloud and presents the structure of the text in an easy-to-understand manner by the difference in utterance style.

【０００８】[0008]

【作用】テキストデータ入力手段によりテキストデータ
を取り込み、構文解析手段によりテキストを台詞と地の
文にわけ、さらにそれぞれの文の主語と動詞を抽出し、
話者推定手段により、台詞に現われる特定の単語や、台
詞の前後の地の文の主語と動詞情報から、各々の台詞の
話者を識別し、話者データ割り当て手段にでテキストに
現われる話者に対してどのような音声で読み上げるべき
か決定し、音声規則合成手段によりテキストを音声合成
波形に変換し、D／A 変換装置により合成音声を出力す
ることで、テキスト中の話者の異なるテキストを異なる
声質の合成音声で読み上げられ、テキスト中に含まれる
会話構造を理解しやすく呈示できる。[Function] The text data input means takes in text data, the syntax analysis means divides the text into dialogue and ground sentences, and further extracts the subject and verb of each sentence,
The speaker estimating means identifies the speaker of each dialogue from the specific words appearing in the dialogue and the subject and verb information of the sentence before and after the dialogue, and the speaker appearing in the text in the speaker data allocating means. It is decided by what kind of voice should be read aloud, the text is converted into a voice synthesis waveform by the voice rule synthesis means, and the synthesized voice is outputted by the D / A converter, so that the texts of different speakers in the text are Can be read aloud with synthetic voices with different voice qualities, and the conversation structure contained in the text can be presented in an easily understandable manner.

【０００９】テキストの地の文をタイトル、注釈、通常
の文に分類する分類手段により、テキストをタイトル、
注釈、通常の文に分類し、声質、話速、声の高さ、抑揚
パタンを変えて合成音声が合成可能な音声規則合成手段
により、文の分類毎に声質、話速、声の高さ、抑揚パタ
ンのいずれかもしくは全てを変えてテキストから合成音
声をテキストを読み上げ、テキストの構造を発声様式の
違いによってわかりやすく呈示できる。By the classifying means for classifying the ground sentence of the text into the title, the annotation, and the ordinary sentence, the text is divided into the title,
Annotations, normal sentences are classified, and voice quality, speech speed, and voice pitch are classified for each sentence by a voice rule synthesizing unit that can synthesize synthesized speech by changing voice quality, speech speed, pitch, and intonation pattern. By changing any or all of the intonation patterns, the synthesized voice is read aloud from the text, and the structure of the text can be presented in an easy-to-understand manner by the difference in utterance style.

【００１０】[0010]

【実施例】図１に本発明のハードウエア構成図の一実施
例を示す。本実施例は、各種のプログラムが格納され
た、プログラム格納用ROM（２０５）、プログラム格納
用ROM（２０５）に格納されたプログラムを実行するCPU
（２０１）、各種のデータを格納するメモリ（２０
２）、話者の分析に必要なデータベースを格納した不揮
発メモリ（２０３）、規則合成で用いる話者データを格
納した話者データ格納装置（２０４）、外部からテキス
トデータを入力するテキストデータ入力装置（２０
６）、ユーザからの操作を入力するためのユーザ操作入
力装置（２０７）、テキストデータから生成した読み上
げ音声を出力するためのＤ／Ａ変換装置（１１４）から
構成されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows an embodiment of a hardware configuration diagram of the present invention. In this embodiment, a program storage ROM (205) in which various programs are stored, and a CPU which executes the programs stored in the program storage ROM (205)
(201), a memory for storing various data (20
2), a non-volatile memory (203) that stores a database necessary for speaker analysis, a speaker data storage device (204) that stores speaker data used in rule synthesis, and a text data input device that inputs text data from the outside. (20
6), a user operation input device (207) for inputting an operation from a user, and a D / A conversion device (114) for outputting a reading voice generated from text data.

【００１１】実際の読み上げ処理の全体的な処理フロー
を図２に示す。テキストデータ入力装置（２０６）から
入力されたテキストデータ（１００）は、構文解析処理
（１０１）によって、改行記号と鍵括弧と括弧を参照し
て、テキストを地の文（１８１）、台詞（１８２）、章
題（１８０）、注釈文（１８３）に分類し、言語解析を
行い、テキストを単語に分解し品詞情報と文中での主
語、述語といった構文情報が付加される。話者分析処理
（１０２）では、台詞（１８２）と地の文（１８１）の
主語の関係や、台詞（１８２）の並びなどから台詞の話
者の推定と、話者の属性を推定する。図３に本発明で扱
うテキストの一例を示す。図４に図３のテキストを処理
した結果の一実施例を示す。図２における話者データ割
当処理（１０３）は話者の属性に合った話者データを選
択し、話者データインデックス（１００８）を設定す
る。音声規則合成処理部（１０４）は図４の話者情報付
きテキスト（１９１）と話者属性情報（１９２）を参照
して、話者データ（１１３）を切り替えながら、テキス
トの各部分を適切な話者で読み上げる。The overall processing flow of the actual reading processing is shown in FIG. The text data (100) input from the text data input device (206) is subjected to a parsing process (101) with reference to a line feed symbol, a pair of brackets and parentheses, and the text is converted into a ground sentence (181) and a dialogue (182). ), A chapter (180), and an annotation sentence (183), linguistic analysis is performed, the text is decomposed into words, and part-of-speech information and syntactic information such as a subject and a predicate in the sentence are added. The speaker analysis process (102) estimates the speaker of the dialogue and the attribute of the speaker from the relationship between the subject of the dialogue (182) and the sentence of the ground (181) and the arrangement of the dialogue (182). FIG. 3 shows an example of text handled in the present invention. FIG. 4 shows an example of the result of processing the text of FIG. The speaker data allocation processing (103) in FIG. 2 selects speaker data that matches the speaker attributes and sets the speaker data index (1008). The voice rule synthesis processing unit (104) refers to the text with speaker information (191) and the speaker attribute information (192) in FIG. Read out by the speaker.

【００１２】つぎに、図二の処理フローの詳細について
説明する。構文解析（１０１）は、例えば、文節数最小
法による形態素解析を行い、テキストを単語に分割し、
品詞を解析し、格助詞をキーとして格解析を行い、主語
の解析を行う。Next, the details of the processing flow of FIG. 2 will be described. The syntactic analysis (101), for example, performs morphological analysis by the minimum clause number method, divides the text into words,
Part-of-speech is analyzed, case-particle is analyzed as a key, and subject is analyzed.

【００１３】話者分析処理（１０２）は、台詞の前後の
地の文に現われる発話に関する動詞の主語、台詞中に現
われる一人称代名詞、台詞の文末表現の違い、台詞中に
現われる方言の違いにより台詞の話者を区別し、話者の
特性を発話者の人名や、発話中に現われる特定の単語か
ら年令や性別を推定し、各台詞毎の話者の区別を示す台
詞テーブル（２１１）と、それぞれの話者の性別、年令
などの特徴を示す話者テーブル（２１２）が生成され
る。図５は発話に関する動詞を検索する処理で用いられ
る発話動詞動詞データベース（１０６）の一実施例であ
る。本実施例では発話に関する動詞の終止型や、発話に
関する慣用表現を格納している。図６は人称代名詞デー
タベース（１０７）の一実施例である。本実施例では、
人称代名詞の表記を納めた単語欄（４０１）と、人称代
名詞性別が推定される場合の性別を格納する性別属性欄
（４０２）、人称代名詞が何人称であるかを示す人称属
性欄（４０３）、人称代名詞から年令層が推定可能な場
合の年令層を格納するための年令属性欄（４０４）、人
称代名詞から方言属性が推定可能な場合の方言属性を格
納するための方言属性格納欄（１０７）から成る。図７
は人名属性データベース（１０８）の一実施例である。
多くの人名において、人名から性別が判断できる。そこ
で、人名属性データベース（１０８）は、名前を格納す
る単語欄（４０１）と、人名から推定される性別属性を
格納する性別属性欄（４０２）から成る。図８は文末表
現属性データベース（１０９）の一実施例である。文末
表現属性データベース（１０９）は、特定の方言にのみ
現われる文末表現や、特定の性別の発話者のみが用いる
文末表現、特定の職業や経歴の発話者のみが用いる文末
表現といったような、発話者の識別に有効と考えられる
文末表現が格納される。また、本実施例では、文末表現
によって予測される話者の性別を格納する性別属性欄
（４０２）と、文末表現によって話者の方言が特定でき
る話者の方言属性を格納する方言属性欄（４０５）を有
する。図９は方言データベース（１１０）の一実施例で
ある。方言データベース（１１０）は特定の方言に現わ
れる単語を格納する単語欄（４０１）と、単語が有する
方言属性を格納する方言属性欄（４０５）から成る。図
１０は年令属性データベース（１１１）の一実施例であ
る。本実施例では、特定の年令や年代の話者が使用する
単語を格納する単語欄（４０１）と、単語から推定され
る年令を格納する年令属性欄（４０６）から成る。話者
テーブル（２１２）の一実施例を図１１に示す。また、
台詞テーブル（２１１）の一実施例を図１２に示す。ま
た、話者分析処理（１０２）の詳細を図１３に示す。話
者分析処理（１０２）は、まずテキストの台詞を台詞テ
ーブル（２１２）に全て登録し（１２０１）、つぎに登
録された台詞から話者を抽出し話者テーブルに登録する
（１２０３）。つぎに台詞テーブルに登録された台詞が
話者テーブルに登録されたどの話者の発声によるものか
を、台詞に現われる属性により推定する処理を行い（１
２０３）、台詞の属性だけでは話者が決定できない台詞
について、隣接した台詞は異なる話者を持つことが多い
ことを利用して、台詞の並びを基にした話者推定を行な
う（１２０４）。最後に自動的に話者が推定できなかっ
た台詞の話者を手動で設定する（１２０５）。The speaker analysis process (102) uses the subject of the verb relating to the utterance appearing in the sentence before and after the dialogue, the first-person pronoun appearing in the dialogue, the difference in the sentence end expression of the dialogue, and the dialect appearing in the dialogue. A speaker table (211) showing the characteristics of the speaker by estimating the speaker's name and the age and gender from the specific word appearing during the utterance, and distinguishing the speaker for each dialog. , A speaker table (212) showing characteristics such as sex and age of each speaker is generated. FIG. 5 is an example of the utterance verb verb database (106) used in the process of searching for a verb relating to utterance. In this embodiment, the verb ending type for utterances and idiomatic expressions for utterances are stored. FIG. 6 shows an example of the personal pronoun database (107). In this embodiment,
A word column (401) containing the notation of personal pronouns, a gender attribute column (402) that stores the gender when the personal pronoun gender is estimated, and a personal attribute column (403) that indicates how many personal pronouns are , Age attribute column (404) for storing the age group when the age group can be estimated from the personal pronoun, and dialect attribute storage for storing the dialect attribute when the dialect attribute can be estimated from the personal pronoun It consists of a column (107). Figure 7
Is an example of the personal name attribute database (108).
In many personal names, the gender can be determined from the personal name. Therefore, the personal name attribute database (108) includes a word column (401) for storing a name and a gender attribute column (402) for storing a gender attribute estimated from the personal name. FIG. 8 is an example of the sentence end expression attribute database (109). The sentence end expression attribute database (109) is a sentence end expression that appears only in a specific dialect, a sentence end expression that is used only by a speaker of a specific gender, or a sentence end expression that is used only by a speaker of a specific occupation or background. The end-of-sentence expression considered to be effective for identifying is stored. In this embodiment, the gender attribute column (402) for storing the gender of the speaker predicted by the sentence end expression and the dialect attribute column (for storing the dialect attribute of the speaker whose speaker dialect can be specified by the sentence end expression). 405). FIG. 9 shows an example of the dialect database (110). The dialect database (110) includes a word column (401) for storing words appearing in a specific dialect and a dialect attribute column (405) for storing dialect attributes of the words. FIG. 10 shows an example of the age attribute database (111). In this embodiment, a word column (401) for storing a word used by a speaker of a specific age or age and an age attribute column (406) for storing an age estimated from the word. An example of the speaker table (212) is shown in FIG. Also,
An example of the dialogue table (211) is shown in FIG. The details of the speaker analysis process (102) are shown in FIG. In the speaker analysis process (102), first, all the dialogues of the text are registered in the dialogue table (212) (1201), then the speaker is extracted from the registered dialogues and registered in the speaker table (1203). Next, a process of estimating which speaker registered in the dialog table is the utterance of the speaker registered in the speaker table by the attribute appearing in the dialog (1
203), with respect to the dialogue in which the speaker cannot be determined only by the attribute of the dialogue, the fact that adjacent dialogues often have different speakers is used to perform speaker estimation based on the arrangement of dialogues (1204). Finally, the speaker whose dialogue cannot be estimated automatically is manually set (1205).

【００１４】つぎに、話者分析処理の実施例の処理フロ
ーの詳細について説明する。まず台詞テーブル登録処理
（１２０１）について説明する。台詞テーブル登録処理
（１２０１）では、図１２に示す台詞テーブル（２１
１）にテキストに現われる台詞を全て登録する。台詞テ
ーブル登録処理（１２０１）では、台詞の前後の地の文
（１８１）に発話者が明示されている場合、話者名（１
００２）を台詞テーブルに登録する。本発明では、地の
文の発話動詞の主語を発話者として扱う。発話動詞の判
定は、図５の発話動詞データベースに登録された動詞お
よび慣用表現を発話動詞と判定する。話者を決定可能な
台詞と地の文のパタンとを図１５に示す。台詞の前後に
発話動詞を含む地の文が現われるパタンは３種類あり、
台詞の前に地の文が現われるパタン１（１３５１）、台
詞の後に地の文が現われるパタン２（１３５２）、地の
文の目的語として台詞が現われるパタン３（１３５３）
がある。また、図１６のように、発話動詞（１３６２）
を含む台詞が２つの台詞に挟まれている場合、地の文の
前に位置する台詞１（１３７１）の発話者が台詞テーブ
ル（２１１）に登録されていない場合は、地の文の主語
（１３６１）は台詞１（１３７１）の話者として扱い、
すでに台詞１（１３７１）の話者が登録されている場合
は、地の文の主語（１３６１）は台詞２（１３７２）の
話者として登録される。また、発話動詞を含む地の文に
主語が無い場合、発話動詞を含む地の文よりも前に存在
する地の文の主語を発話動詞の主語とし、話者として扱
う。発話動詞を含む地の文に主語が無い場合の例を図１
７に示す。図１７の例では、地の文２（１３８２）には
発話動詞（１３６２）に主語がないが、地の文２の前の
地の文１に「美和子」という主語（１３６１）がある。
したがって、地の文２（１３８２）の主語は、「美和
子」になる。この台詞テーブル登録処理の処理手順を図
１４を用いて説明する。まず、内部変数の「話者変数」
および「主語変数」をクリアする（１３０１）。１文づ
つ読み込み、その文が台詞か否かを判定する（１３０
２）台詞の場合、台詞属性登録処理（１３０８）を呼び
出し、台詞の属性をテーブルに登録する。つぎに「話者
変数」に値が設定されているか否かをチェックし、この
値が設定されている場合は、台詞テーブル（２１１）の
話者欄（１００２）に「話者変数」の値を設定し、「話
者変数」をクリアする（１３１２）。この処理は、図１
５のパタン１（１３５１）の登録に相当する。また、１
３０２で読み込んだ文が台詞でなく、地の文であった場
合、文の主語を取り出し（１３０３）、主語が存在した
場合は「主語変数」に値を代入する。つぎに発話動詞が
文に含まれているかチェックし（１３０６）、含まれて
いた場合は、まず図１５のパタン３（１３５３）である
か検査する処理として、文中に台詞を含むか検査する
（１３０７）。含む場合、台詞属性登録処理（１３０
８）を呼び出し、台詞の属性を登録し、「主語変数」の
値を処理中の台詞の話者（１００２）として設定する。
文中に台詞を含まない場合は、図１５のパタン２（１３
５２）であるかを検査する処理として、文の直前に話者
が未定の台詞があるかを検査し（１３１０）、直前の台
詞の話者（１００２）として、「主語変数」を設定す
る。文の直前に話者が未定の台詞が無い場合は、図１５
のパタン１（１３５１）である可能性があるので、「話
者変数」に「主語変数」の値を設定する。最後に文を全
て処理したかをチェックし（１３１４）、まだ処理すべ
き文が残っている場合は、１３０２に戻る。Next, details of the processing flow of the embodiment of the speaker analysis processing will be described. First, the dialog table registration processing (1201) will be described. In the dialog table registration processing (1201), the dialog table (21
Register all the words that appear in the text in 1). In the dialogue table registration processing (1201), when the speaker is clearly indicated in the sentence (181) before and after the dialogue, the speaker name (1
002) is registered in the dialogue table. In the present invention, the subject of the speech verb of the ground sentence is treated as the speaker. In the determination of the utterance verb, the verb and the idiomatic expression registered in the utterance verb database of FIG. 5 are determined as the utterance verb. FIG. 15 shows the dialogue and the pattern of the ground sentence that can determine the speaker. There are three types of patterns in which the sentence of the ground including the utterance verb appears before and after the dialogue,
Pattern 1 (1351) in which the sentence of the earth appears before the dialogue, Pattern 2 (1352) in which the sentence of the earth appears after the dialogue, Pattern 3 (1353) in which the dialogue appears as the object of the sentence of the earth
There is. Also, as shown in FIG. 16, the utterance verb (1362)
When the dialogue including the word is sandwiched between two dialogues, if the speaker of dialogue 1 (1371) located before the sentence of the ground is not registered in the dialogue table (211), the subject of the ground sentence ( 1361) is treated as a speaker of dialogue 1 (1371),
When the speaker of the line 1 (1371) is already registered, the subject (1361) of the ground sentence is registered as the line 2 (1372) speaker. If the ground sentence containing the utterance verb does not have a subject, the subject of the ground sentence existing before the ground sentence containing the utterance verb is treated as the speaker, as the subject of the utterance verb. Fig. 1 shows an example of the case where the sentence in the ground containing the utterance verb has no subject.
7 shows. In the example of FIG. 17, the sentence 2 of the ground (1382) has no subject in the utterance verb (1362), but the sentence 1 of the ground before the sentence 2 of the ground has the subject of “Miwako” (1361).
Therefore, the subject of the ground sentence 2 (1382) is "Miwako". The processing procedure of this speech table registration processing will be described with reference to FIG. First, the internal variable "speaker variable"
And "subject variable" are cleared (1301). Each sentence is read, and it is determined whether the sentence is a dialogue (130
2) In the case of dialogue, the dialogue attribute registration processing (1308) is called to register the dialogue attribute in the table. Next, it is checked whether or not a value is set in the "speaker variable". If this value is set, the value of the "speaker variable" is set in the speaker column (1002) of the dialogue table (211). Is set and the "speaker variable" is cleared (1312). This process is shown in FIG.
This corresponds to the registration of Pattern 1 (1351) of 5. Also, 1
When the sentence read in 302 is not a dialogue but a sentence of the ground, the subject of the sentence is taken out (1303), and when the subject is present, a value is assigned to the "subject variable". Next, it is checked whether the utterance verb is included in the sentence (1306), and if it is included, first, as a process of checking whether it is pattern 3 (1353) in FIG. 15, it is checked whether the sentence includes a dialogue ( 1307). If included, the dialogue attribute registration process (130
8) is called, the attribute of the dialogue is registered, and the value of the “subject variable” is set as the speaker (1002) of the dialogue being processed.
If the sentence does not include dialogue, pattern 2 (13
As a process of inspecting whether or not it is 52), it is inspected whether there is a dialogue whose speaker is undecided immediately before the sentence (1310), and a "subject variable" is set as the speaker (1002) of the dialogue just before. If there is no dialogue whose speaker is undecided immediately before the sentence,
Since it may be the pattern 1 (1351), the value of the “subject variable” is set in the “speaker variable”. Finally, it is checked whether all the sentences have been processed (1314). If there are still sentences to be processed, the process returns to 1302.

【００１５】ところで、全ての台詞について、図１５に
示したように明示的に書かれているわけではない。そこ
で、台詞に現われる特徴を抽出し、明示的に台詞の話者
が書かれていない台詞と明示的に話者が書かれている台
詞と特徴のマッチングを計算し、特徴が一致した台詞は
同一の話者が発声したとみなすことで、明示的に話者が
書かれていない台詞の話者を推定する。本発明の一実施
例では、台詞の特徴として、台詞中に現われる特定の単
語や、特定の単語から推定される属性情報を特徴として
用いる。具体的には、一人称表現（１００３）、方言属
性（１００４）、性別属性（１００５）、文末表現（１
００６）、年令属性（１００７）を用いている。実際の
話者属性登録処理の一実施例を図１８に示す。まず、一
人称欄（１００３）を設定する（１４０１）。一人称表
現（１００３）は、人称代名詞データベース（１０７）
を用いて台詞中から検索する。人称代名詞データベース
（１０７）の一実施例を図８に示す。人称代名詞データ
ベースの人称（４０３）が１である単語（４０１）を台
詞テキスト中から検索し、見つかった場合に登録する。
つぎに文末表現属性欄（１００６）の設定を行う（１４
０２）。文末表現（１００６）は、図８の文末表現デー
タベースに登録された単語（４０１）が台詞テキスト中
に現われるかを検索し、見つかった場合に登録する。つ
ぎに方言欄（１００４）の設定を行う（１４０３）。方
言属性は、人称代名詞データベース（１０７）、図８の
文末表現データベース（１０９）、図９の方言データベ
ース（１１０）に登録されている単語（４０１）を台詞
中から検索し、単語が見つかった場合は、単語に対応し
た方言属性（１０９）を取り出し、台詞テーブル（２１
１）の方言属性の欄（１００４）に登録する。図１９に
示すように、台詞と台詞の間に地の文が入らずに、台詞
が連続する場合がある。ほとんどの場合、地の文を鋏ま
ずに台詞が連続する場合、連続して現われる２つの台詞
の話者は異なる。本発明では、この特徴を話者の推定に
利用する。そこで、台詞を台詞テーブルに登録する際
に、直前の文が台詞であるか地の文であるかの情報を台
詞接続フラグ（１１０２）として記録する。まず、現在
の台詞の直前の文が台詞か否かを判定し（１４０４）、
台詞だった場合、台詞接続フラグ（１１０２）に１を設
定し（１４０６）、直前の文が台詞でなかった場合に、
台詞接続フラグ（１１０２）に０を設定する（１４０
５）。つぎに性別欄（１００５）の設定を行う（１４０
７）。性別属性は、話者名欄（１００２）に話者名が登
録されている場合は、図７の人名属性データベース（１
０９）、図６の人称代名詞データベース（１０７）を話
者名により検索し、性別属性（４０２）を取り出して登
録する。つぎに年令欄（１００７）の設定を行う（１４
０８）。年令属性（１００７）は、図６の人称代名詞デ
ータベース（１０７）、図１０の年令識別語データベー
ス（１１１）に登録された単語（４０１）が台詞テキス
ト中に現われるかを検索し、これらのデータベース中に
台詞に現われる単語が存在し、かつ年令属性（４０６）
が存在する場合に年令属性をデータベースより取り出し
て登録する。By the way, not all lines are explicitly written as shown in FIG. Therefore, we extract the features that appear in the dialogue and calculate the matching between the dialogue in which the speaker of the dialogue is not explicitly written and the dialogue in which the speaker is written explicitly, and the dialogue in which the features match is the same. By presuming that the speaker has uttered, we presume a speaker whose dialogue is not written explicitly. In one embodiment of the present invention, as a feature of a dialogue, a specific word appearing in the dialogue or attribute information estimated from the particular word is used as the feature. Specifically, first-person expression (1003), dialect attribute (1004), gender attribute (1005), sentence end expression (1
006), and the age attribute (1007) is used. FIG. 18 shows an example of the actual speaker attribute registration processing. First, the first person column (1003) is set (1401). The first person expression (1003) is the personal pronoun database (107).
Use to search from the dialogue. An example of the personal pronoun database (107) is shown in FIG. A word (401) whose personal name (403) is 1 in the personal pronoun database is searched from the dialogue text and registered when found.
Next, the sentence end expression attribute column (1006) is set (14
02). The sentence end expression (1006) is searched for whether the word (401) registered in the sentence end expression database of FIG. 8 appears in the dialogue text, and registered if found. Next, the dialect column (1004) is set (1403). As for the dialect attribute, when the word (401) registered in the personal pronoun database (107), the sentence end expression database (109) in FIG. 8 and the dialect database (110) in FIG. Retrieves the dialect attribute (109) corresponding to the word and uses the dialogue table (21
It is registered in the dialect attribute column (1) of 1). As shown in FIG. 19, the dialogue may be continuous without a ground sentence between the dialogues. In most cases, when the dialogue is continuous without scissors in the ground sentence, the speakers of the two dialogues appearing consecutively are different. In the present invention, this feature is used for speaker estimation. Therefore, when the dialogue is registered in the dialogue table, information as to whether the preceding sentence is a dialogue or a sentence of the ground is recorded as a dialogue connection flag (1102). First, it is determined whether the sentence immediately before the current dialogue is dialogue (1404),
If it is a dialogue, the dialogue connection flag (1102) is set to 1 (1406), and if the immediately preceding sentence is not a dialogue,
The dialogue connection flag (1102) is set to 0 (140)
5). Next, the gender column (1005) is set (140
7). As for the gender attribute, if the speaker name is registered in the speaker name column (1002), the person name attribute database (1
09), the personal pronoun database (107) of FIG. 6 is searched by the speaker name, and the gender attribute (402) is extracted and registered. Next, set the age column (1007) (14
08). The age attribute (1007) is searched for whether the word (401) registered in the personal pronoun database (107) of FIG. 6 and the age identification word database (111) of FIG. 10 appears in the dialogue text. There are words that appear in the dialogue in the database, and the age attribute (406)
If exists, the age attribute is retrieved from the database and registered.

【００１６】つぎに、話者テーブルの登録処理（１２０
２）の実施例の詳細について、図２０を用いて説明す
る。話者テーブルの登録処理の概要は、台詞テーブル中
で話者名（１００２）が同一の人名、もしくは一人称の
人称代名詞の話者は同一の話者と見做し、話者のリスト
を話者テーブル（２１２）に生成する。まず、台詞テー
ブルの話者名欄（１００２）に人名もしくは一人称代名
詞が設定されているか検査し（１５０１）、人名もしく
は一人称代名詞が登録されている場合は、該当する話者
が話者テーブルに登録されているか検査し（１５０
２）、すでに当該話者が登録されている場合は、台詞テ
ーブルの話者属性には値が設定されているが、話者テー
ブルの話者属性には値が設定されていない属性があるか
検査し、台詞テーブルの話者属性には値が設定されてい
るが、話者テーブルの話者属性には値が設定されていな
い属性がある場合は台詞テーブルの属性値を話者テーブ
ルの属性値に代入する。１５０２の検査で該当する話者
が話者テーブルに登録されていなかった場合は、話者テ
ーブルに話者を登録する。そして、処理の最後に台詞テ
ーブルを全て処理したかを検査し（１５０５）まだ処理
していない台詞が残っていた場合は、１５０１から処理
を繰り返す。Next, the speaker table registration process (120)
Details of the embodiment 2) will be described with reference to FIG. The outline of the speaker table registration processing is that the person whose speaker name (1002) is the same as the speaker in the dialogue table, or the speaker of the first person personal pronoun is regarded as the same speaker, and the list of speakers is the speaker. Generate in table (212). First, it is checked whether a personal name or a first-person pronoun is set in the speaker name column (1002) of the dialogue table (1501). If the personal name or the first-person pronoun is registered, the corresponding speaker is registered in the speaker table. Inspected (150
2) If the speaker is already registered, a value is set for the speaker attribute in the dialogue table, but is there an attribute for which a value is not set for the speaker attribute in the speaker table? If the speaker attribute in the dialogue table is set, but the speaker attribute in the speaker table does not have a value, the attribute value in the dialogue table is changed to the speaker table attribute. Substitute in the value. If the corresponding speaker is not registered in the speaker table in the inspection of 1502, the speaker is registered in the speaker table. Then, at the end of the processing, it is checked whether all the speech tables have been processed (1505). If there is any unprocessed speech, the processing is repeated from 1501.

【００１７】つぎに、台詞の属性を基にした話者推定処
理（１２０３）の詳細について図２１を用いて説明す
る。台詞の属性を基にした話者推定処理（１２０３）で
は、図１２に示される台詞テーブル（２１１）の台詞イ
ンデックス（１１０１）順に台詞の話者を推定し、推定
された話者は話者テーブル（２１２）の話者インデクス
（１００１）の値で表わし、台詞テーブル（２１１）の
話者インデックス欄（１１０３）に設定される。具体的
な処理の実施例を以下に説明する。まず、台詞テーブル
（２１２）の話者名欄（１００２）に話者名が設定され
ており、かつ話者テーブルにその話者名が登録されてい
るか検査し（１６０１）、登録されていた場合は話者テ
ーブル（２１２）の該当する話者の話者インデックス番
号（１００１）を台詞テーブル（２１１）の話者インデ
ックス欄（１１０３）に設定する。台詞テーブルの話者
名欄（１００２）に話者名が設定されていない場合は、
一人称（１１０３）、方言（１００４）、性別（１００
５）の属性が一致する話者を話者テーブル（２１２）か
ら検索する（１６０２）。ここで、属性の一致の検査
は、一人称（１１０３）、方言（１００４）、性別（１
００５）属性のうち、話者テーブル（２１２）、台詞テ
ーブル（２１２）の両方に値が設定されている属性のみ
を対象とする。台詞の属性と一致した属性をもつ話者候
補の数が１名であるか検査し（１６０４）１名のみで合
った場合は、当該話者候補の話者インデックス（１００
１）を、台詞の話者インデックス欄（１１０３）に設定
する。つぎに全ての台詞について処理が完了したかを検
査し（１６０６）、処理すべき台詞が台詞テーブル（２
１２）に残っている場合は、１６０１からの処理を繰り
返す。Details of the speaker estimation processing (1203) based on the dialogue attribute will be described with reference to FIG. In the speaker estimation process (1203) based on the dialogue attributes, the dialogue speakers are estimated in the dialogue index (1101) order of the dialogue table (211) shown in FIG. 12, and the estimated speakers are the speaker table. It is represented by the value of the speaker index (1001) of (212) and is set in the speaker index column (1103) of the speech table (211). An example of specific processing will be described below. First, when the speaker name is set in the speaker name column (1002) of the dialogue table (212) and the speaker name is registered in the speaker table (1601), and it is registered Sets the speaker index number (1001) of the corresponding speaker of the speaker table (212) in the speaker index column (1103) of the speech table (211). If no speaker name is set in the speaker name column (1002) of the dialogue table,
First person (1103), dialect (1004), gender (100
A speaker having a matching attribute of 5) is searched from the speaker table (212) (1602). Here, the attribute matching check is performed in the first person (1103), dialect (1004), and gender (1
005) Of the attributes, only the attributes whose values are set in both the speaker table (212) and the speech table (212) are targeted. It is checked whether the number of speaker candidates having an attribute that matches the dialogue attribute is one (1604). If only one speaker matches, the speaker index (100
1) is set in the dialogue speaker index column (1103). Next, it is checked whether the processing has been completed for all the dialogues (1606), and the dialogue to be processed is indicated in the dialogue table (2).
If it remains in 12), the processing from 1601 is repeated.

【００１８】つぎに、台詞の並びを基にした話者推定処
理（１２０４）の概要と処理フローの詳細について説明
する。まず、図２３を用いて処理の概要を説明する。台
詞が他の台詞と隣接する場合、２つの台詞の話者は異な
る。従って、図２３に示すように、現在処理中の台詞
（１７５４）に隣接する台詞が存在する場合、隣接する
台詞の話者が設定されている場合は現在処理中の台詞の
話者候補から隣接する台詞の話者を除いて、話者候補を
絞り込むことができる。図２３の例では、前に隣接する
台詞（１７５３）および後に隣接する台詞（１７５５）
の話者を処理中の台詞（１７５４）の話者候補から取り
除く。話者候補が１人に絞り込めたばあい、その一名の
話者候補を処理中の台詞の話者として登録する。ここ
で、台詞の前の地の文（１７５２）を、処理中の台詞
（１７５５）よりも前に存在する最も近い地の文から、
連続して続く一連の地の文と定義する（図２３参照）。
また、台詞の後の地の文（１７５６）を、処理中の台詞
（１７５５）よりも後に存在する最も近い地の文から連
続して続く一連の地の文（１７５６）と定義する（図２
３参照）。話者候補が１人に絞り込めない場合、台詞の
前の地の文（１７５２）と台詞の後の地の文（１７５
６）に台詞の話者が主語として登場することを利用し
て、話者候補の絞り込みを行う。台詞の台詞の前の地の
文（１７５２）と台詞の後の地の文（１７５６）の主語
以外の話者候補を取り除く。話者候補が１人まで絞り込
めたばあい、絞りこんで残った１名の話者を処理中の台
詞の話者として登録する。つぎに、図２２を用いて台詞
の並びを基にした話者推定処理（１２０４）の処理フロ
ーの詳細について説明する。まず最初に処理する台詞の
インデックス番号を格納するための変数iを１に初期化
する（１７０１）。そしてインデックス番号iの台詞の
一人称表現、方言属性、性別属性の一致する話者候補を
話者テーブル（２１１）より検索する。台詞テーブルの
接続フラグ（１１０２）を参照し、１つ前の台詞と隣接
しているか検査し（１７０３）、隣接している場合は前
の台詞に話者が設定されていた場合は話者候補から前に
隣接台詞の話者を取り除く（１７０４）。同様に台詞テ
ーブルの接続フラグ（１１０２）を参照し、１つ後の台
詞と隣接しているか検査し（１７０５）、隣接している
場合は前の台詞に話者が設定されていた場合は話者候補
から前に隣接台詞の話者を取り除く（１７０６）。話者
候補数が１名であるか検査し（１７０７）、話者候補数
が１名の場合は話者候補に対応した話者インデックスを
台詞テーブル（２１２）の話者インデックス欄（１１０
３）に登録する（１７１０）。話者候補数が１名でない
場合は、さらに台詞の前後の地の文（１７５２、１７５
６）の主語に含まれていない話者候補を除き（１７０
８）、話者候補者数が１名であるか検査する（１７０
９）。話者候補者数が話者候補に対応した話者インデッ
クスを台詞テーブル（２１２）の話者インデックス欄
（１１０３）に登録する（１７１０）。そして処理中の
話者インデックス番号を１増やし（１７１１）、全ての
台詞を処理したかチェックし（１７１２）、全ての台詞
を処理し終わっていない場合は１７０２からの処理を繰
り返す。全ての台詞を処理した後に、新たに話者インデ
ックスが設定された台詞があるか検査する（１７１
３）。新たに話者が設定された台詞がある場合、新たに
話者が設定された台詞に隣接する台詞の話者候補が１人
減らすことが出きるので、台詞に話者を設定できる可能
性が出てくる。そこで、新たに話者インデックスが設定
された台詞があるばあいは、１７０１からの処理を繰り
返す。Next, the outline of the speaker estimation processing (1204) based on the line of dialogue and the details of the processing flow will be described. First, the outline of the processing will be described with reference to FIG. If a dialogue is adjacent to another dialogue, the speakers of the two dialogues are different. Therefore, as shown in FIG. 23, when there is a speech adjacent to the speech currently being processed (1754), when a speaker of the speech that is adjacent to the speech is set, the speech from the speaker candidate of the speech currently being processed is adjacent. It is possible to narrow down the speaker candidates excluding the speakers who speak the dialogue. In the example of FIG. 23, the dialogue adjacent to the front (1753) and the dialogue adjacent to the back (1755)
Is removed from the speaker candidates of the dialogue (1754) being processed. When the speaker candidates are narrowed down to one, the one speaker candidate is registered as the speaker of the dialogue being processed. Here, the sentence (1752) of the place before the dialogue is changed from the sentence of the closest place existing before the dialogue (1755) being processed to
It is defined as a series of continuous texts (see FIG. 23).
In addition, the ground sentence (1756) after the dialogue is defined as a series of ground sentences (1756) that continuously follow the closest sentence existing after the dialogue (1755) being processed (FIG. 2).
3). When the number of speaker candidates cannot be narrowed down to one, the sentence before the dialogue (1752) and the sentence after the dialogue (175
The speaker candidates are narrowed down by utilizing the fact that the speaker of the dialogue appears as the subject in 6). Speaker candidates other than the subject of the ground sentence (1752) before the dialogue and the ground sentence (1756) after the dialogue are removed. When the number of speaker candidates is narrowed down to one, the one speaker remaining after being narrowed down is registered as the speaker of the dialogue being processed. Next, the processing flow of the speaker estimation processing (1204) based on the line of dialogue will be described in detail with reference to FIG. First, a variable i for storing the index number of the dialogue to be processed first is initialized to 1 (1701). Then, the speaker table (211) is searched for speaker candidates in which the first-person expression of the dialogue of the index number i, the dialect attribute, and the gender attribute match. The connection flag (1102) of the dialogue table is referred to, and it is checked whether or not it is adjacent to the previous dialogue (1703). If it is adjacent, the speaker candidate is set if the previous dialogue is set to the speaker. Remove the speaker of the adjacent speech before (1704). Similarly, by referring to the connection flag (1102) in the dialogue table, it is checked whether or not it is adjacent to the next dialogue (1705). If it is, the speech is spoken if the previous dialogue is set to the speaker. The speaker of the adjacent speech is removed from the candidate candidates (1706). It is checked whether the number of speaker candidates is one (1707). If the number of speaker candidates is one, the speaker index corresponding to the speaker candidate is the speaker index column (110) of the dialogue table (212).
It is registered in 3) (1710). When the number of speaker candidates is not one, the sentence before and after the dialogue (1752, 175)
Speaker candidates not included in the subject of 6) are excluded (170
8) Inspect whether the number of speaker candidates is one (170)
9). The speaker index in which the number of speaker candidates corresponds to the speaker candidate is registered in the speaker index column (1103) of the speech table (212) (1710). Then, the speaker index number being processed is incremented by 1 (1711), it is checked whether all the lines have been processed (1712), and if all the lines have not been processed, the processes from 1702 are repeated. After processing all the lines, it is checked whether there is a line for which a new speaker index is set (171).
3). If there is a new line for which a speaker has been set, it is possible to reduce one speaker candidate for the line that is adjacent to the line for which a new speaker has been set, so it may be possible to set a line for a speaker. Come out. Therefore, if there is a dialogue for which a speaker index is newly set, the processing from 1701 is repeated.

【００１９】次にマニュアルによる台詞の話者指定処理
（１２０５）の処理フローの詳細を図２４を用いて説明
する。まず最初に処理する台詞のインデックス番号を格
納するための変数iを１に初期化する（１８０１）。つ
ぎにi番目の台詞の話者インデックス欄（１１０３）に
話者が設定されているか検査し（１８０２）、設定され
ていない場合は台詞付近のテキストと話者の一覧を表示
し（１８０３）、ユーザに台詞に対応した話者を選択さ
せ、台詞の話者を決定し台詞テーブル（２１２）の話者
インデックス欄（１１０３）に登録する（１８０３）。
そして処理中の話者インデックス番号を１増やし（１７
１１）、全ての台詞を処理したかチェックし（１７１
２）、全ての台詞を処理し終わっていない場合は１７０
２からの処理を繰り返す。Next, the details of the processing flow of the manual dialogue speaker designation processing (1205) will be described with reference to FIG. First, a variable i for storing the index number of the speech to be processed first is initialized to 1 (1801). Next, it is checked whether a speaker is set in the speaker index column (1103) of the i-th dialogue (1802), and if not set, the text around the dialogue and a list of speakers are displayed (1803), The user is made to select the speaker corresponding to the dialogue, the speaker of the dialogue is determined, and registered in the speaker index column (1103) of the dialogue table (212) (1803).
Then, increase the speaker index number being processed by 1 (17
11) and check whether all the lines have been processed (171
2), 170 if all the lines have not been processed
The process from 2 is repeated.

【００２０】つぎに、話者データ割当処理（１０３）に
ついて、図２５を用いて説明する。話者データ割り当て
処理では、テキストから抽出された各々の話者の合成音
を生成する際に、どの話者データを用いて合成音を生成
するかを決定する。図２５は、話者データ属性（１１
２）の内容の一実施例である。話者データ属性（１１
２）には、音声規則合成（１０４）用に用意された話者
データ（１１３）中のそれぞれの話者の方言属性（１０
０４）、性別属性（１１０５）、年令属性（１１０７）
が格納されている。話者テーブルの各話者の属性と話者
データ属性（１１２）とのマッチングをとり、最も属性
が近く、かつ既に他の話者に割り当てられていない話者
データの話者データインデックス（１００８）を話者テ
ーブル（２１１）の話者データインデックス欄（１００
８）に設定する。Next, the speaker data allocation process (103) will be described with reference to FIG. In the speaker data allocation processing, when generating the synthesized sound of each speaker extracted from the text, it is determined which speaker data is used to generate the synthesized sound. FIG. 25 shows the speaker data attribute (11
This is an example of the contents of 2). Speaker data attribute (11
2), the dialect attribute (10) of each speaker in the speaker data (113) prepared for the voice rule synthesis (104).
04), gender attribute (1105), age attribute (1107)
Is stored. The attribute of each speaker in the speaker table is matched with the speaker data attribute (112), and the speaker data index (1008) of the speaker data having the closest attribute and not already assigned to another speaker. The speaker data index column (100
Set to 8).

【００２１】音声規則合成処理（１０４）では、台詞テ
ーブルと（２１１）、話者テーブル（２１２）、話者デ
ータ（１１３）、テキストを入力とし、音声波形を生成
する。台詞のテキストの読み上げを行う際は、台詞テー
ブル（２１１）の読み上げる台詞に対応した話者インデ
ックス（１１０３）を取得する。つぎに、得られた話者
インデックス番号を用いて話者テーブル（２１２）を参
照し、話者インデックス（１００１）に対応した話者デ
ータインデックス番号（１００８）を取りだす。台詞を
読み上げる際に用いるべき話者データを決定する。決定
した話者データを話者データ格納手段（１１３）から取
りだし、音声波形を生成する。生成された音声波形は、
Ｄ／Ａ装置（１１４）によって音声に変換される。In the voice rule synthesizing process (104), the speech table and (211), the speaker table (212), the speaker data (113) and the text are input and a voice waveform is generated. When reading the text of the dialogue, the speaker index (1103) corresponding to the dialogue read in the dialogue table (211) is acquired. Next, the speaker table (212) is referenced using the obtained speaker index number, and the speaker data index number (1008) corresponding to the speaker index (1001) is extracted. Determine the speaker data to be used when reading the dialogue. The determined speaker data is taken out from the speaker data storage means (113) and a voice waveform is generated. The generated voice waveform is
It is converted into voice by the D / A device (114).

【００２２】[0022]

【発明の効果】本発明によれば、付加情報が付与されて
いないテキストに対して、テキスト中に現われる台詞部
分の話者を推定し、話者毎に異なる声質で読み上げるテ
キスト読み上げ装置を提供でき、テキストの理解を助け
る。また本発明によれば、テキストの文字属性ごとに異
なる音質の音声で読み上げることによって、文字属性の
音声による理解を助ける。As described above, according to the present invention, it is possible to provide a text-to-speech device which estimates a speaker of a dialogue portion appearing in a text in a text to which additional information is not added and reads aloud with a different voice quality for each speaker. , Help understand the text. Further, according to the present invention, by reading aloud with a voice having a different sound quality for each character attribute of the text, it is possible to help the voice understand the character attribute.

[Brief description of drawings]

【図１】本発明のハードウエア構成の一実施例である。FIG. 1 is an example of a hardware configuration of the present invention.

【図２】本発明の処理フローの一実施例である。FIG. 2 is an example of a processing flow of the present invention.

【図３】本発明の入力テキストの一例である。FIG. 3 is an example of input text of the present invention.

【図４】本発明の話者分析処理の分析結果の一例であ
る。FIG. 4 is an example of an analysis result of speaker analysis processing of the present invention.

【図５】本発明の発話動詞データベースの一実施例であ
る。FIG. 5 is an example of a speech verb database of the present invention.

【図６】本発明の人称代名詞データベースの一実施例で
ある。FIG. 6 is an example of a personal pronoun database of the present invention.

【図７】本発明の人名属性データベースの一実施例であ
る。FIG. 7 is an example of a personal name attribute database of the present invention.

【図８】本発明の文末表現属性データベースの一実施例
である。FIG. 8 is an example of a sentence end expression attribute database of the present invention.

【図９】本発明の方言データベースの一実施例である。FIG. 9 is an example of a dialect database of the present invention.

【図１０】本発明の年令識別語データベースの一実施例
である。FIG. 10 is an example of an age identification word database of the present invention.

【図１１】本発明の話者テーブルの一実施例である。FIG. 11 is an example of a speaker table of the present invention.

【図１２】本発明の台詞テーブルの一実施例である。FIG. 12 is an example of a speech table of the present invention.

【図１３】本発明の話者分析処理の処理フローの一実施
例である。FIG. 13 is an example of a processing flow of speaker analysis processing of the present invention.

【図１４】本発明の台詞テーブル設定処理の処理フロー
の一実施例である。FIG. 14 is an example of a processing flow of a speech table setting process of the present invention.

【図１５】本発明の話者推定方式の説明図である。FIG. 15 is an explanatory diagram of a speaker estimation method of the present invention.

【図１６】本発明の話者の推定方式の説明図である。FIG. 16 is an explanatory diagram of a speaker estimation method of the present invention.

【図１７】本発明の主語の推定方式の説明図である。FIG. 17 is an explanatory diagram of a subject estimation method of the present invention.

【図１８】本発明の台詞属性登録処理の処理フローの一
実施例である。FIG. 18 is an example of the processing flow of the dialog attribute registration processing of the present invention.

【図１９】本発明の話者推定方式の説明図である。FIG. 19 is an explanatory diagram of a speaker estimation method of the present invention.

【図２０】本発明の話者テーブル登録処理の処理フロー
の一実施例である。FIG. 20 is an example of a processing flow of speaker table registration processing of the present invention.

【図２１】本発明の台詞の属性を基にした話者推定処理
の処理フローの一実施例である。FIG. 21 is an example of a processing flow of speaker estimation processing based on a dialogue attribute of the present invention.

【図２２】本発明の台詞の並びを基にした話者推定処理
の処理フローの一実施例である。FIG. 22 is an example of a processing flow of speaker estimation processing based on a line of dialogue according to the present invention.

【図２３】本発明の台詞の並びを基にした話者推定処理
の処理フローの一実施例である。FIG. 23 is an example of a processing flow of speaker estimation processing based on the line of speech according to the present invention.

【図２４】本発明のマニュアルによる台詞の話者指定処
理の処理フローの一実施例である。FIG. 24 is an example of a processing flow of manual speech speaker specification processing of the present invention.

【図２５】本発明の話者データ属性の一実施例である。FIG. 25 is an example of the speaker data attribute of the present invention.

[Explanation of symbols]

１００…テキストデータ、１０１…形態素解析処理、１
０２…話者分析処理、１０３…話者データ割当処理、１
０４…音声規則合成処理、１０６…発話動詞DB、１０７
…人称代名詞DB、１０８…人名属性DB、１０９…文末表
現属性DB、１１０…方言DB、年令識別語DB、１１２…話
者データ属性、１１３…話者データ、１１４…D/A装
置、１２０…制御プログラム、１８０…章題、１８１…
地の文、１８２…台詞、１８３…注釈文、１９１…話者
情報付きテキスト、１９２…話者属性情報、１９３…話
者属性、２０１…CPU、２０２…メモリ、２０３…不揮
発メモリ、２０４…話者データ格納装置、２０５…プロ
グラム格納用ROM、２０６…テキストデータ入力装置、
２０７…ユーザ操作入力装置、２１１…台詞テーブル、
２１２…話者テーブル、２１３…ワークエリア、４０１
…単語見だし、４０２…性別属性、４０３…人称、４０
４…年令属性、４０５…方言属性、４０６…年令属性、
１００１…話者インデックス番号、１００２…話者名
欄、１００３…一人称表現欄、１００４…方言欄、１０
０５…性別欄、１００６…文末表現欄、１００７…年令
欄、１００８…話者データインデックス、１１０１…台
詞インデックス番号、１１０２…台詞接続フラグ、１１
０３…話者インデックス番号、１２０１…台詞テーブル
登録処理、１２０２…話者テーブル登録処理、１２０３
…台詞の属性を基にした話者推定処理、１２０４…台詞
の並びを基にした話者推定処理、１２０５…マニュアル
による台詞の話者指定処理、１３０１…変数初期化処
理、１３０２…台詞部分検出処理、１３０３…主語抽出
処理、１３０４…主語抽出結果判定処理、１３０５…検
出主語保存処理、１３０６…発話動詞チェック処理、１
３０７…文中台詞検出処理、１３０８…台詞属性登録処
理、１３０９…台詞テーブル話者欄設定処理、１３１０
…台詞検出処理、１３１１…話者変数設定処理、１３１
２…話者変数設定チェック、１３１３…台詞テーブル話
者欄設定処理、１３１４…処理終了チェック、１３５１
…パターン１の例、１３５２…パターン２の例、１３５
３…パターン３の例、１４５４…パターン４の例、１３
７１…台詞１、１３７２…台詞２、１４０１…台詞テー
ブル一人称欄設定処理、１４０２…台詞テーブル文末表
現欄設定処理、１４０３…台詞テーブル方言欄設定処
理、１４０４…台詞連続出現チェック処理、１４０５…
台詞テーブル台詞接続フラグ欄設定処理１、１４０６…
台詞テーブル台詞接続フラグ欄設定処理２、１４０７…
台詞テーブル台詞性別欄設定処理、１４０８…台詞テー
ブル年令欄設定処理、１５０１…有効話者名設定チェッ
ク処理、１５０２…話者テーブル既登録チェック処理、
１５０３…話者テーブル空欄設定処理、１５０４…話者
テーブルへの話者設定処理、１５０５…話者登録処理完
了チェック処理、１６０１…台詞に対応付けられた話者
名により話者テーブルを検索する処理、１６０２…台詞
に対応付けられた特性のマッチングによる話者テーブル
の検索処理、１６０３…台詞に対する話者インデックス
番号の設定処理、１６０４…話者候補の絞り込みチェッ
ク処理、１６０５…台詞に対する話者インデックス番号
の設定処理、１６０６…全ての台詞が処理されたかをチ
ェックする処理、１７０１…処理対象の台詞インデック
ス番号の初期化処理、１７０２…話者候補の検索処理、
１７０３…話者インデックスが設定された台詞が現在の
台詞の前に隣接しているかをチェックする処理、１７０
４…隣接する台詞の話者を現在の台詞の話者候補から除
く処理、１７０５…話者インデックスが設定された台詞
が現在の台詞の後に隣接しているかをチェックする処
理、１７０６…隣接する台詞の話者を現在の台詞の話者
候補から除く処理、１７０７…話者候補数のチェック処
理、１７０８…台詞の前後の地の文の主語以外の話者候
補を除く処理、１７０９…話者項補数のチェック処理、
１７１０…話者インデックスを台詞テーブルに登録する
処理、１７１１…処理対象となる台詞のインデックス番
号の更新処理、１７１２…処理終了チェック処理、１７
１３…話者インデックスの設定が変化したかをチェック
する処理、１７５１…台詞１、１７５２…台詞の前の地
の文、１７５３…前に隣接する台詞、１７５４…処理中
の台詞、１７５５…後に隣接する台詞、１７５６…台詞
の後の地の文、１７５７…台詞２、１８０１…処理対象
の台詞インデックス番号の初期化処理、１８０２…処理
対処の台詞に話者インデックスが設定されているかチェ
ックする処理、１８０３…マニュアルで話者を選択する
為の台詞付近のテキスト表示および話者候補の表示処
理、１８０４…ユーザからの話者指定の入力を受け付け
る処理、１８０５…処理終了チェック処理。100 ... Text data, 101 ... Morphological analysis processing, 1
02 ... Speaker analysis processing, 103 ... Speaker data allocation processing, 1
04 ... Voice rule synthesis processing, 106 ... Utterance verb DB, 107
... Personal pronoun DB, 108 ... Person name attribute DB, 109 ... Sentence expression attribute DB, 110 ... Dialogue DB, Age identification word DB, 112 ... Speaker data attribute, 113 ... Speaker data, 114 ... D / A device, 120 … Control program, 180… Chapter, 181…
Ground sentence, 182 ... Dialogue, 183 ... Annotation sentence, 191 ... Text with speaker information, 192 ... Speaker attribute information, 193 ... Speaker attribute, 201 ... CPU, 202 ... Memory, 203 ... Non-volatile memory, 204 ... Speak Person data storage device, 205 ... ROM for program storage, 206 ... Text data input device,
207 ... User operation input device, 211 ... Dialogue table,
212 ... speaker table, 213 ... work area, 401
... Word finding, 402 ... Gender attribute, 403 ... Person, 40
4 ... age attribute, 405 ... dialect attribute, 406 ... age attribute,
1001 ... Speaker index number, 1002 ... Speaker name column, 1003 ... First person expression column, 1004 ... Dialect column, 10
05 ... gender column, 1006 ... sentence expression column, 1007 ... age column, 1008 ... speaker data index, 1101 ... speech index number, 1102 ... speech connection flag, 11
03 ... speaker index number, 1201 ... dialogue table registration process, 1202 ... speaker table registration process, 1203
... Speaker estimation processing based on dialogue attributes, 1204 ... Speaker estimation processing based on dialogue arrangement, 1205 ... Manual dialogue speaker designation processing, 1301 ... Variable initialization processing, 1302 ... Speech portion detection Processing, 1303 ... Subject extraction processing, 1304 ... Subject extraction result determination processing, 1305 ... Detected subject saving processing, 1306 ... Utterance verb check processing, 1
307 ... In-sentence speech detection processing, 1308 ... Speech attribute registration processing, 1309 ... Speech table speaker column setting processing, 1310
... Speech detection processing, 1311 ... Speaker variable setting processing, 131
2 ... Speaker variable setting check, 1313 ... Dialog table speaker column setting processing, 1314 ... Processing end check, 1351
... pattern 1 example, 1352 ... pattern 2 example, 135
3 ... pattern 3 example, 1454 ... pattern 4 example, 13
71 ... Dialogue 1, 1372 ... Dialogue 2, 1401 ... Dialogue table first-person column setting processing, 1402 ... Dialogue table end expression column setting processing, 1403 ... Dialogue table dialect column setting processing, 1404 ... Dialogue consecutive appearance check processing, 1405 ...
Dialog table dialogue connection flag column setting processing 1, 1406 ...
Dialog table dialogue connection flag column setting processing 2, 1407 ...
Speech table dialogue gender column setting processing, 1408 ... Speech table age column setting processing, 1501 ... Effective speaker name setting check processing, 1502 ... Speaker table already registered check processing,
1503 ... Speaker table blank setting processing, 1504 ... Speaker setting processing for speaker table, 1505 ... Speaker registration processing completion check processing, 1601 ... Processing for searching speaker table by speaker name associated with dialogue , 1602 ... Speaker table search processing by matching characteristics associated with dialogue, 1603 ... Speaker index number setting processing for dialogue, 1604 ... Speaker candidate narrowing check processing, 1605 ... Speaker index number for dialogue Setting processing, 1606 ... Processing for checking whether all dialogues have been processed, 1701 ... Initialization processing for dialogue index numbers to be processed, 1702 ... Speaker candidate search processing,
1703 ... A process of checking whether the dialogue set with the speaker index is adjacent to the speech before the current dialogue, 170
4 ... Processing for removing a speaker having an adjacent line of speech from a candidate speaker of the current line of speech, 1705 ... Process of checking whether a line having a speaker index is adjacent after the current line of speech, 1706 ... Adjacent line of speech Processing for removing the speaker of the above from the current speaker candidates of the dialogue, 1707 ... Check processing of the number of speaker candidates, 1708 ... Processing for eliminating the speaker candidates other than the subject of the sentence before and after the dialogue, 1709 ... Speaker item Complement checking process,
1710 ... Process of registering speaker index in speech table, 1711 ... Process of updating index number of speech to be processed, 1712 ... Process end check process, 17
13 ... Processing for checking whether the setting of the speaker index has changed, 1751 ... Dialogue 1, 1752 ... Text before the dialogue, 1753 ... Speech adjacent to the front, 1754 ... Speech being processed, 1755 ... Adjacent to the back Dialogue, 1756 ... ground sentence after dialogue, 1757 ... dialogue 2, 1801 ... initialization processing of dialogue index number to be processed, 1802 ... processing for checking whether speaker index is set for dialogue to be processed, 1803 ... Processing for displaying text around a dialogue for manually selecting a speaker and displaying candidate speakers, 1804 ... Processing for accepting speaker designation input from the user, 1805 ... Processing end check processing.

───────────────────────────────────────────────────── フロントページの続き (72)発明者在塚俊之東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Toshiyuki Arizuka 1-280 Higashi Koikekubo, Kokubunji, Tokyo Inside the Central Research Laboratory, Hitachi, Ltd.

Claims

[Claims]

1. In a text-to-speech device, a plurality of speaker data storage means, a speaker estimation means for estimating a speaker in a text, and a rule-synthesized speech while switching speakers according to the estimated speaker information. A text-to-speech device having a regular voice synthesizing means for generating and outputting, and presenting the structure of text such as conversation in an easy-to-understand manner by distinguishing the text by a plurality of speaker-synthesized sounds.

2. A dialogue symbol in a text separates a dialogue of a text from a ground sentence, and further, a personal pronoun appearing in the dialogue, a suffix expression of the dialogue, a dialect appearing in the dialogue, a subject and a utterance of sentences before and after the dialogue. Speaker estimation method for estimating the speaker of each part of the text from one or more pieces of verb information related to.

3. The text-to-speech device according to claim 1, wherein a classifying unit that classifies the text into a title, a comment, and a normal sentence according to a line feed and a symbol appearing in the text, and a different person for each classification is used, or a speaking rate. , A voice rule synthesizing means capable of reading text with different voice qualities by changing the utterance style such as voice pitch and intonation pattern, or by changing both the speaker and the utterance style.
A text-to-speech device characterized by presenting the structure of text in an easy-to-understand manner.