JP2003208191A

JP2003208191A - Speech synthesis system

Info

Publication number: JP2003208191A
Application number: JP2002005501A
Authority: JP
Inventors: Atsushi Shoji; 厚志庄司; Kimimasa Horio; 公正堀尾; Takeo Mori; 竹雄森; Michihiko Seo; 充彦瀬尾
Original assignee: Hitachi ULSI Systems Co Ltd
Current assignee: Hitachi Solutions Technology Ltd
Priority date: 2002-01-15
Filing date: 2002-01-15
Publication date: 2003-07-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech synthesis system where the designation of a reading manner of and customizing can easily be performed by a writer and speech synthesis as designated by the writer is realized when being read in creating a document on the premise of reading the document through speech synthesis. <P>SOLUTION: The speech synthesis system comprises: a document editing device 100 provided with document editors 120 and 130 for performing inputting of rhythm information showing the reading manner of a word/phrase accompanying document input and data filing of document data and the rhythm information; and a document output device 200 having a speech synthesis means 240 which performs the speech synthesis of the document data on the basis of generated data files 122, 132 and 133 and which performs synthesis of speech according to the rhythm information when the rhythm information related with the document data exists and document editors 220 and 230 capable of displaying and outputting only a document expressing the document data without displaying the rhythm information. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、音声合成される
ことを前提に文書作成を行うシステムに適用して有用な
技術に関し、例えば複数のユーザ間で音声合成される文
書のやり取りをする場合に利用して特に有用な技術に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique useful when applied to a system for creating a document on the assumption that voice synthesis is performed. For example, when exchanging a voice synthesized document among a plurality of users. The technology is particularly useful to use.

【０００２】[0002]

【従来の技術】例えば、電子メールを音声合成により読
み上げて出力するようにしたソフトウェアがある。この
ようなソフトウェアを用いて、音声合成されることを前
提に電子メールをやり取りするような場合には、送信者
はキーボード等から文書入力を行ってメール文を作成送
信する一方、受信者は受け取ったメール文を音声合成プ
ログラムにより音声合成して聞き取る。音声合成プログ
ラムは、一般に、文章がどのような単語から構成されて
いるか、主部と述部の境界はどこにあるかなどの日本語
解析を行った後、各種単語の音声情報を格納した辞書デ
ータを参照しつつ、上記日本語解析の結果として得られ
た言語情報と各単語の音響的特徴とを関連づけながら、
一連の文章の音声合成を行っていく。2. Description of the Related Art For example, there is software that reads out and outputs an electronic mail by voice synthesis. When using such software to exchange e-mails on the premise that voice synthesis is performed, the sender inputs a document from a keyboard or the like to create and send an e-mail message, while the recipient receives it. You can listen to the e-mail text by synthesizing it with a voice synthesis program. Speech synthesis programs generally analyze the Japanese words such as what words a sentence is composed of and where the boundary between the main part and the predicate is, and then dictionary data that stores the voice information of various words. While associating the linguistic information obtained as a result of the above Japanese analysis with the acoustic features of each word,
Speech synthesis of a series of sentences is performed.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記の
ような音声合成を前提に電子メールをやり取りする電子
メールシステムでは、辞書データに登録されてない新規
の単語を使った場合には、その単語の発音が正しく行わ
れないという問題がある。また、送信者が語句のイント
ネーションをカスタマイズして相手に読ませたいと思っ
た場合でも、このような欲求を満たすことは非常に困難
であるという課題がある。However, in the e-mail system for exchanging e-mails on the premise of speech synthesis as described above, when a new word not registered in the dictionary data is used, the word There is a problem that pronunciation is not performed correctly. Further, even if the sender wants to customize the intonation of the phrase and have the other party read it, it is very difficult to satisfy such a desire.

【０００４】この発明の目的は、音声合成により読み上
げられることを前提に文書作成を行うような場合に、書
き手による読み方の指定やカスタマイズが容易に行える
とともに、読み上げられる際に書き手の指定どおりの音
声合成が可能となる音声合成システムを提供することに
ある。この発明の前記ならびにそのほかの目的と新規な
特徴については、本明細書の記述および添附図面から明
らかになるであろう。An object of the present invention is to allow a writer to easily specify and customize a reading method when a document is created on the premise that it is read out by voice synthesis, and at the same time, a voice according to a writer's specification is read out. It is to provide a speech synthesis system that enables synthesis. The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

【０００５】[0005]

【課題を解決するための手段】本願において開示される
発明のうち代表的なものの概要を説明すれば、下記のと
おりである。すなわち、文書編集装置において、語句の
韻律情報を文書入力に伴って入力可能とし、これら文書
データと韻律情報とをデータファイル化する一方、文書
出力装置において、文書データの音声合成を行う際にデ
ータファイルに韻律情報がある語句に関しては該韻律情
報に従った音声合成を行わせるようにしたものである。
このような手段によれば、指定した読み方で文書が読み
上げられるように文書を作成することが容易となり、ま
た、その指定どおりに文書が読み上げられることにな
る。The typical ones of the inventions disclosed in the present application will be outlined below. That is, in the document editing device, prosody information of words and phrases can be input along with document input, and while these document data and prosody information are made into a data file, the data output device performs data synthesis when performing voice synthesis of the document data. For words and phrases having prosodic information in the file, speech synthesis is performed according to the prosodic information.
According to such a means, it becomes easy to create a document so that the document can be read aloud according to the specified reading, and the document can be read aloud as specified.

【０００６】[0006]

【発明の実施の形態】以下、本発明の好適な実施例を図
面に基づいて説明する。図１は、本発明の実施例の音声
合成システムの概要を説明する図である。この実施例の
音声合成システムは、文書入力を行う編集装置１００
と、文書を表示したり音声出力したりする出力装置２０
０とから構成されるものであり、これら編集装置１００
と出力装置２００とは例えばソフトウェア構成により別
々のパーソナルコンピュータ上に実現される。もちろ
ん、１台のコンピュータ上に編集装置１００と出力装置
２００の両機能が備わるようにしても良い。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram for explaining the outline of the speech synthesis system according to the embodiment of the present invention. The voice synthesis system of this embodiment is an editing apparatus 100 for inputting a document.
And an output device 20 for displaying a document and outputting a voice
And the editing device 100.
The output device 200 and the output device 200 are realized on separate personal computers by, for example, software configurations. Of course, both functions of the editing device 100 and the output device 200 may be provided on one computer.

【０００７】また、図１の音声合成システムは、その編
集方式と出力方式とが３種類備わったものである。な
お、これら各方式はそれぞれ独立したものであり、３種
類とも備える必要はなく、何れか１種類のみを備えてい
れば本発明に係る音声合成システムとしての機能を果た
す。次に、各方式ごとに説明する。The speech synthesis system shown in FIG. 1 has three types of editing methods and output methods. It should be noted that each of these methods is independent, and it is not necessary to provide all three types. If only any one type is provided, the function as the speech synthesis system according to the present invention is fulfilled. Next, each method will be described.

【０００８】第１の方式は、韻律情報をテキストデータ
形式で文書中に直接挿入して入力し、この韻律情報と文
書データとを含んだデータファイルから文書の表示や音
声合成出力を行うものである。この第１の方式において
編集装置１００では、入力インターフェースおよびデー
タファイル生成手段としての機能を有するエディタ１１
０が用いられ、出力装置２００では文書表示機能などを
有するエディタ２１０が用いられる。この第１方式にお
いては、文書入力はテキストデータや下線などの入力が
可能な汎用のテキストエディタ１１０を用いて行うこと
が出来る。そして、該エディタ１１０を用いた文書入力
中に、音声合成用の辞書に登録されてないような新規の
語句を用いたり、語句の発音をカスタマイズしたい場合
に、韻律情報を後述するようにテキストデータ形式で文
書中に直接入力することで対応する。In the first method, prosody information is directly inserted into a document in a text data format and input, and a document is displayed and a voice synthesis output is performed from a data file containing the prosody information and the document data. is there. In the editing apparatus 100 in the first method, the editor 11 having a function as an input interface and a data file generating means is provided.
0 is used, and the output device 200 uses an editor 210 having a document display function and the like. In the first method, document input can be performed using a general-purpose text editor 110 capable of inputting text data, underline, and the like. Then, when a new word that is not registered in the dictionary for voice synthesis is used during inputting a document using the editor 110 or when it is desired to customize the pronunciation of the word, the prosody information is converted into text data as described later. It corresponds by directly inputting in the document in the format.

【０００９】テキストエディタ１１０は、文書入力が終
わったら、文書データと韻律情報とが一緒にまとめられ
てテキストデータ形式（下線情報も含まれる拡張型のテ
キストデータ形式、又はＨＴＭＬ（Hyper Text Mark-up
language）などを含む）のデータファイル１１２を生
成する。出力装置２００側には、第１の方式に関わるモ
ジュールとして、上記のテキストデータ形式のデータフ
ァイルを扱える汎用的なテキストエディタ２１０や、日
本語音声合成を行う音声合成手段としての日本語処理モ
ジュール２４０や規則合成処理モジュール２６０などが
設けられている。これらテキストエディタ２１０、日本
語処理モジュール２４０、規則合成処理モジュール２６
０はコンピュータにより実現されるソフトウェア構成で
ある。出力装置２００に上記のデータファイル１１２が
受け渡された場合には、先ず、該データファイル１１２
がテキストエディタ２１０に入力されて、文書の表示出
力が行われる。但し、ここで表示出力される内容には文
書データのほか韻律情報も含まれる。When the document input is completed, the text editor 110 combines the document data and the prosody information together into a text data format (extended text data format including underline information, or HTML (Hyper Text Mark-up).
(including language) etc.) is generated. On the output device 200 side, as a module related to the first method, a general-purpose text editor 210 that can handle a data file in the above-mentioned text data format, and a Japanese language processing module 240 as a voice synthesizing means for performing Japanese voice synthesizing. A rule synthesis processing module 260 and the like are provided. These text editor 210, Japanese processing module 240, rule composition processing module 26
0 is a software configuration realized by a computer. When the data file 112 is transferred to the output device 200, first, the data file 112 is transferred.
Is input to the text editor 210, and the document is displayed and output. However, the contents displayed and output here include prosody information as well as document data.

【００１０】また、データファイル１１２はそのまま日
本語処理モジュール２４０にも送られて、該モジュール
２４０により文書中に挿入された韻律情報が文書データ
と分離されるとともに、文書データについて日本語解析
解析が行われる。そして、文法・文節に応じた各単語の
読み韻律記号（読み仮名やアクセントや単語句切りなど
の韻律情報が含まれるデータ）が辞書から抽出され、文
書データの全体が読み韻律記号に変換されていく。そし
て、それにより読み韻律記号データ２５０が生成され
る。この読み韻律記号への変換処理の際、文書中に挿入
された韻律情報がある部分については、その韻律情報が
優先されて読み韻律記号に変換される。その後、出力装
置２００内では、読み韻律記号データ２５０が規則合成
処理モジュール２６０に送られて、読み韻律記号に従っ
た規則的な音声合成が行われる。そして合成された音声
がスピーカ２７０等から出力される。このように、上記
の文書編集や文書出力の方式により、文書入力中に語句
の読み方をユーザにより容易に指定或いはカスタマイズ
できるとともに、そのデータファイルの読み手側におい
て書き手の指定どおりの読み方で音声合成出力すること
が出来る。The data file 112 is also sent to the Japanese language processing module 240 as it is, the prosody information inserted in the document is separated from the document data by the module 240, and the Japanese data is analyzed and analyzed. Done. Then, the reading prosodic symbols of each word (data including prosodic information such as kana, accents, and word phrases) according to the grammar and clause are extracted from the dictionary, and the entire document data is converted into reading prosodic symbols. Go. Then, the reading prosody symbol data 250 is thereby generated. At the time of this conversion processing into reading prosodic symbols, the prosody information is prioritized and converted into the reading prosodic symbols for the part having prosody information inserted in the document. After that, in the output device 200, the reading prosody symbol data 250 is sent to the rule synthesis processing module 260, and regular speech synthesis according to the reading prosody symbol is performed. Then, the synthesized voice is output from the speaker 270 or the like. In this way, the above-mentioned document editing and document output methods allow the user to easily specify or customize the reading of words and phrases during document input, and at the side of the reader of the data file, the voice synthesis output according to the reading specified by the writer. You can do it.

【００１１】図２〜図５には、編集装置１００のテキス
トエディタ１１０において韻律情報をテキスト形式で入
力する幾つかの例を示す。図２の例は、単語のアクセン
トを指定する場合を説明するものである。同図に示すよ
うに、例えば、新規の単語「おっはー」などは、通常の
音声合成処理では、１単語としては認識されてもアクセ
ントはどこに有るのか特定されず、同図（Ａ）のよう
に、アクセントのない「オッハー．」と云った読み韻律
記号に変換される。ここで「．」は句切り記号である。
そこで、新規の単語「おっはー」にアクセントを付けた
い場合には、図２（Ｂ）のように、アクセントを付ける
文字の後ろにアクセント記号（例えば「’」）を挿入す
る。そうすることで、音声合成処理において、「オ’ッ
ハー．」と云ったユーザによりアクセント箇所が指定さ
れた読み韻律記号に変換される。2 to 5 show some examples of inputting prosody information in text format in the text editor 110 of the editing apparatus 100. The example of FIG. 2 illustrates a case where a word accent is designated. As shown in the same figure, for example, a new word "Ohhh" or the like is not specified where the accent is present even if it is recognized as one word in the usual speech synthesis process. As described above, it is converted into a reading prosodic symbol with no accent, such as "Ocher." Here, “.” Is a punctuation mark.
Therefore, when it is desired to add an accent to the new word "Ohha", an accent symbol (for example, "'") is inserted after the character to be accented, as shown in FIG. By doing so, in the voice synthesis process, the accented part is converted into a reading prosody symbol in which the accented part is designated by the user, "Oh, ha.".

【００１２】図３の例は、１つのアクセントを有する単
語句切りを指定する場合を説明するものである。新規の
単語「ｉモード」などは、通常の音声合成処理では
「ｉ」と「モード」の２つの単語として認識されるた
め、同図（Ａ）のように「ア’イ／モ’ード．」と２つ
に単語句切りされ２つのアクセントが付された読み韻律
記号に変換される。ここで「／」は単語句切り記号であ
る。そこで、複数の単語と認識される語句を１単語と認
識させたい場合には、図３（Ｂ）のように、１単語と認
識させたい語句に装飾記号としての下線を付す。そうす
ることで、音声合成処理において、「ア’イモード．」
と云ったユーザにより１単語に指定された読み韻律記号
に変換される。The example of FIG. 3 explains a case where a word phrase segment having one accent is designated. Since a new word "i-mode" or the like is recognized as two words "i" and "mode" in a normal speech synthesis process, as shown in FIG. . ”And is divided into two word phrases and converted into a reading prosodic symbol with two accents. Here, "/" is a word phrase separator. Therefore, when it is desired to recognize a word or phrase recognized as a plurality of words as one word, the word or phrase to be recognized as one word is underlined as a decoration symbol as shown in FIG. 3B. By doing so, in the voice synthesis process, "a'i mode."
It is converted into a reading prosody symbol designated by the user as one word.

【００１３】図４の例は、図３の例に読み方の修正も加
えた場合を説明するものである。単語「ＩＴ革命」など
は、通常の音声合成処理では「ＩＴ」と「革命」の２単
語として認識され、さらに、「ＩＴ」は「イット」など
と間違った発音で認識されるため、同図（Ａ）のように
「イ’ット／カクメー」と云った読み韻律記号に変換さ
れる。そこで、間違った発音に認識される語の読み方を
修正する場合には、図４（Ｂ）のように、単語の後ろに
その読み仮名を括弧でくくって挿入する。それにより、
ユーザ指定の読み仮名でその単語の音声合成処理が行わ
れることとなる。また、図４（Ｂ）の例では、下線を付
すことで１単語に認識させることも行っているので、結
果として「アイティーカ’クメー」と云った読み韻律記
号に変換される。なお、「革命」の第１アクセントは、
アクセント記号を入力することで付けることもできる
が、ここでは、接頭語付き単語として辞書登録された
「革命」のアクセントが付されている。The example shown in FIG. 4 illustrates a case where the reading is modified in addition to the example shown in FIG. The word "IT revolution" or the like is recognized as two words "IT" and "revolution" in the normal speech synthesis process, and further, "IT" is recognized as an incorrect pronunciation such as "it". As shown in (A), it is converted into a reading prosody symbol called "It / Kakume". Therefore, to correct the reading of a word recognized as having an incorrect pronunciation, the reading kana is inserted in parentheses after the word as shown in FIG. 4 (B). Thereby,
The speech synthesis processing of the word will be performed using the phonetic alphabet designated by the user. Further, in the example of FIG. 4 (B), one word is also recognized by being underlined, and as a result, it is converted into a reading prosodic symbol called "itika'kume". The first accent of "Revolution" is
It can be added by inputting an accent symbol, but here, the accent of "revolution" registered in the dictionary as a word with a prefix is attached.

【００１４】図５の例は、語句に様々な韻律パラメータ
を指定する場合を説明するものである。単語や句に、音
の高低、音の大小、音の長さ（出力時間長）などの韻律
パラメータを指定する場合には、図５（イ）〜（ホ）の
ように、単語や句の後に韻律パラメータの指定用に予め
設定されている記号や文字を括弧にくくって挿入して行
う。韻律パラメータ用に設定されている文字としては、
例えば、音の長さを表わす「早く」「遅く」、音の高低
を表わす「高く」「低く」、音の大小を表わす「大き
く」「小さく」、音を大きく且つ長くする「強調」など
がある。そして、文書入力時に上記の記号を単語の後に
付すことで韻律パラメータの指定を行う。このような指
定が行われている場合、図５（イ）〜（ニ）のように、
音声合成処理により韻律パラメータ付きの読み韻律記号
に変換される。図中の「Ｓ」「Ｐ」「Ｆ」「Ｖ」が韻律
パラメータの種類、数字はそのパラメータ値を示してい
る。The example of FIG. 5 illustrates the case where various prosody parameters are specified for a phrase. When specifying prosodic parameters such as pitch of a sound, loudness of a sound, and duration of a sound (output time length) for a word or a phrase, as shown in FIGS. Later, symbols and characters that are preset for specifying prosody parameters are inserted in parentheses so that they are not parenthesized. The characters set for the prosody parameter are:
For example, "fast" and "slow" that represent the length of the sound, "high" and "low" that represent the pitch of the sound, "big" and "small" that represent the loudness of the sound, and "emphasis" that makes the sound loud and long. is there. Then, when the document is input, the prosody parameter is specified by adding the above symbol after the word. When such a designation is made, as shown in FIGS.
It is converted into a reading prosody symbol with a prosody parameter by a voice synthesis process. In the figure, “S”, “P”, “F”, and “V” indicate the types of prosody parameters, and the numbers indicate the parameter values.

【００１５】また、複数の単語が集まった句に韻律パラ
メータを設定する場合には、図５（ホ）のように、鉤括
弧で句を区切ってその後に韻律パラメータの記号を挿入
する。この場合、音声合成処理により複数単語に韻律パ
ラメータが掛かった読み韻律記号への変換が行われる。
図中の「＞」は複数単語の場合に韻律パラメータの指定
範囲の終りを示す記号である。以上のように、テキスト
エディタ１１０において文書入力中に韻律情報の入力を
行うことが出来る。Further, when setting a prosodic parameter for a phrase in which a plurality of words are gathered, as shown in FIG. 5 (e), the phrase is delimited by a hook and a prosodic parameter symbol is inserted after that. In this case, the voice synthesis process converts the prosodic parameters into a plurality of words into reading prosodic symbols.
In the figure, “>” is a symbol indicating the end of the specified range of the prosody parameter in the case of a plurality of words. As described above, prosody information can be input during text input in the text editor 110.

【００１６】次に、第２方式について説明する。第２方
式は、韻律情報の入力を文書入力方式と異なるコマンド
選択により文書入力中に行い、文書データと韻律情報と
が別の形態にまとめられた専用のデータファイルとして
取り扱われるようにしたものである。この第２方式にお
いて、編集装置１００側では、入力インターフェースお
よびデータファイル生成手段としての機能を有する専用
エディタ１２０が用いられ、出力装置２００側では出力
インターフェースとしての機能を有する専用エディタ２
２０が用いられる。Next, the second method will be described. The second method is such that prosody information is input during document input by command selection different from the document input method, and the document data and the prosody information are handled as a dedicated data file in a different form. is there. In the second method, the editing apparatus 100 side uses a dedicated editor 120 having a function as an input interface and a data file generating means, and the output apparatus 200 side has a dedicated editor 2 having a function as an output interface.
20 is used.

【００１７】図６には、上記専用エディタ１２０におい
て韻律情報が入力されるところの画像図を示す。専用エ
ディタ１２０では、文書入力に加えて、書式やフォント
など様々な拡張情報を付加することが可能な汎用の文書
編集モジュールとほぼ同様のものである。この実施例の
専用エディタ１２０は、文書データに付加することが可
能な拡張情報として、さらに語句の韻律情報が含まれる
ものである。韻律情報の入力は、図６に示すように、韻
律情報を付加する語句を選択した後、専用エディタ１２
０に備わるメニュー１２５の中から韻律情報挿入コマン
ド１２６を選択することで行う。そして、このコマンド
選択により出現する吹出し表示１２７の中に図２〜図５
に示したような読み韻律記号を入力する。FIG. 6 shows an image diagram where the prosody information is input in the dedicated editor 120. The dedicated editor 120 is almost the same as a general-purpose document editing module capable of adding various extended information such as formats and fonts in addition to document input. The dedicated editor 120 of this embodiment further includes prosodic information of words and phrases as extended information that can be added to document data. To input the prosody information, as shown in FIG. 6, after selecting a phrase to which the prosody information is added, the dedicated editor 12
This is done by selecting the prosody information insertion command 126 from the menu 125 provided in 0. 2 to 5 in the balloon display 127 appearing by this command selection.
Enter the reading prosodic symbol as shown in.

【００１８】なお、吹出し表示１２７の中に読み韻律記
号を直接入力するのではなく、ユーザに判り易く規定し
た記号体系を用いて韻律情報を入力するようにしても良
い。また、コマンド選択の仕方や韻律情報入力用の表示
（吹出し表示１２７）などは種々の方式に変更可能であ
る。そして、専用エディタ１２０において文書の入力が
完了されると、専用エディタ１２０により文書データ
（「漢字かな混じり文」）と韻律情報とが別形態で且つ
韻律情報がどの語句のものか対応づけられた状態でデー
タファイル化され、専用のデータファイル１２２が生成
される。It should be noted that instead of directly inputting the reading prosody symbols into the balloon display 127, the prosody information may be input using a symbol system that is defined so as to be easily understood by the user. Further, the method of command selection, the display for inputting prosody information (callout display 127), and the like can be changed to various methods. When the input of the document is completed in the dedicated editor 120, the dedicated editor 120 associates the document data (“Kanji-kana mixed sentence”) and the prosody information with different forms and the prosody information is associated with which phrase. A data file is created in the state, and a dedicated data file 122 is generated.

【００１９】出力装置２００側では、上記データファイ
ル１２２を受け取った場合に、該データファイル１２２
の形式に対応した専用エディタ２２０を用いて、韻律情
報を省いた文書のみの表示出力が可能である。なお、語
句を指定することで、その語句に付加された韻律情報が
吹出し表示されるなど、特殊な方法で韻律情報が表示さ
れる機能を付加しても良い。また、出力装置２００側で
は、データファイル１２２が日本語処理モジュール２４
０にも送られて、文書の日本語解析や言語情報に関連し
た各単語の読み韻律記号の抽出等が行われ、文書データ
の全体が読み韻律記号に変換されていく。そして、読み
韻律記号データ２５０が生成される。上記読み韻律記号
の変換の際、データファイル１２２内に含まれる韻律情
報に対応づけられた語句については、その韻律情報が優
先されて変換処理が行われる。その後、この読み韻律記
号２５０が規則合成処理モジュール２６０に送られて、
読み韻律記号に従った規則的な音声合成が行われ、その
合成音声がスピーカ２７０等から出力される。On the output device 200 side, when the data file 122 is received, the data file 122 is received.
It is possible to display and output only the document without the prosody information by using the dedicated editor 220 corresponding to the format. It should be noted that a function of displaying the prosody information by a special method may be added by, for example, displaying the prosody information added to the phrase by displaying it by designating the phrase. On the output device 200 side, the data file 122 is the Japanese language processing module 24.
0, the Japanese analysis of the document is performed, the reading prosodic symbols of each word related to the language information are extracted, and the entire document data is converted into reading prosodic symbols. Then, the reading prosody symbol data 250 is generated. At the time of conversion of the reading prosody symbols, the conversion processing is performed for words and phrases associated with the prosody information included in the data file 122 with priority given to the prosody information. Then, this reading prosodic symbol 250 is sent to the rule synthesis processing module 260,
Regular voice synthesis is performed according to the reading prosodic symbols, and the synthesized voice is output from the speaker 270 or the like.

【００２０】以上のように、この第２方式においても、
文書入力中に語句の読み方をユーザにより容易に指定或
いはカスタマイズできるとともに、そのデータファイル
の読み手側において書き手の指定どおりの読み方で音声
合成出力することが出来る。また、読み手側で文書を表
示する際に、読み方を示す韻律情報が表示されないよう
に出来るので、見た目上も自然で好ましい。As described above, also in this second method,
The user can easily specify or customize the reading method of the phrase during input of the document, and the reader side of the data file can perform voice synthesis output according to the reading method specified by the writer. Further, when the document is displayed on the reader side, it is possible to prevent the prosodic information indicating the reading from being displayed, which is natural and preferable.

【００２１】第３方式は、データファイルの形式が第２
方式のものと異なり、その他韻律情報の入力の仕方など
は第２方式のものと同様のものである。この第３方式に
おいて編集装置１００側では入力インターフェースおよ
びデータファイル生成手段としての機能を有する専用エ
ディタ１３０が用いられ、出力装置２００側では出力イ
ンターフェースとしての機能を有する専用エディタ２３
０が用いられる。この第３方式においては、専用エディ
タ２３０を用いた文書データの入力方法や韻律情報の入
力方法は図６に示したものと同様である。他方、この第
３方式において専用エディタ２３０により生成されるデ
ータファイルは、文書データや書式情報などが格納され
たテキストデータファイル１３２と、韻律情報が文書デ
ータの語句に対応づけられて格納された韻律データファ
イル１３３と、２個のデータファイルである。In the third method, the data file format is the second.
Unlike the system, the other method of inputting prosody information is similar to that of the second system. In the third method, the editing apparatus 100 side uses a dedicated editor 130 having a function as an input interface and a data file generating means, and the output apparatus 200 side has a dedicated editor 23 having a function as an output interface.
0 is used. In the third method, the document data input method and the prosody information input method using the dedicated editor 230 are the same as those shown in FIG. On the other hand, the data file generated by the dedicated editor 230 in the third method is a text data file 132 in which document data, format information, etc. are stored, and a prosody in which prosody information is stored in association with words of the document data. A data file 133 and two data files.

【００２２】出力装置２００側では、これら２個のデー
タファイル１３２，１３３を受け取ると、専用エディタ
２３０によりテキストデータファイル１３２の処理を行
って文書の表示出力を行う一方、日本語処理モジュール
２４０にテキストデータファイル１３２と韻律データフ
ァイル１３３の両方を入力して、第２方式と同様に音声
合成処理を行う。そして、同様にスピーカー２７０等か
ら合成音声の出力が行われる。以上のように、上記実施
例の音声合成システムによれば、文書を音声合成で読み
上げることを前提に文書を書く場合に、書き手側の意図
した読み方を文書入力中に容易に指定し、読み手側にお
いてもこの指定に従って音声合成されるようにすること
が出来る。When the output device 200 receives these two data files 132 and 133, the dedicated editor 230 processes the text data file 132 to display and output the document, while the Japanese processing module 240 processes the text. Both the data file 132 and the prosody data file 133 are input, and the voice synthesis processing is performed as in the second method. Then, similarly, the synthesized voice is output from the speaker 270 or the like. As described above, according to the speech synthesis system of the above-described embodiment, when writing a document on the premise that the document is read out by speech synthesis, the reading method intended by the writer can be easily specified during document input, and the reading side Also in, the voice synthesis can be performed in accordance with this designation.

【００２３】なお、図１の例では、書き手側に編集装置
１００が備わり、読み手側に出力装置２００が備わるシ
ステム構成を示したが、書き手側と読み手側の両者にそ
れぞれ編集装置１００と出力装置２００とが備わるよう
にシステムを構成し、音声合成用の文書のやり取りを双
方向で行えるようにするのが一般的である。そして、こ
のように構成された場合には、ユーザが作成した文書を
自分で音声合成して耳で確認しながら文書作成や読み方
の指定を進めることも可能となる。In the example of FIG. 1, the writer side is provided with the editing device 100 and the reader side is provided with the output device 200. However, both the writer side and the reader side have the editing device 100 and the output device, respectively. It is general that the system is configured so that it is provided with the device 200 so that the documents for voice synthesis can be exchanged bidirectionally. With such a configuration, it is also possible to proceed with the document creation and the designation of the reading while confirming with the ear the voice created by the user for the document created by the user.

【００２４】以上本発明者によってなされた発明を実施
例に基づき具体的に説明したが、本発明は上記実施例に
限定されるものではなく、その要旨を逸脱しない範囲で
種々変更可能であることはいうまでもない。例えば、音
声合成の手法として、文書の日本語解析の後に辞書デー
タを用いて読み韻律記号へ変換し、その後、読み韻律記
号を規則的に音声合成するといった手法を示したが、書
き手が入力した韻律情報が反映されるような音声合成の
仕方であれば、どのような方法によるものであっても良
い。また、実施例で具体的に示した韻律情報の入力の仕
方についても、文書入力に伴ってユーザが容易に入力で
きるものであれば、様々に変更可能である。Although the invention made by the present inventor has been specifically described based on the embodiments, the invention is not limited to the above embodiments, and various modifications can be made without departing from the scope of the invention. Needless to say. For example, as a method of speech synthesis, a method of converting Japanese into a reading prosody symbol using dictionary data after Japanese analysis of a document, and then regularly synthesizing the reading prosody symbol was shown. Any method may be used as long as it is a method of speech synthesis that reflects prosody information. Further, the method of inputting the prosody information concretely shown in the embodiments can be variously changed as long as the user can easily input the text input.

【００２５】以上の説明では主として本発明者によって
なされた発明をその背景となった利用分野である音声合
成出力されることを前提に複数のユーザ間で文書をやり
取りするような場面に使用される音声合成システムにつ
いて説明したが、この発明はそれに限定されるものでな
く、例えば、音声合成信号又は合成音声データを作成す
るために用いられる音声合成システムなど広く利用する
ことができる。In the above description, the invention made by the present inventor is mainly used in a situation where documents are exchanged between a plurality of users on the assumption that voice synthesis is output which is a field of use which is the background of the invention. Although the speech synthesis system has been described, the present invention is not limited thereto, and can be widely used, for example, a speech synthesis system used to create a speech synthesis signal or synthetic speech data.

【００２６】[0026]

【発明の効果】本願において開示される発明のうち代表
的なものによって得られる効果を簡単に説明すれば下記
のとおりである。すなわち、本発明に従うと、文書を音
声合成で読み上げることを前提に文書を書く場合に、書
き手側の意図した読み方を文書入力中に容易に指定する
ことが可能で、且つ、読み手側において書き手の指定に
従った読み方で音声合成することが出来るという効果が
ある。The effects obtained by the representative one of the inventions disclosed in the present application will be briefly described as follows. That is, according to the present invention, when writing a document on the premise that the document is read out by voice synthesis, it is possible to easily specify the intended reading on the writer side during document input, and the reader side There is an effect that the voice can be synthesized by the reading according to the designation.

[Brief description of drawings]

【図１】本発明の実施例の音声合成システムを説明する
概要図である。FIG. 1 is a schematic diagram illustrating a speech synthesis system according to an embodiment of the present invention.

【図２】テキスト形式で韻律情報を入力する第１例を説
明する図である。FIG. 2 is a diagram illustrating a first example of inputting prosody information in a text format.

【図３】テキスト形式で韻律情報を入力する第２例を説
明する図である。FIG. 3 is a diagram illustrating a second example of inputting prosody information in a text format.

【図４】テキスト形式で韻律情報を入力する第３例を説
明する図である。FIG. 4 is a diagram illustrating a third example of inputting prosody information in a text format.

【図５】テキスト形式で韻律情報を入力する第４例を説
明する図である。FIG. 5 is a diagram illustrating a fourth example of inputting prosody information in a text format.

【図６】音声合成システムの専用エディタで韻律情報を
入力するところを示す表示例である。FIG. 6 is a display example showing input of prosody information with a dedicated editor of the voice synthesis system.

[Explanation of symbols]

１００編集装置１１０テキストエディタ１１２データファイル１２０，１３０専用エディタ１２２，１３２，１３３データファイル２００出力装置２２０，２３０専用エディタ２４０日本語処理モジュール２６０規則合成処理モジュール 100 editing device 110 text editor 112 data files 120, 130 dedicated editor 122, 132, 133 data files 200 output device 220, 230 dedicated editor 240 Japanese processing module 260 Rule Synthesis Processing Module

───────────────────────────────────────────────────── フロントページの続き (72)発明者堀尾公正東京都小平市上水本町５丁目22番１号株式会社日立超エル・エス・アイ・システムズ内 (72)発明者森竹雄東京都小平市上水本町５丁目22番１号株式会社日立超エル・エス・アイ・システムズ内 (72)発明者瀬尾充彦東京都小平市上水本町５丁目22番１号株式会社日立超エル・エス・アイ・システムズ内Ｆターム(参考） 5D045 AB02 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Tadashi Horio 5-22-1 Kamimizuhonmachi, Kodaira-shi, Tokyo Stock Ceremony Company Hitachi Cho-LS System Within (72) Inventor Takeo Mori 5-22-1 Kamimizuhonmachi, Kodaira-shi, Tokyo Stock Ceremony Company Hitachi Cho-LS System Within (72) Inventor Mitsuhiko Seo 5-22-1 Kamimizuhonmachi, Kodaira-shi, Tokyo Stock Ceremony Company Hitachi Cho-LS System Within F-term (reference) 5D045 AB02

Claims

[Claims]

1. An input interface for inputting prosodic information indicating how to read a phrase when a document is input, and data for making a data file of the inputted document data and making a data file of the prosody information associated with the document data. A document editing apparatus having a file generation means, and voice synthesis of document data is performed based on the data file of the document data and prosody information, and if there is prosody information associated with the document data, the prosody information is followed. A document output device having a voice synthesizing unit for performing voice synthesis, and an output interface capable of displaying and outputting only the document represented by the document data without displaying the prosody information based on the data file of the document data and the prosody information. A speech synthesis system comprising:

2. An input interface for inputting prosody information indicating how to read a phrase in the same format as a document input when a document is input, and input document data and the prosody information integrally and in a text data format. A document editing apparatus having a data file generating means for converting to a data file, and based on the data file of the document data and the prosody information, the document data and the prosody information are separated to perform voice synthesis of the document data and the prosody information is added. And a document output device having a voice synthesizing means for performing voice synthesizing in accordance with the prosody information when there is a phrase, and a voice synthesizing system.

3. The prosody information is input by adding a decoration symbol to a phrase in the input interface,
3. The speech synthesis system according to claim 2, further comprising accent delimiter information that regards the words or phrases to which the decoration symbol is continuously given as a word having one accent.

4. The prosody information includes accent information indicating that a character is input by adding an accent mark to the character in the input interface and indicates that the character with the accent mark is to be accented. The speech synthesis system according to claim 2 or 3.

5. The prosody information includes prosody parameter information that sets a pronunciation prosody of a phrase as a prosody according to a prosody parameter, and the prosody parameter information adds the prosody parameter before or after the phrase in the input interface. 5. The voice synthesis system according to claim 2, wherein the voice synthesis system inputs the voice.