JP2002108378A

JP2002108378A - Document reading-aloud device

Info

Publication number: JP2002108378A
Application number: JP2000302679A
Authority: JP
Inventors: Tomohiro Takano; 智大高野; Shunichi Yonemura; 俊一米村; Yasuhito Kono; 泰人河野; Mio Hosoya; 未生細谷; Tomoya Kosaka; 朋也小阪
Original assignee: Nippon Telegraph and Telephone Corp; Nippon Telegraph and Telephone East Corp
Current assignee: Nippon Telegraph and Telephone Corp; Nippon Telegraph and Telephone East Corp
Priority date: 2000-10-02
Filing date: 2000-10-02
Publication date: 2002-04-10

Abstract

PROBLEM TO BE SOLVED: To provide a document reading-aloud device which actualizes the improvement of listener's amusement and convenience by changing sound quality, document contents, intonation, and a voice synthesizing engine according to personal information on a document creator, area information using the reading- aloud device, etc. SOLUTION: The document reading-aloud device which converts a text document into a voice and reads an electronic document aloud is equipped with a creator information extraction part 3 which extracts at least one of pieces of information representing the creator of the object document to be read aloud, the sex, the hometown, address, and the language of the document, a sound quality determination part 4 which determines at least one of the pitch, strength, and vocalizing speed of a synthesized voice and a synthesized voice dictionary when the object document 2 is read aloud, a voice synthesizing engine 1 which generates the synthesized voice of the object document 2 according to the sound quality determined by the sound quantity determination part 4, and a synthesized voice output part 5 outputs the synthesized voice of the voice synthesizing engine 1 to a speaker 6.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、電子化されたテ
キスト文書を音声に変換して読み上げる文書読み上げ装
置に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a text-to-speech apparatus that converts a digitized text document into voice and reads it out.

【０００２】[0002]

【従来の技術】テキスト音声合成技術を用いた電子文書
読み上げ装置は、パーソナルコンピュータに実装する一
機能として注目されている。この装置によれば、ユーザ
は電子文書の内容を音声によって確認することができ
る。このため、従来ディスプレイにて確認する必要があ
った情報を直接耳から確認することが可能となり、ユー
ザのパーソナルコンピュータの利用形態を拡大すること
ができる。例えば、モバイルコンピュータにてこの機能
を実現すれば、ユーザは野外でディスプレイを広げなく
とも、耳にインナーヘッドホンを装着するだけで電子メ
イル内容を確認することが可能となる。2. Description of the Related Art An electronic document reading apparatus using a text-to-speech synthesis technique has attracted attention as a function to be implemented in a personal computer. According to this device, the user can confirm the content of the electronic document by voice. For this reason, it is possible to directly check the information that had to be checked on the display directly from the ear, and the use form of the user's personal computer can be expanded. For example, if this function is realized by a mobile computer, the user can check the contents of the electronic mail only by wearing the inner headphones to the ears without expanding the display outdoors.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、一般に
テキスト合成音を再生する装置では、読み上げ音声の抑
揚が十分ではなく、常に同じ音質で読むため、聞き取り
側に提供する娯楽性を損なわせる原因となっていた。さ
らに合成エンジンに対して対応していない言語（例えば
英語）の文章を読み上げると誤った読み方をしてしまう
ため、聞き取り側にとって利便性が損なわれるという問
題があった。However, in general, in a device for reproducing a text-synthesized sound, the inflection of the read-out voice is not sufficient, and the read-out voice is always read in the same sound quality, which causes a loss of the entertainment provided to the listener. I was Furthermore, if a text in a language (for example, English) that is not compatible with the synthesis engine is read aloud, the reading is erroneously performed, so that there is a problem that convenience for a listener is impaired.

【０００４】娯楽性を高める手段として、文書の作成者
が合成音声の音質編集作業システムを用いて合成音声の
音質（音の高低、発話速度、音の強弱）、抑揚、あるい
は発話内容の一部を編集する方法がある。例えば、音質
編集作業システムにより、音質を高くして女性らしくし
たり、抑揚をつけて文章に感情を付加したり、発話内容
を一部変えて方言を喋らせることで、受聴者に対して娯
楽性を付加することが可能である。しかしながら、現状
の文書読み上げ装置では、受聴者自らが、読み上げさせ
たい文書毎にその音質を変更する必要があった。このよ
うな煩雑な操作は、受聴者に対して文書読み上げ装置が
本来持つべき利便性を著しく低下させるため、受聴者の
娯楽性を向上させる一般的な手段とはなり得なかった。[0004] As means for enhancing the entertainment, the creator of the document uses a sound quality editing work system of the synthesized voice, and the sound quality of the synthesized voice (the pitch of the sound, the speech speed, the strength of the sound), the intonation, or a part of the speech content. There is a way to edit. For example, the sound quality editing work system can enhance the sound quality to make it feminine, add inflections to add emotion to sentences, and change the utterance content to speak a dialect to make the listeners entertain It is possible to add sex. However, in the current text-to-speech apparatus, it is necessary for the listener himself to change the sound quality of each text to be read. Such a complicated operation significantly lowers the convenience that the document reading device should originally have for the listener, and thus cannot be a general means for improving the entertainment of the listener.

【０００５】また、一般にテキスト合成音を読み上げる
装置では、読み上げ文書の言語（例えば英語）を判断し
て読み上げの合成エンジンを変更することは出来なかっ
た。例えば日本語対応の合成エンジンに英語で書かれた
文章を読み上げた場合、その読み上げが正しく行われな
いため、受聴者の利便性を低下させる原因となってい
た。Further, in general, in a device that reads out a text-synthesized sound, it is not possible to determine the language (for example, English) of the text-to-speech document and change the text-to-speech synthesis engine. For example, when a sentence written in English is read out on a Japanese-language synthesis engine, the reading is not performed correctly, which causes a decrease in the convenience of the listener.

【０００６】このように、既存の文書読み上げ装置で
は、その読み上げの調子に変化がなく受聴者にとっての
娯楽性を著しく損なうという問題点があった。また、英
語の文書であってもそのまま日本語の合成エンジンを適
用してしまうため、読み上げが正しく行われないことと
いう問題点があった。As described above, the existing text-to-speech apparatus has a problem that the tone of the text-to-speech does not change and the entertainment for the listener is significantly impaired. Further, even if the document is in English, the Japanese synthesis engine is applied as it is, and thus there is a problem that the reading is not performed correctly.

【０００７】この発明の目的は、例えば、文書作成者の
個人情報、読み上げ装置を利用する地域情報、文書読み
上げの聞き取り者の個人情報、のいずれかの情報から、
音質、文書内容、抑揚、音声合成エンジンの変更を行う
ことで、受聴者の娯楽性や利便性の向上を実現する文書
読み上げ装置を提供することにある。[0007] An object of the present invention is to provide, for example, any one of the following information: personal information of a document creator, regional information using a reading device, and personal information of a listener of a document.
It is an object of the present invention to provide a text-to-speech apparatus that changes the sound quality, the content of a document, the inflection, and the speech synthesis engine, thereby improving the entertainment and convenience of the listener.

【０００８】[0008]

【課題を解決するための手段】上記課題を解決するた
め、請求項１記載の発明は、テキスト文書を音声に変換
して電子文書を読み上げる文書読み上げ装置において、
読み上げ対象文書について、その作成者、性別、出身
地、住所、文書の言語を示す情報のうち少なくとも一つ
を抽出する作成者情報抽出手段と、前記作成者情報抽出
手段から得られる情報により、前記読み上げ対象文書を
読み上げる際の合成音声における音の高低、強弱、発話
の速度、合成音声辞書の少なくとも一つを決定する音質
決定手段と、前記音質決定手段で決定される音質に基づ
いて前記読み上げ対象文書の合成音声を生成する音声合
成エンジンと、前記音声合成エンジンの合成音声を音声
デバイスに出力する合成音出力手段とを有することを特
徴とする。According to a first aspect of the present invention, there is provided a document reading apparatus for converting a text document into voice and reading an electronic document.
For the document to be read out, the creator, gender, hometown, address, creator information extracting means for extracting at least one of information indicating the language of the document, and information obtained from the creator information extracting means, Sound quality determining means for determining at least one of the pitch, strength, utterance speed, and utterance speed of the sound in the synthesized speech when reading the document to be read out, and the read-out target based on the sound quality determined by the sound quality determining means. It has a speech synthesis engine for generating a synthesized speech of a document, and synthesized speech output means for outputting a synthesized speech of the speech synthesis engine to a speech device.

【０００９】請求項２記載の発明は、テキスト文書を音
声に変換して電子文書を読み上げる文書読み上げ装置に
おいて、ある特定の単語と、その単語を実際に読み上げ
る際の単語との対応関係が示されている単語変換データ
ベースと、前記単語変換データベースを用いて前記読み
上げ対象文書中の変換対象となる単語を置換すること
で、前記読み上げ対象文書を変換し出力する文書変換手
段と、前記文書変換手段の出力文書の合成音声を生成す
る音声合成エンジンと、前記音声合成エンジンの合成音
声を音声デバイスに出力する合成音出力手段とを備える
ことを特徴とする。請求項３記載の発明は、請求項２記
載の文書読み上げ装置において、読み上げ対象文書につ
いて、その作成者、性別、住所、出身地、文書の言語の
情報のうち少なくとも文書作成者の住所又は出身地情報
を含む情報を抽出する作成者情報抽出手段をさらに具備
し、前記文書変換手段は、前記単語変換データベースを
用いて前記読み上げ対象文書中の変換対象となる単語を
文書作成者の出身地や住所情報に応じて置換すること
で、前記読み上げ対象文書を変換し出力することを特徴
とする。請求項４記載の発明は、請求項２または請求項
３記載の文書読み上げ装置において、文書作成者の住所
又は出身地情報に応じて読み上げる単語とそれに付加す
る音声抑揚情報を蓄積する抑揚変換データ蓄積手段をさ
らに具備し、前記音声合成エンジンは、前記抑揚変換デ
ータ蓄積手段を参照して合成音声の抑揚を変換する機能
を有することを特徴とする。請求項５記載の発明は、請
求項２記載の文書読み上げ装置において、少なくとも読
み上げ文書を聞き取る人の住所又は出身地情報を抽出す
る聞き取り者情報抽出手段をさらに具備し、前記文書変
換手段は、前記単語変換データベースを用いて前記読み
上げ対象文書中の変換対象となる単語を、文書を聞き取
る人の出身地や住所情報に応じて置換することで、前記
読み上げ対象文書を変換し出力することを特徴とする。
請求項６記載の発明は、請求項５記載の文書読み上げ装
置において、文書聞き取り者の住所情報に応じて読み上
げる単語とそれに付加する音声抑揚情報を蓄積する抑揚
変換データ蓄積手段をさらに具備し、前記音声合成エン
ジンは、前記抑揚変換データ蓄積手段を参照して合成音
声の抑揚を変換する機能を有することを特徴とする。According to a second aspect of the present invention, in a document reading device that converts a text document into a voice and reads an electronic document, a correspondence between a specific word and a word when the word is actually read is shown. A word conversion database, a document conversion unit that converts and outputs the read target document by replacing a conversion target word in the read target document using the word conversion database, and a document conversion unit. A speech synthesis engine for generating a synthesized speech of an output document, and synthesized speech output means for outputting a synthesized speech of the speech synthesis engine to an audio device. According to a third aspect of the present invention, in the document reading device according to the second aspect, at least the address or the place of origin of the document creator among the information of the creator, gender, address, hometown, and language of the document to be read. Creator information extracting means for extracting information including information, wherein the document conversion means uses the word conversion database to convert a word to be converted in the text to be read out from the origin or address of the document creator. The read-out target document is converted and output by replacing the read-out target document according to information. According to a fourth aspect of the present invention, in the document reading apparatus according to the second or third aspect, a transliteration data storage for storing a word to be read out according to the address of the document creator or the information on the place of origin and voice intonation added thereto. Means for converting the intonation of synthesized speech with reference to the intonation conversion data storage means. The invention according to claim 5 is the document reading device according to claim 2, further comprising listener information extracting means for extracting at least the address or hometown information of a person who listens to the reading document, and wherein the document converting means comprises Using a word conversion database, the word to be converted in the read target document, by replacing the word according to the birthplace or address information of the person who listens to the document, the read target document is converted and output. I do.
The invention according to claim 6 is the document reading device according to claim 5, further comprising intonation conversion data storage means for storing words to be read according to the address information of the document listener and voice intonation information added thereto. The speech synthesis engine has a function of converting the intonation of synthesized speech with reference to the intonation conversion data storage means.

【００１０】請求項７記載の発明は、テキスト文書を音
声に変換して電子文書を読み上げる文書読み上げ装置に
おいて、読み上げ対象文書について、その作成者、性
別、住所、出身地、文書の言語を示す情報の情報のうち
少なくとも一つを抽出する作成者情報抽出手段と、前記
作成者情報抽出手段で得られた情報に応じて、複数の音
声合成エンジンから適切な音声合成エンジンを選び、合
成音声を生成する音声合成エンジン群と、前記音声合成
エンジン群から出力された合成音声を音声デバイスに出
力する合成音出力手段とを有することを特徴とする。According to a seventh aspect of the present invention, in a document reading apparatus for converting a text document into a voice and reading an electronic document, information indicating a creator, a gender, an address, a place of origin, and a language of the document with respect to the reading target document. Creator information extracting means for extracting at least one of the above information, and selecting an appropriate speech synthesis engine from a plurality of speech synthesis engines in accordance with the information obtained by the creator information extraction means to generate a synthesized speech. And a synthesized sound output unit that outputs a synthesized sound output from the sound synthesis engine group to a sound device.

【００１１】請求項８記載の発明は、請求項１〜７のい
ずれか１項に記載の文書読み上げ装置において実行され
るプログラムを記録したコンピュータ読み取り可能な記
録媒体である。According to an eighth aspect of the present invention, there is provided a computer-readable recording medium storing a program to be executed by the document reading apparatus according to any one of the first to seventh aspects.

【００１２】上記構成において、請求項１記載の発明に
おいては、文書作成者の個人情報を抽出し、その情報に
基づき文書読み上げの音質をかえる機能を有する。この
ため、例えば発明装置をメイル文書の読み上げに利用し
た場合、男性からのメイルは声を低くし、女性からのメ
イルは声を高くする操作を行うことが可能となる。この
ような操作により、合成音声の聞き手側の娯楽性を向上
させることが可能となる。In the above configuration, the invention according to the first aspect has a function of extracting personal information of a document creator and changing the sound quality of the text-to-speech based on the extracted information. Therefore, for example, when the invention apparatus is used for reading out a mail document, it is possible to perform an operation of lowering the voice of a male mail and raising the voice of a female mail. Such an operation makes it possible to improve the entertainment on the listener's side of the synthesized speech.

【００１３】請求項２記載の発明においては、読み上げ
文書の中に含まれる特定の単語を別の単語に変換する機
能を有する。例えば、読み上げ文書中に「ありがとう」
という単語があった場合に「おおきに」という言葉に変
換する。このため、合成音声の聞き手側に対して本装置
の利用地域にあった方言を用いた言葉を読み上げさせる
ことが可能となり、聞き手側の娯楽性を向上させること
が可能となる。請求項３記載の発明においては、請求項
２記載の発明において文書作成者の個人情報を抽出し、
その情報に基づき作成文書を一部変更する機能を有す
る。このため、例えば文書作成者の出身地の方言にあわ
せて作成文書を変更しそれを読み上げる操作を行うこと
が可能となる。このような操作により、合成音声の聞き
手側の娯楽性を向上させることが可能なる。請求項４記
載の発明においては、請求項３記載の発明において、文
書作成者の個人情報に基づき読み上げの抑揚を変更する
機能を有する。このため、例えば文書作成者の出身地の
方言にあわせて読み上げの抑揚を変更することが可能と
なる。このような操作により、さらに合成音声の聞き手
側の娯楽性を向上させることが可能となる。[0013] The invention according to claim 2 has a function of converting a specific word contained in the read-out document into another word. For example, "thank you"
Is converted to the word "large". For this reason, it is possible for the listener of the synthesized voice to read a word using a dialect that is in the area where the present apparatus is used, and it is possible to improve the entertainment of the listener. According to the third aspect of the present invention, the personal information of the document creator is extracted in the second aspect of the invention,
It has a function to partially change the created document based on the information. For this reason, for example, it is possible to change the created document according to the dialect of the hometown of the document creator and perform an operation of reading out the created document. Through such an operation, it is possible to improve the entertainment on the listener side of the synthesized voice. According to a fourth aspect of the present invention, in the third aspect of the invention, there is provided a function of changing the inflection of reading aloud based on personal information of a document creator. Therefore, for example, it is possible to change the inflection of the reading aloud according to the dialect of the hometown of the document creator. By such an operation, it is possible to further improve the entertainment on the listener side of the synthesized voice.

【００１４】請求項５記載の発明においては、文書聞き
取り者の個人情報を抽出し、その情報に基づき作成文書
を一部変更する機能を有する。このため、例えば聞き取
り者の出身地の方言にあわせて作成文書を変更しそれを
読み上げる操作を行うことが可能となる。このような操
作により、合成音声の聞き手側の娯楽性を向上させるこ
とが可能となる。請求項６記載の発明においては、請求
項５記載の発明において、聞き取り者の個人情報に基づ
き読み上げの抑揚を変更する機能を有する。このため、
例えば聞き取り者の出身地の方言にあわせて読み上げの
抑揚を変更することが可能となる。このような操作によ
り、さらに合成音声の聞き手側の娯楽性を向上させるこ
とが可能となる。The invention according to claim 5 has a function of extracting personal information of a listener of a document and partially changing a prepared document based on the information. For this reason, for example, it is possible to perform an operation of changing the prepared document and reading it out according to the dialect of the listener's hometown. Such an operation makes it possible to improve the entertainment on the listener's side of the synthesized speech. According to a sixth aspect of the present invention, in the fifth aspect of the invention, there is provided a function of changing the inflection of reading aloud based on the personal information of the listener. For this reason,
For example, it is possible to change the inflection of reading aloud according to the dialect of the listener's hometown. By such an operation, it is possible to further improve the entertainment on the listener side of the synthesized voice.

【００１５】請求項７記載の発明においては、文書作成
者の個人情報を抽出し、その情報に基づき合成エンジン
を変更する機能を有する。このため、例えば聞き取り者
の出身地の方言にあわせて音声合成エンジンを変更する
ことが可能となる。このような操作により、合成音声の
聞き手側の娯楽性を向上させることが可能となる。ま
た、文書作成者の記載言語が外国語（例えば英語）であ
った場合、その言語に対応した合成音声エンジンに切り
替えて利用することが可能となる。このような操作によ
り、聞き取り側の利便性を向上させることも可能とな
る。The invention according to claim 7 has a function of extracting personal information of a document creator and changing a synthesis engine based on the information. For this reason, for example, it is possible to change the speech synthesis engine according to the dialect of the listener's hometown. Such an operation makes it possible to improve the entertainment on the listener's side of the synthesized speech. In addition, when the written language of the document creator is a foreign language (for example, English), it is possible to switch to a synthetic speech engine corresponding to the language and use it. Such an operation also makes it possible to improve the convenience of the listening side.

【００１６】[0016]

【発明の実施の形態】本発明の実施形態は、音質を変更
する第一形態、文書内容を変更する第二形態、文書内容
と読み上げの抑揚を変更する第三形態、音声合成エンジ
ンを変更する第四形態に分類できる。第一形態は請求項
第一項、第二形態は請求項第二項、第三項と第五項、第
三形態は請求項第四項と第六項、第四形態は請求項第七
項に相当する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention include a first mode for changing the sound quality, a second mode for changing the document content, a third mode for changing the inflection of the document content and reading, and a voice synthesis engine. It can be classified into the fourth form. The first form is Claim 1, the second form is Claim 2, the third and fifth forms, the third form is Claims 4 and 6, and the fourth form is Claim 7. Term.

【００１７】「実施形態１」図１は、本発明の第一形態
（音質を変更する形態）のブロック図である。本実施形
態では電子メイルの読み上げ装置として説明する。読み
上げ対象文書２は、電子メイルソフトウェアにて受信し
た電子メイルの本文となる。作成者情報抽出部３は、電
子メイルの送信者の個人情報を抽出する。送信者の名称
については電子メイルのヘッダーファイルから自動的に
抽出することが可能である。また、送信者の住所情報に
ついては、電子メイルの署名部分を参照することにより
自動抽出するか、あるいは電子メイルのユーザがアドレ
ス帳にて予め手動で入力することで可能となる。その他
の情報、例えば性別情報も電子メイルのユーザがアドレ
ス帳にて予め手動で入力した情報を蓄積することで抽出
することが可能である。[Embodiment 1] FIG. 1 is a block diagram of a first embodiment (an embodiment in which sound quality is changed) of the present invention. In the present embodiment, an electronic mail reading device will be described. The reading target document 2 is the text of the electronic mail received by the electronic mail software. The creator information extraction unit 3 extracts personal information of the sender of the electronic mail. The name of the sender can be automatically extracted from the header file of the electronic mail. The address information of the sender can be automatically extracted by referring to the signature part of the electronic mail, or can be input manually by the user of the electronic mail in advance in the address book. Other information, for example, gender information, can also be extracted by storing information manually input by the user of the electronic mail in the address book in advance.

【００１８】音質決定部４は、作成者情報抽出部３の情
報から音の高低、発話速度、音の強さ、音声辞書、を決
定する。例えば性別情報から男性の場合は音声辞書を男
性に切り替え、女性の場合は音声辞書を女性に切り替え
る。また、ユーザが予め手動で入力した音の高低、発話
速度、音の強さ、音声辞書、の設定値を文書作成者毎に
作成者情報蓄積部３にて蓄積させ、それらを反映させる
ことも可能である。The sound quality determining unit 4 determines the pitch of the sound, the speech speed, the sound intensity, and the voice dictionary from the information of the creator information extracting unit 3. For example, based on the gender information, the voice dictionary is switched to the male in the case of the male, and the voice dictionary is switched to the female in the case of the female. Also, the creator information storage unit 3 may accumulate the set values of the sound pitch, speech speed, sound intensity, and sound dictionary manually input by the user in advance in the creator information storage unit 3 and reflect these. It is possible.

【００１９】音声合成エンジン１は、合成音声を生成す
る際の各種情報の設定が、音質決定部４で決定された音
の高低、発話速度、音の強さ、音声辞書、に設定され、
読み上げ対象文書２を合成音データに変換する。合成音
出力部５は、変換された合成音データを音声信号に変換
して出力し、音声デバイスとしてのスピーカ６で音声信
号に基づく音響信号が発生される。The speech synthesis engine 1 sets various information at the time of generating the synthesized speech to the pitch of the sound, the speech speed, the sound intensity, and the speech dictionary determined by the sound quality determination section 4,
The document 2 to be read out is converted into synthetic sound data. The synthesized sound output unit 5 converts the converted synthesized sound data into an audio signal and outputs the audio signal. A speaker 6 as an audio device generates an acoustic signal based on the audio signal.

【００２０】なお、本実施形態および以下の実施形態に
おいて、文書変換部、音声合成エンジン等のデータ処理
を行う各構成は１または複数のコンピュータ（中央処理
装置）と、それによって実行されるプログラムとを用い
て実現することができ、そのプログラムは、コンピュー
タ読み取り可能な記録媒体に記録して、あるいは通信回
線を介して頒布することが可能である。In this embodiment and the following embodiments, each component that performs data processing such as a document conversion unit and a speech synthesis engine includes one or a plurality of computers (central processing units) and programs executed by the computers. The program can be recorded on a computer-readable recording medium or distributed via a communication line.

【００２１】「実施形態２」図２、図３、図５は本発明
の第二形態（文書内容を変更する形態）の構成を示すブ
ロック図である。図２、図３、および図５に示す構成を
比較すると、図３に示す構成は図２示す構成に作成者情
報抽出部３を追加したものであり、図５に示す構成は図
３に示す作成者情報抽出部３に代えて聞き取り者情報抽
出部９を設けたものである。ここでは文書内容を変更す
る場合の実施形態として図３のブロック図を用いた電子
メイルの読み上げ装置を中心に説明する。なお、実施形
態２および以下の実施形態において、実施形態１を含む
各実施形態間で基本的な機能を同一とする構成要素に対
しては同一の符号を付けている。[Embodiment 2] FIGS. 2, 3 and 5 are block diagrams showing the structure of the second embodiment (form for changing the contents of a document) of the present invention. Comparing the configurations shown in FIGS. 2, 3, and 5, the configuration shown in FIG. 3 is obtained by adding the creator information extracting unit 3 to the configuration shown in FIG. 2, and the configuration shown in FIG. 5 is shown in FIG. A listener information extracting unit 9 is provided in place of the creator information extracting unit 3. Here, as an embodiment in the case of changing the contents of a document, an electronic mail reading device using the block diagram of FIG. 3 will be mainly described. In the second embodiment and the following embodiments, the same reference numerals are given to components having the same basic function in each embodiment including the first embodiment.

【００２２】図３において、読み上げ対象文書２は、電
子メイルソフトウェアにて受信した電子メイルの本文と
なる。作成者情報抽出部３は、電子メイルの送信者の個
人情報を抽出する。送信者の名称については電子メイル
のヘッダーファイルから自動的に抽出することが可能で
ある。また、送信者の住所情報については、電子メイル
の署名部分を参照することにより自動抽出するか、ある
いは電子メイルのユーザがアドレス帳にて予め手動で入
力することで可能となる。その他の情報、例えば文書作
成者の出生地、出身地や性別情報も電子メイルのユーザ
がアドレス帳にて予め手動で入力した情報を蓄積するこ
とで抽出することが可能である。In FIG. 3, the reading target document 2 is the text of the electronic mail received by the electronic mail software. The creator information extraction unit 3 extracts personal information of the sender of the electronic mail. The name of the sender can be automatically extracted from the header file of the electronic mail. The address information of the sender can be automatically extracted by referring to the signature part of the electronic mail, or can be input manually by the user of the electronic mail in advance in the address book. Other information, such as the birthplace, place of birth, and gender of the document creator, can also be extracted by storing information manually input in advance in the address book by the user of the electronic mail.

【００２３】単語変換データベース７は、読み上げ対象
文書２の中で該当する単語を置き換えるためのデータベ
ースである。図７に単語変換データベース例を示す。こ
の図が示すように、単語変換データベースにはある単語
に対して各地域の方言が割り振られる。The word conversion database 7 is a database for replacing a corresponding word in the reading target document 2. FIG. 7 shows an example of the word conversion database. As shown in this figure, dialects of each region are assigned to a word in the word conversion database.

【００２４】文書変換部８は、作成者情報抽出部３によ
って抽出された情報やあらかじめ設定された情報に基づ
いて読み上げ対象文書２内に含まれる単語に対して文書
作成者の出身地や所在する地域の方言に変換可能な単語
を変換する。例えば、読み上げ対象文書が「今日はメイ
ルをくれてありがとう。」であり、メイル送信者が関西
出身あるいは開西在住の人であった場合、上記文書は
「今日はメイルをくれておおきに。」に変換される。The document conversion section 8 is based on the information extracted by the creator information extraction section 3 or information set in advance, and the place of origin and location of the word included in the document 2 to be read aloud. Convert words that can be converted to local dialect. For example, if the document to be read is "Thank you for your mail today." Is converted.

【００２５】音声合成エンジン１は、文書変換部８で変
換された文書を読み上げて、合成音データに変換する。
合成音出力部５は、変換された合成音データを音声信号
に変換して出力し、スピーカ６で音声信号に基づいて音
響信号が発生される。The speech synthesis engine 1 reads out the document converted by the document conversion section 8 and converts it into synthesized sound data.
The synthesized sound output unit 5 converts the converted synthesized sound data into an audio signal and outputs the audio signal. A speaker 6 generates an acoustic signal based on the audio signal.

【００２６】なお、図２に示す読み上げ装置（請求項２
記載の発明に対応）おいては、上記説明において作成者
情報抽出部３が接続されていないことを除き上記と同じ
動作をする。この場合は、例えばアニメーションキャラ
クタによる文書読み上げ装置等に利用すると効果的であ
る。読み上げ対象文書２をある特定のアニメーションキ
ャラクタの喋り言葉に変換して読ませることで、そのア
ニメーションの個性を強調することが可能となり、聞き
手側の娯楽性を向上させることが可能となる。The reading device shown in FIG.
In the above description, the same operation as above is performed except that the creator information extraction unit 3 is not connected in the above description. In this case, for example, it is effective to use a document reading device using an animation character. By converting the reading target document 2 into a spoken word of a specific animation character and reading it, the individuality of the animation can be emphasized, and the entertainment on the listener side can be improved.

【００２７】また、図５に示す読み上げ装置（請求項５
記載の発明に対応）においては、上記説明において作成
者情報抽出部３から抽出される作成者の地域情報を利用
するのではなく、聞き取り者情報抽出部９によって抽出
された読み上げ対象２の聞き取り側の出身地や所在地の
方言に合わせて、文書置き換えを行うものである。聞き
取り者情報抽出部９は、読み上げ装置のユーザ等が予め
登録した聞き取り者の個人情報の中から出生地、出身地
や性別情報を参照することで文書置き換えに使用される
情報を抽出して文書変換部８へ出力する。Further, the reading device shown in FIG.
In the above description, instead of using the creator's area information extracted from the creator information extraction unit 3 in the above description, the listening side of the reading target 2 extracted by the listener information extraction unit 9 The document is replaced according to the dialect of the place of birth or location. The listener information extracting unit 9 extracts information used for document replacement by referring to the place of birth, place of birth and gender information from the personal information of the listener registered in advance by the user of the reading device. Output to the converter 8.

【００２８】「実施形態３」図４、図６は本発明の第三
形態（文書内容と読み上げの抑揚を変更する形態）のブ
ロック図である。実施形態として図４のブロック図を用
いた電子メイルの読み上げ装置として説明する。実施形
態３は、抑揚変換データ蓄積部１０が接続されたことを
除き実施形態２と同じ動作である。以下抑揚変換データ
蓄積部１０について説明する。[Embodiment 3] FIGS. 4 and 6 are block diagrams of a third embodiment (an embodiment in which the contents of a document and the inflection of reading out are changed). An embodiment will be described as an electronic mail reading device using the block diagram of FIG. The third embodiment is the same as the second embodiment except that the intonation conversion data storage unit 10 is connected. Hereinafter, the intonation conversion data storage unit 10 will be described.

【００２９】抑揚変換データ蓄積部１０は、ある特定の
単語とその単語に付加したい音声の抑揚情報との対応関
係が蓄積されている。ここで、ある特定の単語とその単
語に付加したい音声の抑揚情報との対応関係は、地域情
報や性別情報毎に分類され、分類毎に参照可能に設定さ
れている。そして、音声合成エンジン１は、作成者情報
抽出部３で抽出された情報に基づいて文書変換を行い、
さらに抑揚変換データ蓄積部１０に記載された対応関係
を参照しながら読み上げ対象文書に対し抑揚情報を付加
する。例えば、抑揚変換データ蓄積部１０に、『おおき
に → 語尾を上げる』という対応関係が示されてお
り、読み上げ対象文書２が「今日はメイルをくれておお
きに。」であった場合、音声合成エンジン１では「おお
きに」の語尾を上げた合成音声を生成する。The intonation conversion data storage unit 10 stores the correspondence between a specific word and the intonation information of the voice to be added to the word. Here, the correspondence between a specific word and the intonation information of the voice to be added to the word is classified according to regional information or gender information, and is set so that it can be referred to for each classification. Then, the speech synthesis engine 1 performs document conversion based on the information extracted by the creator information extraction unit 3,
Furthermore, it refers to the correspondence described in the intonation conversion data storage unit 10 and adds intonation information to the reading target document. For example, in the intonation conversion data storage unit 10, a correspondence relationship of "Okini → Raise the ending" is indicated, and when the reading target document 2 is "Please give me a mail today", the speech synthesis engine 1 Then, a synthesized speech with the ending of "Ohokuni" raised is generated.

【００３０】なお、図６の読み上げ装置（請求項６記載
の発明に対応）においては、上記説明において作成者情
報抽出部３から抽出される作成者の地域情報を利用する
のではなく、聞き取り者情報抽出部９で抽出された読み
上げ対象の聞き取り側の出身地や所在地の方言に合わせ
て、文書置き換えを行うものである。In the reading device of FIG. 6 (corresponding to the invention described in claim 6), instead of using the creator's local information extracted from the creator information extracting unit 3 in the above description, the listener The document is replaced in accordance with the dialect of the place of birth or location of the listening side to be read out, which is extracted by the information extracting unit 9.

【００３１】「実施形態４」図７は本発明の第四形態
（音声合成エンジンを変更する形態）のブロック図であ
る。実施形態として図７のブロック図を用いた電子メイ
ル読み上げ装置として説明する。[Embodiment 4] FIG. 7 is a block diagram of a fourth embodiment (an embodiment in which a speech synthesis engine is changed) of the present invention. An embodiment will be described as an electronic mail reading device using the block diagram of FIG.

【００３２】読み上げ対象文書２は、電子メイルソフト
ウェアにて受信した電子メイルの本文となる。作成者情
報抽出部３は、電子メイルの送信者の個人情報を抽出す
る。送信者の名称については電子メイルのヘッダーファ
イルから自動的に抽出することが可能である。また、送
信者の住所情報については、電子メイルの署名部分を参
照することにより自動抽出するか、あるいは電子メイル
のユーザがアドレス帳にて予め手動で入力することで可
能となる。さらに、本文の言語に何がかかれているかを
把握する仕組みを設はることも可能である。その他の情
報、例えば性別情報も電子メイルのユーザがアドレス帳
にて予め手動で人力した情報を蓄積することで抽出する
ことが可能である。The reading target document 2 is the text of the electronic mail received by the electronic mail software. The creator information extraction unit 3 extracts personal information of the sender of the electronic mail. The name of the sender can be automatically extracted from the header file of the electronic mail. The address information of the sender can be automatically extracted by referring to the signature part of the electronic mail, or can be input manually by the user of the electronic mail in advance in the address book. Furthermore, it is also possible to set up a mechanism for grasping what is written in the language of the text. Other information, for example, gender information, can also be extracted by accumulating manually input information manually in advance in the address book by the user of the electronic mail.

【００３３】音声合成エンジン群１ａは、例えば日本
語、英語等の各言語に対応して構成されている複数の音
声合成エンジンａ，ｂ，ｃ，…から構成されている。作
成者情報作成部３の情報から適切な音声合成エンジンを
設定し、読み上げ対象文書２を読み上げる。例えば、電
子メイルの本文が英語であった場合には、英語対応の音
声合成エンジンを選択することで合成音声の聞き手側の
利便性を向上させることが可能となる。また、複数の音
声合成エンジンａ，ｂ，ｃ，…として同一言語（例えば
日本語）対応で合成パラメータ等を予め異ならせた複数
の音声合成エンジンを用意した場合には、複数の日本語
対応音声合成エンジンを文書作成者毎にランダムに切り
替えて利用することで、読み上げる際の音質にバリエー
ションを与えることが可能となり、合成音声の聞き手側
の娯楽性を向上させることが可能となる。The speech synthesis engine group 1a is composed of a plurality of speech synthesis engines a, b, c,... Corresponding to respective languages such as Japanese and English. An appropriate speech synthesis engine is set based on the information of the creator information creating unit 3, and the reading target document 2 is read out. For example, when the text of the electronic mail is in English, it is possible to improve the convenience on the listener side of the synthesized voice by selecting a voice synthesis engine that supports English. When a plurality of speech synthesis engines a, b, c,... Corresponding to the same language (for example, Japanese) and having different synthesis parameters and the like are prepared in advance, a plurality of Japanese speech By switching and using the synthesis engine at random for each document creator, it is possible to give variations to the sound quality when reading out, and it is possible to improve the entertainment of the synthesized voice on the listener's side.

【００３４】以上、図面を参照して本発明の実施の形態
について説明したが、本発明の実施形態は上記のものに
限定されることなく適宜変更可能である。例えば、読み
上げ対象文書２としては、電子メイル本文に限らず、電
子メイルに添付されたアプリケーション固有形式の文書
情報や、あるいは文字データを画像情報として含む情報
であっても、所定の変換処理を行うことでテキスト情報
に変換できるものであれば同様に利用することができ
る。また、音声合成エンジン１と、他の文書変換部８や
合成音出力部５の各データ処理部あるいは単語変換デー
タベース７や抑揚変換データ蓄積部１０は、必ずしも１
台のパーソナルコンピュータ上に設けられている必要は
なく、有線および無線のネットワークを介して分散して
配置されていてもよい。As described above, the embodiments of the present invention have been described with reference to the drawings. However, the embodiments of the present invention are not limited to those described above, and can be appropriately changed. For example, the reading target document 2 is not limited to the electronic mail body, but performs a predetermined conversion process even if the application information is document information attached to the electronic mail or information including character data as image information. Anything that can be converted into text information in this way can be similarly used. Also, the speech synthesis engine 1 and the other data processing units of the document conversion unit 8 and the synthesized sound output unit 5 or the word conversion database 7 and the intonation conversion data storage unit 10 are not necessarily one.
It does not need to be provided on one personal computer, and may be distributed and arranged via a wired and wireless network.

【００３５】[0035]

【発明の効果】以上、説明したように、本発明により、
文書作成者の個人情報、読み上げ装置を利用する地域、
文書読み上げの聞き取り者の個人情報、のいずれかの情
報に応じて音質、文書内容、抑揚、音声合成エンジンの
変更を行って合成音声を出力することが可能となる。こ
のような機能により、聞き取り者の娯楽性の向上や利便
性向上を実現する文書読み上げ装置を提供することが可
能となる。As described above, according to the present invention,
Personal information of the document creator, the region where the reading device is used,
It is possible to output a synthesized speech by changing the sound quality, the content of the document, the intonation, and the speech synthesis engine in accordance with any of the personal information of the listener who reads out the document. With such a function, it is possible to provide a text-to-speech apparatus that realizes improvement in the entertainment and convenience of the listener.

【００３６】本発明の第一形態では、文書作成者毎に音
質を換えて文書読み上げをさせることが可能となる。こ
のため、聞き取り者の娯楽性を向上させることが可能と
なる。In the first embodiment of the present invention, it is possible to read out a document by changing the sound quality for each document creator. For this reason, it is possible to improve the entertainment of the listener.

【００３７】本発明の第二形態では、文書内容の一部を
変更して合成音声を出力することが可能となる。例え
ば、アニメーションキャラクタによる文書読み上げ装置
とした場合には、そのキャラクタの個性に合った読み上
げとなるように読み上げ文書の単語の一部を変更して、
聞き取り者の娯楽性を向上させることが可能となる。ま
た、文書読み上げ装置の利用者、文書読み上げ装置を利
用する地域、文書作成者の出身地や所在地、のいずれか
に合った方言を喋らせて、聞き取り者の娯楽性を向上さ
せることが可能となる。According to the second embodiment of the present invention, it is possible to change a part of the document content and output a synthesized speech. For example, in the case of a text-to-speech apparatus using an animated character, a part of words of the text to be read is changed so that the text is read aloud according to the personality of the character.
It is possible to improve the entertainment of the listener. In addition, it is possible to improve the listener's entertainment by speaking a dialect that matches the user of the text-to-speech device, the area where the text-to-speech device is used, and the place of origin or location of the document creator. Become.

【００３８】本発明の第三形態では、本発明の第二形態
に対してさらに読み上げの抑揚を調整する機能が付加さ
れている。このため、聞き取り者の娯楽性を向上させる
ことが可能となる。In the third embodiment of the present invention, a function for adjusting the inflection of reading is further added to the second embodiment of the present invention. For this reason, it is possible to improve the entertainment of the listener.

【００３９】本発明の第四形態では、文書作成者毎に音
声合成エンジンを換えて文書読み上げをさせることが可
能となる。このため、例えば読み上げ文書が英語であっ
た場合には、英語対応の音声合成エンジンに切り替えて
利用することで、聞き取り側の利便性を向上させること
が可能となる。また、複数の日本語対応音声合成エンジ
ンを利用することで、文書作成者毎の音質にバリエーシ
ョンを与えることが可能となり、聞き取り側の娯楽性を
向上させることが可能である。In the fourth embodiment of the present invention, it is possible to read out a document by changing the speech synthesis engine for each document creator. For this reason, for example, when the read-out document is in English, it is possible to improve the convenience on the listening side by switching and using the speech synthesis engine that supports English. In addition, by using a plurality of Japanese-language speech synthesis engines, it is possible to give variations to the sound quality of each document creator, and it is possible to improve the entertainment on the listening side.

【００４０】また、上記第一形態から第四形態の発明を
組み合わせて利用することで、聞き取り者の利便性や娯
楽性の向上効果を高めることも可能である。Further, by combining and using the inventions of the first to fourth embodiments, it is possible to enhance the effect of improving the convenience and recreational performance of the listener.

【００４１】本発明による読み上げ装置は、例えば、方
言対応音声合成ソフトウェアや、インタフェースエージ
ェントを用いたパーソナルコンピュータのアプリケーシ
ョンソフトウェアや、ゲーム作成ツールなどへの適用が
見込める。また、ある地域の市町村の役場等に配置され
た情報提供端末に適応すれば、その地域の方言で音声ガ
イドをしてくれる装置を作成することも可能となる。The reading device according to the present invention can be applied to, for example, dialect-compatible speech synthesis software, application software of a personal computer using an interface agent, and a game creation tool. Further, if the present invention is applied to an information providing terminal arranged at a local government office in a certain region, it is possible to create a device that provides voice guidance in a local dialect.

[Brief description of the drawings]

【図１】本発明による読み上げ装置の実施形態を示す図
（請求項１記載の発明）FIG. 1 is a diagram showing an embodiment of a reading device according to the present invention (the invention according to claim 1);

【図２】本発明による読み上げ装置の他の実施形態を示
す図（請求項２記載の発明）FIG. 2 is a diagram showing another embodiment of the reading device according to the present invention (the invention according to claim 2);

【図３】本発明による読み上げ装置の他の実施形態を示
す図（請求項３記載の発明）FIG. 3 is a diagram showing another embodiment of the reading device according to the present invention (the invention according to claim 3);

【図４】本発明による読み上げ装置の他の実施形態を示
す図（請求項４記載の発明）FIG. 4 is a diagram showing another embodiment of the reading device according to the present invention (the invention according to claim 4);

【図５】本発明による読み上げ装置の他の実施形態を示
す図（請求項５記載の発明）FIG. 5 is a diagram showing another embodiment of the reading device according to the present invention (the invention according to claim 5);

【図６】本発明による読み上げ装置の他の実施形態を示
す図（請求項６記載の発明）FIG. 6 is a diagram showing another embodiment of the reading device according to the present invention (the invention according to claim 6);

【図７】本発明による読み上げ装置の他の実施形態を示
す図（請求項７記載の発明）FIG. 7 is a diagram showing another embodiment of the reading device according to the present invention (the invention according to claim 7);

【図８】図３等に示す単語変換データベース８の例を示
す図8 is a diagram showing an example of the word conversion database 8 shown in FIG. 3 and the like.

【符号の説明】１…音声合成エンジン１ａ…音声合成エンジン群２…読み上げ対象文書３…作成者情報抽出部４…音質決定部５…合成音出力部６…スピーカ７…単語変換データベース８…文書変換部９…聞き取り者情報抽出部１０…抑揚変換データ蓄積部[Description of Signs] 1 ... Speech synthesis engine 1a ... Speech synthesis engine group 2 ... Document to be read out 3 ... Creator information extraction unit 4 ... Sound quality determination unit 5 ... Synthesis sound output unit 6 ... Speaker 7 ... Word conversion database 8 ... Document Conversion unit 9: Listener information extraction unit 10: Inflection conversion data storage unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者河野泰人東京都新宿区西新宿三丁目19番２号東日本電信電話株式会社内 (72)発明者細谷未生東京都新宿区西新宿三丁目19番２号東日本電信電話株式会社内 (72)発明者小阪朋也東京都新宿区西新宿三丁目19番２号東日本電信電話株式会社内Ｆターム(参考） 5B009 RD00 5D045 AA08 AA09 AB04 AB26 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Yasuto Kono 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo East Japan Nippon Telegraph and Telephone Corporation (72) Inventor Mio Hosoya Nishi-Shinjuku, Shinjuku-ku, Tokyo Tome Nippon Telegraph and Telephone Co., Ltd. (72) Inventor Tomoya Kosaka 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo F-term within Nippon Telegraph and Telephone Co., Ltd. AB04 AB26

Claims

[Claims]

1. A document reading device for converting a text document into a voice and reading an electronic document, wherein at least one of information indicating a creator, a gender, a place of origin, an address, and a language of the document is read. Creator information extracting means to be extracted, and information obtained from the creator information extracting means, at least one of pitch, loudness, utterance speed, and synthetic speech dictionary of a synthesized speech when reading out the reading target document is read out. Sound quality determining means for determining; a voice synthesis engine for generating a synthesized voice of the reading target document based on the sound quality determined by the sound quality determining means; and a synthesized voice output for outputting a synthesized voice of the voice synthesis engine to a voice device. Means for reading out a document.

2. A text-to-speech apparatus which converts a text document into a voice and reads an electronic document, comprising: a word conversion database indicating a correspondence between a specific word and a word when the word is actually read aloud; A document conversion unit that converts and outputs the read target document by replacing a conversion target word in the read target document using the word conversion database; and A text-to-speech apparatus comprising: a speech synthesis engine for generating; and a synthesized speech output unit for outputting a synthesized speech of the speech synthesis engine to a speech device.

3. The document reading device according to claim 2, wherein the creator, gender, address,
Origin information, further comprising creator information extraction means for extracting information including at least the address of the creator of the document or information of the origin of the information of the language of the document, the document conversion means, using the word conversion database A text-to-speech apparatus characterized in that the text-to-speech target is converted and output by replacing words to be converted in the text-to-speech according to the origin and address information of the document creator.

4. The document reading device according to claim 2, further comprising an inflection conversion data storage unit that stores a word to be read according to the address or hometown information of the document creator and voice intonation information added thereto. The text-to-speech apparatus according to claim 1, wherein the speech synthesis engine has a function of converting intonation of synthesized speech with reference to the intonation conversion data storage means.

5. The document reading device according to claim 2, further comprising: listener information extracting means for extracting at least the address or birthplace information of a person who listens to the reading document, wherein the document converting means includes the word conversion database. By replacing the words to be converted in the read target document according to the birthplace and address information of the person who listens to the document, thereby converting and outputting the read target document. apparatus.

6. The text-to-speech engine according to claim 5, further comprising: intonation conversion data storage means for storing words to be read according to the address information of the document listener and voice intonation information added thereto. A document reading device having a function of converting the intonation of a synthesized voice with reference to the intonation conversion data storage means.

7. A text-to-speech apparatus for converting a text document into voice and reading an electronic document, wherein a creator, a gender, an address,
Creator information extraction means for extracting at least one of information of information indicating a place of origin and a language of a document; and appropriate speech from a plurality of speech synthesis engines according to the information obtained by the creator information extraction means. A document reading apparatus comprising: a speech synthesis engine group that selects a synthesis engine and generates a synthesized speech; and a synthesized speech output unit that outputs a synthesized speech output from the speech synthesis engine group to a speech device.

8. A computer-readable recording medium which records a program to be executed by the document reading device according to claim 1. Description: