JP3433868B2

JP3433868B2 - E-mail communication media conversion system

Info

Publication number: JP3433868B2
Application number: JP28976795A
Authority: JP
Inventors: 久子浅野; 芳史大山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-11-08
Filing date: 1995-11-08
Publication date: 2003-08-04
Anticipated expiration: 2015-11-08
Also published as: JPH09135264A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、電子メール通信に
おいて、テキストからなる電子メールに合成音声で正確
な読みと自然な韻律を付与して読み上げることを可能と
する電子メール通信メディア変換システムに関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an e-mail communication media conversion system capable of reading an e-mail composed of text by adding accurate reading and natural prosody with synthetic voice in e-mail communication. Is.

【０００２】[0002]

【従来の技術】従来の電子メール通信システムにおいて
は、一般にテキスト、図形、音声などを、そのまま忠実
に送受信している。このため、例えばテキストのみのメ
ールを外出中に電話で聞くことはできなかった。2. Description of the Related Art In a conventional electronic mail communication system, texts, figures, voices, etc. are generally faithfully transmitted and received. Therefore, for example, it was not possible to listen to a text-only mail by telephone while outing.

【０００３】一方、テキストを合成音声に変換するテキ
スト音声変換装置（ハードウェア、ソフトウェア共に存
在）が販売されており、ユーザ端末から合成音声でメー
ルを読み上げさせることはできるものもあるが、現状で
は、読み・韻律付与のための単語辞書の語彙数が少な
い、読み・韻律付与ルールが不十分である等により、読
みの正確性、韻律の自然性にかなり問題があり、理解で
きない部分があったり、聞きにくいことが頻繁にあっ
た。On the other hand, text-to-speech converters (both hardware and software exist) for converting texts into synthetic speech are on the market, and there are some which allow a user terminal to read a mail with synthetic speech, but under the present circumstances. , The number of vocabularies in the word dictionary for reading and prosody provision is small, the reading and prosody provision rules are insufficient, etc. , It was often difficult to hear.

【０００４】図２に、この種のテキスト音声変換装置の
一般的な流れを示す。２００は日本語の入力テキストで
ある。読み・韻律付与部２１０は簡単な単語辞書と読み
・韻律付与ルールを具備し、入力テキスト２００を入力
として、該入力テキストに対する読みと韻律情報を付与
して読み・韻律情報２３０を出力する。読み・韻律情報
２３０は、簡単なテキストフォーマットで表現され、読
み、アクセント句分割位置、アクセント句アクセント
型、ポーズ位置、ポーズ長などの読みと韻律情報をも
つ。この読み・韻律情報のフォーマットは各テキスト音
声変換装置ごとに多少異なるが、簡単に各音声変換装置
のフォーマットに変換可能である。合成音声生成部２４
０は読み・韻律情報２３０を入力として合成音声を生成
し、合成音声２５０を出力する。FIG. 2 shows a general flow of this type of text-to-speech conversion apparatus. Reference numeral 200 is a Japanese input text. The reading / prosody imparting unit 210 has a simple word dictionary and a reading / prosody imparting rule, receives the input text 200 as an input, imparts reading and prosody information to the input text, and outputs the reading / prosody information 230. The reading / prosody information 230 is expressed in a simple text format and has readings and prosody information such as reading, accent phrase division position, accent phrase accent type, pose position, and pose length. The format of the reading / prosodic information is slightly different for each text-to-speech conversion device, but can be easily converted to the format of each speech-to-speech device. Synthetic speech generator 24
0 receives the reading / prosodic information 230 as input, generates a synthetic voice, and outputs a synthetic voice 250.

【０００５】なお、一般にテキスト音声変換装置は、入
力がテキストである場合と、読み・韻律情報である場合
の両方に対応している。入力がテキストである場合に
は、テキスト２００を入力として読み・韻律付与部２１
０、合成音声生成部２４０の処理を行い、合成音声２５
０を出力し、入力が読み・韻律情報である場合には、読
み・韻律情報２３０を入力として、直ちに合成音声生成
部２４０の処理を行い、合成音声２５０を出力する。In general, the text-to-speech conversion apparatus is compatible with both the case where the input is text and the case where the input is reading / prosodic information. When the input is text, the text / 200 is used as the input and the reading / prosody providing unit 21.
0, the processing of the synthetic speech generation unit 240 is performed, and the synthetic speech 25
When 0 is output and the input is reading / prosodic information, the reading / prosodic information 230 is input, the processing of the synthetic speech generation unit 240 is immediately performed, and the synthetic speech 250 is output.

【０００６】[0006]

【発明が解決しようとする課題】従来の電子メール通信
システムでは、テキストメール（テキストのみからなる
電子メール）を音声で聞くことができなかった。また、
テキスト音声変換装置を利用して、テキストメールを読
み上げさせた場合でも、読みの正確性、韻律の自然性に
問題があった。In the conventional electronic mail communication system, it was not possible to listen to a text mail (an electronic mail consisting only of text) by voice. Also,
Even when a text-to-speech device is used to read a text mail aloud, there are problems in reading accuracy and naturalness of prosody.

【０００７】本発明は、上記従来技術の問題点を解決す
るためになされたものであり、ユーザが使用する端末の
テキスト音声変換装置の読み・韻律付与処理の能力に依
存せずに、また、携帯端末等のディスク容量の小さいユ
ーザ端末においても、テキストメールを正しい読み、自
然な韻律で読み上げることを可能とする電子メール通信
メディア変換システムを提供することにある。The present invention has been made to solve the above-mentioned problems of the prior art, and does not depend on the reading / prosody imparting processing capability of the text-to-speech conversion device of the terminal used by the user, and An object of the present invention is to provide an electronic mail communication media conversion system that allows a user terminal such as a mobile terminal having a small disk capacity to read a text mail correctly and read it in a natural prosody.

【０００８】さらに本発明は、ユーザ端末の音声出力装
置を利用して電子メールを読み上げる場合にも、ユーザ
端末が受け取るデータ量をほとんど増やすことなく、テ
キストメールを正しい読み、自然な韻律で読み上げるこ
とを可能とする電子メール通信メディア変換システムを
提供することにある。Further, according to the present invention, even when an e-mail is read out by using the voice output device of the user terminal, the text mail is read correctly and read out in a natural prosody with almost no increase in the amount of data received by the user terminal. It is to provide an electronic mail communication media conversion system that enables the above.

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、電子メールの配信を行うメール通信ネッ
トワークと、前記メール通信ネットワークに接続され、
ユーザが電子メールを送受信する複数のユーザ端末とか
らなる電子メール通信システムにおいて、前記メール通
信ネットワーク上に、テキストからなる電子メールに音
声出力デバイスを用いて合成音声を生成するための読み
・韻律情報を付与するテキスト変換サーバと、ユーザが
指定した場合に合成音声デバイスとなり合成音声を生成
する音声出力サーバとを接続し、テキストからなる電子
メールに前記テキスト変換サーバにて読み・韻律情報を
付与し、前記音声出力サーバが音声出力デバイスとなっ
て合成音声を生成する構成としたことである。なお、ユ
ーザ端末内に合成音声装置がある場合は、前記音声出力
サーバのかわりにこの合成音声装置を音声出力デバイス
としてもよく、あるいは、メール通信ネットワーク上の
音声サーバとユーザ端末内の合成音声装置を必要に応じ
て使い分ることでもよい。In order to achieve the above object, the present invention provides a mail communication network for delivering electronic mail, and a mail communication network connected to the mail communication network.
In an e-mail communication system including a plurality of user terminals for users to send and receive e-mails, reading / prosodic information for generating synthetic voice for a text e-mail using a voice output device on the mail communication network. Is connected to a text output server that generates a synthetic voice as a synthetic voice device when the user specifies, and the text conversion server assigns reading / prosodic information to an email composed of text. The voice output server serves as a voice output device and generates a synthesized voice. If there is a synthesized voice device in the user terminal, this synthesized voice device may be used as a voice output device instead of the voice output server, or a synthesized voice device in the voice server on the mail communication network and the user terminal. May be used as needed.

【００１０】[0010]

【発明の実施の形態】図１は、本発明の電子メール通信
メディア変換システムの一実施例の概略構成を示す図で
ある。１００はメール通信ネットワークであり、任意の
メール通信ネットワークを想定している。メールの配信
方式はそのネットワークの方式を利用する。１１０は大
規模な単語辞書と高い精度の読み・韻律付与ルールを備
え、テキストメールに読み・韻律情報を付与するテキス
ト変換サーバである。１２０は、ユーザがユーザ端末に
音声合成装置を持たない場合や高品質の音声を聞きたい
場合に合成音声を出力するために用いる音声出力サーバ
である。１３０は電子メールを利用するユーザのユーザ
端末、１４０は各ユーザがもつ音声出力設定ファイル、
１５０は読み・韻律情報に基づき合成音声を出力する音
声合成装置である。音声合成装置１５０は、図２の合成
音声生成部２４０に相当し、ソフトウェアのみで構成さ
れるもの、ボードとなっているもの等、さまざまなもの
がある。なお、ユーザ端末１３０には、音声合成装置１
５０を持つ端末と持たない端末がある。また、ユーザ端
末１３０が音声合成装置１５０を持つ場合には、音声出
力サーバ１２０を省略してもよい。1 is a diagram showing a schematic configuration of an embodiment of an electronic mail communication media conversion system of the present invention. Reference numeral 100 denotes a mail communication network, which is assumed to be an arbitrary mail communication network. The mail delivery method uses the network method. Reference numeral 110 denotes a text conversion server that has a large-scale word dictionary and a highly accurate reading / prosody addition rule, and adds reading / prosody information to a text mail. A voice output server 120 is used to output a synthesized voice when the user does not have a voice synthesizer in the user terminal or wants to hear a high quality voice. 130 is a user terminal of a user who uses e-mail, 140 is a voice output setting file of each user,
Reference numeral 150 is a voice synthesizing device which outputs a synthetic voice based on the reading / prosodic information. The speech synthesizer 150 corresponds to the synthesized speech generator 240 of FIG. 2, and there are various ones such as one configured only with software and one configured as a board. It should be noted that the user terminal 130 includes the speech synthesizer 1
Some terminals have 50 and others do not. If the user terminal 130 has the voice synthesizer 150, the voice output server 120 may be omitted.

【００１１】音声出力設定ファイル１４０は、メールを
音声出力するための読み・韻律情報を生成するかどう
か、またどのメールを対象に読み・韻律情報を付与する
かなどを設定するファイルであり、ユーザがあらかじめ
必要な情報を設定しておく。この音声出力設定ファイル
の一例を図３に示す。The voice output setting file 140 is a file for setting whether or not to generate reading / prosodic information for voice-outputting a mail, and for which mail the reading / prosody information is to be given, and the like. Sets the necessary information in advance. An example of this audio output setting file is shown in FIG.

【００１２】図３において、３１１は音声出力フラグフ
ィールド、３１２は送信者フィールド、３１３はキーワ
ードフィールド、３１４は日時フィールド、３１５は変
換サーバフィールド、３１６は音声出力デバイスフィー
ルド、３１７は自動読み上げフィールドである。各フィ
ールドは、各フィールド名とフィールド値よりなる。音
声出力フラグフィールド３１１は、“ｏｎ”または“ｏ
ｆｆ”のフィールド値をもつ。ここで、“ｏｎ”の場合
に、テキストメールに読み・韻律情報を付与することを
示し、“ｏｆｆ”の場合には付与しないことを示す。In FIG. 3, 311 is a voice output flag field, 312 is a sender field, 313 is a keyword field, 314 is a date and time field, 315 is a conversion server field, 316 is a voice output device field, and 317 is an automatic reading field. . Each field consists of each field name and field value. The audio output flag field 311 is "on" or "o".
The field value is “ff”. Here, when “on”, the reading / prosodic information is added to the text mail, and when it is “off”, it is not added.

【００１３】３１２〜３１７の各フィールドは、読み・
韻律情報を付与する条件を表す。各フィールドは、複数
のフィールド値をもつことが可能であり、複数のフィー
ルド値をもつ場合には、それらがＯＲ条件を表す。ま
た、フィールドがないか、またはフィールド値をもたな
い場合、そのフィールドに対して任意のメールを読み・
韻律情報付与の対象とする。The fields 312 to 317 are read and
Indicates the condition for adding prosody information. Each field can have multiple field values, and if they have multiple field values, they represent an OR condition. Also, if there is no field or no field value, read any email for that field.
It is the target of prosody information addition.

【００１４】送信者フィールド３１２は、当該フィール
ド値のいずれかの文字列を含む送信者からの電子メール
のみを、読み・韻律情報付与の対象とする。図３の例で
は、ａｂｃ＠ａａａ．ｂｂｂ．ｃｃｃ．ｊｐやｙｙｙ＠
ｄｄｄ．ｅｅｅ．ｊｐやｙｙｙ＠ｄｄｄ．ｅｅｅ．ｊｐ
からきたメールのみを対象とする。The sender field 312 is a target of reading / prosodic information addition only for an electronic mail from a sender including any character string of the field value. In the example of FIG. 3, abc @ aaa. bbb. ccc. jp or yyy @
ddd. ee. jp or yyy @ ddd. ee. jp
Only emails from

【００１５】キーワードフィールド３１３は、当該フィ
ールド値のいずれかを本文に含む電子メールのみを、読
み・韻律情報付与の対象とする。図３の例では、「緊
急」または「至急」という文字を本文に含む電子メール
のみを対象とする。The keyword field 313 targets only e-mails containing any of the field values in the text for reading / prosodic information addition. In the example of FIG. 3, only electronic mail including the characters “urgent” or “urgent” in the body is targeted.

【００１６】フィールド３１４は、当該日時フィールド
値で示す日時（期間）に到着したメールのみを、読み・
韻律情報付与の対象とする。図３の例ではフィールド値
がないので、日時による制限は行わない。The field 314 reads / reads only the mail that arrives at the date / time (period) indicated by the date / time field value.
It is the target of prosody information addition. In the example of FIG. 3, since there is no field value, there is no limitation by date and time.

【００１７】変換サーバフィールド３１５は、読み・韻
律情報を付与するテキスト変換サーバをフィールド値と
する。フィールドがないか、またはフィールド値を持た
ない場合は、ユーザ端末がテキスト変換サーバとなる。
図３の例では、ｃｏｎｖ＠ｏｏｏ．ｐｐｐ．ｊｐという
テキスト変換サーバを指定している。The conversion server field 315 uses a text conversion server to which reading / prosodic information is added as a field value. If there is no field or no field value, the user terminal becomes the text conversion server.
In the example of FIG. 3, conv @ ooo. ppp. The text conversion server jp is specified.

【００１８】音声出力デバイスフィールド３１６は、合
成音声を生成するデバイス（ネットワーク上の音声出力
サーバまたはユーザ端末の音声合成装置）をフィールド
値とする。また、このフィールド値により、図２の読み
・韻律情報２３０のフォーマットが定まる。図３の例で
は、ｓｙａｂｅｒｉｎｂｏ（ここでは「しゃべりん坊Ｈ
Ｇ」ＮＴＴ−ＩＴ社製を表している）という音声合成装
置により、合成音声を生成する。The voice output device field 316 has a device (a voice output server on the network or a voice synthesizer of a user terminal) that generates a synthesized voice as a field value. The format of the reading / prosodic information 230 of FIG. 2 is determined by this field value. In the example of FIG. 3, syaberinbo (here, “Shaberinbo H
G) (representing the product of NTT-IT) is used to generate synthetic speech.

【００１９】自動読み上げフィールド３１７は、当該フ
ィールド値が“ｏｎ”の場合、読み・韻律情報が付与さ
れた電子メールが到着するか、またはユーザがユーザ端
末にログインし、未読電子メールがある場合に、自動的
にユーザ端末から合成音声で読み上げることを表わして
いる。フィールド値が“ｏｆｆ”の場合には、ユーザが
電子メール読み上げコマンドを実行した場合にのみ、電
子メールの読み上げを行う。When the field value is "on", the automatic reading field 317 indicates that an e-mail with reading / prosody information arrives, or the user logs in to the user terminal and there is an unread e-mail. , Which means that the user terminal automatically reads aloud a synthetic voice. When the field value is "off", the e-mail is read out only when the user executes the e-mail reading command.

【００２０】以上まとめると、図３の音声出力設定ファ
イルの例では、「メールアドレスにａｂｃ＠ａａａ．ｂ
ｂｂ．ｃｃｃ．ｊｐまたはｄｄｄ．ｅｅｅ．ｊｐという
文字列を含む送信者から来る、本文に「緊急」または
「至急」という文字列を含むメール」に対して、ｃｏｎ
ｖ＠ｏｏｏ．ｐｐｐ．ｊｐというテキスト変換サーバに
より読み・韻律情報を付与し、ｓｙａｂｅｒｉｎｂｏと
いう音声合成装置により、読み・韻律情報が付与された
電子メールが到着すると同時に合成音声を出力する。In summary, in the example of the voice output setting file of FIG. 3, "mail address is abc@aaa.b.
bb. ccc. jp or ddd. ee. For mails that include the text string "Urgent" or "Urgent" from the sender that contains the text string "jp," con
v @ ooo. ppp. The text conversion server jp adds reading / prosodic information, and the voice synthesizer sibererbo outputs synthetic voice at the same time when the e-mail with reading / prosodic information arrives.

【００２１】なお、電子メールを集中管理サーバで集中
管理する場合には、図３のような音声出力設定ファイル
１４０も、集中管理サーバでユーザ対応に集中管理すれ
ばよい。When the e-mail is centrally managed by the central management server, the voice output setting file 140 as shown in FIG. 3 may be centrally managed by the central management server for each user.

【００２２】図４は、本電子メール通信メディア変換シ
ステムが電子メールに読み・韻律情報を付与する処理の
流れを示す図である。図４の処理は、新規電子メールが
到着するごとにその処理が繰り返される。以下、図４の
処理を詳細に説明する。FIG. 4 is a diagram showing a flow of processing in which the present electronic mail communication media conversion system adds reading / prosodic information to electronic mail. The process of FIG. 4 is repeated each time a new electronic mail arrives. Hereinafter, the processing of FIG. 4 will be described in detail.

【００２３】（１）任意のユーザ（ここではＲとす
る）に、テキストのみからなる新規電子メール（電子メ
ールの送信者≠テキスト変換サーバ）が到着する。これ
は、メール通信ネットワーク１００がインターネット等
のバケツリレー方式の場合には、受信側のユーザ端末に
電子メールが到着することを意味する。また、パソコン
通信等の集中管理方式の場合には、電子メールの集中管
理サーバ（図１では図示せず）に電子メールが到着する
ことを意味する。電子メール到着後、（２）へ移行す
る。(1) A new electronic mail (sender of electronic mail ≠ text conversion server) consisting only of text arrives at an arbitrary user (here, R). This means that when the mail communication network 100 is a bucket relay system such as the Internet, the electronic mail arrives at the user terminal on the receiving side. In the case of a centralized management system such as personal computer communication, it means that the electronic mail arrives at a centralized management server (not shown in FIG. 1) for the electronic mail. After the e-mail arrives, move to (2).

【００２４】（２）音声出力設定ファイル１４０の音
声出力フラグフィールド３１１をチエックする。当該フ
ィールドが“ｏｎ”である場合には（３）へ移行する。
“ｏｆｆ”である場合には処理を終了する。(2) Check the voice output flag field 311 of the voice output setting file 140. When the field is “on”, the process proceeds to (3).
If it is "off", the process is terminated.

【００２５】（３）音声出力設定ファイル１４の送信
者フィールド３１２、キーワードフィールド３１３、日
時フィールド３１４の各条件を満たすか判定する。条件
を満たす場合には（４）へ移行する。条件を満たさない
場合には処理を終了する。(3) It is determined whether each condition of the sender field 312, the keyword field 313, and the date / time field 314 of the voice output setting file 14 is satisfied. When the condition is satisfied, the process shifts to (4). If the condition is not satisfied, the process ends.

【００２６】（４）音声出力設定ファイル１４０の変
換サーバフィールド３１５で指定するテキスト変換サー
バ１１０に、到着した電子メールを転送する。この処理
後、（５）へ移行する。(4) The arrived electronic mail is transferred to the text conversion server 110 designated by the conversion server field 315 of the voice output setting file 140. After this process, the process moves to (5).

【００２７】（５）テキスト変換サーバ１１０におい
て、メールヘッダ部に読み・韻律情報を付与する。この
処理後（６）へ移行する。(5) The text conversion server 110 adds reading / prosodic information to the mail header section. After this process, the process moves to (6).

【００２８】（６）ユーザＲに、読み・韻律情報を付
与した電子メールを転送する。これで、電子メール読み
・韻律情報を付与する処理が終了する。(6) The electronic mail to which the reading / prosodic information is added is transferred to the user R. This completes the process of adding the e-mail reading / prosodic information.

【００２９】ここで、メール通信ネットワークがパソコ
ン通信等の集中管理方式の場合には、電子メールはネッ
トワーク内の集中管理サーバに格納され、ユーザはその
サーバからの自ユーザ端末へ電子メールを引きだして読
むことになる。このようなネットワーク形態の場合に
は、集中管理サーバが図１のテキスト変換サーバ１１
０、音声出力サーバ１２０を兼ねる場合がある。この場
合、図４の（４）、（６）の転送処理は不要であり、集
中管理サーバ内で処理が終結することになる。Here, when the mail communication network is a centralized management system such as personal computer communication, the electronic mail is stored in a centralized management server in the network, and the user draws the electronic mail from the server to his own user terminal. Will read. In the case of such a network form, the central management server is the text conversion server 11 of FIG.
0, the voice output server 120 may also be used. In this case, the transfer processing of (4) and (6) in FIG. 4 is unnecessary, and the processing ends in the centralized management server.

【００３０】図６に、図３の音声出力設定ファイルのフ
ィールド例を用いた場合の電子メールの読み・韻律情報
付与処理例を示す。FIG. 6 shows an example of e-mail reading / prosodic information adding processing when the field example of the voice output setting file of FIG. 3 is used.

【００３１】図６（Ａ）は読み・韻律情報付与前の電子
メールである。この電子メールの送信者は「ａｂｃ＠ａ
ａａ．ｂｂｂ．ｃｃｃ．ｊｐ」であり、図３の送信者フ
ィールド３１２を満たす。また、電子メールの本文には
「緊急」という文字列が含まれるので、図３のキーワー
ドフィールド３１３を満たす。そして、図３において、
日時フィールド３１４の指定はないので、この図６
（Ａ）の電子メールは読み・韻律情報付与の対象とな
る。FIG. 6A shows an electronic mail before the reading / prosodic information is added. The sender of this email is "abc @ a
aa. bbb. ccc. jp ”and fills the sender field 312 of FIG. Further, since the text of the e-mail contains the character string “urgent”, the keyword field 313 of FIG. 3 is satisfied. And in FIG.
Since the date / time field 314 is not specified, this FIG.
The electronic mail of (A) is a target for reading and prosody information addition.

【００３２】図６（Ｂ）は読み・韻律情報付与後の電子
メールである。ヘッダ部に、網かけで示したような読み
・韻律情報が付与される。この読み・韻律情報のフォー
マットは、図３の音声出力デバイスフィールド３１６で
指定されたｓｙａｂｅｒｉｎｂｏフォーマットであり、
「読みデータ〔文節セパレータアクセント〕」をひと
まとめにして文節としている（具体的には、「しゃべり
ん坊ＨＧ操作マニュアル」（ＮＴＴ−ＩＴ）Ｐ７２に示
されるアクセント付カナ文の形式を参照）。この例で
は、「コンニチワ」、「コレワ」、「キンキューノ」、
「メールデス」などが読みデータ、「。」（長ポーズを
伴う）、「／」（ポーズを伴わない弱い結合）、「＊」
（ポーズを伴わない強い結合）などが文節セパレータ、
「００」、「０４」がアクセントを付ける読みの位置で
ある。FIG. 6B shows an electronic mail after the reading / prosodic information is added. Reading / prosodic information as shown by hatching is added to the header portion. The format of this reading / prosodic information is the syaberinbo format specified in the audio output device field 316 of FIG.
The "reading data [sentence separator accent]" is collected into a sentence (specifically, refer to the accented kana sentence format shown in "Shaberinbo HG Operation Manual" (NTT-IT) P72). In this example, "konichiwa", "korewa", "kinkyuno",
"Mail Death" is read data, "." (With long pause), "/" (weak bond without pause), "*"
(Strong bond without pause) etc. is a phrase separator,
"00" and "04" are the reading positions to be accented.

【００３３】図５は、本電子メール通信メディア変換シ
ステムの音声出力処理の流れを示す図である。この音声
出力処理は、ユーザに到着した任意のメールが対象とな
る。FIG. 5 is a diagram showing a flow of voice output processing of the electronic mail communication media conversion system. This voice output process is targeted for any mail that arrives at the user.

【００３４】（１）ユーザ端末１３０において、メー
ル読み上げコマンドが実行されたか判定する。実行され
た場合には（２）へ移行する。実行されない場合には処
理を終了する。なお、図３の自動読み上げフィールド３
１７が“ｏｎ”の場合には、図４の（６）において、読
み・韻律情報の付与されたメールをユーザに転送後、す
ぐにこのメール読み上げコマンドが実行される。(1) In the user terminal 130, it is determined whether or not the mail reading command is executed. When it is executed, the process proceeds to (2). If not executed, the process ends. In addition, the automatic reading field 3 in FIG.
When 17 is “on”, in (6) of FIG. 4, this mail read command is executed immediately after the mail to which the reading / prosodic information is added is transferred to the user.

【００３５】（２）メール読み上げコマンドが実行さ
れた電子メールのヘッダ部に、読み・韻律付与情報が存
在するか判定する。存在しない場合には（３）へ移行す
る。存在する場合には（４）へ移行する。(2) It is determined whether or not the reading / prosody addition information is present in the header portion of the electronic mail in which the mail reading command is executed. If it does not exist, the process proceeds to (3). If it exists, the process proceeds to (4).

【００３６】（３）図４の（４），（５），（６）の
読み・韻律情報設定処理を行う。この処理後、（４）へ
移行する。(3) The reading / prosodic information setting process of (4), (5), and (6) in FIG. 4 is performed. After this processing, the process moves to (4).

【００３７】（４）電子メールヘッダの読み・韻律情
報から、図３の音声出力デバイスフィールド３１６で示
される音声出力デバイスにより音声を生成し、ユーザ端
末１３０から合成音声を出力する。(4) From the reading / prosodic information of the electronic mail header, a voice is generated by the voice output device indicated by the voice output device field 316 in FIG. 3, and the synthesized voice is output from the user terminal 130.

【００３８】[0038]

【発明の効果】以上説明したように、本発明によれば、
大規模な単語辞書を持ち、精度の高い読み・韻律付与ル
ールを備えたテキスト変換サーバを介することにより、
ユーザ端末に単語辞書をもつ必要がないので、携帯端末
等のディスク容量が小さい端末においても、読みが正確
で韻律が自然な読み・韻律情報を得ることができるの
で、電子メールを、読み誤りがない自然な合成音声で聞
くことが可能となる。さらに、特にユーザ端末の合成音
声装置を利用する場合には、ユーザ端末が受け取るデー
タ量をほとんど増やすことなく、電子メールを、読み誤
りがなく、自然な合成音声で聞くことが可能となる。As described above, according to the present invention,
By having a large-scale word dictionary and a text conversion server equipped with highly accurate rules for reading and prosody,
Since it is not necessary for the user terminal to have a word dictionary, it is possible to obtain reading / prosodic information that is accurate in reading and has a natural prosody even in a terminal with a small disk capacity such as a mobile terminal. It becomes possible to listen with no natural synthetic voice. Further, particularly when using the synthetic speech device of the user terminal, it becomes possible to listen to the e-mail with natural synthetic speech without reading errors, while increasing the amount of data received by the user terminal.

[Brief description of drawings]

【図１】本発明の電子メール通信メディア変換システム
の一実施例の概略構成を示す図である。FIG. 1 is a diagram showing a schematic configuration of an embodiment of an electronic mail communication media conversion system of the present invention.

【図２】一般的なテキスト音声変換装置の処理の流れを
示す図である。FIG. 2 is a diagram showing a processing flow of a general text-to-speech conversion device.

【図３】本発明で用いる音声出力設定ファイルの一例を
示す図である。FIG. 3 is a diagram showing an example of an audio output setting file used in the present invention.

【図４】本発明の電子メール通信メディア変換システム
の読み・韻律情報付与処理の流れを示す図である。FIG. 4 is a diagram showing a flow of reading / prosodic information addition processing of the electronic mail communication media conversion system of the present invention.

【図５】本発明の電子メール通信メディア変換システム
の音声出力処理の流れを示す図である。FIG. 5 is a diagram showing a flow of voice output processing of the electronic mail communication media conversion system of the present invention.

【図６】読み・韻律付与処理の具体例を示す図である。FIG. 6 is a diagram showing a specific example of reading / prosody addition processing.

[Explanation of symbols]

１００メール通信ネットワーク１１０テキスト変換サーバ１２０音声出力サーバ１３０ユーザ端末１４０音声出力設定ファイル１５０音声合成装置３１１音声出力フラグフィールド３１２送信者フィールド３１３キーワードフィールド３１４日時フィールド３１５変換サーバフィールド３１６音声出力デバイスフィールド３１７自動読み上げフィールド 100 mail communication network 110 Text conversion server 120 voice output server 130 user terminal 140 audio output setting file 150 speech synthesizer 311 Audio output flag field 312 Sender field 313 Keyword field 314 date and time field 315 Conversion server field 316 Audio output device field 317 Automatic reading field

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04L 12/58 G06F 13/00 G06F 3/16 ─────────────────────────────────────────────────── ─── Continuation of the front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) H04L 12/58 G06F 13/00 G06F 3/16

Claims

(57) [Claims]

1. A mail communication network for delivering electronic mail, and a connection to the mail communication network,
In an e-mail communication system composed of a plurality of user terminals for sending and receiving e-mails by a user, reading / prosodic information for generating synthetic voice for a text e-mail using a voice output device on the mail communication network. Connect a text conversion server that assigns a voice output server to a voice output server that becomes a synthetic voice device and generates a synthetic voice when the user specifies, and becomes a voice output device and generates a synthetic voice when the user specifies in the user terminal. According to the user's setting information, the text conversion server adds reading / prosodic information to the e-mail composed of text, and the voice output server or the synthesized voice device serves as a voice output device to synthesize. E-mail communication media conversion system characterized by generating voice

2. A mail communication network for delivering electronic mail, and a connection to the mail communication network,
In an e-mail communication system composed of a plurality of user terminals for sending and receiving e-mails by a user, reading / prosodic information for generating synthetic voice for a text e-mail using a voice output device on the mail communication network. Connect a text conversion server that assigns a text output to a voice output server that becomes a synthetic voice device and generates synthetic voice when specified by the user, and the text conversion server sends an e-mail composed of text to the text conversion server based on the setting information of the user. An electronic mail communication media conversion system, characterized in that reading / prosodic information is added, and the voice output server serves as a voice output device to generate synthetic voice.

3. A mail communication network for delivering electronic mail, and a connection to the mail communication network,
In an e-mail communication system composed of a plurality of user terminals for sending and receiving e-mails by a user, reading / prosodic information for generating synthetic voice for a text e-mail using a voice output device on the mail communication network. Connect a text conversion server that assigns, and install a synthetic voice device that becomes a voice output device and generates a synthetic voice in the user terminal when the user specifies it. An e-mail communication media conversion system characterized in that reading / prosodic information is added by a text conversion server, and the synthetic speech device serves as a speech output device to generate synthetic speech.

4. The electronic mail communication media conversion system according to claim 1, 2 or 3, and information specifying whether to add reading / prosodic information of electronic mail,
As a condition for adding the reading / prosodic information of the e-mail
Te, information that specifies that to grant read-prosodic information of only a particular e-mail, information that specifies the text conversion server
Broadcast, e-mail communication media conversion system characterized by comprising means for setting information for specifying an audio output device, part or all of the information of the information that specifies whether read aloud by voice at the same time as the arrival of electronic mail.