JP2007241321A

JP2007241321A - Message transmission system, message transmission method, reception device, transmission device and message transmission program

Info

Publication number: JP2007241321A
Application number: JP2004062408A
Authority: JP
Inventors: Reishi Kondou; 玲史近藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2004-03-05
Filing date: 2004-03-05
Publication date: 2007-09-20
Also published as: WO2005086010A1

Abstract

<P>PROBLEM TO BE SOLVED: To provide a message transmission system including a reception device for reading a text message transmitted from a transmission device while displaying an image. <P>SOLUTION: The transmission device 11 transmits the text message to the reception device 13. In the reception device 13, an audio synthesis unit 21 generates a synthesized audio according to the received text message, and outputs it from a loudspeaker 27. An image configuration information generation unit 24 generates information on an image to be displayed together with the output of the synthesized audio, and an image display unit 22 displays an image based on the image information. Here, the audio synthesis unit 21 generates segmentation information as information indicating the break point of the synthesized audio, and inputs it to the image display unit 22. According to the input segmentation information, the image display unit 22 changes the image to be displayed at the timing of the break point of the synthesized audio. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、受信装置が受信したメッセージを読み上げ、画像を表示するメッセージ伝達システム、メッセージ伝達方法、受信装置、送信装置およびメッセージ伝達プログラムに関する。 The present invention relates to a message transmission system, a message transmission method, a reception apparatus, a transmission apparatus, and a message transmission program for reading a message received by a reception apparatus and displaying an image.

従来のメッセージ伝達システムの一例が、非特許文献１に記載されている。図１７は、従来のメッセージ伝達システムの一構成例を示すブロック図である。図１７に示すように、この従来のメッセージ伝達システムは、音声合成部２１と、画像表示部２２とを含む。このような構成を有する従来のメッセージ伝達システムは、次のように動作する。 An example of a conventional message transmission system is described in Non-Patent Document 1. FIG. 17 is a block diagram showing a configuration example of a conventional message transmission system. As shown in FIG. 17, this conventional message transmission system includes a voice synthesis unit 21 and an image display unit 22. The conventional message transmission system having such a configuration operates as follows.

音声合成部２１は、受信した電子メールの内容を読み上げる。このとき読み上げに用いる音声の性質（以下、声質情報という。）は、受信装置１１の使用者（以下、受信者という。）が設定した声質情報か、または受信装置１１が自動に選択した声質情報を使用する。また、画像表示部２２は、受信装置１３が受信した電子メールに対応した画像を表示するとともに、読み上げの進行にともなって、画像を変化させる。 The voice synthesizer 21 reads out the content of the received electronic mail. At this time, the nature of the voice used for reading (hereinafter referred to as voice quality information) is voice quality information set by the user of the reception apparatus 11 (hereinafter referred to as the receiver) or voice quality information automatically selected by the reception apparatus 11. Is used. The image display unit 22 displays an image corresponding to the electronic mail received by the receiving device 13 and changes the image as the reading progresses.

このため、画像の情報および画像を変化させるための情報である画像構成情報を用いる。ここで、画像構成情報とは、受信者が予め受信装置１３に記憶させている画像と、その画像に対して受信者が設定した目、口などの顔の部位の位置情報である。また、他の例では、画像構成情報を、受信者が設定する代わりに、予め端末内に記憶しているものがある。 For this reason, image configuration information that is information for changing the image information and the image is used. Here, the image configuration information is an image stored in advance in the receiving device 13 by the receiver, and position information of facial parts such as eyes and mouth set by the receiver for the image. In another example, the image configuration information may be stored in advance in the terminal instead of being set by the receiver.

また、受信した電子メールを読み上げる他のシステムが、特許文献１に記載されている。特許文献１に記載されているシステムは、受信したメッセージ中の文字列に埋め込まれた動作制御情報にもとづいて、電子メールを合成音声で読み上げると同時に、ロボットを動かす。また、送信者に対応付けて受信した電子メールの音声データを記憶する。 Another system that reads a received electronic mail is described in Patent Document 1. The system described in Patent Document 1 moves the robot simultaneously with reading out an e-mail with synthesized speech based on the operation control information embedded in the character string in the received message. Also, the voice data of the email received in association with the sender is stored.

さらに、受信した電子メールを読み上げる別の方法が、特許文献２に記載されている。特許文献２に記載されている方法は、受信した電子メールに対して、予め受信側で記憶している複数の画像のうちから、送信者ＩＤによって画像を選択して表示し、電子メールを合成音声で読み上げる。合成音声の声質情報は、送信者の音声を予め分析して利用する。 Furthermore, Patent Document 2 describes another method of reading a received electronic mail. The method described in Patent Document 2 selects and displays an image by a sender ID from a plurality of images stored in advance on the receiving side for the received e-mail, and synthesizes the e-mail. Read aloud. The voice quality information of the synthesized voice is used by analyzing the voice of the sender in advance.

「ＮＴＴドコモ携帯電話総合カタログＤｅｃｅｍｂｅｒ２００３（９版）」株式会社ＮＴＴドコモ、２００３年１２月、ｐ．２７“NTT DOCOMO mobile phone general catalog December 2003 (9th edition)” NTT DOCOMO, INC., December 2003, p. 27 特開２００３−３０８１４２号公報（段落００５９−０１１２、図１）JP 2003-308142 A (paragraph 0059-0112, FIG. 1) 特開平０７−０６６８３２号公報（段落００１４−００５９、図１）JP 07-0666832 (paragraphs 0014-0059, FIG. 1)

しかし、従来の技術の第１の問題点は、声質情報と画像構成情報とをともに設定する場合、それぞれを別々に設定しなければならない点である。また、第２の問題点は、受信した電子メールの送信者に適した声質情報または画像構成情報を、受信者が予め選択する等の準備をしなければならず、送信者が電子メールの送信時に送信者に適した声質情報または画像構成情報を通信回線を介して提供する方法が無いということである。さらに、第３の問題点は、送信者と受信者以外の第三者が提供する声質情報または画像構成情報を、通信回線を介して受信者に提供する方法が無いということである。 However, the first problem of the prior art is that when voice quality information and image configuration information are set together, they must be set separately. In addition, the second problem is that the receiver needs to make preparations such as preselecting voice quality information or image configuration information suitable for the sender of the received electronic mail. Sometimes there is no way to provide voice quality information or image composition information suitable for the sender via a communication line. Furthermore, the third problem is that there is no method for providing voice quality information or image configuration information provided by a third party other than the sender and the receiver to the receiver via the communication line.

そこで、本発明は、声質情報または画像構成情報を一体として扱うことのできる、メッセージ伝達システム、メッセージ伝達方法、受信装置、送信装置およびメッセージ伝達プログラムを提供することを目的とする。また、本発明は、電子メールの送信者が、電子メールの送信時に送信者に適した声質情報または画像構成情報を、通信回線を介して提供できるメッセージ伝達システム、メッセージ伝達方法、受信装置、送信装置およびメッセージ伝達プログラムを提供することを目的とする。さらに、本発明は、送信者と受信者以外の第三者が提供する声質情報または画像構成情報を、通信回線を介して提供するメッセージ伝達システム、メッセージ伝達方法、受信装置、送信装置およびメッセージ伝達プログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a message transmission system, a message transmission method, a reception device, a transmission device, and a message transmission program that can handle voice quality information or image configuration information as a unit. The present invention also provides a message transmission system, a message transmission method, a receiving apparatus, and a transmission in which an e-mail sender can provide voice quality information or image configuration information suitable for the sender at the time of e-mail transmission via a communication line. An object is to provide a device and a message transmission program. Furthermore, the present invention provides a message transmission system, a message transmission method, a reception apparatus, a transmission apparatus, and a message transmission that provide voice quality information or image configuration information provided by a third party other than the sender and the receiver via a communication line. The purpose is to provide a program.

本発明によるメッセージ伝達システムは、テキストメッセージを送信する送信手段と、受信したテキストメッセージにもとづいて合成音声を生成する音声合成手段と、合成音声の出力とともに表示する画像の情報を生成する画像情報生成手段と、画像情報生成手段が生成した画像の情報にもとづく画像を表示する画像表示手段と、画像表示手段に表示させる画像の情報である画像構成情報と、音声合成手段に生成させる合成音声の特徴の情報である声質情報とを含む表現情報を予め記憶する表現情報記憶手段とを含み、画像情報生成手段は、画像構成情報にもとづいて画像の情報を生成し、音声合成手段は、声質情報にもとづいて合成音声を生成し、生成した合成音声の区切りを示す情報である区切り情報を生成して、画像表示手段に入力し、画像表示手段は、入力された区切り情報にもとづいて、合成音声の区切りのタイミングで表示する画像を変化させることを特徴とする。 The message transmission system according to the present invention includes a transmitting means for transmitting a text message, a speech synthesizing means for generating synthesized speech based on the received text message, and image information generation for generating image information to be displayed together with the output of the synthesized speech. Means, image display means for displaying an image based on image information generated by the image information generation means, image configuration information which is information of an image to be displayed on the image display means, and characteristics of synthesized speech to be generated by the voice synthesis means Expression information storage means for preliminarily storing expression information including voice quality information which is information of the image, the image information generation means generates image information based on the image configuration information, and the speech synthesis means Generates synthesized speech based on the generated information, generates delimiter information that indicates the delimiter of the generated synthesized speech, and inputs it to the image display means. Image display means, based on the input delimiter information, and wherein the changing the image to be displayed at the timing of the separated synthesized speech.

送信手段を含む送信装置と、音声合成手段と、画像情報生成手段と、画像表示手段と、表現情報記憶手段とを含む受信装置とを備えてもよい。そのような構成によれば、受信装置のユーザが希望する画像と音声とを受信装置に出力させることができる。 A transmission device including a transmission unit, a voice synthesis unit, an image information generation unit, an image display unit, and a reception device including an expression information storage unit may be provided. According to such a configuration, an image and sound desired by the user of the receiving device can be output to the receiving device.

音声合成手段と、画像情報生成手段と、画像表示手段とを含む受信装置と、送信手段と、表現情報記憶手段と、表現情報記憶手段が記憶している表現情報を受信装置に送信する表現情報送信手段とを含む送信装置とを備えてもよく、受信装置は、表現情報を受信して声質情報を生成する声質情報生成手段を含んでもよく、画像情報生成手段は、表現情報を送信装置から受信して、表現情報から画像構成情報を生成してもよい。そのような構成によれば、送信装置のユーザが希望する画像と音声とを、受信装置に出力させることができる。 Expression information for transmitting the expression information stored in the reception apparatus including the voice synthesis means, the image information generation means, and the image display means, the transmission means, the expression information storage means, and the expression information storage means to the reception apparatus. A receiving device may include voice quality information generating means for receiving the expression information and generating voice quality information, and the image information generating means receives the expression information from the transmitting device. The image configuration information may be generated from the expression information received. According to such a configuration, an image and sound desired by the user of the transmission device can be output to the reception device.

送信装置は、受信装置に送信した表現情報に応じた料金の情報である表現情報課金情報を生成する課金手段を含んでもよい。そのような構成によれば、受信装置に送信した表現情報に応じて、送信装置のユーザに料金を課金することができる。 The transmitting device may include a charging unit that generates expression information charging information that is information on a fee according to the expression information transmitted to the receiving device. According to such a configuration, a charge can be charged to the user of the transmission apparatus according to the expression information transmitted to the reception apparatus.

画像構成情報の全部または一部を記憶する外部画像構成情報記憶手段と、外部画像構成情報記憶手段が記憶している画像構成情報の全部または一部を受信装置に送信する画像構成情報送信手段とを含む画像構成情報提供装置を備えてもよい。そのような構成によれば、外部画像構成情報記憶手段が記憶している、送信装置および受信装置のユーザ以外の第三者が提供する画像を、受信装置に出力させることができる。 External image configuration information storage means for storing all or part of the image configuration information; and image configuration information transmission means for transmitting all or part of the image configuration information stored in the external image configuration information storage means to the receiving device; May be provided. According to such a configuration, the image provided by a third party other than the user of the transmission device and the reception device stored in the external image configuration information storage unit can be output to the reception device.

受信装置と画像構成情報提供装置とは専用回線で接続されてもよい。そのような構成によれば、受信装置と画像構成情報提供装置との通信のプロトコルを簡易なものにできるので、メッセージ伝達システムの構築が簡単になる。 The receiving device and the image configuration information providing device may be connected by a dedicated line. According to such a configuration, since the communication protocol between the receiving device and the image configuration information providing device can be simplified, the construction of the message transmission system is simplified.

受信装置と画像構成情報提供装置とは公衆回線網で接続されてもよい。そのような構成によれば、受信装置のユーザは、インターネット等の公衆回線網を介して複数の画像構成情報提供装置が記憶している画像構成情報を用いることができる。 The receiving device and the image configuration information providing device may be connected via a public network. According to such a configuration, the user of the receiving device can use image configuration information stored in a plurality of image configuration information providing devices via a public network such as the Internet.

表現情報は、外部画像構成情報記憶手段が記憶する画像構成情報の全部または一部の位置を示す情報である画像インデックス情報を含んでもよく、画像情報生成手段は、画像インデックス情報にもとづいて、画像構成情報送信手段に、外部画像構成情報記憶手段が記憶する画像構成情報の全部または一部の送信を要求してもよい。 The expression information may include image index information that is information indicating the position of all or part of the image configuration information stored in the external image configuration information storage unit, and the image information generation unit generates an image based on the image index information. The configuration information transmission unit may request transmission of all or part of the image configuration information stored in the external image configuration information storage unit.

画像構成情報送信手段は、画像情報生成手段の要求に応じて、外部画像構成情報記憶手段が記憶する画像構成情報の全部または一部を受信装置に送信してもよい。 The image configuration information transmission unit may transmit all or part of the image configuration information stored in the external image configuration information storage unit to the receiving device in response to a request from the image information generation unit.

画像構成情報提供装置は、受信装置に送信した画像構成情報に応じた料金の情報である画像課金情報を生成する課金手段を含んでもよい。そのような構成によれば、受信装置に送信した画像構成情報に応じて、受信装置のユーザに料金を課金することができる。 The image configuration information providing device may include a billing unit that generates image billing information that is fee information according to the image configuration information transmitted to the receiving device. According to such a configuration, a charge can be charged to the user of the receiving device according to the image configuration information transmitted to the receiving device.

声質情報の全部または一部を記憶する外部声質情報記憶手段と、外部声質情報記憶手段が記憶している声質情報の全部または一部を受信装置に送信する声質情報送信手段を含む声質情報提供装置を備えてもよい。そのような構成によれば、外部声質情報記憶手段が記憶している、送信装置および受信装置のユーザ以外の第三者が提供する音声の性質の合成音声を、受信装置に出力させることができる。 Voice quality information providing device including external voice quality information storage means for storing all or part of voice quality information, and voice quality information transmission means for sending all or part of voice quality information stored in the external voice quality information storage means to the receiving device May be provided. According to such a configuration, it is possible to cause the receiving device to output synthesized speech having the nature of speech provided by a third party other than the user of the transmitting device and the receiving device, stored in the external voice quality information storage unit. .

受信装置と声質情報提供装置とは専用回線で接続されてもよい。そのような構成によれば、受信装置と声質情報提供装置との通信のプロトコルを簡易なものにできるので、メッセージ伝達システムの構築が簡単になる。 The receiving device and the voice quality information providing device may be connected by a dedicated line. According to such a configuration, since the communication protocol between the receiving device and the voice quality information providing device can be simplified, the construction of the message transmission system is simplified.

受信装置と声質情報提供装置とは公衆回線網で接続されてもよい。そのような構成によれば、受信装置のユーザは、インターネット等の公衆回線網を介して複数の声質情報提供装置が記憶している画像構成情報を用いることができる。 The receiving device and the voice quality information providing device may be connected via a public network. According to such a configuration, the user of the receiving device can use image configuration information stored in a plurality of voice quality information providing devices via a public line network such as the Internet.

表現情報は、外部声質情報記憶手段が記憶する声質情報の全部または一部の位置を示す情報である声質インデックス情報を含んでもよく、音声合成手段は、声質インデックス情報にもとづいて、声質情報送信手段に、外部声質情報記憶手段が記憶する声質情報の全部または一部の送信を要求してもよい。 The expression information may include voice quality index information that is information indicating the position of all or part of the voice quality information stored in the external voice quality information storage means, and the voice synthesis means is configured to transmit voice quality information based on the voice quality index information. In addition, transmission of all or part of the voice quality information stored in the external voice quality information storage means may be requested.

声質情報送信手段は、音声合成手段の要求に応じて、外部声質情報記憶手段が記憶する声質情報の全部または一部を受信装置に送信してもよい。 The voice quality information transmitting means may transmit all or part of the voice quality information stored in the external voice quality information storage means to the receiving device in response to a request from the voice synthesis means.

声質情報提供装置は、受信装置に送信した声質情報に応じた料金の情報である声質課金情報を生成する課金手段を含んでもよい。そのような構成によれば、受信装置に送信した声質情報に応じて、受信装置のユーザに料金を課金することができる。 The voice quality information providing device may include a billing unit that generates voice quality billing information that is information on a fee according to the voice quality information transmitted to the receiving device. According to such a configuration, a charge can be charged to the user of the receiving apparatus according to the voice quality information transmitted to the receiving apparatus.

本発明によるメッセージ伝達方法は、テキストメッセージを受信し、表示する画像の情報である画像構成情報と合成音声の特徴の情報である声質情報とを含む表現情報の声質情報と、受信したテキストメッセージとにもとづいて合成音声を生成し、合成音声の区切りを示す情報である区切り情報を生成し、合成音声を出力し、画像構成情報にもとづいて合成音声の出力とともに表示する画像の情報を生成し、画像の情報にもとづく画像を表示し、区切り情報にもとづいて、合成音声の区切りのタイミングで表示する画像を変化させることを特徴とする。 The message transmission method according to the present invention receives a text message, voice quality information of expression information including image configuration information that is information of an image to be displayed and voice quality information that is characteristic information of synthesized speech, and a received text message. Generating synthesized speech based on the generated speech, generating break information which is information indicating a break of the synthesized speech, outputting the synthesized speech, generating information on the image to be displayed together with the output of the synthesized speech based on the image configuration information, An image based on the image information is displayed, and the image to be displayed is changed at the timing of the synthesized speech separation based on the separation information.

表現情報から画像構成情報を生成してもよく、表現情報から声質情報を生成してもよい。そのような方法によれば、ユーザが希望する画像と音声とを出力させることができる。 Image configuration information may be generated from expression information, and voice quality information may be generated from expression information. According to such a method, an image and sound desired by the user can be output.

予め記憶している表現情報を送信してもよく、表現情報を受信すると、受信した表現情報から画像構成情報を生成してもよく、受信した表現情報から声質情報を生成してもよい。そのような方法によれば、送信側のユーザが希望する画像と音声とを、出力側に出力させることができる。 Expression information stored in advance may be transmitted. When expression information is received, image configuration information may be generated from the received expression information, or voice quality information may be generated from the received expression information. According to such a method, the image and sound desired by the user on the transmission side can be output to the output side.

送信した表現情報に応じた料金の情報である表現情報課金情報を生成してもよい。そのような方法によれば、送信した表現情報に応じて、受信側のユーザに料金を課金することができる。 Expression information billing information, which is fee information corresponding to the transmitted expression information, may be generated. According to such a method, it is possible to charge a charge to the user on the receiving side according to the transmitted expression information.

表現情報は、画像構成情報の全部または一部を記憶している外部画像構成情報記憶手段における、画像構成情報の全部または一部を記憶している位置を示す情報である画像インデックス情報を含んでもよく、画像インデックス情報にもとづいて、外部画像構成情報記憶手段に、記憶している画像構成情報の全部または一部の送信を要求してもよく、外部画像構成情報記憶手段から、画像構成情報の全部または一部を受信してもよい。そのような方法によれば、外部画像構成情報記憶手段が記憶している第三者等の画像を、受信側に出力させることができる。 The expression information may include image index information that is information indicating a position where all or part of the image configuration information is stored in the external image configuration information storage unit that stores all or part of the image configuration information. Often, based on the image index information, the external image configuration information storage means may be requested to transmit all or part of the stored image configuration information. You may receive all or one part. According to such a method, an image of a third party or the like stored in the external image configuration information storage unit can be output to the receiving side.

外部画像構成情報記憶手段において、送信した画像構成情報に応じた料金の情報である画像課金情報を生成してもよい。そのような方法によれば、受信側に送信した画像構成情報に応じて、受信側のユーザに料金を課金することができる。 The external image configuration information storage means may generate image billing information that is fee information according to the transmitted image configuration information. According to such a method, a charge can be charged to the user on the receiving side according to the image configuration information transmitted to the receiving side.

表現情報は、声質情報の全部または一部を記憶している外部声質情報記憶手段における、声質情報の全部または一部を記憶している位置を示す情報である声質インデックス情報を含んでもよく、声質インデックス情報にもとづいて、外部声質情報記憶手段に、声質情報の全部または一部の送信を要求してもよく、外部声質情報記憶手段から、声質情報の全部または一部を受信してもよい。そのような方法によれば、外部声質情報記憶手段が記憶している第三者の音声の性質の合成音声を、受信側に出力させることができる。 The expression information may include voice quality index information which is information indicating a position where all or part of the voice quality information is stored in the external voice quality information storage means storing all or part of the voice quality information. Based on the index information, the external voice quality information storage means may be requested to transmit all or part of the voice quality information, and all or part of the voice quality information may be received from the external voice quality information storage means. According to such a method, it is possible to cause the receiving side to output synthesized speech having the nature of third-party speech stored in the external voice quality information storage unit.

外部声質情報記憶手段において、送信した声質情報に応じた料金の情報である声質課金情報を生成してもよい。そのような方法によれば、受信側に送信した声質情報に応じて、受信側のユーザに料金を課金することができる。 In the external voice quality information storage means, voice quality billing information which is information on a fee corresponding to the transmitted voice quality information may be generated. According to such a method, a charge can be charged to the user on the receiving side according to the voice quality information transmitted to the receiving side.

本発明による受信装置は、送信装置からテキストメッセージを受信する受信装置であって、画像の情報である画像構成情報と合成音声の特徴の情報である声質情報とを含む表現情報の声質情報と、送信装置から受信したテキストメッセージとにもとづいて合成音声を生成する音声合成手段と、画像構成情報にもとづいて、合成音声の出力とともに表示する画像の情報を生成する画像情報生成手段と、画像情報生成手段が生成した画像の情報にもとづく画像を表示する画像表示手段とを含み、音声合成手段は、生成した合成音声の区切りを示す情報である区切り情報を生成して、画像表示手段に入力し、画像表示手段は、入力された区切り情報にもとづいて、合成音声の区切りのタイミングで表示する画像を変化させることを特徴とする。 A receiving device according to the present invention is a receiving device that receives a text message from a transmitting device, and includes voice quality information of expression information including image configuration information that is image information and voice quality information that is information of characteristics of synthesized speech, Speech synthesis means for generating synthesized speech based on a text message received from a transmission device, image information generating means for generating information on an image to be displayed together with output of synthesized speech based on image configuration information, and image information generation Image display means for displaying an image based on the information of the image generated by the means, the speech synthesis means generates separator information that is information indicating a separator of the generated synthesized speech, and inputs it to the image display means, The image display means is characterized in that the image to be displayed is changed at the timing of the synthesized speech separation based on the inputted separation information.

表現情報を予め記憶する表現情報記憶手段を含んでもよく、画像情報生成手段は、表現情報から画像構成情報を生成してもよい。そのような構成によれば、受信装置のユーザが希望する画像と音声とを出力させることができる。 Expression information storage means for storing expression information in advance may be included, and the image information generation means may generate image configuration information from the expression information. According to such a configuration, it is possible to output an image and sound desired by the user of the receiving apparatus.

表現情報を受信して、表現情報から声質情報を生成する声質情報生成手段を含んでもよく、画像情報生成手段は、表現情報を受信して、表現情報から画像構成情報を生成してもよい。そのような構成によれば、送信装置のユーザが希望する画像と音声とを、受信装置に出力させることができる。 Voice quality information generating means for receiving the expression information and generating voice quality information from the expression information may be included, and the image information generating means may receive the expression information and generate image configuration information from the expression information. According to such a configuration, an image and sound desired by the user of the transmission device can be output to the reception device.

画像構成情報の全部または一部を記憶する画像構成情報提供装置から、画像構成情報の全部または一部を受信してもよい。そのような構成によれば、画像情報提供装置が記憶している第三者等の画像を、受信装置に出力させることができる。 All or part of the image configuration information may be received from an image configuration information providing apparatus that stores all or part of the image configuration information. According to such a configuration, an image of a third party or the like stored in the image information providing device can be output to the receiving device.

表現情報は、画像構成情報提供装置が記憶する画像構成情報の全部または一部の位置を示す情報である画像インデックス情報を含んでもよく、画像情報生成手段は、画像インデックス情報にもとづいて、画像構成情報提供装置に、画像構成情報提供装置が記憶する画像構成情報の全部または一部の送信を要求し、画像構成情報提供装置から、画像構成情報提供装置が記憶する画像構成情報の全部または一部を受信してもよい。 The expression information may include image index information that is information indicating the position of all or part of the image configuration information stored in the image configuration information providing apparatus, and the image information generation unit is configured to generate the image configuration information based on the image index information. The information providing device is requested to transmit all or part of the image configuration information stored in the image configuration information providing device, and all or part of the image configuration information stored in the image configuration information providing device is transmitted from the image configuration information providing device. May be received.

画像情報提供装置と専用回線で接続されていてもよい。そのような構成によれば、受信装置と画像構成情報提供装置との通信のプロトコルを簡易なものにできる。 The image information providing apparatus may be connected by a dedicated line. According to such a configuration, a communication protocol between the receiving device and the image configuration information providing device can be simplified.

画像情報提供装置と公衆回線網で接続されていてもよい。そのような構成によれば、受信装置のユーザは、複数の画像情報提供装置が記憶している画像構成情報を用いることができる。 The image information providing apparatus may be connected to the public line network. According to such a configuration, the user of the receiving device can use the image configuration information stored in the plurality of image information providing devices.

声質情報の全部または一部を記憶する声質情報提供装置から、声質情報の全部または一部を受信してもよい。そのような構成によれば、声質情報提供装置が記憶している第三者の音声の性質の合成音声を、受信装置に出力させることができる。 You may receive all or a part of voice quality information from the voice quality information provision apparatus which memorize | stores all or a part of voice quality information. According to such a configuration, it is possible to cause the receiving device to output synthesized speech having the nature of third-party speech stored in the voice quality information providing device.

表現情報は、声質情報提供装置が記憶する声質情報の全部または一部の位置を示す情報である声質インデックス情報を含んでもよく、音声合成手段は、声質インデックス情報にもとづいて、声質情報提供装置に、声質情報提供装置が記憶する声質情報の全部または一部の送信を要求し、声質情報提供装置から、声質情報提供装置が記憶する声質情報の全部または一部を受信してもよい。 The expression information may include voice quality index information which is information indicating the position of all or a part of the voice quality information stored in the voice quality information providing apparatus, and the speech synthesis means provides the voice quality information providing apparatus based on the voice quality index information. The voice quality information providing apparatus may request transmission of all or part of the voice quality information stored therein, and may receive all or part of the voice quality information stored in the voice quality information providing apparatus from the voice quality information providing apparatus.

声質情報提供装置と専用回線で接続されていてもよい。そのような構成によれば、受信装置と声質情報提供装置との通信のプロトコルを簡易なものにできる。 The voice quality information providing apparatus may be connected by a dedicated line. According to such a configuration, the communication protocol between the receiving device and the voice quality information providing device can be simplified.

声質情報提供装置と公衆回線網で接続されていてもよい。そのような構成によれば、受信装置のユーザは、複数の声質情報提供装置が記憶している声質情報を用いることができる。 The voice quality information providing apparatus may be connected to the public line network. According to such a configuration, the user of the receiving apparatus can use voice quality information stored in a plurality of voice quality information providing apparatuses.

本発明による送信装置は、受信装置に伝達するテキストメッセージ、受信装置に表示させる画像の情報である画像構成情報、および受信装置に生成させるテキストメッセージの合成音声の特徴の情報である声質情報を含む表現情報とを送信する送信手段を含むことを特徴とする。 The transmission device according to the present invention includes a text message to be transmitted to the reception device, image configuration information that is information of an image to be displayed on the reception device, and voice quality information that is information on characteristics of synthesized speech of the text message to be generated by the reception device. It includes transmission means for transmitting expression information.

本発明によるメッセージ伝達プログラムは、コンピュータに、画像の情報である画像構成情報と合成音声の特徴の情報である声質情報とを含む表現情報の声質情報と、受信したテキストメッセージとにもとづいて合成音声を生成させ、合成音声の区切りを示す情報である区切り情報を生成させる音声合成処理と、テキストメッセージの合成音声の出力とともに表示する画像の情報を、画像構成情報にもとづいて生成させる画像情報生成処理と、区切り情報にもとづいて、画像を表示する画像表示手段に、合成音声の区切りのタイミングで画像を変化させて表示させる画像表示処理とを実行させることを特徴とする。 The message transmission program according to the present invention provides a computer with synthesized speech based on voice quality information of expression information including image configuration information that is image information and voice quality information that is information of characteristics of synthesized speech, and a received text message. Information generation processing for generating image information to be displayed together with the output of the synthesized voice of the text message, based on the image configuration information And image display means for displaying an image based on the separation information, and performing an image display process for changing the image at the timing of the synthesized speech separation.

コンピュータに、表現情報を受信して、表現情報から声質情報を生成する声質情報生成処理を実行させてもよく、画像情報生成処理で、表現情報を受信して、表現情報から画像構成情報を生成する処理を実行させてもよい。そのような構成によれば、送信側のユーザが希望する画像を画像表示手段に表示させ、送信側のユーザが希望する性質の合成音声を音声出力手段に出力させることができる。 The computer may receive expression information and execute voice quality information generation processing for generating voice quality information from the expression information. In the image information generation process, the expression information is received and image configuration information is generated from the expression information. You may perform the process to perform. According to such a configuration, the image desired by the user on the transmission side can be displayed on the image display means, and the synthesized voice having the property desired by the user on the transmission side can be output to the sound output means.

コンピュータに、画像情報生成処理で、画像構成情報の全部または一部を記憶する外部画像構成情報記憶手段と、外部画像構成情報記憶手段が記憶している画像構成情報の全部または一部を送信する画像構成情報送信手段とを含む画像情報提供装置から、表現情報に含まれ、外部画像構成情報記憶手段が記憶する画像構成情報の全部または一部の位置を示す情報である画像インデックス情報にもとづいて、画像情報提供装置が記憶する画像構成情報の全部または一部の送信を画像情報提供装置に要求する処理を実行させてもよく、画像情報提供装置から画像構成情報の全部または一部を受信する処理を実行させてもよい。そのような構成によれば、画像情報提供装置が記憶している第三者等の画像を、画像表示手段に表示させることができる。 In the image information generation process, external image configuration information storage means for storing all or part of the image configuration information and all or part of the image configuration information stored in the external image configuration information storage means are transmitted to the computer. Based on image index information which is information included in the expression information from the image information providing apparatus including the image configuration information transmitting unit and which indicates the position of all or part of the image configuration information stored in the external image configuration information storage unit. The image information providing apparatus may execute processing for requesting the image information providing apparatus to transmit all or part of the image configuration information stored in the image information providing apparatus, and receive all or part of the image configuration information from the image information providing apparatus. Processing may be executed. According to such a configuration, an image of a third party or the like stored in the image information providing apparatus can be displayed on the image display means.

コンピュータに、音声合成処理で、声質情報の全部または一部を記憶する外部声質情報記憶手段と、外部声質情報記憶手段が記憶している声質情報の全部または一部を送信する声質情報送信手段とを含む声質情報提供装置から、表現情報に含まれ、外部声質情報記憶手段が記憶している声質情報の全部または一部の位置を示す情報である声質インデックス情報にもとづいて、声質情報提供装置が記憶する声質情報の全部または一部の送信を、声質情報提供装置に要求する処理を実行させてもよく、声質情報提供装置から、声質情報の全部または一部を受信する処理を実行させてもよい。そのような構成によれば、声質情報提供装置が記憶している第三者の音声の性質の合成音声を、音声出力手段に出力させることができる。 An external voice quality information storage means for storing all or part of the voice quality information in the speech synthesis process; and a voice quality information transmission means for transmitting all or a part of the voice quality information stored in the external voice quality information storage means to the computer. A voice quality information providing device based on voice quality index information which is information indicating the position of all or part of the voice quality information stored in the external voice quality information storage means. Processing for requesting the voice quality information providing apparatus to transmit all or part of the stored voice quality information may be executed, or processing for receiving all or part of the voice quality information from the voice quality information providing apparatus may be executed. Good. According to such a configuration, it is possible to cause the voice output means to output the synthesized voice having the nature of the third party voice stored in the voice quality information providing apparatus.

本発明の第１の効果は、電子メールを、送信者に結び付いた内容である声質情報と画像構成情報を用いて、合成音声と画像とで表現することができることである。また、第２の効果は、電子メールの送信者が提供した声質情報または画像構成情報を使うことで、より緊密なコミュニケーションを行うことが可能になるということである。さらに、第３の効果は、例えば著名人やキャラクタの音声や画像を使用する権利を有する第三者が、声質情報または画像構成情報を受信者に提供することで、送信者と受信者とがより多彩なコミュニケーションを行うことが可能になるということである。 The first effect of the present invention is that an e-mail can be expressed by synthesized speech and an image using voice quality information and image configuration information which are contents linked to a sender. The second effect is that closer communication can be performed by using voice quality information or image configuration information provided by the sender of the e-mail. Furthermore, the third effect is that, for example, a third party who has the right to use voices and images of celebrities and characters provides the receiver with voice quality information or image configuration information. This means that more diverse communication is possible.

実施の形態１．
本発明の第１の実施の形態について、図面を参照して説明する。図１は、本発明の第１の実施の形態の一構成例を示すブロック図である。 Embodiment 1 FIG.
A first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of the first embodiment of the present invention.

本発明の第１の実施の形態は、電子メール等のテキストメッセージを通信回線１２を介して送信する送信装置１１と、送信装置１１から受信したテキストメッセージにもとづく合成音声の出力と、合成音声の出力に対応した画像の出力とを行う受信装置１３とを含む。 In the first embodiment of the present invention, a transmission device 11 that transmits a text message such as an electronic mail via a communication line 12, an output of synthesized speech based on the text message received from the transmission device 11, and a synthesized speech And a receiving device 13 that outputs an image corresponding to the output.

送信装置１１は、テキストメッセージを記憶するテキストメッセージ記憶部３１と、テキストメッセージ記憶部３１が記憶しているテキストメッセージを、通信回線１２を介して受信装置１３に送信する送信部３２とを含む。 The transmission device 11 includes a text message storage unit 31 that stores a text message, and a transmission unit 32 that transmits the text message stored in the text message storage unit 31 to the reception device 13 via the communication line 12.

受信装置１３は、声質情報と画像構成情報とを含む表現情報を記憶する表現情報記憶部（表現情報記憶手段）３４、表現情報から声質情報を生成する声質情報生成部（声質情報生成手段）２３、表現情報から画像構成情報を生成する画像構成情報生成部（画像情報生成手段）２４、画像構成情報を記憶する画像構成情報記憶部３３、声質情報にもとづいてテキストメッセージを合成音声に変換したり、合成音声の区切りを示す情報である区切り情報を生成したりする音声合成部（音声合成手段）２１、画像構成情報と区切り情報とにもとづいて画像を表示する画像表示部（画像表示手段）２２、および合成音声を出力するスピーカ２７を含む。 The receiving device 13 includes an expression information storage unit (expression information storage unit) 34 that stores expression information including voice quality information and image configuration information, and a voice quality information generation unit (voice quality information generation unit) 23 that generates voice quality information from the expression information. An image configuration information generation unit (image information generation unit) 24 that generates image configuration information from expression information, an image configuration information storage unit 33 that stores image configuration information, and converts a text message into synthesized speech based on voice quality information A speech synthesizer (speech synthesizer) 21 that generates delimiter information, which is information indicating a delimiter of the synthesized speech, and an image display unit (image display unit) 22 that displays an image based on the image configuration information and the delimiter information. And a speaker 27 for outputting synthesized speech.

声質情報生成部２３は、表現情報記憶部３４が記憶している表現情報から声質情報を生成する。ここで、声質情報は、音声合成部２１がテキストメッセージから変換して生成する合成音声の声質を指定する情報である。なお、声質情報は、話者名、声の高さ、抑揚の強さ、語尾の特徴等の特定個人あるいは概念的な人物像の声を想起させる要素や、発声速度、焦り方などの発話の際におかれている状況を想起させる要素のうち１以上を含む。 The voice quality information generation unit 23 generates voice quality information from the expression information stored in the expression information storage unit 34. Here, the voice quality information is information that designates the voice quality of the synthesized voice generated by the voice synthesis unit 21 by converting the text message. Voice quality information includes the elements that recall the voice of a specific individual or conceptual person, such as speaker name, voice pitch, strength of inflection, and ending features, as well as utterances such as utterance speed and impatience. It includes one or more of the elements that remind you of the situation you are facing.

画像構成情報生成部２４は、表現情報記憶部３４が記憶している表現情報から、画像構成情報を生成する。ここで、画像構成情報は、画像表示部２２が、送信者本人またはその代理となるキャラクタの画像を構成、表示するために用いる情報である。また、画像構成情報は、画像を生成するために、基本となる画像である基本画像や、顔の画像の場合は目や口や眉毛などの位置と形状との情報を含んでもよく、それ以外の部位の画像の場合も同等の情報を含んでもよい。画像構成情報生成部２４は、生成した画像構成情報を画像構成情報記憶部３３に記憶させる。画像構成情報記憶部３３は、画像構成情報を記憶する。 The image configuration information generation unit 24 generates image configuration information from the expression information stored in the expression information storage unit 34. Here, the image configuration information is information used by the image display unit 22 to configure and display an image of the sender himself or his / her proxy character. In addition, the image configuration information may include information on a basic image, which is a basic image for generating an image, and information on the position and shape of eyes, mouth, eyebrows, etc. in the case of a face image. In the case of the image of this part, the same information may be included. The image configuration information generation unit 24 stores the generated image configuration information in the image configuration information storage unit 33. The image configuration information storage unit 33 stores image configuration information.

なお、画像構成情報は、送信者本人の顔写真や、全身写真のほか、似顔絵や、送信者を受信者に想起させるキャラクタなどの無生物等の情報であってもよい。そして、画像構成情報は、上記に限定したものでは無く、コンピュータグラフィックス（ＣＧ）でキャラクタを合成表示するためのパラメータや、複数の静止画像のそれぞれに番号等を付けて束ねたもの等を用いてもよい。 Note that the image configuration information may be information such as an inanimate object such as a portrait or a character that reminds the receiver of the sender, in addition to the photograph of the sender himself / herself or the whole body. The image configuration information is not limited to the above, but uses parameters for combining and displaying characters by computer graphics (CG), or a bundle of numbers of each of a plurality of still images. May be.

音声合成部２１は、声質情報生成部２３が生成した声質情報にもとづいて、送信装置１１から受信したテキストメッセージを合成音声に変換し、スピーカ２７に出力する。
合成音声の出力中には、合成音声の出力の開始、段落の区切り、文の区切り、音節の区切り、合成音声の出力の終了、などの各種の区切りが存在する。音声合成手段２１は、これらの区切りのうち、予め定めたいくつかの区切りのタイミングで、区切り情報を画像表示部２２に出力する。 The voice synthesis unit 21 converts the text message received from the transmission device 11 into synthesized voice based on the voice quality information generated by the voice quality information generation unit 23 and outputs the synthesized voice to the speaker 27.
During the output of synthesized speech, there are various types of breaks such as the start of output of synthesized speech, paragraph breaks, sentence breaks, syllable breaks, and termination of output of synthesized speech. The voice synthesizer 21 outputs the delimiter information to the image display unit 22 at some predetermined delimiter timings among these delimiters.

画像表示部２２は、音声合成部２１が区切り情報を入力したタイミングで、画像構成情報記憶部３３が記憶している画像構成情報にもとづいて画像を生成し、生成した画像を表示する。なお、画像表示部２２は、予め音声合成部２１が入力した区切り情報に応じて読み込む画像構成情報の要素を記憶している。 The image display unit 22 generates an image based on the image configuration information stored in the image configuration information storage unit 33 at the timing when the speech synthesis unit 21 inputs the delimiter information, and displays the generated image. The image display unit 22 stores elements of image configuration information that are read according to the delimiter information input by the speech synthesizer 21 in advance.

ここで、送信装置１１と受信装置１３とは、例えば、電子メール送受信機能を備えた携帯電話機であり、通信回線１２は携帯電話通信網およびそれに付随するデータ通信網である。ただし、本発明の適用はこれらに限定されるものではなく、一般のインターネットを介した電子メールシステムやチャットシステム、専用ホストによるパソコン通信システム、ＩＰ電話網、ビデオ通信網などでもよい。また、通信回線１２を介する送信装置１１と受信装置１３との間の通信は、双方向通信でなくてもよく、送信装置１１から受信装置１３への単一方向通信でもよい。 Here, the transmission device 11 and the reception device 13 are, for example, cellular phones having an electronic mail transmission / reception function, and the communication line 12 is a cellular phone communication network and a data communication network associated therewith. However, the application of the present invention is not limited to these, and may be an electronic mail system or chat system via the general Internet, a personal computer communication system using a dedicated host, an IP telephone network, a video communication network, or the like. Further, the communication between the transmission device 11 and the reception device 13 via the communication line 12 may not be bidirectional communication, but may be unidirectional communication from the transmission device 11 to the reception device 13.

ここで、受信装置１３は、画像の情報である画像構成情報と合成音声の特徴の情報である声質情報とを含む表現情報の声質情報と、受信したテキストメッセージとにもとづいて合成音声を生成させ、合成音声の区切りを示す情報である区切り情報を生成させる音声合成処理と、テキストメッセージの合成音声の出力とともに表示する画像の情報を、画像構成情報にもとづいて生成させる画像情報生成処理と、区切り情報にもとづいて、画像を表示する画像表示部２２に、合成音声の区切りのタイミングで画像を変化させて表示させる画像表示処理とを実行するメッセージ伝達プログラムを搭載する。 Here, the receiving device 13 generates synthesized speech based on voice quality information of expression information including image configuration information that is image information and voice quality information that is information of characteristics of synthesized speech, and the received text message. A speech synthesis process for generating delimiter information, which is information indicating a delimiter of the synthesized speech, an image information generation process for generating image information to be displayed together with the output of the synthesized voice of the text message based on the image configuration information, and a delimiter On the basis of the information, a message transmission program for executing an image display process for changing and displaying an image at the timing of the synthesized speech is mounted on the image display unit 22 for displaying an image.

次に、本発明の第１の実施の形態の動作について説明する。図２は、本発明の第１の実施の形態の動作を説明するフローチャートである。 Next, the operation of the first exemplary embodiment of the present invention will be described. FIG. 2 is a flowchart for explaining the operation of the first embodiment of the present invention.

送信装置１１の送信部３２は、通信回線１２を介して受信装置１３に、テキストメッセージ記憶部３１が記憶しているテキストメッセージを送信する（ステップＳ１０１）。図３に、テキストメッセージ記憶部３１が記憶しているテキストメッセージの一例を示す。図３に示す例では、テキストメッセージ記憶部３１は、「今日は、良い天気です。」というテキストメッセージを記憶している。送信部３２は、通信回線１２を介して受信装置１３に、「今日は、良い天気です。」というテキストメッセージを送信する。 The transmission unit 32 of the transmission device 11 transmits the text message stored in the text message storage unit 31 to the reception device 13 via the communication line 12 (step S101). FIG. 3 shows an example of the text message stored in the text message storage unit 31. In the example illustrated in FIG. 3, the text message storage unit 31 stores a text message “Today is good weather”. The transmission unit 32 transmits a text message “Today is good weather” to the reception device 13 via the communication line 12.

受信装置１３において、音声合成部２１がテキストメッセージを受信すると、声質情報生成部２３および画像構成情報生成部２４に、テキストメッセージを受信したことを通知する（ステップＳ１０２）。声質情報生成部２３および画像構成情報生成部２４は、表現情報記憶部３４が記憶している表現情報を読み出す（ステップＳ１０３）。図４は、表現情報の一例を示す説明図である。図４に示す例では、表現情報は、声質情報と画像構成情報とを含む。 In the receiving device 13, when the speech synthesizer 21 receives the text message, it notifies the voice quality information generator 23 and the image configuration information generator 24 that the text message has been received (step S102). The voice quality information generation unit 23 and the image configuration information generation unit 24 read the expression information stored in the expression information storage unit 34 (step S103). FIG. 4 is an explanatory diagram illustrating an example of expression information. In the example shown in FIG. 4, the expression information includes voice quality information and image configuration information.

声質情報生成部２３は、読み出した表現情報から声質情報を生成し、画像構成情報生成部２４は、読み出した表現情報から画像構成情報を生成する（ステップＳ１０４）。図５は、声質情報の一例を示す説明図である。声質情報は、話者と、発声速度と、声の高さとを示す情報である。ここで、図５に示す例では、声の高さを基準値＋１００Ｈｚとしているが、例えば３５０Ｈｚ等の絶対値で示してもよい。図６は、画像構成情報の一例を示す説明図である。画像構成情報は、基本画像と、目の相対位置と、目の大きさと、口の相対位置と、口の開度とを示している。目の相対位置と口の相対位置とは、例えば、顔の画像の中心を原点として正規化した座標値で示す。なお、目の相対位置において、右目の相対位置は、例えば、右目を接して囲む四角形の対角線の交点のＹ座標値と、Ｘ座標値とであり、左目の相対位置は、左目を接して囲む四角形の対角線の交点のＹ座標値と、Ｘ座標値とである。また、口の相対位置は、例えば、口を接して囲む四角形の上辺のＹ座標値（口の上下方向の位置を示す。）と、四角形の幅の値（口の幅を示す。）と、四角形の高さの値（口の厚さを示す。）とで示す。目の大きさにおいて、例えば、右目の大きさは、右目を接して囲む四角形の高さ（右目の厚さを示す。）と幅（右目の幅を示す。）との値で示し、左目の大きさは、左目を接して囲む四角形の高さ（左目の厚さを示す。）と幅（左目の幅を示す。）との値で示す。口の開度は、例えば、口を最も大きく開けた時の口を接して囲む四角形の高さを１００として、正規化した値で示す。例えば、口の開度が７５の場合、口を最も大きく開けたときの７５％の高さで口が開いていることを示す。なお、この実施の形態の例では、上述の方法で、画像構成情報が示す画像を数値化して示したが、本発明はこれに限定されるものではなく、他の方法で画像を画像構成情報が示してもよい。画像構成情報生成部２４は、生成した画像構成情報を画像構成情報記憶部３３に記憶させる。 The voice quality information generation unit 23 generates voice quality information from the read expression information, and the image configuration information generation unit 24 generates image configuration information from the read expression information (step S104). FIG. 5 is an explanatory diagram showing an example of voice quality information. The voice quality information is information indicating a speaker, an utterance speed, and a voice pitch. Here, in the example shown in FIG. 5, the pitch of the voice is set to the reference value +100 Hz, but may be indicated by an absolute value such as 350 Hz, for example. FIG. 6 is an explanatory diagram illustrating an example of image configuration information. The image configuration information indicates the basic image, the relative position of the eyes, the size of the eyes, the relative position of the mouth, and the opening of the mouth. The relative position of the eyes and the relative position of the mouth are represented by coordinate values normalized with the center of the face image as the origin, for example. In the relative position of the eyes, the relative position of the right eye is, for example, the Y coordinate value and the X coordinate value of the intersection of the diagonal lines of the rectangle surrounding the right eye, and the relative position of the left eye is surrounded by the left eye. The Y coordinate value and the X coordinate value of the intersection of the diagonal lines of the rectangle. The relative position of the mouth is, for example, the Y coordinate value (indicating the vertical position of the mouth) of the upper side of the quadrangle surrounding the mouth, the square width value (indicating the mouth width), and the like. It is indicated by the value of the height of the rectangle (indicating the thickness of the mouth). In the size of the eye, for example, the size of the right eye is indicated by the values of the height (indicating the thickness of the right eye) and the width (indicating the width of the right eye) of the quadrangle surrounding the right eye. The size is indicated by the values of the height (indicating the thickness of the left eye) and the width (indicating the width of the left eye) of the quadrangle surrounding the left eye. The opening degree of the mouth is represented by a normalized value, for example, where the height of a quadrangle that touches and surrounds the mouth when the mouth is opened the most is 100. For example, when the opening degree of the mouth is 75, it indicates that the mouth is open at a height of 75% when the mouth is opened most widely. In the example of this embodiment, the image indicated by the image configuration information is expressed by the numerical value by the above-described method. However, the present invention is not limited to this. May show. The image configuration information generation unit 24 stores the generated image configuration information in the image configuration information storage unit 33.

画像表示部２２は、画像構成情報記憶部３３が記憶している画像構成情報を読み出し、基本画像を表示する（ステップＳ１０５）。音声合成部２１は、声質情報生成部２３が生成した声質情報にもとづいて、受信したテキストメッセージの音声合成を行ない、合成音声を生成し、スピーカ２７に合成音声の出力を開始する（ステップＳ１０６）。 The image display unit 22 reads the image configuration information stored in the image configuration information storage unit 33 and displays a basic image (step S105). The voice synthesizer 21 performs voice synthesis of the received text message based on the voice quality information generated by the voice quality information generator 23, generates a synthesized voice, and starts outputting the synthesized voice to the speaker 27 (step S106). .

図７は、音声合成部２１が声質情報にもとづいて、受信したテキストメッセージの音声合成を行ない、生成した合成音声の音声波形の一例を示す説明図である。ここで、音声合成部２１は内部でタイミング点の情報を持つ。本実施例におけるタイミング点の例として、図７中に時刻Ａから時刻Ｄを示す。それぞれ、時刻Ａは発声の開始時点の時刻、時刻Ｂは読点の時点の時刻、時刻Ｃは音声合成部２１が認識した文節区切りの時点の時刻、時刻Ｄは発声の終了時点を表している。音声合成部２１はそれぞれのタイミング点の時刻で、区切り情報を生成して画像表示部２２に出力する。画像情報表示部２２は、区切り情報にもとづいて、それぞれのタイミング点の時刻で、予め決められた画像に表示する画像を変化させる。 FIG. 7 is an explanatory diagram illustrating an example of a speech waveform of a synthesized speech generated by the speech synthesizer 21 performing speech synthesis of a received text message based on voice quality information. Here, the speech synthesizer 21 has timing point information therein. As an example of the timing points in this embodiment, time A to time D are shown in FIG. Time A represents the time at the start of utterance, time B represents the time at the time of reading, time C represents the time at the paragraph break recognized by the speech synthesizer 21, and time D represents the end of utterance. The voice synthesizer 21 generates delimiter information at the time of each timing point and outputs it to the image display unit 22. The image information display unit 22 changes the image to be displayed in a predetermined image at the time of each timing point based on the delimiter information.

時刻Ａになると（ステップＳ１０７）、画像表示部２２は画像構成情報記憶部３３が記憶している口の相対位置と口の開度との情報を読み出す（ステップＳ１０８）。そして、画像表示部は、読み出した口の相対位置と口の開度との情報に応じて、口の周辺の画像を生成し、表示している基本画像の口の周辺の画像に上書きする。この実施の形態では、基本画像の口の開度は７５なので、口の開度が１００である、口が開いた画像に書き換えられる（ステップＳ１０９）。 At time A (step S107), the image display unit 22 reads information on the relative position of the mouth and the opening degree of the mouth stored in the image configuration information storage unit 33 (step S108). Then, the image display unit generates an image around the mouth according to the read information on the relative position of the mouth and the opening degree of the mouth, and overwrites the image around the mouth of the displayed basic image. In this embodiment, since the opening degree of the mouth of the basic image is 75, the mouth opening degree is 100, and the image is rewritten to an image with the mouth opened (step S109).

時刻Ｂになると（ステップＳ１１０）、画像表示部２２は画像構成情報記憶部３３が記憶している目の相対位置と目の大きさとの情報を読み出す（ステップＳ１１１）。そして、目の相対位置を中心に、例えば、右目を囲んで接する四角形の中の画像を、反時計回りの方向に３０度傾ける。また、左目を囲んで接する四角形の中の画像を、時計回りの方向に３０度傾ける。すると、目が笑っているように見える画像が生成され、画像表示部２２は、基本画像の目の周辺の画像を、目が笑っているように見える画像に書き換える（ステップＳ１１２）。 At time B (step S110), the image display unit 22 reads information on the relative position and the eye size stored in the image configuration information storage unit 33 (step S111). Then, with the relative position of the eyes at the center, for example, the image in the quadrangle surrounding the right eye is tilted 30 degrees counterclockwise. Further, the image in the quadrangle that touches the left eye is tilted 30 degrees in the clockwise direction. Then, an image that looks like the eyes are laughing is generated, and the image display unit 22 rewrites the image around the eyes of the basic image into an image that looks like the eyes are laughing (step S112).

時刻Ｃになると（ステップＳ１１３）、画像表示部２２は画像構成情報記憶部３３が記憶している目の相対位置と目の大きさとの情報を読み出す（ステップＳ１１４）。そして、画像表示部は、読み出した目の相対位置と目の大きさとの情報に応じて、目の周辺の画像を生成し、表示している笑っている目の周辺の画像に上書きし、目の周辺を基本画像に書き換える（ステップＳ１１５）。 At time C (step S113), the image display unit 22 reads information on the relative position and the eye size stored in the image configuration information storage unit 33 (step S114). Then, the image display unit generates an image around the eye according to the read information on the relative position of the eye and the size of the eye, and overwrites the image around the laughing eye being displayed. Is rewritten to a basic image (step S115).

時刻Ｄになると（ステップＳ１１６）、画像表示部２２は画像構成情報記憶部３３が記憶している口の相対位置と口の開度との情報を読み出す（ステップＳ１１７）。そして、画像表示部は、読み出した口の相対位置と口の開度との情報に応じて、口の周辺の画像を生成し、表示している開いている口の周辺の画像に上書きする。この実施の形態では、基本画像の口の開度は７５なので、口の開度が１００である、口が開いた画像が、口の開度が７５である基本画像に書き換えられる（ステップＳ１１８）。 At time D (step S116), the image display unit 22 reads information on the relative position of the mouth and the opening degree of the mouth stored in the image configuration information storage unit 33 (step S117). Then, the image display unit generates an image around the mouth according to the read information on the relative position of the mouth and the opening of the mouth, and overwrites the displayed image around the open mouth. In this embodiment, since the opening degree of the mouth of the basic image is 75, the image of the opening degree of the mouth having the opening degree of 100 is rewritten to the basic image having the opening degree of the mouth of 75 (step S118). .

音声合成部２１は合成音声の出力を終了し（ステップＳ１１９）、画像表示部２２は、画像の表示を終了する（ステップＳ１２０）。 The voice synthesizer 21 finishes outputting the synthesized voice (step S119), and the image display unit 22 finishes displaying the image (step S120).

図８は、第1の実施の形態において、画像表示部２２が出力する画像の例を示す説明図である。第1の実施の形態の動作の説明で述べたように、時刻Ａから時刻Ｄまで、画像表示部２２が出力する画像が時刻に応じて変化することがわかる。 FIG. 8 is an explanatory diagram illustrating an example of an image output from the image display unit 22 in the first embodiment. As described in the description of the operation of the first embodiment, it can be seen that from time A to time D, the image output by the image display unit 22 changes according to the time.

以上、述べたように、この実施の形態によれば、テキストメッセージの出力に連動して画像が動くように見えるように、画像表示部２２は画像を出力することができる。 As described above, according to this embodiment, the image display unit 22 can output an image so that the image appears to move in conjunction with the output of the text message.

なお、この実施の形態では、テキストメッセージは、プレーンテキストを例に説明したが、修飾情報を伴うリッチテキスト、音声合成の発声内容を表した発音記号列などを用いてもよい。その場合には、それぞれに対応した音声合成部２１を用いる。また、図２のフローチャートに示した動作は、各区切りの時刻毎に予め定められているが、そのほか文中の単語の種類や、記号等の特定の文字種、リッチテキストの場合の修飾情報に応じて予め定められていてもよい。 In this embodiment, the text message has been described by taking plain text as an example. However, a rich text with modification information, a phonetic symbol string representing the utterance content of speech synthesis, or the like may be used. In that case, the speech synthesizer 21 corresponding to each is used. The operation shown in the flowchart of FIG. 2 is predetermined for each delimiter time, but in addition to the type of word in the sentence, a specific character type such as a symbol, and modification information in the case of rich text It may be determined in advance.

また、この実施例においては、説明のため、表現情報は単一としている。ここで、送信装置１１（送信者）が複数ある場合には、表現情報記憶部３４は、送信装置１１に対応する表現情報を複数記憶しておく。そして、送信装置１１は、テキストメッセージとともに送信装置を示す送信者ＩＤを送信し、音声合成部２１は、受信した送信者ＩＤを声質情報生成部２３および画像構成情報生成部２４に出力し、声質情報生成部２３および画像構成情報生成部２４は、送信者ＩＤに対応した表現情報をそれぞれ読み出す。 In this embodiment, the expression information is single for the sake of explanation. Here, when there are a plurality of transmission apparatuses 11 (senders), the expression information storage unit 34 stores a plurality of expression information corresponding to the transmission apparatuses 11. Then, the transmission device 11 transmits a sender ID indicating the transmission device together with the text message, and the speech synthesizer 21 outputs the received sender ID to the voice quality information generation unit 23 and the image configuration information generation unit 24 to obtain the voice quality. The information generation unit 23 and the image configuration information generation unit 24 each read expression information corresponding to the sender ID.

実施の形態２．
本発明の第２の実施の形態について、図面を参照して説明する。図９は、本発明の第２の実施の形態の一構成例を示すブロック図である。本実施の形態の構成は、第１の実施の形態の受信装置１３の表現情報記憶部３４を、送信装置１１が備えたものであり、その他の構成は第１の実施例と同様である。そのため、第１の実施の形態と同様の回路等については図１と同じ符号を付し、説明を省略する。 Embodiment 2. FIG.
A second embodiment of the present invention will be described with reference to the drawings. FIG. 9 is a block diagram showing a configuration example of the second exemplary embodiment of the present invention. In the configuration of the present embodiment, the transmission device 11 includes the expression information storage unit 34 of the reception device 13 of the first embodiment, and other configurations are the same as those of the first embodiment. For this reason, the same circuits as those in the first embodiment are denoted by the same reference numerals as those in FIG.

送信装置１１が備える送信部（表現情報送信手段）３２は、テキストメッセージ記憶部３１が記憶しているテキストメッセージとともに、または別々に、通信回線１２を介して表現情報記憶部３４が記憶している表現情報を受信装置１３に送信する。受信装置１３では、声質情報生成部２３と画像構成情報生成部２４とが表現情報を受信する。 The transmission unit (expression information transmission means) 32 included in the transmission device 11 is stored in the expression information storage unit 34 via the communication line 12 together with or separately from the text message stored in the text message storage unit 31. The expression information is transmitted to the receiving device 13. In the receiving device 13, the voice quality information generation unit 23 and the image configuration information generation unit 24 receive the expression information.

ここで、送信部３２は、テキストメッセージと表現情報とを、一体として通信回線１２を送信してもよいし、別々に送信してもよい。また、テキストメッセージと表現情報とを別々に送信する場合、テキストメッセージと表現情報とを伝送する通信回線１２は物理的・論理的に同一のものでなくてもよい。 Here, the transmission unit 32 may transmit the text message and the expression information as a unit through the communication line 12 or separately. When the text message and the expression information are transmitted separately, the communication lines 12 that transmit the text message and the expression information may not be physically and logically the same.

受信装置１３は、画像の情報である画像構成情報と合成音声の特徴の情報である声質情報とを含む表現情報の声質情報と、受信したテキストメッセージとにもとづいて合成音声を生成させ、合成音声の区切りを示す情報である区切り情報を生成させる音声合成処理と、テキストメッセージの合成音声の出力とともに表示する画像の情報を、画像構成情報にもとづいて生成させる画像情報生成処理と、区切り情報にもとづいて、画像を表示する画像表示部２２に、合成音声の区切りのタイミングで画像を変化させて表示させる画像表示処理とを実行するメッセージ伝達プログラムを搭載する。また、表現情報を受信して、表現情報から声質情報を生成する声質情報生成処理を実行してもよく、画像情報生成処理で、表現情報を受信して、表現情報から画像構成情報を生成する処理を実行してもよい。 The receiving device 13 generates synthesized speech based on voice quality information of expression information including image configuration information which is image information and voice quality information which is characteristic information of synthesized speech, and a received text message, and generates synthesized speech. A speech synthesis process for generating delimiter information, which is information indicating the delimiter of the image, an image information generation process for generating image information to be displayed together with the output of the synthesized voice of the text message based on the image configuration information, and based on the delimiter information Then, a message transmission program for executing an image display process for displaying an image by changing the image at the timing of the synthesized speech is mounted on the image display unit 22 for displaying the image. In addition, the voice information generation processing for generating the voice quality information from the expression information may be executed by receiving the expression information. In the image information generation process, the expression information is received and the image configuration information is generated from the expression information. Processing may be executed.

次に、本発明の第２の実施の形態の動作について、図面を参照して説明する。図１０は、本発明の第２の実施の形態の動作を説明するフローチャートである。 Next, the operation of the second exemplary embodiment of the present invention will be described with reference to the drawings. FIG. 10 is a flowchart for explaining the operation of the second exemplary embodiment of the present invention.

送信装置１１の送信部３２は、テキストメッセージと表現情報とを、通信回線１２を介して受信装置１３に送信する（ステップＳ２０１）。送信部３２は、テキストメッセージと表現情報とに、共通のヘッダを付して通信回線１２を一体として送信する。ただし、本発明は、これに限定されることはなく、送信部３２は、テキストメッセージと表現情報とを別々のデータとして送信してもよい。また、一度テキストメッセージと表現情報とを一体として送付した後は、表現情報に変更の無い限り、表現情報を受信装置１３の記憶部（図示せず）に記憶させて、表現情報の送信を省略してもよい。すると、本発明の第１の実施の形態と同様の構成となる。 The transmission unit 32 of the transmission device 11 transmits the text message and the expression information to the reception device 13 via the communication line 12 (Step S201). The transmission unit 32 attaches a common header to the text message and the expression information, and transmits the communication line 12 as a unit. However, the present invention is not limited to this, and the transmission unit 32 may transmit the text message and the expression information as separate data. In addition, once the text message and the expression information are sent together, the expression information is stored in a storage unit (not shown) of the receiving device 13 and transmission of the expression information is omitted unless the expression information is changed. May be. Then, it becomes the structure similar to the 1st Embodiment of this invention.

受信装置１３では、声質情報生成部２３と画像構成情報生成部２４とが表現情報を受信し、音声合成手段２１がテキストメッセージを受信する（ステップＳ２０２）。 In the receiving device 13, the voice quality information generation unit 23 and the image configuration information generation unit 24 receive the expression information, and the speech synthesis unit 21 receives the text message (step S202).

声質情報生成部２３は、受信した表現情報から声質情報を生成し、画像構成情報生成部２４は、受信した表現情報から画像構成情報を生成する（ステップＳ２０３）。画像構成情報生成部２４は、生成した画像構成情報を画像構成情報記憶部３３に記憶させる。 The voice quality information generation unit 23 generates voice quality information from the received expression information, and the image configuration information generation unit 24 generates image configuration information from the received expression information (step S203). The image configuration information generation unit 24 stores the generated image configuration information in the image configuration information storage unit 33.

画像表示部２２は、画像構成情報記憶部３３が記憶している画像構成情報を読み出し、基本画像を表示する（ステップＳ２０４）。音声合成部２１は、声質情報生成部２３が生成した声質情報にもとづいて、受信したテキストメッセージの音声合成を行ない、合成音声を生成し、スピーカ２７に合成音声の出力を開始する（ステップＳ２０５）。 The image display unit 22 reads the image configuration information stored in the image configuration information storage unit 33 and displays a basic image (step S204). The voice synthesizer 21 performs voice synthesis of the received text message based on the voice quality information generated by the voice quality information generator 23, generates a synthesized voice, and starts outputting the synthesized voice to the speaker 27 (step S205). .

ステップＳ２０６以降（ステップＳ２０６〜Ｓ２１９）の動作は、第１の実施の形態におけるステップＳ１０７以降（ステップＳ１０７〜Ｓ１２０）の動作と同様なため、説明を省略する。 Since the operations after step S206 (steps S206 to S219) are the same as the operations after step S107 (steps S107 to S120) in the first embodiment, the description thereof will be omitted.

以上、述べたように、本発明の第２の実施の形態によれば、送信装置１１が表現情報を受信装置１３に送信するため、送信装置１１のユーザの希望する声質の合成音声を受信装置１３のスピーカ２７から出力させたり、送信装置１１のユーザの希望する画像を、合成音声の出力に連動して画像表示部２２に表示させたりすることができる。 As described above, according to the second embodiment of the present invention, since the transmission device 11 transmits the expression information to the reception device 13, the reception device receives the synthesized voice of the voice quality desired by the user of the transmission device 11. 13 speakers 27, and an image desired by the user of the transmission device 11 can be displayed on the image display unit 22 in conjunction with the output of the synthesized voice.

実施の形態３．
本発明の第３の実施の形態について、図面を参照して説明する。図１１は、本発明の第３の実施の形態の一構成例を示すブロック図である。本実施の形態の構成は、第２の実施の形態の受信装置１３の画像構成情報生成部２４に通信回線１４を介して接続されるサーバ（画像構成情報提供装置、声質情報提供装置）１５を含む点が第２の実施の形態と異なる。そして、サーバ１５は、予め画像構成情報を記憶しているサーバ画像構成情報記憶部（外部画像構成情報記憶手段）３５と、サーバ画像構成情報記憶部３５が記憶している画像構成情報を通信回線１４を介して受信装置１３の画像構成情報生成部２４に送信する画像構成情報送信部（画像構成情報送信手段）２５とを含む。その他の構成は第２の実施の形態と同様である。そのため、第２の実施の形態と同様の回路等については図９と同じ符号を付し、説明を省略する。なお、通信回線１４は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等の専用回線で画像情報生成部２４とサーバ１５とを接続してもよいし、インターネット等の公衆回線で画像情報生成部２４とサーバ１５とを接続してもよい。そして、通信回線１４は、インターネット等の公衆回線であった場合、通信回線１２と一部共用していてもよい。 Embodiment 3 FIG.
A third embodiment of the present invention will be described with reference to the drawings. FIG. 11 is a block diagram showing a configuration example of the third embodiment of the present invention. The configuration of this embodiment includes a server (image configuration information providing device, voice quality information providing device) 15 connected to the image configuration information generating unit 24 of the receiving device 13 of the second embodiment via a communication line 14. Including points is different from the second embodiment. Then, the server 15 transmits a server image configuration information storage unit (external image configuration information storage unit) 35 that stores image configuration information in advance and an image configuration information stored in the server image configuration information storage unit 35 to a communication line. 14 includes an image configuration information transmission unit (image configuration information transmission unit) 25 that transmits the image configuration information to the image configuration information generation unit 24 of the reception device 13 via the network 14. Other configurations are the same as those of the second embodiment. For this reason, circuits similar to those of the second embodiment are denoted by the same reference numerals as those in FIG. 9, and description thereof is omitted. The communication line 14 may connect the image information generation unit 24 and the server 15 via a dedicated line such as a LAN (Local Area Network), or may connect the image information generation unit 24 and the server 15 via a public line such as the Internet. May be connected. If the communication line 14 is a public line such as the Internet, the communication line 14 may be partially shared with the communication line 12.

図１２は、第３の実施の形態の表現情報の一例を示す説明図である。図１２に示す例では、表現情報は、声質情報と画像構成情報のインデックスとを含む。画像構成情報のインデックスとは、画像構成情報を記憶している装置と、その装置内で画像構成情報を記憶している位置とを示す情報である。この実施の形態の例では、サーバ１５のサーバ画像構成情報記憶部３５が画像構成情報を記憶しているため、画像構成情報のインデックスは、サーバ１５のＩＰアドレスと、サーバ１５内での画像構成情報を記憶している位置を示す番号である位置番号とで構成されている。ここで、画像構成情報のインデックスは、サーバのＩＰアドレスとサーバ１５内における位置番号との組に限らず、サーバの名称とフルパス名の組や、ＵＲＩ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＩｄｅｎｔｉｆｉｅｒｓ）等を用いてもよい。 FIG. 12 is an explanatory diagram illustrating an example of expression information according to the third embodiment. In the example shown in FIG. 12, the expression information includes voice quality information and an index of image configuration information. The index of the image configuration information is information indicating a device that stores the image configuration information and a position in the device that stores the image configuration information. In the example of this embodiment, since the server image configuration information storage unit 35 of the server 15 stores the image configuration information, the index of the image configuration information includes the IP address of the server 15 and the image configuration in the server 15. The position number is a number indicating the position where the information is stored. Here, the index of the image configuration information is not limited to the set of the server IP address and the position number in the server 15, but may be a set of the server name and full path name, a URI (Uniform Resource Identifier), or the like. .

画像構成情報生成部２４は、画像構成情報のインデックスを含む表現情報を受信すると、画像構成情報のインデックスにもとづいて通信回線１４を介してサーバ１５に接続し、画像構成情報の位置番号を送信する。サーバ１５の画像構成情報送信部２５は、画像構成情報生成部２４から受信した位置番号の画像構成情報をサーバ画像構成情報記憶部３５から読み出して、読み出した画像構成情報を通信回線１４を介して画像構成情報生成部２４に送信する。 When receiving the expression information including the index of the image configuration information, the image configuration information generation unit 24 connects to the server 15 via the communication line 14 based on the index of the image configuration information, and transmits the position number of the image configuration information. . The image configuration information transmission unit 25 of the server 15 reads the image configuration information of the position number received from the image configuration information generation unit 24 from the server image configuration information storage unit 35, and the read image configuration information via the communication line 14. It transmits to the image configuration information generation unit 24.

ここで、受信装置１３は、画像の情報である画像構成情報と合成音声の特徴の情報である声質情報とを含む表現情報の声質情報と、受信したテキストメッセージとにもとづいて合成音声を生成させ、合成音声の区切りを示す情報である区切り情報を生成させる音声合成処理と、テキストメッセージの合成音声の出力とともに表示する画像の情報を、画像構成情報にもとづいて生成させる画像情報生成処理と、区切り情報にもとづいて、画像を表示する画像表示部２２に、合成音声の区切りのタイミングで画像を変化させて表示させる画像表示処理とを実行するメッセージ伝達プログラムを搭載する。また、画像情報生成処理で、画像構成情報の全部または一部を記憶するサーバ画像構成情報記憶部３５と、サーバ画像構成情報記憶部３５が記憶している画像構成情報の全部または一部を送信する画像構成情報送信部２５とを含むサーバ１５から、表現情報に含まれ、サーバ画像構成情報記憶部３５が記憶する画像構成情報の全部または一部の位置を示す情報である画像構成情報のインデックスにもとづいて、サーバ１５が記憶する画像構成情報の全部または一部の送信をサーバ１５に要求する処理を実行してもよく、サーバ１５から画像構成情報の全部または一部を受信する処理を実行してもよい。さらに、音声合成処理で、声質情報の全部または一部を記憶する外部声質情報記憶部（図示せず）と、外部声質情報記憶部が記憶している声質情報の全部または一部を送信する声質情報送信部（図示せず）とを含むサーバ１５から、表現情報に含まれ、外部声質情報記憶部が記憶している声質情報の全部または一部の位置を示す情報である声質情報のインデックスにもとづいて、サーバ１５が記憶する声質情報の全部または一部の送信を、サーバ１５に要求する処理を実行してもよく、サーバ１５から、声質情報の全部または一部を受信する処理を実行してもよい。 Here, the receiving device 13 generates synthesized speech based on voice quality information of expression information including image configuration information that is image information and voice quality information that is information of characteristics of synthesized speech, and the received text message. A speech synthesis process for generating delimiter information, which is information indicating a delimiter of the synthesized speech, an image information generation process for generating image information to be displayed together with the output of the synthesized voice of the text message based on the image configuration information, and a delimiter On the basis of the information, a message transmission program for executing an image display process for changing and displaying an image at the timing of the synthesized speech is mounted on the image display unit 22 for displaying an image. Also, in the image information generation process, the server image configuration information storage unit 35 that stores all or part of the image configuration information, and all or part of the image configuration information stored in the server image configuration information storage unit 35 are transmitted. An index of image configuration information that is information included in the representation information from the server 15 including the image configuration information transmission unit 25 to be stored and that indicates the position of all or part of the image configuration information stored in the server image configuration information storage unit 35 The server 15 may execute a process of requesting the server 15 to transmit all or part of the image configuration information stored in the server 15 or perform a process of receiving all or part of the image configuration information from the server 15. May be. Furthermore, in the speech synthesis process, an external voice quality information storage unit (not shown) that stores all or part of the voice quality information, and a voice quality that transmits all or part of the voice quality information stored in the external voice quality information storage unit From the server 15 including the information transmitting unit (not shown), an index of voice quality information which is information included in the expression information and indicating the position of all or part of the voice quality information stored in the external voice quality information storage unit Based on the above, processing for requesting the server 15 to transmit all or part of the voice quality information stored in the server 15 may be executed, and processing for receiving all or part of the voice quality information from the server 15 is executed. May be.

次に、本発明の第３の実施の形態の動作について説明する。図１３は、画像構成情報生成部２４が画像構成情報送信部２５から画像構成情報を受信する際の動作を説明するシーケンス図である。 Next, the operation of the third exemplary embodiment of the present invention will be described. FIG. 13 is a sequence diagram illustrating an operation when the image configuration information generation unit 24 receives image configuration information from the image configuration information transmission unit 25.

画像構成情報生成部２４は、画像構成情報のインデックスを受信すると（ステップＳ３０１）、通信回線１４を介してサーバ１５の画像構成情報送信部２５に接続を要求する、（ステップＳ３０２）。画像構成情報送信部２５は、接続を許可する（ステップＳ３０３）。 When receiving the index of the image configuration information (step S301), the image configuration information generation unit 24 requests connection to the image configuration information transmission unit 25 of the server 15 via the communication line 14 (step S302). The image configuration information transmitting unit 25 permits the connection (step S303).

画像構成情報生成部２４は、画像構成情報のインデックスに含まれる位置番号を、画像構成情報送信部２５に通信回線１４を介して送信して、画像構成情報の送信を要求する（ステップＳ３０４）。画像構成情報送信部２５は、位置番号を受信すると（ステップＳ３０５）、受信した位置番号の画像構成情報をサーバ画像構成情報記憶部３５から読み出し（ステップＳ３０６）、読み出した画像構成情報を、画像構成情報生成部２４に通信回線１４を介して送信する（ステップＳ３０７）。 The image configuration information generation unit 24 transmits the position number included in the index of the image configuration information to the image configuration information transmission unit 25 via the communication line 14 and requests transmission of the image configuration information (step S304). When receiving the position number (step S305), the image configuration information transmitting unit 25 reads the image configuration information of the received position number from the server image configuration information storage unit 35 (step S306), and the read image configuration information is stored in the image configuration information. The information is transmitted to the information generator 24 via the communication line 14 (step S307).

画像構成情報生成部２４は、画像構成情報を受信すると（ステップＳ３０８）、画像構成情報送信部２５に接続の切断を要求する（ステップＳ３０９）。画像構成情報送信部２５は、接続の切断を許可する（ステップＳ３１０）。画像構成情報生成部２４は、画像構成情報送信部２５との接続を切断する（ステップＳ３１１）。 When receiving the image configuration information (step S308), the image configuration information generation unit 24 requests the image configuration information transmission unit 25 to disconnect the connection (step S309). The image configuration information transmission unit 25 permits the disconnection (step S310). The image configuration information generation unit 24 disconnects from the image configuration information transmission unit 25 (step S311).

本発明の第３の実施の形態の、送信装置１１と受信装置１３とサーバ１５との動作について説明する。図１４は、本発明の第３の実施の形態の動作を説明するフローチャートである。 Operations of the transmission device 11, the reception device 13, and the server 15 according to the third embodiment of the present invention will be described. FIG. 14 is a flowchart for explaining the operation of the third embodiment of the present invention.

送信装置１１の送信部３２は、テキストメッセージと表現情報とを、通信回線１２を介して受信装置1１３に送信する（ステップＳ４０１）。受信装置１３では、声質情報生成部２３と画像構成情報生成部２４とが表現情報を受信し、音声合成手段２１がテキストメッセージを受信する（ステップＳ４０２）。 The transmission unit 32 of the transmission device 11 transmits the text message and the expression information to the reception device 113 via the communication line 12 (Step S401). In the receiving device 13, the voice quality information generation unit 23 and the image configuration information generation unit 24 receive the expression information, and the speech synthesis unit 21 receives the text message (step S402).

画像情報生成部２４は、表現情報に含まれる画像構成情報のインデックスにもとづいて、画像構成情報をサーバ１５から通信回線１４を介して受信する（ステップＳ４０３）。画像情報生成部２４は、図１３のシーケンス図に示す動作を行ない、サーバ１５から画像構成情報を受信する。そして、画像情報生成部２４は、受信した画像構成情報を画像構成情報記憶部３３に記憶させ、画像構成情報を受信したことを音声合成手段２１に通知する。 The image information generation unit 24 receives the image configuration information from the server 15 via the communication line 14 based on the index of the image configuration information included in the expression information (step S403). The image information generation unit 24 performs the operation shown in the sequence diagram of FIG. 13 and receives image configuration information from the server 15. Then, the image information generation unit 24 stores the received image configuration information in the image configuration information storage unit 33 and notifies the voice synthesis unit 21 that the image configuration information has been received.

声質情報生成部２３は、受信した表現情報から声質情報を生成する（ステップＳ４０４）。画像表示部２２は、画像構成情報記憶部３３が記憶している画像構成情報を読み出し、基本画像を表示する（ステップＳ４０５）。音声合成部２１は、声質情報生成部２３が生成した声質情報にもとづいて、受信したテキストメッセージの音声合成を行ない、合成音声を生成し、スピーカ２７に合成音声の出力を開始する（ステップＳ４０６）。 The voice quality information generation unit 23 generates voice quality information from the received expression information (step S404). The image display unit 22 reads the image configuration information stored in the image configuration information storage unit 33 and displays a basic image (step S405). The voice synthesizer 21 performs voice synthesis of the received text message based on the voice quality information generated by the voice quality information generator 23, generates a synthesized voice, and starts outputting the synthesized voice to the speaker 27 (step S406). .

ステップＳ４０７以降（ステップＳ４０７〜Ｓ４２０）の動作は、第１の実施の形態におけるステップＳ１０６以降（ステップＳ１０７〜Ｓ１２０）の動作と同様なため、説明を省略する。 Since the operations after step S407 (steps S407 to S420) are the same as the operations after step S106 in the first embodiment (steps S107 to S120), description thereof will be omitted.

なお、第３の実施の形態で述べた例では、画像構成情報生成部２４は、画像構成情報の全てをサーバ１５から受信したが、本発明はこれに限定されるものではなく、画像構成情報の一部を送信装置１１から表現情報として受信し、残りの情報をサーバ１５から受信してもよい。具体的には、表現情報の画像構成情報に、画像表示部が画像を表示するのに必要となる情報の一部（例えば、基本画像）が欠けていたり、表現情報の画像構成情報に、基本画像がないことを示す情報と、基本画像が記憶されているインデックスとが含まれていたりする場合、画像構成情報生成部２４は、サーバ１５に接続して基本画像を受信し、サーバ１５から受信した基本画像と、表現情報に含まれている他の情報とで画像構成情報を生成してもよい。 In the example described in the third embodiment, the image configuration information generation unit 24 receives all of the image configuration information from the server 15, but the present invention is not limited to this, and the image configuration information is not limited to this. May be received from the transmission device 11 as expression information, and the remaining information may be received from the server 15. Specifically, a part of the information necessary for the image display unit to display an image (for example, a basic image) is missing in the image configuration information of the expression information, or the image configuration information of the expression information includes the basic information. In a case where information indicating that there is no image and an index in which the basic image is stored are included, the image configuration information generation unit 24 connects to the server 15 to receive the basic image and receives from the server 15 The image configuration information may be generated from the basic image and other information included in the expression information.

また、第３の実施の形態で述べた例では、画像構成情報の全部または一部をサーバ１５から受信するという構成になっているが、サーバ１５が声質情報を記憶し、声質情報生成部２３が通信回線１４を介してサーバ１５と接続され、声質情報の全部または一部をサーバ１５から受信して、声質情報を生成してもよい。その場合、表現情報に声質情報のインデックスが含まれる。さらに、画像構成情報部２４が画像構成情報の全部または一部をサーバ１５から受信し、声質情報生成部２３が声質情報の全部または一部をサーバ１５から受信する構成であってもよい。 In the example described in the third embodiment, all or part of the image configuration information is received from the server 15. However, the server 15 stores the voice quality information and the voice quality information generation unit 23. May be connected to the server 15 via the communication line 14, and all or part of the voice quality information may be received from the server 15 to generate the voice quality information. In that case, an index of voice quality information is included in the expression information. Further, the image configuration information unit 24 may receive all or part of the image configuration information from the server 15, and the voice quality information generation unit 23 may receive all or part of the voice quality information from the server 15.

以上、述べたように、この実施の形態によれば、画像表示部２２が表示する画像または音声合成部２１が生成する合成音声を、サーバ１５が記憶している画像構成情報または声質情報にもとづいて生成するため、例えば著名人やキャラクタの画像や音声を使用する権利を有する第三者が、画像構成情報または声質情報をサーバ１５に記憶させて、受信装置１３に送信することで、送信装置１１のユーザと受信装置１３のユーザとは、より多彩なコミュニケーションを行うことができる。 As described above, according to this embodiment, the image displayed by the image display unit 22 or the synthesized speech generated by the speech synthesis unit 21 is based on the image configuration information or voice quality information stored in the server 15. For example, a third party who has the right to use celebrity or character images and voices stores the image configuration information or voice quality information in the server 15 and transmits the information to the reception device 13. The 11 users and the user of the receiving device 13 can perform more various communications.

また、通信回線１４がＬＡＮ等の専用回線であれば、画像情報生成部２４とサーバ１５との通信プロトコルを簡易なものにできるので、システムの構築が簡単になる。また、通信回線１４がインターネット等の公衆回線網であれば、画像情報生成部２４は、他のサーバに接続して、他のサーバが記憶している画像構成情報または声質情報を受信することができるため、受信装置１３のユーザは、複数のサーバが記憶している画像構成情報または声質情報を使い分けることができる。 Further, if the communication line 14 is a dedicated line such as a LAN, the communication protocol between the image information generation unit 24 and the server 15 can be simplified, so that the system can be easily constructed. If the communication line 14 is a public line network such as the Internet, the image information generation unit 24 may connect to another server and receive image configuration information or voice quality information stored in the other server. Therefore, the user of the receiving device 13 can use image configuration information or voice quality information stored in a plurality of servers.

実施の形態４．
本発明の第４の実施の形態について、図面を参照して説明する。図１５は、本発明の第４の実施の形態の一構成例を示すブロック図である。本実施の形態の構成は、第３の実施の形態のサーバ１５に通信回線１６を介して外部の課金処理システムに接続される課金情報生成部（課金手段）２６を含む点が第３の実施の形態と異なる。そして、画像構成情報送信部２５は、サーバ画像構成情報記憶部３５が記憶している画像構成情報を通信回線１４を介して受信装置１３の画像構成情報生成部２４に送信すると、送信した画像構成情報と、送信先の受信装置１３を示す情報とを課金情報生成部２６に出力する。その他の構成は第３の実施の形態と同様である。そのため、第３の実施の形態と同様の回路等については図１１と同じ符号を付し、説明を省略する。 Embodiment 4 FIG.
A fourth embodiment of the present invention will be described with reference to the drawings. FIG. 15 is a block diagram showing a configuration example of the fourth embodiment of the present invention. The configuration of the present embodiment is that the server 15 of the third embodiment includes a billing information generation unit (billing means) 26 connected to an external billing processing system via the communication line 16 in the third embodiment. The form is different. Then, when the image configuration information transmission unit 25 transmits the image configuration information stored in the server image configuration information storage unit 35 to the image configuration information generation unit 24 of the reception device 13 via the communication line 14, the transmitted image configuration information is transmitted. The information and information indicating the destination receiving device 13 are output to the billing information generating unit 26. Other configurations are the same as those of the third embodiment. For this reason, circuits similar to those of the third embodiment are denoted by the same reference numerals as those in FIG.

次に、本発明の第４の実施の形態の動作について説明する。図１６は、画像構成情報生成部２４が画像構成情報送信部２５から画像構成情報を受信し、画像構成情報送信部２５が、送信した画像構成情報と送信先の受信装置１３とを示す情報を課金情報生成部２６に出力する際の動作を説明するシーケンス図である。 Next, the operation of the fourth exemplary embodiment of the present invention will be described. In FIG. 16, the image configuration information generation unit 24 receives image configuration information from the image configuration information transmission unit 25, and the image configuration information transmission unit 25 displays information indicating the transmitted image configuration information and the receiving device 13 that is the transmission destination. FIG. 11 is a sequence diagram illustrating an operation when outputting to billing information generating unit 26.

画像構成情報生成部２４が、画像構成情報のインデックスを受信すると（ステップＳ５０１）、サーバ１５の画像構成情報送信部２５に受信装置１３を示す情報を送信して接続を要求する、（ステップＳ５０２）。画像構成情報送信部２５は、受信装置１３の認証を行ない（ステップＳ５０３）、接続を許可する（ステップＳ５０４）。 When the image configuration information generation unit 24 receives the index of the image configuration information (step S501), it transmits information indicating the receiving device 13 to the image configuration information transmission unit 25 of the server 15 to request connection (step S502). . The image configuration information transmitting unit 25 authenticates the receiving device 13 (step S503) and permits the connection (step S504).

画像構成情報生成部２４は、画像構成情報のインデックスに含まれる位置番号を、画像構成情報送信部２５に送信する（ステップＳ５０５）。画像構成情報送信部２５は、位置番号を受信すると（ステップＳ５０６）、受信した位置番号の画像構成情報をサーバ画像構成情報記憶部３５から読み出し（ステップＳ５０７）、読み出した画像構成情報を、画像構成情報生成部２４に送信する（ステップＳ５０８）。 The image configuration information generation unit 24 transmits the position number included in the index of the image configuration information to the image configuration information transmission unit 25 (step S505). When receiving the position number (step S506), the image configuration information transmitting unit 25 reads the image configuration information of the received position number from the server image configuration information storage unit 35 (step S507), and the read image configuration information is read out from the image configuration information. It transmits to the information generation part 24 (step S508).

画像構成情報送信部２５は、送信した画像構成情報と送信先の受信装置１３を示す情報とを課金情報生成部２６に出力する（ステップＳ５０９）。課金情報生成部２６は、送信した画像構成情報と送信先の受信装置１３を示す情報とにもとづいて、受信装置１３に課金する金額を決定し、決定した課金する金額と受信装置１３を示す情報とである課金情報を生成し（ステップＳ５１０）、生成した課金情報を通信回線１６を介して外部の課金処理システムに送信する（ステップＳ５１１）。外部の課金処理システムは、受信した課金情報にもとづいて、受信装置１３のユーザに課金し、料金を請求する。なお、課金する金額は、外部の課金処理システムが決定してもよい。その場合、課金情報は、送信した画像構成情報と送信先の受信装置１３を示す情報を含む情報である。 The image configuration information transmitting unit 25 outputs the transmitted image configuration information and information indicating the destination receiving device 13 to the billing information generating unit 26 (step S509). The charging information generation unit 26 determines the amount to be charged to the receiving device 13 based on the transmitted image configuration information and the information indicating the receiving device 13 of the transmission destination, and the determined charging amount and the information indicating the receiving device 13 The billing information is generated (step S510), and the generated billing information is transmitted to the external billing processing system via the communication line 16 (step S511). The external charging processing system charges the user of the receiving device 13 based on the received charging information and charges a fee. The amount to be charged may be determined by an external charging processing system. In this case, the billing information is information including the transmitted image configuration information and information indicating the destination receiving device 13.

画像構成情報生成部２４は、画像構成情報を受信すると（ステップＳ５１２）、画像構成情報送信部２５に接続の切断を要求する（ステップＳ５１３）。画像構成情報送信部２５は、接続の切断を許可する（ステップＳ５１４）。画像構成情報生成部２４は、画像構成情報送信部２５との接続を切断する（ステップＳ５１５）。 When receiving the image configuration information (step S512), the image configuration information generation unit 24 requests the image configuration information transmission unit 25 to disconnect (step S513). The image configuration information transmission unit 25 permits the disconnection (step S514). The image configuration information generation unit 24 disconnects from the image configuration information transmission unit 25 (step S515).

なお、以上に述べた例では、課金情報生成部２６は、送信した画像構成情報と送信先の受信装置１３を示す情報とにもとづいて、受信装置１３に課金する金額を決定しているが、サーバ１５が声質情報を記憶し、送信した声質情報と送信先の受信装置１３を示す情報とにもとづいて、受信装置１３に課金する金額を決定してもよい。 In the example described above, the billing information generation unit 26 determines the amount to be billed to the receiving device 13 based on the transmitted image configuration information and the information indicating the receiving device 13 of the transmission destination. The server 15 may store voice quality information, and determine the amount charged to the receiving device 13 based on the transmitted voice quality information and information indicating the receiving device 13 of the transmission destination.

以上、述べたように、この実施の形態によれば、受信装置１３のユーザによる、画像構成情報や声質情報の利用回数や種類に応じた課金処理が可能となり、例えば著名人やキャラクタの声質情報や画像構成情報を有料で受信装置１３のユーザに提供することができる。また、例えば、画像構成情報や声質情報に広告・宣伝の要素を入れると、そのような画像構成情報や声質情報を利用した回数に応じて広告主に対して課金することができる。 As described above, according to this embodiment, the user of the receiving device 13 can perform billing processing according to the number and types of use of the image configuration information and voice quality information, for example, voice quality information of celebrities and characters. And image configuration information can be provided to the user of the receiving device 13 for a fee. Further, for example, if an element of advertisement / promotion is included in the image configuration information or voice quality information, the advertiser can be charged according to the number of times such image configuration information or voice quality information is used.

なお、本実施の形態では、課金情報生成部２６はサーバ１５に含まれるが、送信装置１１が課金情報生成部２６を含み、課金情報生成部２６は、送信部３２が表現情報を受信装置１３に送信すると、送信した表現情報に応じて、受信装置１３のユーザに課金してもよい。 In the present embodiment, the billing information generation unit 26 is included in the server 15, but the transmission device 11 includes the billing information generation unit 26, and the billing information generation unit 26 receives the expression information from the transmission unit 32. May be charged to the user of the receiving device 13 according to the transmitted expression information.

本発明によれば、電子メールや電子会議、チャット等の、テキストのメッセージの送受信を行う用途に適用することができる。また、マンマシンインタフェースのような、機械的に生成されるメッセージを出力する用途に適用することができる。 INDUSTRIAL APPLICABILITY According to the present invention, the present invention can be applied to uses for transmitting and receiving text messages such as e-mail, electronic conference, and chat. Further, the present invention can be applied to a purpose of outputting a mechanically generated message such as a man-machine interface.

本発明の第１の実施の形態の一構成例を示すブロック図である。It is a block diagram which shows one structural example of the 1st Embodiment of this invention. 本発明の第１の実施の形態の動作を説明するフローチャートである。It is a flowchart explaining the operation | movement of the 1st Embodiment of this invention. テキストメッセージ記憶部が記憶しているテキストメッセージの一例である。It is an example of the text message which the text message storage part has memorized. 表現情報の一例を示す説明図である。It is explanatory drawing which shows an example of expression information. 声質情報の一例を示す説明図である。It is explanatory drawing which shows an example of voice quality information. 画像構成情報の一例を示す説明図である。It is explanatory drawing which shows an example of image structure information. 合成音声の音声波形の一例を示す説明図である。It is explanatory drawing which shows an example of the audio | voice waveform of a synthetic voice. 画像表示部が出力する画像の例を示す説明図である。It is explanatory drawing which shows the example of the image which an image display part outputs. 本発明の第２の実施の形態の一構成例を示すブロック図である。It is a block diagram which shows one structural example of the 2nd Embodiment of this invention. 本発明の第２の実施の形態の動作を説明するフローチャートである。It is a flowchart explaining the operation | movement of the 2nd Embodiment of this invention. 本発明の第３の実施の形態の一構成例を示すブロック図である。It is a block diagram which shows one structural example of the 3rd Embodiment of this invention. 第３の実施の形態の表現情報の一例を示す説明図である。It is explanatory drawing which shows an example of the expression information of 3rd Embodiment. 画像構成情報生成部が画像構成情報送信部から画像構成情報を受信する際の動作を説明するシーケンス図である。It is a sequence diagram explaining operation | movement at the time of an image structure information generation part receiving image structure information from an image structure information transmission part. 本発明の第３の実施の形態の動作を説明するフローチャートである。It is a flowchart explaining the operation | movement of the 3rd Embodiment of this invention. 本発明の第４の実施の形態の一構成例を示すブロック図である。It is a block diagram which shows one structural example of the 4th Embodiment of this invention. 画像構成情報生成部が画像構成情報送信部から画像構成情報を受信し、画像構成情報送信部が、送信した画像構成情報と送信先の受信装置とを示す情報を課金情報生成部に出力する際の動作を説明するシーケンス図である。When the image configuration information generation unit receives the image configuration information from the image configuration information transmission unit, and the image configuration information transmission unit outputs information indicating the transmitted image configuration information and the receiving device of the transmission destination to the billing information generation unit. It is a sequence diagram explaining the operation | movement of. 従来のメッセージ伝達システムの一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the conventional message transmission system.

Explanation of symbols

１１送信装置
１２、１４、１６通信回線
１３受信装置
１５サーバ
２１音声合成部
２２画像表示部
２３声質情報生成部
２４画像構成情報生成部
２５画像構成情報送信部
２６課金情報生成部
２７スピーカ
３１テキストメッセージ記憶部
３２送信部
３３画像構成情報記憶部
３４表現情報記憶部
３５サーバ画像構成情報記憶部 DESCRIPTION OF SYMBOLS 11 Transmission apparatus 12, 14, 16 Communication line 13 Reception apparatus 15 Server 21 Speech synthesis part 22 Image display part 23 Voice quality information generation part 24 Image structure information generation part 25 Image structure information transmission part 26 Billing information generation part 27 Speaker 31 Text message Storage unit 32 Transmission unit 33 Image configuration information storage unit 34 Expression information storage unit 35 Server image configuration information storage unit

Claims

A transmission means for transmitting a text message;
Speech synthesis means for generating synthesized speech based on the received text message;
Image information generating means for generating information of an image to be displayed together with the output of the synthesized speech;
Image display means for displaying an image based on image information generated by the image information generation means;
Expression information storage means for preliminarily storing expression information including image configuration information which is information of an image to be displayed on the image display means and voice quality information which is information of characteristics of synthesized speech to be generated by the speech synthesis means. ,
The image information generation means generates image information based on the image configuration information,
The speech synthesizer generates synthesized speech based on the voice quality information, generates delimiter information which is information indicating a delimiter of the generated synthesized speech, and inputs it to the image display unit,
The message display system is characterized in that the image display means changes the image to be displayed at the timing of the synthesized speech separation based on the inputted separation information.

A transmission device including transmission means;
The message transmission system according to claim 1, further comprising: a receiving device including voice synthesis means, image information generation means, image display means, and expression information storage means.

A receiving device including speech synthesis means, image information generation means, and image display means;
A transmission device including transmission means, expression information storage means, and expression information transmission means for transmitting expression information stored in the expression information storage means to the reception device;
The receiving device includes voice quality information generating means for receiving the expression information and generating voice quality information,
The message transmission system according to claim 1, wherein the image information generation unit receives the expression information from the transmission device and generates image configuration information from the expression information.

The message transmission system according to claim 3, wherein the transmitting device includes charging means for generating expression information charging information, which is fee information corresponding to the expression information transmitted to the receiving device.

External image configuration information storage means for storing all or part of the image configuration information, and image configuration information transmission for transmitting all or part of the image configuration information stored in the external image configuration information storage means to the receiving device The message transmission system according to claim 3, further comprising an image configuration information providing device including means.

The message transmission system according to claim 5, wherein the receiving device and the image configuration information providing device are connected by a dedicated line.

The message transmission system according to claim 5, wherein the receiving device and the image configuration information providing device are connected by a public network.

The expression information includes image index information that is information indicating the position of all or part of the image configuration information stored in the external image configuration information storage unit,
The image information generation unit requests the image configuration information transmission unit to transmit all or a part of the image configuration information stored in the external image configuration information storage unit based on the image index information. The message transmission system according to any one of items 7 to 9.

The message transmission system according to claim 8, wherein the image configuration information transmitting unit transmits all or part of the image configuration information stored in the external image configuration information storage unit to the receiving device in response to a request from the image information generation unit.

The message transmission according to any one of claims 5 to 9, wherein the image configuration information providing device includes charging means for generating image charging information which is fee information according to the image configuration information transmitted to the receiving device. system.

Voice quality information including external voice quality information storage means for storing all or part of voice quality information, and voice quality information transmission means for transmitting all or part of the voice quality information stored in the external voice quality information storage means to a receiving device The message transmission system according to any one of claims 3 to 10, further comprising a providing device.

The message transmission system according to claim 11, wherein the receiving device and the voice quality information providing device are connected by a dedicated line.

The message transmission system according to claim 11, wherein the receiving device and the voice quality information providing device are connected by a public network.

The expression information includes voice quality index information that is information indicating the position of all or part of the voice quality information stored in the external voice quality information storage means,
The voice synthesizing unit requests the voice quality information transmitting unit to transmit all or part of the voice quality information stored in the external voice quality information storage unit based on the voice quality index information. The message transmission system according to any one of the above.

The message transmission system according to claim 14, wherein the voice quality information transmitting means transmits all or part of the voice quality information stored in the external voice quality information storage means to the receiving device in response to a request from the voice synthesis means.

The message transmission system according to any one of claims 11 to 15, wherein the voice quality information providing device includes billing means for generating voice quality billing information which is information on a fee according to voice quality information transmitted to the receiving device.

Receive a text message,
Generating synthesized speech based on voice quality information of expression information including image configuration information which is information of an image to be displayed and voice quality information which is information of characteristics of synthesized speech, and the received text message;
Generating delimiter information which is information indicating a delimiter of the synthesized speech;
Outputting the synthesized speech,
Generating information of an image to be displayed together with the output of the synthesized speech based on the image configuration information;
Displaying an image based on the image information;
A message transmission method characterized in that an image to be displayed is changed at the timing of synthesis speech separation based on the separation information.

Generate image composition information from expression information,
The message transmission method according to claim 17, wherein voice quality information is generated from the expression information.

Send pre-stored expression information,
The message transmission method according to claim 17, wherein when the expression information is received, image configuration information is generated from the received expression information, and voice quality information is generated from the received expression information.

The message transmission method according to claim 19, wherein expression information billing information that is fee information according to the transmitted expression information is generated.

The expression information includes image index information which is information indicating a position where all or part of the image configuration information is stored in the external image configuration information storage unit storing all or part of the image configuration information. ,
Based on the image index information, the external image configuration information storage means is requested to transmit all or part of the image configuration information,
The message transmission method according to claim 19 or 20, wherein all or part of the image configuration information is received from the external image configuration information storage unit.

The message transmission method according to claim 21, wherein the external image configuration information storage means generates image billing information which is fee information according to the transmitted image configuration information.

The expression information includes voice quality index information which is information indicating a position where all or part of the voice quality information is stored in the external voice quality information storage means storing all or part of the voice quality information,
Based on the voice quality index information, the external voice quality information storage means is requested to transmit all or part of the voice quality information,
The message transmission method according to any one of claims 19 to 22, wherein all or part of the voice quality information is received from the external voice quality information storage unit.

24. The message transmission method according to claim 23, wherein the external voice quality information storage means generates voice quality billing information which is information on a fee according to the transmitted voice quality information.

In a receiving device that receives a text message from a sending device,
Speech synthesis means for generating synthesized speech based on voice quality information of expression information including image configuration information which is image information and voice quality information which is characteristic information of synthesized speech, and a text message received from the transmission device; And image information generating means for generating image information to be displayed together with the output of the synthesized speech based on the image configuration information, and image display means for displaying an image based on the image information generated by the image information generating means. Including
The speech synthesizer generates delimiter information that is information indicating a delimiter of the generated synthesized speech, and inputs the delimiter information to the image display unit.
The receiving apparatus according to claim 1, wherein the image display means changes an image to be displayed at the timing of the synthesized speech separation based on the inputted separation information.

Including expression information storage means for storing expression information in advance,
The receiving device according to claim 25, wherein the image information generating means generates image configuration information from the expression information.

Voice quality information generating means for receiving expression information and generating voice quality information from the expression information;
The receiving device according to claim 25, wherein the image information generating means receives the expression information and generates image configuration information from the expression information.

The receiving device according to claim 26 or claim 27, wherein all or part of the image configuration information is received from an image configuration information providing device that stores all or part of the image configuration information.

The expression information includes image index information that is information indicating the position of all or part of the image configuration information stored in the image configuration information providing device,
The image information generation means requests the image information providing device to transmit all or part of the image configuration information stored in the image information providing device based on the image index information, and from the image information providing device. The receiving device according to claim 28, wherein all or part of the image configuration information stored in the image information providing device is received.

30. The receiving device according to claim 28 or 29, connected to the image information providing device by a dedicated line.

30. The receiving apparatus according to claim 28 or 29, connected to the image information providing apparatus via a public network.

The receiving device according to any one of claims 26 to 31, wherein the receiving device receives all or part of the voice quality information from a voice quality information providing device that stores all or part of the voice quality information.

The expression information includes voice quality index information that is information indicating the position of all or part of the voice quality information stored in the voice quality information providing device,
Based on the voice quality index information, the voice synthesis means requests the voice quality information providing apparatus to transmit all or part of the voice quality information stored in the voice quality information providing apparatus, and from the voice quality information providing apparatus, The receiving device according to claim 32, wherein all or part of the voice quality information stored in the information providing device is received.

The receiving device according to claim 32 or 33, wherein the receiving device is connected to the voice quality information providing device via a dedicated line.

The receiving device according to claim 32 or 33, wherein the receiving device is connected to the voice quality information providing device via a public network.

Transmission that transmits a text message to be transmitted to the receiving device, image configuration information that is information of an image to be displayed on the receiving device, and expression information that includes voice quality information that is characteristic of synthesized speech of the text message to be generated by the receiving device A transmission device comprising: means.

On the computer,
A synthesized speech is generated based on voice quality information of expression information including image configuration information which is image information and voice quality information which is characteristic information of synthesized speech, and a received text message, and indicates a break of the synthesized speech Speech synthesis processing for generating delimiter information as information,
Image information generation processing for generating information on an image to be displayed together with the output of the synthesized voice of the text message based on the image configuration information;
A message transmission program for causing an image display means for displaying an image based on the separation information to perform an image display process for changing and displaying an image at a separation timing of the synthesized speech.

On the computer,
Receiving expression information, and executing voice quality information generation processing for generating voice quality information from the expression information;
38. The message transmission program according to claim 37, wherein in the image information generation process, the expression information is received, and a process of generating image configuration information from the expression information is executed.

On the computer,
In the image information generation process, an external image configuration information storage unit that stores all or part of the image configuration information, and an image that transmits all or part of the image configuration information stored in the external image configuration information storage unit From the image information providing apparatus including the configuration information transmitting means, based on the image index information that is included in the expression information and indicates the position of all or part of the image configuration information stored in the external image configuration information storage means. , Causing the image information providing apparatus to execute a process of requesting the image information providing apparatus to transmit all or part of the image configuration information stored in the image information providing apparatus, and sending all or part of the image configuration information from the image information providing apparatus. 39. The message transmission program according to claim 38, wherein the message transmission program is executed.

On the computer,
External voice quality information storage means for storing all or part of voice quality information in voice synthesis processing, and voice quality information transmission means for transmitting all or part of the voice quality information stored in the external voice quality information storage means The voice quality information is provided based on voice quality index information that is included in the expression information and indicates the position of all or part of the voice quality information stored in the external voice quality information storage means from the voice quality information providing device that includes the voice quality information A process for requesting the voice quality information providing apparatus to transmit all or part of the voice quality information stored in the apparatus, and a process for receiving all or part of the voice quality information from the voice quality information providing apparatus The message transmission program according to claim 38 or 39.