JP6677596B2

JP6677596B2 - Communication terminal

Info

Publication number: JP6677596B2
Application number: JP2016136819A
Authority: JP
Inventors: 武石原
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2016-07-11
Filing date: 2016-07-11
Publication date: 2020-04-08
Anticipated expiration: 2036-07-11
Also published as: JP2018011116A

Description

本発明は、通信端末に関する。 The present invention relates to a communication terminal.

従来から、特許文献１に記載されているような、音声を発することなく電話の対応ができる装置が提案されている。特許文献１に記載された装置では、予め応答用のテキストが記憶されており、応答時に選択されたテキストが音声信号へ音声合成変換されて出力されるとされている。また、音声信号は、予め装置で記憶されていてもよいとされている。上記のように予め応答用のテキストを用意しておくことで、即座に応答を行うことができ、スムーズな会話を行うことができるとされている。 2. Description of the Related Art Conventionally, there has been proposed an apparatus capable of handling a telephone call without emitting a voice, as described in Patent Document 1. In the device described in Patent Literature 1, a text for response is stored in advance, and the text selected at the time of response is subjected to voice synthesis conversion into a voice signal and output. Further, the audio signal may be stored in the device in advance. It is said that by preparing a response text in advance as described above, a response can be made immediately and a smooth conversation can be performed.

特開２００３−９９０８１号公報JP-A-2003-99081

電話に用いる合成音声を生成するために、電話を行う端末自身で上記の音声合成を行うと、端末の処理能力及び音声合成を行うアプリケーションに起因して合成音声に不自然さが残ることがある。一方で、端末よりも計算能力及び記憶領域の観点で勝るサーバで合成音声を作成することで、より自然な合成音声を利用することができる。そこで、端末において、サーバで作成された合成音声を予め取得して記憶しておくことで、より自然な合成音声を用いて電話を行うことができる。しかしながら、合成音声の取得には端末とサーバとの間の通信が必要となるため、電話で用いる候補となるテキストに対応する合成音声を一律に取得しようとすると大容量のデータの通信を行うこととなり得る。従って、一律な合成音声の取得は、必ずしも適切でない場合がある。 If the above-mentioned speech synthesis is performed by the terminal that makes the call in order to generate the synthesized speech used for the phone call, unnaturalness may remain in the synthesized speech due to the processing capability of the terminal and the application that performs the speech synthesis. . On the other hand, by creating a synthesized speech by a server that is superior in terms of computational power and storage area than the terminal, more natural synthesized speech can be used. Therefore, by obtaining and storing in advance the synthesized speech created by the server in the terminal, it is possible to make a call using a more natural synthesized speech. However, since communication between the terminal and the server is required to obtain synthesized speech, large amounts of data communication must be performed to uniformly obtain synthesized speech corresponding to text that is a candidate for telephone use. Can be Therefore, obtaining uniform synthesized speech may not always be appropriate.

本発明は、上記に鑑みてなされたものであり、通信に利用される音声データの取得を適切に行うことができる通信端末を提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a communication terminal that can appropriately acquire audio data used for communication.

上記目的を達成するために、本発明に係る通信端末は、テキストを音声データに変換して、変換された音声データを別の通信端末との間での通信に利用する通信端末であって、通信に先立って、音声データへの変換の対象となるテキストを入力する入力手段と、自端末の状態、入力手段によって入力されたテキスト自体、及び自端末における通信の履歴の少なくとも何れかに基づいて、当該入力されたテキストに対応する音声データを通信に先立って取得するか否かを判断する判断手段と、判断手段による判断に基づいて、入力手段によって入力されたテキストに対応する音声データを、別の装置から取得する取得手段と、を備える。 In order to achieve the above object, a communication terminal according to the present invention is a communication terminal that converts text into voice data and uses the converted voice data for communication with another communication terminal, Prior to communication, based on at least one of an input unit for inputting a text to be converted to voice data, a state of the own terminal, the text itself input by the input unit, and a communication history in the own terminal. Determining means for determining whether to obtain voice data corresponding to the input text prior to communication, and voice data corresponding to the text input by the input means, based on the determination by the determining means, Acquiring means for acquiring from another device.

本発明に係る通信端末では、自端末の状態、テキスト自体、及び自端末における通信の履歴の少なくとも何れかに基づいて、当該テキストに対応する音声データを通信に先立って取得するか否かが判断される。従って、本発明に係る通信端末によれば、通信に利用される音声データの取得を適切に行うことができる。 In the communication terminal according to the present invention, it is determined whether or not to acquire voice data corresponding to the text prior to communication based on at least one of the state of the own terminal, the text itself, and the communication history of the own terminal. Is done. Therefore, according to the communication terminal of the present invention, it is possible to appropriately obtain audio data used for communication.

判断手段は、自端末に着信があった場合にテキストに対応する音声データを通信に先立って取得すると判断し、取得手段は、テキストに対応する音声データを別の装置から取得するまで着信に係る通信の確立を禁止する制御を行う。この構成によれば、自端末に着信があった場合に音声データの取得を適切に行うことができる。 The determining means determines that voice data corresponding to the text is acquired prior to communication when there is an incoming call to the own terminal, and the obtaining means determines that the voice data corresponding to the text is received until the voice data is obtained from another device. It intends line control to prohibit the establishment of communication. According to this configuration, when there is an incoming call to the own terminal, it is possible to appropriately obtain audio data.

又は、判断手段は、通信の履歴に示される通信の回数に基づいて、テキストに対応する音声データを通信に先立って取得するか否かを判断する。この構成によれば、自端末の通信の回数に応じて音声データの取得を適切に行うことができる。 Or, determining means, based on the number of times of communication shown in the history of communications, it determines whether to acquire prior to communicate voice data corresponding to the text. According to this configuration, it is possible to appropriately acquire audio data according to the number of times of communication of the own terminal.

又は、通信端末は、通信を行う際にテキストの指定を受け付けて、当該指定されたテキストを、取得手段によって取得されると共に当該指定されたテキストに対応する音声データに変換して、通信相手の通信端末に送信する変換手段を更に備える。この構成によれば、自端末の状態、及び自端末における通信の履歴の少なくとも何れかに応じて音声データの取得をより適切に行うことができる。 Alternatively, the communication terminal accepts the designation of the text when performing communication, converts the designated text into voice data acquired by the acquiring unit and corresponding to the designated text, and further Ru comprising a conversion means for transmitting to the communication terminal. According to this configuration, it is possible to more appropriately acquire audio data according to at least one of the state of the own terminal and the communication history of the own terminal.

又は、通信端末は、通信を行う際に通信相手の通信端末からテキストを示す情報を受信して、当該受信した情報によって示されるテキストを、取得手段によって取得されると共に当該受信した情報によって示されるテキストに対応する音声データに変換して、変換した音声データに基づく音声出力を行う変換手段を更に備える。この構成によれば、自端末において指定されたテキストに対応する音声データを通信相手の端末に送信することができる。 Alternatively, the communication terminal receives information indicating the text from the communication terminal of the communication partner when performing communication, and obtains the text indicated by the received information by the obtaining unit and is indicated by the received information. converts the audio data corresponding to the text, further Ru comprising a conversion means for performing an audio output based on the converted audio data. According to this configuration, the voice data corresponding to the text specified in the own terminal can be transmitted to the communication partner terminal.

又は、入力手段は、別の通信端末からテキストを受信し、判断手段は、自端末に記憶されている、テキストの送信元の別の通信端末に係る情報に基づいて、テキストに対応する音声データを通信に先立って取得するか否かを判断する。この構成によれば、送信元の別の通信端末に応じて音声データの取得を適切に行うことができる。 Alternatively, the input unit receives a text from another communication terminal, and the determination unit determines the voice data corresponding to the text based on information stored in the own terminal and related to another communication terminal from which the text is transmitted. it determines whether to obtain prior to the communication. According to this configuration, acquisition of audio data can be appropriately performed according to another communication terminal of the transmission source.

また、本発明に係る通信端末は、テキストを音声データに変換して、変換された音声データを別の通信端末との間での通信に利用する通信相手の通信端末と通信を行う通信端末であって、通信に先立って、音声データへの変換の対象となるテキストを入力する入力手段と、通信に先立って、入力手段によって入力されたテキストを通信相手の通信端末に送信する送信手段と、を備える。 Further, the communication terminal according to the present invention is a communication terminal that performs communication with a communication terminal of a communication partner that converts text into voice data and uses the converted voice data for communication with another communication terminal. Prior to communication, input means for inputting a text to be converted to voice data, and transmission means for transmitting the text input by the input means to a communication terminal of a communication partner prior to communication, Is provided.

上記の本発明に係る通信端末は、上述した音声データを取得する通信端末に対応するものである。この構成によれば、適切かつ確実に音声データを取得する通信端末にテキストを入力させることができる。 The communication terminal according to the present invention described above corresponds to the communication terminal that acquires the above-described audio data. According to this configuration, the text can be input to the communication terminal that acquires the audio data appropriately and reliably.

送信手段は、自端末に記憶されている通信相手の通信端末に係る情報に基づいて、テキストを当該通信相手の通信端末に送信するか否かを判断することとしてもよい。この構成によれば、通信相手の通信端末に応じてテキストの送信を適切に行うことができる。 The transmitting unit may determine whether to transmit the text to the communication terminal of the communication partner based on the information on the communication terminal of the communication partner stored in the own terminal. According to this configuration, text transmission can be appropriately performed according to the communication terminal of the communication partner.

本発明によれば、通信に利用される音声データの取得を適切に行うことができる。 ADVANTAGE OF THE INVENTION According to this invention, acquisition of the audio | voice data used for communication can be performed appropriately.

本発明の第１実施形態に係る通信端末の構成を示す図である。FIG. 2 is a diagram illustrating a configuration of a communication terminal according to the first embodiment of the present invention. 本発明の実施形態に係る通信端末のハードウェア構成を示す図である。FIG. 2 is a diagram illustrating a hardware configuration of a communication terminal according to the embodiment of the present invention. 本発明の第１実施形態に係る通信端末で、音声データが取得されて保存される際に実行される処理を示すフローチャートである。4 is a flowchart illustrating a process executed when audio data is acquired and stored in the communication terminal according to the first embodiment of the present invention. 本発明の第１実施形態に係る通信端末で、保存された音声データが用いられて通信が行われる際に実行される処理を示すフローチャートである。6 is a flowchart illustrating a process executed when communication is performed using stored audio data in the communication terminal according to the first embodiment of the present invention. 本発明の第２実施形態に係る通信端末の構成を示す図である。It is a figure showing the composition of the communication terminal concerning a 2nd embodiment of the present invention. 本発明の第２実施形態に係る通信端末で、定型文が別の通信端末に送信される際に実行される処理を示すフローチャートである。It is a flowchart which shows the process performed when a fixed phrase is transmitted to another communication terminal in the communication terminal which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る通信端末で、保存された音声データが用いられて通信が行われる際に実行される処理を示すフローチャートである。9 is a flowchart illustrating a process executed when communication is performed using stored voice data in the communication terminal according to the second embodiment of the present invention.

以下、図面と共に本発明に係る通信端末の実施形態について詳細に説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。 Hereinafter, embodiments of a communication terminal according to the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same elements will be denoted by the same reference symbols, without redundant description.

＜第１実施形態＞
図１に第１実施形態に係る通信端末１０を示す。通信端末１０は、ユーザによって用いられ、通信網Ｎを介して別の通信端末２０との間で通信を行う装置である。通信端末１０は、具体的には、携帯電話機又はスマートフォン等に相当する。通信端末１０は、別の通信端末２０との間で行われる通信として、通信端末１０，２０間で音声通信の通信接続である呼接続を確立して音声データを送受信する電話通信を行う。通信網Ｎは、例えば、移動体通信網又は無線ＬＡＮ（ローカル・エリア・ネットワーク）である。また、通信網Ｎは、固定電話網等を含む複数の網によって構成されていてもよい。通信端末１０は、無線によって通信を行う。 <First embodiment>
FIG. 1 shows a communication terminal 10 according to the first embodiment. The communication terminal 10 is a device that is used by a user and communicates with another communication terminal 20 via the communication network N. The communication terminal 10 specifically corresponds to a mobile phone, a smartphone, or the like. The communication terminal 10 performs telephone communication for transmitting and receiving voice data by establishing a call connection, which is a communication connection for voice communication, between the communication terminals 10 and 20, as communication performed with another communication terminal 20. The communication network N is, for example, a mobile communication network or a wireless LAN (local area network). Further, the communication network N may be configured by a plurality of networks including a fixed telephone network and the like. The communication terminal 10 performs wireless communication.

通信端末１０は、本実施形態に係る機能として、テキストによって通信を行う機能を有している。具体的には、通信端末１０は、通信を行う際にユーザからテキストの指定を受け付けて、当該指定されたテキストを音声データに変換して、通信相手の通信端末２０に送信する機能を有している。即ち、通信端末１０は、テキストを音声データに変換して、変換された音声データを別の通信端末２０との間での通信に利用する。この機能により、通信端末１０のユーザは、声を出さずに通話（電話）を行うことができる。これにより、例えば、ユーザが電車又はバス等の公共の乗り物に乗っている場合、会議中又は授業中である場合、あるいはお手洗い又はエレベータ等の公共の場所にいる場合等であっても、ユーザは声を出すことなく通話を行うことができる。 The communication terminal 10 has a function of performing communication by text as a function according to the present embodiment. Specifically, the communication terminal 10 has a function of receiving a designation of a text from a user when performing communication, converting the designated text into voice data, and transmitting the voice data to the communication terminal 20 of the communication partner. ing. That is, the communication terminal 10 converts the text into voice data and uses the converted voice data for communication with another communication terminal 20. With this function, the user of the communication terminal 10 can make a call (telephone) without making a voice. Thereby, for example, even when the user is in a public vehicle such as a train or a bus, during a meeting or in class, or in a public place such as a restroom or an elevator, Can talk without speaking.

なお、通信端末１０は、本実施形態に係る機能とは別に、ユーザからの音声（発話）を受け付けて、当該音声による音声データを通信相手の通信端末２０に送信する機能を有していてもよい。即ち、通信端末１０は、通常と同様に電話を行う機能を有していてもよい。 Note that, apart from the function according to the present embodiment, the communication terminal 10 may have a function of receiving a voice (utterance) from a user and transmitting voice data of the voice to the communication terminal 20 of the communication partner. Good. That is, the communication terminal 10 may have a function of making a telephone call as usual.

また、通信相手の通信端末２０から送信されて通信端末１０によって受信された音声データは、イヤフォン等で音声としてそのままユーザに聞かれてもよいし、あるいは、音声認識によってテキストに変換されて表示されてもよい。 The voice data transmitted from the communication terminal 20 of the communication partner and received by the communication terminal 10 may be directly heard by the user as voice using earphones or the like, or may be converted into text by voice recognition and displayed. You may.

また、通信端末１０は、通常の携帯電話機又はスマートフォン等が備えている従来機能を備えていてもよい。後述する通信端末１０の本実施形態に係る機能は、従来機能を前提としているものがある。通信端末１０が本実施形態に係る当該機能を有している場合、前提としている従来機能も有している。具体的には、本実施形態に係る機能の説明で述べる。 In addition, the communication terminal 10 may have a conventional function included in an ordinary mobile phone, a smartphone, or the like. Some functions of the communication terminal 10 according to the present embodiment described below are based on conventional functions. When the communication terminal 10 has the function according to the present embodiment, the communication terminal 10 also has the conventional function as a premise. More specifically, the function according to the present embodiment will be described.

引き続いて、本実施形態に係る通信端末１０の機能を説明する。図１に示すように通信端末１０は、変換部１１と、入力部１２と、判断部１３と、取得部１４とを備えて構成される。 Subsequently, functions of the communication terminal 10 according to the present embodiment will be described. As shown in FIG. 1, the communication terminal 10 includes a conversion unit 11, an input unit 12, a determination unit 13, and an acquisition unit 14.

変換部１１は、通信を行う際（通信を行っている際）にテキストの指定を受け付けて、当該指定されたテキストを対応する音声データに変換して、通信相手の別の通信端末２０に送信する変換手段である。変換部１１は、通信の前に予め定型文であるテキストと音声データとを対応付けて記憶（保存）している。テキストに対応する音声データは、当該テキストが読み上げられた音声に係るものである。 The conversion unit 11 accepts designation of a text when performing communication (during communication), converts the specified text into corresponding voice data, and transmits the voice data to another communication terminal 20 of the communication partner. Conversion means. The conversion unit 11 stores (saves) a text, which is a fixed sentence, and voice data in association with each other before communication. The voice data corresponding to the text relates to the voice from which the text was read aloud.

変換部１１は、通信時に、指定されるテキストの候補である定型文を通信端末１０が備える表示装置に表示する等してユーザに提示する。指定されるテキストの候補である定型文は、予め記憶されたテキストである。変換部１１は、当該候補の提示に応じて行われる定型文を指定するユーザの操作を受け付ける。変換部１１は、指定された定型文に対応付けられて記憶している音声データを読み出して、通信時に確立されている通信接続を介して別の通信端末２０に送信する。変換部１１が記憶している音声データは、後述するように取得部１４によって取得されるものである。 The conversion unit 11 presents to the user, for example, by displaying a fixed phrase that is a candidate for the designated text on a display device included in the communication terminal 10 during communication. The fixed phrase that is a candidate for the designated text is a text stored in advance. The conversion unit 11 accepts a user operation to specify a fixed phrase performed in response to the presentation of the candidate. The conversion unit 11 reads out the voice data stored in association with the designated fixed phrase and transmits it to another communication terminal 20 via the communication connection established at the time of communication. The audio data stored in the conversion unit 11 is acquired by the acquisition unit 14 as described later.

なお、通信中に音声データの生成及び取得を行うこととすると、当該生成及び取得にかかる時間により１秒〜数秒程度のタイムラグが生じる。上記のように定型文に対応する音声データを予め記憶しておき、それを通信に利用することでタイムラグの発生を防ぐことができる。 If audio data is generated and acquired during communication, a time lag of about one second to several seconds occurs depending on the time required for the generation and acquisition. As described above, by storing voice data corresponding to a fixed phrase in advance and using it for communication, a time lag can be prevented.

また、変換部１１は、予め記憶された定型文以外のテキストの入力を受け付けてもよい。その場合、変換部１１は、その時点で入力したテキストに対応する音声データを取得して、通信相手の別の通信端末２０に送信する。変換部１１は、後述する取得部１４による方法と同様に、音声合成装置３０からテキストに対応する音声データを取得する。あるいは、変換部１１は、取得部１４による方法とは異なる方法、例えば、自端末１０における音声合成アプリケーション等で音声データを取得してもよい。このように取得された音声データも、上記と同様にそれ以降の通信で利用できるようテキストと対応付けて記憶しておいてもよい。 Further, the conversion unit 11 may receive an input of a text other than a fixed phrase stored in advance. In that case, the conversion unit 11 acquires voice data corresponding to the text input at that time, and transmits the voice data to another communication terminal 20 of the communication partner. The conversion unit 11 acquires speech data corresponding to a text from the speech synthesis device 30 in the same manner as the method by the acquisition unit 14 described below. Alternatively, the conversion unit 11 may acquire the voice data by a method different from the method by the acquisition unit 14, for example, by a voice synthesis application or the like in the terminal 10. The voice data obtained in this manner may be stored in association with the text so that it can be used in the subsequent communication as described above.

入力部１２は、通信に先立って、音声データへの変換の対象となるテキストである定型文を入力する入力手段である。入力部１２は、例えば、ユーザのテキストを入力する操作を受け付けて定型文を入力する。入力部１２による定型文の入力は、通信が行われていない状態、即ち、通信接続が確立されていない状態で行われる。 The input unit 12 is an input unit for inputting a fixed phrase, which is a text to be converted into voice data, prior to communication. The input unit 12 receives, for example, a user's operation of inputting a text and inputs a fixed phrase. The input of a fixed phrase by the input unit 12 is performed in a state where communication is not performed, that is, a state where a communication connection is not established.

定型文の生成（編集、更新）は、通信端末１０以外の装置で行われてもよい。例えば、ユーザがＰＣ（パーソナルコンピュータ）等を操作して定型文をＰＣに入力して、入力部１２は、ＰＣから通信端末１０に送信された定型文を受信して入力してもよい。あるいは、通信に用いられる定型文は、通信事業者によって用意されてもよい。この場合、入力部１２は、通信事業者のサーバから送信された定型文を受信して入力してもよい。また、この場合の定型文は、通信端末１０のユーザの属性等に応じて作成されたものであってもよい。入力部１２は、入力した定型文を判断部１３に出力する。 The generation (editing, updating) of the fixed phrase may be performed by a device other than the communication terminal 10. For example, the user may operate a PC (personal computer) or the like to input a fixed phrase to the PC, and the input unit 12 may receive and input the fixed phrase transmitted from the PC to the communication terminal 10. Alternatively, a fixed phrase used for communication may be prepared by a communication carrier. In this case, the input unit 12 may receive and input the fixed phrase sent from the server of the communication carrier. In addition, the fixed phrase in this case may be created according to the attribute of the user of the communication terminal 10 or the like. The input unit 12 outputs the input fixed phrase to the determination unit 13.

判断部１３は、自端末１０の状態、入力部１２によって入力された定型文自体、及び自端末１０における通信の履歴の少なくとも何れかに基づいて、当該入力された定型文に対応する音声データを通信に先立って取得するか否かを判断する判断手段である。音声データの取得は、後述するように音声合成装置３０から行われる。そのため、音声データの取得には、通信が必要となる。移動体通信では、通信量の制限（例えば、何日で何ＧＢといった制限）又は通信量に応じた課金が生じることがある。また、大容量のデータの通信によって、通信網Ｎに大きな負荷がかかる。そこで、判断部１３は、音声データを取得することが妥当な場合のみに音声データを取得させるものである。具体的には、判断部１３は、以下のように判断を行う。 The judging unit 13 converts the voice data corresponding to the input fixed phrase based on at least one of the state of the terminal 10, the fixed phrase itself input by the input unit 12, and the communication history of the terminal 10. This is a determining means for determining whether or not to acquire the information before communication. Acquisition of voice data is performed from the voice synthesizer 30 as described later. Therefore, communication is required to obtain the audio data. In mobile communication, there may be a limit on the amount of communication (for example, a limit on how many days and how many GB) or a charge according to the amount of communication. In addition, a large load is placed on the communication network N due to the communication of a large amount of data. Therefore, the determination unit 13 causes the audio data to be obtained only when it is appropriate to obtain the audio data. Specifically, the determination unit 13 makes a determination as follows.

判断部１３は、自端末１０の通信状態を参照して、自端末１０が無線ＬＡＮ（例えば、Ｗｉ−Ｆｉ）での通信が可能な状態であれば（自端末１０が無線ＬＡＮエリアに存在していれば）、音声データを通信に先立って取得すると判断する。無線ＬＡＮでの通信であれば、移動体通信での制限及び課金が生じないためである。 The determination unit 13 refers to the communication state of the own terminal 10 and, if the own terminal 10 can communicate with a wireless LAN (for example, Wi-Fi) (when the own terminal 10 exists in the wireless LAN area). ), It is determined that audio data is acquired prior to communication. This is because, in the case of communication using a wireless LAN, there is no restriction or billing in mobile communication.

判断部１３は、自端末１０が無線ＬＡＮでの通信が可能な状態でなく、移動体通信のみが可能な状態であれば、以下のように判断を行う。判断部１３は、自端末１０に着信があった場合に（着信があったタイミングで）音声データを通信に先立って取得すると判断する。通信端末１０に着信があった場合、ユーザへの通知が行われる。この通知は、例えば、いわゆる着信音を鳴動させることにより行われる。これに対して、ユーザが通信端末１０に対して応答の操作を行うと、通信端末１０が通信相手の別の通信端末２０との間で通信接続を確立する処理を行って、通信が開始される。 The determination unit 13 makes the following determination if the terminal 10 is not in a state in which communication by wireless LAN is possible but in a state in which only mobile communication is possible. The determination unit 13 determines that the voice data is to be acquired prior to the communication when the terminal 10 receives a call (at the timing of the call reception). When there is an incoming call to the communication terminal 10, a notification is sent to the user. This notification is performed, for example, by sounding a so-called ring tone. On the other hand, when the user performs a response operation on the communication terminal 10, the communication terminal 10 performs a process of establishing a communication connection with another communication terminal 20 of the communication partner, and communication starts. You.

着信があった場合、音声データが通信に用いられる可能性がある。従って、通信に先立って音声データを取得するとしたものである。この場合、後述するように通信接続の確立についての制御も行われてもよい。また、この場合、判断部１３は、音声データの取得対象である定型文に応じて、音声データを取得すると判断してもよい。例えば、当該定型文の量が予め設定した閾値以下である場合のみに、音声データを取得すると判断してもよい。具体的には、当該定型文の数が一文である場合のみに、音声データを取得すると判断してもよい。あるいは、この場合に予め取得すると設定された定型文についてのみ、音声データを取得すると判断してもよい。例えば、通信で用いられる可能性が高いと考えられる定型文について、予め取得すると設定される。当該設定は、例えば、通信端末１０のユーザ又は定型文の生成者によって行われる。 If there is an incoming call, voice data may be used for communication. Therefore, audio data is acquired prior to communication. In this case, control for establishing a communication connection may be performed as described later. In this case, the determination unit 13 may determine that the audio data is to be acquired according to the fixed phrase from which the audio data is to be acquired. For example, it may be determined that audio data is acquired only when the amount of the fixed phrase is equal to or less than a preset threshold. Specifically, only when the number of the fixed phrases is one, it may be determined that the audio data is acquired. Alternatively, in this case, it may be determined that audio data is to be acquired only for a fixed phrase set to be acquired in advance. For example, it is set to acquire a fixed phrase that is likely to be used in communication in advance. The setting is performed by, for example, the user of the communication terminal 10 or the creator of the fixed phrase.

また、この設定は、統計情報として得られる定型文の利用頻度に応じて行われてもよい。例えば、音声合成装置３０において音声データを生成した回数に基づいて、設定が行われてもよい。当該回数が閾値以上の定型文について、予め音声データを取得するものとして設定する。あるいは、全ての定型文の回数の和に対する回数の割合が閾値以上の定型文について、予め音声データを取得するものとして設定する。この設定は、例えば、通信事業者が定型文を用意する場合に、上記の音声合成装置３０において統計情報が参照されて行われる。 Further, this setting may be performed according to the frequency of use of fixed phrases obtained as statistical information. For example, the setting may be performed based on the number of times the voice synthesizer 30 has generated voice data. For a fixed sentence whose number of times is equal to or greater than a threshold value, audio data is set in advance to be acquired. Alternatively, for a fixed sentence whose ratio of the number of times to the sum of the number of times of all the fixed sentences is equal to or more than a threshold value, audio data is set in advance to be acquired. This setting is performed, for example, by referring to the statistical information in the speech synthesizer 30 when a communication carrier prepares a fixed phrase.

通信の確立前に音声データを取得する場合、上記のように定型文に応じて音声データを取得することで、音声データの取得が完了した状態で、通信相手の別の通信端末２０のユーザからの発信を受けることができる。 When the voice data is acquired before the communication is established, by acquiring the voice data according to the fixed phrase as described above, the user of another communication terminal 20 of the communication partner is in a state where the acquisition of the voice data is completed. Can be sent.

判断部１３は、通信の履歴に示される通信の回数に基づいて、定型文に対応する音声データを通信に先立って取得するか否かを判断する。具体的には、判断部１３は、以下のように判断を行う。 The determination unit 13 determines whether or not to acquire voice data corresponding to a fixed phrase before communication based on the number of times of communication indicated in the communication history. Specifically, the determination unit 13 makes a determination as follows.

通信端末１０では、通信（通話）が行われると、通信が行われた時刻、通信が行われた時間（長さ）、発着信の別及び通信相手の別の通信端末２０を示す情報が、通信の履歴として記憶される。判断部１３は、上記の情報を用いて判断を行う。通信の履歴から新たに通信が行われる可能性が高いと考えられる場合に、通信に先立って音声データを取得するとしたものである。 In the communication terminal 10, when communication (call) is performed, information indicating the time at which the communication was performed, the time (length) at which the communication was performed, and information indicating another communication terminal 20 of the outgoing / incoming call and the other party of the communication are: It is stored as a communication history. The determination unit 13 makes a determination using the above information. If it is considered that there is a high possibility that a new communication will be performed from the communication history, the voice data is acquired prior to the communication.

例えば、判断部１３は、自端末１０に記憶された通信の履歴を参照して、以下の式により着信確率を示す値を算出する。
［過去２４時間以内の通話時間合計］÷２４
ＭＩＮ（［過去１週間の着信回数］÷３０，１）
ＭＩＮ（［過去１週間での現在時刻前後１時間での着信回数］÷７，１）
ＭＩＮ（［過去１ヶ月での現在の曜日での着信回数］÷４，１）
なお、上記の式のＭＩＮ（Ｘ，Ｙ）は、Ｘ及びＹのうち、小さな値を取る関数である。判断部１３は、算出した値と、上記の判断を行うための判断基準である予め設定された閾値とを比較する。判断部１３は、算出した値が閾値以上であった場合、音声データを取得すると判断する。 For example, the determination unit 13 refers to the communication history stored in the terminal 10 and calculates a value indicating the probability of arrival by the following equation.
[Total call time within last 24 hours] $ 24
MIN ([Number of incoming calls in the past week] $ 30,1)
MIN ([Number of incoming calls in one hour before and after the current time in the past week] $ 7,1)
MIN ([Number of incoming calls on the current day in the past month] $ 4,1)
Note that MIN (X, Y) in the above equation is a function that takes a small value out of X and Y. The determination unit 13 compares the calculated value with a preset threshold, which is a criterion for making the above determination. When the calculated value is equal to or larger than the threshold value, the determining unit 13 determines to acquire the audio data.

また、上記に加えて、判断部１３は、以下のように自端末１０又はユーザの状態（ステータス）等に基づいて判断を行うこととしてもよい。例えば、判断部１３は、自端末１０の位置に基づいて判断を行う。この場合、通信端末１０は、自端末１０の位置を推定し、推定された自端末１０の位置を示す位置情報を判断部１３に入力する。自端末１０の位置の推定は、従来の方法、例えば、ＧＰＳ（グローバル・ポジショニング・システム）又は基地局信号等を利用して行うことができる。判断部１３は、位置情報によって示される位置が、予め設定された特定の範囲に入っているか否かを判断する。判断部１３は、当該位置が特定の範囲に入っていると判断した場合、例えば、算出した値を２倍にする、あるいは閾値を下げる（例えば、半分にする）等で音声データを取得すると判断しやすくすることとしてもよい。特定の範囲は、例えば、駅又はバス停といったユーザが移動中であると考えられる位置、及び会議室の位置等である。これらの位置は、ユーザが声を出せず、定型文による通話を行う可能性が高い位置である。 In addition to the above, the determination unit 13 may make the determination based on the state (status) of the terminal 10 or the user as described below. For example, the determination unit 13 makes a determination based on the position of the terminal 10 itself. In this case, the communication terminal 10 estimates the position of the own terminal 10 and inputs position information indicating the estimated position of the own terminal 10 to the determination unit 13. The position of the terminal 10 can be estimated by a conventional method, for example, using a GPS (Global Positioning System) or a base station signal. The determining unit 13 determines whether or not the position indicated by the position information is within a specific range set in advance. When determining that the position is within the specific range, the determination unit 13 determines to acquire the audio data by, for example, doubling the calculated value or lowering (for example, halving) the threshold value. It may be easy to do. The specific range is, for example, a position such as a station or a bus stop where the user is considered to be moving, a position of a conference room, and the like. These positions are positions where the user cannot speak and has a high possibility of making a call using a fixed phrase.

また、判断部１３は、自端末１０の速度又は加速度に基づいて、上記と同様な判断を行うこととしてもよい。自端末１０の速度又は加速度は、位置情報によって示される位置の時間毎の遷移によって又はそれらを検出するセンサによって得ることができる。判断部１３は、速度又は加速度が予め設定された閾値以上であると判断した場合、上記と同様に音声データを取得すると判断しやすくすることとしてもよい。速度又は加速度が、一定以上である場合には、ユーザが移動中であると考えられる。 Further, the determination unit 13 may make the same determination as described above based on the speed or the acceleration of the terminal 10 itself. The speed or acceleration of the terminal 10 can be obtained by a time-based transition of the position indicated by the position information or by a sensor that detects them. When the determination unit 13 determines that the speed or the acceleration is equal to or higher than the preset threshold, the determination unit 13 may make it easier to determine that the voice data is to be acquired in the same manner as described above. If the speed or the acceleration is equal to or higher than a certain value, it is considered that the user is moving.

また、上記に加えてあるいは代えて、判断部１３は、ユーザのスケジュールに基づいて判断を行うこととしてもよい。この場合、通信端末１０は、スケジューラ等でユーザのスケジュールを示す情報を記憶している。ユーザのスケジュールは、従来の方法を利用して、メールの本文の解析等から得ることとしてもよい。スケジュールを示す情報としては、スケジュールの時間帯、スケジュールの内容等が対応付けられた情報である。判断部１３は、現時点の時刻、あるいは現時点の時刻から予め設定した一定時間後の時刻が、何らかのスケジュールの時間帯に含まれていると判断した場合、上記と同様に音声データを取得すると判断しやすくすることとしてもよい。あるいは、予め設定された特定のスケジュール（例えば、会議及び移動中）の時間帯に上記の時刻が含まれていると判断した場合、音声データを取得すると判断しやすくすることとしてもよい。また、スケジュールが別の通信端末２０のユーザから参照可能である場合には、当該ユーザが何もスケジュールが入っていない時間帯に発信を行うことも考えられる。そのため、何もスケジュールが入っていない時間帯に上記の時刻が含まれていると判断した場合、音声データを取得すると判断しやすくすることとしてもよい。 Further, in addition to or instead of the above, the determination unit 13 may make the determination based on the schedule of the user. In this case, the communication terminal 10 stores information indicating a user's schedule by a scheduler or the like. The user's schedule may be obtained from analysis of the body of the mail using a conventional method. The information indicating the schedule is information in which the time zone of the schedule, the content of the schedule, and the like are associated. When the determining unit 13 determines that the current time or the time after a predetermined time set from the current time is included in the time zone of any schedule, the determining unit 13 determines that the voice data is to be acquired in the same manner as described above. It may be easier. Alternatively, when it is determined that the above-mentioned time is included in the time zone of a specific schedule set in advance (for example, during a meeting or during a move), it may be easier to determine that audio data is acquired. Further, when the schedule can be referred to by a user of another communication terminal 20, it is conceivable that the user makes a call during a time period when no schedule is entered. Therefore, when it is determined that the above time is included in a time zone where no schedule is included, it may be easier to determine that audio data is acquired.

これらの判断により、定型文による通信が行われる可能性が高いにもかかわらず、事前に音声データが取得されていないという可能性を下げることができる。 With these determinations, it is possible to reduce the possibility that voice data has not been acquired in advance, although there is a high possibility that communication using fixed phrases will be performed.

また、通信端末１０の移動体通信が速度制限中である場合にも、音声データを取得すると判断しやすくすることとしてもよい。なぜなら、速度制限中の事前取得は、制限中でない場合の事前取得よりも重要だからである。速度制限中は取得の為にかかる時間がより多く必要になり、事前取得していないと、返答が大きく更に遅れてしまう。なお、速度制限中であるか否かを示す情報については、例えば、移動体通信網の通信事業者によって設けられたサーバから取得することができる。また、別の通信端末２０に発信をしたが、相手（別の通信端末２０のユーザ）が出なかった等の、通信端末１０に電話がかかってくる可能性が高い時も上記と同様にしてもよい。また、上記の条件が重なる場合、例えば、電話をかけたが相手が出なかった際に自身が移動中になった場合には、個別に計算する以上に音声データを取得すると判断しやすくすることとしてもよい。例えば、算出した値を２倍×２倍以上の６倍にしてもよい。また、音声データを取得すると判断しやすくすると判断した上記の場合に、音声データを取得すると判断しやすくするのではなく、音声データを必ず取得すると判断することとしてもよい。 Further, even when the speed of the mobile communication of the communication terminal 10 is being limited, it may be easy to determine that the voice data is to be acquired. This is because the pre-acquisition during the speed limit is more important than the pre-acquisition without the speed limit. During the speed limit, more time is required for acquisition, and if the acquisition is not performed in advance, the response will be further delayed. The information indicating whether or not the speed limit is being performed can be acquired from, for example, a server provided by a communication carrier of a mobile communication network. Also, when there is a high possibility that a call will be made to the communication terminal 10 such as when a call is made to another communication terminal 20 but the other party (user of the other communication terminal 20) does not appear, the same as above is performed. Is also good. In addition, when the above conditions overlap, for example, when a call is made but the other party does not answer and the user is on the move, it is easier to determine that voice data is acquired than to calculate individually. It may be. For example, the calculated value may be 6 times, that is, 2 × 2 or more. In addition, in the above-described case where it is determined that acquiring audio data is easy to determine, it may be determined that audio data is always acquired instead of making it easy to determine acquiring audio data.

また、判断部１３は、自端末１０の電池残量に基づいて判断を行うこととしてもよい。通信端末１０は、通常、充電可能な電池により動作する。判断部１３は、自端末１０の電池残量を参照して、電池残量が予め設定された閾値以下であるか否かを判断する。判断部１３は、電池残量が閾値以下であると判断した場合、例えば、閾値を上げる等で音声データを取得すると判断しにくくすることとしてもよい。この場合、事前の音声データを取得して電池を消費して通話自体ができなくなる危険性を下げることができる。 Further, the determination unit 13 may make the determination based on the remaining battery power of the terminal 10. The communication terminal 10 normally operates with a rechargeable battery. The determining unit 13 refers to the remaining battery level of the terminal 10 and determines whether the remaining battery level is equal to or less than a preset threshold. When the determination unit 13 determines that the remaining battery level is equal to or less than the threshold, the determination unit 13 may make it difficult to determine that audio data is acquired, for example, by increasing the threshold. In this case, it is possible to reduce the risk that a call itself cannot be made by consuming the battery by acquiring the audio data in advance.

判断部１３は、上記の判断を、例えば、入力部１２から定型文が入力されたタイミングで行う。あるいは、一定時間毎に上記の判断を行うこととしてもよい。着信時の判断については、通信端末１０に着信があった場合に行う。また、判断部１３は、定型文に応じた判断を行う場合には、定型文毎に判断を行う。なお、上記では、複数の判断方法を示したが、それらを全て行う必要はなく、本発明としては何れかの判断が行われていればよい。また、複数の条件を組み合わせて判断してもよい。例えば、何れかの条件を満たせば音声データを取得すると判断してもよいし、複数の条件を全て満たした場合に音声データを取得すると判断してもよい。 The determination unit 13 performs the above determination at the timing when, for example, a fixed phrase is input from the input unit 12. Alternatively, the above determination may be made at regular intervals. The determination at the time of the incoming call is made when the communication terminal 10 receives the incoming call. Further, when making a determination according to a fixed phrase, the determination unit 13 performs the determination for each fixed phrase. Although a plurality of determination methods have been described above, it is not necessary to perform all of them, and any determination may be made as the present invention. Also, the determination may be made by combining a plurality of conditions. For example, it may be determined that audio data is acquired if any of the conditions are satisfied, or it may be determined that audio data is acquired if all of a plurality of conditions are satisfied.

判断部１３は、入力された定型文に対応する音声データを通信に先立って取得すると判断した場合、当該定型文を取得部１４に出力する。判断部１３は、着信時の判断である場合には、あわせてその旨も取得部１４に出力する。 When determining that the voice data corresponding to the input fixed phrase is to be acquired prior to the communication, the determination unit 13 outputs the fixed phrase to the acquisition unit 14. If the determination is for an incoming call, the determination unit 13 also outputs the fact to the acquisition unit 14.

取得部１４は、判断部１３による判断に基づいて、定型文に対応する音声データを通信端末１０とは別の装置である音声合成装置３０から取得する取得手段である。 The obtaining unit 14 is an obtaining unit that obtains voice data corresponding to a fixed phrase from a voice synthesizing device 30 that is different from the communication terminal 10 based on the determination by the determining unit 13.

音声合成装置３０は、テキストを通信端末１０から受信して、音声合成を行って受信したテキストに対応する音声データを生成する（テキストの読み上げを行う）装置である。音声合成装置３０は、生成した音声データを通信端末１０に送信する。音声合成自体は、従来の方法によって行われる。音声合成装置３０では、処理能力及びアプリケーション等の相違から、通信端末１０で作成されるよりも自然な合成音声を作成することができる。通信端末１０と音声合成装置３０とは、通信網Ｎを介して互いに情報の送受信を行うことができる。 The voice synthesizer 30 is a device that receives a text from the communication terminal 10 and performs voice synthesis to generate voice data corresponding to the received text (to read out the text). The voice synthesizer 30 transmits the generated voice data to the communication terminal 10. Speech synthesis itself is performed by a conventional method. The speech synthesizer 30 can create a synthesized speech that is more natural than that created by the communication terminal 10 due to differences in processing capability, applications, and the like. The communication terminal 10 and the speech synthesizer 30 can transmit and receive information to and from each other via the communication network N.

取得部１４は、定型文に対応する音声データを音声合成装置３０からダウンロードする。具体的には、取得部１４は、判断部１３から定型文を入力すると、当該定型文を音声合成装置３０に送信する。取得部１４は、当該送信に応じて音声合成装置３０から送信された定型文に対応する音声データを受信して取得する。取得部１４は、定型文及び取得した音声データを対応付けて変換部１１に出力し、変換部１１に記憶させる。 The acquisition unit 14 downloads voice data corresponding to the fixed phrase from the voice synthesizer 30. Specifically, upon inputting a fixed phrase from the determining unit 13, the acquiring unit 14 transmits the fixed phrase to the speech synthesizer 30. The acquiring unit 14 receives and acquires the voice data corresponding to the fixed phrase sent from the voice synthesizer 30 in response to the transmission. The acquisition unit 14 outputs the fixed phrase and the acquired voice data to the conversion unit 11 in association with each other, and causes the conversion unit 11 to store the data.

判断部１３による判断が着信時のものである場合、取得部１４は、音声データを取得するまで当該着信に係る通信の確立を禁止する制御を行う。即ち、取得部１４は、この場合、音声合成装置３０から音声データを受信するまで、ユーザから応答の操作があった場合でも、自端末１０において通信接続を確立する処理を禁止する制御を行う。これにより、音声データを取得した状態で通話を開始することができる。但し、取得部１４は、上記の禁止の制御をせずに（通信接続の確立とは独立に）音声データを取得することとしてもよい。この場合、通信接続が確立された後に音声データが取得されてもよい。 If the determination by the determining unit 13 is for an incoming call, the obtaining unit 14 performs control to prohibit establishment of communication related to the incoming call until voice data is obtained. That is, in this case, the acquisition unit 14 performs control to prohibit the process of establishing a communication connection in the own terminal 10 even if a response operation is performed by the user until the voice data is received from the voice synthesizer 30. As a result, a call can be started in a state where the voice data has been acquired. However, the acquisition unit 14 may acquire the audio data without performing the above-described prohibition control (independently of the establishment of the communication connection). In this case, audio data may be acquired after a communication connection is established.

この場合、上述したように定型文に応じたタイミングで音声データを取得することとしてもよい。例えば、「お世話になっております。」のような通話の最初に使われるテキストについては、通信接続の確立前に音声データを取得しておくこととするのがよい。一方で、「よろしくお願い致します。」のような通話の最初に使われないテキストであれば、通信接続の確立後に（通信接続の確立とは独立に）音声データを取得してもよい。上記の制御は、上述したように定型文毎の設定により行われ得る。以上が、本実施形態に係る通信端末１０の機能である。 In this case, as described above, the audio data may be obtained at a timing corresponding to the fixed phrase. For example, for a text used at the beginning of a call, such as "I'm taking care of you," it is better to obtain voice data before establishing a communication connection. On the other hand, if the text is not used at the beginning of the call, such as "Thank you.", The voice data may be acquired after the communication connection is established (independently of the establishment of the communication connection). The above control can be performed by setting for each fixed sentence as described above. The above is the function of the communication terminal 10 according to the present embodiment.

図２に本実施形態に係る通信端末１０のハードウェア構成を示す。図２に示すように、通信端末１０は、１つ以上のＣＰＵ（Central Processing Unit）１０１、主記憶装置であるＲＡＭ（RandomAccess Memory）１０２及びＲＯＭ１０３（Read Only Memory)、操作モジュール１０４、無線通信モジュール１０５、アンテナ１０６、マイク１０７、スピーカ１０８並びにディスプレイ１０９等のハードウェアにより構成されている。これらの構成要素がプログラム等により動作することにより、上述した通信端末１０の各機能が発揮される。以上が、通信端末１０の構成である。 FIG. 2 shows a hardware configuration of the communication terminal 10 according to the present embodiment. As shown in FIG. 2, the communication terminal 10 includes one or more CPUs (Central Processing Units) 101, a random access memory (RAM) 102 and a ROM 103 (Read Only Memory) as main storage devices, an operation module 104, a wireless communication module 105, an antenna 106, a microphone 107, a speaker 108, a display 109, and other hardware. The functions of the communication terminal 10 described above are exhibited by operating these components by a program or the like. The above is the configuration of the communication terminal 10.

引き続いて、図３及び図４のフローチャートを用いて、通信端末１０で実行される処理を説明する。まず、図３のフローチャートを用いて、音声データが取得されて保存される際の処理を説明する。 Subsequently, the processing executed by the communication terminal 10 will be described with reference to the flowcharts of FIGS. First, processing when audio data is acquired and stored will be described with reference to the flowchart in FIG.

本処理では、入力部１２によって、通信に先立って、音声データへの変換の対象となるテキストである定型文が入力される（Ｓ０１）。入力された定型文は、入力部１２から判断部１３に出力される。続いて、判断部１３によって、自端末１０の状態、入力部１２によって入力された定型文自体、及び自端末１０における通信の履歴の少なくとも何れかに基づいて、当該入力された定型文に対応する音声データを通信に先立って取得するか否か、即ち、音声データの事前ダウンロードを行うか否かが判断される（Ｓ０２）。なお、この判断は、上述したように例えば、定型文が入力されたタイミング、一定時間毎、及び通信端末１０に着信があった場合に行われる。 In this processing, the input unit 12 inputs a fixed phrase, which is a text to be converted into voice data, prior to communication (S01). The input fixed phrase is output from the input unit 12 to the determination unit 13. Subsequently, the determination unit 13 corresponds to the input fixed phrase based on at least one of the state of the own terminal 10, the fixed phrase itself input by the input unit 12, and the communication history in the own terminal 10. It is determined whether or not the audio data is acquired prior to the communication, that is, whether or not the audio data is pre-downloaded (S02). This determination is performed, for example, as described above, at the timing when a fixed phrase is input, at regular intervals, and when there is an incoming call to the communication terminal 10.

音声データの事前ダウンロードを行わないと判断された場合（Ｓ０３のＮＯ）は、本処理は終了する。なお、一定時間毎の判断を行う場合は、上記の判断から一定時間経過後、再度Ｓ０２から処理を再開する。音声データの事前ダウンロードを行うと判断された場合（Ｓ０３のＹＥＳ）は、定型文が、判断部１３から取得部１４に出力される。続いて、取得部１４によって、定型文に対応する音声データが音声合成装置３０から事前ダウンロードされて取得される（Ｓ０４）。なお、判断部１３による判断が、通信端末１０に着信があった場合のものであれば、上述したようにこの際に、音声データが取得されるまで、取得部１４によって当該着信に係る通信の確立を禁止する制御が行われることとしてもよい。定型文及び取得された音声データは、対応付けられて、取得部１４から変換部１１に出力されて、変換部１１によって保存される（Ｓ０５）。以上が、音声データが取得されて保存される際の処理である。 If it is determined that the pre-download of the audio data is not to be performed (NO in S03), this processing ends. When the determination is made at regular intervals, the process is restarted from S02 again after a lapse of a fixed time from the above determination. When it is determined that the audio data is to be downloaded in advance (YES in S03), the fixed phrase is output from the determination unit 13 to the acquisition unit 14. Subsequently, the acquisition unit 14 acquires and downloads the speech data corresponding to the fixed phrase from the speech synthesis apparatus 30 in advance (S04). If the judgment by the judging unit 13 is a case where there is an incoming call to the communication terminal 10, as described above, at this time, until the voice data is obtained, the obtaining unit 14 performs the communication of the incoming call. Control for prohibiting establishment may be performed. The fixed phrase and the obtained voice data are associated with each other, output from the obtaining unit 14 to the conversion unit 11, and stored by the conversion unit 11 (S05). The above is the processing when audio data is acquired and stored.

引き続いて、図４のフローチャートを用いて、保存された音声データが用いられて通信が行われる際の処理を説明する。本処理は、通信端末１０と別の通信端末２０との間で通信接続が確立されており、通信端末１０から別の通信端末２０に音声データが送信される際のものである。 Subsequently, processing when communication is performed using the stored audio data will be described with reference to the flowchart in FIG. This processing is performed when a communication connection is established between the communication terminal 10 and another communication terminal 20 and voice data is transmitted from the communication terminal 10 to another communication terminal 20.

通信端末１０では、通信において通信相手の通信端末２０への送信に係る入力が行われる（Ｓ１１）。当該入力が、ユーザ自身の発話によるもの、即ち、音声の入力である場合（Ｓ１２のＹＥＳ）には、当該音声による音声データが、通信端末１０から通信相手の通信端末２０に送信される（Ｓ１３）。当該入力が、音声の入力でない場合（Ｓ１２のＮＯ）、当該入力はテキストに係る情報の入力であり、当該情報は変換部１１に入力される。 In the communication terminal 10, an input related to transmission to the communication terminal 20 of the communication partner is performed in communication (S11). If the input is a user's own utterance, that is, a voice input (YES in S12), the voice data of the voice is transmitted from the communication terminal 10 to the communication terminal 20 of the communication partner (S13). ). If the input is not a voice input (NO in S12), the input is input of information related to text, and the information is input to the conversion unit 11.

当該情報が、変換部１１に音声データに対応付けられて記憶されている定型文を指定するものであった場合（Ｓ１４のＹＥＳ）、変換部１１によって、指定された定型文に対応付けられて記憶している音声データが読み出されて、取得される（Ｓ１５）。続いて、取得された音声データが、変換部１１から通信相手の通信端末２０に送信される（Ｓ１３）。 If the information specifies the fixed phrase stored in the conversion unit 11 in association with the voice data (YES in S14), the conversion unit 11 associates the fixed phrase with the specified fixed phrase. The stored audio data is read and obtained (S15). Subsequently, the acquired voice data is transmitted from the conversion unit 11 to the communication terminal 20 of the communication partner (S13).

当該情報が、変換部１１に音声データに対応付けられて記憶されている定型文を指定するものでなかった場合（Ｓ１４のＮＯ）、当該情報は、変換部１１に音声データに対応付けられて記憶されている定型文以外のテキストである。その場合、変換部１１によって、入力されたテキストに対応する音声データが、音声合成装置３０からダウンロードされて取得される（Ｓ１６）。続いて、取得された音声データが、変換部１１から通信相手の通信端末２０に送信される（Ｓ１３）。以上が、保存された音声データが用いられて通信が行われる際の処理である。 If the information does not specify a fixed phrase stored in the conversion unit 11 in association with the audio data (NO in S14), the information is associated with the audio data in the conversion unit 11. This is a text other than the stored fixed phrase. In this case, the conversion unit 11 downloads and acquires the speech data corresponding to the input text from the speech synthesis device 30 (S16). Subsequently, the acquired voice data is transmitted from the conversion unit 11 to the communication terminal 20 of the communication partner (S13). The above is the processing when communication is performed using the stored audio data.

上述したように本実施形態では、自端末１０の状態、テキスト自体、及び自端末１０における通信の履歴の少なくとも何れかに基づいて、当該テキストに対応する音声データを通信に先立って取得するか否かが判断される。従って、本実施形態によれば、例えば、通信に音声データを利用する可能性が高いタイミングで取得を行うことができる。あるいは、通信に利用する可能性が高いテキストについて音声データを取得することができる。このように、本実施形態によれば、通信に利用される音声データの取得を適切に行うことができる。 As described above, in the present embodiment, based on at least one of the state of the own terminal 10, the text itself, and the communication history of the own terminal 10, whether or not to acquire voice data corresponding to the text prior to communication Is determined. Therefore, according to the present embodiment, for example, acquisition can be performed at a timing when there is a high possibility that audio data is used for communication. Alternatively, it is possible to acquire voice data for a text that is likely to be used for communication. As described above, according to the present embodiment, it is possible to appropriately acquire audio data used for communication.

更に具体的には、着信時に音声データを取得することとしてもよい。この構成によれば、自端末１０に着信があった場合に音声データの取得を適切に行うことができる。また、この際に通信の確立を禁止する制御を行うことで、音声データを取得した状態で通話を開始することができる。 More specifically, voice data may be acquired at the time of an incoming call. According to this configuration, when there is an incoming call to the own terminal 10, it is possible to appropriately acquire the voice data. At this time, by performing control to prohibit the establishment of communication, it is possible to start a call in a state where voice data is acquired.

また、本実施形態のように、通信の履歴に示される通信の回数に基づいて判断を行うこととしてもよい。この構成によれば、自端末の通信の回数に応じて、通信に音声データを利用する可能性が高いタイミングで適切に取得を行うことができる。 Further, as in the present embodiment, the determination may be made based on the number of times of communication indicated in the communication history. According to this configuration, acquisition can be appropriately performed at a timing when there is a high possibility that voice data will be used for communication in accordance with the number of times of communication of the own terminal.

また、本実施形態のようにテキストの指定を受け付けて、音声データへの変換を行って、当該音声データを通信相手の通信端末２０に送信することとしてもよい。即ち、通信相手の通信端末２０に送信する音声データが、指定されたテキストから変換されたものであってもよい。この構成によれば、自端末１０において指定されたテキストに対応する音声データを通信相手の端末に送信することができ、確実にテキストを用いた通話を行うことができる。 Also, as in the present embodiment, the designation of a text may be received, converted into voice data, and the voice data may be transmitted to the communication terminal 20 of the communication partner. That is, the voice data to be transmitted to the communication terminal 20 of the communication partner may be converted from the specified text. According to this configuration, the voice data corresponding to the text specified in the own terminal 10 can be transmitted to the communication partner terminal, and the telephone call using the text can be reliably performed.

なお、本実施形態では、通信端末１０は、無線によって通信を行う装置としたが、本発明に係る通信端末は、有線によって通信を行う装置であってもよい。 In the present embodiment, the communication terminal 10 is a device that performs wireless communication, but the communication terminal according to the present invention may be a device that performs wired communication.

＜第２実施形態＞
引き続いて、第２実施形態を説明する。第１実施形態では、送信側の通信端末１０においてテキストから音声データへの変換が行われていた。本実施形態では、受信側の通信端末においてテキストから音声データへの変換が行われる。一方で、送信側の通信端末では、通話に係るテキストが送信される。 <Second embodiment>
Subsequently, a second embodiment will be described. In the first embodiment, the transmission-side communication terminal 10 performs conversion from text to voice data. In the present embodiment, conversion from text to voice data is performed in the communication terminal on the receiving side. On the other hand, the transmitting communication terminal transmits the text related to the call.

図５に第２実施形態に係る通信端末４０，５０を示す。通信端末４０が、通話時にテキストを受信する受信側の通信端末であり、通信端末５０が、通話時にテキストを送信する送信側の通信端末である。即ち、通信端末４０は、テキストを音声データに変換して、変換された音声データを別の通信端末との間での通信に利用する通信端末である。また、通信端末５０は、当該通信端末４０と通信を行う通信端末である。なお、特に説明を行わない点については、通信端末４０，５０は、第１実施形態の通信端末１０と同様の構成をとることとしてもよい。また、一つの通信端末が、通信端末４０，５０のそれぞれの機能を有していてもよい。 FIG. 5 shows communication terminals 40 and 50 according to the second embodiment. The communication terminal 40 is a receiving communication terminal that receives a text during a call, and the communication terminal 50 is a transmitting communication terminal that transmits a text during a call. That is, the communication terminal 40 is a communication terminal that converts text into voice data and uses the converted voice data for communication with another communication terminal. The communication terminal 50 is a communication terminal that communicates with the communication terminal 40. Note that the communication terminals 40 and 50 may have the same configuration as the communication terminal 10 of the first embodiment, unless specifically described. Further, one communication terminal may have the respective functions of the communication terminals 40 and 50.

引き続いて、本実施形態に係る通信端末４０，５０の機能を説明する。図５に示すように通信端末４０は、変換部４１と、入力部４２と、判断部４３と、取得部４４とを備えて構成される。 Subsequently, functions of the communication terminals 40 and 50 according to the present embodiment will be described. As shown in FIG. 5, the communication terminal 40 includes a conversion unit 41, an input unit 42, a determination unit 43, and an acquisition unit 44.

変換部４１は、通信を行う際（通信を行っている際）に通信相手の通信端末５０からテキストを示す情報を受信して、当該受信した情報によって示されるテキストを対応する音声データに変換して、変換した音声データに基づく音声出力を行う変換手段である。変換部１１は、通信の前に予め定型文であるテキストと音声データとを対応付けて記憶（保存）している。テキストに対応する音声データは、当該テキストが読み上げられた音声に係るものである。 The conversion unit 41 receives information indicating a text from the communication terminal 50 of the communication partner when performing communication (when performing communication), and converts the text indicated by the received information into corresponding voice data. Means for outputting a sound based on the converted sound data. The conversion unit 11 stores (saves) a text, which is a fixed sentence, and voice data in association with each other before communication. The voice data corresponding to the text relates to the voice from which the text was read aloud.

変換部４１は、通信時に確立されている通信接続を介して通信相手の通信端末５０から、定型文を示す情報を受信する。定型文を示す情報は、定型文（テキスト）自体であってもよいし、定型文の識別子等の定型文を特定する情報であってもよい。変換部４１は、受信した情報によって示される定型文に対応付けられて記憶している音声データを読み出して、自端末４０が備えるマイク等によって音声出力を行う。あるいは、音声出力は、自端末４０に接続されたイヤフォンによって行われてもよい。変換部４１が記憶している音声データは、後述するように取得部４４によって取得されるものである。 The conversion unit 41 receives information indicating a fixed phrase from the communication terminal 50 of the communication partner via the communication connection established during communication. The information indicating the fixed phrase may be the fixed phrase (text) itself, or may be information identifying the fixed phrase such as an identifier of the fixed phrase. The conversion unit 41 reads out audio data stored in association with the fixed phrase indicated by the received information, and performs audio output using a microphone or the like provided in the terminal 40 itself. Alternatively, the audio output may be performed by an earphone connected to own terminal 40. The audio data stored in the conversion unit 41 is acquired by the acquisition unit 44 as described later.

また、変換部４１は、予め記憶された定型文以外のテキストを、通信相手の通信端末５０から受信してもよい。その場合、変換部４１は、その時点で入力したテキストに対応する音声データを取得して、音声出力を行う。変換部４１は、後述する取得部４４による方法と同様に、音声合成装置３０からテキストに対応する音声データを取得する。あるいは、変換部４１は、取得部４４による方法とは異なる方法、例えば、自端末４０における音声合成アプリケーション等で音声データを取得してもよい。このように取得された音声データも、上記と同様にそれ以降の通信で利用できるようテキストと対応付けて記憶しておいてもよい。 Further, the conversion unit 41 may receive a text other than the fixed phrase stored in advance from the communication terminal 50 of the communication partner. In that case, the conversion unit 41 obtains audio data corresponding to the text input at that time, and performs audio output. The conversion unit 41 acquires speech data corresponding to a text from the speech synthesis device 30 in the same manner as the method by the acquisition unit 44 described below. Alternatively, the conversion unit 41 may acquire the voice data by a method different from the method by the acquisition unit 44, for example, by a voice synthesis application or the like in the own terminal 40. The voice data obtained in this manner may be stored in association with the text so that it can be used in the subsequent communication as described above.

入力部４２は、通信に先立って、音声データへの変換の対象となるテキストである定型文を入力する入力手段である。入力部４２は、自端末４０とは別の通信端末５０から定型文を受信する。定型文の送受信は、通話のための音声通信の通信接続（発着信により確立される通信接続）とは異なる方法、例えば、データ通信によるメッセージの送受信によって行われる。入力部４２は、入力した定型文を判断部４３に出力する。また、入力部４２は、定型文の送信元の通信端末５０を示す情報を判断部４３に出力する。送信元の通信端末５０を示す情報は、例えば、通信端末５０の電話番号又は予め通信端末５０に設定されたＩＤである。情報を送受信する通信端末は、従来と同様の方法で特定することができる。 The input unit 42 is an input unit for inputting a fixed phrase, which is a text to be converted into voice data, prior to communication. The input unit 42 receives a fixed phrase from a communication terminal 50 different from the own terminal 40. The transmission and reception of fixed phrases is performed by a method different from the communication connection of voice communication for communication (communication connection established by outgoing / incoming calls), for example, by transmitting and receiving a message by data communication. The input unit 42 outputs the input fixed phrase to the determination unit 43. Further, the input unit 42 outputs information indicating the communication terminal 50 that is the transmission source of the fixed phrase to the determination unit 43. The information indicating the communication terminal 50 of the transmission source is, for example, a telephone number of the communication terminal 50 or an ID preset in the communication terminal 50. The communication terminal that transmits and receives information can be specified by a method similar to the conventional method.

判断部４３は、自端末４０の状態、入力部４２によって入力された定型文自体、及び自端末４０における通信の履歴の少なくとも何れかに基づいて、当該入力された定型文に対応する音声データを通信に先立って取得するか否かを判断する判断手段である。判断部４３は、第１実施形態の判断部１３と同様に判断を行う。 The determining unit 43 converts the voice data corresponding to the input fixed phrase based on at least one of the state of the own terminal 40, the fixed phrase itself input by the input unit 42, and the communication history in the own terminal 40. This is a determining means for determining whether or not to acquire the information before communication. The determination unit 43 makes a determination in the same manner as the determination unit 13 of the first embodiment.

また、本実施形態では、上記の判断に代えて、あるいは加えて、判断部４３は、自端末４０に記憶されている、定型文の送信元の通信端末５０に係る情報に基づいて、定型文に対応する音声データを通信に先立って取得するか否かを判断することとしてもよい。自端末４０に記憶されている通信端末５０に係る情報は、例えば、電話帳（アドレス帳）、通信の履歴及び電子メールの送受信履歴である。 Further, in the present embodiment, instead of or in addition to the above determination, the determination unit 43 determines the fixed form text based on the information related to the communication terminal 50 that has transmitted the fixed form sentence stored in the own terminal 40. It is also possible to determine whether or not to acquire the audio data corresponding to. The information related to the communication terminal 50 stored in the own terminal 40 is, for example, a telephone directory (address book), a communication history, and an e-mail transmission / reception history.

判断部４３は、入力部４２から入力された定型文の送信元の通信端末５０を示す情報が、自端末４０に記憶されているものか否かを判断して、当該判断に基づき音声データを通信に先立って取得するか否かを判断する。例えば、定型文の送信先の通信端末５０が電話帳に登録されている、当該通信端末５０との通信履歴が存在する、当該通信端末５０との間で電子メールの送信又は受信が行われている場合、判断部４３は、音声データを通信に先立って取得すると判断する。また、第１実施形態における通信の履歴（通信の回数）を用いた判断において、定型文の送信元の通信端末５０に係る履歴のみを用いて判断することとしてもよい。 The determination unit 43 determines whether or not the information indicating the communication terminal 50 of the transmission source of the fixed phrase input from the input unit 42 is stored in the own terminal 40, and based on the determination, converts the voice data. It is determined whether or not to acquire prior to communication. For example, the communication terminal 50 that is the transmission destination of the fixed phrase is registered in the telephone directory, the communication history with the communication terminal 50 exists, and the transmission or reception of the electronic mail with the communication terminal 50 is performed. If there is, the determination unit 43 determines that the audio data is acquired prior to the communication. Further, in the determination using the communication history (the number of times of communication) in the first embodiment, the determination may be made using only the history relating to the communication terminal 50 that is the transmission source of the fixed phrase.

また、上記以外にも、定型文の送信元の通信端末５０に係る情報を用いた判断を行うこととしてもよい。例えば、自端末４０において記憶されている電子メールの本文に、当該通信端末５０のユーザを示す記載がある場合に音声データを通信に先立って取得すると判断する。例えば、電子メール中に「通信端末５０保有者に聞いてみる」という記載がある場合である。また、公開されている通信端末４０のユーザのスケジュールを、定型文の送信元の通信端末５０のユーザが参照した場合に音声データを通信に先立って取得すると判断する。この場合、例えば、スケジュールを参照したユーザが通信端末４０に通知される。 Further, in addition to the above, the determination using the information related to the communication terminal 50 of the transmission source of the fixed phrase may be performed. For example, when the body of the e-mail stored in the own terminal 40 includes a description indicating the user of the communication terminal 50, it is determined that the voice data is to be obtained prior to the communication. For example, there is a case in which an e-mail includes a description "Ask the owner of the communication terminal 50". In addition, when the user of the communication terminal 50 that has transmitted the fixed phrase refers to the published schedule of the user of the communication terminal 40, it is determined that the voice data is to be acquired prior to the communication. In this case, for example, the user who referred to the schedule is notified to the communication terminal 40.

また、通信端末４０のユーザが宅配業者のドライバーであって、通信端末４０に荷物の届け先として、定型文の送信元の通信端末５０のユーザの電話番号が登録されている場合に音声データを通信に先立って取得すると判断する。 Also, when the user of the communication terminal 40 is a driver of a courier service and the telephone number of the user of the communication terminal 50 of the sender of the fixed phrase is registered as the delivery destination of the package in the communication terminal 40, the voice data is transmitted. Is determined to be obtained prior to

また、例えば、電話帳に格納されている、定型文の送信元の通信端末５０のユーザの属性（例えば、肩書）に基づいて判断を行うこととしてもよい。例えば、通信端末５０のユーザの肩書が上司である場合に音声データを通信に先立って取得すると判断する。なお、上記の属性は、従来の技術等を用いて、例えば、自端末４０において記憶されている電子メールから判断されてもよい。また、ユーザの属性と定型文との組み合わせ毎に、予め音声データを取得すべきか否かの設定をしておき、当該設定に基づいて判断してもよい。 Further, for example, the determination may be made based on an attribute (for example, a title) of the user of the communication terminal 50 that is the transmission source of the fixed phrase stored in the telephone directory. For example, when the title of the user of the communication terminal 50 is the boss, it is determined that the voice data is acquired prior to the communication. The above attribute may be determined from, for example, an e-mail stored in the own terminal 40 using a conventional technique or the like. Alternatively, for each combination of the attribute of the user and the fixed phrase, whether or not to acquire the voice data may be set in advance, and the determination may be made based on the setting.

上記のように定型文の送信元の通信端末５０に応じて判断することとすれば、通話の可能性が低いユーザからの定型文の通知に対しては音声データを取得しないこととすることができる。 As described above, if the determination is made according to the communication terminal 50 of the transmission source of the fixed phrase, the voice data may not be obtained in response to the notification of the fixed phrase from the user who is unlikely to talk. it can.

判断部４３は、入力された定型文に対応する音声データを通信に先立って取得すると判断した場合、当該定型文を取得部４４に出力する。 When determining that the voice data corresponding to the input fixed phrase is to be acquired prior to the communication, the determining unit 43 outputs the fixed phrase to the acquiring unit 44.

取得部４４は、判断部４３による判断に基づいて、定型文に対応する音声データを音声合成装置３０から取得する取得手段である。取得部４４は、第１実施形態の取得部１４と同様に音声データを音声合成装置３０から取得する。取得部４４は、定型文及び取得した音声データを対応付けて変換部４１に出力し、変換部４１に記憶させる。なお、入力された定型文の送信元の通信端末５０に応じた音声データを取得することとしてもよい。例えば、通信端末５０のユーザの性別に応じて、音声データを取得することとしてもよい。即ち、通信端末５０のユーザが男性であれば男性の声の音声データを、通信端末５０のユーザが女性であれば女性の声の音声データを取得してもよい。通信端末５０に応じた音声データを取得した場合、変換部４１においても、通信相手の通信端末５０に応じて音声データへの変換を行う。以上が、本実施形態に係る通信端末４０の機能である。 The acquisition unit 44 is an acquisition unit that acquires speech data corresponding to a fixed phrase from the speech synthesis device 30 based on the determination by the determination unit 43. The acquisition unit 44 acquires speech data from the speech synthesis device 30 in the same manner as the acquisition unit 14 of the first embodiment. The acquisition unit 44 outputs the fixed phrase and the acquired voice data to the conversion unit 41 in association with each other, and causes the conversion unit 41 to store the same. It should be noted that voice data corresponding to the communication terminal 50 of the transmission source of the input fixed phrase may be obtained. For example, audio data may be acquired according to the gender of the user of the communication terminal 50. That is, if the user of the communication terminal 50 is a male, voice data of a male voice may be obtained, and if the user of the communication terminal 50 is a female, voice data of a female voice may be obtained. When the audio data corresponding to the communication terminal 50 is acquired, the conversion unit 41 also performs conversion to the audio data according to the communication terminal 50 of the communication partner. The above is the function of the communication terminal 40 according to the present embodiment.

図５に示すように通信端末５０は、入力部５１と、送信部５２とを備えて構成される。 As shown in FIG. 5, the communication terminal 50 includes an input unit 51 and a transmission unit 52.

入力部５１は、通信に先立って、音声データへの変換の対象となるテキストである定型文を入力する入力手段である。入力部５１は、第１実施形態の入力部１２と同様に定型文を入力する。入力部５１は、入力した定型文を送信部５２に出力する。 The input unit 51 is an input unit for inputting a fixed phrase, which is a text to be converted into voice data, prior to communication. The input unit 51 inputs a fixed phrase in the same manner as the input unit 12 of the first embodiment. The input unit 51 outputs the input fixed phrase to the transmission unit 52.

送信部５２は、通信に先立って、入力部５１によって入力された定型文を通信相手の通信端末４０に送信する送信手段である。送信先の通信端末４０は、例えば、自端末５０の電話帳に登録されている通信端末４０とすることができる。あるいは、定型文を利用した通信が可能な通信端末４０が自端末５０において把握されている場合には、それらの通信端末４０としてもよい。 The transmitting unit 52 is a transmitting unit that transmits the fixed text input by the input unit 51 to the communication terminal 40 of the communication partner prior to the communication. The communication terminal 40 of the transmission destination can be, for example, the communication terminal 40 registered in the telephone directory of the own terminal 50. Alternatively, when communication terminals 40 capable of performing communication using fixed phrases are known in the own terminal 50, those communication terminals 40 may be used.

また、送信部５２は、送信先の通信端末４０及び送信の可否を判断することとしてもよい。例えば、送信部５２は、自端末５０に記憶されている通信相手の通信端末４０に係る情報に基づいて、定型文を当該通信相手の通信端末４０に送信するか否かを判断することとしてもよい。この判断は、例えば、自端末５０の電話帳に登録されている通信端末４０毎に行う。具体的には、送信部５２は、判断部４３と同様の判断を行う。判断部４３によって音声データを通信に先立って取得すると判断される条件と同様の条件の場合、送信部５２は、当該通信端末４０に対して定型文を送信すると判断する。 Further, the transmission unit 52 may determine the communication terminal 40 of the transmission destination and whether or not transmission is possible. For example, the transmission unit 52 may determine whether to transmit a fixed phrase to the communication terminal 40 of the communication partner based on information related to the communication terminal 40 of the communication partner stored in the own terminal 50. Good. This determination is made for each communication terminal 40 registered in the telephone directory of the own terminal 50, for example. Specifically, the transmission unit 52 makes the same determination as the determination unit 43. In the case of a condition similar to the condition determined by the determination unit 43 to acquire voice data prior to communication, the transmission unit 52 determines to transmit a fixed phrase to the communication terminal 40.

送信部５２によって送信された定型文は、通信端末５０に記憶されて通信時に利用される。具体的には、通信端末５０は、通信時に定型文を指定するユーザの操作を受け付けて、当該定型文を示す情報を通信相手の通信端末４０に送信する。以上が、本実施形態に係る通信端末５０の機能である。 The fixed text sent by the transmission unit 52 is stored in the communication terminal 50 and used during communication. Specifically, the communication terminal 50 receives a user operation for designating a fixed phrase at the time of communication, and transmits information indicating the fixed phrase to the communication terminal 40 of the communication partner. The above is the function of the communication terminal 50 according to the present embodiment.

引き続いて、図６及び図７のフローチャートを用いて、通信端末４０，５０で実行される処理を説明する。まず、図６のフローチャートを用いて、通信端末５０において、定型文が通信端末４０に送信される際の処理を説明する。 Subsequently, processing executed by the communication terminals 40 and 50 will be described with reference to the flowcharts of FIGS. 6 and 7. First, a process when a fixed phrase is transmitted to the communication terminal 40 in the communication terminal 50 will be described using the flowchart of FIG.

本処理では、入力部５１によって、通信に先立って、音声データへの変換の対象となるテキストである定型文が入力される（Ｓ２１）。入力された定型文は、入力部５１から送信部５２に出力される。続いて、送信部５２によって、送信先の通信端末４０及び送信の可否が判断される（Ｓ２２）。 In this processing, the input unit 51 inputs a fixed phrase, which is a text to be converted into voice data, prior to communication (S21). The input fixed phrase is output from the input unit 51 to the transmission unit 52. Subsequently, the transmission unit 52 determines the communication terminal 40 of the transmission destination and whether transmission is possible (S22).

送信先がないと判断された場合（Ｓ２３のＮＯ）は、本処理は終了する。送信先があると判断された場合（Ｓ２３のＹＥＳ）は、定型文が、送信部５２から当該送信先の通信端末４０に送信される（Ｓ２４）。送信された定型文は、通信端末５０に記憶される。以上が、定型文が送信される際の処理である。 If it is determined that there is no transmission destination (NO in S23), this processing ends. When it is determined that there is a transmission destination (YES in S23), the fixed phrase is transmitted from the transmission unit 52 to the communication terminal 40 of the transmission destination (S24). The transmitted fixed phrase is stored in the communication terminal 50. The above is the processing when a fixed phrase is transmitted.

定型文が送信されると送信先の通信端末４０では、音声データが取得されて保存される際の処理が行われる。本処理は、以下の点を除いて、上述した第１実施形態の図３のフローチャートを用いて説明した処理と同様に行われる。本処理では、入力部４２によって、通信端末５０から送信された定型文が受信されて入力される。また、本処理では、判断部４３によって、定型文の送信元の通信端末５０に応じた判断が行われる。 When the fixed phrase is transmitted, the communication terminal 40 of the transmission destination performs a process when the voice data is acquired and stored. This processing is performed in the same manner as the processing described with reference to the flowchart of FIG. 3 of the above-described first embodiment, except for the following points. In this process, the input unit 42 receives and inputs the fixed phrase sent from the communication terminal 50. In this processing, the determination unit 43 makes a determination according to the communication terminal 50 that is the transmission source of the fixed phrase.

引き続いて、図７のフローチャートを用いて、保存された音声データが用いられて通信が行われる際の、通信端末４０での処理を説明する。本処理は、通信端末４０，５０間で通信接続が確立されており、通信端末５０から情報が送信された場合の処理である。 Subsequently, processing in the communication terminal 40 when communication is performed using the stored audio data will be described with reference to the flowchart in FIG. This process is a process performed when a communication connection is established between the communication terminals 40 and 50 and information is transmitted from the communication terminal 50.

通信端末４０では、通信端末５０から送信に応じた受信が行われる（Ｓ３１）。当該受信が、音声データの受信である場合（Ｓ３２のＹＥＳ）、当該音声データに基づく音声出力が行われる（Ｓ３３）。当該受信が音声データの受信でない場合（Ｓ３２のＮＯ）、当該受信はテキストに係る情報の受信であり、当該情報は変換部４１に入力される。 In the communication terminal 40, reception according to the transmission from the communication terminal 50 is performed (S31). If the reception is the reception of audio data (YES in S32), an audio output based on the audio data is performed (S33). When the reception is not the reception of the audio data (NO in S32), the reception is the reception of the information related to the text, and the information is input to the conversion unit 41.

当該情報が、変換部４１に音声データに対応付けられて記憶されている定型文を指定するものであった場合（Ｓ３４のＹＥＳ）、変換部４１によって、指定された定型文に対応付けられて記憶している音声データが読み出されて、取得される（Ｓ３５）。続いて、変換部４１によって、取得された音声データに基づく音声出力が行われる（Ｓ４３）。 If the information specifies a fixed phrase stored in the conversion unit 41 in association with the voice data (YES in S34), the conversion unit 41 associates the fixed phrase with the specified fixed phrase. The stored audio data is read and obtained (S35). Subsequently, the conversion unit 41 performs audio output based on the acquired audio data (S43).

当該情報が、変換部４１に音声データに対応付けられて記憶されている定型文を指定するものでなかった場合（Ｓ３４のＮＯ）、当該情報は、変換部４１に音声データに対応付けられて記憶されている定型文以外のテキストである。その場合、変換部４１によって、入力されたテキストに対応する音声データが、音声合成装置３０からダウンロードされて取得される（Ｓ３６）。続いて、変換部４１によって、取得された音声データに基づく音声出力が行われる（Ｓ３３）。以上が、保存された音声データが用いられて通信が行われる際の、通信端末４０での処理である。 If the information does not specify the fixed phrase stored in the conversion unit 41 in association with the voice data (NO in S34), the information is associated with the voice data in the conversion unit 41. This is a text other than the stored fixed phrase. In this case, the conversion unit 41 downloads and acquires the speech data corresponding to the input text from the speech synthesis device 30 (S36). Subsequently, the conversion unit 41 performs audio output based on the acquired audio data (S33). The above is the processing in the communication terminal 40 when communication is performed using the stored voice data.

上述したように通信が行われる際に定型文に係る情報が送受信される態様においても、本実施形態の通信端末４０によれば、通信に利用される音声データの取得を適切に行うことができる。 As described above, even in a mode in which information related to fixed phrases is transmitted and received when communication is performed, according to the communication terminal 40 of the present embodiment, voice data used for communication can be appropriately acquired. .

また、本実施形態にように定型文の送信元の通信端末４０に応じて判断を行うこととすれば、定型文の送信元の通信端末４０に応じて音声データの取得を適切に行うことができる。例えば、上述したように、通話の可能性が低いユーザからの定型文の通知に対しては音声データを取得しないことで、データ通信料の削減等が可能となる。 In addition, if the determination is made according to the communication terminal 40 that is the source of the fixed phrase as in the present embodiment, it is possible to appropriately obtain the voice data according to the communication terminal 40 that is the source of the fixed phrase. it can. For example, as described above, by not acquiring voice data in response to a notification of a fixed phrase from a user who is unlikely to make a call, data communication fees can be reduced.

また、本実施形態の定型文を送信する通信端末５０によれば、適切かつ確実に音声データを取得する通信端末４０に定型文を入力させることができる。また、送信先の通信端末４０に応じて定型文の送信可否を判断することとすれば、通信相手の通信端末４０に応じてテキストの送信を適切に行うことができる。 Further, according to the communication terminal 50 of the present embodiment for transmitting a fixed sentence, the fixed sentence can be input to the communication terminal 40 that acquires voice data appropriately and reliably. Further, if it is determined whether or not a fixed phrase can be transmitted according to the communication terminal 40 of the transmission destination, the text can be appropriately transmitted according to the communication terminal 40 of the communication partner.

また、本実施形態では、通信端末４０において記憶される音声データは、通信端末４０とは別の通信端末５０から送信された定型文に応じたものであったが、それ以外の定型文に応じたものであってもよい。例えば、第１実施形態と同様に通信端末４０のユーザの自端末４０に対するテキストを入力する操作を受け付けて定型文を入力することとしてもよい。 Further, in the present embodiment, the voice data stored in the communication terminal 40 corresponds to the fixed phrase transmitted from the communication terminal 50 different from the communication terminal 40, but may correspond to the fixed phrase transmitted from other communication terminals. May be used. For example, similarly to the first embodiment, a user of the communication terminal 40 may receive an operation of inputting a text to the own terminal 40 and input a fixed phrase.

また、第１実施形態の通信端末１０の機能、並びに第２実施形態の通信端末４０の機能及び通信端末５０の機能は、それぞれ組み合わせられて１つの通信端末として実現されてもよい。 Further, the function of the communication terminal 10 of the first embodiment, and the function of the communication terminal 40 and the function of the communication terminal 50 of the second embodiment may be combined and realized as one communication terminal.

１０，２０，４０，５０…通信端末、１１…変換部、１２…入力部、１３…判断部、１４…取得部、１０１…ＣＰＵ、１０２…ＲＡＭ、１０３…ＲＯＭ、１０４…操作モジュール、１０５…無線通信モジュール、１０６…アンテナ、１０７…マイク、１０８…スピーカ、１０９…ディスプレイ、３０…音声合成装置、４１…変換部、４２…入力部、４３…判断部、４４…取得部、５１…入力部、５２…送信部、Ｎ…通信網。 10, 20, 40, 50 communication terminal, 11 conversion unit, 12 input unit, 13 determination unit, 14 acquisition unit, 101 CPU, 102 RAM, 103 ROM, 104 operation module, 105 Wireless communication module, 106 antenna, 107 microphone, 108 speaker, 109 display, 30 voice synthesizer, 41 conversion unit, 42 input unit, 43 determination unit, 44 acquisition unit, 51 input unit , 52... A transmission unit, N... A communication network.

Claims

A communication terminal that converts text to voice data and uses the converted voice data for communication with another communication terminal,
Prior to the communication, input means for inputting a text to be converted to voice data,
Based on at least one of the state of the own terminal, the text itself input by the input means, and the communication history of the own terminal, whether or not to acquire voice data corresponding to the input text prior to communication Determining means for determining
Acquiring means for acquiring, from another device, audio data corresponding to the text input by the input means, based on the determination by the determining means;
Equipped with a,
The determining means determines that the voice data corresponding to the text is obtained prior to communication when there is an incoming call to the own terminal,
The communication terminal , wherein the obtaining unit performs control to prohibit establishment of communication regarding the incoming call until voice data corresponding to the text is obtained from another device .

A communication terminal that converts text to voice data and uses the converted voice data for communication with another communication terminal,
Prior to the communication, input means for inputting a text to be converted to voice data,
Based on at least one of the state of the own terminal, the text itself input by the input means, and the communication history of the own terminal, whether or not to acquire voice data corresponding to the input text prior to communication Determining means for determining
Acquiring means for acquiring, from another device, audio data corresponding to the text input by the input means, based on the determination by the determining means;
Equipped with a,
A communication terminal that determines whether or not to acquire voice data corresponding to the text prior to communication based on the number of times of communication indicated in the communication history .

A communication terminal that converts text to voice data and uses the converted voice data for communication with another communication terminal,
Prior to the communication, input means for inputting a text to be converted to voice data,
Based on at least one of the state of the own terminal, the text itself input by the input means, and the communication history of the own terminal, whether or not to acquire voice data corresponding to the input text prior to communication Determining means for determining
Acquiring means for acquiring, from another device, audio data corresponding to the text input by the input means, based on the determination by the determining means;
Equipped with a,
When performing communication, a text specification is accepted, the specified text is acquired by the acquisition unit, converted into audio data corresponding to the designated text, and transmitted to the communication terminal of the communication partner. A communication terminal further comprising a conversion unit .

A communication terminal that converts text to voice data and uses the converted voice data for communication with another communication terminal,
Prior to the communication, input means for inputting a text to be converted to voice data,
Based on at least one of the state of the own terminal, the text itself input by the input means, and the communication history of the own terminal, whether or not to acquire voice data corresponding to the input text prior to communication Determining means for determining
Acquiring means for acquiring, from another device, audio data corresponding to the text input by the input means, based on the determination by the determining means;
Equipped with a,
When performing communication, information indicating a text is received from a communication terminal of a communication partner, and a text indicated by the received information is acquired by the acquisition unit, and a voice corresponding to the text indicated by the received information is acquired. A communication terminal further comprising conversion means for converting data into data and outputting audio based on the converted audio data .

A communication terminal that converts text to voice data and uses the converted voice data for communication with another communication terminal,
Prior to the communication, input means for inputting a text to be converted to voice data,
Based on at least one of the state of the own terminal, the text itself input by the input means, and the communication history of the own terminal, whether or not to acquire voice data corresponding to the input text prior to communication Determining means for determining
Acquiring means for acquiring, from another device, audio data corresponding to the text input by the input means, based on the determination by the determining means;
Equipped with a,
The input means receives a text from another communication terminal,
The determining means determines whether or not to acquire voice data corresponding to the text prior to communication based on information related to the another communication terminal as a text transmission source stored in the own terminal. , Communication terminal.

A communication terminal that performs communication with a communication terminal of a communication partner that converts text to voice data and uses the converted voice data for communication with another communication terminal,
Prior to the communication, input means for inputting a text to be converted to voice data,
Prior to the communication, transmitting means for transmitting the text input by the input means to the communication terminal of the communication partner,
A communication terminal comprising:

The communication according to claim 6 , wherein the transmitting unit determines whether to transmit the text to the communication terminal of the communication partner based on information on the communication terminal of the communication partner stored in the own terminal. Terminal.