JP2015018327A

JP2015018327A - Terminal device, communication system, communication method, and program

Info

Publication number: JP2015018327A
Application number: JP2013143750A
Authority: JP
Inventors: 千晶森田; Chiaki Morita; 耕太郎永瀬; Kotaro Nagase; 山本　浩之; Hiroyuki Yamamoto; 浩之山本; 和成鈴木; Kazunari Suzuki; 佐藤　大輔; Daisuke Sato; 大輔佐藤
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2013-07-09
Filing date: 2013-07-09
Publication date: 2015-01-29
Anticipated expiration: 2033-07-09
Also published as: JP6120708B2

Abstract

PROBLEM TO BE SOLVED: To provide a terminal device capable of maintaining communications with a user even when the terminal device cannot communicate with another device.SOLUTION: A terminal device 10 collects voice of a question sentence uttered by a user. The terminal device 10 transmits the user's voice signal to a server device 20 when it can communicate with the server device 20. The server device 20 generates information corresponding to the voice signal and transmits the information to the terminal device 10, which outputs the transmitted information by voice and in writing. The terminal device 10 outputs words for communications with the user by voice and in writing if it cannot communicate with the server device 20 when it collects the voice of the question sentence uttered by the user.

Description

本発明は、ユーザからの情報に応答する技術に関する。 The present invention relates to a technique for responding to information from a user.

特許文献１には、複数のエージェントによりサービスを受ける技術が開示されている。特許文献１に開示されたシステムにおいては、ローカルネットワークに接続されたＰＣ（Personal Computer）でＭＹエージェントが動作する。また、ローカルネットワークが接続されたインターネットには、サービスエージェントが稼働する機器が接続されている。ＰＣに対してユーザが発話すると、ＰＣで稼働しているＭＹエージェントが応答する。また、ＰＣに対してユーザが発話すると、必要に応じてインターネットに接続された機器で稼働しているサービスエージェントが応答する。 Patent Document 1 discloses a technique for receiving services by a plurality of agents. In the system disclosed in Patent Document 1, an MY agent operates on a PC (Personal Computer) connected to a local network. In addition, a device on which a service agent operates is connected to the Internet to which a local network is connected. When the user speaks to the PC, the MY agent operating on the PC responds. When the user speaks to the PC, a service agent operating on a device connected to the Internet responds as necessary.

特開２００８−９０５４５号公報JP 2008-90545 A

ところで、特許文献１におけるＰＣがスマートフォンであり、移動体通信網を介してサービスエージェントと通信を行う場合、例えばスマートフォンが電車で移動中のときにはスマートフォンが圏外となり、インターネットに接続されている機器とスマートフォンとが通信を行えなくなる場合がある。この場合、ユーザが操作するスマートフォンが、インターネットに接続された機器で動作しているサービスエージェントと通信を行えなくなり、ユーザの発話に対して応答が得られないという事態が生じてしまう。 By the way, when PC in patent document 1 is a smart phone and communicates with a service agent via a mobile communication network, for example, when the smart phone is moving by train, the smart phone is out of service area, and the device and the smart phone connected to the Internet. May not be able to communicate with each other. In this case, the smartphone operated by the user cannot communicate with the service agent operating on the device connected to the Internet, and a situation in which a response to the user's utterance cannot be obtained occurs.

本発明は、上述した背景の下になされたものであり、端末装置が他の装置と通信を行えない状態にあっても、ユーザに対してコミュニケーションを継続する技術を提供することを目的とする。 The present invention has been made under the above-described background, and an object of the present invention is to provide a technology for continuing communication to a user even when a terminal device cannot communicate with other devices. .

本発明は、ユーザからの第１情報を取得する取得手段と、サーバ装置との通信が不可である場合、前記取得手段が前記第１情報を取得すると、前記ユーザとコミュニケーションを図る第２情報を出力する第１出力手段と、前記サーバ装置との通信が可である場合、前記取得手段が前記第１情報を取得すると、前記第１情報を前記サーバ装置へ送信する第１送信手段と、前記第１送信手段が送信した前記第１情報への応答として前記サーバ装置から送信された第３情報を受信し、受信した第３情報を出力する第２出力手段とを有する端末装置を提供する。 In the present invention, when the acquisition means for acquiring the first information from the user and communication with the server device are impossible, the second information for communicating with the user is obtained when the acquisition means acquires the first information. When communication between the first output means for outputting and the server apparatus is possible, when the acquisition means acquires the first information, the first transmission means for transmitting the first information to the server apparatus; There is provided a terminal device comprising: second output means for receiving third information transmitted from the server apparatus as a response to the first information transmitted by the first transmission means, and outputting the received third information.

また、本発明は、端末装置とサーバ装置とを備えるコミュニケーションシステムであって、前記端末装置は、ユーザからの第１情報を取得する取得手段と、前記サーバ装置との通信が不可である場合、前記取得手段が前記第１情報を取得すると、前記ユーザとコミュニケーションを図る第２情報を出力する第１出力手段と、前記サーバ装置との通信が可である場合、前記取得手段が前記第１情報を取得すると、前記第１情報を前記サーバ装置へ送信する第１送信手段と、前記第１送信手段が送信した前記第１情報への応答として前記サーバ装置から送信された第３情報を受信し、受信した第３情報を出力する第２出力手段とを有し、前記サーバ装置は、前記第１送信手段から送信された前記第１情報を受信する受信手段と、前記受信手段が受信した前記第１情報に対応した第３情報を生成する生成手段と、前記第３情報を前記端末装置へ送信する第２送信手段と、を有するコミュニケーションシステムを提供する。 Moreover, this invention is a communication system provided with a terminal device and a server apparatus, Comprising: When the said terminal device cannot communicate with the acquisition means which acquires the 1st information from a user, and the said server device, When the acquisition unit acquires the first information, when the first output unit that outputs second information for communicating with the user and communication with the server device are possible, the acquisition unit includes the first information. The first transmission means for transmitting the first information to the server apparatus, and the third information transmitted from the server apparatus as a response to the first information transmitted by the first transmission means. And second output means for outputting the received third information, wherein the server device receives the first information transmitted from the first transmission means, and the reception means receives the first information. Providing a communication system comprising generating means for generating a to third information corresponding to the first information, and second transmission means for transmitting the third information to the terminal device.

本発明においては、前記端末装置は、前記サーバ装置との通信が不可である場合、前記取得手段が前記第１情報を取得すると、当該第１情報を記憶手段に記憶させる制御手段を有し、前記第１送信手段は、前記サーバ装置との通信が不可の状態から可の状態に変化した場合、前記記憶手段に記憶された前記第１情報を前記サーバ装置へ送信する構成としてもよい。 In the present invention, the terminal device includes a control unit that stores the first information in the storage unit when the acquisition unit acquires the first information when communication with the server device is impossible. The first transmission unit may be configured to transmit the first information stored in the storage unit to the server device when communication with the server device is changed from a disabled state to an enabled state.

また、本発明においては、前記第１送信手段は、予め定められたタイミングで前記サーバ装置との通信が可である場合、予め定められた第１情報を前記サーバ装置へ送信する構成としてもよい。 In the present invention, the first transmission means may be configured to transmit predetermined first information to the server device when communication with the server device is possible at a predetermined timing. .

また、本発明においては、前記第１情報は、ユーザが発した音声を示す音声信号又は当該音声の音声認識結果を示すテキストデータである構成としてもよい。 In the present invention, the first information may be a voice signal indicating a voice uttered by a user or text data indicating a voice recognition result of the voice.

また、本発明においては、前記第１出力手段は、前記第２情報を音声で出力し、前記第２出力手段は、前記第３情報を音声で出力する構成としてもよい。 In the present invention, the first output unit may output the second information by voice, and the second output unit may output the third information by voice.

また、本発明においては、前記第１出力手段が出力する音声と前記第２出力手段が出力する音声とが異なる構成としてもよい。 In the present invention, the sound output from the first output means may be different from the sound output from the second output means.

また、本発明は、ユーザからの第１情報を取得する取得ステップと、サーバ装置との通信が不可である場合、前記取得ステップで前記第１情報を取得すると、前記ユーザとコミュニケーションを図る第２情報を出力する第１出力ステップと、前記サーバ装置との通信が可である場合、前記取得ステップで前記第１情報を取得すると、前記第１情報を前記サーバ装置へ送信する送信ステップと、前記送信ステップで送信した前記第１情報への応答として前記サーバ装置から送信された第３情報を受信し、受信した第３情報を出力する第２出力ステップとを有するコミュニケーション方法を提供する。 Further, according to the present invention, when the acquisition step of acquiring the first information from the user and the communication with the server device are impossible, the second step of communicating with the user is acquired when the first information is acquired in the acquisition step. A first output step for outputting information; and a transmission step for transmitting the first information to the server device upon acquiring the first information in the obtaining step when communication with the server device is possible; There is provided a communication method including a second output step of receiving third information transmitted from the server device as a response to the first information transmitted in the transmitting step and outputting the received third information.

また、本発明は、コンピュータを、ユーザからの第１情報を取得する取得手段と、サーバ装置との通信が不可である場合、前記取得手段が前記第１情報を取得すると、前記ユーザとコミュニケーションを図る第２情報を出力する第１出力手段と、前記サーバ装置との通信が可である場合、前記取得手段が前記第１情報を取得すると、前記第１情報を前記サーバ装置へ送信する第１送信手段と、前記第１送信手段が送信した前記第１情報への応答として前記サーバ装置から送信された第３情報を受信し、受信した第３情報を出力する第２出力手段として機能させるためのプログラムを提供する。 Further, according to the present invention, when communication between the acquisition unit for acquiring the first information from the user and the server device is impossible, the computer communicates with the user when the acquisition unit acquires the first information. When communication between the first output means for outputting the second information and the server device is possible, when the acquisition means acquires the first information, the first information is transmitted to the server device. To function as transmission means and second output means for receiving third information transmitted from the server device as a response to the first information transmitted by the first transmission means and outputting the received third information Provide a program.

本発明によれば、端末装置が他の装置と通信を行えない状態にあっても、ユーザに対してコミュニケーションを継続することができる。 ADVANTAGE OF THE INVENTION According to this invention, even if it is in the state which a terminal device cannot communicate with another apparatus, communication can be continued with respect to a user.

コミュニケーションシステム１を構成する装置を示した図。The figure which showed the apparatus which comprises the communication system. 端末装置１０のハードウェア構成を示したブロック図。The block diagram which showed the hardware constitutions of the terminal device 10. 端末装置１０において実現する機能の構成を示したブロック図。The block diagram which showed the structure of the function implement | achieved in the terminal device. サーバ装置２０のハードウェア構成を示したブロック図。The block diagram which showed the hardware constitutions of the server apparatus 20. FIG. サーバ装置２０において実現する機能の構成を示したブロック図。The block diagram which showed the structure of the function implement | achieved in the server apparatus. 制御部１０１が行う処理の流れを示したフローチャート。The flowchart which showed the flow of the process which the control part 101 performs. 制御部１０１が行う処理の流れを示したフローチャート。The flowchart which showed the flow of the process which the control part 101 performs. 制御部２０１が行う処理の流れを示したフローチャート。The flowchart which showed the flow of the process which the control part 201 performs. 端末装置１０が表示する画面の一例を示した図。The figure which showed an example of the screen which the terminal device 10 displays. 変形例で表示される画像の一例を示した図。The figure which showed an example of the image displayed by a modification.

［実施形態］
（全体構成）
図１は、本発明の一実施形態に係るコミュニケーションシステム１を構成する装置を示した図である。コミュニケーションシステム１は、ユーザから与えられた情報に応答し、与えられた情報に対応した情報をユーザに提示するシステムである。本実施形態に係るコミュニケーションシステム１は、サーバ装置２０と端末装置１０とで構成されており、ユーザが端末装置１０に話しかけた音声（ユーザから与えられる情報の一例）に応答し、話しかけた音声の内容に対応した情報をユーザに提示する、即ち、ユーザと装置とが互いに情報を伝達してコミュニケーションを図るシステムである。 [Embodiment]
(overall structure)
FIG. 1 is a diagram showing an apparatus constituting a communication system 1 according to an embodiment of the present invention. The communication system 1 is a system that responds to information given by a user and presents information corresponding to the given information to the user. The communication system 1 according to the present embodiment includes a server device 20 and a terminal device 10, and responds to voice (an example of information given by the user) spoken to the terminal device 10 by the user, In this system, information corresponding to the content is presented to the user, that is, the user and the device communicate with each other by transmitting information.

通信網２は、スマートフォンなどの端末装置１０に音声通信やデータ通信などの通信サービスを提供する通信網である。通信網２は、インターネットや固定電話網、公衆無線ＬＡＮ（Local Area Network）なども含めることができる。サーバ装置２０と端末装置１０は、通信網２を介してデータ通信を行う。 The communication network 2 is a communication network that provides communication services such as voice communication and data communication to the terminal device 10 such as a smartphone. The communication network 2 can include the Internet, a fixed telephone network, a public wireless LAN (Local Area Network), and the like. The server device 20 and the terminal device 10 perform data communication via the communication network 2.

端末装置１０は、本実施形態においてはスマートフォンであり、通信網２を介して音声通信やデータ通信を行う。端末装置１０は、スマートフォンに限定されるものではなく、通信網２を介してデータ通信を行う機能を備えているものであれば、タブレットＰＣ、フィーチャーフォン又はＰＤＡ（Personal Digital Assistant）などであってもよい。本実施形態においては、端末装置１０は、ユーザから与えられた情報に応答するソフトウェアエージェントの機能が実現する。なお、コミュニケーションシステム１においては、端末装置１０は複数存在するが、図面が繁雑になるのを防ぐため、図１においては、一つの端末装置１０のみを示している。 The terminal device 10 is a smartphone in the present embodiment, and performs voice communication and data communication via the communication network 2. The terminal device 10 is not limited to a smartphone, and may be a tablet PC, a feature phone, a PDA (Personal Digital Assistant) or the like as long as it has a function of performing data communication via the communication network 2. Also good. In the present embodiment, the terminal device 10 realizes the function of a software agent that responds to information given by the user. In the communication system 1, there are a plurality of terminal devices 10, but only one terminal device 10 is shown in FIG. 1 to prevent the drawing from becoming complicated.

サーバ装置２０は、端末装置１０から送られた情報に対して応答するソフトウェアエージェントの機能を有する装置である。サーバ装置２０は、端末装置１０から送られた情報を解析し、送られた情報に対応した情報を取得して端末装置１０へ送信する。 The server device 20 is a device having a function of a software agent that responds to information sent from the terminal device 10. The server device 20 analyzes the information sent from the terminal device 10, acquires information corresponding to the sent information, and transmits it to the terminal device 10.

（端末装置１０の構成）
図２は、端末装置１０のハードウェア構成の一例を示したブロック図である。制御部１０１は、ＣＰＵ（Central Processing Unit）やＲＡＭ（Random Access Memory）、不揮発性メモリを備えており。ＣＰＵが不揮発性メモリに記憶されているプログラムを実行すると、スマートフォンの基本的な機能が実現する。 (Configuration of terminal device 10)
FIG. 2 is a block diagram illustrating an example of a hardware configuration of the terminal device 10. The control unit 101 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a nonvolatile memory. When the CPU executes a program stored in the nonvolatile memory, the basic functions of the smartphone are realized.

表示部１０３は、液晶ディスプレイを備えており、端末装置１０を操作するための画面や各種メッセージを表示する。操作部１０４は、端末装置１０を操作するための複数のキーを備えている。また、操作部１０４は、表示部１０３の表面に設けられ、表示部１０３が表示した画像を透過し、指が触れた位置を検出するタッチパネルを備えている。通信部１０５は、通信網２を介して通信を行う通信インターフェースとして機能し、各種情報の受信や送信を行う。 The display unit 103 includes a liquid crystal display, and displays a screen for operating the terminal device 10 and various messages. The operation unit 104 includes a plurality of keys for operating the terminal device 10. In addition, the operation unit 104 includes a touch panel that is provided on the surface of the display unit 103 and detects a position touched by a finger through the image displayed on the display unit 103. The communication unit 105 functions as a communication interface that performs communication via the communication network 2 and receives and transmits various types of information.

音声処理部１０７は、マイクロホンとスピーカを有している。音声処理部１０７は、端末装置１０同士が音声通話を行う場合、通話相手の音声に係るデジタル信号が通信部１０５から供給されると、供給されたデジタル信号をアナログ信号に変換する。このアナログ信号は、スピーカへ供給され、スピーカからは、通話相手の音声が放音される。また、音声処理部１０７は、マイクロホンが音声を収音すると、収音した音声をデジタル信号に変換する。音声処理部１０７は、端末装置１０が音声通話を行う場合、ユーザの音声を変換したデジタル信号を通信部１０５へ供給する。このデジタル信号は、通信部１０５から通信網２へ送信され、通話相手の端末装置１０へ送信される。また、音声処理部１０７は、制御部１０１が後述するアプリＡを実行している場合には、マイクロホンが収音したユーザの音声をデジタル信号に変換し、このデジタル信号を制御部１０１へ供給する。 The audio processing unit 107 has a microphone and a speaker. When the terminal device 10 performs a voice call between the terminal devices 10, when a digital signal related to the voice of the other party is supplied from the communication unit 105, the voice processing unit 107 converts the supplied digital signal into an analog signal. This analog signal is supplied to a speaker, and the voice of the other party is emitted from the speaker. In addition, when the microphone collects sound, the sound processing unit 107 converts the collected sound into a digital signal. When the terminal device 10 performs a voice call, the voice processing unit 107 supplies a digital signal obtained by converting the user's voice to the communication unit 105. This digital signal is transmitted from the communication unit 105 to the communication network 2 and transmitted to the terminal device 10 of the other party. Further, when the control unit 101 is executing application A described later, the voice processing unit 107 converts the user's voice collected by the microphone into a digital signal and supplies the digital signal to the control unit 101. .

記憶部１０２は、不揮発性メモリであり、各種アプリケーションプログラムを記憶する。本実施形態においては、記憶部１０２は、ユーザから与えられた情報に応答するソフトウェアエージェントの機能を実現するアプリケーションプログラム（以下、アプリＡと称する）を記憶している。制御部１０１のＣＰＵが、記憶部１０２に記憶されているアプリＡを実行すると、ユーザから与えられた情報に応答するソフトウェアエージェントの機能が実現する。ソフトウェアエージェントは、ユーザが端末装置１０に話しかけた音声（情報）に応答し、話しかけた音声の内容に対応した情報をサーバ装置２０と協働してユーザに提示する。 The storage unit 102 is a nonvolatile memory and stores various application programs. In the present embodiment, the storage unit 102 stores an application program (hereinafter referred to as an application A) that realizes a function of a software agent that responds to information given by a user. When the CPU of the control unit 101 executes the application A stored in the storage unit 102, a function of a software agent that responds to information given by the user is realized. The software agent responds to the voice (information) spoken to the terminal device 10 by the user and presents information corresponding to the content of the spoken voice to the user in cooperation with the server device 20.

また、記憶部１０２は、ソフトウェアエージェントがユーザの音声を認識するときに用いる音響モデルＭＡ１と、言語モデルＭＡ２とを記憶している。音響モデルＭＡ１は、音声の特徴量と音素との対応関係を示すモデルであり、音素がそれぞれどのような周波数特性を持っているかを表したものである。音響モデルとしては、周知の隠れマルコフモデルを用いている。言語モデルＭＡ２は、形態素の前後間のつながりやすさや音素の並び方に関する制約を表したものである。 The storage unit 102 also stores an acoustic model MA1 and a language model MA2 that are used when the software agent recognizes the user's voice. The acoustic model MA1 is a model that indicates the correspondence between the feature amount of the speech and the phoneme, and expresses what frequency characteristics each phoneme has. As the acoustic model, a well-known hidden Markov model is used. The language model MA2 represents restrictions on ease of connection between before and after morphemes and how phonemes are arranged.

図３は、アプリＡを実行した制御部１０１において実現する機能のうち、本発明に係る特徴的な機能の構成を示したブロック図である。取得手段１００１は、音声処理部１０７が出力したデジタル信号、即ち、マイクロフォンが収音したユーザの音声を表す信号を取得する手段である。第１送信手段１００２は、サーバ装置２０との通信が可能な場合、取得手段１００１が得たユーザの音声の信号を通信部１０５を制御してサーバ装置２０へ送信する手段である。第１出力手段１００３は、サーバ装置２０との通信が不可の場合、ユーザとのコミュニケーションを図る音声や文字列を出力する手段である。第２出力手段１００４は、第１送信手段が送信した信号への応答としてサーバ装置２０から送信された情報を受信し、受信した情報を音声や文字で出力する手段である。 FIG. 3 is a block diagram showing a configuration of characteristic functions according to the present invention among functions realized by the control unit 101 that executes the application A. The acquisition unit 1001 is a unit that acquires a digital signal output from the audio processing unit 107, that is, a signal representing the user's voice collected by the microphone. The first transmission unit 1002 is a unit that controls the communication unit 105 to transmit the user's voice signal obtained by the acquisition unit 1001 to the server device 20 when communication with the server device 20 is possible. The first output unit 1003 is a unit that outputs a voice or a character string for communication with the user when communication with the server device 20 is impossible. The second output unit 1004 is a unit that receives information transmitted from the server device 20 as a response to the signal transmitted by the first transmission unit, and outputs the received information by voice or text.

（サーバ装置２０の構成）
図４は、サーバ装置２０のハードウェア構成の一例を示したブロック図である。表示部２０３は、液晶ディスプレイを備えており、サーバ装置２０を操作するための画面や記憶部２０２に記憶されている情報などを表示する。操作部２０４は、キーボードやマウスを備えており、サーバ装置２０は、キーボードやマウスに行われた操作に応じて動作する。通信部２０５は、通信網２を介して通信を行う通信インターフェースとして機能する。通信部２０５は、通信網２を介して端末装置１０と情報のやり取りを行う。なお、本実施形態においては、サーバ装置２０は、表示部２０３と操作部２０４を備えているが、表示部２０３と操作部２０４を備えていない構成であってもよい。 (Configuration of server device 20)
FIG. 4 is a block diagram illustrating an example of a hardware configuration of the server device 20. The display unit 203 includes a liquid crystal display, and displays a screen for operating the server device 20, information stored in the storage unit 202, and the like. The operation unit 204 includes a keyboard and a mouse, and the server device 20 operates in accordance with operations performed on the keyboard and mouse. The communication unit 205 functions as a communication interface that performs communication via the communication network 2. The communication unit 205 exchanges information with the terminal device 10 via the communication network 2. In the present embodiment, the server device 20 includes the display unit 203 and the operation unit 204, but may be configured not to include the display unit 203 and the operation unit 204.

記憶部２０２は、ハードディスク装置を有しており、端末装置１０から送信された音声の情報に対応した情報を取得して端末装置１０へ送信するソフトウェアエージェントの機能を実現するプログラムを記憶している。また、記憶部２０２は、ソフトウェアエージェントがユーザの音声を認識するときに用いる音響モデルＭＢ１と、言語モデルＭＢ２とを記憶している。音響モデルＭＢ１も、音響モデルＭＡ１と同じく音声の特徴量と音素との対応関係を示すモデルである。また、言語モデルＭＢ２も、言語モデルＭＡ２と同じく形態素の前後間のつながりやすさや音素の並び方に関する制約を表したものである。 The storage unit 202 has a hard disk device, and stores a program that realizes the function of a software agent that acquires information corresponding to audio information transmitted from the terminal device 10 and transmits the information to the terminal device 10. . The storage unit 202 stores an acoustic model MB1 and a language model MB2 that are used when the software agent recognizes the user's voice. Similarly to the acoustic model MA1, the acoustic model MB1 is a model that indicates the correspondence between the feature amount of the speech and the phoneme. Similarly to the language model MA2, the language model MB2 represents restrictions on the ease of connection between morphemes and the arrangement of phonemes.

制御部２０１は、ＣＰＵ、ＲＯＭ及びＲＡＭを備えている。記憶部２０２に記憶されているプログラムが制御部２０１で実行されると、端末装置１０から送信された音声の情報に対応した情報を取得して端末装置１０へ送信するソフトウェアエージェントの機能が実現する。 The control unit 201 includes a CPU, a ROM, and a RAM. When the program stored in the storage unit 202 is executed by the control unit 201, a function of a software agent that acquires information corresponding to audio information transmitted from the terminal device 10 and transmits the information to the terminal device 10 is realized. .

図５は、サーバ装置２０において実現する機能のうち、本発明に係る機能の構成を示したブロック図である。受信手段２００１は、通信部２０５と協働し、端末装置１０が送信したユーザの音声を表す信号を受信する手段である。生成手段２００２は、受信手段２００１が受信した信号を解析し、当該信号に対応した情報を生成する手段である。第２送信手段２００３は、生成手段が生成した情報を端末装置１０へ送信する手段である。 FIG. 5 is a block diagram showing the configuration of the functions according to the present invention among the functions realized in the server device 20. The receiving unit 2001 is a unit that cooperates with the communication unit 205 to receive a signal representing the user's voice transmitted by the terminal device 10. The generation unit 2002 is a unit that analyzes the signal received by the reception unit 2001 and generates information corresponding to the signal. The second transmission unit 2003 is a unit that transmits the information generated by the generation unit to the terminal device 10.

（実施形態の動作例）
次に本実施形態の動作例について説明する。なお、以下の説明においては、まず端末装置１０とサーバ装置２０とが通信可能な第１状態にあるときの動作例について説明する。次に端末装置１０とサーバ装置２０とが通信できない第２状態の動作例と、第２状態から第１状態に変化したときの動作例について説明する。 (Operation example of embodiment)
Next, an operation example of this embodiment will be described. In the following description, first, an operation example when the terminal device 10 and the server device 20 are in the first state where communication is possible will be described. Next, an operation example in the second state in which the terminal device 10 and the server device 20 cannot communicate and an operation example when changing from the second state to the first state will be described.

（第１状態のときの動作例）
まず、端末装置１０のユーザは、例えば調べたいことがある場合、調べたいことを端末装置１０に話しかける。このユーザの音声は、音声処理部１０７のマイクロホンで収音される。音声処理部１０７は、マイクロホンが収音した音声をデジタル信号（以下、ユーザ音声信号と称する）に変換し、このユーザ音声信号を制御部１０１へ供給する。 (Operation example in the first state)
First, the user of the terminal device 10 speaks to the terminal device 10 that he / she wants to check, for example. The user's voice is picked up by the microphone of the voice processing unit 107. The sound processing unit 107 converts the sound collected by the microphone into a digital signal (hereinafter referred to as a user sound signal) and supplies the user sound signal to the control unit 101.

図６は、制御部１０１が行う処理の流れを示したフローチャートである。制御部１０１（取得手段１００１）は、音声処理部１０７から供給されたユーザ音声信号を取得する（ステップＳＡ１）。制御部１０１は、ユーザ音声信号を取得すると、端末装置１０が通信網２の無線基地局の圏内に位置しているか否かを判断する。制御部１０１は、無線基地局が送信する制御情報を通信部１０５が受信できている場合、端末装置１０が通信網２の無線基地局の圏内に位置していると判断し、無線基地局が送信する制御情報を通信部１０５が受信できていない場合、端末装置１０が通信網２の無線基地局の圏内に位置していないと判断する。制御部１０１（第１送信手段１００２）は、端末装置１０が通信網２の無線基地局の圏内に位置している場合（ステップＳＡ２でＹＥＳ）、音声処理部１０７から供給されたユーザ音声信号を、通信部１０５を介してサーバ装置２０へ送信する（ステップＳＡ３）。 FIG. 6 is a flowchart showing the flow of processing performed by the control unit 101. The control unit 101 (acquiring unit 1001) acquires the user audio signal supplied from the audio processing unit 107 (step SA1). When acquiring the user voice signal, the control unit 101 determines whether or not the terminal device 10 is located within the radio base station area of the communication network 2. When the communication unit 105 can receive control information transmitted from the radio base station, the control unit 101 determines that the terminal device 10 is located within the radio base station range of the communication network 2, and the radio base station If the communication unit 105 cannot receive the control information to be transmitted, it is determined that the terminal device 10 is not located within the radio base station area of the communication network 2. When the terminal device 10 is located within the radio base station area of the communication network 2 (YES in step SA2), the control unit 101 (first transmission unit 1002) receives the user voice signal supplied from the voice processing unit 107. Then, the data is transmitted to the server device 20 via the communication unit 105 (step SA3).

サーバ装置２０においては、端末装置１０から送信されたユーザ音声信号を通信部２０５が受信すると、通信部２０５が受信したユーザ音声信号が制御部２０１へ供給される。制御部２０１（受信手段２００１）は、ユーザ音声信号を取得する。制御部２０１は、ユーザ音声信号が供給されると、図８に示した処理を実行する。制御部２０１（生成手段２００２）は、供給されたユーザ音声信号が表す音声を認識してテキストデータに変換する（ステップＳＢ１）。音声信号をテキストデータに変換する方法としては、例えば、「端末機能やサービスの利便性向上のための音声認識技術とアプリケーション開発」、ＮＴＴＤＯＣＯＭＯテクニカルジャーナル、２０１２年１月、Ｖｏｌ１９、Ｎｏ．４、ｐ７４−ｐ７６に記載されている周知の技術を用いる。制御部２０１は、ユーザ音声信号の周波数特性を分析し、音声の特徴量を抽出する。制御部２０１は、音声の特徴量を抽出すると、ユーザ音声信号が表す音声の音素を音響モデルＭＢ１を用いて特定する。制御部２０１は、音素を特定すると、言語モデルＭＢ２を用いてユーザ音声信号が表す音声の形態素列を特定する。制御部２０１は、特定した形態素列からユーザが発話した音声を文字列にしたテキストデータを生成する。 In the server device 20, when the communication unit 205 receives the user voice signal transmitted from the terminal device 10, the user voice signal received by the communication unit 205 is supplied to the control unit 201. The control unit 201 (reception unit 2001) acquires a user voice signal. When the user voice signal is supplied, the control unit 201 executes the process shown in FIG. The control unit 201 (generation unit 2002) recognizes the voice represented by the supplied user voice signal and converts it into text data (step SB1). As a method for converting voice signals into text data, for example, “Voice recognition technology and application development for improving convenience of terminal functions and services”, NTT DOCOMO Technical Journal, January 2012, Vol 19, 4, known techniques described in p74-p76 are used. The control unit 201 analyzes a frequency characteristic of the user voice signal and extracts a voice feature amount. When the voice feature amount is extracted, the control unit 201 specifies the phoneme of the voice represented by the user voice signal using the acoustic model MB1. When the control unit 201 specifies a phoneme, it uses the language model MB2 to specify the morpheme sequence of the voice represented by the user voice signal. The control unit 201 generates text data in which a voice uttered by the user is converted into a character string from the identified morpheme string.

制御部２０１（生成手段２００２）は、ユーザが発話した音声のテキストデータを生成すると、テキストデータに基いてユーザの発話に対応した情報を生成する（ステップＳＢ２）。なお、ユーザの発話に対応する情報の生成については、知識データベースや検索エンジンを使用する方法があり、例えば、「しゃべってコンシェルにおける質問応答技術」、ＮＴＴ技術ジャーナル、２０１３年２月、Ｖｏｌ２５、Ｎｏ．２、ｐ５６−ｐ５９や、「自然文質問への直接回答を実現する知識Ｑ＆Ａ」、ＮＴＴＤＯＣＯＭＯテクニカルジャーナル、２０１３年１月、Ｖｏｌ２０、Ｎｏ．４、ｐ６−ｐ１１に記載されている周知の技術を用いる。
制御部２０１は、例えば、ユーザの発話の内容が「富士山の高さは」という質問であった場合、質問に対応した情報として、質問への回答となる「富士山の高さは３７７６ｍです」というテキストデータ（以下、回答データと称する）と、質問への回答となる「３７７６ｍです」という音声のデジタル信号（以下、回答音声信号と称する）を生成する。 When generating the text data of the voice uttered by the user, the control unit 201 (generating unit 2002) generates information corresponding to the user's utterance based on the text data (step SB2). In addition, there is a method using a knowledge database or a search engine for generating information corresponding to a user's utterance. For example, “Speaking and Concerning Question Answering Technology”, NTT Technical Journal, February 2013, Vol 25, No. . 2, p56-p59, “Knowledge Q & A for Realizing Direct Answers to Natural Sentence Questions”, NTT DOCOMO Technical Journal, January 2013, Vol20, No. 4, well-known techniques described in p6-p11 are used.
For example, when the content of the user's utterance is a question “the height of Mt. Fuji”, the control unit 201 says that “the height of Mt. Fuji is 3776 m” which is an answer to the question as information corresponding to the question. Text data (hereinafter referred to as answer data) and a digital signal (hereinafter referred to as an answer voice signal) of “3776 m” as an answer to the question are generated.

制御部２０１（第２送信手段２００３）は、回答データと回答音声信号とを通信部２０５を介して端末装置１０へ送信する（ステップＳＢ３）。端末装置１０においては、回答データと回答音声信号とを通信部１０５が受信すると、この回答データと回答音声信号とが制御部１０１へ供給される。制御部１０１（第２出力手段１００４）は、回答データと回答音声信号とを取得する（ステップＳＡ４）。制御部１０１（第２出力手段１００４）は、取得した回答データが表す文字列が表示されるように表示部１０３を制御する（ステップＳＡ５）。これにより、図９に例示したように、「富士山の高さは３７７６ｍです」という文字列が表示部１０３に表示される。また、制御部１０１は、回答音声信号を音声処理部１０７へ供給する。音声処理部１０７は、供給された回答音声信号をアナログ信号に変換する。このアナログ信号は、スピーカへ供給され、スピーカからは「３７７６ｍです」という音声が放音される（ステップＳＡ６）。 The control unit 201 (second transmission unit 2003) transmits the answer data and the answer voice signal to the terminal device 10 via the communication unit 205 (step SB3). In the terminal device 10, when the communication unit 105 receives the answer data and the answer voice signal, the answer data and the answer voice signal are supplied to the control unit 101. The control unit 101 (second output means 1004) obtains answer data and answer voice signal (step SA4). The control unit 101 (second output unit 1004) controls the display unit 103 so that the character string represented by the acquired answer data is displayed (step SA5). As a result, as illustrated in FIG. 9, the character string “Mt. Fuji is 3776 m high” is displayed on the display unit 103. Further, the control unit 101 supplies the answer voice signal to the voice processing unit 107. The voice processing unit 107 converts the supplied answer voice signal into an analog signal. This analog signal is supplied to the speaker, and a sound “3976 m” is emitted from the speaker (step SA6).

このように本実施形態においては、ユーザが端末装置１０に話しかけたときに端末装置１０とサーバ装置２０とが通信可能である場合、ユーザの音声に対してコミュニケーションシステム１が応答し、話しかけた音声に対応した情報をユーザに提供する。 As described above, in the present embodiment, when the terminal device 10 and the server device 20 can communicate when the user talks to the terminal device 10, the communication system 1 responds to the user's voice and the spoken voice. Providing information corresponding to the user.

（第２状態のときの動作例）
次に、ユーザが端末装置１０に話しかけたときに端末装置１０とサーバ装置２０とが通信できない状態である場合の動作例について説明する。例えば、電車での移動中においては、端末装置１０は、一時的に無線基地局の圏外となり、通信網２を介した通信を行えなくなる場合がある。制御部１０１は、ユーザ音声信号が供給されたときに端末装置１０が無線基地局の圏内に位置していない場合（ステップＳＡ２でＮＯ）、ユーザ音声信号を記憶部１０２に記憶させる（ステップＳＡ７）。また、制御部１０１（第１出力手段１００３）は、サーバ装置２０と同様にユーザ音声信号が表す音声を認識してテキストデータを生成する（ステップＳＡ８）。具体的には、制御部１０１は、ユーザ音声信号の周波数特性を分析し、音声の特徴量を抽出する。制御部１０１は、音声の特徴量を抽出すると、ユーザ音声信号が表す音声の音素を音響モデルＭＡ１を用いて特定する。制御部１０１は、音素を特定すると、言語モデルＭＡ２を用いてユーザ音声信号が表す音声の形態素列を特定する。制御部１０１は、特定した形態素列からユーザが発話した音声を文字列にしたテキストデータを生成する。 (Operation example in the second state)
Next, an operation example when the terminal device 10 and the server device 20 cannot communicate when the user talks to the terminal device 10 will be described. For example, while moving on a train, the terminal device 10 may temporarily be out of the range of the wireless base station and may not be able to communicate via the communication network 2. If the terminal device 10 is not located within the radio base station when the user voice signal is supplied (NO in step SA2), the control unit 101 stores the user voice signal in the storage unit 102 (step SA7). . Further, the control unit 101 (first output unit 1003) recognizes the voice represented by the user voice signal and generates text data in the same manner as the server device 20 (step SA8). Specifically, the control unit 101 analyzes a frequency characteristic of the user voice signal and extracts a voice feature amount. When the voice feature amount is extracted, the control unit 101 specifies the phoneme of the voice represented by the user voice signal using the acoustic model MA1. When the control unit 101 specifies a phoneme, it uses the language model MA2 to specify a speech morpheme string represented by the user speech signal. The control unit 101 generates text data in which a voice uttered by the user is converted into a character string from the identified morpheme string.

制御部１０１（第１出力手段１００３）は、ユーザが発話した音声のテキストデータを生成すると、ユーザの発話に対応する応答を生成する（ステップＳＡ９）。ここで生成する応答としては、例えば、コミュニケーションの間を保つ自然文などがある。制御部１０１は、生成した自然文のテキストデータと、生成した自然文を発話したときの音声を表す応答音声信号を生成する。制御部１０１（第１出力手段１００３）は、生成したテキストデータが表す文字列が表示されるように表示部１０３を制御する（ステップＳＡ１０）。また、制御部１０１（第１出力手段１００３）は、応答音声信号を音声処理部１０７へ供給する。音声処理部１０７は、供給された応答音声信号をアナログ信号に変換する。このアナログ信号は、スピーカへ供給され、スピーカからは生成した自然文を発話したときの音声が放音される（ステップＳＡ１１）。 When the control unit 101 (first output unit 1003) generates text data of speech uttered by the user, the control unit 101 (first output unit 1003) generates a response corresponding to the user's utterance (step SA9). The response generated here includes, for example, a natural sentence that keeps communication. The control unit 101 generates text data of the generated natural sentence and a response voice signal representing a voice when the generated natural sentence is uttered. The control unit 101 (first output unit 1003) controls the display unit 103 so that the character string represented by the generated text data is displayed (step SA10). In addition, the control unit 101 (first output unit 1003) supplies a response audio signal to the audio processing unit 107. The voice processing unit 107 converts the supplied response voice signal into an analog signal. This analog signal is supplied to the speaker, and the sound when the generated natural sentence is uttered is emitted from the speaker (step SA11).

例えば、ユーザの発話の内容が「富士山の高さは？」という質問文であった場合、制御部１０１は、コミュニケーションの間を保つ文として「それについては・・・」という自然文のテキストデータと、この自然文を発話したときの音声を表す応答音声信号を生成する。制御部１０１は、生成したテキストデータが表す文字列が表示されるように表示部１０３を制御する。これにより、「それについては・・・」という文字列が表示部１０３に表示される。また、制御部１０１は、応答音声信号を音声処理部１０７へ供給する。音声処理部１０７は、供給された回答音声信号をアナログ信号に変換する。このアナログ信号は、スピーカへ供給され、スピーカからは「それについては」という音声が放音される。また、制御部１０１は、さらにコミュニケーションの間を保つ文として「少しまってね」という自然文のテキストデータと、この自然文を発話したときの応答音声信号を生成する。制御部１０１は、生成したテキストデータが表す文字列が表示されるように表示部１０３を制御する。これにより、「少しまってね」という文字列が表示部１０３に表示される。また、制御部１０１は、応答音声信号を音声処理部１０７へ供給する。これにより、スピーカからは「少しまってね」という音声が放音される。 For example, when the content of the user's utterance is a question sentence “What is the height of Mt. Fuji?”, The control unit 101 uses natural sentence text data “about that ...” as a sentence for maintaining communication. And a response voice signal representing the voice when the natural sentence is uttered. The control unit 101 controls the display unit 103 so that a character string represented by the generated text data is displayed. As a result, the character string “about it” is displayed on the display unit 103. In addition, the control unit 101 supplies a response voice signal to the voice processing unit 107. The voice processing unit 107 converts the supplied answer voice signal into an analog signal. This analog signal is supplied to the speaker, and a sound “about it” is emitted from the speaker. In addition, the control unit 101 generates text data of a natural sentence “Please wait a little longer” as a sentence that keeps communication, and a response voice signal when the natural sentence is uttered. The control unit 101 controls the display unit 103 so that a character string represented by the generated text data is displayed. As a result, the character string “Please wait a bit” is displayed on the display unit 103. In addition, the control unit 101 supplies a response voice signal to the voice processing unit 107. As a result, a sound “slightly wait” is emitted from the speaker.

なお、本実施形態においては、コミュニケーションの間を保つ文は、予めアプリＡが記憶する構成であるが、この構成に限定されるものではない。例えば、コミュニケーションの間を保つ文のデータベースを記憶部１０２に記憶させ、ユーザの発話の内容に対応した文を制御部１０１がデータベースから取得する構成であってもよい。また、コミュニケーションの間を保つ文としては、挨拶や相槌などであってもよい。 In the present embodiment, the sentence for maintaining communication is a configuration stored in advance by the app A, but is not limited to this configuration. For example, a configuration may be adopted in which a database of sentences that keeps communication is stored in the storage unit 102, and the control unit 101 acquires a sentence corresponding to the content of the user's utterance from the database. In addition, as a sentence for maintaining communication, a greeting or a companion may be used.

このように本実施形態においては、端末装置１０が通信網２の無線基地局の圏外となってサーバ装置２０との通信ができない状態にある場合、ユーザの音声に対して端末装置１０が応答し、コミュニケーションを継続する。 As described above, in the present embodiment, when the terminal device 10 is out of the range of the wireless base station of the communication network 2 and cannot communicate with the server device 20, the terminal device 10 responds to the user's voice. , Continue communication.

（第２状態から第１状態へ変化したときの動作例）
次に、上述したように第２状態でユーザの発話に対して端末装置１０が応答した後、第１状態に変化したときの動作例について説明する。例えば、上述したように電車での移動中においては、端末装置１０は、一時的に無線基地局の圏外となって第２状態になるが、さらに移動すると第２状態から第１状態に戻る。制御部１０１は、第２状態から第１状態になると、図７に示した処理を実行し、第２状態のときに記憶部１０２に記憶されたユーザ音声信号を、通信部１０５を介してサーバ装置２０へ送信する（ステップＳＣ１）。 (Operation example when changed from the second state to the first state)
Next, an operation example when the terminal apparatus 10 responds to the user's utterance in the second state as described above and then changes to the first state will be described. For example, while moving on a train as described above, the terminal device 10 temporarily goes out of range of the radio base station and enters the second state. However, when the terminal device 10 further moves, the terminal device 10 returns from the second state to the first state. When the control unit 101 changes from the second state to the first state, the control unit 101 executes the processing illustrated in FIG. 7, and the user voice signal stored in the storage unit 102 in the second state is transmitted to the server via the communication unit 105. It transmits to the apparatus 20 (step SC1).

例えば、上述したように第２状態においてユーザの発話の内容が「富士山の高さは？」という質問であった場合、記憶部１０２には、この発話のユーザ音声信号が記憶されている。制御部１０１は、第２状態から第１状態になると、このユーザ音声信号を通信部１０５を介してサーバ装置２０へ送信する。 For example, as described above, when the content of the user's utterance is the question “What is the height of Mt. Fuji?” In the second state, the user voice signal of this utterance is stored in the storage unit 102. When the control unit 101 changes from the second state to the first state, the control unit 101 transmits the user voice signal to the server device 20 via the communication unit 105.

サーバ装置２０においては、端末装置１０から送信されたユーザ音声信号を通信部２０５が受信すると、上述した第１状態のときの動作例と同様に、制御部２０１が質問への回答となる回答データと回答音声信号を生成する。制御部２０１は、生成した回答データと回答音声信号とを通信部２０５を介して端末装置１０へ送信する。制御部１０１は、サーバ装置２０が送信した回答データと回答音声信号とを取得する（ステップＳＣ２）。受信した回答データが表す文字列を表示部１０３に表示し（ステップＳＣ３）、回答音声信号が示す音声をスピーカから放音する（ステップＳＣ４）。 In the server device 20, when the communication unit 205 receives the user voice signal transmitted from the terminal device 10, the control unit 201 answers the question in the same manner as the operation example in the first state described above. And an answer voice signal is generated. The control unit 201 transmits the generated answer data and answer voice signal to the terminal device 10 via the communication unit 205. The control unit 101 acquires the answer data and answer voice signal transmitted by the server device 20 (step SC2). The character string represented by the received answer data is displayed on display 103 (step SC3), and the voice indicated by the answer voice signal is emitted from the speaker (step SC4).

以上説明したように本実施形態によれば、端末装置１０とサーバ装置２０とが一時的に通信を行えない状態となっても、端末装置１０が音声を発してユーザとのコミュニケーションを継続するため、通信が行えずにサーバ装置２０から情報を取得できない状態をユーザに意識させないことができる。また、本実施形態によれば、端末装置１０とサーバ装置２０とが通信を行えない状態から通信可能な状態に戻ると、端末装置１０とサーバ装置２０とが通信を行うため、ユーザの発話に対して端末装置１０では回答できない情報をサーバ装置２０からユーザに提示することができる。 As described above, according to the present embodiment, even if the terminal device 10 and the server device 20 are temporarily unable to communicate with each other, the terminal device 10 utters voice and continues communication with the user. Therefore, it is possible to prevent the user from being aware of the state in which communication cannot be performed and information cannot be acquired from the server device 20. Further, according to the present embodiment, when the terminal device 10 and the server device 20 return from a state in which communication is not possible to a state in which communication is possible, the terminal device 10 and the server device 20 communicate with each other. On the other hand, information that cannot be answered by the terminal device 10 can be presented from the server device 20 to the user.

［変形例］
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。例えば、上述の実施形態を以下のように変形して本発明を実施してもよい。なお、上述した実施形態及び以下の変形例は、各々を組み合わせてもよい。 [Modification]
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. For example, the present invention may be implemented by modifying the above-described embodiment as follows. In addition, you may combine each of embodiment mentioned above and the following modifications.

上述した実施形態においては、ユーザが発話した音声のユーザ音声信号をサーバ装置２０へ送信し、このユーザ音声信号の内容に対応した情報をサーバ装置２０が端末装置１０へ送信する構成となっているが、この構成に限定されるものではない。例えば、天気や交通機関の運行情報などは、日や時間によって変化するため、ユーザがよく質問するものである。端末装置１０は、天気や運行情報などの予め定められた情報については、これらを問い合わせる質問文を予め定めた時間が経過する毎又は予め定めた時刻にサーバ装置２０へ送信し、サーバ装置２０から回答データと回答音声信号とを取得しておくようにしてもよい。そして、端末装置１０とサーバ装置２０とが通信不可の状態のときに、ユーザの発話の内容が天気や交通機関の運行情報などを質問するものである場合、予めサーバ装置２０から取得した回答データと回答音声信号とに基いて、天気の情報や運行情報をユーザに提示してもよい。
なお、予め回答データと回答音声信号とを取得する構成においては、例えばユーザが端末装置１０のＷｅｂブラウザを使用して検索エンジンで検索した文字列について、回答データと回答音声信号とを周期的又は予め定めた時刻に取得するようにしてもよい。
また、回数が多いユーザの質問を端末装置１０がユーザの音声信号から解析し、回数が多いと特定した質問文を、予め定めた時間が経過する毎又は予め定めた時刻にサーバ装置２０へ送信し、サーバ装置２０から回答データと回答音声信号とを取得しておくようにしてもよい。例えば、端末装置１０は、一日に同じ質問が３回以上された場合、この質問を回数が多い質問と特定する。そして、端末装置１０とサーバ装置２０とが通信不可の状態のときのユーザの発話の内容が、予め回数が多いと特定した質問文である場合、予めサーバ装置２０から取得した回答データと回答音声信号とに基いて、質問文への回答をユーザに提示してもよい。例えば、株価の情報についての質問回数が多いユーザについては、端末装置１０は、予め定められた時間（前場と後場の開始時間及び終了時間）に予め株価の情報をサーバ装置２０から取得しておいてもよい。
また、端末装置１０は、予め取得した回答データと回答音声信号とで第２状態において応答した後に第１状態となった場合、記憶部１０２に記憶しておいたユーザ音声信号をサーバ装置２０へ送信し、サーバ装置２０から取得した回答データと回答音声データとに基いて画面の表示と放音とを行うようにしてもよい。
なお、回答データと回答音声信号とを予め取得した時刻と、第２状態から第１状態になった時刻との差が予め定められた閾値未満である場合、記憶部１０２に記憶しておいたユーザ音声信号をサーバ装置２０へ送信しないようにしてもよい。また、回答データと回答音声信号とを予め取得する構成においては、端末装置１０は、第２状態にある場合、例えば近距離無線通信で近隣の端末装置１０と通信を行い、他の携帯端末が取得している回答データと回答音声信号とを取得するようにしてもよい。 In the embodiment described above, the user voice signal of the voice spoken by the user is transmitted to the server apparatus 20, and the server apparatus 20 transmits information corresponding to the content of the user voice signal to the terminal apparatus 10. However, it is not limited to this configuration. For example, the weather and transportation operation information vary depending on the day and time, so the user often asks questions. For the predetermined information such as weather and operation information, the terminal device 10 transmits a question message for inquiring them to the server device 20 every time a predetermined time elapses or at a predetermined time. You may make it acquire reply data and a reply voice signal. And when the terminal device 10 and the server device 20 are in a state where communication is not possible, if the content of the user's utterance asks questions about weather, transportation operation information, etc., the response data acquired from the server device 20 in advance And weather information and operation information may be presented to the user based on the answer voice signal.
In the configuration in which the answer data and the answer voice signal are acquired in advance, the answer data and the answer voice signal are periodically or for the character string searched by the search engine using the Web browser of the terminal device 10 by the user, for example. It may be acquired at a predetermined time.
Further, the terminal device 10 analyzes a user's question with a large number of times from the user's voice signal, and transmits a question sentence specified as having a large number of times to the server device 20 every time a predetermined time elapses or at a predetermined time. Then, the answer data and the answer voice signal may be acquired from the server device 20. For example, when the same question is asked three times or more in a day, the terminal device 10 identifies this question as a question with a large number of times. Then, when the content of the user's utterance when the terminal device 10 and the server device 20 are in a communication disabled state is a question sentence specified as having a large number of times in advance, the answer data and answer voice acquired in advance from the server device 20 An answer to the question sentence may be presented to the user based on the signal. For example, for a user who has a large number of questions about stock price information, the terminal device 10 obtains stock price information from the server device 20 in advance at predetermined times (the start time and end time of the front and the back). May be.
Further, when the terminal apparatus 10 enters the first state after responding in the second state with the answer data and the answer voice signal acquired in advance, the terminal apparatus 10 transmits the user voice signal stored in the storage unit 102 to the server apparatus 20. The screen display and sound emission may be performed based on the response data and the response voice data that are transmitted and acquired from the server device 20.
If the difference between the time when the answer data and the answer voice signal are acquired in advance and the time when the state changes from the second state to the first state is less than a predetermined threshold, it is stored in the storage unit 102. The user voice signal may not be transmitted to the server device 20. In the configuration in which the answer data and the answer voice signal are acquired in advance, when the terminal device 10 is in the second state, the terminal device 10 communicates with the neighboring terminal device 10 by, for example, short-range wireless communication, and other mobile terminals You may make it acquire the acquired reply data and a reply voice signal.

なお、ユーザがよくする質問について端末装置１０が予めサーバ装置２０から回答データと回答音声信号とを取得する構成においては、周期的又は予め定めた時刻に取得する構成に限定されるものではない。例えば、端末装置１０は、無線基地局の電波強度を監視し、電波強度が予め定められた閾値未満となると、ユーザがよくする質問についてサーバ装置２０から回答データと回答音声信号とを取得する構成としてもよい。また、端末装置１０は、圏外となる位置をＧＰＳ（Global Positioning System）により予め特定しておき、第１状態のときに特定した位置から予め定められた範囲内の位置に入ると、ユーザがよくする質問についてサーバ装置２０から回答データと回答音声信号とを取得する構成としてもよい。
また、例えばショッピングモールやデパートなどの大型の建物に入ると、端末装置１０が第２状態となる虞があるため、端末装置１０は、自身の位置を監視し、このような建物に近づいた場合には、ユーザがよくする質問についてはサーバ装置２０から回答データと回答音声信号とを予め取得する構成としてもよい。 In addition, in the structure which the terminal device 10 acquires answer data and an answer voice signal from the server apparatus 20 beforehand about the question which a user often asks, it is not limited to the structure acquired periodically or at predetermined time. For example, the terminal device 10 monitors the radio field intensity of the radio base station, and acquires response data and answer audio signals from the server apparatus 20 for questions frequently asked by the user when the radio field intensity is less than a predetermined threshold. It is good. Further, the terminal device 10 specifies a position that is out of service area in advance by GPS (Global Positioning System), and if the user enters a position within a predetermined range from the position specified in the first state, the user often It is good also as a structure which acquires reply data and a reply audio | voice signal from the server apparatus 20 about the question to answer.
In addition, for example, when entering a large building such as a shopping mall or a department store, the terminal device 10 may be in the second state. Therefore, the terminal device 10 monitors its own position and approaches the building. For a question frequently asked by the user, answer data and answer voice signal may be acquired from the server device 20 in advance.

上述した実施形態においては、端末装置１０が第２状態の場合、端末装置１０は、ユーザの発話に対してコミュニケーションの間を保つ自然文で応答するが、この構成に限定されるものではない。例えば、端末装置１０は、発話の内容をユーザに詳細に問い合わせる構成であってもよい。例えば、ユーザが「ランチ食べたい」と発話した場合、「何を食べたい？」、「価格は？」、「場所は？」など、発話に対する回答を絞り込むのに有用な会話を端末装置１０が行うようにしてもよい。
この場合、端末装置１０は、各質問に対して発した音声のユーザ音声信号を記憶部１０２に記憶し、第１状態に戻ったときは、記憶した各ユーザ音声信号をサーバ装置２０へ送信する。サーバ装置２０は、各ユーザ音声信号に対して音声認識を行い、各音声認識結果から得られた文字列でユーザの発話に対応する情報を取得する構成としてもよい。 In the above-described embodiment, when the terminal device 10 is in the second state, the terminal device 10 responds to the user's utterance with a natural sentence that keeps communication between, but is not limited to this configuration. For example, the terminal device 10 may be configured to inquire the user about the content of the utterance in detail. For example, when the user utters “I want to eat lunch”, the terminal device 10 makes a conversation useful for narrowing down the answers to the utterance, such as “What do you want to eat?”, “What is the price?”, “What is the place?” You may make it perform.
In this case, the terminal device 10 stores the user voice signal of the voice issued for each question in the storage unit 102, and transmits the stored user voice signal to the server device 20 when returning to the first state. . The server device 20 may be configured to perform voice recognition on each user voice signal and acquire information corresponding to the user's utterance with a character string obtained from each voice recognition result.

上述した実施形態においては、端末装置１０は、第２状態である場合にユーザの発話に対して応答しているが、この構成に限定されるものではない。例えば、端末装置１０が無線基地局の圏内にあっても、通信網２又はサーバ装置２０で障害が発生し、端末装置１０がサーバ装置２０と通信を行えない場合が生じ得る。端末装置１０は、無線基地局の圏内にあるときにサーバ装置２０と通信可能であるか周期的に検知し、圏内であっても通信不可の場合には、上述した実施形態と同様に、ユーザの発話に対して端末装置１０が応答するようにしてもよい。 In the above-described embodiment, the terminal device 10 responds to the user's utterance in the second state, but is not limited to this configuration. For example, even when the terminal device 10 is within the range of the radio base station, a failure may occur in the communication network 2 or the server device 20, and the terminal device 10 may not be able to communicate with the server device 20. The terminal device 10 periodically detects whether or not it is possible to communicate with the server device 20 when it is within the radio base station area, and if communication is not possible even within the area, the user equipment 10 is similar to the above-described embodiment. The terminal device 10 may respond to the utterance.

上述した実施形態においては、テキストデータが表す文字列を表示するときに、図１０に例示したように、エージェントのアバターを表示し、表示したアバターからの吹き出しの中にテキストデータが表す文字列を表示してもよい。また、回答音声信号や応答音声信号の音声を放音するときに、アバターが発話しているようにアバターの口元をアニメーションで表示するようにしてもよい。
また、端末装置１０は、サーバ装置２０から取得した回答データを表示し、回答音声データの音声を放音する場合には、大人のアバターを表示し、第２状態においてコミュニケーションの間を保つ自然文の表示と音声の放音を行う場合には、子供のアバターを表示するようにしてもよい。また、この変形例にあっては、端末装置１０は、大人のアバターを表示しているときには大人の音声で放音し、子供のアバターを表示しているときには子供の音声で放音するようにしてもよい。また、アバターを表示する構成においては、端末装置１０は、アバターを複数種類有し、ユーザの会話の内容に応じてアバターを変更するようにしてもよい。 In the embodiment described above, when the character string represented by the text data is displayed, as illustrated in FIG. 10, the agent avatar is displayed, and the character string represented by the text data is displayed in the balloon from the displayed avatar. It may be displayed. Further, when the voice of the answer voice signal or the response voice signal is emitted, the mouth of the avatar may be displayed as an animation so that the avatar speaks.
In addition, the terminal device 10 displays the answer data acquired from the server device 20, and when the sound of the answer voice data is emitted, the terminal device 10 displays an adult avatar and a natural sentence that keeps communication in the second state. When displaying and sound emission, a child avatar may be displayed. Further, in this modification, the terminal device 10 emits sound with an adult voice when an adult avatar is displayed, and emits sound with a child voice when a child avatar is displayed. May be. Moreover, in the structure which displays an avatar, the terminal device 10 may have two or more types of avatars, and may change an avatar according to the content of a user's conversation.

上述した実施形態においては、第２状態においてコミュニケーションの間を保つ自然文の表示及び音声の放音が端末装置１０で行われた後、ユーザが情報の取得を中止する発話した場合、端末装置１０は、記憶部１０２に記憶されたユーザ音声信号をサーバ装置２０へ送信しないようにしてもよい。
また、端末装置１０は、第２状態においてコミュニケーションの間を保つ自然文の表示及び音声の放音が端末装置１０で行われた後、第２状態の継続時間が予め定められた時間を越えた場合、ユーザの発話に対して直ぐに応答できないことを報知してもよい。また、この場合、端末装置１０は、ユーザの発話に対して応答できないことを報知し、第１状態に戻っても、記憶したユーザ音声信号をサーバ装置２０へ送信しないようにしてもよい。また、端末装置１０は、第２状態においてコミュニケーションの間を保つ自然文の表示及び音声の放音が端末装置１０で行われた後、第２状態の継続時間が予め定められた時間を越えてから第１状態となった場合、質問文への回答を行うか否かユーザに問い合わせる構成としてもよい。端末装置１０は、ユーザが回答を希望した場合、ユーザ音声信号をサーバ装置２０へ送信し、回答を希望しなかった場合、ユーザ音声信号をサーバ装置２０へ送信しないようにしてもよい。 In the above-described embodiment, after the natural sentence display and the sound emission to keep the communication in the second state are performed by the terminal device 10, when the user utters to stop acquiring information, the terminal device 10 The user voice signal stored in the storage unit 102 may not be transmitted to the server device 20.
In addition, after the terminal device 10 displays the natural sentence and emits the sound in the second state during the communication in the second state, the duration of the second state exceeds a predetermined time. In this case, it may be notified that the user cannot respond immediately to the user's utterance. In this case, the terminal device 10 may notify that the user cannot respond to the user's utterance, and may not transmit the stored user voice signal to the server device 20 even when the terminal device 10 returns to the first state. In addition, the terminal device 10 is configured so that the duration of the second state exceeds a predetermined time after the natural sentence display and the sound emission of the voice that are maintained during the communication in the second state are performed by the terminal device 10. When the first state is reached, the user may be inquired whether to answer the question text. The terminal device 10 may transmit a user voice signal to the server device 20 when the user desires an answer, and may not transmit the user voice signal to the server device 20 when the user does not desire an answer.

上述した実施形態においては、端末装置１０は、第２状態となってから経過した時間を計時し、計時した時間が予め定められた閾値以上のときにユーザが発話した場合、ユーザの発話に対して直ぐに応答できないことを報知してもよい。また、端末装置１０は、第１状態から第２状態となったときには、ユーザの発話に対して直ぐに応答できないことを報知し、第２状態から第１状態となったときには、ユーザの発話に対して直ぐに応答できることを報知してもよい。 In the above-described embodiment, the terminal device 10 counts the time that has elapsed since becoming the second state, and when the user utters when the measured time is equal to or greater than a predetermined threshold, It may be notified that the user cannot respond immediately. Further, when the terminal device 10 changes from the first state to the second state, the terminal device 10 notifies that the user's utterance cannot be immediately responded. When the terminal device 10 changes from the second state to the first state, the terminal device 10 responds to the user's utterance. You may notify that you can respond immediately.

上述した実施形態においては、ユーザが発話した音声に対して端末装置１０やサーバ装置２０が応答しているが、ユーザからの入力は音声に限定されるものではない。例えば、ユーザが端末装置１０において会話や質問の文章を入力し、入力された文章に対応した情報をユーザに提示するようにしてもよい。この構成によれば、チャットのように文字の入力でコミュニケーションを図ることができる。 In the embodiment described above, the terminal device 10 and the server device 20 respond to the voice spoken by the user, but the input from the user is not limited to the voice. For example, the user may input a sentence of a conversation or a question on the terminal device 10 and present information corresponding to the input sentence to the user. According to this configuration, communication can be achieved by inputting characters as in chat.

上述した実施形態においては、コミュニケーションシステム１は、端末装置１０とサーバ装置２０との構成に限定されるものではなく他の構成であってもよい。例えば、サーバ装置２０については、上記の刊行物の「自然文質問への直接回答を実現する知識Ｑ＆Ａ」に記載されているように、端末装置１０から送信された音声信号を受信するフロントサーバと、データベース型Ｑ＆Ａサーバと、検索型Ｑ＆Ａサーバとで構成してもよい。 In the above-described embodiment, the communication system 1 is not limited to the configuration of the terminal device 10 and the server device 20, and may have other configurations. For example, the server device 20 includes a front server that receives an audio signal transmitted from the terminal device 10 as described in “Knowledge Q & A for Realizing Direct Answers to Natural Sentence Questions” in the above publication. A database type Q & A server and a search type Q & A server may be used.

フロントサーバは、端末装置１０から送信されたユーザ音声信号を音声認識してテキストデータを生成する。フロントサーバは、生成したテキストデータをデータベース型Ｑ＆Ａサーバへ送信し、データベース型Ｑ＆Ａサーバで回答を得られた場合には、得られた回答を端末装置１０へ送信する。また、フロントサーバは、データベース型Ｑ＆Ａサーバで回答を得られなかった場合には、生成したテキストデータを検索型Ｑ＆Ａサーバへ送信する。フロントサーバは、検索型Ｑ＆Ａサーバで得られた回答を端末装置１０へ送信する。 The front server recognizes the user voice signal transmitted from the terminal device 10 and generates text data. The front server transmits the generated text data to the database type Q & A server, and when an answer is obtained by the database type Q & A server, the obtained answer is transmitted to the terminal device 10. Further, when the front server cannot obtain an answer from the database type Q & A server, the front server transmits the generated text data to the search type Q & A server. The front server transmits the answer obtained by the search type Q & A server to the terminal device 10.

データベース型Ｑ＆Ａサーバは、知識データベースを有するサーバである。データベース型Ｑ＆Ａサーバは、フロントサーバから送られたテキストデータが表す質問を解析し、質問の対象と属性を抽出する。データベース型Ｑ＆Ａサーバは、抽出した対象と属性を知識データベースにおいて検査する。例えば、質問の内容が「エベレストの高さは？」という質問である場合、データベース型Ｑ＆Ａサーバは、「エベレスト」という対象と、「標高」という属性を抽出する。知識データベースにおいては、富士山やエベレスト、キリマンジャロなどの山の名称と標高とが対応付けて格納されており、データベース型Ｑ＆Ａサーバは、知識データベースからエベレストの標高を抽出し、抽出した標高をフロントサーバへ送信する。 The database type Q & A server is a server having a knowledge database. The database type Q & A server analyzes the question represented by the text data sent from the front server, and extracts the question target and attributes. The database type Q & A server checks the extracted object and attribute in the knowledge database. For example, when the content of the question is a question “What is the height of Everest?”, The database type Q & A server extracts an object “Everest” and an attribute “Elevation”. In the knowledge database, the names of mountains such as Mt. Fuji, Everest, and Kilimanjaro are stored in association with the altitude. The database-type Q & A server extracts the altitude of Everest from the knowledge database and sends the extracted altitude to the front server. Send.

検索型Ｑ＆Ａサーバは、検索エンジンを用いてユーザの発話に対する回答を得るサーバである。検索型Ｑ＆Ａサーバは、フロントサーバから送られたテキストデータから検索エンジンへ送るキーワードを抽出し、抽出したキーワードを検索エンジンへ送る。検索型Ｑ＆Ａサーバは、検索エンジンの検索結果からユーザの発話に対する回答を生成し、生成した回答をフロントサーバへ送信する。 The search-type Q & A server is a server that obtains an answer to a user's utterance using a search engine. The search-type Q & A server extracts keywords to be sent to the search engine from the text data sent from the front server, and sends the extracted keywords to the search engine. The search-type Q & A server generates an answer to the user's utterance from the search engine search result, and transmits the generated answer to the front server.

上述した実施形態においては、端末装置１０は、第２状態の場合にコミュニケーションの間を保つ音声を放音するが、この構成に限定されるものではない。例えば、端末装置１０に記憶されている楽曲のデータを再生してコミュニケーションの間を保つようにしてもよい。 In the embodiment described above, the terminal device 10 emits a sound that keeps communication during the second state, but is not limited to this configuration. For example, music data stored in the terminal device 10 may be reproduced to keep communication.

上述した実施形態においては、端末装置１０は、第２状態のときにはコミュニケーションの間を保つ自然文を出力する構成となっているが、この構成に限定されるものではない。例えば、予め定めた時間帯や予め定めた位置など、所定の条件に合致する場合には第１状態であってもコミュニケーションの間を保つ自然文を出力する構成としてもよい。 In the embodiment described above, the terminal device 10 is configured to output a natural sentence that keeps communication during the second state, but is not limited to this configuration. For example, when a predetermined condition such as a predetermined time zone or a predetermined position is met, a natural sentence that keeps communication during the first state may be output.

上述した実施形態においては、端末装置１０内において仮想化ネットワークを構築し、ユーザ音声信号を仮想化ネットワークに構築されたデータ保管部に記憶させるようにしてもよい。この構成においては、端末装置１０は、第２状態にある場合、ユーザ音声信号を仮想化ネットワークに構築されたデータ保管部に記憶させる。端末装置１０は、第２状態から第１状態になった場合、データ保管部から仮想化ネットワークを介してユーザ音声信号を読み出し、読み出したユーザ音声信号をサーバ装置２０へ送信する。 In the above-described embodiment, a virtual network may be constructed in the terminal device 10, and the user voice signal may be stored in a data storage unit constructed in the virtual network. In this configuration, when the terminal device 10 is in the second state, the terminal device 10 stores the user voice signal in the data storage unit built in the virtual network. When the terminal device 10 changes from the second state to the first state, the terminal device 10 reads the user voice signal from the data storage unit via the virtual network, and transmits the read user voice signal to the server device 20.

上述した実施形態においては、ユーザ音声信号を端末装置１０からサーバ装置２０へ送信しているが、端末装置１０においてユーザ音声信号を音声認識してテキストデータを生成し、生成したテキストデータをサーバ装置２０へ送信する構成としてもよい。この構成においては、サーバ装置２０は、送信されたテキストデータに基いて、ユーザの発話に対応する回答を生成する。
また、上述した実施形態においては、回答音声信号をサーバ装置２０から端末装置１０へ送信しているが、端末装置１０が放音する音声を示すテキストデータをサーバ装置２０から端末装置１０へ送信する構成としてもよい。この構成においては、端末装置１０は、サーバ装置２０から送信されたテキストデータから音声合成を行い、テキストデータの内容を発話する。
また、端末装置１０とサーバ装置２０との間でやり取りする情報は、音声信号やテキストデータに限定されるものではなく、ユーザの音声やサーバ装置２０からの回答を符号化してもよい。例えば、「おはよう」という挨拶を「Ａ０１」、「こんにちは」という挨拶を「Ａ０２」、「今晩は」という挨拶を「Ａ０３」と符号化し、符号化後のデータを通信先の装置へ送信してもよい。端末装置１０とサーバ装置２０は、符号化された情報と符号化される前の情報との対応関係を記憶しており、符号化された情報を取得した装置は、記憶している対応関係を参照し、取得した情報をテキストデータに変換して処理する。
また、質問に関する音声を符号化し、日時に関する音声をパラメータとするようにしてもよい。例えば、ユーザの音声が「今日の天気は？」という音声である場合、「今日」という日について「天気」の質問をしていることとなる。この場合、端末装置１０は、天気の質問を「Ｂ０１」と符号化し、「今日」という音声を「today」というパラメータに変換してサーバ装置２０へ送信する。端末装置１０とサーバ装置２０は、符号化された情報及びパラメータと、符号化される前の情報との対応関係を記憶しており、符号化された情報やパラメータを取得した装置は、記憶している対応関係を参照し、取得した情報をテキストデータに変換して処理する。例えば、サーバ装置２０は、「Ｂ０１」という情報と「today」という情報を取得すると、今日の天気についての質問と解釈し、今日の天気についての情報を端末装置１０へ送信する。 In the above-described embodiment, the user voice signal is transmitted from the terminal device 10 to the server device 20, but the terminal device 10 recognizes the user voice signal and generates text data, and the generated text data is transmitted to the server device. It is good also as a structure which transmits to 20. In this configuration, the server device 20 generates an answer corresponding to the user's utterance based on the transmitted text data.
In the embodiment described above, the answer voice signal is transmitted from the server device 20 to the terminal device 10, but text data indicating the sound emitted by the terminal device 10 is transmitted from the server device 20 to the terminal device 10. It is good also as a structure. In this configuration, the terminal device 10 performs speech synthesis from the text data transmitted from the server device 20 and utters the content of the text data.
Further, the information exchanged between the terminal device 10 and the server device 20 is not limited to a voice signal or text data, and a user's voice or a response from the server device 20 may be encoded. For example, the greeting of "good morning", "A01", "Hello", "A02" a greeting that, coded as "A03" the greeting of "tonight", to send the data after encoding to the communication destination device Also good. The terminal device 10 and the server device 20 store the correspondence relationship between the encoded information and the information before encoding, and the device that has acquired the encoded information stores the correspondence relationship stored therein. Refer to and convert the acquired information into text data for processing.
In addition, the voice related to the question may be encoded and the voice related to the date and time may be used as a parameter. For example, if the user's voice is “What is the weather today?”, The question “weather” is being asked about the day “Today”. In this case, the terminal device 10 encodes the weather question as “B01”, converts the voice “today” into a parameter “today”, and transmits the parameter to the server device 20. The terminal device 10 and the server device 20 store the correspondence between the encoded information and parameters and the information before encoding, and the device that has acquired the encoded information and parameters stores The acquired information is converted into text data and processed. For example, when the server device 20 acquires the information “B01” and the information “today”, the server device 20 interprets it as a question about today's weather, and transmits the information about today's weather to the terminal device 10.

上述した実施形態においては、ステップＳＡ１とステップＳＡ２の順番を逆にしてもよく、端末装置１０は、無線基地局の圏内か否かを判断した後に、ユーザ音声信号を取得し、無線基地局の圏内の場合には、取得したユーザ音声信号をサーバ装置２０へ送信し、無線基地局の圏外の場合には、取得したユーザ音声信号を記憶部１０２に記憶させるようにしてもよい。 In the above-described embodiment, the order of step SA1 and step SA2 may be reversed. After determining whether or not the terminal device 10 is within the range of the radio base station, the terminal device 10 acquires the user voice signal, and In the case of the vicinity, the acquired user voice signal may be transmitted to the server device 20, and when the radio base station is out of the service area, the acquired user voice signal may be stored in the storage unit 102.

本発明に係る機能を実現するプログラムは、磁気記録媒体（磁気テープ、磁気ディスク（ＨＤＤ（Hard Disk Drive）、ＦＤ（Flexible Disk））など）、光記録媒体（光ディスクなど）、光磁気記録媒体、半導体メモリなどのコンピュータ読取り可能な記録媒体に記憶した状態で提供し、各装置にインストールしてもよい。また、通信網２を介してプログラムをダウンロードして各装置にインストールしてもよい。 The program for realizing the functions according to the present invention includes a magnetic recording medium (magnetic tape, magnetic disk (HDD (Hard Disk Drive), FD (Flexible Disk)), etc.), optical recording medium (optical disk, etc.), magneto-optical recording medium, It may be provided in a state stored in a computer-readable recording medium such as a semiconductor memory and installed in each device. Alternatively, the program may be downloaded via the communication network 2 and installed in each device.

１…コミュニケーションシステム、２…通信網、１０…端末装置、２０…サーバ装置、１０１…制御部、１０２…記憶部、１０３…表示部、１０４…操作部、１０５…通信部、１０７…音声処理部、２０１…制御部、２０２…記憶部、２０３…表示部、２０４…操作部、２０５…通信部、１００１…取得手段、１００２…第１送信手段、１００３…第１出力手段、１００４…第２出力手段、２００１…受信手段、２００２…生成手段、２００３…第２送信手段 DESCRIPTION OF SYMBOLS 1 ... Communication system, 2 ... Communication network, 10 ... Terminal device, 20 ... Server apparatus, 101 ... Control part, 102 ... Memory | storage part, 103 ... Display part, 104 ... Operation part, 105 ... Communication part, 107 ... Voice processing part , 201 ... control unit, 202 ... storage unit, 203 ... display unit, 204 ... operation unit, 205 ... communication unit, 1001 ... acquisition unit, 1002 ... first transmission unit, 1003 ... first output unit, 1004 ... second output Means 2001 ... Reception means 2002 ... Generation means 2003 ... Second transmission means

Claims

Obtaining means for obtaining first information from a user;
A first output unit that outputs second information for communicating with the user when the acquisition unit acquires the first information when communication with the server device is impossible;
When communication with the server device is possible, when the acquisition unit acquires the first information, a first transmission unit that transmits the first information to the server device;
A terminal device comprising: second output means for receiving third information transmitted from the server apparatus as a response to the first information transmitted by the first transmission means, and outputting the received third information.

A communication system comprising a terminal device and a server device,
The terminal device
Obtaining means for obtaining first information from a user;
When communication with the server device is impossible, when the acquisition unit acquires the first information, a first output unit that outputs second information for communicating with the user;
When communication with the server device is possible, when the acquisition unit acquires the first information, a first transmission unit that transmits the first information to the server device;
Second output means for receiving third information transmitted from the server device as a response to the first information transmitted by the first transmission means and outputting the received third information;
The server device
Receiving means for receiving the first information transmitted from the first transmitting means;
Generating means for generating third information corresponding to the first information received by the receiving means;
Second transmission means for transmitting the third information to the terminal device;
Having a communication system.

The terminal device includes a control unit that stores the first information in the storage unit when the acquisition unit acquires the first information when communication with the server device is impossible.
The said 1st transmission means transmits the said 1st information memorize | stored in the said memory | storage means to the said server apparatus, when the communication with the said server apparatus changes from the impossible state to the enabled state. Communication system.

The communication according to claim 2, wherein the first transmission unit transmits predetermined first information to the server device when communication with the server device is possible at a predetermined timing. system.

The communication system according to any one of claims 2 to 4, wherein the first information is a voice signal indicating a voice uttered by a user or text data indicating a voice recognition result of the voice.

The first output means outputs the second information by voice,
The communication system according to any one of claims 2 to 5, wherein the second output means outputs the third information by voice.

The communication system according to claim 6, wherein the sound output from the first output means is different from the sound output from the second output means.

An acquisition step of acquiring first information from a user;
A first output step for outputting second information for communicating with the user when the first information is obtained in the obtaining step when communication with the server device is impossible;
When communication with the server device is possible, when the first information is acquired in the acquiring step, the transmitting step of transmitting the first information to the server device;
A second output step of receiving third information transmitted from the server device as a response to the first information transmitted in the transmitting step and outputting the received third information.

Computer
Obtaining means for obtaining first information from a user;
A first output unit that outputs second information for communicating with the user when the acquisition unit acquires the first information when communication with the server device is impossible;
When communication with the server device is possible, when the acquisition unit acquires the first information, a first transmission unit that transmits the first information to the server device;
The program for functioning as 2nd output means which receives the 3rd information transmitted from the said server apparatus as a response to the said 1st information which the said 1st transmission means transmitted, and outputs the received 3rd information.