JP2017152913A

JP2017152913A - Communication system, communication terminal, server device, and information processing method

Info

Publication number: JP2017152913A
Application number: JP2016033333A
Authority: JP
Inventors: 長谷川　進; Susumu Hasegawa; 進長谷川
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2016-02-24
Filing date: 2016-02-24
Publication date: 2017-08-31

Abstract

PROBLEM TO BE SOLVED: To provide a communication system allowing a user to have, in a predetermined location, conversation suitable for the location.SOLUTION: A portable communication terminal 100 allowing a user to have conversation and autonomously imaging the surroundings determines whether the communication terminal 100 arrives at a predetermined location based on positional information representing a current position of the communication terminal 100. The communication terminal 100 images the surroundings of the communication terminal 100 using a camera in the predetermined location. The communication terminal 100 transmits to a server device 200, positional information of the predetermined location and image data which are obtained by the imaging. The server device 200 generates dictionary data for voice recognition based on the positional information and the image data that are sent from the communication terminal 100. The server device 200 transmits the generated dictionary data to the communication terminal 100. The communication terminal 100 makes a conversation with a user 700 using the dictionary data based on receiving the dictionary data.SELECTED DRAWING: Figure 3

Description

本発明は、サーバ装置および通信端末を備えた通信システムと、当該通信端末と、当該サーバ装置と、これらのシステム、端末および装置における情報処理方法とに関する。 The present invention relates to a communication system including a server device and a communication terminal, the communication terminal, the server device, and an information processing method in these systems, terminals, and devices.

従来、音声認識システムが知られている。音声認識システムでは、音声認識用の辞書データが用いられる。 Conventionally, a voice recognition system is known. In the speech recognition system, dictionary data for speech recognition is used.

たとえば、特開２０１５−０７９２３７号公報（特許文献１）に開示された音声認識システムは、場所に関する辞書データである場所ボキャブラリを生成し、当該生成された場所ボキャブラリを用いて音声認識が行われる。 For example, a speech recognition system disclosed in Japanese Patent Laying-Open No. 2015-079237 (Patent Document 1) generates a location vocabulary that is dictionary data related to a location, and performs speech recognition using the generated location vocabulary.

また、特開２００６−１９５３０２号公報（特許文献２）に開示された音声認識システムは、車両が走行している場所（詳しくは、市街地、山間部、および高速道路の各区分）と、当該車両の搭乗者の口元の動きを撮像することにより得られた画像データとに基づいて、音声認識用の辞書データを切り替える構成が開示されている。 In addition, the speech recognition system disclosed in Japanese Patent Application Laid-Open No. 2006-195302 (Patent Document 2) includes a place where a vehicle is traveling (specifically, each of an urban area, a mountainous area, and an expressway) and the vehicle. A configuration is disclosed in which dictionary data for voice recognition is switched based on image data obtained by imaging the movement of the passenger's mouth.

また、特開２００１−１５４６９３号公報（特許文献３）には、可搬かつ会話可能なロボットが開示されている。当該ロボットは、音声認識結果に基づいて行動する。また、当該ロボットは、ロボットの周囲を撮像し、撮像により得られた画像を認識する。さらに、当該ロボットは、画像認識結果に基づいて、音声認識の対象となっている単語に対する重み付けを制御する。 Japanese Unexamined Patent Application Publication No. 2001-154893 (Patent Document 3) discloses a portable and conversational robot. The robot acts based on the voice recognition result. In addition, the robot captures an image of the surroundings of the robot and recognizes an image obtained by the imaging. Further, the robot controls the weighting for the word that is the target of speech recognition based on the image recognition result.

特開２０１５−０７９２３７号公報Japanese Patent Laying-Open No. 2015-079237 特開２００６−１９５３０２号公報JP 2006-195302 A 特開２００１−１５４６９３号公報JP 2001-154893 A

しかしながら、従来のシステム等では、現在いる場所に適した会話をユーザと行なうことはできない。 However, in a conventional system or the like, a conversation suitable for the current location cannot be performed with the user.

本発明は、上記の問題点に鑑みなされたものであって、その目的は、現在いる場所に適した会話を通信端末がユーザと行なうことが可能な通信システムと、当該通信システムを構成する通信端末およびサーバ装置と、当該通信システム、通信端末、およびサーバ装置における情報処理方法を提供することにある。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a communication system in which a communication terminal can carry out a conversation suitable for a current location with a user, and communication constituting the communication system. An object is to provide a terminal and a server device, and an information processing method in the communication system, the communication terminal, and the server device.

本発明のある局面に従うと、通信システムは、サーバ装置と、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の通信端末とを備える。前記通信端末は、前記通信端末の現在位置を表す位置情報に基づき、前記通信端末が予め定められた場所に到着したか否かを判断する。通信端末は、前記予め定められた場所において、前記通信端末の周囲をカメラによって撮像する。通信端末は、前記予め定められた場所の位置情報と、前記撮像により得られた第１の画像データとを、前記サーバ装置に送信する。 According to an aspect of the present invention, a communication system includes a server device and a portable communication terminal that can talk and autonomously image the surroundings. The communication terminal determines whether the communication terminal has arrived at a predetermined location based on position information indicating the current position of the communication terminal. The communication terminal images the surroundings of the communication terminal with a camera at the predetermined location. The communication terminal transmits the position information of the predetermined location and the first image data obtained by the imaging to the server device.

前記サーバ装置は、前記通信端末から送られてきた前記位置情報および前記第１の画像データに基づいて、音声認識用の辞書データを生成する。サーバ装置は、生成された前記辞書データを前記通信端末に送信する。前記通信端末は、前記辞書データを受信したことに基づき、前記辞書データを用いて前記通信端末のユーザと会話を行なう。 The server device generates dictionary data for speech recognition based on the position information and the first image data sent from the communication terminal. The server device transmits the generated dictionary data to the communication terminal. The communication terminal has a conversation with the user of the communication terminal using the dictionary data based on the reception of the dictionary data.

本発明の他の局面に従うと、通信端末は、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の端末である。通信端末は、音声入力部と音声出力部とを用いて、前記通信端末のユーザと会話を行なうように構成された会話制御手段と、前記通信端末の現在位置を表す位置情報に基づき、前記通信端末が予め定められた場所に到着したか否かを判断する判断手段と、前記予め定められた場所において、前記通信端末の周囲をカメラによって撮像する撮像手段と、前記予め定められた場所の位置情報と、前記撮像により得られた画像データとを、サーバ装置に送信する送信手段と、前記位置情報および前記画像データに基づいて生成された音声認識用の辞書データを、前記サーバ装置から受信する受信手段とを備える。前記会話制御手段は、前記辞書データを用いて前記通信端末のユーザと会話を行なう。 According to another aspect of the present invention, the communication terminal is a portable terminal capable of talking and autonomously imaging the surroundings. The communication terminal uses the voice input unit and the voice output unit, based on the conversation control means configured to have a conversation with the user of the communication terminal, and the position information indicating the current position of the communication terminal. Determination means for determining whether or not the terminal has arrived at a predetermined location; imaging means for imaging the periphery of the communication terminal with a camera at the predetermined location; and the position of the predetermined location Transmitting means for transmitting information and image data obtained by the imaging to the server device, and dictionary data for speech recognition generated based on the position information and the image data are received from the server device. Receiving means. The conversation control means has a conversation with the user of the communication terminal using the dictionary data.

本発明のさらに他の局面に従うと、サーバ装置は、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の通信端末と通信する。サーバ装置は、前記通信端末が予め定められた場所において撮像した被写体の画像データと、前記予め定められた場所の位置情報とを、前記通信端末から受信する受信手段と、前記通信端末から送られてきた前記位置情報および前記画像データに基づいて、音声認識用の辞書データを生成する辞書データ生成手段と、生成された前記辞書データを前記通信端末に送信する送信手段とを備える。 According to still another aspect of the present invention, the server device communicates with a portable communication terminal that can talk and autonomously image the surroundings. The server device receives the image data of the subject imaged at the predetermined location by the communication terminal and the position information of the predetermined location from the communication terminal, and is sent from the communication terminal. Based on the received position information and the image data, there is provided dictionary data generating means for generating dictionary data for speech recognition, and transmitting means for transmitting the generated dictionary data to the communication terminal.

本発明のさらに他の局面に従うと、情報処理方法は、サーバ装置と、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の通信端末とを備えた通信システムにおいて実行される。情報処理方法は、前記通信端末が、前記通信端末の現在位置を表す位置情報に基づき、前記通信端末が予め定められた場所に到着したか否かを判断するステップと、前記通信端末が、前記予め定められた場所において、前記通信端末の周囲をカメラによって撮像するステップと、前記通信端末が、前記予め定められた場所の位置情報と、前記撮像により得られた第１の画像データとを、前記サーバ装置に送信するステップと、前記サーバ装置が、前記通信端末から送られてきた前記位置情報および前記第１の画像データに基づいて、音声認識用の辞書データを生成するステップと、前記サーバ装置が、生成された前記辞書データを前記通信端末に送信するステップと、前記通信端末が、前記辞書データを受信したことに基づき、前記辞書データを用いて前記通信端末のユーザと会話を行なうステップとを備える。 If the further another situation of this invention is followed, the information processing method will be performed in the communication system provided with the server apparatus and the portable communication terminal in which the conversation is possible and the surroundings can be imaged autonomously. The information processing method includes the step of determining whether or not the communication terminal has arrived at a predetermined location based on position information indicating a current position of the communication terminal; In a predetermined place, the step of imaging the surroundings of the communication terminal with a camera, the communication terminal, the position information of the predetermined place, and the first image data obtained by the imaging, Transmitting to the server device, the server device generating dictionary data for speech recognition based on the position information and the first image data sent from the communication terminal, and the server A device transmitting the generated dictionary data to the communication terminal; and the communication terminal receiving the dictionary data based on the dictionary data. And a step of performing user and conversation of the communication terminal using the.

本発明のさらに他の局面に従うと、情報処理方法は、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の通信端末において実行される。情報処理方法は、音声入力部と音声出力部とを用いて、前記通信端末のユーザと会話を行なうステップと、前記通信端末の現在位置を表す位置情報に基づき、前記通信端末が予め定められた場所に到着したか否かを判断するステップと、前記予め定められた場所において、前記通信端末の周囲をカメラによって撮像するステップと、前記予め定められた場所の位置情報と、前記撮像により得られた画像データとを、サーバ装置に送信するステップと、前記位置情報および前記画像データに基づいて生成された音声認識用の辞書データを、前記サーバ装置から受信するステップとを備える。前記会話を行なうステップでは、前記辞書データを用いて前記通信端末のユーザと会話を行なう。 According to still another aspect of the present invention, the information processing method is executed in a portable communication terminal capable of conversation and autonomously imaging the surroundings. According to an information processing method, the communication terminal is predetermined based on a step of having a conversation with a user of the communication terminal using a voice input unit and a voice output unit, and position information indicating a current position of the communication terminal. A step of determining whether or not the vehicle has arrived; a step of capturing an image of the surroundings of the communication terminal by a camera at the predetermined location; position information of the predetermined location; and Transmitting the received image data to the server device, and receiving dictionary data for speech recognition generated based on the position information and the image data from the server device. In the conversation step, the dictionary data is used to perform a conversation with the user of the communication terminal.

本発明のさらに他の局面に従うと、情報処理方法は、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の通信端末と通信するサーバ装置において実行される。情報処理方法は、前記通信端末が予め定められた場所において撮像した被写体の画像データと、前記予め定められた場所の位置情報とを、前記通信端末から受信するステップと、前記通信端末から送られてきた前記位置情報および前記画像データに基づいて、音声認識用の辞書データを生成するステップと、生成された前記辞書データを前記通信端末に送信するステップとを備える。 If the further another situation of this invention is followed, the information processing method will be performed in the server apparatus which communicates with the portable communication terminal in which the conversation is possible and the surroundings can be imaged autonomously. An information processing method includes: receiving from the communication terminal image data of a subject imaged at a predetermined location by the communication terminal; and position information of the predetermined location from the communication terminal; Based on the received position information and the image data, there are provided a step of generating dictionary data for speech recognition, and a step of transmitting the generated dictionary data to the communication terminal.

本発明によれば、現在いる場所に適した会話を通信端末がユーザと行なうことが可能となる。 According to the present invention, it becomes possible for a communication terminal to perform a conversation suitable for a current location with a user.

本実施の形態にかかる通信システムの概略構成を説明するための図である。It is a figure for demonstrating schematic structure of the communication system concerning this Embodiment. 通信端末がレストランに到着した場合の処理を説明するための図である。It is a figure for demonstrating the process when a communication terminal arrives at a restaurant. レストランで行われる処理を詳しく説明するための図である。It is a figure for demonstrating in detail the process performed at a restaurant. 通信端末のハードウェア構成の典型例を表した図である。It is a figure showing the typical example of the hardware constitutions of the communication terminal. サーバ装置のハードウェア構成の典型例を表した図である。It is a figure showing the typical example of the hardware constitutions of a server apparatus. スケジュールデータにしたがって、ユーザが自宅から目的地であるレストランに向かっている状態を表した図である。It is a figure showing the state where the user is going to the restaurant which is a destination from a house according to schedule data. 通信端末の機能的構成を説明するための機能ブロック図である。It is a functional block diagram for demonstrating the functional structure of a communication terminal. 記憶部に記憶されている訪問履歴情報の概略構成を説明するための図である。It is a figure for demonstrating schematic structure of the visit history information memorize | stored in the memory | storage part. サーバ装置の機能的構成を説明するための機能ブロック図である。It is a functional block diagram for demonstrating the functional structure of a server apparatus. サーバ装置の記憶部に記憶されているデータテーブルの概略構成を説明するための図である。It is a figure for demonstrating schematic structure of the data table memorize | stored in the memory | storage part of the server apparatus. 通信システムにおける処理の流れを説明するためのシーケンスチャートである。It is a sequence chart for demonstrating the flow of a process in a communication system. 通信端末で行われる処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the process performed with a communication terminal. サーバ装置で行われる処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the process performed with a server apparatus.

以下、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明では、同一の部品には同一の符号を付してある。それらの名称および機能も同じである。したがって、それらについての詳細な説明は繰り返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.

［実施の形態１］
＜Ａ．システム構成＞
図１は、本実施の形態にかかる通信システムの概略構成を説明するための図である。図１を参照して、通信システム１は、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の通信端末１００と、サーバ装置２００とを含んで構成される。以下では、通信端末１００として、プログラムの実行により、筐体を構成する可動部を自動的に動かすことが可能な端末（いわゆる、ロボット型の端末）を例に挙げて説明する。 [Embodiment 1]
<A. System configuration>
FIG. 1 is a diagram for explaining a schematic configuration of a communication system according to the present embodiment. Referring to FIG. 1, the communication system 1 includes a portable communication terminal 100 that can talk and that can autonomously image the surroundings, and a server device 200. Hereinafter, as the communication terminal 100, a terminal (so-called robot-type terminal) that can automatically move a movable part that configures a housing by executing a program will be described as an example.

具体的には、通信端末１００は、手、足、頭部、胴部等を備える。通信端末１００は、典型的には、歩行可能な自律型の移動体として構成されている。頭部は、胴部に対して所定の角度内において回転可能に構成されている。また、頭部には、カメラが内蔵されている。なお、通信端末１００は、上記のような人型のロボットに限定されるものではない。 Specifically, the communication terminal 100 includes a hand, a foot, a head, a torso, and the like. The communication terminal 100 is typically configured as an autonomous mobile body capable of walking. The head is configured to be rotatable within a predetermined angle with respect to the trunk. The head has a built-in camera. The communication terminal 100 is not limited to the humanoid robot as described above.

通信端末１００は、ユーザ７００によって持ち運ばれることにより、様々な場所で利用される。通信端末１００は、基地局５００およびネットワーク６００を介して、サーバ装置２００と通信する。 The communication terminal 100 is used in various places by being carried by the user 700. Communication terminal 100 communicates with server apparatus 200 via base station 500 and network 600.

＜Ｂ．処理の概要＞
以下、通信システム１における処理の概要について説明する。通信端末１００は、通信端末１００の現在地を表す位置情報に基づき、通信端末１００が予め定められた場所に到着したか否かを判断する。以下では、予め定められた場所として、ユーザ７００の目的地の一つであるレストラン３００を例に挙げて説明する。 <B. Outline of processing>
Hereinafter, an outline of processing in the communication system 1 will be described. The communication terminal 100 determines whether or not the communication terminal 100 has arrived at a predetermined location based on position information indicating the current location of the communication terminal 100. Hereinafter, the restaurant 300, which is one of the destinations of the user 700, will be described as an example of the predetermined place.

図２は、通信端末１００がレストラン３００に到着した場合の処理を説明するための図である。図２を参照して、通信端末１００は、レストラン３００において、通信端末１００の周囲を内蔵カメラによって撮像する。 FIG. 2 is a diagram for explaining processing when the communication terminal 100 arrives at the restaurant 300. Referring to FIG. 2, communication terminal 100 captures an image of surroundings of communication terminal 100 with a built-in camera in restaurant 300.

通信端末１００は、内蔵されたカメラの向きを変えることにより、レストラン３００の店内を撮像する。このような撮像により、通信端末１００は、たとえば、テーブル９１１に置かれたメニュー９２１、壁に掛けられたメニュー９２３、テーブルに置かれた料理、テーブルに置かれた皿、テーブルに置かれたグラス、スタッフの名札、その他、店内の状況等が写り込んだ画像データを得ることができる。 The communication terminal 100 images the inside of the restaurant 300 by changing the direction of the built-in camera. By such imaging, the communication terminal 100 can, for example, have a menu 921 placed on the table 911, a menu 923 hung on the wall, a dish placed on the table, a plate placed on the table, and a glass placed on the table. It is possible to obtain image data in which the staff's name tag and other information in the store are reflected.

なお、通信端末１００は、カメラを内蔵した頭部を胴部に対して回転することにより、歩行を伴うことなくカメラの向きを変えることができる。また、通信端末１００は、歩行と頭部の回転等との組み合わせにより、カメラの位置および向きを変更してもよい。 Note that the communication terminal 100 can change the orientation of the camera without walking by rotating the head with a built-in camera relative to the torso. The communication terminal 100 may change the position and orientation of the camera by a combination of walking and head rotation.

図３は、レストラン３００で行われる処理を詳しく説明するための図である。図３を参照して、通信端末１００がレストラン３００に到着すると、予め定められた報知処理を実行する。 FIG. 3 is a diagram for explaining in detail the processing performed in the restaurant 300. Referring to FIG. 3, when communication terminal 100 arrives at restaurant 300, a predetermined notification process is executed.

具体的には、通信端末１００は、カメラによる撮像許可を通信端末１００のユーザ７００から得るための発話を行なう。詳しくは、通信端末１００は、通信端末１００の周囲の撮像許可をユーザ７００から得るための発話を行なう。 Specifically, communication terminal 100 performs an utterance for obtaining permission for imaging by a camera from user 700 of communication terminal 100. Specifically, the communication terminal 100 performs an utterance to obtain permission for imaging around the communication terminal 100 from the user 700.

たとえば、ユーザ７００が所持するバックあるいは衣服のポケットに通信端末１００が入っているため、通信端末１００が発話してもユーザ７００に音声が届かないような場合には、通信端末１００は、発話の前に、内蔵されたバイブレータによって振動してもよい。あるいは、このような場合、通信端末１００は、所定のアラームを鳴動させてもよい。通信端末１００がバック内であるか否かは、一例として、通信端末１００の周囲の明るさによって判定することができる。 For example, since the communication terminal 100 is in a bag or a pocket of clothes possessed by the user 700, and the voice does not reach the user 700 even when the communication terminal 100 speaks, the communication terminal 100 Before, you may vibrate with a built-in vibrator. Alternatively, in such a case, the communication terminal 100 may sound a predetermined alarm. As an example, whether or not the communication terminal 100 is in the back can be determined by the brightness around the communication terminal 100.

ユーザ７００が通信端末１００に対して撮像許可を与える発話を行った場合、通信端末１００は、通信端末１００の周囲の撮像を開始する。なお、撮像条件は、事前に、適宜設定可能である。たとえば、撮像の時間間隔（撮像周期）を、予め設定された時間間隔とすることができる。当該時間間隔は、カメラの向きの変更時間に基づき設定されていてもよい。また、撮像枚数の上限を規定することも可能である。さらには、撮像を許可する前に、発話により、撮像条件を設定可能なように、通信端末１００を構成してもよい。 When the user 700 makes an utterance that gives an imaging permission to the communication terminal 100, the communication terminal 100 starts imaging around the communication terminal 100. The imaging conditions can be set as appropriate in advance. For example, the imaging time interval (imaging cycle) can be set to a preset time interval. The time interval may be set based on the change time of the camera orientation. It is also possible to define an upper limit for the number of images to be captured. Furthermore, the communication terminal 100 may be configured so that the imaging conditions can be set by speaking before allowing the imaging.

なお、撮像許可を与える発話は、通常使用される言葉（たとえば、“良いよ”、“ＯＫ”等）のみならず、予め設定された特別なフレーズであってもよい。 Note that the utterance giving the imaging permission may be a special phrase set in advance, as well as a commonly used word (for example, “good”, “OK”, etc.).

通信端末１００は、撮像を行なうと、通信端末１００の現在地の位置情報とともに、当該撮像により得られた画像データを、サーバ装置２００に送信する。画像データの送信は、位置情報に関連付けられていればよく、全ての撮像が終了してから全ての画像データを一括してサーバ装置に送ってもよいし、あるいは撮像ごとに画像データを逐次、サーバ装置２００に送信してもよい。 When imaging is performed, the communication terminal 100 transmits image data obtained by the imaging to the server device 200 together with the position information of the current location of the communication terminal 100. The transmission of the image data only needs to be associated with the position information, and all the image data may be sent collectively to the server device after all imaging is completed, or the image data is sequentially transmitted for each imaging, You may transmit to the server apparatus 200.

サーバ装置２００は、通信端末１００から送られてきた位置情報および画像データに基づいて、音声認識用のユーザ辞書データを生成する。サーバ装置２００は、生成されたユーザ辞書データを通信端末１００に送信する。 The server device 200 generates user dictionary data for voice recognition based on the position information and image data sent from the communication terminal 100. The server device 200 transmits the generated user dictionary data to the communication terminal 100.

詳しくは、サーバ装置２００は、位置情報と画像データとを用いて、予め定められた場所であるレストラン３００に関連するキーワードを決定する。より詳しくは、サーバ装置２００は、位置情報と画像データと予め用意されたデータベース（図１０）とを用いて、レストラン３００に関連するキーワードを決定する。さらに、サーバ装置２００は、決定されたキーワードを用いてユーザ辞書データを生成し、当該ユーザ辞書データを通信端末１００に送信する。 Specifically, the server device 200 determines a keyword related to the restaurant 300 that is a predetermined place using the position information and the image data. More specifically, the server device 200 determines a keyword related to the restaurant 300 using position information, image data, and a database prepared in advance (FIG. 10). Further, the server device 200 generates user dictionary data using the determined keyword, and transmits the user dictionary data to the communication terminal 100.

通信端末１００は、ユーザ辞書データをサーバ装置２００から受信する。通信端末１００は、当該ユーザ辞書データを用いた音声認識を行なうことにより、ユーザ７００と会話を行なう。 The communication terminal 100 receives user dictionary data from the server device 200. The communication terminal 100 has a conversation with the user 700 by performing voice recognition using the user dictionary data.

以上のように、サーバ装置２００は、予め定められた場所の位置情報と当該場所の画像データとに基づいて決定されたキーワードを利用してユーザ辞書データを生成する。それゆえ、サーバ装置２００は、当該位置情報および当該画像データの少なくとも一方を利用せずにキーワードを決定する構成よりも、予め定められた場所に関連するデータを多く含んだユーザ辞書データを生成することが可能となる。 As described above, the server device 200 generates user dictionary data using a keyword determined based on position information of a predetermined place and image data of the place. Therefore, the server device 200 generates user dictionary data including a lot of data related to a predetermined place, rather than a configuration in which a keyword is determined without using at least one of the position information and the image data. It becomes possible.

したがって、通信端末１００は、このようなユーザ辞書データを利用することにより、当該ユーザ辞書データを使用しない構成に比べて、予め定められた場所に適した会話をユーザと行なうことが可能となる。 Therefore, by using such user dictionary data, the communication terminal 100 can perform a conversation with a user that is suitable for a predetermined place as compared to a configuration that does not use the user dictionary data.

なお、上記においては、レストラン３００内のみを撮像する場合を例に挙げているがこれに限定されるものではない。通信端末１００がレストラン３００の前に到着した時点で、ユーザ７００に対して撮像許可を得るための発話を行なってもよい。 In addition, in the above, the case where only the inside of the restaurant 300 is imaged is described as an example, but the present invention is not limited to this. When the communication terminal 100 arrives in front of the restaurant 300, the user 700 may make an utterance for obtaining an imaging permission.

また、撮影許可の発話を行なう場所は、レストラン３００から離れていた場所であってもよい。すなわち、通信端末１００が撮像開始する場所は、レストラン３００から離れた場所であってもよい。 Further, the place where the utterance of permission for photographing is performed may be a place away from the restaurant 300. That is, the place where the communication terminal 100 starts imaging may be a place away from the restaurant 300.

以下では、上記のように、通信端末１００がレストラン３００の前に到着するよりも前に、ユーザ７００に対して撮像許可を得るための発話を行なう構成について説明する。詳しくは、通信端末１００がレストラン３００から所定の距離だけ離れた場所に到達した場合、通信端末１００が、撮像許可を得たことを条件に、撮像を開始する構成について説明する。 Hereinafter, as described above, a description will be given of a configuration in which an utterance for obtaining imaging permission is given to the user 700 before the communication terminal 100 arrives in front of the restaurant 300. Specifically, a configuration in which imaging is started when the communication terminal 100 reaches a place away from the restaurant 300 by a predetermined distance on the condition that the communication terminal 100 has obtained imaging permission.

＜Ｃ．ハードウェア構成＞
図４は、通信端末１００のハードウェア構成の典型例を表した図である。図４を参照して、通信端末１００は、主たる構成要素として、プログラムを実行するＣＰＵ１５１と、データを不揮発的に格納するＲＯＭ１５２と、ＣＰＵ１５１によるプログラムの実行により生成されたデータ、又は入力装置を介して入力されたデータを揮発的に格納するＲＡＭ１５３と、データを不揮発的に格納するフラッシュメモリ１５４と、ＬＥＤ１５５と、操作キー１５６と、スイッチ１５７と、ＧＰＳ（Global Positioning System）受信機１５８と、通信ＩＦ１５９と、電源回路１６０と、タッチスクリーン１６１と、マイク１６２と、スピーカ１６３と、カメラ１６４と、駆動装置１６５と、アンテナ１５８１，１５９１とを含む。各構成要素は、相互にデータバスによって接続されている。 <C. Hardware configuration>
FIG. 4 is a diagram illustrating a typical example of the hardware configuration of the communication terminal 100. Referring to FIG. 4, communication terminal 100 includes, as main components, CPU 151 that executes a program, ROM 152 that stores data in a nonvolatile manner, data generated by execution of the program by CPU 151, or an input device. RAM 153 for storing the input data in a volatile manner, flash memory 154 for storing the data in a nonvolatile manner, LED 155, operation keys 156, switch 157, GPS (Global Positioning System) receiver 158, communication It includes an IF 159, a power supply circuit 160, a touch screen 161, a microphone 162, a speaker 163, a camera 164, a driving device 165, and antennas 1581 and 1591. Each component is connected to each other by a data bus.

タッチスクリーン１６１は、ディスプレイ１６１１と、タッチパネル１６１２により構成される。アンテナ１５８１は、ＧＰＳ受信機１５８用のアンテナである。アンテナ１５９１は、通信ＩＦ１５９用のアンテナである。 The touch screen 161 includes a display 1611 and a touch panel 1612. The antenna 1581 is an antenna for the GPS receiver 158. The antenna 1591 is an antenna for the communication IF 159.

ＬＥＤ１５５は、通信端末１００の動作状態を表す各種の表示ランプである。たとえば、ＬＥＤ１５５は、通信端末１００の主電源のオンまたはオフ状態、およびフラッシュメモリ１５４への読み出しまたは書き込み状態等を表す。 The LED 155 is various display lamps that indicate the operation state of the communication terminal 100. For example, the LED 155 represents an on or off state of the main power supply of the communication terminal 100, a read or write state to the flash memory 154, and the like.

操作キー１５６は、通信端末１００のユーザが主電源のオンまたはオフ等するためのキー（操作ボタン）である。スイッチ１５７は、電源回路１６０に給電を行なうか否かを切替えるための主電源用のスイッチ、およびその他の各種の押しボタンスイッチである。 The operation key 156 is a key (operation button) for the user of the communication terminal 100 to turn on or off the main power. The switch 157 is a main power switch for switching whether or not to supply power to the power circuit 160 and other various push button switches.

ＧＰＳ受信機１５８は、ＧＰＳ衛星からの電波に基づき、通信端末１００の現在位置の位置情報を取得する。ＧＰＳ受信機１５８によって取得された位置情報は、通信ＩＤ１５９を介して、サーバ装置２００に送信される。通信端末１００による位置情報の取得の開始タイミングについては、後述する。 The GPS receiver 158 acquires position information on the current position of the communication terminal 100 based on radio waves from GPS satellites. The position information acquired by the GPS receiver 158 is transmitted to the server device 200 via the communication ID 159. The start timing of acquisition of position information by the communication terminal 100 will be described later.

通信ＩＦ１５９は、サーバ装置２００に対するデータの送信処理およびサーバ装置２００から送信されたデータの受信処理を行なう。 The communication IF 159 performs processing for transmitting data to the server device 200 and processing for receiving data transmitted from the server device 200.

電源回路１６０は、コンセントを介して受信した商用電源の電圧を降圧し、通信端末１００の各部に電源供給を行なう回路である。 The power supply circuit 160 is a circuit that steps down the voltage of the commercial power received via the outlet and supplies power to each part of the communication terminal 100.

タッチスクリーン１６１は、各種のデータを表示および入力を受け付けるためのデバイスである。ディスプレイ１６１１は、画像を表示するための画面を含んで構成されている。 The touch screen 161 is a device for displaying various data and receiving input. The display 1611 includes a screen for displaying an image.

マイク１６２は、通信端末１００の周囲の音を集音する。たとえば、マイク１６２は、ユーザ７００の発話に基づく音声を集音する。 The microphone 162 collects sounds around the communication terminal 100. For example, the microphone 162 collects sound based on the utterance of the user 700.

スピーカ１６３は、音声を出力する。スピーカ１６３は、ある局面においては、ユーザ等とのコミュニケーションのために、発話を行なう。 The speaker 163 outputs sound. The speaker 163 speaks for communication with a user or the like in a certain aspect.

カメラ１６４は、通信端末１００の周囲の被写体を撮像するための撮像装置である。カメラ１６４による撮像により得られた画像データは、通信ＩＤ１５９を介して、サーバ装置２００に送信される。 The camera 164 is an imaging device for imaging a subject around the communication terminal 100. Image data obtained by imaging with the camera 164 is transmitted to the server apparatus 200 via the communication ID 159.

駆動装置１６５は、通信端末１００の手、足、頭部を駆動させるための駆動機構である。なお、駆動装置１６５により足が駆動されることにより、通信端末１００は歩行する。また、駆動装置１６５によって頭部が胴部に対して回転することにより、カメラ１６４の向きが代わる。 The driving device 165 is a driving mechanism for driving the hand, foot, and head of the communication terminal 100. Note that the communication terminal 100 walks when the foot is driven by the driving device 165. Further, the direction of the camera 164 is changed when the head is rotated with respect to the trunk by the driving device 165.

通信端末１００における処理は、各ハードウェアおよびＣＰＵ１５１により実行されるソフトウェアによって実現される。このようなソフトウェアは、フラッシュメモリ１５４に予め記憶されている場合がある。また、ソフトウェアは、その他の記憶媒体に格納されて、プログラムプロダクトとして流通している場合もある。あるいは、ソフトウェアは、いわゆるインターネットに接続されている情報提供事業者によってダウンロード可能なプログラムプロダクトとして提供される場合もある。このようなソフトウェアは、読取装置によりその記憶媒体から読み取られて、あるいは、通信ＩＦ１５９等を介してダウンロードされた後、フラッシュメモリ１５４に一旦格納される。そのソフトウェアは、ＣＰＵ１５１によってフラッシュメモリ１５４から読み出され、ＲＡＭ１５３に実行可能なプログラムの形式で格納される。ＣＰＵ１５１は、そのプログラムを実行する。 The processing in the communication terminal 100 is realized by each hardware and software executed by the CPU 151. Such software may be stored in advance in the flash memory 154. The software may be stored in other storage media and distributed as a program product. Alternatively, the software may be provided as a program product that can be downloaded by an information provider connected to the so-called Internet. Such software is read from the storage medium by the reading device or downloaded via the communication IF 159 or the like and then temporarily stored in the flash memory 154. The software is read from the flash memory 154 by the CPU 151 and stored in the RAM 153 in the form of an executable program. CPU 151 executes the program.

同図に示される通信端末１００を構成する各構成要素は、一般的なものである。したがって、本発明の本質的な部分は、ＲＡＭ１５３、フラッシュメモリ１５４、記憶媒体に格納されたソフトウェア、あるいはネットワークを介してダウンロード可能なソフトウェアであるともいえる。なお、通信端末１００の各ハードウェアの動作は周知であるので、詳細な説明は繰り返さない。 Each component which comprises the communication terminal 100 shown by the figure is a general thing. Therefore, it can be said that the essential part of the present invention is RAM 153, flash memory 154, software stored in a storage medium, or software that can be downloaded via a network. In addition, since the operation | movement of each hardware of the communication terminal 100 is known, detailed description is not repeated.

なお、記録媒体としては、ＤＶＤ−ＲＡＭに限られず、ＤＶＤ-ＲＯＭ、ＣＤ−ＲＯＭ、ＦＤ、ハードディスク、磁気テープ、カセットテープ、光ディスク、ＥＥＰＲＯＭ、フラッシュＲＯＭなどの半導体メモリ等の固定的にプログラムを担持する媒体でもよい。また、記録媒体は、当該プログラム等をコンピュータが読取可能な一時的でない媒体である。また、ここでいうプログラムとは、ＣＰＵにより直接実行可能なプログラムだけでなく、ソースプログラム形式のプログラム、圧縮処理されたプログラム、暗号化されたプログラム等を含む。 The recording medium is not limited to a DVD-RAM, and a fixed program such as a semiconductor memory such as a DVD-ROM, a CD-ROM, an FD, a hard disk, a magnetic tape, a cassette tape, an optical disk, an EEPROM, and a flash ROM is supported. It may be a medium for The recording medium is a non-temporary medium that can be read by the computer. The program here includes not only a program directly executable by the CPU but also a program in a source program format, a compressed program, an encrypted program, and the like.

図５は、サーバ装置２００のハードウェア構成の典型例を表した図である。図５を参照して、サーバ装置２００は、主たる構成要素として、プログラムを実行するＣＰＵ２５１と、データを不揮発的に格納するＲＯＭ２５２と、ＣＰＵ２５１によるプログラムの実行により生成されたデータ、又は入力装置を介して入力されたデータを揮発的に格納するＲＡＭ２５３と、データを不揮発的に格納するＨＤＤ２５４と、ＬＥＤ２５５と、スイッチ２５６と、通信ＩＦ（Interface）２５７と、電源回路２５８と、ディスプレイ２５９と、操作キー２６０とを含む。各構成要素は、相互にデータバスによって接続されている。 FIG. 5 is a diagram illustrating a typical example of the hardware configuration of the server apparatus 200. Referring to FIG. 5, server apparatus 200 includes, as main components, CPU 251 that executes a program, ROM 252 that stores data in a nonvolatile manner, data generated by execution of the program by CPU 251, or an input device. RAM 253 for storing the input data in a volatile manner, HDD 254 for storing the data in a nonvolatile manner, LED 255, switch 256, communication IF (Interface) 257, power supply circuit 258, display 259, and operation keys 260. Each component is connected to each other by a data bus.

電源回路２５８は、コンセントを介して受信した商用電源の電圧を降圧し、サーバ装置２００の各部に電源供給を行なう回路である。スイッチ２５６は、電源回路２５８に給電を行なうか否かを切替えるための主電源用のスイッチ、およびその他の各種の押しボタンスイッチである。ディスプレイ２５９は、各種のデータを表示するためのデバイスである。 The power supply circuit 258 is a circuit that steps down the voltage of the commercial power received via the outlet and supplies power to each part of the server device 200. The switch 256 is a main power switch for switching whether or not to supply power to the power circuit 258 and other various push button switches. The display 259 is a device for displaying various data.

通信ＩＦ２５７は、通信端末１００に対するデータの送信処理および通信端末１００から送信されたデータの受信処理を行なう。 Communication IF 257 performs a data transmission process for communication terminal 100 and a data reception process for data transmitted from communication terminal 100.

ＬＥＤ２５５は、サーバ装置２００の動作状態を表す各種の表示ランプである。たとえば、ＬＥＤ２５５は、サーバ装置２００の主電源のオンまたはオフ状態、およびＨＤＤ２５４への読み出しまたは書き込み状態等を表す。操作キー２６０は、サーバ装置２００のユーザがサーバ装置２００へデータを入力するための用いるキー（キーボード）である。 The LED 255 is various display lamps that indicate the operating state of the server device 200. For example, the LED 255 represents an on or off state of the main power supply of the server apparatus 200, a read or write state to the HDD 254, and the like. The operation key 260 is a key (keyboard) used for the user of the server device 200 to input data to the server device 200.

サーバ装置２００における処理は、各ハードウェアおよびＣＰＵ２５１により実行されるソフトウェアによって実現される。このようなソフトウェアは、ＨＤＤ２５４に予め記憶されている場合がある。また、ソフトウェアは、その他の記憶媒体に格納されて、プログラムプロダクトとして流通している場合もある。あるいは、ソフトウェアは、いわゆるインターネットに接続されている情報提供事業者によってダウンロード可能なプログラムプロダクトとして提供される場合もある。このようなソフトウェアは、読取装置によりその記憶媒体から読み取られて、あるいは、通信ＩＦ２５７等を介してダウンロードされた後、ＨＤＤ２５４に一旦格納される。そのソフトウェアは、ＣＰＵ２５１によってＨＤＤ２５４から読み出され、ＲＡＭ２５３に実行可能なプログラムの形式で格納される。ＣＰＵ２５１は、そのプログラムを実行する。 The processing in the server device 200 is realized by each hardware and software executed by the CPU 251. Such software may be stored in the HDD 254 in advance. The software may be stored in other storage media and distributed as a program product. Alternatively, the software may be provided as a program product that can be downloaded by an information provider connected to the so-called Internet. Such software is read from the storage medium by a reading device or downloaded via the communication IF 257 or the like and then temporarily stored in the HDD 254. The software is read from the HDD 254 by the CPU 251 and stored in the RAM 253 in the form of an executable program. The CPU 251 executes the program.

同図に示されるサーバ装置２００を構成する各構成要素は、一般的なものである。したがって、本発明の本質的な部分は、ＲＡＭ２５３、ＨＤＤ２５４、記憶媒体に格納されたソフトウェア、あるいはネットワークを介してダウンロード可能なソフトウェアであるともいえる。なお、サーバ装置２００の各ハードウェアの動作は周知であるので、詳細な説明は繰り返さない。 Each component which comprises the server apparatus 200 shown by the figure is a general thing. Therefore, it can be said that the essential part of the present invention is RAM 253, HDD 254, software stored in a storage medium, or software that can be downloaded via a network. Since the operation of each hardware of server device 200 is well known, detailed description will not be repeated.

＜Ｄ．処理の詳細＞
（ｄ１．通信端末１００のデータ送信）
図６は、スケジュールデータに従って、ユーザ７００が自宅８００から目的地（予め定められた場所）であるレストラン３００に向かっている状態を表した図である。 <D. Details of processing>
(D1. Data transmission of communication terminal 100)
FIG. 6 is a diagram showing a state in which the user 700 is heading from the home 800 to the restaurant 300 as a destination (predetermined place) according to the schedule data.

図６を参照して、ユーザ７００は、通信端末１００を所持した状態で、レストラン３００に向かうために自宅８００を出る（位置Ｐ０）。なお、図６は、ユーザ７００が通信端末１００を専用のケース４００に入れた状態で所持している状態を示している。 Referring to FIG. 6, user 700 leaves home 800 in order to go to restaurant 300 with position of communication terminal 100 (position P0). FIG. 6 shows a state in which the user 700 is carrying the communication terminal 100 in a dedicated case 400.

通信端末１００は、現在時刻Ｔ１とスケジュール帳に登録した予定時刻Ｔ０（たとえば、レストラン３００の予約時間）との時間差ｔが閾値Ｔｈ１以下になったと判断すると（位置Ｐ１）、通信端末１００は、予め定められた周期で位置情報の取得を開始する。さらに、通信端末１００は、目的地の名称と、当該目的地を過去に訪れた回数を表す回数情報と、前回訪れた日からの経過日数を表す日数情報とを、サーバ装置２００に送信する。なお、目的地の名称と回数情報と日数情報との送信は、一回でよい。 When the communication terminal 100 determines that the time difference t between the current time T1 and the scheduled time T0 registered in the schedule book (for example, the reservation time of the restaurant 300) is equal to or less than the threshold Th1 (position P1), the communication terminal 100 Acquisition of position information is started at a predetermined cycle. Furthermore, the communication terminal 100 transmits to the server apparatus 200 the name of the destination, the number information indicating the number of times the destination has been visited in the past, and the number of days information indicating the number of days elapsed since the previous visit. The destination name, the number of times information, and the number of days information may be transmitted only once.

通信端末１００は、現在地と目的地との間の距離Ｄが閾値Ｔｈ２以下になったと判断すると（位置Ｐ２）、撮像の許可を得るための発話（報知処理）を行なう。通信端末１００は、撮像の許可をユーザ７００から得たことを条件に、１分あたりＭ枚（Ｍは自然数）の撮像が行われる周期（タイミング）で、通信端末１００の周囲を撮像する。 If the communication terminal 100 determines that the distance D between the current location and the destination is equal to or less than the threshold Th2 (position P2), the communication terminal 100 performs an utterance (notification process) for obtaining permission for imaging. The communication terminal 100 images the surroundings of the communication terminal 100 at a cycle (timing) in which M images (M is a natural number) are imaged per minute on the condition that permission for imaging is obtained from the user 700.

通信端末１００は、サーバ装置２００に対して、撮像を行なった時の位置情報と、当該撮像により得られた複数の画像データとの送信を開始する。また、位置情報と画像データの送信は、通信端末１００が目的地に到着するまで行われる。 Communication terminal 100 starts transmission of position information when imaging is performed and a plurality of pieces of image data obtained by imaging to server device 200. Further, the transmission of the position information and the image data is performed until the communication terminal 100 arrives at the destination.

さらに詳しくは、通信端末１００は、通信端末１００の移動速度Ｖｍと閾値Ｔｈ２とから、目的地到着までの所要時間Ｔ２を算出し、（Ｎ−Ｍ）／Ｔ２の割合で、１分あたりの撮像枚数を増加させる。なお、Ｎは、Ｍよりも大きい自然数である。 More specifically, the communication terminal 100 calculates a required time T2 to reach the destination from the moving speed Vm of the communication terminal 100 and the threshold value Th2, and images per minute at a rate of (N−M) / T2. Increase the number. N is a natural number larger than M.

（ｄ２．通信端末１００の機能的構成）
図７は、通信端末１００の機能的構成を説明するための機能ブロック図である。図７を参照して、通信端末１００は、制御部１１１と、記憶部１１２と、位置情報取得部１１３と、撮像部１１４と、駆動部１１５と、音声入力部１１６と、音声出力部１１７と、通信処理部１１８とを備えている。 (D2. Functional configuration of communication terminal 100)
FIG. 7 is a functional block diagram for explaining a functional configuration of the communication terminal 100. Referring to FIG. 7, communication terminal 100 includes control unit 111, storage unit 112, position information acquisition unit 113, imaging unit 114, drive unit 115, audio input unit 116, and audio output unit 117. And a communication processing unit 118.

制御部１１１は、通信端末１００の全体の動作を制御する。制御部１１１は、表示制御部１１１１と、会話制御部１１１２と、駆動制御部１１１３と、間隔算出部１１１４と、判断部１１１５とを有する。 The control unit 111 controls the overall operation of the communication terminal 100. The control unit 111 includes a display control unit 1111, a conversation control unit 1112, a drive control unit 1113, an interval calculation unit 1114, and a determination unit 1115.

通信処理部１１８は、ネットワーク６００を介したサーバ装置２００との通信に用いられる。通信処理部１１８は、データをサーバ装置２００に送信するための送信部１１８１と、データをサーバ装置２００から受信するための受信部１１８２とを有する。 The communication processing unit 118 is used for communication with the server device 200 via the network 600. The communication processing unit 118 includes a transmission unit 1181 for transmitting data to the server device 200 and a reception unit 1182 for receiving data from the server device 200.

記憶部１１２は、オペレーティングシステム、複数のアプリケーションプログラム、訪問履歴情報Ｄ８、目的地の位置情報、スケジュールデータ、閾値Ｔｈ１，Ｔｈ２を記憶している。さらに、記憶部１１２は、撮像部１１４によって撮像された被写体の画像データを記憶する。また、記憶部１１２は、サーバ装置２００から音声認識用のユーザ辞書データを受信したことに基づき、当該ユーザ辞書データを記憶する。なお、ユーザ辞書データが記憶部１１２に既に記憶されている場合には、サーバ装置２００から新たに受信したユーザ辞書データによって、記憶部１１２に既に記憶されているユーザ辞書データが更新される。ユーザ辞書データの更新は、ユーザ辞書データの置換であってもよいし、あるいはデータの差分の追加であってもよい。 The storage unit 112 stores an operating system, a plurality of application programs, visit history information D8, destination position information, schedule data, and threshold values Th1 and Th2. Further, the storage unit 112 stores image data of the subject imaged by the imaging unit 114. The storage unit 112 stores the user dictionary data based on the reception of the user dictionary data for speech recognition from the server device 200. If the user dictionary data is already stored in the storage unit 112, the user dictionary data already stored in the storage unit 112 is updated with the user dictionary data newly received from the server device 200. The update of the user dictionary data may be a replacement of the user dictionary data or an addition of a data difference.

スケジュールデータは、ユーザがスケジュール帳のアプリケーション等を用いて入力したデータである。スケジュールデータには、日時の情報、目的地の情報等が記載されている。たとえば、スケジュールデータには、「２０１６年３月１日の１３時にレストラン３００で食事」といった内容が記述されている。 The schedule data is data input by the user using a schedule book application or the like. The schedule data includes date and time information, destination information, and the like. For example, in the schedule data, contents such as “meal at the restaurant 300 at 13:00 on March 1, 2016” are described.

目的地の位置情報には、一例として、スケジュールデータで設定された訪問場所（たとえば、レストラン３００）の位置情報が記憶されている。当該位置情報は、通信処理部１１８を介して、ネットワーク６００を介して各種のサーバ装置から取得される。具体的には、制御部１１１がスケジュールデータに記載の訪問場所を読み出して、当該訪問場所の位置情報を取得する。なお、訪問場所の位置情報は、緯度および経度情報であってもよいし、住所情報であってもよい。 In the destination location information, as an example, location information of a visited location (for example, restaurant 300) set in the schedule data is stored. The position information is acquired from various server devices via the network 600 via the communication processing unit 118. Specifically, the control unit 111 reads out the visited place described in the schedule data, and acquires the position information of the visited place. The location information of the visited place may be latitude and longitude information or address information.

訪問履歴情報Ｄ８は、スケジュールデータと、目的地の位置情報とに基づき、生成される。訪問履歴情報Ｄ８には、過去に訪問した場所の名称に対して、訪問日時と位置情報とが関連付けて記憶されている（図８）。 The visit history information D8 is generated based on the schedule data and the location information of the destination. In the visit history information D8, the visit date / time and position information are stored in association with the name of the place visited in the past (FIG. 8).

位置情報取得部１１３は、上述した時間差ｔが閾値Ｔｈ１以下になると、通信端末１００の現在位置の位置情報の取得を開始する。位置情報取得部１１３は、取得された位置情報を、制御部１１１に送る。この場合、制御部１１１の判断部１１１５は、取得された位置情報に基づき、通信端末１００が予め定められた場所（たとえば、レストラン３００等の目的地）に到着したか否かを判断する。 The position information acquisition unit 113 starts acquiring the position information of the current position of the communication terminal 100 when the time difference t described above becomes equal to or less than the threshold Th1. The position information acquisition unit 113 sends the acquired position information to the control unit 111. In this case, the determination unit 1115 of the control unit 111 determines whether or not the communication terminal 100 has arrived at a predetermined location (for example, a destination such as the restaurant 300) based on the acquired position information.

撮像部１１４は、距離Ｄが閾値Ｔｈ２以下になると、通信端末１００の周囲の撮像を開始し、当該撮像により得られた画像データを制御部１１１に送る。なお、撮像部１１４による撮像処理は、制御部１１１からの指令に基づき行われる。たとえば、制御部１１１は撮像部１１４の向きを逐次変更して、撮像部１１４に撮像を行なわせる。また、制御部１１１は、撮像のタイミングを上述したように制御する。 When the distance D becomes equal to or smaller than the threshold Th2, the imaging unit 114 starts imaging around the communication terminal 100 and sends image data obtained by the imaging to the control unit 111. Note that the imaging processing by the imaging unit 114 is performed based on a command from the control unit 111. For example, the control unit 111 sequentially changes the orientation of the imaging unit 114 and causes the imaging unit 114 to perform imaging. Further, the control unit 111 controls the imaging timing as described above.

撮像部１１４による撮像が開始されると、制御部１１１は、送信部１１８１を介して、撮像により得られた画像データと当該撮像が行われた場所の位置情報とを互いに関連付けて、サーバ装置２００に送信する処理を開始する。 When imaging by the imaging unit 114 is started, the control unit 111 associates the image data obtained by imaging with the position information of the place where the imaging is performed via the transmission unit 1181, and the server device 200. The process to send to is started.

音声入力部１１６は、通信端末１００の周囲の音声を集音し、集音された音声をデジタルデータとして制御部１１１に送る。 The voice input unit 116 collects voices around the communication terminal 100 and sends the collected voices to the control unit 111 as digital data.

制御部１１１の表示制御部１１１１は、表示部１１９に各種の情報を表示させる。
駆動制御部１１１３、駆動部１１５を駆動する制御を実行する。これにより、通信端末１００は、可動部を動かすことが可能となる。たとえば、通信端末１００が目的地（たとえば、レストラン３００）に着いた場合、通信端末１００は、歩行あるいは頭部を胴部に対して回転させることができる。これにより、通信端末１００は、撮像部１１４の位置および向きを変更可能となる。 The display control unit 1111 of the control unit 111 causes the display unit 119 to display various types of information.
Control for driving the drive control unit 1113 and the drive unit 115 is executed. Thereby, the communication terminal 100 can move the movable part. For example, when the communication terminal 100 arrives at a destination (for example, a restaurant 300), the communication terminal 100 can walk or rotate the head with respect to the torso. As a result, the communication terminal 100 can change the position and orientation of the imaging unit 114.

制御部１１１は、訪問履歴情報Ｄ８に基づいて、今回の目的地を過去に訪問した回数を算出する。算出された結果は、記憶部１１２に記憶される。このように、通信端末１００は、ユーザが目的地を過去に訪れた回数を表す回数情報を記憶する。 Based on the visit history information D8, the control unit 111 calculates the number of times the current destination has been visited in the past. The calculated result is stored in the storage unit 112. As described above, the communication terminal 100 stores the frequency information indicating the number of times the user has visited the destination in the past.

間隔算出部１１１４は、今回の訪問と、前回の訪問との間の日にちの間隔を算出する。つまり、間隔算出部１１１４は、前回訪れた日からの経過日数を算出する。 The interval calculation unit 1114 calculates the date interval between the current visit and the previous visit. That is, the interval calculation unit 1114 calculates the number of days that have elapsed since the last visit.

制御部１１１は、回数情報と、前回訪れた日からの経過日数を表す日数情報とを、撮像により得られた画像データと当該撮像が行われた場所の位置情報とに関連付けて、通信処理部１１８を介して、サーバ装置２００に送信する。 The control unit 111 associates the number-of-times information and the number of days information indicating the number of days elapsed since the previous visit with the image data obtained by the imaging and the position information of the place where the imaging is performed, The data is transmitted to the server apparatus 200 via 118.

また、制御部１１１は、上述した各データおよび情報をサーバ装置２００に送信した後に、これらのデータおよび情報に基づいてサーバ装置２００で生成されたユーザ辞書データを、受信部１１８２を介してサーバ装置２００から受信する。ユーザ辞書データは、記憶部１１２に記憶される。 Further, the control unit 111 transmits the above-described data and information to the server device 200, and then transmits the user dictionary data generated by the server device 200 based on these data and information via the receiving unit 1182 to the server device. 200. User dictionary data is stored in the storage unit 112.

会話制御部１１１２は、ユーザとの間の会話を制御する。また、会話制御部１１１２は、音声入力部１１６から入力された音声データを解析し、解析結果に基づいた内容を音声出力部１１７に出力（発話）させる。詳しくは、会話制御部１１１２は、サーバ装置２００から取得した上記ユーザ辞書データを利用した会話を開始する。 The conversation control unit 1112 controls the conversation with the user. Further, the conversation control unit 1112 analyzes the voice data input from the voice input unit 116 and causes the voice output unit 117 to output (speak) the contents based on the analysis result. Specifically, the conversation control unit 1112 starts a conversation using the user dictionary data acquired from the server device 200.

なお、位置情報取得部１１３、撮像部１１４、駆動部１１５、音声入力部１１６、音声出力部１１７、通信処理部１１８、表示部１１９は、それぞれ、図４における、ＧＰＳ受信機１５８、カメラ１６４、駆動装置１６５、マイク１６２、スピーカ１６３、通信ＩＦ１５９、ディスプレイ１６１１に対応する。また、制御部１１１は、ＣＰＵ１５１がメモリに格納されたオペレーティングシステムおよび各種のプログラムを実行することにより実現される。なお、記憶部１１２は、図４のメモリに対応する。また、閾値Ｔｈ１，Ｔｈ２の利用の仕方については後述する。 Note that the position information acquisition unit 113, the imaging unit 114, the drive unit 115, the audio input unit 116, the audio output unit 117, the communication processing unit 118, and the display unit 119 are respectively the GPS receiver 158, the camera 164, This corresponds to the driving device 165, the microphone 162, the speaker 163, the communication IF 159, and the display 1611. The control unit 111 is realized by the CPU 151 executing an operating system and various programs stored in the memory. The storage unit 112 corresponds to the memory of FIG. Further, how to use the threshold values Th1 and Th2 will be described later.

図８は、記憶部１１２に記憶されている訪問履歴情報Ｄ８の概略構成を説明するための図である。図８を参照して、訪問履歴情報Ｄ８は、訪問先の名称と、訪問日時と、緯度経度情報とを含む。なお、訪問履歴情報Ｄ８は、緯度経度情報とともに、あるいは緯度経度情報の代わりに、住所情報を含んでいてもよい。上述したように、制御部１１１は、訪問履歴情報Ｄ８に基づき、回数情報および日数情報を生成する。 FIG. 8 is a diagram for explaining a schematic configuration of the visit history information D8 stored in the storage unit 112. Referring to FIG. 8, the visit history information D8 includes a name of a visited place, a visit date / time, and latitude / longitude information. The visit history information D8 may include address information together with the latitude / longitude information or instead of the latitude / longitude information. As described above, the control unit 111 generates the number of times information and the number of days information based on the visit history information D8.

（ｄ３．サーバ装置２００の機能的構成）
図９は、サーバ装置２００の機能的構成を説明するための機能ブロック図である。図９を参照して、サーバ装置２００は、制御部２１１と、記憶部２１２と、通信処理部２１３とを備える。 (D3. Functional configuration of server device 200)
FIG. 9 is a functional block diagram for explaining a functional configuration of the server apparatus 200. Referring to FIG. 9, server apparatus 200 includes a control unit 211, a storage unit 212, and a communication processing unit 213.

制御部２１１は、サーバ装置２００の全体の動作を制御する。制御部２１１は、キーワード決定部２１１１と、辞書データ生成部２１１２とを有する。 The control unit 211 controls the overall operation of the server device 200. The control unit 211 includes a keyword determination unit 2111 and a dictionary data generation unit 2112.

記憶部２１２は、オペレーティングシステム、複数のアプリケーションプログラム、データテーブルＤ１０、辞書データテーブルを記憶している。 The storage unit 212 stores an operating system, a plurality of application programs, a data table D10, and a dictionary data table.

通信処理部２１３は、ネットワーク６００を介した、通信端末１００との間の通信処理を行なう。通信処理部２１３は、送信部２１３１と、受信部２１３２とを有する。具体的には、ある局面において、受信部２１３２は、通信端末１００から、位置情報、画像データ、回数情報、および日数情報を受信する。これらのデータ（位置情報、画像データ、回数情報、日数情報）は、制御部２１１によって記憶部２１２に記憶される。 The communication processing unit 213 performs communication processing with the communication terminal 100 via the network 600. The communication processing unit 213 includes a transmission unit 2131 and a reception unit 2132. Specifically, in a certain aspect, receiving unit 2132 receives position information, image data, number-of-times information, and number-of-days information from communication terminal 100. These data (position information, image data, number of times information, number of days information) are stored in the storage unit 212 by the control unit 211.

制御部２１１のキーワード決定部２１１１は、通信端末１００から受信した各種のデータ（位置情報、画像データ、回数情報、および日数情報）と、データテーブルＤ１０（図１０）とを利用して、キーワードを決定する。キーワードの決定方法の具体例については、後述する。 The keyword determination unit 2111 of the control unit 211 uses the various data received from the communication terminal 100 (position information, image data, number of times information, and number of days information) and the data table D10 (FIG. 10) to select keywords. decide. A specific example of the keyword determination method will be described later.

辞書データ生成部２１１２は、キーワード決定部２１１１によって決定されたキーワードと、記憶部２１２に記憶された辞書データベースとに基づいて、ユーザ辞書データを生成する。なお、キーワードからユーザ辞書データを生成することは、従来知られている手法を用いるため、ここでは説明しない。 The dictionary data generation unit 2112 generates user dictionary data based on the keyword determined by the keyword determination unit 2111 and the dictionary database stored in the storage unit 212. Note that generation of user dictionary data from a keyword uses a conventionally known method, and thus will not be described here.

制御部２１１は、生成されたユーザ辞書データを、送信部２１３１を介して、通信端末１００に送信する。 The control unit 211 transmits the generated user dictionary data to the communication terminal 100 via the transmission unit 2131.

図１０は、サーバ装置２００の記憶部２１２に記憶されているデータテーブルＤ１０の概略構成を説明するための図である。図１０を参照して、データテーブルＤ１０は、ジャンル毎にデータテーブルを備えている。 FIG. 10 is a diagram for explaining a schematic configuration of the data table D10 stored in the storage unit 212 of the server device 200. Referring to FIG. 10, data table D10 includes a data table for each genre.

各データテーブルは、日数の間隔と訪問回数とに基づき、区分けされている。たとえば、各データテーブルでは、日数の間隔として、一ヶ月未満、一ヶ月以上かつ一年未満、一年以上の３つの区分に分かれている。また、各データテーブルでは、訪問回数として、初回、二回、三回目以上の３つの区分に分かれている。すなわち、各データテーブルでは、９つに区分され、かつ各区分においては、原則、複数の項目が記憶されている。 Each data table is divided based on the number of days and the number of visits. For example, in each data table, the interval of days is divided into three categories of less than one month, more than one month, less than one year, and more than one year. In each data table, the number of visits is divided into three categories: first time, second time, third time or more. That is, each data table is divided into nine sections, and in principle, a plurality of items are stored in each section.

１つの区分に着目すると、複数の項目に対して、優先順位が付されている。たとえば、一ヶ月未満かつ初回の区分（左上の欄）では、「建物周辺の風景」の優先順位が一位となっており、「建物の外観」の優先順位が二位となっている。 Focusing on one category, priorities are assigned to a plurality of items. For example, in less than one month and the first division (upper left column), the priority of “landscape around the building” is first and the priority of “appearance of the building” is second.

データテーブルＤ１０を用いたキーワードの決定方法の詳細については、後述する。
（ｄ４．制御構造）
図１１は、通信システム１における処理の流れを説明するためのシーケンスチャートである。図１１を参照して、シーケンスＳＱ２において、通信端末１００は、時間差ｔが閾値Ｔｈ１以下になったと判断すると、予め定められた周期で位置情報の取得を開始する。シーケンスＳＱ４において、通信端末１００は、距離Ｄが閾値Ｔｈ２以下になったと判断すると、周囲の撮像を開始する。シーケンスＳＱ６において、通信端末１００は、位置情報と画像データと回数情報と日数情報とを、サーバ装置２００に送信する。 Details of the keyword determination method using the data table D10 will be described later.
(D4. Control structure)
FIG. 11 is a sequence chart for explaining the flow of processing in the communication system 1. Referring to FIG. 11, in sequence SQ2, if communication terminal 100 determines that time difference t has become equal to or less than threshold value Th1, it starts acquiring position information at a predetermined cycle. In sequence SQ4, when the communication terminal 100 determines that the distance D is equal to or less than the threshold value Th2, the communication terminal 100 starts imaging of the surroundings. In sequence SQ6, communication terminal 100 transmits position information, image data, number-of-times information, and number-of-days information to server apparatus 200.

シーケンスＳＱ８において、サーバ装置２００は、位置情報、画像データ、回数情報、および日数情報と、データテーブルＤ１０とを用いて、キーワードの決定を行なう。シーケンスＳＱ１０において、サーバ装置２００は、キーワードを用いてユーザ辞書データの生成を行なう。シーケンスＳＱ１２において、サーバ装置２００は、生成されたユーザ辞書データを通信端末１００に送信する。 In sequence SQ8, server device 200 determines a keyword using position information, image data, number-of-times information, number-of-days information, and data table D10. In sequence SQ10, server device 200 generates user dictionary data using a keyword. In sequence SQ12, server device 200 transmits the generated user dictionary data to communication terminal 100.

シーケンスＳＱ１４において、通信端末１００は、サーバ装置２００から受信したユーザ辞書データを用いた会話を行なう。 In sequence SQ14, communication terminal 100 performs a conversation using the user dictionary data received from server device 200.

図１２は、通信端末１００で行われる処理の流れを説明するためのフローチャートである。図１２を参照して、ステップＳ１０２において、通信端末１００は、現在時刻Ｔ１と予定時刻Ｔ０との時間差ｔが閾値Ｔｈ１以下になったか否かを判断する。通信端末１００は、時間差ｔが閾値Ｔｈ１以下になったと判断すると（ステップＳ１０２においてＹＥＳ）、ステップＳ１０４において、現在地の位置情報の取得を開始する。通信端末１００は、時間差ｔが閾値Ｔｈ１以下でないと判断すると（ステップＳ１０２においてＮＯ）、処理をステップＳ１０２に戻す。 FIG. 12 is a flowchart for explaining the flow of processing performed in communication terminal 100. With reference to FIG. 12, in step S102, the communication terminal 100 determines whether or not the time difference t between the current time T1 and the scheduled time T0 is equal to or less than a threshold value Th1. When communication terminal 100 determines that time difference t has become equal to or smaller than threshold value Th1 (YES in step S102), it starts acquiring position information of the current location in step S104. If communication terminal 100 determines that time difference t is not equal to or smaller than threshold value Th1 (NO in step S102), the process returns to step S102.

ステップＳ１０６において、通信端末１００は、目的地の名称と、目的地を過去に訪れた回数と、前回訪れた日からの経過日数とを算出し、算出結果をサーバ装置２００に送信する。つまり、通信端末１００は、回数情報と日数情報とをサーバ装置２００に送信する。なお、回数情報と日数情報の送信タイミングは、このタイミングに限定されるものではない。たとえば、位置情報および画像データとともに、目的地の名称と回数情報と日数情報とを送信するように、通信端末１００を構成してもよい。 In step S 106, the communication terminal 100 calculates the destination name, the number of times the destination has been visited in the past, and the number of days elapsed since the previous visit, and transmits the calculation result to the server device 200. That is, the communication terminal 100 transmits the number information and the day number information to the server device 200. Note that the transmission timing of the number information and the day information is not limited to this timing. For example, the communication terminal 100 may be configured to transmit the destination name, the number-of-times information, and the number-of-days information together with the position information and the image data.

ステップＳ１０８において、通信端末１００は、現在位置と目的地との間の距離Ｄが閾値Ｔｈ２以下であるか否かを判断する。通信端末１００は、距離Ｄが閾値Ｔｈ２以下であると判断すると（ステップＳ１０８においてＹＥＳ）、ステップＳ１１０において、撮像の許可を得るための発話を行なう。通信端末１００は、距離Ｄが閾値Ｔｈ２以下でないと判断すると（ステップＳ１０８においてＮＯ）、処理をステップＳ１０８に戻す。 In step S108, the communication terminal 100 determines whether or not the distance D between the current position and the destination is equal to or less than the threshold value Th2. When communication terminal 100 determines that distance D is equal to or smaller than threshold value Th2 (YES in step S108), in step S110, utterance for obtaining permission for imaging is performed. If communication terminal 100 determines that distance D is not equal to or smaller than threshold value Th2 (NO in step S108), the process returns to step S108.

ステップＳ１１２において、通信端末１００は、撮像の許可を得たことを条件に、１分あたりＭ枚の周期で通信端末１００の周囲を撮像する。ステップＳ１１４において、通信端末１００は、サーバ装置２００に対する、取得した位置情報の送信と、撮像により得られた画像データの送信とを開始する。 In step S112, the communication terminal 100 images the surroundings of the communication terminal 100 at a cycle of M sheets per minute on condition that the permission for imaging is obtained. In step S 114, the communication terminal 100 starts transmission of the acquired position information and transmission of image data obtained by imaging to the server device 200.

ステップＳ１１６において、通信端末１００は、通信端末１００の移動速度Ｖｍと閾値Ｔｈ２とから、目的地到着までの所要時間Ｔ２を算出し、（Ｎ−Ｍ）／Ｔ２の割合で、１分あたりの撮像枚数を増加させる。 In step S116, the communication terminal 100 calculates a required time T2 to reach the destination from the moving speed Vm of the communication terminal 100 and the threshold value Th2, and images per minute at a rate of (N−M) / T2. Increase the number.

ステップＳ１１８において、通信端末１００は、サーバ装置２００から、ユーザ辞書データを受信する。ステップＳ１２０において、通信端末１００は、受信したユーザ辞書データを用いた会話を開始する。 In step S 118, the communication terminal 100 receives user dictionary data from the server device 200. In step S120, the communication terminal 100 starts a conversation using the received user dictionary data.

ステップＳ１２２において、通信端末１００は、目的地から所定距離離れたか否かを、現在地の位置情報を利用して判断する。通信端末１００は、所定距離離れたと判断すると（ステップＳ１２２においてＹＥＳ）、ステップＳ１２４において、現在地の位置情報の取得と撮像とを終了する。通信端末１００は、所定距離離れていないと判断すると（ステップＳ１２２においてＮＯ）、処理をステップＳ１２２に戻す。 In step S122, the communication terminal 100 determines whether or not a predetermined distance from the destination by using the current location information. If communication terminal 100 determines that the predetermined distance has been reached (YES in step S122), acquisition of position information and imaging of the current location are terminated in step S124. If communication terminal 100 determines that it is not a predetermined distance away (NO in step S122), the process returns to step S122.

図１３は、サーバ装置２００で行われる処理の流れを説明するためのフローチャートである。図１３を参照して、ステップＳ２０２において、サーバ装置２００は、通信端末１００から、目的地の名称と、ユーザ７００が目的地を過去に訪れた回数（具体的には、回数情報）と、前回訪れた日からの経過日数（具体的には、日数情報）とを受信する。 FIG. 13 is a flowchart for explaining the flow of processing performed in the server apparatus 200. Referring to FIG. 13, in step S 202, server device 200 determines the name of the destination, the number of times the user 700 has visited the destination in the past (specifically, information on the number of times), and the previous time from communication terminal 100. The number of days elapsed since the day of the visit (specifically, the number of days information) is received.

ステップＳ２０４において、サーバ装置２００は、通信端末１００から送られてきた、目的地の名称と位置情報と画像データとの受信を開始する。なお、目的地の名称には、ジャンルが含まれていてもよい。ステップＳ２０６において、サーバ装置２００は、位置情報と画像データとを受信してから所定の時間が経過したか否かを判断する。サーバ装置２００は、所定の時間が経過したと判断すると（ステップＳ２０６においてＹＥＳ）、ステップＳ２０８において、目的地の名称と、データテーブルＤ１０と、訪問回数（回数情報）および経過日数（日数情報）とを用いて、受信した画像データの中から、予め定められた数（Ｋ個）の画像データを抽出する。サーバ装置２００は、所定の時間が経過していないと判断すると（ステップＳ２０６においてＮＯ）、処理をステップＳ２０６に戻す。 In step S 204, the server apparatus 200 starts receiving the destination name, position information, and image data sent from the communication terminal 100. Note that the name of the destination may include a genre. In step S206, the server device 200 determines whether or not a predetermined time has elapsed after receiving the position information and the image data. When server device 200 determines that a predetermined time has elapsed (YES in step S206), in step S208, the name of the destination, data table D10, the number of visits (number of times information), and the number of days elapsed (days information) Is used to extract a predetermined number (K) of image data from the received image data. If server device 200 determines that the predetermined time has not elapsed (NO in step S206), it returns the process to step S206.

ステップＳ２０８の処理の詳細について説明すると、以下のとおりである。
サーバ装置２００は、目的地の名称から、場所に関するジャンル（レストラン、遊園地、病院、…）を特定する。サーバ装置２００は、特定されたジャンルのデータテーブル（図１０）を利用する。たとえば、レストランのジャンルが選択された場合、サーバ装置２００は、図１０に示したデータテーブルを利用する。なお、サーバ装置２００は、目的地の名称だけではジャンルを特定できない場合には、通信端末１００から取得した位置情報を利用してジャンルを特定してもよい。 The details of the processing in step S208 will be described as follows.
The server device 200 identifies the genre (restaurant, amusement park, hospital,...) Related to the place from the name of the destination. The server device 200 uses a data table (FIG. 10) of the specified genre. For example, when a restaurant genre is selected, the server apparatus 200 uses the data table shown in FIG. Note that the server device 200 may specify the genre using position information acquired from the communication terminal 100 when the genre cannot be specified only by the destination name.

サーバ装置２００は、ジャンルが特定されると、データテーブルの９つの区分の中から、回数情報に示された回数および日数情報に示された日数に対応する、１つの区分を参照する。具体的には、サーバ装置２００は、当該１つの区分に記載された複数の項目を読み出す。 When the genre is specified, the server device 200 refers to one division corresponding to the number of times indicated in the number-of-times information and the number of days indicated in the number-of-days information from among the nine categories of the data table. Specifically, the server apparatus 200 reads a plurality of items described in the one category.

サーバ装置２００は、受信した複数の画像データの中から、読み出した複数の項目に該当する画像データを判断する。詳しくは、サーバ装置２００は、上記複数の画像データのうち、上記複数の項目における該当するものを、優先順位に基づいて予め定められた数（Ｋ個）だけ抽出する。一例を挙げて説明すれば、以下のとおりである。 The server device 200 determines image data corresponding to the read items from the received image data. Specifically, the server device 200 extracts a corresponding number of the plurality of items from the plurality of image data by a predetermined number (K) based on the priority order. An example will be described as follows.

たとえば、訪問回数が二回で、訪問間隔が一ヶ月未満であったとする。また、Ｋの値を３とする。この場合において、複数の画像データに、優先度が１番のメニュー（おすすめ）に関する画像データと、優先度が３番の目前にある料理に関する画像データと、優先度が６番の食器に関する画像データと、優先度が７番の名札に関する画像データと、訪問回数が二回かつ訪問間隔が一ヶ月未満の欄に記載されていない物に関する複数の画像データとが含まれていたとする。この場合、サーバ装置２００は、受信した複数の画像データのうちから、優先度の高いものから順に３つの画像データを抽出する。具体的には、サーバ装置２００は、優先度が１番のニュー（おすすめ）に関する画像データと、優先度が３番の目前にある料理に関する画像データと、優先度が６番の食器に関する画像データとを抽出する。 For example, assume that the number of visits is two and the visit interval is less than one month. The value of K is set to 3. In this case, the image data relating to the menu (recommendation) having the first priority, the image data relating to the dish having the priority of the third priority, and the image data relating to the tableware having the priority of the sixth are included in the plurality of image data. Suppose that image data related to the name tag with the priority of 7 and a plurality of image data related to items not listed in the column where the number of visits is twice and the visit interval is less than one month are included. In this case, the server apparatus 200 extracts three pieces of image data in descending order of priority from the plurality of received image data. Specifically, the server apparatus 200 has image data related to new (recommended) with the first priority, image data related to the dish with the priority of 3rd, and image data related to the tableware with the priority of 6th. And extract.

ステップＳ２１０において、サーバ装置２００は、抽出された画像データと、位置情報とを用いて、キーワードを決定する。なお、サーバ装置２００には、画像データに関連付けて位置情報が逐次送られてくるが、サーバ装置２００は、これらの複数の位置情報のうち、少なくとも目的地（予め定められた場所）の位置情報を用いてキーワードを決定することが好ましい。また、抽出された画像データと、当該画像データが得られた場所（撮影場所）の位置情報とを利用して、キーワードを決定するように、サーバ装置２００（具体的には、キーワード決定部２１１１）を構成してもよい。 In step S210, the server device 200 determines a keyword using the extracted image data and position information. The server apparatus 200 sequentially sends position information in association with the image data. The server apparatus 200 includes position information on at least a destination (predetermined place) among the plurality of position information. It is preferable to determine a keyword using. Further, the server device 200 (specifically, the keyword determination unit 2111) is configured to determine the keyword using the extracted image data and the position information of the place (shooting place) where the image data is obtained. ) May be configured.

キーワードの決定について、より詳しく説明すると以下のとおりである。サーバ装置２００は、抽出したＫ個の画像データを解析する。たとえば、サーバ装置２００は、Ｋ個の画像データに対して、文字認識処理、あるいはパターンマッチング処理を施す。これにより、サーバ装置２００は、典型的には、複数のキーワードを得る。 The keyword determination will be described in more detail as follows. The server device 200 analyzes the extracted K pieces of image data. For example, the server device 200 performs character recognition processing or pattern matching processing on K pieces of image data. Thereby, the server apparatus 200 typically obtains a plurality of keywords.

さらに、サーバ装置２００は、位置情報を利用して、これらの複数のキーワードからユーザ辞書データの作成に用いるキーワードを抽出する。たとえば、サーバ装置２００は、位置情報で特定される場所がレストランであると判断し得る場合に、複数のキーワードからレストランに関連する度合いの高いキーワードを抽出する。一例として、サーバ装置２００は、複数の語句に関し、語句と語句との関連度合いを記憶しており、当該関連度合いが高い語句（キーワード）を除外する。 Further, the server device 200 extracts keywords used for creating user dictionary data from the plurality of keywords using the position information. For example, when it can be determined that the place specified by the position information is a restaurant, the server device 200 extracts a keyword that is highly related to the restaurant from a plurality of keywords. As an example, the server device 200 stores a degree of association between a word and a phrase with respect to a plurality of words and excludes a word (keyword) having a high degree of association.

あるいは、サーバ装置２００は、位置情報を利用して、これらのキーワードから不要なキーワードを削除してもよい。たとえば、サーバ装置２００は、位置情報で特定される場所がレストランであると判断し得る場合に、キーワードとして病院が含まれているときには、複数のキーワードから病院を除外する。一例として、サーバ装置２００は、複数の語句に関し、語句と語句との関連度合いを記憶しており、当該関連度合いが低い語句（キーワード）を除外する。 Alternatively, the server device 200 may delete unnecessary keywords from these keywords using position information. For example, when it can be determined that the place specified by the position information is a restaurant, the server device 200 excludes a hospital from a plurality of keywords when a hospital is included as a keyword. As an example, the server apparatus 200 stores a degree of association between a word and a phrase with respect to a plurality of words, and excludes a word (keyword) having a low degree of association.

ステップＳ２１２において、サーバ装置２００は、辞書データベースと、決定されたキーワードとを用いてユーザ辞書データを生成する。ステップＳ２１４において、サーバ装置２００は、生成されたユーザ辞書データを、通信端末１００に送信する。ステップＳ２１６において、サーバ装置２００は、位置情報と画像データとを通信端末１００から受信しなくなったか否かを判断する。 In step S212, the server device 200 generates user dictionary data using the dictionary database and the determined keyword. In step S 214, the server device 200 transmits the generated user dictionary data to the communication terminal 100. In step S 216, server apparatus 200 determines whether or not position information and image data are no longer received from communication terminal 100.

サーバ装置２００は、受信しなくなったと判断すると（ステップＳ２１６においてＹＥＳ）、一連の処理を終了する。サーバ装置２００は、受信していると判断すると（ステップＳ２１６においてＮＯ）、ステップＳ２１８において、目的地の名称と、データテーブルＤ１０と、訪問回数（回数情報）および経過日数（日数情報）とを用いて、受信した画像データの中（既に抽出された画像データを除く）から、予め定められた数（Ｋ個）の画像データを抽出する。サーバ装置２００は、その後、処理をステップＳ２１０に進める。 When server device 200 determines that it has not been received (YES in step S216), the series of processing ends. If server device 200 determines that it is received (NO in step S216), it uses the destination name, data table D10, the number of visits (number of times information), and the number of days elapsed (days information) in step S218. Then, a predetermined number (K pieces) of image data is extracted from the received image data (excluding the already extracted image data). Thereafter, the server apparatus 200 advances the process to step S210.

［実施の形態２］
上記においては、予め定められた場所（目的地）は、通信端末１００のユーザ７００による入力操作によって指定された場所である。たとえば、予め定められた場所は、上述したように、ユーザ７００がスケジュール帳に入力した場所である。 [Embodiment 2]
In the above, the predetermined place (destination) is a place specified by an input operation by the user 700 of the communication terminal 100. For example, the predetermined place is a place entered by the user 700 in the schedule book as described above.

しかしながら、これに限定されるものではない。通信端末１００は、サーバ装置２００から、おすすめのスポットあるいはネットワークにおいて話題となっているスポットを、上記予め定められた場所の情報として取得してもよい。 However, the present invention is not limited to this. The communication terminal 100 may acquire a recommended spot or a hot spot in the network from the server device 200 as information on the predetermined location.

［実施の形態３］
上記においては、通信端末１００が自動的にカメラ１６４の向きを変更しつつ、撮像を行なう構成を例に挙げて説明したが、これに限定されるものではない。通信端末１００は、会話が可能であるため、通信端末１００がユーザに対して、カメラ１６４の向きを変更するように指示してもよい。この場合、指示に応じた向きにカメラ１６４の向きが変更されたことを条件に撮像するように、通信端末１００を構成してもよい。 [Embodiment 3]
In the above description, the configuration in which the communication terminal 100 automatically captures an image while changing the orientation of the camera 164 has been described as an example. However, the present invention is not limited to this. Since the communication terminal 100 can talk, the communication terminal 100 may instruct the user to change the orientation of the camera 164. In this case, the communication terminal 100 may be configured to take an image on the condition that the orientation of the camera 164 is changed to the orientation according to the instruction.

［実施の形態４］
上述した報知処理は、振動、発話のみならず、腕の上げ下げ等の可動部の動作を含んでもよい。 [Embodiment 4]
The notification process described above may include not only vibration and speech, but also movements of the movable part such as raising and lowering arms.

［実施の形態５］
通信端末１００が撮像の許可を得るための発話を行なった場合に、ユーザ７００から撮像の許可が得られなかったときには、所定のタイミングで（たとえば、発話から所定時間経過後に）、撮像の許可を得るための発話を再度行なうように、通信端末１００を構成することが好ましい。この場合、２回目以降の発話（撮像の許可を得るための発話）は、発話の中身および発話の調子（抑揚）を最初の発話（撮像の許可を得るための発話）から変更してもよい。 [Embodiment 5]
When communication terminal 100 makes an utterance for obtaining permission for imaging, if permission for imaging is not obtained from user 700, permission for imaging is given at a predetermined timing (for example, after a predetermined time has elapsed since the utterance). It is preferable to configure communication terminal 100 so that the utterance for obtaining is performed again. In this case, in the second and subsequent utterances (utterances for obtaining permission for imaging), the content of the utterances and the tone (inflection) of the utterances may be changed from the first utterances (utterances for obtaining the permission for imaging). .

［実施の形態６］
撮像の許可が得られないまま通信端末１００が目的地に到着した場合には、通信端末１００は、撮像の許可を得るための発話を行なうことが好ましい。 [Embodiment 6]
When the communication terminal 100 arrives at the destination without obtaining permission for imaging, the communication terminal 100 preferably performs an utterance for obtaining permission for imaging.

［まとめ］
以下、上述した処理のうち主要な処理と、当該処理により得られる利点とについて記載する。
（１）通信端末１００は、通信端末１００の現在位置を表す位置情報に基づき、通信端末１００が予め定められた場所（たとえば、レストラン３００等の目的地）に到着したか否かを判断する。通信端末１００は、予め定められた場所において、通信端末１００の周囲をカメラ１６４（撮像部１１４）によって撮像する。通信端末１００は、少なくとも、予め定められた場所の位置情報と、撮像により得られた画像データ（以下、「第１の画像データ」とも称する）を、サーバ装置２００に送信する。 [Summary]
Hereinafter, main processes among the processes described above and advantages obtained by the processes will be described.
(1) The communication terminal 100 determines whether or not the communication terminal 100 has arrived at a predetermined place (for example, a destination such as the restaurant 300) based on position information indicating the current position of the communication terminal 100. The communication terminal 100 images the surroundings of the communication terminal 100 with a camera 164 (imaging unit 114) at a predetermined location. The communication terminal 100 transmits at least position information of a predetermined location and image data obtained by imaging (hereinafter also referred to as “first image data”) to the server device 200.

サーバ装置２００は、通信端末１００から送られてきた位置情報および第１の画像データに基づいて、音声認識用のユーザ辞書データを生成する。サーバ装置２００は、生成されたユーザ辞書データを通信端末１００に送信する。 Server device 200 generates user dictionary data for speech recognition based on the position information and the first image data sent from communication terminal 100. The server device 200 transmits the generated user dictionary data to the communication terminal 100.

通信端末１００は、少なくとも予め定められた場所においては、サーバ装置２００から送信されたユーザ辞書データを用いて通信端末１００のユーザと会話を行なう。 Communication terminal 100 has a conversation with the user of communication terminal 100 using the user dictionary data transmitted from server device 200 at least in a predetermined place.

上記の構成によれば、サーバ装置２００は、予め定められた場所の位置情報と当該場所の画像データとに基づいて決定されたキーワードを利用してユーザ辞書データを生成する。それゆえ、サーバ装置２００は、当該位置情報および当該画像データの少なくとも一方を利用せずにキーワードを決定する構成よりも、予め定められた場所に関連するデータを多く含んだユーザ辞書データを生成することが可能となる。 According to said structure, the server apparatus 200 produces | generates user dictionary data using the keyword determined based on the positional information on a predetermined place, and the image data of the said place. Therefore, the server device 200 generates user dictionary data including a lot of data related to a predetermined place, rather than a configuration in which a keyword is determined without using at least one of the position information and the image data. It becomes possible.

したがって、通信端末１００は、このようなユーザ辞書データを利用することにより、当該ユーザ辞書データを使用しない構成に比べて、予め定められた場所に適した会話をユーザと行なうことが可能となる。
（２）通信端末１００は、通信端末１００のユーザが予め定められた場所を過去に訪れた回数を表す回数情報をサーバ装置２００にさらに送信する。サーバ装置２００は、上記位置情報と上記第１の画像データと上記回数情報とに基づいて、ユーザ辞書データを生成する。 Therefore, by using such user dictionary data, the communication terminal 100 can perform a conversation with a user that is suitable for a predetermined place as compared to a configuration that does not use the user dictionary data.
(2) The communication terminal 100 further transmits to the server device 200 frequency information indicating the number of times the user of the communication terminal 100 has visited a predetermined location in the past. The server device 200 generates user dictionary data based on the position information, the first image data, and the frequency information.

上記構成によれば、サーバ装置２００は、ユーザの訪問回数を考慮してユーザ辞書データを生成する。したがって、サーバ装置２００が回数情報を利用しないでユーザ辞書データを生成する場合よりも、ユーザ７００との会話が密になるユーザ辞書データを生成することができる。
（３）通信端末１００は、ユーザ７００が予め定められた場所を過去に訪れた日を表わす日にち情報をさらに記憶している。通信端末１００は、日にち情報に基づいて、予め定められた場所を前回訪れた日からの経過日数を算出する。通信端末１００は、算出された経過日数を表す日数情報を、サーバ装置２００にさらに送信する。サーバ装置２００は、位置情報と第１の画像データと回数情報と日数情報とに基づいて、ユーザ辞書データを生成する。 According to the above configuration, the server device 200 generates user dictionary data in consideration of the number of visits of the user. Therefore, it is possible to generate user dictionary data in which the conversation with the user 700 is denser than when the server device 200 generates user dictionary data without using the number information.
(3) The communication terminal 100 further stores date information representing a date when the user 700 has visited a predetermined place in the past. Based on the date information, the communication terminal 100 calculates the number of days that have elapsed since the previous visit to a predetermined location. The communication terminal 100 further transmits day information indicating the calculated elapsed days to the server device 200. The server device 200 generates user dictionary data based on the position information, the first image data, the number of times information, and the number of days information.

上記の構成によれば、サーバ装置２００は、さらに、前回訪れた日からの経過日数を考慮してユーザ辞書データを生成する。したがって、サーバ装置２００が経過日数（日数情報）を利用しないでユーザ辞書データを生成する場合よりも、ユーザ７００との会話が密になるユーザ辞書データを生成することができる。
（４）通信端末１００は、通信端末１００と予め定められた場所との間の距離が閾値以下となったことを条件に、通信端末１００の周囲の撮像を開始する。通信端末１００は、予め定められた場所に到達するまでの撮像により得られた第２の画像データを、サーバ装置２００にさらに送信する、サーバ装置２００は、位置情報と第１の画像データと第２の画像データと回数情報と日数情報とに基づいて、ユーザ辞書データを生成する。 According to said structure, the server apparatus 200 produces | generates user dictionary data further considering the elapsed days from the day visited last time. Therefore, it is possible to generate user dictionary data in which conversation with the user 700 is denser than when the server device 200 generates user dictionary data without using the elapsed days (days information).
(4) The communication terminal 100 starts imaging around the communication terminal 100 on condition that the distance between the communication terminal 100 and a predetermined location is equal to or less than a threshold value. The communication terminal 100 further transmits the second image data obtained by imaging until reaching a predetermined location to the server device 200. The server device 200 includes the position information, the first image data, and the first image data. User dictionary data is generated based on the second image data, number-of-times information, and number-of-days information.

上記の構成によれば、サーバ装置２００は、予め定められた場所の周囲の撮像により得られた第２の画像データを考慮して、ユーザ辞書データを生成する。したがって、サーバ装置２００が第２の画像データを利用しないでユーザ辞書データを生成する場合よりも、ユーザ７００との会話が密になるユーザ辞書データを生成することができる。
（５）通信端末１００は、予め定められた場所に近づくほど、撮像の時間間隔を短くする。 According to said structure, the server apparatus 200 produces | generates user dictionary data in consideration of the 2nd image data obtained by imaging around the predetermined place. Therefore, it is possible to generate user dictionary data in which conversation with the user 700 is denser than when the server apparatus 200 generates user dictionary data without using the second image data.
(5) The communication terminal 100 shortens the imaging time interval as it approaches a predetermined location.

上記の構成によれば、サーバ装置２００は、予め定められた場所に近い場所の画像データを遠い場所よりも多く受信することができる。それゆえ、サーバ装置２００は、通信端末１００が同じ時間間隔で撮像する構成よりも、予め定められた場所に関連するデータを多く含んだユーザ辞書データを生成することが可能となる。
（６）サーバ装置２００は、位置情報と第１の画像データと第２の画像データとを用いて、予め定められた場所に関連するキーワードを決定する。サーバ装置２００は、決定されたキーワードを用いて、ユーザ辞書データを生成する。 According to said structure, the server apparatus 200 can receive more image data of the place near a predetermined place than a distant place. Therefore, the server device 200 can generate user dictionary data including more data related to a predetermined place than the configuration in which the communication terminal 100 captures images at the same time interval.
(6) The server device 200 determines a keyword related to a predetermined location using the position information, the first image data, and the second image data. The server device 200 generates user dictionary data using the determined keyword.

上記の構成によれば、予め定められた場所の位置情報と、予め定められた場所での撮像により得られた第１の画像データと、当該場所に近い場所での撮像により得られた第２の画像データとを用いて、ユーザ辞書データの生成に用いる予め定められた場所に関連するキーワードを生成できる。
（７）サーバ装置２００は、回数と日にちの間隔との複数の組み合わせの各々に対して、複数の項目を優先順位を付けて記憶したデータベースを記憶している。サーバ装置２００は、少なくとも、データテーブルＤ１０と、通信端末１００から送信された回数情報および日数情報とを用いて、第１の画像データおよび第２の画像データの中から、予め定められた数の画像データを抽出する。サーバ装置２００は、抽出された画像データと、位置情報とを用いて、キーワードを決定する。 According to the above configuration, the position information of the predetermined location, the first image data obtained by imaging at the predetermined location, and the second obtained by imaging at the location close to the location. Using this image data, it is possible to generate a keyword related to a predetermined location used for generating user dictionary data.
(7) The server apparatus 200 stores a database in which a plurality of items are stored with priorities for each of a plurality of combinations of the number of times and the date interval. Server device 200 uses a data table D10 and the number-of-times information and the number-of-days information transmitted from communication terminal 100 to determine a predetermined number of first image data and second image data. Extract image data. The server device 200 determines a keyword using the extracted image data and position information.

上記の構成によれば、サーバ装置２００は、回数情報および日数情報に応じた画像データを優先度の高いものから順に抽出できる。それゆえ、サーバ装置２００は、このように回数情報および日数情報に応じた画像データを抽出しない構成に比べて、ユーザ７００との会話が密になるユーザ辞書データを生成することができる。
（８）通信端末１００は、通信端末１００と予め定められた場所との間の距離が閾値以下となったと判断すると、予め定められた報知処理を実行する。 According to said structure, the server apparatus 200 can extract the image data according to frequency | count information and days information in an order from a high priority. Therefore, the server device 200 can generate user dictionary data in which the conversation with the user 700 is denser than the configuration in which image data corresponding to the number information and the number of days information is not extracted as described above.
(8) When the communication terminal 100 determines that the distance between the communication terminal 100 and a predetermined location is equal to or less than the threshold value, the communication terminal 100 executes a predetermined notification process.

上記の構成によれば、通信端末１００のユーザは、予め定められた場所に近づいたことを知ることができる。すなわち、通信端末１００による撮像開始のタイミングとなったことを知ることができる。それゆえ、ユーザ７００は、通信端末１００によってユーザ７００の周囲の風景の撮像が可能なように、通信端末１００の状態を変更することが可能となる。たとえば、通信端末１００を鞄にしまっている場合には、上記報知処理が、鞄から取り出す動機付けになる。
（９）予め定められた報知処理は、通信端末１００の周囲の撮像許可を通信端末１００のユーザから得るための発話を含んでいる。通信端末１００は、ユーザ７００から撮像の許可を得たことを条件に、通信端末１００の周囲をカメラによって撮像する。 According to said structure, the user of the communication terminal 100 can know that it approached the predetermined place. That is, it is possible to know that the timing for starting imaging by the communication terminal 100 has come. Therefore, the user 700 can change the state of the communication terminal 100 so that the communication terminal 100 can capture a scene around the user 700. For example, when the communication terminal 100 is trapped, the notification process is motivated to be removed from the bag.
(9) The predetermined notification process includes an utterance for obtaining permission for imaging around the communication terminal 100 from the user of the communication terminal 100. The communication terminal 100 images the surroundings of the communication terminal 100 with a camera on the condition that permission for imaging is obtained from the user 700.

上記の構成によれば、ユーザ７００から許可が得られない状態で、通信端末１００が撮像を行なうことを防止できる。また、通信端末１００による発話により、ユーザ７００は、撮像を行なう必要性を認識することができる。
（１０）通信端末１００は、予め定められた場所の位置情報を、サーバ装置２００から受信する。
（１１）予め定められた場所は、通信端末１００のユーザによる入力操作によって指定された場所である。
（１２）通信端末１００は、カメラ１６４の向きを変更可能である。通信端末１００は、通信端末１００が予め定められた場所に到着したと判断すると、通信端末１００の周囲をカメラの向きを変更して複数枚撮像する。通信端末１００は、予め定められた場所の位置情報と、複数の第１の画像データとを、サーバ装置２００に送信する。 According to the above configuration, it is possible to prevent the communication terminal 100 from capturing an image without permission from the user 700. In addition, the user 700 can recognize the necessity of imaging by utterance by the communication terminal 100.
(10) The communication terminal 100 receives position information of a predetermined location from the server device 200.
(11) The predetermined place is a place designated by an input operation by the user of the communication terminal 100.
(12) The communication terminal 100 can change the orientation of the camera 164. When the communication terminal 100 determines that the communication terminal 100 has arrived at a predetermined location, the communication terminal 100 changes the camera direction and captures a plurality of images around the communication terminal 100. The communication terminal 100 transmits position information of a predetermined place and a plurality of first image data to the server device 200.

上記の構成によれば、サーバ装置２００は、予め定められた場所を複数のアングルから撮像することにより得られた複数の画像データを用いて、ユーザ辞書データを生成することができる。
（１３）通信端末１００は、予め定められた場所において通信端末１００の周囲をカメラによって撮像すると、通信端末１００の向きをユーザに変更させるための発話を行なう。 According to said structure, the server apparatus 200 can produce | generate user dictionary data using the several image data obtained by imaging a predetermined place from several angles.
(13) When the communication terminal 100 captures an image of the surroundings of the communication terminal 100 with a camera at a predetermined location, the communication terminal 100 performs an utterance for changing the orientation of the communication terminal 100 by the user.

上記の構成によれば、ユーザが通信端末１００の向きを変更することにより、通信端末１００は、予め定められた場所を複数のアングルから撮像することが可能となる。
（１４）通信端末１００は、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の端末である。通信端末１００は、音声入力部１１６と音声出力部１１７とを用いて、通信端末１００のユーザと会話を行なうように構成された会話制御部１１１２と、通信端末１００の現在位置を表す位置情報に基づき、通信端末１００が予め定められた場所に到着したか否かを判断する判断部１１１５と、予め定められた場所において、通信端末１００の周囲をカメラによって撮像する撮像部１１４と、予め定められた場所の位置情報と、撮像により得られた画像データとを、サーバ装置２００に送信する送信部１１８１と、位置情報および画像データに基づいて生成された音声認識用のユーザ辞書データを、サーバ装置２００から受信する受信部１１８２とを備える。会話制御部１１１２は、ユーザ辞書データを用いて通信端末１００のユーザと会話を行なう。
（１５）サーバ装置２００は、会話が可能であるとともに自律的に周囲を撮像可能な可搬型の通信端末１００と通信する。サーバ装置２００は、通信端末１００が予め定められた場所において撮像した被写体の画像データと、予め定められた場所の位置情報とを、通信端末１００から受信する受信部２１３２と、通信端末１００から送られてきた位置情報および画像データに基づいて、音声認識用のユーザ辞書データを生成する辞書データ生成部２１１２と、生成されたユーザ辞書データを通信端末１００に送信する送信部２１３１とを備える。 According to said structure, when a user changes direction of the communication terminal 100, the communication terminal 100 can image a predetermined place from several angles.
(14) The communication terminal 100 is a portable terminal capable of talking and autonomously imaging the surroundings. The communication terminal 100 uses the voice input unit 116 and the voice output unit 117, the conversation control unit 1112 configured to have a conversation with the user of the communication terminal 100, and the position information indicating the current position of the communication terminal 100. A determination unit 1115 that determines whether or not the communication terminal 100 has arrived at a predetermined location; an imaging unit 114 that captures the surroundings of the communication terminal 100 with a camera at the predetermined location; The transmission unit 1181 that transmits the position information of the location and the image data obtained by imaging to the server apparatus 200, and the user dictionary data for voice recognition generated based on the position information and the image data are stored in the server apparatus. 200 to receive from 200. The conversation control unit 1112 has a conversation with the user of the communication terminal 100 using the user dictionary data.
(15) The server device 200 communicates with the portable communication terminal 100 that can talk and autonomously image the surroundings. The server apparatus 200 receives the image data of the subject imaged at the predetermined location by the communication terminal 100 and the position information of the predetermined location from the communication terminal 100, and sends from the communication terminal 100. A dictionary data generation unit 2112 that generates user dictionary data for speech recognition based on the positional information and image data that has been received, and a transmission unit 2131 that transmits the generated user dictionary data to the communication terminal 100 are provided.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１通信システム、１００通信端末、１１１，２１１制御部、１１２，２１２記憶部、１１３位置情報取得部、１１４撮像部、１１５駆動部、１１６音声入力部、１１７音声出力部、１１８，２１３通信処理部、１１９表示部、１５８ＧＰＳ受信機、１６２マイク、１６３スピーカ、１６４カメラ、１６５駆動装置、２００サーバ装置、３００レストラン、４００ケース、５００基地局、６００ネットワーク、７００ユーザ、８００自宅、９１１テーブル、９２１メニュー、１１１２会話制御部、１１１３駆動制御部、１１１４間隔算出部、１１１５判断部、１１８１，２１３１送信部、１１８２，２１３２受信部、２１１１キーワード決定部、２１１２辞書データ生成部、Ｄ８訪問履歴情報、Ｄ１０データテーブル、Ｐ０，Ｐ１，Ｐ２位置。 DESCRIPTION OF SYMBOLS 1 Communication system, 100 Communication terminal, 111, 211 Control part, 112,212 Storage part, 113 Position information acquisition part, 114 Imaging part, 115 Drive part, 116 Voice input part, 117 Voice output part, 118,213 Communication processing part 119 Display unit, 158 GPS receiver, 162 microphone, 163 speaker, 164 camera, 165 driving device, 200 server device, 300 restaurant, 400 case, 500 base station, 600 network, 700 user, 800 home, 911 table, 921 Menu, 1112, conversation control unit, 1113 drive control unit, 1114 interval calculation unit, 1115 determination unit, 1181, 2131 transmission unit, 1182, 2132 reception unit, 2111 keyword determination unit, 2112 dictionary data generation unit, D8 visit history information D10 data table, P0, P1, P2 position.

Claims

A communication system comprising a server device and a portable communication terminal capable of talking and autonomously imaging the surroundings,
The communication terminal is
Based on the location information representing the current location of the communication terminal, determine whether the communication terminal has arrived at a predetermined location,
In the predetermined place, the surroundings of the communication terminal are imaged by a camera,
Transmitting the position information of the predetermined location and the first image data obtained by the imaging to the server device;
The server device
Based on the position information and the first image data sent from the communication terminal, generate dictionary data for speech recognition,
Transmitting the generated dictionary data to the communication terminal;
The communication terminal is
A communication system for carrying out a conversation with a user of the communication terminal using the dictionary data based on reception of the dictionary data.

The communication terminal further transmits frequency information representing the number of times the user of the communication terminal has visited the predetermined place in the past to the server device,
The communication system according to claim 1, wherein the server device generates the dictionary data based on the position information, the first image data, and the frequency information.

The communication terminal is
Further storing date information representing a date the user has visited the predetermined location in the past;
Based on the date information, calculate the number of days elapsed since the previous visit to the predetermined place,
Further transmitting days information representing the calculated elapsed days to the server device,
The server device
The communication system according to claim 2, wherein the dictionary data is generated based on the position information, the first image data, the frequency information, and the days information.

The communication terminal is
On the condition that the distance between the communication terminal and the predetermined location is equal to or less than a threshold, start imaging around the communication terminal,
The second image data obtained by imaging until reaching the predetermined location is further transmitted to the server device,
The server device
The communication system according to claim 3, wherein the dictionary data is generated based on the position information, the first image data, the second image data, the number-of-times information, and the number-of-days information.

The communication system according to claim 4, wherein the communication terminal shortens the time interval of the imaging as it approaches the predetermined location.

The server device
Using the position information, the first image data, and the second image data, a keyword related to the predetermined location is determined,
The communication system according to claim 5, wherein the dictionary data is generated using the determined keyword.

The server device
For each of a plurality of combinations of the number of times and the interval of dates, a database storing a plurality of items with priorities is stored.
Extracting a predetermined number of image data from the first image data and the second image data using the database and the number-of-times information and the number-of-days information transmitted from the communication terminal And
The communication system according to claim 6, wherein the keyword is determined using the extracted image data and the position information.

The communication terminal according to any one of claims 1 to 7, wherein when the communication terminal determines that the distance between the communication terminal and the predetermined location is equal to or less than a threshold value, the communication terminal executes a predetermined notification process. The communication system described.

The predetermined notification process includes an utterance for obtaining permission for imaging around the communication terminal from a user of the communication terminal,
The communication system according to claim 8, wherein the communication terminal captures an image of the surroundings of the communication terminal with the camera on the condition that permission for imaging is obtained from the user.

The communication system according to any one of claims 1 to 9, wherein the communication terminal receives position information of the predetermined location from the server device.

The communication system according to any one of claims 1 to 9, wherein the predetermined location is a location designated by an input operation by a user of the communication terminal.

The communication terminal is
The orientation of the camera can be changed,
When it is determined that the communication terminal has arrived at a predetermined location, a plurality of images are captured by changing the orientation of the camera around the communication terminal,
The communication system according to any one of claims 1 to 11, wherein position information of the predetermined location and a plurality of the first image data are transmitted to the server device.

12. The communication terminal according to any one of claims 1 to 11, wherein the communication terminal performs an utterance for changing the orientation of the communication terminal by the user when the camera is imaged around the communication terminal at the predetermined location. The communication system according to item.

A portable communication terminal capable of talking and autonomously imaging the surroundings,
Conversation control means configured to have a conversation with the user of the communication terminal using a voice input unit and a voice output unit;
Determining means for determining whether or not the communication terminal has arrived at a predetermined location based on position information indicating the current position of the communication terminal;
Imaging means for imaging the periphery of the communication terminal by a camera at the predetermined location;
Transmitting means for transmitting the position information of the predetermined location and the image data obtained by the imaging to a server device;
Receiving means for receiving dictionary data for speech recognition generated based on the position information and the image data from the server device;
The conversation control means is a communication terminal that has a conversation with a user of the communication terminal using the dictionary data.

A server device that communicates with a portable communication terminal capable of talking and autonomously imaging the surroundings,
Receiving means for receiving, from the communication terminal, image data of a subject imaged at a predetermined location by the communication terminal and position information of the predetermined location;
Dictionary data generating means for generating dictionary data for speech recognition based on the position information and the image data sent from the communication terminal;
A server apparatus comprising: a transmission unit that transmits the generated dictionary data to the communication terminal.

An information processing method in a communication system comprising a server device and a portable communication terminal capable of conversation and autonomously imaging the surroundings,
Determining whether the communication terminal has arrived at a predetermined location based on position information representing a current position of the communication terminal; and
The communication terminal imaging the surroundings of the communication terminal with a camera at the predetermined location;
The communication terminal transmitting the position information of the predetermined location and the first image data obtained by the imaging to the server device;
The server device generating speech recognition dictionary data based on the position information and the first image data sent from the communication terminal;
The server device transmitting the generated dictionary data to the communication terminal;
An information processing method comprising: a step of having a conversation with a user of the communication terminal using the dictionary data based on the communication terminal receiving the dictionary data.

An information processing method in a portable communication terminal capable of talking and autonomously imaging the surroundings,
Using a voice input unit and a voice output unit to perform a conversation with the user of the communication terminal;
Determining whether the communication terminal has arrived at a predetermined location based on position information representing a current position of the communication terminal;
Imaging the surroundings of the communication terminal with a camera at the predetermined location;
Transmitting the position information of the predetermined location and the image data obtained by the imaging to a server device;
Receiving dictionary data for speech recognition generated based on the position information and the image data from the server device,
An information processing method of performing a conversation with a user of the communication terminal using the dictionary data in the conversation step.

An information processing method in a server device that communicates with a portable communication terminal capable of conversation and being able to autonomously image the surroundings,
Receiving from the communication terminal image data of a subject imaged at a predetermined location by the communication terminal and position information of the predetermined location;
Generating dictionary data for speech recognition based on the position information and the image data sent from the communication terminal;
Transmitting the generated dictionary data to the communication terminal.