JP7459020B2

JP7459020B2 - Information processing device, information processing method, and information processing program

Info

Publication number: JP7459020B2
Application number: JP2021101353A
Authority: JP
Inventors: 浩二西田; 安昭兵藤; 瑠衣有地; 康大助光
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2024-04-01
Anticipated expiration: 2041-06-18
Also published as: JP2023000493A

Description

本発明は、情報処理装置、情報処理方法及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

ユーザの行動に関する履歴情報から、ユーザの通常の行動から外れる外れ行動を示す情報である外れ情報を抽出し、抽出された外れ情報に基づいて、ユーザの特性であって、外れ行動により顕在化される特性であるユーザ特性を推定する技術が開示されている。 Outlier information, which is information indicating deviant behavior that deviates from the user's normal behavior, is extracted from historical information regarding user behavior, and based on the extracted deviant information, user characteristics that are manifested by deviant behavior are extracted. A technique has been disclosed for estimating user characteristics, which are characteristics associated with a given user.

特開２０１９－４９９４６号公報JP 2019-49946 A

しかしながら、上記の従来技術では、普段国内で行動しており履歴情報から行動を追うことが可能な国内のユーザの特性は推定できるが、インバウンド（Inbound）等で一時的に海外から訪れたユーザの履歴情報は存在せず行動を追うことはできないため、初めて日本を訪れた海外在住の訪日客（来訪者）等のユーザ特性は推定できない。 However, with the above-mentioned conventional technology, it is possible to estimate the characteristics of domestic users who are usually active in Japan and whose actions can be tracked from historical information, but it is possible to estimate the characteristics of domestic users who are usually active in Japan and whose actions can be tracked from historical information. Since there is no historical information and it is not possible to track the user's behavior, it is not possible to estimate the user characteristics of a foreign visitor (visitor) visiting Japan for the first time.

本願は、上記に鑑みてなされたものであって、インバウンドの来訪者のデータを適切に取得することを目的とする。 This application was made in view of the above, and aims to appropriately acquire data on inbound visitors.

本願に係る情報処理装置は、複数の場所に設置されたスマートスピーカのうちいずれかのスマートスピーカを介してユーザと対話した際の前記ユーザの音声を取得する取得部と、前記ユーザの音声を用いた音声認識により、異なる場所又は日時に対話した前記ユーザが同一人物であることを特定する特定部と、前記ユーザの音声を取得した場所と日時とを紐づけて前記ユーザの行動履歴として蓄積する管理部と、前記ユーザの音声を用いた音声認識により、前記ユーザの使用言語を推定し、前記ユーザの使用言語により前記ユーザが訪日外国人であると推定する推定部と、前記ユーザの音声を取得した場所に設置されたスマートスピーカ又は表示装置を介して前記ユーザの使用言語に応じた情報提供をする提供部と、を備えることを特徴とする。 The information processing device according to the present application includes an acquisition unit that acquires the user's voice when interacting with the user through one of the smart speakers installed at a plurality of locations , and a an identifying unit that identifies that the users who interacted with the user at different locations or at different times are the same person through voice recognition; a management unit, an estimation unit that estimates the language used by the user through voice recognition using the user's voice, and estimates that the user is a foreigner visiting Japan based on the language used by the user; The present invention is characterized by comprising a provision unit that provides information according to the language used by the user via a smart speaker or display device installed at the acquired location .

実施形態の一態様によれば、インバウンドの来訪者のデータを適切に取得することができる。 According to one aspect of the embodiment, data on inbound visitors can be appropriately acquired.

図１は、実施形態に係る情報処理方法の概要を示す説明図である。FIG. 1 is an explanatory diagram showing an overview of an information processing method according to an embodiment. 図２は、実施形態に係る情報処理システムの構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of an information processing system according to an embodiment. 図３は、実施形態に係る情報提供装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of an information providing device according to an embodiment. 図４は、訪日客情報データベースの一例を示す図である。FIG. 4 is a diagram showing an example of a visitor information database. 図５は、提供情報データベースの一例を示す図である。FIG. 5 is a diagram showing an example of the provided information database. 図６は、実施形態に係る処理手順を示すフローチャートである。FIG. 6 is a flowchart showing the processing procedure according to the embodiment. 図７は、ハードウェア構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of a hardware configuration.

以下に、本願に係る情報処理装置、情報処理方法及び情報処理プログラムを実施するための形態（以下、「実施形態」と記載する）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法及び情報処理プログラムが限定されるものではない。また、以下の実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Below, the information processing device, information processing method, and information processing program according to the present application will be described in detail with reference to the drawings. Note that the information processing device, information processing method, and information processing program according to the present application are not limited to these embodiments. In addition, the same components in the following embodiments will be denoted by the same reference numerals, and duplicated descriptions will be omitted.

〔１．情報処理方法の概要〕
まず、図１を参照し、実施形態に係る情報処理装置が行う情報処理方法の概要について説明する。図１は、実施形態に係る情報処理方法の概要を示す説明図である。なお、図１では、海外からの訪日客（来訪者）への情報提供及び行動追跡をする場合を例に挙げて説明する。 [1. Overview of information processing method]
First, with reference to FIG. 1, an overview of an information processing method performed by an information processing apparatus according to an embodiment will be described. FIG. 1 is an explanatory diagram showing an overview of an information processing method according to an embodiment. In addition, in FIG. 1, a case will be described as an example in which information is provided to and behavior tracking is performed for visitors to Japan from overseas.

図１に示すように、情報処理システム１は、端末装置１０と情報提供装置１００とスマートスピーカ２００とカメラ３００とを含む。情報提供装置１００とスマートスピーカ２００とカメラ３００とは、それぞれネットワークＮ（図２参照）を介して有線又は無線で互いに通信可能に接続される。本実施形態では、情報提供装置１００は、スマートスピーカ２００及びカメラ３００と連携する。スマートスピーカ２００及びカメラ３００はそれぞれ複数の場所に設置されている。 As shown in FIG. 1, the information processing system 1 includes a terminal device 10, an information providing device 100, a smart speaker 200, and a camera 300. The information providing device 100, the smart speaker 200, and the camera 300 are connected to each other via a network N (see FIG. 2) so that they can communicate with each other by wire or wirelessly. In this embodiment, the information providing device 100 cooperates with a smart speaker 200 and a camera 300. The smart speaker 200 and the camera 300 are each installed at multiple locations.

端末装置１０は、海外からの訪日客（来訪者）等である利用者Ｕ（ユーザ）により使用されるスマートフォンやタブレット等のスマートデバイスであり、４Ｇ（Generation）やＬＴＥ（Long Term Evolution）等の無線通信網を介して任意のサーバ装置と通信を行うことができる携帯端末装置である。また、端末装置１０は、液晶ディスプレイ等の画面であって、タッチパネルの機能を有する画面を有し、利用者Ｕから指やスタイラス等によりタップ操作、スライド操作、スクロール操作等、コンテンツ等の表示データに対する各種の操作を受付ける。なお、画面のうち、コンテンツが表示されている領域上で行われた操作を、コンテンツに対する操作としてもよい。また、端末装置１０は、スマートデバイスのみならず、ノートＰＣ（Personal Computer）等の可搬式の情報処理装置であってもよい。 The terminal device 10 is a smart device such as a smartphone or a tablet used by a user U (user), such as a visitor to Japan from overseas, and supports 4G (Generation), LTE (Long Term Evolution), etc. This is a mobile terminal device that can communicate with any server device via a wireless communication network. Further, the terminal device 10 has a screen such as a liquid crystal display, which has a touch panel function, and displays data such as content through a tap operation, a slide operation, a scroll operation, etc. from the user U using a finger or a stylus. Accepts various operations on. Note that an operation performed on an area of the screen where content is displayed may be an operation on the content. Further, the terminal device 10 may be not only a smart device but also a portable information processing device such as a notebook PC (Personal Computer).

本実施形態では、端末装置１０は必須ではない。利用者Ｕは、端末装置１０を所持していない人（例えば子供など）であってもよい。 In this embodiment, the terminal device 10 is not essential. The user U may be a person (for example, a child) who does not own the terminal device 10.

情報提供装置１００は、海外からの訪日客（来訪者）等である利用者Ｕに関する情報を収集する情報処理装置であり、サーバ装置やクラウドシステム等により実現される。本実施形態では、情報提供装置１００は、スマートスピーカ２００及びカメラ３００と連携し、スマートスピーカ２００及びカメラ３００から利用者Ｕに関する情報を収集する。 The information providing device 100 is an information processing device that collects information regarding a user U who is a visitor to Japan from overseas, and is realized by a server device, a cloud system, or the like. In this embodiment, the information providing device 100 cooperates with the smart speaker 200 and the camera 300 to collect information about the user U from the smart speaker 200 and the camera 300.

スマートスピーカ２００は、対話型の音声操作に対応したＡＩアシスタント機能を持つスピーカであり、内蔵されているマイクで音声を認識する。また、スマートスピーカ２００は、利用者Ｕとの対話により得られた音声データから、音声認識により利用者Ｕを特定する。さらに、スマートスピーカ２００は、利用者Ｕとの対話により、利用者Ｕに対して情報提供を行う。 The smart speaker 200 is a speaker with an AI assistant function that supports interactive voice operations, and recognizes voice using a built-in microphone. Furthermore, the smart speaker 200 identifies the user U through voice recognition from the voice data obtained from the interaction with the user U. Furthermore, the smart speaker 200 provides information to the user U through dialogue with the user U.

また、スマートスピーカ２００は、対話した利用者Ｕの使用言語により利用者Ｕが訪日外国人であると推定する。例えば、スマートスピーカ２００は、利用者Ｕの音声データから利用者Ｕの使用言語を推定し、利用者Ｕが当該使用言語を公用語とする国や地域から来訪した訪日外国人であると推定する。但し、使用言語と国とは必ずしも１対１の関係にはない（例えば、英語、スペイン語、アラビア語等は多くの国や地域で公用語となっている）ため、使用言語からの利用者Ｕの出身地（母国等）の推定は必須ではない。 The smart speaker 200 also estimates that user U is a foreign visitor to Japan based on the language used by user U during the conversation. For example, the smart speaker 200 estimates the language used by user U from user U's voice data, and estimates that user U is a foreign visitor to Japan who has come from a country or region where that language is an official language. However, there is not necessarily a one-to-one relationship between the language used and the country (for example, English, Spanish, Arabic, etc. are official languages in many countries and regions), so it is not essential to estimate user U's place of origin (mother country, etc.) from the language used.

また、スマートスピーカ２００は、利用者Ｕとの対話（会話）の中で、利用者Ｕから出身地（母国等）に関連する情報を聞き出してもよい。例えば、出身地（母国等）の名称等に関する直接的な質問はせず、出身地（母国等）に関連する話題（流行、著名人、特産品／名産品等）に触れてもよい。あるいは、スマートスピーカ２００は、利用者Ｕとの会話の内容から利用者Ｕの出身地を推定してもよい。 The smart speaker 200 may also find out information related to the user U's place of origin (home country, etc.) from the user U during a dialogue (conversation) with the user U. For example, instead of asking a direct question about the name of the place of origin (home country, etc.), the smart speaker 200 may touch on topics related to the place of origin (home country, etc.) (trends, famous people, local products/famous products, etc.). Alternatively, the smart speaker 200 may infer the place of origin of the user U from the content of the conversation with the user U.

また、スマートスピーカ２００は、同じ使用言語であっても、方言や訛り等の違いにより、さらに細分化してもよい。すなわち、使用言語は、標準語に限らず、地域言語（地方言語）であってもよい。例えば、スマートスピーカ２００は、広義には同じ使用言語であっても、利用者Ｕの出身地に応じた方言や訛り等の違いがある場合、別の使用言語として扱ってもよい。また、スマートスピーカ２００は、使用言語に含まれる方言や訛り等から、利用者Ｕの出身国（アメリカ、中国等）や地域（アメリカのどの地域か等）を推定してもよい。 Furthermore, even if the smart speakers 200 use the same language, they may be further subdivided due to differences in dialects, accents, etc. That is, the language used is not limited to standard language, but may be a regional language (local language). For example, even if the smart speaker 200 uses the same language in a broad sense, if there is a difference in dialect or accent depending on the place of birth of the user U, the smart speaker 200 may treat the languages as different languages. Furthermore, the smart speaker 200 may estimate the user U's country of origin (America, China, etc.) and region (which region in the United States, etc.) from the dialect, accent, etc. included in the language used.

また、スマートスピーカ２００は、利用者Ｕとの対話に先立って、利用者Ｕに使用言語又は出身地を指定（選択）させるようにしてもよい。一般的に、利用者Ｕは使い慣れている自身の使用言語での対話や案内を望むものと推測される。 Further, the smart speaker 200 may allow the user U to specify (select) the language to use or the place of birth before interacting with the user U. In general, it is assumed that the user U desires dialogue and guidance in his or her own language, which he or she is familiar with.

また、スマートスピーカ２００は、利用者Ｕとの対話（会話）の内容から、利用者Ｕの訪日の目的がビジネスか観光かを推定してもよい。そして、利用者Ｕの訪日の目的に応じて、利用者Ｕへの情報提供の内容を変更してもよい。訪日の目的によって、利用者Ｕの求める情報は異なると推測される。 Furthermore, the smart speaker 200 may estimate whether the purpose of the user U's visit to Japan is business or sightseeing based on the content of the dialogue (conversation) with the user U. The content of information provided to user U may be changed depending on the purpose of user U's visit to Japan. It is assumed that the information that user U seeks differs depending on the purpose of visiting Japan.

なお、実際には、スマートスピーカ２００は、対話した利用者Ｕの音声データを情報提供装置１００に提供し、情報提供装置１００が利用者Ｕの音声データから、音声認識により利用者Ｕを特定し、利用者Ｕの使用言語を推定し、利用者Ｕが訪日外国人であると推定してもよい。 Note that in reality, the smart speaker 200 provides the information providing device 100 with the voice data of the user U with whom it interacted, and the information providing device 100 identifies the user U from the user U's voice data through voice recognition. , the language used by user U may be estimated, and it may be estimated that user U is a foreigner visiting Japan.

カメラ３００は、スマートスピーカ２００に搭載、又はスマートスピーカ２００の近傍に設置された撮像装置であり、スマートスピーカ２００と対話中の利用者Ｕを撮影する。また、カメラ３００は、撮影された利用者Ｕの画像データから、画像認識により利用者Ｕを特定する。但し、画像認識は一例に過ぎない。実際には、画像認識に限らず、指紋認証、顔認証、虹彩認証、静脈認証等の生体認証であってもよい。 The camera 300 is an imaging device mounted on the smart speaker 200 or installed near the smart speaker 200, and photographs the user U who is interacting with the smart speaker 200. Furthermore, the camera 300 identifies the user U through image recognition from the photographed image data of the user U. However, image recognition is just one example. In reality, the method is not limited to image recognition, but may also be biometric authentication such as fingerprint authentication, face authentication, iris authentication, vein authentication, etc.

例えば、同一人物であっても複数の言語を使用する能力をもっている人（バイリンガル、トライリンガル等）も存在する。また、出身地が英語圏でなくても英語を使用してコミュニケーションをとろうとする人もいる。たとえ同一人物が異なる言語を使用した場合でも、画像認識により同一人物と推定された場合には、同一人物として扱ってもよい。例えば、利用者Ｕが場所によって異なる言語を使用したとしても、カメラ３００が画像認識により利用者Ｕであると特定した場合には、使用言語にかかわらず利用者Ｕとして扱う。 For example, there are some people (bilingual, trilingual, etc.) who are the same person but have the ability to use multiple languages. There are also people who try to communicate using English even if they are not from an English-speaking country. Even if the same person uses different languages, if it is estimated that they are the same person through image recognition, they may be treated as the same person. For example, even if the user U uses different languages depending on the location, if the camera 300 identifies the user U through image recognition, the user is treated as the user U regardless of the language used.

さらに、カメラ３００は、利用者Ｕの画像データを用いた画像認識により、利用者Ｕの容姿、衣装及び所持品を判別し、利用者Ｕの容姿、衣装及び所持品から、利用者Ｕが訪日外国人であると推定してもよい。また、カメラ３００は、利用者Ｕの容姿、衣装及び所持品、あるいは挙動から、利用者Ｕの嗜好（興味・関心の対象）を推定してもよい。そして、スマートスピーカ２００は、カメラ３００と連携し、推定された利用者Ｕの嗜好に応じて、利用者Ｕへの情報提供の内容を変更してもよい。例えば、スマートスピーカ２００は、推定された利用者Ｕの嗜好に合わせた商品提案をしてもよい。 Furthermore, the camera 300 determines the appearance, clothes, and belongings of the user U through image recognition using the image data of the user U, and from the appearance, clothes, and belongings of the user U, the user U It may be assumed that the person is a foreigner. Furthermore, the camera 300 may estimate the preferences (objects of interest) of the user U from the user's U's appearance, clothes, belongings, or behavior. The smart speaker 200 may cooperate with the camera 300 to change the content of information provided to the user U according to the estimated preferences of the user U. For example, the smart speaker 200 may suggest products that match the estimated preferences of the user U.

また、カメラ３００は、利用者Ｕの容姿、衣装及び所持品等から、利用者Ｕの出身国（アメリカ、中国等）や地域（アメリカのどの地域か等）を推定してもよい。例えば、カメラ３００は、利用者Ｕが所定のメーカーのサングラスを持っていたら、販売元地域から来たと推定する。また、カメラ３００は、利用者Ｕが主に一部地域の住民しか所持していない物品を持っていたら、当該地域から来たと推定する。 Furthermore, the camera 300 may estimate the user U's country of origin (America, China, etc.) or region (which region in the U.S., etc.) from the user U's appearance, costume, belongings, etc. For example, if the user U has sunglasses from a certain manufacturer, the camera 300 infers that the user U comes from the region where the sunglasses were sold. Further, if the user U is carrying an item that is mainly owned only by residents of a certain area, the camera 300 estimates that the user has come from that area.

なお、実際には、カメラ３００は、撮影された利用者Ｕの画像データを情報提供装置１００に提供し、情報提供装置１００が利用者Ｕの画像データから、画像認識により利用者Ｕを特定し、利用者Ｕが訪日外国人であると推定してもよい。 Note that in reality, the camera 300 provides the photographed image data of the user U to the information providing device 100, and the information providing device 100 identifies the user U from the image data of the user U through image recognition. , it may be assumed that user U is a foreigner visiting Japan.

図１に示す例において、識別のため、便宜上、場所Ａに設置されたスマートスピーカ２００及びカメラ３００をスマートスピーカ２００Ａ及びカメラ３００Ａ、場所Ｂに設置されたスマートスピーカ２００及びカメラ３００をスマートスピーカ２００Ｂ及びカメラ３００Ｂ、場所Ｃに設置されたスマートスピーカ２００及びカメラ３００をスマートスピーカ２００Ｃ及びカメラ３００Ｃと称する。 In the example shown in FIG. 1, for convenience, the smart speaker 200 and camera 300 installed at location A are designated as smart speaker 200A and camera 300A, and the smart speaker 200 and camera 300 installed at location B are designated as smart speaker 200B and The camera 300B, smart speaker 200 and camera 300 installed at location C are referred to as smart speaker 200C and camera 300C.

ここで、図１に示す例では、全ての場所でスマートスピーカ２００及びカメラ３００の組を示しているが、実際には、スマートスピーカ２００及びカメラ３００は、最初に利用者Ｕと接触する場所（場所Ａ）で両方そろっていれば十分であり、２回目以降に利用者Ｕと接触する場所（場所Ｂ，場所Ｃ）では、スマートスピーカ２００及びカメラ３００のうち、少なくとも一方があればよい。スマートスピーカ２００のみが設置されている場所では、利用者Ｕとの対話による音声認識のみで利用者Ｕを特定する。カメラ３００のみが設置されている場所では、利用者Ｕの撮影による画像認識のみで利用者Ｕを特定する。 Here, in the example shown in FIG. 1, pairs of smart speakers 200 and cameras 300 are shown at all locations, but in reality, smart speakers 200 and cameras 300 are located at locations ( It is sufficient to have both at location A), and at least one of the smart speaker 200 and the camera 300 is sufficient at the locations where the user U comes into contact with the user U for the second time and thereafter (location B, location C). In a place where only the smart speaker 200 is installed, the user U is identified only by voice recognition through dialogue with the user U. In a place where only the camera 300 is installed, the user U is identified only by image recognition taken by the user U.

また、カメラ３００は、スマートスピーカ２００に接近してきた利用者Ｕを撮影して画像認識のみで利用者Ｕを特定し、特定された利用者Ｕに関する情報（識別情報等）をスマートスピーカ２００に提供してもよい。これにより、スマートスピーカ２００は、接近してきた利用者Ｕの使用言語で、当該利用者Ｕに話しかけることができる。 The camera 300 may also capture a picture of the user U approaching the smart speaker 200, identify the user U using only image recognition, and provide the smart speaker 200 with information about the identified user U (such as identification information). This allows the smart speaker 200 to speak to the approaching user U in the language used by the user U.

なお、上記のスマートスピーカ２００及びカメラ３００は、可搬式のものであって、各場所に簡易設置されているものであってもよい。例えば、スマートスピーカ２００及びカメラ３００は、スマートデバイスやノートＰＣ等に搭載又は接続されたものであってもよい。また、スマートスピーカ２００及びカメラ３００は、ワイヤレスデバイスであってもよい。 Note that the smart speaker 200 and camera 300 described above may be portable and easily installed at each location. For example, the smart speaker 200 and the camera 300 may be mounted on or connected to a smart device, a notebook PC, or the like. Additionally, smart speaker 200 and camera 300 may be wireless devices.

また、上記のスマートスピーカ２００及びカメラ３００は同一の店頭設置端末又はデジタルサイネージ（電子看板）に搭載されていてもよい。また、スマートスピーカ２００及びカメラ３００はＡＩ（Artificial Intelligence：人工知能）を搭載したロボット（自律型会話ロボット等）に搭載されていてもよい。 Furthermore, the above smart speaker 200 and camera 300 may be installed in the same in-store terminal or digital signage (electronic signboard). Further, the smart speaker 200 and the camera 300 may be installed in a robot (such as an autonomous conversation robot) equipped with AI (Artificial Intelligence).

また、利用者Ｕの端末装置１０が上記のスマートスピーカ２００及びカメラ３００の役割を果たしてもよい。例えば、利用者Ｕの端末装置１０が、店頭や特定の場所に配置（又は店頭設置端末に表示）されたＱＲコード（登録商標）等の二次元コードを読み取ることで、端末装置１０が上記のスマートスピーカ２００及びカメラ３００としての機能を発揮するためのアプリ（app：application program）を導入（インストール）するようにしてもよい。 Further, the terminal device 10 of the user U may play the role of the smart speaker 200 and camera 300 described above. For example, when the terminal device 10 of the user U reads a two-dimensional code such as a QR code (registered trademark) placed at a store or a specific location (or displayed on a terminal installed in the store), the terminal device 10 can read the above-mentioned An application program (app) for exhibiting the functions of the smart speaker 200 and the camera 300 may be introduced (installed).

また、スマートスピーカ２００及びカメラ３００が設置される場所（場所Ａ、場所Ｂ、場所Ｃ）は、例えばコンビニエンスストアやスーパーマーケット、ドラッグストア、ホームセンター、ディスカウントストア等の小売店、カフェやレストラン、酒場等の飲食店等の個々の店舗である。また、場所は、例えば大型商業施設（ショッピングセンター／アウトレットモール／地下街）、娯楽施設（テーマパーク／遊園地／遊戯場／映画館／動物園／水族館／プール／入浴施設）、文化施設（ホール／劇場／図書館／美術館／博物館）、複合施設、スポーツ施設、駐車場、サービスエリア（ＳＡ）やパーキングエリア（ＰＡ）、又は鉄道駅や道の駅、空港、港湾（乗船場）等であってもよい。また、場所は、例えば観光案内所、観光スポット、ランドマーク、城・城跡、寺社仏閣、公園、庭園、名所、旧跡、景勝地、温泉、キャンプ場／コテージ／ロッジ／バンガロー、山／海岸／湖畔／河川敷の周辺施設、観光農園や観光牧場等であってもよい。特に、海外からの訪日客等が訪れる可能性がある場所や人気の場所であると好ましい。 The locations (location A, location B, location C) where the smart speaker 200 and the camera 300 are installed are, for example, individual stores such as convenience stores, supermarkets, drug stores, home improvement stores, discount stores, and other retail stores, cafes, restaurants, bars, and other eating and drinking establishments. The locations may be, for example, large commercial facilities (shopping centers, outlet malls, underground shopping malls), entertainment facilities (theme parks, amusement parks, amusement parks, movie theaters, zoos, aquariums, swimming pools, bathing facilities), cultural facilities (hall, theater, library, art museum, museum), complex facilities, sports facilities, parking lots, service areas (SA) and parking areas (PA), or train stations, roadside stations, airports, ports (boating areas), and the like. The locations may be, for example, tourist information centers, tourist spots, landmarks, castles and castle ruins, temples and shrines, parks, gardens, famous places, historical sites, scenic spots, hot springs, campsites, cottages, lodges, bungalows, mountain, coast, lakeside, and riverbed surrounding facilities, tourist farms, tourist ranches, and the like. In particular, it would be preferable if the location is a popular place or one that is likely to attract visitors from overseas.

また、上記以外にも、スマートスピーカ２００及びカメラ３００が設置される場所は、例えばマンション・アパート等の集合住宅や戸建住宅、企業等のオフィスビル、店舗等の商業施設、ホテル等の宿泊施設、学校等の教育機関、病院等の医療機関、研究所等の研究機関、工場等の産業プラント、配送センター等の物流拠点等であってもよい。 In addition to the above, places where the smart speaker 200 and camera 300 are installed include, for example, apartment complexes such as condominiums and apartments, detached houses, office buildings such as companies, commercial facilities such as stores, and accommodation facilities such as hotels. , educational institutions such as schools, medical institutions such as hospitals, research institutions such as research institutes, industrial plants such as factories, distribution bases such as distribution centers, etc.

〔１－１．海外からの訪日客への情報提供及び行動追跡〕
ここで、図１を参照して、海外からの訪日客への情報提供及び行動追跡について説明する。 [1-1. Providing information to and tracking the behavior of overseas visitors to Japan]
Now, with reference to FIG. 1, the provision of information to and behavior tracking of visitors from overseas to Japan will be described.

図１に示すように、場所Ａに設置されたスマートスピーカ２００が対話した利用者Ｕの音声データを取得し、場所Ａに設置されたカメラ３００が撮影した利用者Ｕの画像データを取得する（ステップＳ１）。このとき、スマートスピーカ２００及びカメラ３００はそれぞれ取得された利用者Ｕの音声データ及び画像データを情報提供装置１００に送信してもよい。 As shown in FIG. 1, the smart speaker 200 installed at location A acquires the voice data of the user U with whom the user U interacted, and the image data of the user U captured by the camera 300 installed at the location A is acquired ( Step S1). At this time, the smart speaker 200 and the camera 300 may transmit the acquired audio data and image data of the user U to the information providing device 100, respectively.

続いて、スマートスピーカ２００は利用者Ｕの音声データを用いた音声認識により、カメラ３００は利用者Ｕの画像データを用いた画像認識により、利用者Ｕを特定する（ステップＳ２）。なお、実際には、情報提供装置１００が、利用者Ｕの音声データ及び画像データを用いた音声認識及び画像認識により、利用者Ｕを特定してもよい。また、特定された利用者Ｕを識別するための識別情報が設定される。 Next, the smart speaker 200 identifies the user U through voice recognition using the user U's voice data, and the camera 300 identifies the user U through image recognition using the user U's image data (step S2). Note that, in reality, the information providing apparatus 100 may identify the user U through voice recognition and image recognition using the user's U voice data and image data. Further, identification information for identifying the specified user U is set.

続いて、情報提供装置１００は、スマートスピーカ２００及びカメラ３００が利用者Ｕの音声データ及び画像データを取得した場所Ａの識別情報（位置情報でも可）と、データが取得された日時に関する情報とを、特定された利用者Ｕの識別情報に紐づけて、利用者Ｕの行動履歴として蓄積する（ステップＳ３）。 Next, the information providing device 100 provides identification information (position information is also acceptable) of the location A where the smart speaker 200 and camera 300 acquired the audio data and image data of the user U, and information regarding the date and time when the data was acquired. is associated with the identified identification information of the user U and accumulated as the action history of the user U (step S3).

なお、実際には、情報提供装置１００は、対話した利用者Ｕの端末装置１０と近距離無線通信を行ったスマートスピーカ２００（店頭設置端末でも可）から、スマートスピーカ２００の位置情報（又は設置場所の情報）と利用者Ｕの端末装置１０の識別情報とを取得し、利用者Ｕの端末装置１０の識別情報から利用者Ｕを特定し、スマートスピーカ２００の位置情報と利用者Ｕの端末装置１０の識別情報を取得した日時とを紐づけて利用者Ｕの行動履歴として蓄積してもよい。 Note that, in reality, the information providing device 100 obtains location information (or installed location information) and the identification information of the terminal device 10 of the user U, identify the user U from the identification information of the terminal device 10 of the user U, and obtain the location information of the smart speaker 200 and the identification information of the user U's terminal device 10. The date and time when the identification information of the device 10 was acquired may be linked and accumulated as the user U's action history.

続いて、スマートスピーカ２００は、利用者Ｕの音声データを用いた音声認識により、利用者Ｕの使用言語を推定し、使用言語から利用者Ｕが訪日外国人であると推定する（ステップＳ４）。なお、実際には、情報提供装置１００が、スマートスピーカ２００から送信された利用者Ｕの音声データを用いた音声認識により、利用者Ｕの使用言語を推定し、使用言語から利用者Ｕが訪日外国人であると推定してもよい。さらに、カメラ３００は、利用者Ｕの画像データを用いた画像認識により、利用者Ｕの容姿、衣装及び所持品を判別し、利用者Ｕの容姿、衣装及び所持品から、利用者Ｕが訪日外国人であると推定してもよい。 Next, the smart speaker 200 estimates the language used by the user U through voice recognition using the voice data of the user U, and estimates that the user U is a foreigner visiting Japan from the language used (step S4). . In reality, the information providing device 100 estimates the language used by the user U through voice recognition using the voice data of the user U transmitted from the smart speaker 200, and from the language used, the information providing device 100 estimates the language used by the user U when visiting Japan. It may be assumed that the person is a foreigner. Furthermore, the camera 300 determines the appearance, clothes, and belongings of the user U through image recognition using the image data of the user U, and from the appearance, clothes, and belongings of the user U, the user U It may be assumed that the person is a foreigner.

続いて、スマートスピーカ２００は、情報提供装置１００と連携し、利用者Ｕの使用言語に応じた情報提供を行う（ステップＳ５）。すなわち、スマートスピーカ２００は、利用者Ｕの使用言語ごとに異なる内容の情報提供を行う（使用言語ごとに情報提供の内容を変更する）。利用者Ｕの使用言語に応じた利用者Ｕへの情報提供は、例えば当該使用言語を使用する訪日客におすすめの観光地や商品及びサービスの案内等である。あるいは、対話の中で挙げられた利用者Ｕの要望（要求）に応じた店舗や施設等の案内等であってもよい。本実施形態では、スマートスピーカ２００は、当該使用言語を使用する訪日客を対象とした情報提供を行う。 Next, the smart speaker 200 cooperates with the information providing device 100 to provide information according to the language used by the user U (step S5). That is, the smart speaker 200 provides different information depending on the language used by the user U (changes the content of the information provided depending on the language used). The provision of information to the user U according to the language used by the user U includes, for example, information on recommended sightseeing spots, products, and services for visitors to Japan who use the language concerned. Alternatively, it may be information about stores, facilities, etc. that corresponds to the wishes (requirements) of the user U mentioned during the dialogue. In this embodiment, the smart speaker 200 provides information targeted at visitors to Japan who use the relevant language.

例えば、利用者Ｕへの情報提供は、他の場所（場所Ｂ、場所Ｃ等）への案内や商品及びサービスの広告提示又はクーポン発行等であってもよい。また、当該使用言語を使用する訪日客に人気のアクティビティ（遊び・体験）等に関する情報提供であってもよい。また、当該使用言語を使用する訪日客があまりいない場所（穴場等）に関する情報提供であってもよい。同邦（同国）の人が集まる場所を避けたい訪日客もいると推測される。また、情報提供は、当該使用言語を使用する訪日客に対する緊急速報や注意喚起等であってもよい。 For example, the provision of information to user U may include guidance to other locations (location B, location C, etc.), presentation of advertisements for products and services, or issuance of coupons. Alternatively, information may be provided regarding activities (plays/experiences) that are popular among visitors to Japan who use the language concerned. Alternatively, the information may be provided regarding places (such as hidden spots) where there are not many visitors to Japan who use the language concerned. It is assumed that some visitors to Japan want to avoid places where people from the same country gather. Further, the information provision may be an emergency alert or a warning for visitors to Japan who use the language concerned.

ここでは、スマートスピーカ２００は、利用者Ｕの使用言語による音声案内で利用者Ｕへの情報提供を行う。なお、実際には、情報提供装置１００がスマートスピーカ２００を介して、利用者Ｕの使用言語による音声案内で利用者Ｕへの情報提供を行うようにしてもよい。また、スマートスピーカ２００は、表示装置がスマートスピーカ２００に搭載又は接続されている場合には、当該表示装置に案内表示を行うことで情報提供してもよい。すなわち、利用者Ｕの使用言語による案内は、音声案内であってもよいし、文字及び画像による案内表示であってもよい（両方でもよい）。 Here, the smart speaker 200 provides information to the user U through voice guidance in the language used by the user U. Note that, in reality, the information providing device 100 may provide information to the user U via the smart speaker 200 with voice guidance in the language used by the user U. Furthermore, if a display device is installed or connected to the smart speaker 200, the smart speaker 200 may provide information by displaying a guide on the display device. That is, the guidance in the language used by the user U may be voice guidance, or may be guidance display using text and images (both may be used).

このとき、情報提供装置１００は、利用者Ｕと同じ使用言語を使用する他の利用者の行動履歴について機械学習を行い、利用者Ｕに適した情報を推定してもよい。例えば、使用言語と、その使用言語を使用する利用者が訪問している場所とをデータセットとして、ニューラルネットワークによる機械学習の手法等を用いて学習モデルを構築する。そして、情報提供装置１００は、構築された学習モデルに利用者Ｕの使用言語を入力して、学習モデルからの出力として案内すべき場所に関する情報を取得する。ここでは、情報提供装置１００は、例えばＲＮＮ（Recurrent Neural Network）やＬＳＴＭ（Long short-term memory）等を用いた機械学習を経て生成される案内先推定モデルに利用者Ｕの使用言語を入力し、案内先推定モデルからの出力として案内すべき場所に関する情報を取得する。 At this time, the information providing apparatus 100 may perform machine learning on the behavior history of other users who use the same language as the user U, and estimate information suitable for the user U. For example, a learning model is constructed using a machine learning method using a neural network, etc., using a dataset of languages used and places visited by users who use those languages. Then, the information providing device 100 inputs the language used by the user U into the constructed learning model, and obtains information regarding the place to be guided as an output from the learning model. Here, the information providing device 100 inputs the language used by the user U into a guide destination estimation model that is generated through machine learning using, for example, RNN (Recurrent Neural Network) or LSTM (Long short-term memory). , information regarding the place to be guided is obtained as an output from the guide destination estimation model.

なお、ＲＮＮやＬＳＴＭは、アテンション（Attention）の仕組みに基づくニューラルネットワークであってもよい。アテンションは、文章のような前後の並びが重要なデータを扱うことができる。また、情報提供装置１００は、同様の自然言語処理モデルを用いてもよい。このようなモデルを用いて、利用者Ｕの使用言語を含む各種情報から案内すべき場所を推定することにより、より情報の順序に重点を置いた案内すべき場所の推定を実現することができる。 Note that the RNN and LSTM may be neural networks based on an attention mechanism. Attention can handle data such as sentences where the order in which they come before and after is important. Further, the information providing device 100 may use a similar natural language processing model. By using such a model to estimate the location to be guided from various information including the language used by user U, it is possible to estimate the location to be guided with more emphasis on the order of information. .

また、情報提供装置１００は、利用者Ｕの音声データや画像データから利用者Ｕを特定する精度を向上させるために機械学習を行ってもよい。音声データや画像データは、機械学習の手法と相性が良い。利用者Ｕの識別情報と利用者Ｕの音声データや画像データとの組を正解データとして学習を重ねることで、利用者Ｕを特定する精度が向上してくことが期待できる。利用者Ｕの音声データから、利用者Ｕの使用言語を推定する場合についても同様である。 Further, the information providing device 100 may perform machine learning in order to improve the accuracy of identifying the user U from the user U's voice data and image data. Audio data and image data are compatible with machine learning methods. By repeating learning using a set of user U's identification information and user U's audio data and image data as correct data, it can be expected that the accuracy of identifying user U will improve. The same applies to the case where the language used by the user U is estimated from the user U's voice data.

これにより、海外からの訪日客等の言語コミュニケーションギャップや情報不足を解決するとともに、国内でサービスを展開する企業等が取得できない海外からの訪日客等の人流やニーズを取得することができる。 This will resolve the language communication gap and lack of information faced by overseas visitors to Japan, and will also enable companies providing services domestically to obtain information on the flow of people and their needs.

また、本実施形態では、利用者Ｕの属性等の個人情報は収集しない。そのため、仮に利用者Ｕの特定や使用言語の推定に失敗したとしても、利用者Ｕの個人情報の流出や漏洩の心配はない。利用者Ｕの特定や使用言語の推定に失敗した場合には、利用者Ｕに対する情報提供の精度が低下するだけである。 Furthermore, in this embodiment, personal information such as attributes of the user U is not collected. Therefore, even if identification of the user U or estimation of the language used fails, there is no need to worry about leakage or leak of personal information of the user U. If identification of the user U or estimation of the language used fails, the accuracy of information provided to the user U will simply decrease.

〔２．情報処理システムの構成例〕
次に、図２を用いて、実施形態に係る情報提供装置１００が含まれる情報処理システム１の構成について説明する。図２は、実施形態に係る情報処理システム１の構成例を示す図である。図２に示すように、実施形態に係る情報処理システム１は、情報提供装置１００とスマートスピーカ２００とカメラ３００とを含む。これらの各種装置は、ネットワークＮを介して、有線又は無線により通信可能に接続される。ネットワークＮは、例えば、ＬＡＮ（Local Area Network）や、インターネット等のＷＡＮ（Wide Area Network）である。 [2. Configuration example of information processing system]
Next, the configuration of the information processing system 1 including the information providing apparatus 100 according to the embodiment will be described using FIG. 2. FIG. 2 is a diagram showing a configuration example of the information processing system 1 according to the embodiment. As shown in FIG. 2, the information processing system 1 according to the embodiment includes an information providing device 100, a smart speaker 200, and a camera 300. These various devices are connected via a network N so that they can communicate by wire or wirelessly. The network N is, for example, a LAN (Local Area Network) or a WAN (Wide Area Network) such as the Internet.

また、図２に示す情報処理システム１に含まれる各装置の数は図示したものに限られない。例えば、図２では、図示の簡略化のため、スマートスピーカ２００とカメラ３００とをそれぞれ３台だけ示したが、これはあくまでも例示であって限定されるものではなく、４台以上であってもよい。 Furthermore, the number of devices included in the information processing system 1 shown in FIG. 2 is not limited to what is illustrated. For example, in FIG. 2, only three smart speakers 200 and three cameras 300 are shown to simplify the illustration, but this is just an example and is not limited to four or more. good.

また、かかるスマートスピーカ２００及びカメラ３００は、ＬＴＥ（Long Term Evolution）、４Ｇ（4th Generation）、５Ｇ（5th Generation：第５世代移動通信システム）等の無線通信網や、Ｂｌｕｅｔｏｏｔｈ（登録商標）、無線ＬＡＮ（Local Area Network）等の近距離無線通信を介してネットワークＮに接続し、情報提供装置１００と通信することも可能である。また、スマートスピーカ２００は、近距離無線通信を介して端末装置１０と通信することも可能である。 Furthermore, the smart speaker 200 and the camera 300 are connected to wireless communication networks such as LTE (Long Term Evolution), 4G (4th Generation), 5G (5th Generation), Bluetooth (registered trademark), and wireless communication networks. It is also possible to connect to the network N via short-range wireless communication such as a LAN (Local Area Network) and communicate with the information providing device 100. Furthermore, the smart speaker 200 can also communicate with the terminal device 10 via short-range wireless communication.

情報提供装置１００は、例えばＰＣやサーバ装置、あるいはメインフレーム又はワークステーション等である。なお、情報提供装置１００は、クラウドコンピューティングにより実現されてもよい。 The information providing device 100 is, for example, a PC, a server device, a mainframe, a workstation, etc. The information providing device 100 may be realized by cloud computing.

〔３．情報提供装置の構成例〕
次に、図３を用いて、実施形態に係る情報提供装置１００の構成について説明する。図３は、実施形態に係る情報提供装置１００の構成例を示す図である。図３に示すように、情報提供装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。 [3. Configuration example of information providing device]
Next, the configuration of the information providing apparatus 100 according to the embodiment will be described using FIG. 3. FIG. 3 is a diagram showing a configuration example of the information providing apparatus 100 according to the embodiment. As shown in FIG. 3, the information providing device 100 includes a communication section 110, a storage section 120, and a control section 130.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。また、通信部１１０は、ネットワークＮ（図２参照）と有線又は無線で接続される。 (Communication Department 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card). Further, the communication unit 110 is connected to the network N (see FIG. 2) by wire or wirelessly.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、又は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、光ディスク等の記憶装置によって実現される。図３に示すように、記憶部１２０は、訪日客情報データベース１２１と、提供情報データベース１２２とを有する。 (Storage unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or an optical disk. Ru. As shown in FIG. 3, the storage unit 120 includes a visitor information database 121 and a provided information database 122.

（訪日客情報データベース１２１）
訪日客情報データベース１２１は、海外からの訪日客（来訪者）等である利用者Ｕに関する各種情報を記憶する。図４は、訪日客情報データベース１２１の一例を示す図である。図４に示した例では、訪日客情報データベース１２１は、「利用者ＩＤ（Identifier）」、「音声」、「画像」、「取得場所」、「取得日時」、「使用言語」といった項目を有する。 (Visitor information database 121)
The visitor information database 121 stores various information related to users U who are visitors (visitors) to Japan from overseas. Fig. 4 is a diagram showing an example of the visitor information database 121. In the example shown in Fig. 4, the visitor information database 121 has items such as "user ID (identifier)", "audio", "image", "acquisition location", "acquisition date and time", and "language used".

「利用者ＩＤ」は、利用者Ｕを識別するための識別情報を示す。なお、「利用者ＩＤ」は、情報提供装置１００、又はスマートスピーカ２００及びカメラ３００が自動的に割り当てた識別番号であってもよい。 “User ID” indicates identification information for identifying user U. Note that the "user ID" may be an identification number automatically assigned by the information providing device 100 or the smart speaker 200 and camera 300.

また、「音声」は、利用者Ｕと対話したスマートスピーカ２００により取得された利用者Ｕの音声データを示す。なお、実際には、その音声データの所在（格納場所：ディレクトリ及びファイル名、ＵＲＬ（Uniform Resource Locator）等）を示す情報であってもよい。 Moreover, "audio" indicates the user U's voice data acquired by the smart speaker 200 that interacted with the user U. Note that actually, the information may be information indicating the location of the audio data (storage location: directory and file name, URL (Uniform Resource Locator), etc.).

また、「画像」は、利用者Ｕを撮影したカメラ３００により取得された利用者Ｕの画像データを示す。なお、実際には、その画像データの所在（格納場所：ディレクトリ及びファイル名、ＵＲＬ等）を示す情報であってもよい。 In addition, "image" refers to image data of user U acquired by the camera 300 that photographed user U. Note that, in reality, it may also be information indicating the location of the image data (storage location: directory and file name, URL, etc.).

また、「取得場所」は、スマートスピーカ２００が音声データを取得した場所や、カメラ３００が画像データを取得した場所に関する情報を示す。すなわち、スマートスピーカ２００及びカメラ３００の設置場所に関する情報を示す。なお、場所に関する情報は、位置情報や住所等であってもよいし、設置場所の名称や識別情報等であってもよい。 Further, "acquisition location" indicates information regarding the location where the smart speaker 200 acquired the audio data and the location where the camera 300 acquired the image data. That is, information regarding the installation locations of the smart speaker 200 and camera 300 is shown. Note that the information regarding the location may be location information, address, etc., or may be the name of the installation location, identification information, etc.

また、「取得日時」は、スマートスピーカ２００が音声データを取得した日時や、カメラ３００が画像データを取得した日時に関する情報を示す。ここでは、取得場所と取得日時とに関する情報は紐づけられている。 Furthermore, the “acquisition date and time” indicates information regarding the date and time when the smart speaker 200 acquired the audio data and the date and time when the camera 300 acquired the image data. Here, information regarding the acquisition location and acquisition date and time are linked.

また、「使用言語」は、利用者Ｕの音声データ等から推定された利用者Ｕの使用言語の種別を示す。使用言語は、例えば、英語、フランス語、スペイン語、中国語、ロシア語、アラビア語等、各国又は各地域で使用されている言語である。また、使用言語は、方言や訛り等の違いにより、さらに細分化されてもよい。ここでは、音声データを取得した場所ごとに、使用言語を毎回推定するものとしているが、実際には最初の１回だけでもよい。また、利用者Ｕが複数の言語を使用した場合には、使用言語は複数であってもよい。 Furthermore, "language used" indicates the type of language used by user U that is estimated from user U's voice data, etc. Examples of languages used include English, French, Spanish, Chinese, Russian, Arabic, and other languages used in each country or region. Languages used may also be further subdivided based on differences in dialects, accents, and the like. Here, the language used is estimated each time for each location where voice data is obtained, but in reality it may be estimated only the first time. Furthermore, if user U uses multiple languages, the number of languages used may be multiple.

例えば、図４に示す例において、利用者ＩＤ「Ｕ１」により識別される利用者Ｕの音声データ「音声＃１Ａ」及び画像データ「画像＃１Ａ」が取得された場所は「場所Ａ」であり、取得された日時は「2021/6/5」であり、利用者Ｕの使用言語は「中国語」であることを示す。なお、中国語は、北京語や広東語のようにさらに細分化されていてもよい。 For example, in the example shown in FIG. 4, the location where the audio data "audio #1A" and the image data "image #1A" of the user U identified by the user ID "U1" are acquired is "location A". , the acquired date and time is "2021/6/5", indicating that the language used by user U is "Chinese". Note that Chinese may be further subdivided into Mandarin and Cantonese.

ここで、図４に示す例では、「Ｕ１」、「音声＃１Ａ」、「画像＃１Ａ」及び「場所Ａ」といった抽象的な値を用いて図示するが、「Ｕ１」、「音声＃１Ａ」、「画像＃１Ａ」及び「場所Ａ」には、具体的な文字列や数値等の情報が記憶されるものとする。以下、他の情報に関する図においても、抽象的な値を図示する場合がある。 Here, in the example shown in FIG. 4, abstract values such as "U1", "audio #1A", "image #1A", and "location A" are used for illustration, but "U1", "audio #1A" ", "Image #1A" and "Location A" store information such as specific character strings and numerical values. Below, abstract values may be illustrated in diagrams related to other information as well.

なお、訪日客情報データベース１２１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、訪日客情報データベース１２１は、利用者Ｕの端末装置１０に関する各種情報を記憶してもよい。また、訪日客情報データベース１２１は、利用者Ｕの使用言語や容姿等から推定された利用者Ｕの出身地（母国等）に関する情報を記憶してもよい。 Note that the visitor information database 121 is not limited to the above, and may store various information depending on the purpose. For example, the visitor information database 121 may store various information regarding the terminal device 10 of the user U. Furthermore, the visitor information database 121 may store information regarding the place of birth (home country, etc.) of the user U, which is estimated from the language used by the user U, his appearance, and the like.

（提供情報データベース１２２）
提供情報データベース１２２は、同じ言語を使用する訪日客に対して提供すべき情報を記憶する。図５は、提供情報データベース１２２の一例を示す図である。図５に示した例では、提供情報データベース１２２は、「使用言語」、「滞在地」、「訪問先」、「情報」といった項目を有する。 (Provided information database 122)
The provided information database 122 stores information to be provided to visitors to Japan who speak the same language. FIG. 5 is a diagram showing an example of the provided information database 122. In the example shown in FIG. 5, the provided information database 122 has items such as "language used", "place of stay", "destination visited", and "information".

「使用言語」は、海外からの訪日客（来訪者）等が使用する言語を示す。 “Language used” indicates the language used by foreign visitors to Japan (visitors).

また、「滞在地」は、過去に当該使用言語を使用する訪日客がスマートスピーカ２００と対話した場所を示す。すなわち、滞在地は、過去に当該使用言語を使用する他の訪日客に対して情報提供を行った場所を示す。なお、「滞在地」は、現在、当該使用言語を使用して訪日客と対話中のスマートスピーカ２００の所在地（当該使用言語で対話している訪日客の現在地）と照合される。 Furthermore, the “place of stay” indicates a place where a visitor to Japan who uses the relevant language has interacted with the smart speaker 200 in the past. That is, the place of stay indicates a place where information has been provided to other visitors to Japan who use the language concerned in the past. Note that the "place of stay" is checked against the location of the smart speaker 200 that is currently communicating with the visitor to Japan using the language in question (the current location of the visitor in Japan with whom the visitor is in conversation in the language in question).

また、「訪問先」は、過去に当該使用言語を使用する訪日客が滞在地から次に向かった場所（目的地等）を示す。すなわち、訪問先は、過去に当該使用言語を使用する他の訪日客が滞在地から次に向かった場所を示す。なお、訪問先は、過去に当該使用言語を使用する訪日客の集団の行動履歴から抽出される。このとき、各訪問先へ向かった過去の訪日客の人数等から、各訪問先に順位をつけて順位の高い順に並べてもよい（訪問先のランキング）。 Furthermore, the "visited destination" indicates the next place (destination, etc.) that a visitor to Japan who uses the language used in the past went from the place of stay. That is, the visited destination indicates the next place where other visitors to Japan who used the language used in the past went from their place of stay. Note that the visited destinations are extracted from the behavior history of a group of visitors to Japan who used the language in question in the past. At this time, each destination may be ranked based on the number of visitors to Japan who have visited each destination in the past, and arranged in descending order (destination ranking).

また、「情報」は、現在対話している訪日客（当該使用言語を使用している訪日客）が滞在地から次に向かうと予想される場所（訪問先）に関する情報を示す。すなわち、情報は、現在対話している訪日客に提供すべき情報を示す。このとき、情報も訪問先のランキングの順に並べて、訪問先のランキングの順に訪日客に情報提供されるようにしてもよい。なお、情報は、次に向かうと予想される場所（訪問先）に関する情報に限らず、旬な情報やおすすめ情報等であってもよい。また、情報は、当該使用言語を使用する訪日客に対する緊急速報や注意喚起等であってもよい。 Further, "information" indicates information regarding the place (visit destination) where the visitor to Japan with whom the visitor is currently interacting (the visitor who uses the language concerned) is expected to go next from the place of stay. That is, the information indicates information to be provided to the visitor to Japan with whom the visitor is currently interacting. At this time, the information may also be arranged in the order of the ranking of the visited destinations, so that the information is provided to the visitors to Japan in the order of the ranking of the visited destinations. Note that the information is not limited to information about the place where the user is expected to go next (destination to visit), but may also be current information, recommended information, or the like. Further, the information may be an emergency alert or a warning for visitors to Japan who use the language concerned.

例えば、図５に示す例において、使用言語「中国語」を使用する訪日客は、滞在地が「場所Ａ」である場合、次の訪問先として「場所Ｂ」に向かう傾向にあるため、情報提供として場所Ｂの最新情報や場所Ａから場所Ｂまでのルート案内及びその周辺の店舗・施設等に関する「情報Ｂ」を提供すると好ましいことを示す。また、情報Ｂは、場所Ｂと直接関係なく、場所Ｂに関連するカテゴリーの情報であってもよい。 For example, in the example shown in Figure 5, if a visitor to Japan whose language is "Chinese" stays at "Location A," he/she tends to head to "Location B" as his/her next destination. This indicates that it is preferable to provide the latest information on location B, route guidance from location A to location B, and "information B" regarding surrounding stores and facilities. Further, the information B may be information of a category related to the place B without being directly related to the place B.

ここで、図５に示す例では、「場所Ａ」、「場所Ｂ」及び「情報Ｂ」といった抽象的な値を用いて図示するが、「場所Ａ」、「場所Ｂ」及び「情報Ｂ」には、具体的な文字列や数値等の情報が記憶されるものとする。以下、他の情報に関する図においても、抽象的な値を図示する場合がある。 Here, in the example shown in FIG. 5, abstract values such as "location A", "location B", and "information B" are used for illustration, but "location A", "location B", and "information B" It is assumed that information such as specific character strings and numerical values is stored in . Below, abstract values may be illustrated in diagrams related to other information as well.

なお、提供情報データベース１２２は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、提供情報データベース１２２は、情報提供される訪問先の人気度（訪問人数）に関する情報を記憶してもよい。また、提供情報データベース１２２は、当該使用言語を使用する訪日客のうち、情報提供された訪問先へ実際に訪れた訪日客の人数や割合に関する情報を記憶してもよい。また、提供情報データベース１２２は、訪問先に限らず、当該使用言語を使用する訪日客に人気のサービスやアクティビティ（遊び・体験）等に関する情報を記憶してもよい。また、提供情報データベース１２２は、当該使用言語を使用する訪日客があまりいない場所（穴場等）に関する情報を記憶してもよい。また、提供情報データベース１２２は、当該使用言語を使用する訪日客に関する情報を、訪日の目的（ビジネス／観光）によって分類して記憶してもよい。 The provided information database 122 may store various information according to the purpose, not limited to the above. For example, the provided information database 122 may store information on the popularity (number of visitors) of the destination for which information is provided. The provided information database 122 may also store information on the number and percentage of visitors to Japan who use the language in question who actually visited the destination for which information is provided. The provided information database 122 may also store information on services and activities (play/experiences) that are popular among visitors to Japan who use the language in question, in addition to the destination. The provided information database 122 may also store information on places (hidden spots, etc.) where there are not many visitors to Japan who use the language in question. The provided information database 122 may also store information on visitors to Japan who use the language in question, classified according to the purpose of their visit to Japan (business/tourism).

（制御部１３０）
図３に戻り、説明を続ける。制御部１３０は、コントローラ（Controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等によって、情報提供装置１００の内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭ等の記憶領域を作業領域として実行されることにより実現される。図３に示す例では、制御部１３０は、取得部１３１と、特定部１３２と、管理部１３３と、推定部１３４と、提供部１３５とを有する。 (Control unit 130)
Returning to FIG. 3, the explanation will be continued. The control unit 130 is a controller, and controls the information providing apparatus 100 using, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field Programmable Gate Array). This is realized by executing various programs (corresponding to an example of an information processing program) stored in an internal storage device using a storage area such as a RAM as a work area. In the example shown in FIG. 3, the control unit 130 includes an acquisition unit 131, a specification unit 132, a management unit 133, an estimation unit 134, and a provision unit 135.

（取得部１３１）
取得部１３１は、スマートスピーカ２００が利用者Ｕ（ユーザ）と対話した際に、スマートスピーカ２００及びカメラ３００から、通信部１１０を介して、利用者Ｕの音声データ及び画像データを取得する。例えば、取得部１３１は、利用者Ｕと対話した店頭設置端末から利用者Ｕの音声データ及び画像データを取得する。あるいは、取得部１３１は、スマートスピーカ２００及びカメラ３００としての機能を有する利用者Ｕの端末装置１０を介して利用者Ｕと対話した際に、利用者Ｕの端末装置１０から利用者Ｕの音声データ及び画像データを取得する。 (Acquisition unit 131)
The acquisition unit 131 acquires audio data and image data of the user U from the smart speaker 200 and the camera 300 via the communication unit 110 when the smart speaker 200 interacts with the user U (user). For example, the acquisition unit 131 acquires voice data and image data of the user U from a terminal installed at a store with which the user U interacted. Alternatively, when the acquisition unit 131 interacts with the user U via the user U's terminal device 10 having functions as the smart speaker 200 and the camera 300, the acquisition unit 131 may receive the user U's voice from the user U's terminal device 10 Acquire data and image data.

また、取得部１３１は、利用者Ｕの端末装置１０と近距離無線通信を行った店頭設置端末から、店頭設置端末の所在情報（位置情報、識別情報）と利用者Ｕの端末装置１０の識別情報とを取得してもよい。 In addition, the acquisition unit 131 obtains the location information (position information, identification information) of the in-store terminal and the identification of the terminal device 10 of the user U from the in-store terminal that has performed near field wireless communication with the terminal device 10 of the user U. You may also obtain information.

（特定部１３２）
特定部１３２は、利用者Ｕの音声データに基づく音声認識、及び利用者Ｕの画像データに基づく画像認識により、利用者Ｕを特定する。また、特定部１３２は、利用者Ｕの端末装置１０の識別情報から利用者Ｕを特定してもよい。 (Specific unit 132)
The identification unit 132 identifies the user U through voice recognition based on the user U's voice data and image recognition based on the user U's image data. Further, the identifying unit 132 may identify the user U from the identification information of the user U's terminal device 10.

（管理部１３３）
管理部１３３は、利用者Ｕを特定した場所と日時とを紐づけて利用者Ｕの行動履歴として蓄積する。すなわち、管理部１３３は、利用者Ｕの音声データ及び画像データを取得した場所及び日時を紐づけて利用者Ｕの行動履歴として蓄積する。このとき、管理部１３３は、利用者Ｕを特定した場所ごとに、利用者Ｕを特定した場所と日時とを紐づけて利用者Ｕの行動履歴として蓄積することで、利用者Ｕが提供部１３５による情報提供で案内された場所を訪れたことを示す履歴情報を蓄積する。 (Management Department 133)
The management unit 133 associates the location where the user U was identified with the date and time and stores the location as the user U's action history. That is, the management unit 133 associates the location and date and time at which the user U's audio data and image data were acquired, and stores them as the user's U action history. At this time, the management unit 133 associates the location where the user U is identified with the date and time and stores it as the action history of the user U. The history information indicating that the user has visited a place guided by the information provided by 135 is accumulated.

また、管理部１３３は、店頭設置端末の所在情報と利用者Ｕの端末装置１０の識別情報を取得した日時とを紐づけて利用者Ｕの行動履歴として蓄積してもよい。 Further, the management unit 133 may associate the location information of the in-store terminal with the date and time when the identification information of the terminal device 10 of the user U is acquired, and store the information as the user U's action history.

（推定部１３４）
推定部１３４は、利用者Ｕの音声データから、利用者Ｕの使用言語を推定する。このとき、推定部１３４は、利用者Ｕとの対話（会話）の内容から、利用者Ｕの使用言語を推定してもよい。また、推定部１３４は、利用者Ｕが入力又は指定（選択）した文字の画像データから、利用者Ｕの使用言語を推定してもよい。あるいは、推定部１３４は、利用者Ｕが手にしているガイドブックや観光パンフレット等に記載されている言語から、利用者Ｕの使用言語を推定してもよい。また、推定部１３４は、利用者Ｕが入力又は指定（選択）した使用言語又は出身地に関する情報から、利用者Ｕの使用言語を推定してもよい。 (Estimation unit 134)
The estimation unit 134 estimates the language used by the user U from the voice data of the user U. At this time, the estimation unit 134 may estimate the language used by the user U from the content of the dialogue (conversation) with the user U. The estimation unit 134 may also estimate the language used by the user U from image data of characters input or specified (selected) by the user U. Alternatively, the estimation unit 134 may estimate the language used by the user U from the language written in a guidebook, tourist pamphlet, or the like held by the user U. The estimation unit 134 may also estimate the language used by the user U from the language used or information related to the place of origin input or specified (selected) by the user U.

また、推定部１３４は、利用者Ｕの使用言語により利用者Ｕが訪日外国人であると推定する。このとき、推定部１３４は、利用者Ｕの使用言語と利用者Ｕの容姿とにより利用者Ｕが訪日外国人であると推定してもよい。なお、容姿は、利用者Ｕの年齢層（年代）や性別、人種、その他の利用者Ｕの身体的特徴を含む。また、推定部１３４は、利用者Ｕの使用言語と利用者Ｕの衣装及び所持品とにより利用者Ｕが訪日外国人であると推定してもよい。なお、衣装は、衣服、被り物（帽子等）、履物（靴等）等、身にまとうもの全般を含む。また、所持品は、鞄や小物、装飾品、携行品も含む。例えば、推定部１３４は、所持品に付帯する記号（マーク）や象徴（シンボル）、所持品の形状や模様等から利用者Ｕが訪日外国人であると推定してもよい。 Furthermore, the estimation unit 134 estimates that the user U is a foreigner visiting Japan based on the language used by the user U. At this time, the estimation unit 134 may estimate that the user U is a foreigner visiting Japan based on the language used by the user U and the user U's appearance. Note that the appearance includes the user U's age group (generation), gender, race, and other physical characteristics of the user U. Furthermore, the estimation unit 134 may estimate that the user U is a foreigner visiting Japan based on the language used by the user U and the clothes and belongings of the user U. Note that costumes include all things worn by the person, such as clothes, headgear (hats, etc.), footwear (shoes, etc.). In addition, belongings include bags, accessories, accessories, and personal belongings. For example, the estimating unit 134 may estimate that the user U is a foreigner visiting Japan from the marks and symbols attached to the belongings, the shape and pattern of the belongings, and the like.

（提供部１３５）
提供部１３５は、利用者Ｕの使用言語による案内で利用者Ｕに情報提供する。このとき、提供部１３５は、利用者Ｕの使用言語による音声案内及び案内表示のうち少なくとも一方で利用者Ｕに情報提供してもよい。また、提供部１３５は、利用者Ｕの使用言語と、利用者Ｕの容姿、衣装及び所持品のうち少なくとも１つとに応じて、利用者Ｕへの情報提供の内容を変更する。 (Providing Unit 135)
The providing unit 135 provides information to the user U by guidance in the language used by the user U. At this time, the providing unit 135 may provide information to the user U by at least one of voice guidance and displayed guidance in the language used by the user U. In addition, the providing unit 135 changes the content of the information provided to the user U depending on the language used by the user U and at least one of the appearance, clothing, and belongings of the user U.

〔４．処理手順〕
次に、図６を用いて実施形態に係る情報提供装置１００、スマートスピーカ２００及びカメラ３００による処理手順について説明する。図６は、実施形態に係る処理手順を示すフローチャートである。なお、以下に示す処理手順は、情報提供装置１００、スマートスピーカ２００及びカメラ３００によって繰り返し実行される。 [4. Processing procedure]
Next, a processing procedure by the information providing device 100, smart speaker 200, and camera 300 according to the embodiment will be described using FIG. FIG. 6 is a flowchart showing the processing procedure according to the embodiment. Note that the processing procedure shown below is repeatedly executed by the information providing device 100, the smart speaker 200, and the camera 300.

図６に示すように、スマートスピーカ２００は、海外からの訪日客等である利用者Ｕと対話する（ステップＳ１０１）。このとき、スマートスピーカ２００は、情報提供装置１００と連携して、利用者Ｕとの対話の内容を情報提供装置１００に通知してもよい。例えば、情報提供装置１００がスマートスピーカ２００を介して利用者Ｕと対話してもよい。 As shown in FIG. 6, the smart speaker 200 interacts with the user U, who is a visitor from overseas, etc. (step S101). At this time, the smart speaker 200 may cooperate with the information providing device 100 to notify the information providing device 100 of the content of the interaction with the user U. For example, the information providing device 100 may interact with the user U via the smart speaker 200.

続いて、スマートスピーカ２００は、利用者Ｕと対話することにより、利用者Ｕの音声データを取得する（ステップＳ１０２）。このとき、情報提供装置１００の取得部１３１は、通信部１１０を介して、スマートスピーカ２００から利用者Ｕの音声データを取得する。 Next, the smart speaker 200 acquires user U's voice data by interacting with the user U (step S102). At this time, the acquisition unit 131 of the information providing device 100 acquires user U's audio data from the smart speaker 200 via the communication unit 110.

続いて、カメラ３００は、スマートスピーカ２００と対話中の利用者Ｕを撮影することにより、利用者Ｕの画像データを取得する（ステップＳ１０３）。このとき、情報提供装置１００の取得部１３１は、通信部１１０を介して、カメラ３００から利用者Ｕの画像データを取得する。 Subsequently, the camera 300 acquires image data of the user U by photographing the user U who is interacting with the smart speaker 200 (step S103). At this time, the acquisition unit 131 of the information providing device 100 acquires image data of the user U from the camera 300 via the communication unit 110.

続いて、スマートスピーカ２００は利用者Ｕの音声データに基づく音声認識により、カメラ３００は利用者Ｕの画像データ基づく画像認識により、利用者Ｕを特定する（ステップＳ１０４）。このとき、情報提供装置１００の特定部１３２は、利用者Ｕの音声データに基づく音声認識、及び利用者Ｕの画像データに基づく画像認識により、利用者Ｕを特定してもよい。また、取得部１３１は、対話した利用者Ｕの端末装置１０と近距離無線通信を行ったスマートスピーカ２００及び／又はカメラ３００から、利用者Ｕの端末装置１０の識別情報を取得してもよい。そして、特定部１３２は、利用者Ｕの端末装置１０の識別情報から利用者Ｕを特定してもよい。あるいは、取得部１３１は、利用者Ｕを特定したスマートスピーカ２００及び／又はカメラ３００から、特定された利用者Ｕに関する情報を取得してもよい。 Next, the smart speaker 200 identifies the user U through voice recognition based on the user U's voice data, and the camera 300 identifies the user U through image recognition based on the user U's image data (step S104). At this time, the identification unit 132 of the information providing device 100 may identify the user U through voice recognition based on the user U's voice data and image recognition based on the user U's image data. Further, the acquisition unit 131 may acquire the identification information of the terminal device 10 of the user U with which the user U interacted with the smart speaker 200 and/or the camera 300 that performed short-range wireless communication with the terminal device 10. . The identification unit 132 may identify the user U from the identification information of the user U's terminal device 10. Alternatively, the acquisition unit 131 may acquire information regarding the identified user U from the smart speaker 200 and/or camera 300 that identified the user U.

続いて、情報提供装置１００の管理部１３３は、利用者Ｕを特定した場所と日時とを紐づけて利用者Ｕの行動履歴として蓄積する（ステップＳ１０５）。このとき、管理部１３３は、利用者Ｕを特定した場所ごとに、利用者Ｕを特定した場所と日時とを紐づけて利用者Ｕの行動履歴として蓄積することで、利用者Ｕが情報提供で案内された場所を訪れたことを示す履歴情報を蓄積する。 Subsequently, the management unit 133 of the information providing device 100 associates the location and date and time at which the user U was identified and accumulates this as the user U's action history (step S105). At this time, the management unit 133 stores the user U's behavior history by linking the date and time for each location where the user U is identified, so that the user U can provide information. Accumulates historical information indicating that you have visited the places you were guided to.

続いて、情報提供装置１００の推定部１３４は、利用者Ｕの音声データから、利用者Ｕの使用言語を推定する（ステップＳ１０６）。このとき、推定部１３４は、利用者Ｕとの対話（会話）の内容から、利用者Ｕの使用言語を推定してもよい。また、推定部１３４は、利用者Ｕが入力又は指定（選択）した文字の画像データから、利用者Ｕの使用言語を推定してもよい。 Subsequently, the estimation unit 134 of the information providing device 100 estimates the language used by the user U from the voice data of the user U (step S106). At this time, the estimating unit 134 may estimate the language used by the user U based on the content of the interaction (conversation) with the user U. Furthermore, the estimation unit 134 may estimate the language used by the user U from the image data of characters input or specified (selected) by the user U.

続いて、情報提供装置１００の推定部１３４は、利用者Ｕの使用言語により利用者Ｕが訪日外国人であると推定する（ステップＳ１０７）。このとき、推定部１３４は、利用者Ｕの使用言語と利用者Ｕの容姿、衣装及び所持品とにより利用者Ｕが訪日外国人であると推定してもよい。 Subsequently, the estimation unit 134 of the information providing device 100 estimates that the user U is a foreigner visiting Japan based on the language used by the user U (step S107). At this time, the estimation unit 134 may estimate that the user U is a foreigner visiting Japan based on the language used by the user U and the user U's appearance, costume, and belongings.

続いて、情報提供装置１００の提供部１３５は、利用者Ｕの使用言語に応じた情報提供を行う（ステップＳ１０８）。本実施形態では、提供部１３５は、当該使用言語を使用する訪日客を対象とした情報提供を行う。例えば、提供部１３５は、スマートスピーカ２００を介して、利用者Ｕの使用言語による音声案内で利用者Ｕに情報提供する。このとき、提供部１３５は、スマートスピーカ２００に対して、利用者Ｕへの情報提供に用いられるデータを提供する。また、スマートスピーカ２００は、表示装置がスマートスピーカ２００に搭載又は接続されている場合には、当該表示装置に案内表示を行うことで情報提供してもよい。すなわち、利用者Ｕの使用言語による案内は、音声案内であってもよいし、文字及び画像による案内表示であってもよい（両方でもよい）。 Then, the providing unit 135 of the information providing device 100 provides information according to the language used by the user U (step S108). In this embodiment, the providing unit 135 provides information targeted at visitors to Japan who use that language. For example, the providing unit 135 provides information to the user U by voice guidance in the language used by the user U via the smart speaker 200. At this time, the providing unit 135 provides the smart speaker 200 with data used to provide information to the user U. Furthermore, if a display device is mounted on or connected to the smart speaker 200, the smart speaker 200 may provide information by displaying guidance on the display device. That is, the guidance in the language used by the user U may be voice guidance or may be a guidance display using text and images (or both).

〔５．変形例〕
上述した端末装置１０及び情報提供装置１００は、上記実施形態以外にも種々の異なる形態にて実施されてよい。そこで、以下では、実施形態の変形例について説明する。 [5. Modified example]
The terminal device 10 and the information providing device 100 described above may be implemented in various different forms other than the above embodiments. Therefore, a modification of the embodiment will be described below.

上記の実施形態において、情報提供装置１００が実行している処理の一部又は全部は、実際には、スマートスピーカ２００が実行してもよい。例えば、スタンドアローン（Stand-alone）で（スマートスピーカ２００単体で）処理が完結してもよい。この場合、スマートスピーカ２００に、上記の実施形態における情報提供装置１００の機能が備わっているものとする。また、スマートスピーカ２００は、ネットワークＮを介して、他のスマートスピーカ２００と連携している。なお、上記の実施形態では、スマートスピーカ２００は情報提供装置１００と連携しているため、利用者Ｕから見れば、情報提供装置１００の処理もスマートスピーカ２００が実行しているように見える。すなわち、他の観点では、スマートスピーカ２００は、情報提供装置１００を備えているともいえる。 In the above embodiment, some or all of the processing performed by the information providing device 100 may actually be performed by the smart speaker 200. For example, the processing may be completed in a stand-alone manner (by the smart speaker 200 alone). In this case, the smart speaker 200 is assumed to have the functions of the information providing device 100 in the above embodiment. In addition, the smart speaker 200 is linked to other smart speakers 200 via the network N. Note that in the above embodiment, since the smart speaker 200 is linked to the information providing device 100, from the perspective of the user U, the processing of the information providing device 100 also appears to be performed by the smart speaker 200. In other words, from another perspective, it can be said that the smart speaker 200 is equipped with the information providing device 100.

また、上記の実施形態において、スマートスピーカ２００や情報提供装置１００は、対話における利用者Ｕの音声データから利用者Ｕの使用言語を推定しているが、実際は、カメラ３００や情報提供装置１００が、筆談等において利用者Ｕが記載又は選択した文字に関するデータから利用者Ｕの使用言語を推定してもよい。例えば、タッチパネル式の店頭設置端末等が、利用者Ｕから指やスタイラス等により文字入力（又は文字選択）を受け付けた際に、当該文字から利用者Ｕの使用言語を推定してもよい。 Furthermore, in the above embodiment, the smart speaker 200 and the information providing device 100 estimate the language used by the user U from the voice data of the user U during the conversation, but in reality, the camera 300 and the information providing device 100 , the language used by the user U may be estimated from data regarding characters written or selected by the user U in written communication or the like. For example, when a touch panel type in-store terminal or the like receives character input (or character selection) from user U using a finger, stylus, etc., the language used by user U may be estimated from the characters.

また、上記の実施形態において、カメラ３００や情報提供装置１００が、訪日客である利用者Ｕが手にしているガイドブックや観光パンフレット等に記載されている言語から、利用者Ｕの使用言語を推定してもよい。訪日客は、自身の使用言語に対応したガイドブックや観光パンフレット等を利用するものと推測される。 In addition, in the above embodiment, the camera 300 and the information providing device 100 may estimate the language used by the user U, who is a visitor to Japan, from the language written in a guidebook, tourist brochure, etc. held by the user U. It is assumed that visitors to Japan will use guidebooks, tourist brochures, etc. that correspond to the language they use.

〔６．効果〕
上述してきたように、本願に係る情報処理装置（情報提供装置１００）は、利用者Ｕ（ユーザ）と対話した際の利用者Ｕの音声を取得する取得部１３１と、音声認識により利用者Ｕを特定する特定部１３２と、利用者Ｕを特定した場所と日時とを紐づけて利用者Ｕの行動履歴として蓄積する管理部１３３と、利用者Ｕの使用言語により利用者Ｕが訪日外国人であると推定する推定部１３４と、利用者Ｕの使用言語に応じた情報提供をする提供部１３５と、を備える。 [6. effect〕
As described above, the information processing device (information providing device 100) according to the present application includes the acquisition unit 131 that acquires the voice of the user U (user) when interacting with the user, and the acquisition unit 131 that acquires the voice of the user U when interacting with the user an identification unit 132 that identifies the user U; a management unit 133 that associates the location and date and time at which the user U was identified and accumulates the action history of the user U; , and a providing section 135 that provides information according to the language used by the user U.

さらに、取得部１３１は、利用者Ｕと対話した際の利用者Ｕの画像を取得する。特定部１３２は、画像認識により利用者Ｕを特定する。 Further, the acquisition unit 131 acquires an image of the user U when interacting with the user U. The identification unit 132 identifies the user U through image recognition.

例えば、推定部１３４は、利用者Ｕの使用言語と利用者Ｕの容姿とにより利用者Ｕが訪日外国人であると推定する。 For example, the estimation unit 134 estimates that the user U is a foreigner visiting Japan based on the language used by the user U and the user U's appearance.

また、推定部１３４は、利用者Ｕの使用言語と利用者Ｕの衣装及び所持品とにより利用者Ｕが訪日外国人であると推定する。 Furthermore, the estimating unit 134 estimates that the user U is a foreigner visiting Japan based on the language used by the user U and the costume and belongings of the user U.

また、提供部１３５は、利用者Ｕの使用言語と、利用者Ｕの容姿、衣装及び所持品のうち少なくとも１つとに応じて、利用者Ｕへの情報提供の内容を変更する。 In addition, the providing unit 135 changes the content of the information provided to the user U depending on the language used by the user U and at least one of the user U's appearance, clothing, and belongings.

また、提供部１３５は、利用者Ｕの使用言語による音声案内及び案内表示のうち少なくとも一方で利用者Ｕに情報提供する。 Further, the providing unit 135 provides information to the user U at least through audio guidance and guidance display in the language used by the user U.

また、管理部１３３は、利用者Ｕを特定した場所ごとに、利用者Ｕを特定した場所と日時とを紐づけて利用者Ｕの行動履歴として蓄積することで、利用者Ｕが提供部１３５による情報提供で案内された場所を訪れたことを示す履歴情報を蓄積する。 In addition, the management unit 133 stores the location where the user U was identified as the user U's behavior history by linking the location where the user U was identified with the date and time for each location where the user U was identified, thereby storing the history information indicating that the user U visited the location guided by the information provided by the provision unit 135.

また、取得部１３１は、利用者Ｕと対話した店頭設置端末から利用者Ｕの音声を取得する。 Furthermore, the acquisition unit 131 acquires the voice of the user U from the in-store terminal with which the user U interacted.

また、取得部１３１は、利用者Ｕの端末装置を介して利用者Ｕと対話した際に、利用者Ｕの端末装置から利用者Ｕの音声を取得する。 The acquisition unit 131 also acquires the voice of the user U from the user U's terminal device when the user U interacts with the user U via the user's U terminal device.

また、取得部１３１は、利用者Ｕの端末装置と近距離無線通信を行った店頭設置端末から、店頭設置端末の所在情報と利用者Ｕの端末装置の識別情報とを取得する。特定部１３２は、利用者Ｕの端末装置の識別情報から利用者Ｕを特定する。管理部１３３は、店頭設置端末の所在情報と利用者Ｕの端末装置の識別情報を取得した日時とを紐づけて利用者Ｕの行動履歴として蓄積する。推定部１３４は、利用者Ｕが指定した使用言語により利用者Ｕが訪日外国人であると推定する。提供部１３５は、利用者Ｕが指定した使用言語による案内で利用者Ｕに情報提供する。 Further, the acquisition unit 131 acquires the location information of the in-store terminal and the identification information of the user U's terminal device from the in-store terminal that has performed short-range wireless communication with the user U's terminal device. The identification unit 132 identifies the user U from the identification information of the user U's terminal device. The management unit 133 associates the location information of the in-store terminal with the date and time when the identification information of the user U's terminal device was acquired, and stores the information as the user U's action history. The estimation unit 134 estimates that the user U is a foreigner visiting Japan based on the language specified by the user U. The providing unit 135 provides information to the user U with guidance in the language specified by the user U.

上述した各処理のいずれかもしくは組合せにより、本願に係る情報処理装置は、インバウンドの来訪者のデータを適切に取得することができる。 Through any one or a combination of the above-described processes, the information processing device according to the present application can appropriately acquire data on inbound visitors.

〔７．ハードウェア構成〕
また、上述した実施形態に係る情報提供装置１００は、例えば図７に示すような構成のコンピュータ１０００によって実現される。以下、情報提供装置１００を例に挙げて説明する。図７は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力Ｉ／Ｆ（Interface）１０６０、入力Ｉ／Ｆ１０７０、ネットワークＩ／Ｆ１０８０がバス１０９０により接続された形態を有する。 [7. Hardware configuration]
Further, the information providing apparatus 100 according to the embodiment described above is realized by a computer 1000 having a configuration as shown in FIG. 7, for example. The information providing apparatus 100 will be described below as an example. FIG. 7 is a diagram showing an example of the hardware configuration. The computer 1000 is connected to an output device 1010 and an input device 1020, and a calculation device 1030, a primary storage device 1040, a secondary storage device 1050, an output I/F (Interface) 1060, an input I/F 1070, and a network I/F 1080 are connected to a bus. 1090.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。演算装置１０３０は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等により実現される。 The arithmetic unit 1030 operates based on programs stored in the primary storage device 1040 and the secondary storage device 1050, programs read from the input device 1020, and performs various processes. The arithmetic device 1030 is realized by, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like.

一次記憶装置１０４０は、ＲＡＭ（Random Access Memory）等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等により実現される。二次記憶装置１０５０は、内蔵ストレージであってもよいし、外付けストレージであってもよい。また、二次記憶装置１０５０は、ＵＳＢ（Universal Serial Bus）メモリやＳＤ（Secure Digital）メモリカード等の取り外し可能な記憶媒体であってもよい。また、二次記憶装置１０５０は、クラウドストレージ（オンラインストレージ）やＮＡＳ（Network Attached Storage）、ファイルサーバ等であってもよい。 The primary storage device 1040 is a memory device such as a RAM (Random Access Memory) that primarily stores data used by the arithmetic device 1030 for various calculations. The secondary storage device 1050 is a storage device in which data used by the arithmetic device 1030 for various calculations and various databases are registered, and is realized by a ROM (Read Only Memory), a HDD (Hard Disk Drive), a SSD (Solid State Drive), a flash memory, or the like. The secondary storage device 1050 may be an internal storage device or an external storage device. The secondary storage device 1050 may be a removable storage medium such as a USB (Universal Serial Bus) memory or a SD (Secure Digital) memory card. The secondary storage device 1050 may be a cloud storage device (online storage device), a NAS (Network Attached Storage), a file server, or the like.

出力Ｉ／Ｆ１０６０は、ディスプレイ、プロジェクタ、及びプリンタ等といった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインターフェースであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力Ｉ／Ｆ１０７０は、マウス、キーボード、キーパッド、ボタン、及びスキャナ等といった各種の入力装置１０２０から情報を受信するためのインターフェースであり、例えば、ＵＳＢ等により実現される。 The output I/F 1060 is an interface for transmitting information to be output to the output device 1010 that outputs various information such as a display, a projector, and a printer. (Digital Visual Interface) and HDMI (registered trademark) (High Definition Multimedia Interface). Further, the input I/F 1070 is an interface for receiving information from various input devices 1020 such as a mouse, keyboard, keypad, button, scanner, etc., and is realized by, for example, a USB or the like.

また、出力Ｉ／Ｆ１０６０及び入力Ｉ／Ｆ１０７０はそれぞれ出力装置１０１０及び入力装置１０２０と無線で接続してもよい。すなわち、出力装置１０１０及び入力装置１０２０は、ワイヤレス機器であってもよい。 Further, the output I/F 1060 and the input I/F 1070 may be wirelessly connected to the output device 1010 and the input device 1020, respectively. That is, output device 1010 and input device 1020 may be wireless devices.

また、出力装置１０１０及び入力装置１０２０は、タッチパネルのように一体化していてもよい。この場合、出力Ｉ／Ｆ１０６０及び入力Ｉ／Ｆ１０７０も、入出力Ｉ／Ｆとして一体化していてもよい。 Moreover, the output device 1010 and the input device 1020 may be integrated like a touch panel. In this case, the output I/F 1060 and the input I/F 1070 may also be integrated as an input/output I/F.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、又は半導体メモリ等から情報を読み出す装置であってもよい。 Note that the input device 1020 is, for example, an optical recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), or a PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), or a tape. It may be a device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like.

ネットワークＩ／Ｆ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 Network I/F 1080 receives data from other devices via network N and sends it to computing device 1030, and also sends data generated by computing device 1030 to other devices via network N.

演算装置１０３０は、出力Ｉ／Ｆ１０６０や入力Ｉ／Ｆ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 Arithmetic device 1030 controls output device 1010 and input device 1020 via output I/F 1060 and input I/F 1070. For example, the arithmetic device 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040, and executes the loaded program.

例えば、コンピュータ１０００が情報提供装置１００として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。また、コンピュータ１０００の演算装置１０３０は、ネットワークＩ／Ｆ１０８０を介して他の機器から取得したプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行してもよい。また、コンピュータ１０００の演算装置１０３０は、ネットワークＩ／Ｆ１０８０を介して他の機器と連携し、プログラムの機能やデータ等を他の機器の他のプログラムから呼び出して利用してもよい。 For example, when the computer 1000 functions as the information providing device 100, the arithmetic unit 1030 of the computer 1000 realizes the functions of the control unit 130 by executing a program loaded onto the primary storage device 1040. Further, the arithmetic device 1030 of the computer 1000 may load a program obtained from another device via the network I/F 1080 onto the primary storage device 1040, and execute the loaded program. Furthermore, the arithmetic unit 1030 of the computer 1000 may cooperate with other devices via the network I/F 1080, and may call and use program functions, data, etc. from other programs of other devices.

〔８．その他〕
以上、本願の実施形態を説明したが、これら実施形態の内容により本発明が限定されるものではない。また、前述した構成要素には、当業者が容易に想定できるもの、実質的に同一のもの、いわゆる均等の範囲のものが含まれる。さらに、前述した構成要素は適宜組み合わせることが可能である。さらに、前述した実施形態の要旨を逸脱しない範囲で構成要素の種々の省略、置換又は変更を行うことができる。 [8. others〕
Although the embodiments of the present application have been described above, the present invention is not limited to the contents of these embodiments. Furthermore, the above-mentioned components include those that can be easily assumed by those skilled in the art, those that are substantially the same, and those that are in a so-called equivalent range. Furthermore, the aforementioned components can be combined as appropriate. Furthermore, various omissions, substitutions, or modifications of the constituent elements can be made without departing from the gist of the embodiments described above.

また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 Further, among the processes described in the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed manually. All or part of this can also be performed automatically using known methods. In addition, information including the processing procedures, specific names, and various data and parameters shown in the above documents and drawings may be changed arbitrarily, unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 In addition, each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc.

例えば、上述した情報提供装置１００は、複数のサーバコンピュータで実現してもよく、また、機能によっては外部のプラットフォーム等をＡＰＩ（Application Programming Interface）やネットワークコンピューティング等で呼び出して実現するなど、構成は柔軟に変更できる。 For example, the information providing apparatus 100 described above may be realized by a plurality of server computers, and depending on the function, it may be realized by calling an external platform etc. using an API (Application Programming Interface), network computing, etc. can be changed flexibly.

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Furthermore, the above-described embodiments and modifications can be combined as appropriate within a range that does not conflict with the processing contents.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Further, the above-mentioned "section, module, unit" can be read as "means", "circuit", etc. For example, the acquisition unit can be read as an acquisition means or an acquisition circuit.

１情報処理システム
１０端末装置
１００情報提供装置
１１０通信部
１２０記憶部
１２１訪日客情報データベース
１２２提供情報データベース
１３０制御部
１３１取得部
１３２特定部
１３３管理部
１３４推定部
１３５提供部 1 Information processing system 10 Terminal device 100 Information providing device 110 Communication unit 120 Storage unit 121 Visitor information database 122 Provided information database 130 Control unit 131 Acquisition unit 132 Identification unit 133 Management unit 134 Estimation unit 135 Provision unit

Claims

an acquisition unit that acquires the user's voice when interacting with the user through one of the smart speakers installed at a plurality of locations ;
an identification unit that identifies that the users who interacted with the user at different locations or at different times are the same person through voice recognition using the user's voice ;
a management unit that associates the location and date and time at which the user 's voice was acquired and stores it as the user's action history;
an estimation unit that estimates the language used by the user through voice recognition using the user's voice, and estimates that the user is a foreigner visiting Japan based on the language used by the user;
a providing unit that provides information according to the language used by the user via a smart speaker or display device installed at a location where the user's voice is acquired ;
An information processing device comprising:

The acquisition unit is further configured to capture an image of the user through a camera installed together with the smart speaker when the user interacts with the smart speaker through one of the smart speakers installed at the plurality of locations. Acquired,
The identification unit further identifies that the users who interacted with the user at different locations or at different times are the same person through image recognition using the image of the user ;
The estimating unit further determines the user's appearance, costume, and belongings through image recognition using the user's image, and determines that the user is a foreigner visiting Japan from the user's appearance, costume, and belongings. The information processing device according to claim 1, wherein the information processing device estimates .

The information processing apparatus according to claim 2 , wherein the identification unit treats the same person as if they use different languages, when the identification unit determines that the person is the same person through image recognition .

The estimation unit estimates the object of interest of the user from the user's appearance , costume, belongings, or behavior ,
The information processing apparatus according to claim 2 or 3 , wherein the providing unit changes the content of information provision to the user according to the estimated object of interest of the user .

The provision unit may provide the user with information on places, products or services recommended for visitors to Japan who speak the language used, issue coupons, and provide information on places and products that meet the user's requests raised during dialogue. Or providing information about services, providing information on popular activities for visitors to Japan who use the language mentioned above, providing information about hidden places where there are not many visitors who use the language mentioned above, or providing emergency information to visitors who use the language mentioned above. Provide breaking news and alerts
The information processing device according to any one of claims 1 to 4.

The estimation unit performs machine learning on the behavior history of other users who use the same language as the user, and estimates information suitable for the user,
The information processing device according to any one of claims 1 to 5, wherein the providing unit provides information suitable for the user.

The management unit associates the location and date and time of the user 's voice for each location where the user 's voice was acquired, and stores it as the user's action history, so that the user receives the information. The location and the location visited by the user after receiving the information are accumulated, indicating that the user visited the location as a result of the user being guided to the recommended location by the information provision by the providing unit. The information processing device according to any one of claims 1 to 6, wherein the information processing device stores history information.

The acquisition unit acquires the user's voice from an in-store terminal that has interacted with the user,
Claims 1 to 7, wherein the management unit associates the installation location of the in-store terminal that acquired the user's voice with the date and time when the user's voice was acquired, and stores the result as the user's action history. The information processing device according to any one of the above.

The acquisition unit acquires the user's voice from the user's terminal device when interacting with the user via the user's terminal device ,
According to any one of claims 1 to 8 , the management unit associates a date and time with a place where the user's terminal device acquired the user's voice and stores the same as the user's action history. The information processing device described in .

If the provision unit estimates that the user is a foreigner visiting Japan at the first location, and the user moves from the first location to a second location where a smart speaker is installed, The information processing device according to any one of claims 1 to 9, wherein the information processing device provides information via a smart speaker or a display device installed at the second location .

An information processing method executed by an information processing device,
An acquisition step of acquiring a voice of a user when the user interacts with the user through any one of smart speakers installed in a plurality of locations ;
a step of identifying the users who have had conversations at different locations or dates and times as the same person through speech recognition using the voices of the users ;
A management process of linking the location and date and time of acquiring the user's voice to accumulate the user's behavior history;
an estimation step of estimating a language used by the user through speech recognition using the user's voice and estimating that the user is a foreign visitor to Japan based on the language used by the user;
A providing step of providing information according to the language used by the user via a smart speaker or a display device installed at a location where the user's voice is acquired ;
13. An information processing method comprising:

An acquisition step of acquiring a voice of a user when the user interacts with the user through one of smart speakers installed in a plurality of locations ;
a step of identifying that the users who have had conversations at different locations or dates and times are the same person by using voice recognition using the voice of the users ;
A management procedure for linking the location and date and time of acquisition of the user's voice with each other and accumulating the user's behavior history;
an estimation step of estimating a language used by the user through speech recognition using the user's voice, and estimating that the user is a foreign visitor to Japan based on the language used by the user;
A provision step of providing information according to the language used by the user via a smart speaker or a display device installed at a location where the user's voice is acquired ;
An information processing program for causing a computer to execute the above.