JP6172353B2

JP6172353B2 - Terminal apparatus, object identification method, information processing apparatus, and program

Info

Publication number: JP6172353B2
Application number: JP2016138147A
Authority: JP
Inventors: 隆之芦ヶ原; 福地　正樹; 正樹福地
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2016-07-13
Filing date: 2016-07-13
Publication date: 2017-08-02
Anticipated expiration: 2031-03-25
Also published as: JP2016186817A

Description

本発明は端末装置、物体識別方法、情報処理装置、及びプログラムに関する。 The present invention relates to a terminal device, an object identification method, an information processing device, and a program.

近年、画像認識技術が高度化し、カメラからの入力画像に映る物体の位置や姿勢を画像特徴量のマッチングによって識別することが可能となっている。このような物体識別の応用例の１つは、拡張現実（ＡＲ：Augmented Reality）アプリケーションである。ＡＲアプリケーションでは、実世界に存在する建物、道路又はその他の物体を映した画像内で、様々な情報（例えば、広告情報、ナビゲーション情報又はゲームのための情報など）が物体と関連付けて付加的に表示され得る。 In recent years, image recognition technology has been advanced, and the position and orientation of an object shown in an input image from a camera can be identified by matching image feature amounts. One application of such object identification is an augmented reality (AR) application. In an AR application, various information (for example, advertisement information, navigation information, or information for a game) is additionally associated with an object in an image showing a building, road, or other object existing in the real world. Can be displayed.

下記特許文献１は、視点の変化、明度変化及びノイズに対するロバスト性を高めた、物体識別のための特徴量抽出アルゴリズムを提案している。下記特許文献２は、より処理コストが少なく高速に動作し得る、Random Ferns法と呼ばれる特徴量抽出アルゴリズムを提案している。 Patent Document 1 below proposes a feature quantity extraction algorithm for object identification with improved robustness against changes in viewpoints, changes in brightness, and noise. The following Patent Document 2 proposes a feature quantity extraction algorithm called a Random Ferns method, which can be operated at high speed with a lower processing cost.

特許第４４９２０３６号公報Japanese Patent No. 4492036 Mustafa Oezuysal，“Fast Keypoint Recognition using Random Ferns”，IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.32, Nr.3, pp.448-461, March 2010Mustafa Oezuysal, “Fast Keypoint Recognition using Random Ferns”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.32, Nr.3, pp.448-461, March 2010

上述したように、画像に映る物体を識別するための特徴量抽出アルゴリズムには様々なものがある。しかし、一般的に、より高い識別性能を実現可能なアルゴリズムほど、より多くの処理コストを要する。そのため、例えば携帯端末などの処理リソースの少ない装置上で物体識別を行う際には、識別性能（識別の精度及び同時に識別可能な物体の数など）に制約が生じる。また、豊富な処理リソースを有するサーバに画像を毎フレーム転送して物体識別を実行させると、サーバからの応答を待つ間の遅延がアプリケーションの即応性を阻害し得る。 As described above, there are various feature amount extraction algorithms for identifying an object shown in an image. However, in general, algorithms that can achieve higher discrimination performance require more processing costs. For this reason, for example, when performing object identification on an apparatus with few processing resources such as a portable terminal, there is a restriction on identification performance (such as the accuracy of identification and the number of objects that can be simultaneously identified). Further, when an image is transferred frame by frame to a server having abundant processing resources to execute object identification, a delay while waiting for a response from the server can hinder the responsiveness of the application.

そこで、本発明は、処理リソースの少ない装置においてより高い物体識別の性能を発揮することのできる、端末装置、物体識別方法、情報処理装置、及びプログラムを提供しようとするものである。 Therefore, the present invention is intended to provide a terminal device, an object identification method, an information processing device, and a program that can exhibit higher object identification performance in an apparatus with less processing resources.

本発明のある実施形態によれば、撮像された入力画像を取得する画像取得部と、前記入力画像、及び補助情報を、前記入力画像に映る物体を識別するために前記入力画像の特徴量と照合される第１の特徴量辞書を有するサーバへ送信する送信部と、前記第１の特徴量辞書に基づく物体の識別結果と前記補助情報に応じて前記サーバにより取得され、前記第１の特徴量辞書に基づいて識別された物体と共起する可能性が高い共起物体についての特徴量を含む、第２の特徴量辞書を、前記サーバから受信する受信部と、を備え、前記第１の特徴量辞書に含まれる特徴量の抽出に用いられるアルゴリズムの精度と、前記第２の特徴量辞書に含まれる特徴量の抽出に用いられるアルゴリズムの精度は異なる、端末装置が提供される。 According to an embodiment of the present invention, an image acquisition unit that acquires a captured input image, the input image, and auxiliary information, a feature amount of the input image to identify an object appearing in the input image, and A transmission unit that transmits to a server having a first feature value dictionary to be collated, an object identification result based on the first feature value dictionary, and the auxiliary information, acquired by the server, and the first feature A receiving unit that receives from the server a second feature value dictionary that includes a feature value of a co-occurrence object that is highly likely to co-occur with an object identified based on the amount dictionary; There is provided a terminal device in which the accuracy of the algorithm used for extracting the feature amount included in the feature amount dictionary is different from the accuracy of the algorithm used for extracting the feature amount included in the second feature amount dictionary.

また、本発明の別の実施形態によれば、撮像された入力画像を取得することと、前記入力画像、及び補助情報を、前記入力画像に映る物体を識別するために前記入力画像の特徴量と照合される第１の特徴量辞書を有するサーバへ送信することと、前記第１の特徴量辞書に基づく物体の識別結果と前記補助情報に応じて前記サーバにより取得され、前記第１の特徴量辞書に基づいて識別された物体と共起する可能性が高い共起物体についての特徴量を含む、第２の特徴量辞書を、前記サーバから受信することと、を含み、前記第１の特徴量辞書に含まれる特徴量の抽出に用いられるアルゴリズムの精度と、前記第２の特徴量辞書に含まれる特徴量の抽出に用いられるアルゴリズムの精度は異なる、物体識別方法が提供される。 According to another embodiment of the present invention, a feature amount of the input image is obtained in order to identify the object shown in the input image by acquiring the captured input image and using the input image and the auxiliary information. To the server having the first feature value dictionary to be collated, and obtained by the server according to the object identification result based on the first feature value dictionary and the auxiliary information, and the first feature Receiving from the server a second feature dictionary that includes features for co-occurrence objects that are likely to co-occur with objects identified based on a quantity dictionary, and An object identification method is provided in which the accuracy of an algorithm used to extract a feature amount included in a feature amount dictionary is different from the accuracy of an algorithm used to extract a feature amount included in the second feature amount dictionary.

また、本発明の別の実施形態によれば、既知の物体画像の特徴量の集合である第１の特徴量辞書を記憶する記憶部と、端末装置により撮像される入力画像、及び補助情報を受信する受信部と、前記入力画像の特徴量を前記第１の特徴量辞書と照合することにより、前記入力画像に映る物体を識別する識別部と、前記識別部による識別結果と前記補助情報に応じて、前記第１の特徴量辞書に基づいて識別された物体と共起する可能性が高い共起物体についての特徴量を含む第２の特徴量辞書を取得する辞書取得部と、前記辞書取得部により取得される前記第２の特徴量辞書を前記端末装置へ送信する送信部と、を備え、前記第１の特徴量辞書に含まれる特徴量の抽出に用いられるアルゴリズムの精度と、前記第２の特徴量辞書に含まれる特徴量の抽出に用いられるアルゴリズムの精度は異なる、情報処理装置が提供される。 According to another embodiment of the present invention, a storage unit for storing a first feature dictionary that is a set of feature quantities of known object images, an input image captured by a terminal device, and auxiliary information are stored. A receiving unit for receiving, an identification unit for identifying an object appearing in the input image by collating the feature amount of the input image with the first feature amount dictionary, an identification result by the identification unit, and the auxiliary information Accordingly, a dictionary acquisition unit that acquires a second feature quantity dictionary including a feature quantity of a co-occurrence object that is highly likely to co-occur with an object identified based on the first feature quantity dictionary, and the dictionary A transmission unit that transmits the second feature value dictionary acquired by the acquisition unit to the terminal device, and the accuracy of an algorithm used for extracting the feature value included in the first feature value dictionary, Features included in the second feature dictionary The accuracy of the algorithm used in the extraction is different, the information processing apparatus is provided.

コンピュータに、撮像された入力画像を取得する機能と、前記入力画像、及び補助情報を、前記入力画像に映る物体を識別するために前記入力画像の特徴量と照合される第１の特徴量辞書を有するサーバへ送信する機能と、前記第１の特徴量辞書に基づく物体の識別結果と前記補助情報に応じて前記サーバにより取得され、前記第１の特徴量辞書に基づいて識別された物体と共起する可能性が高い共起物体についての特徴量を含む、第２の特徴量辞書を、前記サーバから受信する機能と、を実現させ、前記第１の特徴量辞書に含まれる特徴量の抽出に用いられるアルゴリズムの精度と、前記第２の特徴量辞書に含まれる特徴量の抽出に用いられるアルゴリズムの精度は異なる、プログラム。 A first feature dictionary in which a computer acquires a captured input image, and the input image and auxiliary information are collated with a feature of the input image to identify an object shown in the input image A function of transmitting to a server having an object, an identification result of an object based on the first feature value dictionary, and an object acquired by the server according to the auxiliary information and identified based on the first feature value dictionary, A function of receiving a second feature value dictionary including a feature value of a co-occurrence object having a high possibility of co-occurrence from the server, and realizing a feature amount included in the first feature value dictionary A program in which the accuracy of an algorithm used for extraction differs from the accuracy of an algorithm used for extracting feature amounts included in the second feature amount dictionary.

以上説明したように、本発明に係る端末装置、物体識別方法、情報処理装置、及びプログラムによれば、処理リソースの少ない装置においてより高い物体識別の性能を発揮することができる。 As described above, according to the terminal device, the object identification method, the information processing apparatus, and the program according to the present invention, it is possible to exhibit higher object identification performance in an apparatus with few processing resources.

一実施形態に係るシステムの概要について説明するための説明図である。It is explanatory drawing for demonstrating the outline | summary of the system which concerns on one Embodiment. 端末装置の画面上に表示され得る画像について説明するための説明図である。It is explanatory drawing for demonstrating the image which can be displayed on the screen of a terminal device. 一実施形態に係る端末装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the terminal device which concerns on one Embodiment. 一実施形態に係る端末装置の論理的機能の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the logical function of the terminal device which concerns on one Embodiment. 一実施形態に係る辞書サーバのハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the dictionary server which concerns on one Embodiment. 一実施形態に係る辞書サーバの論理的機能の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the logical function of the dictionary server which concerns on one Embodiment. 辞書サーバにより記憶される特徴量辞書の一例について説明するための説明図である。It is explanatory drawing for demonstrating an example of the feature-value dictionary memorize | stored by the dictionary server. 辞書サーバにより取得される辞書のサブセットの第１の例について説明するための説明図である。It is explanatory drawing for demonstrating the 1st example of the subset of the dictionary acquired by the dictionary server. 辞書サーバにより取得される辞書のサブセットの第２の例について説明するための説明図である。It is explanatory drawing for demonstrating the 2nd example of the subset of the dictionary acquired by the dictionary server. 辞書サーバにより取得される辞書のサブセットの第３の例について説明するための説明図である。It is explanatory drawing for demonstrating the 3rd example of the subset of the dictionary acquired by the dictionary server. 辞書サーバにより取得される辞書のサブセットの第４の例について説明するための説明図である。It is explanatory drawing for demonstrating the 4th example of the subset of the dictionary acquired by the dictionary server. 付加情報データベースにより記憶されるデータの一例について説明するための説明図である。It is explanatory drawing for demonstrating an example of the data memorize | stored by an additional information database. 第１の変形例に係る辞書サーバの論理的機能の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the logical function of the dictionary server which concerns on a 1st modification. 第１の変形例における特徴量辞書の生成について説明するための第１の説明図である。It is the 1st explanatory view for explaining generation of the feature-value dictionary in the 1st modification. 第１の変形例における特徴量辞書の生成について説明するための第２の説明図である。It is the 2nd explanatory view for explaining generation of the feature-value dictionary in the 1st modification. 第２の変形例に係る辞書サーバの論理的機能の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the logical function of the dictionary server which concerns on a 2nd modification. 第２の変形例において取得される辞書のサブセットの一例について説明するための説明図である。It is explanatory drawing for demonstrating an example of the subset of the dictionary acquired in the 2nd modification. 一実施形態に係る端末装置による処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process by the terminal device which concerns on one Embodiment. 一実施形態に係る辞書サーバによる処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process by the dictionary server which concerns on one Embodiment.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付すことにより重複説明を省略する。 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

また、以下の順序にしたがって当該「発明を実施するための形態」を説明する。
１．システムの概要
２．一実施形態に係る端末装置の構成例
２−１．ハードウェア構成
２−２．論理的な構成
３．一実施形態に係る辞書サーバの構成例
３−１．ハードウェア構成
３−２．論理的な構成
３−３．変形例
４．一実施形態に係る処理の流れ
４−１．端末側の処理
４−２．サーバ側の処理
５．まとめ Further, the “DETAILED DESCRIPTION OF THE INVENTION” will be described in the following order.
1. 1. System overview 2. Configuration example of terminal device according to one embodiment 2-1. Hardware configuration 2-2. 2. Logical configuration 3. Configuration example of dictionary server according to one embodiment 3-1. Hardware configuration 3-2. Logical configuration 3-3. Modified example 4. Flow of processing according to one embodiment 4-1. Processing on terminal side 4-2. Server-side processing Summary

＜１．システムの概要＞
図１は、本明細書で開示する技術が適用され得る物体識別システムの概要を示す説明図である。図１を参照すると、一実施形態に係る物体識別システム１が示されている。物体識別システム１は、端末装置１００及び辞書サーバ２００を含む。 <1. System overview>
FIG. 1 is an explanatory diagram showing an outline of an object identification system to which the technology disclosed in this specification can be applied. Referring to FIG. 1, an object identification system 1 according to an embodiment is shown. The object identification system 1 includes a terminal device 100 and a dictionary server 200.

端末装置１００は、撮像装置により撮像される画像に映る物体を識別する装置である。端末装置１００は、スマートフォン又はＰＤＡ（Personal Digital Assistant）などの、ユーザにより携帯される携帯端末であってもよい。また、端末装置１００は、ＰＣ（Personal Computer）、デジタル家電機器、ゲーム機器又は作業用ロボットなどのその他の種類の装置であってもよい。撮像装置は、端末装置１００に内蔵されてもよい。その代わりに、撮像装置は、端末装置１００の外部に設けられ、端末装置１００と有線又は無線で接続されてもよい。 The terminal device 100 is a device that identifies an object shown in an image captured by an imaging device. The terminal device 100 may be a mobile terminal carried by a user, such as a smartphone or a PDA (Personal Digital Assistant). The terminal device 100 may be another type of device such as a PC (Personal Computer), a digital home appliance, a game device, or a work robot. The imaging device may be built in the terminal device 100. Instead, the imaging device may be provided outside the terminal device 100 and connected to the terminal device 100 by wire or wirelessly.

端末装置１００は、画像に映る物体を識別するにあたり、画像から抽出される特徴量を１つ以上の物体についての既知の特徴量の集合である特徴量辞書と照合する。そして、端末装置１００は、照合によって算出されるスコア（以下、照合スコアという）に基づき、いずれの物体が画像に映っているかを識別する。なお、本明細書において、既知のある物体の特徴量についての照合スコアが「高い」とは、当該物体が入力画像に映っている可能性が高いことを意味する。例えば、ある特定の位置及び姿勢において既知の特徴量と入力画像の特徴量との間の差異がゼロに近ければ、当該入力画像にはその特徴量に対応する物体がその位置及び姿勢で映っている可能性が高い。このような状況を、本明細書では、（差異の評価値自体は小さいものの）照合スコアが「高い」という。即ち、端末装置１００は、画像に映っている物体の当該画像内での位置及び姿勢をも識別し得る。このような物体識別の結果を利用する様々なアプリケーションが、端末装置１００には搭載され得る。本明細書では、物体識別の結果を利用するＡＲアプリケーションが端末装置１００に搭載される例を主に説明する。しかしながら、端末装置１００において、他の目的（例えば、監視、環境認識又は作業支援など）を有するアプリケーションが物体識別の結果を利用してもよい。 When identifying an object appearing in an image, the terminal device 100 collates the feature amount extracted from the image with a feature amount dictionary that is a set of known feature amounts for one or more objects. Then, the terminal device 100 identifies which object is shown in the image based on a score calculated by matching (hereinafter referred to as a matching score). In the present specification, “high” in the collation score for the feature amount of a known object means that there is a high possibility that the object appears in the input image. For example, if the difference between a known feature quantity and a feature quantity of an input image is close to zero at a specific position and orientation, an object corresponding to the feature quantity is reflected in the input image at that position and orientation. There is a high possibility. This situation is referred to as “high” in this specification (although the evaluation value of the difference itself is small). That is, the terminal device 100 can also identify the position and orientation of the object shown in the image in the image. Various applications using the result of such object identification can be installed in the terminal device 100. In this specification, an example in which an AR application that uses the result of object identification is installed in the terminal device 100 will be mainly described. However, in the terminal device 100, an application having another purpose (for example, monitoring, environment recognition, work support, or the like) may use the object identification result.

辞書サーバ２００は、物体識別のための特徴量辞書を端末装置１００に提供する情報処理装置である。辞書サーバ２００は、ネットワーク３を介して端末装置１００と通信する。ネットワーク３は、インターネット、プロバイダネットワーク又は企業ネットワークなどのいかなる種類のネットワークであってもよい。本実施形態において、辞書サーバ２００は、端末装置１００から画像を受信する。そして、辞書サーバ２００は、受信した画像に映る物体を識別し、その識別結果に応じた特徴量辞書を端末装置１００に提供する。 The dictionary server 200 is an information processing device that provides the terminal device 100 with a feature dictionary for object identification. The dictionary server 200 communicates with the terminal device 100 via the network 3. The network 3 may be any kind of network such as the Internet, a provider network or a corporate network. In the present embodiment, the dictionary server 200 receives an image from the terminal device 100. Then, the dictionary server 200 identifies an object shown in the received image, and provides the terminal device 100 with a feature dictionary according to the identification result.

図２は、本実施形態において端末装置１００の画面上に表示され得る画像について説明するための説明図である。図２に例示された画像は、ＡＲアプリケーションの画像である。図２を参照すると、端末装置１００の画面上に、実空間に存在する建物１０を映した画像が表示されている。また、当該画像には、付加情報１２が重畳されている。付加情報１２は、建物１０において営業されているレストランの名称及びレーティングを示す情報である。このような付加情報は、端末装置１００における物体識別の結果に基づいて選択され、画像内の物体の位置に合わせて重畳される。本実施形態では、このように画像に重畳される付加情報のデータベースもまた、辞書サーバ２００から端末装置１００へ提供される。 FIG. 2 is an explanatory diagram for describing an image that can be displayed on the screen of the terminal device 100 in the present embodiment. The image illustrated in FIG. 2 is an AR application image. Referring to FIG. 2, an image showing the building 10 existing in the real space is displayed on the screen of the terminal device 100. Further, additional information 12 is superimposed on the image. The additional information 12 is information indicating the name and rating of a restaurant operating in the building 10. Such additional information is selected based on the result of object identification in the terminal device 100, and is superimposed according to the position of the object in the image. In the present embodiment, a database of additional information superimposed on an image in this way is also provided from the dictionary server 200 to the terminal device 100.

＜２．一実施形態に係る端末装置の構成例＞
［２−１．ハードウェア構成］
図３は、本実施形態に係る端末装置１００のハードウェア構成の一例を示すブロック図である。図３を参照すると、端末装置１００は、撮像部１０２、センサ部１０４、入力部１０６、記憶部１０８、表示部１１２、通信部１１４、バス１１８及び制御部１２０を備える。 <2. Configuration Example of Terminal Device According to One Embodiment>
[2-1. Hardware configuration]
FIG. 3 is a block diagram illustrating an example of a hardware configuration of the terminal device 100 according to the present embodiment. Referring to FIG. 3, the terminal device 100 includes an imaging unit 102, a sensor unit 104, an input unit 106, a storage unit 108, a display unit 112, a communication unit 114, a bus 118, and a control unit 120.

（撮像部）
撮像部１０２は、画像を撮像するカメラモジュールである。撮像部１０２は、ＣＣＤ（Charge Coupled Device）又はＣＭＯＳ（Complementary Metal Oxide Semiconductor）などの撮像素子を用いて実空間を撮像することにより、物体識別のための入力画像を生成する。 (Imaging part)
The imaging unit 102 is a camera module that captures an image. The imaging unit 102 generates an input image for object identification by imaging a real space using an imaging element such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor).

（センサ部）
センサ部１０４は、端末装置１００の位置及び姿勢の認識を支援するセンサ群である。例えば、センサ部１０４は、ＧＰＳ（Global Positioning System）信号を受信して端末装置１００の緯度、経度及び高度を測定するＧＰＳセンサを含んでもよい。また、センサ部１０４は、無線アクセスポイントから受信される無線信号の強度に基づいて端末装置１００の位置を測定する測位センサを含んでもよい。また、センサ部１０４は、端末装置１００の傾き角を測定するジャイロセンサ、３軸加速度を測定する加速度センサ、又は方位を測定する地磁気センサを含んでもよい。なお、端末装置１００が画像認識に基づく位置推定機能及び姿勢推定機能を有する場合には、センサ部１０４は端末装置１００の構成から省略されてもよい。 (Sensor part)
The sensor unit 104 is a sensor group that supports recognition of the position and orientation of the terminal device 100. For example, the sensor unit 104 may include a GPS sensor that receives a GPS (Global Positioning System) signal and measures the latitude, longitude, and altitude of the terminal device 100. In addition, the sensor unit 104 may include a positioning sensor that measures the position of the terminal device 100 based on the strength of the wireless signal received from the wireless access point. In addition, the sensor unit 104 may include a gyro sensor that measures the tilt angle of the terminal device 100, an acceleration sensor that measures triaxial acceleration, or a geomagnetic sensor that measures azimuth. When the terminal device 100 has a position estimation function and a posture estimation function based on image recognition, the sensor unit 104 may be omitted from the configuration of the terminal device 100.

（入力部）
入力部１０６は、ユーザが端末装置１００を操作し又は端末装置１００へ情報を入力するために使用される入力デバイスである。入力部１０６は、例えば、キーボード、キーパッド、マウス、ボタン、スイッチ又はタッチパネルなどを含み得る。入力部１０６は、入力画像に映るユーザのジェスチャを認識するジェスチャ認識モジュールを含んでもよい。また、入力部１０６は、ＨＭＤ（Head Mounted Display）を装着したユーザの視線方向をユーザ入力として検出する視線検出モジュールを含んでもよい。 (Input section)
The input unit 106 is an input device that is used by a user to operate the terminal device 100 or input information to the terminal device 100. The input unit 106 may include, for example, a keyboard, a keypad, a mouse, a button, a switch, or a touch panel. The input unit 106 may include a gesture recognition module that recognizes a user's gesture shown in the input image. The input unit 106 may include a line-of-sight detection module that detects a line-of-sight direction of a user wearing a HMD (Head Mounted Display) as a user input.

（記憶部）
記憶部１０８は、半導体メモリ又はハードディスクなどの記憶媒体により構成され、端末装置１００による処理のためのプログラム及びデータを記憶する。例えば、記憶部１０８は、撮像部１０２により生成される入力画像及びセンサ部１０４により測定されるセンサデータを一時的に記憶する。また、記憶部１０８は、通信部１１４を介して辞書サーバ２００から受信されるデータを記憶する。辞書サーバ２００から受信されるデータの例について、後に詳細に説明する。 (Memory part)
The storage unit 108 is configured by a storage medium such as a semiconductor memory or a hard disk, and stores a program and data for processing by the terminal device 100. For example, the storage unit 108 temporarily stores an input image generated by the imaging unit 102 and sensor data measured by the sensor unit 104. The storage unit 108 also stores data received from the dictionary server 200 via the communication unit 114. An example of data received from the dictionary server 200 will be described in detail later.

（表示部）
表示部１１２は、ＬＣＤ（Liquid Crystal Display）、ＯＬＥＤ（Organic light-Emitting Diode）又はＣＲＴ（Cathode Ray Tube）などにより構成される表示モジュールである。表示部１１２は、例えば、撮像部１０２により撮像される入力画像、又は物体識別の結果を利用するアプリケーションの画像（例えば、図２に例示したＡＲアプリケーションの画像）を画面上に表示する。表示部１１２は、端末装置１００の一部であってもよく、又は端末装置１００の外部に設けられてもよい。また、表示部１１２は、ユーザにより装着されるＨＭＤであってもよい。 (Display section)
The display unit 112 is a display module configured by an LCD (Liquid Crystal Display), an OLED (Organic light-Emitting Diode), a CRT (Cathode Ray Tube), or the like. The display unit 112 displays, for example, an input image captured by the imaging unit 102 or an image of an application that uses the result of object identification (for example, an AR application image illustrated in FIG. 2) on the screen. The display unit 112 may be a part of the terminal device 100 or may be provided outside the terminal device 100. Further, the display unit 112 may be an HMD worn by the user.

（通信部）
通信部１１４は、端末装置１００による辞書サーバ２００との間の通信を仲介する通信インタフェースである。通信部１１４は、任意の無線通信プロトコル又は有線通信プロトコルをサポートし、辞書サーバ２００との間の通信接続を確立する。それにより、端末装置１００が辞書サーバ２００に画像を送信し及び辞書サーバ２００から特徴量辞書を受信することが可能となる。 (Communication Department)
The communication unit 114 is a communication interface that mediates communication between the terminal device 100 and the dictionary server 200. The communication unit 114 supports an arbitrary wireless communication protocol or wired communication protocol, and establishes a communication connection with the dictionary server 200. Thereby, the terminal device 100 can transmit an image to the dictionary server 200 and receive a feature dictionary from the dictionary server 200.

（バス）
バス１１８は、撮像部１０２、センサ部１０４、入力部１０６、記憶部１０８、表示部１１２、通信部１１４及び制御部１２０を相互に接続する。 (bus)
The bus 118 connects the imaging unit 102, the sensor unit 104, the input unit 106, the storage unit 108, the display unit 112, the communication unit 114, and the control unit 120 to each other.

（制御部）
制御部１２０は、ＣＰＵ（Central Processing Unit）又はＤＳＰ（Digital Signal Processor）などのプロセッサに相当する。制御部１２０は、記憶部１０８又は他の記憶媒体に記憶されるプログラムを実行することにより、後に説明する端末装置１００の様々な機能を動作させる。 (Control part)
The control unit 120 corresponds to a processor such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor). The control unit 120 operates various functions of the terminal device 100 described later by executing a program stored in the storage unit 108 or another storage medium.

［２−２．論理的な構成］
図４は、図３に示した端末装置１００の記憶部１０８及び制御部１２０により実現される論理的機能の構成の一例を示すブロック図である。図４を参照すると、端末装置１００は、画像取得部１３０、送信部１４０、受信部１５０、辞書キャッシュ１６０、識別部１７０、付加情報キャッシュ１８０及び表示制御部１９０を含む。 [2-2. Logical configuration]
4 is a block diagram illustrating an example of a configuration of logical functions realized by the storage unit 108 and the control unit 120 of the terminal device 100 illustrated in FIG. Referring to FIG. 4, the terminal device 100 includes an image acquisition unit 130, a transmission unit 140, a reception unit 150, a dictionary cache 160, an identification unit 170, an additional information cache 180, and a display control unit 190.

（画像取得部）
画像取得部１３０は、撮像部１０２により生成される入力画像を取得する。そして、画像取得部１３０は、取得した入力画像を送信部１４０及び識別部１７０へ順次出力する。 (Image acquisition unit)
The image acquisition unit 130 acquires an input image generated by the imaging unit 102. Then, the image acquisition unit 130 sequentially outputs the acquired input image to the transmission unit 140 and the identification unit 170.

（送信部）
送信部１４０は、所定のトリガイベントが検出されると、画像取得部１３０から入力される入力画像を通信部１１４を介して辞書サーバ２００へ送信する。辞書サーバ２００は、上述したように、既知の物体についての画像特徴量の集合である特徴量辞書を保持しているサーバである。 (Transmitter)
When a predetermined trigger event is detected, the transmission unit 140 transmits an input image input from the image acquisition unit 130 to the dictionary server 200 via the communication unit 114. As described above, the dictionary server 200 is a server that holds a feature value dictionary that is a set of image feature values for known objects.

送信部１４０からの入力画像の送信の契機となるトリガイベントは、例えば、次のうち１つ以上のイベントであってよい。
ａ）周期的なタイミングの到来：ｎフレームごとに１度又はｔ秒ごとに１度などの周期で、入力画像が送信される。周期は、典型的には、入力画像の送信の頻度が識別部１７０による物体の識別の頻度よりも少なくなるように予め設定される。
ｂ）ユーザ指示：入力部１０６を介するユーザからの明示的な指示に応じて入力画像が送信される。
ｃ）追跡中の物体のフレームアウト：物体識別の結果として識別されていた物体が画像からフレームアウトすると、入力画像が送信される。
ｄ）新たな物体のフレームイン：物体識別の結果として新たな物体が画像内に存在することが検出されると、入力画像が送信される。新たな物体とは、画像内に存在していなかった既知の物体、及び未知の物体を含み得る。既知の物体は、例えば顔認識などの簡易な認識技術を用いて検出されてもよい。また、例えば、現在のフレームと前のフレームとの差分（動き補償が適用された上での差分であってもよい）から動物体が認識され、認識された動物体がどのような物体であるかを識別できない場合には、未知の新たな物体がフレームインしたと判定され得る。このような差分画像に基づく動物体の検出は、例えば、物体識別によって画像内で何らの物体も識別されない場合にのみ行われてもよい。
ｅ）端末装置の移動：端末装置１００の位置若しくは向きの変化又は速度の上昇などが検出されると、入力画像が送信される。
周期的な入力画像の送信は、画像の内容に依存することなく継続して物体識別を行うことが望ましい場合に採用され得る。ユーザ指示に応じた入力画像の送信は、例えば、画面に表示されている物体を識別し又は追跡することをユーザが望む場合などに採用され得る。他のトリガイベントは、新たな物体が画像に映る可能性が高いことを想定したイベントであり、これらトリガイベントに応じて入力画像が送信され、特徴量辞書が辞書サーバ２００から提供されることで、新たな物体を適切に識別することを可能とする。 The trigger event that triggers transmission of the input image from the transmission unit 140 may be one or more of the following events, for example.
a) Periodic timing arrival: An input image is transmitted in a cycle such as once every n frames or once every t seconds. The period is typically set in advance such that the frequency of transmission of the input image is less than the frequency of object identification by the identification unit 170.
b) User instruction: An input image is transmitted in response to an explicit instruction from the user via the input unit 106.
c) Frame-out of object being tracked: When an object that has been identified as a result of object identification is framed out of the image, an input image is transmitted.
d) New object frame-in: When a new object is detected in the image as a result of object identification, the input image is transmitted. New objects may include known objects that were not present in the image and unknown objects. The known object may be detected using a simple recognition technique such as face recognition. Also, for example, the moving object is recognized from the difference between the current frame and the previous frame (may be a difference after motion compensation is applied), and what kind of object is the recognized moving object. If it cannot be identified, it can be determined that an unknown new object has been framed in. Such detection of the moving object based on the difference image may be performed only when no object is identified in the image by object identification, for example.
e) Movement of the terminal device: When a change in the position or orientation of the terminal device 100 or an increase in speed is detected, an input image is transmitted.
Periodic input image transmission may be employed when it is desirable to continuously identify an object without depending on the content of the image. The transmission of the input image in response to the user instruction can be employed when the user desires to identify or track an object displayed on the screen, for example. The other trigger event is an event assuming that there is a high possibility that a new object will appear in the image. An input image is transmitted in response to these trigger events, and a feature dictionary is provided from the dictionary server 200. It is possible to appropriately identify a new object.

送信部１４０は、入力画像と共に、辞書サーバ２００による特徴量辞書の取得を補助する補助情報を辞書サーバ２００へ送信してもよい。補助情報とは、例えば、入力画像が撮像された際の（端末装置１００又は撮像装置の）位置、日付及び時刻、並びに端末装置１００のケイパビリティ情報のうち少なくとも１つを含み得る。位置及び日時は、辞書サーバ２００において特徴量辞書をフィルタリングする際に使用され得る。端末装置１００のケイパビリティ情報は、辞書サーバ２００が端末装置１００に提供すべき特徴量辞書のデータ量を決定する際に使用され得る。これら補助情報の活用について、後にさらに説明する。 The transmission unit 140 may transmit auxiliary information that assists the dictionary server 200 to acquire a feature dictionary along with the input image to the dictionary server 200. The auxiliary information may include, for example, at least one of the position (of the terminal device 100 or the imaging device) when the input image is captured, the date and time, and the capability information of the terminal device 100. The position and date / time may be used when filtering the feature dictionary in the dictionary server 200. The capability information of the terminal device 100 can be used when the dictionary server 200 determines the data amount of the feature dictionary to be provided to the terminal device 100. The use of such auxiliary information will be further described later.

（受信部）
受信部１５０は、送信部１４０から辞書サーバ２００へ入力画像が送信された後、辞書サーバ２００において当該入力画像についての物体の識別結果に応じて取得される特徴量辞書を、辞書サーバ２００から受信する。受信部１５０により受信される特徴量辞書は、辞書サーバ２００が有している特徴量辞書よりもデータ量の少ない辞書である。端末装置１００に提供される特徴量辞書が辞書サーバ２００においてどのように取得されるかについて、後にさらに説明する。 (Receiver)
After the input image is transmitted from the transmission unit 140 to the dictionary server 200, the reception unit 150 receives from the dictionary server 200 a feature amount dictionary acquired by the dictionary server 200 according to the object identification result for the input image. To do. The feature amount dictionary received by the receiving unit 150 is a dictionary having a smaller data amount than the feature amount dictionary that the dictionary server 200 has. How the feature dictionary provided to the terminal device 100 is acquired by the dictionary server 200 will be further described later.

受信部１５０は、特徴量辞書を受信すると、受信した特徴量辞書を辞書キャッシュ１６０に記憶させる。本実施形態において、特徴量辞書に含まれる各特徴量には、物体を一意に識別するための識別子（以下、物体ＩＤという）が関連付けられる。受信部１５０は、辞書キャッシュ１６０により記憶されている特徴量と同じ物体ＩＤを有する特徴量を新たに受信した場合には、辞書キャッシュ１６０の特徴量を新たに受信した特徴量で更新してもよい。また、受信部１５０は、受信した各特徴量に受信タイムスタンプを付し、受信タイムスタンプから所定の期間が経過した特徴量を、辞書キャッシュ１６０から自動的に削除してもよい。その代わりに、辞書キャッシュ１６０からの特徴量の削除は、端末装置１００の移動量又は関連付けられている物体の画像からのフレームアウトを契機として行われてもよい。 When receiving the feature dictionary, the receiving unit 150 stores the received feature dictionary in the dictionary cache 160. In the present embodiment, an identifier (hereinafter referred to as an object ID) for uniquely identifying an object is associated with each feature quantity included in the feature quantity dictionary. When the reception unit 150 newly receives a feature amount having the same object ID as the feature amount stored in the dictionary cache 160, the reception unit 150 may update the feature amount of the dictionary cache 160 with the newly received feature amount. Good. The receiving unit 150 may attach a reception time stamp to each received feature quantity, and automatically delete from the dictionary cache 160 a feature quantity that has passed a predetermined period from the reception time stamp. Instead, the deletion of the feature value from the dictionary cache 160 may be performed in response to a movement amount of the terminal device 100 or a frame-out from an image of an associated object.

さらに、本実施形態では、受信部１５０は、辞書サーバ２００において物体の識別結果に応じて取得される付加情報データベースを、辞書サーバ２００から受信する。受信部１５０により受信される付加情報データベースは、辞書サーバ２００が予め有している付加情報データベースよりもデータ量の少ないデータベースである。受信部１５０は、受信した付加情報データベースを付加情報キャッシュ１８０に記憶させる。 Further, in the present embodiment, the receiving unit 150 receives from the dictionary server 200 an additional information database acquired by the dictionary server 200 according to the object identification result. The additional information database received by the receiving unit 150 is a database having a smaller data amount than the additional information database that the dictionary server 200 has in advance. The receiving unit 150 stores the received additional information database in the additional information cache 180.

（辞書キャッシュ）
辞書キャッシュ１６０は、図３に示した記憶部１０８を用いて、受信部１５０により受信される特徴量辞書を記憶する。辞書キャッシュ１６０により記憶される特徴量辞書は、識別部１７０による物体識別の際に参照される。 (Dictionary cache)
The dictionary cache 160 stores the feature dictionary received by the receiving unit 150 using the storage unit 108 illustrated in FIG. The feature dictionary stored in the dictionary cache 160 is referred to when the identification unit 170 identifies an object.

（識別部）
識別部１７０は、画像取得部１３０から入力される入力画像の特徴量を抽出し、抽出した特徴量を辞書キャッシュ１６０により記憶されている特徴量辞書と照合することにより、入力画像に映る物体を識別する。識別部１７０が用いる特徴量抽出アルゴリズムは、例えば、上記非特許文献２に記載されたRandom Ferns法、又は“SURF: Speeded Up Robust Features”（H.Bay, A.Ess, T.Tuytelaars and L.V.Gool, Computer Vision and Image Understanding(CVIU), Vol.110, No.3, pp.346--359, 2008）に記載されたＳＵＲＦ法などであってよい。これらアルゴリズムは、比較的処理コストが少なく高速に動作し得る、軽量なアルゴリズムである。識別部１７０による物体識別の結果として、典型的には、入力画像に映る物体の物体ＩＤ、並びに当該物体の入力画像内での位置及び姿勢が導かれる。そして、識別部１７０は、物体識別の結果を表示制御部１９０へ出力する。 (Identification part)
The identification unit 170 extracts the feature amount of the input image input from the image acquisition unit 130, and collates the extracted feature amount with the feature amount dictionary stored in the dictionary cache 160, thereby identifying an object shown in the input image. Identify. The feature amount extraction algorithm used by the identification unit 170 is, for example, the Random Ferns method described in Non-Patent Document 2 or “SURF: Speeded Up Robust Features” (H. Bay, A. Ess, T. Tuytelaars and LVGool, Computer Vision and Image Understanding (CVIU), Vol.110, No.3, pp.346-359, 2008) may be used. These algorithms are lightweight algorithms that can operate at high speed with relatively low processing costs. As a result of the object identification by the identification unit 170, typically, the object ID of the object shown in the input image, and the position and orientation of the object in the input image are derived. Then, the identification unit 170 outputs the object identification result to the display control unit 190.

（付加情報キャッシュ）
付加情報キャッシュ１８０は、図３に示した記憶部１０８を用いて、受信部１５０により受信される付加情報データベースを記憶する。次に説明する表示制御部１９０は、付加情報キャッシュ１８０により記憶される付加情報データベースから、入力画像に重畳すべき付加情報を選択する。 (Additional information cache)
The additional information cache 180 stores the additional information database received by the receiving unit 150 using the storage unit 108 shown in FIG. The display control unit 190 described below selects additional information to be superimposed on the input image from the additional information database stored in the additional information cache 180.

（表示制御部）
表示制御部１９０は、識別部１７０により識別される物体と関連する付加情報を付加情報キャッシュ１８０により記憶されている付加情報データベースから取得し、取得した付加情報を入力画像に重畳することにより、出力画像を生成する。そして、表示制御部１９０は、生成した出力画像を表示部１１２へ出力する。 (Display control unit)
The display control unit 190 acquires additional information related to the object identified by the identification unit 170 from the additional information database stored in the additional information cache 180, and outputs the additional information by superimposing the acquired additional information on the input image. Generate an image. Then, the display control unit 190 outputs the generated output image to the display unit 112.

入力画像に重畳される付加情報は、いかなる情報であってもよい。例えば、入力画像に重畳される付加情報は、図２に例示したような、入力画像に映る建物と関連付けられている広告情報及びレーティング情報などであってもよい。付加情報の他の例について、後にさらに説明する。 The additional information superimposed on the input image may be any information. For example, the additional information superimposed on the input image may be advertisement information and rating information associated with a building shown in the input image as illustrated in FIG. Another example of the additional information will be further described later.

＜３．一実施形態に係る辞書サーバの構成例＞
［３−１．ハードウェア構成］
図５は、本実施形態に係る辞書サーバ２００のハードウェア構成の一例を示すブロック図である。図５を参照すると、辞書サーバ２００は、記憶部２０８、通信部２１４、バス２１８及び制御部２２０を備える。 <3. Configuration Example of Dictionary Server According to One Embodiment>
[3-1. Hardware configuration]
FIG. 5 is a block diagram illustrating an example of a hardware configuration of the dictionary server 200 according to the present embodiment. Referring to FIG. 5, the dictionary server 200 includes a storage unit 208, a communication unit 214, a bus 218, and a control unit 220.

（記憶部）
記憶部２０８は、半導体メモリ又はハードディスクなどの記憶媒体により構成され、辞書サーバ２００による処理のためのプログラム及びデータを記憶する。記憶部２０８は、端末装置１００の記憶部１０８と比較して、より豊富な記憶容量を有し得る。記憶部２０８は、後に説明する特徴量辞書及び付加情報データベースを予め記憶している。 (Memory part)
The storage unit 208 is configured by a storage medium such as a semiconductor memory or a hard disk, and stores a program and data for processing by the dictionary server 200. The storage unit 208 may have a richer storage capacity than the storage unit 108 of the terminal device 100. The storage unit 208 stores in advance a feature dictionary and additional information database, which will be described later.

（通信部）
通信部２１４は、辞書サーバ２００による端末装置１００との間の通信を仲介する通信インタフェースである。通信部２１４は、任意の無線通信プロトコル又は有線通信プロトコルをサポートし、端末装置１００との間の通信接続を確立する。それにより、辞書サーバ２００が端末装置１００から画像を受信し、並びに端末装置１００へ特徴量辞書及び付加情報データベースを送信することが可能となる。 (Communication Department)
The communication unit 214 is a communication interface that mediates communication between the dictionary server 200 and the terminal device 100. The communication unit 214 supports an arbitrary wireless communication protocol or wired communication protocol, and establishes a communication connection with the terminal device 100. Thereby, the dictionary server 200 can receive an image from the terminal device 100 and can transmit the feature dictionary and the additional information database to the terminal device 100.

（バス）
バス２１８は、記憶部２０８、通信部２１４及び制御部２２０を相互に接続する。 (bus)
The bus 218 connects the storage unit 208, the communication unit 214, and the control unit 220 to each other.

（制御部）
制御部２２０は、ＣＰＵ又はＤＳＰなどのプロセッサに相当する。制御部２２０は、端末装置１００の制御部１２０と比較して、より高い演算性能を有し得る。制御部２２０は、記憶部２０８又は他の記憶媒体に記憶されるプログラムを実行することにより、後に説明する辞書サーバ２００の様々な機能を動作させる。 (Control part)
The control unit 220 corresponds to a processor such as a CPU or a DSP. The control unit 220 may have higher calculation performance than the control unit 120 of the terminal device 100. The control unit 220 operates various functions of the dictionary server 200 to be described later by executing a program stored in the storage unit 208 or another storage medium.

［３−２．論理的な構成］
図６は、図５に示した辞書サーバ２００の記憶部２０８及び制御部２２０により実現される論理的機能の構成の一例を示すブロック図である。図６を参照すると、辞書サーバ２００は、受信部２３０、第１のアルゴリズム（Ａｒｇ１）についての特徴量辞書２４０、第２のアルゴリズム（Ａｒｇ２）についての特徴量辞書２４２、識別部２５０、辞書取得部２６０、付加情報データベース（ＤＢ）２７０、付加情報取得部２８０及び送信部２９０を含む。 [3-2. Logical configuration]
FIG. 6 is a block diagram showing an example of a configuration of logical functions realized by the storage unit 208 and the control unit 220 of the dictionary server 200 shown in FIG. Referring to FIG. 6, the dictionary server 200 includes a receiving unit 230, a feature value dictionary 240 for the first algorithm (Arg1), a feature value dictionary 242 for the second algorithm (Arg2), an identification unit 250, and a dictionary acquisition unit. 260, an additional information database (DB) 270, an additional information acquisition unit 280, and a transmission unit 290.

（受信部）
受信部２３０は、端末装置１００から送信される入力画像を待ち受ける。そして、受信部２３０は、通信部２１４を介して入力画像を受信すると、受信した入力画像を識別部２５０へ出力する。また、受信部２３０は、上述した補助情報が入力画像と共に受信されると、当該補助情報を識別部２５０及び辞書取得部２６０へ出力する。 (Receiver)
The receiving unit 230 waits for an input image transmitted from the terminal device 100. When receiving the input image via the communication unit 214, the receiving unit 230 outputs the received input image to the identification unit 250. In addition, when the auxiliary information described above is received together with the input image, the reception unit 230 outputs the auxiliary information to the identification unit 250 and the dictionary acquisition unit 260.

（特徴量辞書）
特徴量辞書（Ａｒｇ１）２４０及び特徴量辞書（Ａｒｇ２）２４２は、記憶部２０８により予め記憶されている特徴量の集合である。特徴量辞書（Ａｒｇ１）２４０の各特徴量は、既知の物体画像から第１のアルゴリズムに従って抽出される。特徴量辞書（Ａｒｇ２）２４２の各特徴量は、同様の既知の物体画像から第２のアルゴリズムに従って抽出される。典型的には、第１のアルゴリズムは、第２のアルゴリズムと比較してより精度の高い物体識別を可能とする特徴量抽出アルゴリズムである。一方、第２のアルゴリズムは、第１のアルゴリズムと比較してより高速に実行可能な特徴量抽出アルゴリズムである。第１のアルゴリズムは、例えば、上記特許文献１に記載された特徴量抽出アルゴリズムであってよい。その代わりに、第１のアルゴリズムは、“Shape Matching and Object Recognition Using Shape Contexts”（S.Belongie， J.Malik， and J.Puzicha， IEEE Trans. Pattern Analysis and Machine Intelligence， vol.2， no.4， pp.509--522， April 2002）に記載されたアルゴリズム、又は“Distinctive image features from scale-invariant keypoints”（D.G.Lowe， Internal Journal of Computer Vision， 60， 2， pp.91--110， January 2004）に記載されたアルゴリズムなどであってもよい。第２のアルゴリズムは、上述した端末装置１００の識別部１７０による物体の識別の際にも用いられる特徴量抽出アルゴリズム（例えば、Random Ferns法又はＳＵＲＦ法など）である。以下の説明では、第１のアルゴリズムを高精度アルゴリズム、第２のアルゴリズムを軽量アルゴリズムという。 (Feature dictionary)
The feature value dictionary (Arg1) 240 and the feature value dictionary (Arg2) 242 are a set of feature values stored in advance by the storage unit 208. Each feature amount of the feature amount dictionary (Arg1) 240 is extracted from a known object image according to the first algorithm. Each feature amount of the feature amount dictionary (Arg2) 242 is extracted from a similar known object image according to the second algorithm. Typically, the first algorithm is a feature amount extraction algorithm that enables object identification with higher accuracy than the second algorithm. On the other hand, the second algorithm is a feature amount extraction algorithm that can be executed at a higher speed than the first algorithm. The first algorithm may be, for example, a feature amount extraction algorithm described in Patent Document 1. Instead, the first algorithm is “Shape Matching and Object Recognition Using Shape Contexts” (S.Belongie, J.Malik, and J.Puzicha, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.2, no.4). , Pp.509--522, April 2002) or “Distinctive image features from scale-invariant keypoints” (DGLowe, Internal Journal of Computer Vision, 60, 2, pp.91--110, January 2004) The algorithm described in) may be used. The second algorithm is a feature amount extraction algorithm (for example, Random Ferns method or SURF method) that is also used when an object is identified by the identification unit 170 of the terminal device 100 described above. In the following description, the first algorithm is referred to as a high-precision algorithm, and the second algorithm is referred to as a lightweight algorithm.

特徴量辞書（Ａｒｇ１）２４０の特徴量と特徴量辞書（Ａｒｇ２）２４２の特徴量とは、共通する物体ＩＤを用いてリンク付けされる。即ち、同じ物体ＩＤについての特徴量が、特徴量辞書（Ａｒｇ１）２４０及び特徴量辞書（Ａｒｇ２）２４２の双方に含まれる。 The feature quantity in the feature quantity dictionary (Arg1) 240 and the feature quantity in the feature quantity dictionary (Arg2) 242 are linked using a common object ID. That is, the feature amount for the same object ID is included in both the feature amount dictionary (Arg1) 240 and the feature amount dictionary (Arg2) 242.

図７は、辞書サーバ２００により記憶される特徴量辞書の一例について説明するための説明図である。図７を参照すると、特徴量辞書（Ａｒｇ１）２４０は、８個の物体Ｂ_１〜Ｂ_８を含む複数の物体の各々についての、既知の物体画像から高精度アルゴリズムに従って予め抽出された特徴量を含む。各物体には、それぞれ名称が付与されている。特徴量辞書（Ａｒｇ２）２４２は、同じく８個の物体Ｂ_１〜Ｂ_８を含む複数の物体の各々についての、軽量アルゴリズムに従って予め抽出された特徴量を含む。各物体の物体ＩＤは、これら２つの特徴量辞書の間で共通である。即ち、例えば特徴量辞書２４２の物体Ｂ_１についての特徴量は、特徴量辞書２４０の物体Ｂ_１についての特徴量と同じく、建物Ａの画像から抽出された特徴量である。 FIG. 7 is an explanatory diagram for describing an example of a feature dictionary stored by the dictionary server 200. Referring to FIG. 7, the feature dictionary (Arg1) 240 stores feature amounts previously extracted from known object images according to a high-precision algorithm for each of a plurality of objects including _eight objects B _{1 to} B _8. Including. Each object is given a name. The feature amount dictionary (Arg2) 242 includes feature amounts extracted in advance according to a lightweight algorithm for each of a plurality of objects that similarly include the _eight objects B _{1 to} B ₈ . The object ID of each object is common between these two feature quantity dictionaries. That is, for example, the feature amount for the object B _{1 in the} feature amount dictionary 242 is the feature amount extracted from the image of the building A, as is the feature amount for the object B _{1 in the} feature amount dictionary 240.

図７の例に限定されず、特徴量辞書２４０及び２４２は、追加的なデータを含んでもよい。後に説明するいくつかの例において、特徴量辞書２４０は、端末装置１００に提供すべき特徴量辞書の効率的な取得を支援するための追加的なデータを含む。但し、特徴量辞書２４０の代わりに（又は特徴量辞書２４０に加えて）特徴量辞書２４２が、そうした追加的なデータを含んでもよい。 The feature amount dictionaries 240 and 242 may include additional data without being limited to the example of FIG. In some examples described later, the feature value dictionary 240 includes additional data for supporting efficient acquisition of a feature value dictionary to be provided to the terminal device 100. However, instead of the feature value dictionary 240 (or in addition to the feature value dictionary 240), the feature value dictionary 242 may include such additional data.

（識別部）
識別部２５０は、受信部２３０により受信される入力画像の特徴量を高精度アルゴリズムに従って抽出し、抽出した特徴量を特徴量辞書（Ａｒｇ１）２４０と照合することにより、入力画像に映る１つ以上の物体を識別する。そして、識別部２５０は、識別した物体の物体ＩＤと照合スコアとを、辞書取得部２６０及び付加情報取得部２８０へ出力する。 (Identification part)
The identifying unit 250 extracts one or more feature quantities of the input image received by the receiving unit 230 according to a high-precision algorithm, and collates the extracted feature quantity with the feature quantity dictionary (Arg1) 240 to thereby display one or more reflected in the input image. Identify the object. Then, the identification unit 250 outputs the object ID and the matching score of the identified object to the dictionary acquisition unit 260 and the additional information acquisition unit 280.

（辞書取得部）
辞書取得部２６０は、識別部２５０による識別結果に応じて、端末装置１００へ提供すべき特徴量辞書を取得する。辞書取得部２６０により取得される特徴量辞書は、上述した特徴量辞書（Ａｒｇ１）２４０及び特徴量辞書（Ａｒｇ２）２４２よりもデータ量の少ない、特徴量辞書（Ａｒｇ２）２４２のサブセットである。以下、辞書取得部２６０による辞書のサブセットの取得の４つの例について、図８〜図１１を用いて説明する。 (Dictionary acquisition part)
The dictionary acquisition unit 260 acquires a feature dictionary to be provided to the terminal device 100 according to the identification result by the identification unit 250. The feature value dictionary acquired by the dictionary acquisition unit 260 is a subset of the feature value dictionary (Arg2) 242 having a data amount smaller than that of the feature value dictionary (Arg1) 240 and the feature value dictionary (Arg2) 242 described above. Hereinafter, four examples of dictionary subset acquisition by the dictionary acquisition unit 260 will be described with reference to FIGS.

（１）第１の例
図８は、辞書取得部２６０により取得される辞書のサブセットの第１の例について説明するための説明図である。図８を参照すると、識別部２５０による識別結果として得られる照合スコアのランク（順位）が、特徴量辞書（Ａｒｇ１）２４０の物体ＩＤごとに示されている。図８の例では、物体Ｂ_１の照合スコアが最も高く、ランクは第１位である。物体Ｂ_２の照合スコアは次に高く、ランクは第２位である。物体Ｂ_６の照合スコアのランクは、第ｋ位である。辞書取得部２６０は、例えば、このような照合スコアの上位ｋ個の物体についての特徴量を特徴量辞書（Ａｒｇ２）２４２から取得する。そして、辞書取得部２６０は、取得した特徴量を含む特徴量辞書のサブセット２４２ａを、端末装置１００へ提供すべき特徴量辞書として送信部２９０へ出力する。 (1) First Example FIG. 8 is an explanatory diagram for describing a first example of a dictionary subset acquired by the dictionary acquisition unit 260. Referring to FIG. 8, the rank (rank) of the matching score obtained as a result of identification by the identification unit 250 is shown for each object ID of the feature dictionary (Arg1) 240. In the example of FIG. 8, the matching score of the object B ₁ is the highest rank is the first place. Matching score of the object B ₂ is then high, rank is the second largest. The rank of the collation score of the object B ₆ is kth. The dictionary acquisition unit 260 acquires, for example, the feature amounts for the top k objects having such a matching score from the feature amount dictionary (Arg2) 242. Then, the dictionary acquisition unit 260 outputs the feature amount dictionary subset 242 a including the acquired feature amounts to the transmission unit 290 as a feature amount dictionary to be provided to the terminal device 100.

なお、特徴量辞書のサブセット２４２ａに含めるべきデータ量（例えば、特徴量の数ｋ）は、端末装置１００から補助情報として受信される端末装置１００のケイパビリティ情報に応じて、動的に決定されてもよい。端末装置１００のケイパビリティは、例えば、処理可能なデータ件数、プロセッサのコア数又はメモリ容量などにより表現され得る。 Note that the amount of data to be included in the feature amount dictionary subset 242a (for example, the number k of feature amounts) is dynamically determined according to the capability information of the terminal device 100 received as auxiliary information from the terminal device 100. Also good. The capability of the terminal device 100 can be expressed by, for example, the number of data that can be processed, the number of cores of a processor, or the memory capacity.

（２）第２の例
図９は、辞書取得部２６０により取得される辞書のサブセットの第２の例について説明するための説明図である。第２の例では、特徴量辞書（Ａｒｇ１）２４０は、各物体について、「物体ＩＤ」、「名称」及び「特徴量」に加えて、予め定義される「共起物体」というデータを有する。「共起物体」は、各物体と共起する可能性が高い物体のリストを表す。本明細書において、第１の物体の近傍に第２の物体が存在することを、第１の物体と第２の物体とが「共起する」という。図９の例では、物体Ｂ_４の共起物体は、物体Ｂ_５及び物体Ｂ_９である。これは、入力画像に物体Ｂ_４（信号機Ｄ）が映っていると識別された場合に、後に続く入力画像に物体Ｂ_５（自動車Ｅ）又は物体Ｂ_９（標識Ｉ）が映る可能性が高いことを意味する。辞書取得部２６０は、このようなデータを用いて、入力画像に既に映っている物体についての特徴量のみならず、後に続く入力画像に映ると予測される物体についての特徴量を取得し得る。図９の例では、辞書取得部２６０は、照合スコアが上位であった物体Ｂ_４についての特徴量に加えて、後に続く入力画像に映ると予測される物体Ｂ_５及び物体Ｂ_９についての特徴量を特徴量辞書（Ａｒｇ２）２４２から取得している。そして、辞書取得部２６０は、取得した特徴量を含む特徴量辞書のサブセット２４２ｂを送信部２９０へ出力する。 (2) Second Example FIG. 9 is an explanatory diagram for describing a second example of a dictionary subset acquired by the dictionary acquisition unit 260. In the second example, the feature quantity dictionary (Arg1) 240 has data “co-occurrence object” defined in advance in addition to “object ID”, “name”, and “feature quantity” for each object. The “co-occurrence object” represents a list of objects that are highly likely to co-occur with each object. In the present specification, the presence of the second object in the vicinity of the first object is referred to as “co-occurring” between the first object and the second object. In the example of FIG. 9, the co-occurrence objects of the object B ₄ are the object B ₅ and the object B ₉ . This is because there is a high possibility that the object B ₅ (car E) or the object B ₉ (sign I) will appear in the subsequent input image when the object B ₄ (signal D) is identified in the input image. Means that. Using such data, the dictionary acquisition unit 260 can acquire not only the feature amount of the object already shown in the input image but also the feature amount of the object predicted to appear in the subsequent input image. In the example of FIG. 9, the dictionary acquisition unit 260 includes the features of the object B ₅ and the object B ₉ that are predicted to appear in the subsequent input image, in addition to the feature amount of the object B ₄ having the higher matching score. The quantity is acquired from the feature quantity dictionary (Arg2) 242. Then, the dictionary acquisition unit 260 outputs the feature amount dictionary subset 242 b including the acquired feature amount to the transmission unit 290.

（３）第３の例
図１０は、辞書取得部２６０により取得される辞書のサブセットの第３の例について説明するための説明図である。第３の例においても、辞書取得部２６０は、入力画像に既に映っている物体についての特徴量のみならず、後に続く入力画像に映ると予測される物体についての特徴量を取得する。但し、第３の例では、後に続く入力画像に映ると予測される物体とは、入力画像に既に映っている物体の近傍に位置することが位置データから判定される物体である。図１０を参照すると、特徴量辞書（Ａｒｇ１）２４０は、各物体についての位置データ（緯度及び経度、又はその他の座標データ）を有する。例えば、物体Ｂ_１の位置はＸ_１、物体Ｂ_２の位置はＸ_２、物体Ｂ_３の位置はＸ_３である。このうち、位置Ｘ_１と位置Ｘ_２との間の距離は閾値Ｄよりも小さいものとする。辞書取得部２６０は、物体Ｂ_１の照合スコアが上位であった場合に、このような位置データに基づいて、物体Ｂ_１についての特徴量に加えて、物体Ｂ_１の近傍に位置する物体Ｂ_２についての特徴量を特徴量辞書（Ａｒｇ２）２４２から取得する。そして、辞書取得部２６０は、取得した特徴量を含む特徴量辞書のサブセット２４２ｃを送信部２９０へ出力する。 (3) Third Example FIG. 10 is an explanatory diagram for describing a third example of a dictionary subset acquired by the dictionary acquisition unit 260. Also in the third example, the dictionary acquisition unit 260 acquires not only the feature amount of the object already shown in the input image but also the feature amount of the object predicted to appear in the subsequent input image. However, in the third example, the object predicted to appear in the subsequent input image is an object that is determined from the position data to be located in the vicinity of the object already shown in the input image. Referring to FIG. 10, the feature dictionary (Arg1) 240 has position data (latitude and longitude, or other coordinate data) for each object. For example, the position of the object B ₁ is X ₁ , the position of the object B ₂ is X ₂ , and the position of the object B ₃ is X ₃ . Among them, the distance between the position X ₁ and position X ₂ is the smaller than the threshold D. Dictionary acquisition unit 260, if the matching score of the object B ₁ is was higher, based on such location data, in addition to the feature amount of the object B _1, the object located in the vicinity of a body B ₁ B _{2 is} acquired from the feature dictionary (Arg2) 242. Then, the dictionary acquisition unit 260 outputs the feature amount dictionary subset 242 c including the acquired feature amount to the transmission unit 290.

なお、図１０に例示した位置データは、特徴量辞書のフィルタリングのために使用されてもよい。例えば、辞書取得部２６０は、照合スコアの上位ｋ個の物体のうち端末装置１００の近傍に位置する物体についての特徴量のみを取得してもよい。また、識別部２５０は、端末装置１００の近傍に位置する物体についての特徴量のみを、入力画像から抽出される特徴量との照合の対象としてもよい。端末装置１００の位置は、端末装置１００から受信される補助情報から認識され得る。 Note that the position data illustrated in FIG. 10 may be used for filtering the feature dictionary. For example, the dictionary acquisition unit 260 may acquire only the feature amount of an object located in the vicinity of the terminal device 100 among the top k objects of the matching score. Further, the identification unit 250 may use only the feature amount of an object located in the vicinity of the terminal device 100 as a target to be compared with the feature amount extracted from the input image. The position of the terminal device 100 can be recognized from auxiliary information received from the terminal device 100.

（４）第４の例
図１１は、辞書取得部２６０により取得される辞書のサブセットの第４の例について説明するための説明図である。図１１を参照すると、特徴量辞書（Ａｒｇ１）２４０は、各物体について、「物体ＩＤ」、「名称」及び「特徴量」に加えて、「照明条件」というデータを有する。「照明条件」は、例えば、既知の物体画像が撮像された際の照明条件を表す区分であってよい。照明条件は、例えば、撮像された時間帯もしくは撮像された季節などのような時間に関する条件、又は天候に関する条件によって互いに区別される。特徴量辞書（Ａｒｇ１）２４０は、このような照明条件が互いに異なる状況において同じ物体をそれぞれ撮像した画像から抽出される、複数の種類の特徴量を含み得る。図１１の例では、物体Ｂ_２について、照明条件Ｌ１（例えば、“朝”又は“晴れ”）に対応する特徴量、照明条件Ｌ２（例えば、“昼”又は“曇り”）に対応する特徴量、及び照明条件Ｌ３（例えば、“夕方”又は“雨”）に対応する特徴量が特徴量辞書（Ａｒｇ１）２４０に含まれている。物体Ｂ_３についても同様に、照明条件Ｌ１、Ｌ２及びＬ３に対応する特徴量が特徴量辞書（Ａｒｇ１）２４０に含まれている。このように、撮像された際の照明条件が互いに異なる同じ物体についての複数の特徴量を特徴量辞書（Ａｒｇ１）２４０が含んでいることで、識別部２５０による物体の識別が、照明条件の違いに起因する物体の見え方の違いの影響を受けにくくなる。図１１の例では、例えば物体Ｂ_２が映っている入力画像が受信された場合に、入力画像の特徴量と照明条件Ｌ１及びＬ２に対応する特徴量との照合スコアは低いものの、照明条件Ｌ３に対応する特徴量との照合スコアが高いために、物体Ｂ_２についての特徴量が適切に特徴量辞書のサブセット２４２ｄに含まれている。 (4) Fourth Example FIG. 11 is an explanatory diagram for describing a fourth example of a dictionary subset acquired by the dictionary acquisition unit 260. Referring to FIG. 11, the feature dictionary (Arg1) 240 has “illumination condition” data in addition to “object ID”, “name”, and “feature” for each object. The “illumination condition” may be, for example, a category representing an illumination condition when a known object image is captured. Illumination conditions are distinguished from each other by, for example, a condition relating to time such as a time zone in which an image is taken or a season in which an image is taken, or a condition relating to weather. The feature amount dictionary (Arg1) 240 may include a plurality of types of feature amounts that are extracted from images obtained by capturing the same object in situations where the illumination conditions are different from each other. In the example of FIG. 11, the object _{B 2,} illumination conditions L1 (e.g., "morning" or "clear") feature amount corresponding to the illumination condition L2 (e.g., "day" or "cloudy") feature amount corresponding to , And the feature amount corresponding to the illumination condition L3 (for example, “evening” or “rain”) is included in the feature amount dictionary (Arg1) 240. Similarly for the object _{B 3,} feature quantity corresponding to the illumination condition L1, L2 and L3 are included in the feature dictionary (Arg1) 240. As described above, the feature amount dictionary (Arg1) 240 includes a plurality of feature amounts for the same object having different illumination conditions when captured, so that the identification of the object by the identification unit 250 is different in the illumination condition. It becomes difficult to be influenced by the difference in the appearance of the object caused by the. In the example of FIG. 11, for example, when an input image is received that the object B ₂ is reflected, the matching score of the feature quantity corresponding to the illumination conditions L1 and L2 and the feature quantity of the input image although low lighting condition L3 to due to the high matching score with the feature amount corresponding feature quantity for the object B ₂ is included in the subset 242d appropriately feature dictionary.

なお、図１１に例示した照明条件データもまた、特徴量辞書のフィルタリングのために使用されてよい。例えば、辞書取得部２６０は、照合スコアの上位ｋ個の物体の特徴量のうち、入力画像が撮像された際の日時の属する照明条件とは異なる照明条件に対応する特徴量を、特徴量のサブセット２４２ｄから除外してもよい。また、識別部２５０は、当該日時の属する照明条件に対応する特徴量のみを、入力画像から抽出される特徴量との照合の対象としてもよい。入力画像が撮像された際の日時は、端末装置１００から受信される補助情報から認識され得る。 Note that the illumination condition data illustrated in FIG. 11 may also be used for filtering the feature dictionary. For example, the dictionary acquisition unit 260 calculates a feature amount corresponding to an illumination condition different from the illumination condition to which the date and time when the input image is captured, from among the feature amounts of the top k objects of the matching score. It may be excluded from the subset 242d. Further, the identification unit 250 may use only the feature amount corresponding to the illumination condition to which the date and time belongs as a target to be compared with the feature amount extracted from the input image. The date and time when the input image is captured can be recognized from the auxiliary information received from the terminal device 100.

（付加情報ＤＢ）
付加情報ＤＢ２７０は、実空間に存在する物体と関連付けられる付加情報の集合である。ＡＲの分野では、付加情報は、アノテーションとも呼ばれる。図１２は、付加情報ＤＢにより記憶されるデータの一例について説明するための説明図である。図１２を参照すると、付加情報ＤＢ２７０において、「種別」及び「内容」という２つのデータ項目を含む付加情報が、各物体の物体ＩＤと関連付けられている。「種別」は、個々の付加情報の種類を表す。「内容」は、個々の付加情報の実体としてのテキストデータ、図形データ又は画像データなどであってよい。図１２の例では、物体Ｂ_１に広告情報及びレーティング情報が関連付けられている。また、物体Ｂ_２、Ｂ_４及びＢ_５にそれぞれ広告情報、注意喚起情報及び車種情報が関連付けられている。 (Additional information DB)
The additional information DB 270 is a set of additional information associated with an object existing in the real space. In the AR field, additional information is also called annotation. FIG. 12 is an explanatory diagram for describing an example of data stored in the additional information DB. Referring to FIG. 12, in the additional information DB 270, additional information including two data items “type” and “content” is associated with the object ID of each object. “Type” represents the type of individual additional information. The “content” may be text data, graphic data, image data, or the like as an entity of each additional information. In the example of FIG. 12, it is associated with the advertisement information and the rating information to the object B _1. In addition, advertisement information, alert information, and vehicle type information are associated with the objects B ₂ , B _4, and B ₅ , respectively.

（付加情報取得部）
付加情報取得部２８０は、識別部２５０による識別結果に応じて端末装置１００へ提供すべき付加情報を付加情報ＤＢ２７０から取得し、データ量のより少ない付加情報データベースのサブセットを生成する。そして、付加情報取得部２８０は、生成した付加情報データベースのサブセットを送信部２９０へ出力する。付加情報取得部２８０は、典型的には、辞書取得部２６０により取得される特徴量辞書のサブセットと物体ＩＤが共通する付加情報の集合を、付加情報ＤＢ２７０から取得する。即ち、付加情報取得部２８０もまた、照合スコアの上位ｋ個の物体に対応する付加情報の集合を付加情報ＤＢ２７０から取得してもよい。また、付加情報取得部２８０は、後に続く入力画像に映ると予測される物体に対応する付加情報を、付加情報ＤＢ２７０からさらに取得してもよい。 (Additional information acquisition unit)
The additional information acquisition unit 280 acquires additional information to be provided to the terminal device 100 from the additional information DB 270 according to the identification result by the identification unit 250, and generates a subset of the additional information database with a smaller data amount. Then, the additional information acquisition unit 280 outputs the generated subset of the additional information database to the transmission unit 290. The additional information acquisition unit 280 typically acquires, from the additional information DB 270, a set of additional information that has a common object ID and a subset of the feature dictionary acquired by the dictionary acquisition unit 260. That is, the additional information acquisition unit 280 may also acquire a set of additional information corresponding to the top k objects of the matching score from the additional information DB 270. Further, the additional information acquisition unit 280 may further acquire additional information corresponding to an object predicted to appear in the subsequent input image from the additional information DB 270.

（送信部）
送信部２９０は、通信部２１４を介して、辞書取得部２６０により取得される特徴量辞書のサブセットを端末装置１００へ送信する。その際、送信部２９０は、識別された物体が過去に識別された物体とは異なる新たな物体を含むか否かを判定し、新たな物体が識別された場合にのみ、当該新たな物体についての特徴量辞書のサブセットを端末装置１００へ送信してもよい。それにより、入力画像に同じ物体が継続して映っている場合に、冗長的な特徴量辞書の送信が省略され、トラフィックの負荷が軽減される。また、送信部２９０は、付加情報取得部２８０により生成される付加情報データベースのサブセットを端末装置１００へ送信する。付加情報データベースのサブセットもまた、新たな物体が識別された場合にのみ送信されてよい。 (Transmitter)
The transmission unit 290 transmits a subset of the feature dictionary acquired by the dictionary acquisition unit 260 to the terminal device 100 via the communication unit 214. At that time, the transmission unit 290 determines whether or not the identified object includes a new object different from the previously identified object, and only when the new object is identified, May be transmitted to the terminal device 100. Thereby, when the same object is continuously shown in the input image, transmission of the redundant feature dictionary is omitted, and the traffic load is reduced. In addition, the transmission unit 290 transmits a subset of the additional information database generated by the additional information acquisition unit 280 to the terminal device 100. A subset of the side information database may also be sent only when a new object is identified.

［３−３．変形例］
次に、辞書サーバ２００の２つの変形例を説明する。 [3-3. Modified example]
Next, two modifications of the dictionary server 200 will be described.

（１）第１の変形例
図１３は、第１の変形例に係る辞書サーバ２００の論理的機能の構成の一例を示すブロック図である。図１３を参照すると、辞書サーバ２００は、受信部２３２、高精度アルゴリズム（Ａｒｇ１）についての特徴量辞書２４０、軽量アルゴリズム（Ａｒｇ２）についての特徴量辞書２４２、識別部２５２、辞書取得部２６２、付加情報ＤＢ２７０、付加情報取得部２８０及び送信部２９０を含む。 (1) First Modification FIG. 13 is a block diagram illustrating an example of a logical function configuration of the dictionary server 200 according to a first modification. Referring to FIG. 13, the dictionary server 200 includes a receiving unit 232, a feature amount dictionary 240 for the high-precision algorithm (Arg1), a feature amount dictionary 242 for the lightweight algorithm (Arg2), an identification unit 252, a dictionary acquisition unit 262, and an addition An information DB 270, an additional information acquisition unit 280, and a transmission unit 290 are included.

受信部２３２は、端末装置１００から送信される入力画像を待ち受ける。そして、受信部２３２は、通信部２１４を介して入力画像を受信すると、受信した入力画像を識別部２５２及び辞書取得部２６２へ出力する。 The receiving unit 232 waits for an input image transmitted from the terminal device 100. Then, when receiving the input image via the communication unit 214, the receiving unit 232 outputs the received input image to the identifying unit 252 and the dictionary acquiring unit 262.

識別部２５２は、受信部２３２により受信される入力画像の特徴量を高精度アルゴリズムに従って抽出し、抽出した特徴量を特徴量辞書（Ａｒｇ１）２４０と照合することにより、入力画像に映る１つ以上の物体を識別する。また、識別部２５２は、入力画像に映る物体の位置及び姿勢を識別する。そして、識別部２５２は、識別した物体の物体ＩＤ、位置及び姿勢を辞書取得部２６２へ出力する。また、識別部２５２は、識別した物体の物体ＩＤを付加情報取得部２８０へ出力する。 The identifying unit 252 extracts one or more feature quantities of the input image received by the receiving unit 232 according to a high-precision algorithm, and collates the extracted feature quantities with the feature quantity dictionary (Arg1) 240 to thereby display one or more of the input images. Identify the object. The identifying unit 252 identifies the position and orientation of the object shown in the input image. Then, the identification unit 252 outputs the object ID, position, and orientation of the identified object to the dictionary acquisition unit 262. Further, the identification unit 252 outputs the object ID of the identified object to the additional information acquisition unit 280.

辞書取得部２６２は、識別部２５２による識別結果に応じて、端末装置１００へ提供すべき特徴量辞書を生成する。より具体的には、辞書取得部２６２は、まず、識別部２５２により識別された物体の入力画像内での位置を認識し、各物体が映っている領域の部分画像を入力画像から切り出す。そして、辞書取得部２６２は、切り出した部分画像から軽量アルゴリズムに従って特徴量を抽出する。辞書取得部２６２は、このように抽出した各物体の特徴量に識別部２５２から入力された物体ＩＤを関連付けて、軽量アルゴリズムについての特徴量辞書を生成する。この場合、辞書サーバ２００の構成から、軽量アルゴリズム（Ａｒｇ２）についての特徴量辞書２４２は省略されてよい。その代わりに、辞書取得部２６２は、部分画像から抽出される特徴量（即ち、追加学習される特徴量）を、特徴量辞書２４２から取得される特徴量のサブセットに追加することにより、新たな特徴量辞書を生成してもよい。辞書取得部２６２は、このように生成される特徴量辞書を送信部２９０へ出力し、当該特徴量辞書を送信部２９０から端末装置１００へ送信させる。 The dictionary acquisition unit 262 generates a feature dictionary to be provided to the terminal device 100 according to the identification result by the identification unit 252. More specifically, the dictionary acquisition unit 262 first recognizes the position of the object identified by the identification unit 252 in the input image, and cuts out a partial image of a region in which each object appears from the input image. And the dictionary acquisition part 262 extracts a feature-value from the cut-out partial image according to a lightweight algorithm. The dictionary acquisition unit 262 associates the object ID input from the identification unit 252 with the feature amount of each object extracted in this manner, and generates a feature amount dictionary for the lightweight algorithm. In this case, the feature dictionary 242 for the lightweight algorithm (Arg2) may be omitted from the configuration of the dictionary server 200. Instead, the dictionary acquisition unit 262 adds a feature amount extracted from the partial image (that is, a feature amount to be additionally learned) to the subset of feature amounts acquired from the feature amount dictionary 242, thereby creating a new A feature dictionary may be generated. The dictionary acquisition unit 262 outputs the feature amount dictionary generated in this way to the transmission unit 290, and causes the transmission unit 290 to transmit the feature amount dictionary to the terminal device 100.

また、辞書取得部２６２は、軽量アルゴリズムに従って抽出した特徴量から、色、明るさ又はボケの程度などのパラメータを変化させた特徴量のバリエーションをさらに生成してもよい。これら特徴量のバリエーションもまた、新たな特徴量辞書を構成し得る。 Further, the dictionary acquisition unit 262 may further generate a variation of the feature amount by changing parameters such as color, brightness, or the degree of blur from the feature amount extracted according to the lightweight algorithm. Variations of these feature quantities can also constitute a new feature quantity dictionary.

図１４及び図１５は、第１の変形例における辞書取得部２６２による特徴量辞書の生成について説明するための説明図である。図１４を参照すると、入力画像Ｉｍ１に映る物体Ｂ_１及びＢ_４が、特徴量辞書２４０及び高精度アルゴリズムを用いて識別されている。すると、辞書取得部２６２は、図１５に示したように、入力画像Ｉｍ１から物体Ｂ_１が映る部分画像Ａ１及び物体Ｂ_４が映る部分画像Ａ２を切り出す。そして、辞書取得部２６２は、軽量アルゴリズムに従って部分画像Ａ１及び部分画像Ａ２からそれぞれ特徴量を抽出する。また、辞書取得部２６２は、抽出した特徴量から、色又は明るさなどのパラメータの異なる特徴量のバリエーションを生成する。そして、辞書取得部２６２は、各特徴量に物体ＩＤを付すことにより、端末装置１００へ提供される新たな特徴量辞書２４２ｄを形成する。 14 and 15 are explanatory diagrams for explaining generation of a feature dictionary by the dictionary acquisition unit 262 in the first modification. Referring to FIG. 14, objects B ₁ and B ₄ appearing in the input image Im1 are identified using the feature dictionary 240 and a high-precision algorithm. Then, the dictionary acquisition unit 262, as shown in FIG. 15, cuts out a partial image A2 which is the partial image A1 and the object _{B 4} that the object _{B 1} is reflected reflected from the input image Im1. Then, the dictionary acquisition unit 262 extracts feature amounts from the partial image A1 and the partial image A2 according to the lightweight algorithm. Further, the dictionary acquisition unit 262 generates variations of feature amounts having different parameters such as color or brightness from the extracted feature amounts. Then, the dictionary acquisition unit 262 forms a new feature dictionary 242d to be provided to the terminal device 100 by attaching an object ID to each feature.

第１の変形例によれば、辞書サーバ２００により入力画像から動的に生成される特徴量辞書が端末装置１００へ提供される。かかる特徴量辞書は、端末装置１００が存在する環境（撮像環境又は照明環境など）に特に適合した特徴量を含む、データ量の少ない特徴量辞書である。そのため、端末装置１００は、その後の入力画像から、高精度かつ少ない処理コストで、入力画像に映る物体並びに当該物体の位置及び姿勢を識別することができる。 According to the first modification, the feature amount dictionary dynamically generated from the input image by the dictionary server 200 is provided to the terminal device 100. Such a feature dictionary is a feature dictionary with a small amount of data, including feature quantities that are particularly suited to the environment in which the terminal device 100 exists (such as an imaging environment or a lighting environment). Therefore, the terminal device 100 can identify the object shown in the input image and the position and orientation of the object from the subsequent input image with high accuracy and low processing cost.

（２）第２の変形例
ここまでに説明した例では、軽量アルゴリズムのための特徴量辞書のサブセットが辞書サーバ２００から端末装置１００へ提供される。しかしながら、本項で説明する第２の変形例のように、辞書サーバ２００は、高精度アルゴリズムのための特徴量辞書のサブセットを端末装置１００へ提供してもよい。 (2) Second Modification In the example described so far, a subset of the feature dictionary for the lightweight algorithm is provided from the dictionary server 200 to the terminal device 100. However, as in the second modification described in this section, the dictionary server 200 may provide the terminal device 100 with a subset of the feature dictionary for the high-precision algorithm.

図１６は、第２の変形例に係る辞書サーバ２００の論理的機能の構成の一例を示すブロック図である。図１６を参照すると、辞書サーバ２００は、受信部２３０、高精度アルゴリズム（Ａｒｇ１）についての特徴量辞書２４０、識別部２５０、辞書取得部２６４、付加情報ＤＢ２７０、付加情報取得部２８０及び送信部２９０を含む。 FIG. 16 is a block diagram illustrating an example of a logical function configuration of the dictionary server 200 according to the second modification. Referring to FIG. 16, the dictionary server 200 includes a reception unit 230, a feature dictionary 240 for the high-precision algorithm (Arg1), an identification unit 250, a dictionary acquisition unit 264, an additional information DB 270, an additional information acquisition unit 280, and a transmission unit 290. including.

辞書取得部２６４は、識別部２５０による識別結果に応じて、端末装置１００へ提供すべき特徴量辞書のサブセットを特徴量辞書（Ａｒｇ１）２４０から取得する。例えば、図１７には、識別部２５０による識別結果として得られる照合スコアのランクが再び示されている。図１７の例では、物体Ｂ_１の照合スコアのランクは第１位、物体Ｂ_２の照合スコアのランクは第２位、物体Ｂ_６の照合スコアのランクは第ｋ位である。辞書取得部２６４は、例えば、このような照合スコアの上位ｋ個の物体についての特徴量を特徴量辞書（Ａｒｇ１）２４０から取得し、取得した特徴量を含む特徴量辞書のサブセット２４０ａを形成する。そして、送信部２９０は、特徴量辞書のサブセット２４０ａを端末装置１００へ送信する。 The dictionary acquisition unit 264 acquires, from the feature amount dictionary (Arg1) 240, a subset of the feature amount dictionary to be provided to the terminal device 100 according to the identification result by the identification unit 250. For example, FIG. 17 shows again the rank of the matching score obtained as a result of identification by the identification unit 250. In the example of FIG. 17, the collation score rank of the object B ₁ is _first , the collation score rank of the object B ₂ is _second , and the collation score rank of the object B ₆ is kth. The dictionary acquisition unit 264 acquires, for example, feature amounts for the top k objects having such a matching score from the feature amount dictionary (Arg1) 240, and forms a subset 240a of the feature amount dictionary including the acquired feature amounts. . Then, the transmitting unit 290 transmits the subset 240a of the feature dictionary to the terminal device 100.

第２の変形例が適用される場合、端末装置１００の識別部１７０は、高精度アルゴリズムに従って入力画像から特徴量を抽出し、抽出した特徴量を辞書サーバ２００から提供される特徴量辞書のサブセットと照合する。この場合、軽量アルゴリズムを使用する例と比較して、特徴量の抽出に要する端末装置１００の処理コストは多くなる。しかし、辞書キャッシュ１６０には、辞書サーバ２００の特徴量辞書の全体ではなく、そのサブセットのみが記憶される。そのため、端末装置１００が特徴量辞書の全体を有する場合と比較すれば、端末装置１００における特徴量の照合の処理コスト及び消費されるメモリリソースは、格段に少なくて済む。 When the second modification is applied, the identification unit 170 of the terminal device 100 extracts a feature amount from the input image according to a high-precision algorithm, and the extracted feature amount is a subset of the feature amount dictionary provided from the dictionary server 200. To match. In this case, the processing cost of the terminal device 100 required for extracting the feature amount increases compared to the example using the lightweight algorithm. However, the dictionary cache 160 stores not only the entire feature dictionary of the dictionary server 200 but only a subset thereof. Therefore, compared with the case where the terminal device 100 has the entire feature amount dictionary, the processing cost for matching feature amounts in the terminal device 100 and the memory resources consumed can be significantly reduced.

なお、ここでは端末装置１００の送信部１４０が辞書サーバ２００へ入力画像を送信する例について主に説明した。しかしながら、端末装置１００の送信部１４０は、入力画像を送信する代わりに、識別部１７０により入力画像から抽出される特徴量を辞書サーバ２００へ送信してもよい。その場合、辞書サーバ２００の識別部２５０は、受信部２３０により受信される入力画像の特徴量を特徴量辞書（Ａｒｇ１）２４０と照合し得る Here, an example in which the transmission unit 140 of the terminal device 100 transmits an input image to the dictionary server 200 has been mainly described. However, the transmission unit 140 of the terminal device 100 may transmit the feature amount extracted from the input image by the identification unit 170 to the dictionary server 200 instead of transmitting the input image. In this case, the identification unit 250 of the dictionary server 200 can collate the feature amount of the input image received by the reception unit 230 with the feature amount dictionary (Arg1) 240.

＜４．一実施形態に係る処理の流れ＞
［４−１．端末側の処理］
図１８は、本実施形態に係る端末装置１００による処理の流れの一例を示すフローチャートである。 <4. Flow of processing according to one embodiment>
[4-1. Terminal processing]
FIG. 18 is a flowchart illustrating an example of the flow of processing by the terminal device 100 according to the present embodiment.

図１８を参照すると、まず、端末装置１００の画像取得部１３０は、入力画像を取得する（ステップＳ１０２）。次に、送信部１４０は、上述した所定のトリガイベント（例えば、周期的なタイミングの到来又はユーザ指示など）が検出されたか否かを判定する（ステップＳ１０４）。ここで、トリガイベントが検出されていなければ、その後のステップＳ１０６〜Ｓ１１０の処理はスキップされる。一方、トリガイベントが検出されると、送信部１４０は、入力画像（及び必要に応じて補助情報）を辞書サーバ２００へ送信する（ステップＳ１０６）。そして、受信部１５０は、辞書サーバ２００から特徴量辞書を受信する（ステップＳ１０８）。ここで受信される特徴量辞書は、辞書キャッシュ１６０により記憶される。また、受信部１５０は、辞書サーバ２００から付加情報ＤＢを受信する（ステップＳ１１０）。ここで受信される付加情報ＤＢは、付加情報キャッシュ１８０により記憶される。次に、識別部１７０は、辞書キャッシュ１６０内の特徴量辞書を用いて、入力画像に映る物体を識別する（ステップＳ１１２）。次に、表示制御部１９０は、識別部１７０により識別された物体と関連する付加情報を付加情報キャッシュ１８０から取得し、取得した付加情報を入力画像に重畳することにより、出力画像を生成する（ステップＳ１１４）。付加情報の入力画像内での位置及び姿勢は、例えば、識別部１７０による識別される物体の位置及び姿勢に合わせて調整され得る。そして、表示制御部１９０は、生成した出力画像を表示部１１２により表示させる（ステップＳ１１６）。 Referring to FIG. 18, first, the image acquisition unit 130 of the terminal device 100 acquires an input image (step S102). Next, the transmission unit 140 determines whether or not the predetermined trigger event described above (for example, the arrival of periodic timing or a user instruction) is detected (step S104). Here, if a trigger event is not detected, the processing of subsequent steps S106 to S110 is skipped. On the other hand, when the trigger event is detected, the transmission unit 140 transmits the input image (and auxiliary information as necessary) to the dictionary server 200 (step S106). Then, the receiving unit 150 receives the feature dictionary from the dictionary server 200 (step S108). The feature dictionary received here is stored in the dictionary cache 160. The receiving unit 150 receives the additional information DB from the dictionary server 200 (step S110). The additional information DB received here is stored in the additional information cache 180. Next, the identification unit 170 identifies an object shown in the input image using the feature dictionary in the dictionary cache 160 (step S112). Next, the display control unit 190 acquires additional information related to the object identified by the identification unit 170 from the additional information cache 180, and generates an output image by superimposing the acquired additional information on the input image ( Step S114). The position and orientation of the additional information in the input image can be adjusted according to the position and orientation of the object identified by the identification unit 170, for example. Then, the display control unit 190 causes the display unit 112 to display the generated output image (Step S116).

このような処理が、画像取得部１３０により取得される一連の入力画像の各々について繰り返される。 Such processing is repeated for each of a series of input images acquired by the image acquisition unit 130.

［４−２．サーバ側の処理］
図１９は、本実施形態に係る辞書サーバ２００による処理の流れの一例を示すフローチャートである。 [4-2. Server side processing]
FIG. 19 is a flowchart showing an example of the flow of processing by the dictionary server 200 according to the present embodiment.

図１９を参照すると、まず、辞書サーバ２００の受信部２３０は、端末装置１００からの入力画像の受信を待ち受けている（ステップＳ２０２）。そして、受信部２３０により入力画像が受信されると、識別部２５０は、高精度アルゴリズムに従って入力画像から特徴量を抽出する（ステップＳ２０４）。次に、識別部２５０は、抽出した入力画像の特徴量を特徴量辞書（Ａｒｇ１）２４０内の各特徴量と照合し、入力画像に映る物体を識別する（ステップＳ２０６）。ここで、以前に受信された入力画像において識別された物体とは異なる新たな物体が識別された場合には、処理はステップＳ２１０へ進む（ステップＳ２０８）。一方、新たな物体が識別されなかった場合には、その後のステップＳ２１０〜Ｓ２１４の処理はスキップされてよい。識別部２５０により新たな物体が識別されると、その識別結果に応じて、特徴量辞書のサブセットが取得される（又はデータ量の少ない新たな特徴量辞書が生成される）（ステップＳ２１０）。次に、付加情報取得部２８０は、識別部２５０による物体の識別結果に応じて、付加情報ＤＢのサブセットを付加情報ＤＢ２７０から取得する（ステップＳ２１２）。次に、送信部２９０は、特徴量辞書のサブセット及び付加情報ＤＢのサブセットを、端末装置１００へ送信する（ステップＳ２１４）。 Referring to FIG. 19, first, the receiving unit 230 of the dictionary server 200 waits for reception of an input image from the terminal device 100 (step S202). Then, when the input image is received by the receiving unit 230, the identifying unit 250 extracts a feature amount from the input image according to a high-precision algorithm (step S204). Next, the identifying unit 250 compares the extracted feature quantity of the input image with each feature quantity in the feature dictionary (Arg1) 240, and identifies an object shown in the input image (step S206). If a new object different from the object identified in the previously received input image is identified, the process proceeds to step S210 (step S208). On the other hand, when a new object is not identified, the process of subsequent steps S210-S214 may be skipped. When a new object is identified by the identification unit 250, a subset of the feature value dictionary is acquired (or a new feature value dictionary with a small data amount is generated) according to the identification result (step S210). Next, the additional information acquisition unit 280 acquires a subset of the additional information DB from the additional information DB 270 according to the object identification result by the identification unit 250 (step S212). Next, the transmission unit 290 transmits the subset of the feature dictionary and the subset of the additional information DB to the terminal device 100 (step S214).

このような処理により辞書サーバ２００から端末装置１００へ提供される特徴量辞書及び付加情報ＤＢが、端末装置１００における物体識別のために利用される。 The feature dictionary and the additional information DB provided from the dictionary server 200 to the terminal device 100 by such processing are used for object identification in the terminal device 100.

＜５．まとめ＞
ここまで、図１〜図１９を用いて、本明細書で開示する技術の一実施形態及び２つの変形例について詳細に説明した。上述した実施形態によれば、端末装置１００における入力画像からの物体の識別のために利用される特徴量辞書が、より豊富な特徴量を含む特徴量辞書を予め記憶している辞書サーバ２００から端末装置１００へ提供される。端末装置１００へ提供される特徴量辞書は、辞書サーバ２００において入力画像からの物体の識別結果に応じて取得される辞書である。従って、処理リソースの少ない端末装置１００が予め膨大な特徴量辞書を有していなくとも、端末装置１００は、自らが置かれている状況にふさわしい特徴量辞書を用いてより高い精度で物体を識別することができる。 <5. Summary>
Up to this point, an embodiment of the technology disclosed in the present specification and two modifications have been described in detail with reference to FIGS. According to the above-described embodiment, the feature dictionary used for identifying an object from the input image in the terminal device 100 is the dictionary server 200 that stores in advance a feature dictionary including a richer feature. Provided to the terminal device 100. The feature dictionary provided to the terminal device 100 is a dictionary acquired by the dictionary server 200 in accordance with the object identification result from the input image. Therefore, even if the terminal device 100 with few processing resources does not have an enormous feature amount dictionary in advance, the terminal device 100 identifies an object with higher accuracy by using the feature amount dictionary suitable for the situation where the terminal device 100 is placed. can do.

また、本実施形態によれば、辞書サーバ２００においては高精度な特徴量抽出アルゴリズムを用いて物体が識別され、端末装置１００においては軽量な特徴量抽出アルゴリズムを用い物体が識別され得る。従って、処理リソースの少ない端末装置１００においても、ＡＲアプリケーションなどのリアルタイム性が求められる物体識別を伴うアプリケーションを、高い精度で高速に動作させることができる。 Further, according to the present embodiment, the dictionary server 200 can identify an object using a highly accurate feature amount extraction algorithm, and the terminal device 100 can identify an object using a lightweight feature amount extraction algorithm. Therefore, even in the terminal device 100 with few processing resources, an application with object identification that requires real-time performance, such as an AR application, can be operated at high speed with high accuracy.

また、本実施形態によれば、ＡＲアプリケーションにおいて画像に重畳され得る付加情報のデータベースが辞書サーバ２００により予め記憶され、そのサブセットが端末装置１００へ提供される。辞書サーバ２００から端末装置１００へ提供される付加情報もまた、辞書サーバ２００における入力画像からの物体の識別結果に応じて取得される。従って、端末装置１００において付加情報を記憶し及び処理するためのリソースも節約される。 Further, according to the present embodiment, a database of additional information that can be superimposed on an image in the AR application is stored in advance by the dictionary server 200, and a subset thereof is provided to the terminal device 100. Additional information provided from the dictionary server 200 to the terminal device 100 is also acquired in accordance with an object identification result from an input image in the dictionary server 200. Accordingly, resources for storing and processing additional information in the terminal device 100 are also saved.

また、本実施形態によれば、辞書サーバ２００から端末装置１００へ提供される特徴量辞書は、最新の入力画像に映っている物体についての特徴量のみならず、後に続く入力画像に映ると予測される物体についての特徴量を含む。従って、端末装置１００では、辞書サーバ２００から一度提供された特徴量辞書をある程度の期間にわたって使用し続けることができる。それにより、特徴量辞書が一度提供されると、その後の端末装置１００での物体識別においてデータの受信待ちが生じないため、端末装置１００で動作するアプリケーションのリアルタイム性が向上する。また、端末装置１００は入力画像を毎フレーム辞書サーバ２００へ送信しなくてもよいため、トラフィックの負荷も軽減される。 Further, according to the present embodiment, the feature dictionary provided from the dictionary server 200 to the terminal device 100 is predicted to be reflected not only in the feature amount of the object shown in the latest input image but also in the subsequent input image. The feature amount about the object to be included is included. Therefore, the terminal device 100 can continue to use the feature dictionary once provided from the dictionary server 200 for a certain period of time. Thus, once the feature dictionary is provided, data reception waiting does not occur in the subsequent object identification in the terminal device 100, so that the real-time property of the application operating on the terminal device 100 is improved. Further, since the terminal device 100 does not need to transmit the input image to the dictionary server 200 every frame, the traffic load is also reduced.

また、第１の変形例によれば、入力画像の部分画像を用いて辞書サーバ２００において生成される新たな特徴量が、端末装置１００に提供される。この場合には、標準的な環境で予め用意される特徴量辞書のサブセットが提供される場合と比較して、端末装置１００が存在する環境（撮像環境又は照明環境など）に特に適合した特徴量辞書を端末装置１００が使用することが可能となる。また、第２の変形例においても、端末装置１００における特徴量の照合の処理コスト及び消費されるメモリリソースを低減することができる。 Further, according to the first modification, a new feature amount generated in the dictionary server 200 using the partial image of the input image is provided to the terminal device 100. In this case, as compared with the case where a subset of the feature amount dictionary prepared in advance in a standard environment is provided, the feature amount particularly suited to the environment (such as an imaging environment or a lighting environment) in which the terminal device 100 exists. The terminal device 100 can use the dictionary. Also in the second modified example, it is possible to reduce the processing cost of feature amount matching in the terminal device 100 and the memory resources consumed.

なお、上述した物体識別技術は、ＡＲ又は他の目的のアプリケーションのみならず、例えば、ＳＬＡＭ（Simultaneous Localization and Mapping）技術による端末装置１００の位置及び姿勢の推定の際の座標系の初期化又は校正のために使用されてもよい。ＳＬＡＭ技術については、“Real-Time Simultaneous Localization and Mapping with a Single Camera”（A.J.Davison，Proceedings of the 9th IEEE International Conference on Computer Vision Volume 2, 2003, pp.1403-1410）を参照されたい。 Note that the above-described object identification technology is not limited to AR or other purpose applications, but also, for example, initialization or calibration of a coordinate system when estimating the position and orientation of the terminal device 100 using SLAM (Simultaneous Localization and Mapping) technology. May be used for. Regarding SLAM technology, refer to “Real-Time Simultaneous Localization and Mapping with a Single Camera” (A. J. Davison, Proceedings of the 9th IEEE International Conference on Computer Vision Volume 2, 2003, pp. 1403-1410).

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field to which the present invention pertains can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that these also belong to the technical scope of the present invention.

１物体識別システム
１００端末装置
１０８記憶部
１３０画像取得部
１４０送信部
１５０受信部
１７０識別部
１９０表示制御部
２００辞書サーバ（情報処理装置）
２３０受信部
２５０識別部
２６０，２６２，２６４辞書取得部
２８０付加情報取得部
２９０送信部
DESCRIPTION OF SYMBOLS 1 Object identification system 100 Terminal device 108 Memory | storage part 130 Image acquisition part 140 Transmission part 150 Reception part 170 Identification part 190 Display control part 200 Dictionary server (information processing apparatus)
230 receiving unit 250 identifying unit 260, 262, 264 dictionary acquiring unit 280 additional information acquiring unit 290 transmitting unit

Claims

An image acquisition unit for acquiring a captured input image;
A transmission unit that transmits the input image and the auxiliary information to a server having a first feature value dictionary that is matched with a feature value of the input image to identify an object reflected in the input image;
Co-occurrence that is highly likely to co-occur with an object that is obtained by the server according to the identification result of the object based on the first feature quantity dictionary and the auxiliary information and identified based on the first feature quantity dictionary A receiving unit that receives a second feature value dictionary including a feature value of an object from the server;
With
A terminal device, wherein the accuracy of an algorithm used for extracting a feature value included in the first feature value dictionary is different from the accuracy of an algorithm used for extracting a feature value included in the second feature value dictionary.

The terminal device according to claim 1, wherein the co-occurrence object includes an object that exists in the vicinity of an object identified based on the first feature dictionary.

The terminal device according to claim 1, wherein the co-occurrence object includes an object that is not reflected in the input image.

The terminal device according to claim 1, wherein the co-occurrence object includes an object that is predicted to be reflected in a subsequent input image.

The terminal device according to claim 1, wherein the auxiliary information includes information on a position where the input image is captured.

The terminal device according to claim 1, wherein the object identified based on the first feature quantity dictionary is an object having a higher collation score based on the first feature quantity dictionary.

The terminal device according to any one of claims 1 to 6, wherein the co-occurrence object includes at least one of a traffic signal, a building, a moving body, a signboard, and a sign.

Obtaining a captured input image;
Transmitting the input image and auxiliary information to a server having a first feature dictionary that is matched with a feature of the input image to identify an object shown in the input image;
Co-occurrence that is highly likely to co-occur with an object that is obtained by the server according to the identification result of the object based on the first feature quantity dictionary and the auxiliary information and identified based on the first feature quantity dictionary Receiving a second feature dictionary containing features from the server from the server;
Including
The object identification method, wherein accuracy of an algorithm used for extracting a feature amount included in the first feature amount dictionary is different from accuracy of an algorithm used for extracting a feature amount included in the second feature amount dictionary.

A storage unit for storing a first feature quantity dictionary that is a set of feature quantities of known object images;
A receiving unit for receiving an input image captured by the terminal device and auxiliary information;
An identification unit for identifying an object appearing in the input image by comparing the feature amount of the input image with the first feature amount dictionary;
A second feature amount including a feature amount of a co-occurrence object that is highly likely to co-occur with the object identified based on the first feature amount dictionary according to the identification result by the identification unit and the auxiliary information. A dictionary acquisition unit for acquiring a dictionary;
A transmission unit that transmits the second feature dictionary acquired by the dictionary acquisition unit to the terminal device;
With
An information processing apparatus, wherein accuracy of an algorithm used for extracting a feature amount included in the first feature amount dictionary is different from accuracy of an algorithm used for extracting a feature amount included in the second feature amount dictionary.

On the computer,
A function to acquire a captured input image;
A function of transmitting the input image and auxiliary information to a server having a first feature dictionary that is matched with a feature of the input image to identify an object appearing in the input image;
Co-occurrence that is highly likely to co-occur with an object that is obtained by the server according to the identification result of the object based on the first feature quantity dictionary and the auxiliary information and identified based on the first feature quantity dictionary A function of receiving, from the server, a second feature value dictionary including a feature value of an object;
Realized,
A program in which the accuracy of an algorithm used to extract a feature amount included in the first feature amount dictionary is different from the accuracy of an algorithm used to extract a feature amount included in the second feature amount dictionary.