JP2013105373A

JP2013105373A - Device, method, and program for data acquisition

Info

Publication number: JP2013105373A
Application number: JP2011249676A
Authority: JP
Inventors: Masajiro Iwasaki; 雅二郎岩崎
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2011-11-15
Filing date: 2011-11-15
Publication date: 2013-05-30
Anticipated expiration: 2031-11-15
Also published as: JP5430636B2

Abstract

PROBLEM TO BE SOLVED: To acquire data suitable for recognition of a target object from a set of images.SOLUTION: Clusters are generated per local feature quantity of an image first selected from a set of images given as images relating to a target object, on the basis of this image, and local feature quantities of images selected subsequently are classified into the clusters of similar feature quantities to update the clusters, and an image and feature quantities of an extraction source are acquired as data for object recognition on the basis of clusters having larger numbers of elements. Since importance is attached to general features common to many images and minor features are ignored, data suitable for target object recognition can be acquired from the set of images. In particular, first selecting an appropriate image correctly orients classification to clusters to lead to the improvement in accuracy of object recognition.

Description

本発明は、対象物の認識に関する。 The present invention relates to recognition of an object.

コンピュータにより画像からオブジェクト（例えば、建物、物品、人物、顔、動植物その他、各種の対象物）を認識する、いわゆるオブジェクト認識の分野において、認識対象とするオブジェクトを含む画像を複数用意して、学習しておくことで認識の精度を改善できることが知られている。そして、学習データとして、目的のオブジェクトを含む大量の画像を収集するには、対象とするオブジェクトの名称をクエリすなわち検索キーワードとした「Yahoo!検索（画像）」（非特許文献１参照）などの画像検索が利用可能である。また、特許文献１のように、検索キーとしてキーワードではなく画像すなわちクエリ画像を用い、類似画像検索により学習事例すなわち学習データを収集する技術も知られている。 In the field of so-called object recognition in which objects (for example, buildings, articles, people, faces, animals and plants, and other various objects) are recognized from images by a computer, a plurality of images including objects to be recognized are prepared and learned. It is known that the accuracy of recognition can be improved by making it. Then, in order to collect a large amount of images including the target object as learning data, “Yahoo! Search (image)” (see Non-Patent Document 1) using the name of the target object as a query, that is, a search keyword, etc. Image search is available. In addition, as in Patent Document 1, a technique is also known in which an image, that is, a query image is used as a search key instead of a keyword, and learning examples, that is, learning data are collected by similar image search.

特開２０１０−９２４１３号JP 2010-92413 A

ヤフー株式会社、「Yahoo!検索（画像）」、［online］、［2011年10月14日検索］、インターネット〈URL: http://image-search.yahoo.co.jp/>Yahoo Japan Corporation, "Yahoo! Search (image)", [online], [October 14, 2011 search], Internet <URL: http://image-search.yahoo.co.jp/>

しかし、キーワードに基づく画像検索等により収集した画像集合には、必ずしも目的のオブジェクトが適切に写っていない画像がしばしば含まれる。不適切な画像の例は、話題は関連するがまったく異なる対象を写した画像、オブジェクトが明確に写っていない夜景、景色の他の要素が目立つ画像、部分拡大画像、特別な視点や手法、特殊映像効果による画像など多岐に亘る。これら不適切な画像を含んだままの画像集合を学習データとすると、オブジェクト認識の精度が低下するので、精度のよい学習データの獲得が重要な課題である。 However, an image set collected by an image search or the like based on a keyword often includes images in which the target object is not necessarily properly captured. Examples of inappropriate images include images that are related to the subject but are completely different, a night view that does not clearly show objects, images that show other elements of the landscape, partially enlarged images, special viewpoints and techniques, special There are a wide variety of images such as images produced by visual effects. If these image sets that still contain inappropriate images are used as learning data, the accuracy of object recognition decreases, so acquisition of accurate learning data is an important issue.

この点、特許文献１の技術では、収集の基準となる特徴量が固定的となり、収集される画像はクエリ画像に依存してしまうため学習データの精度に限界があった。 In this regard, the technique disclosed in Patent Document 1 has a fixed feature amount serving as a collection reference, and the collected image depends on the query image, so there is a limit to the accuracy of the learning data.

上記の課題に対し、本発明の目的は、オブジェクト認識の精度を向上させる学習データを得ることである。 In view of the above problems, an object of the present invention is to obtain learning data that improves the accuracy of object recognition.

上記の目的をふまえ、本発明の一態様（１）であるデータ取得装置は、画像集合から画像を順次選択しその画像から複数の局所特徴量を抽出する特徴量抽出手段と、前記画像集合から最初に選択された前記画像から抽出された前記局所特徴量ごとに、クラスタを生成するクラスタ生成手段と、各クラスタに分類された全要素から該クラスタの特徴値を算出して、各クラスタに設定する特徴値設定手段と、前記画像集合から画像が前記選択される都度、その画像から抽出された前記局所特徴量のうち、前記クラスタの特徴値に近似する局所特徴量が所定数以上であれば、その局所特徴量を該近似する特徴値が設定されたクラスタに分類することでクラスタを更新するクラスタ更新手段と、要素数の多い前記クラスタを基に、オブジェクト認識用のデータを取得するデータ取得手段と、を備えたことを特徴とする。 Based on the above object, a data acquisition device according to one aspect (1) of the present invention includes a feature amount extraction unit that sequentially selects an image from an image set and extracts a plurality of local feature amounts from the image, and the image set. For each local feature extracted from the initially selected image, cluster generation means for generating a cluster, and the feature value of the cluster is calculated from all elements classified into each cluster, and set for each cluster Each time an image is selected from the image set, the local feature value extracted from the image is more than a predetermined number of local feature values that approximate the feature value of the cluster. A cluster updating means for updating the cluster by classifying the local feature amount into a cluster in which the approximate feature value is set, and for object recognition based on the cluster having a large number of elements. A data acquisition means for acquiring data, comprising the.

本発明の他の態様（７）であるデータ取得方法は、上記態様を方法のカテゴリで捉えたもので、画像集合から画像を順次選択しその画像から複数の局所特徴量を抽出する特徴量抽出処理と、前記画像集合から最初に選択された前記画像から抽出された前記局所特徴量ごとに、クラスタを生成するクラスタ生成処理と、各クラスタに分類された全要素から該クラスタの特徴値を算出して、各クラスタに設定する特徴値設定処理と、前記画像集合から画像が前記選択される都度、その画像から抽出された前記局所特徴量のうち、前記クラスタの特徴値に近似する局所特徴量が所定数以上であれば、その局所特徴量を該近似する特徴値が設定されたクラスタに分類することでクラスタを更新するクラスタ更新処理と、要素数の多い前記クラスタを基に、オブジェクト認識用のデータを取得するデータ取得処理と、をコンピュータが実行することを特徴とする。 A data acquisition method according to another aspect (7) of the present invention is a method of extracting the plurality of local feature amounts from the image by sequentially selecting the images from the image set by capturing the above aspects in the method category. Processing, cluster generation processing for generating a cluster for each local feature extracted from the image initially selected from the image set, and calculation of feature values of the cluster from all elements classified into each cluster Then, each time an image is selected from the image set, the local feature value that approximates the feature value of the cluster among the local feature values extracted from the image each time the image is selected from the image set Is equal to or greater than a predetermined number, the cluster update processing for updating the cluster by classifying the local feature amount into clusters set with the approximate feature value, and the cluster having a large number of elements. , Characterized in that the data acquisition process for acquiring data for object recognition, the computer executes.

本発明の他の態様（８）であるデータ取得用コンピュータ・プログラムは、上記態様をコンピュータ・プログラムのカテゴリで捉えたもので、コンピュータに、画像集合から画像を順次選択しその画像から複数の局所特徴量を抽出させ、前記画像集合から最初に選択された前記画像から抽出された前記局所特徴量ごとに、クラスタを生成させ、各クラスタに分類された全要素から該クラスタの特徴値を算出して、各クラスタに設定させ、前記画像集合から画像が前記選択される都度、その画像から抽出された前記局所特徴量のうち、前記クラスタの特徴値に近似する局所特徴量が所定数以上であれば、その局所特徴量を該近似する特徴値が設定されたクラスタに分類することでクラスタを更新させ、要素数の多い前記クラスタを基に、オブジェクト認識用のデータを取得させることを特徴とする。 Another aspect (8) of the present invention is a data acquisition computer program that captures the above aspect in the category of a computer program. The computer sequentially selects images from a set of images, and selects a plurality of local images from the image. A feature amount is extracted, a cluster is generated for each local feature amount extracted from the image first selected from the image set, and a feature value of the cluster is calculated from all elements classified into each cluster. Each time an image is selected from the image set, the local feature value approximated to the cluster feature value out of the local feature values extracted from the image is greater than or equal to a predetermined number. For example, the local feature amount is classified into a cluster in which the approximate feature value is set to update the cluster, and based on the cluster having a large number of elements, the object Characterized in that to obtain the data for the defect recognition.

本発明の上記態様では、目的のオブジェクトに係るものとして与えられた画像集合から最初に選択した画像を基準としその局所特徴量ごとにクラスタを生成し、以降に選択された画像の局所特徴量を、近似した特徴値のクラスタに分類してクラスタを更新してゆき、要素数の多いクラスタを基に抽出元の画像や特徴値をオブジェクト認識用のデータとして得る。 In the above aspect of the present invention, a cluster is generated for each local feature amount based on an image first selected from an image set given as related to the target object, and the local feature amount of the image selected thereafter is determined. Then, the cluster is updated by classifying it into approximate feature value clusters, and the image and the feature value of the extraction source are obtained as object recognition data based on the cluster having a large number of elements.

これにより、多くの画像に共通の一般的特徴が重視され少数派の特徴は無視されるので、目的とするオブジェクトの認識に適したデータを画像集合から得ることが可能となる。特に、適切な画像を最初に選択すれば、クラスタへの分類を正しく方向づけ、オブジェクト認識の精度が改善できる。 As a result, general features common to many images are emphasized and minority features are ignored, so that data suitable for recognition of the target object can be obtained from the image set. In particular, if an appropriate image is selected first, the classification into clusters can be correctly oriented, and the accuracy of object recognition can be improved.

本発明の他の態様（２）は、上記いずれかの態様において、前記クラスタ更新手段は、２回目以降に選択された前記画像から抽出された前記局所特徴量に近似する前記特徴値が存在しない場合は、その局所特徴量に対応するクラスタを生成することを特徴とする。 According to another aspect (2) of the present invention, in any one of the above aspects, the cluster updating unit does not have the feature value that approximates the local feature amount extracted from the image selected after the second time. In the case, a cluster corresponding to the local feature is generated.

本発明の他の態様（３）は、上記いずれかの態様において、前記クラスタ更新手段は、前記画像集合の画像に基づいて更新した前記クラスタを、再度、同じ画像集合の画像に基づいて更新することを特徴とする。 In another aspect (3) of the present invention, in any one of the above aspects, the cluster updating unit updates the cluster updated based on the image of the image set again based on the image of the same image set. It is characterized by that.

本発明の他の態様（４）は、上記いずれかの態様において、前記データ取得手段により取得されたデータを候補データとし、この候補データに基づく画像を前記画像集合から除いた新たな画像集合に対して、前記特徴量抽出手段、前記クラスタ生成手段、前記特徴値設定手段、前記クラスタ更新手段及びデータ取得手段により新たな候補データを取得させる再実行手段と、前記取得される候補データを比較し、画像数の多いものを最終的なオブジェクト認識用のデータとして選択する結果選択手段と、を備えたことを特徴とする。 According to another aspect (4) of the present invention, in any one of the above aspects, the data acquired by the data acquisition unit is used as candidate data, and an image based on the candidate data is removed from the image set as a new image set. On the other hand, the re-execution unit that acquires new candidate data by the feature amount extraction unit, the cluster generation unit, the feature value setting unit, the cluster update unit, and the data acquisition unit is compared with the acquired candidate data. And a result selection means for selecting data having a large number of images as final object recognition data.

本発明の他の態様（５）は、上記いずれかの態様において、前記データ取得手段は、前記要素数の多い前記クラスタに分類された前記局所特徴量を抽出した抽出元の前記画像を、オブジェクト認識のための学習用データとして取得することを特徴とする。 According to another aspect (5) of the present invention, in any one of the aspects described above, the data acquisition unit may extract the image from which the local feature amount classified into the cluster having a large number of elements is extracted as an object. It is acquired as learning data for recognition.

本発明の他の態様（６）は、上記いずれかの態様において、前記データ取得手段は、前記要素数の多い前記クラスタの前記特徴値を前記オブジェクト認識用のデータとして取得することを特徴とする。 According to another aspect (6) of the present invention, in any one of the above aspects, the data acquisition unit acquires the feature value of the cluster having a large number of elements as the data for object recognition. .

なお、上記の各態様は、明記しない他のカテゴリ（方法、プログラム、システムなど）としても把握することができ、方法やプログラムのカテゴリについては、装置のカテゴリで示した「手段」を、「処理」や「ステップ」のように適宜読み替えるものとする。また、処理やステップの順序は、本出願に直接明記のものに限定されず、順序を変更したり、一部の処理をまとめてもしくは随時一部分ずつ実行するなど、変更可能である。 Each of the above aspects can also be understood as other categories (methods, programs, systems, etc.) that are not specified. For the method and program categories, the “means” indicated in the device category is used as the “processing”. "Or" step ". Further, the order of processes and steps is not limited to the one directly specified in the present application, and can be changed such as changing the order or executing some processes collectively or partly at any time.

また、個々の手段、処理やステップを実現、実行する端末などのコンピュータは共通でもよいし、手段、処理やステップごとにもしくはタイミングごとに異なってもよい。また、上記「手段」の全部又は任意の一部を「部」（ユニット、セクション、モジュール等）と読み替えることができる。 In addition, computers such as terminals that implement and execute individual means, processes, and steps may be common, or may differ for each means, process, step, or timing. Further, all or any part of the “means” can be read as “part” (unit, section, module, etc.).

本発明によれば、オブジェクト認識の精度を向上させる学習データを得ることが可能となる。 According to the present invention, it is possible to obtain learning data that improves the accuracy of object recognition.

本発明の実施形態について構成を示す機能ブロック図。The functional block diagram which shows a structure about embodiment of this invention. 本発明の実施形態におけるデータ例を示す図。The figure which shows the example of data in embodiment of this invention. 本発明の実施形態における処理手順を示すフローチャート。The flowchart which shows the process sequence in embodiment of this invention. 本発明の実施形態における画像集合を示す概念図。The conceptual diagram which shows the image set in embodiment of this invention. 本発明の実施形態におけるクラスタへの分類を示す概念図。The conceptual diagram which shows the classification | category to the cluster in embodiment of this invention.

次に、本発明を実施するための形態（「実施形態」と呼ぶ）について図に沿って例示する。なお、背景技術や課題などで既に述べた内容と共通の前提事項は適宜省略する。 Next, a mode for carrying out the present invention (referred to as “embodiment”) will be illustrated along the drawings. In addition, the assumptions common to the contents already described in the background art and problems are omitted as appropriate.

〔１．構成〕
本実施形態は、図１の構成図に示すデータ取得装置１（以下「本装置１」や「本装置」とも呼ぶ）に関するもので、本装置１は、コンピュータの構成として少なくとも、ＣＰＵなどの演算制御部６と、主メモリや補助記憶装置等の記憶装置７と、通信ネットワークＮ（例えば、携帯電話、ＰＨＳ、公衆無線ＬＡＮなどの移動通信網、インターネットなど）との通信手段８（移動通信網との通信回路、無線ＬＡＮアダプタなど）と、を有する。 [1. Constitution〕
The present embodiment relates to the data acquisition device 1 (hereinafter also referred to as “this device 1” or “this device”) shown in the configuration diagram of FIG. 1, and this device 1 has at least an arithmetic operation such as a CPU as a computer configuration. Communication means 8 (mobile communication network) between the control unit 6, a storage device 7 such as a main memory or an auxiliary storage device, and a communication network N (for example, a mobile communication network such as a mobile phone, PHS, public wireless LAN, or the Internet) Communication circuit, wireless LAN adapter, and the like.

本装置１では、記憶装置７に記憶した所定のコンピュータ・プログラムを演算制御部６が実行することで、図１に示す各手段などの要素（２０，３０ほか）を実現する。実現される要素のうち情報の記憶手段の態様は自由で、記憶装置７上のファイルなど任意のデータ形式で実現できるほか、ネットワーク・コンピューティング（クラウド）によるリモート記憶などでもよい。 In the present apparatus 1, elements (20, 30, etc.) such as each unit shown in FIG. 1 are realized by the arithmetic control unit 6 executing a predetermined computer program stored in the storage device 7. Of the elements to be realized, the mode of the information storage means is arbitrary, and can be realized in any data format such as a file on the storage device 7, or may be remote storage by network computing (cloud).

また、記憶手段は、データの格納領域だけでなく、データの入出力や管理などの機能を含んでもよい。また、本出願に示す記憶手段の単位は説明上の便宜によるもので、適宜、構成を分けたり一体化できるほか、明示する記憶手段以外にも、各手段の処理データや処理結果などを記憶する記憶手段を適宜用いるものとする。 The storage means may include not only a data storage area but also functions such as data input / output and management. In addition, the unit of the storage means shown in this application is for convenience of explanation, and the configuration can be divided or integrated as appropriate, and the processing data and processing results of each means are stored in addition to the explicit storage means. A storage means is used as appropriate.

なお、本願のフロー図などにおいて、記憶手段を表す円筒型の図形は、直接アクセス記憶への限定は意味せず、また、六角形は準備ではなく判断を表すものとする。また、図中（例えば図１）の矢印は、データや制御などの流れについて主要な方向を補助的に示すもので、他の流れを否定するものでも、方向の限定を意味するものでもない。例えばある方向のデータ取得の前後に、データ要求や確認応答（ＡＣＫ）が逆方向に発生し得る。 In the flow chart of the present application and the like, the cylindrical figure representing the storage means does not mean limitation to direct access storage, and the hexagon represents determination rather than preparation. In addition, the arrows in the figure (for example, FIG. 1) supplementarily indicate the main direction of the flow of data, control, etc., and do not deny other flows or imply a limitation on the direction. For example, before or after data acquisition in a certain direction, a data request or an acknowledgment (ACK) can occur in the reverse direction.

また、記憶手段以外の各手段は、以下に説明するような情報処理の機能・作用を実現・実行する処理手段であるが、これらは説明のために整理した機能単位であり、実際のハードウェア要素やソフトウェアモジュールとの一致は問わない。 In addition, each means other than the storage means is a processing means for realizing and executing the functions and operations of information processing as described below, but these are functional units arranged for explanation, and actual hardware It doesn't matter if it matches elements or software modules.

〔２．作用効果〕
上記のように構成された本装置１での処理手順を図３のフローチャートに示す。 [2. Effect)
A processing procedure in the apparatus 1 configured as described above is shown in a flowchart of FIG.

〔２−１．概要〕
まず、図３から概要のみ抽出して述べると、特徴量抽出手段２０が、画像集合記憶手段１５に記憶されている画像集合から画像を順次選択し、その画像から複数の局所特徴量を抽出する（ステップＳ１１）。そして、クラスタ生成手段３０は、画像集合から最初に選択された画像から抽出された局所特徴量ごとに、クラスタを生成する（ステップＳ１２）。また、特徴値設定手段４０は、各クラスタに分類された全要素から該クラスタの特徴値を算出して、各クラスタに設定する（ステップＳ１９）。 [2-1. Overview〕
First, only the outline is extracted and described from FIG. 3. The feature amount extraction unit 20 sequentially selects images from the image set stored in the image set storage unit 15 and extracts a plurality of local feature amounts from the image. (Step S11). Then, the cluster generation unit 30 generates a cluster for each local feature amount extracted from the image first selected from the image set (step S12). Further, the feature value setting means 40 calculates the feature value of the cluster from all the elements classified into each cluster and sets it in each cluster (step S19).

そして、クラスタ更新手段５０は、画像集合Ｓから画像が選択される都度、その画像から抽出された局所特徴量（ステップＳ１３）について、クラスタの特徴値に近似する局所特徴量が所定数以上であれば（ステップＳ１５：「ＹＥＳ」）、その局所特徴量を該近似する特徴値が設定されたクラスタに分類することでクラスタを更新する（ステップＳ１６）。その更新の結果、最終的にデータ取得手段６０が、要素数の多い（例えば所定割合以上の）クラスタを基に、オブジェクト認識用のデータを取得する（ステップＳ２２）。
以下、各処理について具体的に説明する。 Then, each time an image is selected from the image set S, the cluster updating unit 50 determines whether or not the local feature value approximated to the cluster feature value is greater than or equal to a predetermined number for the local feature value extracted from the image (step S13). If this is the case (step S15: “YES”), the cluster is updated by classifying the local feature amount into clusters in which the approximate feature value is set (step S16). As a result of the update, the data acquisition unit 60 finally acquires object recognition data based on a cluster having a large number of elements (for example, a predetermined ratio or more) (step S22).
Each process will be specifically described below.

〔２−２．画像の選択とクラスタの生成〕
まず、与えられた画像集合Ｓに対する処理の最初において、特徴量抽出手段２０は、画像集合Ｓからある画像（例えば画像ａとする）を一つ選択し、その画像ａから局所特徴量（例えばＳＩＦＴ）を抽出する（ステップＳ１１）。ここで、画像集合Ｓは、操作者が記憶媒体などで本装置１に与えてもよいが、本装置１が画像検索サーバ２などへ目的物の名称（例えば「東京タワー」など）を送信し、検索結果の画像を受信して画像集合Ｓとしてもよい。 [2-2. (Image selection and cluster generation)
First, at the beginning of the processing for a given image set S, the feature amount extraction means 20 selects one image (for example, image a) from the image set S, and selects a local feature amount (for example, SIFT) from the image a. ) Is extracted (step S11). Here, the image set S may be given to the apparatus 1 by the operator via a storage medium or the like, but the apparatus 1 transmits the name of the object (for example, “Tokyo Tower”) to the image search server 2 or the like. The image set S may be received by receiving the search result images.

そして、クラスタ生成手段３０は、画像集合Ｓから最初に選択された画像ａから抽出された局所特徴量ごとに、クラスタを生成する（ステップＳ１２）。クラスタは局所特徴量の集合により構成され、クラスタを表すデータはクラスタ記憶手段３５に記憶されるが（例えば図２）、具体的なデータ構成や登録内容の実現態様は自由である。 Then, the cluster generation unit 30 generates a cluster for each local feature amount extracted from the image a first selected from the image set S (step S12). A cluster is configured by a set of local feature values, and data representing the cluster is stored in the cluster storage unit 35 (for example, FIG. 2), but a specific data configuration and a realization mode of registered contents are free.

例えば、図２の例のようにクラスタごとに要素数すなわちそのクラスタに分類された局所特徴量の数（「要素数」）を記憶しておけば、要素数の多寡に応じたクラスタ単位の削除やデータの取得が容易になる。なお、クラスタと、クラスタに分類された局所特徴量との対応付けの態様も自由で、例えば、クラスタごとに局所特徴量のリストを持つ形でもよいし、逆に、抽出された局所特徴量ごとに、分類されたクラスタを示すクラスタＩＤなどのデータを持つ形でもよい。 For example, if the number of elements for each cluster, that is, the number of local feature quantities classified into the cluster (“number of elements”) is stored as in the example of FIG. 2, deletion in units of clusters corresponding to the number of elements And data acquisition becomes easier. Note that the mode of associating clusters with local feature quantities classified into clusters is also free. For example, each cluster may have a list of local feature quantities, or conversely, for each extracted local feature quantity. Alternatively, it may have data such as a cluster ID indicating the classified cluster.

また、図５（概念図）の例では、各クラスタ０１，０２，０３にそれぞれ、画像ａから抽出した例えば「上向きの尖頭部がある」（図４において破線の円で示す局所特徴量ａ１）、「右向きの凸部がある」（図４において破線の円で示す局所特徴量ａ２）といった局所特徴量が分類されているが、実際には、局所特徴量毎に抽出元の画像を示す識別情報などもクラスタのデータに含めることが考えられる。クラスタの生成直後は、各クラスタに１つの局所特徴量が分類され、その局所特徴量がクラスタの特徴値を兼ねる。抽出された局所特徴量とクラスタの特徴値については、画像上における座標の位置を考慮して照合などの処理が行われる。座標の位置の照合に、値が略一致する特徴量の座標上での位置関係を用いることにより、オブジェクトが画像に写っている角度やサイズの相違にも対応できる。 Further, in the example of FIG. 5 (conceptual diagram), each cluster 01, 02, 03 has, for example, “there is an upward cusp” extracted from the image “a” (local feature amount a1 indicated by a dashed circle in FIG. 4). ), “There is a right-facing convex portion” (local feature amount a2 indicated by a broken-line circle in FIG. 4) is classified, but in practice, an extraction source image is shown for each local feature amount. It is possible to include identification information in the cluster data. Immediately after the generation of the cluster, one local feature is classified into each cluster, and the local feature also serves as the feature value of the cluster. The extracted local feature value and cluster feature value are subjected to processing such as matching in consideration of the coordinate position on the image. By using the positional relationship on the coordinates of the feature quantities whose values substantially coincide with each other in the collation of the coordinate positions, it is possible to cope with differences in angles and sizes in which the object is reflected in the image.

また、クラスタの生成は、単一の画像から抽出した局所特徴量ごとでもよいが、画像集合Ｓから最初に二つの画像ａ，ｂを選択し、それら二つの画像ａ，ｂ間から抽出した類似する局所特徴量をｒａｎｓａｃなどのマッチング手法で座標の照合を行って対応づけ、照合により対応付けられた局所特徴量ごとにクラスタを作成することが望ましい。この場合、クラスタに対応付けられた局所特徴量の数が多ければ同一オブジェクトが存在すると判断でき、その局所特徴量の分布する座標の領域にオブジェクトが存在すると推定できる。 Further, the cluster may be generated for each local feature extracted from a single image, but two images a and b are first selected from the image set S, and similarities extracted from the two images a and b are selected. It is desirable that the local feature quantities to be matched are matched by matching the coordinates using a matching method such as ransac, and a cluster is created for each local feature quantity associated by matching. In this case, if the number of local feature amounts associated with the cluster is large, it can be determined that the same object exists, and it can be estimated that the object exists in the coordinate region where the local feature amount is distributed.

なお、最初に選択した画像ａ，ｂ間で対応付けることができた局所特徴量の数がゼロ又は所定以下で同一オブジェクトが存在しないと考えられる場合、画像ａ又はｂに代えて、画像集合Ｓから別の画像ｃを選択し、局所特徴量の抽出と対応付けをやり直す。又は、集合Ｓに含まれる画像全体に対して局所特徴量のマッチングを行ってもよい。 If the number of local features that can be associated between the images a and b selected first is zero or less than a predetermined value and it is considered that the same object does not exist, the image set S is used instead of the image a or b. Another image c is selected, and the extraction and association of the local feature amount are performed again. Alternatively, local feature amount matching may be performed on the entire image included in the set S.

〔２−３．分類とクラスの新規生成〕
上記のように処理の最初にクラスタを生成した後、クラスタ更新手段５０は、画像集合Ｓから画像が選択される都度、その画像から抽出された局所特徴量（ステップＳ１３）と、全てのクラスタの特徴値とを、座標の比較を用いて照合する（ステップＳ１４）。すなわち、ｒａｎｓａｃなどのマッチング手法によれば、局所特徴量のある座標の位置関係でオブジェクトの有無を判定でき、このｒａｎｓａｃなどのマッチング手法を用いて、画像から抽出した各局所特徴量と、各クラスタの特徴値との間で、座標と値の照合を行う。 [2-3. Classification and new generation of classes)
After generating a cluster at the beginning of the processing as described above, each time an image is selected from the image set S, the cluster updating unit 50 extracts the local feature (step S13) extracted from the image and all the clusters. The feature value is collated using the comparison of coordinates (step S14). That is, according to a matching method such as ransac, the presence / absence of an object can be determined by the positional relationship of coordinates having local feature amounts, and each local feature amount extracted from an image and each cluster using the matching method such as ransac Coordinates and values are collated with the feature values.

照合の結果、画像から抽出された局所特徴量のうち、クラスタの特徴値に近似していて座標の位置関係も一致している局所特徴量が所定数（一つでもよい）以上であれば（ステップＳ１５：「ＹＥＳ」）、その局所特徴量を該近似する特徴値が設定されたクラスタに分類することでクラスタを更新する（ステップＳ１６）。近似の判断に、局所特徴量の座標と、クラスタの特徴値である画像上での座標の重心との位置関係を考慮することにより、同一のオブジェクトの存在する画像の判定精度が向上する。 As a result of the collation, if the local feature amount extracted from the image is equal to or greater than a predetermined number (or one) of the local feature amount that approximates the feature value of the cluster and matches the positional relationship of the coordinates (or may be one) ( Step S15: "YES"), the cluster is updated by classifying the local feature quantity into clusters set with the approximate feature value (Step S16). Considering the positional relationship between the coordinates of the local feature amount and the center of gravity of the coordinates on the image, which is the cluster feature value, in the determination of approximation, the determination accuracy of the image in which the same object exists is improved.

なお、最初に選択された画像ａ（ステップＳ１１）からはクラスタが生成されるが（ステップＳ１２）、クラスタ更新手段５０は、その後、２回目以降に選択された画像から抽出された局所特徴量（ステップＳ１３）のなかでクラスタに分類されなかったものについては（ステップＳ１７：「ＮＯ」）、その局所特徴量に対応するクラスタを新たに生成する（ステップＳ１８）。 Although a cluster is generated from the first selected image a (step S11) (step S12), the cluster updating unit 50 then selects a local feature amount extracted from the second and subsequent images (step S12). For those that are not classified into clusters in step S13) (step S17: “NO”), a cluster corresponding to the local feature is newly generated (step S18).

〔２−４．特徴値の設定ほか〕
また、特徴値設定手段４０は、各クラスタに分類された全要素から該クラスタの特徴値を算出して、各クラスタに設定する（ステップＳ１９）。具体的には、クラスタごとに分類された局所特徴量について、特徴量空間上での重心と画像上での座標位置での重心を算出し、各クラスタの新たな特徴値として設定する。画像上での座標位置での重心は、多様な画像サイズや画像中におけるオブジェクトの回転や縮小拡大に対応するために、最初の画像の座標位置を基に、各画像ごとに回転角度や縮小拡大率を算出した後に座標を正規化する。 [2-4. (Feature value setting etc.)
Further, the feature value setting means 40 calculates the feature value of the cluster from all the elements classified into each cluster and sets it in each cluster (step S19). Specifically, for the local feature quantities classified for each cluster, the center of gravity in the feature amount space and the center of gravity at the coordinate position on the image are calculated and set as new feature values for each cluster. The center of gravity at the coordinate position on the image is based on the coordinate position of the first image to correspond to various image sizes and the rotation and reduction / enlargement of the object in the image. Normalize coordinates after calculating rate.

クラスタの特徴値としてこれら重心を用いることで、どのような局所特徴量（特徴量空間上での重心）が、画像のどのあたり（画像上での座標位置での重心）にあるか、に基づいてクラスタへの分類が可能となる。特に、局所特徴量について画像上での座標位置を判断に含めることで処理精度が高まる。 By using these centroids as cluster feature values, it is based on what local features (centroids in the feature space) are in the image (centroids at the coordinate positions on the image). Can be classified into clusters. In particular, the processing accuracy is improved by including the coordinate position on the image for the local feature amount in the determination.

以上の処理を、未処理の画像が尽きる（ステップＳ２０：「ＮＯ」）まで繰り返した後、オブジェクト認識用のデータの取得に進むが（ステップＳ２２）、それに先立ってクラスタ内の要素数が所定以下のクラスタを削除してもよい（ステップＳ２１）。これにより、例えば下記のようにクラスタの再更新をする際、少数しか存在しない特徴を分類対象から排除することで、主要な特徴に処理能力を集中できる。 The above processing is repeated until the unprocessed image is exhausted (step S20: “NO”), and then the process proceeds to acquisition of data for object recognition (step S22). Prior to that, the number of elements in the cluster is equal to or less than a predetermined value. May be deleted (step S21). Thereby, for example, when the cluster is re-updated as described below, the processing capability can be concentrated on the main features by excluding the features that exist in a small number from the classification targets.

〔２−５．クラスタの再更新〕
また、ステップＳ１３での画像の選択順序により、クラスタの特徴値は変動するため、図３に示したステップＳ１３からの処理を、一旦処理済となったクラスタ群に対して再度繰り返して再更新すれば、オブジェクトの特質を精度よく表した特徴値を生成することができる。この場合、クラスタ更新手段５０は、画像集合の画像に基づいて更新したクラスタを、再度、同じ画像集合の画像に基づいて更新する。 [2-5. (Renew cluster)
Further, since the feature value of the cluster fluctuates depending on the image selection order in step S13, the processing from step S13 shown in FIG. For example, it is possible to generate feature values that accurately represent the characteristics of an object. In this case, the cluster update unit 50 updates the cluster updated based on the images of the image set again based on the images of the same image set.

〔２−６．データ取得の態様〕
以上のようにクラスタを更新した結果、最終的に、データ取得手段６０は、要素数の多い（例えば所定割合以上の）クラスタを基に、オブジェクト認識用のデータを取得する（ステップＳ２２）。 [2-6. Data acquisition mode]
As a result of updating the clusters as described above, the data acquisition unit 60 finally acquires data for object recognition based on a cluster having a large number of elements (for example, a predetermined ratio or more) (step S22).

オブジェクト認識用のデータについて、具体的な内容や形式は自由であるが、その第一の例は、データ取得手段６０が、要素数の多いクラスタに分類された局所特徴量を抽出した抽出元の画像を、オブジェクト認識のための学習用データとして取得することである。 Although the specific content and format of the data for object recognition are arbitrary, the first example is that the data acquisition means 60 is the source of extraction from which the local feature quantities classified into clusters with a large number of elements are extracted. An image is acquired as learning data for object recognition.

例えば、図４の画像集合Ｓのうち、最初に画像ａを選択すれば、目的のオブジェクト（例えば「○○タワー」）が明瞭に写っている画像ａ，ｃ，ｆが取得の対象となり、「○○タワー」が一部のみ写っている画像ｂ、夜景により不明瞭に写っている画像ｄ、景色の一部として写っている画像ｅ、他のオブジェクトの方が目立つ画像ｇは取得の対象とならない（説明上、×印で示している）。 For example, if an image a is first selected from the image set S in FIG. 4, images a, c, and f in which the target object (for example, “OO tower”) is clearly shown are to be acquired. An image b in which only a part of “XX Tower” is shown, an image d in which the night view is indistinct, an image e in which a part of the scenery is shown, and an image g in which other objects are more prominent are obtained. (It is indicated with a cross for explanation).

オブジェクト認識用のデータに関する第二の例は、データ取得手段６０が、要素数の多いクラスタの特徴値又はそのクラスタに分類された局所特徴量をオブジェクト認識用のデータとして取得することである。図５の例の状態では、例えば、要素数が最多（３つ）であるクラスタ０２の特徴値（例えば特徴量空間上での重心と画像上での座標位置での重心）がオブジェクト認識用のデータとなる。 The second example regarding the data for object recognition is that the data acquisition means 60 acquires the feature value of the cluster having a large number of elements or the local feature amount classified into the cluster as the data for object recognition. In the state of the example in FIG. 5, for example, the feature value of the cluster 02 having the largest number (three) of elements (for example, the centroid on the feature amount space and the centroid at the coordinate position on the image) is used for object recognition. It becomes data.

〔２−７．候補データの複数取得〕
また、オブジェクトの認識精度をさらに改善するには、オブジェクト認識用のデータを、複数取得することも考えられる。この場合、再実行手段７０が、データ取得手段により取得されたデータを候補データとし、この候補データに基づく画像を画像集合から除いた新たな画像集合に対して、特徴量抽出手段２０、クラスタ生成手段３０、特徴値設定手段４０、クラスタ更新手段５０及びデータ取得手段６０により新たな候補データを取得させる。 [2-7. (Multiple acquisition of candidate data)
In order to further improve the object recognition accuracy, it is conceivable to acquire a plurality of object recognition data. In this case, the re-execution means 70 uses the data acquired by the data acquisition means as candidate data, and the feature quantity extraction means 20, cluster generation for the new image set obtained by removing the image based on the candidate data from the image set. New candidate data is acquired by the means 30, the feature value setting means 40, the cluster update means 50, and the data acquisition means 60.

そのうえで、結果選択手段８０が、最初に取得された候補データと、新たに取得された候補データと、を比較し、画像数の多いものを最終的なオブジェクト認識用のデータとして選択する。 After that, the result selection means 80 compares the candidate data acquired first and the newly acquired candidate data, and selects the one with a large number of images as the final object recognition data.

〔３．効果〕
（１）以上のように本実施形態では、目的のオブジェクトに係るものとして与えられた画像集合から最初に選択した画像を基準としその局所特徴量ごとにクラスタ（例えば図５、図２）を生成し（例えば図３のステップＳ１２）、以降に選択された画像の局所特徴量を（ステップＳ１３）、近似した特徴値のクラスタに分類してクラスタを更新してゆき（ステップＳ１４〜Ｓ１６）、要素数の多いクラスタを基に抽出元の画像や特徴値をオブジェクト認識用のデータとして得る（ステップＳ２２）。 [3. effect〕
(1) As described above, in the present embodiment, a cluster (for example, FIG. 5 and FIG. 2) is generated for each local feature amount based on an image first selected from an image set given as related to a target object. (For example, step S12 in FIG. 3), the local feature amount of the image selected after that (step S13) is classified into clusters of approximate feature values, and the cluster is updated (steps S14 to S16). Based on the large number of clusters, the source image and feature values are obtained as object recognition data (step S22).

これにより、オブジェクトに共通する特徴が重視され、少数派の特徴は無視されて、認識用のデータが収集されていくので、オブジェクトの認識に適したデータを画像集合から得ることが可能となる。特に、適切な画像を最初に選択すれば、クラスタへの分類を正しく方向づけ、オブジェクト認識に用いるデータの精度が改善できる。 As a result, features common to objects are emphasized, minority features are ignored, and data for recognition is collected. Therefore, data suitable for object recognition can be obtained from the image set. In particular, if an appropriate image is selected first, classification into clusters can be correctly oriented, and the accuracy of data used for object recognition can be improved.

（２）特に、本実施形態では、２つ目以降に選択された画像の局所特徴量に対応するクラスタがない場合に（ステップＳ１７：「ＮＯ」）対応するクラスタを新たに生成することにより（ステップＳ１８）、他の画像に含まれている可能性がある特徴も以降のクラスタ分類の対象とすることができ、オブジェクトの特徴を表すクラスタの生成漏れを防止できるので、オブジェクト認識の精度が改善できる。 (2) In particular, in the present embodiment, when there is no cluster corresponding to the local feature amount of the image selected after the second (step S17: “NO”), by newly generating the corresponding cluster ( In step S18), a feature that may be included in another image can be a target of subsequent cluster classification, and a generation of a cluster representing the feature of the object can be prevented, thereby improving the accuracy of object recognition. it can.

（３）また、本実施形態では、画像集合に基づく１巡目で分類されたクラスタに対しさらに、同じ画像集合に基づいて２巡目としてクラスタを更新する。これにより、局所特徴量の近似性判断の基準としたクラスタごとの特徴量空間での重心や画像座標上での重心について、１巡目である程度整った状態を活用して、そのクラスタの特徴値を補正していくことにより、さらに高精度なクラスタ分類に反映できるので、オブジェクトの認識精度がさらに改善できる。 (3) Further, in the present embodiment, the cluster is updated as the second round based on the same image set with respect to the cluster classified in the first round based on the image set. As a result, the feature value of the cluster is obtained by utilizing the state that is established to some extent in the first round with respect to the center of gravity in the feature amount space for each cluster and the center of gravity on the image coordinates, which are the criteria for determining the local feature approximation. Can be reflected in the cluster classification with higher accuracy, so that the object recognition accuracy can be further improved.

（４）さらに、本実施形態では、再実行手段７０により、候補データを除いた画像集合から、更なる処理で別の候補データを取得し、結果選択手段８０により、２つの候補データを比較し画像数の多いものを最終的なオブジェクト認識のデータとして選択する。 (4) Further, in this embodiment, the re-execution unit 70 acquires another candidate data from the image set excluding the candidate data by further processing, and the result selection unit 80 compares the two candidate data. Those having a large number of images are selected as final object recognition data.

これにより、最初に選択された画像に偶然、目的のオブジェクトとは別のオブジェクトが写っていても、より目的のオブジェクトを写した画像が多く残る候補データを最終的なオブジェクト認識用のデータとすることができる。このため、画像数の多いほうの候補データを選択することで正しい処理結果が得られ、オブジェクトの認識性が改善できる。 As a result, even if an object different from the target object is accidentally shown in the first selected image, candidate data in which more images showing the target object remain is used as final object recognition data. be able to. For this reason, by selecting candidate data having a larger number of images, a correct processing result can be obtained, and object recognition can be improved.

（５）加えて、本実施形態では、要素数の多いクラスタに追加された局所特徴量の抽出元である画像（例えば図７の画像ａ，ｃ，ｆ）を、オブジェクト認識のための学習用データとして取得することにより、実際のオブジェクト認識にはそれら画像に基づいて、用途などの事情に応じた任意のアルゴリズムを柔軟に適用可能となる。 (5) In addition, in the present embodiment, an image (for example, the images a, c, and f in FIG. 7) from which the local feature amount added to the cluster having a large number of elements is extracted is used for learning for object recognition. By acquiring as data, an arbitrary algorithm can be flexibly applied to actual object recognition on the basis of these images according to circumstances such as usage.

（６）とりわけ、本実施形態では、要素数の多いクラスタ（例えば図５のクラスタ０２）の特徴値をオブジェクト認識用のデータとして取得することにより、そのデータを実際のオブジェクト認識における認識対象として円滑に活用可能となる。 (6) In particular, in the present embodiment, the feature values of a cluster having a large number of elements (for example, cluster 02 in FIG. 5) are acquired as object recognition data, so that the data can be smoothly recognized as a recognition target in actual object recognition. It becomes possible to utilize.

〔４．他の実施形態〕
なお、上記実施形態は例示に過ぎず、本発明は、以下に例示するものやそれ以外の他の実施態様も含むものである。例えば、本出願における構成図、データの図、フローチャートなどは例示に過ぎず、各要素の有無、その配置や処理実行などの順序、具体的内容などは適宜変更可能である。例えば、ステップＳ１１で最初に選択する画像については、複数の画像を候補とし、その中から選択してもよい。具体的には、候補として選定した画像を基準画像とし、その基準画像と、集合Ｓ内の各画像との局所特徴量の照合を行い（ステップＳ１５の処理）、マッチングする画像が多い基準の画像を最初の画像とすれば、より標準的なオブジェクトが存在する画像を用いて学習データを得ることができる。 [4. Other embodiments]
In addition, the said embodiment is only an illustration and this invention includes what is illustrated below and other embodiment other than that. For example, the configuration diagrams, data diagrams, flowcharts, and the like in the present application are merely examples, and the presence / absence of each element, the order of the arrangement and processing execution, and the specific contents can be changed as appropriate. For example, for the image to be initially selected in step S11, a plurality of images may be selected as candidates. Specifically, an image selected as a candidate is used as a reference image, and a local feature amount is compared between the reference image and each image in the set S (processing in step S15). Is the first image, learning data can be obtained using an image in which a more standard object exists.

また、本装置１を構成する個々の手段を実現する態様は自由で、外部のサーバが提供している機能をＡＰＩ（アプリケーション・プログラム・インタフェース）やネットワーク・コンピューティング（いわゆるクラウドなど）で呼び出して実現するなど、本発明の構成は柔軟に変更できる。さらに、本発明に関する手段などの各要素は、コンピュータの演算制御部に限らず物理的な電子回路など他の情報処理機構で実現してもよい。 In addition, the mode for realizing the individual means constituting the apparatus 1 is free, and functions provided by an external server are called by API (application program interface) or network computing (so-called cloud). For example, the configuration of the present invention can be flexibly changed. Furthermore, each element such as means relating to the present invention may be realized by other information processing mechanisms such as a physical electronic circuit as well as a computer control unit.

０１，０２，０３各クラスタ
１データ取得装置
２画像検索サーバ
６演算制御部
７記憶装置
８通信手段
１５画像集合記憶手段
２０特徴量抽出手段
３０クラスタ生成手段
３５クラスタ記憶手段
４０特徴値設定手段
５０クラスタ更新手段
６０データ取得手段
７０再実行手段
８０結果選択手段
ａ〜ｇ画像
ａ１，ａ２，ａ３，ｃ１，ｃ２，ｄ２，ｄ３局所特徴量
Ｎ通信ネットワーク
Ｓ画像集合 01, 02, 03 Each cluster 1 Data acquisition device 2 Image search server 6 Arithmetic control unit 7 Storage device 8 Communication means 15 Image set storage means 20 Feature quantity extraction means 30 Cluster generation means 35 Cluster storage means 40 Feature value setting means 50 Cluster Update means 60 Data acquisition means 70 Re-execution means 80 Result selection means a to g Images a1, a2, a3, c1, c2, d2, d3 Local feature amount N Communication network S Image set

Claims

Feature amount extraction means for sequentially selecting images from the image set and extracting a plurality of local feature amounts from the images;
Cluster generation means for generating a cluster for each local feature extracted from the image initially selected from the image set;
Feature value setting means for calculating the feature value of the cluster from all elements classified into each cluster and setting the feature value in each cluster;
Each time an image is selected from the image set, if the local feature value approximated to the feature value of the cluster is greater than or equal to a predetermined number among the local feature values extracted from the image, the local feature value is Cluster updating means for updating the cluster by classifying the cluster with the feature value to be approximated, and
Data acquisition means for acquiring data for object recognition based on the cluster having a large number of elements;
A data acquisition device comprising:

The cluster update means includes
The cluster corresponding to the local feature is generated when there is no feature value that approximates the local feature extracted from the image selected after the second time. Data acquisition device.

3. The data acquisition apparatus according to claim 1, wherein the cluster updating unit updates the cluster updated based on the image of the image set again based on the image of the same image set.

The data acquired by the data acquisition unit is set as candidate data, and the feature amount extraction unit, the cluster generation unit, and the feature value setting are performed on a new image set obtained by removing an image based on the candidate data from the image set. Means, re-execution means for acquiring new candidate data by the cluster update means and the data acquisition means,
A result selection means for comparing the acquired candidate data and selecting a large number of images as final object recognition data;
The data acquisition device according to any one of claims 1 to 3, further comprising:

2. The data acquisition unit acquires the source image from which the local feature amount classified into the cluster having the large number of elements is extracted as learning data for object recognition. 5. The data acquisition device according to any one of items 4 to 4.

6. The data acquisition apparatus according to claim 1, wherein the data acquisition unit acquires the feature value of the cluster having a large number of elements as the data for object recognition.

Feature amount extraction processing for sequentially selecting images from an image set and extracting a plurality of local feature amounts from the images;
Cluster generation processing for generating a cluster for each local feature extracted from the image initially selected from the image set;
Calculating a feature value of the cluster from all elements classified into each cluster, and setting the feature value in each cluster;
Each time an image is selected from the image set, if the local feature value approximated to the feature value of the cluster is greater than or equal to a predetermined number among the local feature values extracted from the image, the local feature value is Cluster update processing to update the cluster by classifying the cluster with the approximate feature value, and
Based on the cluster having a large number of elements, a data acquisition process for acquiring data for object recognition;
A data acquisition method characterized in that a computer executes.

On the computer,
Select images sequentially from the image set and extract multiple local features from the images.
Generating a cluster for each of the local features extracted from the image initially selected from the image set;
The feature value of the cluster is calculated from all elements classified into each cluster, and is set for each cluster.
Each time an image is selected from the image set, if the local feature value approximated to the feature value of the cluster is greater than or equal to a predetermined number among the local feature values extracted from the image, the local feature value is Update the clusters by classifying them into clusters with approximate feature values,
A computer program for data acquisition, characterized in that data for object recognition is acquired based on the cluster having a large number of elements.