JP2019021100A

JP2019021100A - Image search device, merchandise recognition device, and image search program

Info

Publication number: JP2019021100A
Application number: JP2017139858A
Authority: JP
Inventors: 山本　直史; Tadashi Yamamoto; 直史山本
Original assignee: Toshiba TEC Corp
Current assignee: Toshiba TEC Corp
Priority date: 2017-07-19
Filing date: 2017-07-19
Publication date: 2019-02-07

Abstract

To provide an image search device, a merchandise recognition device, and an image search program for performing highly accurate image search processing.SOLUTION: An image search device comprises: an image acquisition part; and a processor. The image acquisition part acquires an input image. The processor divides an image as a search object included in the input image into a plurality of small regions, and extracts feature points with a predetermined number as an upper limit for each of the divided small regions, and on the basis of a plurality of feature points in the image as the search object and feature points of a plurality of registration images registered in a dictionary, calculates a degree of similarity between the image as the search object and each of the plurality of registration images stored in the dictionary, and specifies the registration image having the maximum degree of similarity.SELECTED DRAWING: Figure 2

Description

本発明の実施形態は、画像探索装置、商品認識装置および画像探索プログラムに関する。 Embodiments described herein relate generally to an image search device, a product recognition device, and an image search program.

従来、入力画像と同じ画像（同一種の画像）をデータベース（辞書）から検索する画像探索処理を用いた画像認識技術には、カテゴリー認識（一般物体認識）と特定物体認識とがある。カテゴリー認識は、主に動物や植物などのように個体ごとの画像パターンや形状のばらつきがある物体を認識対象とする画像認識技術である。特定物体認識は、個体差がほとんど無い物体を認識対象とする画像認識技術である。例えば、特定物体認識は、商品のパッケージの画像から品種を認識する技術として適用される。 Conventionally, there are category recognition (general object recognition) and specific object recognition as image recognition techniques using image search processing for searching the same image (the same type of image) as an input image from a database (dictionary). Category recognition is an image recognition technique for recognizing objects having variations in image patterns and shapes of individuals such as animals and plants. The specific object recognition is an image recognition technique for recognizing an object having almost no individual difference. For example, specific object recognition is applied as a technique for recognizing a product type from an image of a product package.

特定物体認識は、認識対象となる全種類の物体の画像情報及び特徴情報を記憶した辞書を用意する。特定物体認識では、物体を撮影した画像と辞書に登録されている各物体の画像との類似度を計算し、最も類似度が高い物体を選定する。類似度の計算方法としては、画像同士を直接比較するテンプレートマッチングや画像上の局所的な特徴点を比較する特徴点比較方式などがある。テンプレートマッチングは画像全面を比較するために計算時間や辞書の容量が膨大になるなどの問題があるため、特定物体認識では特徴点比較方式を用いることが多い。 In the specific object recognition, a dictionary storing image information and feature information of all types of objects to be recognized is prepared. In the specific object recognition, the similarity between the image obtained by photographing the object and the image of each object registered in the dictionary is calculated, and the object having the highest similarity is selected. Similarity calculation methods include template matching for directly comparing images, and a feature point comparison method for comparing local feature points on images. Since template matching has problems such as enormous calculation time and a large dictionary capacity for comparing the entire images, a feature point comparison method is often used in specific object recognition.

特徴点比較方式では、画像からいくつかの特徴点を抽出し、さらに各特徴点の特徴量と属性を計算する。類似度は、辞書に登録されている各画像の特徴点のリストと比較して、対応する特徴点の数または比率から判定される。特徴点は、画像の濃淡値の分布から局所的に一意に特定できる点であり、たとえば、濃度値の２次微分の極大点などが用いられる。特徴量は、特徴点の近傍の濃度パターンなどから算出される。特徴点の抽出方法や特徴量の計算方法としては、ＳＩＦＴ、ＳＵＲＦ，ＯＲＢなどと呼ばれる方式がある。 In the feature point comparison method, some feature points are extracted from an image, and the feature amount and attribute of each feature point are calculated. The degree of similarity is determined from the number or ratio of corresponding feature points compared to a list of feature points of each image registered in the dictionary. The feature point is a point that can be uniquely specified locally from the gray value distribution of the image. For example, the maximum point of the second derivative of the density value is used. The feature amount is calculated from a density pattern in the vicinity of the feature point. As a feature point extraction method and a feature amount calculation method, there are methods called SIFT, SURF, ORB, and the like.

上述したように、同一の画像を探索する方法としては、局所特徴量を比較する特徴点比較方式が有効である。特徴点比較方式は、画像全体のごく一部の特徴点の情報を用いて画像の類似度を判定する。パターンマッチング方式などの画像全体を比較する方式に比べて極めて計算量が小さくなる。 As described above, as a method for searching for the same image, a feature point comparison method for comparing local feature amounts is effective. In the feature point comparison method, the similarity of images is determined using information on only a few feature points of the entire image. Compared to a method for comparing the entire image such as a pattern matching method, the amount of calculation is extremely small.

しかしながら、特徴点比較方式による画像探索処理を用いた特定物体認識には、いくつかの問題点がある。たとえば、従来のチェックアウト装置（ＰＯＳ端末）などに適用される特定物体認識としての商品認識処理では、共通のデザインを有する商品の識別が難しいことがある。同じメーカの同種の商品では、商標などの同じマークが複数の商品に共通して印刷されている場合が多い。また、同じシリーズの商品では、類似するデザインや共通のデザインが用いられることが多い。従来の特徴点比較方式では、認識対象の画像全面から強度の高い順に特徴点を抽出する。このため、従来の特徴点比較方式を用いた特定物体認識では、類似や共通のデザイン部分から強度の高い特徴点が抽出される商品群は、正確に識別するのが難しい。 However, there are some problems in specific object recognition using image search processing by a feature point comparison method. For example, in product recognition processing as specific object recognition applied to a conventional checkout device (POS terminal) or the like, it may be difficult to identify products having a common design. In the same type of products from the same manufacturer, the same mark such as a trademark is often printed in common for a plurality of products. In addition, products of the same series often use similar designs and common designs. In the conventional feature point comparison method, feature points are extracted in descending order of intensity from the entire image to be recognized. For this reason, in the specific object recognition using the conventional feature point comparison method, it is difficult to accurately identify a product group in which feature points having high strength are extracted from similar or common design parts.

特開２００７−１４０６１３号公報JP 2007-140613 A

上記した課題を解決するために、高精度な画像探索処理を行える画像探索装置、商品認識装置および画像探索プログラムを提供する。 In order to solve the above-described problems, an image search device, a product recognition device, and an image search program capable of performing high-precision image search processing are provided.

実施形態によれば、画像探索装置は、画像取得部とプロセッサとを有する。画像取得部は、入力画像を取得する。プロセッサは、入力画像に含まれる探索対象となる画像を複数の小領域に分割し、分割した小領域ごとに所定数を上限とした特徴点を抽出し、前記探索対象となる画像における複数の特徴点と辞書に登録される複数の登録画像の特徴点とに基づいて前記探索対象となる画像と前記辞書が記憶する各登録画像との類似度を算出し、最大の類似度となる登録画像を特定する。 According to the embodiment, the image search device includes an image acquisition unit and a processor. The image acquisition unit acquires an input image. The processor divides an image to be searched included in the input image into a plurality of small regions, extracts feature points with a predetermined number as an upper limit for each of the divided small regions, and a plurality of features in the image to be searched The similarity between the image to be searched and each registered image stored in the dictionary is calculated based on the points and the feature points of a plurality of registered images registered in the dictionary, and the registered image having the maximum similarity is calculated. Identify.

図１は、実施形態に係る画像探索装置としての認識装置の構成例を示す外観図である。FIG. 1 is an external view illustrating a configuration example of a recognition device as an image search device according to an embodiment. 図２は、実施形態に係る画像探索装置としての認識装置の制御系の構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of a control system of the recognition device as the image search device according to the embodiment. 図３は、実施形態に係る画像探索装置としての認識装置における画像探索処理を含む商品認識処理の流れを説明するためのフローチャートである。FIG. 3 is a flowchart for explaining the flow of a product recognition process including an image search process in the recognition device as the image search device according to the embodiment. 図４は、実施形態に係る画像探索装置としての認識装置における画像探索処理を含む商品認識処理の流れを説明するためのフローチャートである。FIG. 4 is a flowchart for explaining the flow of a product recognition process including an image search process in the recognition apparatus as the image search apparatus according to the embodiment. 図５は、実施形態に係る画像探索装置としての認識装置が処理対象とする商品画像の例を示す図である。FIG. 5 is a diagram illustrating an example of a product image to be processed by the recognition device as the image search device according to the embodiment. 図６は、実施形態に係る画像探索装置としての認識装置が図５に示す商品画像全体から強度の高い特徴点を抽出した例を示す図である。FIG. 6 is a diagram illustrating an example in which the recognition device as the image search device according to the embodiment extracts high-intensity feature points from the entire product image illustrated in FIG. 5. 図７は、実施形態に係る画像探索装置としての認識装置が図５に示す商品画像全体における各分割領域から特徴点を抽出した例を示す図である。FIG. 7 is a diagram illustrating an example in which the recognition device as the image search device according to the embodiment extracts feature points from each divided region in the entire product image illustrated in FIG. 5. 図８は、実施形態に係る画像探索装置としての認識装置における画像認識処理に用いられる辞書の構成例を示す図である。FIG. 8 is a diagram illustrating a configuration example of a dictionary used for image recognition processing in the recognition device as the image search device according to the embodiment.

以下、図面を参照しながら実施形態について説明する。
まず、本実施形態に係る画像探索装置としての認識装置を含む商品認識システムの構成について説明する。
図１は、商品認識システム１０の構成例を示す図である。
図１に示す構成例において、商品認識システム１０は、商品台１０１、照明装置１０２、カメラ１０３、及び、認識装置１０５を有する。商品台１０１は、認識対象とする商品を配置する台である。例えば、商品台１０１は、認識対象とする商品を入れたカゴを置くようにしても良い。また、商品台１０１に置くカゴには、１または複数の商品が置かれるものとする。 Hereinafter, embodiments will be described with reference to the drawings.
First, the configuration of a product recognition system including a recognition device as an image search device according to the present embodiment will be described.
FIG. 1 is a diagram illustrating a configuration example of a product recognition system 10.
In the configuration example shown in FIG. 1, the product recognition system 10 includes a product stand 101, a lighting device 102, a camera 103, and a recognition device 105. The product stand 101 is a stand on which products to be recognized are arranged. For example, the product stand 101 may place a basket containing products to be recognized. In addition, it is assumed that one or more products are placed on the basket placed on the product stand 101.

照明装置１０２は、商品台１０１に向けて光を発光するものである。照明装置１０２は、商品台１０１に配置された商品に対して上方から光を照射する。カメラ１０３は、商品台１０１を撮影範囲に含む画像を撮影する。カメラ１０３は、商品台１０１に配置された商品を含む領域の画像を撮影する。カメラ１０３は、撮像した画像を画像信号に変換して認識装置１０５へ送信する。照明装置１０２及びカメラ１０３は、撮影装置１０４を構成する。 The illumination device 102 emits light toward the product table 101. The illuminating device 102 irradiates light on the product arranged on the product table 101 from above. The camera 103 captures an image including the product stand 101 in the capturing range. The camera 103 captures an image of an area including a product arranged on the product stand 101. The camera 103 converts the captured image into an image signal and transmits the image signal to the recognition device 105. The illumination device 102 and the camera 103 constitute a photographing device 104.

認識装置１０５は、類似する画像を探索する画像探索処理を含む商品認識処理（画像認識処理）を行う。認識装置１０５は、カメラ１０３が撮影した画像に基づく商品認識処理などの処理を実行する。認識装置１０５は、一般的なノイマン型の計算機などのコンピュータで実現できる。認識装置１０５は、例えば、認識結果として商品コードなどの商品を示す情報を出力する。また、認識装置１０５は、商品のデータを記憶するデータベースを参照して商品の価格などの情報を出力するようにしても良い。なお、以下の実施形態では、認識装置１０５を画像探索装置として説明するが、撮影装置１０４と認識装置１０５とを含む構成を画像探索装置としても良い。 The recognition device 105 performs product recognition processing (image recognition processing) including image search processing for searching for similar images. The recognition device 105 executes processing such as product recognition processing based on an image captured by the camera 103. The recognition device 105 can be realized by a computer such as a general Neumann computer. For example, the recognition device 105 outputs information indicating a product such as a product code as a recognition result. The recognition apparatus 105 may output information such as the price of the product with reference to a database that stores the data of the product. In the following embodiment, the recognition device 105 is described as an image search device, but a configuration including the photographing device 104 and the recognition device 105 may be used as the image search device.

次に、画像探索装置としての認識装置１０５の構成について説明する。
図２は、実施形態に係る認識装置１０５の構成例を示すブロック図である。
図２に示すように、認識装置１０５は、プロセッサ２０１、メモリ２０２、不揮発性メモリ２０３、画像処理アクセラレータ２０４、入出力インターフェース（Ｉ／Ｆ）２０５、コンソール２０６、及び、ＮＩＣ２０７などがシステムバス２０８を介して接続される。 Next, the configuration of the recognition device 105 as an image search device will be described.
FIG. 2 is a block diagram illustrating a configuration example of the recognition apparatus 105 according to the embodiment.
As shown in FIG. 2, the recognition apparatus 105 includes a processor 201, a memory 202, a nonvolatile memory 203, an image processing accelerator 204, an input / output interface (I / F) 205, a console 206, a NIC 207, and the like via a system bus 208. Connected through.

プロセッサ２０１は、例えば、ＣＰＵである。プロセッサ２０１は、プログラムを実行することにより各種の処理機能を実現する。メモリ２０２は、作業用のデータを記憶するメモリである。不揮発性メモリ２０３は、書換え可能な不揮発性の記憶装置で構成する。例えば、不揮発性メモリ２０３は、ＨＤＤ、ＳＳＤなどで実現される。また、不揮発性メモリ２０３は、ＲＯＭなどを含んでも良い。不揮発性メモリ２０３は、プログラムおよびデータなどを記憶する。 The processor 201 is, for example, a CPU. The processor 201 implements various processing functions by executing programs. The memory 202 is a memory that stores work data. The nonvolatile memory 203 is composed of a rewritable nonvolatile storage device. For example, the non-volatile memory 203 is realized by an HDD, an SSD, or the like. Further, the nonvolatile memory 203 may include a ROM or the like. The nonvolatile memory 203 stores programs and data.

プロセッサ２０１は、不揮発性メモリ２０３が記憶するプログラム（プログラムコード）を実行する。プロセッサ２０１は、不揮発性メモリ２０３が記憶するプログラムコードをメモリ２０２に展開し、メモリ２０２に展開したプログラムコードを実行する。本実施形態において、後述する処理を実行するためのプログラムは、不揮発性メモリ２０３が記憶し、プロセッサ２０１により実行されるものとする。 The processor 201 executes a program (program code) stored in the nonvolatile memory 203. The processor 201 expands the program code stored in the nonvolatile memory 203 in the memory 202, and executes the program code expanded in the memory 202. In the present embodiment, it is assumed that a program for executing processing to be described later is stored in the nonvolatile memory 203 and executed by the processor 201.

また、不揮発性メモリ２０３は、認識対象となる全種類の商品に関する登録情報（辞書情報）を格納する辞書（データベース）２０９を有する。各商品に関する登録情報は、後述の商品認識処理（画像探索処理）に用いる辞書としての画像（登録画像）及び特徴情報などである。辞書２０９の構成例については、後で詳述する。また、不揮発性メモリ２０３は、辞書２０９に格納したデータの管理情報も記憶する。また、不揮発性メモリ２０３は、カメラ１０３から入力した画像（入力画像）、および、商品認識処理の結果などの情報も記憶する。 The nonvolatile memory 203 has a dictionary (database) 209 that stores registration information (dictionary information) regarding all types of products to be recognized. The registration information related to each product includes an image (registered image) as a dictionary and feature information used for a product recognition process (image search process) described later. A configuration example of the dictionary 209 will be described in detail later. The non-volatile memory 203 also stores management information of data stored in the dictionary 209. The nonvolatile memory 203 also stores information such as an image (input image) input from the camera 103 and the result of the product recognition process.

画像処理アクセラレータ２０４は、画像処理を実行する処理部である。入出力インターフェース（Ｉ／Ｆ）２０５は、カメラ１０３および照明１０２を含む撮影装置１０４を接続するインターフェースである。入力Ｉ／Ｆ２０５は、カメラ１０３が撮影した画像としての入力画像を取得するインターフェース（画像取得部）である。また、プロセッサ２０１は、入出力Ｉ／Ｆ２０５を介してカメラ１０３及び照明１０２へ制御指示を供給する。コンソール２０６は、管理者などのオペレータが操作指示を入力するものである。ＮＩＣ２０７は、外部装置と通信するためのインターフェースである。ＮＩＣ２０７は、例えば、ネットワークインターフェースであり、外部ネットワークを介して外部装置と通信する。 The image processing accelerator 204 is a processing unit that executes image processing. An input / output interface (I / F) 205 is an interface for connecting the imaging apparatus 104 including the camera 103 and the illumination 102. An input I / F 205 is an interface (image acquisition unit) that acquires an input image as an image captured by the camera 103. The processor 201 supplies a control instruction to the camera 103 and the illumination 102 via the input / output I / F 205. The console 206 is used by an operator such as an administrator to input an operation instruction. The NIC 207 is an interface for communicating with an external device. The NIC 207 is, for example, a network interface and communicates with an external device via an external network.

次に、本実施形態に係る商品認識システム１０による商品認識処理の流れについて概略的に説明する。
本商品認識システム１０が認識対象とする商品は、商品台１０１上に配置される。本実施形態においては、複数の商品がある場合、商品台１０１上には複数の商品が互いに重ならないように配置されるものとする。撮影装置１０４は、照明装置１０２により商品台１０１上に光を照射し、カメラ１０３により商品台１０１上を撮影領域に含む画像を撮影する。カメラ１０３は、撮影した画像を認識装置１０５へ供給する。 Next, a flow of product recognition processing by the product recognition system 10 according to the present embodiment will be schematically described.
A product to be recognized by the product recognition system 10 is placed on the product stand 101. In the present embodiment, when there are a plurality of products, it is assumed that the plurality of products are arranged on the product table 101 so as not to overlap each other. The imaging device 104 irradiates light on the product stand 101 by the illumination device 102, and takes an image including the image on the product stand 101 by the camera 103. The camera 103 supplies the captured image to the recognition device 105.

認識装置１０５は、カメラ１０３が撮影した画像を入力画像として取得する。認識装置１０５は、入力画像において個々の商品ごとの領域を切出す。認識装置１０５は、商品ごとの領域に切り出した画像（商品画像）から特徴点の抽出と特徴量の計算とを行う。認識装置１０５は、商品画像から抽出した特徴点の情報と辞書２０９に登録されている各商品画像の特徴点の情報とを比較して特徴点の情報が類似する画像を探す。 The recognition device 105 acquires an image captured by the camera 103 as an input image. The recognition device 105 cuts out an area for each product in the input image. The recognition device 105 performs feature point extraction and feature amount calculation from an image (product image) cut out in an area for each product. The recognition device 105 compares the feature point information extracted from the product image with the feature point information of each product image registered in the dictionary 209 and searches for images having similar feature point information.

すなわち、認識装置１０５は、画像探索処理として、商品画像に対して類似する特徴を有する画像の商品を辞書２０９から検索する。本実施形態に係る認識装置１０５は、類似する画像であるかを局所特徴量方式で判定するものとする。局所特徴量方式は、対応する特徴点数の量により判定する。認識装置１０５は、すべての対象となる商品の画像について予め計算した特徴点の情報を商品情報と合わせて１つのデータセットとして辞書２０９に記憶しておく。なお、辞書２０９は、商品認識処理において、認識装置１０５のプロセッサ２０１がアクセスできるものであれば良い。例えば、辞書２０９は、ＮＩＣ２０７を介して通信可能な外部装置に設けても良い。 In other words, the recognition device 105 searches the dictionary 209 for an image product having characteristics similar to the product image as an image search process. It is assumed that the recognition apparatus 105 according to the present embodiment determines whether the images are similar using a local feature amount method. The local feature amount method is determined by the amount of corresponding feature points. The recognition device 105 stores feature point information calculated in advance for all target product images in the dictionary 209 together with the product information as one data set. The dictionary 209 only needs to be accessible by the processor 201 of the recognition apparatus 105 in the product recognition process. For example, the dictionary 209 may be provided in an external device that can communicate via the NIC 207.

特徴点は、画像において局所的に一意に特定できる点である。特徴点は、画像のコーナー部分や画像濃度の２次微分関数の極点（極大又は極小点）などの局所領域内で一意に定められる点である。特徴点の情報は、その位置を表す座標（ｘ、ｙ）の他に、大きさを示すスケール、方向を表すオリエンテーションなどの情報を含む。また、特徴点の特徴量は、例えば、特徴点近傍の濃度パターンに応じて定義される。具体例としては、特徴点の特徴量は、特徴点の近傍の濃度パターンを数１０〜数１００の低次元の情報量に圧縮した値である。 A feature point is a point that can be uniquely specified locally in an image. A feature point is a point uniquely determined in a local region such as a corner portion of an image or a maximum point (maximum or minimum point) of a secondary differential function of image density. The feature point information includes information such as a scale indicating the size and an orientation indicating the direction in addition to the coordinates (x, y) indicating the position. The feature amount of the feature point is defined according to a density pattern near the feature point, for example. As a specific example, the feature amount of the feature point is a value obtained by compressing the density pattern in the vicinity of the feature point into a low-dimensional information amount of several tens to several hundreds.

本実施形態においては、これらの情報を合わせて特徴点の属性（属性情報）とよぶことにする。また、属性は、例えば、特徴点の位置座標および特徴点の向きを表わすオリエンテーションとサイズを表すスケールである。特徴点の情報として、スケールとオリエンテーションとを有することにより、回転相似変換した画像間でも同じ特徴量で表現できる。
なお、特徴点の抽出方法や特徴量の計算方法としては、ＳＩＦＴ、ＳＵＲＦ，ＯＲＢなどの方式がある。 In the present embodiment, these pieces of information are collectively referred to as feature point attributes (attribute information). Further, the attribute is, for example, a scale representing the orientation and size indicating the position coordinates of the feature point and the direction of the feature point. By having the scale and orientation as the feature point information, it can be expressed with the same feature quantity even between images subjected to rotational similarity transformation.
Note that there are methods such as SIFT, SURF, and ORB as feature point extraction methods and feature amount calculation methods.

また、局所特徴量方式は、テンプレートマッチングなどのように画像全体の情報を用いる方法に比べ、辞書のデータサイズ、及び、処理における計算量を小さくすることができる。また、回転相似変換によっても特徴点の特徴量が不変であるため、任意の角度、距離から撮影した画像からも検索が可能である。 In addition, the local feature method can reduce the data size of the dictionary and the amount of calculation in processing, compared to a method using information of the entire image such as template matching. Further, since the feature amount of the feature point is not changed even by the rotation similarity conversion, it is possible to search from an image taken from an arbitrary angle and distance.

ただし、認識処理時に商品を撮影した画像（入力画像中の商品画像）と辞書作成時に商品を撮影した画像とでは、商品の位置、向き、撮影距離などが異なる。このため、入力画像から抽出した商品画像（入力商品画像）と辞書作成時の商品画像（登録用の商品画像、登録画像）とは、同一商品であっても、画像の座標系が異なる。これらの座標系は、平行移動、回転、拡大縮小の合成された相似変換の関係となる。 However, the position, orientation, shooting distance, etc. of the product are different between the image obtained by photographing the product during the recognition process (the product image in the input image) and the image obtained by photographing the product at the time of creating the dictionary. For this reason, the product image (input product image) extracted from the input image and the product image at the time of creating the dictionary (product image for registration, registered image) have different image coordinate systems even for the same product. These coordinate systems are related to the combined similarity transformation of translation, rotation, and enlargement / reduction.

本実施形態に係る認識装置１０５は、商品認識処理（画像探索処理）において、入力商品画像と辞書の商品画像との座標系の関係である相似変換行列の候補を求める。認識装置１０５は、候補から妥当性のある変換行列を求め、算出した変換行例による座標変換をかけた上で対応する特徴点の数を求める。認識装置１０５は、辞書の特徴点数に対して対応する特徴点数の割合を類似度とする。認識装置１０５は、類似度が最大でかつ所定の閾値以上となる辞書の商品を入力商品画像と一致する認識（検索）結果とする。 In the product recognition process (image search process), the recognition apparatus 105 according to the present embodiment obtains a candidate for a similarity transformation matrix that is a coordinate system relationship between the input product image and the product image in the dictionary. The recognizing device 105 obtains a valid transformation matrix from the candidates, obtains the number of corresponding feature points after performing coordinate transformation according to the calculated transformation row example. The recognition apparatus 105 sets the ratio of the number of feature points corresponding to the number of feature points in the dictionary as the similarity. The recognition device 105 sets a product in the dictionary having the maximum similarity and a predetermined threshold value or more as a recognition (search) result that matches the input product image.

次に、認識装置１０５による商品認識処理（画像探索処理）について詳細に説明する。
図３及び図４は、商品認識処理（画像探索処理）の流れを説明するためのフローチャートである。
カメラ１０３は、認識対象とする商品が載置される商品台１０１を含む撮影範囲の画像を撮影する。認識装置１０５は、入出力Ｉ／Ｆ２０５によりカメラ１０３が撮影した撮影画像（入力画像）を取得する（ＡＣＴ１１）。認識装置１０５のプロセッサ２０１は、入出力Ｉ／Ｆ２０５によりカメラ１０３から取得した画像を入力画像として不揮発性メモリ２０３に記憶する。ここで、入力画像は、モノクロ明度信号、すなわち画像の画素の明度を０から２５５までの値で明るい方が大きい値で表現した画像信号であるものとする。ただし、入力画像は、モノクロ明度信号に限らず、例えば、ＲＧＢ等のカラー画像信号であっても良い。 Next, the product recognition process (image search process) by the recognition apparatus 105 will be described in detail.
3 and 4 are flowcharts for explaining the flow of the product recognition process (image search process).
The camera 103 captures an image of an imaging range including the product stand 101 on which the product to be recognized is placed. The recognition device 105 acquires a captured image (input image) captured by the camera 103 by the input / output I / F 205 (ACT 11). The processor 201 of the recognition apparatus 105 stores an image acquired from the camera 103 by the input / output I / F 205 as an input image in the nonvolatile memory 203. Here, it is assumed that the input image is a monochrome lightness signal, that is, an image signal in which the lightness of the pixels of the image is expressed by a value from 0 to 255 and a brighter value is larger. However, the input image is not limited to a monochrome lightness signal but may be a color image signal such as RGB.

入力画像を取得すると、認識装置１０５のプロセッサ２０１は、入力画像から個々の商品の領域を抽出する（ＡＣＴ１２）。プロセッサ２０１は、カメラ１０３から入力した画像において個々の商品の領域を切出し、個々の商品の画像（商品画像）を抽出する。本実施形態において、商品画像は、個々の商品の領域を囲む最小の矩形領域であるものとし、商品画像の領域は、矩形の４つの頂点の座標値で表現するものとする。ただし、商品画像の表現は、矩形の４つの頂点の座標値で表現するものに限定されない。例えば、商品画像の表現は、別に画像領域に対応する２次元の配列を用意し、その配列の値によって商品の種別を表すような表現でもよい。 When the input image is acquired, the processor 201 of the recognition apparatus 105 extracts individual product areas from the input image (ACT 12). The processor 201 cuts out individual product regions from the image input from the camera 103, and extracts an image of each product (product image). In the present embodiment, the product image is a minimum rectangular region surrounding each product region, and the product image region is expressed by coordinate values of four vertices of a rectangle. However, the representation of the product image is not limited to that represented by the coordinate values of the four vertices of the rectangle. For example, the product image may be expressed in such a manner that a two-dimensional array corresponding to the image area is prepared and the type of the product is represented by the value of the array.

入力画像から少なくとも１つの商品画像を抽出すると、プロセッサ２０１は、変数ｉを初期化（ｉ＝０）とする（ＡＣＴ１３）。変数ｉを初期化した後、プロセッサ２０１は、変数ｉをインクリメント（ｉ＝ｉ＋１）とする（ＡＣＴ１４）。変数ｉをインクリメントすると、プロセッサ２０１は、抽出した商品画像のうちｉ番目の商品画像を処理対象の画像として選出する。 When at least one product image is extracted from the input image, the processor 201 initializes the variable i (i = 0) (ACT 13). After initializing the variable i, the processor 201 increments the variable i (i = i + 1) (ACT 14). When the variable i is incremented, the processor 201 selects the i-th product image among the extracted product images as an image to be processed.

ｉ番目の商品画像を選択すると、プロセッサ２０１は、選択した商品画像全体における特徴点を求める処理を行う。本実施形態では、プロセッサ２０１は、商品画像を複数の小領域（分割領域）に分割し、分割した各小領域から特徴点を抽出する。 When the i-th product image is selected, the processor 201 performs processing for obtaining a feature point in the entire selected product image. In the present embodiment, the processor 201 divides the product image into a plurality of small areas (divided areas), and extracts feature points from the divided small areas.

分割領域毎に特徴点を抽出する処理において、プロセッサ２０１は、商品画像を複数の領域に分割する（ＡＣＴ１５）。図５は、商品画像を９つの領域に分割した例を示す図である。プロセッサ２０１は、商品画像から商品全体を含む長方形の画像領域（全体領域）を設定する。たとえば、プロセッサ２０１は、商品に外接する長方形（外接矩形）を商品の全体領域とする。図５に示す例では、プロセッサ２０１は、長方形の全体領域において向かい合う２つの辺に平行で、かつ、それ以外の辺を３分割する仮想の境界線を設定する。この場合、図５に示すように、全体領域には縦横にそれぞれ２本の境界線が設定され、それらの境界線によって商品の全体領域が９つの領域（分割領域）に分割される。 In the process of extracting feature points for each divided area, the processor 201 divides the product image into a plurality of areas (ACT 15). FIG. 5 is a diagram illustrating an example in which a product image is divided into nine regions. The processor 201 sets a rectangular image area (entire area) including the entire product from the product image. For example, the processor 201 sets a rectangle circumscribing the product (a circumscribed rectangle) as the entire region of the product. In the example illustrated in FIG. 5, the processor 201 sets a virtual boundary line that is parallel to two opposite sides in the entire rectangular area and that divides the other sides into three. In this case, as shown in FIG. 5, two boundary lines are set vertically and horizontally in the entire area, and the entire area of the product is divided into nine areas (divided areas) by these boundary lines.

なお、図５では、商品の全体領域を９分割する例を示したが、領域の分割方法はこれに限るものではない。例えば、縦横をそれぞれ４分割することにより１６分割した分割領域を設定しても良い。また、縦と横との分割数は異なる値でもよい。さらに、商品のサイズに応じて分割数を変え、各分割領域の面積が商品の大きさによらずほぼ同じなるように分割してもよい。 In addition, although the example which divides the whole area | region of goods into 9 was shown in FIG. 5, the division | segmentation method of an area | region is not restricted to this. For example, a divided area divided into 16 may be set by dividing the vertical and horizontal directions into four. Also, the vertical and horizontal division numbers may be different values. Furthermore, the number of divisions may be changed according to the size of the product, and the area of each divided region may be divided so as to be substantially the same regardless of the size of the product.

商品の全体領域を分割すると、プロセッサ２０１は、分割領域ごとの抽出する特徴点の数の上限Ｎｐを特定する（ＡＣＴ１６）。たとえば、プロセッサ２０１は、１つの商品画像に対して抽出すべき特徴点数を分割領域数で除した値を各分割領域の特徴点の上限数Ｎｐとする。たとえば、１つの商品画像に対する特徴点数が５００である場合、９分割された各分割領域の特徴点の上限数Ｎｐは、５００÷９＝５６個となる。 When the entire region of the product is divided, the processor 201 specifies the upper limit Np of the number of feature points to be extracted for each divided region (ACT 16). For example, the processor 201 sets a value obtained by dividing the number of feature points to be extracted for one product image by the number of divided areas as the upper limit number Np of feature points in each divided area. For example, when the number of feature points for one product image is 500, the upper limit number Np of feature points of each of the nine divided areas is 500 ÷ 9 = 56.

各分割領域の特徴点の上限数Ｎｐを特定すると、プロセッサ２０１は、当該商品の全体領域において所定の強度以上の特徴点を抽出する（ＡＣＴ１７）。ここで、所定の強度とは、特徴点として抽出すべきないものを除外するための閾値である。たとえば、強度の低い特徴点は、商品画像上の汚れ又は撮影時のノイズなどにより偶然に生じる有意でない特徴点の可能性が高い。このような特徴点は、濃淡の変化が小さい分割領域においても抽出すべきものではない。従って、特徴点に対して強度の最低閾値を設けておき、閾値より強度の低い特徴点は抽出しないようにする。これにより、濃淡がほとんどない領域などの特徴点が存在しないか、或は、極めて強度の低い特徴点しか存在しない分割領域においては、有意でない特徴点が抽出されないようにできる。 When the upper limit number Np of feature points in each divided region is specified, the processor 201 extracts feature points having a predetermined intensity or more in the entire region of the product (ACT 17). Here, the predetermined intensity is a threshold value for excluding those that should not be extracted as feature points. For example, a feature point with low intensity is highly likely to be a non-significant feature point that occurs by chance due to dirt on a product image or noise during photographing. Such feature points should not be extracted even in divided regions where the change in shading is small. Therefore, a minimum threshold value of intensity is set for the feature points, and feature points having an intensity lower than the threshold value are not extracted. Thereby, insignificant feature points can be prevented from being extracted in a divided region where there are no feature points such as regions having almost no shading, or there are only feature points with extremely low intensity.

全体領域から所定強度以上の特徴点を抽出すると、プロセッサ２０１は、抽出した各特徴点が存在する分割領域を判定する（ＡＣＴ１８）。たとえば、プロセッサ２０１は、抽出した各特徴点の座標値から各特徴点がどの分割領域に入るかを判定する。すなわち、プロセッサ２０１は、商品の全体領域から抽出した特徴点がどの分割領域に属するかを決定する。 When feature points having a predetermined intensity or more are extracted from the entire region, the processor 201 determines a divided region where the extracted feature points exist (ACT 18). For example, the processor 201 determines which divided region each feature point enters from the coordinate value of each extracted feature point. That is, the processor 201 determines to which divided area the feature points extracted from the entire area of the product belong.

各特徴点が属する分割領域を判定すると、プロセッサ２０１は、分割領域ごとに特徴点を強度の高い順に並べる（ＡＣＴ１９）。分割領域毎に特徴点を強度の高い順に並べると、プロセッサ２０１は、各分割領域において強度が高い順番に上限数Ｎｐ個の特徴点を抽出する（ＡＣＴ２０）。たとえば、上限数が５６個である場合、プロセッサ２０１は、各分割領域において強度が高い順番に５６個の特徴点を抽出する。ただし、分割領域における特徴点の数が上限数Ｎｐよりも少ない場合、プロセッサ２０１は、上限数Ｎｐ個未満の特徴点を抽出するようにしても良い。 When the divided area to which each feature point belongs is determined, the processor 201 arranges the feature points in descending order for each divided area (ACT 19). When the feature points are arranged in descending order for each divided area, the processor 201 extracts the upper limit number Np feature points in the descending order of the intensity in each divided area (ACT 20). For example, when the upper limit number is 56, the processor 201 extracts 56 feature points in descending order of strength in each divided region. However, when the number of feature points in the divided region is smaller than the upper limit number Np, the processor 201 may extract feature points less than the upper limit number Np.

以上の処理（ＡＣＴ１５−２０）によって、プロセッサ２０１は、商品画像を複数の分割領域に分割し、各分割領域において上限数までの数の特徴点を抽出できる。
図６は、図５に示す商品画像の全体領域から強度の高い順に特徴点を抽出した場合の例を示す図である。図７は、図５に示す商品画像の各分割領域から強度の高い順に特徴点を抽出した場合の例を示す図である。なお、図６および図７は、抽出された特徴点を円で図示するものであり、特徴点を示す円は、中心が特徴点の位置を示し、大きさが特徴点のスケールを示す。 Through the above processing (ACT 15-20), the processor 201 can divide the product image into a plurality of divided areas and extract the number of feature points up to the upper limit number in each divided area.
FIG. 6 is a diagram illustrating an example when feature points are extracted in descending order of intensity from the entire region of the product image illustrated in FIG. 5. FIG. 7 is a diagram illustrating an example when feature points are extracted in descending order of intensity from each divided region of the product image illustrated in FIG. 5. 6 and 7 illustrate the extracted feature points as circles. The circle indicating the feature points indicates the position of the feature point and the size indicates the scale of the feature point.

図６および図７から明らかなように、分割領域ごとに特徴点を抽出することにより、特徴点が商品画像における特定の領域に集中することを防ぐことができる。特徴点は、濃淡変化の大きい部分で強度が大きくなる。このため、一般的には写真部分より文字部分の方に強度が大きい特徴点が集中する。商品画像では商品名やロゴなどの特定の箇所が目立つデザインとなっていることが多く、そのような領域に特徴点が抽出されやすくなる。また、白地に黒文字のようにコントラストの強い文字の周辺も特徴点が抽出されやすくなる。 As is apparent from FIGS. 6 and 7, by extracting feature points for each divided region, it is possible to prevent the feature points from concentrating on a specific region in the product image. The feature point has a high strength at a portion where the shading change is large. For this reason, generally, feature points having higher strength are concentrated in the character portion than in the photograph portion. In product images, a specific part such as a product name or logo is often conspicuous, and feature points are easily extracted in such areas. Also, feature points can be easily extracted around a character with high contrast such as a black character on a white background.

図６に示す例では、商品画像全体から強度の高い順に特徴点を抽出すると、商品画像の上部のロゴ文字部や右下のコントラストの強い文字部に特徴点が集中することを示している。たとえば、同じシリーズの商品同士ではパッケージが類似したデザインとなる事が多く、商品名やロゴなどが同じデザインとなっていることも多い。つまり、同じデザイン部分や類似するデザインを含むパッケージの複数の商品は、商品画像全体から強度の高い順に特徴点を抽出すると、正確に識別することが難しくなる。また、特徴点が特定の領域に集中する商品は、その特定の領域がシールや別の商品などで隠されると、ほとんどの特徴点が隠されるために認識が困難となるという問題もある。 In the example illustrated in FIG. 6, when feature points are extracted from the entire product image in descending order, the feature points are concentrated on a logo character portion at the top of the product image and a character portion with high contrast at the lower right. For example, the same series of products often have similar designs on the package, and the product name and logo are often the same design. That is, it is difficult to accurately identify a plurality of products in a package including the same design part or a similar design if feature points are extracted from the entire product image in descending order of strength. In addition, there is a problem that a product having feature points concentrated on a specific area is difficult to recognize when the specific area is hidden by a seal or another product because most of the feature points are hidden.

これに対して、図７に示す例では、抽出される特徴点が商品画像の全体に分散され、左下の写真領域などにも特徴点が点在している。このため、図７に示すような特徴点を用いた商品認識（画像探索）では、類似の商品（画像）の間でも判別性能が高くなる。また、商品の一部が隠れていても特徴点の大半が隠れるというリスクが下がる。商品画像における分割領域ごとに特徴点を抽出することにより、商品の認識精度が向上し、高精度な商品検索処理を提供することができる。 On the other hand, in the example shown in FIG. 7, the extracted feature points are dispersed throughout the product image, and the feature points are also scattered in the lower left photographic area. For this reason, in the product recognition (image search) using the feature points as shown in FIG. 7, the discrimination performance is high even between similar products (images). In addition, the risk that most of the feature points are hidden even if part of the product is hidden is reduced. By extracting feature points for each divided region in the product image, product recognition accuracy is improved, and highly accurate product search processing can be provided.

ｉ番目の商品画像における特徴点を抽出すると、プロセッサ２０１は、抽出した各特徴点について特徴量（局所特徴量）を計算する（ＡＣＴ２１）。局所特徴量は、たとえば、特徴点の近傍の領域の濃淡のパターンを固定長の符号または数値で表現したもので良い。特徴量としては、いくつかの方式により算出することが可能である。本実施形態の説明においては、ＯＲＢ特徴量を用いるものとして説明する。ＯＲＢ特徴量は、２５６次元の２値情報で表される。ＯＲＢ特徴量は、パターンの形状が近いほど２つの２５６次元の特徴量のハミング距離が小さくなり、同じパターンの場合にはハミング距離が最小値の０となる。ハミング距離は、各ｂｉｔの排他的論理和の２５６次元での総和で表される。このため、ＯＲＢ特徴量は、ユークリッド距離（Ｌ２ノルム）又はマンハッタン距離（Ｌ１ノルム）に比べて、極めて高速に計算できるという特長がある。 When feature points in the i-th product image are extracted, the processor 201 calculates a feature amount (local feature amount) for each extracted feature point (ACT21). The local feature amount may be, for example, a representation of a shading pattern in a region near a feature point with a fixed-length code or a numerical value. The feature amount can be calculated by several methods. In the description of the present embodiment, it is assumed that the ORB feature amount is used. The ORB feature amount is represented by 256-dimensional binary information. As the ORB feature value is closer, the Hamming distance between the two 256-dimensional feature values becomes smaller as the pattern shape is closer, and in the case of the same pattern, the Hamming distance has a minimum value of 0. The Hamming distance is expressed as a total in 256 dimensions of exclusive OR of each bit. For this reason, the ORB feature quantity has a feature that it can be calculated at a very high speed compared to the Euclidean distance (L2 norm) or the Manhattan distance (L1 norm).

プロセッサ２０１は、抽出した特徴点について、Ｘ座標、Ｙ座標、スケール、オリエンテーション、及び特徴量を含む属性情報を特定する。これらの情報を特定すると、プロセッサ２０１は、抽出した特徴点に対する属性情報としてメモリ２０２に保持する。さらに、プロセッサ２０１は、スケールの対数を整数化した対数スケールも算出し、メモリ２０２に保持しておく。ここで、スケールと対数スケールとは、１対１に対応しており、冗長な情報である。なお、上述した特徴点の属性情報は、後述する処理によって特徴点の数分だけ計算される。この結果として、メモリ２０２には、ｉ番目の商品画像の各特徴点の属性情報が配列として記憶される。 The processor 201 specifies attribute information including an X coordinate, a Y coordinate, a scale, an orientation, and a feature amount for the extracted feature point. When these pieces of information are specified, the processor 201 holds the attribute information for the extracted feature points in the memory 202. Further, the processor 201 also calculates a logarithmic scale obtained by converting the logarithm of the scale into an integer, and stores it in the memory 202. Here, the scale and the logarithmic scale have a one-to-one correspondence and are redundant information. Note that the feature point attribute information described above is calculated by the number of feature points by a process described later. As a result, the memory 202 stores the attribute information of each feature point of the i-th product image as an array.

特徴点の情報（属性情報）をメモリ２０２に記憶すると、プロセッサ２０１は、変数ｋを初期化（ｋ＝０）する（ＡＣＴ２２）。変数ｋを初期化した後、プロセッサ２０１は、変数ｋをインクリメント（ｋ＝ｋ＋１）とする（ＡＣＴ２３）。そして、プロセッサ２０１は、辞書２０９に記憶されているｋ番目の商品の登録情報を読み出す（ＡＣＴ２４）。プロセッサ２０１は、辞書２０９が記憶する各商品の登録情報を順に読み出す。例えば、辞書２０９は、登録される各商品（認識対象となる各種の商品）に関する登録情報を配列テーブルの形で記憶するものとする。 When the feature point information (attribute information) is stored in the memory 202, the processor 201 initializes the variable k (k = 0) (ACT22). After initializing the variable k, the processor 201 increments the variable k (k = k + 1) (ACT 23). Then, the processor 201 reads the registration information of the kth product stored in the dictionary 209 (ACT 24). The processor 201 sequentially reads registration information of each product stored in the dictionary 209. For example, it is assumed that the dictionary 209 stores registration information regarding each product to be registered (various products to be recognized) in the form of an array table.

図８は、辞書２０９の構成例を示す図である。
図８に示す例において、辞書２０９は、登録される商品の数（品種の数）のブロックにより構成される。図８に示す辞書２０９においては、各ブロックが、個々の商品に対する登録情報（辞書情報）である。図８に示す例では、登録商品の数をＮｉとし、Ｎｉ個のブロックを有する。図８に示す各ブロックは、商品の名称、商品コード（商品に一意に対応する識別情報）、商品画像における特徴点の数、各特徴点の属性情報（特徴点のＸ及びＹ座標、スケール、オリエンテーション、特徴量）を記憶する。例えば、特徴量は、上述したように、３２桁の１６進数で表される２５６次元の２値情報である。 FIG. 8 is a diagram illustrating a configuration example of the dictionary 209.
In the example illustrated in FIG. 8, the dictionary 209 is configured by blocks of the number of products to be registered (number of products). In the dictionary 209 shown in FIG. 8, each block is registration information (dictionary information) for each product. In the example shown in FIG. 8, the number of registered products is Ni, and there are Ni blocks. Each block shown in FIG. 8 includes a product name, a product code (identification information uniquely corresponding to the product), the number of feature points in the product image, and attribute information of each feature point (X and Y coordinates of the feature point, scale, Orientation, feature quantity). For example, as described above, the feature amount is 256-dimensional binary information represented by a 32-digit hexadecimal number.

ただし、辞書２０９に登録される特徴点は、登録用の商品画像に対して上述の処理（ＡＣＴ１５−２０の処理）と同様な処理を実施することにより得られるものとする。つまり、辞書２０９には、登録用の商品画像を所定の小領域に分割し、各分割領域から抽出される特徴点に関する情報が登録されるものとする。 However, the feature points registered in the dictionary 209 are obtained by performing the same processing as the above-described processing (processing of ACT 15-20) on the product image for registration. In other words, in the dictionary 209, the product image for registration is divided into predetermined small areas, and information about feature points extracted from each divided area is registered.

辞書のｋ番目の商品の情報を読出すと、プロセッサ２０１は、商品画像から算出した特徴点の情報とｋ番目の商品の特徴点の情報とを比較して類似の特徴点対群を選出する（ＡＣＴ２５）。類似の特徴点対としては、特徴量の近い特徴点の対が選出される。例えば、プロセッサ２０１は、入力商品画像の全特徴点と辞書のｋ番目の商品（ブロック）の全特徴点との総当たりでハミング距離Ｈｄを計算する。プロセッサ２０１は、算出したハミング距離Ｈｄが所定の閾値Ｔｈｄ（例えば、６４程度）より小さい場合に、類似する特徴点とみなす。プロセッサ２０１は、ハミング距離Ｈｄが所定閾値Ｔｈｄよりも小さくなる特徴点の対をすべて求める。 When the information of the kth product in the dictionary is read, the processor 201 compares the feature point information calculated from the product image with the feature point information of the kth product and selects a similar feature point pair group. (ACT25). As similar feature point pairs, pairs of feature points having similar feature quantities are selected. For example, the processor 201 calculates the hamming distance Hd based on the brute force between all feature points of the input product image and all feature points of the kth product (block) in the dictionary. The processor 201 regards it as a similar feature point when the calculated Hamming distance Hd is smaller than a predetermined threshold Thd (for example, about 64). The processor 201 obtains all pairs of feature points in which the Hamming distance Hd is smaller than the predetermined threshold Thd.

類似の特徴点対を選出すると、プロセッサ２０１は、辞書の座標系と入力商品画像とを座標変換するための座標変換行列を計算する行列計算処理を実行する（ＡＣＴ２６）。すなわち、プロセッサ２０１は、特徴点対のセットから入力商品画像と辞書のｋ番目の商品の画像との座標変換行列を計算する。もし、商品画像と辞書画像とが同じものであれば、商品画像上の点と辞書画像上の点とは相似な座標変換行列で表される関係になる。この場合、特徴点同士もその座標変換行列で対応づけられる位置関係になる。ここでは、そのような座標変換行列を求める。 When a similar feature point pair is selected, the processor 201 executes a matrix calculation process for calculating a coordinate transformation matrix for coordinate transformation between the coordinate system of the dictionary and the input product image (ACT 26). That is, the processor 201 calculates a coordinate transformation matrix between the input product image and the k-th product image in the dictionary from the set of feature points. If the product image and the dictionary image are the same, the point on the product image and the point on the dictionary image are represented by a similar coordinate transformation matrix. In this case, the feature points also have a positional relationship that is associated with the coordinate transformation matrix. Here, such a coordinate transformation matrix is obtained.

たとえば、全ての特徴点対が対応点であれば、最小二乗法を利用した回帰分析などの手法で座標変換行列を求めるのが効率的である。ただし、多くの場合は画像上に複数の類似のパターンがあったり偶然に特徴量の近い点の組が存在したりするため、全ての特徴点対が同じ座標変換行列で表されない。このような場合のロバストな変換行列推定方法としては、ＲＡＮＳＡＣ法を用いることができる。ここでは、特徴点対の尤度を利用したＲＡＮＳＡＣの変形方式を用いて、座標変換行列を求めるものとする。 For example, if all feature point pairs are corresponding points, it is efficient to obtain a coordinate transformation matrix by a technique such as regression analysis using the least square method. However, in many cases, there are a plurality of similar patterns on the image, or a set of points having similar feature amounts exists by chance, so that all feature point pairs are not represented by the same coordinate transformation matrix. The RANSAC method can be used as a robust transformation matrix estimation method in such a case. Here, it is assumed that a coordinate transformation matrix is obtained using a RANSAC modification method using the likelihood of feature point pairs.

座標変換行列を算出すると、プロセッサ２０１は、座標変換行列を用いて選出した全ての特徴点対を座標変換する。プロセッサ２０１は、座標変換した結果について変換誤差を算出する。プロセッサ２０１は、座標変換行列による変換誤差が所定値以下となる特徴点対の数を計数する（ＡＣＴ２７）。 When the coordinate transformation matrix is calculated, the processor 201 performs coordinate transformation on all feature point pairs selected using the coordinate transformation matrix. The processor 201 calculates a conversion error for the result of the coordinate conversion. The processor 201 counts the number of feature point pairs whose conversion error due to the coordinate conversion matrix is equal to or less than a predetermined value (ACT 27).

プロセッサ２０１は、計数した変換誤差が所定値以下となる特徴点対の数に基づいて入力商品画像と辞書のｋ番目の商品との類似度を算出する（ＡＣＴ２８）。たとえば、すべての特徴点対について辞書画像の特徴点の位置座標に座標変換行列をかけると、対応する商品画像の座標の推定値となる。この座標の推定値と商品画像の特徴点対の座標の間の距離を求める。この距離が所定の閾値以下ならば、当該特徴点対は位置的にも正しく対応する特徴点と考えられる。全特徴点対のうち、距離が閾値以下となる特徴点対を全特徴点対の数で除算した値を求め、これを類似度と定義する。類似度が高いほど商品画像と辞書画像が近く、同じ画像である確率が高いと考えられる。 The processor 201 calculates the degree of similarity between the input product image and the k-th product in the dictionary based on the number of feature point pairs for which the counted conversion error is a predetermined value or less (ACT 28). For example, when the coordinate transformation matrix is applied to the position coordinates of the feature points of the dictionary image for all the feature point pairs, the estimated value of the coordinates of the corresponding product image is obtained. The distance between the estimated value of the coordinates and the coordinates of the feature point pair of the product image is obtained. If this distance is less than or equal to a predetermined threshold value, the feature point pair is considered as a feature point that corresponds to the correct position. Of all the feature point pairs, a value obtained by dividing a feature point pair whose distance is equal to or smaller than a threshold value by the number of all feature point pairs is obtained, and this is defined as similarity. It is considered that the higher the similarity is, the closer the product image and the dictionary image are, and the higher the probability that they are the same image.

ｋ番目の商品に対する類似度を算出すると、プロセッサ２０１は、変数ｋが辞書２０９に登録されている商品の総数（辞書の商品数）に達したかを判断する（ＡＣＴ２７）。変数ｋが辞書２０９の商品数に達していない場合、つまり、類似度を算出していない商品がある場合（ＡＣＴ２９、ＮＯ）、プロセッサ２０１は、ＡＣＴ２３へ戻り、辞書２０９にある次の商品（ｋ＝ｋ＋１番目の商品）に対する類似度を算出する。 After calculating the similarity to the kth product, the processor 201 determines whether the variable k has reached the total number of products registered in the dictionary 209 (the number of products in the dictionary) (ACT27). If the variable k has not reached the number of products in the dictionary 209, that is, if there is a product for which the similarity is not calculated (ACT29, NO), the processor 201 returns to ACT23 and the next product (k in the dictionary 209) = K + 1-th product) is calculated.

変数ｋが辞書２０９に登録されている商品数に達した場合、つまり、全商品に対する類似度を算出したと判断した場合（ＡＣＴ２９、ＹＥＳ）、プロセッサ２０１は、各商品に対する類似度により商品の認識結果（画像探索結果）を判定する（ＡＣＴ３０）。例えば、プロセッサ２０１は、類似度が最大となる商品を特定する。プロセッサ２０１は、最大の類似度が所定値（同定用の閾値）以上であれば、当該商品を入力商品画像に対する商品の認識結果（画像の探索結果）とする。また、最大の類似度が所定値（同定用の閾値）よりも小さければ、ｉ番目の商品画像に対して該当する商品（画像）が無いという処理結果とする。 When the variable k reaches the number of products registered in the dictionary 209, that is, when it is determined that the similarity for all the products is calculated (ACT29, YES), the processor 201 recognizes the product based on the similarity for each product. The result (image search result) is determined (ACT30). For example, the processor 201 specifies a product having the maximum similarity. If the maximum similarity is equal to or greater than a predetermined value (identification threshold), the processor 201 sets the product as a product recognition result (image search result) for the input product image. If the maximum similarity is smaller than a predetermined value (identification threshold), the processing result is that there is no corresponding product (image) for the i-th product image.

また、プロセッサ２０１は、上記ＡＣＴ１４−３０の処理を入力画像から抽出した全ての商品画像に対して実施する。この判断のため、プロセッサ２０１は、変数ｉが入力画像から抽出した商品画像の総数（商品領域数）に達したかを判断する（ＡＣＴ２９）。変数ｉが入力画像から抽出した商品画像の総数に達していない場合、つまり、商品認識処理（画像探索処理）を実施していない入力商品画像がある場合（ＡＣＴ３１、ＮＯ）、プロセッサ２０１は、ＡＣＴ１４へ戻り、次の商品画像（ｉ＝ｉ＋１番目の入力商品画像）に対する商品認識処理を実行する。また、変数ｉが商品領域数に達した場合、つまり、全ての商品画像に対する商品認識処理（画像探索処理）が終了した場合（ＡＣＴ３１、ＹＥＳ）、プロセッサ２０１は、カメラ１０３から入力した入力画像に対する商品認識処理（画像探索処理）を終了する。 In addition, the processor 201 performs the process of ACT 14-30 on all product images extracted from the input image. For this determination, the processor 201 determines whether the variable i has reached the total number of product images (number of product regions) extracted from the input image (ACT 29). If the variable i has not reached the total number of product images extracted from the input image, that is, if there is an input product image for which product recognition processing (image search processing) has not been performed (ACT 31, NO), the processor 201 determines that the ACT 14 Returning to the flowchart, the product recognition process for the next product image (i = i + 1th input product image) is executed. When the variable i reaches the number of product areas, that is, when the product recognition process (image search process) for all product images is completed (ACT31, YES), the processor 201 applies the input image input from the camera 103 to the input image. The product recognition process (image search process) is terminated.

以上のように、本実施形態に係る商品認識システムでは、入力画像から抽出する商品画像を小領域に分割し、分割領域ごとに所定数の特徴点を抽出する。これにより、商品画像において特徴点が特定の領域に集中することを防ぐことができる。特徴点が商品画像の全体に分散されることにより、商品名やロゴなどが類似または同一の商品の間でも判別性能が高くなる。また、商品の一部が隠れていても、特徴点の大半が隠れてしまい、判別性能が極端に低下するというリスクも下げることができる。これらの結果、本実施形態に係る商品認識システム（画像探索装置、商品認識装置）は、商品（画像）の判別の精度が向上し、高精度な商品認識（画像検索）処理を提供できる。 As described above, in the product recognition system according to the present embodiment, the product image extracted from the input image is divided into small regions, and a predetermined number of feature points are extracted for each divided region. Thereby, it can prevent that the feature point concentrates on a specific area | region in a product image. Since the feature points are dispersed throughout the product image, the discrimination performance is improved even between products having similar or identical product names and logos. Further, even if a part of the product is hidden, most of the feature points are hidden and the risk that the discrimination performance is extremely lowered can be reduced. As a result, the product recognition system (image search device, product recognition device) according to the present embodiment improves the accuracy of product (image) discrimination, and can provide highly accurate product recognition (image search) processing.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１０…商品認識システム、１０３…カメラ、１０５…認識装置（画像探索装置、商品認識装置）、２０１…プロセッサ、２０２…メモリ、２０３…不揮発性メモリ、２０５…入出力インターフェース（画像取得部）、２０９…辞書。 DESCRIPTION OF SYMBOLS 10 ... Product recognition system, 103 ... Camera, 105 ... Recognition apparatus (image search device, product recognition apparatus), 201 ... Processor, 202 ... Memory, 203 ... Non-volatile memory, 205 ... Input / output interface (image acquisition part), 209 …dictionary.

Claims

An image acquisition unit for acquiring an input image;
Divide the image to be searched included in the input image into multiple small areas,
Extract feature points up to a predetermined number for each divided small area,
Based on a plurality of feature points in the search target image and feature points of a plurality of registered images registered in the dictionary, a similarity between the search target image and each registered image stored in the dictionary is calculated. Identify the registered image with the highest similarity,
An image search apparatus comprising:

The processor extracts feature points whose intensities are equal to or higher than a predetermined threshold, with a predetermined number as an upper limit for each of the small regions.
The image search device according to claim 1.

An image acquisition unit for acquiring an input image;
Divide the product image included in the input image into multiple small areas,
Extract feature points up to a predetermined number for each divided small area,
Based on a plurality of feature points in the product image and feature points for a plurality of products registered in the dictionary, a similarity between the product and each product stored in the dictionary is calculated, and the maximum similarity is obtained. A product identifying processor,
A product recognition apparatus.

The processor extracts feature points whose intensities are equal to or higher than a predetermined threshold, with a predetermined number as an upper limit for each of the small regions.
The product recognition apparatus according to claim 3.

The image acquisition unit acquires, as an input image, an image taken by a shooting device of an image of an area where at least one product to be recognized is placed;
The processor extracts an image of a product to be recognized from the input image;
The product recognition apparatus according to claim 3 or 4.

On the computer,
Processing to acquire the input image;
A process of dividing an image to be searched included in the input image into a plurality of small areas;
A process of extracting feature points up to a predetermined number for each divided small area;
The similarity between the image to be searched and each registered image stored in the dictionary is calculated based on a plurality of feature points in the image to be searched and feature points of a plurality of registered images registered in the dictionary. Processing,
A process of specifying a registered image that is the maximum similarity among the calculated similarities;
An image search program for executing