JP6393495B2

JP6393495B2 - Image processing apparatus and object recognition method

Info

Publication number: JP6393495B2
Application number: JP2014058194A
Authority: JP
Inventors: 宏将武井
Original assignee: Nihon Unisys Ltd
Current assignee: Nihon Unisys Ltd
Priority date: 2014-03-20
Filing date: 2014-03-20
Publication date: 2018-09-19
Anticipated expiration: 2034-03-20
Also published as: JP2015184743A

Description

本発明は、画像処理装置および物体認識方法に関し、特に、ある物体が写った２次元画像からその物体の形状や姿勢を認識する技術に関するものである。 The present invention relates to an image processing apparatus and an object recognition method, and more particularly to a technique for recognizing the shape and posture of a certain object from a two-dimensional image in which the object is captured.

従来、カメラにより撮影された２次元画像の中から所定の物体を検出あるいは認識する技術が広く用いられている。例えば、特定の人物をトラッキングする監視カメラ、車両周囲の障害物を検出して警告する走行支援システム、人の身振りや手振りなどの動作によってコンピュータを制御するジェスチャ入力など、その応用範囲は広い。これら種々の応用技術の中には、物体の形状や位置に加えて、物体の姿勢を検出することが必要なものも多くある。 Conventionally, a technique for detecting or recognizing a predetermined object from a two-dimensional image taken by a camera has been widely used. For example, the application range is wide, such as a monitoring camera that tracks a specific person, a driving support system that detects and warns an obstacle around the vehicle, and a gesture input that controls a computer by actions such as gestures and hand gestures. Many of these various applied technologies require detecting the posture of an object in addition to the shape and position of the object.

従来、手のジェスチャによって手ぶらで対象機器を操作できるようにするために、手先の形状、位置、姿勢を画像認識するようにした技術が提案されている（例えば、特許文献１参照）。この特許文献１に記載の情報入力装置では、ユーザを含む環境の観測データをもとに、ユーザを含む前景と、前景以外の環境からなる背景とを分離して、３次元モデルを学習し、既にモデル化された個別の前景モデルが環境中のどこに配置されているか、その位置と姿勢を推定する。 2. Description of the Related Art Conventionally, a technique has been proposed in which an image of the shape, position, and posture of a hand is recognized so that a target device can be operated with a hand gesture by hand (see, for example, Patent Document 1). In the information input device described in Patent Document 1, based on the observation data of the environment including the user, the foreground including the user is separated from the background including the environment other than the foreground, and the three-dimensional model is learned. Estimate where the individual foreground model already modeled is located in the environment and its position and orientation.

特開２０１３−２０５９８３号公報JP 2013-205983 A

上記特許文献１に記載された技術のように、撮影画像内から抽出した対象物体の画像（以下、対象画像という）とモデル化された複数の物体画像（以下、比較画像という）とを比較して、対象物体の形状や姿勢を認識する場合、特徴量に基づく認識方法が多用されている。この手法は、対象画像および比較画像のそれぞれから計算により抽出した特徴量を比較し、対象画像と最も特徴量が近い比較画像から対象物体の形状や姿勢を認識するというものである。 As in the technique described in Patent Document 1, a target object image extracted from a captured image (hereinafter referred to as a target image) is compared with a plurality of modeled object images (hereinafter referred to as comparative images). Thus, when recognizing the shape and orientation of a target object, a recognition method based on a feature amount is frequently used. In this method, feature amounts extracted by calculation from the target image and the comparison image are compared, and the shape and orientation of the target object are recognized from the comparison image having the closest feature amount to the target image.

しかしながら、画像から計算される１つあるいは数点の特徴量のみで比較を行った場合、誤認識が多くなるという問題があった。例えば、物体上に設定した１〜数個の代表点のみの特徴量を用いて比較を行った場合、代表点に関する画像の類似度が高ければ特徴量は互いに近い値を示すため、代表点以外の部分の類似度が低くても、本来は不正解の比較画像が対象画像に最も近い画像として選ばれてしまうことがある。 However, when the comparison is performed using only one or several feature amounts calculated from the images, there is a problem that misrecognition increases. For example, when the comparison is performed using the feature amounts of only one to several representative points set on the object, the feature amounts are close to each other if the image similarity with respect to the representative points is high. Even if the degree of similarity is low, the comparatively incorrect image may be selected as the image closest to the target image.

特に、１つの物体に関して様々な姿勢の比較画像を生成しておき、これらと対象画像とを比較することによって対象物体の姿勢を認識する場合、生成された複数の比較画像の形状自体は互いに似たものとなる。そのため、物体上の代表点のみについて計算した特徴量や、物体の全体の形状等から計算した１つの特徴量を用いた単純な比較では、本来は不正解の比較画像が対象画像に最も近い画像として選ばれてしまい、正しい姿勢を認識できないことが多くなるという問題があった。 In particular, when comparing images of various postures are generated for one object and the posture of the target object is recognized by comparing these with the target image, the shapes of the plurality of generated comparative images themselves are similar to each other. It will be. Therefore, in a simple comparison using a feature amount calculated only for a representative point on the object or one feature amount calculated from the overall shape of the object, the image that is originally the closest comparison image of the incorrect solution is the image closest to the target image There was a problem that the correct posture was often not recognized.

本発明は、このような問題を解決するために成されたものであり、対象画像およびあらかじめ用意された比較画像のそれぞれから計算した物体の特徴量の比較によって、撮影画像内に写る対象物体の形状や姿勢をより正しく認識できるようにすることを目的とする。 The present invention has been made to solve such a problem, and by comparing the feature amount of an object calculated from each of the target image and a comparison image prepared in advance, the target object captured in the captured image is obtained. The purpose is to make it possible to recognize the shape and posture more correctly.

上記した課題を解決するために、本発明では、撮影画像から対象物体を抽出して大きさを正規化した対象画像について、物体の境界の各ピクセル位置および当該各ピクセル位置における各特徴量を算出する。そして、当該算出した各ピクセル位置および各特徴量と、複数の比較画像について同様にしてあらかじめ算出しておいた各ピクセル位置および各特徴量とに基づいて、最も一致度の高い比較画像を検索し、当該検索した比較画像に関する物体の形状または姿勢を、撮影画像に写る物体の形状または姿勢として特定するようにしている。また、本発明では、抽出した境界に位置するピクセルのうち、一のピクセルの特徴量を算出する際、当該一のピクセルの周辺において境界に位置する他のピクセルが出現するパターンをビット列により表すバイナリ特徴量を特徴量として算出する。また、本発明では、対象画像と最も一致度の高い比較画像を検索する際、比較画像のそれぞれについて、抽出した境界上のピクセル位置毎に、対象画像のピクセルに係るバイナリ特徴量とのハミング距離が所定の閾値以下のバイナリ特徴量を有するピクセルが周辺に存在するか否かを判定し、ハミング距離が所定の閾値以下のバイナリ特徴量を有するピクセルが周辺に存在すると判定されたピクセル位置の個数が最も多い比較画像を、最も一致度の高い比較画像として検出する。

In order to solve the above-described problems, in the present invention, for a target image obtained by extracting a target object from a captured image and normalizing the size, each pixel position of the boundary of the object and each feature amount at each pixel position are calculated. To do. Then, based on each calculated pixel position and each feature amount and each pixel position and each feature amount calculated in advance in the same manner for a plurality of comparison images, a comparison image with the highest degree of matching is searched. The shape or orientation of the object related to the searched comparison image is specified as the shape or orientation of the object shown in the captured image. Further, in the present invention, when calculating the feature amount of one pixel among the pixels located at the extracted boundary, a binary representing a pattern in which other pixels located at the boundary appear around the one pixel is represented by a bit string. The feature amount is calculated as a feature amount. Further, in the present invention, when searching for a comparative image having the highest degree of coincidence with the target image, the Hamming distance between each of the comparative images and the binary feature amount relating to the pixel of the target image for each pixel position on the extracted boundary The number of pixel positions where it is determined whether or not there is a pixel having a binary feature amount equal to or smaller than a predetermined threshold, and the pixel having a binary feature amount whose Hamming distance is equal to or smaller than the predetermined threshold is present in the vicinity. The comparison image with the largest number is detected as the comparison image with the highest degree of coincidence.

上記のように構成した本発明によれば、対象物体の境界を構成するそれぞれのピクセル位置毎に、撮影画像から抽出された対象画像と比較画像との間で特徴量の比較が行われる。そして、最も一致度の高い比較画像が対象画像に最も近い画像として検索され、検索された比較画像により対象物体の形状または姿勢が特定されることとなる。これにより、物体上の代表点のみの特徴量に基づき比較を行う場合や、物体の全体の形状等から計算した１つの特徴量に基づき比較を行う場合に比べて精度の高い比較を行うことができ、撮影画像内に写る対象物体の形状や姿勢をより正しく認識することができる。 According to the present invention configured as described above, the feature amount is compared between the target image extracted from the photographed image and the comparison image for each pixel position constituting the boundary of the target object. Then, the comparison image with the highest degree of coincidence is searched as the image closest to the target image, and the shape or posture of the target object is specified by the searched comparison image. As a result, it is possible to perform a comparison with higher accuracy than when comparing based on the feature amount of only the representative point on the object, or comparing based on one feature amount calculated from the overall shape of the object. It is possible to more accurately recognize the shape and posture of the target object shown in the captured image.

本実施形態による画像処理装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the image processing apparatus by this embodiment. 本実施形態の主成分分析部により分析される主成分方向を説明するための図である。It is a figure for demonstrating the principal component direction analyzed by the principal component analysis part of this embodiment. 本実施形態によるバイナリ特徴量の算出方法を説明するための図である。It is a figure for demonstrating the calculation method of the binary feature-value by this embodiment. 本実施形態による画像マッチング部の処理例を説明するための図である。It is a figure for demonstrating the example of a process of the image matching part by this embodiment. 本実施形態による画像処理装置の動作例で、比較データを生成して記憶させる処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process which produces | generates and memorize | stores comparison data in the operation example of the image processing apparatus by this embodiment. 本実施形態による画像処理装置の動作例で、撮影画像から対象物体の形状または姿勢を認識する処理の手順を示すフローチャートである。6 is a flowchart illustrating a procedure of processing for recognizing the shape or posture of a target object from a captured image in an operation example of the image processing apparatus according to the present embodiment.

以下、本発明の一実施形態を図面に基づいて説明する。図１は、本実施形態による画像処理装置１００の機能構成例を示すブロック図である。図１に示すように、本実施形態の画像処理装置１００は、その機能構成として、撮影画像取得部１１、物体抽出部１２、主成分分析部１３、正規化処理部１４、境界抽出部１５、特徴量算出部１６、比較画像取得部１７、比較データ生成部１８、画像マッチング部１９および比較データ記憶部２０を備えている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a functional configuration example of the image processing apparatus 100 according to the present embodiment. As shown in FIG. 1, the image processing apparatus 100 according to the present embodiment includes, as its functional configuration, a captured image acquisition unit 11, an object extraction unit 12, a principal component analysis unit 13, a normalization processing unit 14, a boundary extraction unit 15, A feature amount calculation unit 16, a comparison image acquisition unit 17, a comparison data generation unit 18, an image matching unit 19, and a comparison data storage unit 20 are provided.

上記各機能ブロック１１〜１９は、ハードウェア、ＤＳＰ（Digital Signal Processor）、ソフトウェアの何れによっても構成することが可能である。例えばソフトウェアによって構成する場合、上記各機能ブロック１１〜１９は、実際にはコンピュータのＣＰＵ、ＲＡＭ、ＲＯＭなどを備えて構成され、ＲＡＭやＲＯＭ、ハードディスクまたは半導体メモリ等の記録媒体に記憶されたプログラムが動作することによって実現される。 Each of the functional blocks 11 to 19 can be configured by any of hardware, DSP (Digital Signal Processor), and software. For example, when configured by software, each of the functional blocks 11 to 19 is actually configured by including a CPU, RAM, ROM, etc. of a computer, and is stored in a recording medium such as RAM, ROM, hard disk, or semiconductor memory. Is realized by operating.

撮影画像取得部１１は、単眼カメラ２００を用いて実空間を撮影することにより生成される２次元画像を取得する。なお、図１の例では、パーソナルコンピュータ等の画像処理装置１００に単眼カメラ２００を接続しておき、単眼カメラ２００で撮影された２次元画像を撮影画像取得部１１がリアルタイムに取得する例を示しているが、本発明はこれに限定されない。例えば、単眼カメラ２００で撮影した２次元画像をメモリに記憶させ、このメモリに記憶された２次元画像を撮影画像取得部１１が後から取り込むようにしてもよい。 The captured image acquisition unit 11 acquires a two-dimensional image generated by capturing a real space using the monocular camera 200. In the example of FIG. 1, a monocular camera 200 is connected to the image processing apparatus 100 such as a personal computer, and the captured image acquisition unit 11 acquires a two-dimensional image captured by the monocular camera 200 in real time. However, the present invention is not limited to this. For example, a two-dimensional image captured by the monocular camera 200 may be stored in a memory, and the captured image acquisition unit 11 may later capture the two-dimensional image stored in the memory.

物体抽出部１２は、撮影画像取得部１１により取得された撮影画像から背景を除去して対象物体の画像である対象画像を抽出する。例えば、物体抽出部１２は、撮影画像に対して前景抽出処理を行うことにより、撮影画像内に写っている認識対象物体の画像を抽出する。前景抽出処理には公知の手法を採用することが可能である。例えば、グラフカットを用いた前景抽出処理を適用することが可能である。また、対象物体の画像および背景の画像を教師データとして用い、当該教師データから生成した学習データを用いて前景抽出処理を行うようにしてもよい。 The object extraction unit 12 removes the background from the captured image acquired by the captured image acquisition unit 11 and extracts a target image that is an image of the target object. For example, the object extracting unit 12 performs foreground extraction processing on the captured image, thereby extracting an image of the recognition target object that appears in the captured image. A known method can be adopted for the foreground extraction process. For example, foreground extraction processing using graph cut can be applied. Further, the foreground extraction process may be performed using learning data generated from the teacher data using the image of the target object and the background image as the teacher data.

主成分分析部１３は、物体画像（物体抽出部１２により抽出された対象画像および後述する比較画像取得部１７により取得された比較画像）に対して主成分分析を行い、第１主成分方向および第２主成分方向を定める。主成分分析とは、元の多数の説明変数で表わされる情報を数個の主成分に要約して表現するための公知の処理である。本実施形態では、物体画像の各ピクセル値を説明変数として、これらの説明変数に座標変換を行って、総合指標となる２つの変数（第１主成分方向および第２主成分方向）を生成する。図２に示すように、第１主成分方向は物体のおよその長尺方向であり、第２主成分方向は物体のおよその短尺方向である。 The principal component analysis unit 13 performs principal component analysis on the object image (the target image extracted by the object extraction unit 12 and the comparison image acquired by the comparison image acquisition unit 17 described later), and the first principal component direction and A second principal component direction is defined. Principal component analysis is a known process for summarizing and expressing information represented by a large number of original explanatory variables into several principal components. In this embodiment, each pixel value of the object image is used as an explanatory variable, and coordinate conversion is performed on these explanatory variables to generate two variables (a first principal component direction and a second principal component direction) that serve as a comprehensive index. . As shown in FIG. 2, the first principal component direction is an approximately long direction of the object, and the second principal component direction is an approximately short direction of the object.

正規化処理部１４は、物体画像が正規化された大きさとなるように、物体画像を正規化する。具体的には、正規化処理部１４は、主成分分析部１３により特定された第１主成分方向および第２主成分方向により物体画像を正規化する。ここで言う正規化とは、第１主成分方向および第２主成分方向に対する特徴ベクトルの大きさが１となるように、物体画像の大きさを拡大または縮小する処理のことである。これにより、同じ対象物体を撮影した場合には、例えば撮影距離の違いによって撮影画像内に写る物体画像の大きさが異なっても、正規化処理後は略同じ大きさの物体画像に整えることができる。 The normalization processing unit 14 normalizes the object image so that the object image has a normalized size. Specifically, the normalization processing unit 14 normalizes the object image based on the first principal component direction and the second principal component direction specified by the principal component analysis unit 13. The normalization referred to here is a process of enlarging or reducing the size of the object image so that the size of the feature vector with respect to the first principal component direction and the second principal component direction is 1. As a result, when the same target object is photographed, for example, even if the size of the object image shown in the photographed image varies depending on the photographing distance, the object image having the same size can be arranged after the normalization process. it can.

境界抽出部１５は、正規化処理部１４により正規化された物体画像から、物体の境界（シルエット）を抽出する。境界の抽出は、例えば、いわゆるエッジ検出処理（画像の輝度や色などが鋭敏に（不連続に）変化している箇所を特定する処理）によって行うことが可能である。 The boundary extraction unit 15 extracts a boundary (silhouette) of the object from the object image normalized by the normalization processing unit 14. The extraction of the boundary can be performed by, for example, a so-called edge detection process (a process of specifying a location where the brightness or color of the image changes sharply (discontinuously)).

特徴量算出部１６は、境界抽出部１５により抽出された境界の各ピクセル位置および当該各ピクセル位置における各特徴量を算出する。境界のピクセル位置ごとに特徴量が求まるのであれば、その算出方法は任意である。例えば、特徴量算出部１６は、以下に説明するバイナリ特徴量を境界のピクセル位置毎に算出する。図３は、バイナリ特徴量の算出方法を説明するための図である。 The feature amount calculation unit 16 calculates each pixel position of the boundary extracted by the boundary extraction unit 15 and each feature amount at each pixel position. If a feature amount is obtained for each pixel position on the boundary, the calculation method is arbitrary. For example, the feature amount calculation unit 16 calculates a binary feature amount described below for each pixel position on the boundary. FIG. 3 is a diagram for explaining a method of calculating a binary feature amount.

まず、特徴量算出部１６は、図３（ａ）に示すように、物体のシルエット画像の注目するピクセル３１を中心として、一辺の大きさをｎ（ｎは２以上の任意の整数）とするｎ×ｎのボクセル３２を設定する。図３（ａ）の例は、ｎ＝５としてボクセル３２を設定した場合を示している。注目ピクセル３１は、シルエット画像において物体の境界線３３があるピクセルの中の１つである。 First, as shown in FIG. 3A, the feature amount calculation unit 16 sets the size of one side to n (n is an arbitrary integer equal to or greater than 2) with the pixel 31 of interest in the silhouette image of the object as the center. n × n voxels 32 are set. The example of FIG. 3A shows a case where the voxel 32 is set with n = 5. The target pixel 31 is one of the pixels having the boundary line 33 of the object in the silhouette image.

次に、特徴量算出部１６は、図３（ｂ）に示すように、各ボクセル３２の符号化を行う。具体的には、特徴量算出部１６は、境界線３３のピクセルを含むボクセル３２は値を１とし、含まないボクセル３２は値を０とする。なお、０と１の符号化は逆パターンとしてもよい。すなわち、境界線３３のピクセルを含むボクセル３２は値を０、含まないボクセル３２は値を１としてもよい。 Next, the feature quantity calculation unit 16 encodes each voxel 32 as shown in FIG. Specifically, the feature amount calculation unit 16 sets the value to 1 for voxels 32 including pixels of the boundary line 33 and sets the value to 0 for voxels 32 not including the boundary line 33. The encoding of 0 and 1 may be a reverse pattern. That is, the value of voxel 32 including the pixel of the boundary line 33 may be 0, and the value of voxel 32 not including the pixel of boundary line 33 may be 1.

さらに、特徴量算出部１６は、図３（ｃ）に示すように、符号化された各ボクセル３２の値を横方向、縦方向、斜め２方向の４方向に取得し、それらを順に配列して２０次元のバイナリ列を生成する。これが求めるバイナリ特徴量である。なお、ここに挙げた４方向およびその配列の順番は一例であって、これに限定されるものではない。特徴量算出部１６は、物体の境界線３３に沿って注目ピクセル３１を１つずつ移動させ、境界線３３上のピクセル位置ごとにバイナリ特徴量を算出する。 Further, as shown in FIG. 3C, the feature amount calculation unit 16 obtains the values of each encoded voxel 32 in four directions of the horizontal direction, the vertical direction, and the two diagonal directions, and arranges them in order. To generate a 20-dimensional binary string. This is the binary feature value to be obtained. It should be noted that the four directions listed here and the order of arrangement thereof are merely examples, and the present invention is not limited to these. The feature amount calculation unit 16 moves the target pixel 31 one by one along the boundary line 33 of the object, and calculates a binary feature amount for each pixel position on the boundary line 33.

比較画像取得部１７は、撮影画像から認識したい対象物体と同じ画像を比較画像として取得する。例えば、比較画像取得部１７は、パーソナルコンピュータ等で生成された物体のＣＧ画像を取得する。取得の形態は任意である。例えば、画像処理装置１００にパーソナルコンピュータを接続し、比較画像取得部１７がパーソナルコンピュータからダイレクトに比較画像を取得する。あるいは、パーソナルコンピュータ等で生成された比較画像をメモリに記憶させ、このメモリに記憶された比較画像を比較画像取得部１７が取り込むようにしてもよい。 The comparative image acquisition unit 17 acquires the same image as the target object to be recognized from the captured image as a comparative image. For example, the comparative image acquisition unit 17 acquires a CG image of an object generated by a personal computer or the like. The form of acquisition is arbitrary. For example, a personal computer is connected to the image processing apparatus 100, and the comparative image acquisition unit 17 acquires a comparative image directly from the personal computer. Alternatively, a comparison image generated by a personal computer or the like may be stored in a memory, and the comparison image acquisition unit 17 may capture the comparison image stored in the memory.

撮影画像から対象物体の形状を認識したい場合、比較画像取得部１７は、形状の異なる様々な物体に関する比較画像を取得する。また、撮影画像から対象物体の姿勢（対象物体が向いている方向）を認識したい場合、比較画像取得部１７は、同じ物体に関して姿勢の異なる様々な比較画像を取得する。撮影画像から対象物体の形状および姿勢の両方を認識したい場合、比較画像取得部１７は、形状の異なる様々な物体のそれぞれに関して、姿勢の異なる様々な比較画像を取得する。ここで、比較画像取得部１７は、各種の比較画像を、当該比較画像が表す物体の形状または姿勢の少なくとも何れか一方を示す情報と共に取得する。 When it is desired to recognize the shape of the target object from the captured image, the comparative image acquisition unit 17 acquires comparative images relating to various objects having different shapes. When it is desired to recognize the posture of the target object (the direction in which the target object is facing) from the captured image, the comparative image acquisition unit 17 acquires various comparative images having different postures with respect to the same object. When it is desired to recognize both the shape and posture of the target object from the captured image, the comparative image acquisition unit 17 acquires various comparative images having different postures for various objects having different shapes. Here, the comparative image acquisition unit 17 acquires various comparative images together with information indicating at least one of the shape and posture of the object represented by the comparative image.

上述した主成分分析部１３、正規化処理部１４、境界抽出部１５および特徴量算出部１６は、比較画像取得部１７により取得された比較画像についても同様の処理を行う。これは、認識したい対象物体を単眼カメラ２００で撮影する前に、あらかじめ行っておく。この場合、主成分分析部１３は、比較画像取得部１７により取得された比較画像に対して主成分分析を行い、第１主成分方向および第２主成分方向を定める。 The above-described principal component analysis unit 13, normalization processing unit 14, boundary extraction unit 15, and feature amount calculation unit 16 perform the same processing on the comparison image acquired by the comparison image acquisition unit 17. This is performed in advance before the target object to be recognized is photographed by the monocular camera 200. In this case, the principal component analysis unit 13 performs a principal component analysis on the comparison image acquired by the comparison image acquisition unit 17 to determine a first principal component direction and a second principal component direction.

正規化処理部１４は、比較画像取得部１７により取得された比較画像が正規化された大きさとなるように、比較画像を正規化する。これにより、正規化された比較画像により表される物体画像の大きさと、撮影画像から抽出され正規化される物体画像の大きさとを略同じ大きさに整えることができる。境界抽出部１５は、正規化処理部１４により正規化された比較画像から物体の境界（シルエット）を抽出する。特徴量算出部１６は、境界抽出部１５により抽出された境界の各ピクセル位置および当該各ピクセル位置における各バイナリ特徴量を算出する。 The normalization processing unit 14 normalizes the comparison image so that the comparison image acquired by the comparison image acquisition unit 17 has a normalized size. Thereby, the size of the object image represented by the normalized comparison image and the size of the object image extracted from the captured image and normalized can be adjusted to substantially the same size. The boundary extraction unit 15 extracts the boundary (silhouette) of the object from the comparison image normalized by the normalization processing unit 14. The feature amount calculation unit 16 calculates each pixel position of the boundary extracted by the boundary extraction unit 15 and each binary feature amount at each pixel position.

比較データ生成部１８は、物体に関する複数の比較画像について主成分分析部１３、正規化処理部１４、境界抽出部１５および特徴量算出部１６の各処理を行うことによってあらかじめ生成された各ピクセル位置および各バイナリ特徴量のセットを、複数の比較画像毎に、当該比較画像が表す物体の形状または姿勢の少なくとも何れか一方を示す情報と合わせて比較データを生成する。そして、生成した比較データを比較データ記憶部２０に記憶させる。 The comparison data generation unit 18 performs pixel processing of each pixel position generated in advance by performing each process of the principal component analysis unit 13, the normalization processing unit 14, the boundary extraction unit 15, and the feature amount calculation unit 16 on a plurality of comparison images related to the object. The set of binary feature values is combined with information indicating at least one of the shape and orientation of the object represented by the comparison image for each of the plurality of comparison images to generate comparison data. Then, the generated comparison data is stored in the comparison data storage unit 20.

画像マッチング部１９は、特徴量算出部１６により対象画像について算出された各ピクセル位置および各バイナリ特徴量と、比較データ記憶部２０に複数の比較画像毎にあらかじめ記憶されている各ピクセル位置および各バイナリ特徴量とに基づいて、最も一致度の高い比較画像を検索する。そして、当該検索した比較画像に関する物体の形状または姿勢を、撮影画像に写る対象物体の形状または姿勢として特定する。 The image matching unit 19 includes each pixel position and each binary feature amount calculated for the target image by the feature amount calculation unit 16, each pixel position and each of the plurality of comparison images stored in advance in the comparison data storage unit 20. Based on the binary feature amount, a comparative image having the highest degree of matching is searched. Then, the shape or orientation of the object related to the searched comparison image is specified as the shape or orientation of the target object shown in the captured image.

具体的には、画像マッチング部１９は、特徴量算出部１６により対象画像について算出された各ピクセル位置と比較画像について算出された各ピクセル位置との差、および、特徴量算出部１６により対象画像について算出された各バイナリ特徴量と比較画像について算出された各バイナリ特徴量との差に基づいて、各ピクセル位置でのバイナリ特徴量の差の大きさが全体として最も小さくなる比較画像を、最も一致度の高い比較画像として検索する。 Specifically, the image matching unit 19 determines the difference between each pixel position calculated for the target image by the feature amount calculation unit 16 and each pixel position calculated for the comparison image, and the target image by the feature amount calculation unit 16. On the basis of the difference between each binary feature amount calculated with respect to the comparison image and each binary feature amount calculated with respect to the comparison image, the comparison image having the smallest overall difference in binary feature amount at each pixel position is Search as a comparative image with a high degree of coincidence.

図４は、画像マッチング部１９の処理例を説明するための図である。図４（ａ）は、撮影画像から抽出された対象物体のシルエット画像を示す。また、図４（ｂ）〜（ｄ）は、複数の比較画像から抽出された比較物体のシルエット画像を示す。 FIG. 4 is a diagram for explaining a processing example of the image matching unit 19. FIG. 4A shows a silhouette image of the target object extracted from the captured image. 4B to 4D show silhouette images of comparison objects extracted from a plurality of comparison images.

まず、画像マッチング部１９は、図４（ａ）に示すように、特徴量算出部１６により対象画像について算出されたピクセル位置（x1,y1）と、そのピクセル位置において算出されたバイナリ特徴量Ｐ11とを取得する。次に、画像マッチング部１９は、図４（ｂ）〜（ｄ）に示す複数の比較画像を対象として、ピクセル位置（x1,y1）から所定の距離以内（符号４１で示す範囲）のピクセル位置に、バイナリ特徴量Ｐ11とのハミング距離が所定の閾値以下となるバイナリ特徴量を持つ比較画像があるか否かを検索する。 First, as shown in FIG. 4A, the image matching unit 19 includes the pixel position (x1, y1) calculated for the target image by the feature amount calculation unit 16 and the binary feature amount P11 calculated at the pixel position. And get. Next, the image matching unit 19 targets a plurality of comparative images shown in FIGS. 4B to 4D as pixel positions within a predetermined distance (range indicated by reference numeral 41) from the pixel position (x1, y1). Then, it is searched whether there is a comparison image having a binary feature amount in which the Hamming distance with the binary feature amount P11 is equal to or less than a predetermined threshold.

ハミング距離とは、対象画像および比較画像のそれぞれから算出された２０ビットから成るバイナリ特徴量の対応ビットどうしを比較して、異なる値を示すビットの数をカウントした値のことである。画像マッチング部１９は、ハミング距離が所定の閾値以下となる比較画像が検索された場合、その比較画像に対してスコアを加算する。 The Hamming distance is a value obtained by comparing the corresponding bits of the 20-bit binary feature amount calculated from each of the target image and the comparison image and counting the number of bits indicating different values. When a comparison image in which the Hamming distance is equal to or less than a predetermined threshold is searched, the image matching unit 19 adds a score to the comparison image.

図４（ｂ）に示す比較画像は、図４（ａ）の対象画像と略同形同大（大きさは正規化されている）であり、ピクセル位置（x1,y1）から所定の距離以内のピクセル位置に、バイナリ特徴量Ｐ11とのハミング距離が所定の閾値以下となるバイナリ特徴量を持つ。よって、この比較画像に対してスコアを加算する。一方、図４（ｃ）、（ｄ）に示す比較画像は、図４（ａ）の対象画像と形状が異なるため、ピクセル位置（x1,y1）から所定の距離以内のピクセル位置に、バイナリ特徴量Ｐ11とのハミング距離が所定の閾値以下となるバイナリ特徴量を持たない。よって、これらの比較画像に対してはスコアを加算しない。 The comparison image shown in FIG. 4B is substantially the same shape and size as the target image in FIG. 4A (the size is normalized), and within a predetermined distance from the pixel position (x1, y1). The pixel feature has a binary feature amount whose Hamming distance from the binary feature amount P11 is equal to or less than a predetermined threshold value. Therefore, a score is added to this comparison image. On the other hand, since the comparison images shown in FIGS. 4C and 4D are different in shape from the target image in FIG. 4A, the binary feature is located at a pixel position within a predetermined distance from the pixel position (x1, y1). There is no binary feature amount whose Hamming distance to the amount P11 is equal to or less than a predetermined threshold. Therefore, no score is added to these comparative images.

画像マッチング部１９は、以上のような処理を、図４（ａ）に示す対象物体の境界線上の各ピクセル位置について順次行う。そして、その結果としてスコアが最も大きくなった比較画像を、対象画像と最も一致度の高い比較画像として抽出する。そして、当該抽出した比較画像に関する物体の形状または姿勢（比較データ記憶部２０に記憶されている）を、撮影画像に写る対象物体の形状または姿勢として特定する。 The image matching unit 19 sequentially performs the above processing for each pixel position on the boundary line of the target object shown in FIG. As a result, the comparative image having the highest score is extracted as the comparative image having the highest degree of coincidence with the target image. Then, the shape or posture of the object related to the extracted comparison image (stored in the comparison data storage unit 20) is specified as the shape or posture of the target object shown in the captured image.

図５および図６は、上記のように構成した本実施形態による画像処理装置１００の動作例を示すフローチャートである。図５は、あらかじめ比較データを生成して記憶させる処理の手順を示す。図６は、撮影画像から対象物体の形状または姿勢を認識する処理の手順を示す。 5 and 6 are flowcharts showing an operation example of the image processing apparatus 100 according to the present embodiment configured as described above. FIG. 5 shows a procedure of processing for generating and storing comparison data in advance. FIG. 6 shows a processing procedure for recognizing the shape or posture of the target object from the captured image.

図５において、まず、比較画像取得部１７は、パーソナルコンピュータ等で生成された物体のＣＧ画像を比較画像として取得する（ステップＳ１）。例えば、比較画像取得部１７は、形状の異なる様々な物体に関して、姿勢の異なる様々な比較画像を、当該比較画像が表す物体の形状および姿勢を示す情報と共に取得する。次に、主成分分析部１３は、比較画像取得部１７により取得された比較画像に対して主成分分析を行い、第１主成分方向および第２主成分方向を定める（ステップＳ２）。 In FIG. 5, first, the comparative image acquisition unit 17 acquires a CG image of an object generated by a personal computer or the like as a comparative image (step S1). For example, the comparison image acquisition unit 17 acquires various comparison images having different postures with respect to various objects having different shapes together with information indicating the shape and posture of the object represented by the comparison images. Next, the principal component analysis unit 13 performs a principal component analysis on the comparison image acquired by the comparison image acquisition unit 17, and determines a first principal component direction and a second principal component direction (step S2).

さらに、正規化処理部１４は、比較画像取得部１７により取得された比較画像が正規化された大きさとなるように、比較画像を正規化する（ステップＳ３）。続いて、境界抽出部１５は、正規化処理部１４により正規化された比較画像から物体の境界（シルエット）を抽出する（ステップＳ４）。そして、特徴量算出部１６は、境界抽出部１５により抽出された境界の各ピクセル位置および当該各ピクセル位置における各バイナリ特徴量を算出する（ステップＳ５）。 Further, the normalization processing unit 14 normalizes the comparison image so that the comparison image acquired by the comparison image acquisition unit 17 has a normalized size (step S3). Subsequently, the boundary extraction unit 15 extracts the boundary (silhouette) of the object from the comparison image normalized by the normalization processing unit 14 (step S4). Then, the feature amount calculation unit 16 calculates each pixel position of the boundary extracted by the boundary extraction unit 15 and each binary feature amount at each pixel position (step S5).

最後に、比較データ生成部１８は、特徴量算出部１６により複数の比較画像について生成された各ピクセル位置および各バイナリ特徴量のセットを、当該複数の比較画像毎に、当該比較画像が表す物体の形状および姿勢を示す情報と共に比較データ記憶部２０に記憶させる（ステップＳ６）。以上により、撮影画像から対象物体の形状および姿勢を認識するための前準備が完了する。 Finally, the comparison data generation unit 18 represents the set of each pixel position and each binary feature amount generated for the plurality of comparison images by the feature amount calculation unit 16 for each of the plurality of comparison images. Is stored in the comparison data storage unit 20 together with information indicating the shape and posture (step S6). Thus, the preparation for recognizing the shape and posture of the target object from the captured image is completed.

図６において、撮影画像取得部１１は、単眼カメラ２００を用いて実空間を撮影することにより生成される２次元画像を取得する（ステップＳ１１）。また、物体抽出部１２は、撮影画像取得部１１により取得された撮影画像から背景を除去して物体の対象画像を抽出する（ステップＳ１２）。次に、主成分分析部１３は、物体抽出部１２により抽出された対象画像に対して主成分分析を行い、第１主成分方向および第２主成分方向を定める（ステップＳ１３）。 In FIG. 6, the captured image acquisition unit 11 acquires a two-dimensional image generated by capturing a real space using the monocular camera 200 (step S11). In addition, the object extraction unit 12 removes the background from the captured image acquired by the captured image acquisition unit 11 and extracts a target image of the object (step S12). Next, the principal component analysis unit 13 performs a principal component analysis on the target image extracted by the object extraction unit 12, and determines a first principal component direction and a second principal component direction (step S13).

さらに、正規化処理部１４は、物体抽出部１１により抽出された対象画像が正規化された大きさとなるように、対象画像を正規化する（ステップＳ１４）。さらに、境界抽出部１５は、正規化処理部１４により正規化された対象画像から、物体の境界（シルエット）を抽出する（ステップＳ１５）。続いて、特徴量算出部１６は、境界抽出部１５により抽出された境界の各ピクセル位置および当該各ピクセル位置における各特徴量を算出する（ステップＳ１６）。 Further, the normalization processing unit 14 normalizes the target image so that the target image extracted by the object extraction unit 11 has a normalized size (step S14). Further, the boundary extraction unit 15 extracts the boundary (silhouette) of the object from the target image normalized by the normalization processing unit 14 (step S15). Subsequently, the feature amount calculation unit 16 calculates each pixel position of the boundary extracted by the boundary extraction unit 15 and each feature amount at each pixel position (step S16).

最後に、画像マッチング部１９は、特徴量算出部１６により対象画像について算出された各ピクセル位置および各バイナリ特徴量と、比較データ記憶部２０に複数の比較画像毎にあらかじめ記憶されている各ピクセル位置および各バイナリ特徴量とに基づいて、最も一致度の高い比較画像を検索し、当該検索した比較画像に関する物体の形状または姿勢を、撮影画像に写る対象物体の形状または姿勢として特定する（ステップＳ１７）。 Finally, the image matching unit 19 stores each pixel position and each binary feature amount calculated for the target image by the feature amount calculation unit 16 and each pixel stored in advance in the comparison data storage unit 20 for each of the plurality of comparison images. Based on the position and each binary feature amount, a comparison image having the highest degree of coincidence is searched, and the shape or posture of the object related to the searched comparison image is specified as the shape or posture of the target object in the captured image (step) S17).

以上詳しく説明したように、本実施形態では、撮影画像から対象物体を抽出して大きさを正規化した対象画像について、境界の各ピクセル位置および当該各ピクセル位置における各特徴量を算出する。そして、当該算出した各ピクセル位置および各特徴量と、複数の比較画像について同様にしてあらかじめ算出しておいた各ピクセル位置および各特徴量とに基づいて、最も一致度の高い比較画像を検索し、当該検索した比較画像に関する物体の形状または姿勢を、撮影画像に写る対象物体の形状または姿勢として特定するようにしている。 As described above in detail, in the present embodiment, each pixel position of the boundary and each feature amount at each pixel position are calculated for the target image obtained by extracting the target object from the captured image and normalizing the size. Then, based on each calculated pixel position and each feature amount and each pixel position and each feature amount calculated in advance in the same manner for a plurality of comparison images, a comparison image with the highest degree of matching is searched. Then, the shape or orientation of the object related to the searched comparison image is specified as the shape or orientation of the target object shown in the captured image.

このように構成した本実施形態によれば、対象物体の境界を構成するそれぞれのピクセル位置毎に、撮影画像から抽出された物体の対象画像と比較画像との間で特徴量の比較が行われる。そして、最も一致度の高い比較画像が対象画像に最も近い画像として検索され、検索された比較画像により物体の形状または姿勢が特定されることとなる。これにより、従来のように物体上の代表点のみの特徴量に基づき比較を行う場合や、物体の全体の形状等から計算した１つの特徴量に基づき比較を行う場合に比べて精度の高い比較を行うことができ、撮影画像内に写る対象物体の形状や姿勢をより正しく認識することができる。 According to the present embodiment configured as described above, the feature amount is compared between the target image of the object extracted from the photographed image and the comparison image for each pixel position constituting the boundary of the target object. . Then, the comparison image with the highest degree of coincidence is searched as the image closest to the target image, and the shape or posture of the object is specified by the searched comparison image. As a result, the comparison is based on the feature value of only the representative point on the object as in the past, and the comparison is more accurate than the comparison based on the single feature value calculated from the overall shape of the object. And the shape and posture of the target object appearing in the captured image can be recognized more correctly.

なお、上記実施形態では、主成分分析を行って第１主成分方向および第２主成分方向を特定し、これら２つの方向により物体画像を正規化する例について説明したが、本発明はこれに限定されない。物体画像を正規化することができれば、必ずしも主成分分析による方法によらずともよい。 In the above embodiment, the example in which the principal component analysis is performed to identify the first principal component direction and the second principal component direction and the object image is normalized by these two directions has been described. It is not limited. As long as the object image can be normalized, the method based on principal component analysis is not necessarily required.

また、上記実施形態では、画像マッチングにおいてハミング距離を算出する例について説明したが、本発明はこれに限定されない。対象画像から算出された特徴量と比較画像から算出された特徴量との類似度を算出することが可能な手法であれば、何れも本発明に適用することが可能である。 Moreover, although the said embodiment demonstrated the example which calculates a Hamming distance in image matching, this invention is not limited to this. Any method can be applied to the present invention as long as it can calculate the degree of similarity between the feature amount calculated from the target image and the feature amount calculated from the comparison image.

また、上記実施形態では、対象画像から物体の特徴量を算出する処理と、比較画像から物体の特徴量を抽出する処理とを同じ画像処理装置１００の主成分分析部１３、正規化処理部１４、境界抽出部１５および特徴量算出部１６を用いて行う例について説明したが、本発明はこれに限定されない。例えば、比較画像から物体の特徴量を抽出する処理を画像処理装置１００とは別のパーソナルコンピュータ等で行い、その結果得られた比較データを比較データ記憶部２０にあらかじめ記憶しておくようにしてもよい。 In the above embodiment, the principal component analysis unit 13 and the normalization processing unit 14 of the same image processing apparatus 100 perform the processing for calculating the feature amount of the object from the target image and the processing for extracting the feature amount of the object from the comparison image. Although the example performed using the boundary extraction unit 15 and the feature amount calculation unit 16 has been described, the present invention is not limited to this. For example, the process of extracting the feature amount of the object from the comparison image is performed by a personal computer or the like different from the image processing apparatus 100, and the comparison data obtained as a result is stored in the comparison data storage unit 20 in advance. Also good.

その他、上記実施形態は、何れも本発明を実施するにあたっての具体化の一例を示したものに過ぎず、これによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその要旨、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 In addition, each of the above-described embodiments is merely an example of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed in a limited manner. That is, the present invention can be implemented in various forms without departing from the gist or the main features thereof.

１１撮影画像取得部
１２物体抽出部
１３主成分分析部
１４正規化処理部
１５境界抽出部
１６特徴量算出部
１７比較画像取得部
１８比較データ生成部
１９画像マッチング部
２０比較データ記憶部
１００画像処理装置
２００単眼カメラ DESCRIPTION OF SYMBOLS 11 Captured image acquisition part 12 Object extraction part 13 Principal component analysis part 14 Normalization process part 15 Boundary extraction part 16 Feature-value calculation part 17 Comparison image acquisition part 18 Comparison data generation part 19 Image matching part 20 Comparison data storage part 100 Image processing Device 200 Monocular camera

Claims

An object extraction unit that extracts a target image that is an image of the target object by removing the background from the captured image;
A normalization processing unit that normalizes the object image so that the object image has a normalized size;
A boundary extraction unit that extracts the boundary of the object from the object image normalized by the normalization processing unit;
A feature amount calculation unit that calculates each pixel position of the boundary extracted by the boundary extraction unit and each feature amount at each pixel position;
A set of each pixel position and each feature value generated in advance by performing the same processing as the normalization processing unit, the boundary extraction unit, and the feature value calculation unit for the plurality of comparison images related to the object is compared with the plurality of comparison images. A comparison data storage unit that stores, for each image, information indicating at least one of the shape and orientation of the object represented by the comparison image;
Based on each pixel position and each feature amount calculated for the target image by the feature amount calculation unit, and each pixel position and each feature amount stored for each of the plurality of comparison images in the comparison data storage unit, An image matching unit that searches for a comparative image having the highest degree of coincidence and identifies the shape or orientation of the object related to the searched comparative image as the shape or orientation of the object in the captured image ;
The feature amount calculation unit
Among the pixels located at the boundary extracted by the boundary extraction unit, when calculating the feature value of one pixel, a binary representing a pattern in which other pixels located at the boundary appear around the one pixel by a bit string Calculate the feature value as a feature value,
The image matching unit
When searching for a comparison image having the highest degree of coincidence with the target image, for each comparison image, for each pixel position on the boundary extracted by the boundary extraction unit, a binary feature amount related to the pixel of the target image It is determined whether or not there is a pixel having a binary feature amount whose Hamming distance is equal to or smaller than a predetermined threshold, and the pixel position where it is determined that a pixel having a binary feature amount whose Hamming distance is equal to or smaller than a predetermined threshold exists in the vicinity An image processing apparatus that detects a comparison image having the largest number of images as a comparison image having the highest degree of coincidence .

A first step in which an object extraction unit of the image processing apparatus extracts a target image that is an image of a target object by removing the background from the captured image;
A second step in which the normalization processing unit of the image processing apparatus normalizes the object image so that the object image extracted by the object extraction unit has a normalized size;
A third step in which the boundary extraction unit of the image processing device extracts the boundary of the object from the object image normalized by the normalization processing unit;
A fourth step in which the feature amount calculation unit of the image processing apparatus calculates each pixel position of the boundary extracted by the boundary extraction unit and each feature amount at each pixel position;
The image matching unit of the image processing apparatus includes each pixel position and each feature amount calculated for the target image by the feature amount calculation unit, and each pixel stored in advance in the comparison data storage unit for each of a plurality of comparison images. A comparison image with the highest degree of matching is searched based on the position and each feature amount, and the shape or posture of the object related to the searched comparison image is specified as the shape or posture of the object shown in the captured image ,
In the fourth step, when calculating the feature amount of one pixel among the pixels located on the boundary extracted by the boundary extraction unit, the feature amount calculation unit sets the boundary around the one pixel. A binary feature value representing a pattern in which other pixels appear in a bit string is calculated as a feature value,
In the fifth step, when the image matching unit searches for a comparison image having the highest degree of coincidence with the target image, for each comparison image, for each pixel position on the boundary extracted by the boundary extraction unit. , It is determined whether there is a pixel having a binary feature amount whose Hamming distance with a binary feature amount relating to the pixel of the target image is equal to or less than a predetermined threshold, and a binary feature amount whose Hamming distance is equal to or less than a predetermined threshold value An object recognition method, comprising: detecting a comparison image having the largest number of pixel positions determined to be present in the vicinity as a comparison image having the highest degree of matching .