JP7088618B2

JP7088618B2 - Information processing equipment and programs

Info

Publication number: JP7088618B2
Application number: JP2020192507A
Authority: JP
Inventors: 晴久加藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2022-06-21
Anticipated expiration: 2037-06-22
Also published as: JP2021015654A

Description

本発明は、画像から対象を高精度に認識するのに適した特徴情報を算出することのできる情報処理装置及びプログラムに関する。 The present invention relates to an information processing apparatus and a program capable of calculating feature information suitable for recognizing an object with high accuracy from an image.

画像から対象を認識する装置は、配布や提示が容易な媒体に記載されたアナログ情報からデジタル情報に変換させることが可能であり、利用者の利便性を向上させることができる。例えば、非特許文献１では、画像から特徴点を検出し、特徴点周辺から局所画像特徴量を算出した上で、事前に蓄積しておいた局所画像特徴量と照合することによって、対象の種類および相対的な位置関係を認識する。 A device that recognizes an object from an image can convert analog information described in a medium that is easy to distribute and present into digital information, and can improve user convenience. For example, in Non-Patent Document 1, a feature point is detected from an image, a local image feature amount is calculated from the periphery of the feature point, and then the local image feature amount is collated with a pre-stored local image feature amount to obtain a target type. And recognize relative positional relationships.

さらに、認識精度を向上させる装置や頑健性を獲得する装置としては、以下のようなものが公開されている。非特許文献２では、画像を2軸回転で変形した上でSIFT特徴量を抽出し特徴点の対応付けを高精度化する。特許文献１では、特徴点の誤対応を低減するため、特徴点を頂点としたドロネー三角形分割とトポロジー評価によって認識率向上を図る。特許文献２では、特徴量の判別性能を評価することによって、有効な特徴量を選別する手法を開示している。特許文献３では、特徴点およびエッジの位置関係の類似度を判定することで、特徴点の対応付けを高精度化する。 Furthermore, the following devices have been published as devices for improving recognition accuracy and devices for acquiring robustness. In Non-Patent Document 2, the SIFT feature amount is extracted after the image is deformed by biaxial rotation, and the correspondence of the feature points is made highly accurate. In Patent Document 1, in order to reduce erroneous correspondence of feature points, the recognition rate is improved by Delaunay triangulation with the feature points as vertices and topology evaluation. Patent Document 2 discloses a method of selecting an effective feature amount by evaluating the feature amount discrimination performance. In Patent Document 3, the degree of similarity between the positional relationship between the feature points and the edges is determined to improve the accuracy of the correspondence between the feature points.

特開２０１４－１２７０６８号公報Japanese Unexamined Patent Publication No. 2014-127068 特開２０１４－１３４８５８号公報Japanese Unexamined Patent Publication No. 2014-134858 特開２０１４－１２６８９３号公報Japanese Unexamined Patent Publication No. 2014-126893

D. G. Lowe, "Object recognition from local scale-invariant features", Proc. of IEEE International Conference on Computer Vision (ICCV), pp.1150-1157, 1999.D. G. Lowe, "Object recognition from local scale-invariant features", Proc. Of IEEE International Conference on Computer Vision (ICCV), pp.1150-1157, 1999. J.-M. Morel and G. Yu: "ASIFT: A new framework for fully affine invariant image comparison", SIAM J. Imaging Sciences, 2, 2, pp. 438-469, 2009.J.-M. Morel and G. Yu: "ASIFT: A new framework for fully affine invariant image comparison", SIAM J. Imaging Sciences, 2, 2, pp. 438-469, 2009.

しかしながら以上のような従来技術は、高精度な認識を行うためにはそれぞれ以下のような課題を有していた。 However, the above-mentioned conventional techniques have the following problems in order to perform highly accurate recognition.

非特許文献１および特許文献１、特許文献２、特許文献３では、特徴点が十分に検出できないと精度が低下するという問題がある。特に、特徴点がもともと少ない対象については精度が大幅に低下する。また、類似した対象が存在する場合、差異がある領域から特徴が算出されないと誤認識する恐れがあるという問題がある。非特許文献２では、回転の変形により特徴点を検出しやすくなる場合があるため、上記問題を一部解決する。しかし、回転の変形に係る処理時間の増加および特徴点数の増加による処理時間の増加が問題となる。また、対象が複数存在する場合に対応できないという問題がある。 Non-Patent Document 1, Patent Document 1, Patent Document 2, and Patent Document 3 have a problem that the accuracy is lowered if the feature points cannot be sufficiently detected. In particular, the accuracy is significantly reduced for objects that originally have few feature points. Further, when there are similar objects, there is a problem that there is a risk of erroneous recognition that the features are not calculated from the regions where there is a difference. In Non-Patent Document 2, since the feature point may be easily detected due to the deformation of rotation, the above problem is partially solved. However, the increase in the processing time related to the deformation of rotation and the increase in the processing time due to the increase in the number of feature points become problems. In addition, there is a problem that it cannot be dealt with when there are a plurality of targets.

上記従来技術の課題に鑑み、本発明は、画像から対象を高精度に認識するのに適した特徴情報を算出することのできる情報処理装置及びプログラムを提供することを目的とする。 In view of the above problems of the prior art, an object of the present invention is to provide an information processing apparatus and a program capable of calculating feature information suitable for recognizing an object from an image with high accuracy.

上記目的を達成するため、本発明は、情報処理装置であって、撮像対象が撮像されている画像に複数の加工処理を施しそれぞれの加工画像を得る加工部と、前記加工画像のそれぞれより特徴情報を算出する算出部と、前記算出された特徴情報と所定の複数の認識対象に関して予め記憶されている特徴情報とを照合することにより、前記画像における撮像対象が当該所定の複数の認識対象のいずれに該当するかを認識する認識部と、を備えることを特徴とする。また、本発明は、コンピュータを前記情報処理装置として機能させるプログラムであることを特徴とする。 In order to achieve the above object, the present invention is an information processing apparatus, which is characterized by a processing unit that performs a plurality of processing processes on an image to be imaged to obtain each processed image, and each of the processed images. By collating the calculated unit for calculating the information with the calculated feature information and the feature information stored in advance for the predetermined plurality of recognition targets, the image pickup target in the image is the predetermined plurality of recognition targets. It is characterized by including a recognition unit that recognizes which one it corresponds to. Further, the present invention is characterized in that it is a program that causes a computer to function as the information processing apparatus.

本発明によれば、加工画像の利用により、画像から対象を高精度に認識するのに適した特徴情報を算出することができる。 According to the present invention, by using a processed image, it is possible to calculate feature information suitable for recognizing an object with high accuracy from the image.

一実施形態に係る情報処理装置の機能ブロック図である。It is a functional block diagram of the information processing apparatus which concerns on one Embodiment. 一実施形態に係る情報処理装置の動作のフローチャートである。It is a flowchart of the operation of the information processing apparatus which concerns on one Embodiment. 第二実施形態が好適な例として、立体形状の同一認識対象を様々な角度において認識可能とする例を示す図である。As a preferred example of the second embodiment, it is a figure which shows an example which makes it possible to recognize the same recognition object of a three-dimensional shape at various angles. 図３の例に対応する例として、平面射影変換の加工処理を各k回目のループ処理において階層的に定義しておく例を示す図である。As an example corresponding to the example of FIG. 3, it is a figure which shows the example which the processing process of a plane projective transformation is hierarchically defined in each kth loop process. 図２の実施形態を前提にさらなる追加処理を行う実施形態のフローチャートである。It is a flowchart of the embodiment which performs further additional processing on the premise of the embodiment of FIG. 図５の実施形態を適用した模式例を、筐体として構成される撮像対象に関して示すものである。A schematic example to which the embodiment of FIG. 5 is applied is shown with respect to an image pickup target configured as a housing. 図５の実施形態を適用した模式例を、前輪として構成される第一認識対象及び後輪として構成される第二認識対象に関して示すものである。A schematic example to which the embodiment of FIG. 5 is applied is shown with respect to a first recognition target configured as a front wheel and a second recognition target configured as a rear wheel.

以下、図面を参照して本発明を説明する。本発明は、一例では拡張現実表示を実現する場合のように、情報処理装置10として携帯端末を利用し、撮像対象を任意の物体とする場合に好適である。しかし、本発明の情報処理装置は、携帯端末に限られるものではなく、撮像部1を備えたものであればどのような情報処理装置でもよく、例えば、デスクトップ型、ラップトップ型又はその他のコンピュータなどでもよい。また、撮像部1以外の一部若しくは全てをサーバーに設置し、情報を適宜、ネットワーク上の通信でやり取りしてもよい。同様に、撮像部1を外部構成としてもよく、例えばネットワーク上のサーバー等から取得した画像を対象として本発明を適用してもよい。 Hereinafter, the present invention will be described with reference to the drawings. The present invention is suitable when a portable terminal is used as the information processing apparatus 10 and the image pickup target is an arbitrary object, as in the case of realizing augmented reality display in one example. However, the information processing device of the present invention is not limited to the mobile terminal, and may be any information processing device provided with the image pickup unit 1, for example, a desktop type, a laptop type, or other computer. And so on. Further, a part or all other than the image pickup unit 1 may be installed on the server, and information may be exchanged by communication on the network as appropriate. Similarly, the image pickup unit 1 may be an external configuration, and the present invention may be applied to, for example, an image acquired from a server or the like on a network.

図１は、一実施形態に係る情報処理装置10の機能ブロック図である。情報処理装置10は、撮像部1、加工部2、算出部3、認識部4及び記憶部5を備える。当該各部の概要は以下の通りである。 FIG. 1 is a functional block diagram of the information processing apparatus 10 according to the embodiment. The information processing apparatus 10 includes an image pickup unit 1, a processing unit 2, a calculation unit 3, a recognition unit 4, and a storage unit 5. The outline of each part is as follows.

撮像部1は、ユーザによる撮像操作等を受け付けることによって撮像対象を撮像して、その撮像画像を加工部2へ出力する。撮像対象には予め既知の認識対象が含まれる。認識対象は例えば、特徴等が既知の模様を持つマーカーや印刷物、立体物等であってよい。撮像部1を実現するハードウェアとしては、携帯端末には標準装備されていることの多いデジタルカメラを用いることができる。 The image pickup unit 1 captures an image pickup target by accepting an image pickup operation or the like by a user, and outputs the captured image to the processing unit 2. The imaging target includes a recognition target known in advance. The recognition target may be, for example, a marker, a printed matter, a three-dimensional object, or the like having a pattern having a known feature or the like. As the hardware for realizing the image pickup unit 1, a digital camera, which is often installed as standard equipment in mobile terminals, can be used.

加工部2は、撮像部1で撮像された撮像画像に１種類以上の加工処理を適用し、当該適用した加工処理ごとの加工画像を得て、算出部3へと出力する。例えば3種類の加工処理M101,M102,M103を適用するものとして予め設定されている実施形態であれば、撮像画像Pに当該3種類の加工処理M101,M102,M103を適用して3つの加工画像Q101,Q102,Q103が得られ、算出部3へと出力されることとなる。なお、当該複数の加工処理の中には、何も加工しない処理、すなわち、撮像画像をそのまま加工画像とする処理が含まれていてもよい。 The processing unit 2 applies one or more types of processing to the image captured by the imaging unit 1, obtains a processed image for each of the applied processing processes, and outputs the processed image to the calculation unit 3. For example, if the embodiment is preset to apply the three types of processing M101, M102, M103, the three types of processing M101, M102, M103 are applied to the captured image P to obtain three processed images. Q101, Q102, and Q103 are obtained and output to the calculation unit 3. It should be noted that the plurality of processing processes may include a process in which nothing is processed, that is, a process in which the captured image is used as a processed image as it is.

ここで、当該適用する加工処理の種類や適用順番などは、所定の種類の加工処理や、所定規則による所定順番での加工処理といったものを、実施形態に応じて予め設定しておく。なお、当該所定規則の適用の一例として、図１中に線L2で示されているように、認識部4での認識結果に基づく加工部2への指示という形で適用されるものがある。加工部2の各実施形態の詳細は後述する。 Here, as for the type and order of application of the processing to be applied, a predetermined type of processing and a processing in a predetermined order according to a predetermined rule are set in advance according to the embodiment. As an example of the application of the predetermined rule, as shown by the line L2 in FIG. 1, there is one that is applied in the form of an instruction to the processing unit 2 based on the recognition result by the recognition unit 4. Details of each embodiment of the processing unit 2 will be described later.

算出部3は、まず加工部2で得られされた加工画像のそれぞれから認識対象の特徴点を検出する。当該検出する特徴点には、認識対象におけるコーナーなどの特徴的な点を利用できる。検出手法としては、SIFT (Scale-Invariant Feature Transform)やSURF (Speeded Up Robust Features)などの特徴的な点を検出する既存手法が利用できる。後述する統合化された特徴情報の算出においては、検出された特徴点座標は複数の加工情報間で共有し、特徴点の検出漏れを抑制してもよい。 The calculation unit 3 first detects the feature points to be recognized from each of the processed images obtained by the processing unit 2. As the feature point to be detected, a feature point such as a corner in the recognition target can be used. As a detection method, existing methods for detecting characteristic points such as SIFT (Scale-Invariant Feature Transform) and SURF (Speeded Up Robust Features) can be used. In the calculation of the integrated feature information described later, the detected feature point coordinates may be shared among a plurality of machining information to suppress the detection omission of the feature point.

算出部3では次に、当該検出された特徴点座標を中心として、それぞれの加工画像から局所画像特徴量を算出する。算出部3で以上のように算出された複数の特徴点および局所画像特徴量は、各加工画像における特徴情報として認識部4へ出力する。例えば３つの加工画像Q101,Q102,Q103が得られている場合であれば、３つの対応するそれぞれの特徴情報f101,f102,f103が認識部4へ出力される。局所画像特徴量の算出手法としては、SIFT(Scale-Invariant Feature Transform)やSURF(Speeded Up Robust Features)などの特徴的な量を算出する既存手法が利用できる。 Next, the calculation unit 3 calculates the local image feature amount from each processed image centering on the detected feature point coordinates. The plurality of feature points and local image feature quantities calculated as described above by the calculation unit 3 are output to the recognition unit 4 as feature information in each processed image. For example, when three processed images Q101, Q102, and Q103 are obtained, the three corresponding feature information f101, f102, and f103 are output to the recognition unit 4. As a method for calculating the local image feature amount, an existing method for calculating a characteristic amount such as SIFT (Scale-Invariant Feature Transform) or SURF (Speeded Up Robust Features) can be used.

認識部4は、算出部3で算出された特徴情報と記憶部5に記憶された特徴情報との類似度を求めることで、類似度最大に該当するものとして撮像部1で得た撮像画像における撮像対象の認識結果を得る。すなわち、記憶部5で記憶されているリファレンスとしての1つ以上の所定の撮像対象のいずれに、クエリとしての撮像画像内の対象が該当するのかを特定する。ここで、特徴情報同士を比較する際には、各種の周知手法を利用することができ、例えば、RANSAC(Random Sample Consensus)等により、特徴情報を構成している各特徴量をそれぞれ個別にマッチングすることを試みながら外れ値を排除することで、全体として最もマッチングしているものを特定する手法を用いてもよい。 The recognition unit 4 obtains the similarity between the feature information calculated by the calculation unit 3 and the feature information stored in the storage unit 5, so that the image captured by the image pickup unit 1 corresponds to the maximum similarity. Obtain the recognition result of the image pickup target. That is, it is specified which of one or more predetermined imaging targets stored in the storage unit 5 as a reference corresponds to the object in the captured image as a query. Here, when comparing feature information with each other, various well-known methods can be used. For example, RANSAC (Random Sample Consensus) or the like is used to individually match each feature amount constituting the feature information. You may use a technique to identify the best match as a whole by eliminating outliers while trying to do so.

ここで、撮像部1でクエリとして得た撮像画像Pが加工されて得られた１つ以上の加工画像Qi(i=1,2,...)における特徴情報fi(i=1,2,...)と、記憶部5で記憶しておくリファレンスとしての対象Oj(j=1,2,...)における特徴情報Fj(i=1,2,...)と、の類似度スコアscore(fi,Fj)を求めたうえで、当該類似度スコアの最大値を与えるようなペア(加工画像Qmax_i, 対象Omax_j)を特定することで、認識部4の認識結果を得ることができる。式で書けば以下の通りである。 Here, the feature information fi (i = 1,2, ...) in one or more processed images Qi (i = 1,2, ...) obtained by processing the captured image P obtained as a query in the imaging unit 1 ...) and the similarity between the feature information Fj (i = 1,2, ...) in the target Oj (j = 1,2, ...) as a reference to be stored in the storage unit 5. By obtaining the score score (fi, Fj) and then specifying the pair (processed image Qmax_i, target Omax_j) that gives the maximum value of the similarity score, the recognition result of the recognition unit 4 can be obtained. .. The formula is as follows.

すなわち本発明においては、撮像画像Pに対応する対象は記憶部5に記憶されている対象Omax_jである旨（通常の意味での認識結果）と、当該最大類似度を与えているのは撮像画像Pから得られた加工画像Qmax_iである旨（及び対応する加工部2における加工処理が加工処理Mmax_iである旨）が、認識結果に含まれて得られることとなる。 That is, in the present invention, the object corresponding to the captured image P is the target Omax_j stored in the storage unit 5 (recognition result in the normal sense), and it is the captured image that gives the maximum similarity. The fact that the processed image Qmax_i is obtained from P (and the fact that the processing processing in the corresponding processing unit 2 is the processing processing Mmax_i) is included in the recognition result.

なお、上記の認識結果は後述する実施形態Aにおけるものであり、後述する実施形態Bでは別形式での認識結果が得られることとなる。 It should be noted that the above recognition result is based on the embodiment A described later, and the recognition result in another format can be obtained in the embodiment B described later.

記憶部5は、事前にリファレンスとしての認識対象のそれぞれから算出された特徴情報を記憶しておき、上記説明した認識部4での認識処理の参照に供する。当該記憶しておく特徴情報の算出は、算出部3におけるのと同一種類の手法を用いる。 The storage unit 5 stores the feature information calculated from each of the recognition targets as a reference in advance, and uses it as a reference for the recognition process in the recognition unit 4 described above. The same type of method as in the calculation unit 3 is used for the calculation of the feature information to be stored.

図２は、一実施形態に係る情報処理装置10の動作のフローチャートである。図示する通り、ステップS2～S5,S7はループ（繰り返し）処理となっているが、各ステップの説明の際に当該時点におけるループ処理の回数をk(k=1,2, ...)として参照することとする。各ステップの処理内容は以下の通りである。 FIG. 2 is a flowchart of the operation of the information processing apparatus 10 according to the embodiment. As shown in the figure, steps S2 to S5 and S7 are loop (repeated) processes, but when explaining each step, the number of loop processes at that time is set to k (k = 1,2, ...). I will refer to it. The processing contents of each step are as follows.

ステップS1では、撮像部1が認識対象の撮像を行い撮像画像Pを得て加工部2に出力したうえで、ステップS2へと進む。ステップS2では、加工部2が当該得られた撮像画像Pに対して所定の複数の加工処理を行い、各加工処理に応じた加工画像を得て算出部3へと出力したうえで、ステップS3へと進む。ここで、ステップS2における加工処理は、加工部2の各実施形態における当該ループ処理k回目に応じた所定種類K(k)個の加工処理M(k,i)(i=1, 2, ..., K(k))であり、各加工処理に応じた加工画像Q[M(k,i)]が出力される。当該加工処理M(k,i)の具体例としての、加工部2の各実施形態は後述する。 In step S1, the image pickup unit 1 takes an image of the recognition target, obtains the captured image P, outputs it to the processing unit 2, and then proceeds to step S2. In step S2, the processing unit 2 performs a plurality of predetermined processing processes on the obtained captured image P, obtains a processed image corresponding to each processing process, outputs the processed image to the calculation unit 3, and then steps S3. Proceed to. Here, the machining process in step S2 is the machining process M (k, i) (i = 1, 2, .; .., K (k)), and the processed image Q [M (k, i)] corresponding to each processing process is output. Each embodiment of the processing unit 2 as a specific example of the processing process M (k, i) will be described later.

ステップS3では、直近のステップS2にて上記得られた各加工画像Q[M(k,i)]より算出部3がその特徴情報f[M(k,i)]を算出して認識部4へと出力したうえで、ステップS4へと進む。ステップS4では、直近のステップS3にて上記得られた各特徴情報f[M(k,i)]と、記憶部5にて各認識対象Ojに関して記憶されている特徴情報Fjと、の類似度スコアscore(f[M(k,i)],Fj)が最大となるものを、前述の式(1)の通りに認識部4が求めることによって認識結果を得てから、ステップS5へと進む。ここで、当該k回目のループ処理のステップS4において得られた認識結果を、加工処理M(k,imax(k))の加工画像Q(k,imax(k))とリファレンスの認識対象Ojmax(k)）との間で類似度スコアが最大であったものとして符号表記する。 In step S3, the calculation unit 3 calculates the feature information f [M (k, i)] from each processed image Q [M (k, i)] obtained in the latest step S2, and the recognition unit 4 After outputting to, proceed to step S4. In step S4, the degree of similarity between each feature information f [M (k, i)] obtained in the latest step S3 and the feature information Fj stored in the storage unit 5 for each recognition target Oj. The recognition result is obtained by the recognition unit 4 finding the one with the maximum score score (f [M (k, i)], Fj) according to the above equation (1), and then the process proceeds to step S5. .. Here, the recognition result obtained in step S4 of the kth loop processing is used as the processing image Q (k, imax (k)) of the processing process M (k, imax (k)) and the recognition target Ojmax (of the reference). It is coded as if the similarity score with k)) was the largest.

また、ステップS3及びS4の別の実施形態として、次のようにしてもよい。すなわち、ステップS3では得られた各特徴情報f[M(k,i)]を一連の加工処理M(k,i) (i=1, 2, ..., K(k))の間で統合したものとして撮像画像Pの統合された特徴情報f[k]を求めて認識部4へと出力するようにし、ステップS4では認識部4が当該統合された特徴情報f[k]と各認識対象Ojに関して記憶されている特徴情報Fjと、の類似度スコアscore(f[k],Fj)が最大となるもの(j=jmax(k))を認識結果Ojmax(k)を出力するようにしてもよい。 Further, as another embodiment of steps S3 and S4, the following may be performed. That is, each feature information f [M (k, i)] obtained in step S3 is transferred between a series of processing processes M (k, i) (i = 1, 2, ..., K (k)). As an integrated image, the integrated feature information f [k] of the captured image P is obtained and output to the recognition unit 4, and in step S4, the recognition unit 4 recognizes the integrated feature information f [k] and each recognition. The feature information Fj stored for the target Oj and the one with the maximum similarity score score (f [k], Fj) (j = jmax (k)) are recognized and the recognition result Ojmax (k) is output. You may.

ここで、説明の明確化のため、ステップS3及びS4の前者の実施形態（加工処理ごとの個別の特徴情報f[M(k,i)]で類似度評価などを行うもの）を実施形態A、後者の実施形態（一連の加工処理の間で統合された特徴情報f[k]で類似度評価などを行うもの）を実施形態Bと呼ぶこととする。 Here, in order to clarify the explanation, the former embodiment of steps S3 and S4 (in which the similarity is evaluated by the individual feature information f [M (k, i)] for each machining process) is implemented in the A embodiment. , The latter embodiment (the one in which the similarity evaluation or the like is performed by the feature information f [k] integrated between a series of processing processes) is referred to as the second embodiment.

ステップS5では、認識部4が当該k回目の時点において最終的な認識結果に相当するものが得られているか否かの判断を行い、得られている肯定判断の場合はステップS6へと進み、得られていない否定判断の場合はステップS7へと進む。 In step S5, the recognition unit 4 determines whether or not the final recognition result is obtained at the kth time, and if the positive judgment is obtained, the process proceeds to step S6. In the case of a negative judgment that has not been obtained, the process proceeds to step S7.

ステップS5における判断は、実施形態に応じて当該時点の回数kが所定上限値に到達しているか否かで判断してもよいし、当該k回目の時点までのステップS4で得られている一連の最大類似度スコアのうち所定閾値を超えるものがあるか否かで判断してもよい。なお、当該時点での回数kの所定上限値としては「1」を設定する実施形態も可能であり、この場合、ステップS7を経由した2回目以降のループ処理は行われない。 The determination in step S5 may be determined based on whether or not the number of times k at the relevant time point has reached a predetermined upper limit value according to the embodiment, or the series obtained in step S4 up to the kth time point. It may be determined whether or not any of the maximum similarity scores of the above exceeds a predetermined threshold value. It should be noted that an embodiment in which "1" is set as the predetermined upper limit value of the number of times k at the relevant time point is also possible, and in this case, the second and subsequent loop processes via step S7 are not performed.

ステップS6では、以上の各k回のループ処理における一連のステップS4で得られた認識結果の中から選ぶことで認識部4が最終的な認識結果を出力したうえで、図２のフローは終了する。当該選ばれる最終的な認識結果は、最後のステップS4における結果であってもよいし、一連のステップS4における最大類似度スコアを与えているものであってもよい。図１では、当該選ばれた最終的な認識結果の出力が線L1で示されている。 In step S6, the recognition unit 4 outputs the final recognition result by selecting from the recognition results obtained in the series of steps S4 in each of the above k loop processes, and then the flow of FIG. 2 ends. do. The final recognition result selected may be the result in the final step S4 or may be given the maximum similarity score in a series of steps S4. In FIG. 1, the output of the selected final recognition result is shown by line L1.

ステップS7では、実施形態Aが適用される場合、認識部4が直近のステップS4においてループ処理k回目に関して得た認識結果（加工画像M(k,imax(k))及びリファレンスの認識対象Ojmax(k)）の間で類似度スコアが最大となったという認識結果）に基づき、加工部2に対して次のループ処理k+1回目における加工処理M(k+1,i)(i=1, 2, ..., K(k+1))を指定してから、ステップS2へと戻る。こうして、ループ処理k+1回目におけるステップS2では、加工部2は当該直近のk回目のステップS7で認識部4より当該指定された加工処理M(k+1,i)(i=1, 2, ..., K(k+1))を行って各加工画像を得て、継続するステップS3,S4でも当該k+1回目の加工画像や特徴情報に応じた同様の処理が継続されることとなる。 In step S7, when the embodiment A is applied, the recognition result (processed image M (k, imax (k)) obtained by the recognition unit 4 for the kth loop processing in the latest step S4 and the recognition target Ojmax of the reference are used. Based on the recognition result) that the similarity score was maximized between k)), the processing process M (k + 1, i) (i = 1) in the next loop processing k + 1th time for the processing unit 2. , 2, ..., K (k + 1)), then return to step S2. In this way, in step S2 in the loop processing k + 1th time, the processing unit 2 is subjected to the designated processing processing M (k + 1, i) (i = 1, 2) from the recognition unit 4 in the latest kth step S7. , ..., K (k + 1)) is performed to obtain each processed image, and the same processing according to the k + 1st processed image and feature information is continued in the continuing steps S3 and S4. It will be.

また、ステップS7では、実施形態Bが適用される場合も上記の実施形態Aの場合と同様に、その認識結果（統合された特徴情報f[k]との類似スコア算出で対象Ojmax(k)が最大類似度となったという認識結果）に基づいて、ループ処理k+1回目における加工処理M(k+1,i)(i=1, 2, ..., K(k+1))を指定するようにしてもよい。当該指定する例としては後述する補足（２）の実施形態が挙げられる。 Further, in step S7, when the embodiment B is applied, the recognition result (the target Ojmax (k) is calculated by calculating the similarity score with the integrated feature information f [k], as in the case of the above embodiment A. Based on the recognition result that is the maximum similarity), the machining process M (k + 1, i) (i = 1, 2, ..., K (k + 1)) in the loop process k + 1st time. May be specified. An example of the designation is the embodiment of Supplement (2) described later.

以上、図２のフローチャートを説明した。当該フローは加工部2による加工処理や、当該加工処理の認識部4による指定処理の一般的な形を与える一実施形態となっているが、以下では、これら加工や指定についての具体的な各実施形態を説明する。 The flowchart of FIG. 2 has been described above. The flow is an embodiment that gives a general form of processing by the processing unit 2 and designation processing by the recognition unit 4 of the processing, but in the following, each specific processing and designation is given. An embodiment will be described.

第一実施形態は、k=1の初回の時点でステップS5からステップS6へと至り2回目以降のループ処理は行わず、且つ、実施形態Bが適用されるものである。 In the first embodiment, the loop processing from step S5 to step S6 is performed at the first time point of k = 1, and the loop processing after the second time is not performed, and the embodiment B is applied.

第一実施形態では、具体的にステップS2における加工部2の加工処理として予め設定した一種類以上の解像度変換を適用する。第一実施形態は次のような考察に基づくものである。すなわち、後段のステップS3の算出部3による特徴点算出は、僅かな画像の相違から分布が変化するという性質があるので、1枚の撮像画像PのみからN個の特徴点を算出するより、異なる解像度に変換されたM枚の加工情報からそれぞれN/M個の特徴点を算出する方が、同一のN個の特徴点において対応付けに効果的な特徴点算出が期待できる、という考察である。 In the first embodiment, one or more types of resolution conversions preset as the processing process of the processing unit 2 in step S2 are specifically applied. The first embodiment is based on the following considerations. That is, since the feature point calculation by the calculation unit 3 in step S3 in the subsequent stage has the property that the distribution changes due to a slight difference in the images, it is better to calculate N feature points from only one captured image P. Considering that it is better to calculate N / M feature points from the processing information of M sheets converted to different resolutions, which is more effective for mapping the same N feature points. be.

また、第一実施形態のステップS3における算出部3による統合された特徴情報の算出は具体的には、座標が重複する特徴を統合するようにすればよい。すなわち、解像度変換した各加工画像間では座標位置の対応関係が既知であるので、第一解像度の第一加工画像における第一特徴点と、第二解像度の第二加工画像における第二特徴点とが、当該位置関係の変換の適用により閾値判定で同位置にあると判定される場合には、第一特徴点の第一特徴量と第二特徴点の第二特徴量とを平均した一つの特徴量を、当該互いに対応する特徴点の特徴量として求めるようにすればよい。ここでは２枚の異なる解像度加工画像間での統合を説明したが、３枚以上の画像間でも全く同様に座標位置の対応関係の閾値判定により、平均化した特徴量を求めることができる。当該統合により、必要最低限の特徴を選別でき、特徴点数の増加を抑制できるため処理時間の高速化も期待できる。 Further, in the calculation of the integrated feature information by the calculation unit 3 in step S3 of the first embodiment, specifically, the features having overlapping coordinates may be integrated. That is, since the correspondence between the coordinate positions is known between the resolution-converted processed images, the first feature point in the first resolution processed image and the second feature point in the second resolution second processed image are known. However, when it is determined by the threshold determination that they are in the same position by applying the conversion of the positional relationship, one of the averages of the first feature amount of the first feature point and the second feature amount of the second feature point. The feature amount may be obtained as the feature amount of the feature points corresponding to each other. Here, the integration between two different resolution processed images has been described, but even between three or more images, the averaged feature amount can be obtained by determining the threshold value of the correspondence between the coordinate positions in exactly the same way. With this integration, the minimum required features can be selected and the increase in the number of feature points can be suppressed, so the processing time can be expected to increase.

なお、第一実施形態のステップS2における加工部2での解像度変換に際しては、認識部4での認識に既存技術である幾何検証も利用する実施形態においては、撮像画像のアスペクト比を維持するような一連の解像度変換を用いることが望ましい。ただし、僅かなアスペクト比の変化は幾何検証の閾値設定で許容できるため、あえてアスペクト比を所定範囲内において変化させることで特徴点分布の変化を促すこともできる。例えば、一画素だけ異なる複数の解像度に変換するようにしてもよい。具体例を挙げると、当初の撮像画像が640×480画素である場合に、641×480画素、640×481画素、641×481画素といった解像度変換された加工画像を出力するようにしてもよい。639×480画素といったように１画素だけ減らす加工画像を出力してもよい。 When converting the resolution in the processing unit 2 in step S2 of the first embodiment, the aspect ratio of the captured image should be maintained in the embodiment in which the geometric verification which is an existing technique is also used for the recognition in the recognition unit 4. It is desirable to use a series of resolution conversions. However, since a slight change in the aspect ratio can be tolerated by setting the threshold value for geometric verification, it is possible to promote the change in the feature point distribution by intentionally changing the aspect ratio within a predetermined range. For example, it may be converted to a plurality of resolutions different by one pixel. To give a specific example, when the initially captured image is 640 × 480 pixels, a processed image with resolution conversion such as 641 × 480 pixels, 640 × 481 pixels, and 641 × 481 pixels may be output. A processed image that reduces only one pixel, such as 639 × 480 pixels, may be output.

第二実施形態は、実施形態Aが適用されるものである。 The second embodiment is to which the A embodiment is applied.

具体的に、第二実施形態では、加工処理として予め設定した一つ以上の平面射影変換（ホモグラフィ行列による変換）を適用する。第二実施形態は次のような考察に基づくものである。すなわち、後段の算出部3において特徴量算出は相対的な位置関係（撮像部1を構成するカメラに対する姿勢等の位置関係）によって値が変化するため、撮像画像からそのまま特徴量を算出するより、記憶部5に記憶された特徴量が算出された際の位置関係と同じ位置関係となるように平面射影変換された加工画像から特徴量を算出する方が、対応付けに効果的な特徴量算出が期待できるという考察である。 Specifically, in the second embodiment, one or more planar projective transformations (transformation by a homography matrix) preset as a machining process are applied. The second embodiment is based on the following considerations. That is, since the value of the feature amount calculation in the subsequent calculation unit 3 changes depending on the relative positional relationship (positional relationship such as the posture with respect to the camera constituting the image pickup unit 1), the feature amount is calculated as it is from the captured image. It is more effective to calculate the feature amount for mapping by calculating the feature amount from the processed image that has been planarly projected so that the positional relationship is the same as the positional relationship when the feature amount stored in the storage unit 5 is calculated. Is a consideration that can be expected.

例えば、立体形状の同一認識対象を様々な角度から撮影した画像から算出された特徴量が、同一認識対象の異なる姿勢におけるものとして（形式上はそれぞれ異なる認識対象であるものとして）記憶部5に記憶されている場合、撮像対象を撮像した撮像画像からは正対した領域からの特徴量のみが正しく判定されうるが、正対していない領域からの特徴量は記憶部5に保持された特徴量とは一致しない。斜めから撮像するような射影変換を撮像情報に適用した加工画像を生成することで、認識部4での対応付けに効果的な特徴量算出が期待できる。 For example, the feature amount calculated from the images of the same recognition target of the three-dimensional shape taken from various angles is stored in the storage unit 5 as being in different postures of the same recognition target (assuming that they are formally different recognition targets). When stored, only the feature amount from the facing region can be correctly determined from the captured image of the image pickup target, but the feature amount from the non-facing region is held in the storage unit 5. Does not match. By generating a processed image in which a projective transformation such as imaging from an angle is applied to the imaging information, it can be expected that the feature amount calculation that is effective for the correspondence in the recognition unit 4 can be expected.

図３は、第二実施形態が好適な例として、当該立体形状の同一認識対象を様々な角度において認識可能とする例を示す図である。すなわち、撮像画像Pには直方体状の筐体を備える家電品が認識対象として撮像されているものとする。第二実施形態において予め設定されている複数の平面射影変換の中に、平面射影変換M1,M2,M3があったものとし、各変換M1,M2,M3（ホモグラフィ行列による変換）を撮像画像Pに適用した結果が図示する通りの加工画像Q1,Q2,Q3であったものとする。この場合、加工画像Q1,Q2,Q3はそれぞれ、撮像画像Pに撮像されている直方体状の筐体において、上面、左側面、右側面を撮像画像Pの場合よりもより正対した状態に変換するものとなっている。従って、記憶部5に予め当該上面、左側面、右側面が正対した状態の当該筐体におけるリファレンス画像からそれぞれ特徴情報を算出したものを記憶しておけば、第二実施形態においてこれら各面の高精度な認識が可能となる。 FIG. 3 is a diagram showing an example in which the same recognition target of the three-dimensional shape can be recognized at various angles as a preferable example of the second embodiment. That is, it is assumed that the captured image P captures a home appliance having a rectangular parallelepiped housing as a recognition target. It is assumed that the planar projective transformations M1, M2, and M3 are among the plurality of planar projective transformations preset in the second embodiment, and each transformation M1, M2, M3 (transformation by the homography matrix) is captured. It is assumed that the result applied to P is the processed images Q1, Q2, and Q3 as shown in the figure. In this case, the processed images Q1, Q2, and Q3 are converted into a state in which the upper surface, the left side surface, and the right side surface of the rectangular parallelepiped housing imaged in the captured image P are more facing each other than in the case of the captured image P. It is supposed to be done. Therefore, if the storage unit 5 stores in advance the characteristic information calculated from the reference images in the housing in which the upper surface, the left side surface, and the right side surface face each other, each of these surfaces is stored in the second embodiment. Highly accurate recognition is possible.

また特に、一般に撮像画像Pにおいて撮像対象の姿勢は未知であるが、第二実施形態においては一連の平面射影変換行列を予め設定しておくことで、少なくともいずれかの変換が図３のように概ね正対させる変換となり、記憶部5で記憶されている特徴情報と類似する特徴情報が得られるようにする確率を高めることができる。当該設定しておく一連の平面射影行列の中には、少なくとも平面の向きを変えるもの（カメラの光軸回り以外での回転成分を含むことでカメラ中心からの距離を変化させるような回転を含むもの）が含まれていることが好ましい。 In particular, the posture of the imaged object is generally unknown in the captured image P, but in the second embodiment, by setting a series of planar projective transformation matrices in advance, at least one of the transformations can be performed as shown in FIG. It is a transformation that is almost directly opposed, and it is possible to increase the probability that feature information similar to the feature information stored in the storage unit 5 can be obtained. The set of plane projection matrices includes at least those that change the direction of the plane (rotation that changes the distance from the center of the camera by including rotation components other than around the optical axis of the camera). It is preferable that the thing) is included.

以上の第二実施形態を図２のフローにおいてループ処理回数k≧2以降に適用する場合、各k回に適用する複数の平面射影変換をツリー状に定義しておくことで、段階的・反復的な認識部4による認識精度向上が期待できる。すなわち、図２のステップS7においては、当該k回目のループ処理において最大類似度スコアを与えた平面射影変換としての加工処理M(k,imax(k))の近傍の変換が行われるような加工処理をk+1回目に適用すべきものとして定義しておけばよい。すなわち、k回目の各加工処理M(k,i)(i=1, 2, ..., K(k))について、その近傍の加工を与える加工処理をk+1回目に適用すべきものとして定義しておけばよい。 When the above second embodiment is applied to the loop processing count k ≧ 2 or more in the flow of FIG. 2, by defining a plurality of planar projective transformations to be applied to each k times in a tree shape, it is stepwise and iterative. It can be expected that the recognition accuracy will be improved by the conventional recognition unit 4. That is, in step S7 of FIG. 2, processing in the vicinity of processing processing M (k, imax (k)) as a planar projective transformation giving the maximum similarity score in the kth loop processing is performed. The process should be defined as the one to be applied to the k + 1th time. That is, for each processing M (k, i) (i = 1, 2, ..., K (k)) of the kth time, the processing processing that gives the processing in the vicinity thereof should be applied to the k + 1th time. You just have to define it.

図４は、図３の例に対応する例として、平面射影変換の加工処理を各k回目のループ処理において階層的に定義しておく例を示す図である。例えばk=1回目のループ処理で図３でも説明した通りの加工画像Q1（筐体の上面を概ね正対（正面化）させる変換M1におけるもの）が最大類似度を与えたのであれば、続くk=2回目のループ処理においては、変換M1の近傍で変換する例えば３つの変換M11,M12,M13を加工適用対象として設定しておくことで、上面の概ね正対した状態でさらに上面がより正対状態に近くなる加工画像がQ11,Q12,Q13のいずれかにおいて得られる可能性を高めることができる。 FIG. 4 is a diagram showing an example in which the processing of the planar projective transformation is hierarchically defined in each k-th loop processing as an example corresponding to the example of FIG. For example, if the processed image Q1 (in the conversion M1 that makes the upper surface of the housing almost face-to-face (frontal)) as explained in FIG. 3 in the k = 1st loop process gives the maximum similarity, it continues. In the k = second loop processing, for example, by setting three conversions M11, M12, and M13 to be converted in the vicinity of the conversion M1 as processing application targets, the upper surface is further improved while the upper surface is almost facing each other. It is possible to increase the possibility that a processed image that is close to the facing state can be obtained in any of Q11, Q12, and Q13.

すなわち、図４の例より明らかなようにループ回数kが小さい間は複数の変化の大きな粗い平面射影変換が適用され、ループ回数kが大きくなるにつれ類似度最大で選択されたk-1回目の平面射影変換の近傍で変化の小さな平面射影変換が適用されるように、各k回における加工処理をツリー状に定義しておけばよい。 That is, as is clear from the example of FIG. 4, while the loop count k is small, a plurality of coarse planar projective transformations with large changes are applied, and as the loop count k increases, the k-1st time selected with the maximum similarity. The machining process at each k times may be defined in a tree shape so that the planar projective transformation with a small change is applied in the vicinity of the planar projective transformation.

なお、変換M11,M12,M13は当初の撮像画像Pに適用することで加工画像Q11,Q12,Q13がそれぞれ得られるようなものとして定義しておいてもよいし、前提となる加工画像Q1をさらに当該変換M11,M12,M13で変換することで加工画像Q11,Q12,Q13がそれぞれ得られるようなものとして定義しておいてもよい。加工部2による加工処理も当該定義しておいたのと同様の方式で加工してよい。 The converted M11, M12, and M13 may be defined as such that the processed images Q11, Q12, and Q13 can be obtained by applying them to the initial captured image P, or the presupposed processed image Q1 may be defined. Further, it may be defined as such that the processed images Q11, Q12, and Q13 can be obtained by converting with the conversion M11, M12, and M13, respectively. The processing by the processing unit 2 may be processed by the same method as defined above.

図５は、図２の実施形態を前提にさらなる追加処理を行う実施形態のフローチャートである。当該実施形態は、撮像画像Pにおける撮像対象の一部（以下説明する「第一対象」）を認識した結果に基づいて段階的に加工処理を施すことで、撮像対象のさらに別の一部（特に、当初の撮像画像Pにおいては認識が困難であった別の一部としての、以下説明する「第二対象」）をも認識可能とするものである。以下、図５の各ステップを説明する。 FIG. 5 is a flowchart of an embodiment in which further additional processing is performed on the premise of the embodiment of FIG. In the embodiment, another part of the image pickup target (the first target object described below) is further processed step by step based on the result of recognizing a part of the image pickup target in the captured image P (the "first object" described below). In particular, the "second object" described below), which is another part that was difficult to recognize in the initial captured image P, can also be recognized. Hereinafter, each step in FIG. 5 will be described.

ステップS11では、情報処理装置10が図２のフロー全体を行うことにより、撮像画像Pにおける第一対象の認識結果を得てから、ステップS12へと進む。当該ステップS11は、図２のフローを平面射影変換で加工する第二実施形態によって実現するのが好ましい。 In step S11, the information processing apparatus 10 performs the entire flow of FIG. 2 to obtain the recognition result of the first object in the captured image P, and then proceeds to step S12. The step S11 is preferably realized by the second embodiment in which the flow of FIG. 2 is processed by a planar projective transformation.

ステップS12では、加工部2が撮像部1で得た撮像画像P（対応する図２のフローのステップS1で得たもの）を第二対象の認識に適した状態となるように加工した加工画像を得たうえで、ステップS13へと進む。 In step S12, the processed image processed by the processing unit 2 so that the captured image P (obtained in step S1 of the corresponding flow of FIG. 2) obtained by the imaging unit 1 is in a state suitable for recognition of the second object. After obtaining, proceed to step S13.

ステップS12における加工部2の加工は、次のようにすればよい。前提として、ステップS11における第一対象の認識において追加処理として、第一対象の撮像部1のカメラ中心に対する位置姿勢を表す平面射影変換H1を認識部4において求めておくものとする。当該平面射影変換H1は、既存技術である幾何検証や、あるいは拡張現実分野における既存技術としての正方マーカーの位置姿勢認識でなされているのと同様の手法により、記憶部5に記憶しておく特徴情報における座標と撮像画像Pにおける認識対象の対応する座標との変換関係を与えるものとして、求めることができる。あるいは、別実施形態として、ステップS11における第一対象の認識で最も高い類似度を与えた加工処理における平面射影変換行列を、近似的にH1に該当するものとして採用するようにしてもよい。当該別実施形態では、上記のように座標の変換関係を追加的に求める処理が省略可能となる。 The processing of the processing portion 2 in step S12 may be performed as follows. As a premise, as an additional process in the recognition of the first target in step S11, the plane projection conversion H1 representing the position and orientation of the first target image pickup unit 1 with respect to the camera center is obtained in the recognition unit 4. The planar projective transformation H1 is characterized by being stored in the storage unit 5 by the same method as that used for geometric verification, which is an existing technique, or for position / orientation recognition of a square marker as an existing technique in the field of augmented reality. It can be obtained as giving a conversion relationship between the coordinates in the information and the corresponding coordinates of the recognition target in the captured image P. Alternatively, as another embodiment, the planar projective transformation matrix in the processing process that gives the highest similarity in the recognition of the first object in step S11 may be adopted as approximately corresponding to H1. In the other embodiment, the process of additionally obtaining the coordinate conversion relationship as described above can be omitted.

また前提として、記憶部5では、ステップS11で認識された第一対象に紐づけて、第二対象を正対化する平面射影変換行列H2の情報を予め登録しておくものとする。すなわち、画像Pに対して上記２つの平面射影変換の積H2*H1による変換を適用することで、第二対象が正対化（及び拡大化）されることが可能となるように、行列H2を登録しておくものとする。行列H2はすなわち、第一対象と第二対象との位置関係に基づいて与えられるものであり、第一対象が所定サイズでカメラ（撮像部1のカメラ中心）に対して正対した状態の画像（撮像画像Pを行列H1で変換した画像H1(P)）をさらに行列H2で変換することによって、第二対象が所定サイズでカメラ（撮像部1）に対して正対した状態の画像H2*H1(P)を得ることができるものである。 Further, as a premise, in the storage unit 5, it is assumed that the information of the planar projective transformation matrix H2 that faces the second object is registered in advance in association with the first object recognized in step S11. That is, the matrix H2 so that the second object can be face-to-face (and magnified) by applying the transformation by the product H2 * H1 of the above two planar projective transformations to the image P. Shall be registered. The matrix H2 is given based on the positional relationship between the first object and the second object, and is an image in which the first object has a predetermined size and faces the camera (the center of the camera of the imaging unit 1). By further converting (image H1 (P) obtained by converting the captured image P by the matrix H1) by the matrix H2, the image H2 * in which the second object faces the camera (imaging unit 1) at a predetermined size. H1 (P) can be obtained.

ステップS12では加工部2は当該登録された行列H2とステップS11で求まっている行列H1との積H2*H1を撮像画像Pに適用することで、第二対象が正対・拡大された加工画像H2*H1(P)を得てから、ステップS13へと進む。 In step S12, the processing unit 2 applies the product H2 * H1 of the registered matrix H2 and the matrix H1 obtained in step S11 to the captured image P, so that the second object is a facing / enlarged processed image. After obtaining H2 * H1 (P), proceed to step S13.

ステップS13では、当該加工画像H2*H1(P)より算出部3が特徴情報を算出し、且つ、認識部4が当該算出された特徴情報と記憶部5に記憶された特徴情報との類似度照合を行うことにより、第二対象が何であるかを認識する。ステップS14では、認識部4が以上のように得た第二対象の認識結果を出力して、図５のフローは終了する。 In step S13, the calculation unit 3 calculates the feature information from the processed image H2 * H1 (P), and the recognition unit 4 has the similarity between the calculated feature information and the feature information stored in the storage unit 5. By collating, it recognizes what the second object is. In step S14, the recognition unit 4 outputs the recognition result of the second object obtained as described above, and the flow of FIG. 5 ends.

なお、記憶部5では第一対象に対して、所定の行列H2と、対応する第二対象の候補としての複数の対象の特徴情報を登録しておき、ステップS13では当該複数の候補となる第二対象からいずれに該当するかを特定することができる。すなわち、第二対象は複数を登録しておきこれらの中から認識結果を得るが、行列H2は当該複数登録されたものについて共通のものを記憶部5に登録しておく。 In the storage unit 5, the predetermined matrix H2 and the feature information of a plurality of targets as the corresponding second target candidates are registered for the first target, and in step S13, the plurality of candidates are the first. (Ii) It is possible to specify which of the two targets is applicable. That is, the second object registers a plurality of objects and obtains a recognition result from these, but the matrix H2 registers a common object for the plurality of registered objects in the storage unit 5.

図６及び図７は図５の実施形態を適用した模式例をそれぞれ示すものである。図６では、撮像画像Pに対して図５の実施形態で加工処理M1が適用され筐体（図３，４の例と同様の筐体）の上面F1が概ね正対した状態の画像Q1において、当該上面が認識される。当該認識結果において行列H1が求まると共に、認識された上面F1（第一対象）に対応して予め記憶されている行列H2を用いることで、第二対象としての側面F2が拡大・正対状態となった加工画像Q20=H2*H1(P)が得られ、当該加工画像Q20において側面F2が第二対象として高精度に認識される。 6 and 7 show schematic examples to which the embodiment of FIG. 5 is applied. In FIG. 6, in the image Q1 in which the processing process M1 is applied to the captured image P in the embodiment of FIG. 5 and the upper surface F1 of the housing (the same housing as in the examples of FIGS. , The upper surface is recognized. The matrix H1 is obtained in the recognition result, and the side surface F2 as the second object is expanded / facing state by using the matrix H2 stored in advance corresponding to the recognized upper surface F1 (first object). The processed image Q20 = H2 * H1 (P) is obtained, and the side surface F2 is recognized with high accuracy as the second object in the processed image Q20.

上記の図６の例では第一対象と第二対象が筐体において互いに辺を共有する面（すなわち、互いに辺を共有し隣接している上面F1及び側面F2）として近接していたが、全く同様に、空間的に離れた第一対象と第二対象とを図５の実施形態によって認識することも可能である。図７は当該空間的に離れた第一対象と第二対象の例であり、車両（不図示）の前輪W1が第一対象として加工画像WQ1（なお、図７では正対状態に変換する行列H1が加工画像WQ1を得る行列と一致していた場合が示されている）において認識されたうえで、当初の撮像画像Pではサイズが小さく且つ後方側に歪んで存在していたために認識が困難であった後輪W2が拡大・正対化された加工画像WQ2=H2*H1(P)において認識可能となる例である。 In the example of FIG. 6 above, the first object and the second object are close to each other as surfaces that share sides with each other in the housing (that is, the upper surface F1 and the side surface F2 that share the sides and are adjacent to each other), but at all. Similarly, it is also possible to recognize the spatially separated first object and the second object by the embodiment of FIG. FIG. 7 is an example of the first object and the second object that are spatially separated, and the front wheel W1 of the vehicle (not shown) is the processed image WQ1 as the first object (note that in FIG. 7, a matrix that converts to a facing state). It is shown that H1 matches the matrix that obtains the processed image WQ1), and it is difficult to recognize because the size of the original captured image P is small and it is distorted to the rear side. This is an example in which the rear wheel W2, which was the above, can be recognized in the enlarged / facing processed image WQ2 = H2 * H1 (P).

例えば図７のような例であれば、前輪W1の拡大正対状態から後輪W2の拡大正対状態へと変換する行列H2を固定的なものとして記憶部5に登録しておき、第一対象としての前輪W1及び第二対象としての後輪W2のそれぞれの種類が複数の候補のいずれかに該当するものとして記憶部5に記憶しておき、情報端末装置10による認識を行うことができる。さらに既存技術である拡張現実による重畳で、撮像画像Pにおいて小さく歪んだ後輪W2に対して高精度な重畳表示を行うことも応用例として可能である。 For example, in the case of FIG. 7, the matrix H2 that converts the front wheel W1 from the magnified facing state to the rear wheel W2's magnified facing state is registered in the storage unit 5 as a fixed one, and the first Each type of the front wheel W1 as the target and the rear wheel W2 as the second target can be stored in the storage unit 5 as one of a plurality of candidates, and can be recognized by the information terminal device 10. .. Furthermore, it is also possible as an application example to perform high-precision superimposition display on the rear wheel W2 that is slightly distorted in the captured image P by superimposition by augmented reality, which is an existing technique.

以上、本発明によれば、撮像画像を加工することで複数の加工画像を生成し、それぞれから算出した特徴情報を利用することにより、撮像画像における撮像対象を高精度に認識することが可能となる。以下、本発明における説明上の補足を述べる。 As described above, according to the present invention, it is possible to generate a plurality of processed images by processing the captured image and to recognize the image pickup target in the captured image with high accuracy by using the feature information calculated from each. Become. Hereinafter, explanatory supplements in the present invention will be described.

（１）本発明の各実施形態は種々の組み合わせも可能である。例えば、第一実施形態（解像度変換）で認識部4において最良の認識結果（類似度スコア最大）を得た際の加工情報M201を利用して、第二実施形態（平面射影変換）の適用を行うようにしてもよい。この場合、第二実施形態において撮像部1が出力し認識対象とされる撮像画像Pを、加工情報M201を適用したものに置き換えたうえで、以上説明した通りの第二実施形態を適用すればよい。 (1) Various combinations are possible for each embodiment of the present invention. For example, the application of the second embodiment (planar projective conversion) is applied by using the processing information M201 when the best recognition result (maximum similarity score) is obtained in the recognition unit 4 in the first embodiment (resolution conversion). You may do it. In this case, if the captured image P output by the image pickup unit 1 and to be recognized in the second embodiment is replaced with the one to which the processing information M201 is applied, and then the second embodiment as described above is applied. good.

（２）また、第一実施形態（解像度変換）はk=1の初回の時点でステップS5からステップS6へと至り2回目以降のループ処理は行わないものとしたが、変形例としてk≧2以降の繰り返し処理を解像度変換に関して行うこともできる。この場合、特徴情報f[k]は各解像度の加工画像を統合したものとして得るようにするが、統合された情報ひとつひとつがどの解像度変換から生成されたのかを保持しておき、RANSAC等でインライアと判定された特徴情報が最も多く含まれる解像度の近傍解像度の加工を次のk+1回目の繰り返し処理で適用するように、図４を用いて平面射影変換に関して説明したのと同様のツリー構造を、解像度変換に関して予め設定しておけばよい。なお、特徴情報f[k]を各解像度の加工画像を統合したものとして得る際に、平均することによって統合したのであれば、平均した際の重み付け係数をカウントすることによって、RANSAC等でインライアと判定された特徴情報が最も多く含まれる解像度を決定するようにすればよい。 (2) Further, in the first embodiment (resolution conversion), it is assumed that the loop processing is not performed from the second time onward from step S5 to step S6 at the first time of k = 1, but k ≧ 2 as a modification. Subsequent iterative processing can also be performed for resolution conversion. In this case, the feature information f [k] is obtained as an integrated image of each resolution, but the integrated information is retained from which resolution conversion the integrated information is generated, and is aligned with RANSAC or the like. A tree structure similar to that described for the planar projective transformation using FIG. 4 so that the processing of the near resolution of the resolution containing the most characteristic information determined to be applied is applied in the next k + 1th iterative process. May be set in advance regarding the resolution conversion. If the feature information f [k] is integrated by averaging when the processed images of each resolution are integrated, by counting the weighting coefficient at the time of averaging, the in-liner is used in RANSAC or the like. The resolution that contains the most determined feature information may be determined.

（３）また、平面射影変換をツリー状に階層的に繰り返し適用する第二実施形態において、第一実施形態におけるような統合された特徴情報f[k]を用いるようにしてもよい。この場合も、解像度変換を繰り返し適用することを説明した上記の（２）と同様に、統合された特徴情報のひとつひとつがどの平面射影変換から生成されたのかを保持（平均の場合はその重みを保持）しておき、RANSAC等でインライアと判定された特徴情報が最も多く含まれる平面射影変換の近傍平面射影変換の加工を次のk+1回目の繰り返し処理で適用するようにすればよい。 (3) Further, in the second embodiment in which the planar projective transformation is repeatedly applied hierarchically in a tree shape, the integrated feature information f [k] as in the first embodiment may be used. In this case as well, as in (2) above, which explains that the resolution transformation is repeatedly applied, it is retained from which planar projective transformation each of the integrated feature information is generated (in the case of the average, its weight is used). (Hold), and the processing of the near-plane projective transformation of the planar projective transformation containing the largest amount of feature information determined to be inlier by RANSAC or the like may be applied in the next k + 1th iterative process.

（４）情報処理装置10は一般的な構成のコンピュータとして実現可能である。すなわち、CPU（中央演算装置）、当該CPUにワークエリアを提供する主記憶装置、ハードディスクやSSDその他で構成可能な補助記憶装置、キーボード、マウス、タッチパネルその他といったユーザからの入力を受け取る入力インタフェース、ネットワークに接続して通信を行うための通信インタフェース、表示を行うディスプレイ、カメラ及びこれらを接続するバスを備えるような、一般的なコンピュータによって情報処理装置10を構成することができる。さらに、図１に示す情報処理装置10の各部の処理はそれぞれ、当該処理を実行させるプログラムを読み込んで実行するCPUによって実現することができるが、任意の一部の処理を別途の専用回路等（GPUを含む）において実現するようにしてもよい。撮像部1は、当該ハードウェアとしてのカメラによって実現できる。 (4) The information processing apparatus 10 can be realized as a computer having a general configuration. That is, a CPU (Central Processing Unit), a main storage device that provides a work area for the CPU, an auxiliary storage device that can be configured with a hard disk, SSD, etc., an input interface that receives input from users such as a keyboard, mouse, touch panel, etc., and a network. The information processing unit 10 can be configured by a general computer including a communication interface for connecting to and communicating with, a display for displaying, a camera, and a bus connecting them. Further, the processing of each part of the information processing apparatus 10 shown in FIG. 1 can be realized by a CPU that reads and executes a program for executing the processing, but any part of the processing can be performed by a separate dedicated circuit or the like ( It may be realized in (including GPU). The image pickup unit 1 can be realized by the camera as the hardware.

10…情報処理装置、1…撮像部、2…加工部、3…算出部、4…認識部、5…記憶部 10 ... Information processing device, 1 ... Imaging unit, 2 ... Processing unit, 3 ... Calculation unit, 4 ... Recognition unit, 5 ... Storage unit

Claims

A processing unit that obtains each processed image by performing multiple processing processes on the image to be imaged.
A calculation unit that calculates feature information from each of the processed images,
The calculated feature information and the feature information stored by being calculated from a plurality of images taken with the plurality of faces facing each other for the same object having a plurality of faces having different orientations. A recognition unit for recognizing which of a plurality of surfaces of the same object corresponds to the image pickup target in the image by collating with the image is provided.
The information processing unit is characterized in that the processed image is obtained by performing a conversion process by a plurality of planar projective transformations.

The information processing apparatus according to claim 1, wherein the plurality of plane projection transformations to be performed include at least one that changes the direction of a plane.

The recognition unit collates the feature information of each of the processed images with the feature information stored in advance for each of the predetermined plurality of recognition targets, and is similar between each processed image and each recognition target. The information processing apparatus according to claim 1 or 2, wherein the information processing apparatus according to claim 1 or 2 recognizes which of the predetermined plurality of recognition targets corresponds to the image pickup target in the image.

The calculation unit calculates feature points and feature quantities from each of the processed images, and averages the feature quantities at the feature points whose coordinates correspond between the processed images, whereby the image integrated in the plurality of processed images. Calculate the feature information corresponding to
The information processing apparatus according to claim 1 or 2, wherein the recognition unit recognizes which of the above is applicable based on the feature information corresponding to the integrated image.

The stored feature information, which is the object to be collated by the recognition unit, is characterized in that it is calculated from a plurality of images captured only when the plurality of surfaces face each other. The information processing apparatus according to any one of claims 1 to 4.

A program characterized in that a computer functions as the information processing apparatus according to any one of claims 1 to 5 .