JP2018010359A

JP2018010359A - Information processor, information processing method, and program

Info

Publication number: JP2018010359A
Application number: JP2016136928A
Authority: JP
Inventors: 檜垣　欣成; Kinsei Higaki; 欣成檜垣
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-07-11
Filing date: 2016-07-11
Publication date: 2018-01-18

Abstract

PROBLEM TO BE SOLVED: To provide an information processor capable of finding corresponding points between images with high precision and high density while suppressing calculation costs.SOLUTION: The present invention relates to an information processor having: acquisition means of acquiring a first single-viewpoint image and a second single-viewpoint image obtained by imaging the same subject from different viewpoints; generation means for generating a plurality of feature quantity maps respectively for the first single-viewpoint image and second single-viewpoint image by performing processing for applying a filter for detecting a specific structure in an image on the first single-viewpoint image and second single-viewpoint image in stages while changing filters; and search means for searching for corresponding points of the first single-viewpoint image and second single-viewpoint image based upon the plurality of generated feature quantity maps, filters that the generation means applies in the respective stages being a plurality of filters which are different from one another.SELECTED DRAWING: Figure 3

Description

本発明は、異視点画像間で対応点を探索するための技術に関する。 The present invention relates to a technique for searching for corresponding points between different viewpoint images.

同一の被写体を異なる視点から見た場合の複数の画像（多視点画像）を用いて、被写体の距離や形状に関する情報を取得する技術がある。また、多視点画像を用いて、カメラの位置および姿勢を推定する技術がある。さらに、パノラマ画像の作成やノイズ低減、超解像などの目的で複数の画像を合成する技術がある。これらの技術において、複数の画像間で対応する点（対応点）を探索することが必須である。 There is a technique for acquiring information about the distance and shape of a subject using a plurality of images (multi-viewpoint images) when the same subject is viewed from different viewpoints. There is also a technique for estimating the position and orientation of a camera using a multi-viewpoint image. Furthermore, there is a technique for synthesizing a plurality of images for the purpose of creating a panoramic image, reducing noise, and super-resolution. In these techniques, it is essential to search for corresponding points (corresponding points) between a plurality of images.

特許文献１は、ニューラルネットワークを用いて２つの入力画像間の対応点を探索する方法を開示している。特許文献１では、２つの入力画像の各々を複数の矩形領域に分割し、該分割した矩形領域の夫々について特徴量ベクトルを算出し、該算出した特徴量ベクトルに基づき、第１の入力画像の矩形領域に対応する第２の入力画像の矩形領域を探索する処理を反復する。 Patent Document 1 discloses a method for searching for corresponding points between two input images using a neural network. In Patent Document 1, each of the two input images is divided into a plurality of rectangular regions, a feature amount vector is calculated for each of the divided rectangular regions, and the first input image is calculated based on the calculated feature amount vector. The process of searching for the rectangular area of the second input image corresponding to the rectangular area is repeated.

特開２００９−２０５５５３号公報JP 2009-205553 A

特許文献１では、２つの入力画像の各々について、分割された矩形領域ごとの特徴量ベクトルを算出する際、入力画像毎にニューラルネットワークを用いた学習を行うため、計算コストが大きい。また、分割された矩形領域単位で対応点を探索するため、対応点の密度が低い。対応点の密度が低ければ、対応点のマッチング精度も必然的に低下する。対応点の密度を高くするためには、入力画像をより細かく分割した上で、ニューラルネットワークを用いた学習による特徴量生成を反復する必要があり、ただでさえ大きい計算コストがさらに増大する。このように特許文献１には、入力画像間の対応点を、計算コストを抑えつつ、高精度かつ高密度に求めることが難しいという課題がある。 In Patent Document 1, when a feature vector for each divided rectangular area is calculated for each of two input images, learning using a neural network is performed for each input image, so that the calculation cost is high. In addition, since the corresponding points are searched in units of divided rectangular areas, the density of the corresponding points is low. If the density of corresponding points is low, the matching accuracy of corresponding points is inevitably lowered. In order to increase the density of corresponding points, it is necessary to divide the input image more finely and repeat the generation of feature values by learning using a neural network, which increases the calculation cost even more. As described above, Patent Document 1 has a problem that it is difficult to obtain corresponding points between input images with high accuracy and high density while suppressing calculation cost.

そこで本発明は、画像間の対応点を、計算コストを抑えつつ、高精度かつ高密度に求めることが可能な情報処理装置を提供することを目的とする。 Therefore, an object of the present invention is to provide an information processing apparatus capable of obtaining corresponding points between images with high accuracy and high density while suppressing calculation cost.

本発明は、同一の被写体を異なる視点から撮像することで得られる第１の単視点画像および第２の単視点画像を取得する取得手段と、前記第１の単視点画像と前記第２の単視点画像とのそれぞれに対し、画像内の特定の構造を検出するためのフィルタを適用する処理を、フィルタを変えて段階的に行うことで、前記第１の単視点画像と前記第２の単視点画像とのそれぞれに対する複数の特徴量マップを作成する作成手段と、前記作成した複数の特徴量マップに基づき、前記第１の単視点画像と前記第２の単視点画像との対応点を探索する探索手段とを有し、前記作成手段が各段階で適用するフィルタは、互いに異なる複数のフィルタであることを特徴とする情報処理装置である。 The present invention provides acquisition means for acquiring a first single-viewpoint image and a second single-viewpoint image obtained by imaging the same subject from different viewpoints, the first single-viewpoint image, and the second single-viewpoint image. A process of applying a filter for detecting a specific structure in the image to each of the viewpoint images is performed step by step by changing the filter, so that the first single-viewpoint image and the second single-viewpoint image are processed. A creation unit that creates a plurality of feature quantity maps for each of the viewpoint images, and searches for corresponding points between the first single-viewpoint image and the second single-viewpoint image based on the created feature quantity maps The information processing apparatus is characterized in that the filter applied by the creating means at each stage is a plurality of different filters.

本発明によれば、画像間の対応点を、計算コストを抑えつつ、高精度かつ高密度に求めることが可能である。 According to the present invention, it is possible to obtain corresponding points between images with high accuracy and high density while suppressing calculation cost.

実施例１における情報処理装置のハードウェア構成を示すブロック図1 is a block diagram illustrating a hardware configuration of an information processing apparatus according to a first embodiment. 実施例１における情報処理装置の機能構成を示すブロック図1 is a block diagram illustrating a functional configuration of an information processing apparatus according to a first embodiment. 実施例１における情報処理装置による処理の流れを示すフローチャート7 is a flowchart showing the flow of processing by the information processing apparatus according to the first embodiment. 特徴量生成部の機能構成を示すブロック図Block diagram showing the functional configuration of the feature quantity generator 特徴量生成部による処理の流れを示すフローチャートFlow chart showing the flow of processing by the feature quantity generator 特徴量生成部の機能構成を示すブロック図Block diagram showing the functional configuration of the feature quantity generator 特徴量生成部による処理の流れを示すフローチャートFlow chart showing the flow of processing by the feature quantity generator 実施例１で用いる入力画像を示す図The figure which shows the input image used in Example 1 実施例１で用いるフィルタを示す図The figure which shows the filter used in Example 1 視差推定の結果を示す図The figure which shows the result of parallax estimation 実施例２における情報処理装置の機能構成を示すブロック図FIG. 2 is a block diagram illustrating a functional configuration of an information processing apparatus according to a second embodiment. 実施例２における情報処理装置による処理の流れを示すフローチャート7 is a flowchart showing a flow of processing by the information processing apparatus according to the second embodiment.

以下、図面を参照して本発明の好適な実施形態を例示的に説明する。但し、以下に記載されている構成要素の相対配置、装置形状等は、あくまで例示であり、この発明の範囲をそれらのみに限定する趣旨のものではない。その趣旨を逸脱しない範囲で、当業者の通常の知識に基づいて、以下に記載する実施形態に対して適宜変更、改良が加えられたものについても本発明の範囲に入ることが理解されるべきである。 Hereinafter, exemplary embodiments of the present invention will be described by way of example with reference to the drawings. However, the relative arrangement of the constituent elements described below, the device shape, and the like are merely examples, and are not intended to limit the scope of the present invention only to them. It should be understood that within the scope of the present invention, the embodiments described below are appropriately modified and improved within the scope of the present invention based on the ordinary knowledge of those skilled in the art. It is.

［実施例１］
本実施例では、多視点画像（複数枚の画像）から視差マップを作成する場合について述べる。多視点画像を取得するためのカメラとして、１台のカメラで被写体を同時に撮像することで多視点画像を取得可能なカメラ（プレノプティックカメラや多眼カメラなど）や、適切に設置された複数台のカメラを用いてよい。また、１台のカメラを移動させながら被写体を撮像することで多視点画像を取得してもよい。なお、以降では、多視点画像に含まれる各視点の画像を単視点画像と呼ぶ。 [Example 1]
In this embodiment, a case where a parallax map is created from a multi-viewpoint image (a plurality of images) will be described. As a camera for acquiring multi-viewpoint images, a camera (such as a plenoptic camera or a multi-lens camera) that can acquire multi-viewpoint images by simultaneously capturing a subject with a single camera, or a properly installed camera Multiple cameras may be used. Alternatively, a multi-viewpoint image may be acquired by capturing a subject while moving one camera. In the following, each viewpoint image included in the multi-viewpoint image is referred to as a single viewpoint image.

＜情報処理装置の構成について＞
以下、実施例１における情報処理装置の構成について説明する。図１は、実施例１における情報処理装置のハードウェア構成の一例を示すブロック図である。実施例１における情報処理装置１００（以下、処理装置１００と略記する）は、ＣＰＵ１０１、ＲＡＭ１０２、ＲＯＭ１０３、二次記憶装置１０４、入力インターフェース（以下、インターフェースをＩＦと略記する）１０５、及び出力ＩＦ１０６から構成される。これらの構成要素はシステムバス１０７によって相互に接続されている。また、処理装置１００は、入力ＩＦ１０５を介して外部記憶装置１０８および操作部１１０に接続されており、出力ＩＦ１０６を介して外部記憶装置１０８および表示装置１０９に接続されている。 <Configuration of information processing device>
Hereinafter, the configuration of the information processing apparatus according to the first embodiment will be described. FIG. 1 is a block diagram illustrating an example of a hardware configuration of the information processing apparatus according to the first embodiment. The information processing apparatus 100 (hereinafter abbreviated as processing apparatus 100) in the first embodiment includes a CPU 101, a RAM 102, a ROM 103, a secondary storage device 104, an input interface (hereinafter abbreviated as IF) 105, and an output IF 106. Composed. These components are connected to each other by a system bus 107. The processing device 100 is connected to the external storage device 108 and the operation unit 110 via the input IF 105, and is connected to the external storage device 108 and the display device 109 via the output IF 106.

ＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして、ＲＯＭ１０３に格納されたプログラムを実行し、システムバス１０７を介して処理装置１００の各構成要素を統括的に制御する。これにより、後述する様々な処理が実行される。二次記憶装置１０４は、処理装置１００で取り扱われる種々のデータを記憶するための装置であり、本実施例ではＨＤＤが用いられる。ＣＰＵ１０１は、システムバス１０７を介して二次記憶装置１０４へのデータの書き込みおよび二次記憶装置１０４に記憶されたデータの読出しを行う。なお、二次記憶装置１０４としてＨＤＤの他に、光ディスクドライブやフラッシュメモリなど、様々な記憶装置を用いることが可能である。 The CPU 101 executes a program stored in the ROM 103 using the RAM 102 as a work memory, and comprehensively controls each component of the processing apparatus 100 via the system bus 107. Thereby, various processes described later are executed. The secondary storage device 104 is a device for storing various data handled by the processing device 100, and an HDD is used in this embodiment. The CPU 101 writes data to the secondary storage device 104 and reads data stored in the secondary storage device 104 via the system bus 107. As the secondary storage device 104, various storage devices such as an optical disk drive and a flash memory can be used in addition to the HDD.

入力ＩＦ１０５は、例えばＵＳＢやＩＥＥＥ１３９４等のシリアルバスＩＦを含み、外部装置から処理装置１００へのデータや命令等の入力は、入力ＩＦ１０５を介して行われる。具体的に処理装置１００は、入力ＩＦ１０５を介して、外部記憶装置１０８からデータを取得する。なお、外部記憶装置１０８として例えば、ハードディスク、メモリーカード、ＣＦカード、ＳＤカード、ＵＳＢメモリなどを用いることが可能である。また、処理装置１００は、入力ＩＦ１０５を介して、ユーザが操作部１１０を用いて入力した命令を取得する。操作部１１０はユーザの指示を処理装置１００に入力するための装置であり、例えばマウスやキーボードなどを含む。 The input IF 105 includes, for example, a serial bus IF such as USB or IEEE1394, and input of data, commands, and the like from an external device to the processing device 100 is performed via the input IF 105. Specifically, the processing apparatus 100 acquires data from the external storage device 108 via the input IF 105. As the external storage device 108, for example, a hard disk, a memory card, a CF card, an SD card, a USB memory, or the like can be used. Further, the processing device 100 acquires a command input by the user using the operation unit 110 via the input IF 105. The operation unit 110 is a device for inputting user instructions to the processing device 100, and includes, for example, a mouse and a keyboard.

出力ＩＦ１０６には、入力ＩＦ１０５と同様のＵＳＢやＩＥＥＥ１３９４等のシリアルバスインターフェースの他に、例えばＤＶＩやＨＤＭＩ（登録商標）等の映像出力端子も含まれる。処理装置１００から外部装置へのデータ等の出力は、出力ＩＦ１０６を介して行われる。処理装置１００は、出力ＩＦ１０６を介して表示装置１０９（液晶ディスプレイなど）に、処理された画像などを出力することで、画像の表示を行う。なお、処理装置１００の構成要素は上述した物の他にも存在するが、本発明の主眼ではないため、説明を省略する。 The output IF 106 includes, for example, a video output terminal such as DVI or HDMI (registered trademark) in addition to a serial bus interface such as USB and IEEE1394 similar to the input IF 105. Output of data and the like from the processing apparatus 100 to the external apparatus is performed via the output IF 106. The processing device 100 displays the image by outputting the processed image or the like to the display device 109 (liquid crystal display or the like) via the output IF 106. In addition, although the component of the processing apparatus 100 exists besides the thing mentioned above, since it is not the main point of this invention, description is abbreviate | omitted.

＜情報処理装置によって実行される処理の概要について＞
以下、本実施例における処理装置１００によって実行される、多視点画像に基づき視差マップを作成する処理（以下、本処理）の概要について説明する。 <Outline of processing executed by information processing apparatus>
Hereinafter, an outline of a process (hereinafter, this process) for creating a parallax map based on a multi-viewpoint image, which is executed by the processing apparatus 100 in the present embodiment, will be described.

まず、ＣＰＵ１０１は、外部記憶装置１０８からフィルタ（データ）を読み出す。このフィルタは本処理に先立ち取得されたものであり、処理対象となる多視点画像とは別の訓練画像を用いて予め学習されたものであってもよい。用いる訓練画像は、処理対象となる多視点画像の被写体と同一または類似の被写体や該多視点画像のシーンと同一または類似のシーンを撮像した画像であることが望ましい。フィルタの学習方法の第１の具体例としては、次の方法が存在する。即ち、訓練画像から多数の部分画像を抽出し、該抽出した部分画像の各々に対する共分散行列を生成した上で、これらの共分散行列の平均（平均共分散行列）を算出する。そして、この平均共分散行列に対し、特異値分解による主成分分析、または、固有値解析を行うことによりフィルタを得る方法である。第２の具体例としては、公知の畳み込みニューラルネットワークの学習アルゴリズムを用いて訓練画像からフィルタを学習する方法が存在する。ただし、フィルタの学習方法は、これらの具体例に限定されない。また、フィルタとして、訓練画像を用いて学習されるものの他に、処理対象の画像や解析的に与えられる関数（例えば、離散コサイン変換基底）を用いてもよい。 First, the CPU 101 reads a filter (data) from the external storage device 108. This filter is acquired prior to this processing, and may be learned in advance using a training image different from the multi-viewpoint image to be processed. The training image to be used is preferably an image obtained by capturing a subject that is the same or similar to the subject of the multi-viewpoint image to be processed or a scene that is the same or similar to the scene of the multi-viewpoint image. The following method exists as a first specific example of the filter learning method. That is, a large number of partial images are extracted from the training image, a covariance matrix is generated for each of the extracted partial images, and an average (average covariance matrix) of these covariance matrices is calculated. Then, this average covariance matrix is a method for obtaining a filter by performing principal component analysis or eigenvalue analysis by singular value decomposition. As a second specific example, there is a method of learning a filter from a training image using a learning algorithm of a known convolutional neural network. However, the filter learning method is not limited to these specific examples. Further, as a filter, an image to be processed or a function given analytically (for example, a discrete cosine transform base) may be used in addition to what is learned using a training image.

次に、式（１）に示すように、多視点画像に含まれる単視点画像の各々に複数の第１のフィルタを畳み込み（適用し）、第１の特徴量マップを得る。なお本明細書では、単視点画像に１次的に畳み込まれるフィルタを第１のフィルタと呼び、第１のフィルタは複数存在する。 Next, as shown in Expression (1), a plurality of first filters are convolved (applied) to each single-viewpoint image included in the multi-viewpoint image to obtain a first feature map. In the present specification, a filter that is primarily folded into a single viewpoint image is referred to as a first filter, and there are a plurality of first filters.

式（１）において、Ｆ_1i（ｘ，ｙ）は、ｉ番目の第１のフィルタの座標（ｘ，ｙ）における係数を表す。また、Ｉ_k（ｘ，ｙ）は、ｋ番目の単視点画像の座標（ｘ，ｙ）における画素値を表す。また、Ｔ_ik（ｘ，ｙ）は、ｋ番目の単視点画像にｉ番目の第１のフィルタを畳み込んで得られる第１の特徴量マップの座標（ｘ，ｙ）における画素値を表す。なお、単視点画像の画素値とは、輝度、色差（即ちＹＵＶ、Ｌａｂなどの色空間における輝度以外の成分）、カラーチャンネル（例えばＲＧＢ）などの値である。 In Expression (1), F _1i (x, y) represents a coefficient at the coordinates (x, y) of the i-th first filter. I _k (x, y) represents a pixel value at the coordinates (x, y) of the k-th single-viewpoint image. T _ik (x, y) represents a pixel value at the coordinates (x, y) of the first feature map obtained by convolving the i-th first filter with the k-th single-viewpoint image. The pixel value of the single viewpoint image is a value such as luminance, color difference (that is, a component other than luminance in a color space such as YUV or Lab), a color channel (for example, RGB), and the like.

フィルタを入力画像に畳み込むことで、単視点画像内の特定の構造を検出できる。例えば、水平方向に勾配を有し垂直方向に一様な値の分布を有するソーベルフィルタを入力画像に畳み込むと、入力画像に含まれる垂直なエッジの位置において、出力画像（特徴量マップ）の画素値は大きい値をとる。このように、特徴量マップは、畳み込まれたフィルタに対応する構造の空間的分布を表す。 By convolving the filter with the input image, a specific structure in the single viewpoint image can be detected. For example, when a Sobel filter having a gradient in the horizontal direction and a distribution of uniform values in the vertical direction is convoluted with the input image, the output image (feature map) is output at the position of the vertical edge included in the input image. The pixel value takes a large value. Thus, the feature map represents the spatial distribution of the structure corresponding to the convolved filter.

次に、式（２）に示すように、第１の特徴量マップに第２のフィルタを畳み込み、第２の特徴量マップを得る。なお本明細書では、単視点画像に２次的に（即ち、第１のフィルタに次いで）畳み込まれるフィルタを第２のフィルタと呼び、第２のフィルタは複数存在する。 Next, as shown in Expression (2), the second feature value map is obtained by convolving the second filter with the first feature value map. In this specification, a filter that is secondarily convolved with a single-viewpoint image (that is, after the first filter) is referred to as a second filter, and there are a plurality of second filters.

式（２）において、Ｆ_2j（ｘ，ｙ）は、ｊ番目の第２のフィルタの座標（ｘ，ｙ）における係数を表す。また、Ｏ_ijk（ｘ，ｙ）は、第２の特徴量マップの座標（ｘ，ｙ）における画素値を表す。 In Expression (2), F _2j (x, y) represents a coefficient at the coordinates (x, y) of the j-th second filter. O _ijk (x, y) represents a pixel value at the coordinates (x, y) of the second feature map.

なお、得られた特徴量マップに対し、非線形変換を施してもよい。非線形変換は、画像に含まれるエッジを強調し画素間の差を増幅することにより、対応点の探索精度を向上する目的で施される。非線形変換としては、統計フィルタ処理、および、閾値処理や非線形関数による画素ごとの値の変換などが挙げられる。以降では、特に断らない限り非線形変換を施さない場合について説明する。 Note that non-linear transformation may be performed on the obtained feature map. Nonlinear transformation is performed for the purpose of improving the search accuracy of corresponding points by enhancing edges included in an image and amplifying differences between pixels. Nonlinear conversion includes statistical filter processing, threshold value processing, and conversion of values for each pixel by a nonlinear function. Hereinafter, a case where non-linear transformation is not performed unless otherwise specified will be described.

また上述の例では、各単視点画像に第１のフィルタおよび第２のフィルタを畳み込むことで、最終的に第２の特徴量マップを取得しているが、最終的に取得する特徴量マップは第２の特徴量マップに限定されない。つまり、第ｎ（ｎは自然数）の特徴量マップに対し第ｎ＋１のフィルタを畳み込み第ｎ＋１の特徴量マップを得るという上述と同様の処理を任意の回数実行し、得られた第ｎ＋１の特徴量マップを最終的に取得する特徴量マップとしてもよい。フィルタを畳む込む回数は１回でも良いが、該回数を増やすことで、画像が有するより複雑な構造を抽出できるようになる。 In the above example, the second feature map is finally acquired by convolving the first filter and the second filter with each single-viewpoint image. However, the feature map to be finally acquired is It is not limited to the second feature amount map. In other words, the n + 1-th feature quantity is obtained by performing the same process as described above to obtain the (n + 1) -th feature map by convolving the (n + 1) -th filter with the n-th (n is a natural number) feature map. It is good also as a feature-value map which acquires a map finally. Although the filter may be folded once, increasing the number of times makes it possible to extract a more complicated structure of the image.

本技術分野で使用される一般的な特徴量としては、ＳＩＦＴ（Ｓｃａｌｅ―ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）をはじめとする様々な局所特徴量が知られている。これらの局所特徴量が算出される位置は、画像内の特定の条件を満たす位置に限定されるため、局所特徴量を用いて対応点を探索する場合、対応点の密度は画像の解像度（画素密度）に比べると非常に低い。一方で本発明では、原理的に全ての画素に対し特徴量を算出できるため、対応点の密度を画像の解像度と同一にすることができる。また、本発明はフィルタ処理だけで特徴量を算出できるため、１点あたりの特徴量の計算コストを非常に低くすることもできる。 As general feature quantities used in this technical field, various local feature quantities such as SIFT (Scale-Invariant Feature Transform) are known. Since the position where these local feature amounts are calculated is limited to a position satisfying a specific condition in the image, when searching for corresponding points using the local feature amounts, the density of the corresponding points is the resolution of the image (pixel Very low compared to (density). On the other hand, in the present invention, since the feature amount can be calculated for all pixels in principle, the density of corresponding points can be made the same as the resolution of the image. In addition, since the present invention can calculate the feature amount only by the filter processing, the calculation cost of the feature amount per point can be extremely reduced.

本実施例では、処理装置１００は、第２の特徴量マップを取得した後、該取得した第２の特徴量マップに基づき単視点画像間の対応点を探索する。なお、以降では、ステレオカメラにより取得した２枚の単視点画像、即ち第１単視点画像Ｉ₁と第２単視点画像Ｉ₂とに基づき、視差マップを作成する場合を例にとり説明する。ここで、本実施例における視差マップとは、各画素位置に対応する視差値を画素値として有するビットマップ形式の画像データである。ステレオカメラは、被写体を複数の異なる方向から同時に撮像することにより、その奥行き方向の情報も取得できるカメラであり、ステレオカメラで取得した２枚の単視点画像について、各画像の水平方向は同一である。 In this embodiment, after acquiring the second feature value map, the processing device 100 searches for corresponding points between single viewpoint images based on the acquired second feature value map. In the following description, a case where a parallax map is created based on two single-view images acquired by a stereo camera, that is, the first single-view image I ₁ and the second single-view image I ₂ will be described as an example. Here, the parallax map in the present embodiment is image data in a bitmap format having a parallax value corresponding to each pixel position as a pixel value. A stereo camera is a camera that can acquire information on its depth direction by simultaneously capturing images of a subject from a plurality of different directions. For two single-viewpoint images acquired by a stereo camera, the horizontal direction of each image is the same. is there.

単視点画像Ｉ₁の座標（ｘ，ｙ）における視差を推定するために、まず単視点画像Ｉ₁に対する第２の特徴量マップＯ_ij1の座標（ｘ，ｙ）における値をｉおよびｊの順序に従い並べて特徴量ベクトルＶ₁（ｘ，ｙ）を得る。例えば、特徴量ベクトルＶ₁（ｘ，ｙ）を、Ｖ₁（ｘ，ｙ）＝（Ｏ₁₁₁（ｘ，ｙ），Ｏ₁₂₁（ｘ，ｙ），Ｏ₁₃₁（ｘ，ｙ），・・・）としてもよい。第１のフィルタがＭ個、第２のフィルタがＮ個である場合には、特徴量ベクトルＶ₁（ｘ，ｙ）の次元はＭ×Ｎとなる。 To estimate the parallax in the single viewpoint image I ₁ of the coordinates (x, y), first single-view image I second feature map O _ij1 of coordinates for ₁ (x, y) values in i and j of the order To obtain a feature vector V ₁ (x, y). For example, the feature vector V ₁ (x, y) is expressed as V ₁ (x, y) = (O ₁₁₁ (x, y), O ₁₂₁ (x, y), O ₁₃₁ (x, y),. ). When there are M first filters and N second filters, the dimension of the feature vector V ₁ (x, y) is M × N.

次に、単視点画像Ｉ₂に対する第２の特徴量マップＯ_ij2の座標（ｘ’，ｙ’）における値をｉおよびｊの順序に従い並べて特徴量ベクトルＶ₂（ｘ’，ｙ’）を得る。ここで、第２の特徴量マップＯ_ij2の座標（ｘ’，ｙ’）における値を並べる順序は、特徴量ベクトルＶ₁（ｘ，ｙ）を取得する際に、第２の特徴量マップＯ_ij1の座標（ｘ，ｙ）における値を並べたｉおよびｊの順序と同一である。また、特徴量ベクトルＶ₂（ｘ’，ｙ’）の取得は、座標（ｘ’，ｙ’）を変えて繰り返し行われる。ただし本実施例では、多視点画像をステレオカメラで取得しているので、このときの座標（ｘ’，ｙ’）の移動範囲を、単視点画像Ｉ₂上の座標（ｘ，ｙ）を通る水平線上に限定できる。 Next, the feature vector V ₂ (x ′, y ′) is obtained by arranging the values at the coordinates (x ′, y ′) of the second feature map O _ij2 for the single viewpoint image I _{2 in} the order of i and j. . Here, the order in which the values in the coordinates (x ′, y ′) of the second feature map O _ij2 are arranged is determined when the feature vector V ₁ (x, y) is acquired. _This is the same as the order of i and j in which the values at the coordinates (x, y) of _ij1 are arranged. The feature vector V ₂ (x ′, y ′) is repeatedly acquired by changing the coordinates (x ′, y ′). However, in this embodiment, since the multi-viewpoint image is acquired by the stereo camera, the movement range of the coordinates (x ′, y ′) at this time passes the coordinates (x, y) on the single-viewpoint image I _2. Can be limited to the horizon.

次に、特徴量ベクトルＶ₁（ｘ，ｙ）と特徴量ベクトルＶ₂（ｘ’，ｙ’）との類似度を定量化して導出する。この類似度としては、一般に用いられる様々な距離（ユークリッド距離、マンハッタン距離、ハミング距離など）や、相互相関係数などが挙げられる。 Next, the similarity between the feature vector V ₁ (x, y) and the feature vector V ₂ (x ′, y ′) is quantified and derived. Examples of the similarity include various commonly used distances (Euclidean distance, Manhattan distance, Hamming distance, etc.), a cross-correlation coefficient, and the like.

次に、類似度が最大になる座標（ｘ’，ｙ’）を導出し、該導出した座標（ｘ’，ｙ’）と座標（ｘ，ｙ）との間の距離を推定視差値として出力する。前述の処理を単視点画像Ｉ₁の全座標において実行することで、視差マップが得られる。 Next, the coordinate (x ′, y ′) that maximizes the similarity is derived, and the distance between the derived coordinate (x ′, y ′) and the coordinate (x, y) is output as an estimated parallax value. To do. A parallax map is obtained by executing the above-described processing on all coordinates of the single viewpoint image I ₁ .

なお、上述の例では多視点画像をステレオカメラで取得する場合について説明したが、多視点画像の取得手段はステレオカメラに限定されない。多視点画像の取得手段がステレオカメラではない場合、座標（ｘ’，ｙ’）を移動しながら特徴量ベクトルＶ₂（ｘ’，ｙ’）を取得するときに座標（ｘ’，ｙ’）の移動範囲を拡大して上述と同様の処理を行う。或いは、各単視点画像を取得した際に被写体を撮像したカメラの位置と姿勢とに関する情報が得られている場合、特徴量ベクトルＶ₂（ｘ’，ｙ’）を取得する際に座標（ｘ’，ｙ’）の移動範囲を、この情報から一意に決定されるエピポーラ線上に限定できる。 In the above example, the case where a multi-viewpoint image is acquired by a stereo camera has been described. However, the means for acquiring a multi-viewpoint image is not limited to a stereo camera. When the multi-viewpoint image acquisition means is not a stereo camera, the coordinates (x ′, y ′) are used when acquiring the feature vector V ₂ (x ′, y ′) while moving the coordinates (x ′, y ′). The same processing as described above is performed by expanding the movement range. Alternatively, when information about the position and orientation of the camera that captured the subject is obtained when each single-viewpoint image is obtained, the coordinates (x) are obtained when the feature vector V ₂ (x ′, y ′) is obtained. The movement range of ', y') can be limited to epipolar lines that are uniquely determined from this information.

＜情報処理装置によって実行される処理について＞
以下、本実施例における処理装置１００によって実行される具体的な処理について、図２および図３を用いて説明する。図２は、本実施例における処理装置１００の機能構成を示すブロック図である。図示するように、処理装置１００は、取得部２０１と、特徴量生成部２０２と、対応点探索部２０３と、出力部２０４とを有する。これらの構成要素は、処理装置１００のＣＰＵ１０１がＲＯＭ１０３内に格納された制御プログラムをＲＡＭ１０２に展開し、該展開したプログラムを実行することで、実現される。或いは、各構成要素に相当する専用の処理回路を備えるように処理装置１００を構成してもよい。 <About processing executed by information processing apparatus>
Hereinafter, specific processing executed by the processing apparatus 100 according to the present embodiment will be described with reference to FIGS. 2 and 3. FIG. 2 is a block diagram illustrating a functional configuration of the processing apparatus 100 according to the present embodiment. As illustrated, the processing apparatus 100 includes an acquisition unit 201, a feature amount generation unit 202, a corresponding point search unit 203, and an output unit 204. These components are realized by the CPU 101 of the processing apparatus 100 expanding a control program stored in the ROM 103 in the RAM 102 and executing the expanded program. Alternatively, the processing apparatus 100 may be configured to include a dedicated processing circuit corresponding to each component.

取得部２０１は、多視点画像を取得し、該取得した多視点画像を特徴量生成部２０２に出力する。本実施例では、取得部２０１は、被写体を第１の視点から見た場合の画像である第１単視点画像と、該被写体を第１の視点とは異なる第２の視点から見た場合の画像である第２単視点画像とを取得するものとする。第１単視点画像は、第１の視点から被写体を撮像することで取得され、第２単視点画像は、第２の視点から該被写体を撮像することで取得される。なお、第１単視点画像および第２単視点画像は、外部装置から入力されるデータであってもよいし、二次記憶装置１０４に記憶されているデータであってもよい。 The acquisition unit 201 acquires a multi-viewpoint image and outputs the acquired multi-viewpoint image to the feature amount generation unit 202. In the present embodiment, the acquisition unit 201 includes a first single viewpoint image that is an image when the subject is viewed from the first viewpoint, and a second viewpoint that is different from the first viewpoint. It is assumed that a second single-viewpoint image that is an image is acquired. The first single viewpoint image is acquired by imaging the subject from the first viewpoint, and the second single viewpoint image is acquired by imaging the subject from the second viewpoint. The first single-viewpoint image and the second single-viewpoint image may be data input from an external device or data stored in the secondary storage device 104.

特徴量生成部２０２は、予め取得されたフィルタを用いることで、第１単視点画像に基づき、第１単視点画像に対応する複数の特徴量マップを作成し、対応点探索部２０３に出力する。また、特徴量生成部２０２は、当該予め取得されたフィルタを用いることで、第２単視点画像に基づき、第２単視点画像に対応する複数の特徴量マップを作成し、対応点探索部２０３に出力する。 The feature value generation unit 202 creates a plurality of feature value maps corresponding to the first single-viewpoint image based on the first single-viewpoint image by using a filter acquired in advance, and outputs it to the corresponding point search unit 203. . In addition, the feature quantity generation unit 202 creates a plurality of feature quantity maps corresponding to the second single viewpoint image based on the second single viewpoint image by using the pre-acquired filter, and the corresponding point search section 203. Output to.

対応点探索部２０３は、第１単視点画像に対応する複数の特徴量マップおよび第２単視点画像に対応する複数の特徴量マップに基づき、第１単視点画像と第２単視点画像との間の対応点を探索する。 The corresponding point search unit 203 determines whether the first single viewpoint image and the second single viewpoint image are based on a plurality of feature amount maps corresponding to the first single viewpoint image and a plurality of feature amount maps corresponding to the second single viewpoint image. Search for corresponding points between.

出力部２０４は、対応点探索部２０３による探索結果に基づき、第１単視点画像および第２単視点画像に対応する視差マップを出力する。 The output unit 204 outputs a parallax map corresponding to the first single-viewpoint image and the second single-viewpoint image based on the search result by the corresponding point search unit 203.

図３は、本実施例における処理装置１００によって実行される処理のフローチャートである。ステップＳ３０１において、取得部２０１は、入力インターフェース１０５を介して、または、二次記憶装置１０４から、処理対象の多視点画像を取得する。そして、取得部２０１は、該取得した多視点画像を特徴量生成部２０２に出力する。本実施例では、取得部２０１によって取得される多視点画像が２枚の単視点画像である場合を例にとり説明している。しかし、多視点画像に含まれる単視点画像の枚数は２枚に限定されず３枚以上であってもよい。多視点画像に含まれる単視点画像が３枚以上の場合には、単視点画像２枚の組を１つまたは複数作り、それぞれの組に対して以降の処理を行うことにより視差マップを作成する。なお、視差マップの形態はビットマップ形式の画像データに限定されず、画素位置と視差値との関係を規定するテーブル形式で出力してもよい。なお、１つの単視点画像に対して複数の視差マップが作成される場合は、それらを合成し、最終的に１つの視差マップを出力する。複数の視差マップを合成する手法として、座標毎に各視差マップの画素値の平均をとる手法や、座標毎に各視差マップの画素値を重み付け加算する手法を用いてよい。 FIG. 3 is a flowchart of processing executed by the processing device 100 in this embodiment. In step S 301, the acquisition unit 201 acquires a multi-viewpoint image to be processed via the input interface 105 or from the secondary storage device 104. Then, the acquisition unit 201 outputs the acquired multi-viewpoint image to the feature amount generation unit 202. In this embodiment, a case where the multi-viewpoint image acquired by the acquisition unit 201 is two single-viewpoint images is described as an example. However, the number of single-viewpoint images included in the multi-viewpoint image is not limited to two and may be three or more. When there are three or more single-viewpoint images included in the multi-viewpoint image, one or a plurality of sets of two single-viewpoint images are created, and a subsequent process is performed on each set to create a parallax map . Note that the form of the parallax map is not limited to image data in the bitmap format, and the parallax map may be output in a table format that defines the relationship between the pixel position and the parallax value. In addition, when a some parallax map is produced with respect to one single viewpoint image, they are synthesize | combined and finally one parallax map is output. As a method of combining a plurality of parallax maps, a method of averaging pixel values of each parallax map for each coordinate or a method of weighting and adding pixel values of each parallax map for each coordinate may be used.

ステップＳ３０２において、特徴量生成部２０２は、取得部２０１から入力された単視点画像の各々に対し、複数のフィルタを逐次的に畳み込むことで、単視点画像毎の複数の特徴量マップを作成する。以下では、このような処理を逐次的または段階的なフィルタ処理と呼ぶ。本ステップで用いるフィルタは、外部記憶装置１０８から読み出される。図４に、単視点画像に第１のフィルタと第２のフィルタとを逐次的に畳み込む場合における特徴量生成部２０２の機能ブロック図を示す。図４に示すように、特徴量生成部２０２は、各単視点画像に第１のフィルタを畳み込むフィルタ処理部２１１と、フィルタ処理部２１１の出力に第２のフィルタを畳み込むフィルタ処理部２１３とを有する。また図５に、図４に示す特徴量生成部２０２によって実行される処理のフローチャートを示す。図５に示すように、ステップＳ３１１において、フィルタ処理部２１１は、単視点画像に第１のフィルタを畳み込む（即ち、第１のフィルタ処理を実行する）。次いで、ステップＳ３１３において、フィルタ処理部２１３は、フィルタ処理部２１１の出力に第２のフィルタを畳み込む（即ち、第２のフィルタ処理を実行する）。なおここでは、第１のフィルタと第２のフィルタとを畳み込む場合について説明しているが、逐次的に畳み込むフィルタの数は２に限定されず３以上であってよい。例えば第３のフィルタをさらに畳み込む場合には、第２のフィルタ処理後に第３のフィルタ処理が追加的に実行される。 In step S 302, the feature amount generation unit 202 creates a plurality of feature amount maps for each single-viewpoint image by sequentially convolving each of the single-viewpoint images input from the acquisition unit 201 with a plurality of filters. . Hereinafter, such processing is referred to as sequential or stepwise filter processing. The filter used in this step is read from the external storage device 108. FIG. 4 is a functional block diagram of the feature quantity generation unit 202 when the first filter and the second filter are sequentially convoluted with the single viewpoint image. As shown in FIG. 4, the feature value generation unit 202 includes a filter processing unit 211 that convolves the first filter with each single-viewpoint image, and a filter processing unit 213 that convolves the second filter with the output of the filter processing unit 211. Have. FIG. 5 shows a flowchart of processing executed by the feature quantity generation unit 202 shown in FIG. As illustrated in FIG. 5, in step S 311, the filter processing unit 211 convolves the first filter with the single viewpoint image (that is, executes the first filter processing). Next, in step S313, the filter processing unit 213 convolves the second filter with the output of the filter processing unit 211 (ie, executes the second filter processing). In addition, although the case where the 1st filter and the 2nd filter are convolved is demonstrated here, the number of the filters which convolve sequentially is not limited to 2, and may be 3 or more. For example, when the third filter is further convoluted, the third filter process is additionally executed after the second filter process.

なお、フィルタ処理後に上述の非線形変換処理を実行してもよい。図６に、２段階のフィルタ処理に加えて非線形変換処理を実行する場合における特徴量生成部２０２の機能ブロック図を示す。また図７に、図６に示す特徴量生成部２０２によって実行される処理のフローチャートを示す。図７中のステップＳ３１２またはＳ３１４における非線形変換処理として、具体的には、ニューラルネットワークにおいて用いられるｔａｎｈ，ｓｉｇｍｏｉｄ、ＲｅＬＵなどの公知の変換処理を用いてもよい。 Note that the nonlinear conversion process described above may be executed after the filter process. FIG. 6 shows a functional block diagram of the feature quantity generation unit 202 when the nonlinear transformation process is executed in addition to the two-stage filter process. FIG. 7 shows a flowchart of processing executed by the feature quantity generation unit 202 shown in FIG. Specifically, as the nonlinear conversion process in step S312 or S314 in FIG. 7, a known conversion process such as tanh, sigmoid, or ReLU used in the neural network may be used.

以下、図３の説明に戻る。ステップＳ３０３において、特徴量生成部２０２は、単視点画像の各々に対する複数の特徴量マップの作成が完了したか、即ち、第１のフィルタと第２のフィルタとの全ての組み合わせに対応する特徴量マップを、単視点画像の各々に対し作成したかを判定する。ステップＳ３０３の判定結果が真の場合、ステップＳ３０４に進む一方、該判定結果が偽の場合、ステップＳ３０２に戻る。 Returning to the description of FIG. In step S303, the feature quantity generation unit 202 has completed creation of a plurality of feature quantity maps for each single viewpoint image, that is, feature quantities corresponding to all combinations of the first filter and the second filter. It is determined whether a map has been created for each single-viewpoint image. If the determination result in step S303 is true, the process proceeds to step S304. If the determination result is false, the process returns to step S302.

ステップＳ３０４において、対応点探索部２０３は、ステップＳ３０２で作成された特徴量マップに基づき、第１単視点画像と第２単視点画像との間で対応点を探索する。ここで、対応点探索部２０３は近傍位置の対応点探索結果に基づき、注目位置（注目画素位置）の対応点探索範囲を適応的に変化させてもよい。例えば、予め粗いサンプリング（低解像度）で視差マップを取得し、次にこのサンプリング位置の間の位置において視差値を算出（対応点を探索）する場合に、近傍位置の既に算出された視差値から候補値を決定し、その候補値の範囲内で視差値を算出する。別の例としては、サンプリング位置を走査し逐次的に視差値を算出する場合に、新しいサンプリング位置と近傍のサンプリング位置との視差値の中から、類似度が最も高いサンプリング位置における視差値を、注目位置の視差値として採用する方法が挙げられる。さらに別の例としては、マルコフ確率場に基づきコスト関数が最小になる視差値を算出する方法が挙げられる。 In step S304, the corresponding point search unit 203 searches for a corresponding point between the first single viewpoint image and the second single viewpoint image based on the feature amount map created in step S302. Here, the corresponding point search unit 203 may adaptively change the corresponding point search range of the target position (target pixel position) based on the corresponding point search result of the neighboring position. For example, when a parallax map is acquired in advance by rough sampling (low resolution), and then a parallax value is calculated at a position between the sampling positions (corresponding points are searched), the parallax value already calculated at a neighboring position is A candidate value is determined, and a parallax value is calculated within the range of the candidate value. As another example, when the parallax value is sequentially calculated by scanning the sampling position, the parallax value at the sampling position with the highest similarity is selected from the parallax values between the new sampling position and the neighboring sampling positions. There is a method of adopting the parallax value of the target position. Yet another example is a method of calculating a disparity value that minimizes a cost function based on a Markov random field.

ステップＳ３０５において、出力部２０４は、対応点探索の結果を視差マップなどの形式に変換して出力する。 In step S305, the output unit 204 converts the corresponding point search result into a format such as a parallax map and outputs the result.

以上が、本実施例における単視点画像間の対応点を求める処理である。本実施例によれば、単視点画像間の視差推定において、各単視点画像が有する複雑な構造を効果的に抽出できるために、視差推定の精度が向上し、視差推定結果が安定化する。 The above is the processing for obtaining corresponding points between single-viewpoint images in the present embodiment. According to the present embodiment, in the parallax estimation between single-viewpoint images, the complicated structure of each single-viewpoint image can be extracted effectively, so that the accuracy of the parallax estimation is improved and the parallax estimation result is stabilized.

＜本実施例の効果について＞
本実施例の効果を説明するため、以下に上述の処理を実際に行った例を示す。本例では、入力画像として、水平方向にのみ５画素の視差を有しかつ光軸が平行な２枚の画像を用いる。図８（ａ）および図８（ｂ）は、本例で用いる入力画像を示す。図８に示す２枚の入力画像は、人工的に作成された１対の視差画像であり、これらの画像は、同一の原画像に一様に視差を与え、さらに異なるぼけと輝度変調とを与えることで得られる。 <About the effects of this embodiment>
In order to explain the effects of the present embodiment, an example in which the above processing is actually performed will be shown below. In this example, two images having parallax of 5 pixels only in the horizontal direction and parallel optical axes are used as input images. FIG. 8A and FIG. 8B show input images used in this example. The two input images shown in FIG. 8 are a pair of artificially created parallax images, and these images uniformly give parallax to the same original image, and further have different blur and luminance modulation. It is obtained by giving.

図９（ａ）は、訓練画像として用意した多数の自然画像から５×５サイズの部分画像を８万枚抽出し、平均共分散行列を算出し、該算出した平均共分散行列に対する主成分分析により得られた８個の第１のフィルタを示す図である。また図９（ｂ）は、前記抽出した８万枚の部分画像に第１のフィルタ（８個）の各々を畳み込むことで部分画像を６４万枚取得し、平均共分散行列を算出し、該算出した平均共分散行列に対する主成分分析により得られた８個の第２のフィルタを示す図である。図示するように、いずれのフィルタもサイズは５×５サイズである。 FIG. 9A shows the extraction of 80,000 5 × 5 partial images from a large number of natural images prepared as training images, calculation of an average covariance matrix, and principal component analysis for the calculated average covariance matrix. It is a figure which shows eight 1st filters obtained by (1). Further, FIG. 9B shows that 640,000 partial images are obtained by convolving each of the first filters (eight) with the extracted 80,000 partial images, and an average covariance matrix is calculated. It is a figure which shows eight 2nd filters obtained by the principal component analysis with respect to the calculated average covariance matrix. As shown in the figure, each filter has a size of 5 × 5.

本例では、視差推定の誤差として、畳み込みの誤差が発生する画像端部（上下左右における５画素幅の領域）を除く領域における真値との差の２乗平均を評価する。また、特徴量ベクトルの類似度にはユークリッド距離（差分２乗和）を用い、ブロックサイズは５×５サイズとする。 In this example, as a parallax estimation error, a mean square of a difference from a true value in an area excluding an image end portion (an area having a width of 5 pixels in the upper, lower, left, and right) where a convolution error occurs is evaluated. Also, the Euclidean distance (sum of squared differences) is used for the similarity of the feature vector, and the block size is 5 × 5.

図１０の各図は、対応点探索結果の視差マップである。図１０に示す視差マップでは、各画素位置における推定視差値を階調表現により表している。図１０（ａ）は、画素値の差分２乗和に基づく従来のブロックマッチングにより導出される視差マップである。この視差マップを導出するために要する処理時間は０．４秒であり、視差推定値の誤差は８．２３画素である。図１０（ｂ）は、第１のフィルタのみを使用した場合に導出される視差マップである。この視差マップを導出するために要する処理時間は０．２秒であり、視差推定値の誤差は１．５６画素である。図１０（ｃ）は、第１のフィルタおよび第２のフィルタを使用した場合に導出される視差マップである。この視差マップを導出するために要する処理時間は１．８秒であり、視差推定値の誤差は０．１０画素である。このように、本実施例の方法に従って畳み込み回数を増やすことで、視差推定の精度が向上する。 Each diagram of FIG. 10 is a disparity map of the corresponding point search result. In the parallax map shown in FIG. 10, the estimated parallax value at each pixel position is represented by gradation expression. FIG. 10A is a disparity map derived by conventional block matching based on the sum of squared differences of pixel values. The processing time required to derive this parallax map is 0.4 seconds, and the error of the parallax estimation value is 8.23 pixels. FIG. 10B is a disparity map derived when only the first filter is used. The processing time required to derive this parallax map is 0.2 seconds, and the error of the parallax estimation value is 1.56 pixels. FIG. 10C is a disparity map derived when the first filter and the second filter are used. The processing time required for deriving this parallax map is 1.8 seconds, and the error of the parallax estimation value is 0.10 pixels. Thus, the accuracy of parallax estimation is improved by increasing the number of convolutions according to the method of the present embodiment.

入力画像によってはフィルタとブロックとのサイズをより大きくした方が視差推定の精度が向上する。上述の例では、フィルタとブロックとのサイズはともに５×５サイズだが、例えば１５×１５サイズのフィルタおよびブロックを用いて上述の処理を行ってもよい。この場合の視差マップを導出するために要する処理時間は、ブロックマッチングの場合は３．２秒、第１のフィルタのみを使用した場合は０．２秒、第１のフィルタおよび第２のフィルタを使用した場合は１．９秒である。このように、本実施例により、視差推定の精度向上のみならず処理の高速化も実現できる。この理由は以下のとおりである。つまり、ブロックマッチングの場合はブロックに含まれる画素数の次元（上の例では２２５（＝１５×１５）次元）のベクトルの比較によって対応点探索を行う。これに対し、本実施例ではフィルタ数の次元（上の例では８または６４次元）のベクトルの比較によって対応点探索を行っており、比較対象のベクトルの次元数が小さくて済むためである。このように本実施例では、計算コストはフィルタサイズではなくフィルタの個数に主に依存するため、画像によってフィルタサイズを変えても処理時間は略一定である。 Depending on the input image, the accuracy of the parallax estimation is improved by increasing the size of the filter and the block. In the above example, the size of both the filter and the block is 5 × 5, but the above processing may be performed using, for example, a 15 × 15 size filter and block. The processing time required to derive the disparity map in this case is 3.2 seconds in the case of block matching, 0.2 seconds when only the first filter is used, and the first filter and the second filter are When used, it is 1.9 seconds. Thus, according to the present embodiment, not only the accuracy of the parallax estimation can be improved but also the processing speed can be increased. The reason for this is as follows. That is, in the case of block matching, the corresponding point search is performed by comparing vectors of the number of pixels included in the block (in the above example, 225 (= 15 × 15) dimensions). On the other hand, in this embodiment, the corresponding point search is performed by comparing vectors of the number of filters (8 or 64 in the above example), and the number of dimensions of the comparison target vector can be small. As described above, in this embodiment, the calculation cost mainly depends on the number of filters, not the filter size. Therefore, even if the filter size is changed depending on the image, the processing time is substantially constant.

さらに本実施例は、ロバスト性においても従来技術より優れており、画像の明るさが変更された場合であっても、入力画像間の対応点を精度良く求めることができる。 Furthermore, the present embodiment is also superior to the prior art in terms of robustness, and it is possible to accurately obtain corresponding points between input images even when the brightness of the images is changed.

［実施例２］
実施例２では、処理対象の多視点画像に基づきフィルタを作成する場合について、図１１および図１２を用いて説明する。なお、実施例１と同一の内容については説明を省略する。 [Example 2]
In the second embodiment, a case where a filter is created based on a multi-viewpoint image to be processed will be described with reference to FIGS. 11 and 12. In addition, description is abbreviate | omitted about the content same as Example 1. FIG.

図１１は、本実施例における処理装置１００の機能構成を示すブロック図である。図示するように、処理装置１００は、取得部２０１と、特徴量生成部２０２と、対応点探索部２０３と、出力部２０４と、フィルタ作成部２０５とを有する。フィルタ作成部２０５は、多視点画像に基づきフィルタを作成する。 FIG. 11 is a block diagram illustrating a functional configuration of the processing apparatus 100 according to the present embodiment. As illustrated, the processing apparatus 100 includes an acquisition unit 201, a feature amount generation unit 202, a corresponding point search unit 203, an output unit 204, and a filter creation unit 205. The filter creation unit 205 creates a filter based on the multi-viewpoint image.

図１２は、本実施例における処理装置１００によって実行される処理のフローチャートである。ステップＳ１２０１において、取得部２０１は、入力インターフェース１０５を介して、または、二次記憶装置１０４から、処理対象の多視点画像を取得する。そして、取得部２０１は、該取得した多視点画像をフィルタ作成部２０５に出力する。 FIG. 12 is a flowchart of processing executed by the processing device 100 in the present embodiment. In step S 1201, the acquisition unit 201 acquires a multi-viewpoint image to be processed via the input interface 105 or from the secondary storage device 104. Then, the acquisition unit 201 outputs the acquired multi-viewpoint image to the filter creation unit 205.

ステップＳ１２０２において、フィルタ作成部２０５は、取得部２０１から入力された多視点画像に基づき複数のフィルタを作成する。フィルタの作成方法は、実施例１で述べた方法と同様である。なお、入力された多視点画像以外の画像を併用してフィルタを作成してもよく、その場合にはフィルタ作成に用いる画像、算出済みの平均共分散行列、作成済みのフィルタなどを外部記憶装置１０８から読み出す。 In step S 1202, the filter creation unit 205 creates a plurality of filters based on the multi-viewpoint image input from the acquisition unit 201. The method for creating the filter is the same as the method described in the first embodiment. Note that a filter may be created using an image other than the input multi-viewpoint image. In that case, the image used for creating the filter, the calculated mean covariance matrix, the created filter, and the like are stored in an external storage device. Read from 108.

ステップＳ１２０３において、特徴量生成部２０２は、取得部２０１によって取得された単視点画像の各々に対し、フィルタ作成部２０５が作成したフィルタを用いて逐次的（段階的）なフィルタ処理を行う。この処理によって、単視点画像の各々に対する特徴量マップが作成される。 In step S 1203, the feature value generation unit 202 performs sequential (stepwise) filter processing on each single-viewpoint image acquired by the acquisition unit 201 using the filter generated by the filter generation unit 205. By this processing, a feature amount map for each single viewpoint image is created.

ステップＳ１２０４において、特徴量生成部２０２は、単視点画像の各々に対する複数の特徴量マップの作成が完了したか、即ち、逐次的に畳み込むフィルタの全ての組み合わせに対応する特徴量マップを、単視点画像の各々に対し作成したかを判定する。ステップＳ１２０４の判定結果が真の場合、ステップＳ１２０５に進む一方、該判定結果が偽の場合、ステップＳ１２０３に戻る。 In step S1204, the feature quantity generation unit 202 completes the creation of a plurality of feature quantity maps for each single viewpoint image, that is, the feature quantity map corresponding to all combinations of filters that are sequentially convoluted is displayed as a single viewpoint. It is determined whether each image has been created. If the determination result in step S1204 is true, the process proceeds to step S1205. If the determination result is false, the process returns to step S1203.

ステップＳ１２０５において、対応点探索部２０３は、ステップＳ１２０３で作成した特徴量マップに基づき、単視点画像間の対応点を探索する。 In step S1205, the corresponding point search unit 203 searches for a corresponding point between single viewpoint images based on the feature amount map created in step S1203.

ステップＳ１２０６において、出力部２０４は、対応点探索の結果を視差マップなどの形式に変換して出力する。 In step S1206, the output unit 204 converts the corresponding point search result into a format such as a parallax map and outputs the result.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００情報処理装置
２０１取得部
２０２特徴量生成部
２０３対応点探索部 100 Information Processing Device 201 Acquisition Unit 202 Feature Quantity Generation Unit 203 Corresponding Point Search Unit

Claims

An acquisition means for acquiring a first single-viewpoint image and a second single-viewpoint image obtained by imaging the same subject from different viewpoints;
For each of the first single-view image and the second single-view image, a process of applying a filter for detecting a specific structure in the image is performed step by step by changing the filter. Creating means for creating a plurality of feature amount maps for each of the first single-viewpoint image and the second single-viewpoint image;
Search means for searching for corresponding points between the first single-viewpoint image and the second single-viewpoint image based on the plurality of created feature amount maps;
The information processing apparatus characterized in that the filter applied by the creating means at each stage is a plurality of different filters.

The information processing apparatus according to claim 1, further comprising an output unit that outputs a parallax map based on a search result obtained by the search unit.

The search means includes
Deriving a first feature vector at the target pixel position of the first single-view image based on a plurality of feature maps for the first single-view image,
Based on a plurality of feature amount maps for the second single viewpoint image, a second feature amount vector is derived for each pixel position in the search range of the second single viewpoint image;
Deriving the similarity between the first feature quantity vector and the second feature quantity vector for each pixel position in the search range;
3. The pixel position of the second single viewpoint image corresponding to the target pixel position is a pixel position having the highest similarity among the derived similarities. Information processing device.

The information processing apparatus according to claim 3, wherein the search unit changes the search range based on a corresponding point search result in the vicinity of the target pixel position.

The first single-view image and the second single-view image are image data of the same size,
The feature quantity map is bitmap format data having a feature quantity as a pixel value, and has the same size as the first single-viewpoint image. The information processing apparatus described.

6. The apparatus according to claim 1, further comprising execution means for executing nonlinear transformation each time a filter is applied when the creation means performs the process of applying the filter stepwise. Information processing device.

The information processing according to any one of claims 1 to 6, further comprising a creation unit that creates the plurality of filters based on the first single-viewpoint image and the second single-viewpoint image. apparatus.

Acquiring a first single-viewpoint image and a second single-viewpoint image obtained by imaging the same subject from different viewpoints;
For each of the first single-view image and the second single-view image, a process of applying a filter for detecting a specific structure in the image is performed step by step by changing the filter. Creating a plurality of feature amount maps for each of the first single-viewpoint image and the second single-viewpoint image;
Searching for corresponding points between the first single-viewpoint image and the second single-viewpoint image based on the plurality of created feature amount maps;
An information processing method characterized in that the filters applied at each stage in the creating step are a plurality of different filters.

A program for causing a computer to execute the method according to claim 8.