JP5166335B2

JP5166335B2 - Apparatus, method and program for extracting features

Info

Publication number: JP5166335B2
Application number: JP2009074582A
Authority: JP
Inventors: 寛服部
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2009-03-25
Filing date: 2009-03-25
Publication date: 2013-03-21
Anticipated expiration: 2029-03-25
Also published as: JP2010225107A

Description

本発明は、特徴を抽出する装置、方法およびプログラムに関する。 The present invention relates to an apparatus, a method, and a program for extracting features.

入力画像に含まれる特定のパターンを検出する技術や、複数のパターンを既知のクラスに分類する技術はパターン認識（パターン識別）技術と呼ばれる。パターン認識は、画像から識別に有効な特徴のみを抽出する特徴抽出処理と、抽出された特徴を用いた識別処理の２段階で構成される。前段の特徴抽出処理は、少ない演算量で高い識別性能を実現する上で極めて重要な処理である。非特許文献１では、ＣｏＨＯＧ（Co-occurrence Histograms of Oriented Gradients、輝度勾配方向共起ヒストグラム）と呼ばれる分析手法により抽出される特徴量が提案されている。 A technique for detecting a specific pattern included in an input image or a technique for classifying a plurality of patterns into a known class is called a pattern recognition (pattern identification) technique. Pattern recognition consists of two stages: a feature extraction process that extracts only features that are effective for identification from an image, and a discrimination process that uses the extracted features. The feature extraction process in the previous stage is extremely important for realizing high discrimination performance with a small amount of calculation. Non-Patent Document 1 proposes a feature amount extracted by an analysis method called CoHOG (Co-occurrence Histograms of Oriented Gradients).

T. Watanabe et al., “Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection”, T. Wada, F. Huang, and S. Lin （Eds.）: PSIVT 2009, LNCS 5414, Springer-Verlag Berlin Heidelberg, pp. 37-47, 2009.T. Watanabe et al., “Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection”, T. Wada, F. Huang, and S. Lin (Eds.): PSIVT 2009, LNCS 5414, Springer-Verlag Berlin Heidelberg, pp 37-47, 2009.

しかしながら、従来の方法では、一般に単一の画像に基づいて特徴量を算出するため、識別処理で高い識別性能を実現できない場合があった。すなわち、ある特定の対象物体を検出したり認識したりする際には、異なる画像間の情報を利用することが望ましいが、例えば非特許文献１の方法では、このように異なる複数の画像から特徴量を算出することが考慮されていなかった。 However, in the conventional method, since the feature amount is generally calculated based on a single image, there is a case where high identification performance cannot be realized by the identification process. That is, when detecting or recognizing a specific target object, it is desirable to use information between different images. However, in the method of Non-Patent Document 1, for example, a feature is obtained from a plurality of different images. It was not considered to calculate the amount.

本発明は、上記に鑑みてなされたものであって、高い識別性能を有する特徴量を抽出することができる装置、方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide an apparatus, a method, and a program capable of extracting a feature quantity having high discrimination performance.

上述した課題を解決し、目的を達成するために、本発明の一態様は、複数の画像間で相互に対応する部分画像を検出する検出部と、複数の前記部分画像ごと、および、前記部分画像に含まれる画素ごとに画素特徴量を算出する画素特徴算出部と、前記部分画像ごとに、前記部分画像に含まれる画素それぞれについて、算出された画素特徴量と、前記画素と予め定められた距離だけ離れた他の画素に対して算出された画素特徴量との共起頻度を算出し、前記画素ごとに算出した前記共起頻度それぞれを要素として含む画像内特徴量を算出する画像内特徴算出部と、複数の前記部分画像それぞれに対して算出された複数の前記画像内特徴量それぞれに含まれる前記共起頻度を、対応する画素ごとに１つに統合した要素を含む特徴量を表す画像特徴量を生成する生成部と、を備えることを特徴とする。 In order to solve the above-described problems and achieve the object, one embodiment of the present invention includes a detection unit that detects partial images corresponding to each other between a plurality of images, each of the plurality of partial images, and the portion. A pixel feature calculation unit that calculates a pixel feature amount for each pixel included in the image, a pixel feature amount calculated for each pixel included in the partial image for each partial image, and the pixel predetermined An intra-image feature that calculates a co-occurrence frequency with a pixel feature amount calculated for other pixels separated by a distance and calculates an intra-image feature amount that includes each of the co-occurrence frequencies calculated for each pixel A calculation unit and a feature amount including an element obtained by integrating the co-occurrence frequencies included in each of the plurality of image feature amounts calculated for each of the plurality of partial images into one corresponding pixel. Image features Characterized in that it comprises a generation unit for generating a.

本発明によれば、高い識別性能を有する特徴量を抽出することができるという効果を奏する。 According to the present invention, it is possible to extract a feature amount having high identification performance.

本実施の形態に係る特徴抽出装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the feature extraction apparatus which concerns on this Embodiment. 特徴抽出装置と検出する物体との関係を示す図である。It is a figure which shows the relationship between a feature extraction apparatus and the object to detect. 本実施の形態で使用するステレオカメラ座標系の一例を示す図である。It is a figure which shows an example of the stereo camera coordinate system used by this Embodiment. 領域設定部が検出して設定する領域の一例を示す図である。It is a figure which shows an example of the area | region which an area | region setting part detects and sets. 視差の算出方法の一例を説明するための模式図である。It is a schematic diagram for demonstrating an example of the calculation method of parallax. ステレオ処理領域に生成される矩形の一例を示す図である。It is a figure which shows an example of the rectangle produced | generated in a stereo process area | region. 矩形の評価方法を説明するための模式図である。It is a schematic diagram for demonstrating the evaluation method of a rectangle. 輝度勾配方向を量子化する方法の一例を説明する図である。It is a figure explaining an example of the method of quantizing a luminance gradient direction. 候補領域の分割方法の一例を示す図である。It is a figure which shows an example of the division | segmentation method of a candidate area | region. 図９のメッシュを拡大表示した図である。It is the figure which expanded and displayed the mesh of FIG. 共起ヒストグラムを表す行列の一例を示す図である。It is a figure which shows an example of the matrix showing a co-occurrence histogram. 注目画素からチェビシェフ距離が１である複数の変位ベクトルの一例を示す図である。It is a figure which shows an example of the some displacement vector whose Chebyshev distance is 1 from an attention pixel. 画像間特徴ベクトルを算出する場合の注目画素の一例を示す図である。It is a figure which shows an example of the attention pixel in the case of calculating the feature vector between images. 画像間特徴ベクトルを算出する場合の参照画素の一例を示す図である。It is a figure which shows an example of the reference pixel in the case of calculating the feature vector between images. 画像間特徴ベクトルを算出する場合の共起ヒストグラムを表す行列の一例を示す図である。It is a figure which shows an example of the matrix showing the co-occurrence histogram in the case of calculating the feature vector between images. 算出された特徴ベクトルの関係を示す図である。It is a figure which shows the relationship of the calculated feature vector. 本実施の形態における特徴抽出処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the feature extraction process in this Embodiment. 本実施の形態に係る特徴抽出装置の各機能を実現する装置の構成を例示的に説明する図である。It is a figure explaining illustratively the composition of the device which realizes each function of the feature extraction device concerning this embodiment.

以下に添付図面を参照して、この発明にかかる特徴抽出装置、特徴抽出方法、および特徴抽出プログラムの最良な実施の形態を詳細に説明する。 Exemplary embodiments of a feature extraction apparatus, a feature extraction method, and a feature extraction program according to the present invention will be explained below in detail with reference to the accompanying drawings.

本実施の形態に係る特徴抽出装置は、物体を検出する領域の候補として複数の画像データそれぞれから検出した、対応する複数の部分画像に対し、同一部分画像内の画素の画素特徴量の組み合わせの共起頻度を表す画像内特徴量をそれぞれ算出する。そして、対応する複数の部分画像それぞれで算出した複数の画像内特徴量を統合した画像特徴量を生成する。これにより、複数の画像を考慮した高い識別能力を有する特徴量を得ることができる。 The feature extraction device according to the present embodiment is a combination of pixel feature amounts of pixels in the same partial image for a plurality of corresponding partial images detected from a plurality of pieces of image data as candidate regions for detecting an object. In-image feature amounts representing the co-occurrence frequencies are calculated. Then, an image feature amount is generated by integrating a plurality of in-image feature amounts calculated for each of the corresponding partial images. Thereby, the feature-value which has the high identification capability in consideration of the some image can be obtained.

さらに、本実施の形態に係る特徴抽出装置は、対応する複数の部分画像のうち１の部分画像内の画素の画素特徴量と、他の部分画像内の画素の画素特徴量との組み合わせの共起頻度を表す画像間特徴量を算出する。そして、この画像間特徴量を結合した画像特徴量を生成する。これにより、複数の画像を考慮し、より高い識別能力を有する特徴量を得ることができる。 Furthermore, the feature extraction apparatus according to the present embodiment shares the combination of the pixel feature amount of a pixel in one partial image and the pixel feature amount of a pixel in another partial image among the corresponding partial images. An inter-image feature amount representing the occurrence frequency is calculated. And the image feature-value which combined this inter-image feature-value is produced | generated. Accordingly, it is possible to obtain a feature amount having higher discrimination ability in consideration of a plurality of images.

図１は、本実施の形態に係る特徴抽出装置１００の構成の一例を示すブロック図である。図１に示すように、特徴抽出装置１００は、画像入力部１０１ａ、１０１ｂと、画像記憶部１０２ａ、１０２ｂと、領域設定部１１０と、正規化部１０３と、画素特徴算出部１０４と、画像内特徴算出部１０５と、画像間特徴算出部１０６と、生成部１０７と、出力部１０８と、を備えている。なお、画像入力部１０１ａ、１０１ｂ、および、画像記憶部１０２ａ、１０２ｂは、特徴抽出装置１００の外部に配置されて特徴抽出装置１００に接続されてもよい。 FIG. 1 is a block diagram showing an example of the configuration of the feature extraction apparatus 100 according to the present embodiment. As shown in FIG. 1, the feature extraction apparatus 100 includes image input units 101a and 101b, image storage units 102a and 102b, an area setting unit 110, a normalization unit 103, a pixel feature calculation unit 104, an in-image A feature calculation unit 105, an inter-image feature calculation unit 106, a generation unit 107, and an output unit 108 are provided. Note that the image input units 101 a and 101 b and the image storage units 102 a and 102 b may be arranged outside the feature extraction device 100 and connected to the feature extraction device 100.

なお、本実施の形態では、図２に示すように、移動体２０１（自動車）の前方に搭載した画像入力部１０１ａ、１０１ｂとして機能する２台のカメラ１１ａ、１１ｂを用いて、自動車の進行方向の道路面２１０上に存在する歩行者２１１を検出する場合に用いる特徴量を抽出することを想定する。 In the present embodiment, as shown in FIG. 2, the traveling direction of the automobile using two cameras 11 a and 11 b functioning as image input units 101 a and 101 b mounted in front of the moving body 201 (automobile). It is assumed that the feature amount used when detecting the pedestrian 211 existing on the road surface 210 is extracted.

図１に戻り、画像入力部１０１ａ、１０１ｂは、少なくとも２台のカメラを用いて撮影視点が異なる複数の画像を入力する。視野に重なりが存在すれば複数のカメラの互いの位置や方向は任意に設定できる。本実施の形態では、同一の２台のカメラ１１ａ、１１ｂを左右平行に配置してステレオ画像を撮影するものとする。 Returning to FIG. 1, the image input units 101 a and 101 b input a plurality of images having different shooting viewpoints using at least two cameras. If there is an overlap in the field of view, the positions and directions of the cameras can be arbitrarily set. In the present embodiment, it is assumed that the same two cameras 11a and 11b are arranged in parallel on the left and right to shoot a stereo image.

図３は、本実施の形態で使用するステレオカメラ座標系の一例を示す図である。本実施の形態では、ステレオカメラ座標系の原点Ｏを右カメラ（カメラ１１ｂ）の視点（レンズ中心）にとる。また、左右カメラの視点を結ぶ直線をＸ軸とし、鉛直下向きにＹ軸、カメラの光軸方向にＺ軸を設定する。 FIG. 3 is a diagram illustrating an example of a stereo camera coordinate system used in the present embodiment. In the present embodiment, the origin O of the stereo camera coordinate system is taken as the viewpoint (lens center) of the right camera (camera 11b). Further, a straight line connecting the viewpoints of the left and right cameras is set as the X axis, the Y axis is set vertically downward, and the Z axis is set in the optical axis direction of the camera.

カメラ間の距離（基線長）をＢとすると、左カメラの位置は（−Ｂ，０，０）と表せる。簡単のため、道路面を平面でモデル化し、かつ、水平方向の傾斜が微小であるとして無視すると、道路面（道路平面）の方程式は以下の（１）式で表される。
Ｙ＝αＺ＋β ・・・（１） If the distance between cameras (baseline length) is B, the position of the left camera can be expressed as (−B, 0, 0). For the sake of simplicity, if the road surface is modeled as a plane and is ignored because the slope in the horizontal direction is very small, the equation for the road surface (road plane) is expressed by the following equation (1).
Y = αZ + β (1)

α及びβは、それぞれステレオカメラから観察した道路平面の傾斜、及び、ステレオカメラの道路面からの高さを表す。以下では、α及びβを合わせて面パラメータと呼ぶ。一般に道路面の傾斜は場所ごとに異なり、かつ、移動体走行時にはカメラが振動するため、面パラメータは移動体の移動に伴って時々刻々変化する。 α and β represent the inclination of the road plane observed from the stereo camera and the height of the stereo camera from the road surface, respectively. Hereinafter, α and β are collectively referred to as surface parameters. In general, the slope of the road surface varies from place to place, and the camera vibrates when traveling on a moving body, so that the surface parameters change from moment to moment as the moving body moves.

ここで、ステレオ画像の各画像に対する座標系について説明する。ステレオ画像を構成する各画像に対しては、それぞれ画像座標系が設定される。右カメラ（カメラ１１ｂ）で撮像された画像（以下「右画像」という。）には、水平方向及び垂直方向にそれぞれｘ軸及びｙ軸が設定される。同様に、左カメラ（カメラ１１ａ）で撮像された画像（以下「左画像」という。）には、水平方向及び垂直方向にそれぞれｘ’軸及びｙ’軸が設定される。各画像の水平方向（ｘ軸及びｘ’軸）は、ステレオカメラ座標系のＸ軸方向と一致するものとする。 Here, a coordinate system for each image of the stereo image will be described. An image coordinate system is set for each image constituting the stereo image. An x-axis and a y-axis are set in the horizontal direction and the vertical direction, respectively, in an image captured by the right camera (camera 11b) (hereinafter referred to as “right image”). Similarly, an x ′ axis and a y ′ axis are set in the horizontal direction and the vertical direction, respectively, in an image captured by the left camera (camera 11a) (hereinafter referred to as “left image”). It is assumed that the horizontal direction (x-axis and x′-axis) of each image matches the X-axis direction of the stereo camera coordinate system.

このような場合、右画像上の点（ｘ，ｙ）の左画像上の対応点を（ｘ’，ｙ’）とすると、ｙ＝ｙ’となる。したがって、水平方向の位置の違いのみを考えればよい。以下では右画像と左画像での対応点の水平位置の違いを視差と呼ぶ。また、以下では、右画像を基準画像として、視差をｄ＝ｘ−ｘ’と表記する。 In such a case, if the corresponding point on the left image of the point (x, y) on the right image is (x ′, y ′), y = y ′. Therefore, only the difference in position in the horizontal direction needs to be considered. Hereinafter, the difference in the horizontal position of the corresponding point between the right image and the left image is referred to as parallax. In the following, the parallax is expressed as d = x−x ′ with the right image as the reference image.

図１に戻り、画像記憶部１０２ａ、１０２ｂは、画像入力部１０１ａ、１０１ｂが取得した画像データをそれぞれ記憶する。なお、画像記憶部１０２ａ、１０２ｂは、ＨＤＤ（Hard Disk Drive）、光ディスク、メモリカード、ＲＡＭ（Random Access Memory）などの一般的に利用されているあらゆる記憶媒体により構成することができる。 Returning to FIG. 1, the image storage units 102a and 102b store the image data acquired by the image input units 101a and 101b, respectively. The image storage units 102a and 102b can be configured by any commonly used storage medium such as an HDD (Hard Disk Drive), an optical disk, a memory card, and a RAM (Random Access Memory).

領域設定部１１０は、道路上に存在する立体物を検出するための候補となる領域（候補領域）を設定する。図４は、領域設定部１１０が検出して設定する領域の一例を示す図である。同図に示すように、領域設定部１１０は、ステレオ画像（左画像および右画像）間で対応する領域をペアとして抽出する。同図では、点線で示した右画像の矩形Ｒ_１〜Ｒ_３と、対応する左画像の矩形Ｒ’_１〜Ｒ’_３が候補領域として設定された例が示されている。 The region setting unit 110 sets a region (candidate region) that is a candidate for detecting a three-dimensional object existing on the road. FIG. 4 is a diagram illustrating an example of an area detected and set by the area setting unit 110. As shown in the figure, the region setting unit 110 extracts corresponding regions between stereo images (left image and right image) as a pair. In the figure, an example is shown in which rectangles R _{1 to} R ₃ of the right image indicated by dotted lines and rectangles R ′ _{1 to} R ′ _{3 of} the corresponding left image are set as candidate areas.

図１に示すように、領域設定部１１０は、詳細な構成として、視差算出部１１１と、パラメータ算出部１１２と、検出部１１３とを備えている。 As shown in FIG. 1, the region setting unit 110 includes a parallax calculation unit 111, a parameter calculation unit 112, and a detection unit 113 as detailed configurations.

視差算出部１１１は、画像入力部１０１ａ、１０１ｂで取得し、画像記憶部１０２ａ、１０２ｂに記憶されたステレオ画像間の視差を計算する。具体的には、視差算出部１１１は、ステレオ画像を構成する複数の画像間で対応する点（対応点）を求め、求めた対応点それぞれについて、複数の画像上での位置の差を表す視差を算出する。 The parallax calculation unit 111 calculates parallax between stereo images acquired by the image input units 101a and 101b and stored in the image storage units 102a and 102b. Specifically, the parallax calculation unit 111 obtains corresponding points (corresponding points) between a plurality of images constituting a stereo image, and each of the obtained corresponding points represents a difference in position on the plurality of images. Is calculated.

次に、図５を参照して、視差算出部１１１による視差の算出方法について説明する。図５は、視差の算出方法の一例を説明するための模式図である。視差算出部１１１は、基準画像である右画像の任意の点（ｘ，ｙ）に対し、左画像上の同一走査線上を探索して対応点（ｘ’，ｙ）＝（ｘ＋ｄ，ｙ）を求める。 Next, a parallax calculation method by the parallax calculation unit 111 will be described with reference to FIG. FIG. 5 is a schematic diagram for explaining an example of a parallax calculation method. The parallax calculation unit 111 searches the same scanning line on the left image for an arbitrary point (x, y) of the right image that is the reference image, and finds a corresponding point (x ′, y) = (x + d, y). Ask.

ｄ≧０となるため、探索の際には、視差算出部１１１は、同図（ａ）のように左画像上の同一座標より右側の点のみを調べればよい。より具体的には、視差算出部１１１は、同図（ｂ）のように右画像上の点（ｘ，ｙ）の周囲にウィンドウを設定し、その内部の輝度パターンと最も類似した輝度パターンを周囲に持つ点を左画像の同一走査線から求める。 Since d ≧ 0, the parallax calculation unit 111 needs to examine only the point on the right side of the same coordinate on the left image as shown in FIG. More specifically, the parallax calculation unit 111 sets a window around a point (x, y) on the right image as shown in FIG. The surrounding points are obtained from the same scanning line of the left image.

輝度パターンの類似性の評価尺度としては、例えば、正規化相互相関Ｃを用いることができる。探索ウィンドウのサイズを（２ｗ＋１）×（２ｗ＋１）画素とし、左右画像に設定したウィンドウ内の輝度をそれぞれｆ（ξ，η），ｇ（ξ，η）と表現すると、正規化相互相関Ｃは以下の（２）式で与えられる。

For example, normalized cross-correlation C can be used as an evaluation measure of similarity of luminance patterns. When the size of the search window is (2w + 1) × (2w + 1) pixels and the luminance in the window set for the left and right images is expressed as f (ξ, η) and g (ξ, η), the normalized cross-correlation C is (2).

基準画像上の任意の点に対して、正規化相互相関Ｃを用いて対応点を探索すれば、全ての点に対する視差を含む視差マップが得られる。 If a corresponding point is searched for any point on the reference image using the normalized cross-correlation C, a parallax map including the parallax for all the points can be obtained.

図１に戻り、パラメータ算出部１１２は、視差算出部１１１によって算出された視差マップを用いて、面パラメータα及びβを算出する。まず、基準画像上の点（ｘ，ｙ）の視差ｄから、その点の３次元空間上での位置（Ｘ，Ｙ，Ｚ）を求める方法について説明する。空間中の点（Ｘ，Ｙ，Ｚ）と、その左右画像への投影像である（ｘ’、ｙ）及び（ｘ，ｙ）の間には以下の（５）式の関係式が成り立つ。

Returning to FIG. 1, the parameter calculation unit 112 calculates the surface parameters α and β using the parallax map calculated by the parallax calculation unit 111. First, a method for obtaining the position (X, Y, Z) of the point in the three-dimensional space from the parallax d of the point (x, y) on the reference image will be described. The following relational expression (5) is established between the point (X, Y, Z) in the space and (x ′, y) and (x, y) which are the projected images on the left and right images.

（５）式をＸ，Ｙ，Ｚについて解くと、以下の（６）式が得られる。

When the equation (5) is solved for X, Y, and Z, the following equation (6) is obtained.

パラメータ算出部１１２は、上記の各式を用いて、視差算出部１１１により視差が得られた基準画像上の点の３次元空間上での位置（３次元位置）を求める。そして、パラメータ算出部１１２は、求めた３次元位置のうち、道路面との距離が近い点を選択して、道路面の方程式である上記（１）式（Ｙ＝αＺ＋β）に代入することにより、面パラメータであるα及びβを算出する。 The parameter calculation unit 112 obtains the position (three-dimensional position) in the three-dimensional space of the point on the reference image from which the parallax is obtained by the parallax calculation unit 111 using each of the above equations. Then, the parameter calculation unit 112 selects a point that is close to the road surface from among the obtained three-dimensional positions, and substitutes it into the above equation (1) (Y = αZ + β) that is the road surface equation. Then, α and β which are surface parameters are calculated.

パラメータ算出部１１２は、道路面との距離が近い点を、以下の（７）式の条件を満たす点として抽出する。

The parameter calculation unit 112 extracts a point that is close to the road surface as a point that satisfies the following expression (7).

ここで、ΔＹは閾値であり、適当な値を予め設定しておく。Ｙ_ｐは、点（Ｘ，Ｙ，Ｚ）を通ってＹ軸に平行な直線と基準道路面との交点のＹ座標を表す。基準道路面とは、例えば平坦な道路などのように予め定められた基準となる道路面をいう。基準道路面のパラメータは、例えば、移動体が静止しているときに平坦な道路で計測する。基準道路面のパラメータをα’、β’とすると、Ｙ_ｐは、以下の（８）式により与えられる。
Ｙ_ｐ＝α’Ｚ＋β’ ・・・（８） Here, ΔY is a threshold value, and an appropriate value is set in advance. Y _p represents the point (X, Y, Z) Y coordinate of the intersection of the parallel straight lines and the reference road surface in Y axis through. The reference road surface refers to a road surface that is a predetermined reference such as a flat road. The parameters of the reference road surface are measured on a flat road when the moving body is stationary, for example. If the parameters of the reference road surface are α ′ and β ′, Y _p is given by the following equation (8).
Y _p = α′Z + β ′ (8)

検出部１１３は、算出されたパラメータを用いて、物体を検出するための領域として、複数の画像間で相互に対応する部分画像（候補領域）を検出する。検出部１１３は、以下の手順に従って候補領域を検出する。 Using the calculated parameters, the detection unit 113 detects partial images (candidate regions) that correspond to each other among a plurality of images as a region for detecting an object. The detection unit 113 detects a candidate area according to the following procedure.

検出部１１３は、まず、ステレオ処理領域の任意の点（ｘ，ｙ）に対し、その点を下辺の中点とする矩形を設定する。図６は、ステレオ処理領域に生成される矩形の一例を示す図である。検出部１１３は、面パラメータα及びβを用いることにより、下辺が道路面と接しており、大きさが人間の縦横のサイズを考慮して決定した矩形を設定する。 First, the detection unit 113 sets a rectangle having an arbitrary point (x, y) in the stereo processing region with the lower side as a midpoint. FIG. 6 is a diagram illustrating an example of a rectangle generated in the stereo processing area. By using the surface parameters α and β, the detection unit 113 sets a rectangle whose lower side is in contact with the road surface and whose size is determined in consideration of the vertical and horizontal sizes of a person.

例えば、人間の身長及び人間の横幅の代表値をそれぞれＨ及びＷとした場合、矩形の画像上の高さｈ及び幅ｗは以下のように求めることができる。すなわち、道路面の方程式である（１）式（Ｙ＝αＺ＋β）及び（５）式から、以下の（９）式が成り立つため、以下の（１０）式が得られる。

For example, when the representative values of the human height and the horizontal width are H and W, respectively, the height h and the width w on the rectangular image can be obtained as follows. That is, since the following formula (9) is established from the formula (1) (Y = αZ + β) and the formula (5), which are road surface equations, the following formula (10) is obtained.

このように画像上の縦方向の位置、すなわちｙ座標によって、検出部１１３が生成する矩形の画像上のサイズは変化する。また、検出部１１３は、人間の様々な大きさに対応するため、同図のように画像上の各点（ｘ，ｙ）に対して、複数種類の矩形を生成する。同図では３種類の矩形を設定する例が示されている。 As described above, the size of the rectangular image generated by the detection unit 113 varies depending on the vertical position on the image, that is, the y coordinate. The detection unit 113 generates a plurality of types of rectangles for each point (x, y) on the image as shown in FIG. In the figure, an example of setting three types of rectangles is shown.

次に、検出部１１３は、このようにして各点に設定した矩形内の視差から、矩形内に人物（歩行者）が含まれる可能性を評価し、可能性が高い矩形を候補領域として検出する。具体的には、検出部１１３は、矩形に含まれる視差の均一の度合いを表す均一度が所定の閾値より大きい矩形を、人物が含まれる可能性が高い矩形であると評価する。図７は、矩形の評価方法を説明するための模式図である。 Next, the detection unit 113 evaluates the possibility that a person (pedestrian) is included in the rectangle from the parallax in the rectangle set to each point in this way, and detects a rectangle having a high possibility as a candidate area. To do. Specifically, the detection unit 113 evaluates a rectangle having a uniformity degree that represents a degree of uniformity of parallax included in the rectangle that is greater than a predetermined threshold as a rectangle that is likely to include a person. FIG. 7 is a schematic diagram for explaining a rectangular evaluation method.

図７（ａ）に示すように、矩形９０１内に人物が含まれる場合、矩形内の奥行きがほぼ一様となるので、対応する視差も一様となる。また、点（ｘ，ｙ）が道路面と接するため、この点の視差は以下の（１１）式により与えられる。

As shown in FIG. 7A, when a person is included in a rectangle 901, the depth in the rectangle is almost uniform, and the corresponding parallax is also uniform. Further, since the point (x, y) is in contact with the road surface, the parallax at this point is given by the following equation (11).

したがって、矩形内の任意の点の視差をｄ_ｉとすると、検出部１１３は、視差の均一度を以下の（１２）式を満たす点の個数Ｎで評価することができる。ここで、Δｄは予め定められた視差の差分の閾値（差分閾値）を表す。

Therefore, if the parallax at an arbitrary point in the rectangle is d _i , the detection unit 113 can evaluate the uniformity of the parallax with the number N of points that satisfy the following expression (12). Here, Δd represents a predetermined parallax difference threshold (difference threshold).

検出部１１３は、矩形の大きさによる影響を考慮するため、矩形の面積Ｓ＝ｗ×ｈで個数Ｎを正規化し、正規化した個数Ｎが以下の（１３）式を満たす矩形を候補領域として登録する。

In order to consider the influence of the size of the rectangle, the detection unit 113 normalizes the number N with the rectangular area S = w × h, and uses the rectangle that satisfies the following expression (13) as a candidate region. sign up.

ここで、Ｎ_ｍｉｎは閾値であり、予め適切な値を設定しておく。なお、図７（ｂ）に示すように、検出対象となる物体（歩行者等の人物）よりも小さい矩形９０２でも矩形内の視差が均一となり、上記（１３）式を満たす場合がある。このような矩形が候補となることを防ぐため、検出部１１３が、ある点に対して複数サイズの矩形が上記（１３）式を満たす場合に、面積が最も大きな矩形だけを候補領域として選択するように構成してもよい。例えば、図７（ｂ）に示すように矩形９０１と矩形９０２とが候補領域となる場合、矩形９０１のみを候補領域として選択してもよい。 Here, N _min is a threshold value, and an appropriate value is set in advance. Note that, as shown in FIG. 7B, even in a rectangle 902 smaller than an object to be detected (a person such as a pedestrian), the parallax within the rectangle becomes uniform, and the above equation (13) may be satisfied. In order to prevent such a rectangle from becoming a candidate, the detection unit 113 selects only a rectangle having the largest area as a candidate area when a plurality of rectangles satisfy the above expression (13) for a certain point. You may comprise as follows. For example, when a rectangle 901 and a rectangle 902 are candidate regions as shown in FIG. 7B, only the rectangle 901 may be selected as a candidate region.

また、上記均一度の評価方法は一例であり、矩形内の視差の均一度を評価できるものであればあらゆる評価方法を適用できる。また、値が小さいほど均一の度合いが大きいことを表す均一度を用いるように構成してもよい。 Moreover, the evaluation method of the said uniformity is an example, and any evaluation method can be applied if it can evaluate the uniformity of the parallax within a rectangle. Moreover, you may comprise so that the uniformity which represents that a degree of uniformity is so large that a value is small may be used.

検出部１１３は、上記の処理によって、Ｎ個（Ｎ≧０）の候補領域Ｒ_１〜Ｒ_Ｎを検出する。上述の図４は、３個の候補領域Ｒ_１、Ｒ_２、Ｒ_３が検出された例を示している。検出部１１３は、さらに、右画像で検出されたこれらの候補領域に対応する左画像での候補領域Ｒ’_１〜Ｒ’_Ｎを抽出する。各矩形の下辺は道路面と接しており、その視差が（１１）式より与えられるため、各矩形の下辺のｙ座標から候補領域Ｒ’_１〜Ｒ’_Ｎを検出する。以下では、右画像で検出された候補領域Ｒ_１〜Ｒ_Ｎと、これらに対応する左画像の候補領域Ｒ’_１〜Ｒ’_Ｎとを対応づけたＮ（≧０）組の対応領域を（Ｒ_１，Ｒ’_１）〜（Ｒ_Ｎ，Ｒ’_Ｎ）と表記する。 The detection unit 113 detects N (N ≧ 0) candidate regions R _{1 to} R _N by the above processing. FIG. 4 described above shows an example in which _three candidate regions R ₁ , R ₂ , and R ₃ are detected. The detection unit 113 further extracts candidate regions R ′ _{1 to} R ′ _N in the left image corresponding to these candidate regions detected in the right image. Since the lower side of each rectangle is in contact with the road surface and the parallax is given by equation (11), candidate regions R ′ _{1 to} R ′ _N are detected from the y coordinate of the lower side of each rectangle. In the following, N (≧ 0) corresponding regions in which the candidate regions R _{1 to} R _N detected in the right image and the corresponding candidate regions R ′ _{1 to} R ′ _{N of} the left image are associated with ( R ₁ , R ′ ₁ ) to (R _N , R ′ _N ).

正規化部１０３は、領域設定部１１０によって設定されたＮ組の対応領域に含まれる候補領域それぞれを予め定めたサイズに正規化する。正規化のサイズは任意であるが、本実施の形態では４８×２４画素の縦長の矩形にそろえるものとする。 The normalization unit 103 normalizes each candidate area included in the N sets of corresponding areas set by the area setting unit 110 to a predetermined size. The normalization size is arbitrary, but in the present embodiment, it is assumed that it is aligned with a vertically long rectangle of 48 × 24 pixels.

画素特徴算出部１０４は、正規化部１０３によって正規化された候補領域の画像データの特徴量を画素毎に算出する。画素特徴算出部１０４は、例えば、輝度の勾配方向を画素特徴量として算出する。輝度の勾配方向は照明変動等に対してロバストな特徴量であり、明るさの変化が大きい環境下でも有効な特徴量である。明るさの変化が比較的小さい場合には、輝度値自身を特徴量として用いても良い。以下では画素特徴量として輝度勾配方向を用いた場合について説明する。また、画素特徴算出部１０４は、算出した輝度勾配方向を適当な範囲の離散値に量子化する。図８は、輝度勾配方向を量子化する方法の一例を説明する図である。同図では、０〜７の８方向に量子化する場合が例示されている。 The pixel feature calculation unit 104 calculates the feature amount of the image data of the candidate area normalized by the normalization unit 103 for each pixel. For example, the pixel feature calculation unit 104 calculates the luminance gradient direction as the pixel feature amount. The gradient direction of the luminance is a feature amount that is robust against illumination variation and the like, and is an effective feature amount even in an environment where the change in brightness is large. When the change in brightness is relatively small, the luminance value itself may be used as the feature amount. Hereinafter, a case where the luminance gradient direction is used as the pixel feature amount will be described. Further, the pixel feature calculation unit 104 quantizes the calculated luminance gradient direction into a discrete value in an appropriate range. FIG. 8 is a diagram for explaining an example of a method of quantizing the luminance gradient direction. In the figure, a case where quantization is performed in 8 directions of 0 to 7 is illustrated.

画像内特徴算出部１０５は、正規化された領域ごとに画像内特徴ベクトルを算出する。画像内特徴ベクトルとは、領域内の複数の画素に対して算出された画素特徴量の共起頻度を要素とするベクトル形式で表された特徴量（画像内特徴量）である。 The in-image feature calculation unit 105 calculates an in-image feature vector for each normalized area. The in-image feature vector is a feature amount (intra-image feature amount) represented in a vector format having the co-occurrence frequency of pixel feature amounts calculated for a plurality of pixels in an area as an element.

画像内特徴算出部１０５は、１つの候補領域に対して１つの特徴ベクトルを算出する。正規化候補領域は２Ｎ個抽出されているので、画像内特徴算出部１０５は、２Ｎ個の特徴ベクトルを算出する。画像内特徴算出部１０５は、まず、正規化部１０３によって正規化された各候補領域をさらに複数の部分領域に分割する。図９は、候補領域の分割方法の一例を示す図である。図９は、候補領域をメッシュ状に区切ることにより、複数の正方形の部分領域（正方領域）に分割する例を示している。 The in-image feature calculation unit 105 calculates one feature vector for one candidate region. Since 2N normalization candidate regions are extracted, the in-image feature calculation unit 105 calculates 2N feature vectors. The in-image feature calculation unit 105 first divides each candidate region normalized by the normalization unit 103 into a plurality of partial regions. FIG. 9 is a diagram illustrating an example of a candidate region dividing method. FIG. 9 shows an example in which the candidate area is divided into a plurality of square partial areas (square areas) by dividing the candidate area into a mesh shape.

この例では、画像内特徴算出部１０５は、次にメッシュ状に区切りられた各メッシュ（正方領域）ごとに共起ヒストグラムを算出する。メッシュ数は任意であるが、例えば同図に示すように８×４個に分割する。領域全体が４８×２４画素であるから１区画（１つの部分領域）は６×６画素の正方領域となる。 In this example, the in-image feature calculation unit 105 calculates a co-occurrence histogram for each mesh (square area) divided into meshes next. Although the number of meshes is arbitrary, for example, as shown in FIG. Since the entire region is 48 × 24 pixels, one section (one partial region) is a square region of 6 × 6 pixels.

以下に共起ヒストグラムについて説明する。図１０は、図９のメッシュを拡大表示した図である。同図では、１つの矩形がメッシュ内の１つの画素を表している。また、矢印は、その画素で算出された輝度勾配方向を示す。まず、ある注目画素ｒ＝（ｘ，ｙ）と、その注目画素から変位（変位ベクトル）δ＝（δｘ，δｙ）だけ離れた画素ｒ＋δを考える。なお、図１０は、δ＝（１，０）の場合を示している。 The co-occurrence histogram will be described below. FIG. 10 is an enlarged view of the mesh of FIG. In the figure, one rectangle represents one pixel in the mesh. An arrow indicates the luminance gradient direction calculated for the pixel. First, consider a pixel of interest r = (x, y) and a pixel r + δ that is separated from the pixel of interest by a displacement (displacement vector) δ = (δx, δy). FIG. 10 shows the case where δ = (1, 0).

次に、ｒおよびｒ＋δの画素特徴量をそれぞれｉおよびｊとして、図１１に示すような行列を定義する。図１０の場合、ｒの画素特徴量ｉとｒ＋δの画素特徴量ｊとの組み合わせが（ｉ，ｊ）＝（０，１）であるから、行列の要素ｈ０１に対応する。一般に行列ｈｉｊを以下の（１４）式のように定義する。

ここで「＃」は括弧内の集合の要素数を示す。また、Ｉ（ｘ）等は画像上の点ｘでの画素特徴量を示す。行列ｈｉｊは、ある変位ベクトルδによって定義される２つの画素における画素特徴量の組み合わせのメッシュ内での分布を示す２次元のヒストグラム（共起ヒストグラム）である。行列ｈｉｊは１つの変位ベクトルに対して定義されるため、Ｄ種類の変位ベクトルを用いた場合にはＤ個の２次元ヒストグラム（共起ヒストグラム）が生成される。 Next, a matrix as shown in FIG. 11 is defined with the pixel feature values of r and r + δ as i and j, respectively. In the case of FIG. 10, since the combination of the pixel feature value i of r and the pixel feature value j of r + δ is (i, j) = (0, 1), it corresponds to the element h01 of the matrix. In general, the matrix hij is defined as the following equation (14).

Here, “#” indicates the number of elements in the set in parentheses. I (x) and the like indicate pixel feature amounts at a point x on the image. The matrix hij is a two-dimensional histogram (co-occurrence histogram) showing the distribution in the mesh of combinations of pixel feature values in two pixels defined by a certain displacement vector δ. Since the matrix hij is defined for one displacement vector, D two-dimensional histograms (co-occurrence histograms) are generated when D types of displacement vectors are used.

図１２は、注目画素からチェビシェフ距離が１である複数の変位ベクトルの一例を示す図である。なお、例えば、δ＝（−１，０）は、注目画素と参照画素とを入れ替えればδ＝（１，０）で代用可能である。なお、参照画素とは、注目画素から変位ベクトルだけ離れた画素を表す。このため、同図に示すようにチェビシェフ距離が１の変位ベクトルはδ_１〜δ_４の４種類存在する。 FIG. 12 is a diagram illustrating an example of a plurality of displacement vectors whose Chebyshev distance is 1 from the target pixel. For example, δ = (− 1, 0) can be substituted with δ = (1, 0) if the target pixel and the reference pixel are interchanged. Note that the reference pixel represents a pixel separated from the target pixel by a displacement vector. For this reason, there are four types of displacement vectors of δ ₁ to δ _{4 with} a Chebyshev distance of 1, as shown in FIG.

画像内特徴算出部１０５は、例えば同図に示すような４種類の変位ベクトルそれぞれを用いて４個の共起ヒストグラムを算出する。なお、変位ベクトルの生成方法や個数はこれに限られるものではない。例えば、マンハッタン距離やユークリッド距離を用いて変位ベクトルを生成してもよい。 The in-image feature calculation unit 105 calculates, for example, four co-occurrence histograms using each of four types of displacement vectors as shown in FIG. The generation method and the number of displacement vectors are not limited to this. For example, the displacement vector may be generated using the Manhattan distance or the Euclidean distance.

輝度勾配方向が８段階であり（図８参照）、４種類の変位ベクトルを用いた場合、図９の各メッシュに対して、画像内特徴算出部１０５は、８×８×４＝２５６次元の特徴ベクトルを算出する。この例では、メッシュ総数が８×４＝３２個であるため、１つの候補領域から２５６×３２＝８１９２次元の特徴ベクトルが生成される。この８１９２次元の特徴ベクトルが画像内特徴ベクトルに相当する。 The brightness gradient direction has 8 steps (see FIG. 8). When four types of displacement vectors are used, the in-image feature calculation unit 105 has 8 × 8 × 4 = 256 dimensions for each mesh in FIG. A feature vector is calculated. In this example, since the total number of meshes is 8 × 4 = 32, a 256 × 32 = 8192-dimensional feature vector is generated from one candidate region. This 8192-dimensional feature vector corresponds to the in-image feature vector.

上述のように左右の画像から１対の候補領域が抽出されるため、画像内特徴算出部１０５は、候補領域それぞれについて１個の画像内特徴ベクトルを算出する。以下では、左画像の候補領域から算出される画像内特徴ベクトルをλ、右画像の候補領域から算出される画像内特徴ベクトルをμと表記する。 Since a pair of candidate areas is extracted from the left and right images as described above, the in-image feature calculation unit 105 calculates one in-image feature vector for each candidate area. In the following, the in-image feature vector calculated from the candidate area of the left image is denoted as λ, and the in-image feature vector calculated from the candidate area of the right image is denoted as μ.

画像間特徴算出部１０６は、対応領域ごとに画像間特徴ベクトルを算出する。画像間特徴ベクトルとは、異なる画像にそれぞれ含まれる複数の画素に対して算出された画素特徴量の共起頻度を要素とするベクトル形式で表された特徴量（画像間特徴量）である。具体的には、画像間特徴算出部１０６は、注目画素ｒ＝（ｘ，ｙ）と参照画素ｒ＋δとを、互いに異なる画像から抽出し、画像内特徴算出部１０５と同様の手法により共起ヒストグラムを生成し、特徴ベクトルの要素とする。上述の例と同様に、輝度勾配方向が８段階であり、変位ベクトルが４種類であり、３２個のメッシュに分割する場合は、画像間特徴算出部１０６は、８１９２次元の特徴ベクトルを画像間特徴ベクトルとして算出する。以下では、画像間特徴ベクトルをψと表記する。 The inter-image feature calculation unit 106 calculates an inter-image feature vector for each corresponding region. The inter-image feature vector is a feature amount (inter-image feature amount) expressed in a vector format having the co-occurrence frequency of pixel feature amounts calculated for a plurality of pixels respectively included in different images. Specifically, the inter-image feature calculation unit 106 extracts the target pixel r = (x, y) and the reference pixel r + δ from different images, and uses the same method as the in-image feature calculation unit 105 to perform the co-occurrence histogram. Are used as elements of the feature vector. Similarly to the above-described example, when the luminance gradient direction has 8 steps, the displacement vector has 4 types, and the image is divided into 32 meshes, the inter-image feature calculation unit 106 converts the 8192-dimensional feature vector between the images. Calculate as a feature vector. Hereinafter, the inter-image feature vector is denoted as ψ.

図１３および図１４は、それぞれ画像間特徴ベクトルを算出する場合の注目画素および参照画素の一例を示す図である。図１３は、右画像から検出された候補領域Ｒ内の注目画素を示している。また、図１４は、図１３の候補領域Ｒに対応する候補領域Ｒ’内の参照画素を示している。なお、図１３および図１４は、変位ベクトルδ＝（１，０）の場合を示している。 FIG. 13 and FIG. 14 are diagrams illustrating an example of a target pixel and a reference pixel when calculating an inter-image feature vector, respectively. FIG. 13 shows a pixel of interest in the candidate area R detected from the right image. FIG. 14 shows reference pixels in the candidate area R ′ corresponding to the candidate area R in FIG. 13. 13 and 14 show a case where the displacement vector δ = (1, 0).

図１５は、画像間特徴ベクトルを算出する場合の共起ヒストグラムを表す行列の一例を示す図である。図１３および図１４の例では、ｒの画素特徴量とｒ＋δの画素特徴量との組み合わせが（ｉ，ｊ）＝（０，１）であるため、図１５に示す行列の要素ｇ０１が対応する。一般に画像間の共起ヒストグラムを表す行列ｇｉｊは以下の（１５）式のように定義される。

FIG. 15 is a diagram illustrating an example of a matrix representing a co-occurrence histogram when an inter-image feature vector is calculated. In the example of FIGS. 13 and 14, since the combination of the pixel feature value of r and the pixel feature value of r + δ is (i, j) = (0, 1), the element g01 of the matrix shown in FIG. 15 corresponds. . In general, a matrix gij representing a co-occurrence histogram between images is defined as the following equation (15).

図１６は、算出された特徴ベクトルの関係を示す図である。図１６に示すように、左画像の候補領域Ｒ’および右画像の候補領域Ｒからは、それぞれ画像内特徴ベクトルλおよびμが算出される。また、候補領域Ｒ’および候補領域Ｒから、画像間特徴ベクトルψが算出される。 FIG. 16 is a diagram illustrating the relationship between the calculated feature vectors. As shown in FIG. 16, in-image feature vectors λ and μ are calculated from the candidate region R ′ for the left image and the candidate region R for the right image, respectively. Further, an inter-image feature vector ψ is calculated from the candidate region R ′ and the candidate region R.

生成部１０７は、候補パターンごとの画像特徴量として、図１６に示すような３個の特徴ベクトルから新たな特徴ベクトルΨを生成する。生成部１０７は、例えば、３つのベクトルを以下の（１６）式のように結合して特徴ベクトルΨを生成する。ここで、｜λ｜、｜μ｜、および｜ψ｜、は、それぞれ特徴ベクトルλ、μ、およびψの次元数を表す。

The generation unit 107 generates a new feature vector Ψ from three feature vectors as shown in FIG. 16 as an image feature amount for each candidate pattern. For example, the generation unit 107 combines the three vectors as shown in the following equation (16) to generate the feature vector Ψ. Here, | λ |, | μ |, and | ψ | represent the number of dimensions of the feature vectors λ, μ, and ψ, respectively.

なお、生成部１０７が、２つの画像内特徴ベクトルλおよびμを統合した後、画像間特徴ベクトルψと結合しても良い。λとμとを統合したベクトルをΛとすると、ベクトルΛは以下の（１７）式で表される。また、特徴ベクトルΨは以下の（１８）式のような｜Λ｜＋｜ψ｜次元の特徴ベクトルとなる。

Note that the generation unit 107 may combine the two intra-image feature vectors λ and μ and then combine it with the inter-image feature vector ψ. When a vector obtained by integrating λ and μ is Λ, the vector Λ is expressed by the following equation (17). The feature vector ψ is a | Λ | + | ψ | -dimensional feature vector as shown in the following equation (18).

生成部１０７は、例えば、画像内特徴ベクトルλおよびμを、要素ごとに相加平均または相乗平均を算出することにより統合することができる。なお、画像内特徴ベクトルの統合方法はこれらに限られるものではない。このように画像内特徴ベクトルを統合することにより、最終的に算出される特徴ベクトルΨのサイズを減少させることができる。 For example, the generation unit 107 can integrate the in-image feature vectors λ and μ by calculating an arithmetic mean or a geometric mean for each element. Note that the method of integrating the feature vectors in the image is not limited to these. By integrating the in-image feature vectors in this way, the size of the finally calculated feature vector Ψ can be reduced.

出力部１０８は、生成部１０７が生成した特徴ベクトルΨを候補パターンごとに出力する。 The output unit 108 outputs the feature vector Ψ generated by the generation unit 107 for each candidate pattern.

次に、このように構成された本実施の形態に係る特徴抽出装置１００による特徴抽出処理について図１７を用いて説明する。図１７は、本実施の形態における特徴抽出処理の全体の流れを示すフローチャートである。 Next, feature extraction processing by the feature extraction apparatus 100 according to the present embodiment configured as described above will be described with reference to FIG. FIG. 17 is a flowchart showing the overall flow of feature extraction processing in the present embodiment.

まず、画像入力部１０１ａ、１０１ｂが、ステレオカメラを構成するカメラ１１ａ及び１１ｂにより撮像された左右画像を入力する（ステップＳ１７０１）。入力された左右画像は画像記憶部１０２ａ、１０２ｂに記憶される。 First, the image input units 101a and 101b input left and right images captured by the cameras 11a and 11b constituting the stereo camera (step S1701). The input left and right images are stored in the image storage units 102a and 102b.

次に、視差算出部１１１が、入力した左画像と右画像の対応点を求め各点に対する視差を算出して視差マップを作成する（ステップＳ１７０２）。次に、パラメータ算出部１１２が、視差マップを用いて道路平面の面パラメータα及びβを算出する（ステップＳ１７０３）。 Next, the parallax calculation unit 111 obtains corresponding points between the input left image and right image, calculates the parallax for each point, and creates a parallax map (step S1702). Next, the parameter calculation unit 112 calculates the surface parameters α and β of the road plane using the parallax map (step S1703).

次に、検出部１１３が、ステレオ処理領域に対して候補領域を生成し、生成した候補領域のうち、視差の均一度が閾値より大きい候補領域を検出する（ステップＳ１７０４）。この結果、右画像で検出された候補領域Ｒ_１〜Ｒ_Ｎと、対応する左画像の候補領域Ｒ’_１〜Ｒ’_Ｎとを対応づけたＮ組の対応領域（Ｒ_１，Ｒ’_１）〜（Ｒ_Ｎ，Ｒ’_Ｎ）が得られる。 Next, the detection unit 113 generates a candidate area for the stereo processing area, and detects a candidate area having a parallax uniformity greater than a threshold among the generated candidate areas (step S1704). As a result, N sets of corresponding regions (R ₁ , R ′ ₁ ) in which the candidate regions R _{1 to} R _N detected in the right image are associated with the corresponding candidate regions R ′ _{1 to} R ′ _{N of} the left image. ~ (R _N , R ' _N ) is obtained.

次に、正規化部１０３が、各対応領域に含まれる候補領域それぞれを正規化する（ステップＳ１７０５）。 Next, the normalization unit 103 normalizes each candidate area included in each corresponding area (step S1705).

次に、画素特徴算出部１０４が、正規化された候補領域の画像データの画素特徴量を画素ごとに算出する（ステップＳ１７０６）。また、画像内特徴算出部１０５が、算出された画素特徴量を用いて、正規化された候補領域ごとに画像内特徴量（画像内特徴ベクトル）を算出する（ステップＳ１７０７）。さらに、画像間特徴算出部１０６が、算出された画素特徴量を用いて、正規化された候補領域ごとに画像間特徴量（画像間特徴ベクトル）を算出する（ステップＳ１７０８）。 Next, the pixel feature calculation unit 104 calculates the pixel feature amount of the image data of the normalized candidate area for each pixel (step S1706). Also, the in-image feature calculation unit 105 calculates an in-image feature amount (intra-image feature vector) for each normalized candidate region using the calculated pixel feature amount (step S1707). Further, the inter-image feature calculation unit 106 calculates an inter-image feature amount (inter-image feature vector) for each normalized candidate area using the calculated pixel feature amount (step S1708).

次に、生成部１０７が、右画像の候補領域から算出された画像内特徴量と、対応する左画像の候補領域から算出された画像内特徴量とを統合し、統合した特徴量と画像間特徴量とを結合することにより、候補パターンごとの画像特徴量を生成する(ステップＳ１７０９)。そして、出力部１０８が、生成された画像特徴量を出力し（ステップＳ１７１０）、特徴抽出処理を終了する。 Next, the generation unit 107 integrates the in-image feature amount calculated from the candidate region of the right image and the in-image feature amount calculated from the corresponding candidate region of the left image. By combining the feature quantity, an image feature quantity for each candidate pattern is generated (step S1709). Then, the output unit 108 outputs the generated image feature amount (step S1710) and ends the feature extraction process.

以上のようにして、同一画像内で抽出される画像内特徴量と、互いに異なる画像間で抽出される画像間特徴量とを算出し、両者を記述した画像特徴量を算出することができる。これにより、パターン識別に用いた場合に極めて高い識別性能を実現できる特徴量を算出可能となる。 As described above, the in-image feature quantity extracted in the same image and the inter-image feature quantity extracted between different images can be calculated, and an image feature quantity describing both can be calculated. Thus, it is possible to calculate a feature amount that can realize extremely high discrimination performance when used for pattern discrimination.

なお、本実施の形態では、移動体の前方に設置したカメラを用いて進行方向に存在する歩行者を検出する場合について説明したが、移動体の側方や後方にカメラを設置してもよい。また、例えば移動ロボットなどのような移動体以外の移動体にカメラを搭載する場合にも適用可能である。また、移動体にカメラを設置する場合に限定されるものではなく、固定位置に設置される監視カメラで撮像した画像から物体を検出する場合にも適用できる。 In this embodiment, the case where a pedestrian existing in the traveling direction is detected using a camera installed in front of the moving body has been described. However, a camera may be installed on the side or rear of the moving body. . Further, the present invention can also be applied when a camera is mounted on a moving body other than a moving body such as a mobile robot. Further, the present invention is not limited to the case where a camera is installed on a moving body, and can also be applied to a case where an object is detected from an image captured by a monitoring camera installed at a fixed position.

また、人物（歩行者）を検出する場合について説明したが、検出対象物はこれに限定されるものではなく、他の検出対象であっても適用可能である。また、人物と移動体の同時検出といった多クラスの物体を検出する場合にも適用できる。 Moreover, although the case where a person (pedestrian) was detected was demonstrated, a detection target object is not limited to this, Even if it is another detection target, it is applicable. Further, the present invention can also be applied when detecting multi-class objects such as simultaneous detection of a person and a moving object.

また、２台のカメラを左右平行に並べた場合のステレオ視について説明したが、２台以上であればカメラの台数は任意であり、また、視野に重なりが存在すればそれら複数のカメラをどのように配置してもよい。 In addition, the stereo view when two cameras are arranged in parallel on the left and right has been described. However, the number of cameras is arbitrary as long as there are two or more cameras. You may arrange as follows.

また、領域設定部１１０が、ステレオ視差に基づいて候補領域を生成する場合について説明したが、１台の固定カメラを用いて得られる時系列画像上の各点の動きを計算し、動き情報から候補領域を抽出するように構成してもよい。上記の例では道路上に存在する立体物（高さを有する物体）が候補領域として抽出されるが、この場合には、動いている物体が候補として抽出される。 Moreover, although the case where the region setting unit 110 generates a candidate region based on stereo parallax has been described, the motion of each point on a time-series image obtained using one fixed camera is calculated, and the motion information is used. You may comprise so that a candidate area | region may be extracted. In the above example, a three-dimensional object (an object having a height) existing on the road is extracted as a candidate area. In this case, a moving object is extracted as a candidate.

また、画素特徴算出部１０４によって算出する画素特徴量として、輝度勾配方向を８段階に量子化した場合を説明したが、量子化の個数はこれに限られるものではない。例えば、上下（図８の２および６）、左右（図８の０および４）等を同一視した４方向に量子化しても良い。また、輝度勾配の方向ではなく、輝度勾配の大きさを用いても良い。また、画素特徴量として、ガウシアンフィルタ、ソーベルフィルタ、およびラプラシアンフィルタ等のフィルタの出力値を用いても良い。さらに、これらの内のいくつかを組み合わせ、各画素について複数種類の画素特徴量を算出しても良い。 In addition, although the case where the luminance gradient direction is quantized in eight steps has been described as the pixel feature amount calculated by the pixel feature calculation unit 104, the number of quantization is not limited to this. For example, quantization may be performed in four directions in which the top and bottom (2 and 6 in FIG. 8), left and right (0 and 4 in FIG. 8), etc. are identified. Also, the magnitude of the luminance gradient may be used instead of the direction of the luminance gradient. Further, output values of filters such as a Gaussian filter, a Sobel filter, and a Laplacian filter may be used as the pixel feature amount. Furthermore, some of these may be combined to calculate a plurality of types of pixel feature amounts for each pixel.

このように、本実施の形態に係る特徴抽出装置では、複数の画像データそれぞれから検出した部分画像に対し、同一部分画像内の画素の画素特徴量の組み合わせの共起頻度を表す画像内特徴量を算出し複数の画像内特徴量を統合した画像特徴量を生成することができる。さらに、本実施の形態に係る特徴抽出装置では、ある部分画像内の画素の画素特徴量と、他の部分画像内の画素の画素特徴量との組み合わせの共起頻度を表す画像間特徴量を算出し、この画像間特徴量を結合した画像特徴量を生成することができる。これにより、複数の画像を考慮し、より高い識別能力を有する特徴量を得ることができる。 As described above, in the feature extraction device according to the present embodiment, the in-image feature amount representing the co-occurrence frequency of the combination of pixel feature amounts of the pixels in the same partial image with respect to the partial images detected from each of the plurality of image data. And an image feature quantity obtained by integrating a plurality of feature quantities in the image can be generated. Furthermore, in the feature extraction device according to the present embodiment, an inter-image feature amount that represents a co-occurrence frequency of a combination of a pixel feature amount of a pixel in a partial image and a pixel feature amount of a pixel in another partial image is calculated. It is possible to generate an image feature amount that is calculated and combined with this inter-image feature amount. Accordingly, it is possible to obtain a feature amount having higher discrimination ability in consideration of a plurality of images.

また、本実施の形態に係る特徴抽出装置では、比較的少ない演算量で識別に有効な特徴量を抽出することができるため、例えば、物体認識や物体検出に適用した場合には、その識別性能を大幅に改善することが可能となる。 In addition, since the feature extraction apparatus according to the present embodiment can extract feature quantities effective for identification with a relatively small amount of computation, for example, when applied to object recognition or object detection, the identification performance thereof Can be greatly improved.

図１８は、本実施の形態に係る特徴抽出装置１００の各機能を実現するコンピュータなどの装置の構成を例示的に説明する図である。図１８のコンピュータは、例えば、主処理部４００、入力部４１０、表示部４２０、記憶部４９０、画像入力部１０１、入力Ｉ／Ｆ４１９、表示Ｉ／Ｆ４２９、画像入力Ｉ／Ｆ４８９、及び、メモリＩ／Ｆ４９９を有する。 FIG. 18 is a diagram for exemplarily explaining the configuration of a device such as a computer that implements each function of the feature extraction device 100 according to the present embodiment. 18 includes, for example, a main processing unit 400, an input unit 410, a display unit 420, a storage unit 490, an image input unit 101, an input I / F 419, a display I / F 429, an image input I / F 489, and a memory I. / F499.

主処理部４００は、図１７の各手順を実行するプログラムを実行させて特徴抽出装置１００の各機能を実現する。主処理部４００は、例えば、ＣＰＵ４０１、ＲＯＭ４０８、及び、ＲＡＭ４０９を有する。ＣＰＵ４０１は、プログラムを実行することにより、コンピュータが有する各デバイス等の制御を行う。ＲＯＭ４０８は、例えば、プログラムやパラメータ等が格納され、ＣＰＵ４０１にそれらが供せられる。ＲＡＭ４０９は、例えば、ＣＰＵ４０１がプログラムを実行する際のワークメモリとして供せられ、図１の画像記憶部１０２としても機能しうる。 The main processing unit 400 implements each function of the feature extraction apparatus 100 by executing a program that executes each procedure in FIG. The main processing unit 400 includes, for example, a CPU 401, a ROM 408, and a RAM 409. The CPU 401 controls each device included in the computer by executing a program. The ROM 408 stores programs, parameters, and the like, for example, and provides them to the CPU 401. The RAM 409 is provided as a work memory when the CPU 401 executes a program, for example, and can also function as the image storage unit 102 in FIG.

入力部４１０は、例えば、キーボードやマウス等の入力手段であり、コンピュータに対する指示が入力される。表示部４２０は、ＣＰＵ４０１の処理結果等を表示する。 The input unit 410 is input means such as a keyboard and a mouse, for example, and inputs instructions to the computer. The display unit 420 displays the processing result of the CPU 401 and the like.

入力Ｉ／Ｆ４１９、表示Ｉ／Ｆ４２９、メモリＩ／Ｆ４９９、及び、画像入力Ｉ／Ｆ４８９は、それぞれ、入力部４１０、表示部４２０、記憶部４９０、画像入力部１０１がバスを介して主処理部４００と接続される際のインタフェースである。 An input I / F 419, a display I / F 429, a memory I / F 499, and an image input I / F 489 are an input unit 410, a display unit 420, a storage unit 490, and an image input unit 101 via a bus, respectively. 400 is an interface when connected to 400.

本実施の形態に係る特徴抽出装置１００によって処理される画像データは、例えば、画像入力部１０１で取得するか、又は、ネットワークを経由して外部から入力される。従って、本実施の形態に係る特徴抽出装置１００は、画像入力部１０１を内部に備えてもよいし、外部の画像入力部と通信可能に接続されてもよい。本実施の形態に係る特徴抽出装置１００によって処理される画像データはまた、例えば、不図示の駆動装置に挿入された記録媒体から読み出されてもよい。 The image data processed by the feature extraction device 100 according to the present embodiment is acquired by the image input unit 101 or input from the outside via a network, for example. Therefore, the feature extraction apparatus 100 according to the present embodiment may include the image input unit 101 inside, or may be connected to an external image input unit so as to be communicable. The image data processed by the feature extraction device 100 according to the present embodiment may also be read from, for example, a recording medium inserted in a driving device (not shown).

本実施の形態に係る特徴抽出装置１００によって抽出される画像データの特徴量は、例えば、表示部４２０、又は、ネットワークから出力される。本実施の形態に係る特徴抽出装置１００によって抽出される画像データの特徴量はまた、例えば、駆動部に挿入された記録媒体、又は、記憶部４９０に記録されてもよい。 The feature amount of the image data extracted by the feature extraction apparatus 100 according to the present embodiment is output from the display unit 420 or the network, for example. The feature amount of the image data extracted by the feature extraction device 100 according to the present embodiment may also be recorded in, for example, a recording medium inserted in the drive unit or the storage unit 490.

図１の特徴抽出装置１００のプログラムは、ＲＯＭ４０８、及び、記憶部４９０等の不揮発性の記憶装置に格納される他に、ＣＤ，ＤＶＤ等の記録媒体に記録され、駆動部に挿入されることにより、コンピュータがそのプログラムを読み取って実行されてもよい。 The program of the feature extraction apparatus 100 in FIG. 1 is stored in a non-volatile storage device such as the ROM 408 and the storage unit 490, or is recorded on a recording medium such as a CD or DVD, and is inserted into the drive unit. Thus, the computer may read and execute the program.

なお、本発明は、上記実施の形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化することができる。また、上記実施の形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成することができる。例えば、実施の形態に示される全構成要素からいくつかの構成要素を削除してもよい。さらに、異なる実施の形態にわたる構成要素を適宜組み合わせても良い。 It should be noted that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

１０４画素特徴算出部
１０５画像内特徴算出部
１０７生成部
１１３検出部 104 pixel feature calculation unit 105 in-image feature calculation unit 107 generation unit 113 detection unit

Claims

A detection unit for detecting partial images corresponding to each other between a plurality of images;
A pixel feature calculation unit that calculates a pixel feature amount for each of the plurality of partial images and for each pixel included in the partial image;
For each partial image, for each pixel included in the partial image, the calculated pixel feature amount and the pixel feature amount calculated for other pixels that are separated from the pixel by a predetermined distance are shared. An in-image feature calculating unit that calculates an occurrence frequency and calculates an in-image feature amount including each of the co-occurrence frequencies calculated for each pixel as an element ;
An image feature amount representing a feature amount including an element obtained by integrating the co-occurrence frequencies included in each of the plurality of in-image feature amounts calculated for each of the plurality of partial images into one for each corresponding pixel. A generating unit to generate;
A feature extraction apparatus comprising:

Pixel feature values calculated for pixels included in the first partial image among the plurality of partial images, and pixel feature values calculated for pixels included in the second partial image among the plurality of partial images And an inter-image feature calculation unit that calculates an inter-image feature amount that represents a feature amount including the calculated co-occurrence frequency,
The generation unit generates the image feature amount including a plurality of the image feature amounts calculated for each of the plurality of partial images and the inter-image feature amount;
The feature extraction apparatus according to claim 1.

A parallax calculator that calculates parallax between the plurality of images;
The detection unit is a partial image included in the plurality of images, and detects the partial image in which the parallax uniformity of each point included in the partial image is greater than a predetermined threshold;
The feature extraction apparatus according to claim 1.

Computer
A detection unit for detecting partial images corresponding to each other between a plurality of images;
A pixel feature calculation unit that calculates a pixel feature amount for each of the plurality of partial images and for each pixel included in the partial image;
For each partial image, for each pixel included in the partial image, the calculated pixel feature amount and the pixel feature amount calculated for other pixels that are separated from the pixel by a predetermined distance are shared. An in-image feature calculating unit that calculates an occurrence frequency and calculates an in-image feature amount including each of the co-occurrence frequencies calculated for each pixel as an element ;
An image feature amount representing a feature amount including an element obtained by integrating the co-occurrence frequencies included in each of the plurality of in-image feature amounts calculated for each of the plurality of partial images into one for each corresponding pixel. A feature extraction program for functioning as a generation unit.

A detection step in which the detection unit detects partial images corresponding to each other between the plurality of images;
A pixel feature calculation step in which a pixel feature calculation unit calculates a pixel feature amount for each of the plurality of partial images and for each pixel included in the partial image;
An in-image feature calculation unit calculates, for each partial image, for each pixel included in the partial image, the calculated pixel feature amount and other pixels that are separated from the pixel by a predetermined distance. An intra-image feature calculation step of calculating a co-occurrence frequency with the pixel feature amount and calculating an in-image feature amount including each of the co-occurrence frequencies calculated for each pixel ;
The generation unit represents a feature amount including an element obtained by integrating the co-occurrence frequencies included in each of the plurality of in-image feature amounts calculated for each of the plurality of partial images into one for each corresponding pixel. A generation step for generating image features;
A feature extraction method characterized by comprising :