JP2011253354A

JP2011253354A - Image processing apparatus, method and program

Info

Publication number: JP2011253354A
Application number: JP2010126779A
Authority: JP
Inventors: Takuo Kawai; 拓郎川合; Yasutaka Hirasawa; 康孝平澤
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2010-06-02
Filing date: 2010-06-02
Publication date: 2011-12-15

Abstract

PROBLEM TO BE SOLVED: To improve accuracy of recognizing category of object in an image.SOLUTION: A distance detecting section 32 acquires information on a distance of an object from an imaging device at capturing an image, which is acquired by a image acquisition section 31, per pixel unit in the image. A region dividing section 41 included in an object recognition section 36 divides the image into a foreground region which is an object and a background region other than that in the image based on the image and the distance information. A background-effect reducing feature amount extraction section 42 extracts a feature amount, which reduces an effect on category recognition of the object by the background, based on the information either one or all of the image, the distance information and the divided background regions. A recognition processing section 43 recognizes a category of the foreground of the object in the image based on the extracted feature amount. The invention is applicable to an image processing apparatus.

Description

本発明は、画像処理装置および方法、並びにプログラムに関し、特に、画像内の被写体となる物体の種別の認識精度を向上させるようにした画像処理装置および方法、並びにプログラムに関する。 The present invention relates to an image processing device, method, and program, and more particularly, to an image processing device, method, and program that improve the recognition accuracy of the type of an object that is a subject in an image.

従来において、物体認識技術として、認識対象を人の掌とし、グー、チョキ、またはパー等の手の形状を識別する技術がある。 Conventionally, as an object recognition technique, there is a technique for identifying the shape of a hand such as goo, choki, or par with a recognition target as a human palm.

また、これらの物体認識技術の物体認識精度を向上させるために、画像と併せて、特定の条件や付加情報を利用する技術があり、その一例として、肌色情報を利用することで手の領域を取得し易くし、手の形状認識の精度向上を図るようにした技術が提案されている（特許文献１参照）。 In addition, in order to improve the object recognition accuracy of these object recognition technologies, there are technologies that use specific conditions and additional information in combination with images, and as an example, the hand area can be reduced by using skin color information. A technique has been proposed that facilitates acquisition and improves the accuracy of hand shape recognition (see Patent Document 1).

特開２００７−１４８６６３号公報JP 2007-148663 A

しかしながら、特許文献１の技術においては、肌色情報を用いる為に、認識処理の精度が照明等の影響を受け易い。 However, in the technique of Patent Document 1, since the skin color information is used, the accuracy of the recognition process is easily affected by lighting or the like.

本発明はこのような状況に鑑みてなされたものであり、特に、様々な環境の下で撮像された画像に対しても画像内の被写体となる物体の種別を高精度で認識できるようにするものである。 The present invention has been made in view of such circumstances, and in particular, makes it possible to recognize with high accuracy the type of an object that is a subject in an image even for images taken under various environments. Is.

本発明の一側面の画像処理装置は、画像を取得する画像取得手段と、前記画像内に撮像されている被写体の、前記画像の撮像時の撮像機器からの距離の情報を、前記画像の画素単位で取得する距離取得手段と、前記画像、および前記距離情報に基づいて、前記画像における被写体となる前景領域と、それ以外の背景領域とに領域を分割する領域分割手段と、前記画像、前記距離情報、および前記領域分割手段により分割された背景領域の情報の全て、またはそのいずれかに基づいて、前記背景による被写体の種別の認識に対する影響を低減する特徴量を抽出する背景影響低減特徴量抽出手段と、前記背景影響低減特徴量抽出手段により抽出された特徴量に基づいて、前記画像における前景となる被写体の種別を認識する認識手段とを含む。 An image processing apparatus according to an aspect of the present invention includes: an image acquisition unit configured to acquire an image; and information about a distance of an object captured in the image from an imaging device when the image is captured. A distance acquisition means for acquiring in units, an area dividing means for dividing an area into a foreground area as a subject in the image and a background area other than the image based on the image and the distance information; the image; A background influence reducing feature quantity that extracts a feature quantity that reduces the influence of the background on the recognition of the type of subject based on distance information and / or all of the background area information divided by the area dividing means Extraction means, and recognition means for recognizing the type of the subject as the foreground in the image based on the feature amount extracted by the background effect reduction feature amount extraction means.

前記背景影響低減特徴量抽出手段には、前記画像、および前記領域分割手段により分割された背景領域の情報に基づいて、前記画像のうち、前記背景領域に対応する画素を平滑化する平滑化手段を含ませるようにすることができ、前記平滑化手段により前記背景領域に対応する画素が平滑化された画像より、前記背景による被写体の種別の認識に対する影響を低減する特徴量を抽出させるようにすることができる。 The background influence reducing feature amount extraction means includes a smoothing means for smoothing pixels corresponding to the background area in the image based on the image and information on the background area divided by the area dividing means. A feature amount that reduces the influence of the background on the recognition of the type of the subject is extracted from the image in which the pixels corresponding to the background region are smoothed by the smoothing unit. can do.

前記平滑化手段には、前記領域分割手段により分割された背景領域の画素に対して、その近傍の画素の背景画素と前景画素との比率に応じた強度で画素を平滑化させるようにすることができる。 The smoothing means smoothes the pixels with the intensity corresponding to the ratio of the background pixels and foreground pixels of the neighboring pixels to the pixels in the background area divided by the area dividing means. Can do.

背景影響低減特徴量抽出手段には、前記画像の特徴量を抽出する特徴量抽出手段と、前記画像、および距離情報、前記領域分割手段により分割された背景領域の情報に基づいて、重み設定し、前記特徴量に前記重みを付する重み設定手段とを含ませるようにすることができ、前記重み設定手段により設定された重みが付された特徴量を、前記背景による被写体の種別の認識に対する影響を低減する特徴量として抽出させるようにすることができる。 The background effect reduction feature quantity extraction means sets the weight based on the feature quantity extraction means for extracting the feature quantity of the image, the image, distance information, and information on the background area divided by the area division means. A weight setting unit for adding the weight to the feature amount, and the feature amount to which the weight set by the weight setting unit is attached can be used for recognition of a subject type by the background. It can be made to extract as a feature-value which reduces influence.

前記特徴量抽出手段には、HOG特徴量を抽出させるようにすることができる。 The feature amount extraction means can extract HOG feature amounts.

本発明の一側面の画像処理方法は、画像を取得する画像取得手段と、前記画像内に撮像されている被写体の、前記画像の撮像時の撮像機器からの距離の情報を、前記画像の画素単位で取得する距離取得手段と、前記画像、および前記距離情報に基づいて、前記画像における被写体となる前景領域と、それ以外の背景領域とに領域を分割する領域分割手段と、前記画像、前記距離情報、および前記領域分割手段により分割された背景領域の情報の全て、またはそのいずれかに基づいて、前記背景による被写体の種別の認識に対する影響を低減する特徴量を抽出する背景影響低減特徴量抽出手段と、前記背景影響低減特徴量抽出手段により抽出された特徴量に基づいて、前記画像における前景となる被写体の種別を認識する認識手段とを含む画像処理装置の画像処理方法であって、前記画像取得手段における、前記画像を取得する画像取得ステップと、前記距離取得手段における、前記画像内に撮像されている被写体の、前記画像の撮像時の撮像機器からの距離の情報を、前記画像の画素単位で取得する距離取得ステップと、前記領域分割手段における、前記画像、および前記距離情報に基づいて、前記画像における被写体となる前景領域と、それ以外の背景領域とに領域を分割する領域分割ステップと、前記背景影響低減特徴量抽出手段における、前記画像、前記距離情報、および前記領域分割ステップの処理により分割された背景領域の情報の全て、またはそのいずれかに基づいて、前記背景による被写体の種別の認識に対する影響を低減する特徴量を抽出する背景影響低減特徴量抽出ステップと、前記認識手段における、前記背景影響低減特徴量抽出ステップの処理により抽出された特徴量に基づいて、前記画像における前景となる被写体の種別を認識する認識ステップとを含む。 An image processing method according to an aspect of the present invention includes: an image acquisition unit configured to acquire an image; and information on a distance of an object captured in the image from an imaging device at the time of imaging the image. A distance acquisition means for acquiring in units, an area dividing means for dividing an area into a foreground area as a subject in the image and a background area other than the image based on the image and the distance information; the image; A background influence reducing feature quantity that extracts a feature quantity that reduces the influence of the background on the recognition of the type of subject based on distance information and / or all of the background area information divided by the area dividing means An image including extraction means and recognition means for recognizing the type of the subject as the foreground in the image based on the feature amount extracted by the background effect reduction feature amount extraction means An image processing method for a physical apparatus, wherein the image acquisition unit acquires the image, and the distance acquisition unit captures an image of the subject captured in the image at the time of capturing the image. A distance acquisition step of acquiring information on the distance from the device in units of pixels of the image, the image in the region dividing means, and a foreground region that is a subject in the image based on the distance information; A region dividing step of dividing the region into background regions, and all of the image, the distance information, and the information on the background region divided by the region dividing step in the background influence reducing feature amount extraction unit, or Based on either of them, a background influence reducing feature quantity that extracts a feature quantity that reduces the influence of the background on recognition of the type of subject is extracted. And step out, in the recognition unit, on the basis of the feature value extracted by the processing of the background impact reduction feature extraction step, and a recognition step of recognizing the type of the subject to be the foreground in the image.

本発明の一側面のプログラムは、画像を取得する画像取得手段と、前記画像内に撮像されている被写体の、前記画像の撮像時の撮像機器からの距離の情報を、前記画像の画素単位で取得する距離取得手段と、前記画像、および前記距離情報に基づいて、前記画像における被写体となる前景領域と、それ以外の背景領域とに領域を分割する領域分割手段と、前記画像、前記距離情報、および前記領域分割手段により分割された背景領域の情報の全て、またはそのいずれかに基づいて、前記背景による被写体の種別の認識に対する影響を低減する特徴量を抽出する背景影響低減特徴量抽出手段と、前記背景影響低減特徴量抽出手段により抽出された特徴量に基づいて、前記画像における前景となる被写体の種別を認識する認識手段とを含む画像処理装置を制御するコンピュータに、前記画像取得手段における、前記画像を取得する画像取得ステップと、前記距離取得手段における、前記画像内に撮像されている被写体の、前記画像の撮像時の撮像機器からの距離の情報を、前記画像の画素単位で取得する距離取得ステップと、前記領域分割手段における、前記画像、および前記距離情報に基づいて、前記画像における被写体となる前景領域と、それ以外の背景領域とに領域を分割する領域分割ステップと、前記背景影響低減特徴量抽出手段における、前記画像、前記距離情報、および前記領域分割ステップの処理により分割された背景領域の情報の全て、またはそのいずれかに基づいて、前記背景による被写体の種別の認識に対する影響を低減する特徴量を抽出する背景影響低減特徴量抽出ステップと、前記認識手段における、前記背景影響低減特徴量抽出ステップの処理により抽出された特徴量に基づいて、前記画像における前景となる被写体の種別を認識する認識ステップとを含む処理を実行させる。 According to another aspect of the present invention, there is provided a program for acquiring, in pixel units of the image, image acquisition means for acquiring an image, and information on the distance of the subject imaged in the image from the imaging device at the time of imaging the image. A distance acquisition means for acquiring, an area dividing means for dividing the area into a foreground area as a subject in the image and a background area other than the image based on the image and the distance information; the image; and the distance information. And background influence reducing feature amount extraction means for extracting a feature amount that reduces the influence of the background on recognition of the type of subject based on all or one of the information of the background region divided by the region dividing means. And an image processing unit for recognizing a type of a subject as a foreground in the image based on the feature amount extracted by the background effect reduction feature amount extraction unit. An image acquisition step for acquiring the image in the image acquisition means, and a subject captured in the image in the distance acquisition means from an imaging device at the time of imaging the image to a computer that controls the apparatus A distance acquisition step of acquiring distance information in units of pixels of the image, the image in the region dividing means, and a foreground region as a subject in the image based on the distance information, and other background regions And / or all of the information of the image, the distance information, and the background region divided by the processing of the region dividing step in the background effect reducing feature amount extraction unit. Based on the above, a background influence reducing feature quantity for extracting a feature quantity for reducing the influence of the background on the recognition of the type of subject And a recognition step of recognizing the type of the subject as the foreground in the image based on the feature amount extracted by the processing of the background effect reduction feature amount extraction step in the recognition unit. .

本発明の一側面においては、画像が取得され、前記画像内に撮像されている被写体の、前記画像の撮像時の撮像機器からの距離の情報が、前記画像の画素単位で取得され、前記画像、および前記距離情報に基づいて、前記画像における被写体となる前景領域と、それ以外の背景領域とに領域が分割され、前記画像、前記距離情報、および分割された背景領域の情報の全て、またはそのいずれかに基づいて、前記背景による被写体の種別の認識に対する影響が低減される特徴量が抽出され、抽出された特徴量に基づいて、前記画像における前景となる被写体の種別が認識される。 In one aspect of the present invention, an image is acquired, and distance information of the subject captured in the image from the imaging device at the time of capturing the image is acquired in pixel units of the image. And based on the distance information, an area is divided into a foreground area as a subject in the image and a background area other than that, and all of the image, the distance information, and information of the divided background area, or Based on either of them, a feature amount that reduces the influence of the background on the recognition of the subject type is extracted, and the type of the subject that is the foreground in the image is recognized based on the extracted feature amount.

本発明の画像処理装置は、独立した装置であっても良いし、画像処理を行うブロックであっても良い。 The image processing apparatus of the present invention may be an independent apparatus or a block that performs image processing.

本発明の一側面によれば、様々な環境下で撮像された画像に含まれる被写体である物体の種別の識別精度を向上させることが可能となる。 According to one aspect of the present invention, it is possible to improve the identification accuracy of the type of an object that is a subject included in images captured under various environments.

本発明を適用した画像処理装置の一実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of one Embodiment of the image processing apparatus to which this invention is applied. 図１の画像処理装置による物体認識処理を説明するフローチャートである。It is a flowchart explaining the object recognition process by the image processing apparatus of FIG. 図１の画像処理装置による物体認識処理を説明する図である。It is a figure explaining the object recognition process by the image processing apparatus of FIG. 図１の画像処理装置による背景影響低減特徴量抽出処理を説明するフローチャートである。3 is a flowchart for explaining background effect reduction feature amount extraction processing by the image processing apparatus of FIG. 1; 図１の画像処理装置による画像加工処理を説明するフローチャートである。3 is a flowchart for explaining image processing by the image processing apparatus of FIG. 1. 図１の画像処理装置による特徴量抽出処理を説明するフローチャートである。It is a flowchart explaining the feature-value extraction process by the image processing apparatus of FIG. 特徴量抽出処理を説明する図である。It is a figure explaining a feature-value extraction process. 本発明を適用した画像処理装置のその他の実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of other embodiment of the image processing apparatus to which this invention is applied. 図８の画像処理装置による背景影響低減特徴量抽出処理を説明するフローチャートである。It is a flowchart explaining the background influence reduction feature-value extraction process by the image processing apparatus of FIG. 本発明を適用した画像処理装置のさらにその他の実施の形態の構成例を示すブロック図である。FIG. 20 is a block diagram illustrating a configuration example of still another embodiment of an image processing device to which the present invention has been applied. 図１０の画像処理装置による特徴量抽出処理を説明するフローチャートである。It is a flowchart explaining the feature-value extraction process by the image processing apparatus of FIG. 汎用のパーソナルコンピュータの構成例を説明する図である。And FIG. 11 is a diagram illustrating a configuration example of a general-purpose personal computer.

以下、発明を実施するための最良の形態（以下実施の形態とする）について説明する。尚、説明は以下の順序で行う。
１．第１の実施の形態（画像加工処理と特徴量抽出処理とで認識精度低下を低減する例）
２．第２の実施の形態（特徴量抽出処理のみで認識精度低下を低減する例）
３．第３の実施の形態（画像加工処理のみで認識精度低下を低減する例） Hereinafter, the best mode for carrying out the invention (hereinafter referred to as an embodiment) will be described. The description will be given in the following order.
1. First embodiment (example in which reduction in recognition accuracy is reduced by image processing processing and feature amount extraction processing)
2. Second embodiment (example in which a reduction in recognition accuracy is reduced only by feature amount extraction processing)
3. Third embodiment (example in which a reduction in recognition accuracy is reduced only by image processing)

＜１．第１の実施の形態＞
［画像処理装置の構成例］
図１は、本発明を適用した画像処理装置のハードウェアの一実施の形態の構成例を示している。図１の画像処理装置１１は、様々な環境下で撮像された画像に含まれる被写体である物体の種別を認識するものである。 <1. First Embodiment>
[Configuration example of image processing apparatus]
FIG. 1 shows a configuration example of an embodiment of hardware of an image processing apparatus to which the present invention is applied. The image processing apparatus 11 in FIG. 1 recognizes the type of an object that is a subject included in images captured under various environments.

画像処理装置１１は、画像取得部３１、距離検出部３２、目標位置検出部３３、拡大縮小部３４，３５、および物体認識部３６を備えている。 The image processing apparatus 11 includes an image acquisition unit 31, a distance detection unit 32, a target position detection unit 33, enlargement / reduction units 34 and 35, and an object recognition unit 36.

画像取得部３１は、CCD（Charge Coupled Devices）やCMOS（Complementary Metal Oxide Semiconductor）などからなる撮像素子によりリアルタイムに複数の画像をステレオ画像のように撮像して、距離検出部３２、および目標位置検出部３３に供給する。また、画像取得部３１は、リアルタイムに撮像されているステレオ画像に代えて、予め撮像された少なくとも１枚の画像を取得するようにしてもよく、この場合、取得した画像を目標位置検出部３３にのみ供給する。 The image acquisition unit 31 captures a plurality of images like a stereo image in real time by an imaging device such as a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS), and detects the distance detection unit 32 and target position detection. To the unit 33. The image acquisition unit 31 may acquire at least one image captured in advance, instead of the stereo image captured in real time. In this case, the acquired image is used as the target position detection unit 33. Supply only to.

距離検出部３２は、画像取得部３１によりステレオ画像などの複数の画像がリアルタイムに撮像される場合、撮像された画像間の対応点探索処理により対応点における視差を求める。そして、距離検出部３２は、求められた対応点における視差により撮像された画像に含まれる被写体の画素単位での画像取得部３１から被写体までの距離を求めて、距離情報として目標位置検出部３３、および拡大縮小部３５に供給する。また、距離検出部３２は、画像取得部３１よりリアルタイムでステレオ画像などが取得できず、予め撮像された画像が供給される場合、画像を用いた手法とは異なる、例えば、TOF（Time of Flight）法などにより撮像時における被写体までの距離を検出し、距離情報として目標位置検出部３３、および拡大縮小部３５に供給する。さらに、距離検出部３２は、画像取得部３１により取得された画像が撮像されるタイミングにおいて、例えば、測距センサなどにより検出された画像内の被写体までの撮像位置からの距離の情報を距離情報として取得して、目標位置検出部３３、および拡大縮小部３５に供給する。 When a plurality of images such as a stereo image are captured in real time by the image acquisition unit 31, the distance detection unit 32 obtains parallax at corresponding points by a corresponding point search process between the captured images. Then, the distance detection unit 32 obtains the distance from the image acquisition unit 31 to the subject in units of pixels of the subject included in the image captured by the parallax at the obtained corresponding point, and the target position detection unit 33 as distance information. And to the enlargement / reduction unit 35. In addition, the distance detection unit 32 cannot acquire a stereo image or the like in real time from the image acquisition unit 31, and is different from a method using an image when a pre-captured image is supplied, for example, TOF (Time of Flight). ) Method to detect the distance to the subject at the time of imaging, and supplies it to the target position detection unit 33 and the enlargement / reduction unit 35 as distance information. Furthermore, the distance detection unit 32 obtains distance information from the imaging position to the subject in the image detected by a distance measuring sensor or the like at the timing when the image acquired by the image acquisition unit 31 is captured, for example. And supplied to the target position detection unit 33 and the enlargement / reduction unit 35.

目標位置検出部３３は、画像取得部３１より供給されてくる画像、および、距離検出部３２により供給されてくる距離情報に基づいて、画像内に存在する目標位置を検出し、目標位置の情報に基づいて画像内の必要領域を切り出して拡大縮小部３４に供給する。ここで、目標とは、認識すべき対象となる被写体としての物体であり、目標位置とは、３次元座標空間における目標である被写体の位置を示す座標である。目標位置検出部３３は、例えば、画像取得部３１が撮像素子である場合、その撮像素子や測距センサから最も近い領域（若しくは点）、または、所定の距離の条件を満たす領域を、逐次モンテカルロ法等の任意の確率モデルを用いた推定等により目標位置として求める。尚、目標位置検出部３３は、画像取得部３１により取得された画像のうち、必要とされる領域を求めて、求められたその領域を切り出すものであるが、処理領域を特定することが目的であるため、必ずしも必要ではない。特に、取得される画像の被写体の領域が所定領域より大きいことが想定されているような場合などには、その処理は必ずしも必要ではないので、目標位置検出部３３は、構成として省略するようにしてもよいものである。 The target position detection unit 33 detects a target position existing in the image based on the image supplied from the image acquisition unit 31 and the distance information supplied from the distance detection unit 32, and information on the target position The necessary area in the image is cut out based on the above and supplied to the enlargement / reduction unit 34. Here, the target is an object as a subject to be recognized, and the target position is a coordinate indicating the position of the subject as a target in the three-dimensional coordinate space. For example, when the image acquisition unit 31 is an image sensor, the target position detection unit 33 sequentially selects a region (or a point) closest to the image sensor or the distance measuring sensor, or a region that satisfies a predetermined distance condition from the Monte Carlo. The target position is obtained by estimation using an arbitrary probability model such as a method. The target position detection unit 33 obtains a required area from the image acquired by the image acquisition unit 31, and cuts out the calculated area. The purpose is to specify a processing area. Therefore, it is not always necessary. In particular, when it is assumed that the area of the subject of the acquired image is larger than the predetermined area, the processing is not necessarily required, so that the target position detection unit 33 is omitted as a configuration. It may be.

拡大縮小部３４，３５は、画像、および距離情報を処理の必要に応じてサイズを拡大、または縮小して物体認識部３６に供給する。より詳細には、拡大縮小部３４，３５は、ニアレストネイバー法、バイリニア法、バイキュービック法、またはLanczosフィルタ等により画像、および距離情報を拡大、または縮小する。尚、サイズの拡大、または縮小は、必要に応じてなされる処理であるため、拡大縮小部３４，３５は、拡大または縮小せずにそのままのサイズで出力するようにしてもよい。 The enlargement / reduction units 34 and 35 supply the image and distance information to the object recognition unit 36 with the size enlarged or reduced according to the necessity of processing. More specifically, the enlargement / reduction units 34 and 35 enlarge or reduce the image and distance information by the nearest neighbor method, the bilinear method, the bicubic method, the Lanczos filter, or the like. Note that the enlargement or reduction of the size is a process performed as necessary, and therefore the enlargement / reduction units 34 and 35 may output the same size without enlargement or reduction.

物体認識部３６は、拡大縮小部３４より供給されてくる画像と、拡大縮小部３５より供給されてくる距離情報とに基づいて、画像内の被写体である物体の種別を識別して認識し、認識結果として出力する。より詳細には、物体認識部３６は、領域分割部４１、背景影響低減特徴量抽出部４２、および認識処理部４３を備えている。 Based on the image supplied from the enlargement / reduction unit 34 and the distance information supplied from the enlargement / reduction unit 35, the object recognition unit 36 identifies and recognizes the type of the object that is the subject in the image, Output as recognition result. More specifically, the object recognition unit 36 includes a region division unit 41, a background effect reduction feature amount extraction unit 42, and a recognition processing unit 43.

領域分割部４１は、拡大縮小部３５より供給されてくる距離情報に基づいて、拡大縮小部３４より供給されてくる画像を、前景領域、および背景領域に分割し、領域分割情報として背景影響低減特徴量抽出部４２に供給する。より具体的には、領域分割部４１は、例えば、距離情報に含まれる全画素の距離の中央値を求め、中央値よりも近い距離となる画素を前景領域とし、遠い距離となる画素を背景領域とする。ただし、前景領域および背景領域の分割は、これ以外の手法であっても良く、例えば、所定の距離より近い画素を前景領域とし、これより遠い画素を背景領域としてもよいし、画像セグメンテーション等の任意のオブジェクト抽出と距離を併せて、例えば、所定のオブジェクトの距離よりも近い距離の画素を前景領域とし、それ以外を背景領域とするようにしてもよい。また、領域分割情報は、各画素が、分割されたそれぞれ前景領域または背景領域のいずれに属するのかを示すフラグ情報となる。従って、画像の全領域が前景領域と背景領域の２つの領域に分割される場合、領域分割情報は、前景領域を示すフラグと背景領域を示すフラグで構成される情報となる。さらに、領域分割部４１は、前景領域および背景領域以外に距離に応じた更に多くの領域に分割するようにしてもよく、この場合、領域分割情報は、距離に応じて分割される領域を示すフラグで構成される情報となる。 Based on the distance information supplied from the enlargement / reduction unit 35, the region division unit 41 divides the image supplied from the enlargement / reduction unit 34 into a foreground region and a background region, and reduces background influence as region division information. This is supplied to the feature quantity extraction unit 42. More specifically, for example, the region dividing unit 41 obtains the median value of the distances of all the pixels included in the distance information, sets the pixels that are closer than the median as the foreground region, and uses the pixels that are far away as the background. This is an area. However, the foreground area and the background area may be divided by other methods. For example, pixels closer than a predetermined distance may be used as the foreground area, pixels farther than this may be used as the background area, and image segmentation or the like may be used. In combination with the extraction of an arbitrary object and the distance, for example, a pixel having a distance shorter than the distance of a predetermined object may be set as the foreground area, and the other may be set as the background area. The area division information is flag information indicating whether each pixel belongs to the divided foreground area or background area. Accordingly, when the entire area of the image is divided into two areas, the foreground area and the background area, the area division information is information including a flag indicating the foreground area and a flag indicating the background area. Further, the area dividing unit 41 may divide the area into more areas according to the distance in addition to the foreground area and the background area. In this case, the area division information indicates an area to be divided according to the distance. This is information composed of flags.

背景影響低減特徴量抽出部４２は、拡大縮小部３４より供給される画像、拡大縮小部３５より供給される距離情報、および領域分割部４１より供給される領域分割情報に基づいて、背景領域の影響により前景領域である物体の種別の認識低下を低減させるための特徴量を抽出して認識処理部４３に出力する。より詳細には、背景影響低減特徴量抽出部４２は、画像加工部５１、および特徴量抽出部５２を備えている。 Based on the image supplied from the enlargement / reduction unit 34, the distance information supplied from the enlargement / reduction unit 35, and the region division information supplied from the region division unit 41, the background influence reduction feature amount extraction unit 42 A feature amount for reducing the reduction in recognition of the type of the object that is the foreground region due to the influence is extracted and output to the recognition processing unit 43. More specifically, the background influence reducing feature amount extraction unit 42 includes an image processing unit 51 and a feature amount extraction unit 52.

画像加工部５１は、領域分割情報、および距離情報に基づいて、画像のうち、背景領域に属する画素に対して平滑化処理を施す。より詳細には、画像加工部５１は、比率計算部６１、比率強度計算部６２、距離強度計算部６３、および平滑化部６４を備えている。比率計算部６１は、背景領域に属する画素のそれぞれに対応付けて局所領域を設定し、局所領域に属する画素の背景領域に属する画素の割合を比率として求める。比率強度計算部６２は、平滑化部６４における平滑化の強度を比率計算部６１により求められた比率に基づいて比率強度として計算する。距離強度計算部６３は、距離情報に基づいて、平滑化部６４における平滑化の強度を距離強度として計算する。平滑化部６４は、比率強度、および距離強度に対応した強度で、背景領域に属する画素を平滑化する。 Based on the region division information and the distance information, the image processing unit 51 performs a smoothing process on the pixels belonging to the background region in the image. More specifically, the image processing unit 51 includes a ratio calculation unit 61, a ratio intensity calculation unit 62, a distance intensity calculation unit 63, and a smoothing unit 64. The ratio calculation unit 61 sets a local area in association with each pixel belonging to the background area, and obtains the ratio of the pixels belonging to the background area of the pixels belonging to the local area as the ratio. The ratio strength calculation unit 62 calculates the smoothing strength in the smoothing unit 64 as the ratio strength based on the ratio obtained by the ratio calculation unit 61. The distance strength calculation unit 63 calculates the smoothing strength in the smoothing unit 64 as the distance strength based on the distance information. The smoothing unit 64 smoothes pixels belonging to the background region with an intensity corresponding to the ratio intensity and the distance intensity.

特徴量抽出部５２は、領域分割情報および距離情報に基づいて、画像加工部５１により背景領域が平滑化された画像より、認識処理に必要となる特徴量としてHOG（Histogram of Oriented Gradients）特徴量を抽出する。より詳細には、特徴量抽出部５２は、勾配計算部７１、比率計算部７２、比率重み計算部７３、距離重み計算部７４、ヒストグラム生成部７５、および正規化部７６を備えている。勾配計算部７１は、画像加工部５１より供給されてくる画像における各画素について勾配の大きさと方向を求める。比率計算部７２は、比率計算部６１と同様に、各画素のそれぞれに対応付けて局所領域を設定し、局所領域に属する画素の背景領域に属する画素の割合を比率として求める。比率重み計算部７３は、比率計算部７２により計算された比率に基づいて、各画素の勾配の大きさに付加する比率重みを設定する。距離重み計算部７４は、距離情報に基づいて、各画素の距離に応じて、各画素の勾配の大きさに付加する距離重みを設定する。ヒストグラム生成部７５は、複数の画素からなるセル単位で勾配の方向別に、勾配の大きさに比率重み、および距離重みを付加して度数を求め、求められた度数に基づいてヒストグラムを生成する。正規化部７６は、複数のセルからなるブロック単位で、生成されたヒストグラムを正規化し、これをHOG特徴量として認識処理部４３に供給する。尚、この例においては、特徴量としてHOG特徴量を抽出する例について説明しているが、物体の認識に利用できる特徴量であれば、これに限るものではなく、例えば、Haar-like特徴量、SIFT特徴量、またはGabor特徴量等などを利用するようにしても良い。 The feature amount extraction unit 52 uses a HOG (Histogram of Oriented Gradients) feature amount as a feature amount necessary for recognition processing from an image obtained by smoothing the background region by the image processing unit 51 based on the region division information and the distance information. To extract. More specifically, the feature amount extraction unit 52 includes a gradient calculation unit 71, a ratio calculation unit 72, a ratio weight calculation unit 73, a distance weight calculation unit 74, a histogram generation unit 75, and a normalization unit 76. The gradient calculation unit 71 obtains the magnitude and direction of the gradient for each pixel in the image supplied from the image processing unit 51. Similar to the ratio calculation unit 61, the ratio calculation unit 72 sets a local area in association with each pixel, and obtains the ratio of the pixels belonging to the background area of the pixels belonging to the local area as the ratio. The ratio weight calculation unit 73 sets a ratio weight to be added to the magnitude of the gradient of each pixel based on the ratio calculated by the ratio calculation unit 72. The distance weight calculation unit 74 sets a distance weight to be added to the gradient of each pixel according to the distance of each pixel based on the distance information. The histogram generating unit 75 obtains the frequency by adding the ratio weight and the distance weight to the magnitude of the gradient for each gradient direction in a cell unit composed of a plurality of pixels, and generates a histogram based on the obtained frequency. The normalizing unit 76 normalizes the generated histogram for each block composed of a plurality of cells, and supplies this to the recognition processing unit 43 as a HOG feature amount. In this example, an example of extracting a HOG feature value as a feature value is described. However, the feature value is not limited to this as long as the feature value can be used for object recognition. For example, a Haar-like feature value is used. SIFT feature value, Gabor feature value, or the like may be used.

認識処理部４３は、背景影響低減特徴量抽出部４２より供給されてくる背景影響低減特徴量に基づいて、例えば、SVM(Support Vector Machine)、またはBoostingなどにより物体の種別を識別する認識処理を実行し、認識結果を出力する。 The recognition processing unit 43 performs recognition processing for identifying the type of an object by using, for example, SVM (Support Vector Machine) or Boosting based on the background effect reduction feature amount supplied from the background effect reduction feature amount extraction unit 42. Execute and output the recognition result.

［物体認識処理］
次に、図２のフローチャートを参照して、物体認識処理について説明する。 [Object recognition processing]
Next, the object recognition process will be described with reference to the flowchart of FIG.

ステップＳ１１において、画像取得部３１は、被写体となる物体を含む画像を取得して、距離検出部３２、および目標位置検出部３３に供給する。 In step S 11, the image acquisition unit 31 acquires an image including an object that is a subject and supplies the acquired image to the distance detection unit 32 and the target position detection unit 33.

ステップＳ１２において、距離検出部３２は、画像取得部３１により取得された画像に含まれる被写体としての物体のそれぞれに対応する画素単位の距離の情報を検出し、目標位置検出部３３、および拡大縮小部３５に供給する。より詳細には、例えば、画像取得部３１が撮像素子を備えており、リアルタイムにステレオ画像などの複数のアングルから同一の被写体を捉えるように撮像する場合、距離検出部３２は、複数の画像の対応点探索処理により、その視差を用いて各画素単位で距離を求め、距離情報として検出する。 In step S 12, the distance detection unit 32 detects pixel-unit distance information corresponding to each of the objects as subjects included in the image acquired by the image acquisition unit 31, the target position detection unit 33, and the enlargement / reduction processing To the unit 35. More specifically, for example, when the image acquisition unit 31 includes an image sensor and captures the same subject from a plurality of angles such as a stereo image in real time, the distance detection unit 32 captures the plurality of images. By the corresponding point search process, a distance is obtained for each pixel using the parallax and detected as distance information.

ステップＳ１３において、目標位置検出部３３は、画像取得部３１より供給されてくる画像、および、距離検出部３２により供給されてくる距離情報に基づいて、画像内に存在する目標位置を検出し、ステップＳ１４において、目標位置の情報に基づいて画像内の必要領域を切り出して調整し、拡大縮小部３４に供給する。 In step S13, the target position detection unit 33 detects a target position existing in the image based on the image supplied from the image acquisition unit 31 and the distance information supplied from the distance detection unit 32, In step S 14, necessary areas in the image are cut out and adjusted based on the target position information, and supplied to the enlargement / reduction unit 34.

ステップＳ１５において、拡大縮小部３４，３５は、それぞれ目標位置検出部３３により目標位置に基づいて調整された画像、および距離情報を処理に適当なサイズに拡大、または縮小し、物体認識部３６に供給する。尚、拡大縮小部３４，３５より供給されてくるサイズが拡大、または縮小されている画像、および距離情報については、それぞれ処理に適当な領域についてサイズが調整されているのみであるので、以降においては、単に画像、および距離情報と称するものとする。 In step S 15, the enlargement / reduction units 34 and 35 enlarge or reduce the image adjusted by the target position detection unit 33 based on the target position and the distance information to a size suitable for processing, and then send the image to the object recognition unit 36. Supply. Note that the images supplied from the enlargement / reduction units 34 and 35 whose size is enlarged or reduced and the distance information are only adjusted in size for an area suitable for processing, and so on. Are simply referred to as images and distance information.

ステップＳ１６において、物体認識部３６における領域分割部４１は、距離情報に基づいて、画像を画素単位で前景領域、および背景領域に分割する。より詳細には、領域分割部４１は、例えば、各画素について、距離情報に基づいて、所定の距離より近い距離の画素について前景領域に分類し、その他の画素を背景領域に分類することにより、前景領域と背景領域とに分割する。すなわち、例えば、図３の画像Ａ１で示されるような画像が取得されると共に、図３の距離情報Ａ２が検出される場合、図３の領域分割情報Ａ３が求められる。ここで、画像Ａ１は、掌からなる被写体となる物体Ｚ１が背景Ｚ２上に存在する画像である。また、距離情報Ａ２は、掌の物体Ｚ１の存在する領域全体が所定の距離を示す配色Ｚ１１で描かれ、背景Ｚ２における空間内の各画素の距離が遠くなるにつれて黒くなるように配色Ｚ１２で示されている。さらに、領域分割情報Ａ３は、掌の物体Ｚ１の存在する領域より僅かに背後となる所定の距離より近い掌が白色の前景領域Ｚ２１として表示されており、それ以降については、背景領域Ｚ２２として示される灰色で示されている。結果として、掌の物体Ｚ１に対応する領域が前景領域Ｚ２１として分割され、それ以外が背景領域Ｚ２２に分割されている。 In step S16, the region dividing unit 41 in the object recognition unit 36 divides the image into a foreground region and a background region in units of pixels based on the distance information. More specifically, for example, the area dividing unit 41 classifies each pixel as a foreground area based on distance information, and classifies other pixels as background areas by classifying other pixels as background areas. Divide into foreground and background areas. That is, for example, when an image as shown by the image A1 in FIG. 3 is acquired and the distance information A2 in FIG. 3 is detected, the area division information A3 in FIG. 3 is obtained. Here, the image A1 is an image in which an object Z1 that is a subject made up of a palm exists on the background Z2. Further, the distance information A2 is drawn by a color scheme Z12 so that the entire area where the palm object Z1 exists is drawn with a color scheme Z11 indicating a predetermined distance, and becomes black as the distance of each pixel in the space in the background Z2 increases. Has been. Further, in the area division information A3, a palm slightly closer than a predetermined distance behind the area where the palm object Z1 is present is displayed as a white foreground area Z21, and the subsequent area is indicated as a background area Z22. Shown in gray. As a result, the area corresponding to the palm object Z1 is divided as the foreground area Z21, and the other area is divided into the background area Z22.

ステップＳ１７において、背景影響低減特徴量抽出部４２は、背景影響低減特徴量抽出処理を実行し、距離情報、および領域分割情報に基づいて、画像より背景影響低減特徴量を抽出する。 In step S 17, the background effect reduction feature value extraction unit 42 executes background effect reduction feature value extraction processing, and extracts a background effect reduction feature value from the image based on the distance information and the region division information.

［背景影響低減特徴量抽出処理］
ここで、図４のフローチャートを参照して、背景影響低減特徴量抽出処理について説明する。背景影響低減特徴量抽出処理は、画像加工部５１においてステップＳ３１で実行される画像加工処理と、特徴量抽出部５２において、ステップＳ３２で実行される特徴量抽出処理とから構成される。 [Background effect reduction feature extraction processing]
Here, the background influence reduction feature amount extraction processing will be described with reference to the flowchart of FIG. The background effect reduction feature value extraction process includes an image processing process executed in step S31 in the image processing unit 51 and a feature value extraction process executed in step S32 in the feature value extraction unit 52.

［画像加工処理］
ここで、図５のフローチャートを参照して、画像加工処理について説明する。 [Image processing]
Here, the image processing will be described with reference to the flowchart of FIG.

ステップＳ５１において、画像加工部５１は、画像を構成する垂直方向の座標、すなわちｙ座標のカウンタｙを０に初期化する。 In step S51, the image processing unit 51 initializes the counter y of the vertical coordinate constituting the image, that is, the y coordinate, to zero.

ステップＳ５２において、画像加工部５１は、画像を構成する水平方向の座標、すなわちｘ座標のカウンタｘを０に初期化する。 In step S52, the image processing unit 51 initializes a horizontal coordinate constituting the image, that is, an x coordinate counter x to zero.

ステップＳ５３において、画像加工部５１は、距離情報に基づいて、画像上の処理対象である画素Ｉ（ｘ，ｙ）が背景領域であるか否かを判定し、例えば、背景領域を示すフラグである場合、処理は、ステップＳ５４に進む。 In step S53, the image processing unit 51 determines whether or not the pixel I (x, y) that is the processing target on the image is a background region based on the distance information. For example, the image processing unit 51 uses a flag indicating the background region. If there is, the process proceeds to step S54.

ステップＳ５４において、画像加工部５１は、比率計算部６１を制御して、画素Ｉ（ｘ，ｙ）に対応する局所領域を設定する。すなわち、比率制御部６１は、例えば、処理対象である画素Ｉ（ｘ，ｙ）を中心とした６画素×６画素の方形領域を局所領域に設定する。 In step S54, the image processing unit 51 controls the ratio calculation unit 61 to set a local region corresponding to the pixel I (x, y). That is, the ratio control unit 61 sets, for example, a 6 pixel × 6 pixel square area centered on the pixel I (x, y) to be processed as a local area.

ステップＳ５５において、比率計算部６１は、局所領域に属する各画素の前景領域に属する画素数と、背景領域に属する画素数との比率ｑを計算する。 In step S55, the ratio calculator 61 calculates a ratio q between the number of pixels belonging to the foreground area and the number of pixels belonging to the background area of each pixel belonging to the local area.

ステップＳ５６において、画像加工部５１は、比率強度計算部６２を制御して、処理対象となる画素Ｉ（ｘ，ｙ）の平滑化の比率ｑに基づいた比率強度Ｓｔ＿ｑを計算する。すなわち、例えば、比率ｑが前景領域の割合が高いことを示している場合、処理対象となる画素Ｉ（ｘ，ｙ）の周辺には、物体の認識処理に影響を及ぼす背景領域の画素が少ないと考えられるため、比率強度計算部６２は、比率強度Ｓｔ＿ｑを弱くするように計算する。一方、比率ｑが背景領域の割合が高いことを示している場合、処理対象となる画素Ｉ（ｘ，ｙ）の周辺には、物体の認識処理に影響を及ぼす背景領域の画素が多いと考えられるため、比率強度計算部６２は、比率強度Ｓｔ＿ｑを強くするように計算する。 In step S56, the image processing unit 51 controls the ratio intensity calculation unit 62 to calculate the ratio intensity St_q based on the smoothing ratio q of the pixel I (x, y) to be processed. That is, for example, when the ratio q indicates that the ratio of the foreground region is high, there are few pixels in the background region that affect the object recognition processing around the pixel I (x, y) to be processed. Therefore, the ratio strength calculation unit 62 calculates the ratio strength St_q to be weak. On the other hand, when the ratio q indicates that the ratio of the background region is high, it is considered that there are many pixels in the background region that affect the object recognition processing around the pixel I (x, y) to be processed. Therefore, the ratio strength calculation unit 62 performs calculation so as to increase the ratio strength St_q.

ステップＳ５７において、画像加工部５１は、距離強度計算部６３を制御して、処理対象となる画素Ｉ（ｘ，ｙ）の平滑化の距離情報の距離に基づいた距離強度ｋｋを計算する。すなわち、処理対象である画素Ｉ（ｘ，ｙ）の距離が遠いほど背景領域であると考えられるので、距離強度計算部６３は、距離強度Ｓｔ＿ｄを強くするように計算し、逆に、距離が近いほど、距離強度Ｓｔ＿ｄを弱くして平滑化する。 In step S57, the image processing unit 51 controls the distance intensity calculation unit 63 to calculate the distance intensity kk based on the distance of the smoothed distance information of the pixel I (x, y) to be processed. That is, as the distance of the pixel I (x, y) to be processed increases, the distance region is considered to be the background region. Therefore, the distance intensity calculation unit 63 calculates the distance intensity St_d to be stronger, and conversely, the distance is As the distance is closer, the distance strength St_d is weakened and smoothed.

ステップＳ５８において、画像加工部５１は、平滑化部６４を制御して、比率強度Ｓｔ＿ｑおよび距離強度Ｓｔ＿ｄに対応した強度で、処理対象である画素Ｉ（ｘ，ｙ）を平滑化する。このように、平滑化の強度は、比率ｑと距離に依存することとなるため、例えば、平滑化部６４は、比率強度Ｓｔ＿ｑおよび距離強度Ｓｔ＿ｄの乗算結果Ｓｔ＿ｑ×Ｓｔ＿ｄに基づいて、平滑化の強度を設定するようにしてもよい。 In step S58, the image processing unit 51 controls the smoothing unit 64 to smooth the pixel I (x, y) to be processed with an intensity corresponding to the ratio intensity St_q and the distance intensity St_d. As described above, since the smoothing strength depends on the ratio q and the distance, for example, the smoothing unit 64 performs the smoothing based on the multiplication result St_q × St_d of the ratio strength St_q and the distance strength St_d. The strength may be set.

一方、ステップＳ５３において、処理対象となる画素Ｉ（ｘ，ｙ）が背景領域のフラグではないと判定された場合、ステップＳ５４乃至Ｓ５８の処理はスキップされる。 On the other hand, if it is determined in step S53 that the pixel I (x, y) to be processed is not a background region flag, the processing in steps S54 to S58 is skipped.

例えば、処理対象の画素Ｉ（ｘ，ｙ）が、図３の分割領域情報Ａ４における画素Ｂ１である場合、画素Ｂ１は、前景領域に属するため、ステップＳ５３において、処理対象となる画素が背景領域のフラグではないと判定されて、ステップＳ５４乃至Ｓ５８の処理がスキップされる。 For example, when the pixel I (x, y) to be processed is the pixel B1 in the divided region information A4 in FIG. 3, the pixel B1 belongs to the foreground region, and therefore, in step S53, the pixel to be processed is the background region. It is determined that the flag is not, and the processing of steps S54 to S58 is skipped.

一方、例えば、処理対象の画素Ｉ（ｘ，ｙ）が、図３の分割領域情報Ａ４における画素Ｂ２またはＢ３である場合、画素Ｂ２，Ｂ３は、背景領域に属すので、局所領域Ｚ１０２，Ｚ１０３における背景領域の割合に基づいて、それぞれが平滑化される。すなわち、局所領域Ｚ１０３は、全てが背景領域であるが、局所領域Ｚ１０２は、一部のみが背景領域である。このため、画素Ｂ３は、画素Ｂ２よりも、より強い強度で平滑化されることになる。 On the other hand, for example, when the pixel I (x, y) to be processed is the pixel B2 or B3 in the divided region information A4 in FIG. 3, the pixels B2 and B3 belong to the background region, and thus in the local regions Z102 and Z103. Each is smoothed based on the proportion of the background area. That is, all of the local area Z103 is a background area, but only a part of the local area Z102 is a background area. For this reason, the pixel B3 is smoothed with a stronger intensity than the pixel B2.

ステップＳ５９において、画像加工部５１は、画像を構成する水平方向の座標、すなわちｘ座標のカウンタｘを１インクリメントする。 In step S59, the image processing unit 51 increments the horizontal coordinate constituting the image, that is, the x-coordinate counter x by one.

ステップＳ６０において、画像加工部５１は、カウンタｘが画像の水平方向のサイズＳｘよりも小さいか否かを判定し、小さいと判定した場合、処理は、ステップＳ５３に戻る。すなわち、カウンタｘが水平方向のサイズＳｘよりも小さくなるまで、ステップＳ５３乃至Ｓ６０の処理が繰り返される。そして、ステップＳ６０において、カウンタｘが水平方向のサイズＳｘよりも小さいと判定された場合、処理は、ステップＳ６１に進む。 In step S60, the image processing unit 51 determines whether or not the counter x is smaller than the horizontal size Sx of the image. If it is determined that the counter x is smaller, the process returns to step S53. That is, the processes in steps S53 to S60 are repeated until the counter x becomes smaller than the horizontal size Sx. If it is determined in step S60 that the counter x is smaller than the horizontal size Sx, the process proceeds to step S61.

ステップＳ６１において、画像加工部５１は、画像を構成する水平方向の座標、すなわちｘ座標のカウンタｘを１インクリメントする。 In step S61, the image processing unit 51 increments a horizontal coordinate constituting the image, that is, an x-coordinate counter x by one.

ステップＳ６２において、画像加工部５１は、カウンタｙが画像の垂直方向のサイズＳｙよりも小さいか否かを判定し、小さいと判定した場合、処理は、ステップＳ５２に戻る。すなわち、カウンタｙが垂直方向のサイズＳｙよりも小さくなるまで、ステップＳ５２乃至Ｓ６２の処理が繰り返される。そして、ステップＳ６２において、カウンタｙが垂直方向のサイズＳｙよりも小さいと判定された場合、処理は、終了する。 In step S62, the image processing unit 51 determines whether or not the counter y is smaller than the vertical size Sy of the image. If it is determined that the counter y is smaller, the process returns to step S52. That is, the processes in steps S52 to S62 are repeated until the counter y is smaller than the vertical size Sy. If it is determined in step S62 that the counter y is smaller than the vertical size Sy, the process ends.

すなわち、領域分割情報に基づいて、背景領域に対応する画素については、局所領域における背景領域と前景領域との比率、および距離情報の距離に対応した強度で平滑化される。この結果、物体の種別の認識に影響を与える背景領域の画素による影響が低減されることになり、後段において、認識処理に影響を与える背景領域の画素の影響が低減された画像から特徴量が抽出されることになるので、物体の種別の認識処理における認識率を向上させることが可能となる。尚、平滑化の強度については、比率のみで制御するようにしても、距離のみで制御するようにしても、背景領域による認識処理の認識精度の低減を抑制する効果を奏する事ができるので、いずれかのみで平滑化の強度を制御するようにしても良い。 That is, on the basis of the area division information, the pixels corresponding to the background area are smoothed with the intensity corresponding to the distance between the background area and the foreground area in the local area and the distance information. As a result, the influence of the pixels in the background area that affects the recognition of the object type is reduced, and in the subsequent stage, the feature amount is extracted from the image in which the influence of the pixels in the background area that affects the recognition process is reduced. As a result, the recognition rate in the object type recognition process can be improved. Note that the smoothing intensity can be controlled only by the ratio or by the distance alone, so that the effect of suppressing the reduction of the recognition accuracy of the recognition process by the background region can be achieved. You may make it control the intensity | strength of smoothing only by either.

［特徴量抽出処理］
次に、図６のフローチャートを参照して、特徴量抽出処理について説明する。 [Feature extraction processing]
Next, the feature amount extraction processing will be described with reference to the flowchart of FIG.

ステップＳ７１において、特徴量抽出部５２は、画像を構成する垂直方向の座標、すなわちｙ座標のカウンタｙを０に初期化する。 In step S 71, the feature amount extraction unit 52 initializes the counter y of the vertical coordinate constituting the image, that is, the y coordinate, to 0.

ステップＳ７２において、特徴量抽出部５２は、画像を構成する水平方向の座標、すなわちｘ座標のカウンタｘを０に初期化する。 In step S72, the feature amount extraction unit 52 initializes a horizontal coordinate constituting the image, that is, an x coordinate counter x to zero.

ステップＳ７３において、特徴量抽出部５２は、勾配計算部７１を制御して、処理対象となる画素Ｉ（ｘ，ｙ）について勾配の角度および大きさを計算させる。より詳細には、例えば、処理対象画素Ｉ（ｘ，ｙ）が、図７の画素Ｃ１である場合、勾配計算部７１は、その垂直方向１３１に隣接する画素Ｉ（ｘ，ｙ＋１），Ｉ（ｘ，ｙ−１）および水平方向１３２に隣接する画素Ｉ（ｘ＋１，ｙ），Ｉ（ｘ−１，ｙ）を用いて、以下の式（１）を計算することにより勾配を求める。尚、図７の左上部においては、処理対象となる画素Ｉ（ｘ，ｙ）の勾配の配置が示されており、図７の中央上部にいては、処理対象画素Ｃ１である画素１２１を含むｍ画素×ｎ画素からなるセル１０２の構成例が示されている。さらに、図７の右上部においては、セル１０２に対応するセル１１１がＭ個×Ｎ個からなるブロック１０１の構成例が示されており、下部には、セル１０２を構成する各画素の勾配の方向Ｌ１乃至ＬＬ毎の度数からなるヒストグラム１０３の例が示されている。 In step S73, the feature amount extraction unit 52 controls the gradient calculation unit 71 to calculate the angle and magnitude of the gradient for the pixel I (x, y) to be processed. More specifically, for example, when the processing target pixel I (x, y) is the pixel C1 in FIG. 7, the gradient calculating unit 71 sets the pixels I (x, y + 1) and I ( The gradient is obtained by calculating the following equation (1) using x, y−1) and the pixels I (x + 1, y) and I (x−1, y) adjacent in the horizontal direction 132. In the upper left part of FIG. 7, the gradient arrangement of the pixel I (x, y) to be processed is shown. In the upper center part of FIG. 7, the pixel 121 that is the processing target pixel C 1 is included. A configuration example of the cell 102 composed of m pixels × n pixels is shown. Further, in the upper right part of FIG. 7, an example of the configuration of the block 101 having M × N cells 111 corresponding to the cell 102 is shown, and in the lower part, the gradient of each pixel constituting the cell 102 is shown. An example of the histogram 103 including the frequencies for the directions L1 to LL is shown.

ここで、ｇｘ（ｘ，ｙ），ｇｙ（ｘ，ｙ）は、それぞれ水平方向の勾配、および垂直方向の勾配である。 Here, gx (x, y) and gy (x, y) are a horizontal gradient and a vertical gradient, respectively.

そして、勾配計算部７１は、以下の式（２），式（３）を計算することにより、それぞれ勾配の大きさ、および角度を計算する。 And the gradient calculation part 71 calculates the magnitude | size and angle of a gradient, respectively by calculating the following formula | equation (2) and Formula (3).

ここｒ（ｘ，ｙ）は、勾配の大きさであり、θ（ｘ，ｙ）は、勾配の角度を示している。 Here, r (x, y) is the magnitude of the gradient, and θ (x, y) indicates the angle of the gradient.

ステップＳ７４において、特徴量抽出部５２は、比率計算部７２を制御して、画像上の処理対象である画素Ｉ（ｘ，ｙ）に対応する局所領域を設定する。すなわち、ステップＳ７４の処理により、比率計算部７２は、例えば、処理対象である画素Ｉ（ｘ，ｙ）を中心とした近傍のｕ画素×ｖ画素の方形領域を局所領域に設定する。 In step S74, the feature amount extraction unit 52 controls the ratio calculation unit 72 to set a local region corresponding to the pixel I (x, y) that is the processing target on the image. That is, by the processing in step S74, the ratio calculation unit 72 sets, for example, a square region of neighboring u pixels × v pixels centered on the pixel I (x, y) to be processed as a local region.

ステップＳ７５において、比率計算部７２は、局所領域に属する各画素の前景領域に属する画素数と、背景領域に属する画素数との比率ｑを計算する。 In step S75, the ratio calculation unit 72 calculates a ratio q between the number of pixels belonging to the foreground area and the number of pixels belonging to the background area of each pixel belonging to the local area.

ステップＳ７６において、特徴量抽出部５２は、比率重み計算部７３を制御して、比率ｑに基づいて比率重みｗｋを計算させる。 In step S76, the feature amount extraction unit 52 controls the ratio weight calculation unit 73 to calculate the ratio weight wk based on the ratio q.

ステップＳ７７において、特徴量抽出部５２は、距離重み計算部７４を制御して、距離情報における処理対象となる画素Ｉ（ｘ，ｙ）の距離に基づいて、距離重みｗｄを計算させる。 In step S77, the feature amount extraction unit 52 controls the distance weight calculation unit 74 to calculate the distance weight wd based on the distance of the pixel I (x, y) to be processed in the distance information.

ステップＳ７８において、特徴量抽出部５２は、画像を構成する水平方向の座標、すなわちｘ座標のカウンタｘを１インクリメントする。 In step S78, the feature quantity extraction unit 52 increments the horizontal coordinate constituting the image, that is, the x-coordinate counter x by one.

ステップＳ７９において、特徴量抽出部５２は、カウンタｘが画像の水平方向のサイズＳｘよりも小さいか否かを判定し、小さいと判定した場合、処理は、ステップＳ７３に戻る。すなわち、カウンタｘが水平方向のサイズＳｘよりも小さくなるまで、ステップＳ７３乃至Ｓ７９の処理が繰り返される。そして、ステップＳ７９において、カウンタｘが水平方向のサイズＳｘよりも小さいと判定された場合、処理は、ステップＳ８０に進む。 In step S79, the feature amount extraction unit 52 determines whether or not the counter x is smaller than the horizontal size Sx of the image. If it is determined that the counter x is smaller, the process returns to step S73. That is, the processes in steps S73 to S79 are repeated until the counter x becomes smaller than the horizontal size Sx. If it is determined in step S79 that the counter x is smaller than the horizontal size Sx, the process proceeds to step S80.

ステップＳ８０において、特徴量抽出部５２は、画像を構成する水平方向の座標、すなわちｘ座標のカウンタｘを１インクリメントする。 In step S80, the feature amount extraction unit 52 increments the horizontal coordinate constituting the image, that is, the x-coordinate counter x by one.

ステップＳ８１において、特徴量抽出部５２は、カウンタｙが画像の垂直方向のサイズＳｙよりも小さいか否かを判定し、小さいと判定した場合、処理は、ステップＳ７２に戻る。すなわち、カウンタｙが垂直方向のサイズＳｙよりも小さくなるまで、ステップＳ７２乃至Ｓ８１の処理が繰り返される。そして、ステップＳ８１において、カウンタｙが垂直方向のサイズＳｙよりも小さいと判定された場合、処理は、ステップＳ８２に進む。 In step S81, the feature quantity extraction unit 52 determines whether or not the counter y is smaller than the size Sy in the vertical direction of the image. If it is determined that the counter y is smaller, the process returns to step S72. That is, the processes in steps S72 to S81 are repeated until the counter y is smaller than the vertical size Sy. If it is determined in step S81 that the counter y is smaller than the vertical size Sy, the process proceeds to step S82.

ステップＳ８２において、ヒストグラム生成部７５は、未処理のセルを処理対象のセルに設定する。すなわち、図７の中央上部で示されるように、複数の画素群からなるセルを画像内に設定し、そのうち、未処理のセルを処理対象のセルに設定する。尚、図７の中央上部においては、ｍ画素×ｎ画素のセル１０２が設定される例が示されている。 In step S82, the histogram generation unit 75 sets an unprocessed cell as a cell to be processed. That is, as shown in the upper center portion of FIG. 7, a cell composed of a plurality of pixel groups is set in the image, and an unprocessed cell is set as a processing target cell. In the upper center of FIG. 7, an example in which a cell 102 of m pixels × n pixels is set is shown.

ステップＳ８３において、ヒストグラム生成部７５は、処理対象となるセルにおける勾配の方向毎の比率重みｗｋ、および距離重みｗｄを含めた度数を設定する。すなわち、HOG特徴量は、図７の中央上部で示されるセル１０２の各画素について勾配方向を方向Ｌ１乃至ＬＬに分けて、各画素の勾配方向に属する度数から求められるヒストグラムに基づくものである。 In step S83, the histogram generation unit 75 sets the frequency including the ratio weight wk and distance weight wd for each gradient direction in the cell to be processed. That is, the HOG feature amount is based on a histogram obtained from the frequency belonging to the gradient direction of each pixel by dividing the gradient direction into directions L1 to LL for each pixel of the cell 102 shown in the upper center of FIG.

すなわち、ヒストグラム生成部７５は、以下の式（４）で示されるように、各画素の勾配方向Ｌｌの度数ｆｌ’を設定する。 That is, the histogram generation unit 75 sets the frequency fl ′ in the gradient direction L1 of each pixel as represented by the following expression (4).

ここで、ｆｌ’は、方向Ｌｌの度数であり、ｗｋは、ステップＳ７６の処理で求められた各画素の局所領域における前景領域と背景領域との比率に基づいた比率重みである。また、ｗｄは、ステップＳ７７の処理で求められた各画素の距離情報の距離に応じて設定される距離重みであり、ｒｌ（ｘ，ｙ）は、勾配の大きさである。 Here, fl ′ is the frequency in the direction L1, and wk is a ratio weight based on the ratio between the foreground area and the background area in the local area of each pixel obtained in the process of step S76. Further, wd is a distance weight set according to the distance of the distance information of each pixel obtained in step S77, and rl (x, y) is the magnitude of the gradient.

すなわち、比率重みｗｋは、以下の式（５）で設定される前景領域と背景領域との比率ｂに基づいて設定されている。 That is, the ratio weight wk is set based on the ratio b between the foreground area and the background area set by the following equation (5).

ここで、δ（Ｄ）は、以下の式（６）で設定される領域分割情報Ｄ（ｘ，ｙ）を一般化した関数である。また、ｕ，ｖは、局所領域がｕ画素×ｖ画素の領域であることを示している。 Here, δ (D) is a generalized function of the region division information D (x, y) set by the following equation (6). U and v indicate that the local area is an area of u pixels × v pixels.

ここで、すなわち、関数δ（Ｄ）は、前景領域では１であり、背景領域では０となる関数である。 In other words, the function δ (D) is a function that is 1 in the foreground area and 0 in the background area.

例えば、比率重みｗｋをｋ段階設定する場合、比率重み計算部７３は、これらの条件に基づいて、例えば、以下の式（７）で示されるように重みを含めた度数を設定する。 For example, when the ratio weight wk is set in k stages, the ratio weight calculator 73 sets the frequency including the weight as shown in the following formula (7) based on these conditions, for example.

ここで、ｔｈ１，ｔｈ２，・・・，ｔｈｋは、ｋ段階の重みを設定するための閾値であり、比率重みｗｋ、および閾値ｔｈｋは、以下の式（８）で示される関係となる。 Here, th1, th2,..., Thk are thresholds for setting k-stage weights, and the ratio weight wk and the threshold thk have a relationship represented by the following expression (8).

すなわち、比率ｂが大きくなるに連れて、比率重みｗｋも大きくなり、すなわち、前景領域の比率が大きいほど、比率重みｗｋは、大きく設定されて、逆に、前景領域の比率が小さいほど、比率重みｗｋは小さく設定される。このため、物体の識別に影響を及ぼす背景領域の画素によるヒストグラムの度数が軽減されるように比率重みｗｋが設定されるので、特徴量より背景領域の画素による影響を排除し、認識精度を向上させることが可能となる。 That is, as the ratio b increases, the ratio weight wk also increases. That is, as the ratio of the foreground area increases, the ratio weight wk is set larger. Conversely, as the ratio of the foreground area decreases, the ratio weight wk increases. The weight wk is set small. For this reason, the ratio weight wk is set so that the frequency of the histogram by the pixels in the background area that affects the object identification is reduced, so that the influence of the pixels in the background area is eliminated from the feature amount and the recognition accuracy is improved. It becomes possible to make it.

同様に、距離重みｗｄは、各画素について、距離情報に応じて、距離が大きいほど小さく設定され、逆に、距離が小さいほど大きく設定される。このため、やはり、物体の識別に影響を及ぼす背景領域の画素によるヒストグラムの度数が軽減されるように距離重みｗｄが設定されるので、特徴量より背景領域の画素による影響を排除し、認識精度を向上させることが可能となる。 Similarly, the distance weight wd is set to be smaller as the distance is larger for each pixel, and conversely, is set to be larger as the distance is smaller. For this reason, the distance weight wd is set so that the frequency of the histogram of the pixels in the background area that affects the object identification is reduced, so that the influence of the pixels in the background area is eliminated from the feature amount, and the recognition accuracy is reduced. Can be improved.

ステップＳ８４において、ヒストグラム生成部７５は、このようにして求められた比率重みｗｋ、および距離重みｗｄを含む度数に基づいて、処理対象となるセル単位でヒストグラムを生成する。すなわち、ヒストグラムは、各画素の勾配方向を、所定数に分類し、分類される勾配方向毎の度数の和であり、例えば、以下の式（９）で示される。 In step S84, the histogram generation unit 75 generates a histogram for each cell to be processed based on the frequency including the ratio weight wk and the distance weight wd thus determined. That is, the histogram is a sum of the frequency for each gradient direction classified into a predetermined number of gradient directions of each pixel, and is represented by the following equation (9), for example.

ここで、Ｆは、各セルのヒストグラムであり、ｆＬ１，ｆＬ２、・・・ｆＬＬは、各勾配方向ＬＬ毎の度数を示している。 Here, F is a histogram of each cell, and fL1, fL2,... FLL indicates the frequency for each gradient direction LL.

ステップＳ８５において、ヒストグラム生成部７５は、未処理のセルが存在するか否かを判定し、未処理のセルが存在する場合、処理は、ステップＳ８２に戻る。すなわち、全てのセルについてヒストグラムが生成されるまで、ステップＳ８２乃至Ｓ８５の処理が繰り返される。そして、ステップＳ８５において、全てのセルについてヒストグラムが求められ、未処理のセルが存在しないと判定された場合、処理は、ステップＳ８６に進む。 In step S85, the histogram generation unit 75 determines whether there is an unprocessed cell. If there is an unprocessed cell, the process returns to step S82. That is, the processes in steps S82 to S85 are repeated until histograms are generated for all cells. In step S85, histograms are obtained for all the cells, and if it is determined that there are no unprocessed cells, the process proceeds to step S86.

ステップＳ８６において、正規化部７６は、未処理のブロックを処理対象のブロックに設定する。すなわち、ブロックは、複数のセルからなる領域であり、例えば、図７の右上部のようにセルがＭ個×Ｎ個の領域である。 In step S86, the normalization unit 76 sets an unprocessed block as a processing target block. That is, the block is an area composed of a plurality of cells. For example, the block is an area of M × N cells as shown in the upper right part of FIG.

ステップＳ８７において、正規化部７６は、求められたセル単位で求められたヒストグラムを各セルを中心として設定される処理対象のブロック単位で正規化し、ブロック単位で正規化されたヒストグラムによりHOG特徴量を生成し、認識処理部４３に出力する。すなわち、セル単位で設定されるブロック単位のヒストグラムＶは、以下の式（１０）となる。 In step S87, the normalizing unit 76 normalizes the obtained histogram in units of cells, in units of blocks to be processed set around each cell, and uses the normalized histogram in units of blocks to generate HOG feature values. Is output to the recognition processing unit 43. That is, the histogram V in block units set in cell units is expressed by the following equation (10).

正規化部７６は、全てのセルに対して、セル単位で設定されるブロック内のヒストグラムＶを、以下の式（１１）を計算することにより正規化する。この結果、各セル毎に設定されるブロック単位の正規化結果がHOG特徴量として求められることになる。 The normalization unit 76 normalizes the histogram V in the block set in cell units for all cells by calculating the following equation (11). As a result, the normalization result in units of blocks set for each cell is obtained as the HOG feature amount.

ここで、Ｖバーは、ブロック単位のヒストグラムＶが正規化されたものを示している。また、εは、ブロック単位のヒストグラムＶがゼロとなって発散しないようにさせるための定数である。 Here, the V bar indicates a normalized histogram V in block units. Further, ε is a constant for preventing the histogram V in block units from becoming divergent because it becomes zero.

ステップＳ８８において、正規化部７６は、未処理のブロックが存在するか否かを判定し、未処理のブロックが存在する場合、処理は、ステップＳ８６に戻る。すなわち、全てのブロックに対して正規化処理がなされるまで、ステップＳ８６乃至Ｓ８８の処理が繰り返される。そして、ステップＳ８８において、全てのブロックの正規化処理が終了し、未処理のブロックが存在しないと判定された場合、処理は、終了する。 In step S88, the normalization unit 76 determines whether there is an unprocessed block. If there is an unprocessed block, the process returns to step S86. That is, the processes in steps S86 to S88 are repeated until the normalization process is performed on all blocks. If it is determined in step S88 that the normalization process for all the blocks has been completed and there are no unprocessed blocks, the process ends.

すなわち、画像内が前景領域と背景領域との２種類にしか設定されない場合、比率重みｗｋは、ｋ＝２であるときの処理となる。従って、前景領域および背景領域以外の領域を距離情報に基づいて設定し、ｋ＞２とするときも、距離が近い領域の比率が高いほど比率重みｗｋを大きくすると共に、距離が近い画素についての距離重みｗｄを大きくすることで、距離の遠い画素、すなわち、背景領域に属しているものとみなされる、認識処理に影響を及ぼす要素を排除することが可能となり、より高い精度で物体の種別を識別して認識することが可能となる。 That is, when only two types of foreground areas and background areas are set in the image, the ratio weight wk is a process when k = 2. Therefore, even when an area other than the foreground area and the background area is set based on the distance information and k> 2, the ratio weight wk is increased as the ratio of the areas close to each other is high, and the pixels with the short distance are set. By increasing the distance weight wd, it is possible to eliminate pixels that are far away from each other, that is, elements that are considered to belong to the background area and that affect the recognition process. It becomes possible to identify and recognize.

ここで、図２のフローチャートに戻る。 Here, it returns to the flowchart of FIG.

ステップＳ１８において、認識処理部４３は、上述した処理により供給されてきた背景影響低減特徴量に基づいて、例えば、SVM(Support Vector Machine)、またはBoostingなどにより画像に含まれる物体の種別を識別して認識し、認識結果を出力する。 In step S18, the recognition processing unit 43 identifies the type of the object included in the image by, for example, SVM (Support Vector Machine) or Boosting based on the background effect reduction feature amount supplied by the above-described processing. Recognize and output the recognition result.

以上の処理により、画像における背景領域に属する画素を、局所領域における前景領域と背景領域との比率に応じた強度で平滑化した上で、さらに、各画素について局所領域における前景領域と背景領域との比率に応じて設定される比率重みと、各画素の距離情報に基づいて設定される距離重みとを付して特徴量を抽出するようにした。この結果、物体の認識処理において、認識精度を低減させる背景領域の影響を低減させることが可能となり、物体の認識精度を向上させることが可能となる。 With the above processing, the pixels belonging to the background area in the image are smoothed with an intensity according to the ratio of the foreground area and the background area in the local area, and for each pixel, the foreground area and the background area in the local area The feature amount is extracted by adding the ratio weight set according to the ratio and the distance weight set based on the distance information of each pixel. As a result, in the object recognition process, it is possible to reduce the influence of the background region that reduces the recognition accuracy, and it is possible to improve the object recognition accuracy.

＜２．第２の実施の形態＞
［その他の画像処理装置の構成例］
以上においては、画像加工処理、および特徴量抽出処理の両方の処理において、物体の認識精度を低下させる要因となる背景領域の画素による影響を低減させる例について説明してきたが、いずれの処理のみであっても効果を奏するものである。従って、図８で示される画像処理装置１１のように、画像加工部５１を削除するような構成としても、特徴量抽出処理により背景領域の画素による影響を低減させることが可能である。その場合、背景影響低減特徴量抽出処理は、図９で示されるように、ステップＳ１０１で示される特徴量抽出処理のみとなる。尚、特徴量抽出処理については、図６のフローチャートを参照して説明した処理と同様であるので、その説明は省略するものとする。 <2. Second Embodiment>
[Other image processing apparatus configuration examples]
In the above, an example has been described in which both the image processing process and the feature amount extraction process reduce the influence of the pixels in the background region, which causes a reduction in object recognition accuracy. Even if there is, it is effective. Therefore, even if the image processing unit 51 is deleted as in the image processing apparatus 11 shown in FIG. 8, it is possible to reduce the influence of the pixels in the background area by the feature amount extraction processing. In this case, the background effect reduction feature quantity extraction process is only the feature quantity extraction process shown in step S101 as shown in FIG. Note that the feature amount extraction processing is the same as the processing described with reference to the flowchart of FIG.

＜３．第３の実施の形態＞
［さらにその他の画像処理装置の構成例］
また、図１０で示される画像処理装置１１のように、画像加工処理のみで物体の認識精度を低下させる要因となる背景領域の画素による影響を低減させるようにしてもよい。すなわち、図１０における画像処理装置１１においては、背景影響特徴量抽出部５２における特徴量抽出部５２より比率計算部７２、比率重み計算部７３、および距離重み計算部７４が削除されている。この場合、特徴量抽出処理は、図１１で示されるように、図６のステップＳ７５乃至Ｓ７７の比率重みおよび距離重みを設定する処理が含まれておらず、ステップＳ１３０において、比率重みおよび距離重みを考慮しない度数が設定される。尚、図１１のフローチャートで示されるステップＳ１２１乃至Ｓ１２９，Ｓ１３１乃至Ｓ１３５の処理は、図６のステップＳ７１乃至Ｓ７４，Ｓ７８乃至Ｓ８２，Ｓ８４乃至Ｓ８８の処理と同様であるので、その説明は省略する。 <3. Third Embodiment>
[Further configuration example of other image processing apparatus]
Further, as in the image processing apparatus 11 shown in FIG. 10, it is possible to reduce the influence of the pixels in the background area that cause the object recognition accuracy to be lowered only by the image processing. That is, in the image processing apparatus 11 in FIG. 10, the ratio calculation unit 72, the ratio weight calculation unit 73, and the distance weight calculation unit 74 are deleted from the feature amount extraction unit 52 in the background influence feature amount extraction unit 52. In this case, as shown in FIG. 11, the feature amount extraction process does not include the process of setting the ratio weight and distance weight in steps S75 to S77 in FIG. 6, and the ratio weight and distance weight in step S130. A frequency that does not take into account is set. Note that the processing in steps S121 to S129 and S131 to S135 shown in the flowchart of FIG. 11 is the same as the processing of steps S71 to S74, S78 to S82, and S84 to S88 in FIG.

以上によれば、画像に含まれる被写体たる物体の種別を認識する認識処理における認識精度を低減させる要因となる背景領域の要素を排除して特徴量を抽出できるようにしたので、画像に含まれる被写体たる物体の種別を認識する認識精度を向上させることが可能となる。 According to the above, the feature amount can be extracted by excluding the elements of the background area that cause the recognition accuracy in the recognition process for recognizing the type of the object as the subject included in the image, and thus included in the image. It is possible to improve the recognition accuracy for recognizing the type of an object as a subject.

ところで、上述した一連の処理は、ハードウェアにより実行させることもできるが、ソフトウェアにより実行させることもできる。一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、記録媒体からインストールされる。 By the way, the series of processes described above can be executed by hardware, but can also be executed by software. When a series of processing is executed by software, a program constituting the software may execute various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a recording medium in a general-purpose personal computer or the like.

図１２は、汎用のパーソナルコンピュータの構成例を示している。このパーソナルコンピュータは、CPU(Central Processing Unit)１００１を内蔵している。CPU１００１にはバス１００４を介して、入出力インタ-フェイス１００５が接続されている。バス１００４には、ROM(Read Only Memory)１００２およびRAM(Random Access Memory)１００３が接続されている。 FIG. 12 shows a configuration example of a general-purpose personal computer. This personal computer incorporates a CPU (Central Processing Unit) 1001. An input / output interface 1005 is connected to the CPU 1001 via a bus 1004. A ROM (Read Only Memory) 1002 and a RAM (Random Access Memory) 1003 are connected to the bus 1004.

入出力インタ-フェイス１００５には、ユーザが操作コマンドを入力するキーボード、マウスなどの入力デバイスよりなる入力部１００６、処理操作画面や処理結果の画像を表示デバイスに出力する出力部１００７、プログラムや各種データを格納するハードディスクドライブなどよりなる記憶部１００８、LAN（Local Area Network）アダプタなどよりなり、インターネットに代表されるネットワークを介した通信処理を実行する通信部１００９が接続されている。また、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory)、DVD(Digital Versatile Disc)を含む）、光磁気ディスク（ＭＤ(Mini Disc)を含む）、もしくは半導体メモリなどのリムーバブルメディア１０１１に対してデータを読み書きするドライブ１０１０が接続されている。 The input / output interface 1005 includes an input unit 1006 including an input device such as a keyboard and a mouse for a user to input an operation command, an output unit 1007 for outputting a processing operation screen and an image of the processing result to a display device, programs, and various types. A storage unit 1008 including a hard disk drive for storing data, a LAN (Local Area Network) adapter, and the like, and a communication unit 1009 for performing communication processing via a network represented by the Internet are connected. Also, a magnetic disk (including a flexible disk), an optical disk (including a CD-ROM (Compact Disc-Read Only Memory), a DVD (Digital Versatile Disc)), a magneto-optical disk (including an MD (Mini Disc)), or a semiconductor A drive 1010 for reading / writing data from / to a removable medium 1011 such as a memory is connected.

CPU１００１は、ROM１００２に記憶されているプログラム、または磁気ディスク、光ディスク、光磁気ディスク、もしくは半導体メモリ等のリムーバブルメディア１０１１から読み出されて記憶部１００８にインストールされ、記憶部１００８からRAM１００３にロードされたプログラムに従って各種の処理を実行する。RAM１００３にはまた、CPU１００１が各種の処理を実行する上において必要なデータなども適宜記憶される。 The CPU 1001 is read from a program stored in the ROM 1002 or a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, installed in the storage unit 1008, and loaded from the storage unit 1008 to the RAM 1003. Various processes are executed according to the program. The RAM 1003 also appropriately stores data necessary for the CPU 1001 to execute various processes.

尚、本明細書において、記録媒体に記録されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理は、もちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理を含むものである。 In this specification, the step of describing the program recorded on the recording medium is not limited to the processing performed in time series in the order described, but of course, it is not necessarily performed in time series. Or the process performed separately is included.

１１画像処理装置，３１画像取得部，３２距離検出部，３３目標位置検出部，３４，３５拡大縮小部，３６物体認識部，４１領域分割部，４２背景影響低減特徴量抽出部，４３認識処理部，５１画像加工部，５２特徴量抽出部，６１比率計算部，６２比率強度計算部，６３距離強度計算部，６４平滑化部，７１勾配計算部，７２比率計算部，７３比率重み計算部，７４距離重み計算部，７５ヒストグラム生成部，７６正規化部 DESCRIPTION OF SYMBOLS 11 Image processing apparatus, 31 Image acquisition part, 32 Distance detection part, 33 Target position detection part, 34, 35 Enlarging / reducing part, 36 Object recognition part, 41 Area division part, 42 Background influence reduction feature-value extraction part, 43 Recognition processing , 51 image processing unit, 52 feature quantity extraction unit, 61 ratio calculation unit, 62 ratio intensity calculation unit, 63 distance intensity calculation unit, 64 smoothing unit, 71 gradient calculation unit, 72 ratio calculation unit, 73 ratio weight calculation unit , 74 Distance weight calculator, 75 Histogram generator, 76 Normalizer

Claims

Image acquisition means for acquiring images;
Distance acquisition means for acquiring information on the distance of the subject imaged in the image from the imaging device at the time of imaging of the image in units of pixels of the image;
Area dividing means for dividing the area into a foreground area as a subject in the image and a background area other than the image based on the image and the distance information;
A background for extracting a feature amount that reduces the influence of the background on the recognition of the type of the subject based on the image, the distance information, and / or all of the background area information divided by the area dividing unit. An influence reduction feature amount extraction means;
An image processing apparatus comprising: a recognizing unit that recognizes a type of a subject as a foreground in the image based on the feature amount extracted by the background effect reducing feature amount extracting unit.

The background influence reducing feature amount extraction means includes:
Smoothing means for smoothing pixels corresponding to the background area of the image based on the image and information on the background area divided by the area dividing means;
The image processing apparatus according to claim 1, wherein a feature amount that reduces an influence on recognition of a subject type by the background is extracted from an image obtained by smoothing pixels corresponding to the background region by the smoothing unit.

The said smoothing means smoothes a pixel with the intensity | strength according to the ratio of the background pixel of the pixel of the vicinity, and a foreground pixel with respect to the pixel of the background area | region divided | segmented by the said area | region division means. Image processing apparatus.

The background influence reducing feature amount extraction means
Feature quantity extraction means for extracting the feature quantity of the image;
Weight setting means for setting a weight based on the image, distance information, and information on a background area divided by the area dividing means, and attaching the weight to the feature amount;
The image processing apparatus according to claim 1, wherein the feature amount to which the weight set by the weight setting unit is added is extracted as a feature amount that reduces the influence of the background on recognition of the type of subject.

The image processing apparatus according to claim 1, wherein the feature amount extraction unit extracts a HOG feature amount.

Image acquisition means for acquiring images;
Distance acquisition means for acquiring information on the distance of the subject imaged in the image from the imaging device at the time of imaging of the image in units of pixels of the image;
Area dividing means for dividing the area into a foreground area as a subject in the image and a background area other than the image based on the image and the distance information;
A background for extracting a feature amount that reduces the influence of the background on the recognition of the type of the subject based on the image, the distance information, and / or all of the background area information divided by the area dividing unit. An influence reduction feature amount extraction means;
An image processing method of an image processing apparatus, comprising: a recognition unit that recognizes a type of a subject that is a foreground in the image based on the feature amount extracted by the background effect reduction feature amount extraction unit.
An image acquisition step of acquiring the image in the image acquisition means;
A distance acquisition step of acquiring distance information from the imaging device at the time of imaging the image of the subject imaged in the image in the distance acquisition unit, in units of pixels of the image;
A region dividing step of dividing the region into a foreground region to be a subject in the image and a background region other than the image based on the image and the distance information in the region dividing unit;
Based on the background, the distance information, and / or all of the information of the background region divided by the region dividing step in the background influence reducing feature amount extraction unit, the type of the subject by the background is determined. A background effect reduction feature quantity extraction step for extracting a feature quantity that reduces the influence on recognition; and
A recognition step of recognizing the type of the subject as the foreground in the image based on the feature amount extracted by the processing of the background effect reduction feature amount extraction step in the recognition means.

Image acquisition means for acquiring images;
Distance acquisition means for acquiring information on the distance of the subject imaged in the image from the imaging device at the time of imaging of the image in units of pixels of the image;
Area dividing means for dividing the area into a foreground area as a subject in the image and a background area other than the image based on the image and the distance information;
A background for extracting a feature amount that reduces the influence of the background on the recognition of the type of the subject based on the image, the distance information, and / or all of the background area information divided by the area dividing unit. An influence reduction feature amount extraction means;
A computer that controls an image processing apparatus including: a recognition unit that recognizes a type of a subject that is a foreground in the image based on the feature amount extracted by the background effect reduction feature amount extraction unit;
An image acquisition step of acquiring the image in the image acquisition means;
A distance acquisition step of acquiring distance information from the imaging device at the time of imaging the image of the subject imaged in the image in the distance acquisition unit, in units of pixels of the image;
A region dividing step of dividing the region into a foreground region to be a subject in the image and a background region other than the image based on the image and the distance information in the region dividing unit;
Based on the background, the distance information, and / or all of the information of the background region divided by the region dividing step in the background influence reducing feature amount extraction unit, the type of the subject by the background is determined. A background effect reduction feature quantity extraction step for extracting a feature quantity that reduces the influence on recognition; and
And a recognition step of recognizing the type of the subject as the foreground in the image based on the feature amount extracted by the processing of the background effect reduction feature amount extraction step in the recognition means.