WO2017168889A1

WO2017168889A1 - Object detection device and vehicle having the object detection device

Info

Publication number: WO2017168889A1
Application number: PCT/JP2016/088698
Authority: WO
Inventors: Yoshiki KURANUKI; Ioannis PATRAS
Original assignee: Yamaha Hatsudoki Kabushiki Kaisha; Queen Mary University Of London
Priority date: 2016-04-01
Filing date: 2016-12-26
Publication date: 2017-10-05

Abstract

An object detection device which is capable of reducing the amount of data for processing and the processing time without reducing the recognition performance is provided. The object detection device includes: a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of input images; a gradient magnitude operation part which operates gradient magnitudes of each of the input images; a gradient direction operation part which operates gradient directions of each of the input images; a maximum gradient magnitude operation part which operates a maximum gradient magnitude with respect to each of pixels; a maximum gradient magnitude direction operation part which operates a gradient direction of the maximum gradient magnitude for each of pixels obtained by the maximum gradient magnitude operation part; a normalized maximum gradient magnitude operation part which operates a normalized maximum gradient magnitude; a normalized maximum gradient magnitude image preparing part which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude; a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions; a convolution part which enhances an image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10; a feature vector converter which makes conversion into a feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part; and an object recognition part which recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter.

Description

OBJECT DETECTION DEVICE AND VEHICLE HAVING THE OBJECT DETECTION DEVICE

The present invention relates to an object detection device which detects an object by image recognition and to a vehicle having the object detection device.

Description of the Related Art

"Filtered channel features for pedestrian detection" by S. Zhang, R. Benenson, and B. Schiele, CVPR, 2015 (non-patent Document 1) proposes recognition of a pedestrian from images through feature values in ten channels formed of an LUV color, a normalized gradient magnitude image and images of six different histograms of oriented gradients (HOG).

In non-patent Document 1, a combination of a plurality of types of filters is proposed as Checkerboards, with which data in the ten channels is converted into a feature vector, from which a pedestrian is recognized with a recognition method.

"Informed Haar-like features improve pedestrian detection" by S. Zhang, R. Bauckhage, and A. B. Cremers, CVPR, 2014 (non-patent Document D2) proposes a filter corresponding to the contour of a person. A filter extracted from the contour of a person, however, is not necessarily the optimum filter. There is an apprehension of a reduction in processing speed when this method (such the filter) is used.

With low-level features such as HOG features or Haar-like features, it is difficult to detect an object such as a pedestrian or an object which can change in shape (vary in shape attributes) or in appearance for example due to variations in clothing. In order to solve the problem of difficulty of detecting, CoHOG features have also been proposed such that co-occurrences between HOG features are obtained. However, when such features are used, the number of dimensions of the feature vector is increased and the time taken to perform detection and learning is extended. That is, the amount of computation (processing time) is increased and it is difficult to incorporate an object detection device using such features in a small CPU.

In the methods described in non-patent Documents 1 and 2, the number of channels or the number of filters is large and as a result the amount of data for processing and, hence, the processing time is increased.

An objective of the present invention is to provide an object detection device which is capable of reducing the amount of data for processing, and the processing time without reducing the recognition performance and a vehicle having the object detection device.

Various aspects of the present invention are defined in the independent claims appended hereto. Some optional and/or preferred features are defined in the dependent claims appended hereto.

According to a first aspect of the present invention there is provided an object detection device according to appended claims 1 to 6.

According to a second aspect of the present invention there is provided a vehicle according to claim 7.

An object detection device according to the present invention has:
an LUV converter which converts an RGB image into an LUV color image;
a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the LUV color image on the basis of the intensity values of the LUV color image obtained by the LUV converter;
a gradient magnitude operation part which computes gradient magnitudes of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation part;
a gradient direction operation part which computes gradient directions of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation part;
a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the LUV color image obtained by the gradient magnitude operation part;
a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation part;
a normalized maximum gradient magnitude operation part which computes a normalized maximum gradient magnitude with respect to each pixel on the basis of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel;
a normalized maximum gradient magnitude image preparing part which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude obtained by the normalized maximum gradient magnitude operation part with respect to each pixel;
a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient directions obtained by the gradient direction operation part and the gradient magnitudes obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part into a feature vector; and
an object recognition part which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.

In the present invention, the object detection device may further have an image capturing part. For example, an RGB image or a grayscale image may be taken with a CMOS sensor.

In the present invention, the object detection device may have a scanning part which slides a rectangular area of a predetermined size on the entire RGB image to repeat the process from processing performed by the LUV converter to processing performed by the object recognition part. For example, in a case where the image size is 64 x 128, the scanning part performs cropping process on the entire image by sliding in four-pixel steps.

In the present invention, the object detection device may have an image shrinking part which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions into an image with size smaller than the size of the RGB image.

According to this arrangement, the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions are of an image with size smaller than the size of the RGB image. For example, in the case where the RGB image size may have 64 x 128 pixels, each size of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions may have 32 x 64 pixels.

In another mode of implementation not having the image shrinking part, the LUV converter may convert the RGB image into an image with size smaller than the size of the RGB image before conversion into the LUV color image.

When preparing a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude, the normalized maximum gradient magnitude image preparing part may prepare the image in an image with size smaller than the size of the RGB image.

When preparing a gradient magnitude image for each of the six gradient directions, the magnitude image for each gradient direction preparing part may prepare the image in an image with size smaller than the size of the RGB image.

In still another mode of implementation of the present invention, another color space (color expression) can be used instead of the LUV color image. For example, an image converter which converts an RGB image into an HSV image may be provided. Processing in each component can be executed by using an HSV image instead of the LUV color image.

An object detection device in this mode of implementation has:
an image converter which converts an RGB image into an input image with a predetermined color space (an LUV color image or an HSV image);
a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the input image on the basis of the intensity values of the input image obtained by the image converter;
a gradient magnitude operation part which computes gradient magnitudes of the input image on the basis of the horizontal gradients and the vertical gradients of the input image obtained by the gradient operation part;
a gradient direction operation part which computes gradient directions of the input image on the basis of the horizontal gradients and the vertical gradients of the input image obtained by the gradient operation part;
a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the input image obtained by the gradient magnitude operation part;
a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the input image obtained by the gradient operation part;
a normalized maximum gradient magnitude operation part which computes a normalized maximum gradient magnitude with respect to each pixel on the basis of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel;
a normalized maximum gradient magnitude image preparing part which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude obtained by the normalized maximum gradient magnitude operation part with respect to each pixel;
a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient directions obtained by the gradient direction operation part and the gradient magnitudes obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part into a feature vector; and
an object recognition part which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.

In a further mode of implementation of the present invention, a grayscale image can be used instead of the LUV color image. For example, an image converter which converts an RGB image into a grayscale image may be provided. The need to compute the maximum gradient magnitude from the gradient magnitudes in the two or more color spaces (color expressions) is eliminated, so that processing in the maximum gradient magnitude operation part can be removed.

An object detection device in this mode of implementation has:
an image converter which converts an RGB image into a grayscale image;
a gradient operation part which computes a horizontal gradient in a horizontal direction and a vertical gradient in a vertical direction of the grayscale image on the basis of the intensity value of the grayscale image obtained by the image converter;
a gradient magnitude operation part which computes a gradient magnitude of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part;
a gradient direction operation part which computes a gradient direction of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part;
a normalized gradient magnitude image preparing part which prepares a normalized gradient magnitude image on the basis of the gradient magnitude obtained by the gradient magnitude operation part;
a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part and the gradient magnitude obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the grayscale image, the normalized gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 8 corresponding to the total of the grayscale image, the normalized gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the grayscale image, the normalized gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part into a feature vector; and
an object recognition part which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.

In a yet another mode of implementation of the present invention, an arrangement using an RGB image and not having the LUV converter may be provided.

An object detection device in this mode of implementation has:
a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of an RGB image on the basis of the intensity values of the RGB image;
a gradient magnitude operation part which computes gradient magnitudes of the RGB image on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part;
a gradient direction operation part which computes gradient directions of the RGB image on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part;
a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the RGB image obtained by the gradient magnitude operation part;
a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part;
a normalized maximum gradient magnitude operation part which computes a normalized maximum gradient magnitude with respect to each pixel on the basis of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel;
a normalized maximum gradient magnitude image preparing part which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude obtained by the normalized maximum gradient magnitude operation part with respect to each pixel;
a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient directions obtained by the gradient direction operation part and the gradient magnitudes obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part into a feature vector; and
an object recognition part which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.

In the present invention, the object recognition part may have a prelearned recognition device to which the feature vector is provided.

The recognition device may be, for example, Adaboost (Adaptive Boosting).

The object recognition part may recognize that the object to be detected is detected when an output score which is the sum total as a result of multiplication between weighting values set in advance for each of the decision trees and scores for each of the decision trees is equal to or larger than a threshold value set in advance.

For example, the selected feature vector and a threshold value set in advance are compared at each node of the decision tree. A terminal node is detected by a depth priority search method and scores for the decision trees are computed. In all the decision trees, weighting values set in advance for each of the decision trees and the scores for each of the decision trees are multiplied with the corresponding weighting values and the total sum of the multiplication results is computed. The total sum is the output score. The object recognition part recognizes that the object to be detected is detected when the output score is equal to or larger than a threshold value set in advance.

In the present invention, the filters may have a rectangular shape, a square shape or a different shape (e.g., L-shape) including a plurality of pixels.

The filters may be uniform, horizontal, vertical pattern or non-uniform pattern and may be a check pattern or an inclination pattern. For example, when the filters are uniform, the values at all pixels are "1". When the filters are horizontal, the pixels above a boundary line are "1" and the pixels below the boundary line are "-1". When the filters are vertical, the pixels on the left-hand side of a boundary line are "1" and the pixels on the right-hand side of low the boundary line are "-1".

In the three or more types of filters, at least two of them differ in filter size from each other or all of them differ in filter size from each other.

The shapes of the filters may be not similar to a rectangular image. Rhomboid filters may be used for an oblong image or object.

In the present invention, the filters may be constituted of a square uniform filter, a square horizontal filter and a square vertical filter, and these filters may differ in filter size from each other.

In the present invention, the filters may be constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and these filters may be in a filter size relationship shown by the following equations:
U-filter ≦ H-filter < V-filter or
H-filter ≦ U-filter < V-filter

Preferably, the object detection device according to the present invention is used to detect a person, particularly a pedestrian.

Preferably, the filters are constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and these filters are in a filter size relationship shown by the following equations:
U-filter ≦ H-filter < V-filter or
H-filter ≦ U-filter < V-filter

Three or four of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter) and a square check pattern filter (C-filter) may be selected as the filters. The selected filters may have the same or different filter sizes and may be in a size relationship shown by the following equations:
H-filter ≦ U-filter ≦ C-filter < V-filter

Also, three or four and fewer than ten of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter), a square check pattern filter (C-filter), a square inclination pattern filter (I-filter) and an L-shaped filter (L-filter) may be selected as the filters. Not only filters having different patterns but also filters differing in filter size may be counted as different types of filters.

If the intensity of a color (c) in a pixel at coordinates (x, y) is l (x, y, c), a gradient lx (x, y, c) of the color (c) in the horizontal direction and a gradient ly (x, y, c) of the color (c) in the vertical direction can be obtained by the following equations:
lx (x, y, c) = l (x+1, y, c) - l (x-1, y, c) (1)
ly (x, y, c) = l (x, y+1, c) - l (x, y-1, c) (2)

A gradient magnitude gm (x, y, c) can be obtained by the following equation:

A maximum gradient magnitude gm_max (x, y) can be obtained by the following equation from gradient magnitudes gm (x, y, c) of the LUV color image obtained by the gradient magnitude operation part.
gm_max (x, y) = max {gm (x, y, L), gm (x, y, U), gm (x, y, V)} (4)

Also, a gradient direction θ (x, y) when the gradient magnitude is the maximum gradient magnitude in the LUV color image can be obtained by the following equation:

A normalized maximum gradient magnitude ngm_max (x, y) can be obtained by the following equation:

where Sum (A) is the sum total in a predetermined pixel area containing gm_max (x, y) at its center. The pixel area is, for example an 11 x 11 rectangle centered to the pixel in gm_max (x, y).

The invention in another aspect is a vehicle having the above-described object detection device.

The vehicle may be a saddle-ride vehicle or straddled vehicle.

The object detection device according to the present invention is capable of reducing the amount of data for processing and the processing time without reducing the recognition performance. The vehicle according to the present invention can be provided with the above-described object detection device.

It will be appreciated that features analogous to those described in relation to the above aspect or optional features may be individually and separably or in combination applicable to any of the other aspects or optional features.

Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, which are following.

Fig.1 is a functional block diagram of an object detection device according to a first embodiment of the present invention. Fig.2 is a process flow chart of the object detection device according to the first embodiment. Fig.3A is an explanation view related to sliding windows. Fig.3B is an explanation view related to sliding windows. Fig.3C is an explanation view related to sliding windows. Fig.4A is an explanation view related to a gradient. Fig.4B is an explanation view related to a gradient. Fig.5 is an explanation view of generating 10 channel features from an RGB image. Fig.6 is a drawing showing the type of filters. Fig.7 is a drawing showing an evaluation result of Miss Rate. Fig.8 is a functional block diagram of an object detection device according to a second embodiment of the present invention. Fig.9 is a functional block diagram of an object detection device according to a third embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

First embodiment
Hereinafter, the first embodiment according to the present invention will be explained. Additionally, the present invention is not limited to embodiments which will be explained below, and naturally includes other embodiments. FIG. 1 is a functional block diagram of an object detection device according to the first embodiment.

Explanation of function
An object detection device 1 includes an image input part 10, a scan part 11, an LUV converter 12, a gradient operation part 13, a gradient magnitude operation part 14, a gradient direction operation part 15, a maximum gradient magnitude operation part 16, a maximum gradient magnitude direction operation part 17, a normalized maximum gradient magnitude operation part 18, a normalized maximum gradient magnitude image preparing part 19, a gradient magnitude image for each gradient direction preparing part 20, an image shrinking part 21, a convolution part 22, a feature vector converter 23 and an object recognition part 24. The object detection device 1 may be constituted by a single configuration such as a dedicated communication circuit, firmware and a processing device or a combination thereof. The above elements of the object detection device 1 may be achieved by a combination of software and hardware.

The image input part 10 may be an image capturing part which captures an image, a reading part which can read image data or a receiving part which receives image data (irrespective of wireless or wired). The image input part 10 may include an image converter which performs conversion into an image of a predetermined color space when the input image is not an image of a predetermined color space. The image input part 10 may include an RGB image converter which performs conversion into an RGB image in a case that the original image color space is not RGB, for example.

The scan part 11 scans a rectangular area of a predetermined size on the entire input image (for example, an RGB image). By the unit of a rectangular area of a predetermined size, the process from processing performed by the LUV converter to processing performed by the object recognition part which will be explained later is performed. A cropping area (a rectangular area of a predetermined size) can be arbitrarily set, and for example, 8 x 8, 8 x 16 can be listed. As shown in FIG. 3A, the scan part 11 performs cropping process by sliding the cropping area (8 x 18) in four-pixel steps from the left end to the right end of an RGB image (64 x 128). As shown in FIG. 3B, the next line is scanned by returning to the left end side of the RGB image and sliding in four-pixel steps downward. In the same manner as above, the cropping processing is performed by sliding the cropping area by four-pixel steps from the left end to the right end. In this way, sliding windows is performed from an upper left corner to a lower right corner of the RGB image, that is, on the entire RGB image. Additionally, the size of the cropping area is not limited to constant, but may be dynamically changed at the time of executing sliding windows. For example, in FIG. 3C, the cropping area may be enlarged toward diagonally lower right in stages, and the number of pixels to be slid may be proportional to the size of the rectangle.

The LUV converter 12 converts an RGB image (three channels of image) into LUV color images (three channels of image).

The gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the LUV color images on the basis of the intensity values (an intensity value of an L image, an intensity value of a U image and an intensity value of V image) of each of the LUV color images obtained by the LUV converter 12.

The gradient operation part 13 computes as follows.
If the intensity of a color channel (c) in a pixel at coordinates (x, y) is l (x, y, c) in an intensity image, a gradient lx (x, y, c) in the horizontal direction (x direction) and a gradient ly (x, y, c) in the vertical direction of a color (c) in a pixel at coordinates (x, y) can be obtained by the following equations. The arrangement relationship of the coordinate (x, y) is shown in FIG. 4A.

l_x (x, y, c) = l (x+1, y, c) - l (x-1, y, c) (1)
l_y (x, y, c) = l (x, y+1, c) - l (x, y-1, c) (2)

In a case of a grayscale image, the above equations can be directly used.
Since an LUV color image is used in this embodiment, the gradient is obtained for each of the LUV color images. If the color of an L image is cL, the color of a U image is cU and the color of a V image is cV, the horizontal gradient and the vertical gradient can be obtained as follows.
In a case of the L image,
lx (x, y, cL) = l (x+1, y, cL) - l (x-1, y, cL) (1-a)
ly (x, y, cL) = l (x, y+1, cL) - l (x, y-1, cL) (2-a)

In a case of the U image,
lx (x, y, cU) = l (x+1, y, cU) - l (x-1, y, cU) (1-b)
ly (x, y, cU) = l (x, y+1, cU) - l (x, y-1, cU) (2-b)

In a case of the V image,
lx (x, y, cV) = l (x+1, y, cV) - l (x-1, y, cV) (1-c)
ly (x, y, cV) = l (x, y+1, cV) - l (x, y-1, cV) (2-c)

The gradient magnitude operation part 14 computes the gradient magnitude of each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images obtained by the gradient operation part 13.

The gradient magnitude operation part 14 obtains the gradient magnitude gm (x, y, c) by the following equation.

In a case of a grayscale image, the above equation can be directly used.
In each of the LUV color images, the gradient magnitude can be obtained as follows.
In a case of the L image,

In a case of the U image,

In a case of the V image,

The gradient direction operation part 15 computes the gradient direction of each of the LUV color images (the gradient direction of the L image, the gradient direction of the U image and the gradient direction of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images obtained by the gradient operation part 13.

FIG. 4B shows the direction for each 30° as the gradient direction.
The maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of V image) obtained by the gradient magnitude operation part 14.

The maximum gradient magnitude operation part 16 computes the maximum gradient magnitude gm_max (x, y) by the following equation.

gm_max (x, y) = max {gm (x, y, cL), gm (x, y, cU), gm (x, y, cV)} (4-1)

The maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the LUV color images obtained by the gradient operation part 13.

The maximum gradient magnitude direction operation part 17 obtains the gradient direction θ (x, y) of the maximum gradient magnitude by the following equation:

The normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.

The normalized maximum gradient magnitude operation part 18 obtains the normalized maximum gradient magnitude ngm_max (x, y) by the following equation:

where Sum (A) is the total sum in a predetermined pixel area containing gm_max (x, y) at its center. The pixel area is, for example, an area formed by eleven pixels in the vertical direction and eleven pixels in the horizontal direction in the first embodiment.

The normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18.

The gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions (0°, 30°, 60°, 90°, 120°, 150°) between 0° to 180° on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.

The image shrinking part 21 converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image (an input image). For example, if the RGB image size is 64 x 128 pixels, each of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for six gradient directions is converted into 32 x 64 pixels.

The convolution part 22 enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.

By convolving using different types of filters, the feature of each image is enhanced and recognition with high precision can be performed in the recognition part which will be explained later. That is, by using training data with conversion so that the features of each image are enhanced, a recognition device with higher recognition precision can be made.

In this embodiment, from the viewpoint of the object detection precision and the processing speed, the number of filters is 10 or less, preferably 4 or less, and more preferably, 3.

The shape of the filter may be a rectangular shape, a square shape or a different shape (for example, the shape of L) including a plurality of pixels.

As shown in FIG. 6, the pattern of the filter, (a) Uniform, (b) Horizontal, (c) Vertical, (d) Check, (e) Inclination, (f) Different shape (L-shape) are illustrated. In the drawing, white pixels are "1" and black pixels are "-1." As the filter size, for example, 2 x 2, 4 x 4, 6 x 6 and 8 x 8 pixels are listed.

In the first embodiment, the filters may be constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and preferably the filters are in a filter size relationship shown by the following equations:

U-filter ≦ H-filter < V-filter or
H-filter ≦ U-filter < V-filter

Moreover, three or four types of the square uniform filter (U-filter), the square horizontal filter (H-filter), the square vertical filter (V-filter) and a square check filter (C-filter) may be selected as the filters. The selected filters may have the same or different filter sizes and preferably are in a filter size relationship shown by the following equations:

H-filter ≦ U-filter ≦ C-filter < V-filter

The feature vector converter 23 makes conversion into the feature vector on the basis of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22. For example, when the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22 are 64 x 128 pixels, all information has 64 x 128 x 10 pixels, that is, 81920 pixels. The feature vector converter 23 performs processing of converting the information of 64 x 128 x 10 into one dimension of 1 x 81920.

The object recognition part 24 recognizes an object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter 23. The object recognition part 24 may have a prelearned recognition device to which the feature vector is provided. The recognition device may implement , for example, Adaboost.

The object recognition part 24 may recognize that the object to be detected is detected when an output score which is the total sum as a result of multiplication between weighting values set in advance for each of the decision trees and scores for each of the decision trees is equal to or larger than a threshold value set in advance.

The object recognition part 24 performs processing of the following (1) to (4).
(1) A selected value in feature vector and a threshold value set in advance are compared at each node of the decision tree.
(2) A terminal node is reached by a depth-first search method and scores for each of the decision trees are computed.
(3) In all the decision trees, weighting values set in advance for each of the decision trees and the scores for each of the decision trees are multiplied, and then the total sum of the multiplication results is computed. The total sum is the output score.
(4) The object recognition part recognizes that the object to be detected is detected when the output score is equal to or larger than a threshold value set in advance.

Processing Flow
Using FIG. 2, a processing flow of the first embodiment will be explained. First, in step S1, the image input part 10 having a CMOS sensor captures an image. FIG. 5(a) shows an example of an input image. The captured image is stored in a memory 30 by the unit of a frame. For each frame, the following processing is performed.

In step S2, the scan part 11 scans an RGB image by the unit of a rectangular area of a predetermined size. Thereafter, each processing of step S3 to step S14 is performed by the unit of a rectangular area of a predetermined size.

In step S3, the LUV converter 12 converts an RGB image into LUV color images. FIG. 5(b) shows an example of the LUV color image.

In step S4, the gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the LUV color images on the basis of the intensity values (an intensity value of an L image, an intensity value of a U image and an intensity value of a V image) of each of the LUV color images.

In step S5, the gradient magnitude operation part 14 computes the gradient magnitude of each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images.

In step S6, the maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image).

In step S7, the normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.

In step S8, the normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18. FIG. 5(c) shows an example of the normalized maximum gradient magnitude image.

In step S9, the maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the LUV color images.

In step S10, the gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions (0°, 30°, 60°, 90°, 120°, 150°) between 0° to 180° on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14. FIG. 5(d) shows an example of the gradient magnitude image for each of six gradient directions.

Additionally, step S9 or step S10 which will be explained later may be executed after step S4 and step S6, may be simultaneously executed with step S7 or step S8, or may be executed before step S7 or step S8.

In step S11, the image shrinking part 21 converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image. FIG. 5(e) shows the reduced LUV color image, the reduced normalized maximum gradient magnitude image and the reduced gradient magnitude image for each of six gradient directions.

In step S12, the convolution part 22 enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10. While 3 or 4 types of filters are preferably used for convolution in this embodiment, using 3 types of filters for convolution is preferable since the processing time can be reduced without deteriorating the recognition property of the object.

In step S13, the feature vector converter 23 makes conversion into the feature vector on the basis of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.

In step S14, the object recognition part 24 recognizes an object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter 23.

In step S15, whether sliding windows is completed or not is judged in all the image areas of one frame. If it is not completed, the process returns to step S2, and the scan area (the cropping area) is slid in accordance with a predetermined rule to repeat the processing from step S3 to step S14. When sliding windows is completed, the same processing is performed to the next frame.

Example
A pedestrian detection was evaluated using the object detection device 1 according to the first embodiment.

As a filter, three types of filters, which are, a square uniform filter (U-filter, (Fig. 6(a)), a square horizontal filter (H-filter, (Fig. 6(b)) and a square vertical filter (V-filter, (Fig. 6(c)) were used. Since the three types of filters were in four different sizes (2 x 2, 4 x 4, 6 x 6, 8 x 8 pixels), the number of combinations of all the filters is 16.

As an input image, image from the Caltech reasonable dataset were used.
To each filter combination, an evaluation of the performance was made using Miss Rate generally used for evaluation of Caltech dataset. Table 1 shows a comparison result between top 5 filter combinations and the conventional person detection method.

"MFCF" is a filter combination of the first embodiment.
"ACF-Caltech+" uses in total 10 images (channels) that is the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions as the feature in the same manner as the first embodiment. However, "ACF-Caltech+" does not have any convolution part as in the first embodiment, and directly sends 10 images to the feature vector converter and use it as a feature vector.

The difference between "ACF-Caltech+" and "ACF-ours" is the size of the input image.

"ACF-Caltech+" uses an image of 32 x 64 pixels as an input, while "ACF-ours" uses an image of 64 x 128 pixels as an input in the same manner as the first embodiment.

"LDCF" and "LDCF-ours" use total 10 images (channels) of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions in the same manner as the first embodiment, and further They include a convolution part. The difference from the first embodiment is the type of the filter. "LDCF" and "LDCF-ours" use four different types of 5-by-5-pixel-filters to each images: the LUV color image, the normalized maximum gradient magnitude image, and the gradient magnitude image for each of six gradient directions. That is, 40 types of filters are used. Moreover, also regarding the numerical values of an inner portion of the filter, binary numerical values such as "1" or "-1" as in the first embodiment are not used, but real numbers from -1.0 to 1.0 are used. The difference between "LDCF" and "LDCF-ours" is the size of the input image. "LDCF" uses an image of 32 x 64 pixels as an input, while "LDCF-ours" uses an image of 64 x 128 pixels as an input in the same manner as the first embodiment.

"Checkerboards" is a method of non-patent document 1. "Checkerboards" uses total 10 images (channels) of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions in the same manner as the first embodiment, and further it includes a convolution part. The difference from the first embodiment is the type of the filter. "Checkerboards" uses 39 filters of 4 x 3 cells (1 cell is 6 x 6 pixels), the 39 filters use all reproducible combinations in 3 x 4 cells (a U-filter, a V-filter, a H-filter and a checkerboard pattern). "Checkerboards" uses an image of 60 x 120 pixels as an input image.

As shown in Table 1, a filter combination of the first embodiment detected a pedestrian with higher precision than in the conventional person detecting method.

Table. 1

Next, of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter), a square check filter (C-filter), a square inclined pattern filter (I-filter) and a L-shaped filter (L-filter), three or four types of filter combinations were evaluated.

As an input image, Caltech reasonable dataset was used.
FIG. 7 shows an evaluation result by Miss Rate in each of filter combinations. Miss Rate is generally used for evaluation of Caltech dataset. As shown in FIG. 7, MFCF248, MFCF248C and MFCF2-8C had Miss Rates of less than 19%.
MFCF248, which is a filter combination in the first embodiment, had more excellent result than any other published conventional method.

Second Embodiment
Hereinafter, the second embodiment according to the present invention will be explained. FIG. 8 is a functional block diagram of the object detection device according to the second embodiment.

Explanation of Function
The object detection device 1 includes the image input part 10, the scan part 11, an image conversion part 40, the gradient operation part 13, the gradient magnitude operation part 14, the gradient direction operation part 15, the maximum gradient magnitude operation part 16, the maximum gradient magnitude direction operation part 17, the normalized maximum gradient magnitude operation part 18, the normalized maximum gradient magnitude image preparing part 19, the gradient magnitude image for each gradient direction preparing part 20, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24. Hereinafter, elements having the same reference numerals as those of the first embodiment have the same function, so their explanations are omitted or they are briefly explained.

The image conversion part 40 converts an RGB image into input images (HSV images) of a predetermined color space (HSV).

The gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of each of the input images obtained by the image conversion part 40.

The gradient magnitude operation part 14 computes the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part 13.

The gradient direction operation part 15 computes the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part 13.

The maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operation part 14.

The maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the input images obtained by the gradient operation part 13.

The gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.

The image shrinking part 21 converts the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.

The convolution part 22 enhances the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total sum of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.

The feature vector converter 23 makes conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.

The object recognition part 24 recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter 23.

Additionally, while the process flow of the second embodiment is the same as the processing steps of the process flow of the first embodiment (FIG. 2), a change of the process content of replacing the LUV color image with the HSV color image is reflected.

Third embodiment
Hereinafter, the third embodiment according to the present invention will be explained. FIG. 9 is a functional block diagram of the object detection device according to the third embodiment. The third embodiment includes an image conversion part 50 which converts an RGB image into a grayscale image. The need to compute the maximum gradient magnitude from the gradient magnitudes in the two or more color space is eliminated, so that processing in the maximum gradient magnitude operation part can be removed.

Explanation of Function
The object detection device 1 includes the image input part 10, the scan part 11, an image conversion part 50, the gradient operation part 41, the gradient magnitude operation part 42, the gradient direction operation part 43, the normalized gradient magnitude image preparing part 45, the gradient magnitude image for each gradient direction preparing part 46, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24. Hereinafter, elements having the same reference numerals as those of the first or second embodiment have the same function, so their explanations are omitted or they are briefly explained.

The image conversion part 50 converts an RGB image into a grayscale image.

The gradient operation part 41 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the grayscale image on the basis of the intensity values of the grayscale image obtained by the image conversion part 50.

The gradient magnitude operation part 42 computes the gradient magnitude of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part 41.

The gradient direction operation part 43 computes the gradient direction of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part 41.

The normalized gradient magnitude image preparing part 45 prepares the normalized gradient magnitude image from the gradient magnitude obtained by the gradient magnitude operation part 42.

The gradient magnitude image for each gradient direction preparing part 46 prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part 43 and the gradient magnitude obtained by the gradient magnitude operation part 42.

The image shrinking part 21 converts the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.

The convolution part 22 enhances the image by convolving the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 8 corresponding to the total of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.

The feature vector converter 23 makes conversion into the feature vector on the basis of the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.

Additionally, while the process flow of the third embodiment omits step S6 and step S7 in the processing flow (FIG. 2) of the first embodiment, and a change of the process content of replacing the LUV color image with the grayscale image is reflected.

Fourth Embodiment
A fourth embodiment directly uses an RGB image as an input image. The LUV converter and the image conversion part in the first to third embodiments can be omitted.

Explanation of function
The object detection device 1 includes the image input part 10, the scan part 11, the gradient operation part 13, the gradient magnitude operation part 14, the gradient direction operation part 15, the maximum gradient magnitude operation part 16, the maximum gradient magnitude direction operation part 17, the normalized maximum gradient magnitude operation part 18, the normalized maximum gradient magnitude image preparing part 19, the gradient magnitude image for each gradient direction preparing part 20, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24. Hereinafter, elements having the same reference numerals as those of the first embodiment have the same function, so their explanations are omitted or they are briefly explained.

The gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the RGB images on the basis of the intensity values of each of the RGB images.

The gradient magnitude operation part 14 computes the gradient magnitude of each of the RGB images on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.

The gradient direction operation part 15 computes the gradient direction of each of the RGB images on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.

The maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the RGB images obtained by the gradient magnitude operation part 14.

The maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.

The convolution part 22 enhances the image by convolving the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.

The feature vector converter 23 makes conversion into the feature vector on the basis of the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.

Additionally, while the process flow of the fourth embodiment omits step S3 in the processing flow (FIG. 2) of the first embodiment, and a change of the process content of replacing the LUV color image with the RGB image is reflected.

Other embodiments
In the first to fourth embodiments, the image shrinking part is not necessarily included, and the function of the image shrinking part may be included by each of the LUV converter, the image conversion part, the normalized maximum gradient magnitude image preparing part, the gradient magnitude image for each gradient direction preparing part and the normalized gradient magnitude image preparing part.

Fifth embodiment
An object detection method according to a fifth embodiment performs scanning a rectangular area with a predetermined size to all areas of an input image, and performs the following steps (1) to (12) by the unit of an area cropped by sliding windows.

The object detection method comprises the following.
(1) A gradient operating step of computing the horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of color space of each of the input images (an RGB image, an LUV image and an HSV image) having a plurality of color spaces (RGB, LUV and HSV).
(2) A gradient magnitude operating step of operating the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
(3) A gradient direction operating step of operating the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
(4) A maximum gradient magnitude operating step of operating the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operating step.
(5) A maximum gradient magnitude direction operating step of operating the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operating step on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation step.
(6) A normalized maximum gradient magnitude operating step of operating the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operating step.
(7) A normalized maximum gradient magnitude image preparing step of preparing the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operating step.
(8) A gradient magnitude image for each gradient direction preparing step of preparing the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operating step and the gradient magnitude obtained by the gradient magnitude operating step.
(9) An image shrinking step of converting the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the input image to be used in the gradient operating step before a convolving step.
(10) The convolving step of enhancing the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking step with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
(11) A feature vector converting step of making conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolving step.
(12) An object recognizing step of recognizing an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converting step.

Also, the object detection method may include an image converting step of converting the RGB image into the input images of a predetermined color space (an LUV color image or an HSV image) before the gradient operating step for execution.

Sixth embodiment
An object detection program according to a sixth embodiment scans all areas of the input image with a rectangular area of a predetermined size, and allows a computer to execute the following steps (1) to (12) by the unit of an area cropped by sliding window approach.

The object detection program allows the computer to execute the following.
(1) A gradient operating step of computing the horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of color space of each of the input images (an RGB image, an LUV image and an HSV image) having a plurality of color spaces (RGB, LUV and HSV).
(2) A gradient magnitude operating step of operating the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
(3) A gradient direction operating step of operating the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
(4) A maximum gradient magnitude operating step of operating the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operating step.
(5) A maximum gradient magnitude direction operating step of operating the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operating step on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation step.
(6) A normalized maximum gradient magnitude operating step of operating the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operating step.
(7) A normalized maximum gradient magnitude image preparing step of preparing the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operating step.
(8) A gradient magnitude image for each gradient direction preparing step of preparing the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operating step and the gradient magnitude obtained by the gradient magnitude operating step.
(9) An image shrinking step of converting the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the input image to be used in the gradient operating step before a convolving step.
(10) The convolving step of enhancing the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking step with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
(11) A feature vector converting step of making conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolving step.
(12) An object recognizing step of recognizing an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converting step.

Also, the object detection program may allow the computer to execute an image converting step of converting the RGB image into the input images of a predetermined color space (an LUV color image or an HSV image) before the gradient operating step.

Seventh embodiment
A seventh embodiment is a memory medium which stores the object detection program according to the sixth embodiment. The memory medium is not particularly limited, and all the conventional memory media are included.

Eighth embodiment
An object detection device according to an eighth embodiment comprises:
a gradient operation part which computes the horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of color space of each of the input images (for example, an RGB image, an LUV image and an HSV image) having a plurality of color spaces (for example, RGB, LUV and HSV);
a gradient magnitude operation part which operates the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part;
a gradient direction operation part which operates the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part;
a maximum gradient magnitude operation part which operates the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operation part;
a maximum gradient magnitude direction operation part which operates the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part on the basis of the horizontal gradient and the vertical gradient for each of the input images obtained by the gradient operation part;
a normalized maximum gradient magnitude operation part which operates the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part;
a normalized maximum gradient magnitude image preparing part which prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part;
a gradient magnitude image for each gradient direction preparing part which prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part and the gradient magnitude obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total sum of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which makes conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part; and
an object recognition part which recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter.

The object detection device may have the input image of the RGB image, the LUV color image or the HSV color image.

The object detection device may further include a scan part which repeats the process from processing performed by the gradient operation part to processing performed by the object recognition part by scanning of the entire input image in a rectangular area with a predetermined size.

The object detection device may further include
an image conversion part which converts the RGB image into an input image of a predetermined color space (the LUV color image or the HSV image).

The object detection device may further include an image shrinking part which converts the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the input image to be used in the processing by the gradient operation part before the processing by the convolution part.

Vehicle
A vehicle according to the present invention is a vehicle including the object detection device 1 according to the first to fourth embodiments. The vehicle is not particularly limited, and may be a saddle-ride vehicle or straddled vehicle, a two-wheel vehicle, a three-wheel vehicle, and a four-wheel vehicle.

Reference Signs List
1 object detection device
10 image input part
11 scan part
12 LUV converter
13 gradient operation part
14 gradient magnitude operation part
15 gradient direction operation part
16 maximum gradient magnitude operation part
17 maximum gradient magnitude direction operation part
18 normalized maximum gradient magnitude operation part
19 normalized maximum gradient magnitude image preparing part
20 gradient magnitude image for each gradient direction preparing part
21 image shrinking part
22 convolution part
23 feature vector converter
24 object recognition part

Any feature hereinbefore described as a “part” may alternatively be described as a “means” or “means for”. Accordingly, the words/phrases “part” and “means” and “means for” are herein used interchangeably.

Claims

An object detection device comprising:
an LUV converter which converts an RGB image into an LUV color image;
a gradient operation means which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the LUV color image on the basis of the intensity values of the LUV color image obtained by the LUV converter;
a gradient magnitude operation means which computes gradient magnitudes of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation means;
a gradient direction operation means which computes gradient directions of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation means;
a maximum gradient magnitude operation means which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the LUV color image obtained by the gradient magnitude operation means;
a maximum gradient magnitude direction operation means which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation means with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation means;
a normalized maximum gradient magnitude operation means which computes a normalized maximum gradient magnitude with respect to each pixel on the basis of the maximum gradient magnitude obtained by the maximum gradient magnitude operation means with respect to each pixel;
a normalized maximum gradient magnitude image preparing means which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude obtained by the normalized maximum gradient magnitude operation means with respect to each pixel;
a gradient magnitude image for each gradient direction preparing means which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient directions obtained by the gradient direction operation means and the gradient magnitudes obtained by the gradient magnitude operation means;
a convolution means which enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution means into a feature vector; and
an object recognition means which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.
The object detection device according to Claim 1, further comprising a scanning means which scans a rectangular area of a predetermined size on the entire RGB image to repeat the process from processing performed by the LUV converter to processing performed by the object recognition means.
The object detection device according to Claim 1 or 2, further comprising an image shrinking means which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions into an image with size smaller than the size of the RGB image.
The object detection device according to any one of Claims 1 to 3, wherein the object recognition means recognizes that the object to be detected is detected when an output score which is the total sum as a result of multiplication between weighting values set in advance for decision trees and scores for the decision trees is equal to or larger than a threshold value set in advance.
The object detection device according to any one of Claims 1 to 4, wherein the filters are constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and the filters are in a filter size relationship:
U-filter ≦ H-filter < V-filter or
H-filter ≦ U-filter < V-filter.
The object detection device according to any one of Claims 1 to 5, wherein the filter is constituted of three or four types of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter) and a square check filter (C-filter), and the filters are in a filter size relationship:
H-filter ≦ U-filter ≦ C-filter < V-filter.
A vehicle comprising the object detection device according to any one of Claims 1 to 6.