WO2017168889A1 - Object detection device and vehicle having the object detection device - Google Patents

Object detection device and vehicle having the object detection device Download PDF

Info

Publication number
WO2017168889A1
WO2017168889A1 PCT/JP2016/088698 JP2016088698W WO2017168889A1 WO 2017168889 A1 WO2017168889 A1 WO 2017168889A1 JP 2016088698 W JP2016088698 W JP 2016088698W WO 2017168889 A1 WO2017168889 A1 WO 2017168889A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
gradient
gradient magnitude
filter
magnitude
Prior art date
Application number
PCT/JP2016/088698
Other languages
French (fr)
Inventor
Yoshiki KURANUKI
Ioannis PATRAS
Original Assignee
Yamaha Hatsudoki Kabushiki Kaisha
Queen Mary University Of London
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Hatsudoki Kabushiki Kaisha, Queen Mary University Of London filed Critical Yamaha Hatsudoki Kabushiki Kaisha
Publication of WO2017168889A1 publication Critical patent/WO2017168889A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Definitions

  • the present invention relates to an object detection device which detects an object by image recognition and to a vehicle having the object detection device.
  • Non-patent Document 1 proposes recognition of a pedestrian from images through feature values in ten channels formed of an LUV color, a normalized gradient magnitude image and images of six different histograms of oriented gradients (HOG).
  • non-patent Document 1 a combination of a plurality of types of filters is proposed as Checkerboards, with which data in the ten channels is converted into a feature vector, from which a pedestrian is recognized with a recognition method.
  • Non-patent Document D2 proposes a filter corresponding to the contour of a person.
  • a filter extracted from the contour of a person is not necessarily the optimum filter. There is an apprehension of a reduction in processing speed when this method (such the filter) is used.
  • HOG features With low-level features such as HOG features or Haar-like features, it is difficult to detect an object such as a pedestrian or an object which can change in shape (vary in shape attributes) or in appearance for example due to variations in clothing.
  • CoHOG features have also been proposed such that co-occurrences between HOG features are obtained.
  • the number of dimensions of the feature vector is increased and the time taken to perform detection and learning is extended. That is, the amount of computation (processing time) is increased and it is difficult to incorporate an object detection device using such features in a small CPU.
  • An objective of the present invention is to provide an object detection device which is capable of reducing the amount of data for processing, and the processing time without reducing the recognition performance and a vehicle having the object detection device.
  • An object detection device has: an LUV converter which converts an RGB image into an LUV color image; a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the LUV color image on the basis of the intensity values of the LUV color image obtained by the LUV converter; a gradient magnitude operation part which computes gradient magnitudes of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation part; a gradient direction operation part which computes gradient directions of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation part; a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the LUV color image obtained by the gradient magnitude operation part; a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical gradient
  • the object detection device may further have an image capturing part.
  • an RGB image or a grayscale image may be taken with a CMOS sensor.
  • the object detection device may have a scanning part which slides a rectangular area of a predetermined size on the entire RGB image to repeat the process from processing performed by the LUV converter to processing performed by the object recognition part.
  • the scanning part performs cropping process on the entire image by sliding in four-pixel steps.
  • the object detection device may have an image shrinking part which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions into an image with size smaller than the size of the RGB image.
  • the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions are of an image with size smaller than the size of the RGB image.
  • the RGB image size may have 64 x 128 pixels
  • each size of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions may have 32 x 64 pixels.
  • the LUV converter may convert the RGB image into an image with size smaller than the size of the RGB image before conversion into the LUV color image.
  • the normalized maximum gradient magnitude image preparing part may prepare the image in an image with size smaller than the size of the RGB image.
  • the magnitude image for each gradient direction preparing part may prepare the image in an image with size smaller than the size of the RGB image.
  • another color space can be used instead of the LUV color image.
  • an image converter which converts an RGB image into an HSV image may be provided. Processing in each component can be executed by using an HSV image instead of the LUV color image.
  • An object detection device in this mode of implementation has: an image converter which converts an RGB image into an input image with a predetermined color space (an LUV color image or an HSV image); a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the input image on the basis of the intensity values of the input image obtained by the image converter; a gradient magnitude operation part which computes gradient magnitudes of the input image on the basis of the horizontal gradients and the vertical gradients of the input image obtained by the gradient operation part; a gradient direction operation part which computes gradient directions of the input image on the basis of the horizontal gradients and the vertical gradients of the input image obtained by the gradient operation part; a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the input image obtained by the gradient magnitude operation part; a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical
  • a grayscale image can be used instead of the LUV color image.
  • an image converter which converts an RGB image into a grayscale image may be provided. The need to compute the maximum gradient magnitude from the gradient magnitudes in the two or more color spaces (color expressions) is eliminated, so that processing in the maximum gradient magnitude operation part can be removed.
  • An object detection device in this mode of implementation has: an image converter which converts an RGB image into a grayscale image; a gradient operation part which computes a horizontal gradient in a horizontal direction and a vertical gradient in a vertical direction of the grayscale image on the basis of the intensity value of the grayscale image obtained by the image converter; a gradient magnitude operation part which computes a gradient magnitude of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part; a gradient direction operation part which computes a gradient direction of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part; a normalized gradient magnitude image preparing part which prepares a normalized gradient magnitude image on the basis of the gradient magnitude obtained by the gradient magnitude operation part; a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part and the gradient magnitude obtained by the gradient magnitude operation part; a convolution part which enhances
  • an arrangement using an RGB image and not having the LUV converter may be provided.
  • An object detection device in this mode of implementation has: a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of an RGB image on the basis of the intensity values of the RGB image; a gradient magnitude operation part which computes gradient magnitudes of the RGB image on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part; a gradient direction operation part which computes gradient directions of the RGB image on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part; a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the RGB image obtained by the gradient magnitude operation part; a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part; a normalized maximum gradient magnitude operation part which computes a normalized maximum gradient magnitude with respect to each
  • the object recognition part may have a prelearned recognition device to which the feature vector is provided.
  • the recognition device may be, for example, Adaboost (Adaptive Boosting).
  • the object recognition part may recognize that the object to be detected is detected when an output score which is the sum total as a result of multiplication between weighting values set in advance for each of the decision trees and scores for each of the decision trees is equal to or larger than a threshold value set in advance.
  • the selected feature vector and a threshold value set in advance are compared at each node of the decision tree.
  • a terminal node is detected by a depth priority search method and scores for the decision trees are computed.
  • weighting values set in advance for each of the decision trees and the scores for each of the decision trees are multiplied with the corresponding weighting values and the total sum of the multiplication results is computed.
  • the total sum is the output score.
  • the object recognition part recognizes that the object to be detected is detected when the output score is equal to or larger than a threshold value set in advance.
  • the filters may have a rectangular shape, a square shape or a different shape (e.g., L-shape) including a plurality of pixels.
  • the filters may be uniform, horizontal, vertical pattern or non-uniform pattern and may be a check pattern or an inclination pattern. For example, when the filters are uniform, the values at all pixels are "1". When the filters are horizontal, the pixels above a boundary line are “1" and the pixels below the boundary line are “-1". When the filters are vertical, the pixels on the left-hand side of a boundary line are "1" and the pixels on the right-hand side of low the boundary line are "-1".
  • At least two of them differ in filter size from each other or all of them differ in filter size from each other.
  • the shapes of the filters may be not similar to a rectangular image.
  • Rhomboid filters may be used for an oblong image or object.
  • the filters may be constituted of a square uniform filter, a square horizontal filter and a square vertical filter, and these filters may differ in filter size from each other.
  • the filters may be constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and these filters may be in a filter size relationship shown by the following equations: U-filter ⁇ H-filter ⁇ V-filter or H-filter ⁇ U-filter ⁇ V-filter
  • the object detection device is used to detect a person, particularly a pedestrian.
  • the filters are constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and these filters are in a filter size relationship shown by the following equations: U-filter ⁇ H-filter ⁇ V-filter or H-filter ⁇ U-filter ⁇ V-filter
  • Three or four of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter) and a square check pattern filter (C-filter) may be selected as the filters.
  • the selected filters may have the same or different filter sizes and may be in a size relationship shown by the following equations: H-filter ⁇ U-filter ⁇ C-filter ⁇ V-filter
  • a square uniform filter U-filter
  • H-filter square horizontal filter
  • V-filter square vertical filter
  • C-filter square check pattern filter
  • I-filter square inclination pattern filter
  • L-filter L-shaped filter
  • a gradient magnitude gm (x, y, c) can be obtained by the following equation:
  • a maximum gradient magnitude gm max (x, y) can be obtained by the following equation from gradient magnitudes gm (x, y, c) of the LUV color image obtained by the gradient magnitude operation part.
  • gm max (x, y) max ⁇ gm (x, y, L), gm (x, y, U), gm (x, y, V) ⁇ (4)
  • a gradient direction ⁇ (x, y) when the gradient magnitude is the maximum gradient magnitude in the LUV color image can be obtained by the following equation:
  • a normalized maximum gradient magnitude ngm max (x, y) can be obtained by the following equation: where Sum (A) is the sum total in a predetermined pixel area containing gm max (x, y) at its center.
  • the pixel area is, for example an 11 x 11 rectangle centered to the pixel in gm max (x, y).
  • the invention in another aspect is a vehicle having the above-described object detection device.
  • the vehicle may be a saddle-ride vehicle or straddled vehicle.
  • the object detection device is capable of reducing the amount of data for processing and the processing time without reducing the recognition performance.
  • the vehicle according to the present invention can be provided with the above-described object detection device.
  • Fig.1 is a functional block diagram of an object detection device according to a first embodiment of the present invention.
  • Fig.2 is a process flow chart of the object detection device according to the first embodiment.
  • Fig.3A is an explanation view related to sliding windows.
  • Fig.3B is an explanation view related to sliding windows.
  • Fig.3C is an explanation view related to sliding windows.
  • Fig.4A is an explanation view related to a gradient.
  • Fig.4B is an explanation view related to a gradient.
  • Fig.5 is an explanation view of generating 10 channel features from an RGB image.
  • Fig.6 is a drawing showing the type of filters.
  • Fig.7 is a drawing showing an evaluation result of Miss Rate.
  • Fig.8 is a functional block diagram of an object detection device according to a second embodiment of the present invention.
  • Fig.9 is a functional block diagram of an object detection device according to a third embodiment of the present invention.
  • FIG. 1 is a functional block diagram of an object detection device according to the first embodiment.
  • An object detection device 1 includes an image input part 10, a scan part 11, an LUV converter 12, a gradient operation part 13, a gradient magnitude operation part 14, a gradient direction operation part 15, a maximum gradient magnitude operation part 16, a maximum gradient magnitude direction operation part 17, a normalized maximum gradient magnitude operation part 18, a normalized maximum gradient magnitude image preparing part 19, a gradient magnitude image for each gradient direction preparing part 20, an image shrinking part 21, a convolution part 22, a feature vector converter 23 and an object recognition part 24.
  • the object detection device 1 may be constituted by a single configuration such as a dedicated communication circuit, firmware and a processing device or a combination thereof. The above elements of the object detection device 1 may be achieved by a combination of software and hardware.
  • the image input part 10 may be an image capturing part which captures an image, a reading part which can read image data or a receiving part which receives image data (irrespective of wireless or wired).
  • the image input part 10 may include an image converter which performs conversion into an image of a predetermined color space when the input image is not an image of a predetermined color space.
  • the image input part 10 may include an RGB image converter which performs conversion into an RGB image in a case that the original image color space is not RGB, for example.
  • the scan part 11 scans a rectangular area of a predetermined size on the entire input image (for example, an RGB image). By the unit of a rectangular area of a predetermined size, the process from processing performed by the LUV converter to processing performed by the object recognition part which will be explained later is performed.
  • a cropping area (a rectangular area of a predetermined size) can be arbitrarily set, and for example, 8 x 8, 8 x 16 can be listed.
  • the scan part 11 performs cropping process by sliding the cropping area (8 x 18) in four-pixel steps from the left end to the right end of an RGB image (64 x 128). As shown in FIG.
  • the next line is scanned by returning to the left end side of the RGB image and sliding in four-pixel steps downward.
  • the cropping processing is performed by sliding the cropping area by four-pixel steps from the left end to the right end.
  • sliding windows is performed from an upper left corner to a lower right corner of the RGB image, that is, on the entire RGB image.
  • the size of the cropping area is not limited to constant, but may be dynamically changed at the time of executing sliding windows.
  • the cropping area may be enlarged toward diagonally lower right in stages, and the number of pixels to be slid may be proportional to the size of the rectangle.
  • the LUV converter 12 converts an RGB image (three channels of image) into LUV color images (three channels of image).
  • the gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the LUV color images on the basis of the intensity values (an intensity value of an L image, an intensity value of a U image and an intensity value of V image) of each of the LUV color images obtained by the LUV converter 12.
  • the gradient operation part 13 computes as follows. If the intensity of a color channel (c) in a pixel at coordinates (x, y) is l (x, y, c) in an intensity image, a gradient lx (x, y, c) in the horizontal direction (x direction) and a gradient ly (x, y, c) in the vertical direction of a color (c) in a pixel at coordinates (x, y) can be obtained by the following equations. The arrangement relationship of the coordinate (x, y) is shown in FIG. 4A.
  • the above equations can be directly used. Since an LUV color image is used in this embodiment, the gradient is obtained for each of the LUV color images. If the color of an L image is cL, the color of a U image is cU and the color of a V image is cV, the horizontal gradient and the vertical gradient can be obtained as follows.
  • lx (x, y, cV) l (x+1, y, cV) - l (x-1, y, cV) (1-c)
  • the gradient magnitude operation part 14 computes the gradient magnitude of each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images obtained by the gradient operation part 13.
  • the gradient magnitude operation part 14 obtains the gradient magnitude gm (x, y, c) by the following equation.
  • the above equation can be directly used.
  • the gradient magnitude can be obtained as follows. In a case of the L image, In a case of the U image, In a case of the V image,
  • the gradient direction operation part 15 computes the gradient direction of each of the LUV color images (the gradient direction of the L image, the gradient direction of the U image and the gradient direction of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images obtained by the gradient operation part 13.
  • FIG. 4B shows the direction for each 30° as the gradient direction.
  • the maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of V image) obtained by the gradient magnitude operation part 14.
  • the maximum gradient magnitude operation part 16 computes the maximum gradient magnitude gm max (x, y) by the following equation.
  • gm max (x, y) max ⁇ gm (x, y, cL), gm (x, y, cU), gm (x, y, cV) ⁇ (4-1)
  • the maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the LUV color images obtained by the gradient operation part 13.
  • the maximum gradient magnitude direction operation part 17 obtains the gradient direction ⁇ (x, y) of the maximum gradient magnitude by the following equation:
  • the normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.
  • the normalized maximum gradient magnitude operation part 18 obtains the normalized maximum gradient magnitude ngm max (x, y) by the following equation: where Sum (A) is the total sum in a predetermined pixel area containing gm max (x, y) at its center.
  • the pixel area is, for example, an area formed by eleven pixels in the vertical direction and eleven pixels in the horizontal direction in the first embodiment.
  • the normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18.
  • the gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions (0°, 30°, 60°, 90°, 120°, 150°) between 0° to 180° on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.
  • the image shrinking part 21 converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image (an input image). For example, if the RGB image size is 64 x 128 pixels, each of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for six gradient directions is converted into 32 x 64 pixels.
  • the convolution part 22 enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
  • the feature of each image is enhanced and recognition with high precision can be performed in the recognition part which will be explained later. That is, by using training data with conversion so that the features of each image are enhanced, a recognition device with higher recognition precision can be made.
  • the number of filters is 10 or less, preferably 4 or less, and more preferably, 3.
  • the shape of the filter may be a rectangular shape, a square shape or a different shape (for example, the shape of L) including a plurality of pixels.
  • the pattern of the filter (a) Uniform, (b) Horizontal, (c) Vertical, (d) Check, (e) Inclination, (f) Different shape (L-shape) are illustrated.
  • white pixels are "1" and black pixels are "-1.”
  • the filter size for example, 2 x 2, 4 x 4, 6 x 6 and 8 x 8 pixels are listed.
  • the filters may be constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and preferably the filters are in a filter size relationship shown by the following equations: U-filter ⁇ H-filter ⁇ V-filter or H-filter ⁇ U-filter ⁇ V-filter
  • three or four types of the square uniform filter (U-filter), the square horizontal filter (H-filter), the square vertical filter (V-filter) and a square check filter (C-filter) may be selected as the filters.
  • the selected filters may have the same or different filter sizes and preferably are in a filter size relationship shown by the following equations: H-filter ⁇ U-filter ⁇ C-filter ⁇ V-filter
  • the feature vector converter 23 makes conversion into the feature vector on the basis of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
  • the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22 are 64 x 128 pixels, all information has 64 x 128 x 10 pixels, that is, 81920 pixels.
  • the feature vector converter 23 performs processing of converting the information of 64 x 128 x 10 into one dimension of 1 x 81920.
  • the object recognition part 24 recognizes an object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter 23.
  • the object recognition part 24 may have a prelearned recognition device to which the feature vector is provided.
  • the recognition device may implement , for example, Adaboost.
  • the object recognition part 24 may recognize that the object to be detected is detected when an output score which is the total sum as a result of multiplication between weighting values set in advance for each of the decision trees and scores for each of the decision trees is equal to or larger than a threshold value set in advance.
  • the object recognition part 24 performs processing of the following (1) to (4).
  • a selected value in feature vector and a threshold value set in advance are compared at each node of the decision tree.
  • a terminal node is reached by a depth-first search method and scores for each of the decision trees are computed.
  • Weighting values set in advance for each of the decision trees and the scores for each of the decision trees are multiplied, and then the total sum of the multiplication results is computed. The total sum is the output score.
  • the object recognition part recognizes that the object to be detected is detected when the output score is equal to or larger than a threshold value set in advance.
  • step S1 the image input part 10 having a CMOS sensor captures an image.
  • FIG. 5(a) shows an example of an input image.
  • the captured image is stored in a memory 30 by the unit of a frame. For each frame, the following processing is performed.
  • step S2 the scan part 11 scans an RGB image by the unit of a rectangular area of a predetermined size. Thereafter, each processing of step S3 to step S14 is performed by the unit of a rectangular area of a predetermined size.
  • step S3 the LUV converter 12 converts an RGB image into LUV color images.
  • FIG. 5(b) shows an example of the LUV color image.
  • step S4 the gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the LUV color images on the basis of the intensity values (an intensity value of an L image, an intensity value of a U image and an intensity value of a V image) of each of the LUV color images.
  • step S5 the gradient magnitude operation part 14 computes the gradient magnitude of each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images.
  • step S6 the maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image).
  • step S7 the normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.
  • step S8 the normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18.
  • FIG. 5(c) shows an example of the normalized maximum gradient magnitude image.
  • the maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the LUV color images.
  • step S10 the gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions (0°, 30°, 60°, 90°, 120°, 150°) between 0° to 180° on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.
  • FIG. 5(d) shows an example of the gradient magnitude image for each of six gradient directions.
  • step S9 or step S10 which will be explained later may be executed after step S4 and step S6, may be simultaneously executed with step S7 or step S8, or may be executed before step S7 or step S8.
  • step S11 the image shrinking part 21 converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.
  • FIG. 5(e) shows the reduced LUV color image, the reduced normalized maximum gradient magnitude image and the reduced gradient magnitude image for each of six gradient directions.
  • the convolution part 22 enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10. While 3 or 4 types of filters are preferably used for convolution in this embodiment, using 3 types of filters for convolution is preferable since the processing time can be reduced without deteriorating the recognition property of the object.
  • step S13 the feature vector converter 23 makes conversion into the feature vector on the basis of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
  • step S14 the object recognition part 24 recognizes an object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter 23.
  • step S15 whether sliding windows is completed or not is judged in all the image areas of one frame. If it is not completed, the process returns to step S2, and the scan area (the cropping area) is slid in accordance with a predetermined rule to repeat the processing from step S3 to step S14. When sliding windows is completed, the same processing is performed to the next frame.
  • Example A pedestrian detection was evaluated using the object detection device 1 according to the first embodiment.
  • a filter As a filter, three types of filters, which are, a square uniform filter (U-filter, (Fig. 6(a)), a square horizontal filter (H-filter, (Fig. 6(b)) and a square vertical filter (V-filter, (Fig. 6(c)) were used. Since the three types of filters were in four different sizes (2 x 2, 4 x 4, 6 x 6, 8 x 8 pixels), the number of combinations of all the filters is 16.
  • MFCF is a filter combination of the first embodiment.
  • ACF-Caltech+ uses in total 10 images (channels) that is the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions as the feature in the same manner as the first embodiment. However, “ACF-Caltech+” does not have any convolution part as in the first embodiment, and directly sends 10 images to the feature vector converter and use it as a feature vector.
  • ACF-Caltech+ uses an image of 32 x 64 pixels as an input
  • ACF-ours uses an image of 64 x 128 pixels as an input in the same manner as the first embodiment.
  • LDCF and LDCF-ours use total 10 images (channels) of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions in the same manner as the first embodiment, and further They include a convolution part.
  • the difference from the first embodiment is the type of the filter.
  • "LDCF” and “LDCF-ours” use four different types of 5-by-5-pixel-filters to each images: the LUV color image, the normalized maximum gradient magnitude image, and the gradient magnitude image for each of six gradient directions. That is, 40 types of filters are used.
  • binary numerical values such as "1" or "-1" as in the first embodiment are not used, but real numbers from -1.0 to 1.0 are used.
  • LDCF uses an image of 32 x 64 pixels as an input
  • LDCF-ours uses an image of 64 x 128 pixels as an input in the same manner as the first embodiment.
  • Checkerboards is a method of non-patent document 1.
  • “Checkerboards” uses total 10 images (channels) of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions in the same manner as the first embodiment, and further it includes a convolution part. The difference from the first embodiment is the type of the filter.
  • “Checkerboards” uses 39 filters of 4 x 3 cells (1 cell is 6 x 6 pixels), the 39 filters use all reproducible combinations in 3 x 4 cells (a U-filter, a V-filter, a H-filter and a checkerboard pattern).
  • “Checkerboards” uses an image of 60 x 120 pixels as an input image.
  • a filter combination of the first embodiment detected a pedestrian with higher precision than in the conventional person detecting method.
  • U-filter square uniform filter
  • H-filter square horizontal filter
  • V-filter square vertical filter
  • C-filter square check filter
  • I-filter square inclined pattern filter
  • L-filter L-shaped filter
  • FIG. 7 shows an evaluation result by Miss Rate in each of filter combinations. Miss Rate is generally used for evaluation of Caltech dataset. As shown in FIG. 7, MFCF248, MFCF248C and MFCF2-8C had Miss Rates of less than 19%. MFCF248, which is a filter combination in the first embodiment, had more excellent result than any other published conventional method.
  • FIG. 8 is a functional block diagram of the object detection device according to the second embodiment.
  • the object detection device 1 includes the image input part 10, the scan part 11, an image conversion part 40, the gradient operation part 13, the gradient magnitude operation part 14, the gradient direction operation part 15, the maximum gradient magnitude operation part 16, the maximum gradient magnitude direction operation part 17, the normalized maximum gradient magnitude operation part 18, the normalized maximum gradient magnitude image preparing part 19, the gradient magnitude image for each gradient direction preparing part 20, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24.
  • elements having the same reference numerals as those of the first embodiment have the same function, so their explanations are omitted or they are briefly explained.
  • the image conversion part 40 converts an RGB image into input images (HSV images) of a predetermined color space (HSV).
  • the gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of each of the input images obtained by the image conversion part 40.
  • the gradient magnitude operation part 14 computes the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part 13.
  • the gradient direction operation part 15 computes the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part 13.
  • the maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operation part 14.
  • the maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the input images obtained by the gradient operation part 13.
  • the normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.
  • the normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18.
  • the gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.
  • the image shrinking part 21 converts the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.
  • the convolution part 22 enhances the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total sum of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
  • the feature vector converter 23 makes conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
  • the object recognition part 24 recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter 23.
  • FIG. 9 is a functional block diagram of the object detection device according to the third embodiment.
  • the third embodiment includes an image conversion part 50 which converts an RGB image into a grayscale image. The need to compute the maximum gradient magnitude from the gradient magnitudes in the two or more color space is eliminated, so that processing in the maximum gradient magnitude operation part can be removed.
  • the object detection device 1 includes the image input part 10, the scan part 11, an image conversion part 50, the gradient operation part 41, the gradient magnitude operation part 42, the gradient direction operation part 43, the normalized gradient magnitude image preparing part 45, the gradient magnitude image for each gradient direction preparing part 46, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24.
  • elements having the same reference numerals as those of the first or second embodiment have the same function, so their explanations are omitted or they are briefly explained.
  • the image conversion part 50 converts an RGB image into a grayscale image.
  • the gradient operation part 41 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the grayscale image on the basis of the intensity values of the grayscale image obtained by the image conversion part 50.
  • the gradient magnitude operation part 42 computes the gradient magnitude of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part 41.
  • the gradient direction operation part 43 computes the gradient direction of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part 41.
  • the normalized gradient magnitude image preparing part 45 prepares the normalized gradient magnitude image from the gradient magnitude obtained by the gradient magnitude operation part 42.
  • the gradient magnitude image for each gradient direction preparing part 46 prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part 43 and the gradient magnitude obtained by the gradient magnitude operation part 42.
  • the image shrinking part 21 converts the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.
  • the convolution part 22 enhances the image by convolving the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 8 corresponding to the total of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
  • the feature vector converter 23 makes conversion into the feature vector on the basis of the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
  • the object recognition part 24 recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter 23.
  • step S6 and step S7 in the processing flow (FIG. 2) of the first embodiment, and a change of the process content of replacing the LUV color image with the grayscale image is reflected.
  • a fourth embodiment directly uses an RGB image as an input image.
  • the LUV converter and the image conversion part in the first to third embodiments can be omitted.
  • the object detection device 1 includes the image input part 10, the scan part 11, the gradient operation part 13, the gradient magnitude operation part 14, the gradient direction operation part 15, the maximum gradient magnitude operation part 16, the maximum gradient magnitude direction operation part 17, the normalized maximum gradient magnitude operation part 18, the normalized maximum gradient magnitude image preparing part 19, the gradient magnitude image for each gradient direction preparing part 20, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24.
  • elements having the same reference numerals as those of the first embodiment have the same function, so their explanations are omitted or they are briefly explained.
  • the gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the RGB images on the basis of the intensity values of each of the RGB images.
  • the gradient magnitude operation part 14 computes the gradient magnitude of each of the RGB images on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.
  • the gradient direction operation part 15 computes the gradient direction of each of the RGB images on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.
  • the maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the RGB images obtained by the gradient magnitude operation part 14.
  • the maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.
  • the normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.
  • the normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18.
  • the gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.
  • the image shrinking part 21 converts the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.
  • the convolution part 22 enhances the image by convolving the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
  • the feature vector converter 23 makes conversion into the feature vector on the basis of the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
  • the object recognition part 24 recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter 23.
  • step S3 in the processing flow (FIG. 2) of the first embodiment, and a change of the process content of replacing the LUV color image with the RGB image is reflected.
  • the image shrinking part is not necessarily included, and the function of the image shrinking part may be included by each of the LUV converter, the image conversion part, the normalized maximum gradient magnitude image preparing part, the gradient magnitude image for each gradient direction preparing part and the normalized gradient magnitude image preparing part.
  • An object detection method performs scanning a rectangular area with a predetermined size to all areas of an input image, and performs the following steps (1) to (12) by the unit of an area cropped by sliding windows.
  • the object detection method comprises the following. (1) A gradient operating step of computing the horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of color space of each of the input images (an RGB image, an LUV image and an HSV image) having a plurality of color spaces (RGB, LUV and HSV). (2) A gradient magnitude operating step of operating the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step. (3) A gradient direction operating step of operating the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
  • the object detection method may include an image converting step of converting the RGB image into the input images of a predetermined color space (an LUV color image or an HSV image) before the gradient operating step for execution.
  • a predetermined color space an LUV color image or an HSV image
  • An object detection program scans all areas of the input image with a rectangular area of a predetermined size, and allows a computer to execute the following steps (1) to (12) by the unit of an area cropped by sliding window approach.
  • the object detection program allows the computer to execute the following.
  • the object detection program may allow the computer to execute an image converting step of converting the RGB image into the input images of a predetermined color space (an LUV color image or an HSV image) before the gradient operating step.
  • a predetermined color space an LUV color image or an HSV image
  • a seventh embodiment is a memory medium which stores the object detection program according to the sixth embodiment.
  • the memory medium is not particularly limited, and all the conventional memory media are included.
  • An object detection device comprises: a gradient operation part which computes the horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of color space of each of the input images (for example, an RGB image, an LUV image and an HSV image) having a plurality of color spaces (for example, RGB, LUV and HSV); a gradient magnitude operation part which operates the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part; a gradient direction operation part which operates the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part; a maximum gradient magnitude operation part which operates the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operation part; a maximum gradient magnitude direction operation part which operates the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part on
  • the object detection device may have the input image of the RGB image, the LUV color image or the HSV color image.
  • the object detection device may further include a scan part which repeats the process from processing performed by the gradient operation part to processing performed by the object recognition part by scanning of the entire input image in a rectangular area with a predetermined size.
  • the object detection device may further include an image conversion part which converts the RGB image into an input image of a predetermined color space (the LUV color image or the HSV image).
  • the object detection device may further include an image shrinking part which converts the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the input image to be used in the processing by the gradient operation part before the processing by the convolution part.
  • Vehicle A vehicle according to the present invention is a vehicle including the object detection device 1 according to the first to fourth embodiments.
  • the vehicle is not particularly limited, and may be a saddle-ride vehicle or straddled vehicle, a two-wheel vehicle, a three-wheel vehicle, and a four-wheel vehicle.
  • Any feature hereinbefore described as a “part” may alternatively be described as a “means” or “means for”. Accordingly, the words/phrases “part” and “means” and “means for” are herein used interchangeably.

Abstract

An object detection device which is capable of reducing the amount of data for processing and the processing time without reducing the recognition performance is provided. The object detection device includes: a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of input images; a gradient magnitude operation part which operates gradient magnitudes of each of the input images; a gradient direction operation part which operates gradient directions of each of the input images; a maximum gradient magnitude operation part which operates a maximum gradient magnitude with respect to each of pixels; a maximum gradient magnitude direction operation part which operates a gradient direction of the maximum gradient magnitude for each of pixels obtained by the maximum gradient magnitude operation part; a normalized maximum gradient magnitude operation part which operates a normalized maximum gradient magnitude; a normalized maximum gradient magnitude image preparing part which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude; a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions; a convolution part which enhances an image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10; a feature vector converter which makes conversion into a feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part; and an object recognition part which recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter.

Description

OBJECT DETECTION DEVICE AND VEHICLE HAVING THE OBJECT DETECTION DEVICE
The present invention relates to an object detection device which detects an object by image recognition and to a vehicle having the object detection device.
Description of the Related Art
"Filtered channel features for pedestrian detection" by S. Zhang, R. Benenson, and B. Schiele, CVPR, 2015 (non-patent Document 1) proposes recognition of a pedestrian from images through feature values in ten channels formed of an LUV color, a normalized gradient magnitude image and images of six different histograms of oriented gradients (HOG).
In non-patent Document 1, a combination of a plurality of types of filters is proposed as Checkerboards, with which data in the ten channels is converted into a feature vector, from which a pedestrian is recognized with a recognition method.
"Informed Haar-like features improve pedestrian detection" by S. Zhang, R. Bauckhage, and A. B. Cremers, CVPR, 2014 (non-patent Document D2) proposes a filter corresponding to the contour of a person. A filter extracted from the contour of a person, however, is not necessarily the optimum filter. There is an apprehension of a reduction in processing speed when this method (such the filter) is used.
With low-level features such as HOG features or Haar-like features, it is difficult to detect an object such as a pedestrian or an object which can change in shape (vary in shape attributes) or in appearance for example due to variations in clothing. In order to solve the problem of difficulty of detecting, CoHOG features have also been proposed such that co-occurrences between HOG features are obtained. However, when such features are used, the number of dimensions of the feature vector is increased and the time taken to perform detection and learning is extended. That is, the amount of computation (processing time) is increased and it is difficult to incorporate an object detection device using such features in a small CPU.
In the methods described in non-patent Documents 1 and 2, the number of channels or the number of filters is large and as a result the amount of data for processing and, hence, the processing time is increased.
An objective of the present invention is to provide an object detection device which is capable of reducing the amount of data for processing, and the processing time without reducing the recognition performance and a vehicle having the object detection device.
Various aspects of the present invention are defined in the independent claims appended hereto. Some optional and/or preferred features are defined in the dependent claims appended hereto.
According to a first aspect of the present invention there is provided an object detection device according to appended claims 1 to 6.
According to a second aspect of the present invention there is provided a vehicle according to claim 7.
An object detection device according to the present invention has:
an LUV converter which converts an RGB image into an LUV color image;
a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the LUV color image on the basis of the intensity values of the LUV color image obtained by the LUV converter;
a gradient magnitude operation part which computes gradient magnitudes of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation part;
a gradient direction operation part which computes gradient directions of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation part;
a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the LUV color image obtained by the gradient magnitude operation part;
a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation part;
a normalized maximum gradient magnitude operation part which computes a normalized maximum gradient magnitude with respect to each pixel on the basis of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel;
a normalized maximum gradient magnitude image preparing part which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude obtained by the normalized maximum gradient magnitude operation part with respect to each pixel;
a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient directions obtained by the gradient direction operation part and the gradient magnitudes obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part into a feature vector; and
an object recognition part which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.
In the present invention, the object detection device may further have an image capturing part. For example, an RGB image or a grayscale image may be taken with a CMOS sensor.
In the present invention, the object detection device may have a scanning part which slides a rectangular area of a predetermined size on the entire RGB image to repeat the process from processing performed by the LUV converter to processing performed by the object recognition part. For example, in a case where the image size is 64 x 128, the scanning part performs cropping process on the entire image by sliding in four-pixel steps.
In the present invention, the object detection device may have an image shrinking part which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions into an image with size smaller than the size of the RGB image.
According to this arrangement, the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions are of an image with size smaller than the size of the RGB image. For example, in the case where the RGB image size may have 64 x 128 pixels, each size of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions may have 32 x 64 pixels.
In another mode of implementation not having the image shrinking part, the LUV converter may convert the RGB image into an image with size smaller than the size of the RGB image before conversion into the LUV color image.
When preparing a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude, the normalized maximum gradient magnitude image preparing part may prepare the image in an image with size smaller than the size of the RGB image.
When preparing a gradient magnitude image for each of the six gradient directions, the magnitude image for each gradient direction preparing part may prepare the image in an image with size smaller than the size of the RGB image.
In still another mode of implementation of the present invention, another color space (color expression) can be used instead of the LUV color image. For example, an image converter which converts an RGB image into an HSV image may be provided. Processing in each component can be executed by using an HSV image instead of the LUV color image.
An object detection device in this mode of implementation has:
an image converter which converts an RGB image into an input image with a predetermined color space (an LUV color image or an HSV image);
a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the input image on the basis of the intensity values of the input image obtained by the image converter;
a gradient magnitude operation part which computes gradient magnitudes of the input image on the basis of the horizontal gradients and the vertical gradients of the input image obtained by the gradient operation part;
a gradient direction operation part which computes gradient directions of the input image on the basis of the horizontal gradients and the vertical gradients of the input image obtained by the gradient operation part;
a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the input image obtained by the gradient magnitude operation part;
a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the input image obtained by the gradient operation part;
a normalized maximum gradient magnitude operation part which computes a normalized maximum gradient magnitude with respect to each pixel on the basis of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel;
a normalized maximum gradient magnitude image preparing part which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude obtained by the normalized maximum gradient magnitude operation part with respect to each pixel;
a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient directions obtained by the gradient direction operation part and the gradient magnitudes obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the input image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part into a feature vector; and
an object recognition part which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.
In a further mode of implementation of the present invention, a grayscale image can be used instead of the LUV color image. For example, an image converter which converts an RGB image into a grayscale image may be provided. The need to compute the maximum gradient magnitude from the gradient magnitudes in the two or more color spaces (color expressions) is eliminated, so that processing in the maximum gradient magnitude operation part can be removed.
An object detection device in this mode of implementation has:
an image converter which converts an RGB image into a grayscale image;
a gradient operation part which computes a horizontal gradient in a horizontal direction and a vertical gradient in a vertical direction of the grayscale image on the basis of the intensity value of the grayscale image obtained by the image converter;
a gradient magnitude operation part which computes a gradient magnitude of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part;
a gradient direction operation part which computes a gradient direction of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part;
a normalized gradient magnitude image preparing part which prepares a normalized gradient magnitude image on the basis of the gradient magnitude obtained by the gradient magnitude operation part;
a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part and the gradient magnitude obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the grayscale image, the normalized gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 8 corresponding to the total of the grayscale image, the normalized gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the grayscale image, the normalized gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part into a feature vector; and
an object recognition part which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.
In a yet another mode of implementation of the present invention, an arrangement using an RGB image and not having the LUV converter may be provided.
An object detection device in this mode of implementation has:
a gradient operation part which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of an RGB image on the basis of the intensity values of the RGB image;
a gradient magnitude operation part which computes gradient magnitudes of the RGB image on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part;
a gradient direction operation part which computes gradient directions of the RGB image on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part;
a maximum gradient magnitude operation part which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the RGB image obtained by the gradient magnitude operation part;
a maximum gradient magnitude direction operation part which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the RGB image obtained by the gradient operation part;
a normalized maximum gradient magnitude operation part which computes a normalized maximum gradient magnitude with respect to each pixel on the basis of the maximum gradient magnitude obtained by the maximum gradient magnitude operation part with respect to each pixel;
a normalized maximum gradient magnitude image preparing part which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude obtained by the normalized maximum gradient magnitude operation part with respect to each pixel;
a gradient magnitude image for each gradient direction preparing part which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient directions obtained by the gradient direction operation part and the gradient magnitudes obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which converts the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution part into a feature vector; and
an object recognition part which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.
In the present invention, the object recognition part may have a prelearned recognition device to which the feature vector is provided.
The recognition device may be, for example, Adaboost (Adaptive Boosting).
The object recognition part may recognize that the object to be detected is detected when an output score which is the sum total as a result of multiplication between weighting values set in advance for each of the decision trees and scores for each of the decision trees is equal to or larger than a threshold value set in advance.
For example, the selected feature vector and a threshold value set in advance are compared at each node of the decision tree. A terminal node is detected by a depth priority search method and scores for the decision trees are computed. In all the decision trees, weighting values set in advance for each of the decision trees and the scores for each of the decision trees are multiplied with the corresponding weighting values and the total sum of the multiplication results is computed. The total sum is the output score. The object recognition part recognizes that the object to be detected is detected when the output score is equal to or larger than a threshold value set in advance.
In the present invention, the filters may have a rectangular shape, a square shape or a different shape (e.g., L-shape) including a plurality of pixels.
The filters may be uniform, horizontal, vertical pattern or non-uniform pattern and may be a check pattern or an inclination pattern. For example, when the filters are uniform, the values at all pixels are "1". When the filters are horizontal, the pixels above a boundary line are "1" and the pixels below the boundary line are "-1". When the filters are vertical, the pixels on the left-hand side of a boundary line are "1" and the pixels on the right-hand side of low the boundary line are "-1".
In the three or more types of filters, at least two of them differ in filter size from each other or all of them differ in filter size from each other.
The shapes of the filters may be not similar to a rectangular image. Rhomboid filters may be used for an oblong image or object.
In the present invention, the filters may be constituted of a square uniform filter, a square horizontal filter and a square vertical filter, and these filters may differ in filter size from each other.
In the present invention, the filters may be constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and these filters may be in a filter size relationship shown by the following equations:
U-filter ≦ H-filter < V-filter or
H-filter ≦ U-filter < V-filter
Preferably, the object detection device according to the present invention is used to detect a person, particularly a pedestrian.
Preferably, the filters are constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and these filters are in a filter size relationship shown by the following equations:
U-filter ≦ H-filter < V-filter or
H-filter ≦ U-filter < V-filter
Three or four of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter) and a square check pattern filter (C-filter) may be selected as the filters. The selected filters may have the same or different filter sizes and may be in a size relationship shown by the following equations:
H-filter ≦ U-filter ≦ C-filter < V-filter
Also, three or four and fewer than ten of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter), a square check pattern filter (C-filter), a square inclination pattern filter (I-filter) and an L-shaped filter (L-filter) may be selected as the filters. Not only filters having different patterns but also filters differing in filter size may be counted as different types of filters.
If the intensity of a color (c) in a pixel at coordinates (x, y) is l (x, y, c), a gradient lx (x, y, c) of the color (c) in the horizontal direction and a gradient ly (x, y, c) of the color (c) in the vertical direction can be obtained by the following equations:
lx (x, y, c) = l (x+1, y, c) - l (x-1, y, c) (1)
ly (x, y, c) = l (x, y+1, c) - l (x, y-1, c) (2)
A gradient magnitude gm (x, y, c) can be obtained by the following equation:
Figure JPOXMLDOC01-appb-M000001
A maximum gradient magnitude gmmax (x, y) can be obtained by the following equation from gradient magnitudes gm (x, y, c) of the LUV color image obtained by the gradient magnitude operation part.
gmmax (x, y) = max {gm (x, y, L), gm (x, y, U), gm (x, y, V)} (4)
Also, a gradient direction θ (x, y) when the gradient magnitude is the maximum gradient magnitude in the LUV color image can be obtained by the following equation:
Figure JPOXMLDOC01-appb-M000002
A normalized maximum gradient magnitude ngmmax (x, y) can be obtained by the following equation:
Figure JPOXMLDOC01-appb-M000003
where Sum (A) is the sum total in a predetermined pixel area containing gmmax (x, y) at its center. The pixel area is, for example an 11 x 11 rectangle centered to the pixel in gmmax (x, y).
The invention in another aspect is a vehicle having the above-described object detection device.
The vehicle may be a saddle-ride vehicle or straddled vehicle.
The object detection device according to the present invention is capable of reducing the amount of data for processing and the processing time without reducing the recognition performance. The vehicle according to the present invention can be provided with the above-described object detection device.
It will be appreciated that features analogous to those described in relation to the above aspect or optional features may be individually and separably or in combination applicable to any of the other aspects or optional features.
Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, which are following.
Fig.1 is a functional block diagram of an object detection device according to a first embodiment of the present invention. Fig.2 is a process flow chart of the object detection device according to the first embodiment. Fig.3A is an explanation view related to sliding windows. Fig.3B is an explanation view related to sliding windows. Fig.3C is an explanation view related to sliding windows. Fig.4A is an explanation view related to a gradient. Fig.4B is an explanation view related to a gradient. Fig.5 is an explanation view of generating 10 channel features from an RGB image. Fig.6 is a drawing showing the type of filters. Fig.7 is a drawing showing an evaluation result of Miss Rate. Fig.8 is a functional block diagram of an object detection device according to a second embodiment of the present invention. Fig.9 is a functional block diagram of an object detection device according to a third embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
First embodiment
Hereinafter, the first embodiment according to the present invention will be explained. Additionally, the present invention is not limited to embodiments which will be explained below, and naturally includes other embodiments. FIG. 1 is a functional block diagram of an object detection device according to the first embodiment.
Explanation of function
An object detection device 1 includes an image input part 10, a scan part 11, an LUV converter 12, a gradient operation part 13, a gradient magnitude operation part 14, a gradient direction operation part 15, a maximum gradient magnitude operation part 16, a maximum gradient magnitude direction operation part 17, a normalized maximum gradient magnitude operation part 18, a normalized maximum gradient magnitude image preparing part 19, a gradient magnitude image for each gradient direction preparing part 20, an image shrinking part 21, a convolution part 22, a feature vector converter 23 and an object recognition part 24. The object detection device 1 may be constituted by a single configuration such as a dedicated communication circuit, firmware and a processing device or a combination thereof. The above elements of the object detection device 1 may be achieved by a combination of software and hardware.
The image input part 10 may be an image capturing part which captures an image, a reading part which can read image data or a receiving part which receives image data (irrespective of wireless or wired). The image input part 10 may include an image converter which performs conversion into an image of a predetermined color space when the input image is not an image of a predetermined color space. The image input part 10 may include an RGB image converter which performs conversion into an RGB image in a case that the original image color space is not RGB, for example.
The scan part 11 scans a rectangular area of a predetermined size on the entire input image (for example, an RGB image). By the unit of a rectangular area of a predetermined size, the process from processing performed by the LUV converter to processing performed by the object recognition part which will be explained later is performed. A cropping area (a rectangular area of a predetermined size) can be arbitrarily set, and for example, 8 x 8, 8 x 16 can be listed. As shown in FIG. 3A, the scan part 11 performs cropping process by sliding the cropping area (8 x 18) in four-pixel steps from the left end to the right end of an RGB image (64 x 128). As shown in FIG. 3B, the next line is scanned by returning to the left end side of the RGB image and sliding in four-pixel steps downward. In the same manner as above, the cropping processing is performed by sliding the cropping area by four-pixel steps from the left end to the right end. In this way, sliding windows is performed from an upper left corner to a lower right corner of the RGB image, that is, on the entire RGB image. Additionally, the size of the cropping area is not limited to constant, but may be dynamically changed at the time of executing sliding windows. For example, in FIG. 3C, the cropping area may be enlarged toward diagonally lower right in stages, and the number of pixels to be slid may be proportional to the size of the rectangle.
The LUV converter 12 converts an RGB image (three channels of image) into LUV color images (three channels of image).
The gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the LUV color images on the basis of the intensity values (an intensity value of an L image, an intensity value of a U image and an intensity value of V image) of each of the LUV color images obtained by the LUV converter 12.
The gradient operation part 13 computes as follows.
If the intensity of a color channel (c) in a pixel at coordinates (x, y) is l (x, y, c) in an intensity image, a gradient lx (x, y, c) in the horizontal direction (x direction) and a gradient ly (x, y, c) in the vertical direction of a color (c) in a pixel at coordinates (x, y) can be obtained by the following equations. The arrangement relationship of the coordinate (x, y) is shown in FIG. 4A.

lx (x, y, c) = l (x+1, y, c) - l (x-1, y, c) (1)
ly (x, y, c) = l (x, y+1, c) - l (x, y-1, c) (2)
In a case of a grayscale image, the above equations can be directly used.
Since an LUV color image is used in this embodiment, the gradient is obtained for each of the LUV color images. If the color of an L image is cL, the color of a U image is cU and the color of a V image is cV, the horizontal gradient and the vertical gradient can be obtained as follows.
In a case of the L image,
lx (x, y, cL) = l (x+1, y, cL) - l (x-1, y, cL) (1-a)
ly (x, y, cL) = l (x, y+1, cL) - l (x, y-1, cL) (2-a)

In a case of the U image,
lx (x, y, cU) = l (x+1, y, cU) - l (x-1, y, cU) (1-b)
ly (x, y, cU) = l (x, y+1, cU) - l (x, y-1, cU) (2-b)

In a case of the V image,
lx (x, y, cV) = l (x+1, y, cV) - l (x-1, y, cV) (1-c)
ly (x, y, cV) = l (x, y+1, cV) - l (x, y-1, cV) (2-c)
The gradient magnitude operation part 14 computes the gradient magnitude of each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images obtained by the gradient operation part 13.
The gradient magnitude operation part 14 obtains the gradient magnitude gm (x, y, c) by the following equation.
Figure JPOXMLDOC01-appb-M000004
In a case of a grayscale image, the above equation can be directly used.
In each of the LUV color images, the gradient magnitude can be obtained as follows.
In a case of the L image,
Figure JPOXMLDOC01-appb-M000005
In a case of the U image,
Figure JPOXMLDOC01-appb-M000006
In a case of the V image,
Figure JPOXMLDOC01-appb-M000007
The gradient direction operation part 15 computes the gradient direction of each of the LUV color images (the gradient direction of the L image, the gradient direction of the U image and the gradient direction of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images obtained by the gradient operation part 13.
FIG. 4B shows the direction for each 30° as the gradient direction.
The maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of V image) obtained by the gradient magnitude operation part 14.
The maximum gradient magnitude operation part 16 computes the maximum gradient magnitude gmmax (x, y) by the following equation.

gmmax (x, y) = max {gm (x, y, cL), gm (x, y, cU), gm (x, y, cV)} (4-1)
The maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the LUV color images obtained by the gradient operation part 13.
The maximum gradient magnitude direction operation part 17 obtains the gradient direction θ (x, y) of the maximum gradient magnitude by the following equation:
Figure JPOXMLDOC01-appb-M000008
The normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.
The normalized maximum gradient magnitude operation part 18 obtains the normalized maximum gradient magnitude ngmmax (x, y) by the following equation:
Figure JPOXMLDOC01-appb-M000009
where Sum (A) is the total sum in a predetermined pixel area containing gmmax (x, y) at its center. The pixel area is, for example, an area formed by eleven pixels in the vertical direction and eleven pixels in the horizontal direction in the first embodiment.
The normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18.
The gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions (0°, 30°, 60°, 90°, 120°, 150°) between 0° to 180° on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.
The image shrinking part 21 converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image (an input image). For example, if the RGB image size is 64 x 128 pixels, each of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for six gradient directions is converted into 32 x 64 pixels.
The convolution part 22 enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
By convolving using different types of filters, the feature of each image is enhanced and recognition with high precision can be performed in the recognition part which will be explained later. That is, by using training data with conversion so that the features of each image are enhanced, a recognition device with higher recognition precision can be made.
In this embodiment, from the viewpoint of the object detection precision and the processing speed, the number of filters is 10 or less, preferably 4 or less, and more preferably, 3.
The shape of the filter may be a rectangular shape, a square shape or a different shape (for example, the shape of L) including a plurality of pixels.
As shown in FIG. 6, the pattern of the filter, (a) Uniform, (b) Horizontal, (c) Vertical, (d) Check, (e) Inclination, (f) Different shape (L-shape) are illustrated. In the drawing, white pixels are "1" and black pixels are "-1." As the filter size, for example, 2 x 2, 4 x 4, 6 x 6 and 8 x 8 pixels are listed.
In the first embodiment, the filters may be constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and preferably the filters are in a filter size relationship shown by the following equations:

U-filter ≦ H-filter < V-filter or
H-filter ≦ U-filter < V-filter
Moreover, three or four types of the square uniform filter (U-filter), the square horizontal filter (H-filter), the square vertical filter (V-filter) and a square check filter (C-filter) may be selected as the filters. The selected filters may have the same or different filter sizes and preferably are in a filter size relationship shown by the following equations:

H-filter ≦ U-filter ≦ C-filter < V-filter
The feature vector converter 23 makes conversion into the feature vector on the basis of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22. For example, when the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22 are 64 x 128 pixels, all information has 64 x 128 x 10 pixels, that is, 81920 pixels. The feature vector converter 23 performs processing of converting the information of 64 x 128 x 10 into one dimension of 1 x 81920.
The object recognition part 24 recognizes an object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter 23. The object recognition part 24 may have a prelearned recognition device to which the feature vector is provided. The recognition device may implement , for example, Adaboost.
The object recognition part 24 may recognize that the object to be detected is detected when an output score which is the total sum as a result of multiplication between weighting values set in advance for each of the decision trees and scores for each of the decision trees is equal to or larger than a threshold value set in advance.
The object recognition part 24 performs processing of the following (1) to (4).
(1) A selected value in feature vector and a threshold value set in advance are compared at each node of the decision tree.
(2) A terminal node is reached by a depth-first search method and scores for each of the decision trees are computed.
(3) In all the decision trees, weighting values set in advance for each of the decision trees and the scores for each of the decision trees are multiplied, and then the total sum of the multiplication results is computed. The total sum is the output score.
(4) The object recognition part recognizes that the object to be detected is detected when the output score is equal to or larger than a threshold value set in advance.
Processing Flow
Using FIG. 2, a processing flow of the first embodiment will be explained. First, in step S1, the image input part 10 having a CMOS sensor captures an image. FIG. 5(a) shows an example of an input image. The captured image is stored in a memory 30 by the unit of a frame. For each frame, the following processing is performed.
In step S2, the scan part 11 scans an RGB image by the unit of a rectangular area of a predetermined size. Thereafter, each processing of step S3 to step S14 is performed by the unit of a rectangular area of a predetermined size.
In step S3, the LUV converter 12 converts an RGB image into LUV color images. FIG. 5(b) shows an example of the LUV color image.
In step S4, the gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the LUV color images on the basis of the intensity values (an intensity value of an L image, an intensity value of a U image and an intensity value of a V image) of each of the LUV color images.
In step S5, the gradient magnitude operation part 14 computes the gradient magnitude of each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image) on the basis of the horizontal gradient and the vertical gradient of each of the LUV color images.
In step S6, the maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the LUV color images (the gradient magnitude of the L image, the gradient magnitude of the U image and the gradient magnitude of the V image).
In step S7, the normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.
In step S8, the normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18. FIG. 5(c) shows an example of the normalized maximum gradient magnitude image.
In step S9, the maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the LUV color images.
In step S10, the gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions (0°, 30°, 60°, 90°, 120°, 150°) between 0° to 180° on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14. FIG. 5(d) shows an example of the gradient magnitude image for each of six gradient directions.
Additionally, step S9 or step S10 which will be explained later may be executed after step S4 and step S6, may be simultaneously executed with step S7 or step S8, or may be executed before step S7 or step S8.
In step S11, the image shrinking part 21 converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image. FIG. 5(e) shows the reduced LUV color image, the reduced normalized maximum gradient magnitude image and the reduced gradient magnitude image for each of six gradient directions.
In step S12, the convolution part 22 enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10. While 3 or 4 types of filters are preferably used for convolution in this embodiment, using 3 types of filters for convolution is preferable since the processing time can be reduced without deteriorating the recognition property of the object.
In step S13, the feature vector converter 23 makes conversion into the feature vector on the basis of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
In step S14, the object recognition part 24 recognizes an object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter 23.
In step S15, whether sliding windows is completed or not is judged in all the image areas of one frame. If it is not completed, the process returns to step S2, and the scan area (the cropping area) is slid in accordance with a predetermined rule to repeat the processing from step S3 to step S14. When sliding windows is completed, the same processing is performed to the next frame.
Example
A pedestrian detection was evaluated using the object detection device 1 according to the first embodiment.
As a filter, three types of filters, which are, a square uniform filter (U-filter, (Fig. 6(a)), a square horizontal filter (H-filter, (Fig. 6(b)) and a square vertical filter (V-filter, (Fig. 6(c)) were used. Since the three types of filters were in four different sizes (2 x 2, 4 x 4, 6 x 6, 8 x 8 pixels), the number of combinations of all the filters is 16.
As an input image, image from the Caltech reasonable dataset were used.
To each filter combination, an evaluation of the performance was made using Miss Rate generally used for evaluation of Caltech dataset. Table 1 shows a comparison result between top 5 filter combinations and the conventional person detection method.
"MFCF" is a filter combination of the first embodiment.
"ACF-Caltech+" uses in total 10 images (channels) that is the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions as the feature in the same manner as the first embodiment. However, "ACF-Caltech+" does not have any convolution part as in the first embodiment, and directly sends 10 images to the feature vector converter and use it as a feature vector.
The difference between "ACF-Caltech+" and "ACF-ours" is the size of the input image.
"ACF-Caltech+" uses an image of 32 x 64 pixels as an input, while "ACF-ours" uses an image of 64 x 128 pixels as an input in the same manner as the first embodiment.
"LDCF" and "LDCF-ours" use total 10 images (channels) of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions in the same manner as the first embodiment, and further They include a convolution part. The difference from the first embodiment is the type of the filter. "LDCF" and "LDCF-ours" use four different types of 5-by-5-pixel-filters to each images: the LUV color image, the normalized maximum gradient magnitude image, and the gradient magnitude image for each of six gradient directions. That is, 40 types of filters are used. Moreover, also regarding the numerical values of an inner portion of the filter, binary numerical values such as "1" or "-1" as in the first embodiment are not used, but real numbers from -1.0 to 1.0 are used. The difference between "LDCF" and "LDCF-ours" is the size of the input image. "LDCF" uses an image of 32 x 64 pixels as an input, while "LDCF-ours" uses an image of 64 x 128 pixels as an input in the same manner as the first embodiment.
"Checkerboards" is a method of non-patent document 1. "Checkerboards" uses total 10 images (channels) of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions in the same manner as the first embodiment, and further it includes a convolution part. The difference from the first embodiment is the type of the filter. "Checkerboards" uses 39 filters of 4 x 3 cells (1 cell is 6 x 6 pixels), the 39 filters use all reproducible combinations in 3 x 4 cells (a U-filter, a V-filter, a H-filter and a checkerboard pattern). "Checkerboards" uses an image of 60 x 120 pixels as an input image.
As shown in Table 1, a filter combination of the first embodiment detected a pedestrian with higher precision than in the conventional person detecting method.

Table. 1
Figure JPOXMLDOC01-appb-I000010
Next, of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter), a square check filter (C-filter), a square inclined pattern filter (I-filter) and a L-shaped filter (L-filter), three or four types of filter combinations were evaluated.
As an input image, Caltech reasonable dataset was used.
FIG. 7 shows an evaluation result by Miss Rate in each of filter combinations. Miss Rate is generally used for evaluation of Caltech dataset. As shown in FIG. 7, MFCF248, MFCF248C and MFCF2-8C had Miss Rates of less than 19%.
MFCF248, which is a filter combination in the first embodiment, had more excellent result than any other published conventional method.
Second Embodiment
Hereinafter, the second embodiment according to the present invention will be explained. FIG. 8 is a functional block diagram of the object detection device according to the second embodiment.
Explanation of Function
The object detection device 1 includes the image input part 10, the scan part 11, an image conversion part 40, the gradient operation part 13, the gradient magnitude operation part 14, the gradient direction operation part 15, the maximum gradient magnitude operation part 16, the maximum gradient magnitude direction operation part 17, the normalized maximum gradient magnitude operation part 18, the normalized maximum gradient magnitude image preparing part 19, the gradient magnitude image for each gradient direction preparing part 20, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24. Hereinafter, elements having the same reference numerals as those of the first embodiment have the same function, so their explanations are omitted or they are briefly explained.
The image conversion part 40 converts an RGB image into input images (HSV images) of a predetermined color space (HSV).
The gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of each of the input images obtained by the image conversion part 40.
The gradient magnitude operation part 14 computes the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part 13.
The gradient direction operation part 15 computes the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part 13.
The maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operation part 14.
The maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient for each of the input images obtained by the gradient operation part 13.
The normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.
The normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18.
The gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.
The image shrinking part 21 converts the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.
The convolution part 22 enhances the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total sum of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
The feature vector converter 23 makes conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
The object recognition part 24 recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter 23.
Additionally, while the process flow of the second embodiment is the same as the processing steps of the process flow of the first embodiment (FIG. 2), a change of the process content of replacing the LUV color image with the HSV color image is reflected.
Third embodiment
Hereinafter, the third embodiment according to the present invention will be explained. FIG. 9 is a functional block diagram of the object detection device according to the third embodiment. The third embodiment includes an image conversion part 50 which converts an RGB image into a grayscale image. The need to compute the maximum gradient magnitude from the gradient magnitudes in the two or more color space is eliminated, so that processing in the maximum gradient magnitude operation part can be removed.
Explanation of Function
The object detection device 1 includes the image input part 10, the scan part 11, an image conversion part 50, the gradient operation part 41, the gradient magnitude operation part 42, the gradient direction operation part 43, the normalized gradient magnitude image preparing part 45, the gradient magnitude image for each gradient direction preparing part 46, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24. Hereinafter, elements having the same reference numerals as those of the first or second embodiment have the same function, so their explanations are omitted or they are briefly explained.
The image conversion part 50 converts an RGB image into a grayscale image.
The gradient operation part 41 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the grayscale image on the basis of the intensity values of the grayscale image obtained by the image conversion part 50.
The gradient magnitude operation part 42 computes the gradient magnitude of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part 41.
The gradient direction operation part 43 computes the gradient direction of the grayscale image on the basis of the horizontal gradient and the vertical gradient of the grayscale image obtained by the gradient operation part 41.
The normalized gradient magnitude image preparing part 45 prepares the normalized gradient magnitude image from the gradient magnitude obtained by the gradient magnitude operation part 42.
The gradient magnitude image for each gradient direction preparing part 46 prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part 43 and the gradient magnitude obtained by the gradient magnitude operation part 42.
The image shrinking part 21 converts the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.
The convolution part 22 enhances the image by convolving the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 8 corresponding to the total of the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
The feature vector converter 23 makes conversion into the feature vector on the basis of the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
The object recognition part 24 recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter 23.
Additionally, while the process flow of the third embodiment omits step S6 and step S7 in the processing flow (FIG. 2) of the first embodiment, and a change of the process content of replacing the LUV color image with the grayscale image is reflected.
Fourth Embodiment
A fourth embodiment directly uses an RGB image as an input image. The LUV converter and the image conversion part in the first to third embodiments can be omitted.
Explanation of function
The object detection device 1 includes the image input part 10, the scan part 11, the gradient operation part 13, the gradient magnitude operation part 14, the gradient direction operation part 15, the maximum gradient magnitude operation part 16, the maximum gradient magnitude direction operation part 17, the normalized maximum gradient magnitude operation part 18, the normalized maximum gradient magnitude image preparing part 19, the gradient magnitude image for each gradient direction preparing part 20, the image shrinking part 21, the convolution part 22, the feature vector converter 23 and the object recognition part 24. Hereinafter, elements having the same reference numerals as those of the first embodiment have the same function, so their explanations are omitted or they are briefly explained.
The gradient operation part 13 computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the RGB images on the basis of the intensity values of each of the RGB images.
The gradient magnitude operation part 14 computes the gradient magnitude of each of the RGB images on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.
The gradient direction operation part 15 computes the gradient direction of each of the RGB images on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.
The maximum gradient magnitude operation part 16 computes the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the RGB images obtained by the gradient magnitude operation part 14.
The maximum gradient magnitude direction operation part 17 computes the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16 on the basis of the horizontal gradient and the vertical gradient of each of the RGB images obtained by the gradient operation part 13.
The normalized maximum gradient magnitude operation part 18 computes the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part 16.
The normalized maximum gradient magnitude image preparing part 19 prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part 18.
The gradient magnitude image for each gradient direction preparing part 20 prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part 15 and the gradient magnitude obtained by the gradient magnitude operation part 14.
The image shrinking part 21 converts the grayscale image, the normalized gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the RGB image.
The convolution part 22 enhances the image by convolving the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking part 21 with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
The feature vector converter 23 makes conversion into the feature vector on the basis of the RGB image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part 22.
The object recognition part 24 recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter 23.
Additionally, while the process flow of the fourth embodiment omits step S3 in the processing flow (FIG. 2) of the first embodiment, and a change of the process content of replacing the LUV color image with the RGB image is reflected.
Other embodiments
In the first to fourth embodiments, the image shrinking part is not necessarily included, and the function of the image shrinking part may be included by each of the LUV converter, the image conversion part, the normalized maximum gradient magnitude image preparing part, the gradient magnitude image for each gradient direction preparing part and the normalized gradient magnitude image preparing part.
Fifth embodiment
An object detection method according to a fifth embodiment performs scanning a rectangular area with a predetermined size to all areas of an input image, and performs the following steps (1) to (12) by the unit of an area cropped by sliding windows.
The object detection method comprises the following.
(1) A gradient operating step of computing the horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of color space of each of the input images (an RGB image, an LUV image and an HSV image) having a plurality of color spaces (RGB, LUV and HSV).
(2) A gradient magnitude operating step of operating the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
(3) A gradient direction operating step of operating the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
(4) A maximum gradient magnitude operating step of operating the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operating step.
(5) A maximum gradient magnitude direction operating step of operating the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operating step on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation step.
(6) A normalized maximum gradient magnitude operating step of operating the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operating step.
(7) A normalized maximum gradient magnitude image preparing step of preparing the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operating step.
(8) A gradient magnitude image for each gradient direction preparing step of preparing the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operating step and the gradient magnitude obtained by the gradient magnitude operating step.
(9) An image shrinking step of converting the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the input image to be used in the gradient operating step before a convolving step.
(10) The convolving step of enhancing the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking step with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
(11) A feature vector converting step of making conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolving step.
(12) An object recognizing step of recognizing an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converting step.
Also, the object detection method may include an image converting step of converting the RGB image into the input images of a predetermined color space (an LUV color image or an HSV image) before the gradient operating step for execution.
Sixth embodiment
An object detection program according to a sixth embodiment scans all areas of the input image with a rectangular area of a predetermined size, and allows a computer to execute the following steps (1) to (12) by the unit of an area cropped by sliding window approach.
The object detection program allows the computer to execute the following.
(1) A gradient operating step of computing the horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of color space of each of the input images (an RGB image, an LUV image and an HSV image) having a plurality of color spaces (RGB, LUV and HSV).
(2) A gradient magnitude operating step of operating the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
(3) A gradient direction operating step of operating the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operating step.
(4) A maximum gradient magnitude operating step of operating the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operating step.
(5) A maximum gradient magnitude direction operating step of operating the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operating step on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation step.
(6) A normalized maximum gradient magnitude operating step of operating the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operating step.
(7) A normalized maximum gradient magnitude image preparing step of preparing the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operating step.
(8) A gradient magnitude image for each gradient direction preparing step of preparing the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operating step and the gradient magnitude obtained by the gradient magnitude operating step.
(9) An image shrinking step of converting the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the input image to be used in the gradient operating step before a convolving step.
(10) The convolving step of enhancing the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions whose sizes are reduced by the image shrinking step with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance.
(11) A feature vector converting step of making conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolving step.
(12) An object recognizing step of recognizing an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converting step.
Also, the object detection program may allow the computer to execute an image converting step of converting the RGB image into the input images of a predetermined color space (an LUV color image or an HSV image) before the gradient operating step.
Seventh embodiment
A seventh embodiment is a memory medium which stores the object detection program according to the sixth embodiment. The memory medium is not particularly limited, and all the conventional memory media are included.
Eighth embodiment
An object detection device according to an eighth embodiment comprises:
a gradient operation part which computes the horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of each of the input images on the basis of the intensity values of color space of each of the input images (for example, an RGB image, an LUV image and an HSV image) having a plurality of color spaces (for example, RGB, LUV and HSV);
a gradient magnitude operation part which operates the gradient magnitude of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part;
a gradient direction operation part which operates the gradient direction of each of the input images on the basis of the horizontal gradient and the vertical gradient of each of the input images obtained by the gradient operation part;
a maximum gradient magnitude operation part which operates the maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitude for each of the input images obtained by the gradient magnitude operation part;
a maximum gradient magnitude direction operation part which operates the gradient direction of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part on the basis of the horizontal gradient and the vertical gradient for each of the input images obtained by the gradient operation part;
a normalized maximum gradient magnitude operation part which operates the normalized maximum gradient magnitude with respect to each of pixels on the basis of the maximum gradient magnitude with respect to each of pixels obtained by the maximum gradient magnitude operation part;
a normalized maximum gradient magnitude image preparing part which prepares the normalized maximum gradient magnitude image from the normalized maximum gradient magnitude with respect to each of pixels obtained by the normalized maximum gradient magnitude operation part;
a gradient magnitude image for each gradient direction preparing part which prepares the gradient magnitude image for each of six gradient directions on the basis of the gradient direction obtained by the gradient direction operation part and the gradient magnitude obtained by the gradient magnitude operation part;
a convolution part which enhances the image by convolving the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions with at least three types of filter or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total sum of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for the six gradient directions, where the filters are stored in a memory in advance;
a feature vector converter which makes conversion into the feature vector on the basis of the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions processed by the convolution part; and
an object recognition part which recognizes an object by a method using decision trees on the basis of the feature vector calculated by the feature vector converter.
The object detection device may have the input image of the RGB image, the LUV color image or the HSV color image.
The object detection device may further include a scan part which repeats the process from processing performed by the gradient operation part to processing performed by the object recognition part by scanning of the entire input image in a rectangular area with a predetermined size.
The object detection device may further include
an image conversion part which converts the RGB image into an input image of a predetermined color space (the LUV color image or the HSV image).
The object detection device may further include an image shrinking part which converts the input image, the normalized maximum gradient magnitude image and the gradient magnitude image for each of six gradient directions into the image with size which is smaller than the size of the input image to be used in the processing by the gradient operation part before the processing by the convolution part.
Vehicle
A vehicle according to the present invention is a vehicle including the object detection device 1 according to the first to fourth embodiments. The vehicle is not particularly limited, and may be a saddle-ride vehicle or straddled vehicle, a two-wheel vehicle, a three-wheel vehicle, and a four-wheel vehicle.
Reference Signs List
1 object detection device
10 image input part
11 scan part
12 LUV converter
13 gradient operation part
14 gradient magnitude operation part
15 gradient direction operation part
16 maximum gradient magnitude operation part
17 maximum gradient magnitude direction operation part
18 normalized maximum gradient magnitude operation part
19 normalized maximum gradient magnitude image preparing part
20 gradient magnitude image for each gradient direction preparing part
21 image shrinking part
22 convolution part
23 feature vector converter
24 object recognition part

Any feature hereinbefore described as a “part” may alternatively be described as a “means” or “means for”. Accordingly, the words/phrases “part” and “means” and “means for” are herein used interchangeably.

Claims (7)


  1. An object detection device comprising:
    an LUV converter which converts an RGB image into an LUV color image;
    a gradient operation means which computes horizontal gradients in a horizontal direction and vertical gradients in a vertical direction of the LUV color image on the basis of the intensity values of the LUV color image obtained by the LUV converter;
    a gradient magnitude operation means which computes gradient magnitudes of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation means;
    a gradient direction operation means which computes gradient directions of the LUV color image on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation means;
    a maximum gradient magnitude operation means which computes a maximum gradient magnitude with respect to each of pixels on the basis of the gradient magnitudes of the LUV color image obtained by the gradient magnitude operation means;
    a maximum gradient magnitude direction operation means which computes a gradient direction of the maximum gradient magnitude obtained by the maximum gradient magnitude operation means with respect to each pixel on the basis of the horizontal gradients and the vertical gradients of the LUV color image obtained by the gradient operation means;
    a normalized maximum gradient magnitude operation means which computes a normalized maximum gradient magnitude with respect to each pixel on the basis of the maximum gradient magnitude obtained by the maximum gradient magnitude operation means with respect to each pixel;
    a normalized maximum gradient magnitude image preparing means which prepares a normalized maximum gradient magnitude image from the normalized maximum gradient magnitude obtained by the normalized maximum gradient magnitude operation means with respect to each pixel;
    a gradient magnitude image for each gradient direction preparing means which prepares a gradient magnitude image for each of six gradient directions on the basis of the gradient directions obtained by the gradient direction operation means and the gradient magnitudes obtained by the gradient magnitude operation means;
    a convolution means which enhances the image by convolving the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions with at least three types of filters or a number of types of filters larger than 3 but equal to or smaller than 10 corresponding to the total of the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions, where the filters are stored in a memory in advance;
    a feature vector converter which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions processed by the convolution means into a feature vector; and
    an object recognition means which recognizes the object by a method that uses decision trees on the basis of the feature vector calculated by the feature vector converter.

  2. The object detection device according to Claim 1, further comprising a scanning means which scans a rectangular area of a predetermined size on the entire RGB image to repeat the process from processing performed by the LUV converter to processing performed by the object recognition means.

  3. The object detection device according to Claim 1 or 2, further comprising an image shrinking means which converts the LUV color image, the normalized maximum gradient magnitude image and the gradient magnitude images for the six gradient directions into an image with size smaller than the size of the RGB image.

  4. The object detection device according to any one of Claims 1 to 3, wherein the object recognition means recognizes that the object to be detected is detected when an output score which is the total sum as a result of multiplication between weighting values set in advance for decision trees and scores for the decision trees is equal to or larger than a threshold value set in advance.

  5. The object detection device according to any one of Claims 1 to 4, wherein the filters are constituted of a square uniform filter (U-filter), a square horizontal filter (H-filter) and a square vertical filter (V-filter), and the filters are in a filter size relationship:
    U-filter ≦ H-filter < V-filter or
    H-filter ≦ U-filter < V-filter.

  6. The object detection device according to any one of Claims 1 to 5, wherein the filter is constituted of three or four types of a square uniform filter (U-filter), a square horizontal filter (H-filter), a square vertical filter (V-filter) and a square check filter (C-filter), and the filters are in a filter size relationship:
    H-filter ≦ U-filter ≦ C-filter < V-filter.

  7. A vehicle comprising the object detection device according to any one of Claims 1 to 6.

PCT/JP2016/088698 2016-04-01 2016-12-26 Object detection device and vehicle having the object detection device WO2017168889A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1605589.9 2016-04-01
GB201605589 2016-04-01

Publications (1)

Publication Number Publication Date
WO2017168889A1 true WO2017168889A1 (en) 2017-10-05

Family

ID=57851286

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/088698 WO2017168889A1 (en) 2016-04-01 2016-12-26 Object detection device and vehicle having the object detection device

Country Status (1)

Country Link
WO (1) WO2017168889A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673489A (en) * 2021-10-21 2021-11-19 之江实验室 Video group behavior identification method based on cascade Transformer

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130279745A1 (en) * 2012-02-01 2013-10-24 c/o Honda elesys Co., Ltd. Image recognition device, image recognition method, and image recognition program
US20150206319A1 (en) * 2014-01-17 2015-07-23 Microsoft Corporation Digital image edge detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130279745A1 (en) * 2012-02-01 2013-10-24 c/o Honda elesys Co., Ltd. Image recognition device, image recognition method, and image recognition program
US20150206319A1 (en) * 2014-01-17 2015-07-23 Microsoft Corporation Digital image edge detection

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
S. ZHANG; R. BAUCKHAGE; A. B. CREMERS: "Informed Haar-like features improve pedestrian detection", CVPR, 2014
S. ZHANG; R. BENENSON; B. SCHIELE: "Filtered channel features for pedestrian detection", CVPR, 2015
ZHANG SHANSHAN ET AL: "Filtered channel features for pedestrian detection", 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 7 June 2015 (2015-06-07), pages 1751 - 1760, XP032793632, DOI: 10.1109/CVPR.2015.7298784 *
ZHANG SHANSHAN ET AL: "Informed Haar-Like Features Improve Pedestrian Detection", 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 23 June 2014 (2014-06-23), pages 947 - 954, XP032649097, DOI: 10.1109/CVPR.2014.126 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673489A (en) * 2021-10-21 2021-11-19 之江实验室 Video group behavior identification method based on cascade Transformer

Similar Documents

Publication Publication Date Title
CN111401372B (en) Method for extracting and identifying image-text information of scanned document
CN110046529B (en) Two-dimensional code identification method, device and equipment
Zhao et al. Detecting digital image splicing in chroma spaces
KR101795823B1 (en) Text enhancement of a textual image undergoing optical character recognition
EP2833288B1 (en) Face calibration method and system, and computer storage medium
JP4396376B2 (en) Graphic reading method and apparatus, and main color extraction method and apparatus
US20110211233A1 (en) Image processing device, image processing method and computer program
CN107220624A (en) A kind of method for detecting human face based on Adaboost algorithm
JP2008148298A (en) Method and apparatus for identifying regions of different content in image, and computer readable medium for embodying computer program for identifying regions of different content in image
US9171224B2 (en) Method of improving contrast for text extraction and recognition applications
KR20100031481A (en) Object detecting device, imaging apparatus, object detecting method, and program
JP5671928B2 (en) Learning device, learning method, identification device, identification method, and program
JP2013042415A (en) Image processing apparatus, image processing method, and computer program
US10885326B2 (en) Character recognition method
US8526738B2 (en) Information processing apparatus including a plurality of multi-stage connected information processing units
CN115049689A (en) Table tennis identification method based on contour detection technology
CN111445402B (en) Image denoising method and device
JP5201184B2 (en) Image processing apparatus and program
US20190266429A1 (en) Constrained random decision forest for object detection
US9424488B2 (en) Applying a segmentation engine to different mappings of a digital image
WO2017168889A1 (en) Object detection device and vehicle having the object detection device
CN114529715B (en) Image identification method and system based on edge extraction
US20060233452A1 (en) Text enhancement methodology in scanned images of gray-scale documents
KR101667877B1 (en) Feature detecting method for fish image and fish identification method using feature of fish image
CN114241580A (en) Face recognition method based on local binary characteristics and related device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16828989

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16828989

Country of ref document: EP

Kind code of ref document: A1