WO2022244333A1 - Object recognition device and object recognition method - Google Patents

Object recognition device and object recognition method Download PDF

Info

Publication number
WO2022244333A1
WO2022244333A1 PCT/JP2022/004511 JP2022004511W WO2022244333A1 WO 2022244333 A1 WO2022244333 A1 WO 2022244333A1 JP 2022004511 W JP2022004511 W JP 2022004511W WO 2022244333 A1 WO2022244333 A1 WO 2022244333A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature amount
texture
dimensional
object recognition
image
Prior art date
Application number
PCT/JP2022/004511
Other languages
French (fr)
Japanese (ja)
Inventor
健 遠藤
春樹 的野
健 永崎
Original Assignee
日立Astemo株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日立Astemo株式会社 filed Critical 日立Astemo株式会社
Priority to DE112022001417.2T priority Critical patent/DE112022001417T5/en
Publication of WO2022244333A1 publication Critical patent/WO2022244333A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to an object recognition device and an object recognition method that analyze an image captured by an in-vehicle camera and recognize objects around the vehicle.
  • Lane Keep Assist System which prevents deviation from the roadway, is expected to serve as a driving support system to prevent single accidents caused by drowsiness or lack of attention while driving.
  • this lane-keeping support system is equipped with a function that analyzes the image captured by the vehicle-mounted camera and divides the area within the image into a drivable area and an undrivable area.
  • Japanese Patent Application Laid-Open No. 2002-200302 is known as a known document that discloses a region dividing method.
  • the abstract of this document states that ⁇ an acquired target image is divided into regions using different feature quantities, and a generated region group consisting of a plurality of generated regions characterized by the feature quantities used for the region division is (S130)
  • One of the generation region groups is defined as a basic region, and the feature quantity of the generation region that overlaps with the basic region is incorporated into the feature quantity of the basic region.
  • a feature amount is extracted from the target image and the distance information, and correspondence is performed based on the distance weighted for each feature amount. More specifically, as described in FIG. 3, paragraphs 0046 to 0050, etc. of the same document, the weight is calculated from the brightness of the entire target image. uses the weight of
  • Patent Document 1 in which a single target image is divided into regions using a single weight, the weight cannot be changed for each imaged object, and a weight that is different from the weight that should be used is used. In this case, it is difficult to correctly segment the target image into regions.
  • the present invention aims to provide an object recognition device and an object recognition method that recognize a travelable area with high accuracy by changing the weight for each image area.
  • An input signal acquisition unit that acquires texture information and three-dimensional information of an image, and calculates a texture feature amount based on the texture information of a partial area of the image and a three-dimensional feature amount based on the three-dimensional information of the partial area.
  • a feature amount calculation unit a weight parameter generation unit that generates a weight parameter for each of the partial regions, and a combined feature amount that integrates the texture feature amount and the three-dimensional feature amount by weighting with the weight parameter.
  • an object recognition unit for recognizing the object in the image based on the integrated feature quantity.
  • the travelable area can be recognized with high accuracy by changing the weight for each image area.
  • FIG. 4 is a processing flowchart of the object recognition device of the first embodiment; An example of a texture image acquired by the object recognition device of the first embodiment. Data structure of texture information of each pixel of the texture image of FIG. 3A. 4 is an example of a three-dimensional image acquired by the object recognition device of Example 1; Data structure of three-dimensional information of each pixel of the three-dimensional image of FIG. 3C. 4 is a schematic explanatory diagram of a neural network of the object recognition device of Example 1.
  • FIG. 4A and 4B are diagrams for explaining travel permission/inhibition determination processing of the object recognition apparatus according to the first embodiment;
  • FIG. 5 is an example of a weight calculation processing flow of the object recognition apparatus of the first embodiment;
  • FIG. 11 is an example of a weight calculation processing flow of the object recognition device of the second embodiment;
  • FIG. 10 is an example of a method for determining a reference feature amount of the object recognition device of the second embodiment;
  • FIG. An example of a method of calculating a weight from a reference feature amount of the object recognition device of the second embodiment.
  • FIG. 11 is an explanatory diagram of a neural network of the object recognition device of Example 3;
  • FIG. 13 is a diagram showing a weight calculation layer in the neural network of FIG. 12;
  • FIG. An example of the processing flow of weight calculation of the object recognition apparatus of Example 4.
  • FIG. 11 is an explanatory diagram of a neural network of the object recognition device of Example
  • FIG. 1 An object recognition device 100 according to Example 1 of the present invention will be described with reference to FIGS. 1 to 8.
  • FIG. 1 An object recognition device 100 according to Example 1 of the present invention will be described with reference to FIGS. 1 to 8.
  • FIG. 1 An object recognition device 100 according to Example 1 of the present invention will be described with reference to FIGS. 1 to 8.
  • FIG. 1 is a functional block showing the configuration of the object recognition device 100 of the first embodiment.
  • the object recognition apparatus 100 includes, as hardware, an external sensor such as an in-vehicle camera, an arithmetic device such as a CPU, and a storage device such as a semiconductor memory. , various functions shown in the figure operate. It should be noted that implementation of various functions by executing programs is a well-known technology, and therefore, description of specific operations of hardware such as arithmetic units will be omitted below.
  • the object recognition apparatus 100 includes an input signal acquisition unit 1, a feature amount calculation unit 2, a storage unit 3, a weight parameter generation unit 4, and an object recognition unit as functional units realized by the above hardware. has 5. Each part will be described in order below.
  • the input signal acquisition section 1 has an image acquisition section 11 and a three-dimensional information acquisition section 12 .
  • the image acquisition unit 11 acquires the texture image Ft in units of frames captured by the vehicle-mounted camera. Therefore, if the vehicle-mounted camera is a monocular camera, the image acquisition unit 11 acquires one texture image Ft for each imaging frame. Get the texture image Ft .
  • the three-dimensional information acquisition unit 12 obtains the three-dimensional information Id for each pixel by using a well-known parallax calculation method. to generate Further, if the image acquisition unit 11 acquires one texture image Ft from the monocular camera, the three-dimensional information acquisition unit 12 obtains three-dimensional information for each pixel from a millimeter wave radar or Lidar provided in parallel with the monocular camera. Get the Id .
  • the feature amount calculator 2 has a texture feature amount calculator 21 and a three-dimensional feature amount calculator 22 .
  • the texture feature amount calculation unit 21 calculates a texture feature amount fet_t from the texture image Ft acquired by the image acquisition unit 11 .
  • a HoG feature amount using edges may be used, or an ICF feature amount based on machine learning may be used.
  • feature amounts extracted from a convolutional neural network hereinafter simply referred to as "neural network N", which will be described later, may be used.
  • the three-dimensional feature quantity calculator 22 calculates a three-dimensional feature quantity fet d from the three-dimensional information I d acquired by the three-dimensional information acquirer 12 .
  • the feature amount includes the HoG feature amount for the distance image obtained by projecting the distance information onto the image, the ICF feature amount of the 3D image Fd in which the 3D information is stored as the image channel, and the 3D information.
  • a feature amount extracted from a neural network with Id as an input may be used.
  • the storage unit 3 has a texture reference feature amount storage unit 31 and a three-dimensional reference feature amount storage unit 32 .
  • the texture reference feature amount storage unit 31 stores the texture reference feature amount Bt extracted from the texture information It of the texture image Ft
  • the three- dimensional reference feature amount storage unit 32 stores the texture reference feature amount extracted from the three-dimensional information Id.
  • the three-dimensional reference feature amount Bd is stored.
  • the texture reference feature storage unit 31 can store a plurality of texture reference features Bt
  • the three-dimensional reference feature storage unit 32 can store a plurality of three-dimensional reference features.
  • a quantity B d can be stored.
  • the reference feature amount B stored in both storage units is determined from the viewpoint of the recognition rate in the verification data set by constructing classifiers using the texture feature amount calculation unit 21 and the 3D feature amount calculation unit 22, respectively. . Specifically, the feature amount that is successfully recognized or the feature amount with the maximum recognition score is stored as the reference feature amount B in each storage unit. Also, a kernel calculated by learning a neural network in which the texture feature amount calculation unit 21 and the three-dimensional feature amount calculation unit 22 are part of the network configuration may be stored as the reference feature amount B. FIG.
  • the weight parameter generation unit 4 calculates the weight w for the feature amount fet calculated by the feature amount calculation unit 2 using the reference feature amount B stored in the storage unit 3 . Specifically, the inner product of the outputs of the texture feature amount calculation unit 21 and the texture reference feature amount storage unit 31 is calculated, and the inner product is used as the weight w t of the texture feature amount fet t . Similarly, the inner product of the outputs of the three-dimensional feature amount calculation unit 22 and the three-dimensional reference feature amount storage unit 32 is calculated, and the inner product is used as the weight wd of the three-dimensional feature amount fetd .
  • the weight w is calculated by focusing on the similarity with the reference feature value B.
  • an index of an exponential function such as the L2 distance or the Bhattacharya distance may be used as the weight w.
  • the average value of the inner product values of each reference feature amount can be used as the weight. can.
  • the reference feature amount B is a kernel of a neural network
  • the weight w may be calculated by further performing a convolution operation on the calculation result of the inner product of the feature amount and the reference feature amount.
  • the target object recognition unit 5 recognizes the target object based on the feature amount obtained by integrating the feature amount calculated by the feature amount calculation unit 2 with the weight w generated by the weight parameter generation unit 4 .
  • the integrated feature amount fet C is generated by adding the texture feature amount fet t and the three-dimensional feature amount fet d according to the weight w generated by the weight parameter generation unit 4 . Then, a discriminator using the integrated feature amount fet C recognizes the drivable area around the vehicle.
  • the travelable area recognized by the object recognition unit 5 is output to the ECU (Electronic Control Unit) via a CAN (Controller Area Network) (not shown). Therefore, the ECU executes lane keeping support control by assisting control of the steering system so as not to deviate from the drivable area around the vehicle.
  • ECU Electronic Control Unit
  • CAN Controller Area Network
  • the object recognition apparatus 100 of the present embodiment performs an input information acquisition process (step S1), a texture feature amount extraction process (step S2), a three-dimensional feature amount extraction process (step S3), a weight calculation process (step S4), a feature amount An integration process (step S5) and a type determination process (step S6) are performed in order.
  • FIG. 3A shows the data structure of the texture information I t of each pixel of the texture image F t of FIG. 3A.
  • the texture information I t exemplified in FIG. 3B is data defining the color of each pixel of the texture image F t by a combination of the R value, the G value, and the B value. is not limited to
  • a parallax image is generated by scanning the left camera image with the right camera image as a reference for the acquired left and right two texture images Ft .
  • SAD Sud of Absolute Difference
  • the depth distance Z, the horizontal distance X, and the vertical distance Y from the camera are calculated based on the parallax image, and the three-dimensional information I Generate a three-dimensional image Fd , where d is the channel of the image.
  • the three-dimensional image Fd generated in this way is illustrated in FIG.
  • FIG. 3C the data structure of the three-dimensional information Id of each pixel of the three-dimensional image Fd is shown in FIG. 3D.
  • the procedure for determining whether the texture region Rt shown in FIG. 3A is a travelable region or a non-travelable region will be described in consideration of the three-dimensional information Id of the three-dimensional region Rd shown in FIG. 3C.
  • step S2 the texture feature amount extraction process (step S2), the texture feature amount fet_t is extracted using the information acquired in step S1.
  • the neural network Nt of FIG. 4 is learned so that when an arbitrary local region R having the same size as the texture region Rt of FIG. 3A is input, it can be determined whether or not the local region R is a travelable region. It is a thing.
  • a learning data set to which correct values are given is used.
  • the neural network Nt of FIG. 4 is composed of a layer N1t for feature quantity extraction at the front stage and a layer N2t for identification processing at the rear stage.
  • the preceding layer N1t is composed of a number of convolution kernels and activation functions Relu, and extracts texture feature fet t effective for discrimination from the local region R. Further, the latter layer N2t is configured to determine whether or not the local region R is a travelable region by the fully connected layer and the activation function Softmax for the texture feature quantity fet t extracted by the former layer N1t. It's becoming
  • the texture feature quantity fet_t of the texture region Rt can be calculated.
  • step S3 similarly to step S2, the three-dimensional feature quantity fet d is extracted using the information acquired in step S1.
  • a neural network Nd is used to determine whether travel is possible based on the three-dimensional image Fd .
  • This neural network Nd is trained so that when an arbitrary local region R having the same size as the three-dimensional region Rd in FIG. 3C is input, it can be determined whether the local region R is a travelable region. Similar to the neural network Nt shown in FIG. 4, it is composed of a front feature extraction layer N1d and a rear identification processing layer N2d.
  • the three-dimensional region A three-dimensional feature quantity fet d of R d can be calculated.
  • the layer N1t (FIG. 5A) for the texture region Rt and the layer N1d (FIG. 5B) for the three-dimensional region Rd are assumed to extract feature quantities of the same number of dimensions.
  • step S4 the weight w t for the texture feature quantity fet t extracted in step S2 and the weight w d for the three-dimensional feature quantity fet d extracted in step S3 are calculated.
  • the method of calculating the weight w t for the texture feature amount fet t will be described, and the weight w d for the three-dimensional feature amount fet d , which can be calculated in the same way, will be omitted.
  • the texture reference feature B t is used to calculate the weight w t of the texture feature fet t .
  • the neural network Nt of FIG. 4 is used to determine the texture reference feature Bt .
  • R1, R2, and R3 in FIG. 6 respectively indicate local regions of the verification data set used for calculating the recognition rate.
  • E1, E2 and E3 respectively indicate the recognition results of the neural network Nt when the local regions R1, R2 and R3 are input. In FIG. 6, the recognition result E2 when the local region R2 is input is correct, and the recognition results E1 and E3 when the local regions R1 and R3 are input are wrong.
  • the texture feature amount fet t2 output by the preceding layer N1t when the local region R2 is input is determined as the texture reference feature amount Bt and stored in the texture reference feature amount storage unit 31 .
  • the texture reference feature quantity Bt determined in this manner is not a variable that changes according to the position of the texture region Rt , but a constant that is preset before execution of step S4.
  • the texture feature amount fet t extracted in step S2 for an arbitrary texture region R t and the texture reference feature amount B t , which is a constant, are used to calculate the inner product according to (Equation 1).
  • the weight w t for the texture feature fet t in the texture region R t can be calculated.
  • D in (Equation 1) indicates the number of dimensions of the texture reference feature Bt .
  • the calculation result of (Equation 1 ) represents the correlation value between the texture feature quantity fet t and the texture reference feature quantity Bt, that is, the degree of similarity.
  • step S5 the texture feature quantity fet t extracted in step S2 and the three-dimensional feature quantity fet d extracted in step S3 are combined using the weights w t and w d calculated in step S4.
  • a unified integrated feature fet C is calculated. Integration of feature amounts is calculated according to the following (Equation 3).
  • step S6 it is determined whether or not the vehicle is in the travelable area based on the integrated feature amount fet C calculated in step S5.
  • a neural network N3 is used to determine the travelable area based on the integrated feature amount fetC .
  • FIG. 7 shows a conceptual diagram of this step. As shown here, the integrated feature amount fet C is used as an input to the neural network N3 to determine whether or not the vehicle is in the travelable area.
  • the neural network N3 is composed of a feature extractor consisting of a large number of convolution layers and an activation function Relu, and a discrimination processor consisting of a fully connected layer and an activation function Softmax.
  • the neural network N2t described above is trained with a data set with the texture feature fet t as an input, and the neural network layer N2d is trained with a data set with the feature fet d as an input. It is assumed that the neural network N3 of is trained with a data set with the integrated feature amount fet C as an input.
  • the object recognition apparatus 100 of the present embodiment can change the weight of the feature quantity for each pixel of the image. As a result, even if the texture image Ft includes an object that should be determined by actively using the texture information It and an object that should be determined by actively using the three-dimensional information Id , However, different weights can be set for each object, and the recognition accuracy can be improved.
  • the feature amount weight is calculated by comparing the reference feature amount B determined in advance and the feature amount fet. This eliminates the need for additional image analysis processing such as analyzing the luminance value of the entire image for weight calculation, and makes weight calculation more efficient.
  • the weight is calculated by calculating the inner product of the reference feature amount and the feature amount. Since the inner product calculation can be performed only by sum-of-products calculation, weight calculation can be performed with a small amount of calculation.
  • different neural network layers N1t and N1d are used to calculate the reference feature amount B of the texture information It and the three-dimensional information Id.
  • the texture reference feature Bt can be determined only from the texture information It, and the three-dimensional reference feature can be determined only from the three-dimensional information Id , so that the weight can be calculated more accurately.
  • the reference feature quantity B is generated based on the recognition rate for the verification data. Therefore, a successfully recognized feature amount can be selected as a reference feature amount. As a result, it is possible to calculate a weight that positively uses a feature amount similar to the feature amount that has been successfully recognized, so that recognition can be performed with higher accuracy.
  • step S4 the weight wt of the texture information It and the weight wd of the three-dimensional information Id are always calculated.
  • step S41 the validity of the acquired 3D information Id is determined. If the pixel fails to acquire the three-dimensional information Id , or if the cost of the parallax calculation is equal to or greater than a predetermined value, the three-dimensional image Fd is determined to be invalid. If it is determined to be invalid, the process proceeds to step S6 without calculating the weight wt of the texture information I t and the weight wd of the three-dimensional information I d . That is, in step S6, using the layer N2t of the neural network in FIG. 4, it is determined whether or not the vehicle can travel based only on the texture feature amount fet t .
  • step S42 the weight wt of the texture information It and the weight wd of the three-dimensional information Id are calculated (step S42 ). Then, the weight wt and the weight wd are used to execute the processing after step S5.
  • the weight calculation process is composed of the reference feature amount inner product calculation (step S4a) and the average value calculation process (step S4b).
  • each of a plurality of reference feature amounts B set for each feature amount fet is used to calculate the inner product with the feature amount fet.
  • a method for setting a plurality of reference feature amounts B will be described with reference to FIG. 10 .
  • the method of setting the texture reference feature Bt corresponding to the texture image Ft will be described, and the description of the three- dimensional reference feature Bd , which can be set in the same way, will be omitted.
  • the meaning of each code in FIG. 10 is the same as the meaning of each code in FIG. The difference between the two figures is that in FIG.
  • both the texture feature amounts fet t1 and fet t2 resulting from the successfully identified local regions R1 and R2 are set as the texture reference feature amounts B t1 and B t2 .
  • a weight w t of the texture feature amount fet t of the texture region R t is calculated. Therefore, in step S4a, as shown in FIG. 11, the inner product value S t1 of the texture feature amount fet t extracted from the texture region R t and the texture reference feature amount B t1 , and the texture feature amount fet t and the texture reference feature amount Calculate the inner product value S t2 of B t2 . That is, similarity information between each reference feature amount and the texture feature amount fet_t of the texture region R_t is calculated.
  • step S4b the weight w t for the texture feature amount fet t is calculated from the plurality of inner product values calculated in step S4a.
  • the following equation (4) is used to calculate the weight wt .
  • a t indicates an index set of the texture reference feature B t .
  • the average of the inner product values of the texture reference feature Bt can be obtained.
  • the average value of the inner product values calculated from a plurality of texture reference feature quantities Bt is used as the weight wt for the texture feature quantity fet t .
  • the method of calculating the weight w t of the texture feature quantity fet t has been described above . Calculation of Bd and calculation of weight wd based on the average value of inner products are performed.
  • a plurality of reference feature amounts B are set for each feature amount fet, and the average value of the respective inner product values is used as the weight w of the feature amount fet.
  • neural networks are used in each of the three steps of texture feature amount extraction processing (step S2), three-dimensional feature amount extraction processing (step S3), and type determination processing (step S6) shown in FIG. .
  • step S2 three steps of texture feature amount extraction processing
  • step S3 three-dimensional feature amount extraction processing
  • step S6 type determination processing
  • three types of neural networks were used to execute the processing of FIG.
  • one neural network N incorporating the functions of the networks of the first embodiment as layers is used to execute the processing of FIG.
  • FIG. 12 shows the configuration of the neural network N of this embodiment.
  • the correspondence relationship between the processing flow of FIG. 2 and the neural network N will be described below.
  • the input information acquisition process is the same as that of the first embodiment, the subsequent processes will be described.
  • the neural network N shown in FIG. 12 first receives the texture region Rt and the three-dimensional region Rd as inputs, and performs the texture feature amount extraction process (step S2) and the three-dimensional feature amount extraction process (step S3).
  • the texture feature quantity extraction process (step S2) the layer N1t of the neural network N is used to extract the texture feature quantity fet t .
  • This layer N1t is composed of a number of convolution layers and an activation function Relu.
  • the layer N1d of the neural network N is used to extract the three-dimensional feature amount fet d .
  • the number of dimensions of the texture feature amount fet t and the three-dimensional feature amount fet d extracted from the layer N1t and the layer N1d are made equal.
  • the layers N4t and N5t of the neural network N are used to calculate the weight wt of the texture feature fet t
  • the layers N4d and N5d are used to calculate the three-dimensional feature fet d.
  • Calculate the weight wd of In the following, the method of calculating the weight w t for the texture feature amount fet t will be described, and the description of the weight w d for the three-dimensional feature amount fet d , which can be calculated in the same way, will be omitted.
  • FIG. 13 shows the details of the configuration of the layer N4t and the layer N5t for calculating the weight wt for the texture feature fet t .
  • the processing in layers N4t and N5t is the processing flow shown in FIG.
  • the processing by the layer N4t corresponds to the reference feature quantity inner product calculation (step S4c), and the processing by the layer N5t corresponds to the reference similarity inner product calculation (step S4d).
  • the texture reference features B t1 , B t2 , . . . , B tn shown in the layer N4t are kernels in the layer N4t.
  • the texture reference feature values B t1 , B t2 , . . . , B tn are estimated by learning a neural network N described later.
  • the inner product of B t1 , B t2 is the substance of vec.
  • the above processing can be realized by a convolution operation using a 1 ⁇ 1 kernel.
  • the reference similarity C in the layer N5t is a vector of the same dimension as the vector vec, and stores the relationship between each reference feature amount and the feature amount that should be actively used, that is, the weight is increased.
  • the first element of the vector vec stores the degree of similarity between the texture feature fet t and the texture reference feature Bt1
  • the second element stores the similarity between the texture feature fet t and the texture reference feature Bt2 .
  • the similarity is stored, and the conditions under which the calculated feature quantity should be actively used, that is, the condition for increasing the weight w t is similar to the texture reference feature quantity B t1 , while the texture reference feature quantity B t2 and If the condition is that they are not similar, the first element of the reference similarity C stores a positive value and the second element stores a negative value.
  • the reference similarity C is estimated by learning, which will be described later.
  • layer N5t the inner product of vector vec and reference similarity C is calculated.
  • the above processing is realized by a convolution operation using a 1 ⁇ 1 kernel. Let the inner product value of the vector vec and the reference similarity C be the weight wt.
  • a similar process is performed using the layer N4d and the layer N5d with the three-dimensional feature amount fet d as an input to calculate the weight w d corresponding to the three-dimensional feature amount fet d .
  • step S5 feature amounts are integrated based on the layer N6 of the neural network N.
  • FIG. The layer N6 of the neural network N is a layer that performs the same calculation as in (Formula 3) described above, and outputs the integrated feature amount fetC .
  • the layer N3 of the neural network N is used to determine whether or not it is a travelable area from the integrated feature amount fetC .
  • the layer N3 is composed of a layer composed of a convolution layer and an activation function Relu, a fully connected layer and an activation Softmax, and performs type determination.
  • the neural network N has a configuration in which all layers are differentiable, and can be learned by updating the kernel parameters so as to reduce the error function defined for the output of layer N3.
  • the reference feature values used in layers N4t and N5t and the reference similarity weights used in layers N4d and N5d are estimated so as to minimize the error function. That is, it is possible to estimate the reference feature amount and the reference similarity that maximize the recognition rate of the learning data.
  • Example 3 weight calculation was used based not only on the reference feature amount, but also on the reference similarity. This makes it possible to calculate weights that include not only similarity with the reference feature quantity, but also dissimilarity with the reference feature quantity, enabling weighting corresponding to more complicated conditions. Performance can be improved.
  • the drivable area is determined based on a single neural network in which the reference feature amount and the reference similarity are used as kernels. Also, learning was done by defining an error function for the output of the neural network. As a result, the reference feature amount and the reference similarity can be estimated so as to maximize the final recognition rate, so that recognition can be performed with higher accuracy.
  • Example 3 a single neural network is used to calculate feature amounts, estimate weights, and estimate types of objects. This eliminates the need to learn a plurality of neural networks individually, shortening the learning time and reducing the work cost required by the designer.
  • step S4 Since the difference between the first embodiment and the fourth embodiment is the processing content of the weight calculation processing (step S4), step S4 will be described below.
  • the weight w is calculated by independently processing the texture image Ft for each captured frame. Calculate
  • FIG. 15 shows the weight calculation process (step S4) in this embodiment.
  • step S4 of this embodiment includes past frame position calculation processing (step S4e) and weight average value calculation processing (step S4f).
  • step S4e it is calculated which position in the image of the past frame corresponds to the recognition target image area of the current frame. From information such as the vehicle speed and yaw rate, it is possible to predict which position in the previous frame the image area of the current frame corresponds to, or obtain feature points from the image, The position of the past frame may be specified by calculating the movement amount of the camera from the association of the feature points.
  • step S4f the weight of the image area surrounding the image area of the past frame specified in step S4e is used to calculate the weight of the image area to be recognized in the current frame.
  • the radius Rpix of the image region of the specified past frame is defined, and the average value of the past weights included in that region is used as the weight used in the current frame.
  • the above processing is performed by weight calculation of the texture feature amount fet t and the three-dimensional feature amount fet d .
  • weights to be used in the current frame are determined based on weights calculated in the past. This eliminates the need to calculate the weight in the current frame, thereby reducing the processing load.
  • the reference feature B is selected based on the information indicating the success or failure of recognition of the verification data, but the reference feature may be selected based on the recognition score. Specifically, when one reference feature amount is determined, the feature amount with the maximum identification score may be used as the reference feature amount, and when N reference feature amounts are determined, the top N of the identification scores may be used as the reference feature amount. It may be used as a reference feature amount.
  • Object recognition apparatus 1... Input signal acquisition part 11... Image acquisition part 12... Three-dimensional information acquisition part 2... Feature-value calculation part 21... Texture feature-value calculation part 22... Three-dimensional feature-value calculation part , 3...storage unit, 31...texture reference feature amount storage unit, 32...three-dimensional reference feature amount storage unit, 4...weight parameter generation unit, 5...target object recognition unit, Ft...texture image, Rt ... texture area , fet . _ _ _ _ integrated feature quantity, w... weight

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The purpose of the present invention is to provide an object recognition device for recognizing a travelable area with high accuracy. This object recognition device is characterized by including: an input signal acquisition unit that acquires texture information and three-dimensional information about an image; a feature value calculation unit that calculates a texture feature value based on texture information about a partial area of the image, and a three-dimensional feature value based on three-dimensional information about the partial area; a weight parameter generation unit that generates a weight parameter for each partial area; and a target object recognition unit that performs weighting with the weight parameter to thereby generate an integrated feature value obtained by integrating the texture feature value and the three-dimensional feature value, and that recognizes a target object in the image on the basis of the integrated feature value.

Description

物体認識装置、および、物体認識方法Object recognition device and object recognition method
 本発明は、車載カメラの撮像画像を解析して車両周辺の物体を認識する、物体認識装置、および、物体認識方法に関する。 The present invention relates to an object recognition device and an object recognition method that analyze an image captured by an in-vehicle camera and recognize objects around the vehicle.
 車両運転中の居眠りや注意不足等に起因する単独事故を防止するための運転支援システムとして、走行路からの逸脱を防止する車線維持支援システム(LKAS、Lane Keep Assist System)に期待が寄せられている。この車線維持支援システムは、適切な操舵支援制御を実現するための前提機能として、車載カメラの撮像画像を解析し、画像内の領域を走行可能領域と走行不可領域に領域分割する機能を備えている。 Lane Keep Assist System (LKAS), which prevents deviation from the roadway, is expected to serve as a driving support system to prevent single accidents caused by drowsiness or lack of attention while driving. there is As a precondition for realizing appropriate steering support control, this lane-keeping support system is equipped with a function that analyzes the image captured by the vehicle-mounted camera and divides the area within the image into a drivable area and an undrivable area. there is
 また、領域分割方法を開示する公知文献として、特許文献1が知られている。この文献の要約書には「取得した対象画像を、互いに異なる特徴量を用いて領域分割し、領域分割に用いた特徴量で特徴付けられた複数の生成領域からなる生成領域群を、特徴量の種類の数だけ生成する(S130)。生成領域群の一つを基本領域とし、基本領域と重なり合う生成領域の特徴量を基本領域の特徴量に組み込むことにより、複数種類の特徴量で特徴付けられた複数の基本領域からなる単一の基本領域群を生成する(S140)。基本領域群を構成する基本領域の特徴量に基づいて、注目する基本領域Rtarと、これに隣接する基本領域Rkとの間の類似度を表す指標として、特徴量が表す多次元特徴空間内での両領域間の重み付きユークリッド距離を求め、この距離が統合条件を満たす場合に両領域を統合する(S150)。」と記載されており、対象画像が持つテクスチャ情報だけではなく、距離情報(重み付きユークリッド距離)も利用した領域分割方法が開示されている。 Further, Japanese Patent Application Laid-Open No. 2002-200302 is known as a known document that discloses a region dividing method. The abstract of this document states that ``an acquired target image is divided into regions using different feature quantities, and a generated region group consisting of a plurality of generated regions characterized by the feature quantities used for the region division is (S130) One of the generation region groups is defined as a basic region, and the feature quantity of the generation region that overlaps with the basic region is incorporated into the feature quantity of the basic region. (S140) Based on the feature values of the basic regions forming the basic region group, the target basic region Rtar and the adjacent basic region R As an index representing the degree of similarity between k and the weighted Euclidean distance between both regions in the multidimensional feature space represented by the feature quantity, both regions are integrated when this distance satisfies the integration conditions (S150). ).”, and discloses a region dividing method that utilizes not only texture information of a target image but also distance information (weighted Euclidean distance).
 このように特許文献1では、対象画像と距離情報から特徴量を抽出し、特徴量毎に重みづけた距離に基づき対応付けを実施している。より詳細には、同文献の図3や段落0046から段落0050等で説明されるように、対象画像全体の明るさなどから重みを算出しており、ある画像に対する領域分割では領域に依らず同一の重みを利用している。 As described above, in Patent Document 1, a feature amount is extracted from the target image and the distance information, and correspondence is performed based on the distance weighted for each feature amount. More specifically, as described in FIG. 3, paragraphs 0046 to 0050, etc. of the same document, the weight is calculated from the brightness of the entire target image. uses the weight of
特開2006-127024号公報JP 2006-127024 A
 車線維持支援制御の前処理として走行可能領域を領域分割する場合、多数の物体が撮像されたシーンであっても正しく領域分割する必要がある。領域分割には、テクスチャ情報を積極的に利用する方法や、距離情報を積極的に利用する方法等が考えられるが、双方の情報をどの程度の重みで利用すべきかは、領域分割される物体の種別に応じて異なると考えられる。 When segmenting the drivable area as preprocessing for lane keeping support control, it is necessary to correctly segment even a scene in which a large number of objects are captured. A method of actively using texture information, a method of actively using distance information, etc. can be considered for area division. It is thought that it differs depending on the type of
 そのため、1枚の対象画像を単一の重みを利用して領域分割する特許文献1では、撮像された物体毎に重みを変更することができず、本来利用すべき重みと乖離した重みを利用する場合には、対象画像内を正しく領域分割することが困難であった。 Therefore, in Patent Document 1, in which a single target image is divided into regions using a single weight, the weight cannot be changed for each imaged object, and a weight that is different from the weight that should be used is used. In this case, it is difficult to correctly segment the target image into regions.
 このような課題に鑑み、本発明は、画像領域毎に重みを変更することで、走行可能領域を高精度に認識する物体認識装置、および、物体認識方法の提供を目的とする。 In view of such problems, the present invention aims to provide an object recognition device and an object recognition method that recognize a travelable area with high accuracy by changing the weight for each image area.
 画像のテクスチャ情報と3次元情報を取得する入力信号取得部と、前記画像の一部領域のテクスチャ情報に基づくテクスチャ特徴量と、前記一部領域の3次元情報に基づく3次元特徴量を算出する特徴量算出部と、前記一部領域ごとに重みパラメータを生成する重みパラメータ生成部と、前記重みパラメータで重みづけすることにより前記テクスチャ特徴量と前記3次元特徴量を統合した統合特徴量を生成し、該統合特徴量に基づき前記画像中の対象物を認識する対象物認識部と、を有することを特徴とする物体認識装置。 An input signal acquisition unit that acquires texture information and three-dimensional information of an image, and calculates a texture feature amount based on the texture information of a partial area of the image and a three-dimensional feature amount based on the three-dimensional information of the partial area. a feature amount calculation unit, a weight parameter generation unit that generates a weight parameter for each of the partial regions, and a combined feature amount that integrates the texture feature amount and the three-dimensional feature amount by weighting with the weight parameter. and an object recognition unit for recognizing the object in the image based on the integrated feature quantity.
 本発明の物体認識装置、および、物体認識方法によれば、画像領域毎に重みを変更することで、走行可能領域を高精度に認識することができる。 According to the object recognition device and the object recognition method of the present invention, the travelable area can be recognized with high accuracy by changing the weight for each image area.
実施例1の物体認識装置の機能ブロック図。2 is a functional block diagram of the object recognition device of Example 1. FIG. 実施例1の物体認識装置の処理フローチャート。4 is a processing flowchart of the object recognition device of the first embodiment; 実施例1の物体認識装置が取得したテクスチャ画像の一例。An example of a texture image acquired by the object recognition device of the first embodiment. 図3Aのテクスチャ画像の各画素のテクスチャ情報のデータ構造。Data structure of texture information of each pixel of the texture image of FIG. 3A. 実施例1の物体認識装置が取得した3次元画像の一例。4 is an example of a three-dimensional image acquired by the object recognition device of Example 1; 図3Cの3次元画像の各画素の3次元情報のデータ構造。Data structure of three-dimensional information of each pixel of the three-dimensional image of FIG. 3C. 実施例1の物体認識装置のニューラルネットワークの概略説明図。4 is a schematic explanatory diagram of a neural network of the object recognition device of Example 1. FIG. 実施例1のニューラルネットワークを利用したテクスチャ特徴量の抽出処理。Extraction processing of texture feature using the neural network of the first embodiment. 実施例1のニューラルネットワークを利用した3次元特徴量の抽出処理。Extraction processing of three-dimensional feature amount using the neural network of the first embodiment. 実施例1の物体認識装置の基準特徴量の決定方法の一例。An example of a method for determining a reference feature amount of the object recognition device of the first embodiment. 実施例1の物体認識装置の走行可否判定処理を説明する図。4A and 4B are diagrams for explaining travel permission/inhibition determination processing of the object recognition apparatus according to the first embodiment; FIG. 実施例1の物体認識装置の重み算出の処理フローの一例。5 is an example of a weight calculation processing flow of the object recognition apparatus of the first embodiment; 実施例2の物体認識装置の重み算出の処理フローの一例。FIG. 11 is an example of a weight calculation processing flow of the object recognition device of the second embodiment; FIG. 実施例2の物体認識装置の基準特徴量の決定方法の一例。FIG. 10 is an example of a method for determining a reference feature amount of the object recognition device of the second embodiment; FIG. 実施例2の物体認識装置の基準特徴量から重みを算出する方法の一例。An example of a method of calculating a weight from a reference feature amount of the object recognition device of the second embodiment. 実施例3の物体認識装置のニューラルネットワークの説明図。FIG. 11 is an explanatory diagram of a neural network of the object recognition device of Example 3; 図12のニューラルネットワークのうち重み算出レイヤーを示した図。FIG. 13 is a diagram showing a weight calculation layer in the neural network of FIG. 12; 実施例3の物体認識装置の重み算出の処理フローの一例。An example of the processing flow of weight calculation of the object recognition apparatus of Example 3. FIG. 実施例4の物体認識装置の重み算出の処理フローの一例。An example of the processing flow of weight calculation of the object recognition apparatus of Example 4. FIG.
 以下、図面を用いて、本発明の物体認識装置100の実施例について詳細に説明する。 An embodiment of the object recognition device 100 of the present invention will be described in detail below with reference to the drawings.
 まず、図1から図8を用いて、本発明の実施例1に係る物体認識装置100を説明する。 First, an object recognition device 100 according to Example 1 of the present invention will be described with reference to FIGS. 1 to 8. FIG.
 図1は、実施例1の物体認識装置100の構成を示す機能ブロックである。この物体認識装置100は、ハードウェアとして、車載カメラ等の外界センサ、CPU等の演算装置、半導体メモリ等の記憶装置を備えており、記憶装置に格納された制御プログラムを演算装置が実行することで、図示する各種機能が動作する。なお、プログラムの実行による種々機能の実現は周知技術であるので、以下では演算装置等のハードウェアの具体的な動作については説明を省略する。 FIG. 1 is a functional block showing the configuration of the object recognition device 100 of the first embodiment. The object recognition apparatus 100 includes, as hardware, an external sensor such as an in-vehicle camera, an arithmetic device such as a CPU, and a storage device such as a semiconductor memory. , various functions shown in the figure operate. It should be noted that implementation of various functions by executing programs is a well-known technology, and therefore, description of specific operations of hardware such as arithmetic units will be omitted below.
 図1に示すように、物体認識装置100は、上記ハードウェアにより実現される機能部として、入力信号取得部1、特徴量算出部2、記憶部3、重みパラメータ生成部4、対象物認識部5を有している。以下、各部を順次説明する。 As shown in FIG. 1, the object recognition apparatus 100 includes an input signal acquisition unit 1, a feature amount calculation unit 2, a storage unit 3, a weight parameter generation unit 4, and an object recognition unit as functional units realized by the above hardware. has 5. Each part will be described in order below.
 <入力信号取得部1>
 入力信号取得部1は、画像取得部11と3次元情報取得部12を有している。
<Input signal acquisition unit 1>
The input signal acquisition section 1 has an image acquisition section 11 and a three-dimensional information acquisition section 12 .
 画像取得部11は、車載カメラの撮像フレーム単位でテクスチャ画像Fを取得する。
従って、画像取得部11は、車載カメラが単眼カメラであれば、撮像フレーム毎に1枚のテクスチャ画像Fを取得し、また、車載カメラがステレオカメラであれば撮像フレーム毎に左右2枚のテクスチャ画像Fを取得する。
The image acquisition unit 11 acquires the texture image Ft in units of frames captured by the vehicle-mounted camera.
Therefore, if the vehicle-mounted camera is a monocular camera, the image acquisition unit 11 acquires one texture image Ft for each imaging frame. Get the texture image Ft .
 3次元情報取得部12は、画像取得部11がステレオカメラから左右2枚のテクスチャ画像Fを取得した場合であれば、周知の視差計算方法を利用することで画素毎の3次元情報Iを生成する。また、3次元情報取得部12は、画像取得部11が単眼カメラから1枚のテクスチャ画像Fを取得した場合であれば、単眼カメラに併設したミリ波レーダやLidarから画素毎の3次元情報Iを取得する。 If the image acquisition unit 11 acquires two left and right texture images Ft from the stereo camera, the three-dimensional information acquisition unit 12 obtains the three-dimensional information Id for each pixel by using a well-known parallax calculation method. to generate Further, if the image acquisition unit 11 acquires one texture image Ft from the monocular camera, the three-dimensional information acquisition unit 12 obtains three-dimensional information for each pixel from a millimeter wave radar or Lidar provided in parallel with the monocular camera. Get the Id .
 <特徴量算出部2>
 特徴量算出部2は、テクスチャ特徴量算出部21と3次元特徴量算出部22を有している。
<Feature amount calculator 2>
The feature amount calculator 2 has a texture feature amount calculator 21 and a three-dimensional feature amount calculator 22 .
 テクスチャ特徴量算出部21は、画像取得部11で取得したテクスチャ画像Fからテクスチャ特徴量fetを算出する。この特徴量には、エッジを利用したHoG特徴量を利用しても良いし、機械学習に基づくICF特徴量などを利用しても良い。また、後述する畳み込みニューラルネットワーク(以下、単に「ニューラルネットワークN」と称する)から抽出した特徴量を利用しても良い。 The texture feature amount calculation unit 21 calculates a texture feature amount fet_t from the texture image Ft acquired by the image acquisition unit 11 . For this feature amount, a HoG feature amount using edges may be used, or an ICF feature amount based on machine learning may be used. Also, feature amounts extracted from a convolutional neural network (hereinafter simply referred to as "neural network N"), which will be described later, may be used.
 3次元特徴量算出部22は、3次元情報取得部12で取得した3次元情報Iから3次元特徴量fetを算出する。この特徴量には、距離情報を画像に投影することで取得した距離画像に対するHoG特徴量や、画像のチャンネルとして3次元情報が格納された3次元画像FのICF特徴量や、3次元情報Iを入力としたニューラルネットワークから抽出した特徴量を利用しても良い。 The three-dimensional feature quantity calculator 22 calculates a three-dimensional feature quantity fet d from the three-dimensional information I d acquired by the three-dimensional information acquirer 12 . The feature amount includes the HoG feature amount for the distance image obtained by projecting the distance information onto the image, the ICF feature amount of the 3D image Fd in which the 3D information is stored as the image channel, and the 3D information. A feature amount extracted from a neural network with Id as an input may be used.
 <記憶部3>
 記憶部3は、テクスチャ基準特徴量格納部31と3次元基準特徴量格納部32を有している。テクスチャ基準特徴量格納部31には、テクスチャ画像Fのテクスチャ情報Iから抽出したテクスチャ基準特徴量Bが格納され、3次元基準特徴量格納部32には、3次元情報Iから抽出した3次元基準特徴量Bが格納される。なお、実施例2で詳細に説明するように、テクスチャ基準特徴量格納部31には複数のテクスチャ基準特徴量Bを格納でき、3次元基準特徴量格納部32には複数の3次元基準特徴量Bを格納できる。
<Storage unit 3>
The storage unit 3 has a texture reference feature amount storage unit 31 and a three-dimensional reference feature amount storage unit 32 . The texture reference feature amount storage unit 31 stores the texture reference feature amount Bt extracted from the texture information It of the texture image Ft , and the three- dimensional reference feature amount storage unit 32 stores the texture reference feature amount extracted from the three-dimensional information Id. The three-dimensional reference feature amount Bd is stored. As will be described in detail in the second embodiment, the texture reference feature storage unit 31 can store a plurality of texture reference features Bt , and the three-dimensional reference feature storage unit 32 can store a plurality of three-dimensional reference features. A quantity B d can be stored.
 両格納部に格納される基準特徴量Bは、テクスチャ特徴量算出部21と3次元特徴量算出部22を利用した識別器をそれぞれ構築し、検証用データセットでの認識率の観点で決定する。具体的には、認識に成功した特徴量や、認識スコアが最大となる特徴量を基準特徴量Bとして各格納部に格納する。また、テクスチャ特徴量算出部21と3次元特徴量算出部22をネットワークの構成の一部としたニューラルネットワークの学習により算出したカーネルを基準特徴量Bとして格納してもよい。 The reference feature amount B stored in both storage units is determined from the viewpoint of the recognition rate in the verification data set by constructing classifiers using the texture feature amount calculation unit 21 and the 3D feature amount calculation unit 22, respectively. . Specifically, the feature amount that is successfully recognized or the feature amount with the maximum recognition score is stored as the reference feature amount B in each storage unit. Also, a kernel calculated by learning a neural network in which the texture feature amount calculation unit 21 and the three-dimensional feature amount calculation unit 22 are part of the network configuration may be stored as the reference feature amount B. FIG.
 <重みパラメータ生成部4>
 重みパラメータ生成部4では、特徴量算出部2で算出した特徴量fetに対する重みwを、記憶部3に格納した基準特徴量Bを利用して算出する。具体的には、テクスチャ特徴量算出部21とテクスチャ基準特徴量格納部31の出力の内積を計算し、その内積をテクスチャ特徴量fetの重みwとする。同様に、3次元特徴量算出部22と3次元基準特徴量格納部32の出力の内積を計算し、その内積を3次元特徴量fetの重みwとする。
<Weight parameter generator 4>
The weight parameter generation unit 4 calculates the weight w for the feature amount fet calculated by the feature amount calculation unit 2 using the reference feature amount B stored in the storage unit 3 . Specifically, the inner product of the outputs of the texture feature amount calculation unit 21 and the texture reference feature amount storage unit 31 is calculated, and the inner product is used as the weight w t of the texture feature amount fet t . Similarly, the inner product of the outputs of the three-dimensional feature amount calculation unit 22 and the three-dimensional reference feature amount storage unit 32 is calculated, and the inner product is used as the weight wd of the three-dimensional feature amount fetd .
 内積はベクトル同士の相関、すなわち類似性を表現するため、基準特徴量Bとの類似性に着目して重みwを算出する。また、内積以外にもL2距離やバタチャリア距離を指数関数の指数としたものを重みwとして利用してもよい。また、テクスチャ基準特徴量格納部31と3次元基準特徴量格納部32の各々に複数の基準特徴量Bが格納されていた場合、各基準特徴量の内積値の平均値を重みとすることができる。また、基準特徴量Bがニューラルネットワークのカーネルである場合には、特徴量と基準特徴量の内積の計算結果に対して、さらに畳み込み演算を実施することで重みwを算出してもよい。 Since the inner product expresses the correlation between vectors, that is, the similarity, the weight w is calculated by focusing on the similarity with the reference feature value B. In addition to the inner product, an index of an exponential function such as the L2 distance or the Bhattacharya distance may be used as the weight w. Further, when a plurality of reference feature amounts B are stored in each of the texture reference feature amount storage unit 31 and the three-dimensional reference feature amount storage unit 32, the average value of the inner product values of each reference feature amount can be used as the weight. can. Further, when the reference feature amount B is a kernel of a neural network, the weight w may be calculated by further performing a convolution operation on the calculation result of the inner product of the feature amount and the reference feature amount.
 <対象物認識部5>
 対象物認識部5では、特徴量算出部2で算出した特徴量を重みパラメータ生成部4で生成した重みwにより統合した特徴量に基づき、対象物の認識を実施する。具体的には、テクスチャ特徴量fetと3次元特徴量fetを、重みパラメータ生成部4で生成した重みwに従い加算することで、統合特徴量fetを生成する。そして、統合特徴量fetを利用した識別器により車両周囲の走行可能領域を認識する。
<Object Recognition Unit 5>
The target object recognition unit 5 recognizes the target object based on the feature amount obtained by integrating the feature amount calculated by the feature amount calculation unit 2 with the weight w generated by the weight parameter generation unit 4 . Specifically, the integrated feature amount fet C is generated by adding the texture feature amount fet t and the three-dimensional feature amount fet d according to the weight w generated by the weight parameter generation unit 4 . Then, a discriminator using the integrated feature amount fet C recognizes the drivable area around the vehicle.
 対象物認識部5が認識した走行可能領域は、図示しないCAN(Controller Area Network)を介して、ECU(Electronic Control Unit)に出力される。このため、ECUは、車両周囲の走行可能領域を逸脱しないように操舵系の制御を支援することで、車線維持支援制御を実行する。 The travelable area recognized by the object recognition unit 5 is output to the ECU (Electronic Control Unit) via a CAN (Controller Area Network) (not shown). Therefore, the ECU executes lane keeping support control by assisting control of the steering system so as not to deviate from the drivable area around the vehicle.
 <動作例>
 次に、上記した構成の物体認識装置100の動作例を、図2のフローチャートを参照して詳細に説明する。以降の動作例では、車両前方を監視する姿勢で設置されたステレオカメラを利用する物体認識装置100に関して述べる。なお、ステレオカメラは、左カメラと右カメラから構成されるため、撮像フレーム単位で左右2枚のテクスチャ画像Fが撮像されるが、以下では、右カメラが撮像したテクスチャ画像Fにおける走行可能領域を推定するものとする。
<Operation example>
Next, an operation example of the object recognition device 100 having the above configuration will be described in detail with reference to the flowchart of FIG. In the following operation examples, the object recognition device 100 using a stereo camera installed in a posture to monitor the front of the vehicle will be described. Since the stereo camera is composed of a left camera and a right camera, two left and right texture images Ft are captured in units of captured frames. The area shall be estimated.
 本実施例の物体認識装置100は、入力情報取得処理(ステップS1)、テクスチャ特徴量抽出処理(ステップS2)、3次元特徴量抽出処理(ステップS3)、重み算出処理(ステップS4)、特徴量統合処理(ステップS5)、種別判定処理(ステップS6)を順に実施する。 The object recognition apparatus 100 of the present embodiment performs an input information acquisition process (step S1), a texture feature amount extraction process (step S2), a three-dimensional feature amount extraction process (step S3), a weight calculation process (step S4), a feature amount An integration process (step S5) and a type determination process (step S6) are performed in order.
 まず、入力情報取得処理(ステップS1)では、左カメラと右カメラからの左右2枚のテクスチャ画像Fを取得する。右カメラから取得したテクスチャ画像Fを図3Aに例示する。また、図3Aのテクスチャ画像Fの各画素のテクスチャ情報Iのデータ構造を図3Bに示す。なお、図3Bに例示するテクスチャ情報Iは、テクスチャ画像Fの各画素の色を、R値、G値、B値の組み合わせで定義したデータであるが、色の定義方法は、この例に限定されない。 First, in the input information acquisition process (step S1), two left and right texture images Ft are acquired from the left camera and the right camera. A texture image Ft acquired from the right camera is illustrated in FIG. 3A. FIG. 3B shows the data structure of the texture information I t of each pixel of the texture image F t of FIG. 3A. The texture information I t exemplified in FIG. 3B is data defining the color of each pixel of the texture image F t by a combination of the R value, the G value, and the B value. is not limited to
 また、本ステップでは、取得した左右2枚のテクスチャ画像Fに対して、右カメラ画像を基準として、左カメラ画像を走査することで視差画像を生成する。視差の算出には、例えば、SAD(Sum of Absolute Difference)を利用する。そして、カメラの焦点距離、撮像素子のサイズ、カメラの基線長を参照し、視差画像に基づいて、カメラからの奥行距離Z、横方向距離X、縦方向距離Yを算出し、3次元情報Iを画像のチャンネルとした3次元画像Fを生成する。このようにして生成された3次元画像Fを図3Cに例示し、3次元画像Fの各画素の3次元情報Iのデータ構造を図3Dに示す。以降では、図3Cに示す3次元領域Rの3次元情報Iも考慮して、図3Aに示すテクスチャ領域Rが走行可能領域か走行不可領域かを判定する手続きに関して説明する。 Also, in this step, a parallax image is generated by scanning the left camera image with the right camera image as a reference for the acquired left and right two texture images Ft . For example, SAD (Sum of Absolute Difference) is used to calculate parallax. Then, referring to the focal length of the camera, the size of the image sensor, and the base length of the camera, the depth distance Z, the horizontal distance X, and the vertical distance Y from the camera are calculated based on the parallax image, and the three-dimensional information I Generate a three-dimensional image Fd , where d is the channel of the image. The three-dimensional image Fd generated in this way is illustrated in FIG. 3C, and the data structure of the three-dimensional information Id of each pixel of the three-dimensional image Fd is shown in FIG. 3D. In the following, the procedure for determining whether the texture region Rt shown in FIG. 3A is a travelable region or a non-travelable region will be described in consideration of the three-dimensional information Id of the three-dimensional region Rd shown in FIG. 3C.
 テクスチャ特徴量抽出処理(ステップS2)では、ステップS1で取得した情報を用いてテクスチャ特徴量fetを抽出する。 In the texture feature amount extraction process (step S2), the texture feature amount fet_t is extracted using the information acquired in step S1.
 まず、図4を用いて、テクスチャ画像Fに基づいて走行可否を判定するニューラルネットワークNtの概略を説明する。図4のニューラルネットワークNtは、図3Aのテクスチャ領域Rと同サイズの任意の局所領域Rを入力したときに、その局所領域Rが走行可能領域であるか否かを判定できるように学習されたものである。なお、この学習には、正解値が付与された学習データセットを利用している。ここで、図4のニューラルネットワークNtは、前段の特徴量抽出のレイヤーN1tと、後段の識別処理のレイヤーN2tから構成されている。前段のレイヤーN1tは、多数の畳み込みカーネルと活性化関数Reluから構成されており、局所領域Rから識別に有効なテクスチャ特徴量fetを抽出する。また、後段のレイヤーN2tは、前段のレイヤーN1tで抽出したテクスチャ特徴量fetに対して、全結合層と活性化関数Softmaxにより局所領域Rが走行可能領域であるか否かを判断する構成となっている。 First, with reference to FIG. 4, an outline of the neural network Nt that determines whether or not the vehicle can travel based on the texture image Ft will be described. The neural network Nt of FIG. 4 is learned so that when an arbitrary local region R having the same size as the texture region Rt of FIG. 3A is input, it can be determined whether or not the local region R is a travelable region. It is a thing. For this learning, a learning data set to which correct values are given is used. Here, the neural network Nt of FIG. 4 is composed of a layer N1t for feature quantity extraction at the front stage and a layer N2t for identification processing at the rear stage. The preceding layer N1t is composed of a number of convolution kernels and activation functions Relu, and extracts texture feature fet t effective for discrimination from the local region R. Further, the latter layer N2t is configured to determine whether or not the local region R is a travelable region by the fully connected layer and the activation function Softmax for the texture feature quantity fet t extracted by the former layer N1t. It's becoming
 従って、図5Aに示すように、特徴量抽出のレイヤーN1tに、テクスチャ領域Rのテクスチャ情報Iを入力することで、テクスチャ領域Rのテクスチャ特徴量fetを算出することができる。 Therefore, as shown in FIG. 5A, by inputting the texture information It of the texture region Rt to the layer N1t for feature quantity extraction, the texture feature quantity fet_t of the texture region Rt can be calculated.
 3次元特徴量抽出処理(ステップS3)でも、ステップS2と同様に、ステップS1で取得した情報を用いて3次元特徴量fetを抽出する。本ステップでは、3次元画像Fに基づいて走行可否を判定するニューラルネットワークNdを利用する。このニューラルネットワークNdは、図3Cの3次元領域Rと同サイズの任意の局所領域Rを入力したときに、その局所領域Rが走行可能領域であるか否かを判定できるように学習されたものであり、図4のニューラルネットワークNtと同様に、前段の特徴量抽出のレイヤーN1dと、後段の識別処理のレイヤーN2dから構成されている。 In the three-dimensional feature quantity extraction process (step S3), similarly to step S2, the three-dimensional feature quantity fet d is extracted using the information acquired in step S1. In this step, a neural network Nd is used to determine whether travel is possible based on the three-dimensional image Fd . This neural network Nd is trained so that when an arbitrary local region R having the same size as the three-dimensional region Rd in FIG. 3C is input, it can be determined whether the local region R is a travelable region. Similar to the neural network Nt shown in FIG. 4, it is composed of a front feature extraction layer N1d and a rear identification processing layer N2d.
 従って、図5Bに示すように、特徴量抽出のレイヤーN1dに、図3Aのテクスチャ領域Rに対応した図3Cの3次元領域Rの3次元情報Iを入力することで、3次元領域Rの3次元特徴量fetを算出することができる。なお、テクスチャ領域R用のレイヤーN1t(図5A)と3次元領域R用のレイヤーN1d(図5B)は、同一次元数の特徴量を抽出するものとする。 Therefore, as shown in FIG. 5B, by inputting the three-dimensional information Id of the three-dimensional region Rd of FIG. 3C corresponding to the texture region Rt of FIG. 3A to the feature extraction layer N1d , the three-dimensional region A three-dimensional feature quantity fet d of R d can be calculated. It should be noted that the layer N1t (FIG. 5A) for the texture region Rt and the layer N1d (FIG. 5B) for the three-dimensional region Rd are assumed to extract feature quantities of the same number of dimensions.
 重み算出処理(ステップS4)では、ステップS2で抽出したテクスチャ特徴量fetに対する重みwと、ステップS3で抽出した3次元特徴量fetに対する重みwを計算する。以降、テクスチャ特徴量fetに対する重みwの算出方法を述べることとし、同様の考え方で算出可能な、3次元特徴量fetに対する重みwについては説明を省略する。 In the weight calculation process (step S4), the weight w t for the texture feature quantity fet t extracted in step S2 and the weight w d for the three-dimensional feature quantity fet d extracted in step S3 are calculated. Hereinafter, the method of calculating the weight w t for the texture feature amount fet t will be described, and the weight w d for the three-dimensional feature amount fet d , which can be calculated in the same way, will be omitted.
 テクスチャ特徴量fetの重みwの計算には、テクスチャ基準特徴量Bを利用する。まず、図6を用いて、テクスチャ基準特徴量Bの決定方法を説明する。テクスチャ基準特徴量Bの決定には、図4のニューラルネットワークNtを利用する。図6におけるR1、R2、R3は夫々、認識率の算出用に利用される検証用データセットの局所領域を示している。また、E1、E2、E3は夫々、局所領域R1、R2、R3を入力とした際のニューラルネットワークNtの認識結果を示している。図6では、局所領域R2の入力時の認識結果E2が正しく、局所領域R1とR3の入力時の認識結果E1とE3が間違っている。この場合、局所領域R2の入力時に前段のレイヤーN1tが出力したテクスチャ特徴量fett2をテクスチャ基準特徴量Bと決定して、テクスチャ基準特徴量格納部31に格納する。このようにして決定したテクスチャ基準特徴量Bは、テクスチャ領域Rの位置に応じて変更される変数でなく、ステップS4の実行前に事前設定された定数である。 The texture reference feature B t is used to calculate the weight w t of the texture feature fet t . First, a method for determining the texture reference feature amount Bt will be described with reference to FIG. The neural network Nt of FIG. 4 is used to determine the texture reference feature Bt . R1, R2, and R3 in FIG. 6 respectively indicate local regions of the verification data set used for calculating the recognition rate. Also, E1, E2 and E3 respectively indicate the recognition results of the neural network Nt when the local regions R1, R2 and R3 are input. In FIG. 6, the recognition result E2 when the local region R2 is input is correct, and the recognition results E1 and E3 when the local regions R1 and R3 are input are wrong. In this case, the texture feature amount fet t2 output by the preceding layer N1t when the local region R2 is input is determined as the texture reference feature amount Bt and stored in the texture reference feature amount storage unit 31 . The texture reference feature quantity Bt determined in this manner is not a variable that changes according to the position of the texture region Rt , but a constant that is preset before execution of step S4.
 従って、本ステップでは、任意のテクスチャ領域Rに対してステップS2で抽出したテクスチャ特徴量fetと、定数であるテクスチャ基準特徴量Bを用い、(式1)に従い内積を計算することで、そのテクスチャ領域Rにおける、テクスチャ特徴量fetに対する重みwを計算することができる。 Therefore, in this step, the texture feature amount fet t extracted in step S2 for an arbitrary texture region R t and the texture reference feature amount B t , which is a constant, are used to calculate the inner product according to (Equation 1). , the weight w t for the texture feature fet t in the texture region R t can be calculated.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 なお、(式1)におけるDは、テクスチャ基準特徴量Bの次元数を示している。(式1)の計算結果は、テクスチャ特徴量fetとテクスチャ基準特徴量Bとの相関値、すなわち類似度を表す。 Note that D in (Equation 1) indicates the number of dimensions of the texture reference feature Bt . The calculation result of (Equation 1 ) represents the correlation value between the texture feature quantity fet t and the texture reference feature quantity Bt, that is, the degree of similarity.
 同様の手続きを3次元画像Fの3次元領域Rに対しても実施することで、3次元基準特徴量Bを算出し、ステップS3で抽出した3次元特徴量fetに対する重みwを、(式2)を用いて算出することができる。 A similar procedure is performed on the three-dimensional region Rd of the three- dimensional image Fd to calculate the three-dimensional reference feature Bd, and the weight wd for the three-dimensional feature fet d extracted in step S3 . can be calculated using (Equation 2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 特徴量統合処理(ステップS5)では、ステップS2で抽出したテクスチャ特徴量fetと、ステップS3で抽出した3次元特徴量fetを、ステップS4で算出した重みw、wを利用して統合した、統合特徴量fetを計算する。特徴量の統合は、以下の(式3)に従い計算する。 In the feature quantity integration process (step S5), the texture feature quantity fet t extracted in step S2 and the three-dimensional feature quantity fet d extracted in step S3 are combined using the weights w t and w d calculated in step S4. A unified integrated feature fet C is calculated. Integration of feature amounts is calculated according to the following (Equation 3).
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 種別判定処理(ステップS6)では、ステップS5で計算した統合特徴量fetに基づき、走行可能領域か否かの判定を実施する。統合特徴量fetに基づく走行可能領域の判定には、ニューラルネットワークN3を利用する。図7に本ステップの概念図を示す。ここに示すように、統合特徴量fetをニューラルネットワークN3の入力として、走行可能領域か否かを判断する。ニューラルネットワークN3は多数の畳み込み層と活性化関数Reluからなる特徴抽出部と、全結合層と活性化関数Softmaxからなる識別処理部から構成されている。上記したニューラルネットワークN2tは、テクスチャ特徴量fetを入力としたデータセットで学習されており、ニューラルネットワークのレイヤーN2dは、特徴量fetを入力としたデータセットで学習されていたが、図7のニューラルネットワークN3は、統合特徴量fetを入力としたデータセットで学習されているものとする。 In the type determination process (step S6), it is determined whether or not the vehicle is in the travelable area based on the integrated feature amount fet C calculated in step S5. A neural network N3 is used to determine the travelable area based on the integrated feature amount fetC . FIG. 7 shows a conceptual diagram of this step. As shown here, the integrated feature amount fet C is used as an input to the neural network N3 to determine whether or not the vehicle is in the travelable area. The neural network N3 is composed of a feature extractor consisting of a large number of convolution layers and an activation function Relu, and a discrimination processor consisting of a fully connected layer and an activation function Softmax. The neural network N2t described above is trained with a data set with the texture feature fet t as an input, and the neural network layer N2d is trained with a data set with the feature fet d as an input. It is assumed that the neural network N3 of is trained with a data set with the integrated feature amount fet C as an input.
 以上のステップS1からステップS6を順に実施することで、重みwや重みwを適宜調整しながら、図3Aのテクスチャ領域Rに対して走行可能か否かの判定を実施することができる。図3Aのテクスチャ領域R以外の領域も同様に、ステップS1からステップS6を順に実施することで、テクスチャ画像Fの全体に対し走行可能領域判定を行うことができる。 By sequentially performing steps S1 to S6 described above, it is possible to determine whether or not the texture region Rt of FIG. 3A is travelable while appropriately adjusting the weight wt and weight wd . . Similarly, for areas other than the texture area Rt in FIG. 3A, by sequentially performing steps S1 to S6, the entire texture image Ft can be determined as a travelable area.
 以上より、本実施例の物体認識装置100は、特徴量の重みを画像の画素毎に変更することができる。これにより、テクスチャ画像F中にテクスチャ情報Iを積極的に利用して判定すべき物体と、3次元情報Iを積極的に利用して判定すべき物体が混在している場合であっても、各物体に異なる重みを設定することが可能となり、認識精度を高めることができる。 As described above, the object recognition apparatus 100 of the present embodiment can change the weight of the feature quantity for each pixel of the image. As a result, even if the texture image Ft includes an object that should be determined by actively using the texture information It and an object that should be determined by actively using the three-dimensional information Id , However, different weights can be set for each object, and the recognition accuracy can be improved.
 また、本実施例の物体認識装置100では、事前に決定された基準特徴量Bと特徴量fetを比較することで特徴量の重みを算出している。これにより、重み算出のために画像全体の輝度値を解析したりするなどの追加の画像解析処理が不要となり、重み算出を効率化することができる。 Further, in the object recognition device 100 of the present embodiment, the feature amount weight is calculated by comparing the reference feature amount B determined in advance and the feature amount fet. This eliminates the need for additional image analysis processing such as analyzing the luminance value of the entire image for weight calculation, and makes weight calculation more efficient.
 また、本実施例の物体認識装置100では、基準特徴量と特徴量の内積計算により重みを算出している。内積計算は積和演算のみで実行できるため、重み算出を少ない演算量で実施できる。 Also, in the object recognition device 100 of this embodiment, the weight is calculated by calculating the inner product of the reference feature amount and the feature amount. Since the inner product calculation can be performed only by sum-of-products calculation, weight calculation can be performed with a small amount of calculation.
 また、本実施例の物体認識装置100では、テクスチャ情報Iと3次元情報Iの基準特徴量Bの算出に、それぞれ異なるニューラルネットワークのレイヤーN1t,N1dを利用している。テクスチャ基準特徴量Bはテクスチャ情報Iのみから決定し、3次元基準特徴量は3次元情報Iのみから決定することができ、より正確に重みの算出を実施することができる。 Further, in the object recognition apparatus 100 of the present embodiment, different neural network layers N1t and N1d are used to calculate the reference feature amount B of the texture information It and the three-dimensional information Id. The texture reference feature Bt can be determined only from the texture information It, and the three-dimensional reference feature can be determined only from the three-dimensional information Id , so that the weight can be calculated more accurately.
 また、本実施例の物体認識装置100では、図6に例示したように、基準特徴量Bは検証用データに対する認識率に基づき生成されている。そのため、認識に成功した特徴量を基準特徴量として選択することができる。これにより、認識に成功した特徴量に類似した特徴量を積極的に利用する重みの算出が可能となり、より高精度に認識することができる。 In addition, in the object recognition device 100 of this embodiment, as illustrated in FIG. 6, the reference feature quantity B is generated based on the recognition rate for the verification data. Therefore, a successfully recognized feature amount can be selected as a reference feature amount. As a result, it is possible to calculate a weight that positively uses a feature amount similar to the feature amount that has been successfully recognized, so that recognition can be performed with higher accuracy.
 本実施例の重み算出処理(ステップS4)は、常にテクスチャ情報Iの重みwと3次元情報Iの重みwを算出していたが、図8に示す重み算出方法に変更することもできる。まず、3次元情報有効性判断処理(ステップS41)では、取得した3次元情報Iの有効性を判断する。3次元情報Iを取得できなかった画素、若しくは視差算出した際のコストが所定値以上であった場合には、その3次元画像Fが無効だと判断する。無効と判断された場合には、テクスチャ情報Iの重みwおよび3次元情報Iの重みwを算出することなくステップS6に進む。すなわち、ステップS6では、図4のニューラルネットワークのレイヤーN2tを用い、テクスチャ特徴量fetのみに基づいて走行可否を判定する。 In the weight calculation process (step S4 ) of this embodiment, the weight wt of the texture information It and the weight wd of the three-dimensional information Id are always calculated. can also First, in the 3D information validity determination process (step S41), the validity of the acquired 3D information Id is determined. If the pixel fails to acquire the three-dimensional information Id , or if the cost of the parallax calculation is equal to or greater than a predetermined value, the three-dimensional image Fd is determined to be invalid. If it is determined to be invalid, the process proceeds to step S6 without calculating the weight wt of the texture information I t and the weight wd of the three-dimensional information I d . That is, in step S6, using the layer N2t of the neural network in FIG. 4, it is determined whether or not the vehicle can travel based only on the texture feature amount fet t .
 一方で、3次元画像Fが有効と判定された場合には、テクスチャ情報Iの重みwおよび3次元情報Iの重みwを算出する(ステップS42)。そして、重みwと重みwを用いて、ステップS5以降の処理を実行する。 On the other hand, when the three-dimensional image Fd is determined to be valid, the weight wt of the texture information It and the weight wd of the three-dimensional information Id are calculated (step S42 ). Then, the weight wt and the weight wd are used to execute the processing after step S5.
 こうすることで、3次元情報が取得できなかった場合、若しくは取得した3次元情報の信頼度が著しく低い場合に、重みの算出をしないことで処理負荷を低減することができる。 By doing so, it is possible to reduce the processing load by not calculating the weight when the 3D information cannot be acquired or when the reliability of the acquired 3D information is extremely low.
 次に、図9から図11を用いて、本発明の実施例2に係る物体認識装置100を説明する。なお、実施例1との共通点は重複説明を省略する。 Next, the object recognition device 100 according to Example 2 of the present invention will be described with reference to FIGS. 9 to 11. FIG. Duplicate descriptions of common points with the first embodiment will be omitted.
 実施例1では、特徴量fet毎に1つの基準特徴量Bを設定したが(図6参照)、実施例2では、特徴量fet毎に複数の基準特徴量を設定可能にした。そのため、本実施例では、重み計算処理(ステップS4)を、図9に示すように、基準特徴量内積計算(ステップS4a)と平均値計算処理(ステップS4b)で構成した。 In the first embodiment, one reference feature amount B is set for each feature amount fet (see FIG. 6), but in the second embodiment, a plurality of reference feature amounts can be set for each feature amount fet. Therefore, in this embodiment, as shown in FIG. 9, the weight calculation process (step S4) is composed of the reference feature amount inner product calculation (step S4a) and the average value calculation process (step S4b).
 まず、基準特徴量内積計算(ステップS4a)では、特徴量fet毎に設定した複数の基準特徴量Bのそれぞれを用いて、特徴量fetとの内積計算を実施する。ここで、図10を用いて、複数の基準特徴量Bの設定方法を説明する。なお、以降では、テクスチャ画像Fに対応したテクスチャ基準特徴量Bの設定方法に関して述べることとし、同様の考え方で設定可能な、3次元基準特徴量Bについては説明を省略する。図10の各符号の意味は、図6の各符号の意味と同様である。両図の違いは、図6では、局所領域R2に基づく認識結果E2だけが正しく、局所領域R1,R3に基づく認識結果E1,E3の双方が誤っていたが、図10では、局所領域R1,R2に基づく認識結果E1,E2の双方が正しく、局所領域R3に基づく認識結果E3だけが誤っている点である。 First, in the reference feature amount inner product calculation (step S4a), each of a plurality of reference feature amounts B set for each feature amount fet is used to calculate the inner product with the feature amount fet. Here, a method for setting a plurality of reference feature amounts B will be described with reference to FIG. 10 . In the following, the method of setting the texture reference feature Bt corresponding to the texture image Ft will be described, and the description of the three- dimensional reference feature Bd , which can be set in the same way, will be omitted. The meaning of each code in FIG. 10 is the same as the meaning of each code in FIG. The difference between the two figures is that in FIG. 6 only the recognition result E2 based on the local region R2 is correct and both the recognition results E1 and E3 based on the local regions R1 and R3 are incorrect, whereas in FIG. Both of the recognition results E1 and E2 based on R2 are correct, and only the recognition result E3 based on the local region R3 is incorrect.
 そこで、本実施例では、識別に成功した局所領域R1,R2に起因するテクスチャ特徴量fett1、fett2の両方をテクスチャ基準特徴量Bt1,Bt2に設定し、両者を用いて、任意のテクスチャ領域Rのテクスチャ特徴量fetの重みwを演算する。そのため、ステップS4aでは、図11に示すように、テクスチャ領域Rから抽出したテクスチャ特徴量fetとテクスチャ基準特徴量Bt1の内積値St1、および、テクスチャ特徴量fetとテクスチャ基準特徴量Bt2の内積値St2を計算する。すなわち、各基準特徴量と、テクスチャ領域Rのテクスチャ特徴量fetの類似性情報を計算する。 Therefore, in the present embodiment, both the texture feature amounts fet t1 and fet t2 resulting from the successfully identified local regions R1 and R2 are set as the texture reference feature amounts B t1 and B t2 . A weight w t of the texture feature amount fet t of the texture region R t is calculated. Therefore, in step S4a, as shown in FIG. 11, the inner product value S t1 of the texture feature amount fet t extracted from the texture region R t and the texture reference feature amount B t1 , and the texture feature amount fet t and the texture reference feature amount Calculate the inner product value S t2 of B t2 . That is, similarity information between each reference feature amount and the texture feature amount fet_t of the texture region R_t is calculated.
 次に、平均値計算処理(ステップS4b)では、ステップS4aで計算した複数の内積値から、テクスチャ特徴量fetに対する重みwを算出する。重みwの算出には、次に示す(式4)を利用する。 Next, in the average value calculation process (step S4b), the weight w t for the texture feature amount fet t is calculated from the plurality of inner product values calculated in step S4a. The following equation (4) is used to calculate the weight wt .
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 ここで、Aはテクスチャ基準特徴量Bのインデックス集合を示している。(式4)に従い計算することで、テクスチャ基準特徴量Bの内積値の平均を求めることができる。実施例2では、複数のテクスチャ基準特徴量Bから算出した内積値の平均値をテクスチャ特徴量fetに対する重みwとする。以上では、テクスチャ特徴量fetの重みwの算出方法について説明したが、3次元特徴量fetに対しても同様の手続きに従い、(式5)を利用して複数の3次元基準特徴量Bの算出と内積の平均値による重みwの算出を実施する。 Here, A t indicates an index set of the texture reference feature B t . By calculating according to (Formula 4), the average of the inner product values of the texture reference feature Bt can be obtained. In the second embodiment, the average value of the inner product values calculated from a plurality of texture reference feature quantities Bt is used as the weight wt for the texture feature quantity fet t . The method of calculating the weight w t of the texture feature quantity fet t has been described above . Calculation of Bd and calculation of weight wd based on the average value of inner products are performed.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 本実施例では、特徴量fet毎に複数の基準特徴量Bを設定し、それぞれの内積値の平均値をその特徴量fetの重みwとして利用する。これにより、単一の基準特徴量ではなく複数の基準特徴量に基づき重みの算出が可能となるため、ロバストに重みの算出を実施できる。 In this embodiment, a plurality of reference feature amounts B are set for each feature amount fet, and the average value of the respective inner product values is used as the weight w of the feature amount fet. As a result, weights can be calculated based on a plurality of reference feature amounts rather than a single reference feature amount, so weight calculation can be performed robustly.
 次に、図12から図14を用いて、本発明の実施例3に係る物体認識装置100を説明する。なお、上記の実施例との共通点は重複説明を省略する。 Next, the object recognition device 100 according to Example 3 of the present invention will be described with reference to FIGS. 12 to 14. FIG. Duplicate descriptions of the points in common with the above embodiment will be omitted.
 実施例1では、図2に示す、テクスチャ特徴量抽出処理(ステップS2)、3次元特徴量抽出処理(ステップS3)、種別判定処理(ステップS6)の3ステップで夫々ニューラルネットワークを利用していた。つまり、実施例1では、図2の処理の実行に3種のニューラルネットワークを利用していた。これに対し本実施例では、実施例1の各ネットワークの機能をレイヤーとして取り込んだ、1つのニューラルネットワークNを利用して、図2の処理を実行する。 In the first embodiment, neural networks are used in each of the three steps of texture feature amount extraction processing (step S2), three-dimensional feature amount extraction processing (step S3), and type determination processing (step S6) shown in FIG. . In other words, in Example 1, three types of neural networks were used to execute the processing of FIG. On the other hand, in the present embodiment, one neural network N incorporating the functions of the networks of the first embodiment as layers is used to execute the processing of FIG.
 図12に、本実施例のニューラルネットワークNの構成を示す。以降では、図2の処理フローとニューラルネットワークNの対応関係について述べる。ただし、入力情報取得処理は実施例1と同様のため、それ以降の処理に関して述べる。 FIG. 12 shows the configuration of the neural network N of this embodiment. The correspondence relationship between the processing flow of FIG. 2 and the neural network N will be described below. However, since the input information acquisition process is the same as that of the first embodiment, the subsequent processes will be described.
 図12に示すニューラルネットワークNは、まず、テクスチャ領域Rと3次元領域Rを入力として、テクスチャ特徴量抽出処理(ステップS2)と、3次元特徴量抽出処理(ステップS3)を実施する。テクスチャ特徴量抽出処理(ステップS2)では、ニューラルネットワークNのレイヤーN1tを利用してテクスチャ特徴量fetを抽出する。
このレイヤーN1tは、多数の畳み込み層と活性化関数Reluから構成されている。同様に、3次元特徴量抽出処理(ステップS3)でも、ニューラルネットワークNのレイヤーN1dを利用して3次元特徴量fetを抽出する。ここで、レイヤーN1tとレイヤーN1dで抽出されたテクスチャ特徴量fetと3次元特徴量fetの次元数は等しくなるようになっている。
The neural network N shown in FIG. 12 first receives the texture region Rt and the three-dimensional region Rd as inputs, and performs the texture feature amount extraction process (step S2) and the three-dimensional feature amount extraction process (step S3). In the texture feature quantity extraction process (step S2), the layer N1t of the neural network N is used to extract the texture feature quantity fet t .
This layer N1t is composed of a number of convolution layers and an activation function Relu. Similarly, in the three-dimensional feature amount extraction process (step S3), the layer N1d of the neural network N is used to extract the three-dimensional feature amount fet d . Here, the number of dimensions of the texture feature amount fet t and the three-dimensional feature amount fet d extracted from the layer N1t and the layer N1d are made equal.
 重み算出処理(ステップS4)では、ニューラルネットワークNのレイヤーN4t、N5tを利用して、テクスチャ特徴量fetの重みwを算出し、レイヤーN4d、N5dを利用して、3次元特徴量fetの重みwを算出する。以降では、テクスチャ特徴量fetに対する重みwの算出方法に関して述べ、同様の考え方で算出可能な、3次元特徴量fetに対する重みwについては説明を省略する。 In the weight calculation process (step S4), the layers N4t and N5t of the neural network N are used to calculate the weight wt of the texture feature fet t , and the layers N4d and N5d are used to calculate the three-dimensional feature fet d. Calculate the weight wd of In the following, the method of calculating the weight w t for the texture feature amount fet t will be described, and the description of the weight w d for the three-dimensional feature amount fet d , which can be calculated in the same way, will be omitted.
 テクスチャ特徴量fetに対する重みwを算出するレイヤーN4tとレイヤーN5tの構成の詳細を図13に示す。レイヤーN4tとレイヤーN5tでの処理は、図14に示す処理フローとなる。レイヤーN4tによる処理が、基準特徴量内積計算(ステップS4c)に対応しており、レイヤーN5tによる処理が、基準類似度内積計算(ステップS4d)に対応している。 FIG. 13 shows the details of the configuration of the layer N4t and the layer N5t for calculating the weight wt for the texture feature fet t . The processing in layers N4t and N5t is the processing flow shown in FIG. The processing by the layer N4t corresponds to the reference feature quantity inner product calculation (step S4c), and the processing by the layer N5t corresponds to the reference similarity inner product calculation (step S4d).
 まず、基準特徴量内積計算(ステップS4c)に関して述べる。レイヤーN4tに示すテクスチャ基準特徴量Bt1、Bt2、…、BtnはレイヤーN4tにおけるカーネルである。テクスチャ基準特徴量Bt1、Bt2、…、Btnは、後述するニューラルネットワークNを学習することで推定する。レイヤーN4tでは、テクスチャ特徴量fetに対してBt1、Bt2、…、Btnとの内積を計算し、各カーネルとの内積値を要素とするベクトルvecを出力する。ここで、ベクトルvecの各要素は各基準特徴量との内積、つまり相関値であるため、各基準特徴量との類似度を表現したベクトルがvecの実体となる。以上の処理は、1x1のカーネルによる畳み込み演算により実現できる。 First, the reference feature amount inner product calculation (step S4c) will be described. The texture reference features B t1 , B t2 , . . . , B tn shown in the layer N4t are kernels in the layer N4t. The texture reference feature values B t1 , B t2 , . . . , B tn are estimated by learning a neural network N described later. In the layer N4t , the inner product of B t1 , B t2 , . Here, since each element of the vector vec is an inner product with each reference feature amount, that is, a correlation value, the vector representing the degree of similarity with each reference feature amount is the substance of vec. The above processing can be realized by a convolution operation using a 1×1 kernel.
 次に、基準類似度内積計算(ステップS4d)を実施する。レイヤーN5tにおける基準類似度Cは、ベクトルvecと同一次元のベクトルであり、積極的に利用すべき、つまり重みを強める特徴量と各基準特徴量の関係性が格納されている。具体的には、ベクトルvecの第一要素にテクスチャ特徴量fetとテクスチャ基準特徴量Bt1との類似度が格納され、第二要素にテクスチャ特徴量fetとテクスチャ基準特徴量Bt2との類似度が格納されており、算出した特徴量を積極的に利用すべき、つまり重みwを強める条件が、テクスチャ基準特徴量Bt1と類似している一方で、テクスチャ基準特徴量Bt2と類似していないという条件であった場合、基準類似度Cにおける第一要素は正値、第二要素は負値が格納される。基準類似度Cは後述する学習により推定する。レイヤーN5tでは、ベクトルvecと基準類似度Cとの内積を計算する。以上の処理は、1x1のカーネルによる畳み込み演算により実現される。ベクトルvecと基準類似度Cとの内積値を重みwとする。同様の処理を、3次元特徴量fetを入力としてレイヤーN4dとレイヤーN5dを利用することで、3次元特徴量fetに対応した重みwを算出する。 Next, a reference similarity inner product calculation (step S4d) is performed. The reference similarity C in the layer N5t is a vector of the same dimension as the vector vec, and stores the relationship between each reference feature amount and the feature amount that should be actively used, that is, the weight is increased. Specifically, the first element of the vector vec stores the degree of similarity between the texture feature fet t and the texture reference feature Bt1 , and the second element stores the similarity between the texture feature fet t and the texture reference feature Bt2 . The similarity is stored, and the conditions under which the calculated feature quantity should be actively used, that is, the condition for increasing the weight w t is similar to the texture reference feature quantity B t1 , while the texture reference feature quantity B t2 and If the condition is that they are not similar, the first element of the reference similarity C stores a positive value and the second element stores a negative value. The reference similarity C is estimated by learning, which will be described later. In layer N5t, the inner product of vector vec and reference similarity C is calculated. The above processing is realized by a convolution operation using a 1×1 kernel. Let the inner product value of the vector vec and the reference similarity C be the weight wt. A similar process is performed using the layer N4d and the layer N5d with the three-dimensional feature amount fet d as an input to calculate the weight w d corresponding to the three-dimensional feature amount fet d .
 特徴量統合処理(ステップS5)では、ニューラルネットワークNのレイヤーN6に基づき特徴量を統合する。ニューラルネットワークNのレイヤーN6は、上記した(式3)と同様の計算を行うレイヤーとなっており、統合特徴量fetを出力する。 In the feature amount integration process (step S5), feature amounts are integrated based on the layer N6 of the neural network N. FIG. The layer N6 of the neural network N is a layer that performs the same calculation as in (Formula 3) described above, and outputs the integrated feature amount fetC .
 種別判定処理(ステップS6)では、ニューラルネットワークNのレイヤーN3を利用して、統合特徴量fetから走行可能領域か否かを判定する。レイヤーN3は畳み込み層と活性化関数Reluから構成されるレイヤーと、全結合層と活性化Softmaxから構成されており、種別の判定を実施する。 In the type determination process (step S6), the layer N3 of the neural network N is used to determine whether or not it is a travelable area from the integrated feature amount fetC . The layer N3 is composed of a layer composed of a convolution layer and an activation function Relu, a fully connected layer and an activation Softmax, and performs type determination.
 次に、ニューラルネットワークNの学習方法に関して述べる。学習では、レイヤーN3の出力値と正解値とのクロスエントロピーを誤差関数として学習する。ニューラルネットワークNは、全てのレイヤーが微分可能な構成となっており、レイヤーN3の出力に対して定義された誤差関数を減少するようにカーネルのパラメータを更新することで学習できる。これにより、レイヤーN4tとN5tで利用する基準特徴量と、レイヤーN4dとレイヤーN5dで利用する基準類似度の重みが、誤差関数が最小化されるように推定される。つまり、学習データの認識率を最大化するような基準特徴量と基準類似度を推定できる。 Next, the learning method of neural network N will be described. In learning, the cross entropy between the output value of layer N3 and the correct value is learned as an error function. The neural network N has a configuration in which all layers are differentiable, and can be learned by updating the kernel parameters so as to reduce the error function defined for the output of layer N3. As a result, the reference feature values used in layers N4t and N5t and the reference similarity weights used in layers N4d and N5d are estimated so as to minimize the error function. That is, it is possible to estimate the reference feature amount and the reference similarity that maximize the recognition rate of the learning data.
 実施例3では、基準特徴量だけでなく、基準類似度に基づき重みの算出を利用していた。これにより、基準特徴量との類似性だけでなく、基準特徴量と似ていないことを含めて重みの算出をすることができるようになり、より複雑な条件に対応した重みづけが可能となり、性能の向上が可能となる。 In Example 3, weight calculation was used based not only on the reference feature amount, but also on the reference similarity. This makes it possible to calculate weights that include not only similarity with the reference feature quantity, but also dissimilarity with the reference feature quantity, enabling weighting corresponding to more complicated conditions. Performance can be improved.
 また、実施例3では、基準特徴量と基準類似度をカーネルとした単一のニューラルネットワークに基づき走行可能領域の判定を実施する。また、ニューラルネットワークの出力に対して誤差関数を定義して学習していた。これにより、最終的な認識率を最大化するように基準特徴量と基準類似度を推定できるため、より高精度に認識することが可能となる。 In addition, in Example 3, the drivable area is determined based on a single neural network in which the reference feature amount and the reference similarity are used as kernels. Also, learning was done by defining an error function for the output of the neural network. As a result, the reference feature amount and the reference similarity can be estimated so as to maximize the final recognition rate, so that recognition can be performed with higher accuracy.
 また、実施例3では、単一のニューラルネットワークに利用して、特徴量の算出、重みの推定、物体の種別の推定を行う。これにより、複数のニューラルネットワークを個別に学習する必要がなくなり、学習時間の短縮や設計者の要する作業コストを低減できる。 In addition, in Example 3, a single neural network is used to calculate feature amounts, estimate weights, and estimate types of objects. This eliminates the need to learn a plurality of neural networks individually, shortening the learning time and reducing the work cost required by the designer.
 次に、図15を用いて、本発明の実施例4に係る物体認識装置100を説明する。なお、上記の実施例との共通点は重複説明を省略する。 Next, the object recognition device 100 according to Example 4 of the present invention will be described using FIG. Duplicate descriptions of the points in common with the above embodiment will be omitted.
 実施例1と実施例4の違いは、重み算出処理(ステップS4)の処理内容であるため、以降ではステップS4に関して説明する。実施例1では、撮像フレーム毎のテクスチャ画像Fをそれぞれ独立して処理して重みwを算出していたが、本実施例では、過去に算出した重み情報を参照して、今回フレームの重みを算出する。 Since the difference between the first embodiment and the fourth embodiment is the processing content of the weight calculation processing (step S4), step S4 will be described below. In the first embodiment, the weight w is calculated by independently processing the texture image Ft for each captured frame. Calculate
 本実施例における重み算出処理(ステップS4)を図15に示す。ここに示すように、本実施例のステップS4は、過去フレーム位置算出処理(ステップS4e)と、重み平均値計算処理(ステップS4f)からなる。 FIG. 15 shows the weight calculation process (step S4) in this embodiment. As shown here, step S4 of this embodiment includes past frame position calculation processing (step S4e) and weight average value calculation processing (step S4f).
 まず、過去フレーム位置算出処理(ステップS4e)では、今回フレームの認識対象の画像領域が、過去フレームの画像のどの位置に対応するのかを計算する。車両の速度、ヨーレートなどの情報から、今回フレームの画像領域が過去フレームのどの位置に対応するかを予測してもよいし、画像から特徴点を取得して、直前の時刻と今回の時刻で特徴点を対応付けることからカメラの移動量を計算することで過去フレームの位置を特定してもよい。 First, in the past frame position calculation process (step S4e), it is calculated which position in the image of the past frame corresponds to the recognition target image area of the current frame. From information such as the vehicle speed and yaw rate, it is possible to predict which position in the previous frame the image area of the current frame corresponds to, or obtain feature points from the image, The position of the past frame may be specified by calculating the movement amount of the camera from the association of the feature points.
 重み平均値計算処理(ステップS4f)では、ステップS4eで特定した過去フレームの画像領域周辺の重みを利用し、今回フレームの認識対象の画像領域の重みを算出する。
特定した過去フレームの画像領域の半径Rpixを定義し、その領域内に含まれる過去の重みの平均値を今回フレームで利用する重みとする。以上の処理を、テクスチャ特徴量fetおよび3次元特徴量fetの重み算出で実施する。
In the weight average value calculation process (step S4f), the weight of the image area surrounding the image area of the past frame specified in step S4e is used to calculate the weight of the image area to be recognized in the current frame.
The radius Rpix of the image region of the specified past frame is defined, and the average value of the past weights included in that region is used as the weight used in the current frame. The above processing is performed by weight calculation of the texture feature amount fet t and the three-dimensional feature amount fet d .
 実施例4では、過去に算出した重みに基づき今回フレームで利用する重みを決定する。
これにより、今回フレームにおいて、重みの算出を実施する必要がなくなり、処理負荷を低減できる。
In the fourth embodiment, weights to be used in the current frame are determined based on weights calculated in the past.
This eliminates the need to calculate the weight in the current frame, thereby reducing the processing load.
 なお、実施例1と実施例2では、検証用データに対する認識の成功、若しくは失敗という情報に基づき基準特徴量Bを選択していたが、認識スコアに基づき基準特徴量を選択してもよい。具体的には、基準特徴量を1つ決定する場合には識別スコアが最大となる特徴量を基準特徴量としてもよく、基準特徴量をN個決定する場合には識別スコアの上位N個を基準特徴量としてもよい。 Note that in the first and second embodiments, the reference feature B is selected based on the information indicating the success or failure of recognition of the verification data, but the reference feature may be selected based on the recognition score. Specifically, when one reference feature amount is determined, the feature amount with the maximum identification score may be used as the reference feature amount, and when N reference feature amounts are determined, the top N of the identification scores may be used as the reference feature amount. It may be used as a reference feature amount.
 以上、本発明を説明したが、本発明は上記実施例に限定されるものではない。本願発明の構成や詳細には、本発明の趣旨内で当事者が理解し得る様々な変更を加えることができる。 Although the present invention has been described above, the present invention is not limited to the above examples. Various changes can be made to the configuration and details of the present invention within the spirit of the present invention that can be understood by those involved.
100…物体認識装置、1…入力信号取得部、11…画像取得部、12…3次元情報取得部、2…特徴量算出部、21…テクスチャ特徴量算出部、22…3次元特徴量算出部、3…記憶部、31…テクスチャ基準特徴量格納部、32…3次元基準特徴量格納部、4…重みパラメータ生成部、5…対象物認識部、F…テクスチャ画像、R…テクスチャ領域、fet…テクスチャ特徴量、B…テクスチャ基準特徴量、F…3次元画像、R…3次元領域、fet…3次元特徴量、B…3次元基準特徴量、fet…統合特徴量、w…重み DESCRIPTION OF SYMBOLS 100... Object recognition apparatus 1... Input signal acquisition part 11... Image acquisition part 12... Three-dimensional information acquisition part 2... Feature-value calculation part 21... Texture feature-value calculation part 22... Three-dimensional feature-value calculation part , 3...storage unit, 31...texture reference feature amount storage unit, 32...three-dimensional reference feature amount storage unit, 4...weight parameter generation unit, 5...target object recognition unit, Ft...texture image, Rt ... texture area , fet . _ _ _ _ integrated feature quantity, w... weight

Claims (10)

  1.  画像のテクスチャ情報と3次元情報を取得する入力信号取得部と、
     前記画像の一部領域のテクスチャ情報に基づくテクスチャ特徴量と、前記一部領域の3次元情報に基づく3次元特徴量を算出する特徴量算出部と、
     前記一部領域ごとに重みパラメータを生成する重みパラメータ生成部と、
     前記重みパラメータで重みづけすることにより、前記テクスチャ特徴量と前記3次元特徴量を統合した統合特徴量を生成し、該統合特徴量に基づき前記画像中の対象物を認識する対象物認識部と、を有することを特徴とする物体認識装置。
    an input signal acquisition unit that acquires texture information and three-dimensional information of an image;
    a feature amount calculation unit that calculates a texture feature amount based on texture information of a partial area of the image and a three-dimensional feature amount based on three-dimensional information of the partial area;
    a weight parameter generation unit that generates a weight parameter for each of the partial regions;
    an object recognizing unit that generates an integrated feature amount by integrating the texture feature amount and the three-dimensional feature amount by weighting with the weight parameter, and recognizes an object in the image based on the integrated feature amount; , and an object recognition device.
  2.  前記テクスチャ特徴量に対応するテクスチャ基準特徴量と、前記3次元特徴量に対応する3次元基準特徴量を記憶する記憶部を更に備え、
     前記重みパラメータ生成部は、前記テクスチャ特徴量および前記3次元特徴量をそれぞれ前記テクスチャ基準特徴量および前記3次元基準特徴量と比較した結果に基づいて、前記一部領域ごとに重みパラメータを生成することを特徴とする、請求項1に記載の物体認識装置。
    A storage unit for storing a texture reference feature amount corresponding to the texture feature amount and a three-dimensional reference feature amount corresponding to the three-dimensional feature amount,
    The weight parameter generation unit generates a weight parameter for each partial region based on results of comparing the texture feature amount and the three-dimensional feature amount with the texture reference feature amount and the three-dimensional reference feature amount, respectively. 2. The object recognition device according to claim 1, characterized by:
  3.  前記記憶部は、複数の前記テクスチャ基準特徴量および複数の前記3次元基準特徴量を記憶し、
     前記重みパラメータ生成部は、前記テクスチャ特徴量および前記3次元特徴量をそれぞれ複数の前記テクスチャ基準特徴量および複数の前記3次元基準特徴量と比較した結果に基づいて、前記一部領域ごとに重みパラメータを求めることを特徴とする、請求項2に記載の物体認識装置。
    The storage unit stores a plurality of the texture reference feature amounts and a plurality of the three-dimensional reference feature amounts,
    The weight parameter generation unit weights each partial region based on results of comparing the texture feature amount and the three-dimensional feature amount with a plurality of the texture reference feature amounts and a plurality of the three-dimensional reference feature amounts, respectively. 3. The object recognition device according to claim 2, wherein a parameter is obtained.
  4.  前記重み算出部は、前記テクスチャ特徴量および前記3次元特徴量とそれぞれ前記テクスチャ基準特徴量および前記3次元基準特徴量との内積値を前記重みパラメータとすることを特徴とする、請求項2に記載の物体認識装置。 3. The weight calculating unit uses an inner product value of the texture feature amount and the three-dimensional feature amount and the texture reference feature amount and the three-dimensional reference feature amount, respectively, as the weight parameter. An object recognition device as described.
  5.  前記テクスチャ基準特徴量は、前記テクスチャ特徴量を利用するテクスチャ識別器を利用して算出され、
     前記3次元基準特徴量は、前記3次元特徴量を利用する3次元識別器を利用して算出されることを特徴とする、請求項2に記載の物体認識装置。
    The texture reference feature is calculated using a texture identifier that uses the texture feature,
    3. The object recognition apparatus according to claim 2, wherein the three-dimensional reference feature amount is calculated using a three-dimensional discriminator that uses the three-dimensional feature amount.
  6.  前記特徴量算出部、前記重みパラメータ生成部、前記記憶部、および前記対象物認識部は、単一のニューラルネットワークから構成されることを特徴とする、請求項2に記載の物体認識装置。 3. The object recognition device according to claim 2, wherein the feature amount calculation unit, the weight parameter generation unit, the storage unit, and the target object recognition unit are composed of a single neural network.
  7.  前記テクスチャ基準特徴量および前記3次元基準特徴量は、検証用データに対する認識率に基づき予め生成されることを特徴とする、請求項2に記載の物体認識装置。 The object recognition device according to claim 2, wherein the texture reference feature amount and the three-dimensional reference feature amount are generated in advance based on a recognition rate for verification data.
  8.  前記重み算出部は、前記3次元情報が取得される領域に対して前記重みパラメータを算出し、
     前記対象物認識部は、前記3次元情報が取得されない領域に対しては、前記テクスチャ特徴量に基づいて前記画像中の対象物を認識することを特徴とする、請求項1に記載の物体認識装置。
    The weight calculation unit calculates the weight parameter for the region from which the three-dimensional information is acquired;
    2. The object recognition according to claim 1, wherein the object recognizing unit recognizes the object in the image based on the texture feature amount for a region where the three-dimensional information is not obtained. Device.
  9.  前記重みパラメータ生成部は、過去の前記重みパラメータから今回の前記重みパラメータを決定することを特徴とする、請求項1に記載の物体認識装置。 The object recognition device according to claim 1, wherein the weight parameter generator determines the current weight parameter from the past weight parameter.
  10.  画像のテクスチャ情報と3次元情報を取得するステップと、
     前記画像の一部領域のテクスチャ情報に基づくテクスチャ特徴量と、前記画像の一部領域の3次元情報に基づく3次元特徴量を算出するステップと、
     前記一部領域ごとに重みパラメータを生成するステップと、
     前記重みパラメータで重みづけすることにより、前記テクスチャ特徴量と前記3次元特徴量を統合した統合特徴量を生成するステップと、
     該統合特徴量に基づき前記画像中の対象物を認識するステップと、
     を有することを特徴とする物体認識方法。
    obtaining texture information and three-dimensional information of the image;
    calculating a texture feature quantity based on texture information of a partial region of the image and a 3D feature quantity based on 3D information of the partial region of the image;
    generating a weight parameter for each partial region;
    a step of generating an integrated feature amount by integrating the texture feature amount and the three-dimensional feature amount by weighting with the weight parameter;
    a step of recognizing an object in the image based on the integrated feature amount;
    An object recognition method comprising:
PCT/JP2022/004511 2021-05-21 2022-02-04 Object recognition device and object recognition method WO2022244333A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
DE112022001417.2T DE112022001417T5 (en) 2021-05-21 2022-02-04 OBJECT RECOGNITION DEVICE AND OBJECT RECOGNITION METHOD

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021086154A JP2022178981A (en) 2021-05-21 2021-05-21 Object recognition device and object recognition method
JP2021-086154 2021-05-21

Publications (1)

Publication Number Publication Date
WO2022244333A1 true WO2022244333A1 (en) 2022-11-24

Family

ID=84140182

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/004511 WO2022244333A1 (en) 2021-05-21 2022-02-04 Object recognition device and object recognition method

Country Status (3)

Country Link
JP (1) JP2022178981A (en)
DE (1) DE112022001417T5 (en)
WO (1) WO2022244333A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014006588A (en) * 2012-06-21 2014-01-16 Toyota Central R&D Labs Inc Road surface boundary estimation device and program
JP2018124177A (en) * 2017-02-01 2018-08-09 トヨタ自動車株式会社 Floor surface determination method
JP2019148889A (en) * 2018-02-26 2019-09-05 株式会社Soken Road boundary detection device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4506409B2 (en) 2004-10-27 2010-07-21 株式会社デンソー Region dividing method and apparatus, image recognition processing apparatus, program, and recording medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014006588A (en) * 2012-06-21 2014-01-16 Toyota Central R&D Labs Inc Road surface boundary estimation device and program
JP2018124177A (en) * 2017-02-01 2018-08-09 トヨタ自動車株式会社 Floor surface determination method
JP2019148889A (en) * 2018-02-26 2019-09-05 株式会社Soken Road boundary detection device

Also Published As

Publication number Publication date
JP2022178981A (en) 2022-12-02
DE112022001417T5 (en) 2024-01-11

Similar Documents

Publication Publication Date Title
US11145078B2 (en) Depth information determining method and related apparatus
US11657525B2 (en) Extracting information from images
JP6832504B2 (en) Object tracking methods, object tracking devices and programs
EP2757527B1 (en) System and method for distorted camera image correction
US20230252662A1 (en) Extracting information from images
CN111932580A (en) Road 3D vehicle tracking method and system based on Kalman filtering and Hungary algorithm
Hoang et al. Enhanced detection and recognition of road markings based on adaptive region of interest and deep learning
JP2016029564A (en) Target detection method and target detector
US10789515B2 (en) Image analysis device, neural network device, learning device and computer program product
WO2016179808A1 (en) An apparatus and a method for face parts and face detection
CN106611147B (en) Car tracing method and apparatus
Huang et al. ES-Net: An efficient stereo matching network
Kim et al. Adversarial confidence estimation networks for robust stereo matching
WO2022244333A1 (en) Object recognition device and object recognition method
CN111291607B (en) Driver distraction detection method, driver distraction detection device, computer equipment and storage medium
Alvarez et al. Novel index for objective evaluation of road detection algorithms
CN116434156A (en) Target detection method, storage medium, road side equipment and automatic driving system
Zabihi et al. Frame-rate vehicle detection within the attentional visual area of drivers
KR102609829B1 (en) Stereo Matching Confidence Estimation Apparatus And Method Using Generative Adversarial Network
CN113379787B (en) Target tracking method based on 3D convolution twin neural network and template updating
Onkarappa et al. On-board monocular vision system pose estimation through a dense optical flow
US20240153120A1 (en) Method to determine the depth from images by self-adaptive learning of a neural network and system thereof
WO2021024905A1 (en) Image processing device, monitoring device, control system, image processing method, computer program, and recording medium
US20230401733A1 (en) Method for training autoencoder, electronic device, and storage medium
CN115063594B (en) Feature extraction method and device based on automatic driving

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22804255

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 112022001417

Country of ref document: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22804255

Country of ref document: EP

Kind code of ref document: A1