WO2022244333A1

WO2022244333A1 - Object recognition device and object recognition method

Info

Publication number: WO2022244333A1
Application number: PCT/JP2022/004511
Authority: WO
Inventors: 健遠藤; 春樹的野; 健永崎
Original assignee: 日立Astemo株式会社
Priority date: 2021-05-21
Filing date: 2022-02-04
Publication date: 2022-11-24
Also published as: JP2022178981A; DE112022001417T5

Abstract

The purpose of the present invention is to provide an object recognition device for recognizing a travelable area with high accuracy. This object recognition device is characterized by including: an input signal acquisition unit that acquires texture information and three-dimensional information about an image; a feature value calculation unit that calculates a texture feature value based on texture information about a partial area of the image, and a three-dimensional feature value based on three-dimensional information about the partial area; a weight parameter generation unit that generates a weight parameter for each partial area; and a target object recognition unit that performs weighting with the weight parameter to thereby generate an integrated feature value obtained by integrating the texture feature value and the three-dimensional feature value, and that recognizes a target object in the image on the basis of the integrated feature value.

Description

Object recognition device and object recognition method

The present invention relates to an object recognition device and an object recognition method that analyze an image captured by an in-vehicle camera and recognize objects around the vehicle.

Lane Keep Assist System (LKAS), which prevents deviation from the roadway, is expected to serve as a driving support system to prevent single accidents caused by drowsiness or lack of attention while driving. there is As a precondition for realizing appropriate steering support control, this lane-keeping support system is equipped with a function that analyzes the image captured by the vehicle-mounted camera and divides the area within the image into a drivable area and an undrivable area. there is

Further, Japanese Patent Application Laid-Open No. 2002-200302 is known as a known document that discloses a region dividing method. The abstract of this document states that ``an acquired target image is divided into regions using different feature quantities, and a generated region group consisting of a plurality of generated regions characterized by the feature quantities used for the region division is (S130) One of the generation region groups is defined as a basic region, and the feature quantity of the generation region that overlaps with the basic region is incorporated into the feature quantity of the basic region. (S140) Based on the feature values of the basic regions forming the basic region group, the target basic region Rtar and the adjacent basic region R As an index representing the degree of similarity between _k and the weighted Euclidean distance between both regions in the multidimensional feature space represented by the feature quantity, both regions are integrated when this distance satisfies the integration conditions (S150). ).”, and discloses a region dividing method that utilizes not only texture information of a target image but also distance information (weighted Euclidean distance).

As described above, in Patent Document 1, a feature amount is extracted from the target image and the distance information, and correspondence is performed based on the distance weighted for each feature amount. More specifically, as described in FIG. 3, paragraphs 0046 to 0050, etc. of the same document, the weight is calculated from the brightness of the entire target image. uses the weight of

JP 2006-127024 A

When segmenting the drivable area as preprocessing for lane keeping support control, it is necessary to correctly segment even a scene in which a large number of objects are captured. A method of actively using texture information, a method of actively using distance information, etc. can be considered for area division. It is thought that it differs depending on the type of

Therefore, in Patent Document 1, in which a single target image is divided into regions using a single weight, the weight cannot be changed for each imaged object, and a weight that is different from the weight that should be used is used. In this case, it is difficult to correctly segment the target image into regions.

In view of such problems, the present invention aims to provide an object recognition device and an object recognition method that recognize a travelable area with high accuracy by changing the weight for each image area.

An input signal acquisition unit that acquires texture information and three-dimensional information of an image, and calculates a texture feature amount based on the texture information of a partial area of the image and a three-dimensional feature amount based on the three-dimensional information of the partial area. a feature amount calculation unit, a weight parameter generation unit that generates a weight parameter for each of the partial regions, and a combined feature amount that integrates the texture feature amount and the three-dimensional feature amount by weighting with the weight parameter. and an object recognition unit for recognizing the object in the image based on the integrated feature quantity.

According to the object recognition device and the object recognition method of the present invention, the travelable area can be recognized with high accuracy by changing the weight for each image area.

2 is a functional block diagram of the object recognition device of Example 1. FIG. 4 is a processing flowchart of the object recognition device of the first embodiment; An example of a texture image acquired by the object recognition device of the first embodiment. Data structure of texture information of each pixel of the texture image of FIG. 3A. 4 is an example of a three-dimensional image acquired by the object recognition device of Example 1; Data structure of three-dimensional information of each pixel of the three-dimensional image of FIG. 3C. 4 is a schematic explanatory diagram of a neural network of the object recognition device of Example 1. FIG. Extraction processing of texture feature using the neural network of the first embodiment. Extraction processing of three-dimensional feature amount using the neural network of the first embodiment. An example of a method for determining a reference feature amount of the object recognition device of the first embodiment. 4A and 4B are diagrams for explaining travel permission/inhibition determination processing of the object recognition apparatus according to the first embodiment; FIG. 5 is an example of a weight calculation processing flow of the object recognition apparatus of the first embodiment; FIG. 11 is an example of a weight calculation processing flow of the object recognition device of the second embodiment; FIG. FIG. 10 is an example of a method for determining a reference feature amount of the object recognition device of the second embodiment; FIG. An example of a method of calculating a weight from a reference feature amount of the object recognition device of the second embodiment. FIG. 11 is an explanatory diagram of a neural network of the object recognition device of Example 3; FIG. 13 is a diagram showing a weight calculation layer in the neural network of FIG. 12; An example of the processing flow of weight calculation of the object recognition apparatus of Example 3. FIG. An example of the processing flow of weight calculation of the object recognition apparatus of Example 4. FIG.

An embodiment of the object recognition device 100 of the present invention will be described in detail below with reference to the drawings.

First, an object recognition device 100 according to Example 1 of the present invention will be described with reference to FIGS. 1 to 8. FIG.

FIG. 1 is a functional block showing the configuration of the object recognition device 100 of the first embodiment. The object recognition apparatus 100 includes, as hardware, an external sensor such as an in-vehicle camera, an arithmetic device such as a CPU, and a storage device such as a semiconductor memory. , various functions shown in the figure operate. It should be noted that implementation of various functions by executing programs is a well-known technology, and therefore, description of specific operations of hardware such as arithmetic units will be omitted below.

As shown in FIG. 1, the object recognition apparatus 100 includes an input signal acquisition unit 1, a feature amount calculation unit 2, a storage unit 3, a weight parameter generation unit 4, and an object recognition unit as functional units realized by the above hardware. has 5. Each part will be described in order below.

<Input signal acquisition unit 1>
The input signal acquisition section 1 has an image acquisition section 11 and a three-dimensional information acquisition section 12 .

The image acquisition unit 11 acquires the texture image _Ft in units of frames captured by the vehicle-mounted camera.
Therefore, if the vehicle-mounted camera is a monocular camera, the image acquisition unit 11 acquires one texture image _Ft for each imaging frame. Get the texture image _Ft .

If the image acquisition unit 11 acquires two left and right texture images _Ft from the stereo camera, the three-dimensional information acquisition unit 12 obtains the three-dimensional information _Id for each pixel by using a well-known parallax calculation method. to generate Further, if the image acquisition unit 11 acquires one texture image _Ft from the monocular camera, the three-dimensional information acquisition unit 12 obtains three-dimensional information for each pixel from a millimeter wave radar or Lidar provided in parallel with the monocular camera. Get the _Id .

<Feature amount calculator 2>
The feature amount calculator 2 has a texture feature amount calculator 21 and a three-dimensional feature amount calculator 22 .

The texture feature amount calculation unit 21 calculates a _texture feature amount _{fet_t} from the texture image Ft acquired by the image acquisition unit 11 . For this feature amount, a HoG feature amount using edges may be used, or an ICF feature amount based on machine learning may be used. Also, feature amounts extracted from a convolutional neural network (hereinafter simply referred to as "neural network N"), which will be described later, may be used.

The three-dimensional feature quantity calculator 22 calculates a three-dimensional feature quantity fet _d from the three-dimensional information I _d acquired by the three-dimensional information acquirer 12 . The feature amount includes the _HoG feature amount for the distance image obtained by projecting the distance information onto the image, the ICF feature amount of the 3D image Fd in which the 3D information is stored as the image channel, and the 3D information. A feature amount extracted from a neural network with _Id as an input may be used.

<Storage unit 3>
The storage unit 3 has a texture reference feature amount storage unit 31 and a three-dimensional reference feature amount storage unit 32 . The texture reference feature amount storage unit 31 stores the texture reference feature amount _Bt extracted from the texture information It of the texture image _Ft , and the three- _dimensional reference feature amount storage unit 32 stores the texture reference feature amount extracted from the three-dimensional information _Id. The three-dimensional reference feature amount _Bd is stored. As will be described in detail in the second embodiment, the texture reference feature storage unit 31 can store a plurality of texture reference features _Bt , and the three-dimensional reference feature storage unit 32 can store a plurality of three-dimensional reference features. A quantity B _d can be stored.

The reference feature amount B stored in both storage units is determined from the viewpoint of the recognition rate in the verification data set by constructing classifiers using the texture feature amount calculation unit 21 and the 3D feature amount calculation unit 22, respectively. . Specifically, the feature amount that is successfully recognized or the feature amount with the maximum recognition score is stored as the reference feature amount B in each storage unit. Also, a kernel calculated by learning a neural network in which the texture feature amount calculation unit 21 and the three-dimensional feature amount calculation unit 22 are part of the network configuration may be stored as the reference feature amount B. FIG.

<Weight parameter generator 4>
The weight parameter generation unit 4 calculates the weight w for the feature amount fet calculated by the feature amount calculation unit 2 using the reference feature amount B stored in the storage unit 3 . Specifically, the inner product of the outputs of the texture feature amount calculation unit 21 and the texture reference feature amount storage unit 31 is calculated, and the inner product is used as the weight w _t of the texture feature amount fet _t . Similarly, the inner product of the outputs of the three-dimensional feature amount calculation unit 22 and the three-dimensional reference feature amount storage unit 32 is calculated, and the inner product is used as the weight wd of the three-dimensional feature amount _fetd _.

Since the inner product expresses the correlation between vectors, that is, the similarity, the weight w is calculated by focusing on the similarity with the reference feature value B. In addition to the inner product, an index of an exponential function such as the L2 distance or the Bhattacharya distance may be used as the weight w. Further, when a plurality of reference feature amounts B are stored in each of the texture reference feature amount storage unit 31 and the three-dimensional reference feature amount storage unit 32, the average value of the inner product values of each reference feature amount can be used as the weight. can. Further, when the reference feature amount B is a kernel of a neural network, the weight w may be calculated by further performing a convolution operation on the calculation result of the inner product of the feature amount and the reference feature amount.

<Object Recognition Unit 5>
The target object recognition unit 5 recognizes the target object based on the feature amount obtained by integrating the feature amount calculated by the feature amount calculation unit 2 with the weight w generated by the weight parameter generation unit 4 . Specifically, the integrated feature amount fet _C is generated by adding the texture feature amount fet _t and the three-dimensional feature amount fet _d according to the weight w generated by the weight parameter generation unit 4 . Then, a discriminator using the integrated feature amount fet _C recognizes the drivable area around the vehicle.

The travelable area recognized by the object recognition unit 5 is output to the ECU (Electronic Control Unit) via a CAN (Controller Area Network) (not shown). Therefore, the ECU executes lane keeping support control by assisting control of the steering system so as not to deviate from the drivable area around the vehicle.

<Operation example>
Next, an operation example of the object recognition device 100 having the above configuration will be described in detail with reference to the flowchart of FIG. In the following operation examples, the object recognition device 100 using a stereo camera installed in a posture to monitor the front of the vehicle will be described. Since the stereo camera is composed of a left camera and a right camera, two left and right texture images _Ft are _captured in units of captured frames. The area shall be estimated.

The object recognition apparatus 100 of the present embodiment performs an input information acquisition process (step S1), a texture feature amount extraction process (step S2), a three-dimensional feature amount extraction process (step S3), a weight calculation process (step S4), a feature amount An integration process (step S5) and a type determination process (step S6) are performed in order.

First, in the input information acquisition process (step S1), two left and right texture images _Ft are acquired from the left camera and the right camera. A texture image _Ft acquired from the right camera is illustrated in FIG. 3A. FIG. 3B shows the data structure of the texture information I _t of each pixel of the texture image F _t of FIG. 3A. The texture information I _t exemplified in FIG. 3B is data defining the color of each pixel of the texture image F _t by a combination of the R value, the G value, and the B value. is not limited to

Also, in this step, a parallax image is generated by scanning the left camera image with the right camera image as a reference for the acquired left and right two texture images _Ft . For example, SAD (Sum of Absolute Difference) is used to calculate parallax. Then, referring to the focal length of the camera, the size of the image sensor, and the base length of the camera, the depth distance Z, the horizontal distance X, and the vertical distance Y from the camera are calculated based on the parallax image, and the three-dimensional information I Generate a three-dimensional image _Fd , where _d is the channel of the image. The three-dimensional image Fd generated in this way is _illustrated in FIG. 3C, and the data structure of the three-dimensional information _Id of each pixel of the three-dimensional image _Fd is shown in FIG. 3D. In the following, the procedure for determining whether the texture region _Rt shown in FIG. 3A is a travelable region or a non-travelable region will be described in consideration of the three-dimensional information _Id of the three-dimensional region _Rd shown in FIG. 3C.

In the texture feature amount extraction process (step S2), the texture feature amount _{fet_t} is extracted using the information acquired in step S1.

First, with reference to FIG. 4, an outline of the neural network Nt that determines whether or not the vehicle can travel based on the texture image _Ft will be described. The neural network Nt of FIG. 4 is learned so that when an arbitrary local region R having the same size as the texture region _Rt of FIG. 3A is input, it can be determined whether or not the local region R is a travelable region. It is a thing. For this learning, a learning data set to which correct values are given is used. Here, the neural network Nt of FIG. 4 is composed of a layer N1t for feature quantity extraction at the front stage and a layer N2t for identification processing at the rear stage. The preceding layer N1t is composed of a number of convolution kernels and activation functions Relu, and extracts texture feature fet _t effective for discrimination from the local region R. Further, the latter layer N2t is configured to determine whether or not the local region R is a travelable region by the fully connected layer and the activation function Softmax for the texture feature quantity fet _t extracted by the former layer N1t. It's becoming

Therefore, as shown in FIG. 5A, by inputting the texture information It of the texture region _Rt to the layer _N1t for feature quantity extraction, the texture feature quantity _{fet_t} of the texture region _Rt can be calculated.

In the three-dimensional feature quantity extraction process (step S3), similarly to step S2, the three-dimensional feature quantity fet _d is extracted using the information acquired in step S1. In this step, a neural network Nd is used to determine whether travel is possible based on the three-dimensional image _Fd . This neural network Nd is trained so that when an arbitrary local region R having the same size as the three-dimensional region _Rd in FIG. 3C is input, it can be determined whether the local region R is a travelable region. Similar to the neural network Nt shown in FIG. 4, it is composed of a front feature extraction layer N1d and a rear identification processing layer N2d.

Therefore, as shown in FIG. 5B, by inputting the three-dimensional information Id of the three-dimensional region _Rd of FIG. 3C corresponding to the texture region _Rt of FIG. 3A to the feature extraction layer _N1d , the three-dimensional region A three-dimensional feature quantity fet _d of R _d can be calculated. It should be noted that the layer N1t (FIG. 5A) for the texture region _Rt and the layer _N1d (FIG. 5B) for the three-dimensional region Rd are assumed to extract feature quantities of the same number of dimensions.

In the weight calculation process (step S4), the weight w _t for the texture feature quantity fet _t extracted in step S2 and the weight w _d for the three-dimensional feature quantity fet _d extracted in step S3 are calculated. Hereinafter, the method of calculating the weight w _t for the texture feature amount fet _t will be described, and the weight w _d for the three-dimensional feature amount fet _d , which can be calculated in the same way, will be omitted.

The texture reference feature B _t is used to calculate the weight w _t of the texture feature fet _t . First, a method for determining the texture reference feature amount _Bt will be described with reference to FIG. The neural network Nt of FIG. 4 is used to determine the texture reference feature _Bt . R1, R2, and R3 in FIG. 6 respectively indicate local regions of the verification data set used for calculating the recognition rate. Also, E1, E2 and E3 respectively indicate the recognition results of the neural network Nt when the local regions R1, R2 and R3 are input. In FIG. 6, the recognition result E2 when the local region R2 is input is correct, and the recognition results E1 and E3 when the local regions R1 and R3 are input are wrong. In this case, the texture feature amount fet _t2 output by the preceding layer _N1t when the local region R2 is input is determined as the texture reference feature amount Bt and stored in the texture reference feature amount storage unit 31 . The texture reference feature quantity _Bt determined in this manner is not a variable that changes according to the position of the texture region _Rt , but a constant that is preset before execution of step S4.

Therefore, in this step, the texture feature amount fet _t extracted in step S2 for an arbitrary texture region R _t and the texture reference feature amount B _t , which is a constant, are used to calculate the inner product according to (Equation 1). , the weight w _t for the texture feature fet _t in the texture region R _t can be calculated.

Note that D in (Equation 1) indicates the number of dimensions of the texture reference feature _Bt . The calculation result of (Equation 1 ₎ represents the correlation value between the texture feature quantity fet _t and the texture reference feature quantity Bt, that is, the degree of similarity.

A similar procedure is performed on the three-dimensional region _Rd of the three- _{dimensional image Fd to calculate the three-dimensional reference feature Bd, and the weight wd for the three-dimensional feature fet d} _extracted _in step S3 _. can be calculated using (Equation 2).

In the feature quantity integration process (step S5), the texture feature quantity fet _t extracted in step S2 and the three-dimensional feature quantity fet _d extracted in step S3 are combined using the weights w _t and w _d calculated in step S4. A unified integrated feature fet _C is calculated. Integration of feature amounts is calculated according to the following (Equation 3).

In the type determination process (step S6), it is determined whether or not the vehicle is in the travelable area based on the integrated feature amount fet _C calculated in step S5. A neural network N3 is used to determine the travelable area based on the integrated feature amount _fetC . FIG. 7 shows a conceptual diagram of this step. As shown here, the integrated feature amount fet _C is used as an input to the neural network N3 to determine whether or not the vehicle is in the travelable area. The neural network N3 is composed of a feature extractor consisting of a large number of convolution layers and an activation function Relu, and a discrimination processor consisting of a fully connected layer and an activation function Softmax. The neural network N2t described above is trained with a data set with the texture feature fet _t as an input, and the neural network layer N2d is trained with a data set with the feature fet _d as an input. It is assumed that the neural network N3 of is trained with a data set with the integrated feature amount fet _C as an input.

By sequentially performing steps S1 to S6 described above, it is possible to determine whether or not the texture region _Rt of FIG. 3A is _travelable while appropriately adjusting the weight wt and weight _wd . . Similarly, for areas other than the texture area _Rt in FIG. 3A, by sequentially performing steps S1 to S6, the entire texture image _Ft can be determined as a travelable area.

As described above, the object recognition apparatus 100 of the present embodiment can change the weight of the feature quantity for each pixel of the image. As a result, even if the texture image _Ft includes an object that should be determined by actively using the _texture information It and an object that should be determined by actively using the three-dimensional information _Id , However, different weights can be set for each object, and the recognition accuracy can be improved.

Further, in the object recognition device 100 of the present embodiment, the feature amount weight is calculated by comparing the reference feature amount B determined in advance and the feature amount fet. This eliminates the need for additional image analysis processing such as analyzing the luminance value of the entire image for weight calculation, and makes weight calculation more efficient.

Also, in the object recognition device 100 of this embodiment, the weight is calculated by calculating the inner product of the reference feature amount and the feature amount. Since the inner product calculation can be performed only by sum-of-products calculation, weight calculation can be performed with a small amount of calculation.

Further, in the object recognition apparatus 100 of the present embodiment, different neural network layers _N1t and _N1d are used to calculate the reference feature amount B of the texture information It and the three-dimensional information Id. The texture reference feature _Bt can be determined only from the _texture information It, and the three-dimensional reference feature can be determined only from the three-dimensional information _Id , so that the weight can be calculated more accurately.

In addition, in the object recognition device 100 of this embodiment, as illustrated in FIG. 6, the reference feature quantity B is generated based on the recognition rate for the verification data. Therefore, a successfully recognized feature amount can be selected as a reference feature amount. As a result, it is possible to calculate a weight that positively uses a feature amount similar to the feature amount that has been successfully recognized, so that recognition can be performed with higher accuracy.

In the weight calculation process (step _S4 ) of this embodiment, the weight _wt of the texture information It and the weight _wd of the three-dimensional information _Id are always calculated. can also First, in the 3D information validity determination process (step S41), the validity of the acquired 3D information _Id is determined. If the pixel fails to acquire the three-dimensional information _Id , or if the cost of the parallax calculation is equal to or greater than a predetermined value, the three-dimensional image _Fd is determined to be invalid. If it is determined to be invalid, the process proceeds to step S6 without calculating the weight _wt of the texture information I _t and the weight _wd of the three-dimensional information I _d . That is, in step S6, using the layer N2t of the neural network in FIG. 4, it is determined whether or not the vehicle can travel based only on the texture feature amount fet _t .

On the other hand, when the three-dimensional image _Fd is determined to be valid, the weight _wt of the texture information It and the weight _wd of the three-dimensional information _Id are calculated (step _S42 ). Then, the weight _wt and the weight _wd are used to execute the processing after step S5.

By doing so, it is possible to reduce the processing load by not calculating the weight when the 3D information cannot be acquired or when the reliability of the acquired 3D information is extremely low.

Next, the object recognition device 100 according to Example 2 of the present invention will be described with reference to FIGS. 9 to 11. FIG. Duplicate descriptions of common points with the first embodiment will be omitted.

In the first embodiment, one reference feature amount B is set for each feature amount fet (see FIG. 6), but in the second embodiment, a plurality of reference feature amounts can be set for each feature amount fet. Therefore, in this embodiment, as shown in FIG. 9, the weight calculation process (step S4) is composed of the reference feature amount inner product calculation (step S4a) and the average value calculation process (step S4b).

First, in the reference feature amount inner product calculation (step S4a), each of a plurality of reference feature amounts B set for each feature amount fet is used to calculate the inner product with the feature amount fet. Here, a method for setting a plurality of reference feature amounts B will be described with reference to FIG. 10 . In the following, the method of setting the texture reference feature _Bt corresponding to the texture image Ft will be described, and the description of the three- _dimensional reference feature _Bd , which can be set in the same way, will be omitted. The meaning of each code in FIG. 10 is the same as the meaning of each code in FIG. The difference between the two figures is that in FIG. 6 only the recognition result E2 based on the local region R2 is correct and both the recognition results E1 and E3 based on the local regions R1 and R3 are incorrect, whereas in FIG. Both of the recognition results E1 and E2 based on R2 are correct, and only the recognition result E3 based on the local region R3 is incorrect.

Therefore, in the present embodiment, both the texture feature amounts fet _t1 and fet _t2 resulting from the successfully identified local regions R1 and R2 are set as the texture reference feature amounts B _t1 and B _t2 . A weight w _t of the texture feature amount fet _t of the texture region R _t is calculated. Therefore, in step S4a, as shown in FIG. 11, the inner product value S _t1 of the texture feature amount fet _t extracted from the texture region R _t and the texture reference feature amount B _t1 , and the texture feature amount fet _t and the texture reference feature amount Calculate the inner product value S _t2 of B _t2 . That is, similarity information between each reference feature amount and the texture feature amount _{fet_t} of the texture region _{R_t} is calculated.

Next, in the average value calculation process (step S4b), the weight w _t for the texture feature amount fet _t is calculated from the plurality of inner product values calculated in step S4a. The following equation (4) is used to calculate the weight _wt .

Here, A _t indicates an index set of the texture reference feature B _t . By calculating according to (Formula 4), the average of the inner product values of the texture reference feature _Bt can be obtained. In the second embodiment, the average value of the inner product values calculated from a plurality of texture reference feature quantities _Bt is used as the weight wt for the texture feature quantity _{fet t} _. The method of calculating the weight w _t of the texture feature quantity fet _t has been described above _. Calculation of _Bd and calculation of weight _wd based on the average value of inner products are performed.

In this embodiment, a plurality of reference feature amounts B are set for each feature amount fet, and the average value of the respective inner product values is used as the weight w of the feature amount fet. As a result, weights can be calculated based on a plurality of reference feature amounts rather than a single reference feature amount, so weight calculation can be performed robustly.

Next, the object recognition device 100 according to Example 3 of the present invention will be described with reference to FIGS. 12 to 14. FIG. Duplicate descriptions of the points in common with the above embodiment will be omitted.

In the first embodiment, neural networks are used in each of the three steps of texture feature amount extraction processing (step S2), three-dimensional feature amount extraction processing (step S3), and type determination processing (step S6) shown in FIG. . In other words, in Example 1, three types of neural networks were used to execute the processing of FIG. On the other hand, in the present embodiment, one neural network N incorporating the functions of the networks of the first embodiment as layers is used to execute the processing of FIG.

FIG. 12 shows the configuration of the neural network N of this embodiment. The correspondence relationship between the processing flow of FIG. 2 and the neural network N will be described below. However, since the input information acquisition process is the same as that of the first embodiment, the subsequent processes will be described.

The neural network N shown in FIG. 12 first receives the texture region _Rt and the three-dimensional region _Rd as inputs, and performs the texture feature amount extraction process (step S2) and the three-dimensional feature amount extraction process (step S3). In the texture feature quantity extraction process (step S2), the layer N1t of the neural network N is used to extract the texture feature quantity fet _t .
This layer N1t is composed of a number of convolution layers and an activation function Relu. Similarly, in the three-dimensional feature amount extraction process (step S3), the layer N1d of the neural network N is used to extract the three-dimensional feature amount fet _d . Here, the number of dimensions of the texture feature amount fet _t and the three-dimensional feature amount fet _d extracted from the layer N1t and the layer N1d are made equal.

In the weight calculation process (step S4), the layers N4t and N5t of the neural network N are used to calculate the weight wt of the texture feature fet _t , and the layers N4d and _N5d are used to calculate the three-dimensional feature fet _d. Calculate the weight _wd of In the following, the method of calculating the weight w _t for the texture feature amount fet _t will be described, and the description of the weight w _d for the three-dimensional feature amount fet _d , which can be calculated in the same way, will be omitted.

FIG. 13 shows the details of the configuration of the layer N4t and the layer _N5t for calculating the weight wt for the texture feature fet _t . The processing in layers N4t and N5t is the processing flow shown in FIG. The processing by the layer N4t corresponds to the reference feature quantity inner product calculation (step S4c), and the processing by the layer N5t corresponds to the reference similarity inner product calculation (step S4d).

First, the reference feature amount inner product calculation (step S4c) will be described. The texture reference features B _t1 , B _t2 , . . . , B _tn shown in the layer N4t are kernels in the layer N4t. The texture reference feature values B _t1 , B _t2 , . . . , B _tn are estimated by learning a neural network N described later. In the layer _N4t , the inner _product of B _t1 , B _t2 , . Here, since each element of the vector vec is an inner product with each reference feature amount, that is, a correlation value, the vector representing the degree of similarity with each reference feature amount is the substance of vec. The above processing can be realized by a convolution operation using a 1×1 kernel.

Next, a reference similarity inner product calculation (step S4d) is performed. The reference similarity C in the layer N5t is a vector of the same dimension as the vector vec, and stores the relationship between each reference feature amount and the feature amount that should be actively used, that is, the weight is increased. Specifically, the first element of the vector vec stores the degree of similarity between the texture feature fet _t and the texture reference feature _Bt1 , and the second element stores the similarity between the texture feature fet _t and the texture reference feature _Bt2 . The similarity is stored, and the conditions under which the calculated feature quantity should be actively used, that is, the condition for increasing the weight w _t is similar to the texture reference feature quantity B _t1 , while the texture reference feature quantity B _t2 and If the condition is that they are not similar, the first element of the reference similarity C stores a positive value and the second element stores a negative value. The reference similarity C is estimated by learning, which will be described later. In layer N5t, the inner product of vector vec and reference similarity C is calculated. The above processing is realized by a convolution operation using a 1×1 kernel. Let the inner product value of the vector _vec and the reference similarity C be the weight wt. A similar process is performed using the layer N4d and the layer N5d with the three-dimensional feature amount fet _d as an input to calculate the weight w _d corresponding to the three-dimensional feature amount fet _d .

In the feature amount integration process (step S5), feature amounts are integrated based on the layer N6 of the neural network N. FIG. The layer N6 of the neural network N is a layer that performs the same calculation as in (Formula 3) described above, and outputs the integrated feature amount _fetC .

In the type determination process (step S6), the layer N3 of the neural network N is used to determine whether or not it is a travelable area from the integrated feature amount _fetC . The layer N3 is composed of a layer composed of a convolution layer and an activation function Relu, a fully connected layer and an activation Softmax, and performs type determination.

Next, the learning method of neural network N will be described. In learning, the cross entropy between the output value of layer N3 and the correct value is learned as an error function. The neural network N has a configuration in which all layers are differentiable, and can be learned by updating the kernel parameters so as to reduce the error function defined for the output of layer N3. As a result, the reference feature values used in layers N4t and N5t and the reference similarity weights used in layers N4d and N5d are estimated so as to minimize the error function. That is, it is possible to estimate the reference feature amount and the reference similarity that maximize the recognition rate of the learning data.

In Example 3, weight calculation was used based not only on the reference feature amount, but also on the reference similarity. This makes it possible to calculate weights that include not only similarity with the reference feature quantity, but also dissimilarity with the reference feature quantity, enabling weighting corresponding to more complicated conditions. Performance can be improved.

In addition, in Example 3, the drivable area is determined based on a single neural network in which the reference feature amount and the reference similarity are used as kernels. Also, learning was done by defining an error function for the output of the neural network. As a result, the reference feature amount and the reference similarity can be estimated so as to maximize the final recognition rate, so that recognition can be performed with higher accuracy.

In addition, in Example 3, a single neural network is used to calculate feature amounts, estimate weights, and estimate types of objects. This eliminates the need to learn a plurality of neural networks individually, shortening the learning time and reducing the work cost required by the designer.

Next, the object recognition device 100 according to Example 4 of the present invention will be described using FIG. Duplicate descriptions of the points in common with the above embodiment will be omitted.

Since the difference between the first embodiment and the fourth embodiment is the processing content of the weight calculation processing (step S4), step S4 will be described below. In the first embodiment, the weight w is calculated by independently processing the texture image _Ft for each captured frame. Calculate

FIG. 15 shows the weight calculation process (step S4) in this embodiment. As shown here, step S4 of this embodiment includes past frame position calculation processing (step S4e) and weight average value calculation processing (step S4f).

First, in the past frame position calculation process (step S4e), it is calculated which position in the image of the past frame corresponds to the recognition target image area of the current frame. From information such as the vehicle speed and yaw rate, it is possible to predict which position in the previous frame the image area of the current frame corresponds to, or obtain feature points from the image, The position of the past frame may be specified by calculating the movement amount of the camera from the association of the feature points.

In the weight average value calculation process (step S4f), the weight of the image area surrounding the image area of the past frame specified in step S4e is used to calculate the weight of the image area to be recognized in the current frame.
The radius Rpix of the image region of the specified past frame is defined, and the average value of the past weights included in that region is used as the weight used in the current frame. The above processing is performed by weight calculation of the texture feature amount fet _t and the three-dimensional feature amount fet _d .

In the fourth embodiment, weights to be used in the current frame are determined based on weights calculated in the past.
This eliminates the need to calculate the weight in the current frame, thereby reducing the processing load.

Note that in the first and second embodiments, the reference feature B is selected based on the information indicating the success or failure of recognition of the verification data, but the reference feature may be selected based on the recognition score. Specifically, when one reference feature amount is determined, the feature amount with the maximum identification score may be used as the reference feature amount, and when N reference feature amounts are determined, the top N of the identification scores may be used as the reference feature amount. It may be used as a reference feature amount.

Although the present invention has been described above, the present invention is not limited to the above examples. Various changes can be made to the configuration and details of the present invention within the spirit of the present invention that can be understood by those involved.

DESCRIPTION OF SYMBOLS 100... Object recognition apparatus 1... Input signal acquisition part 11... Image acquisition part 12... Three-dimensional information acquisition part 2... Feature-value calculation part 21... Texture feature-value calculation part 22... Three-dimensional feature-value calculation part , 3...storage unit, 31...texture reference feature amount storage unit, 32...three-dimensional reference feature amount storage unit, 4...weight parameter generation unit, 5...target object recognition unit, _{Ft...texture image, Rt} _... texture area _, _fet _. _{_} _{_} _{_} _{_} integrated feature quantity, w... weight

Claims

an input signal acquisition unit that acquires texture information and three-dimensional information of an image;
a feature amount calculation unit that calculates a texture feature amount based on texture information of a partial area of the image and a three-dimensional feature amount based on three-dimensional information of the partial area;
a weight parameter generation unit that generates a weight parameter for each of the partial regions;
an object recognizing unit that generates an integrated feature amount by integrating the texture feature amount and the three-dimensional feature amount by weighting with the weight parameter, and recognizes an object in the image based on the integrated feature amount; , and an object recognition device.
A storage unit for storing a texture reference feature amount corresponding to the texture feature amount and a three-dimensional reference feature amount corresponding to the three-dimensional feature amount,
The weight parameter generation unit generates a weight parameter for each partial region based on results of comparing the texture feature amount and the three-dimensional feature amount with the texture reference feature amount and the three-dimensional reference feature amount, respectively. 2. The object recognition device according to claim 1, characterized by:
The storage unit stores a plurality of the texture reference feature amounts and a plurality of the three-dimensional reference feature amounts,
The weight parameter generation unit weights each partial region based on results of comparing the texture feature amount and the three-dimensional feature amount with a plurality of the texture reference feature amounts and a plurality of the three-dimensional reference feature amounts, respectively. 3. The object recognition device according to claim 2, wherein a parameter is obtained.
3. The weight calculating unit uses an inner product value of the texture feature amount and the three-dimensional feature amount and the texture reference feature amount and the three-dimensional reference feature amount, respectively, as the weight parameter. An object recognition device as described.
The texture reference feature is calculated using a texture identifier that uses the texture feature,
3. The object recognition apparatus according to claim 2, wherein the three-dimensional reference feature amount is calculated using a three-dimensional discriminator that uses the three-dimensional feature amount.
3. The object recognition device according to claim 2, wherein the feature amount calculation unit, the weight parameter generation unit, the storage unit, and the target object recognition unit are composed of a single neural network.
The object recognition device according to claim 2, wherein the texture reference feature amount and the three-dimensional reference feature amount are generated in advance based on a recognition rate for verification data.
The weight calculation unit calculates the weight parameter for the region from which the three-dimensional information is acquired;
2. The object recognition according to claim 1, wherein the object recognizing unit recognizes the object in the image based on the texture feature amount for a region where the three-dimensional information is not obtained. Device.
The object recognition device according to claim 1, wherein the weight parameter generator determines the current weight parameter from the past weight parameter.
obtaining texture information and three-dimensional information of the image;
calculating a texture feature quantity based on texture information of a partial region of the image and a 3D feature quantity based on 3D information of the partial region of the image;
generating a weight parameter for each partial region;
a step of generating an integrated feature amount by integrating the texture feature amount and the three-dimensional feature amount by weighting with the weight parameter;
a step of recognizing an object in the image based on the integrated feature amount;
An object recognition method comprising: