CN116012806B

CN116012806B - Vehicle detection method, device, detector, system and model training method

Info

Publication number: CN116012806B
Application number: CN202310318133.2A
Authority: CN
Inventors: 赵云
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-06-13
Anticipated expiration: 2043-03-29
Also published as: CN116012806A

Abstract

The application relates to the technical field of automatic driving and discloses a vehicle detection method, a device, a detector, a system and a model training method, which comprise the steps of obtaining a feature depth map corresponding to an image feature map; the image feature map corresponds to an external environment image of the vehicle; acquiring a road layout aerial view of the current position of the automatic driving vehicle; the road layout aerial view comprises road elements; estimating the weight of the image feature belonging to the road element in the image feature map; generating a corrected image of the road element for the characteristic depth map according to the road layout aerial view; correcting the characteristic depth map by using the corrected image and the weight to obtain a corrected characteristic depth map; and detecting the exterior of the automatic driving vehicle according to the corrected feature depth map and the image feature map, and outputting a 3D vehicle detection result. According to the road layout aerial view auxiliary detection method, the road layout aerial view auxiliary detection process is utilized, and due to the fact that the road elements are included, feature conversion accuracy of non-vehicle areas in the feature depth map is improved, and then accuracy of vehicle detection results is improved.

Description

Vehicle detection method, device, detector, system and model training method

Technical Field

The present application relates to the field of autopilot technology, and in particular, to a vehicle detection method, apparatus, detector, system, model training method, and computer readable storage medium.

Background

The automatic driving technology can sense the environment and navigate without manual operation, so as to realize automatic driving of the vehicle. An autonomous vehicle needs to accurately identify vehicles in the surrounding environment and accurately identify, locate and predict the speed. The image data shot by the vehicle-mounted camera is used for detecting the surroundings of the automatic driving vehicle, has the characteristics of low cost and rich information acquisition, and is widely applied to the automatic driving vehicle.

When the automatic driving system detects the vehicle outside the automatic driving vehicle, the automatic driving system reads the image information of the external environment and outputs the position, length, width and height and speed information of the vehicle in the three-dimensional space. The existing detection process comprises the steps of firstly obtaining image features of external environment image information through a deep neural network, mapping the image features into a three-dimensional space through the neural network or camera parameters and the like to form 3D space features or bird view features, and then completing detection of vehicles in the 3D space or under a bird view coordinate system. However, the accurate depth estimation is difficult to be completed only based on the image information, and in the feature mapping process, dislocation and confusion between 3D space features and real objects are often caused, so that the detection result is affected.

Therefore, how to solve the above technical problems should be of great interest to those skilled in the art.

Disclosure of Invention

The purpose of the application is to provide a vehicle detection method, a device, a detector, a system, a model training method and a computer readable storage medium, so as to improve the accuracy of mapping image features to a 3D space, and further enable a vehicle detection result to be more accurate.

In order to solve the above technical problems, the present application provides a vehicle detection method, including:

acquiring a feature depth map corresponding to the image feature map; the image feature map corresponds to an external environment image of the vehicle;

acquiring a road layout aerial view of the current position of the automatic driving vehicle; the road layout aerial view comprises road elements;

estimating the weight of the image feature belonging to the road element in the image feature map;

generating a corrected image of the road element for the characteristic depth map according to the road layout aerial view;

correcting the characteristic depth map by using the corrected image and the weight to obtain a corrected characteristic depth map;

and detecting the exterior of the automatic driving vehicle according to the corrected feature depth map and the image feature map, and outputting a 3D vehicle detection result.

Optionally, generating a corrected image of the road element for the feature depth map according to the road layout aerial view includes:

generating a road element depth feature map and a road element mask map according to the road layout aerial view map;

correspondingly, correcting the feature depth map by using the corrected image and the weight, and obtaining the corrected feature depth map includes:

and correcting the feature depth map through weighted summation and masking by using the road element depth feature map, the road element mask map and the weights to obtain a corrected feature depth map.

Optionally, using the road element depth feature map, the road element mask map, and the weights, correcting the feature depth map through weighted summation and masking, and obtaining a corrected feature depth map includes:

correcting the characteristic depth map according to a preset formula, wherein the preset formula is as follows:

；

wherein ,

for the depth features of the (u, v) points in the corrected feature depth map +.>

Road element weight map corresponding to image feature map +.>

The value corresponding to point (u, v), is->

Weight map M corresponding to road element mask map _r The value corresponding to point (u, v), is- >

For the feature corresponding to the (u, v) point of the road element depth feature map, the +.>

Is the feature corresponding to the feature depth map at the point (u, v).

Optionally, generating the road element depth feature map and the road element mask map according to the road layout aerial view includes:

quantifying the road layout aerial view to obtain a road information map;

determining coordinates of points corresponding to all the road elements in the road information graph in a world coordinate system to form a road element point set;

converting each coordinate in the road element point set into an image coordinate system to obtain a road element depth map and the road element mask map;

and obtaining the road element depth feature map according to the road element depth map.

Optionally, converting each of the coordinates in the set of road element points into an image coordinate system includes:

and converting each coordinate in the road element point set into an image coordinate system through an image acquisition device parameter, a rotation translation relation between the image acquisition device and the automatic driving vehicle, and a rotation translation relation between the pose of the automatic driving vehicle and the world coordinate system.

and converting each coordinate in the road element point set into an image coordinate system by matrix operation.

Optionally, obtaining the road element depth feature map according to the road element depth map includes:

expanding each element value in the road element depth map into a D-dimensional vector; the element values are depth values corresponding to points in the road element depth map in an image acquisition equipment coordinate system;

the D-dimensional vector is normalized such that the sum of all values in the D-dimensional vector is equal to 1.

Optionally, quantifying the road layout aerial view, and obtaining the road information map includes:

screening points in the road layout aerial view, which are located in the road element range, and points which are not located in the road element range;

setting the value of a point in the road element range to be 1, and setting the value of a point which is not in the road element range to be 0, thereby obtaining a road information graph.

Optionally, when the road element is a road surface, determining coordinates of points corresponding to all the road elements in the road information map in a world coordinate system includes:

And determining X values, Y values and Z values of points corresponding to all road surfaces in the road information map in a world coordinate system, wherein the Z value is equal to 0.

Optionally, the method further comprises:

enhancing the image features in the image feature map to obtain an enhanced image feature map;

correspondingly, the vehicle detection according to the corrected feature depth map and the image feature map comprises the following steps:

and detecting the vehicle according to the corrected characteristic depth map and the enhanced image characteristic map.

Optionally, the detecting the vehicle according to the corrected feature depth map and the image feature map includes:

generating a 3D feature map under an image coordinate system by the corrected feature depth map and the image feature map;

converting the 3D feature map to a BEV feature map;

and detecting the vehicle by using the BEV characteristic diagram.

Optionally, converting the 3D feature map to a BEV feature map includes:

converting the characteristics of each point in the 3D characteristic map into a vehicle coordinate system through the parameters of the image acquisition equipment and the rotation translation relation between the image acquisition equipment and the vehicle;

carrying out voxelization on points in the vehicle coordinate system, accumulating the characteristic points falling into the same voxel grid, setting the characteristic of the voxel grid without the characteristic points falling into 0, and forming a 3D characteristic map in the vehicle coordinate system;

And accumulating the features corresponding to the voxel grids at all heights in the height dimension to obtain the BEV feature map.

Optionally, the method further comprises:

acquiring the vehicle external environment image acquired by vehicle-mounted image acquisition equipment;

and extracting image features of the vehicle external environment image to obtain the image feature map, wherein the image feature map comprises the image features.

Optionally, acquiring the feature depth map corresponding to the image feature map includes:

and inputting the image feature map into a first preset neural network model to perform depth estimation, and obtaining the feature depth map.

Optionally, estimating the weights of the image features belonging to the road elements in the image feature map includes:

and inputting the image features into a second preset neural network model to obtain the weights of the image features belonging to the road elements.

Optionally, the method further comprises:

acquiring the external environment image of the vehicle; the number of the external environment images of the vehicle is at least two, and each external environment image of the vehicle is acquired by vehicle-mounted image acquisition equipment at different positions;

Correcting each characteristic depth map by using the corrected image and the weight to obtain a plurality of corrected characteristic depth maps;

according to the corrected feature depth map and the image feature map, vehicle detection is carried out on the outside of the automatic driving vehicle, and the output of the 3D vehicle detection result comprises:

according to each corrected feature depth map and the image feature map corresponding to the corrected feature depth map, vehicle detection is carried out on the exterior of the automatic driving vehicle, and a 3D vehicle detection result to be combined is obtained;

and combining all the 3D vehicle detection results to be combined, and outputting a 3D vehicle detection result.

The application also provides a vehicle detection device, including:

the first acquisition module is used for acquiring a characteristic depth map corresponding to the image characteristic map; the image feature map corresponds to an external environment image of the vehicle;

the second acquisition module is used for acquiring a road layout aerial view of the current position of the automatic driving vehicle; the road layout aerial view comprises road elements;

the estimating module is used for estimating the weight of the image feature belonging to the road element in the image feature map;

the first generation module is used for generating a corrected image of the road element for the characteristic depth map according to the road layout aerial view;

The correction module is used for correcting the characteristic depth map by utilizing the corrected image and the weight to obtain a corrected characteristic depth map;

and the detection module is used for detecting the vehicle outside the automatic driving vehicle according to the corrected characteristic depth map and the image characteristic map and outputting a 3D vehicle detection result.

The application also provides a vehicle detection model training method, wherein the vehicle detection model comprises a first preset neural network model and a second preset neural network model, and the method comprises the following steps:

inputting an image feature map into the first preset neural network model for depth estimation to obtain a feature depth map corresponding to the image feature map; the image feature map corresponds to an external environment image of the vehicle;

inputting the image features in the image feature map into the second preset neural network model, and estimating the weights of the image features belonging to the road elements;

According to the corrected feature depth map and the image feature map, detecting the exterior of the automatic driving vehicle, and outputting a 3D vehicle detection result;

and training the vehicle detection model based on the loss function to obtain a trained vehicle detection model.

Optionally, the method further comprises:

setting the region with the road element labeling error in the road element mask map to 0, and generating a corrected mask map;

training the second preset neural network model by using the corrected mask map.

Optionally, training the vehicle detection model based on the loss function includes:

based on

Training the vehicle detection model;

in the formula ,

estimating the loss for the depth of the image feature,/->

Loss for vehicle classification->

Estimating losses for vehicle position, length, width, speed,/->

Training loss of road weight corresponding to image feature map, < ->

Estimating the lost weight for the depth of image feature, +.>

Weight lost for vehicle classification, +.>

Weight lost for vehicle position, length, width, speed estimation, < >>

And training the lost weight for the road weight corresponding to the image feature map.

The present application also provides a detector comprising:

a memory for storing a computer program;

And the processor is used for realizing the steps of any one of the vehicle detection methods and the steps of any one of the vehicle detection model training methods when executing the computer program.

The present application also provides a vehicle detection system, comprising:

a detector as described above;

and the image acquisition device is connected with the detector.

The present application also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any one of the above-mentioned vehicle detection methods, or the steps of any one of the above-mentioned vehicle detection model training methods.

The vehicle detection method provided by the application comprises the following steps: acquiring a feature depth map corresponding to the image feature map; the image feature map corresponds to an external environment image of the vehicle; acquiring a road layout aerial view of the current position of the automatic driving vehicle; the road layout aerial view comprises road elements; estimating the weight of the image feature belonging to the road element in the image feature map; generating a corrected image of the road element for the characteristic depth map according to the road layout aerial view; correcting the characteristic depth map by using the corrected image and the weight to obtain a corrected characteristic depth map; and detecting the exterior of the automatic driving vehicle according to the corrected feature depth map and the image feature map, and outputting a 3D vehicle detection result.

Therefore, when the vehicle is detected, the road layout aerial view of the current position of the automatic driving vehicle and the weight of the image feature belonging to the road element in the image feature map are obtained, the correction image of the feature depth map is obtained by utilizing the road layout aerial view, and then the feature depth map is corrected by utilizing the correction image and the weight.

In addition, the application also provides a device, a detector, a system model training method and a computer readable storage medium with the advantages.

Drawings

For a clearer description of embodiments of the present application or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description that follow are only some embodiments of the present application, and that other drawings may be obtained from these drawings by a person of ordinary skill in the art without inventive effort.

Fig. 1 is a flowchart of a vehicle detection method according to an embodiment of the present application;

FIG. 2 is a flow chart of another vehicle detection method according to an embodiment of the present application;

FIG. 3 is a flow chart of another vehicle detection method according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a vehicle detection method according to an embodiment of the present application;

fig. 5 is a block diagram of a vehicle detection device according to an embodiment of the present application;

FIG. 6 is a block diagram of a detector provided in an embodiment of the present application;

fig. 7 is a block diagram of a vehicle detection system according to an embodiment of the present application.

Detailed Description

In order to provide a better understanding of the present application, those skilled in the art will now make further details of the present application with reference to the drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

As described in the background section, when the automatic driving system detects the surrounding vehicle, depth estimation is completed based on the external environment image information, and in the process of mapping the features into the three-dimensional space, dislocation and confusion between the 3D space features and the real objects are often caused, so that the detection result is affected.

In view of this, the present application provides a vehicle detection method, please refer to fig. 1, including:

step S101: acquiring a feature depth map corresponding to the image feature map; the image feature map corresponds to an image of the environment outside the vehicle.

As an implementation manner, acquiring a feature depth map corresponding to an image feature map includes: acquiring an image feature map corresponding to an external environment image of a vehicle; and acquiring a feature depth map corresponding to the image feature map. Alternatively, the feature depth map that has been processed is obtained directly.

The acquiring of the image feature map corresponding to the vehicle external environment image includes:

and inputting the external environment image of the vehicle into a third preset neural network model for image feature extraction to obtain an image feature map. The third preset neural network model is a multi-layer convolutional neural network, and the number of layers and the number of channels are set in a feasible manner.

Optionally, before acquiring the feature depth map corresponding to the image feature map, the method may further include:

and acquiring the vehicle external environment image.

The vehicle external environment image may be acquired by an in-vehicle image acquisition device. In the present application, the number of the external environment images of the vehicle is not limited, and may be one or two or more, as the case may be.

The image features in the vehicle external environment image are obtained by locally extracting the features of the vehicle external environment image.

Through a first preset neural network model and image characteristics F ₀ Obtaining a feature depth map D _C Reference is made to the related art, and detailed description thereof is omitted. The first preset neural network model is a multi-layer convolutional neural network, and the number of layers and the number of channels are set.

Feature depth map

Wherein R is a real number, H _F and W_F The dimension of the image features is that D is the number of quantized depth, namely the designated depth _min , depth _max ]Dividing the depth value into D units, wherein the value on the ith (i is more than or equal to 1 and less than or equal to D) unit represents the probability that the depth value of the current characteristic point is in the depth range of the ith unit.

Step S102: acquiring a road layout aerial view of the current position of the automatic driving vehicle; the road layout bird's eye view includes road elements.

Road elements refer to objects that are present or appear on a road.

It should be noted that, the road elements are not limited in this application, and include static road elements and dynamic road elements. Wherein, the static road elements can be pavements, crosswalks, signal lamps and the like; the dynamic road element may be a pedestrian, a bicycle, a traveling vehicle, etc.

In the present application, the method for obtaining the road layout bird's eye view is not limited, and may be used as appropriate. For example, the road map can be obtained by means of aerial photography, mapping, labeling of static road elements and the like.

The dynamic road elements may be obtained by a dynamic road element self-positioning system or by a dynamic object detection instrument external to the autonomous vehicle, the detected dynamic road elements including the location and size of the dynamic road elements.

Step S103: and estimating the weight of the image feature belonging to the road element in the image feature map.

The image feature map is an image formed by extracting features of an external environment image of the vehicle.

As an implementation manner, estimating the weights of the image features belonging to the road elements in the image feature map includes:

Weighting of

R is a real number, H _F and W_F For the scale of the image feature, weight W _r Representing the probability that an image feature belongs to a road element, the weight of each image feature in the image feature map is between 0 and 1.

The second preset neural network model is a multi-layer convolutional neural network, and the number of layers and the number of channels are set.

Step S104: and generating a corrected image of the road element for the characteristic depth map according to the road layout aerial view.

Step S105: and correcting the characteristic depth map by using the corrected image and the weight to obtain a corrected characteristic depth map.

Feature depth map D _C Corrected feature depth map obtained after correction

R is a real number, H _F and W_F And D is the number of quantized depth, which is the scale of the image features.

Step S106: and detecting the exterior of the automatic driving vehicle according to the corrected feature depth map and the image feature map, and outputting a 3D vehicle detection result.

The 3D vehicle detection results include a 3D position, a length, a width, a speed, and a classification score of a target vehicle outside the autonomous vehicle.

It should be noted that, the image feature map utilized in this step may be a feature map obtained by directly extracting features from an external environment image of the vehicle, or may be a enhanced image feature map obtained by further enhancing features of a feature map obtained by extracting features from an external environment image of the vehicle, which are all within the protection scope of the present application.

In this embodiment, when the vehicle is detected, a road layout aerial view of the current position of the autonomous vehicle and a weight of the image feature belonging to the road element in the image feature map are obtained, and a corrected image of the feature depth map is obtained by using the road layout aerial view. And then correcting the characteristic depth map by using the corrected image and the weight, wherein the road layout aerial view comprises road elements, so that the corrected image can improve the characteristic conversion precision of a non-vehicle area in the characteristic depth map, reduce the influence of background information on vehicle detection, further improve the accuracy of the characteristic depth map, further improve the accuracy of mapping image characteristics to a 3D space, and enable the vehicle detection result to be more accurate.

On the basis of the above embodiments, in one embodiment of the present application, a vehicle detection method includes:

step S201: acquiring a feature depth map corresponding to the image feature map; the image feature map corresponds to an image of the environment outside the vehicle.

Step S202: acquiring a road layout aerial view of the current position of the automatic driving vehicle; the road layout bird's eye view includes road elements.

Step S203: and estimating the weight of the image feature belonging to the road element in the image feature map.

Step S204: and generating a road element depth feature map and a road element mask map according to the road layout aerial view.

The corrected image in this embodiment includes a road element depth feature map and a road element mask map.

The road element depth feature map is beneficial to improving feature conversion accuracy of non-vehicle regions in image features, reducing influence of background information on vehicle detection, and further improving accuracy of vehicle detection.

As one embodiment, generating the road element depth feature map and the road element mask map from the road layout bird's eye view map includes:

step S2041: and quantifying the road layout aerial view to obtain a road information map.

The automatic driving vehicle coordinate system is used for obtaining the road information map.

Rectangular areas of preset length and width under bird view can be quantized into road information map

Wherein R is a real number, H _r and W_r The height and width of the road information map, respectively. The value of each point in the road information map is 0 or 1, and when the point is located in a road element, the value is 1, and when the point is not located in the road element, the value is 0.

Step S2042: and determining coordinates of points corresponding to all the road elements in the road information graph in a world coordinate system to form a road element point set.

As an implementation manner, when the road element is a road surface, determining coordinates of points corresponding to all the road elements in the road information map in a world coordinate system includes:

Due to the trackThe road surface is horizontal, so Z to 0 in the world coordinate system. Correspondingly, the set of road surface points

, wherein ,/>

For the coordinates of the road surface point in the world coordinate system, +.>

The corresponding values of the points in the road surface information map are indicated.

Step S2043: and converting each coordinate in the road element point set into an image coordinate system to obtain a road element depth map and the road element mask map.

Road element depth map

Wherein R is a real number, H _F and W_F Is the scale of the image feature. Road element depth map->

Represents the depth value of the road element point in the image coordinate system.

Road element mask diagram

Wherein R is a real number, H _F and W_F Is the scale of the image feature. Road element mask map->

Is 0 or 1,0 representing that no road element point exists at the current location, and 1 representing that a road element point exists at the current location.

The specific process of converting each coordinate in the road element point set to the image coordinate system may refer to the related art, and will not be described in detail herein.

Step S2044: and obtaining the road element depth feature map according to the road element depth map.

step S2044a: expanding each element value in the road element depth map into a D-dimensional vector; the element values are depth values corresponding to points in the road element depth map in an image acquisition equipment coordinate system.

The dimension D in the D-dimensional vector is the same value as the depth quantization number D.

The values of the D-dimensional vector are subject to averaging with the element values,

is a gaussian function of variance.

Step S2044b: the D-dimensional vector is normalized such that the sum of all values in the D-dimensional vector is equal to 1.

The obtained road element depth feature map

Wherein R is a real number, H _F and W_F For the scale of the image features, D is the road element depth map +.>

The dimension of the vector is extended by each element value in (a).

Step S205: and correcting the feature depth map through weighted summation and masking by using the road element depth feature map, the road element mask map and the weights to obtain a corrected feature depth map.

(1)；

wherein ,

Road element weight map corresponding to image feature map +.>

The value corresponding to point (u, v), is->

Weight map M corresponding to road element mask map _r The value corresponding to point (u, v), is->

Is the feature corresponding to the feature depth map at the point (u, v). Wherein, due to the road element mask map->

Each point in the road element mask map is fetched to operate, and the value of each point is not 0, namely 1, is fetched to operate The values of (2) are called weights to form a weight map, so the weight map M corresponding to the road element mask map _r Substantially with the road element mask map->

The same applies.

In the corrected characteristic depth map, the position without road elements is [ ]

) Mainly the depth estimated by the image features, at the position of the road element (++>

) The image feature estimation and the road element depth feature map are weighted mainly.

Step S206: and detecting the exterior of the automatic driving vehicle according to the corrected feature depth map and the image feature map, and outputting a 3D vehicle detection result.

On the basis of any one of the foregoing embodiments, in one embodiment of the present application, the vehicle detection method further includes:

Image feature extraction can be performed through a deep neural network to obtain image features

Wherein R is a real number, H _F and W_F C is the channel number of the image feature, which is the scale of the image feature. The specific extraction process may refer to the related art, and will not be described in detail herein.

And inputting the image features in the image feature map into a fourth preset neural network model to obtain the enhanced image feature map. The fourth preset neural network model is a multi-layer convolutional neural network, and the number of layers and the number of channels are set.

Enhanced image feature map

Wherein R is a real number, H _F and W_F C is the channel number of the image feature, which is the scale of the image feature.

In the embodiment, the image characteristics are enhanced, so that the expression capability of the image characteristics can be further improved, and the accuracy of vehicle detection is further improved.

Referring to fig. 3, in one embodiment of the present application, the vehicle detection method includes:

step S301: acquiring a feature depth map corresponding to the image feature map; the image feature map corresponds to an image of the environment outside the vehicle.

Step S302: acquiring a road layout aerial view of the current position of the automatic driving vehicle; the road layout bird's eye view includes road elements.

Step S303: and estimating the weight of the image feature belonging to the road element in the image feature map.

Step S304: and generating a corrected image of the road element for the characteristic depth map according to the road layout aerial view.

Step S305: and correcting the characteristic depth map by using the corrected image and the weight to obtain a corrected characteristic depth map.

Step S306: and generating a 3D feature map under an image coordinate system by the corrected feature depth map and the image feature map.

Depth of corrected featureDegree graph D _m The process of generating the 3D feature map with respect to the image feature map may refer to the related art, and will not be described in detail herein. 3D feature map in image coordinate system

Features G of each point in (2) _m （u，v，i）=D _m （u，v，i）×F _C (u, v) wherein R is a real number, H _F and W_F The dimension of the image features is that D is the number of depth quantization and C is the number of channels of the image features.

Step S307: the 3D feature map is converted to a BEV feature map.

As an implementation manner, converting the 3D feature map into the BEV feature map includes:

The specific process of converting the 3D feature map into the BEV feature map may refer to the related art, and will not be described in detail herein.

BEV (Bird-eye's-view) feature map

Where R is a real number, X, Y represents the dimension of the BEV feature map and C is the number of channels of the image feature.

Step S308: and detecting the vehicle by using the BEV characteristic diagram.

And (3) detecting the vehicle through a neural network by utilizing the BEV characteristic diagram, and outputting the 3D position, the length, the width, the height, the speed and the classification score of the target vehicle outside the automatic driving vehicle. The detection process may refer to the related art, and will not be described in detail herein.

It can be understood that the number of the feature depth maps is equal to the number of the external environment images of the vehicle, and accordingly, how many external environment images of the vehicle exist can obtain how many corrected feature depth maps. And carrying out one-time vehicle detection by utilizing the corrected feature depth map and the image feature map corresponding to the same vehicle external environment image, correspondingly obtaining a 3D vehicle detection result to be combined, and finally combining all the 3D vehicle detection results to be combined and outputting.

The following describes a vehicle detection method in the present application, taking a road element as an example of a road surface. The corresponding architecture diagram of the vehicle detection method is shown in fig. 4.

Step 1, acquiring an external environment image of a vehicle acquired by a camera, and performing image acquisition on the external environment image of the vehicle through a third preset depth neural network pairImage feature extraction is carried out on an external environment image of a vehicle, and an image feature map is obtained

Wherein R is a real number, H _F and W_F C is the channel number of the image feature;

step 2, image feature map F ₀ Inputting the image characteristics into a fourth preset neural network model, and carrying out image characteristic enhancement on the image characteristics to obtain an enhanced image characteristic diagram

step 3, image feature F ₀ Inputting the image feature map into a first preset neural network model to obtain a feature depth map corresponding to the image feature map

Wherein R is a real number, H _F and W_F The dimension of the image features is D, namely the number of depth quantization;

step 4, image feature map F ₀ Inputting a second preset neural network model to obtain a road weight corresponding to the image feature map

The weight of each image feature is between 0 and 1, and the road weight W _r Representing the probability that the image features belong to the road surface;

step 5, obtaining road surface images through aerial photography, mapping or marking, and quantifying rectangular areas with specified length and width under bird views into road surface information maps based on an automatic driving vehicle coordinate system

Wherein R is a real number, H _r and W_r The height and the width of the pavement information map are respectively;

step 6, calculating the world coordinate system of all the points of the road surface in the road surface information mapThe X and Y values below and the Z value is set to 0 (representing a horizontal road surface) to form a set of road surface points in the world coordinate system

；/>

Step 7, converting the road surface points into an image coordinate system through matrix operation by utilizing the existing camera internal reference data, the rotation translation relation between the camera and the vehicle and the rotation translation relation between the vehicle pose and the world coordinate system to form a road surface depth map

And pavement mask pattern->

Wherein R is a real number, H _F and W_F Is the scale of the image feature;

step 8, mapping the road surface depth

Expanding each element value into a D-dimensional vector, and normalizing to obtain a pavement depth feature map ++>

Wherein R is a real number, H _F and W_F For the scale of the image features, D is the road element depth map +. >

The dimension of the vector of each element value expansion;

step 9, inputting a characteristic depth map D _C Road surface depth feature map D _r Road weight W corresponding to image feature map _r With road mask pattern

Obtaining a corrected depth map +.>

R is a real number, H _F and W_F For image featuresD is the number of quantized depths, and each value in the modified depth map is obtained by using formula (1):

(1)；

step 10, correcting the depth map D by the prior art _m And enhanced image feature map F _c Generating 3D feature maps in an image coordinate system

Wherein R is a real number, H _F and W_F The dimension of the image features is that D is the number of depth quantization and C is the number of channels of the image features;

step 11, 3D characteristic diagram G _m Conversion to BEV feature maps

Wherein R is a real number, X, Y represents the dimension of the BEV feature map, and C is the number of channels of the image feature;

and 12, using the BEV characteristic diagram, performing vehicle detection through a neural network, and outputting the 3D position, the length, the width, the height, the speed and the classification score of the target vehicles around the automatic driving vehicle.

Step S401: inputting an image feature map into the first preset neural network model for depth estimation to obtain a feature depth map corresponding to the image feature map; the image feature map corresponds to an external environment image of the vehicle;

step S402: acquiring a road layout aerial view of the current position of the automatic driving vehicle; the road layout aerial view comprises road elements;

step S403: inputting the image features in the image feature map into the second preset neural network model, and estimating the weights of the image features belonging to the road elements;

step S404: generating a corrected image of the road element for the characteristic depth map according to the road layout aerial view;

step S405: correcting the characteristic depth map by using the corrected image and the weight to obtain a corrected characteristic depth map;

step S406: according to the corrected feature depth map and the image feature map, detecting the exterior of the automatic driving vehicle, and outputting a 3D vehicle detection result;

step S407: and training the vehicle detection model based on the loss function to obtain a trained vehicle detection model.

On the basis of the above embodiment, in one embodiment of the present application, the vehicle detection model training method further includes:

and training a second preset neural network model by using the corrected mask map.

Pavement mask pattern

Is 0 or 1,0 representing that no road element point exists at the current location, and 1 representing that a road element point exists at the current location. The region of the road element labeling error, i.e., the region in the road element mask map is not a road element, but the region in the road element mask map is labeled as a road element.

In this embodiment, the second preset neural network model is trained through the corrected mask map, so that the weight obtained through the trained second preset neural network model is more accurate, and further the vehicle detection accuracy is improved.

On the basis of any one of the foregoing embodiments, in one embodiment of the present application, training the vehicle detection model based on a loss function includes:

training the vehicle detection model based on a preset formula; the preset formula is:

(2)；

in the formula ,

estimating the loss for the depth of the image feature,/->

Loss for vehicle classification->

Estimating losses for vehicle position, length, width, speed,/->

Training loss of road weight corresponding to image feature map, < - >

Estimating the lost weight for the depth of image feature, +.>

Weight lost for vehicle classification, +.>

Weight lost for vehicle position, length, width, speed estimation, < >>

The following describes a vehicle detection device provided in an embodiment of the present application, and the vehicle detection device described below and the vehicle detection method described above may be referred to correspondingly to each other.

Fig. 5 is a block diagram of a vehicle detection device according to an embodiment of the present application, and referring to fig. 5, the vehicle detection device may include:

a first obtaining module 100, configured to obtain a feature depth map corresponding to an image feature map; the image feature map corresponds to an external environment image of the vehicle;

a second obtaining module 200, configured to obtain a road layout aerial view of a current position of the autopilot vehicle; the road layout aerial view comprises road elements;

an estimating module 300, configured to estimate a weight of the road element to which the image feature belongs in the image feature map;

a first generation module 400, configured to generate a corrected image of the road element for the feature depth map according to the road layout aerial view;

the correction module 500 is configured to correct the feature depth map by using the corrected image and the weight, so as to obtain a corrected feature depth map;

And the detection module 600 is configured to perform vehicle detection on the exterior of the autonomous vehicle according to the corrected feature depth map and the image feature map, and output a 3D vehicle detection result.

The vehicle detection apparatus of the present embodiment is used to implement the foregoing vehicle detection method, so that the specific implementation in the vehicle detection apparatus may be found in the foregoing example portions of the vehicle detection method, for example, the first acquisition module 100, the second acquisition module 200, the estimation module 300, the first generation module 400, the correction module 500, and the detection module 600, which are respectively used to implement steps S101, S102, S103, S104, S105, and S106 in the foregoing vehicle detection method, so that the specific implementation thereof may refer to the description of the corresponding respective portion examples and will not be repeated herein.

When the device in the embodiment detects the vehicle, a road layout aerial view of the current position of the automatic driving vehicle and the weight of the image feature belonging to the road element in the image feature map are obtained, and a corrected image of the feature depth map is obtained by using the road layout aerial view. And then correcting the characteristic depth map by using the corrected image and the weight, wherein the road layout aerial view comprises road elements, so that the corrected image can improve the characteristic conversion precision of a non-vehicle area in the characteristic depth map, reduce the influence of background information on vehicle detection, further improve the accuracy of the characteristic depth map, further improve the accuracy of mapping image characteristics to a 3D space, and enable the vehicle detection result to be more accurate.

Optionally, the first generating module 400 is specifically configured to: generating a road element depth feature map and a road element mask map according to the road layout aerial view map;

accordingly, the correction module 500 is specifically configured to: and correcting the feature depth map through weighted summation and masking by using the road element depth feature map, the road element mask map and the weights to obtain a corrected feature depth map.

Optionally, the correction module 500 is specifically configured to:

；

wherein ,

Road element weight map corresponding to image feature map +.>

The value corresponding to point (u, v), is->

Is the feature corresponding to the feature depth map at the point (u, v).

Optionally, the first generating module 400 includes:

the quantization sub-module is used for quantizing the road layout aerial view to obtain a road information map;

the determining submodule is used for determining coordinates of points corresponding to all the road elements in the road information graph in a world coordinate system to form a road element point set;

The first conversion sub-module is used for converting each coordinate in the road element point set into an image coordinate system to obtain a road element depth map and the road element mask map;

and the obtaining submodule is used for obtaining the road element depth feature map according to the road element depth map.

Optionally, the first conversion sub-module is specifically configured to: and converting each coordinate in the road element point set into an image coordinate system through an image acquisition device parameter, a rotation translation relation between the image acquisition device and the automatic driving vehicle, and a rotation translation relation between the pose of the automatic driving vehicle and the world coordinate system.

Optionally, when the first conversion sub-module converts each coordinate in the road element point set into an image coordinate system, the first conversion sub-module specifically converts each coordinate in the road element point set into the image coordinate system by using matrix operation.

Optionally, obtaining the submodule includes:

the expansion unit is used for expanding each element value in the road element depth map into a D-dimensional vector; the element values are depth values corresponding to points in the road element depth map in an image acquisition equipment coordinate system;

And the normalization unit is used for normalizing the D-dimensional vector so that the sum of all values in the D-dimensional vector is equal to 1.

Optionally, the quantization submodule includes:

a screening unit, configured to screen points in the road layout aerial view that are located in the road element range and points that are not located in the road element range;

and the setting unit is used for setting the value of a point in the road element range to be 1 and setting the value of a point which is not in the road element range to be 0, so as to obtain a road information graph.

Optionally, when the road element is a road surface, the determining submodule is specifically configured to determine an X value, a Y value, and a Z value of points corresponding to all road surfaces in the road information map in a world coordinate system, where the Z value is equal to 0.

Optionally, the vehicle detection device further includes:

the feature enhancement module is used for enhancing the image features in the image feature map to obtain an enhanced image feature map;

accordingly, the detection module 600 is specifically configured to: and detecting the vehicle according to the corrected characteristic depth map and the enhanced image characteristic map.

Optionally, the detection module 600 includes:

the generation submodule is used for generating a 3D feature map under an image coordinate system from the corrected feature depth map and the image feature map;

A second conversion sub-module for converting the 3D feature map to a BEV feature map;

and the detection sub-module is used for detecting the vehicle by utilizing the BEV characteristic diagram.

Optionally, the second conversion submodule includes:

the conversion unit is used for converting the characteristics of each point in the 3D characteristic map into a vehicle coordinate system through the parameters of the image acquisition equipment and the rotation translation relation between the image acquisition equipment and the vehicle;

the voxelization unit is used for voxelization of points in the vehicle coordinate system, accumulating the characteristic points falling into the same voxel grid, setting the characteristic of the voxel grid without the characteristic point to be 0, and forming a 3D characteristic map in the vehicle coordinate system;

and the accumulation unit is used for accumulating the characteristics corresponding to the voxel grids at all heights in the height dimension to obtain the BEV characteristic diagram.

Optionally, the vehicle detection device further includes:

the third acquisition module is used for acquiring the vehicle external environment image acquired by the vehicle-mounted image acquisition equipment;

the feature extraction module is used for extracting image features of the vehicle external environment image to obtain the image feature map, wherein the image feature map comprises the image features.

Optionally, the first obtaining module 100 is specifically configured to input the image feature map into a first preset neural network model for depth estimation, so as to obtain the feature depth map.

Optionally, the estimation module 300 is specifically configured to input the image feature into a second preset neural network model, so as to obtain a weight of the road element to which the image feature belongs.

Optionally, the vehicle detection device further includes:

a fourth acquisition module configured to acquire the vehicle external environment image; the number of the external environment images of the vehicle is at least two, and each external environment image of the vehicle is acquired by vehicle-mounted image acquisition equipment at different positions;

accordingly, the correction module 500 is specifically configured to: correcting each characteristic depth map by using the corrected image and the weight to obtain a plurality of corrected characteristic depth maps;

the detection module 600 includes:

the detection sub-module is used for detecting the vehicle outside the automatic driving vehicle according to each corrected characteristic depth map and the image characteristic map corresponding to the corrected characteristic depth map to obtain a 3D vehicle detection result to be combined;

and the combination sub-module is used for combining all the 3D vehicle detection results to be combined and outputting the 3D vehicle detection results.

The following describes a detector provided in an embodiment of the present application, and the detector described below and the vehicle detection method described above may be referred to correspondingly to each other.

Fig. 6 is a block diagram of a detector according to an embodiment of the present application, where the detector may include:

a memory 11 for storing a computer program;

the processor 12 is configured to implement the steps of the vehicle detection method according to any one of the above embodiments and the steps of the vehicle detection model training method according to any one of the above embodiments when executing the computer program.

When the detector of the embodiment detects the vehicle, a road layout aerial view of the current position of the automatic driving vehicle and the weight of the image feature belonging to the road element in the image feature map are obtained, and a corrected image of the feature depth map is obtained by using the road layout aerial view. And then correcting the characteristic depth map by using the corrected image and the weight, wherein the road layout aerial view comprises road elements, so that the corrected image can improve the characteristic conversion precision of a non-vehicle area in the characteristic depth map, reduce the influence of background information on vehicle detection, further improve the accuracy of the characteristic depth map, further improve the accuracy of mapping image characteristics to a 3D space, and enable the vehicle detection result to be more accurate.

The following describes a vehicle detection system provided in an embodiment of the present application, and the vehicle detection system described below and the vehicle detection method described above may be referred to correspondingly.

Referring to fig. 7, the present application further provides a vehicle detection system, including:

the detector 1 described in the above embodiment;

an image acquisition device 2 connected to the detector.

The image acquisition device is mounted on the autonomous vehicle for acquiring an image of a vehicle exterior environment outside the autonomous vehicle. The image capturing device may be a camera, a video camera, or the like.

It should be noted that the number of image capturing devices is not limited in this application. The number of the image acquisition devices may be one or two or more. When the number of the image capturing devices is more than two, the number of the image capturing devices is set at different positions of the autonomous vehicle.

When the vehicle detection system of the embodiment detects a vehicle, a road layout aerial view of the current position of the automatic driving vehicle and weights of image features belonging to road elements in the image feature map are obtained, and a corrected image of the feature depth map is obtained by using the road layout aerial view. And then correcting the characteristic depth map by using the corrected image and the weight, wherein the road layout aerial view comprises road elements, so that the corrected image can improve the characteristic conversion precision of a non-vehicle area in the characteristic depth map, reduce the influence of background information on vehicle detection, further improve the accuracy of the characteristic depth map, further improve the accuracy of mapping image characteristics to a 3D space, and enable the vehicle detection result to be more accurate.

The present application further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps of the vehicle detection method according to any one of the above embodiments, and the steps of the vehicle detection model training method according to any one of the above embodiments.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The vehicle detection method, apparatus, detector, system, model training method, and computer readable storage medium provided by the present application are described in detail above. Specific examples are set forth herein to illustrate the principles and embodiments of the present application, and the description of the examples above is only intended to assist in understanding the methods of the present application and their core ideas. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.

Claims

1. A vehicle detection method, characterized by comprising:

generating a corrected image of the road element for the feature depth map according to the road layout aerial view comprises:

2. The vehicle detection method of claim 1, wherein modifying the feature depth map by weighted summation and masking using the road element depth feature map, the road element mask map, and the weights, the obtaining a modified feature depth map comprising:

；

wherein ,

Road element weight map corresponding to image feature map +.>

The value corresponding to point (u, v), is->

Is the feature corresponding to the feature depth map at the point (u, v).

3. The vehicle detection method according to claim 1, wherein generating a road element depth feature map and a road element mask map from the road layout bird's eye view map includes:

quantifying the road layout aerial view to obtain a road information map;

4. The vehicle detection method of claim 3, wherein converting each of the coordinates in the set of road element points into an image coordinate system comprises:

5. The vehicle detection method of claim 4, wherein converting each of the coordinates in the set of road element points into an image coordinate system comprises:

6. The vehicle detection method of claim 3, wherein obtaining the road element depth feature map from the road element depth map comprises:

7. The vehicle detection method according to claim 3, wherein quantifying the road layout bird's eye view map includes:

8. The vehicle detection method according to claim 3, wherein when the road element is a road surface, determining coordinates in a world coordinate system of points corresponding to all the road elements in the road information map includes:

9. The vehicle detection method according to claim 1, characterized by further comprising:

10. The vehicle detection method according to claim 1, wherein performing vehicle detection based on the corrected feature depth map and the image feature map includes:

converting the 3D feature map to a BEV feature map;

and detecting the vehicle by using the BEV characteristic diagram.

11. The vehicle detection method of claim 10, wherein converting the 3D signature to a BEV signature comprises:

12. The vehicle detection method according to claim 1, characterized by further comprising:

13. The vehicle detection method according to claim 12, wherein acquiring a feature depth map corresponding to the image feature map includes:

14. The vehicle detection method according to claim 1, wherein estimating weights of image features belonging to the road elements in the image feature map includes:

15. The vehicle detection method according to any one of claims 1 to 14, characterized by further comprising:

16. A vehicle detection apparatus, characterized by comprising:

the detection module is used for detecting the vehicle outside the automatic driving vehicle according to the corrected feature depth map and the image feature map and outputting a 3D vehicle detection result;

the first generation module is specifically configured to: generating a road element depth feature map and a road element mask map according to the road layout aerial view map;

correspondingly, the correction module is specifically configured to: and correcting the feature depth map through weighted summation and masking by using the road element depth feature map, the road element mask map and the weights to obtain a corrected feature depth map.

17. A vehicle detection model training method, wherein the vehicle detection model comprises a first preset neural network model and a second preset neural network model, the method comprising:

training the vehicle detection model based on a loss function to obtain a trained vehicle detection model;

18. The vehicle detection model training method of claim 17, further comprising:

19. The vehicle detection model training method according to claim 17 or 18, characterized in that training the vehicle detection model based on a loss function includes:

based on

Training the vehicle detection model;

in the formula ,

estimating the loss for the depth of the image feature,/->

Loss for vehicle classification->

Estimating losses for vehicle position, length, width, speed,/->

Training loss of road weight corresponding to image feature map, < ->

Estimating the lost weight for the depth of image feature, +.>

Weight lost for vehicle classification, +.>

The lost weight is estimated for vehicle position, length-width-height, speed,

20. A detector, comprising:

a memory for storing a computer program;

processor for implementing the steps of the vehicle detection method according to any one of claims 1 to 15, the steps of the vehicle detection model training method according to any one of claims 17 to 19 when executing the computer program.

21. A vehicle detection system, characterized by comprising:

the detector of claim 20;

and the image acquisition device is connected with the detector.

22. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the vehicle detection method according to any one of claims 1 to 15, the steps of the vehicle detection model training method according to any one of claims 17 to 19.