CN116311148A

CN116311148A - Large-scale vehicle target detection method and device, electronic equipment and automatic driving automobile

Info

Publication number: CN116311148A
Application number: CN202211707518.XA
Authority: CN
Inventors: 段凯文; 董嘉蓉; 王昊; 陈钊苇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-06-23

Abstract

The disclosure relates to the technical field of automatic driving, in particular to the technical field of computer vision, and particularly relates to a large-scale vehicle target detection method, a large-scale vehicle target detection device, electronic equipment and an automatic driving automobile. The specific implementation scheme is as follows: acquiring point cloud data of a target detection object; respectively predicting and obtaining a first target detection frame and a plurality of visible corner points based on the point cloud data; adjusting the vertex position of the first target detection frame based on the position of the visible corner point to obtain a second target detection frame; and identifying based on the second target detection frame to obtain a target detection result of the target detection object. The target detection frame and the visible corner point are respectively predicted through the point cloud data, the visible corner point is utilized to adjust the target detection frame, and as the corner point is positioned at the edge of the object and contains richer semantic information, the target detection frame after the corner point adjustment is more accurate, and the method is beneficial to the follow-up vehicle speed prediction, vehicle tracking and other tasks after the lifting.

Description

Large-scale vehicle target detection method and device, electronic equipment and automatic driving automobile

Technical Field

The disclosure relates to the technical field of automatic driving, in particular to the technical field of computer vision, and especially relates to a large-scale vehicle target detection method, a device, electronic equipment, media and an automatic driving automobile.

Background

In the running process of the unmanned vehicle, the position, the size, the orientation, the speed and other attributes of surrounding vehicles need to be accurately identified so as to accurately conduct running pre-judgment, thereby improving running safety and riding experience. For example, the unmanned vehicle detects that a vehicle exists in front of the left side of the unmanned vehicle, and predicts the position, the size and the orientation of the vehicle to further judge the intention of the vehicle to change the right lane, and at the moment, the unmanned vehicle adopts early deceleration to avoid the risk of rear-end collision. Existing 3D object detection techniques typically use features near the center of an object to predict the center point coordinates, length, width, height, and angle of the object to determine the position, size, and orientation of the object. However, this approach is not friendly for lidar-based vehicle detection, especially for large vehicles (typically vehicles longer than 10 meters). Because the point cloud of the lidar is reflected after sweeping to the surface of the vehicle, the semantic information of the vehicle is mostly concentrated at the edges of the vehicle, not at the center of the object. Although this problem can be alleviated by increasing the detection receptive field, in a large vehicle, the increase in the detection receptive field brings about a limited gain due to the long center distance from the head to the tail, and thus the problem of inaccurate detection of position, size, and orientation is likely to occur. In addition, due to the overlong large-sized vehicle, the head and tail of the vehicle can amplify the orientation error, so that the follow-up vehicle speed tracking and the running planning of the unmanned vehicle are affected.

Disclosure of Invention

The disclosure provides a large-scale vehicle target detection method and device, a training method of a target detection model, training device electronic equipment, storage media and an automatic driving automobile.

According to a first aspect of the present disclosure, there is provided a target detection method including:

acquiring point cloud data of a target detection object;

and respectively predicting and obtaining a first target detection frame and a plurality of visible corner points based on the point cloud data: extracting semantic features of the point cloud data; extracting corner features in the semantic features through a corner thermodynamic diagram, and predicting the positions of a plurality of visible corners based on the corner features; extracting target detection features based on the semantic features, and constructing the first target detection frame based on the target detection features;

adjusting the vertex position of the first target detection frame based on the position of the visible corner point to obtain a second target detection frame;

and identifying based on the second target detection frame to obtain a target detection result of the target detection object.

According to a second aspect of the present disclosure, there is provided an object detection apparatus including:

the acquisition module is configured to acquire point cloud data of a target detection object;

The prediction module is configured to respectively predict and obtain a first target detection frame and a plurality of visible corner points based on the point cloud data;

the prediction module includes:

a feature extraction unit configured to extract semantic features of the point cloud data;

the corner prediction unit is configured to extract corner features in the semantic features through a corner thermodynamic diagram, and predict positions of a plurality of visible corners based on the corner features;

a target detection frame prediction unit configured to extract target detection features based on the semantic features, and construct the first target detection frame based on the target detection features;

the adjusting module is configured to adjust the vertex position of the first target detection frame based on the position of the visible corner point to obtain a second target detection frame;

and the target detection module is configured to recognize based on the second target detection frame to obtain a target detection result of the target detection object.

According to a third aspect of the present disclosure, there is provided a training method of a target detection model, including:

acquiring point cloud data of a target detection object;

acquiring a plurality of visible corner points as training labels;

inputting the point cloud data into a neural network for feature extraction to obtain semantic features, and respectively predicting to obtain a predicted target detection frame and a plurality of predicted visible corner points based on the semantic features;

And calculating the loss between the predicted visible corner point and the training label, and adjusting the weight of the neural network based on the loss to obtain a target detection model.

According to a fourth aspect of the present disclosure, there is provided a training apparatus of an object detection model, including:

the point cloud acquisition module is configured to acquire point cloud data of a target detection object;

the training label generation module is configured to acquire a plurality of visible corner points as training labels;

the training module is configured to input the point cloud data into a neural network for feature extraction to obtain semantic features, and respectively predict a predicted target detection frame and a plurality of predicted visible corner points based on the semantic features;

and the calculating module is configured to calculate the loss between the predicted visible corner point and the training label, and adjust the weight of the neural network based on the loss to obtain a target detection model.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above claims.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above-mentioned technical solutions.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the above-mentioned technical solutions.

According to an eighth aspect of the present disclosure, there is provided an automatic driving automobile, including the electronic device in the above technical solution.

The invention provides a large-scale vehicle target detection method, a device, electronic equipment, a medium and an automatic driving automobile, wherein a target detection frame and a visible corner point are respectively predicted through point cloud data, the target detection frame is adjusted by utilizing the visible corner point, and the corner point is positioned at the edge of an object and contains richer semantic information, so that the target detection frame after the corner point adjustment is more accurate, and the method is beneficial to the follow-up vehicle speed prediction, vehicle tracking and other tasks after the lifting.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic step diagram of a method of large vehicle target detection in an embodiment of the present disclosure;

FIG. 2 is a frame diagram of an object detection model in an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of the relationship between a point cloud coordinate system and a hetmap in an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a corner optimization target detection box in an embodiment of the present disclosure;

FIG. 5a is a schematic view of a group-trunk Heatm ap in an embodiment of the present disclosure;

FIG. 5b is a schematic representation of a predicted heatmap trained in an embodiment of the disclosure;

FIG. 6 is a schematic illustration of selection of corner points visible at the BEV viewing angle in an embodiment of the present disclosure;

FIG. 7 is a functional block diagram of a large vehicle object detection device in an embodiment of the present disclosure;

FIG. 8a is a training flow diagram of a target detection model in an embodiment of the present disclosure;

FIG. 8b is a detection flow diagram of a target detection model in an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a training apparatus of a target detection model in an embodiment of the present disclosure;

fig. 10 is a schematic block diagram of an example electronic device in an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Aiming at the problem of inaccurate detection of a large vehicle in the existing method, the disclosure provides a large vehicle target detection method, as shown in fig. 1, comprising:

step S101, acquiring point cloud data of a target detection object. The point cloud data may be acquired via radar. As shown in the target detection flowchart of fig. 2, the point cloud data acquired by the radar is input into the convolutional neural network.

Step S102, a first target detection frame and a plurality of visible corner points are respectively predicted based on the point cloud data. The first object detection frame is a 3D object detection frame, and the visible corner point can be BEV (bird ey)e of view, top view), the corner points of the object under view angle, which corner points can be defined from two different angles: the corner point is the intersection of two edges; corner points are feature points within a neighborhood with two main directions. The areas where the corner points are located are also typically stable, informative areas in the image, which areas may have certain properties like rotational invariance, scale invariance, affine invariance and illumination intensity invariance. Wherein the visible corner point refers to the minimum distance l from the vertex of the 3D object detection frame _i And (3) corner points smaller than the distance threshold value theta, otherwise, corner points which are invisible. Firstly, semantic features of point cloud data are extracted through a shared convolutional neural network, and then a 3D target detection frame of an object and visible corner points of the object under a BEV view angle are respectively predicted through two branches of the convolutional neural network. The branch of the 3D object detection frame is predicted by a common 3D object detection structure, and the prediction of the visible corner point is performed by a thermodynamic diagram (hetmap) structure shown in fig. 3.

Step S103, the vertex position of the first target detection frame is adjusted based on the position of the visible corner point to obtain a second target detection frame. And fine-tuning the vertex coordinates of the 3D target detection frame by utilizing the positions of the prediction corner points, and adjusting the coordinates of the 3D target detection frame simultaneously achieves the aim of fine-tuning the orientation of the target frame due to the property that the 3D target detection frame is a rigid body.

Step S104, the target detection result of the target detection object is obtained by identifying the target detection frame. As shown in fig. 4, a first target detection frame (solid line frame) composed of the original vertices A, B, C, D is adjusted by the predicted visible corner E, F to obtain a second target detection frame (broken line frame) using E, F as the vertices. The surface of the target detection frame facing the host vehicle O is finely adjusted, and because the corner points are positioned at the edges of the object and contain richer semantic information, the target detection frame after corner point adjustment is more accurate, and the method is beneficial to the tasks of follow-up vehicle speed prediction, vehicle tracking and the like after lifting.

As an optional implementation manner, predicting the first target detection frame and the visible corner based on the point cloud data respectively includes: extracting semantic features of the point cloud data; extracting corner features in the semantic features through a corner thermodynamic diagram, Predicting the positions of a plurality of visible corner points based on the corner point characteristics; extracting target detection features based on the semantic features, and constructing a first target detection frame based on the target detection features. The point cloud information extracts point cloud characteristics through a shared convolutional neural network, and then two small branches are used for respectively predicting a 3D target frame of the object and a visible corner point of the object under the BEV view angle. Wherein the branch associated with the predicted corner is actually a hetmap for determining whether each location is a visible corner and designating the location of the visible corner. Fig. 3 shows the relationship between the point cloud and the hetmap at BEV viewing angles. Heatm is a two-dimensional matrix in the shape of

Wherein H and W are the length and width of the visible size of the point cloud, respectively, and n is the reduction multiple of the visible size of the point cloud to the size of the hectmap. In a position +.>

Is expressed as a confidence level, which represents the confidence level of the position (W _i ,H _i ) Is the probability of a visible corner. Fig. 5 (a) shows a true value hetmap (group-trunk hetmap) generated from visible corner labels, wherein the response of the corner locations is maximum and the response of the corner neighbors is gaussian decreasing. Fig. 5 (b) shows the predicted hetmap obtained by training, and the response around the corner is ideally maximum, and the response at other positions approaches zero. The training is set as the prediction confidence of the position on the hetmap, and is additionally set as the label value of the corresponding position on the group-trunk hetmap, and the designed loss function is as follows:

Wherein N is the number of objects in the input picture, and alpha and beta are two weight parameters respectively.

As an optional implementation manner, extracting corner features in the semantic features through a corner thermodynamic diagram, and before predicting information of a plurality of visible corners based on the corner features, further includes: and predicting a group of compensation vectors for each corner, and compensating the lost information of each corner mapped onto the corner thermodynamic diagram from the point cloud coordinate system.

In this embodiment, in order to predict the corner positions more accurately, a set of compensation vectors (offsets) needs to be predicted for each corner, and offsets represent the information lost when the corner is mapped from the point cloud coordinate system to the hetmap, and can compensate for the lost information when the corner coordinates are mapped from the hetmap back to the origin cloud coordinate system. Firstly, the prediction of the corner point is carried out on the hetmap, the visible range size of the point cloud coordinate system is in an equal proportion scaling relation with the size of the hetmap, information loss is generated when the input image is mapped to the hetmap, for example, when the visible range size of the point cloud coordinate system is in a scaling relation of 4 times the size of the hetmap, the position (102 ) is mapped to the coordinate on the hetmap in the origin cloud coordinate system

If no compensation is added, this coordinate becomes (25×4 ) = (100, 100) when mapped back to the input image. It is therefore necessary for the network to predict offsets to compensate for the lost information. For example, offset= (Δx, Δy), the coordinates mapped back to the input image become [ (25+Δx) ×4, (25+Δy) ×4) ]. During training, the offsets are predicted by using a Smooth L1 loss training network:

wherein the method comprises the steps of

And->

Respectively represent the abscissa and the ordinate of the kth truth angle point, namely the group-trunk angle point, n represents the scaling multiple from the visible range size of the point cloud coordinate system to the hectmap size,i.e. the step size. />

Is the predicted offset value.

As an optional implementation manner, extracting corner features in the semantic features through a corner thermodynamic diagram, predicting positions of a plurality of visible corners based on the corner features, and further including: and predicting a vector of each visible corner point to the center of the target detection object based on the corner features, wherein the vector is used for judging whether the visible corner points belong to the same object.

In this embodiment, in order to avoid ambiguity between two corner points that are close in distance but belong to different objects, each corner point is further predicted to point to a vector of the center of the respective object, and the training is constrained by using the smoth L1 loss:

wherein the method comprises the steps of

ce _x Sum ce _y Respectively representing the coordinates of the center point pointed by the kth group-trunk corner point.

As an optional embodiment, step S103 of adjusting the vertex position of the first target detection frame based on the position of the visible corner to obtain the second target detection frame includes: and acquiring positions of a plurality of vertexes of the first target detection frame facing the host vehicle. And traversing all the visible corner points aiming at each vertex, screening out positions of two visible corner points meeting preset conditions, and adjusting positions of the vertices corresponding to the first target detection frame to obtain a second target detection frame. The first target detection frames are 3D target detection frames, as shown in fig. 4, for each first target detection frame, vertices a, b, c facing the host vehicle O are obtained, and after the predicted visible corner points e, f are adjusted, a second target detection frame (dashed frame) with e, f as vertices is obtained. The surface of the target detection frame facing the host vehicle O is finely adjusted, and because the corner points are positioned at the edges of the object and contain richer semantic information, the target detection frame after corner point adjustment is more accurate, and the method is beneficial to the tasks of follow-up vehicle speed prediction, vehicle tracking and the like after lifting. Because the target detection frame is a rigid body, the target detection frame coordinate is adjusted, and the aim of fine-tuning the target detection frame orientation is fulfilled. Because the prediction of the corner point is more accurate, the target detection frame subjected to the corner point fine adjustment is more accurate, and particularly for a large-sized vehicle, the effect is more obvious.

Further, the visual corner is obtained through prediction, namely the preset condition comprises: (1) The minimum distance between the visible corner point and the corresponding vertex point is smaller than a preset distance threshold value; (2) The prediction confidence of the visible corner point is larger than a set confidence threshold; (3) the visible corner points to the center of the object to which it belongs. Branches related to target frame regression predict a set of 3D target detection frames, while branches related to corner detection predict a set of corner positions. Firstly, acquiring coordinates under the BEV view angle of a 3D target detection frame, acquiring three vertexes facing a host vehicle for each target detection frame, and acquiring three vertexes v for each vertex _i (i= 0,1,3), all the predictor corners are traversed. If the three conditions are met at the same time, the selected corner points are reserved, and the corner points which do not meet the conditions are deleted. If 3 vertexes of the object can select 3 corner points meeting the condition, selecting two corner points closest to the host vehicle, aligning two vertex points corresponding to the first target detection frame with the two corner points, and achieving the purpose of fine adjustment and prediction of the target frame; if 3 vertexes of the object can select 2 corner points meeting the condition, fine-tuning a first target detection frame by using the two corner points; if the number of corner points meeting the condition selected by the 3 vertexes of the object is less than 2, the first target detection frame obtained through prediction does not conduct fine adjustment operation. Fig. 8b is a flowchart of target detection in an embodiment of the disclosure, where point cloud data is input into a pre-trained 3D target detection model, where the 3D target detection model includes two branches of a hetmap and a 3D target detection structure, a visible corner point is predicted by the hetmap, and a 3D target detection frame, i.e., a first target detection frame, is predicted by the 3D target detection structure, and the 3D target detection frame is optimized by using the visible corner point. In the corner optimization process, the height of the 3D target detection frame can be removed, the 3D target detection frame can be converted into a 2D target detection frame, and the 2D target detection frame is subjected to the following steps of The vertex of the target detection frame is adjusted as shown in fig. 4, and the adjusted 2D target detection frame is added with the height and converted into a 3D target detection frame, so that the calculated amount in the corner optimization process can be reduced.

Through the technical scheme, the target detection frame is finely adjusted by screening the visible corner points meeting preset conditions. Because the corner points are positioned at the edges of the objects and contain richer semantic information, and compared with a regression-based method, the hematmap-based method can predict more accurate corner point positions, the corner point optimized surface, namely the surface of the object facing the host vehicle, is more accurate, and the method is favorable for promoting the following tasks of vehicle speed prediction, vehicle tracking and the like. Meanwhile, the prediction target frame is a rigid body, so that the function of fine adjustment of the angle can be achieved.

The present disclosure also provides an object detection apparatus 700, as shown in fig. 7, comprising:

an acquisition module 701 configured to acquire point cloud data of a target detection object. The point cloud data may be acquired via radar. As shown in the target detection flowchart of fig. 2, the point cloud data acquired by the radar is input into the convolutional neural network.

The prediction module 702 is configured to predict and obtain a first target detection frame and a plurality of visible corner points based on the point cloud data, respectively. The first object detection frame is a 3D object detection frame, the visible corner may be an object corner under a BEV (bird eye of view) angle, and the corner may be defined from two different angles: the corner point is the intersection of two edges; corner points are feature points within a neighborhood with two main directions. The areas where the corner points are located are also typically stable, informative areas in the image, which areas may have certain properties like rotational invariance, scale invariance, affine invariance and illumination intensity invariance. Wherein the visible corner point refers to the minimum distance l from the vertex of the 3D object detection frame _i And (3) corner points smaller than the distance threshold value theta, otherwise, corner points which are invisible. Firstly, semantic features of point cloud data are extracted through a shared convolutional neural network, and then a 3D target detection frame of an object and visible corner points of the object under a BEV view angle are respectively predicted through two branches of the convolutional neural network. Wherein, the liquid crystal display device comprises a liquid crystal display device,branches of the predicted 3D object detection frame use a general 3D object detection structure, and predictions of visible corner points use a thermodynamic diagram (hetmap) structure shown in fig. 3.

An adjustment module 703 is configured to adjust the vertex position of the first target detection frame based on the position of the visible corner to obtain a second target detection frame. And fine-tuning the vertex coordinates of the 3D target detection frame by utilizing the positions of the prediction corner points, and adjusting the coordinates of the 3D target detection frame simultaneously achieves the aim of fine-tuning the orientation of the target frame due to the property that the 3D target detection frame is a rigid body.

The target detection module 704 is configured to identify based on the second target detection frame to obtain a target detection result of the target detection object. As shown in fig. 4, a first target detection frame (solid line frame) composed of the original vertices A, B, C, D is adjusted by the predicted visible corner E, F to obtain a second target detection frame (broken line frame) using E, F as the vertices. The surface of the target detection frame facing the host vehicle O is finely adjusted, and because the corner points are positioned at the edges of the object and contain richer semantic information, the target detection frame after corner point adjustment is more accurate, and the method is beneficial to the tasks of follow-up vehicle speed prediction, vehicle tracking and the like after lifting.

As an alternative embodiment, the prediction module 702 includes: a feature extraction unit configured to extract semantic features of the point cloud data; the corner prediction unit is configured to extract corner features in semantic features through a corner thermodynamic diagram, and predict positions of a plurality of visible corners based on the corner features; and the target detection frame prediction unit is configured to extract target detection features based on the semantic features and construct a first target detection frame based on the target detection features. The point cloud information extracts point cloud characteristics through a shared convolutional neural network, and then two small branches are used for respectively predicting a 3D target frame of the object and a visible corner point of the object under the BEV view angle. Wherein the branch associated with the predicted corner is actually a hetmap for determining whether each location is a visible corner and designating the location of the visible corner. Fig. 3 shows the relationship between the point cloud and the hetmap at BEV viewing angles. Heatm is a two-dimensional matrix in the shape of

As an optional implementation manner, before the corner prediction unit extracts the corner features in the semantic features through the corner thermodynamic diagram and predicts the information of the plurality of visible corners based on the corner features, the method further includes: and predicting a group of compensation vectors for each corner, and compensating the lost information of each corner mapped onto the corner thermodynamic diagram from the point cloud coordinate system.

In this embodiment, in order to predict the corner positions more accurately, a set of compensation vectors (offsets) needs to be predicted for each corner, and offsets represent the information lost when the corner is mapped from the point cloud coordinate system to the hetmap, and can compensate for the lost information when the corner coordinates are mapped from the hetmap back to the origin cloud coordinate system. Firstly, the prediction of the corner point is carried out on a hetmap, the visible range size of a point cloud coordinate system is in an equal scaling relationship with the size of the hetmap, and the image mapping is input Information loss occurs when the light reaches the Heatm, for example, when the visible range size of the point cloud coordinate system is 4 times as large as the Heatm, the position (102 ) is mapped to the Heatm in the origin cloud coordinate system

If no compensation is added, this coordinate becomes (25×4 ) = (100, 100) when mapped back to the input image. It is therefore necessary for the network to predict offsets to compensate for the lost information. For example, offset= (Δx, Δy), the coordinates mapped back to the input image become [ (25+Δx) ×4, (25+Δy) ×4)]. During training, the offsets are predicted by using a Smooth L1 loss training network:

wherein the method comprises the steps of

And->

The abscissa and the ordinate of the kth truth corner, namely the group-trunk corner, are respectively represented, and n represents the scaling multiple, namely the step size, from the visible range size to the hectmap size of the point cloud coordinate system. />

Is the predicted offset value.

As an optional implementation manner, the corner prediction unit extracts corner features in the semantic features through a corner thermodynamic diagram, predicts positions of a plurality of visible corners based on the corner features, and further includes: and predicting a vector of each visible corner point to the center of the target detection object based on the corner features, wherein the vector is used for judging whether the visible corner points belong to the same object.

wherein the method comprises the steps of

As an optional implementation manner, the adjusting module 703 adjusts the vertex position of the first target detection frame based on the position of the visible corner to obtain the second target detection frame includes: and acquiring positions of a plurality of vertexes of the first target detection frame facing the host vehicle. And traversing all the visible corner points aiming at each vertex, screening out positions of two visible corner points meeting preset conditions, and adjusting positions of the vertices corresponding to the first target detection frame to obtain a second target detection frame. The first target detection frames are 3D target detection frames, as shown in fig. 4, for each first target detection frame, vertices a, b, c facing the host vehicle O are obtained, and after the predicted visible corner points e, f are adjusted, a second target detection frame (dashed frame) with e, f as vertices is obtained. The surface of the target detection frame facing the host vehicle O is finely adjusted, and because the corner points are positioned at the edges of the object and contain richer semantic information, the target detection frame after corner point adjustment is more accurate, and the method is beneficial to the tasks of follow-up vehicle speed prediction, vehicle tracking and the like after lifting. Because the target detection frame is a rigid body, the target detection frame coordinate is adjusted, and the aim of fine-tuning the target detection frame orientation is fulfilled. Because the prediction of the corner point is more accurate, the target detection frame subjected to the corner point fine adjustment is more accurate, and particularly for a large-sized vehicle, the effect is more obvious.

Further, the visual corner is obtained through prediction, namely the preset condition comprises: (1) The minimum distance between the visible corner point and the corresponding vertex point is smaller than a preset distance threshold value; (2) Pre-defining visible corner pointsThe measurement confidence coefficient is larger than a set confidence coefficient threshold value; (3) the visible corner points to the center of the object to which it belongs. Branches related to target frame regression predict a set of 3D target detection frames, while branches related to corner detection predict a set of corner positions. Firstly, acquiring coordinates under the BEV view angle of a 3D target detection frame, acquiring three vertexes facing a host vehicle for each target detection frame, and acquiring three vertexes v for each vertex _i (i= 0,1,3), all the predictor corners are traversed. If the three conditions are met at the same time, the selected corner points are reserved, and the corner points which do not meet the conditions are deleted. If 3 vertexes of the object can select 3 corner points meeting the condition, selecting two corner points closest to the host vehicle, aligning two vertex points corresponding to the first target detection frame with the two corner points, and achieving the purpose of fine adjustment and prediction of the target frame; if 3 vertexes of the object can select 2 corner points meeting the condition, fine-tuning a first target detection frame by using the two corner points; if the number of corner points meeting the condition selected by the 3 vertexes of the object is less than 2, the first target detection frame obtained through prediction does not conduct fine adjustment operation.

The present disclosure provides a training method of a target detection model, as shown in fig. 8a, including:

and acquiring point cloud data of the target detection object. In the training process, point cloud data needs to be acquired as a training sample.

And obtaining a plurality of visible corner points as training labels. And selecting the visible corner points meeting the minimum distance condition (namely that the distance from the vertex is smaller than the distance threshold value) as training labels. To predict the position of the visible corner, it is first necessary toAnd manufacturing a training label of the visible corner point. Knowing the 3D target frame labeling of an object, four corner coordinates (i.e., four vertex coordinates A, B, C, D of a rectangle) of the 2D target frame at BEV perspective (i.e., top view) are first obtained. At the same time, the point cloud coordinates are also mapped to BEV viewing angles, as shown in fig. 6. Then a distance threshold value theta is set for each corner point c _i ＝(x _i ,y _i ) Searching the nearest point cloud coordinate, wherein the minimum distance is l _i If l _i And if theta is less than or equal to θ, the angle is a visible angle point (A, B is a visible angle), otherwise, the angle is an invisible angle point (C, D is an invisible angle). The method finds the visible corner points to serve as training labels, so that a target detection model capable of predicting the visible corner points is trained.

And inputting the point cloud data into a neural network for feature extraction to obtain semantic features, and respectively predicting based on the semantic features to obtain a predicted target detection frame and a plurality of predicted visible corner points.

And calculating the loss between the predicted visible corner point and the training label, and adjusting the weight of the neural network based on the loss to obtain the target detection model. The training of the model is supervised, the visible corner label can be calculated according to the label of the target frame, the label does not participate in the training, after the model outputs a result, the loss is calculated according to the label and the output result, and the weight of the model is adjusted according to the calculated loss.

Specifically, the training process in this embodiment includes: 1) And (3) data processing: inputting 3D point cloud data, and generating point cloud manual characteristics 2) characteristic extraction under the BEV view angle through geometrical relations among the point clouds, reflection intensity of the point clouds and other information: semantic information of the point cloud is extracted through the shared convolution layer, and the feature contains the semantic information for 3D target detection and visible corner prediction. 3) 3D target detection module branches and visible corner prediction module: each module is provided with an independent convolution layer for extracting semantic information useful for the respective module, and finally predicting a 3D target frame and a visible corner point; 4) Loss function: each module has a corresponding loss function to optimize. Wherein the 3D object detection module optimizes the prediction category of the object frame through a classification function of cross entropy loss, and predicts the coordinates of the object frame through a Smooth L1 regression loss. For the visible corner prediction module, the prediction of the corner is jointly optimized through a formula (1), a formula (2) and a formula (3).

The present disclosure provides a training apparatus for a target detection model, as shown in fig. 9, including:

the point cloud acquisition module 901 is configured to acquire point cloud data of a target detection object. In the training process, point cloud data needs to be acquired as a training sample.

The training label generation module 902 is configured to obtain a plurality of visible corner points as training labels. And selecting the visible corner points meeting the minimum distance condition (namely that the distance from the vertex is smaller than the distance threshold value) as training labels. To predict the position of the visible corner, a training label of the visible corner needs to be manufactured first. Knowing the 3D target frame labeling of an object, four corner coordinates (i.e., four vertex coordinates of a rectangle) of the 2D target frame at BEV viewing angles (i.e., top view) are first obtained. At the same time, the point cloud coordinates are also mapped to BEV viewing angles, as shown in fig. 6. Then a distance threshold value theta is set for each corner point c _i ＝(x _i ,y _i ) Searching the nearest point cloud coordinate, wherein the minimum distance is l _i If l _i And if theta is less than or equal to θ, the angle point is a visible angle point, otherwise, the angle point is an invisible angle point. The method finds the visible corner points to serve as training labels, so that a target detection model capable of predicting the visible corner points is trained.

The training module 903 is configured to input the point cloud data into a neural network to perform feature extraction to obtain semantic features, and predict the semantic features to obtain a predicted target detection frame and a plurality of predicted visible corner points respectively.

A calculation module 904 configured to calculate a loss between the predicted visible corner point and the training label, and adjust weights of the neural network based on the loss to obtain a target detection model. The training of the model is supervised, the visible corner label can be calculated according to the label of the target frame, the label does not participate in the training, after the model outputs a result, the loss is calculated according to the label and the output result, and the weight of the model is adjusted according to the calculated loss.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 10 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Various components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the respective methods and processes described above, such as the target detection method. For example, in some embodiments, the object detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM1002 and/or communication unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the above-described object detection method may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the target detection method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of large vehicle target detection comprising:

acquiring point cloud data of a target detection object;

2. The method of claim 1, wherein the extracting, by the corner thermodynamic diagram, corner features of the semantic features, and before predicting information of a plurality of the visible corners based on the corner features, further comprises:

and predicting a group of compensation vectors for each corner, and compensating the lost information of each corner mapped onto the corner thermodynamic diagram from a point cloud coordinate system.

3. The method according to claim 1 or 2, wherein the extracting of corner features of the semantic features by a corner thermodynamic diagram predicts positions of a plurality of the visible corners based on the corner features, further comprising:

and predicting a vector of each visible corner point to the center of the target detection object based on the corner features, wherein the vector is used for judging whether the visible corner points belong to the same object.

4. A method according to any one of claims 1-3, wherein said adjusting the vertex position of the first object detection box based on the position of the visible corner to obtain a second object detection box comprises:

Acquiring positions of a plurality of vertexes of the first target detection frame facing the host vehicle;

and traversing all the visible corner points for each vertex, screening out positions of the two visible corner points meeting preset conditions, and adjusting positions of the vertices corresponding to the first target detection frame to obtain the second target detection frame.

5. The method of claim 4, wherein the preset condition comprises:

the minimum distance is reserved between the visible corner point and the corresponding vertex, and the minimum distance is smaller than a preset distance threshold;

the prediction confidence coefficient of the visible corner point is larger than a set confidence coefficient threshold value;

the visible corner points to the center of the object to which it belongs.

6. A large vehicle target detection apparatus comprising:

the prediction module includes:

7. The apparatus of claim 6, wherein the corner prediction unit extracts corner features from the semantic features by the corner thermodynamic diagram, and further comprises, before predicting information of a plurality of the visible corners based on the corner features:

8. The apparatus according to claim 6 or 7, wherein the corner prediction unit extracts corner features from the semantic features by a corner thermodynamic diagram, predicts positions of a plurality of the visible corners based on the corner features, further comprising:

9. The apparatus of any of claims 6-8, wherein the adjusting module to adjust the vertex position of the first target detection frame based on the position of the visible corner to obtain a second target detection frame comprises:

10. The apparatus of claim 9, wherein the preset condition comprises:

the visible corner points to the center of the object to which it belongs.

11. A training method of a target detection model, comprising:

acquiring point cloud data of a target detection object;

acquiring a plurality of visible corner points as training labels;

12. A training device for a target detection model, comprising:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-5.

16. An autopilot comprising the electronic device of claim 13.