CN112507862A

CN112507862A - Vehicle orientation detection method and system based on multitask convolutional neural network

Info

Publication number: CN112507862A
Application number: CN202011411157.5A
Authority: CN
Inventors: 陈智磊; 乔文龙; 李泽彬
Original assignee: Dongfeng Motor Corp
Current assignee: Dongfeng Motor Corp
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-03-16
Anticipated expiration: 2040-12-04
Also published as: CN112507862B

Abstract

The invention relates to a vehicle orientation detection method and system based on a multitask convolutional neural network. The vehicle orientation detection method based on the multitask convolutional neural network comprises the following steps: establishing a mature multi-stage multitask convolution neural network model; acquiring a real-time RGB image of a front vehicle, and inputting the real-time RGB image of the front vehicle into a multi-stage multi-task convolutional neural network model to obtain a real-time vehicle position, a real-time vehicle type, a real-time wheel position and real-time wheel and ground intersection point position information of the front vehicle; and optimizing the position of the real-time wheel and the position information of the intersection point of the real-time wheel and the ground, and acquiring the real-time vehicle orientation angle of the front vehicle according to the position information of the intersection point of the real-time wheel and the ground. The method can solve the problems that a front vehicle 3D detection frame cannot be trained through deep learning, and TTC is not accurately calculated due to lack of front vehicle orientation information when TTC is calculated by utilizing vehicle pixel width change.

Description

Vehicle orientation detection method and system based on multitask convolutional neural network

Technical Field

The invention relates to the technical field of automatic driving, in particular to a vehicle orientation detection method and system based on a multitask convolutional neural network.

Background

In the ADAS (Advanced driving assistance System) monocular vision, accurate detection of a front vehicle is an essential function of the ADAS System, and 2D (two-dimensional) front vehicle detection cannot meet the requirement for further improving the intelligence of the ADAS System, so that a 3D (three-dimensional) frame detection technology of the front vehicle in the monocular vision needs to be researched, the 2D detection of the 3D detection of the vehicle can be realized by calculating the direction of the front vehicle, and the direction angle of the front vehicle can be effectively calculated at a key point where the wheels of the front vehicle intersect with the ground. In addition, when calculating the Time To Collision (TTC) of the vehicle, the change of the pixel width of the front vehicle is often used for calculation, the direction of the front vehicle needs to be considered, and if the direction of the front vehicle is continuously changed during the driving process, the change of the pixel width of the front vehicle is no longer determined only by the distance change of the vehicle, and at this Time, the TTC calculation is inaccurate. And the conversion can be corrected by identifying and detecting the orientation angle of the front vehicle, so that the TTC is more accurately calculated. In the paper Vision-based ACC with a Single Camera: in Bounds on Range and Range RateAccuracy, the problem of the direction of the leading vehicle is not considered when calculating the collision time of the vehicle, so that the error between the TTC calculation and the true value increases when the direction of the leading vehicle changes greatly.

In a Forward Collision Warning (FCW) product based on monocular vision, a front vehicle needs to be monitored, the distance, the direction and the relative speed between the front vehicle and the vehicle are judged, and a driver is warned when a potential Collision danger exists. For the vehicle detection problem, a deep learning based method is generally adopted to detect the position and type of the vehicle. Deep learning is mostly adopted for the vehicle orientation, and vehicle orientation information is obtained through a regression algorithm by using vehicle three-dimensional information as training data. The 3D (three-dimensional) target Detection of the RGB map inputs a single RGB map from single 3D Object Detection and BOX shaping trailing end-to-end using intersection loss, outputs the predicted target type, 2D (two-dimensional) frame position, target distance, target deflection angle, length, width and height of the 3D frame directly through a CNN (Convolutional Neural network), projects eight vertexes of the 3D frame to the coordinate position of 2D, extracts the best 2D frame through NMS (Non-Maximum Suppression ), converts 3D BOXniting into three information of the target type, the 2D frame and the 3D frame, corresponds to the labeled group channel (true value) one by one, optimizes the single IOUlloss to carry out network regression. The method needs to mark data with group route information such as length, width, height, orientation and the like.

However, in reality, a target 3D detection frame obtained by training based on the deep learning method needs a large amount of groudtruth data as a premise, and if the data is less, the trained model is often over-fitted, and the generalization capability is weak. Under the condition of a common road, a plurality of front vehicles exist, and if data are collected, the data fusion of the laser radar is not matched, and the actual physical size and the orientation information of the front vehicles in length, width and height are difficult to obtain only by splicing the monocular cameras. The laser radar is high in cost, and after the laser radar and the monocular camera are calibrated in a combined mode, data fusion is further needed in the later stage, so that the relevant length, width and height actual physical size and orientation information of the front vehicle can be obtained. Therefore, the three-dimensional data information of the front vehicle is difficult to acquire and meet the requirement of training data.

Disclosure of Invention

The invention provides a vehicle orientation detection method and system based on a multitask convolutional neural network, and solves the problems that a front vehicle 3D detection frame cannot be trained through deep learning, and TTC calculation is inaccurate due to lack of front vehicle orientation information when TTC is calculated by utilizing vehicle pixel width change.

In a first aspect, the present invention provides a vehicle orientation detection method based on a multitask convolutional neural network, including the following steps:

establishing a mature multi-stage multitask convolution neural network model;

acquiring a real-time RGB image of a front vehicle, and inputting the real-time RGB image of the front vehicle into a multi-stage multi-task convolutional neural network model to obtain a real-time vehicle position, a real-time vehicle type, a real-time wheel position and real-time wheel and ground intersection point position information of the front vehicle;

and optimizing the position of the real-time wheel and the position information of the intersection point of the real-time wheel and the ground, and acquiring the real-time vehicle orientation angle of the front vehicle according to the position information of the intersection point of the real-time wheel and the ground.

In some embodiments, the step of "establishing a mature multi-stage multitask convolutional neural network model" specifically includes the following steps:

establishing an original multi-stage multitask convolution neural network model;

acquiring a training picture, and inputting the training picture into an original multi-stage multi-task convolutional neural network model to obtain a total loss function of the original multi-stage multi-task convolutional neural network model;

and obtaining a mature multi-stage multi-task convolutional neural network model according to the original multi-stage multi-task convolutional neural network model and the total loss function thereof.

In some embodiments, the step of "establishing an original multi-stage multitask convolutional neural network model" specifically includes the following steps:

acquiring an original picture, and scaling the original picture to different scales to form an image pyramid;

performing multi-level convolution operation according to the image pyramid to obtain a vehicle detection frame, a vehicle type, a wheel detection frame and position information of a wheel and ground intersection point;

acquiring a vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame and the position information of the intersection point of the wheel and the ground;

and obtaining an original multi-stage multitask convolution neural network model according to the steps.

In some embodiments, the step of performing multi-level convolution operation according to the image pyramid to obtain the vehicle detection frame, the vehicle type, the wheel detection frame, and the wheel and ground intersection position information includes the following steps:

extracting preliminary features and a calibration frame of an image on an image pyramid by adopting a dense sliding window through a first-level multitask convolutional neural network, performing frame regression adjustment processing to obtain a candidate frame, calibrating a window of the candidate frame, and then restraining and combining the highly overlapped candidate frames through a non-maximum value to obtain a candidate frame image for coarse positioning of the vehicle and the wheel;

taking a candidate frame image obtained after combining the highly overlapped candidate boundary frames as input, performing second-level multi-task convolutional neural network operation, and rejecting a false detection target through boundary frame regression adjustment and non-maximum value suppression;

and zooming the candidate frame image without the false detection target to be used as input, and performing third-level multitask convolutional neural network operation to obtain a final vehicle detection frame, a vehicle type, a wheel detection frame and wheel and ground intersection point position information.

In some embodiments, the step of obtaining the vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame, and the intersection point position information of the wheel and the ground includes the following steps:

connecting each wheel and the ground intersection point on the image according to the vehicle detection frame, the vehicle type, the wheel detection frame and the key point position information of the wheel and ground intersection point;

and screening out the intersection points of the two wheels which have small horizontal position difference and belong to the same vehicle and the ground, and mapping the intersection points of the two wheels and the ground under a world coordinate system in combination with the internal and external parameters of the camera to obtain the orientation angle of the vehicle.

In some embodiments, the step of "mapping the intersection points of the two wheels and the ground and the inner and outer parameters of the camera to the world coordinate system to obtain the orientation angle of the vehicle" includes the following steps:

respectively establishing an image coordinate system, a camera coordinate system, a vehicle coordinate system and a world coordinate system;

calibrating the internal and external parameters of the camera according to a Zhangyingyou checkerboard mode, and establishing a mapping relation from image coordinates to a world coordinate system;

and mapping the image coordinates of the intersection points of the two wheels and the ground to a world coordinate system, acquiring the included angle between the straight line of the intersection points of the two wheels and the ground and the axis of the world coordinate system along the lane direction, and calculating the orientation angle of the vehicle.

In some embodiments, the step of obtaining a training picture and inputting the training picture into an original multi-stage multi-task convolutional neural network model to obtain a total loss function of the original multi-stage multi-task convolutional neural network model specifically includes the following steps:

carrying out data annotation on the training picture, and marking a vehicle detection frame, a vehicle type, a wheel detection frame and information of intersection points of wheels and the ground;

inputting the marked vehicle detection frame and the vehicle type into an original multi-stage multi-task convolutional neural network model for training, and adjusting parameters to obtain a converged multi-stage multi-task convolutional neural network model;

and inputting the marked vehicle detection frame, the vehicle type, the wheel detection frame and the intersection point information of the wheels and the ground into the converged multistage multitask convolutional neural network model for training to obtain a total loss function of the converged multistage multitask convolutional neural network model.

In some embodiments, the step of obtaining the total loss function of the converged multi-stage multitask convolutional neural network model specifically includes the following steps:

the penalty function for the vehicle class resulting in the converged multi-stage multitask convolutional neural network model is as follows:

where K is the number of categories, y is the label, and p is the probability that the designated category is i;

the loss function of the vehicle detection frame of the converged multi-stage multitask convolutional neural network model is obtained as follows:

the loss function of the wheel detection box of the converged multi-stage multitask convolutional neural network model is obtained as follows:

wherein ,

is the result of the neural network regression frame,

coordinates that are true values;

the loss function of the wheel and ground intersection point of the converged multi-stage multitask convolution neural network model is obtained as follows:

wherein ,

the intersection point of the wheel and the ground which is the output of the neural network,

the real intersection point coordinates are obtained;

setting proper weights for the loss functions, inputting a training picture into a converged multistage multitask convolution neural network model for training, and obtaining a total loss function as follows;

loss＝W1*class_loss+W2*carbox_loss+W3*wheelbox_loss+W4*point_loss

formula (5);

wherein, W1, W2, W3 and W4 are weights of the loss functions respectively.

In some embodiments, the step of obtaining the real-time vehicle orientation angle of the leading vehicle from the real-time wheel and ground intersection position information includes the following steps:

connecting each real-time wheel and the ground intersection point on the image through the position information of the real-time wheel and the ground intersection point of the front vehicle;

and screening out the intersection points of the two real-time wheels which have small horizontal position difference and belong to the front vehicle and the ground, and mapping the two real-time wheels and the ground intersection points under a world coordinate system in combination with the internal and external parameters of the camera to obtain the real-time vehicle orientation angle of the front vehicle.

In a second aspect, the present invention provides a vehicle orientation detection system based on a multitask convolutional neural network, comprising:

the model establishing module is used for establishing a multi-stage multitask convolution neural network model;

the method comprises the steps that key point positions are obtained and touched, real-time RGB images of a front vehicle are obtained, the real-time RGB images of the front vehicle are input into a multi-stage multi-task convolutional neural network model, and real-time vehicle positions, real-time vehicle types, real-time wheel positions and real-time key point position information of intersection points of wheels and the ground of the front vehicle are obtained; and the number of the first and second groups,

and the vehicle orientation angle acquisition module is used for optimizing the position of the real-time wheels and the position information of the intersection points of the real-time wheels and the ground, and acquiring the real-time vehicle orientation angle of the front vehicle through the position information of the intersection points of the real-time wheels and the ground.

The technical scheme provided by the invention has the beneficial effects that:

the embodiment of the invention provides a vehicle orientation detection method based on a multitask convolutional neural network, which comprises the steps of taking an acquired RGB (red, green and blue) image of a front vehicle as input, utilizing a multi-stage multitask convolutional neural network to simultaneously detect the position information of a vehicle, the type of the vehicle, the position of a wheel and the position of a key point of a wheel and ground intersection, optimizing the position of the wheel and the position of the key point of the wheel and ground intersection through post-processing, and finally calculating the orientation angle of the vehicle through the position of the key point.

Aiming at the problem that a 3D detection frame of the front vehicle cannot be directly trained through deep learning due to lack of three-dimensional training data of the front vehicle, the 2D frame detection of the front vehicle and the orientation angle of the front vehicle required by 2D-to-3D conversion of the vehicle frame are solved by adopting the deep learning detection. By detecting the intersection point of the wheel and the ground in the 2D frame, the orientation angle of the vehicle in the image is calculated, and powerful support is provided for the subsequent transformation from the 2D frame to the 3D frame of the vehicle. Therefore, only the data collected by the monocular camera is required to be trained, the front vehicle width height and the heading angle can be calculated, and the data collection cost and the time period are reduced. Moreover, by acquiring the vehicle orientation angle of the front vehicle, the problem that when the TTC is calculated by using the change of the vehicle pixel width, the TTC is calculated inaccurately due to the lack of the front vehicle orientation is solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flowchart illustrating steps of a vehicle orientation detection method based on a multitask convolutional neural network according to an embodiment of the present invention;

fig. 2 is a detailed flowchart illustrating the step S100 of the vehicle orientation detection method based on the multitask convolutional neural network according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a detailed step of step S110 of the vehicle orientation detection method based on the multitask convolutional neural network according to another embodiment of the present invention;

fig. 4 is a flowchart illustrating the detailed step of step S120 of the vehicle orientation detection method based on the multitask convolutional neural network according to the embodiment of the present invention;

fig. 5 is a schematic training diagram of a multi-layer multitask convolutional neural network based on a vehicle orientation detection method of the multitask convolutional neural network according to an embodiment of the invention.

Detailed Description

Reference will now be made in detail to the present embodiments of the invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the specific embodiments, it will be understood that they are not intended to limit the invention to the embodiments described. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. It should be noted that the method steps described herein may be implemented by any functional block or functional arrangement, and that any functional block or functional arrangement may be implemented as a physical entity or a logical entity, or a combination of both.

In order that those skilled in the art will better understand the present invention, the following detailed description of the invention is provided in conjunction with the accompanying drawings and the detailed description of the invention.

Note that: the example to be described next is only a specific example, and does not limit the embodiments of the present invention necessarily to the following specific steps, values, conditions, data, orders, and the like. Those skilled in the art can, upon reading this specification, utilize the concepts of the present invention to construct more embodiments than those specifically described herein.

The invention provides a vehicle orientation detection method and system based on a multitask convolutional neural network, and solves the problems that a front vehicle 3D detection frame cannot be trained through deep learning in the related technology, and TTC calculation is inaccurate due to lack of front vehicle orientation information when TTC is calculated by utilizing vehicle pixel width change.

Specifically, as shown in fig. 1, the present invention provides a vehicle orientation detection method based on a multitask convolutional neural network, including the following steps:

s100, establishing a mature multi-stage multi-task convolutional neural network model;

s200, acquiring a real-time RGB image of the front vehicle, and inputting the real-time RGB image of the front vehicle into a multi-stage multi-task convolutional neural network model to obtain the real-time vehicle position, the real-time vehicle type, the real-time wheel position and the real-time wheel and ground intersection point position information of the front vehicle;

and S300, optimizing the position of the real-time wheel and the position information of the intersection point of the real-time wheel and the ground, and acquiring the real-time vehicle orientation angle of the front vehicle according to the position information of the intersection point of the real-time wheel and the ground.

According to the vehicle orientation detection method based on the multitask convolutional neural network, the collected RGB images of the front vehicle are used as input, the multistage multitask convolutional neural network is utilized to simultaneously detect the position information of the vehicle, the type of the vehicle, the position of the wheel and the position of the key point of the intersection point of the wheel and the ground, then the position of the wheel and the key point of the intersection point of the wheel and the ground are optimized through post-processing, and finally the vehicle orientation angle is calculated through the key point position.

As shown in fig. 2, the step S100, namely the step of establishing a mature multi-stage multitask convolutional neural network model, specifically includes the following steps:

s110, establishing an original multi-stage multitask convolution neural network model;

s120, acquiring a training picture, and inputting the training picture into an original multi-stage multi-task convolutional neural network model to obtain a total loss function of the original multi-stage multi-task convolutional neural network model;

and S130, obtaining a mature multi-stage multi-task convolutional neural network model according to the original multi-stage multi-task convolutional neural network model and the total loss function thereof.

Before obtaining the mature multi-stage multi-task convolutional neural network model, a preliminary (original) multi-stage multi-task convolutional neural network model needs to be established, and the preliminary multi-stage multi-task convolutional neural network model needs to be trained to obtain various parameters (including various loss functions and loss weights) of the multi-stage multi-task convolutional neural network model, so that the mature multi-stage multi-task convolutional neural network model is obtained.

Further, as shown in fig. 3, the step S110, namely the step of "establishing an original multi-stage multitask convolutional neural network model", specifically includes the following steps:

and S112, acquiring the original picture, and scaling the original picture to different scales to form an image pyramid.

Specifically, the acquired original RGB image may be subjected to transformation of different scales to construct a 7-layer image pyramid to adapt to vehicle detection of different sizes. Wherein the dimensions are 2, 4, 8, 16, 32, 64, 128, respectively.

And S114, performing multi-level convolution operation according to the image pyramid to obtain a vehicle detection frame, a vehicle type, a wheel detection frame and the position information of the intersection point of the wheel and the ground.

Step S114, namely, the step of performing multi-level convolution operation according to the image pyramid to obtain the vehicle detection frame, the vehicle type, the wheel detection frame, and the wheel-ground intersection position information, specifically includes the following steps:

s1142, extracting preliminary features and a calibration frame of the image on the image pyramid by adopting a dense sliding window through a first-level multitask convolution neural network, performing frame regression adjustment processing to obtain a candidate frame, calibrating a window of the candidate frame, and then restraining and combining the highly overlapped candidate frames through a non-maximum value to obtain a candidate frame image for coarse positioning of the vehicle and the wheels.

Specifically, on all pyramid images, a dense sliding window 12 × 12 is used, on a three-layer fully-convolutional neural network, a fully-convolutional network FCN is used for performing preliminary feature extraction and frame calibration, and frame regression adjustment and non-maximum suppression (NMS) are performed to filter most of the windows.

For example, the model input is a 12 × 3 picture, and 10 5 × 5 feature maps are generated by 10 convolution kernels of 3 × 3 and Max Pooling (stride ═ 2) operations of 2 × 2; generating 16 characteristic maps of 3 × 3 by 16 convolution kernels of 3 × 10; generating 32 characteristic graphs of 1 × 1 through 32 convolution kernels of 3 × 16; for 32 feature maps of 1 × 1, 12 feature maps of 1 × 1 can be generated for classification through 12 convolution kernels of 1 × 32, wherein the vehicle type is 11 types, and the target is not 1 type; 4 convolution kernels of 1 × 32, generating 4 characteristic maps of 1 × 1 for regression vehicle frames; the 4 convolution kernels of 1 x 32 generate 4 feature maps of 1 x 1 for regression wheel frames, and the 2 convolution kernels of 1 x 32 generate 2 feature maps of 1 x 1 for regression vehicle landings (wheel-ground intersections).

And S1144, taking the candidate frame image obtained after the highly overlapped candidate boundary frames are combined as input, performing second-level multitask convolution neural network operation, and rejecting the false detection target through boundary frame regression adjustment and non-maximum value suppression.

Specifically, the candidate frame image obtained in step S1142 is used as an input, scaled to a size of 24 × 24 image, on a four-layer convolutional neural network, a full link layer is finally used to filter out a large number of candidate frame images with poor effects, and finally, the selected candidate frame image is subjected to bounding box regression adjustment and non-maximum suppression (NMS) to further optimize the prediction result.

For example, the model input is a 24 × 3 picture, and 28 11 × 11 feature maps are generated by 28 convolution kernels of 3 × 3 and Max Pooling (stride 2) operation of 2 × 2; generating 48 4 × 4 feature maps by 48 convolution kernels of 3 × 28, Max Pooling (stride 2) operation of 2 × 2; generating 64 feature maps of 3 × 3 by 64 convolution kernels of 3 × 48; converting into a 128-size full link layer through a full link operation; for 1 128 × 1 feature map, 12 1 × 1 feature maps can be generated for classification through 12 1 × 128 convolution kernels, wherein the vehicle type is 11 types, and the target is not 1 type; 4 convolution kernels of 1 × 128, generating 4 characteristic maps of 1 × 1 for regression vehicle frames; the 4 convolution kernels of 1 × 128 generate 4 feature maps of 1 × 1 for regression of the wheel frame, and the 2 convolution kernels of 1 × 128 generate 2 feature maps of 1 × 1 for regression of the vehicle footprint (intersection of the wheel with the ground).

S1146, zooming the candidate frame image without the false detection target to be used as input, and performing third-level multitask convolution neural network operation to obtain final vehicle detection frame, vehicle type, wheel detection frame and wheel and ground intersection point position information.

Specifically, the candidate frame image obtained in step S1144 may be used as an input, scaled to a size of 48 × 48 image, and finally output the final vehicle detection frame, vehicle category and wheel detection frame, and intersection of the wheel and the ground through a full link layer on a five-layer convolutional neural network.

For example, the model input is a 48 × 3 picture, which is transformed into 32 23 × 23 feature maps by 32 convolution kernels of 3 × 3 and max pooling of 3 × 3(stride ═ 2); after passing through 64 convolution kernels of 3 × 32 and max posing of 3 × 3(stride 2), the feature maps are converted into 64 feature maps of 10 × 10; after passing through 64 convolution kernels of 3 × 64 and max posing of 3 × 3(stride 2), the feature maps are converted into 64 feature maps of 4 × 4; converting into 128 characteristic maps of 3 × 3 through 128 convolution kernels of 2 × 64; converting into a full link layer with 256 sizes through a full link operation; for 1 256 × 1 feature map, 12 1 × 1 feature maps can be generated for classification through 12 convolution kernels of 1 × 256, wherein the vehicle type is 11 types, and the target is not 1 type; 4 convolution kernels of 1 × 256, generating 4 characteristic maps of 1 × 1 for regression vehicle frames; the 4 convolution kernels of 1 × 256 generate 4 feature maps of 1 × 1 for regression wheel frames, and the 2 convolution kernels of 1 × 256 generate 2 feature maps of 1 × 1 for regression vehicle landings.

Steps S1142, S1144, and S1146 are three cascade networks, and when performing model training in the subsequent steps, models and parameters of the three cascade networks may be trained together on the basis of pre-training.

And S116, acquiring a vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame and the intersection point position information of the wheels and the ground.

In step S116, that is, the step of obtaining the vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame, and the information of the intersection point between the wheel and the ground includes the following steps:

s1162, connecting each wheel and the ground intersection point on the image according to the vehicle detection frame, the vehicle type, the wheel detection frame and the key point position information of the intersection point of the wheel and the ground;

s1164, two wheels which have small horizontal position difference and belong to the same vehicle are screened out and intersect with the ground, the intersections of the two wheels and the ground are mapped to a world coordinate system by combining internal and external parameters of a camera, and the orientation angle of the vehicle is obtained.

Further, in step S1164, the step of "mapping the inside and outside parameters of the two wheel intersection points and the ground combined with the camera to the world coordinate system to obtain the orientation angle of the vehicle" includes the following steps:

Defining a world coordinate system, wherein a zw axis is directed upwards through the roof perpendicular to the ground, a yw axis is along the direction of a straight lane line, and an xw axis is directed to the right side of the straight lane line, so that the following coordinates are obtained:

where x, y are image coordinates, f_x，f_yIs the focal length of a camera pixel, u_x，u_yAnd (3) R is a rotation matrix of 3 x 3 according to the optical center coordinate, the rotation matrix is obtained according to three Euler angles of the installation of the camera, and k is a scale factor and can be calculated according to the installation height of the camera.

And a world coordinate straight line connecting the intersection points of the front wheel and the rear wheel with the ground, wherein the vehicle orientation angle is the slope inverse tangent of the straight line.

And S118, obtaining an original multi-stage multitask convolution neural network model according to the steps.

In addition, as shown in fig. 4 and fig. 5, the step S120, namely, the step of obtaining a training picture and inputting the training picture into the original multi-stage multitask convolutional neural network model to obtain a total loss function of the original multi-stage multitask convolutional neural network model, specifically includes the following steps:

s122, carrying out data annotation on the training picture, and annotating a vehicle detection frame, a vehicle type, a wheel detection frame and information of intersection points of wheels and the ground;

and S124, inputting the marked vehicle detection frame and the vehicle type into the original multi-stage multi-task convolutional neural network model for training, and adjusting parameters to obtain the converged multi-stage multi-task convolutional neural network model.

Firstly, a vehicle detection frame and marking information of vehicle types are utilized to supervise and train a vehicle detection task, and parameters are adjusted to make a model converge; the step is to train the vehicle detection network, keep the parameters of other branches from learning, adjust the parameters of the vehicle detection network, print the loss function values of different parameters, analyze the trend of the curve, further adjust the parameters, and finally make the parameters converge.

And S126, inputting the marked vehicle detection frame, the vehicle type, the wheel detection frame and the intersection point information of the wheels and the ground into the converged multi-stage multi-task convolutional neural network model for training to obtain the total loss function of the converged multi-stage multi-task convolutional neural network model.

On the basis of the model trained in step S124, model training is performed using all labeled information (i.e., vehicle detection frame and vehicle type, wheel detection frame, intersection of wheel and ground); during training, the losses of all three layers are accumulated to a target loss function. Specifically, the loss here is the loss of the output of the three cascaded networks, and the losses output in steps S1142, S1144, and S1146 are added in a weighted manner. The loss weight output in step S1146 > the loss weight output in step S1144 > the loss weight output in step S1142.

Moreover, in some embodiments, in the step S126, the step of obtaining the total loss function of the converged multi-stage multitask convolutional neural network model specifically includes the following steps:

wherein ,

is the result of the neural network regression frame,

coordinates that are true values;

wherein ,

the real intersection point coordinates are obtained;

loss＝W1*class_loss+W2*carbox_loss+W3*wheelbox_loss+W4*point_loss

formula (5);

wherein, W1, W2, W3 and W4 are weights of the loss functions respectively. Through experimental adjustment, the loss of each task can be in the same order of magnitude.

Specifically, the adjustment process includes the following steps: firstly setting an initial value, training a model, seeing the magnitude of loss of each task after obtaining a result, and on the basis, adjusting the loss weight W to enable the loss to be on one magnitude. Then, fine adjustment of small step length is carried out on the basis, a permutation and combination matrix is written, parameters of all permutation and combination are automatically trained, data and parameters of all permutation and combination training are obtained, and finally, the recall rate is evaluated, and an optimal parameter and a model are obtained.

In addition, in some embodiments, in step S300, the step of "obtaining the real-time vehicle orientation angle of the leading vehicle through the real-time wheel and ground intersection position information" includes the following steps:

s310, connecting each real-time wheel and the ground intersection point on the image through the position information of the intersection point of the real-time wheel and the ground of the front vehicle;

s320, screening out the intersection points of the two real-time wheels which have small horizontal position difference and belong to the front vehicle and the ground, and mapping the two real-time wheels and the ground intersection points under a world coordinate system in combination with the internal and external parameters of the camera to obtain the real-time vehicle orientation angle of the front vehicle.

Finally, when the real-time vehicle heading angle of the leading vehicle is obtained, the method is basically the same as the step S116, and is not described herein again.

In addition, the present invention provides a vehicle orientation detection system based on a multitask convolutional neural network, comprising:

The vehicle orientation detection system based on the multitask convolutional neural network described in this embodiment corresponds to the vehicle orientation detection method based on the multitask convolutional neural network described above, and functions of each module in the vehicle orientation detection system based on the multitask convolutional neural network in this embodiment are explained in detail in the corresponding method embodiment, and are not described one by one here.

The technical scheme provided by the invention solves the problem that a 3D detection frame of the front vehicle cannot be directly trained through deep learning due to lack of three-dimensional training data of the front vehicle, and solves the problem by adopting the 2D frame detection of the front vehicle through the deep learning and the orientation angle of the front vehicle required by the 2D-to-3D conversion of the vehicle frame. By detecting the intersection point of the wheel and the ground in the 2D frame, the orientation angle of the vehicle in the image is calculated, and powerful support is provided for the subsequent transformation from the 2D frame to the 3D frame of the vehicle. Therefore, only the data collected by the monocular camera is required to be trained, the front vehicle width height and the heading angle can be calculated, and the data collection cost and the time period are reduced. Moreover, by acquiring the vehicle orientation angle of the front vehicle, the problem that when the TTC is calculated by using the change of the vehicle pixel width, the TTC is calculated inaccurately due to the lack of the front vehicle orientation is solved.

Based on the same inventive concept, the embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements all or part of the method steps of the above method.

The present invention can implement all or part of the processes of the above methods, and can also be implemented by using a computer program to instruct related hardware, where the computer program can be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above method embodiments can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.

Based on the same inventive concept, an embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program running on the processor, and the processor executes the computer program to implement all or part of the method steps in the method.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the computer device and the various interfaces and lines connecting the various parts of the overall computer device.

The memory may be used to store computer programs and/or models, and the processor may implement various functions of the computer device by executing or otherwise executing the computer programs and/or models stored in the memory, as well as by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (e.g., a sound playing function, an image playing function, etc.); the storage data area may store data (e.g., audio data, video data, etc.) created according to the use of the cellular phone. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, server, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), servers and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A vehicle orientation detection method based on a multitask convolutional neural network is characterized by comprising the following steps:

establishing a mature multi-stage multitask convolution neural network model;

2. The vehicle orientation detection method based on the multitask convolutional neural network according to claim 1, wherein the step of establishing a mature multistage multitask convolutional neural network model specifically comprises the following steps:

3. The vehicle orientation detection method based on the multitask convolutional neural network according to claim 2, wherein the step of establishing an original multistage multitask convolutional neural network model specifically comprises the following steps:

4. The vehicle orientation detection method based on the multitask convolutional neural network as claimed in claim 3, wherein the step of performing multi-level convolution operation according to the image pyramid to obtain the vehicle detection frame, the vehicle type, the wheel detection frame and the wheel and ground intersection position information specifically comprises the following steps:

5. The vehicle orientation detection method based on the multitask convolutional neural network as claimed in claim 3, wherein the step of obtaining the vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame and the intersection point position information of the wheel and the ground comprises the following specific steps:

6. The vehicle orientation detection method based on the multitask convolutional neural network as claimed in claim 5, wherein the step of obtaining the orientation angle of the vehicle by mapping the internal and external parameters of the cameras combined with the intersection points of the two wheels and the ground into a world coordinate system comprises the following steps:

7. The vehicle orientation detection method based on the multitask convolutional neural network according to claim 2, wherein the step of obtaining a training picture, inputting the training picture into an original multi-stage multitask convolutional neural network model, and obtaining a total loss function of the original multi-stage convolutional neural network model specifically comprises the following steps:

8. The vehicle orientation detection method based on the multitask convolutional neural network according to claim 7, wherein the step of obtaining the total loss function of the converged multistage multitask convolutional neural network model specifically comprises the following steps:

wherein ,

is the result of the neural network regression frame,

coordinates that are true values;

wherein ,

the real intersection point coordinates are obtained;

w1 class _ loss + W2 carbon _ loss + W3 wheelbox _ loss + W4 point _ loss formula (5);

wherein, W1, W2, W3 and W4 are weights of the loss functions respectively.

9. The vehicle orientation detection method based on the multitask convolutional neural network as claimed in claim 1, wherein the step of obtaining the real-time vehicle orientation angle of the front vehicle through the real-time wheel and ground intersection position information specifically comprises the following steps:

10. A vehicle orientation detection system based on a multitasking convolutional neural network, comprising: