CN112507862B

CN112507862B - Vehicle orientation detection method and system based on multitasking convolutional neural network

Info

Publication number: CN112507862B
Application number: CN202011411157.5A
Authority: CN
Inventors: 陈智磊; 乔文龙; 李泽彬
Original assignee: Dongfeng Motor Corp
Current assignee: Dongfeng Motor Corp
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2023-05-26
Anticipated expiration: 2040-12-04
Also published as: CN112507862A

Abstract

The invention relates to a vehicle orientation detection method and system based on a multitasking convolutional neural network. The vehicle orientation detection method based on the multitasking convolutional neural network comprises the following steps: establishing a mature multistage multitasking convolutional neural network model; acquiring a real-time RGB image of a front vehicle, and inputting the real-time RGB image of the front vehicle into a multi-stage multi-task convolutional neural network model to obtain real-time vehicle position, real-time vehicle type, real-time wheel position and real-time wheel and ground intersection point position information of the front vehicle; and optimizing the real-time wheel position and the position information of the intersection point of the real-time wheel and the ground, and acquiring the real-time vehicle orientation angle of the front vehicle through the position information of the intersection point of the real-time wheel and the ground. The invention can solve the problems that the 3D detection frame of the front vehicle cannot be trained through deep learning and TTC calculation is inaccurate due to lack of front vehicle orientation information when the TTC is calculated by utilizing the pixel width change of the vehicle.

Description

Vehicle orientation detection method and system based on multitasking convolutional neural network

Technical Field

The invention relates to the technical field of automatic driving, in particular to a vehicle orientation detection method and system based on a multitasking convolutional neural network.

Background

In ADAS (Advanced DrivingAssistance System ) monocular vision, accurate detection of a vehicle ahead is an indispensable function of an ADAS system, but 2D (two-dimensional) detection of a vehicle ahead cannot meet the requirement of further improving the intellectualization of the ADAS system, so that a 3D (three-dimensional) frame detection technology of a vehicle ahead in monocular vision needs to be studied, 2D detection of a 3D detection of a vehicle can be realized by calculating the orientation of the vehicle ahead, and the angle of the orientation of the vehicle ahead can be effectively calculated by the key point of the intersection of the wheels of the vehicle ahead and the ground. In addition, when calculating the collision time of the vehicle (TTC, time to Collision), the calculation is often performed by using the change of the pixel width of the front vehicle, and the direction of the front vehicle needs to be considered. By identifying the orientation angle of the detected front vehicle, the transformation can be corrected, so that the calculation of TTC is more accurate. In paper Vision-based ACC with a Single Camera: bounds on Range and Range RateAccuracy since the problem of the front vehicle orientation is not considered when calculating the collision time of the vehicle, the TTC calculation and the true value error increase when the front vehicle orientation changes greatly.

In FCW (Forward Collision Warning, front collision warning system) products based on monocular vision, it is necessary to monitor the front vehicle, determine the distance, direction and relative speed between the host vehicle and the front vehicle, and warn the driver when there is a potential collision risk. For the vehicle detection problem, a deep learning-based method is generally adopted to detect the position and type of the vehicle. For the vehicle orientation, deep learning is mostly adopted, three-dimensional information of the vehicle is used as training data, and the vehicle orientation information is obtained through a regression algorithm. 3D (three-dimensional) target detection of single RGB image (Monocular 3D Object Detection and BOX Fitting trained end-to-end using intersecton over union loss) inputs single RGB image, directly outputs predicted target category, 2D (two-dimensional) frame position, target distance, target deflection angle, length width height of 3D frame and eight vertexes of 3D frame to 2D coordinate position through a CNN (Convolutional Neural Networks, convolutional neural network), extracts optimal 2D frame through NMS (Non-Maximum Suppression ), converts 3D BOXFOITting into three information of target category, 2D frame and 3D frame, and labeled groundtruth (true value, true effective value) in one-to-one correspondence, and optimizes IOUloss for network regression training. The method needs to label the data with groudtluth information such as length, width, height, orientation and the like.

However, in reality, the method based on deep learning requires a large amount of groudtluth data as a premise to obtain the target 3D detection frame through training, and if the data are less, the model obtained through training tends to be over-fitted, and the generalization capability is weak. Under the condition of a general road, the vehicles in front are more, if the data are collected, the data are not matched with laser radar data fusion, and the length, width and height actual physical size and orientation information of the vehicles in front are difficult to obtain by only splicing the monocular camera. The laser radar has high cost, and the related length, width and height actual physical dimension and orientation information of the front vehicle can be obtained only by carrying out data fusion after the laser radar and the monocular camera are calibrated in a combined mode. Therefore, three-dimensional data information of the front vehicle is difficult to acquire, and training data requirements are difficult to meet.

Disclosure of Invention

The invention provides a vehicle orientation detection method and system based on a multitasking convolutional neural network, which solves the problems that a front vehicle 3D detection frame cannot be trained through deep learning, and TTC calculation is inaccurate due to lack of front vehicle orientation information when TTC is calculated by utilizing vehicle pixel width variation.

In a first aspect, the present invention provides a method for detecting a vehicle orientation based on a multitasking convolutional neural network, including the steps of:

Establishing a mature multistage multitasking convolutional neural network model;

acquiring a real-time RGB image of a front vehicle, and inputting the real-time RGB image of the front vehicle into a multi-stage multi-task convolutional neural network model to obtain real-time vehicle position, real-time vehicle type, real-time wheel position and real-time wheel and ground intersection point position information of the front vehicle;

and optimizing the real-time wheel position and the position information of the intersection point of the real-time wheel and the ground, and acquiring the real-time vehicle orientation angle of the front vehicle through the position information of the intersection point of the real-time wheel and the ground.

In some embodiments, the step of establishing the mature multi-stage multitasking convolutional neural network model specifically includes the following steps:

establishing an original multi-stage multitasking convolutional neural network model;

acquiring a training picture, and inputting the training picture into an original multi-stage multi-task convolutional neural network model to obtain a total loss function of the original multi-stage multi-task convolutional neural network model;

and obtaining a mature multi-stage multi-task convolutional neural network model according to the original multi-stage multi-task convolutional neural network model and the total loss function thereof.

In some embodiments, the step of establishing the original multi-stage multitasking convolutional neural network model specifically includes the following steps:

Obtaining an original picture, and scaling the original picture to different scales to form an image pyramid;

performing multi-level convolution operation according to the image pyramid to obtain the position information of the vehicle detection frame, the vehicle type, the wheel detection frame, the intersection point of the wheel and the ground;

acquiring a vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame and the intersection point position information of the wheels and the ground;

according to the steps, an original multi-stage multitasking convolutional neural network model is obtained.

In some embodiments, the step of performing a multi-level convolution operation according to the image pyramid to obtain the position information of the intersection point of the vehicle detection frame, the vehicle type, the wheel detection frame, the wheel and the ground specifically includes the following steps:

extracting preliminary features and calibration frames of an image through a first-level multitask convolutional neural network by adopting a dense sliding window on an image pyramid, carrying out boundary frame regression adjustment processing to obtain candidate boundary frames, calibrating windows of the candidate boundary frames, and then merging the highly overlapped candidate boundary frames through non-maximum suppression to obtain a rough positioning candidate frame image of a vehicle and wheels;

taking a candidate frame image obtained after combining the highly overlapped candidate boundary frames as input, performing second-level multitasking convolutional neural network operation, and eliminating false detection targets through boundary frame regression adjustment and non-maximum value inhibition;

And scaling the candidate frame images with the false detection targets removed, taking the scaled candidate frame images as input, and performing third-level multitasking convolutional neural network operation to obtain final vehicle detection frames, vehicle types, wheel detection frames and intersection point position information of wheels and the ground.

In some embodiments, the step of acquiring the vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame, the intersection point position information of the wheel and the ground specifically includes the following steps:

connecting each wheel with the ground intersection point on the image according to the vehicle detection frame, the vehicle type, the wheel detection frame and the key point position information of the intersection point of the wheel and the ground;

screening out intersection points of two wheels and the ground, which are not greatly different in horizontal position and belong to the same vehicle, and mapping the inner and outer parameters of the camera combined with the intersection points of the two wheels and the ground to a world coordinate system to obtain the vehicle orientation angle.

In some embodiments, the step of mapping the inner and outer parameters of the two wheels and the ground intersection point combined with the camera to the world coordinate system to obtain the orientation angle of the vehicle specifically includes the following steps:

respectively establishing an image coordinate system, a camera coordinate system, a vehicle coordinate system and a world coordinate system;

Calibrating the internal and external parameters of the camera according to a Zhang Zhengyou checkerboard mode, and establishing a mapping relation from an image coordinate to a world coordinate system;

and according to the image coordinates of the intersection points of the two wheels and the ground, mapping the image coordinates to a world coordinate system, obtaining the included angle between the straight line of the intersection points of the two wheels and the ground and the world coordinate system axis along the lane direction, and calculating the vehicle orientation angle.

In some embodiments, the step of obtaining a training picture, inputting the training picture into the original multi-stage multi-task convolutional neural network model to obtain the total loss function of the original multi-stage multi-task convolutional neural network model specifically includes the following steps:

marking the training pictures with data, and marking the vehicle detection frame, the vehicle type, the wheel detection frame and the intersection point information of the wheels and the ground;

inputting the marked vehicle detection frame and vehicle type into an original multi-stage multi-task convolutional neural network model for training, and adjusting parameters to obtain a converged multi-stage multi-task convolutional neural network model;

and inputting the marked vehicle detection frame, the vehicle type, the wheel detection frame and the intersection information of the wheels and the ground into the converged multi-stage multi-task convolutional neural network model for training to obtain the total loss function of the converged multi-stage multi-task convolutional neural network model.

In some embodiments, the step of obtaining the total loss function of the converged multi-stage multitasking convolutional neural network model specifically includes the following steps:

the loss function of the vehicle class of the converged multi-stage multitasking convolutional neural network model is obtained as follows:

where K is the number of categories, y is the tag, and p is the probability that the given category is i;

the loss function of the vehicle detection frame of the converged multi-stage multitasking convolutional neural network model is obtained as follows:

the loss function of the wheel detection frame of the converged multi-stage multitasking convolutional neural network model is obtained as follows:

wherein ,

is the result of neural network regression frame, +.>

Coordinates that are true values;

the loss function of the intersection point of the wheel and the ground of the converged multi-stage multitasking convolutional neural network model is obtained as follows:

wherein ,

intersection point of wheel and ground output for neural network, < ->

Is the real intersection point coordinate; />

Setting proper weights for the loss functions, inputting training pictures into a converged multi-stage multi-task convolutional neural network model for training, and obtaining the total loss function as follows;

loss＝W1*class_loss+W2*carbox_loss+W3*wheelbox_loss+W4*point_loss

formula (5);

wherein W1, W2, W3, W4 are the weights of the respective loss functions.

In some embodiments, the step of acquiring the real-time vehicle orientation angle of the front vehicle through the position information of the intersection point of the real-time wheel and the ground specifically includes the following steps:

connecting each real-time wheel with the ground intersection point on the image through the position information of the intersection point of the real-time wheel of the front vehicle and the ground;

and screening out intersection points of two real-time wheels which are not greatly different in horizontal position and belong to the front vehicle and the ground, and mapping the inner and outer parameters of the two real-time wheels and the ground intersection point combined camera to the world coordinate system to obtain the real-time vehicle orientation angle of the front vehicle.

In a second aspect, the present invention provides a vehicle orientation detection system based on a multi-tasking convolutional neural network, comprising:

the model building module is used for building a multi-stage multitasking convolutional neural network model;

the key point position acquisition module is used for acquiring a real-time RGB image of the front vehicle, inputting the real-time RGB image of the front vehicle into the multi-stage multi-task convolutional neural network model, and obtaining real-time vehicle position, real-time vehicle type, real-time wheel position and real-time key point position information of intersection points of wheels and the ground; the method comprises the steps of,

the vehicle orientation angle acquisition module is used for optimally processing the real-time wheel positions and the real-time intersection point position information of the wheels and the ground, and acquiring the real-time vehicle orientation angle of the front vehicle through the real-time intersection point position information of the wheels and the ground.

The technical scheme provided by the invention has the beneficial effects that:

the embodiment of the invention provides a vehicle orientation detection method based on a multi-task convolutional neural network, which is characterized in that an acquired front vehicle RGB image is used as input, a multi-stage multi-task convolutional neural network is utilized to detect the position of a vehicle, the type of the vehicle, the position of a wheel, the position information of the key point of the intersection point of the wheel and the ground, the position of the wheel and the position of the key point of the intersection point of the wheel and the ground are optimized through post-processing, and finally the orientation angle of a vehicle is calculated through the key point.

Aiming at the problem that the 3D detection frame of the front vehicle cannot be trained directly through deep learning due to the lack of three-dimensional training data of the front vehicle, the method adopts deep learning to detect the 2D frame detection of the front vehicle and the front vehicle orientation angle required by the 2D-3D conversion of the vehicle frame. By detecting the intersection point of the wheels and the ground in the 2D frame, the orientation angle of the vehicle in the image is calculated, and powerful support is provided for the subsequent conversion from the 2D frame to the 3D frame of the vehicle. Therefore, only the data collected by the monocular camera is needed for training, the front vehicle width height and the direction angle can be calculated, and the data collection cost and the time period are reduced. Moreover, by acquiring the vehicle orientation angle of the preceding vehicle, the problem that TTC calculation is inaccurate due to lack of the preceding vehicle orientation when the TTC is calculated by utilizing the change of the pixel width of the vehicle is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of steps of a method for detecting a vehicle orientation based on a multitasking convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a detailed step flow diagram of step S100 of a vehicle orientation detection method based on a multi-tasking convolutional neural network according to an embodiment of the present invention;

FIG. 3 is a detailed step flow chart of step S110 of a vehicle orientation detection method based on a multi-tasking convolutional neural network according to another embodiment of the present invention;

FIG. 4 is a detailed step flow chart of step S120 of a vehicle orientation detection method based on a multi-task convolutional neural network according to an embodiment of the present invention;

fig. 5 is a training schematic diagram of a multi-layer multi-task convolutional neural network based on a vehicle orientation detection method of the multi-task convolutional neural network according to an embodiment of the invention.

Detailed Description

Reference will now be made in detail to the present embodiments of the invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the specific embodiments, it will be understood that they are not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. It should be noted that the method steps described herein may be implemented by any functional block or arrangement of functions, and any functional block or arrangement of functions may be implemented as a physical entity or a logical entity, or a combination of both.

The present invention will be described in further detail below with reference to the drawings and detailed description for the purpose of enabling those skilled in the art to understand the invention better.

Note that: the examples to be described below are only one specific example, and not as limiting the embodiments of the present invention necessarily to the following specific steps, values, conditions, data, sequences, etc. Those skilled in the art can, upon reading the present specification, make and use the concepts of the invention to construct further embodiments not mentioned in the specification.

The invention provides a vehicle orientation detection method and system based on a multitasking convolutional neural network, which solve the problems that a front vehicle 3D detection frame cannot be trained through deep learning in the related art, and TTC calculation is inaccurate due to lack of front vehicle orientation information when TTC is calculated by utilizing vehicle pixel width change.

Specifically, as shown in fig. 1, the invention provides a vehicle orientation detection method based on a multitasking convolutional neural network, which comprises the following steps:

s100, establishing a mature multistage multitasking convolutional neural network model;

s200, acquiring a real-time RGB image of a front vehicle, and inputting the real-time RGB image of the front vehicle into a multi-stage multi-task convolutional neural network model to obtain real-time vehicle position, real-time vehicle type, real-time wheel position and real-time wheel and ground intersection point position information of the front vehicle;

s300, optimizing and processing the real-time wheel positions and the position information of the intersection points of the real-time wheels and the ground, and acquiring the real-time vehicle orientation angle of the front vehicle through the position information of the intersection points of the real-time wheels and the ground.

According to the vehicle orientation detection method based on the multi-task convolutional neural network, the collected RGB image of the front vehicle is used as input, the multi-stage multi-task convolutional neural network is utilized, meanwhile, the position information of key points of the vehicle, the vehicle type, the wheel position and the wheel and the ground intersection point is detected, then the wheel position and the key point position of the wheel and the ground intersection point are optimized through post-processing, and finally the vehicle orientation angle is calculated through the key point position.

As shown in fig. 2, the step S100 is the step of establishing a mature multi-stage multi-task convolutional neural network model, and specifically includes the following steps:

s110, an original multi-stage multitasking convolutional neural network model is established;

s120, acquiring a training picture, and inputting the training picture into an original multi-stage multi-task convolutional neural network model to obtain a total loss function of the original multi-stage multi-task convolutional neural network model;

s130, obtaining a mature multi-stage multi-task convolutional neural network model according to the original multi-stage multi-task convolutional neural network model and the total loss function.

Before the mature multi-stage multi-task convolutional neural network model is obtained, a preliminary (original) multi-stage multi-task convolutional neural network model is firstly established, and the preliminary multi-stage multi-task convolutional neural network model is required to be trained so as to obtain various parameters (including various loss functions and loss weights) of the multi-stage multi-task convolutional neural network model, so that the mature multi-stage multi-task convolutional neural network model is obtained.

Further, as shown in fig. 3, the step S110 is the step of "building an original multi-stage and multi-task convolutional neural network model", and specifically includes the following steps:

S112, acquiring an original picture, and scaling the original picture to different scales to form an image pyramid.

Specifically, the collected original RGB images can be transformed in different scales to construct a 7-layer image pyramid so as to adapt to vehicle detection of different sizes. Wherein the dimensions are 2, 4, 8, 16, 32, 64, 128, respectively.

And S114, carrying out multi-level convolution operation according to the image pyramid to obtain the position information of the vehicle detection frame, the vehicle type, the wheel detection frame, and the intersection point of the wheel and the ground.

Step S114, namely the step of performing multi-level convolution operation according to the image pyramid to obtain the position information of the intersection point of the vehicle detection frame, the vehicle type, the wheel detection frame, the wheel and the ground, specifically includes the following steps:

s1142, extracting preliminary features and calibration frames of an image through a first-level multi-task convolutional neural network by adopting a dense sliding window on an image pyramid, carrying out bounding box regression adjustment processing to obtain candidate bounding boxes, calibrating windows of the candidate bounding boxes, and then merging the highly overlapped candidate bounding boxes through non-maximum suppression to obtain a rough positioning candidate frame image of the vehicle and the wheels.

Specifically, on all pyramid images, a dense sliding window 12×12 is used, a frame is initially extracted and calibrated through a full convolution network FCN on a three-layer full convolution neural network, and a bounding box regression adjustment and non-maximum suppression (NMS) are performed to filter most of the windows.

For example, the model input is a 12×12×3 size picture, and 10 characteristic maps 5*5 are generated by 10 Max Pooling operations (stride=2) of 3×3 by 10 convolution kernels of 3×3; generating 16 3*3 feature maps by 16 convolution kernels of 3 x 10; generating 32 1*1 feature maps by 32 convolution kernels of 3 x 16; for 32 1*1 feature maps, 12 1*1 feature maps can be generated for classification by 12 convolution kernels of 1 x 32, wherein the vehicle type is 11 types, and the vehicle type is not target 1 types; 4 convolution kernels of 1 x 32, generating 4 signature graphs of 1*1 for regression vehicle frames; the 4 1 x 32 convolution kernels generate 4 1*1 feature maps for regression to the wheel boxes, and the 2 1 x 32 convolution kernels generate 2 1*1 feature maps for regression to the vehicle footprint (intersection of the wheel and the ground).

S1144, taking the candidate frame images obtained after combining the highly overlapped candidate boundary frames as input, performing second-level multitasking convolutional neural network operation, and eliminating false detection targets through boundary frame regression adjustment and non-maximum value inhibition.

Specifically, the candidate frame image obtained in step S1142 is used as input, scaled to a size of 24×24 images, and finally, a large number of candidate frame images with relatively poor effect are filtered through a full link layer on a four-layer convolutional neural network, and finally, the selected candidate frame images are subjected to bounding box regression adjustment and non-maximum suppression (NMS) to further optimize the prediction result.

For example, the model input is a 24 x 3 size picture, with 28 3 x 3 convolution kernels, 2×2 Max Pooling (stride=2) operations, generating 28 feature maps of 11×11; 48 signature graphs 4*4 are generated through 48 Max Pooling (stride=2) operations of 3×3×28 convolution kernels; generating 64 3*3 feature maps by 64 convolution kernels of 3 x 48; converting to a 128-sized full link layer through a full link operation; for 1 feature map of 128×1, 12 feature maps of 1*1 can be generated for classification by 12 convolution kernels of 1×128, wherein the vehicle type is of type 11, and the vehicle type is not of type 1; 4 convolution kernels of 1 x 128, generating 4 1*1 feature maps for regression vehicle boxes; the 4 1 x 128 convolution kernels generate 4 1*1 feature maps for returning to the wheel box and the 2 1 x 128 convolutions generate 2 1*1 feature maps for returning to the vehicle footprint (wheel-to-ground intersection).

S1146, scaling the candidate frame images with the false detection targets removed and then taking the scaled candidate frame images as input, and performing third-level multitasking convolutional neural network operation to obtain final vehicle detection frames, vehicle types, wheel detection frames and intersection point position information of wheels and the ground.

Specifically, the candidate frame image obtained in step S1144 may be scaled to 48×48 image size as input, and finally, the final vehicle detection frame, the vehicle category, the wheel detection frame, and the intersection point of the wheel and the ground are output through a full link layer on a five-layer convolutional neural network.

For example, the model input is a 48 x 3 size picture, by 32 convolution kernels of 3 x 3 and 3*3 (stride=2) after max mapping, to 32 feature maps of 23 x 23; the characteristic map was converted into 64 10 x 10 features after passing the convolution kernels of 64 3 x 32 and max pooling of 3*3 (stride=2); the feature map was converted into 64 4*4 by 64 convolution kernels of 3×3×64 and max pooling of 3*3 (stride=2); the feature map was converted into 128 3*3 by 128 convolution kernels of 2×64; converting to a full link layer of 256 sizes through a full link operation; for 1 feature map of 256×1, 12 feature maps of 1*1 can be generated for classification by 12 convolution kernels of 1×256, wherein the vehicle type is of type 11, and the vehicle type is not of type 1; 4 convolution kernels of 1×256, generating 4 signature graphs of 1*1 for regression vehicle boxes; the 4 1 x 256 convolution kernels generate 4 1*1 signature for regression wheel boxes and the 2 1 x 256 convolutions generate 2 1*1 signature for regression vehicle landing points.

Steps S1142, S1144, S1146 are three cascaded networks, and the models and parameters of these three cascaded networks may be trained together on a pre-trained basis when model training is performed in a subsequent step.

S116, acquiring a vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame, and the intersection point position information of the wheels and the ground.

The step S116, namely, the step of acquiring the vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame, the intersection point position information of the wheel and the ground, specifically includes the following steps:

s1162, connecting each wheel with the ground intersection point on the image according to the vehicle detection frame, the vehicle type, the wheel detection frame and the key point position information of the wheel and the ground intersection point;

s1164, screening intersection points of two wheels and the ground, which are not greatly different in horizontal position and belong to the same vehicle, mapping the inner and outer parameters of the two wheels and the ground intersection point combined camera to the world coordinate system, and obtaining the vehicle orientation angle.

Further, in the step S1164, that is, the step of mapping the inner and outer parameters of the two wheels and the ground intersection point combined with the camera to the world coordinate system to obtain the orientation angle of the vehicle, specifically includes the following steps:

Defining a world coordinate system, wherein a zw axis is vertical to the ground through a vehicle roof and points upwards, a yw axis is along the direction of a straight line lane line, and an xw axis and points to the right side of the straight line lane line, so that the following coordinates are obtained:

wherein x, y is the image coordinates, f _x ，f _y For camera pixel focal length, u _x ，u _y For the optical center coordinates, R is a rotation matrix of 3*3, the rotation matrix is obtained according to three Euler angles of the installation of the camera, k is a scale factor, and the rotation matrix can be calculated according to the installation height of the camera.

The world coordinate straight line connecting the intersection points of the front wheel and the rear wheel and the ground, and the vehicle orientation angle is the slope of the straight line to find the inverse tangent.

S118, according to the steps, an original multi-stage multitasking convolutional neural network model is obtained.

In addition, as shown in fig. 4 and 5, step S120, namely the step of acquiring a training picture, inputting the training picture into the original multi-stage and multi-task convolutional neural network model to obtain the total loss function of the original multi-stage and multi-task convolutional neural network model, specifically includes the following steps:

S122, marking the training pictures with data, and marking the vehicle detection frames, the vehicle types, the wheel detection frames and the intersection point information of the wheels and the ground;

s124, inputting the marked vehicle detection frame and the marked vehicle type into the original multi-stage multi-task convolutional neural network model for training, and adjusting parameters to obtain the converged multi-stage multi-task convolutional neural network model.

Firstly, monitoring and training a vehicle detection task by using the vehicle detection frame and the labeling information of the vehicle type, and adjusting parameters to enable the model to be converged; the method comprises the steps of training a vehicle detection network, keeping parameters of other branches without learning, adjusting the parameters of the vehicle detection network, printing loss function values of different parameters, analyzing curve trend, further adjusting the parameters, and finally converging the parameters.

S126, inputting the marked vehicle detection frame, the vehicle type, the wheel detection frame and the intersection information of the wheels and the ground into the converged multi-stage multi-task convolutional neural network model for training, and obtaining the total loss function of the converged multi-stage multi-task convolutional neural network model.

On the basis of the model trained in the step S124, model training is carried out by using all marking information (namely, vehicle detection frames and vehicle types, wheel detection frames, and intersection points of wheels and the ground); during training, the losses of all three layers are accumulated to a target loss function. Specifically, the loss here is a loss of three cascade network outputs, and the losses of the outputs of the steps S1142, S1144, and S1146 are weighted and accumulated. Wherein, the loss weight output in step S1146 > the loss weight output in step S1144 > the loss weight output in step S1142.

Moreover, in some embodiments, in the step S126, the step of obtaining the total loss function of the converged multi-stage multi-task convolutional neural network model specifically includes the following steps:

wherein ,

is the result of neural network regression frame, +.>

Coordinates that are true values;

wherein ,

intersection point of wheel and ground output for neural network, < ->

Is the real intersection point coordinate;

loss＝W1*class_loss+W2*carbox_loss+W3*wheelbox_loss+W4*point_loss

formula (5);

wherein W1, W2, W3, W4 are the weights of the respective loss functions. Through experimental adjustment, the loss of each task can be on the same order of magnitude.

Specifically, the adjustment process includes the following: firstly, setting an initial value, training a model, and after obtaining a result, looking at the magnitude of the loss of each task, and on the basis, adjusting the loss weight W so that the loss is in one magnitude. Then, fine tuning of small step length is carried out on the basis, the small step length is written into a permutation matrix, then parameters of all permutation and combination are automatically trained, data and parameters of all permutation and combination training are obtained, and finally recall rate is evaluated, so that an optimal parameter and model are obtained.

In addition, in some embodiments, in the step S300, the step of acquiring the real-time vehicle orientation angle of the front vehicle through the real-time intersection point position information of the wheels and the ground specifically includes the following steps:

s310, connecting each real-time wheel with the ground intersection point on the image through the position information of the intersection point of the real-time wheel of the front vehicle and the ground;

s320, screening out intersection points of two real-time wheels and the ground, which are of small horizontal position difference and belong to the front vehicle, and mapping the inner and outer parameters of the two real-time wheels and the ground intersection point combined camera to the world coordinate system to obtain the real-time vehicle orientation angle of the front vehicle.

Finally, when the real-time vehicle orientation angle of the preceding vehicle is obtained, the method is substantially the same as the step S116, and will not be described herein.

In addition, the invention provides a vehicle orientation detection system based on a multitasking convolutional neural network, comprising:

The vehicle orientation detection system based on the multi-task convolutional neural network in this embodiment corresponds to the above vehicle orientation detection method based on the multi-task convolutional neural network, and functions of each module in the vehicle orientation detection system based on the multi-task convolutional neural network in this embodiment are described in detail in the corresponding method embodiments, which are not described here one by one.

According to the technical scheme provided by the invention, the problem that the front vehicle 3D detection frame cannot be trained directly through deep learning due to the lack of the front vehicle three-dimensional training data is solved by adopting deep learning to detect the front vehicle 2D frame detection and the front vehicle orientation angle required by the 2D-to-3D conversion of the vehicle frame. By detecting the intersection point of the wheels and the ground in the 2D frame, the orientation angle of the vehicle in the image is calculated, and powerful support is provided for the subsequent conversion from the 2D frame to the 3D frame of the vehicle. Therefore, only the data collected by the monocular camera is needed for training, the front vehicle width height and the direction angle can be calculated, and the data collection cost and the time period are reduced. Moreover, by acquiring the vehicle orientation angle of the preceding vehicle, the problem that TTC calculation is inaccurate due to lack of the preceding vehicle orientation when the TTC is calculated by utilizing the change of the pixel width of the vehicle is solved.

Based on the same inventive concept, the embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements all or part of the method steps of the above method.

The present invention may be implemented by implementing all or part of the above-described method flow, or by instructing the relevant hardware by a computer program, which may be stored in a computer readable storage medium, and which when executed by a processor, may implement the steps of the above-described method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

Based on the same inventive concept, the embodiments of the present application further provide an electronic device, including a memory and a processor, where the memory stores a computer program running on the processor, and when the processor executes the computer program, the processor implements all or part of the method steps in the above method.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being a control center of the computer device, and the various interfaces and lines connecting the various parts of the overall computer device.

The memory may be used to store computer programs and/or models, and the processor implements various functions of the computer device by running or executing the computer programs and/or models stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (e.g., a sound playing function, an image playing function, etc.); the storage data area may store data (e.g., audio data, video data, etc.) created according to the use of the handset. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, server, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), servers and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The vehicle orientation detection method based on the multitasking convolutional neural network is characterized by comprising the following steps of:

optimizing real-time wheel positions and real-time wheel and ground intersection point position information, and acquiring real-time vehicle orientation angles of the front vehicle through the real-time wheel and ground intersection point position information;

the step of establishing the mature multistage multitask convolutional neural network model specifically comprises the following steps:

obtaining a mature multi-stage multi-task convolutional neural network model according to the original multi-stage multi-task convolutional neural network model and the total loss function thereof;

The step of establishing an original multistage multitask convolutional neural network model specifically comprises the following steps:

according to the steps, an original multi-stage multitasking convolutional neural network model is obtained;

the step of carrying out multilevel convolution operation according to the image pyramid to obtain the position information of the intersection point of the vehicle detection frame, the vehicle type, the wheel detection frame, the wheel and the ground specifically comprises the following steps:

2. The method for detecting the vehicle orientation based on the multitasking convolutional neural network according to claim 1, wherein the step of acquiring the vehicle orientation angle according to the vehicle detection frame, the vehicle type, the wheel detection frame, the intersection point position information of the wheel and the ground specifically comprises the following steps:

3. The method for detecting the direction of the vehicle based on the multitasking convolutional neural network according to claim 2, wherein the step of mapping the inner and outer parameters of the camera combined with the intersection point of the two wheels and the ground to the world coordinate system to obtain the direction angle of the vehicle comprises the following steps:

4. The method for detecting the vehicle orientation based on the multi-task convolutional neural network according to claim 1, wherein the step of acquiring a training picture, inputting the training picture into an original multi-stage multi-task convolutional neural network model to obtain a total loss function of the original multi-stage multi-task convolutional neural network model comprises the following steps:

5. The method for detecting the vehicle orientation based on the multi-task convolutional neural network according to claim 4, wherein the step of obtaining the total loss function of the converged multi-stage multi-task convolutional neural network model comprises the following steps:

wherein K is the number of species, y is the tag, p _i Is a probability that the specified class is i;

wherein ,

is the result of neural network regression frame, +.>

Coordinates that are true values;

wherein ,

intersection point of wheel and ground output for neural network, < ->

Is the real intersection point coordinate;

loss=w1_loss+w2_loss+w3_loss+w4_loss formula (5);

wherein W1, W2, W3, W4 are the weights of the respective loss functions.

6. The method for detecting the vehicle orientation based on the multitasking convolutional neural network according to claim 1, wherein the step of acquiring the real-time vehicle orientation angle of the preceding vehicle by the real-time intersection point position information of the wheels and the ground specifically comprises the following steps: