CN112215071A

CN112215071A - Vehicle-mounted multi-target coupling identification and tracking method for automatic driving under heterogeneous traffic flow

Info

Publication number: CN112215071A
Application number: CN202010948875.XA
Authority: CN
Inventors: 万千; 谢振友; 彭国庆; 林初染; 龙朝党; 陆盛康
Original assignee: Hualan Design Group Co ltd; Guilin University of Electronic Technology
Current assignee: Hualan Design Group Co ltd; Guilin University of Electronic Technology
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2021-01-12

Abstract

The invention relates to an automatic driving vehicle-mounted multi-target coupling identification and tracking method under heterogeneous traffic flow, which comprises the steps of establishing a vehicle-mounted embedded application program resource scheduler, wherein the identification and tracking method comprises the steps of carrying out model pruning and restoration processing on a driving scene identification model and establishing a resource distribution framework for supporting deep learning model application program dynamics; and establishing a deep learning model operation resource allocation scheduler, flexibly allocating resources for the deep learning models operated concurrently through the resource scheduler, and outputting an optimized scheduling scheme. The invention can reduce the memory occupation and the switching energy consumption of the deep learning model on the mobile visual equipment, provide flexible resource allocation and accuracy rate balance, reduce the processing delay, improve the multi-target identification efficiency of the automatic driving automobile, ensure that the processing of the automobile road cooperation is more timely and accurate when the automobile is automatically driven, and further improve the safety of the automatic driving automobile and the application prospect in the traffic field.

Description

Vehicle-mounted multi-target coupling identification and tracking method for automatic driving under heterogeneous traffic flow

Technical Field

The invention relates to the fields of computer vision technology and intelligent traffic, in particular to an automatic driving vehicle-mounted multi-target coupling identification and tracking method under heterogeneous traffic flow.

Background

With the development of deep learning technology, various artificial intelligence applications are attracting strong attention. The automatic driving is an important field for realizing intelligent traffic and establishing a strong traffic country in the future, and the attention of people is more and more attracted, the safety is always a key consideration of the automatic driving technology, and the automatic driving under the cooperation of the vehicle and the road is very important for improving the safety of the automatic driving. The visual perception is one of key technologies in automatic driving, and a camera and a sensor acquire data from an environment outside a vehicle and transmit the data to a processor; the recognition algorithm in the processor can recognize driving scene targets such as people, vehicles, roads, traffic sign and marking lines; the recognition algorithm is based on sensing and understanding of the environment, and the automatic driving vehicle can safely drive on the road. The application of automatic driving can comprehensively improve the driving safety and comfort, and has great market demand.

Although the deep learning algorithm has a good calculation rate at the PC end and the server, most of the current embedded mobile devices are far less computationally intensive than the server, so that many of the well-performing deep learning network models cannot be deployed on the embedded mobile devices. However, real-time visual recognition is required in an autonomous automobile, cloud support is not available, and computing resources of an embedded mobile device are limited. When it comes to compressing deep learning models reduces the resource requirements but reduces the accuracy and the resource budget of the compressed model is fixed, i.e. static. Then, on one hand, the deep learning model pays attention to the resource requirement, the resources required by the vehicle-mounted scene recognition system during operation are dynamically changed, and when the application programs are parallel and reach the maximum available resources, the application programs mutually compete for the resources, so that the frame frequency of streaming media video processing is lowered; on the other hand, if the mobile vision system has additional resources to run, the compression model cannot recover its reduced accuracy using the additional resources. Under a mixed traffic scene, a plurality of targets need to be identified, cameras and sensors of an automatic driving vehicle are distributed, the front, the back, the left and the right of the vehicle and the roof of the vehicle, streaming video data transmitted by each camera needs to be processed simultaneously, a mobile vision system needs to identify and process in real time, and under the limited computing resources of an embedded mobile device at a vehicle-mounted end, the resource requirement of a deep learning network model with excellent western performance cannot be well met, so that the accuracy of the automatic driving vehicle on scene identification is greatly reduced, the processing delay is increased, and the accuracy of the targets is affected and the real-time performance cannot be achieved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an automatic driving vehicle-mounted multi-target coupling identification and tracking method under heterogeneous traffic flow, which aims at embedded mobile equipment such as an automatic driving automobile and the like supporting a deep learning model, integrates and optimizes the existing algorithm framework, improves the resource distribution efficiency of the deep learning model at an embedded mobile terminal, realizes real-time visual processing of the automatic driving vehicle through equipment such as a camera and the like under a mixed traffic scene, improves the accuracy of the targets such as people, vehicles, roads and the like, reduces processing delay, and has important significance for improving driving safety and improving the efficiency of a traffic system.

The purpose of the invention is realized by the following technical scheme: the identification and tracking method comprises the steps of establishing a vehicle-mounted embedded application program resource scheduler, and comprises the following steps:

performing model pruning and restoration processing on the driving scene recognition model, and establishing a resource allocation framework supporting the dynamic state of the deep learning model application program;

and establishing a deep learning model operation resource allocation scheduler, flexibly allocating resources for the deep learning models operated concurrently through the resource scheduler, and outputting an optimized scheduling scheme.

Further, the identification and tracking method also comprises the step of establishing and training a driving scene identification model before establishing the vehicle-mounted embedded application program resource scheduler; the step of establishing and training a driving scene recognition model comprises the following steps: and establishing a driving scene sample data set and a recognition tracking model for training multiple targets in a driving scene.

Further, the performing model pruning and restoration processing on the driving scene recognition model and establishing a resource allocation framework supporting deep learning model application dynamics includes:

pruning neurons of the deep learning model, and compressing the deep learning model;

restoring the deep learning model to generate a multi-capacity model;

and analyzing the given vehicle-mounted traffic scene recognition system model, selecting the optimal resource precision balance according to the requirement of each traffic scene recognition model, and optimizing the inference accuracy, the memory occupation and the processing delay of each derived model.

Further, the pruning neurons of the deep learning model, and compressing the deep learning model includes:

setting the matrix of each layer of the convolution layer as | M_ij|，M_ijRepresenting a specific value in each convolution kernel, summing the absolute values of the weight values of all convolution kernels, i.e.

mi is the channel number of the convolution kernel;

according to importance to S_iSorting, setting a minimum threshold and comparing S below the minimum threshold_iPruning is carried out, and the feature graph of the convolution kernel of the next layer corresponding to the current layer is deleted;

and (3) creating a new matrix and generating new weights for the next layer corresponding to the pruned convolutional layer, integrating the model structure to complete pruning, retraining the pruned model to complete model compression, and taking the minimum pruning model as a seed model.

Further, the restoring the deep learning model and the generating the multi-capacity model include:

iteration is carried out by using the seed model to carry out model solidification on the filter parameters, and the existing model parameter data are saved;

according to the record of the neuron pruning path during model compression, on the basis of the reverse pruning direction, applying filter growth to supplement the pruned filter to increase the capacity of the model, and then increasing the accuracy of the model through training;

and continuously generating a new derivative model by repeatedly iterating the previous model, and finally generating a derivative model containing all the parameter capacities of the previous model to obtain the multi-capacity model.

Further, the establishing of the deep learning model and the running of the resource allocation scheduler flexibly allocate resources for the deep learning model running concurrently through the resource scheduler, and the outputting of the optimized scheduling scheme includes:

by setting a cost function in the vehicle-mounted driving scene recognition system

Analyzing all concurrently running application programs, and selecting an optimal derivative model for each application program;

the method comprises the steps of designing a scheduling scheme for the deep learning model in the parallel driving scene, minimizing the total cost of the application programs of the deep learning model running concurrently, carrying out optimization constraint on a resource perception scheduler and reasonably distributing the running resources of the parallel application programs so as to balance the running performance.

Further, the training of the recognition tracking model of multiple targets in the driving scene comprises:

training a driving scene multi-target recognition model based on a YOLOv3 algorithm by using a Dartnet deep learning framework, and importing a training set and a verification set;

setting training parameters, setting a learning weight parameter to be 0.0005, setting a momentum to be 0.9, setting a batch size to be 64, setting a learning rate to be 0.001 initially, and reducing the learning rate to be 0.0001 and 0.00001 in sequence, wherein the network iterates 25000, 20000 and 15000 times respectively corresponding to the learning rate of each stage, and the iteration is carried out for 60000 times in total;

the loss function adopts CIOU loss, when the descending gradient of the loss function tends to be smooth, the model training is finished, and the model parameters of the optimal training result are stored;

importing a Deepsort algorithm model into the trained model, modifying classes into the number of model identification classes 16, and setting filters as 3 x (5+ len (classes));

and training the model of YOLOv3 combined with Deepsort, and storing the model after finishing.

The invention has the following advantages: model pruning compression, restoration and analysis are carried out on the visual recognition deep learning model based on a NestDNN framework, and a resource scheduler is built so as to ensure the correctness and the effectiveness of compression of the visual recognition deep learning model and provide technical support for embedded vehicle-mounted equipment for automatic driving. Experimental results show that the method for recognizing and tracking the vehicle-mounted multi-target coupling of the automatic driving under the heterogeneous traffic flow can reduce the memory occupation and switching energy consumption of a deep learning model on mobile visual equipment, provide flexible balance of resource allocation and accuracy, reduce processing delay, improve the efficiency of multi-target recognition of the automatic driving vehicle, enable the vehicle to be more timely and accurate in the aspect of processing of vehicle-road cooperation during automatic driving, and further improve the safety of the automatic driving vehicle and the application prospect in the traffic field. .

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a graph of model training process loss variation;

FIG. 3 is a schematic view of the model before and after trimming;

FIG. 4 is a schematic diagram of pruning compression for a deep learning model network;

FIG. 5 is a graph of model accuracy effect after using the architectural approach;

FIG. 6 is a graph illustrating the effect of model processing time after the architectural approach is used.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the invention relates to a vehicle-mounted multi-target coupling identification and tracking method for automatic driving under heterogeneous traffic flow, which relates to the field of automatic driving automobiles with computer vision, and specifically comprises the following steps:

s1, training a driving scene recognition model:

s11, collecting a data set of a driving scene, and in order to explain implementation cases conveniently, in the embodiment, a German computer vision algorithm is used for evaluating the data set Kitti, and 9000, 1000 and 1000 images are selected as a training set, a verification set and a test set respectively.

S12, training a multi-target recognition tracking model in a driving scene, specifically comprising:

s121, training a driving scene multi-target recognition model based on a YOLOv3 algorithm by using a Dartnet deep learning framework, and importing a training set and a verification set;

s122, set training parameters, set Learning weight parameter (decay) to 0.0005, momentum to 0.9, Batch size (Batch) to 64, Learning rate (Learning _ rate) to 0.001 initially, and decrease to 0.0001 and 0.00001 in sequence. Corresponding to the learning rate of each stage, the network respectively iterates 25000, 20000 and 15000 times, and 60000 iterations are carried out in total;

and S123, adopting CIOU loss as a loss function. The loss function is formulated as:

wherein b, b^gtRepresenting a prediction box B and a real box B^gtC is a diagonal line of a minimum box containing the prediction box and the real box;

wherein

Is a weight function; wherein

Is a parameter for measuring the similarity of aspect ratio, particularly in the length and width of [0,1 ]]In the case of (a) in (b),

taking the value 1.

As shown in fig. 2, YOLOv3 model training ends when the loss function descent gradient tends to level off.

S124, after the training is finished, storing the model parameter frame of the optimal training result to obtain a trained YOLOv3 model;

s125, copying the trained model weights.h5 into a model _ data folder, and copying the deep _ sort _ Yolov3 model; generating a mars-small128.pb file and placing the mars-small pb file into a model _ data folder of a deep _ sort _ Yolov3 model; modifying classes to the number of model identification categories 16, setting filters to 3 × (5+ len (classes));

s126, training the model of YOLOv3 combined with Deepsort, and storing the model after finishing training.

S2, establishing a vehicle-mounted embedded application program resource scheduler:

s21, constructing a NestDNN resource scheduling framework;

s211, performing model compression on the driving scene recognition model;

as shown in fig. 3 and 4, for the neurons with smaller importance in the trained deep learning model, the convolutional layers are trimmed layer by layer and retrained again to complete model compression; and completing the model compression. And the number of parameters is reduced by removing unimportant parameters in the network, and the model compression is completed.

Specifically, the matrix for each layer of the convolutional layer is set to be | M_ij|，M_ijRepresenting a specific value in each convolution kernel, summing the absolute values of the weight values of all convolution kernels, i.e.

mi is the channel number of the convolution kernel; according to importance to S_iSorting, setting a minimum threshold and comparing S below the minimum threshold_iPruning is carried out, and the convolution kernel characteristic graph of the next layer corresponding to the current layer is deleted; creating a new matrix and generating new weights for the pruned convolutional layers and the corresponding next layer, and integrating the model structure to complete pruning; retraining the trimmed model, and repeating the training to trim the whole deep learning model network to obtain a seed model and complete model compression;

s212, performing model restoration on the driving scene recognition model;

iteration is carried out by using a seed model, model solidification is carried out on filter parameters, existing model parameter data are stored according to a neuron trimming path record during model compression, a trimmed filter is added back by applying filter growth based on a reverse trimming direction, and the model accuracy is increased by retraining; the new model generated based on the seed model nesting can share model parameters, and through repeated iteration, the previous model generates a new derivative model until the iteration is finished, and a derivative model containing all the previous model parameter capacities is generated, so that a multi-capacity model is obtained;

and S213, performing model analysis on the driving scene recognition model, taking the video acquired by each camera of the vehicle-mounted end as a deep learning model for traffic scene recognition, selecting the optimal resource precision balance according to the requirement of each traffic scene recognition model, and optimizing and deducing the accuracy, the memory occupation and the processing delay of each derivative model.

S22, establishing a driving scene recognition resource perception scheduler;

s221, establishing a deep learning model operation resource allocation scheduler, and setting a cost function C for the vehicle-mounted driving scene recognition application program, wherein the formula is as follows:

wherein V is an application program of the vehicle-mounted scene recognition system, V is a set of application programs which run in the vehicle-mounted traffic light recognition system concurrently, and V belongs to V. m is_vFor derived models, u_v∈[0，1]For the proportion of computing resources allocated to v,

processing delay allocated to v for all computing resources, A_min(v) For minimum processing precision, L_max(v) Is 0 max processing delay target.

The reasoning accuracy and the processing delay of each scene recognition application program are converted into a cost function, and a resource scheduling scheme basis is provided.

S222, establishing a deep learning model operation resource scheduling scheme, setting the lowest inference accuracy and processing delay according to different recognition scenes and targets, designing the parallel scene recognition application program scheduling scheme according to the set targets, minimizing the total cost of the application program of the concurrently operated deep learning model, and optimally restricting and distributing the operation resources of the parallel application program by the resource perception scheduler so as to balance the operation performance. The optimization constraint formula is as follows:

as shown in fig. 5 and 6, the method provided by the invention shows effectiveness by using ResNet _50, inclusion _ v3, VGG _16, MobileNet and NMT models to a vehicle-mounted end scene recognition experiment of the deep learning model based on the NestDNN framework. Specifically, the method comprises the following steps: when a NestDNN framework is used for establishing a resource-aware running scheduler to process a deep learning model, compared with a deep model without considering resource optimization allocation, the model performance changes remarkably: as shown in the following table, the model memory occupation and the model switching overhead are reduced by 41.7%; the accuracy is improved by 1.39%, and the processing time is improved by 155%.

TABLE 1 comparison of model memory occupancy and model switching overhead effect graphs

Model (model)	Multi-volume model size (MB)	Original model size (MB)	Reducing memory occupation (MB)
				ResNet_50	22.3	33.8	11.5
Inception_v3	81.4	234.5	153.1
				VGG_16	127.0	261.2	134.2
MobileNet	6.7	9.7	3.0
				NMT	293.0	532.7	139.7
Total of	530.4	1071.9	441.5

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The method for automatically driving the vehicle-mounted multi-target coupling recognition and tracking under heterogeneous traffic flows is characterized by comprising the following steps: the identification and tracking method comprises the steps of establishing a vehicle-mounted embedded application program resource scheduler, wherein the steps comprise:

2. The method for automatically driving the vehicle-mounted multi-target coupling to recognize and track under the heterogeneous traffic flow according to claim 1, wherein: the identification and tracking method further comprises the step of establishing and training a driving scene identification model before establishing the vehicle-mounted embedded application program resource scheduler; the step of establishing and training a driving scene recognition model comprises the following steps: and establishing a driving scene sample data set and a recognition tracking model for training multiple targets in a driving scene.

3. The method for automatically driving the vehicle-mounted multi-target coupling to recognize and track under the heterogeneous traffic flow according to claim 1, wherein: the model pruning and restoration processing of the driving scene recognition model, and the establishment of a resource allocation framework supporting the deep learning model application program dynamics comprise:

restoring the deep learning model to generate a multi-capacity model;

4. The method for automatically driving the vehicle-mounted multi-target coupling to recognize and track under the heterogeneous traffic flow according to claim 3, wherein: the method for pruning the neurons of the deep learning model, wherein compressing the deep learning model comprises:

setting the matrix of each layer of the convolution layer as | M_ijI, Mij represents a specific number in each convolution kernel, and the absolute values of the weight values of all convolution kernels are summed, i.e.

mi is the channel number of the convolution kernel;

5. The method for automatically driving the vehicle-mounted multi-target coupling to recognize and track under the heterogeneous traffic flow according to claim 3, wherein: the restoring the deep learning model and the generating the multi-capacity model comprise:

6. The method for automatically driving the vehicle-mounted multi-target coupling to recognize and track under the heterogeneous traffic flow according to claim 1, wherein: the establishing of the deep learning model and the running of the resource allocation scheduler flexibly allocate resources for the deep learning model running concurrently through the resource scheduler, and the outputting of the optimized scheduling scheme comprises the following steps:

7. The method for automatically driving the vehicle-mounted multi-target coupling to recognize and track under the heterogeneous traffic flow according to claim 2, wherein: the identification tracking model for multiple targets in the training driving scene comprises the following steps: