CN116229410A

CN116229410A - Lightweight neural network road scene detection method integrating multidimensional information pooling

Info

Publication number: CN116229410A
Application number: CN202211550617.1A
Authority: CN
Inventors: 岑明; 周聪
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-06-06

Abstract

The invention relates to a lightweight neural network road scene detection method integrating multidimensional information pooling, and belongs to the technical field of intelligent automobile driving safety. In the method, a lightweight neural network model adopts CSPDarknet as a trunk feature extraction layer and is used for extracting features of an input image; using the FPN+PAN bidirectional feature pyramid structure as a target detection task head; upsampling the feature layer by using a bilinear interpolation method to serve as a semantic segmentation task head; and the multidimensional information pooling Module is integrated into a neural network structure, and a lightweight convolution Module Ghost Module is used for optimizing a common convolution Module so as to realize a compact and efficient lightweight neural network with vehicle detection and lane line segmentation functions. The invention can detect the far smaller vehicle target in real time in the actual driving scene, and reduce the omission ratio.

Description

Lightweight neural network road scene detection method integrating multidimensional information pooling

Technical Field

The invention belongs to the technical field of intelligent automobile driving safety, relates to the fields of deep learning, computer vision, auxiliary driving, image processing and the like, and particularly relates to a lightweight neural network road scene detection method integrating multidimensional information pooling.

Background

Whether driving assistance, automatic driving or future all-weather full-area unmanned driving, each transition of driving modes requires the refinement of an intelligent detection algorithm, and reliable traffic environment perception is extremely important and fundamental in each link. In traffic scenes, particularly in following scenes which account for a large proportion of driving scenes, vehicles and lane lines are key targets for forming the scenes, and detection research of the vehicles and the lane lines is a key technology.

In recent years, assisted driving and unmanned driving are the focus of study of students, and real-time and accurate detection and identification of vehicles around intelligent automobiles and driving lane lines are important components of automatic driving, and meanwhile, the premise of safe driving of unmanned vehicles in urban environments is also achieved. Most of existing vehicle and lane line detection and identification algorithm models are single detection, and multi-task combined detection in the same scene, especially in a following scene, cannot be realized, so that the theoretical and practical application values are low. If the object detection network and the semantic segmentation network are operated in the vehicle-mounted chip at the same time, a great deal of computing resources of the vehicle-mounted chip are consumed, and other functions are adversely affected. In addition, the current target detection network is difficult to detect a far smaller vehicle target in an actual driving scene, and the omission factor is high, so that the intelligent vehicle running at a high speed has great potential safety hazard.

In summary, the problems of the prior art are: the multiple neural networks run simultaneously in one vehicle-mounted chip, so that the operation resources of the vehicle-mounted chip can be excessively consumed. Moreover, the current target detection has an unsatisfactory detection effect on small target objects, and the omission factor is high.

Disclosure of Invention

Therefore, the invention aims to provide a lightweight neural network road scene detection method integrating multidimensional information pooling, which can detect a far smaller vehicle target in an actual driving scene in real time and reduce the omission ratio. According to the invention, the multidimensional information pooling Module is integrated into the neural network structure, and the lightweight convolution Module Ghost Module is used for optimizing the common convolution Module so as to realize a compact and efficient lightweight neural network with the functions of vehicle detection and lane line segmentation.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a lightweight neural network road scene detection method integrating multidimensional information pooling is characterized in that a multidimensional information pooling Module is integrated into a neural network structure, and a lightweight convolution Module Ghost Module is used for optimizing a common convolution Module to achieve compact and efficient vehicle detection and a lightweight neural network with lane line segmentation functions. The working flow of the method is as follows: firstly, extracting picture features of an acquired video image by a feature extraction layer, then respectively transmitting extracted feature data to a target detection layer and a semantic segmentation layer for data decoding, and finally transmitting detected data to a vehicle-mounted display for displaying detection results.

The method specifically comprises the following steps:

s1: constructing a lightweight multi-task neural network with two functions of target detection and semantic segmentation (such as vehicle detection and lane line segmentation): adopting a framework of a same trunk feature extraction layer shared by a plurality of task heads; and a Ghost Module light Module is adopted to replace a common convolution Module in the feature extraction layer so as to greatly reduce the model reasoning operand, then a bidirectional feature pyramid structure is combined to conduct multi-scale feature prediction so as to construct a target detection task head, and a bilinear interpolation method is adopted to conduct up-sampling on the feature layer to serve as a semantic segmentation task head.

S2: and constructing a multidimensional information pooling Module (MCPB) and fusing the MCPB into a lightweight multi-task neural network so as to improve the detection effect on a small target object and reduce the omission ratio.

S3: the loss function of the lightweight multitasking neural network is designed.

S4: according to the output format of the lightweight multitasking neural network, a data set is manufactured; the data set comprises a BDD100K data set and a self-made data set according to a road scene acquired by the self, wherein the prepared data set is used for training a lightweight multi-task neural network.

S5: the trained lightweight multi-task neural network is used for vehicle detection and lane line segmentation.

Further, in step S1, a lightweight multi-task neural network is constructed, which specifically includes the following steps:

s11: constructing a lightweight neural network based on the model architecture of the same trunk feature extraction layer shared by the multitasking heads, wherein a plurality of task branches in the network share a trunk feature extraction layer coding result, namely, image features extracted by a feature extraction trunk layer are transmitted to a plurality of task branches; the input of the network is road scene images recorded by a camera, and the lightweight multi-task neural network function provided by the invention is to detect the position information of vehicles and lane lines in the road scene images;

s12: constructing a feature extraction layer of a lightweight multi-task neural network, wherein the network structure refers to a CSPDarknet network structure, and the feature extraction layer mainly comprises three modules, namely an Input module, a BackBone network (BackBone) and a network Neck (Neck);

s13: the main network mainly comprises two convolution units, namely a Standard Convolution Block (SCB) and a cross-stage connection residual convolution block (CSPX); standard convolution blocks include convolution (Conv), batch Normalization (BN), and mich activation functions; the cross-stage connection residual convolution block is a specially designed convolution model group, and is an improvement on a residual convolution structure;

s14: the method comprises the steps of constructing task heads, wherein the task heads are divided into target detection task heads and lane line detection task heads, the vehicle detection task heads adopt a bidirectional feature pyramid structure to construct a multi-scale feature prediction idea, and the lane line detection task heads adopt a bilinear interpolation method to construct a feature layer by an up-sampling method;

s15: in view of excessive convolution operation amount brought by the common convolution process, the invention adopts the Ghost Module light weight Module to replace the common convolution Module in the trunk feature extraction layer so as to greatly reduce the model reasoning operation amount.

Further, the step S2 specifically includes the following steps:

s21: constructing an MCPB module: the feature tensor is converted into a single feature tensor through two-dimensional global pooling, and features are aggregated along two spatial directions respectively, so that remote dependency relationships can be captured along one spatial direction, and accurate position information can be reserved along the other spatial direction. Then encoding the generated feature map into a pair of orientation-aware and position-sensitive intent maps to enhance the table of objects of interest;

s22: the invention proposes to introduce the MCPB module in front of the target detection task head of the network of step S1. MCPB is a lightweight module that, due to its versatility, adds negligible overhead to integrate into any convolutional neural network.

Further, the step S3 specifically includes the following steps:

s31: designing a loss function of the lightweight multitasking neural network, wherein the loss function comprises target detection loss and semantic segmentation loss, and the expression is as follows:

L _all ＝α ₁ L _det +α ₂ L _ll-seg

wherein alpha is ₁ 、α ₂ The parameters are super parameters, and manual setting is adjusted through experiments; l (L) _det The method is characterized in that the loss is target detection, and consists of classification loss, confidence loss and boundary box loss; l (L) _ll-seg Is a semantic segmentation loss, which consists of a cross entropy loss and a IoU loss;

s32: and optimizing the network model parameters by using the loss function designed in the step S31 through back propagation, so that the network model parameters reach the optimal effect.

Further, the step S4 specifically includes the following steps:

s41: according to the light-weight multi-task neural network output structure of target detection and semantic segmentation, training set labels are divided into two types: the first class is a VOC format tag for object detection, and the second class is a mask format tag for semantic segmentation;

s42: the data set of the VOC format label for target detection is marked by using a LabelImg marking tool, and the data set of the mask format label for semantic segmentation is marked by using a Labelme marking tool; the homemade dataset is then combined with the published BDD100K dataset to train the multi-tasking neural network.

Further, step S5 specifically includes: the lightweight neural network for vehicle detection and lane line segmentation is deployed in a vehicle-mounted embedded platform, a running model in the embedded platform detects position information of vehicles in front of an intelligent vehicle and lane lines, and the detected information is sent to an embedded platform control center, and the method specifically comprises the following steps of:

step S51: the input end of the network needs to preprocess the image first, the size of the input picture is modified into 416 x 416 picture size required by the network by using the nearest neighbor interpolation method, and then the pixel value of the picture is converted from 0 to 255 into 0 to 1 value by normalization processing.

Step S52: in the post-processing program, the lightweight neural network designed by the invention detects the vehicle position information and the lane line position information in the input image, and highlights and displays the detected vehicle and lane line respectively by calling an image processing software library.

The invention has the beneficial effects that:

1) Compared with the single detection of the traditional vehicle and lane line detection and identification algorithm model, the lightweight multi-task neural network provided by the invention can realize multi-task combined detection in the same scene, especially in the same scene as a vehicle, and detect the vehicle and the lane line, so that the detection rate is greatly improved, and the good detection precision is maintained.

2) The light-weight multi-task neural network with two functions of vehicle detection and lane line segmentation, which is designed and constructed by the invention, has fewer parameters compared with the traditional single detection, and consumes fewer operation resources in the running process of the vehicle-mounted chip.

3) The multidimensional information pooling module is integrated into the lightweight multi-task neural network, so that the detection effect on a far smaller vehicle target in an actual driving scene is enhanced, and the omission ratio of the multi-task neural network is reduced.

4) The invention designs a Ghost Module light Module to replace a common convolution Module in the feature extraction layer, which greatly reduces the model reasoning operation quantity and further improves the multi-task neural network model reasoning rate.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of a lightweight multi-tasking neural network with two functions of vehicle detection and lane segmentation constructed in accordance with the present invention;

FIG. 2 is a schematic diagram of a lightweight neural network incorporating a multidimensional information pooling module in accordance with the present invention;

FIG. 3 is a schematic view of a Ghost Module lightweight Module structure employed in the present invention;

FIG. 4 is a schematic diagram of a multi-dimensional information pooling module structure employed in the present invention;

FIG. 5 is an original view of an experimental road scene provided by the invention;

FIG. 6 is a schematic diagram of recognition results output by the lightweight multi-tasking neural network designed by the present invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Referring to fig. 1 to 6, the present invention provides a lightweight neural network road scene detection method integrating a multidimensional information pooling Module (MCPB). A workflow diagram of a lightweight neural network is shown in fig. 1.

The embodiment of the invention provides a lightweight multitasking neural network applied to two functions of vehicle detection and lane line segmentation of an intelligent vehicle, which comprises the following steps:

step 1: a lightweight multi-task neural network with two functions of target detection and semantic segmentation (which can be used for vehicle detection and lane line segmentation in a road scene) is constructed.

The lightweight neural network structure integrating the multidimensional information pooling modules is shown in fig. 2 (each layer of each module in fig. 2 represents different convolution layers), the lightweight neural network provided by the invention adopts a model framework which shares the same trunk feature extraction layer, and the image features extracted by one feature extraction layer are transmitted to a plurality of task head branches. Because this multitasking needs to implement two functions, two task heads, namely a vehicle detection task head and a lane line detection task head, are constructed.

The invention provides a feature extraction layer network structure of a lightweight neural network, which refers to a CSPDarknet network structure and mainly comprises three modules, namely an Input module, a BackBone network module and a network Neck module. The input end of the network needs to preprocess the image firstly, the size of the input picture is modified into 416-416 picture size required by the network by using the nearest neighbor interpolation method, and then the pixel value of the picture is converted from 0-255 into 0-1 value by normalization processing; the backbone network is mainly composed of two convolution units, namely a Standard Convolution Block (SCB) and a cross-phase connection residual convolution block (CSPX). Standard convolution blocks include convolution (Conv), batch Normalization (BN), and mich activation functions; the cross-stage connection residual convolution block is a specially designed convolution model group, and is an improvement on a residual convolution structure; the vehicle detection task head adopts a bidirectional feature pyramid structure to carry out multi-scale feature prediction thought construction, the method uses an FPN+PAN network structure, the FPN is top-down, high-level features are fused through upsampling and low-level features to obtain a feature map for prediction, only semantic information is enhanced, but positioning information is not transmitted, and then the PAN network structure is added, which is a bottom-up feature pyramid, rich positioning information is transmitted, the feature maps of two networks with corresponding sizes are fused, and finally detection information of a target is output. The input image is subjected to image feature extraction by a feature extraction layer, then an attention map is formed after the function of a multidimensional information pooling module, finally the extracted feature information is transmitted to a vehicle detection task head, the task head extracts the feature information of 112 x 112, 28 x 28 and 7*7 size feature images respectively, a larger feature image is mainly used for detecting a relatively smaller target, a small feature image is mainly used for detecting a large target, then different size feature images are expanded to the same size through up-sampling and are subjected to splicing operation, and finally a detection frame which is most overlapped with a real target is output through non-maximum value inhibition; the lane line detection task head adopts a bilinear interpolation method to construct an up-sampling method for the feature layer, an input image is subjected to feature extraction by the feature extraction layer, then a multi-dimensional information pooling module is used for forming an illumination map which is rich in accurate semantic information, and finally the position of the lane line in the image is accurately found out through the up-sampling operation of the task head.

In view of excessive convolution operation amount brought by the common convolution process, the invention designs a Ghost Module light Module to replace the common convolution Module in the feature extraction layer so as to greatly reduce the model reasoning operation amount. As shown in fig. 3, a schematic view of a lightweight Module of a Ghost Module proposed by the present invention is shown, for a certain input feature layer, a part of real feature layer (real feature layer) is generated by a common convolution operation, the rest feature layer (phantom feature layer) is obtained by performing a linear operation on the real feature layer, and then the real feature layer and the phantom feature layer are spliced together to form a complete feature layer. The design of the Ghost Module greatly reduces the consumption of calculation resources under the condition that the feature extraction accuracy is kept good, and compared with the common convolution process, the Ghost Module can save the training time by s times and reduce the parameters by s times. The formula of the network reasoning operation cost is as follows:

common convolution calculation cost: cost ₁ ＝h'×w'×n×k；

In the formula, the size of an input characteristic diagram is h multiplied by w multiplied by c, the size of an output characteristic diagram is h 'multiplied by w' multiplied by n, the size of a convolution kernel is k multiplied by k, s is a super parameter, and the linear operation in the phantom module is set as a depth convolution operation.

Therefore, in theory, the training time can be saved by s times by using the Ghost Module, and the parameters of s times can be reduced.

Step 2: a multidimensional information pooling Module (MCPB) is constructed and fused into a lightweight multi-tasking neural network.

In order to solve the problems that the current neural network is difficult to detect a far smaller vehicle target in an actual driving scene and the omission ratio is high, the invention provides a method for fusing a multidimensional information pooling module in a lightweight neural network so as to improve the detection effect on the small target and reduce the omission ratio. Fig. 4 is a schematic diagram of a multidimensional information pooling module structure according to the present invention, which mainly converts a feature tensor into a single feature tensor through two-dimensional global pooling, and aggregates features along two spatial directions, so that a remote dependency relationship can be captured along one spatial direction, and accurate position information can be reserved along another spatial direction. The generated feature map is then encoded into a pair of orientation-aware and position-sensitive intent maps to enhance the representation of the object of interest. The invention provides a multidimensional information pooling module which is a lightweight module and is integrated into any convolutional neural network, so that the added overhead is ignored.

Step 3: and designing a loss function mathematical model suitable for the lightweight neural network.

The lightweight neural network proposed by the invention comprises two task heads of vehicle detection and lane line detection, so the loss function of the network also comprises two parts, namely target detection loss and lane line segmentation loss, and the final network loss is the weighted sum of the two parts:

L _all ＝α ₁ L _det +α ₂ L _ll-seg

L _det ＝L _class +L _conf +L _box

L _ll-seg ＝L _ce +L _IoU

wherein alpha is ₁ 、α ₂ The parameters are super parameters, and manual setting is adjusted through experiments; l (L) _det For the purpose of detecting the loss, it is determined by classifying the loss L _class Confidence loss L _conf And bounding box loss L _box Constructing; l (L) _ll-seg For lane line segmentation loss, it is determined by cross entropy loss L _ce And IoU loss L _IoU The composition is formed. L (L) _class And L _conf Focal Loss is used to reduce the Loss of well-classified samples, forcing the network to concentrate on difficult samples. L (L) _box CIoU loss is employed that takes into account the similarity of distance, overlap rate, scale and aspect ratio between the predicted and real frames. L (L) _ce Adopts cross entropy loss, L _IoU IoU losses are used.

Step 4: and manufacturing a data set according to the light-weight multitasking neural network output format.

Labeling a vehicle target in the picture by using LabelImg software to generate an xml file in a VOC format; marking lane lines in the pictures by using Labelme software to generate a json tag file, then converting the json tag file into a single-channel gray scale map in the png format by using codes, and then combining a homemade data set with a public BDD100K data set to train the multi-task neural network.

Step 5: and deploying the lightweight multi-task neural network for vehicle detection and lane line segmentation into the vehicle-mounted embedded platform.

The light-weight neural network is deployed on a development board by adopting a deep learning framework, the detected lane lines are covered by using lines by the neural network model, the detected lane lines are highlighted, and the detected vehicles are highlighted by using rectangular frames. Fig. 5 is an original view of an experimental road scene provided by the invention, and fig. 6 is a schematic diagram of an identification result output by the lightweight neural network designed by the invention.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. A lightweight neural network road scene detection method integrating multidimensional information pooling is characterized by comprising the following steps:

s1: constructing a lightweight multi-task neural network with two functions of target detection and semantic segmentation: adopting a framework of a same trunk feature extraction layer shared by a plurality of task heads; a Ghost Module light Module is adopted to replace a common convolution Module in a feature extraction layer, then multi-scale feature prediction is carried out by combining a bidirectional feature pyramid structure to construct a target detection task head, and a bilinear interpolation method is adopted to carry out up-sampling on the feature layer to serve as a semantic segmentation task head;

s2: constructing an MCPB module and fusing the MCPB module into a lightweight multi-task neural network; wherein, the MCPB module represents a multidimensional information pooling module;

s3: designing a loss function of the lightweight multitasking neural network;

s4: according to the output format of the lightweight multi-task neural network, a data set is manufactured and used for training the lightweight multi-task neural network;

2. The method for detecting the road scene of the lightweight neural network according to claim 1, wherein in the step S1, the lightweight multi-task neural network is constructed, specifically comprising the following steps:

s11: constructing a lightweight neural network based on the model architecture of the same trunk feature extraction layer shared by the multitasking heads, wherein a plurality of task branches in the network share a trunk feature extraction layer coding result, namely, image features extracted by a feature extraction trunk layer are transmitted to a plurality of task branches;

s12: constructing a characteristic extraction layer of a lightweight multi-task neural network, wherein the network structure refers to a CSPDarknet network structure, and the characteristic extraction layer consists of three modules, namely an input module, a backbone network module and a network neck module;

s13: the main network consists of two convolution units, namely a standard convolution block and a cross-stage connection residual convolution block; standard convolution blocks include convolution, batch normalization, and Mish activation functions;

s14: the method comprises the steps of constructing task heads, wherein the task heads are divided into target detection task heads and lane line detection task heads, a vehicle detection task head adopts a bidirectional feature pyramid structure to construct a multi-scale feature prediction idea, and the lane line detection task head adopts a bilinear interpolation method to construct a feature layer by an up-sampling method;

s15: and a Ghost Module light Module is adopted to replace a common convolution Module in the trunk feature extraction layer.

3. The method for detecting a road scene by using a lightweight neural network according to claim 1, wherein the step S2 specifically comprises the following steps:

s21: constructing an MCPB module: converting the feature tensor into a single feature tensor through two-dimensional global pooling, respectively aggregating features along two spatial directions, and then encoding the generated feature map into a pair of orientation perception and position sensitive maps;

s22: the MCPB module is introduced in front of the target detection task head of the network in step S1.

4. The method for detecting a road scene by using a lightweight neural network according to claim 1, wherein the step S3 specifically comprises the following steps:

L _all ＝α ₁ L _det +α ₂ L _ll-seg

5. The method for detecting a road scene by using a lightweight neural network according to claim 1, wherein the step S4 specifically comprises the following steps: