CN116229410A - Lightweight neural network road scene detection method integrating multidimensional information pooling - Google Patents

Lightweight neural network road scene detection method integrating multidimensional information pooling Download PDF

Info

Publication number
CN116229410A
CN116229410A CN202211550617.1A CN202211550617A CN116229410A CN 116229410 A CN116229410 A CN 116229410A CN 202211550617 A CN202211550617 A CN 202211550617A CN 116229410 A CN116229410 A CN 116229410A
Authority
CN
China
Prior art keywords
neural network
task
module
lightweight
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211550617.1A
Other languages
Chinese (zh)
Inventor
岑明
周聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211550617.1A priority Critical patent/CN116229410A/en
Publication of CN116229410A publication Critical patent/CN116229410A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention relates to a lightweight neural network road scene detection method integrating multidimensional information pooling, and belongs to the technical field of intelligent automobile driving safety. In the method, a lightweight neural network model adopts CSPDarknet as a trunk feature extraction layer and is used for extracting features of an input image; using the FPN+PAN bidirectional feature pyramid structure as a target detection task head; upsampling the feature layer by using a bilinear interpolation method to serve as a semantic segmentation task head; and the multidimensional information pooling Module is integrated into a neural network structure, and a lightweight convolution Module Ghost Module is used for optimizing a common convolution Module so as to realize a compact and efficient lightweight neural network with vehicle detection and lane line segmentation functions. The invention can detect the far smaller vehicle target in real time in the actual driving scene, and reduce the omission ratio.

Description

Lightweight neural network road scene detection method integrating multidimensional information pooling
Technical Field
The invention belongs to the technical field of intelligent automobile driving safety, relates to the fields of deep learning, computer vision, auxiliary driving, image processing and the like, and particularly relates to a lightweight neural network road scene detection method integrating multidimensional information pooling.
Background
Whether driving assistance, automatic driving or future all-weather full-area unmanned driving, each transition of driving modes requires the refinement of an intelligent detection algorithm, and reliable traffic environment perception is extremely important and fundamental in each link. In traffic scenes, particularly in following scenes which account for a large proportion of driving scenes, vehicles and lane lines are key targets for forming the scenes, and detection research of the vehicles and the lane lines is a key technology.
In recent years, assisted driving and unmanned driving are the focus of study of students, and real-time and accurate detection and identification of vehicles around intelligent automobiles and driving lane lines are important components of automatic driving, and meanwhile, the premise of safe driving of unmanned vehicles in urban environments is also achieved. Most of existing vehicle and lane line detection and identification algorithm models are single detection, and multi-task combined detection in the same scene, especially in a following scene, cannot be realized, so that the theoretical and practical application values are low. If the object detection network and the semantic segmentation network are operated in the vehicle-mounted chip at the same time, a great deal of computing resources of the vehicle-mounted chip are consumed, and other functions are adversely affected. In addition, the current target detection network is difficult to detect a far smaller vehicle target in an actual driving scene, and the omission factor is high, so that the intelligent vehicle running at a high speed has great potential safety hazard.
In summary, the problems of the prior art are: the multiple neural networks run simultaneously in one vehicle-mounted chip, so that the operation resources of the vehicle-mounted chip can be excessively consumed. Moreover, the current target detection has an unsatisfactory detection effect on small target objects, and the omission factor is high.
Disclosure of Invention
Therefore, the invention aims to provide a lightweight neural network road scene detection method integrating multidimensional information pooling, which can detect a far smaller vehicle target in an actual driving scene in real time and reduce the omission ratio. According to the invention, the multidimensional information pooling Module is integrated into the neural network structure, and the lightweight convolution Module Ghost Module is used for optimizing the common convolution Module so as to realize a compact and efficient lightweight neural network with the functions of vehicle detection and lane line segmentation.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a lightweight neural network road scene detection method integrating multidimensional information pooling is characterized in that a multidimensional information pooling Module is integrated into a neural network structure, and a lightweight convolution Module Ghost Module is used for optimizing a common convolution Module to achieve compact and efficient vehicle detection and a lightweight neural network with lane line segmentation functions. The working flow of the method is as follows: firstly, extracting picture features of an acquired video image by a feature extraction layer, then respectively transmitting extracted feature data to a target detection layer and a semantic segmentation layer for data decoding, and finally transmitting detected data to a vehicle-mounted display for displaying detection results.
The method specifically comprises the following steps:
s1: constructing a lightweight multi-task neural network with two functions of target detection and semantic segmentation (such as vehicle detection and lane line segmentation): adopting a framework of a same trunk feature extraction layer shared by a plurality of task heads; and a Ghost Module light Module is adopted to replace a common convolution Module in the feature extraction layer so as to greatly reduce the model reasoning operand, then a bidirectional feature pyramid structure is combined to conduct multi-scale feature prediction so as to construct a target detection task head, and a bilinear interpolation method is adopted to conduct up-sampling on the feature layer to serve as a semantic segmentation task head.
S2: and constructing a multidimensional information pooling Module (MCPB) and fusing the MCPB into a lightweight multi-task neural network so as to improve the detection effect on a small target object and reduce the omission ratio.
S3: the loss function of the lightweight multitasking neural network is designed.
S4: according to the output format of the lightweight multitasking neural network, a data set is manufactured; the data set comprises a BDD100K data set and a self-made data set according to a road scene acquired by the self, wherein the prepared data set is used for training a lightweight multi-task neural network.
S5: the trained lightweight multi-task neural network is used for vehicle detection and lane line segmentation.
Further, in step S1, a lightweight multi-task neural network is constructed, which specifically includes the following steps:
s11: constructing a lightweight neural network based on the model architecture of the same trunk feature extraction layer shared by the multitasking heads, wherein a plurality of task branches in the network share a trunk feature extraction layer coding result, namely, image features extracted by a feature extraction trunk layer are transmitted to a plurality of task branches; the input of the network is road scene images recorded by a camera, and the lightweight multi-task neural network function provided by the invention is to detect the position information of vehicles and lane lines in the road scene images;
s12: constructing a feature extraction layer of a lightweight multi-task neural network, wherein the network structure refers to a CSPDarknet network structure, and the feature extraction layer mainly comprises three modules, namely an Input module, a BackBone network (BackBone) and a network Neck (Neck);
s13: the main network mainly comprises two convolution units, namely a Standard Convolution Block (SCB) and a cross-stage connection residual convolution block (CSPX); standard convolution blocks include convolution (Conv), batch Normalization (BN), and mich activation functions; the cross-stage connection residual convolution block is a specially designed convolution model group, and is an improvement on a residual convolution structure;
s14: the method comprises the steps of constructing task heads, wherein the task heads are divided into target detection task heads and lane line detection task heads, the vehicle detection task heads adopt a bidirectional feature pyramid structure to construct a multi-scale feature prediction idea, and the lane line detection task heads adopt a bilinear interpolation method to construct a feature layer by an up-sampling method;
s15: in view of excessive convolution operation amount brought by the common convolution process, the invention adopts the Ghost Module light weight Module to replace the common convolution Module in the trunk feature extraction layer so as to greatly reduce the model reasoning operation amount.
Further, the step S2 specifically includes the following steps:
s21: constructing an MCPB module: the feature tensor is converted into a single feature tensor through two-dimensional global pooling, and features are aggregated along two spatial directions respectively, so that remote dependency relationships can be captured along one spatial direction, and accurate position information can be reserved along the other spatial direction. Then encoding the generated feature map into a pair of orientation-aware and position-sensitive intent maps to enhance the table of objects of interest;
s22: the invention proposes to introduce the MCPB module in front of the target detection task head of the network of step S1. MCPB is a lightweight module that, due to its versatility, adds negligible overhead to integrate into any convolutional neural network.
Further, the step S3 specifically includes the following steps:
s31: designing a loss function of the lightweight multitasking neural network, wherein the loss function comprises target detection loss and semantic segmentation loss, and the expression is as follows:
L all =α 1 L det2 L ll-seg
wherein alpha is 1 、α 2 The parameters are super parameters, and manual setting is adjusted through experiments; l (L) det The method is characterized in that the loss is target detection, and consists of classification loss, confidence loss and boundary box loss; l (L) ll-seg Is a semantic segmentation loss, which consists of a cross entropy loss and a IoU loss;
s32: and optimizing the network model parameters by using the loss function designed in the step S31 through back propagation, so that the network model parameters reach the optimal effect.
Further, the step S4 specifically includes the following steps:
s41: according to the light-weight multi-task neural network output structure of target detection and semantic segmentation, training set labels are divided into two types: the first class is a VOC format tag for object detection, and the second class is a mask format tag for semantic segmentation;
s42: the data set of the VOC format label for target detection is marked by using a LabelImg marking tool, and the data set of the mask format label for semantic segmentation is marked by using a Labelme marking tool; the homemade dataset is then combined with the published BDD100K dataset to train the multi-tasking neural network.
Further, step S5 specifically includes: the lightweight neural network for vehicle detection and lane line segmentation is deployed in a vehicle-mounted embedded platform, a running model in the embedded platform detects position information of vehicles in front of an intelligent vehicle and lane lines, and the detected information is sent to an embedded platform control center, and the method specifically comprises the following steps of:
step S51: the input end of the network needs to preprocess the image first, the size of the input picture is modified into 416 x 416 picture size required by the network by using the nearest neighbor interpolation method, and then the pixel value of the picture is converted from 0 to 255 into 0 to 1 value by normalization processing.
Step S52: in the post-processing program, the lightweight neural network designed by the invention detects the vehicle position information and the lane line position information in the input image, and highlights and displays the detected vehicle and lane line respectively by calling an image processing software library.
The invention has the beneficial effects that:
1) Compared with the single detection of the traditional vehicle and lane line detection and identification algorithm model, the lightweight multi-task neural network provided by the invention can realize multi-task combined detection in the same scene, especially in the same scene as a vehicle, and detect the vehicle and the lane line, so that the detection rate is greatly improved, and the good detection precision is maintained.
2) The light-weight multi-task neural network with two functions of vehicle detection and lane line segmentation, which is designed and constructed by the invention, has fewer parameters compared with the traditional single detection, and consumes fewer operation resources in the running process of the vehicle-mounted chip.
3) The multidimensional information pooling module is integrated into the lightweight multi-task neural network, so that the detection effect on a far smaller vehicle target in an actual driving scene is enhanced, and the omission ratio of the multi-task neural network is reduced.
4) The invention designs a Ghost Module light Module to replace a common convolution Module in the feature extraction layer, which greatly reduces the model reasoning operation quantity and further improves the multi-task neural network model reasoning rate.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a lightweight multi-tasking neural network with two functions of vehicle detection and lane segmentation constructed in accordance with the present invention;
FIG. 2 is a schematic diagram of a lightweight neural network incorporating a multidimensional information pooling module in accordance with the present invention;
FIG. 3 is a schematic view of a Ghost Module lightweight Module structure employed in the present invention;
FIG. 4 is a schematic diagram of a multi-dimensional information pooling module structure employed in the present invention;
FIG. 5 is an original view of an experimental road scene provided by the invention;
FIG. 6 is a schematic diagram of recognition results output by the lightweight multi-tasking neural network designed by the present invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Referring to fig. 1 to 6, the present invention provides a lightweight neural network road scene detection method integrating a multidimensional information pooling Module (MCPB). A workflow diagram of a lightweight neural network is shown in fig. 1.
The embodiment of the invention provides a lightweight multitasking neural network applied to two functions of vehicle detection and lane line segmentation of an intelligent vehicle, which comprises the following steps:
step 1: a lightweight multi-task neural network with two functions of target detection and semantic segmentation (which can be used for vehicle detection and lane line segmentation in a road scene) is constructed.
The lightweight neural network structure integrating the multidimensional information pooling modules is shown in fig. 2 (each layer of each module in fig. 2 represents different convolution layers), the lightweight neural network provided by the invention adopts a model framework which shares the same trunk feature extraction layer, and the image features extracted by one feature extraction layer are transmitted to a plurality of task head branches. Because this multitasking needs to implement two functions, two task heads, namely a vehicle detection task head and a lane line detection task head, are constructed.
The invention provides a feature extraction layer network structure of a lightweight neural network, which refers to a CSPDarknet network structure and mainly comprises three modules, namely an Input module, a BackBone network module and a network Neck module. The input end of the network needs to preprocess the image firstly, the size of the input picture is modified into 416-416 picture size required by the network by using the nearest neighbor interpolation method, and then the pixel value of the picture is converted from 0-255 into 0-1 value by normalization processing; the backbone network is mainly composed of two convolution units, namely a Standard Convolution Block (SCB) and a cross-phase connection residual convolution block (CSPX). Standard convolution blocks include convolution (Conv), batch Normalization (BN), and mich activation functions; the cross-stage connection residual convolution block is a specially designed convolution model group, and is an improvement on a residual convolution structure; the vehicle detection task head adopts a bidirectional feature pyramid structure to carry out multi-scale feature prediction thought construction, the method uses an FPN+PAN network structure, the FPN is top-down, high-level features are fused through upsampling and low-level features to obtain a feature map for prediction, only semantic information is enhanced, but positioning information is not transmitted, and then the PAN network structure is added, which is a bottom-up feature pyramid, rich positioning information is transmitted, the feature maps of two networks with corresponding sizes are fused, and finally detection information of a target is output. The input image is subjected to image feature extraction by a feature extraction layer, then an attention map is formed after the function of a multidimensional information pooling module, finally the extracted feature information is transmitted to a vehicle detection task head, the task head extracts the feature information of 112 x 112, 28 x 28 and 7*7 size feature images respectively, a larger feature image is mainly used for detecting a relatively smaller target, a small feature image is mainly used for detecting a large target, then different size feature images are expanded to the same size through up-sampling and are subjected to splicing operation, and finally a detection frame which is most overlapped with a real target is output through non-maximum value inhibition; the lane line detection task head adopts a bilinear interpolation method to construct an up-sampling method for the feature layer, an input image is subjected to feature extraction by the feature extraction layer, then a multi-dimensional information pooling module is used for forming an illumination map which is rich in accurate semantic information, and finally the position of the lane line in the image is accurately found out through the up-sampling operation of the task head.
In view of excessive convolution operation amount brought by the common convolution process, the invention designs a Ghost Module light Module to replace the common convolution Module in the feature extraction layer so as to greatly reduce the model reasoning operation amount. As shown in fig. 3, a schematic view of a lightweight Module of a Ghost Module proposed by the present invention is shown, for a certain input feature layer, a part of real feature layer (real feature layer) is generated by a common convolution operation, the rest feature layer (phantom feature layer) is obtained by performing a linear operation on the real feature layer, and then the real feature layer and the phantom feature layer are spliced together to form a complete feature layer. The design of the Ghost Module greatly reduces the consumption of calculation resources under the condition that the feature extraction accuracy is kept good, and compared with the common convolution process, the Ghost Module can save the training time by s times and reduce the parameters by s times. The formula of the network reasoning operation cost is as follows:
common convolution calculation cost: cost 1 =h'×w'×n×k;
Figure BDA0003980853370000061
Figure BDA0003980853370000062
In the formula, the size of an input characteristic diagram is h multiplied by w multiplied by c, the size of an output characteristic diagram is h 'multiplied by w' multiplied by n, the size of a convolution kernel is k multiplied by k, s is a super parameter, and the linear operation in the phantom module is set as a depth convolution operation.
Therefore, in theory, the training time can be saved by s times by using the Ghost Module, and the parameters of s times can be reduced.
Step 2: a multidimensional information pooling Module (MCPB) is constructed and fused into a lightweight multi-tasking neural network.
In order to solve the problems that the current neural network is difficult to detect a far smaller vehicle target in an actual driving scene and the omission ratio is high, the invention provides a method for fusing a multidimensional information pooling module in a lightweight neural network so as to improve the detection effect on the small target and reduce the omission ratio. Fig. 4 is a schematic diagram of a multidimensional information pooling module structure according to the present invention, which mainly converts a feature tensor into a single feature tensor through two-dimensional global pooling, and aggregates features along two spatial directions, so that a remote dependency relationship can be captured along one spatial direction, and accurate position information can be reserved along another spatial direction. The generated feature map is then encoded into a pair of orientation-aware and position-sensitive intent maps to enhance the representation of the object of interest. The invention provides a multidimensional information pooling module which is a lightweight module and is integrated into any convolutional neural network, so that the added overhead is ignored.
Step 3: and designing a loss function mathematical model suitable for the lightweight neural network.
The lightweight neural network proposed by the invention comprises two task heads of vehicle detection and lane line detection, so the loss function of the network also comprises two parts, namely target detection loss and lane line segmentation loss, and the final network loss is the weighted sum of the two parts:
L all =α 1 L det2 L ll-seg
L det =L class +L conf +L box
L ll-seg =L ce +L IoU
wherein alpha is 1 、α 2 The parameters are super parameters, and manual setting is adjusted through experiments; l (L) det For the purpose of detecting the loss, it is determined by classifying the loss L class Confidence loss L conf And bounding box loss L box Constructing; l (L) ll-seg For lane line segmentation loss, it is determined by cross entropy loss L ce And IoU loss L IoU The composition is formed. L (L) class And L conf Focal Loss is used to reduce the Loss of well-classified samples, forcing the network to concentrate on difficult samples. L (L) box CIoU loss is employed that takes into account the similarity of distance, overlap rate, scale and aspect ratio between the predicted and real frames. L (L) ce Adopts cross entropy loss, L IoU IoU losses are used.
Step 4: and manufacturing a data set according to the light-weight multitasking neural network output format.
Labeling a vehicle target in the picture by using LabelImg software to generate an xml file in a VOC format; marking lane lines in the pictures by using Labelme software to generate a json tag file, then converting the json tag file into a single-channel gray scale map in the png format by using codes, and then combining a homemade data set with a public BDD100K data set to train the multi-task neural network.
Step 5: and deploying the lightweight multi-task neural network for vehicle detection and lane line segmentation into the vehicle-mounted embedded platform.
The light-weight neural network is deployed on a development board by adopting a deep learning framework, the detected lane lines are covered by using lines by the neural network model, the detected lane lines are highlighted, and the detected vehicles are highlighted by using rectangular frames. Fig. 5 is an original view of an experimental road scene provided by the invention, and fig. 6 is a schematic diagram of an identification result output by the lightweight neural network designed by the invention.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (5)

1. A lightweight neural network road scene detection method integrating multidimensional information pooling is characterized by comprising the following steps:
s1: constructing a lightweight multi-task neural network with two functions of target detection and semantic segmentation: adopting a framework of a same trunk feature extraction layer shared by a plurality of task heads; a Ghost Module light Module is adopted to replace a common convolution Module in a feature extraction layer, then multi-scale feature prediction is carried out by combining a bidirectional feature pyramid structure to construct a target detection task head, and a bilinear interpolation method is adopted to carry out up-sampling on the feature layer to serve as a semantic segmentation task head;
s2: constructing an MCPB module and fusing the MCPB module into a lightweight multi-task neural network; wherein, the MCPB module represents a multidimensional information pooling module;
s3: designing a loss function of the lightweight multitasking neural network;
s4: according to the output format of the lightweight multi-task neural network, a data set is manufactured and used for training the lightweight multi-task neural network;
s5: the trained lightweight multi-task neural network is used for vehicle detection and lane line segmentation.
2. The method for detecting the road scene of the lightweight neural network according to claim 1, wherein in the step S1, the lightweight multi-task neural network is constructed, specifically comprising the following steps:
s11: constructing a lightweight neural network based on the model architecture of the same trunk feature extraction layer shared by the multitasking heads, wherein a plurality of task branches in the network share a trunk feature extraction layer coding result, namely, image features extracted by a feature extraction trunk layer are transmitted to a plurality of task branches;
s12: constructing a characteristic extraction layer of a lightweight multi-task neural network, wherein the network structure refers to a CSPDarknet network structure, and the characteristic extraction layer consists of three modules, namely an input module, a backbone network module and a network neck module;
s13: the main network consists of two convolution units, namely a standard convolution block and a cross-stage connection residual convolution block; standard convolution blocks include convolution, batch normalization, and Mish activation functions;
s14: the method comprises the steps of constructing task heads, wherein the task heads are divided into target detection task heads and lane line detection task heads, a vehicle detection task head adopts a bidirectional feature pyramid structure to construct a multi-scale feature prediction idea, and the lane line detection task head adopts a bilinear interpolation method to construct a feature layer by an up-sampling method;
s15: and a Ghost Module light Module is adopted to replace a common convolution Module in the trunk feature extraction layer.
3. The method for detecting a road scene by using a lightweight neural network according to claim 1, wherein the step S2 specifically comprises the following steps:
s21: constructing an MCPB module: converting the feature tensor into a single feature tensor through two-dimensional global pooling, respectively aggregating features along two spatial directions, and then encoding the generated feature map into a pair of orientation perception and position sensitive maps;
s22: the MCPB module is introduced in front of the target detection task head of the network in step S1.
4. The method for detecting a road scene by using a lightweight neural network according to claim 1, wherein the step S3 specifically comprises the following steps:
s31: designing a loss function of the lightweight multitasking neural network, wherein the loss function comprises target detection loss and semantic segmentation loss, and the expression is as follows:
L all =α 1 L det2 L ll-seg
wherein alpha is 1 、α 2 The parameters are super parameters, and manual setting is adjusted through experiments; l (L) det The method is characterized in that the loss is target detection, and consists of classification loss, confidence loss and boundary box loss; l (L) ll-seg Is a semantic segmentation loss, which consists of a cross entropy loss and a IoU loss;
s32: and optimizing the network model parameters by using the loss function designed in the step S31 through back propagation, so that the network model parameters reach the optimal effect.
5. The method for detecting a road scene by using a lightweight neural network according to claim 1, wherein the step S4 specifically comprises the following steps:
s41: according to the light-weight multi-task neural network output structure of target detection and semantic segmentation, training set labels are divided into two types: the first class is a VOC format tag for object detection, and the second class is a mask format tag for semantic segmentation;
s42: the data set of the VOC format label for target detection is marked by using a LabelImg marking tool, and the data set of the mask format label for semantic segmentation is marked by using a Labelme marking tool; the homemade dataset is then combined with the published BDD100K dataset to train the multi-tasking neural network.
CN202211550617.1A 2022-12-05 2022-12-05 Lightweight neural network road scene detection method integrating multidimensional information pooling Pending CN116229410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211550617.1A CN116229410A (en) 2022-12-05 2022-12-05 Lightweight neural network road scene detection method integrating multidimensional information pooling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211550617.1A CN116229410A (en) 2022-12-05 2022-12-05 Lightweight neural network road scene detection method integrating multidimensional information pooling

Publications (1)

Publication Number Publication Date
CN116229410A true CN116229410A (en) 2023-06-06

Family

ID=86581284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211550617.1A Pending CN116229410A (en) 2022-12-05 2022-12-05 Lightweight neural network road scene detection method integrating multidimensional information pooling

Country Status (1)

Country Link
CN (1) CN116229410A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078967A (en) * 2023-09-04 2023-11-17 石家庄铁道大学 Efficient and lightweight multi-scale pedestrian re-identification method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078967A (en) * 2023-09-04 2023-11-17 石家庄铁道大学 Efficient and lightweight multi-scale pedestrian re-identification method
CN117078967B (en) * 2023-09-04 2024-03-01 石家庄铁道大学 Efficient and lightweight multi-scale pedestrian re-identification method

Similar Documents

Publication Publication Date Title
CN109117718B (en) Three-dimensional semantic map construction and storage method for road scene
US20210142095A1 (en) Image disparity estimation
CN111563909B (en) Semantic segmentation method for complex street view image
CN110084139B (en) Vehicle weight recognition method based on multi-branch deep learning
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN112395951B (en) Complex scene-oriented domain-adaptive traffic target detection and identification method
WO2012139228A1 (en) Video-based detection of multiple object types under varying poses
CN116453121B (en) Training method and device for lane line recognition model
CN112613434A (en) Road target detection method, device and storage medium
Muthalagu et al. Vehicle lane markings segmentation and keypoint determination using deep convolutional neural networks
CN116229410A (en) Lightweight neural network road scene detection method integrating multidimensional information pooling
CN116071747A (en) 3D point cloud data and 2D image data fusion matching semantic segmentation method
CN114596548A (en) Target detection method, target detection device, computer equipment and computer-readable storage medium
CN114048536A (en) Road structure prediction and target detection method based on multitask neural network
CN116563553B (en) Unmanned aerial vehicle image segmentation method and system based on deep learning
CN115588188A (en) Locomotive, vehicle-mounted terminal and driver behavior identification method
CN117372991A (en) Automatic driving method and system based on multi-view multi-mode fusion
Zheng et al. A method of traffic police detection based on attention mechanism in natural scene
CN116935361A (en) Deep learning-based driver distraction behavior detection method
CN116311154A (en) Vehicle detection and identification method based on YOLOv5 model optimization
CN115965783A (en) Unstructured road segmentation method based on point cloud and image feature fusion
CN115565155A (en) Training method of neural network model, generation method of vehicle view and vehicle
CN117011722A (en) License plate recognition method and device based on unmanned aerial vehicle real-time monitoring video
Di et al. Spatial prior for nonparametric road scene parsing
Liu et al. L2-LiteSeg: A Real-Time Semantic Segmentation Method for End-to-End Autonomous Driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination