CN110941995A - Real-time target detection and semantic segmentation multi-task learning method based on lightweight network - Google Patents
Real-time target detection and semantic segmentation multi-task learning method based on lightweight network Download PDFInfo
- Publication number
- CN110941995A CN110941995A CN201911060977.1A CN201911060977A CN110941995A CN 110941995 A CN110941995 A CN 110941995A CN 201911060977 A CN201911060977 A CN 201911060977A CN 110941995 A CN110941995 A CN 110941995A
- Authority
- CN
- China
- Prior art keywords
- module
- loss
- semantic segmentation
- target detection
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 76
- 230000011218 segmentation Effects 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000006870 function Effects 0.000 claims abstract description 27
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 238000010586 diagram Methods 0.000 claims abstract description 6
- 239000000284 extract Substances 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims description 40
- 238000012795 verification Methods 0.000 claims description 19
- 238000012360 testing method Methods 0.000 claims description 18
- 230000000694 effects Effects 0.000 claims description 11
- 230000008447 perception Effects 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004904 shortening Methods 0.000 description 2
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a real-time target detection and semantic segmentation multi-task learning method based on a lightweight network. The system comprises a feature extraction module, a semantic segmentation module, a target detection module and a multi-scale receptive field module; the feature extraction module selects a lightweight convolutional neural network MobileNet, extracts features through the MobileNet network, sends the features into the semantic segmentation module to finish the segmentation of the drivable road area and the selectable driving area, and sends the features into the target detection module to finish the object detection appearing in the road scene; and increasing the receptive field of the characteristic diagram through a multi-scale receptive field module, solving the multi-scale problem by convolution of different scales, and finally performing weighted summation on the loss function of the semantic segmentation module and the loss function of the target detection module to optimize the total module. Compared with the prior art, the method provided by the invention can more quickly and accurately complete two common unmanned perception tasks of road object detection and road driving area segmentation.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a real-time target detection and semantic segmentation multi-task learning method based on a lightweight network.
Background
Computer vision is becoming increasingly popular in autonomous driving, mainly due to the rise of deep learning techniques based on neural networks. The advent of more and more common data sets and developed hardware resources has prompted related research efforts and further pushed the development of computer vision technology. Many computer vision tasks are used in autonomous vehicles, such as object detection and road segmentation, which are crucial for perceiving the driving environment. The current trend is to continuously improve the accuracy of these tasks while keeping the inference time as short as possible. The model perception accuracy is only met, the model prediction speed is not high, great danger is brought to decision making of the unmanned vehicle, and decision making processing cannot be carried out in time when sudden accidents happen, so that the model needs to have high prediction speed, and the vehicle can be guaranteed to have enough time to make decisions. In addition, the hardware resources of the autonomous vehicle are limited, and it is also an important task to fully utilize the hardware resources. And the object under the road scene has the too big problem of size difference of yardstick, and conventional model can't accurately accomplish the perception problem to big object and little object simultaneously, so can explode many potential problems.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a light-weight network-based multi-task learning method for real-time target detection and semantic segmentation, and more quickly and accurately completes two common unmanned perception tasks of road object detection and road driving area segmentation.
In order to solve the technical problems, the invention adopts the technical scheme that: a multitask learning method based on real-time target detection and semantic segmentation of a lightweight network comprises a feature extraction module, a semantic segmentation module, a target detection module and a multi-scale receptive field module; the feature extraction module selects a lightweight convolutional neural network MobileNet, extracts features through the MobileNet network, sends the features to the semantic segmentation module on the upper layer to finish the segmentation of the drivable road area and the selectable driving area, and sends the features to the target detection module on the lower layer to finish the object detection appearing in the road scene; and increasing the receptive field of the characteristic diagram through a multi-scale receptive field module, solving the multi-scale problem by convolution of different scales, and finally performing weighted summation on the loss function of the semantic segmentation module and the loss function of the target detection module to optimize the total module.
Further, the feature extraction module performs feature extraction on the RGB image through a lightweight convolutional neural network MobileNet; MobileNet employs a deep separable convolution instead of a conventional convolution to reduce the number of model parameters. The MobileNet network has smaller volume, less calculation amount and higher precision, and has great advantages in a lightweight neural network. In the process of extracting the features, the more the feature graph size obtained later is smaller, the larger the receptive field is, and the more abundant the semantic information is. MobileNet adopts deep separable convolution to replace conventional convolution to reduce the model parameter quantity, thereby shortening the model prediction time and lowering the requirement on hardware resources.
Further, taking an SSD detection algorithm as a detection baseline model, and adding a multi-scale receptive field module into a target detection module; the multi-scale receptive field module is composed of cavity convolutions with different proportions, and the multi-scale receptive field is increased under the condition that the sizes of the cavity convolutions with different scales are not changed to solve the multi-scale problem. In addition, a multi-scale reception field module is added into a target detection module, the multi-scale reception field module is formed by convolution of holes with different proportions, and the convolution of the holes with different scales increases the multi-scale reception field under the condition that the size of the scales is not changed so as to solve the multi-scale problem. The cavity convolution with the ratio of 5 and the cavity convolution with the ratio of 7 are respectively used for increasing the receptive field of a large-scale object, the cavity convolution with the ratio of 3 is used for increasing the receptive field of a small object, and meanwhile, the convolution layers with different sizes are finally combined together, so that the problem of multiple scales commonly existing in a road scene is well solved.
Furthermore, the features extracted by the backbone network MobileNet are sent to a semantic segmentation module on the upper layer to complete the segmentation of the drivable road area and the selectable driving area, the feature maps on the first two layers are merged, a multi-scale receptive field module is also added into the semantic segmentation module, and the feature maps on the second layer are subjected to cavity convolution with different ratios. The features extracted by the backbone network MobileNet are sent to a semantic segmentation module on the upper layer to complete the segmentation of the drivable road area and the selectable driving area, and the merging operation is carried out on the feature maps on the first two layers, so that the semantic information is increased under the condition of ensuring the scale of the feature maps. And similarly, a multi-scale receptive field module is also added into the semantic segmentation module, hole convolutions with different ratios are adopted for the second layer of feature maps, expansion convolutions with the ratios of 1, 3 and 6 are respectively selected to solve the multi-scale problem, and finally, the feature maps are combined together and then are decoded to finish the segmentation of the road driving area.
Furthermore, the multi-scale reception field module added in the target detection module is used for increasing the reception field of the large-scale object by using the cavity convolution with the ratio of 5 and 7, increasing the reception field of the small object by using the cavity convolution with the ratio of 3, and simultaneously adopting convolution layers with different sizes to be finally combined together.
Furthermore, a multi-scale receptive field module is added into the semantic segmentation module, expansion convolution with the ratio of 1, 3 and 6 is respectively selected to solve the multi-scale problem, and finally, the feature maps are merged together and then are decoded to finish the segmentation of the road driving area.
Further, a Loss function of the multi-task learning is obtained by weighted summation of the Loss functions of all branches, the Loss function of the detection branch is obtained by adding the regression Loss to the classification Loss, and the Loss detection is Loss classification + Loss regression; the Loss function of the splitting branch is Loss split weight [ class ]. crossEntorpyLoss (x, class); finally, Loss function Loss total is Loss detection + Loss segmentation; by optimizing the total Loss, iterative training and back propagation are carried out, and finally Loss convergence is finished and model training is finished. In order to balance the loss of the two labels, i.e., the travelable region and the selectable travel region, it was found through experiments that the best segmentation result can be obtained when weight [ label ═ selectable travel region ] > is 3.
Further, the training step of the model comprises:
s1, a data set BDD100K disclosed by Berkeley is used as training data, data of a road object detection task comprise 10 types of 2D boundary frames, and a drivable region segmentation task comprises two different types: "directly drivable" zones "and" other drivable zones "; the data are processed according to the following steps of 8: 1: 1 dividing the training data into corresponding training data, verification data and test data; the BDD100K is a well-labeled data set for road object detection, instance segmentation, travelable region segmentation and lane marker detection.
S2, extracting features through a lightweight convolutional neural network MobileNet, and training parameters of a backbone network MobileNet, detection branches and parameters of division branches;
s3, carrying out one-time verification through a verification set after each iteration of model training for ten times, and taking the model with the best effect on the verification set as a final model;
and S4, testing the final model on a test set, wherein the test effect is consistent with the effect on the verification set.
The model training is completed, and after the test is free of problems, the model compression can be performed and is arranged on the unmanned vehicle, the size of the model which is not compressed is only 34M, and hardware resources are well saved.
Compared with the prior art, the beneficial effects are:
1. the multi-task learning method based on the MobileNet and combined training of target detection and semantic segmentation uniformly sends the extracted features into a detection branch and a segmentation branch, and simultaneously solves the problem of road object detection and segmentation of a drivable road area by using a single model;
2. when the road environment is sensed by the object, the object detection is relatively time-consuming. The method adopts a single-stage detector, aims at the problem of large object size difference in a road scene, selects an SSD detection method as a reference method, and quickly and accurately detects the road object;
3. before target detection and semantic segmentation are carried out, a multi-scale reception field module is introduced, is formed by convolution layers with different sizes and corresponding cavity convolutions with different proportions, carries out multi-scale feature fusion, and well solves the multi-scale problem, for example, the problem that objects with large scale size difference, such as people walking on roads and buses, cannot be accurately detected at the same time;
4. in conclusion, compared with the prior art, the method provided by the invention can more quickly and accurately complete two common unmanned perception tasks of road object detection and road driving area segmentation.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a diagram of a multi-scale receptor field module according to the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
As shown in fig. 1 and 2, a light-weight network-based multi-task learning method for real-time target detection and semantic segmentation includes a feature extraction module, a semantic segmentation module, a target detection module, and a multi-scale receptive field module; the feature extraction module selects a lightweight convolutional neural network MobileNet, extracts features through the MobileNet network, sends the features to the semantic segmentation module on the upper layer to finish the segmentation of the drivable road area and the selectable driving area, and sends the features to the target detection module on the lower layer to finish the object detection appearing in the road scene; and increasing the receptive field of the characteristic diagram through a multi-scale receptive field module, solving the multi-scale problem by convolution of different scales, and finally performing weighted summation on the loss function of the semantic segmentation module and the loss function of the target detection module to optimize the total module.
Specifically, the feature extraction module performs feature extraction on the RGB image through a lightweight convolutional neural network MobileNet; MobileNet employs a deep separable convolution instead of a conventional convolution to reduce the number of model parameters. The MobileNet network has smaller volume, less calculation amount and higher precision, and has great advantages in a lightweight neural network. In the process of extracting the features, the more the feature graph size obtained later is smaller, the larger the receptive field is, and the more abundant the semantic information is. MobileNet adopts deep separable convolution to replace conventional convolution to reduce the model parameter quantity, thereby shortening the model prediction time and lowering the requirement on hardware resources.
The method comprises the following steps of taking an SSD detection algorithm as a detection baseline model, and adding a multi-scale receptive field module into a target detection module; the multi-scale receptive field module is composed of cavity convolutions with different proportions, and the multi-scale receptive field is increased under the condition that the sizes of the cavity convolutions with different scales are not changed to solve the multi-scale problem. In addition, a multi-scale reception field module is added to the target detection module, as shown in fig. 2, the multi-scale reception field module is formed by convolution of holes with different proportions, and the convolution of the holes with different scales increases the multi-scale reception field under the condition that the size of the scales is not changed to solve the multi-scale problem. The cavity convolution with the ratio of 5 and the cavity convolution with the ratio of 7 are respectively used for increasing the receptive field of a large-scale object, the cavity convolution with the ratio of 3 is used for increasing the receptive field of a small object, and meanwhile, the convolution layers with different sizes are finally combined together, so that the problem of multiple scales commonly existing in a road scene is well solved.
In addition, the features extracted by the backbone network MobileNet are sent to a semantic segmentation module on the upper layer to complete the segmentation of the drivable road area and the selectable driving area, as shown in FIG. 1, merging operation is carried out on the feature maps on the first two layers, a multi-scale receptive field module is also added into the semantic segmentation module, and cavity convolution with different ratios is carried out on the feature maps on the second layer. The features extracted by the backbone network MobileNet are sent to a semantic segmentation module on the upper layer to complete the segmentation of the drivable road area and the selectable driving area, and the merging operation is carried out on the feature maps on the first two layers, so that the semantic information is increased under the condition of ensuring the scale of the feature maps. And similarly, a multi-scale receptive field module is also added into the semantic segmentation module, hole convolutions with different ratios are adopted for the second layer of feature maps, expansion convolutions with the ratios of 1, 3 and 6 are respectively selected to solve the multi-scale problem, and finally, the feature maps are combined together and then are decoded to finish the segmentation of the road driving area.
The Loss function of the multi-task learning is obtained by weighted summation of the Loss functions of all branches, the Loss function of the detection branch is the sum of the classification Loss and the regression Loss, and the Loss detection is Loss classification + Loss regression; the Loss function of the splitting branch is Loss split weight [ class ]. crossEntorpyLoss (x, class); finally, Loss function Loss total is Loss detection + Loss segmentation; by optimizing the total Loss, iterative training and back propagation are carried out, and finally Loss convergence is finished and model training is finished. In order to balance the loss of the two labels, i.e., the travelable region and the selectable travel region, it was found through experiments that the best segmentation result can be obtained when weight [ label ═ selectable travel region ] > is 3.
In this embodiment, the training step of the model includes:
s1, a data set BDD100K disclosed by Berkeley is used as training data, data of a road object detection task comprise 10 types of 2D boundary frames, and a drivable region segmentation task comprises two different types: "directly drivable" zones "and" other drivable zones "; the data are processed according to the following steps of 8: 1: 1 dividing the training data into corresponding training data, verification data and test data; the BDD100K is a well-labeled data set for road object detection, instance segmentation, travelable region segmentation and lane marker detection.
S2, extracting features through a lightweight convolutional neural network MobileNet, and training parameters of a backbone network MobileNet, detection branches and parameters of division branches;
s3, carrying out one-time verification through a verification set after each iteration of model training for ten times, and taking the model with the best effect on the verification set as a final model;
and S4, testing the final model on a test set, wherein the test effect is consistent with the effect on the verification set.
The model training is completed, and after the test is free of problems, the model compression can be performed and is arranged on the unmanned vehicle, the size of the model which is not compressed is only 34M, and hardware resources are well saved.
Example 1
When the multi-task learning method based on real-time target detection and semantic segmentation is implemented, firstly, training data are prepared, data and test data are verified, then model training and testing are carried out, and finally the model is deployed on an unmanned vehicle.
1) Training data, verifying data and preparing and processing test data;
step 1, according to the proportion of 8: 1: 1, dividing a BDD100K data set to obtain a corresponding training set, a verification set and a test set;
step 2, counting the dimension of each image detection object in the training set, so as to facilitate subsequent verification;
and 3, performing data enhancement, picture turnover, picture cutting, brightness saturation change and normalization processing on the training data to fully utilize the data.
2) Detailed process of model training:
step 11, using the pyrorch as a deep learning frame, pre-training the MobileNet on ImageNet1K, and selecting a MobileNet model with the best effect as a pre-training model;
step 2, the training equipment selects 4 Titan Xp as an experimental GPU, the video memory of each video card is 12GB, the more the GPUs are, the more the batch _ size is, and the better the trained model effect is;
step 3, model training parameters are mainly obtained by carrying out transfer learning on the MobileNet backbone network, carrying out fine adjustment on the MobileNet parameters, randomly initializing the parameters of the detection branch and the segmentation branch according to Gaussian distribution, and training from the random initialization of the parameters;
step 4, using SGD to perform gradient descent, setting the batch _ size of each GPU as 28, setting the weight attenuation as 0.0005, and setting the learning rate as 0.004 to perform 30 rounds of training; the model loss function is a weighted sum of the detection loss function and the segmentation loss function, and the segmentation loss function coefficient is set to be 3 to obtain the best model result through multiple experimental result verification;
step 5, selecting the model with the best result on the verification set as a final model, and continuing to compress the model if necessary to further reduce the hardware requirement;
3) the processed model is deployed on an unmanned vehicle, the road scene is verified, the object type with poor indexes is further optimized by debugging and observing indexes of the model for detecting and dividing the model on each object type, and the detection of the road object and the division of the drivable area and the selectable driving area of the front road can be completed through the camera after debugging.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (8)
1. A multitask learning method based on real-time target detection and semantic segmentation of a lightweight network is characterized by comprising a feature extraction module, a semantic segmentation module, a target detection module and a multi-scale receptive field module; the feature extraction module selects a lightweight convolutional neural network MobileNet, extracts features through the MobileNet network, sends the features to the semantic segmentation module on the upper layer to finish the segmentation of the drivable road area and the selectable driving area, and sends the features to the target detection module on the lower layer to finish the object detection appearing in the road scene; and increasing the receptive field of the characteristic diagram through a multi-scale receptive field module, solving the multi-scale problem by convolution of different scales, and finally performing weighted summation on the loss function of the semantic segmentation module and the loss function of the target detection module to optimize the total module.
2. The multitask learning method for real-time target detection and semantic segmentation based on the lightweight network as claimed in claim 1, wherein the feature extraction module performs feature extraction on the RGB image through a lightweight convolutional neural network (MobileNet); MobileNet employs a deep separable convolution instead of a conventional convolution to reduce the number of model parameters.
3. The multitask learning method based on the real-time target detection and the semantic segmentation of the lightweight network according to claim 1, characterized in that an SSD detection algorithm is used as a detection baseline model, and a multi-scale receptive field module is added in a target detection module; the multi-scale receptive field module is composed of cavity convolutions with different proportions, and the multi-scale receptive field is increased under the condition that the sizes of the cavity convolutions with different scales are not changed to solve the multi-scale problem.
4. The multitask learning method for real-time target detection and semantic segmentation based on the lightweight network as claimed in claim 3, wherein the features extracted by the backbone network MobileNet are sent to the semantic segmentation module on the upper layer to complete segmentation of the drivable road region and the selectable driving region, merging operation is performed on the two previous layers of feature maps, a multiscale field module is also added to the semantic segmentation module, and hole convolution with different ratios is performed on the second layer of feature maps.
5. The method as claimed in claim 3, wherein the multi-scale receptive field module added in the target detection module is used to increase the receptive field of large-scale objects by using a ratio of 5 and a ratio of 7 for hole convolution, increase the receptive field of small objects by using a ratio of 3 for hole convolution, and finally merge the convolutional layers with different sizes.
6. The light-weight-network-based real-time target detection and semantic segmentation multitask learning method according to claim 4, characterized in that a multiscale receptive field module is added into the semantic segmentation module, expansion convolution with the ratio of 1, 3 and 6 is respectively selected to solve a multiscale problem, and finally, after feature maps are combined together, decoding operation is performed to complete segmentation of a road driving area.
7. The method as claimed in any one of claims 2 to 6, wherein the Loss function of the multi-task learning is obtained by weighted summation of the Loss functions of the branches, and the Loss function of the detection branch is the classification Loss plus the regression Loss, LossDetection of=LossClassification+LossRegression(ii) a Loss function of the split branch is LossSegmentation=weight[class]*
crossEntorpyLoss (x, class); loss function LossGeneral assembly=LossDetection of+LossSegmentation(ii) a By optimizing the total Loss, iterative training and back propagation are carried out, and finally Loss convergence is finished and model training is finished.
8. The light-weight network-based real-time target detection and semantic segmentation multitask learning method according to claim 7, characterized in that the model training step comprises:
s1, a data set BDD100K disclosed by Berkeley is used as training data, data of a road object detection task comprise 10 types of 2D boundary frames, and a drivable region segmentation task comprises two different types: "directly drivable" zones "and" other drivable zones "; the data are processed according to the following steps of 8: 1: 1 dividing the training data into corresponding training data, verification data and test data;
s2, extracting features through a lightweight convolutional neural network MobileNet, and training parameters of a backbone network MobileNet, detection branches and parameters of division branches;
s3, carrying out one-time verification through a verification set after each iteration of model training for ten times, and taking the model with the best effect on the verification set as a final model;
and S4, testing the final model on a test set, wherein the test effect is consistent with the effect on the verification set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911060977.1A CN110941995A (en) | 2019-11-01 | 2019-11-01 | Real-time target detection and semantic segmentation multi-task learning method based on lightweight network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911060977.1A CN110941995A (en) | 2019-11-01 | 2019-11-01 | Real-time target detection and semantic segmentation multi-task learning method based on lightweight network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110941995A true CN110941995A (en) | 2020-03-31 |
Family
ID=69907282
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911060977.1A Pending CN110941995A (en) | 2019-11-01 | 2019-11-01 | Real-time target detection and semantic segmentation multi-task learning method based on lightweight network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110941995A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111695494A (en) * | 2020-06-10 | 2020-09-22 | 上海理工大学 | Three-dimensional point cloud data classification method based on multi-view convolution pooling |
CN111783784A (en) * | 2020-06-30 | 2020-10-16 | 创新奇智(合肥)科技有限公司 | Method and device for detecting building cavity, electronic equipment and storage medium |
CN111797717A (en) * | 2020-06-17 | 2020-10-20 | 电子科技大学 | High-speed high-precision SAR image ship detection method |
CN111882620A (en) * | 2020-06-19 | 2020-11-03 | 江苏大学 | Road drivable area segmentation method based on multi-scale information |
CN111898439A (en) * | 2020-06-29 | 2020-11-06 | 西安交通大学 | Deep learning-based traffic scene joint target detection and semantic segmentation method |
CN112084864A (en) * | 2020-08-06 | 2020-12-15 | 中国科学院空天信息创新研究院 | Model optimization method and device, electronic equipment and storage medium |
CN112101366A (en) * | 2020-09-11 | 2020-12-18 | 湖南大学 | Real-time segmentation system and method based on hybrid expansion network |
CN112183395A (en) * | 2020-09-30 | 2021-01-05 | 深兰人工智能(深圳)有限公司 | Road scene recognition method and system based on multitask learning neural network |
CN112257794A (en) * | 2020-10-27 | 2021-01-22 | 东南大学 | YOLO-based lightweight target detection method |
CN112528982A (en) * | 2020-11-18 | 2021-03-19 | 燕山大学 | Method, device and system for detecting water gauge line of ship |
CN112634276A (en) * | 2020-12-08 | 2021-04-09 | 西安理工大学 | Lightweight semantic segmentation method based on multi-scale visual feature extraction |
CN112633086A (en) * | 2020-12-09 | 2021-04-09 | 西安电子科技大学 | Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet |
CN112733662A (en) * | 2020-12-31 | 2021-04-30 | 上海智臻智能网络科技股份有限公司 | Feature detection method and device |
CN113486718A (en) * | 2021-06-08 | 2021-10-08 | 天津大学 | Fingertip detection method based on deep multitask learning |
CN113554156A (en) * | 2021-09-22 | 2021-10-26 | 中国海洋大学 | Multi-task learning model construction method based on attention mechanism and deformable convolution |
CN113902896A (en) * | 2021-09-24 | 2022-01-07 | 西安电子科技大学 | Infrared target detection method based on enlarged receptive field |
CN116012953A (en) * | 2023-03-22 | 2023-04-25 | 南京邮电大学 | Lightweight double-task sensing method based on CSI |
CN116612122A (en) * | 2023-07-20 | 2023-08-18 | 湖南快乐阳光互动娱乐传媒有限公司 | Image significance region detection method and device, storage medium and electronic equipment |
CN117746264A (en) * | 2023-12-07 | 2024-03-22 | 河北翔拓航空科技有限公司 | Multitasking implementation method for unmanned aerial vehicle detection and road segmentation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709568A (en) * | 2016-12-16 | 2017-05-24 | 北京工业大学 | RGB-D image object detection and semantic segmentation method based on deep convolution network |
CN107133616A (en) * | 2017-04-02 | 2017-09-05 | 南京汇川图像视觉技术有限公司 | A kind of non-division character locating and recognition methods based on deep learning |
CN107564034A (en) * | 2017-07-27 | 2018-01-09 | 华南理工大学 | The pedestrian detection and tracking of multiple target in a kind of monitor video |
CN108875595A (en) * | 2018-05-29 | 2018-11-23 | 重庆大学 | A kind of Driving Scene object detection method merged based on deep learning and multilayer feature |
CN109145769A (en) * | 2018-08-01 | 2019-01-04 | 辽宁工业大学 | The target detection network design method of blending image segmentation feature |
CN109325534A (en) * | 2018-09-22 | 2019-02-12 | 天津大学 | A kind of semantic segmentation method based on two-way multi-Scale Pyramid |
CN109635694A (en) * | 2018-12-03 | 2019-04-16 | 广东工业大学 | A kind of pedestrian detection method, device, equipment and computer readable storage medium |
CN109685017A (en) * | 2018-12-26 | 2019-04-26 | 中山大学 | A kind of ultrahigh speed real-time target detection system and detection method based on light weight neural network |
CN109741318A (en) * | 2018-12-30 | 2019-05-10 | 北京工业大学 | The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field |
CN110222593A (en) * | 2019-05-18 | 2019-09-10 | 四川弘和通讯有限公司 | A kind of vehicle real-time detection method based on small-scale neural network |
-
2019
- 2019-11-01 CN CN201911060977.1A patent/CN110941995A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709568A (en) * | 2016-12-16 | 2017-05-24 | 北京工业大学 | RGB-D image object detection and semantic segmentation method based on deep convolution network |
CN107133616A (en) * | 2017-04-02 | 2017-09-05 | 南京汇川图像视觉技术有限公司 | A kind of non-division character locating and recognition methods based on deep learning |
CN107564034A (en) * | 2017-07-27 | 2018-01-09 | 华南理工大学 | The pedestrian detection and tracking of multiple target in a kind of monitor video |
CN108875595A (en) * | 2018-05-29 | 2018-11-23 | 重庆大学 | A kind of Driving Scene object detection method merged based on deep learning and multilayer feature |
CN109145769A (en) * | 2018-08-01 | 2019-01-04 | 辽宁工业大学 | The target detection network design method of blending image segmentation feature |
CN109325534A (en) * | 2018-09-22 | 2019-02-12 | 天津大学 | A kind of semantic segmentation method based on two-way multi-Scale Pyramid |
CN109635694A (en) * | 2018-12-03 | 2019-04-16 | 广东工业大学 | A kind of pedestrian detection method, device, equipment and computer readable storage medium |
CN109685017A (en) * | 2018-12-26 | 2019-04-26 | 中山大学 | A kind of ultrahigh speed real-time target detection system and detection method based on light weight neural network |
CN109741318A (en) * | 2018-12-30 | 2019-05-10 | 北京工业大学 | The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field |
CN110222593A (en) * | 2019-05-18 | 2019-09-10 | 四川弘和通讯有限公司 | A kind of vehicle real-time detection method based on small-scale neural network |
Non-Patent Citations (1)
Title |
---|
白傑 等: "用轻量化卷积神经网络图像语义分割的交通场景理解", 《汽车安全与节能学报》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111695494A (en) * | 2020-06-10 | 2020-09-22 | 上海理工大学 | Three-dimensional point cloud data classification method based on multi-view convolution pooling |
CN111797717A (en) * | 2020-06-17 | 2020-10-20 | 电子科技大学 | High-speed high-precision SAR image ship detection method |
CN111797717B (en) * | 2020-06-17 | 2022-03-15 | 电子科技大学 | High-speed high-precision SAR image ship detection method |
CN111882620B (en) * | 2020-06-19 | 2024-08-02 | 江苏大学 | Road drivable area segmentation method based on multi-scale information |
CN111882620A (en) * | 2020-06-19 | 2020-11-03 | 江苏大学 | Road drivable area segmentation method based on multi-scale information |
CN111898439A (en) * | 2020-06-29 | 2020-11-06 | 西安交通大学 | Deep learning-based traffic scene joint target detection and semantic segmentation method |
CN111783784A (en) * | 2020-06-30 | 2020-10-16 | 创新奇智(合肥)科技有限公司 | Method and device for detecting building cavity, electronic equipment and storage medium |
CN112084864A (en) * | 2020-08-06 | 2020-12-15 | 中国科学院空天信息创新研究院 | Model optimization method and device, electronic equipment and storage medium |
CN112101366A (en) * | 2020-09-11 | 2020-12-18 | 湖南大学 | Real-time segmentation system and method based on hybrid expansion network |
CN112183395A (en) * | 2020-09-30 | 2021-01-05 | 深兰人工智能(深圳)有限公司 | Road scene recognition method and system based on multitask learning neural network |
CN112257794A (en) * | 2020-10-27 | 2021-01-22 | 东南大学 | YOLO-based lightweight target detection method |
CN112528982A (en) * | 2020-11-18 | 2021-03-19 | 燕山大学 | Method, device and system for detecting water gauge line of ship |
CN112634276A (en) * | 2020-12-08 | 2021-04-09 | 西安理工大学 | Lightweight semantic segmentation method based on multi-scale visual feature extraction |
CN112634276B (en) * | 2020-12-08 | 2023-04-07 | 西安理工大学 | Lightweight semantic segmentation method based on multi-scale visual feature extraction |
CN112633086A (en) * | 2020-12-09 | 2021-04-09 | 西安电子科技大学 | Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet |
CN112633086B (en) * | 2020-12-09 | 2024-01-26 | 西安电子科技大学 | Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet |
CN112733662A (en) * | 2020-12-31 | 2021-04-30 | 上海智臻智能网络科技股份有限公司 | Feature detection method and device |
CN113486718A (en) * | 2021-06-08 | 2021-10-08 | 天津大学 | Fingertip detection method based on deep multitask learning |
CN113554156A (en) * | 2021-09-22 | 2021-10-26 | 中国海洋大学 | Multi-task learning model construction method based on attention mechanism and deformable convolution |
CN113902896A (en) * | 2021-09-24 | 2022-01-07 | 西安电子科技大学 | Infrared target detection method based on enlarged receptive field |
CN116012953A (en) * | 2023-03-22 | 2023-04-25 | 南京邮电大学 | Lightweight double-task sensing method based on CSI |
CN116612122A (en) * | 2023-07-20 | 2023-08-18 | 湖南快乐阳光互动娱乐传媒有限公司 | Image significance region detection method and device, storage medium and electronic equipment |
CN116612122B (en) * | 2023-07-20 | 2023-10-10 | 湖南快乐阳光互动娱乐传媒有限公司 | Image significance region detection method and device, storage medium and electronic equipment |
CN117746264A (en) * | 2023-12-07 | 2024-03-22 | 河北翔拓航空科技有限公司 | Multitasking implementation method for unmanned aerial vehicle detection and road segmentation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110941995A (en) | Real-time target detection and semantic segmentation multi-task learning method based on lightweight network | |
CN109118467B (en) | Infrared and visible light image fusion method based on generation countermeasure network | |
CN108764065B (en) | Pedestrian re-recognition feature fusion aided learning method | |
CN106897714B (en) | Video motion detection method based on convolutional neural network | |
CN110059586B (en) | Iris positioning and segmenting system based on cavity residual error attention structure | |
CN113420607A (en) | Multi-scale target detection and identification method for unmanned aerial vehicle | |
CN108230291B (en) | Object recognition system training method, object recognition method, device and electronic equipment | |
CN108960059A (en) | A kind of video actions recognition methods and device | |
CN109903339B (en) | Video group figure positioning detection method based on multi-dimensional fusion features | |
CN110232361B (en) | Human behavior intention identification method and system based on three-dimensional residual dense network | |
CN107563349A (en) | A kind of Population size estimation method based on VGGNet | |
CN110070029A (en) | A kind of gait recognition method and device | |
CN114332473B (en) | Object detection method, device, computer apparatus, storage medium, and program product | |
CN111401116B (en) | Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network | |
CN109740656A (en) | A kind of ore method for separating based on convolutional neural networks | |
CN110569780A (en) | high-precision face recognition method based on deep transfer learning | |
CN114360073B (en) | Image recognition method and related device | |
CN113963170A (en) | RGBD image saliency detection method based on interactive feature fusion | |
CN109919246A (en) | Pedestrian's recognition methods again based on self-adaptive features cluster and multiple risks fusion | |
CN112669343A (en) | Zhuang minority nationality clothing segmentation method based on deep learning | |
CN117011883A (en) | Pedestrian re-recognition method based on pyramid convolution and transducer double branches | |
CN117611994A (en) | Remote sensing image target detection method based on attention mechanism weighting feature fusion | |
CN117351487A (en) | Medical image segmentation method and system for fusing adjacent area and edge information | |
CN111310720A (en) | Pedestrian re-identification method and system based on graph metric learning | |
CN113762166A (en) | Small target detection improvement method and system based on wearable equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200331 |