CN110941995A - Real-time target detection and semantic segmentation multi-task learning method based on lightweight network - Google Patents

Real-time target detection and semantic segmentation multi-task learning method based on lightweight network Download PDF

Info

Publication number
CN110941995A
CN110941995A CN201911060977.1A CN201911060977A CN110941995A CN 110941995 A CN110941995 A CN 110941995A CN 201911060977 A CN201911060977 A CN 201911060977A CN 110941995 A CN110941995 A CN 110941995A
Authority
CN
China
Prior art keywords
module
loss
semantic segmentation
target detection
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911060977.1A
Other languages
Chinese (zh)
Inventor
侯舟帆
陈龙
张亚琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201911060977.1A priority Critical patent/CN110941995A/en
Publication of CN110941995A publication Critical patent/CN110941995A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a real-time target detection and semantic segmentation multi-task learning method based on a lightweight network. The system comprises a feature extraction module, a semantic segmentation module, a target detection module and a multi-scale receptive field module; the feature extraction module selects a lightweight convolutional neural network MobileNet, extracts features through the MobileNet network, sends the features into the semantic segmentation module to finish the segmentation of the drivable road area and the selectable driving area, and sends the features into the target detection module to finish the object detection appearing in the road scene; and increasing the receptive field of the characteristic diagram through a multi-scale receptive field module, solving the multi-scale problem by convolution of different scales, and finally performing weighted summation on the loss function of the semantic segmentation module and the loss function of the target detection module to optimize the total module. Compared with the prior art, the method provided by the invention can more quickly and accurately complete two common unmanned perception tasks of road object detection and road driving area segmentation.

Description

Real-time target detection and semantic segmentation multi-task learning method based on lightweight network
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a real-time target detection and semantic segmentation multi-task learning method based on a lightweight network.
Background
Computer vision is becoming increasingly popular in autonomous driving, mainly due to the rise of deep learning techniques based on neural networks. The advent of more and more common data sets and developed hardware resources has prompted related research efforts and further pushed the development of computer vision technology. Many computer vision tasks are used in autonomous vehicles, such as object detection and road segmentation, which are crucial for perceiving the driving environment. The current trend is to continuously improve the accuracy of these tasks while keeping the inference time as short as possible. The model perception accuracy is only met, the model prediction speed is not high, great danger is brought to decision making of the unmanned vehicle, and decision making processing cannot be carried out in time when sudden accidents happen, so that the model needs to have high prediction speed, and the vehicle can be guaranteed to have enough time to make decisions. In addition, the hardware resources of the autonomous vehicle are limited, and it is also an important task to fully utilize the hardware resources. And the object under the road scene has the too big problem of size difference of yardstick, and conventional model can't accurately accomplish the perception problem to big object and little object simultaneously, so can explode many potential problems.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a light-weight network-based multi-task learning method for real-time target detection and semantic segmentation, and more quickly and accurately completes two common unmanned perception tasks of road object detection and road driving area segmentation.
In order to solve the technical problems, the invention adopts the technical scheme that: a multitask learning method based on real-time target detection and semantic segmentation of a lightweight network comprises a feature extraction module, a semantic segmentation module, a target detection module and a multi-scale receptive field module; the feature extraction module selects a lightweight convolutional neural network MobileNet, extracts features through the MobileNet network, sends the features to the semantic segmentation module on the upper layer to finish the segmentation of the drivable road area and the selectable driving area, and sends the features to the target detection module on the lower layer to finish the object detection appearing in the road scene; and increasing the receptive field of the characteristic diagram through a multi-scale receptive field module, solving the multi-scale problem by convolution of different scales, and finally performing weighted summation on the loss function of the semantic segmentation module and the loss function of the target detection module to optimize the total module.
Further, the feature extraction module performs feature extraction on the RGB image through a lightweight convolutional neural network MobileNet; MobileNet employs a deep separable convolution instead of a conventional convolution to reduce the number of model parameters. The MobileNet network has smaller volume, less calculation amount and higher precision, and has great advantages in a lightweight neural network. In the process of extracting the features, the more the feature graph size obtained later is smaller, the larger the receptive field is, and the more abundant the semantic information is. MobileNet adopts deep separable convolution to replace conventional convolution to reduce the model parameter quantity, thereby shortening the model prediction time and lowering the requirement on hardware resources.
Further, taking an SSD detection algorithm as a detection baseline model, and adding a multi-scale receptive field module into a target detection module; the multi-scale receptive field module is composed of cavity convolutions with different proportions, and the multi-scale receptive field is increased under the condition that the sizes of the cavity convolutions with different scales are not changed to solve the multi-scale problem. In addition, a multi-scale reception field module is added into a target detection module, the multi-scale reception field module is formed by convolution of holes with different proportions, and the convolution of the holes with different scales increases the multi-scale reception field under the condition that the size of the scales is not changed so as to solve the multi-scale problem. The cavity convolution with the ratio of 5 and the cavity convolution with the ratio of 7 are respectively used for increasing the receptive field of a large-scale object, the cavity convolution with the ratio of 3 is used for increasing the receptive field of a small object, and meanwhile, the convolution layers with different sizes are finally combined together, so that the problem of multiple scales commonly existing in a road scene is well solved.
Furthermore, the features extracted by the backbone network MobileNet are sent to a semantic segmentation module on the upper layer to complete the segmentation of the drivable road area and the selectable driving area, the feature maps on the first two layers are merged, a multi-scale receptive field module is also added into the semantic segmentation module, and the feature maps on the second layer are subjected to cavity convolution with different ratios. The features extracted by the backbone network MobileNet are sent to a semantic segmentation module on the upper layer to complete the segmentation of the drivable road area and the selectable driving area, and the merging operation is carried out on the feature maps on the first two layers, so that the semantic information is increased under the condition of ensuring the scale of the feature maps. And similarly, a multi-scale receptive field module is also added into the semantic segmentation module, hole convolutions with different ratios are adopted for the second layer of feature maps, expansion convolutions with the ratios of 1, 3 and 6 are respectively selected to solve the multi-scale problem, and finally, the feature maps are combined together and then are decoded to finish the segmentation of the road driving area.
Furthermore, the multi-scale reception field module added in the target detection module is used for increasing the reception field of the large-scale object by using the cavity convolution with the ratio of 5 and 7, increasing the reception field of the small object by using the cavity convolution with the ratio of 3, and simultaneously adopting convolution layers with different sizes to be finally combined together.
Furthermore, a multi-scale receptive field module is added into the semantic segmentation module, expansion convolution with the ratio of 1, 3 and 6 is respectively selected to solve the multi-scale problem, and finally, the feature maps are merged together and then are decoded to finish the segmentation of the road driving area.
Further, a Loss function of the multi-task learning is obtained by weighted summation of the Loss functions of all branches, the Loss function of the detection branch is obtained by adding the regression Loss to the classification Loss, and the Loss detection is Loss classification + Loss regression; the Loss function of the splitting branch is Loss split weight [ class ]. crossEntorpyLoss (x, class); finally, Loss function Loss total is Loss detection + Loss segmentation; by optimizing the total Loss, iterative training and back propagation are carried out, and finally Loss convergence is finished and model training is finished. In order to balance the loss of the two labels, i.e., the travelable region and the selectable travel region, it was found through experiments that the best segmentation result can be obtained when weight [ label ═ selectable travel region ] > is 3.
Further, the training step of the model comprises:
s1, a data set BDD100K disclosed by Berkeley is used as training data, data of a road object detection task comprise 10 types of 2D boundary frames, and a drivable region segmentation task comprises two different types: "directly drivable" zones "and" other drivable zones "; the data are processed according to the following steps of 8: 1: 1 dividing the training data into corresponding training data, verification data and test data; the BDD100K is a well-labeled data set for road object detection, instance segmentation, travelable region segmentation and lane marker detection.
S2, extracting features through a lightweight convolutional neural network MobileNet, and training parameters of a backbone network MobileNet, detection branches and parameters of division branches;
s3, carrying out one-time verification through a verification set after each iteration of model training for ten times, and taking the model with the best effect on the verification set as a final model;
and S4, testing the final model on a test set, wherein the test effect is consistent with the effect on the verification set.
The model training is completed, and after the test is free of problems, the model compression can be performed and is arranged on the unmanned vehicle, the size of the model which is not compressed is only 34M, and hardware resources are well saved.
Compared with the prior art, the beneficial effects are:
1. the multi-task learning method based on the MobileNet and combined training of target detection and semantic segmentation uniformly sends the extracted features into a detection branch and a segmentation branch, and simultaneously solves the problem of road object detection and segmentation of a drivable road area by using a single model;
2. when the road environment is sensed by the object, the object detection is relatively time-consuming. The method adopts a single-stage detector, aims at the problem of large object size difference in a road scene, selects an SSD detection method as a reference method, and quickly and accurately detects the road object;
3. before target detection and semantic segmentation are carried out, a multi-scale reception field module is introduced, is formed by convolution layers with different sizes and corresponding cavity convolutions with different proportions, carries out multi-scale feature fusion, and well solves the multi-scale problem, for example, the problem that objects with large scale size difference, such as people walking on roads and buses, cannot be accurately detected at the same time;
4. in conclusion, compared with the prior art, the method provided by the invention can more quickly and accurately complete two common unmanned perception tasks of road object detection and road driving area segmentation.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a diagram of a multi-scale receptor field module according to the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
As shown in fig. 1 and 2, a light-weight network-based multi-task learning method for real-time target detection and semantic segmentation includes a feature extraction module, a semantic segmentation module, a target detection module, and a multi-scale receptive field module; the feature extraction module selects a lightweight convolutional neural network MobileNet, extracts features through the MobileNet network, sends the features to the semantic segmentation module on the upper layer to finish the segmentation of the drivable road area and the selectable driving area, and sends the features to the target detection module on the lower layer to finish the object detection appearing in the road scene; and increasing the receptive field of the characteristic diagram through a multi-scale receptive field module, solving the multi-scale problem by convolution of different scales, and finally performing weighted summation on the loss function of the semantic segmentation module and the loss function of the target detection module to optimize the total module.
Specifically, the feature extraction module performs feature extraction on the RGB image through a lightweight convolutional neural network MobileNet; MobileNet employs a deep separable convolution instead of a conventional convolution to reduce the number of model parameters. The MobileNet network has smaller volume, less calculation amount and higher precision, and has great advantages in a lightweight neural network. In the process of extracting the features, the more the feature graph size obtained later is smaller, the larger the receptive field is, and the more abundant the semantic information is. MobileNet adopts deep separable convolution to replace conventional convolution to reduce the model parameter quantity, thereby shortening the model prediction time and lowering the requirement on hardware resources.
The method comprises the following steps of taking an SSD detection algorithm as a detection baseline model, and adding a multi-scale receptive field module into a target detection module; the multi-scale receptive field module is composed of cavity convolutions with different proportions, and the multi-scale receptive field is increased under the condition that the sizes of the cavity convolutions with different scales are not changed to solve the multi-scale problem. In addition, a multi-scale reception field module is added to the target detection module, as shown in fig. 2, the multi-scale reception field module is formed by convolution of holes with different proportions, and the convolution of the holes with different scales increases the multi-scale reception field under the condition that the size of the scales is not changed to solve the multi-scale problem. The cavity convolution with the ratio of 5 and the cavity convolution with the ratio of 7 are respectively used for increasing the receptive field of a large-scale object, the cavity convolution with the ratio of 3 is used for increasing the receptive field of a small object, and meanwhile, the convolution layers with different sizes are finally combined together, so that the problem of multiple scales commonly existing in a road scene is well solved.
In addition, the features extracted by the backbone network MobileNet are sent to a semantic segmentation module on the upper layer to complete the segmentation of the drivable road area and the selectable driving area, as shown in FIG. 1, merging operation is carried out on the feature maps on the first two layers, a multi-scale receptive field module is also added into the semantic segmentation module, and cavity convolution with different ratios is carried out on the feature maps on the second layer. The features extracted by the backbone network MobileNet are sent to a semantic segmentation module on the upper layer to complete the segmentation of the drivable road area and the selectable driving area, and the merging operation is carried out on the feature maps on the first two layers, so that the semantic information is increased under the condition of ensuring the scale of the feature maps. And similarly, a multi-scale receptive field module is also added into the semantic segmentation module, hole convolutions with different ratios are adopted for the second layer of feature maps, expansion convolutions with the ratios of 1, 3 and 6 are respectively selected to solve the multi-scale problem, and finally, the feature maps are combined together and then are decoded to finish the segmentation of the road driving area.
The Loss function of the multi-task learning is obtained by weighted summation of the Loss functions of all branches, the Loss function of the detection branch is the sum of the classification Loss and the regression Loss, and the Loss detection is Loss classification + Loss regression; the Loss function of the splitting branch is Loss split weight [ class ]. crossEntorpyLoss (x, class); finally, Loss function Loss total is Loss detection + Loss segmentation; by optimizing the total Loss, iterative training and back propagation are carried out, and finally Loss convergence is finished and model training is finished. In order to balance the loss of the two labels, i.e., the travelable region and the selectable travel region, it was found through experiments that the best segmentation result can be obtained when weight [ label ═ selectable travel region ] > is 3.
In this embodiment, the training step of the model includes:
s1, a data set BDD100K disclosed by Berkeley is used as training data, data of a road object detection task comprise 10 types of 2D boundary frames, and a drivable region segmentation task comprises two different types: "directly drivable" zones "and" other drivable zones "; the data are processed according to the following steps of 8: 1: 1 dividing the training data into corresponding training data, verification data and test data; the BDD100K is a well-labeled data set for road object detection, instance segmentation, travelable region segmentation and lane marker detection.
S2, extracting features through a lightweight convolutional neural network MobileNet, and training parameters of a backbone network MobileNet, detection branches and parameters of division branches;
s3, carrying out one-time verification through a verification set after each iteration of model training for ten times, and taking the model with the best effect on the verification set as a final model;
and S4, testing the final model on a test set, wherein the test effect is consistent with the effect on the verification set.
The model training is completed, and after the test is free of problems, the model compression can be performed and is arranged on the unmanned vehicle, the size of the model which is not compressed is only 34M, and hardware resources are well saved.
Example 1
When the multi-task learning method based on real-time target detection and semantic segmentation is implemented, firstly, training data are prepared, data and test data are verified, then model training and testing are carried out, and finally the model is deployed on an unmanned vehicle.
1) Training data, verifying data and preparing and processing test data;
step 1, according to the proportion of 8: 1: 1, dividing a BDD100K data set to obtain a corresponding training set, a verification set and a test set;
step 2, counting the dimension of each image detection object in the training set, so as to facilitate subsequent verification;
and 3, performing data enhancement, picture turnover, picture cutting, brightness saturation change and normalization processing on the training data to fully utilize the data.
2) Detailed process of model training:
step 11, using the pyrorch as a deep learning frame, pre-training the MobileNet on ImageNet1K, and selecting a MobileNet model with the best effect as a pre-training model;
step 2, the training equipment selects 4 Titan Xp as an experimental GPU, the video memory of each video card is 12GB, the more the GPUs are, the more the batch _ size is, and the better the trained model effect is;
step 3, model training parameters are mainly obtained by carrying out transfer learning on the MobileNet backbone network, carrying out fine adjustment on the MobileNet parameters, randomly initializing the parameters of the detection branch and the segmentation branch according to Gaussian distribution, and training from the random initialization of the parameters;
step 4, using SGD to perform gradient descent, setting the batch _ size of each GPU as 28, setting the weight attenuation as 0.0005, and setting the learning rate as 0.004 to perform 30 rounds of training; the model loss function is a weighted sum of the detection loss function and the segmentation loss function, and the segmentation loss function coefficient is set to be 3 to obtain the best model result through multiple experimental result verification;
step 5, selecting the model with the best result on the verification set as a final model, and continuing to compress the model if necessary to further reduce the hardware requirement;
3) the processed model is deployed on an unmanned vehicle, the road scene is verified, the object type with poor indexes is further optimized by debugging and observing indexes of the model for detecting and dividing the model on each object type, and the detection of the road object and the division of the drivable area and the selectable driving area of the front road can be completed through the camera after debugging.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (8)

1. A multitask learning method based on real-time target detection and semantic segmentation of a lightweight network is characterized by comprising a feature extraction module, a semantic segmentation module, a target detection module and a multi-scale receptive field module; the feature extraction module selects a lightweight convolutional neural network MobileNet, extracts features through the MobileNet network, sends the features to the semantic segmentation module on the upper layer to finish the segmentation of the drivable road area and the selectable driving area, and sends the features to the target detection module on the lower layer to finish the object detection appearing in the road scene; and increasing the receptive field of the characteristic diagram through a multi-scale receptive field module, solving the multi-scale problem by convolution of different scales, and finally performing weighted summation on the loss function of the semantic segmentation module and the loss function of the target detection module to optimize the total module.
2. The multitask learning method for real-time target detection and semantic segmentation based on the lightweight network as claimed in claim 1, wherein the feature extraction module performs feature extraction on the RGB image through a lightweight convolutional neural network (MobileNet); MobileNet employs a deep separable convolution instead of a conventional convolution to reduce the number of model parameters.
3. The multitask learning method based on the real-time target detection and the semantic segmentation of the lightweight network according to claim 1, characterized in that an SSD detection algorithm is used as a detection baseline model, and a multi-scale receptive field module is added in a target detection module; the multi-scale receptive field module is composed of cavity convolutions with different proportions, and the multi-scale receptive field is increased under the condition that the sizes of the cavity convolutions with different scales are not changed to solve the multi-scale problem.
4. The multitask learning method for real-time target detection and semantic segmentation based on the lightweight network as claimed in claim 3, wherein the features extracted by the backbone network MobileNet are sent to the semantic segmentation module on the upper layer to complete segmentation of the drivable road region and the selectable driving region, merging operation is performed on the two previous layers of feature maps, a multiscale field module is also added to the semantic segmentation module, and hole convolution with different ratios is performed on the second layer of feature maps.
5. The method as claimed in claim 3, wherein the multi-scale receptive field module added in the target detection module is used to increase the receptive field of large-scale objects by using a ratio of 5 and a ratio of 7 for hole convolution, increase the receptive field of small objects by using a ratio of 3 for hole convolution, and finally merge the convolutional layers with different sizes.
6. The light-weight-network-based real-time target detection and semantic segmentation multitask learning method according to claim 4, characterized in that a multiscale receptive field module is added into the semantic segmentation module, expansion convolution with the ratio of 1, 3 and 6 is respectively selected to solve a multiscale problem, and finally, after feature maps are combined together, decoding operation is performed to complete segmentation of a road driving area.
7. The method as claimed in any one of claims 2 to 6, wherein the Loss function of the multi-task learning is obtained by weighted summation of the Loss functions of the branches, and the Loss function of the detection branch is the classification Loss plus the regression Loss, LossDetection of=LossClassification+LossRegression(ii) a Loss function of the split branch is LossSegmentation=weight[class]*
crossEntorpyLoss (x, class); loss function LossGeneral assembly=LossDetection of+LossSegmentation(ii) a By optimizing the total Loss, iterative training and back propagation are carried out, and finally Loss convergence is finished and model training is finished.
8. The light-weight network-based real-time target detection and semantic segmentation multitask learning method according to claim 7, characterized in that the model training step comprises:
s1, a data set BDD100K disclosed by Berkeley is used as training data, data of a road object detection task comprise 10 types of 2D boundary frames, and a drivable region segmentation task comprises two different types: "directly drivable" zones "and" other drivable zones "; the data are processed according to the following steps of 8: 1: 1 dividing the training data into corresponding training data, verification data and test data;
s2, extracting features through a lightweight convolutional neural network MobileNet, and training parameters of a backbone network MobileNet, detection branches and parameters of division branches;
s3, carrying out one-time verification through a verification set after each iteration of model training for ten times, and taking the model with the best effect on the verification set as a final model;
and S4, testing the final model on a test set, wherein the test effect is consistent with the effect on the verification set.
CN201911060977.1A 2019-11-01 2019-11-01 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network Pending CN110941995A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911060977.1A CN110941995A (en) 2019-11-01 2019-11-01 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911060977.1A CN110941995A (en) 2019-11-01 2019-11-01 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network

Publications (1)

Publication Number Publication Date
CN110941995A true CN110941995A (en) 2020-03-31

Family

ID=69907282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911060977.1A Pending CN110941995A (en) 2019-11-01 2019-11-01 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network

Country Status (1)

Country Link
CN (1) CN110941995A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695494A (en) * 2020-06-10 2020-09-22 上海理工大学 Three-dimensional point cloud data classification method based on multi-view convolution pooling
CN111783784A (en) * 2020-06-30 2020-10-16 创新奇智(合肥)科技有限公司 Method and device for detecting building cavity, electronic equipment and storage medium
CN111797717A (en) * 2020-06-17 2020-10-20 电子科技大学 High-speed high-precision SAR image ship detection method
CN111882620A (en) * 2020-06-19 2020-11-03 江苏大学 Road drivable area segmentation method based on multi-scale information
CN111898439A (en) * 2020-06-29 2020-11-06 西安交通大学 Deep learning-based traffic scene joint target detection and semantic segmentation method
CN112084864A (en) * 2020-08-06 2020-12-15 中国科学院空天信息创新研究院 Model optimization method and device, electronic equipment and storage medium
CN112101366A (en) * 2020-09-11 2020-12-18 湖南大学 Real-time segmentation system and method based on hybrid expansion network
CN112183395A (en) * 2020-09-30 2021-01-05 深兰人工智能(深圳)有限公司 Road scene recognition method and system based on multitask learning neural network
CN112257794A (en) * 2020-10-27 2021-01-22 东南大学 YOLO-based lightweight target detection method
CN112528982A (en) * 2020-11-18 2021-03-19 燕山大学 Method, device and system for detecting water gauge line of ship
CN112634276A (en) * 2020-12-08 2021-04-09 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction
CN112633086A (en) * 2020-12-09 2021-04-09 西安电子科技大学 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet
CN112733662A (en) * 2020-12-31 2021-04-30 上海智臻智能网络科技股份有限公司 Feature detection method and device
CN113486718A (en) * 2021-06-08 2021-10-08 天津大学 Fingertip detection method based on deep multitask learning
CN113554156A (en) * 2021-09-22 2021-10-26 中国海洋大学 Multi-task learning model construction method based on attention mechanism and deformable convolution
CN113902896A (en) * 2021-09-24 2022-01-07 西安电子科技大学 Infrared target detection method based on enlarged receptive field
CN116012953A (en) * 2023-03-22 2023-04-25 南京邮电大学 Lightweight double-task sensing method based on CSI
CN116612122A (en) * 2023-07-20 2023-08-18 湖南快乐阳光互动娱乐传媒有限公司 Image significance region detection method and device, storage medium and electronic equipment
CN117746264A (en) * 2023-12-07 2024-03-22 河北翔拓航空科技有限公司 Multitasking implementation method for unmanned aerial vehicle detection and road segmentation

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN107133616A (en) * 2017-04-02 2017-09-05 南京汇川图像视觉技术有限公司 A kind of non-division character locating and recognition methods based on deep learning
CN107564034A (en) * 2017-07-27 2018-01-09 华南理工大学 The pedestrian detection and tracking of multiple target in a kind of monitor video
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN109145769A (en) * 2018-08-01 2019-01-04 辽宁工业大学 The target detection network design method of blending image segmentation feature
CN109325534A (en) * 2018-09-22 2019-02-12 天津大学 A kind of semantic segmentation method based on two-way multi-Scale Pyramid
CN109635694A (en) * 2018-12-03 2019-04-16 广东工业大学 A kind of pedestrian detection method, device, equipment and computer readable storage medium
CN109685017A (en) * 2018-12-26 2019-04-26 中山大学 A kind of ultrahigh speed real-time target detection system and detection method based on light weight neural network
CN109741318A (en) * 2018-12-30 2019-05-10 北京工业大学 The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field
CN110222593A (en) * 2019-05-18 2019-09-10 四川弘和通讯有限公司 A kind of vehicle real-time detection method based on small-scale neural network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN107133616A (en) * 2017-04-02 2017-09-05 南京汇川图像视觉技术有限公司 A kind of non-division character locating and recognition methods based on deep learning
CN107564034A (en) * 2017-07-27 2018-01-09 华南理工大学 The pedestrian detection and tracking of multiple target in a kind of monitor video
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN109145769A (en) * 2018-08-01 2019-01-04 辽宁工业大学 The target detection network design method of blending image segmentation feature
CN109325534A (en) * 2018-09-22 2019-02-12 天津大学 A kind of semantic segmentation method based on two-way multi-Scale Pyramid
CN109635694A (en) * 2018-12-03 2019-04-16 广东工业大学 A kind of pedestrian detection method, device, equipment and computer readable storage medium
CN109685017A (en) * 2018-12-26 2019-04-26 中山大学 A kind of ultrahigh speed real-time target detection system and detection method based on light weight neural network
CN109741318A (en) * 2018-12-30 2019-05-10 北京工业大学 The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field
CN110222593A (en) * 2019-05-18 2019-09-10 四川弘和通讯有限公司 A kind of vehicle real-time detection method based on small-scale neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
白傑 等: "用轻量化卷积神经网络图像语义分割的交通场景理解", 《汽车安全与节能学报》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695494A (en) * 2020-06-10 2020-09-22 上海理工大学 Three-dimensional point cloud data classification method based on multi-view convolution pooling
CN111797717A (en) * 2020-06-17 2020-10-20 电子科技大学 High-speed high-precision SAR image ship detection method
CN111797717B (en) * 2020-06-17 2022-03-15 电子科技大学 High-speed high-precision SAR image ship detection method
CN111882620B (en) * 2020-06-19 2024-08-02 江苏大学 Road drivable area segmentation method based on multi-scale information
CN111882620A (en) * 2020-06-19 2020-11-03 江苏大学 Road drivable area segmentation method based on multi-scale information
CN111898439A (en) * 2020-06-29 2020-11-06 西安交通大学 Deep learning-based traffic scene joint target detection and semantic segmentation method
CN111783784A (en) * 2020-06-30 2020-10-16 创新奇智(合肥)科技有限公司 Method and device for detecting building cavity, electronic equipment and storage medium
CN112084864A (en) * 2020-08-06 2020-12-15 中国科学院空天信息创新研究院 Model optimization method and device, electronic equipment and storage medium
CN112101366A (en) * 2020-09-11 2020-12-18 湖南大学 Real-time segmentation system and method based on hybrid expansion network
CN112183395A (en) * 2020-09-30 2021-01-05 深兰人工智能(深圳)有限公司 Road scene recognition method and system based on multitask learning neural network
CN112257794A (en) * 2020-10-27 2021-01-22 东南大学 YOLO-based lightweight target detection method
CN112528982A (en) * 2020-11-18 2021-03-19 燕山大学 Method, device and system for detecting water gauge line of ship
CN112634276A (en) * 2020-12-08 2021-04-09 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction
CN112634276B (en) * 2020-12-08 2023-04-07 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction
CN112633086A (en) * 2020-12-09 2021-04-09 西安电子科技大学 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet
CN112633086B (en) * 2020-12-09 2024-01-26 西安电子科技大学 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet
CN112733662A (en) * 2020-12-31 2021-04-30 上海智臻智能网络科技股份有限公司 Feature detection method and device
CN113486718A (en) * 2021-06-08 2021-10-08 天津大学 Fingertip detection method based on deep multitask learning
CN113554156A (en) * 2021-09-22 2021-10-26 中国海洋大学 Multi-task learning model construction method based on attention mechanism and deformable convolution
CN113902896A (en) * 2021-09-24 2022-01-07 西安电子科技大学 Infrared target detection method based on enlarged receptive field
CN116012953A (en) * 2023-03-22 2023-04-25 南京邮电大学 Lightweight double-task sensing method based on CSI
CN116612122A (en) * 2023-07-20 2023-08-18 湖南快乐阳光互动娱乐传媒有限公司 Image significance region detection method and device, storage medium and electronic equipment
CN116612122B (en) * 2023-07-20 2023-10-10 湖南快乐阳光互动娱乐传媒有限公司 Image significance region detection method and device, storage medium and electronic equipment
CN117746264A (en) * 2023-12-07 2024-03-22 河北翔拓航空科技有限公司 Multitasking implementation method for unmanned aerial vehicle detection and road segmentation

Similar Documents

Publication Publication Date Title
CN110941995A (en) Real-time target detection and semantic segmentation multi-task learning method based on lightweight network
CN109118467B (en) Infrared and visible light image fusion method based on generation countermeasure network
CN108764065B (en) Pedestrian re-recognition feature fusion aided learning method
CN106897714B (en) Video motion detection method based on convolutional neural network
CN110059586B (en) Iris positioning and segmenting system based on cavity residual error attention structure
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN108230291B (en) Object recognition system training method, object recognition method, device and electronic equipment
CN108960059A (en) A kind of video actions recognition methods and device
CN109903339B (en) Video group figure positioning detection method based on multi-dimensional fusion features
CN110232361B (en) Human behavior intention identification method and system based on three-dimensional residual dense network
CN107563349A (en) A kind of Population size estimation method based on VGGNet
CN110070029A (en) A kind of gait recognition method and device
CN114332473B (en) Object detection method, device, computer apparatus, storage medium, and program product
CN111401116B (en) Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network
CN109740656A (en) A kind of ore method for separating based on convolutional neural networks
CN110569780A (en) high-precision face recognition method based on deep transfer learning
CN114360073B (en) Image recognition method and related device
CN113963170A (en) RGBD image saliency detection method based on interactive feature fusion
CN109919246A (en) Pedestrian's recognition methods again based on self-adaptive features cluster and multiple risks fusion
CN112669343A (en) Zhuang minority nationality clothing segmentation method based on deep learning
CN117011883A (en) Pedestrian re-recognition method based on pyramid convolution and transducer double branches
CN117611994A (en) Remote sensing image target detection method based on attention mechanism weighting feature fusion
CN117351487A (en) Medical image segmentation method and system for fusing adjacent area and edge information
CN111310720A (en) Pedestrian re-identification method and system based on graph metric learning
CN113762166A (en) Small target detection improvement method and system based on wearable equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200331