CN113111722A - Automatic driving target identification method based on improved Mask R-CNN - Google Patents

Automatic driving target identification method based on improved Mask R-CNN Download PDF

Info

Publication number
CN113111722A
CN113111722A CN202110287700.3A CN202110287700A CN113111722A CN 113111722 A CN113111722 A CN 113111722A CN 202110287700 A CN202110287700 A CN 202110287700A CN 113111722 A CN113111722 A CN 113111722A
Authority
CN
China
Prior art keywords
frame
recommendation
automatic driving
cnn
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110287700.3A
Other languages
Chinese (zh)
Inventor
董恩增
杨启娟
佟吉钢
冯进峰
张祖锋
于航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN202110287700.3A priority Critical patent/CN113111722A/en
Publication of CN113111722A publication Critical patent/CN113111722A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of machine vision, and relates to an automatic driving target identification method based on improved Mask R-CNN. S1, reading picture information and then preprocessing the picture information to obtain a characteristic diagram of the picture; s2, inputting the feature map into the regional recommendation network module to obtain a recommendation frame; s3, judging whether a target exists in the recommendation frame through a classification layer, distinguishing the target and a background in the recommendation frame, determining the position of the target by utilizing boundary regression, determining an ROI (region of interest) from the screened feature map, and removing redundant recommendation frames by utilizing a non-maximum suppression NMS (network management system) algorithm to obtain an accurate recommendation frame; s4, after the ROI is processed, the mask module utilizes the FCN to segment each ROI and outputs a feature map; and S5, collecting the ROI area by a classification and frame regression module, calculating the classification loss and the regression loss based on a Kullback-Leibler loss boundary frame in the ROI area in the module, and determining an accurate recommended frame by using an NMS (network management system) method to realize the identification and segmentation of the target in the picture.

Description

Automatic driving target identification method based on improved Mask R-CNN
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to an automatic driving target identification method based on improved Mask R-CNN.
Background
The automobile automatic driving technology starts from an unmanned automobile project of Google, and in recent years, automatic driving is rapidly developed along with the continuous application of deep learning in image processing, so that the application of a road environment perception method based on the deep learning in automatic driving is possible. However, in the field of automatic driving, because road conditions in road traffic are complex, vehicles have different densities and densities, and vehicle speed fluctuation is relatively large, so that the requirement for identifying the vehicles and the surrounding environment is relatively high. The most important thing in automatic driving is visual perception, and the weather conditions are severe, such as rain, snow and haze; the road conditions are complex, and when people and vehicles come and go densely, the complex road conditions still present a challenge to the visual perception algorithm.
Research shows that compared with other image segmentation methods based on deep learning, such as SDS, CFM, MNC and the like, the image segmentation method based on the Mask R-CNN can realize detection segmentation of different individuals in the same category, and greatly improves the segmentation accuracy. Mask R-CNN is one of mainstream frames of a target identification and segmentation algorithm based on CNN (convolutional neural network), a ResNet-101+ FPN Feature extraction network input picture is adopted for Feature extraction, then 9 anchor boxes are predicted for each pixel point on Feature maps, and 300 anchor boxes with high classification scores are selected as a final ROI area. And finally, sending the ROI area into a Mask module, a Classification module and a Boundingbox regression module to judge the target type and obtain an accurate target position. Mask R-CNN is a computer vision algorithm which is relatively close to human real visual perception, and has high application value in the field of automatic driving. However, the Mask R-CNN algorithm has the defects of poor edge segmentation effect and poor segmentation effect on small targets due to fuzzy target bounding boxes.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an automatic driving target identification method based on improved Mask R-CNN. The method can well cope with complex and variable traffic environments and accurately detect and segment the vehicles and the road conditions around the vehicles by improving the Mask R-CNN algorithm, and the improved method has good applicability in automatic driving application.
The automatic detection, segmentation and identification of various large, medium and small targets are key technologies for automatic driving visual perception, particularly to vehicles and pedestrians. Aiming at the complicated road traffic conditions and the high-precision requirement of the changed speed on target segmentation and identification, the invention provides an improved Mask R-CNN algorithm based on KL loss. Adopting a ResNet-101+ FPN Feature extraction network to realize better Feature maps and fully utilizing the extracted features of each stage; and the network structure KL loss for estimating the position confidence coefficient is adopted to carry out regression on the boundary box, so that the segmentation accuracy of the boundary is greatly improved due to the fuzzy of the boundary box, and the additional calculation cost is hardly increased. In order to perfect the target detection and segmentation effect of the algorithm in severe weather such as rain, snow and the like, 8000 pictures in an automatic driving data set Cityscapes and 1942 pictures in an MS-COCO data set are combined to train the model. Experimental results show that compared with Mask R-CNN, the algorithm provided by the invention has the advantages that the segmentation precision and recall are obviously improved, and the generalization capability and practicability are good in an automatic driving scene.
In order to achieve the purpose, the invention adopts the following technical scheme:
the automatic driving target identification method based on the improved Mask R-CNN comprises the following steps,
s1, preprocessing the picture information after reading the picture information to obtain a characteristic diagram of the picture;
s2, inputting the feature map into the regional recommendation network module to obtain a recommendation frame;
s3, judging whether a target exists in the recommendation frame through a classification layer, distinguishing the target and a background in the recommendation frame, determining the position of the target by utilizing boundary regression, determining an ROI (region of interest) from the screened feature map, and removing redundant recommendation frames by utilizing a non-maximum suppression NMS (network management system) algorithm to obtain an accurate recommendation frame;
s4, after the ROI is processed, the mask module utilizes the FCN to segment each ROI and outputs a feature map;
and S5, collecting the ROI area by a classification and frame regression module, calculating the classification loss and the regression loss based on a Kullback-Leibler loss boundary frame in the ROI area in the module, and determining an accurate recommended frame by using an NMS (network management system) method to realize the identification and segmentation of the target in the picture.
The automatic driving target recognition method based on improved Mask R-CNN as claimed in claim 1, wherein the step S1 is to first scale the picture, then input the scaled picture into the residual error network 101+ feature pyramid feature extraction network of the feature extraction network module, and then extract the feature map of the picture after passing through the full convolution network.
In a further optimization of the present technical solution, in step S2, the regional recommendation network module traverses the feature map by using a sliding window, and predicts a plurality of anchor frames for each pixel to generate a recommendation frame.
In a further optimization of the present technical solution, the size of the sliding window is 3 × 3.
According to the technical scheme, the further optimization is carried out, the size of each anchor frame predicted by each pixel is 6,6 sizes are {2,4,8,16,64 and 256}, the ratio is 9 {0.3:1, 0.5:1, 0.7:1, 0.9:1, 1:1, 1.5:1, 2:1, 2.5:1 and 3:1}, and the total number of the anchor frames is 54.
In a further optimization of the technical scheme, the reference window of the anchor frame is set to 16 × 16, so that the area S of the anchor framekAs follows below, the following description will be given,
Sk=(16*2k) K∈[1,6] (1)
the length-width ratio of the anchor frame is a:1, the width W of each anchor frameKLong HKAs follows below, the following description will be given,
Figure BDA0002981162880000031
Figure BDA0002981162880000032
in the further optimization of the technical solution, the formula of the threshold value screening method of the NMS algorithm in step S3 is as follows,
Figure BDA0002981162880000033
wherein B ═ B1,b2,L L bnIs a series of initial test frames, S ═ S1,s2,.....snAre their corresponding classification scores, NtIs a threshold for the degree of overlap.
In the further optimization of the present technical solution, the processing on the ROI in step S4 specifically includes performing bilinear interpolation alignment operation on the ROI, and fixing the size of the ROI to a uniform size.
In the further optimization of the present technical solution, in step S5, through the offset of the full connection layer to the target object classification result and the boundary regression, as shown in formulas (5) and (6),
Figure BDA0002981162880000041
Figure BDA0002981162880000042
in the formula (5), the reaction mixture is,
Figure BDA0002981162880000043
represents a classification penalty, defined as
Figure BDA0002981162880000044
Wherein p isiIt is the region recommendation that predicts the probability of a target object,
Figure BDA0002981162880000045
is a label of a real calibration frame,
Figure BDA0002981162880000046
represents the bounding box regression loss, defined as smoothL1(t-t*) Wherein, in the step (A),
Figure BDA0002981162880000047
in the formula (6), xgIs the basic GT bounding box location, xeIs the position of the bounding box to be estimated, DKLIs a KL distance, PDIs the basic GT dirac function, PθIs the predicted gaussian distribution, and h (p) is the information entropy.
In a further optimization of the present technical solution, in step S5, the boundary regression loss is defined as a KL distance between the predicted distribution and the distribution of the true calibration box, and the standard deviation of the position of the boundary box and the position of the boundary box are used together to evaluate the klloss and used for regression of the boundary box.
Different from the prior art, the technical scheme has the following advantages:
A) the automatic driving technology has more strict requirements on the detection accuracy and the missing pick rate of the tiny target and the shielded object. The Feature extraction network of the Feature extraction network is ResNet-101+ FPN, and the ResNet adopts cross-layer connection, so that the training is easier, and the FPN realizes better Feature maps fusion. The method obtains feature maps through a convolutional neural network from bottom to top, and then fuses the feature maps in a top-to-bottom and transverse connection mode, so that each layer of network after fusion has deep-level and shallow-level characteristics, and the extraction of the characteristics of each stage can be fully guaranteed.
B) The proportion and the scale of the anchor boxes in the Region pro-technical network module are modified by matching with the visual field requirement of automatic driving and combining the conditions of small target objects and large quantity in the automatic driving. The modified anchor boxes improve the detection capability of the RPN on the target, and particularly show better recall rate in the detection segmentation of small targets.
C) The original Mask R-CNN algorithm depends on regional information in a segmentation task, an average cross entropy loss function is adopted, and a regression positioning target of a boundary box is utilized, so that the segmentation accuracy of the boundary can be influenced due to the fuzzy of the boundary box, and the judgment decision of a vehicle in the automatic driving process and the timely effectiveness of the action of the vehicle are influenced. The method adopts a network structure KLloss for estimating the position confidence coefficient to carry out regression on the bounding box. The loss function can greatly improve the accuracy of various frameworks, hardly increases extra calculation cost, and improves the segmentation accuracy of the fuzzy small target of the bounding box, so that a timely and effective judgment decision can be made in the automatic driving process.
D) The network model is trained in a staged training mode, and parameter adjustment is performed according to each stage, so that invalid training is avoided, and video memory is saved.
E) The training set adopts a mixed data set of an automatic driving data set Cityscapes and MS-COCO, and the detection segmentation precision and the generalization capability of the model under various weather conditions and complex traffic environments are effectively improved.
Drawings
FIG. 1 is a diagram of an improved Mask R-CNN network;
FIG. 2 is a schematic diagram of a ResNet-101+ FPN feature extraction network;
FIG. 3 is a diagram of a network structure for estimating confidence KL loss of a bounding box position;
FIG. 4 is a graph of Precision-Recall relationship;
FIG. 5 is a graph of the effect of segmentation detection in snowy weather with air disturbances;
FIG. 6 is a graph of the effect of segmentation detected in rainy weather with air disturbances;
FIG. 7 is a diagram of segmentation effect on various vehicles and pedestrians in a complex road scene;
FIG. 8 is a segmentation effect diagram of the target vehicle with occlusion and truncation;
FIG. 9 is a segmentation effect diagram of a target vehicle under different degrees of shielding or smaller targets;
fig. 10 is a diagram showing the effect of segmentation at an intersection where vehicles and pedestrians are comparatively concentrated.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
The invention provides an automatic driving target identification method based on improved Mask R-CNN, which comprises the following steps:
s1, reading the picture information, and then preprocessing the picture information, scaling the picture to 1600 × 700, and then entering a residual error network 101+ Feature pyramid (ResNet-101+ FPN) Feature extraction network in a Feature extraction network (Feature extraction network) module, as shown in fig. 1, which is an improved Mask R-CNN network structure diagram. After passing through 91-layer full convolution networks of Conv1, Conv2_ x, Conv3_ x and Conv4_ x of ResNet-101+ FPN, characteristic maps (feature maps) of pictures are extracted.
S2, the feature map output by the feature extraction network module enters a regional recommendation network (Region recommendation network) module, and as shown in fig. 2, a schematic diagram of the ResNet-101+ FPN feature extraction network is shown. The Regionproposal network module traverses Feature maps using a 3 × 3 sliding window, predicts multiple anchor boxes (anchor boxes) per pixel, and generates recommended boxes (Proposal boxes). In order to enable the anchor frame to basically cover various dimensions and shapes of the target object, after a large number of experimental verifications, the invention sets that the dimensions of the anchor frame predicted by each pixel are 6,6 dimensions are {2,4,8,16,64,256}, the ratio is 9 {0.3:1, 0.5:1, 0.7:1, 0.9:1, 1:1, 1.5:1, 2:1, 2.5:1, 3:1}, and the total number is 54 anchor frames. The invention sets the reference window of the anchor frame as 16 x 16, so the area S of the anchor framekAs shown in formula (1).
Sk=(16*2k) K∈[1,6] (1)
The length-width ratio of the anchor frame is a:1, the width W of each anchor frameKLong HKAs shown in formulas (2) and (3).
Figure BDA0002981162880000061
Figure BDA0002981162880000062
S3, judging whether a target exists in the generated recommendation frame through a two-classification (Softmax) layer, distinguishing the target and the background in the recommendation frame, determining the position of the recommendation frame through bounding box regression (BBoxes regression), and finally determining a final Region of Interest (Region of Interest) ROI area from about 300 screened recommendation frames. And removing redundant target frames by using a Non-Maximum Suppression (Non-Maximum Suppression) NMS algorithm to obtain an accurate recommendation frame, wherein a threshold screening method of the NMS algorithm is shown as a formula (4).
Figure BDA0002981162880000071
Wherein B ═ B1,b2,L L bnIs a series of initial test frames, S ═ S1,s2,.....snAre their corresponding classification scores, NtIs a threshold for the degree of overlap.
S4, perform bilinear interpolation alignment (roiign) operation on the ROI region, and fix the size of the ROI region to a uniform size.
S5, a Mask (Mask) module segments each ROI region by using the FCN network, and outputs K m × m feature maps, that is, K classes of m × m binary masks, to obtain a spatial layout of m × m, thereby obtaining a segmentation Mask of the target.
S6, a Classification (Classification) and bounding box regression module collects the ROI area, the ROI area calculates the Classification loss in the module and the regression loss based on a Kullback-Leibler loss (KL-loss) bounding box, as shown in FIG. 3, the ROI area is a KL-loss network structure diagram for estimating the confidence of the position of the bounding box. And obtaining the classification result of the target object and the offset of the boundary regression through a full connection layer (FC layers), as shown in formulas (5) and (6).
Figure BDA0002981162880000072
Figure BDA0002981162880000073
In the formula (5), the reaction mixture is,
Figure BDA0002981162880000081
represents a classification penalty, defined as
Figure BDA0002981162880000082
Wherein p isiIs the Region recommendation (Region probability) prediction as target object probability,
Figure BDA0002981162880000083
is a real calibration box (groudtruth) tag.
Figure BDA0002981162880000084
Represents the bounding box regression loss, defined as smoothL1(t-t*) Wherein
Figure BDA0002981162880000085
The boundary regression loss is defined on this basis as the KL distance between the predicted distribution and the real calibration box (groudtruth) distribution, and the BBox position and the standard deviation of the BBox position are used together to estimate kloss and for the boundary box regression. In the formula (6), xgIs the basic GT bounding box position; x is the number ofeIs the bounding box position to be estimated; dKLIs a KL distance; pDIs the basic GT dirac function, PθIs the predicted gaussian distribution, and h (p) is the information entropy, which is typically small and fixed.
Establishing a training data set: 8000 training images in an automatic driving data set Cityscapes are selected as a training data set, and the training data set comprises real image data of street scenes under different driving conditions of different cities. In order to improve the target segmentation precision of the training model under severe weather conditions and complex traffic conditions, 1942 training pictures in an MS-COCO data set are added, wherein the weather conditions comprise 'snow', 'rain' and 'sunny'; the complex traffic conditions include "traffic flow is large" and "traffic congestion" and the like. In experiments, to fit a mixed dataset for use in an improved algorithm, it was formatted as an MS-COCO dataset.
Training a network model: training ResNet-101+ FPN in the Feature extraction network module in ImageNet, taking a network model obtained after training as a pre-training model, and making a hybrid automatic driving data set into fine-tuning; the training is divided into three phases: the first phase is Training network headers; the second stage is Fine-tuning ResNet stage 4and up; the third stage Fine-tuning all layers. The learning rate of the algorithm is set to 0.01, and in the iteration process, parameters are set to enable the learning rate to be decreased gradually. In the initialization process, the weights of the full connection layer are initialized by random gaussians. The standard deviation and the mean are set to 0.0001 and 0, respectively, so that KL loss and smooth L of the standard1Losses are similar during the initial training phase.
The purpose of the algorithm improvement of the invention is to improve the deficiency of the Mask R-CNN algorithm and make the algorithm accord with the technical index of the automatic driving task. The ratio and the scale of the anchor boxes in the Region porous network module are suitable for medium and large-sized targets, and certain defects exist in the detection of small targets; the problem of target missing detection is caused by incomplete features extracted by the feature extraction network; and poor model generalization capability in an automatic driving scene.
In a preferred embodiment of the invention, the automatic driving target identification method based on the improved Mask R-CNN comprises the following steps,
(1) the input picture is first scaled to 1600 × 700, and then passes through Conv1, Conv2_ x, Conv3_ x and Conv4_ x of the ResNet-101+ FPN Feature extraction network to obtain picture Feature maps, as shown in fig. 2, which is a schematic diagram of the ResNet-101+ FPN Feature extraction network.
(2) And (3) traversing each Feature maps pixel obtained in (1) by taking the anchor point of the central point as a reference, wherein each anchor point can predict 6 mesoscales {2,4,8,16,64,256} and 9 proportions {0.3:1, 0.5:1, 0.7:1, 0.9:1, 1:1, 1.5:1, 2:1, 2.5:1, 3:1}, and taking the 54 anchor boxes as initial detection frames. The largest anchor boxes are 1774 x 590 and the smallest anchor boxes are 56 x 16, so 54 anchor boxes basically cover various dimensions and shapes of the target object. And judging whether a target exists in the interior through a Softmax layer, distinguishing the target and the background in the Proposal boxes, determining the position of the target and the background by utilizing BBoxes regressive, and then determining a final ROI (region of interest) from about 300 screened Proposal boxes. And finally, removing redundant target boxes by using an NMS algorithm to obtain an accurate Proposal box.
(3) Synthesizing Feature maps obtained in (1) and Proposal box obtained in (2) and sending the synthesized Feature maps and Proposal box into a Mask module&Classification&In the Boundingbox regression module, a Mask module utilizes an FCN to segment each ROI area to obtain a segmentation Mask of a target; classication&The Boundingbox regression module passes through KLloss and standard smooth L1The bounding box regression loss and classification loss were calculated. And meanwhile, determining an accurate Proposal box by using an NMS method, and finishing the identification and segmentation of the target in one picture.
(4) Establishing a training data set: 8000 training images in an automatic driving data set Cityscapes are selected as a training data set, and the training data set comprises real image data of street scenes under different driving conditions of different cities. In order to improve the target segmentation precision of the training model under severe weather conditions and complex traffic conditions, 1942 training pictures in the MS-COCO data set are added. In experiments, to fit a mixed dataset for use in an improved algorithm, it was formatted as an MS-COCO dataset.
(5) Training ResNet-101+ FPN in ImageNet, taking a network model obtained after training as a pre-training model, and taking a hybrid automatic driving data set as fine-tuning; the training is divided into three phases: the first phase is Training network headers; the second stage is Fine-tuning ResNet stage 4and up; setting the learning rate of the algorithm to be 0.01 network model training: in the Feature extraction network module, parameters are set so that the learning rate is decreased in an iterative process. In the initialization process, the weights of the full connection layer are initialized by random gaussians. The standard deviation and the mean were set to 0.0001 and 0, respectively.
Results and analysis of the experiments
Experimental environment and parameters
The experimental environment of the invention is carried out under an operating system of Ubuntu16.04, the server adopts an Intel Xeon Silver 41102.10 GHz 8-core CPU, is provided with 2 Hynix 64GB DDR4-2666MHz memories and is provided with 2 display cards of GTX2080TI 11G. And (3) building a Tensorflow deep learning framework on the basis, and realizing the training and testing of the network by using Python language programming.
Qualitative and quantitative analysis model accuracy
Accepted indicators of evaluation in the target detection and identification task are a Precision-Recall relationship curve, an AP (interpolated Average Precision) value, and an mAP (mean Average Precision) value.
The Precision-Recall relation curve is a curve drawn by taking Precision as a vertical coordinate and Recall as a horizontal coordinate, and the quality of the classification condition of each type of object by the system is qualitatively evaluated by adjusting a threshold value and observing the change of the curve.
Precision in the Precision-Recall relationship curve reflects the proportion of True positives (True positives) in the correctly identified targets, and the calculation formula is shown in formula (7),
Figure BDA0002981162880000111
in the formula, TP: true positives. FP: false positives.
Recalling rate (recalling rate) reflects the proportion of a certain correctly identified target object in the object, and a calculation formula is shown in a formula (8).
Figure BDA0002981162880000112
Wherein, TP: true positives. FN: false negatives.
FIG. 4 shows a Precision-Recall curve for qualitative analysis of the present algorithm. The curves of various objects at the upper right corner in the Precision-Recall relation curve graph are all in a convex shape, which shows that the algorithm has good identification effect and high accuracy.
The invention uses the precision of quantitative analysis model of various object AP (interpolated Average precision) values; the detection and identification effects of the algorithm on the data set are evaluated by using the value of mAP (mean Average precision). The AP (average Precision) value is the area under the Precision-Recall relationship curve, which is used to quantify the model accuracy. In order to avoid the problem that the AP value is low due to the instability of the PR curve, the invention uses a calculation method of 'interrupted Average Precision', namely, in the threshold Precision used each time, the Precision value of the maximum value is multiplied by the Recall value, and then the product values obtained under all the thresholds are accumulated, as shown in a formula (9).
Figure BDA0002981162880000121
Wherein P is Precision. R is Recall.
In multi-target detection and identification of pictures, mAP values are used to measure the quality of models in the classification task of all classes of objects. The mAP is the average value of AP values of a plurality of classes of objects, and the larger the value is, the higher the detection precision is, and the better the performance of the algorithm is.
The data set is a hybrid automatic driving data set, and the measurement index is APsmall(target pixel area less than 32)2),APmedium(target pixel area greater than 32)2And less than 962),APlarge(target pixel area greater than 962). The results of the experiment are shown in table 1.
TABLE 1 Large, Medium, and Small three categories of target AP value to compare
mAP APs APm APl
BBoxreg 23.6 33.2 32.1 37.1
BBoxreg+BBoxreg Std 30.9 34.9 33.5 40.2
Table 2 shows the comparison of the maps of the algorithm of the present invention under different network structures.
Table 2 mAP value comparison under different network architectures
Figure BDA0002981162880000122
The following conclusions can be easily drawn from the experimental results: compared with the original method without adding the FPN network, the method has the advantages that the FPN network is added into the basic network structure ResNet-101, and the network structure of the algorithm is obviously improved in mAP value; although the collocation of the ResNeXt-101+ FPN network structure can improve more mAP values, the network parameters are more, so that certain limitation is caused on the operation speed, and the requirement of automatic driving on quick response cannot be met.
Results of the experiment
The test results of the algorithm of the invention after training on the autopilot hybrid data set are shown in fig. 5-6. Fig. 5 and 6 show the detection and segmentation effects of the training network model under the air interference in snowy days and rainy days. The test results of the algorithm of the invention after training on the autopilot hybrid data set are shown in fig. 7-9. It can be seen from fig. 7 that the algorithm has a good segmentation effect on various vehicles and pedestrians in a complex road scene; in fig. 8, the leftmost target vehicle has an occlusion, but the algorithm of the present invention can still accurately lock and segment the target vehicle; for target vehicles with blurred left vehicles due to tree occlusion in fig. 9, the segmentation algorithm can still overcome the occlusion and accurately detect and identify the target vehicles. Fig. 10 shows that at the intersection where the vehicles and pedestrians are concentrated, the accuracy of the segmentation algorithm of the invention is not reduced, and the bicycles and the pedestrians can be accurately segmented.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising … …" or "comprising … …" does not exclude the presence of additional elements in a process, method, article, or terminal that comprises the element. Further, herein, "greater than," "less than," "more than," and the like are understood to exclude the present numbers; the terms "above", "below", "within" and the like are to be understood as including the number.
Although the embodiments have been described, once the basic inventive concept is obtained, other variations and modifications of these embodiments can be made by those skilled in the art, so that the above embodiments are only examples of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes using the contents of the present specification and drawings, or any other related technical fields, which are directly or indirectly applied thereto, are included in the scope of the present invention.

Claims (10)

1. The automatic driving target identification method based on the improved Mask R-CNN is characterized by comprising the following steps,
s1, preprocessing the picture information after reading the picture information to obtain a characteristic diagram of the picture;
s2, inputting the feature map into the regional recommendation network module to obtain a recommendation frame;
s3, judging whether a target exists in the recommendation frame through a classification layer, distinguishing the target and a background in the recommendation frame, determining the position of the target by utilizing boundary regression, determining an ROI (region of interest) from the screened feature map, and removing redundant recommendation frames by utilizing a non-maximum suppression NMS (network management system) algorithm to obtain an accurate recommendation frame;
s4, after the ROI is processed, the mask module utilizes the FCN to segment each ROI and outputs a feature map;
and S5, collecting the ROI area by a classification and frame regression module, calculating the classification loss and the regression loss based on a Kullback-Leibler loss boundary frame in the ROI area in the module, and determining an accurate recommended frame by using an NMS (network management system) method to realize the identification and segmentation of the target in the picture.
2. The automatic driving target recognition method based on improved Mask R-CNN as claimed in claim 1, wherein the step S1 is to first scale the picture, then input the scaled picture into the residual error network 101+ feature pyramid feature extraction network of the feature extraction network module, and then extract the feature map of the picture after passing through the full convolution network.
3. The improved Mask R-CNN-based automatic driving target identification method of claim 1, wherein in step S2, the regional recommendation network module traverses the feature map by using a sliding window, and each pixel predicts a plurality of anchor frames to generate a recommendation frame.
4. The improved Mask R-CNN based automatic driving target recognition method according to claim 3, wherein the size of the sliding window is 3 x 3.
5. The improved Mask R-CNN-based automatic driving target identification method according to claim 3, wherein the predicted anchor frame size of each pixel is 6,6 scales are {2,4,8,16,64,256}, and the ratio is 9 {0.3:1, 0.5:1, 0.7:1, 0.9:1, 1:1, 1.5:1, 2:1, 2.5:1, 3:1}, and the total number is 54.
6. The automatic driving target recognition method based on improved Mask R-CNN as claimed in claim 3,
characterized in that the anchor frame reference window is set to 16 x 16, so that the area S of the anchor framekAs follows below, the following description will be given,
Sk=(16*2k)K∈[1,6] (1)
the length-width ratio of the anchor frame is a:1, the width W of each anchor frameKLong HKAs follows below, the following description will be given,
Figure FDA0002981162870000021
Figure FDA0002981162870000022
7. the automatic driving target recognition method based on improved Mask R-CNN as claimed in claim 1,
wherein the threshold value screening formula of the NMS algorithm in the step S3 is as follows,
Figure FDA0002981162870000023
wherein B ═ B1,b2,L L bnIs a series of initial test frames, S ═ S1,s2,.....snAre their corresponding classification scores, NtIs a threshold for the degree of overlap.
8. The improved Mask R-CNN based automatic driving target recognition method according to claim 1, wherein the processing of the ROI in step S4 is to perform bilinear interpolation alignment operation on the ROI, and fix the size of the ROI to a uniform size.
9. The improved Mask R-CNN-based automatic driving target identification method according to claim 1, wherein the step S5 is implemented by fully connecting layers to the target object classification result and the offset of boundary regression, as shown in formulas (5) and (6),
Figure FDA0002981162870000024
Figure FDA0002981162870000025
in the formula (5), the reaction mixture is,
Figure FDA0002981162870000026
represents a classification penalty, defined as
Figure FDA0002981162870000027
Wherein p isiIt is the region recommendation that predicts the probability of a target object,
Figure FDA0002981162870000031
is a label of a real calibration frame,
Figure FDA0002981162870000032
represents the bounding box regression loss, defined as smoothL1(t-t*) Wherein, in the step (A),
Figure FDA0002981162870000033
in the formula (6), xgIs the basic GT bounding box location, xeIs the position of the bounding box to be estimated, DKLIs a KL distance, PDIs the basic GT dirac function, PθIs the predicted gaussian distribution, and h (p) is the information entropy.
10. The improved Mask R-CNN based automatic driving target recognition method according to claim 1, wherein the step S5 defines the boundary regression loss as KL distance between the predicted distribution and the real calibration box distribution, and uses the standard deviation of the bounding box position and the bounding box position together for evaluating KLloss and for regression of the bounding box.
CN202110287700.3A 2021-03-17 2021-03-17 Automatic driving target identification method based on improved Mask R-CNN Pending CN113111722A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110287700.3A CN113111722A (en) 2021-03-17 2021-03-17 Automatic driving target identification method based on improved Mask R-CNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110287700.3A CN113111722A (en) 2021-03-17 2021-03-17 Automatic driving target identification method based on improved Mask R-CNN

Publications (1)

Publication Number Publication Date
CN113111722A true CN113111722A (en) 2021-07-13

Family

ID=76711910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110287700.3A Pending CN113111722A (en) 2021-03-17 2021-03-17 Automatic driving target identification method based on improved Mask R-CNN

Country Status (1)

Country Link
CN (1) CN113111722A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657174A (en) * 2021-07-21 2021-11-16 北京中科慧眼科技有限公司 Vehicle pseudo-3D information detection method and device and automatic driving system
CN113705387A (en) * 2021-08-13 2021-11-26 国网江苏省电力有限公司电力科学研究院 Method for detecting and tracking interferent for removing foreign matters on overhead line by laser
CN115063594A (en) * 2022-08-19 2022-09-16 清驰(济南)智能科技有限公司 Feature extraction method and device based on automatic driving
CN116106899A (en) * 2023-04-14 2023-05-12 青岛杰瑞工控技术有限公司 Port channel small target identification method based on machine learning
CN116469014A (en) * 2023-01-10 2023-07-21 南京航空航天大学 Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447018A (en) * 2018-11-08 2019-03-08 天津理工大学 A kind of road environment visual perception method based on improvement Faster R-CNN
CN111027547A (en) * 2019-12-06 2020-04-17 南京大学 Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111862119A (en) * 2020-07-21 2020-10-30 武汉科技大学 Semantic information extraction method based on Mask-RCNN
CN112241950A (en) * 2020-10-19 2021-01-19 福州大学 Detection method of tower crane crack image
CN112508168A (en) * 2020-09-25 2021-03-16 上海海事大学 Frame regression neural network construction method based on automatic correction of prediction frame

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447018A (en) * 2018-11-08 2019-03-08 天津理工大学 A kind of road environment visual perception method based on improvement Faster R-CNN
CN111027547A (en) * 2019-12-06 2020-04-17 南京大学 Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111862119A (en) * 2020-07-21 2020-10-30 武汉科技大学 Semantic information extraction method based on Mask-RCNN
CN112508168A (en) * 2020-09-25 2021-03-16 上海海事大学 Frame regression neural network construction method based on automatic correction of prediction frame
CN112241950A (en) * 2020-10-19 2021-01-19 福州大学 Detection method of tower crane crack image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Q. YANG 等: "An Instance Segmentation Algorithm Based on Improved Mask R-CNN", 《2020 CHINESE AUTOMATION CONGRESS (CAC)》, pages 4804 - 4809 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657174A (en) * 2021-07-21 2021-11-16 北京中科慧眼科技有限公司 Vehicle pseudo-3D information detection method and device and automatic driving system
CN113705387A (en) * 2021-08-13 2021-11-26 国网江苏省电力有限公司电力科学研究院 Method for detecting and tracking interferent for removing foreign matters on overhead line by laser
CN113705387B (en) * 2021-08-13 2023-11-17 国网江苏省电力有限公司电力科学研究院 Interference object detection and tracking method for removing overhead line foreign matters by laser
CN115063594A (en) * 2022-08-19 2022-09-16 清驰(济南)智能科技有限公司 Feature extraction method and device based on automatic driving
CN115063594B (en) * 2022-08-19 2022-12-13 清驰(济南)智能科技有限公司 Feature extraction method and device based on automatic driving
CN116469014A (en) * 2023-01-10 2023-07-21 南京航空航天大学 Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN
CN116469014B (en) * 2023-01-10 2024-04-30 南京航空航天大学 Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN
CN116106899A (en) * 2023-04-14 2023-05-12 青岛杰瑞工控技术有限公司 Port channel small target identification method based on machine learning
CN116106899B (en) * 2023-04-14 2023-06-23 青岛杰瑞工控技术有限公司 Port channel small target identification method based on machine learning

Similar Documents

Publication Publication Date Title
CN111368687B (en) Sidewalk vehicle illegal parking detection method based on target detection and semantic segmentation
CN109977812B (en) Vehicle-mounted video target detection method based on deep learning
CN109447018B (en) Road environment visual perception method based on improved Faster R-CNN
CN113111722A (en) Automatic driving target identification method based on improved Mask R-CNN
CN109033950B (en) Vehicle illegal parking detection method based on multi-feature fusion cascade depth model
CN108596055B (en) Airport target detection method of high-resolution remote sensing image under complex background
CN108305260B (en) Method, device and equipment for detecting angular points in image
CN111709416A (en) License plate positioning method, device and system and storage medium
CN111461039B (en) Landmark identification method based on multi-scale feature fusion
CN108804992B (en) Crowd counting method based on deep learning
CN114973002A (en) Improved YOLOv 5-based ear detection method
CN106778633B (en) Pedestrian identification method based on region segmentation
CN111274926B (en) Image data screening method, device, computer equipment and storage medium
CN117094914B (en) Smart city road monitoring system based on computer vision
CN112861970B (en) Fine-grained image classification method based on feature fusion
KR101941043B1 (en) Method for Object Detection Using High-resolusion Aerial Image
CN113343985B (en) License plate recognition method and device
CN111582339A (en) Vehicle detection and identification method based on deep learning
CN104766065A (en) Robustness prospect detection method based on multi-view learning
CN112634368A (en) Method and device for generating space and OR graph model of scene target and electronic equipment
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN112766273A (en) License plate recognition method
CN110176022B (en) Tunnel panoramic monitoring system and method based on video detection
CN111553337A (en) Hyperspectral multi-target detection method based on improved anchor frame
CN110751670B (en) Target tracking method based on fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination