CN109871903B - Target detection method based on end-to-end deep network and counterstudy - Google Patents

Target detection method based on end-to-end deep network and counterstudy Download PDF

Info

Publication number
CN109871903B
CN109871903B CN201910179602.0A CN201910179602A CN109871903B CN 109871903 B CN109871903 B CN 109871903B CN 201910179602 A CN201910179602 A CN 201910179602A CN 109871903 B CN109871903 B CN 109871903B
Authority
CN
China
Prior art keywords
feature map
feature
layer
candidate
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910179602.0A
Other languages
Chinese (zh)
Other versions
CN109871903A (en
Inventor
韩光
周旺
杨超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201910179602.0A priority Critical patent/CN109871903B/en
Publication of CN109871903A publication Critical patent/CN109871903A/en
Application granted granted Critical
Publication of CN109871903B publication Critical patent/CN109871903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

A target detection method based on an end-to-end deep network and counterstudy. Based on SSD, the characteristics that the low convolution layer has a small local perception field are utilized, and the low-resolution high-semantic information feature map and the high-resolution low-semantic information feature map are fused through an inverse convolution structure, so that the aim of improving the average accuracy of a target detection algorithm is fulfilled. In addition, coarse-grained candidate frame information is obtained through an RPN, a binary classification judgment is added after a candidate frame is generated in a basic feature layer, and then further regression is carried out through a conventional regression branch, so that more accurate detection frame information is obtained. Meanwhile, for the problem that the detection effect of the SSD algorithm on the partially shielded target is poor, the method for realizing the partial shielding of the features by adding the shielding Mask (Mask) on the feature map is provided, so that the effect of resisting learning is achieved.

Description

Target detection method based on end-to-end deep network and antagonistic learning
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method based on an end-to-end deep network and counterstudy.
Background
With the continuous development of computer technology and the continuous increase of the demand of intelligent video analysis, target detection research has become one of the important and challenging research directions in the field of computer vision. Object detection is a prerequisite for a number of advanced visual tasks, including activity or event recognition, scene content understanding, and the like. Moreover, object detection is also applied to many practical tasks, such as intelligent video surveillance, content-based image retrieval, robotic navigation, augmented reality, and the like. The target detection has important significance for the computer vision field and practical application.
At present, mainstream target detection algorithms are mainly based on a deep learning model, and can be divided into two categories: (1) the two-stage detection algorithm divides the detection problem into two stages, firstly generates a candidate region, and then classifies the candidate region, wherein the typical representatives of the algorithms are R-CNN algorithms based on region propofol, such as R-CNN, Fast R-CNN and the like; (2) one-stage detection algorithm, which does not require a region pro-posal stage, directly generates class probability and position coordinate values of an object, comparing typical algorithms such as YOLO and SSD algorithms.
Although the accuracy of the current mainstream target detection technology is high for large and medium targets, the detection effect is poor for small targets and some targets which are partially shielded.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a target detection method based on an end-to-end deep network and counterstudy, aiming at the above deficiencies in the prior art. The method is based on the SSD, utilizes the characteristic that the low convolution layer has a small local perception field, and fuses the low-resolution high-semantic information characteristic graph and the high-resolution low-semantic information characteristic graph through the inverse convolution structure, so as to achieve the purpose of improving the average accuracy of the target detection algorithm. In addition, coarse-grained candidate frame information is obtained through an RPN, a binary classification judgment is added after a candidate frame is generated in a basic feature layer, and then further regression is carried out through a conventional regression branch, so that more accurate detection frame information is obtained. Meanwhile, for the problem that the detection effect of the SSD algorithm on the partially shielded target is poor, the method for realizing the partial shielding of the features by adding the shielding Mask (Mask) on the feature map is provided, so that the effect of resisting learning is achieved.
A target detection method based on an end-to-end deep network and counterstudy comprises the following steps:
step 1, introducing a reverse convolution structure into an SSD algorithm, fusing a low-resolution high-semantic information feature map and a high-resolution low-semantic information feature map by adopting reverse convolution, and increasing the feature extraction capability of a lower layer in a network;
step 2, obtaining coarse-grained candidate frame information through an RPN, adding a binary classification judgment after a candidate frame is generated in a basic feature layer, and then further regressing through a conventional regression branch to obtain more accurate detection frame information;
step 3, corresponding the candidate frames on different scales screened in the step 2 with the fusion layer generated in the step 1 after the features on different scales are fused, then, amplifying or reducing all the feature map areas corresponding to the candidate frames to a fixed size through ROIPooling operation, and realizing partial shielding of the features by adding shielding masks on the feature maps so as to achieve the effect of resisting learning;
and 4, enabling the feature map shielded by the shielding mask to go back and forth to the detection frame and the class through the two full connection layers and the Softmax classifier.
Further, the step 1 specifically comprises:
firstly, inputting image data into an SSD network for extracting image features, and selecting four feature maps with different resolutions in a network structure;
and then, carrying out deconvolution on the low-resolution high-semantic information feature map in the SSD, carrying out feature fusion on the feature map obtained through deconvolution and the original feature map, wherein the feature fusion mode is to carry out deconvolution operation on the feature map, transmit the high-scale information to the previous layer through deconvolution operation on the feature layer of the next layer, transmit the high-scale information layer by layer, and finally obtain four fusion layers with different resolutions.
Further, the step 2 is specifically as follows:
for the feature maps with four different resolutions generated in the step 1, candidate frames with different sizes are generated on the four feature maps, partial negative samples are removed according to the IOU between the candidate frame and the target real frame, two branches are finally obtained based on the 4-layer features, one branch is regressed by coordinates of the candidate frame, and the other branch is a two-branch of the candidate frame.
Further, the step 3 specifically includes:
firstly, corresponding four layers of fusion layers with different resolutions obtained in the step 1 with positive and negative sample candidate frames on the feature maps with different resolutions obtained in the step 2;
then, performing ROIPooling operation on the four fusion layers with different resolutions, and scaling the feature map size corresponding to the candidate frame to a uniform size;
a mask is then generated from the fully connected layer to determine which portions of the feature map should be occluded, and the difficult samples thus generated are preferably misjudged by the detector, and the mask is automatically adjusted according to the loss function.
Further, in the step 4, the feature map shielded by the shielding mask is returned to the detection frame and the class through two full connection layers and a Softmax classifier, and specifically, the Non-maximum suppression (NMS) and the confidence threshold are screened to obtain the final predicted class, probability and positioning result.
In conclusion, the invention fuses the high-resolution low-semantic information feature map and the low-resolution high-semantic information feature map by utilizing the inverse convolution, improves the small target detection capability of the SSD algorithm, and simultaneously introduces counterstudy to enhance the detection capability of the algorithm on partially shielded targets.
Drawings
Fig. 1 is a network structure diagram of a target detection method based on an end-to-end deep network and counterlearning according to the present invention.
Fig. 2 is a structural diagram of the deconvolution feature fusion proposed by the present invention.
FIG. 3 is an exemplary graph of the feature occluded by the binarization mask according to the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the drawings in the specification.
A target detection method based on an end-to-end deep network and counterstudy comprises the following steps:
step 1, introducing a reverse convolution structure into an SSD algorithm, fusing a low-resolution high-semantic information feature map and a high-resolution low-semantic information feature map by adopting reverse convolution, and increasing the feature extraction capability of a low layer in a network.
The step 1 specifically comprises the following steps:
firstly, inputting image data into an SSD network for extracting image features, and selecting four feature maps with different resolutions in a network structure.
And then, carrying out deconvolution on the low-resolution high-semantic information feature map in the SSD, carrying out feature fusion on the feature map obtained through deconvolution and the original feature map, wherein the feature fusion mode is to carry out deconvolution operation on the feature map, transmit the high-scale information to the previous layer through deconvolution operation on the feature layer of the next layer, transmit the high-scale information layer by layer, and finally obtain four fusion layers with different resolutions.
And 2, obtaining coarse-grained candidate frame information through an RPN, adding a binary classification judgment after a candidate frame is generated in a basic feature layer, and then further regressing through a conventional regression branch to obtain more accurate detection frame information.
The step 2 is specifically as follows:
for the feature maps with four different resolutions generated in the step 1, candidate frames with different sizes are generated on the four feature maps, partial negative samples are removed according to the IOU between the candidate frame and the target real frame, two branches are finally obtained based on the 4-layer features, one branch is regressed by coordinates of the candidate frame, and the other branch is a two-branch of the candidate frame.
And 3, corresponding the candidate frames on different scales screened in the step 2 with the fusion layer generated in the step 1 after the features on different scales are fused, then, amplifying or reducing all the feature map areas corresponding to the candidate frames to a fixed size through ROIPooling operation, and realizing partial shielding of the features by adding shielding masks on the feature maps so as to achieve the effect of resisting learning.
The step 3 specifically comprises the following steps:
firstly, the four layers of fusion layers with different resolutions obtained in the step 1 correspond to the positive and negative sample candidate frames on the feature map with different resolutions obtained in the step 2.
And then, carrying out ROIPooling operation on the four fusion layers with different resolutions, and scaling the feature map size corresponding to the candidate frame to a uniform size.
A mask is then generated from the fully connected layer to determine which portions of the feature map should be occluded, and the difficult samples thus generated are preferably misjudged by the detector, and the mask is automatically adjusted according to the loss function.
And 4, enabling the feature map shielded by the shielding mask to go back and forth to the detection frame and the class through the two full connection layers and the Softmax classifier.
And 4, enabling the feature map shielded by the shielding mask to go back and forth to a detection frame and a category through two full connection layers and a Softmax classifier, and specifically, screening a Non-maximum suppression (NMS) and a confidence threshold value through a Non-maximum value to obtain a final predicted category, probability and positioning result.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims (3)

1. A target detection method based on end-to-end deep network and antagonistic learning is characterized in that: comprises the following steps:
step 1, introducing a reverse convolution structure into an SSD algorithm, fusing a low-resolution high-semantic information feature map and a high-resolution low-semantic information feature map by adopting reverse convolution, and increasing the feature extraction capability of a lower layer in a network;
step 2, obtaining coarse-grained candidate frame information through an RPN, adding a binary classification judgment after a candidate frame is generated in a basic feature layer, and then further regressing through a conventional regression branch to obtain more accurate detection frame information;
the step 2 is specifically as follows:
for the feature maps with four different resolutions generated in the step 1, generating candidate frames with different sizes on the four feature maps, removing partial negative samples according to the IOU between the candidate frames and the target real frame, and finally obtaining two branches based on the 4-layer features, wherein one branch is a coordinate regression branch of the candidate frame, and the other branch is a classification branch of the candidate frame;
step 3, corresponding the candidate frames on different scales screened in the step 2 with the fusion layer generated in the step 1 after the features on different scales are fused, then, amplifying or reducing all the feature map areas corresponding to the candidate frames to a fixed size through ROIPooling operation, and realizing partial shielding of the features by adding shielding masks on the feature maps so as to achieve the effect of resisting learning;
the step 3 specifically comprises the following steps:
firstly, corresponding four layers of fusion layers with different resolutions obtained in the step 1 with positive and negative sample candidate frames on feature maps with different resolutions obtained in the step 2;
then, performing ROIPooling operation on the four fusion layers with different resolutions, and scaling the feature map size corresponding to the candidate frame to a uniform size;
then, a mask is generated through the full connection layer, and the parts of the characteristic diagram which should be shielded are determined, so that the generated difficult sample is misjudged by a detector, and the mask can be automatically adjusted according to a loss function;
and 4, enabling the feature map shielded by the shielding mask to go back and forth to the detection frame and the class through the two full connection layers and the Softmax classifier.
2. The method of claim 1, wherein the method comprises the following steps: the step 1 specifically comprises the following steps:
firstly, inputting image data into an SSD network for extracting image features, and selecting four feature maps with different resolutions in a network structure;
and then, carrying out deconvolution on the low-resolution high-semantic information feature map in the SSD, carrying out feature fusion on the feature map obtained through deconvolution and the original feature map, wherein the feature fusion mode is to carry out deconvolution operation on the feature map, transmit the high-scale information to the previous layer through deconvolution operation on the feature layer of the next layer, transmit the high-scale information layer by layer, and finally obtain four fusion layers with different resolutions.
3. The method of claim 1, wherein the method comprises the following steps: and 4, enabling the feature map shielded by the shielding mask to go back and forth to a detection frame and a category through two full connection layers and a Softmax classifier, and specifically, screening a Non-maximum suppression threshold and a confidence threshold to obtain a final predicted category, probability and positioning result.
CN201910179602.0A 2019-03-11 2019-03-11 Target detection method based on end-to-end deep network and counterstudy Active CN109871903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910179602.0A CN109871903B (en) 2019-03-11 2019-03-11 Target detection method based on end-to-end deep network and counterstudy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910179602.0A CN109871903B (en) 2019-03-11 2019-03-11 Target detection method based on end-to-end deep network and counterstudy

Publications (2)

Publication Number Publication Date
CN109871903A CN109871903A (en) 2019-06-11
CN109871903B true CN109871903B (en) 2022-08-26

Family

ID=66920150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910179602.0A Active CN109871903B (en) 2019-03-11 2019-03-11 Target detection method based on end-to-end deep network and counterstudy

Country Status (1)

Country Link
CN (1) CN109871903B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852241B (en) * 2019-11-06 2022-08-16 西安交通大学 Small target detection method applied to nursing robot
CN115082758B (en) * 2022-08-19 2022-11-11 深圳比特微电子科技有限公司 Training method of target detection model, target detection method, device and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288075A (en) * 2018-02-02 2018-07-17 沈阳工业大学 A kind of lightweight small target detecting method improving SSD
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102592076B1 (en) * 2015-12-14 2023-10-19 삼성전자주식회사 Appartus and method for Object detection based on Deep leaning, apparatus for Learning thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288075A (en) * 2018-02-02 2018-07-17 沈阳工业大学 A kind of lightweight small target detecting method improving SSD
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning

Also Published As

Publication number Publication date
CN109871903A (en) 2019-06-11

Similar Documents

Publication Publication Date Title
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN110728200B (en) Real-time pedestrian detection method and system based on deep learning
CN109635666B (en) Image target rapid detection method based on deep learning
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN111275688A (en) Small target detection method based on context feature fusion screening of attention mechanism
WO2021238019A1 (en) Real-time traffic flow detection system and method based on ghost convolutional feature fusion neural network
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
CN111967313B (en) Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN111209887A (en) SSD model optimization method for small target detection
CN111914720B (en) Method and device for identifying insulator burst of power transmission line
CN108875754B (en) Vehicle re-identification method based on multi-depth feature fusion network
CN114495029A (en) Traffic target detection method and system based on improved YOLOv4
CN113723377A (en) Traffic sign detection method based on LD-SSD network
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN111768415A (en) Image instance segmentation method without quantization pooling
CN109871903B (en) Target detection method based on end-to-end deep network and counterstudy
Zhang et al. Application research of YOLO v2 combined with color identification
Han et al. A method based on multi-convolution layers joint and generative adversarial networks for vehicle detection
CN110909656B (en) Pedestrian detection method and system integrating radar and camera
CN113076889B (en) Container lead seal identification method, device, electronic equipment and storage medium
CN113762162A (en) Fire early warning method and system based on semantic segmentation and recognition
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN113569911A (en) Vehicle identification method and device, electronic equipment and storage medium
CN111275733A (en) Method for realizing rapid tracking processing of multiple ships based on deep learning target detection technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant