CN109871903B - Target detection method based on end-to-end deep network and counterstudy - Google Patents
Target detection method based on end-to-end deep network and counterstudy Download PDFInfo
- Publication number
- CN109871903B CN109871903B CN201910179602.0A CN201910179602A CN109871903B CN 109871903 B CN109871903 B CN 109871903B CN 201910179602 A CN201910179602 A CN 201910179602A CN 109871903 B CN109871903 B CN 109871903B
- Authority
- CN
- China
- Prior art keywords
- feature map
- feature
- layer
- candidate
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
A target detection method based on an end-to-end deep network and counterstudy. Based on SSD, the characteristics that the low convolution layer has a small local perception field are utilized, and the low-resolution high-semantic information feature map and the high-resolution low-semantic information feature map are fused through an inverse convolution structure, so that the aim of improving the average accuracy of a target detection algorithm is fulfilled. In addition, coarse-grained candidate frame information is obtained through an RPN, a binary classification judgment is added after a candidate frame is generated in a basic feature layer, and then further regression is carried out through a conventional regression branch, so that more accurate detection frame information is obtained. Meanwhile, for the problem that the detection effect of the SSD algorithm on the partially shielded target is poor, the method for realizing the partial shielding of the features by adding the shielding Mask (Mask) on the feature map is provided, so that the effect of resisting learning is achieved.
Description
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method based on an end-to-end deep network and counterstudy.
Background
With the continuous development of computer technology and the continuous increase of the demand of intelligent video analysis, target detection research has become one of the important and challenging research directions in the field of computer vision. Object detection is a prerequisite for a number of advanced visual tasks, including activity or event recognition, scene content understanding, and the like. Moreover, object detection is also applied to many practical tasks, such as intelligent video surveillance, content-based image retrieval, robotic navigation, augmented reality, and the like. The target detection has important significance for the computer vision field and practical application.
At present, mainstream target detection algorithms are mainly based on a deep learning model, and can be divided into two categories: (1) the two-stage detection algorithm divides the detection problem into two stages, firstly generates a candidate region, and then classifies the candidate region, wherein the typical representatives of the algorithms are R-CNN algorithms based on region propofol, such as R-CNN, Fast R-CNN and the like; (2) one-stage detection algorithm, which does not require a region pro-posal stage, directly generates class probability and position coordinate values of an object, comparing typical algorithms such as YOLO and SSD algorithms.
Although the accuracy of the current mainstream target detection technology is high for large and medium targets, the detection effect is poor for small targets and some targets which are partially shielded.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a target detection method based on an end-to-end deep network and counterstudy, aiming at the above deficiencies in the prior art. The method is based on the SSD, utilizes the characteristic that the low convolution layer has a small local perception field, and fuses the low-resolution high-semantic information characteristic graph and the high-resolution low-semantic information characteristic graph through the inverse convolution structure, so as to achieve the purpose of improving the average accuracy of the target detection algorithm. In addition, coarse-grained candidate frame information is obtained through an RPN, a binary classification judgment is added after a candidate frame is generated in a basic feature layer, and then further regression is carried out through a conventional regression branch, so that more accurate detection frame information is obtained. Meanwhile, for the problem that the detection effect of the SSD algorithm on the partially shielded target is poor, the method for realizing the partial shielding of the features by adding the shielding Mask (Mask) on the feature map is provided, so that the effect of resisting learning is achieved.
A target detection method based on an end-to-end deep network and counterstudy comprises the following steps:
step 2, obtaining coarse-grained candidate frame information through an RPN, adding a binary classification judgment after a candidate frame is generated in a basic feature layer, and then further regressing through a conventional regression branch to obtain more accurate detection frame information;
step 3, corresponding the candidate frames on different scales screened in the step 2 with the fusion layer generated in the step 1 after the features on different scales are fused, then, amplifying or reducing all the feature map areas corresponding to the candidate frames to a fixed size through ROIPooling operation, and realizing partial shielding of the features by adding shielding masks on the feature maps so as to achieve the effect of resisting learning;
and 4, enabling the feature map shielded by the shielding mask to go back and forth to the detection frame and the class through the two full connection layers and the Softmax classifier.
Further, the step 1 specifically comprises:
firstly, inputting image data into an SSD network for extracting image features, and selecting four feature maps with different resolutions in a network structure;
and then, carrying out deconvolution on the low-resolution high-semantic information feature map in the SSD, carrying out feature fusion on the feature map obtained through deconvolution and the original feature map, wherein the feature fusion mode is to carry out deconvolution operation on the feature map, transmit the high-scale information to the previous layer through deconvolution operation on the feature layer of the next layer, transmit the high-scale information layer by layer, and finally obtain four fusion layers with different resolutions.
Further, the step 2 is specifically as follows:
for the feature maps with four different resolutions generated in the step 1, candidate frames with different sizes are generated on the four feature maps, partial negative samples are removed according to the IOU between the candidate frame and the target real frame, two branches are finally obtained based on the 4-layer features, one branch is regressed by coordinates of the candidate frame, and the other branch is a two-branch of the candidate frame.
Further, the step 3 specifically includes:
firstly, corresponding four layers of fusion layers with different resolutions obtained in the step 1 with positive and negative sample candidate frames on the feature maps with different resolutions obtained in the step 2;
then, performing ROIPooling operation on the four fusion layers with different resolutions, and scaling the feature map size corresponding to the candidate frame to a uniform size;
a mask is then generated from the fully connected layer to determine which portions of the feature map should be occluded, and the difficult samples thus generated are preferably misjudged by the detector, and the mask is automatically adjusted according to the loss function.
Further, in the step 4, the feature map shielded by the shielding mask is returned to the detection frame and the class through two full connection layers and a Softmax classifier, and specifically, the Non-maximum suppression (NMS) and the confidence threshold are screened to obtain the final predicted class, probability and positioning result.
In conclusion, the invention fuses the high-resolution low-semantic information feature map and the low-resolution high-semantic information feature map by utilizing the inverse convolution, improves the small target detection capability of the SSD algorithm, and simultaneously introduces counterstudy to enhance the detection capability of the algorithm on partially shielded targets.
Drawings
Fig. 1 is a network structure diagram of a target detection method based on an end-to-end deep network and counterlearning according to the present invention.
Fig. 2 is a structural diagram of the deconvolution feature fusion proposed by the present invention.
FIG. 3 is an exemplary graph of the feature occluded by the binarization mask according to the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the drawings in the specification.
A target detection method based on an end-to-end deep network and counterstudy comprises the following steps:
The step 1 specifically comprises the following steps:
firstly, inputting image data into an SSD network for extracting image features, and selecting four feature maps with different resolutions in a network structure.
And then, carrying out deconvolution on the low-resolution high-semantic information feature map in the SSD, carrying out feature fusion on the feature map obtained through deconvolution and the original feature map, wherein the feature fusion mode is to carry out deconvolution operation on the feature map, transmit the high-scale information to the previous layer through deconvolution operation on the feature layer of the next layer, transmit the high-scale information layer by layer, and finally obtain four fusion layers with different resolutions.
And 2, obtaining coarse-grained candidate frame information through an RPN, adding a binary classification judgment after a candidate frame is generated in a basic feature layer, and then further regressing through a conventional regression branch to obtain more accurate detection frame information.
The step 2 is specifically as follows:
for the feature maps with four different resolutions generated in the step 1, candidate frames with different sizes are generated on the four feature maps, partial negative samples are removed according to the IOU between the candidate frame and the target real frame, two branches are finally obtained based on the 4-layer features, one branch is regressed by coordinates of the candidate frame, and the other branch is a two-branch of the candidate frame.
And 3, corresponding the candidate frames on different scales screened in the step 2 with the fusion layer generated in the step 1 after the features on different scales are fused, then, amplifying or reducing all the feature map areas corresponding to the candidate frames to a fixed size through ROIPooling operation, and realizing partial shielding of the features by adding shielding masks on the feature maps so as to achieve the effect of resisting learning.
The step 3 specifically comprises the following steps:
firstly, the four layers of fusion layers with different resolutions obtained in the step 1 correspond to the positive and negative sample candidate frames on the feature map with different resolutions obtained in the step 2.
And then, carrying out ROIPooling operation on the four fusion layers with different resolutions, and scaling the feature map size corresponding to the candidate frame to a uniform size.
A mask is then generated from the fully connected layer to determine which portions of the feature map should be occluded, and the difficult samples thus generated are preferably misjudged by the detector, and the mask is automatically adjusted according to the loss function.
And 4, enabling the feature map shielded by the shielding mask to go back and forth to the detection frame and the class through the two full connection layers and the Softmax classifier.
And 4, enabling the feature map shielded by the shielding mask to go back and forth to a detection frame and a category through two full connection layers and a Softmax classifier, and specifically, screening a Non-maximum suppression (NMS) and a confidence threshold value through a Non-maximum value to obtain a final predicted category, probability and positioning result.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.
Claims (3)
1. A target detection method based on end-to-end deep network and antagonistic learning is characterized in that: comprises the following steps:
step 1, introducing a reverse convolution structure into an SSD algorithm, fusing a low-resolution high-semantic information feature map and a high-resolution low-semantic information feature map by adopting reverse convolution, and increasing the feature extraction capability of a lower layer in a network;
step 2, obtaining coarse-grained candidate frame information through an RPN, adding a binary classification judgment after a candidate frame is generated in a basic feature layer, and then further regressing through a conventional regression branch to obtain more accurate detection frame information;
the step 2 is specifically as follows:
for the feature maps with four different resolutions generated in the step 1, generating candidate frames with different sizes on the four feature maps, removing partial negative samples according to the IOU between the candidate frames and the target real frame, and finally obtaining two branches based on the 4-layer features, wherein one branch is a coordinate regression branch of the candidate frame, and the other branch is a classification branch of the candidate frame;
step 3, corresponding the candidate frames on different scales screened in the step 2 with the fusion layer generated in the step 1 after the features on different scales are fused, then, amplifying or reducing all the feature map areas corresponding to the candidate frames to a fixed size through ROIPooling operation, and realizing partial shielding of the features by adding shielding masks on the feature maps so as to achieve the effect of resisting learning;
the step 3 specifically comprises the following steps:
firstly, corresponding four layers of fusion layers with different resolutions obtained in the step 1 with positive and negative sample candidate frames on feature maps with different resolutions obtained in the step 2;
then, performing ROIPooling operation on the four fusion layers with different resolutions, and scaling the feature map size corresponding to the candidate frame to a uniform size;
then, a mask is generated through the full connection layer, and the parts of the characteristic diagram which should be shielded are determined, so that the generated difficult sample is misjudged by a detector, and the mask can be automatically adjusted according to a loss function;
and 4, enabling the feature map shielded by the shielding mask to go back and forth to the detection frame and the class through the two full connection layers and the Softmax classifier.
2. The method of claim 1, wherein the method comprises the following steps: the step 1 specifically comprises the following steps:
firstly, inputting image data into an SSD network for extracting image features, and selecting four feature maps with different resolutions in a network structure;
and then, carrying out deconvolution on the low-resolution high-semantic information feature map in the SSD, carrying out feature fusion on the feature map obtained through deconvolution and the original feature map, wherein the feature fusion mode is to carry out deconvolution operation on the feature map, transmit the high-scale information to the previous layer through deconvolution operation on the feature layer of the next layer, transmit the high-scale information layer by layer, and finally obtain four fusion layers with different resolutions.
3. The method of claim 1, wherein the method comprises the following steps: and 4, enabling the feature map shielded by the shielding mask to go back and forth to a detection frame and a category through two full connection layers and a Softmax classifier, and specifically, screening a Non-maximum suppression threshold and a confidence threshold to obtain a final predicted category, probability and positioning result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910179602.0A CN109871903B (en) | 2019-03-11 | 2019-03-11 | Target detection method based on end-to-end deep network and counterstudy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910179602.0A CN109871903B (en) | 2019-03-11 | 2019-03-11 | Target detection method based on end-to-end deep network and counterstudy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109871903A CN109871903A (en) | 2019-06-11 |
CN109871903B true CN109871903B (en) | 2022-08-26 |
Family
ID=66920150
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910179602.0A Active CN109871903B (en) | 2019-03-11 | 2019-03-11 | Target detection method based on end-to-end deep network and counterstudy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109871903B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110852241B (en) * | 2019-11-06 | 2022-08-16 | 西安交通大学 | Small target detection method applied to nursing robot |
CN115082758B (en) * | 2022-08-19 | 2022-11-11 | 深圳比特微电子科技有限公司 | Training method of target detection model, target detection method, device and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108288075A (en) * | 2018-02-02 | 2018-07-17 | 沈阳工业大学 | A kind of lightweight small target detecting method improving SSD |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102592076B1 (en) * | 2015-12-14 | 2023-10-19 | 삼성전자주식회사 | Appartus and method for Object detection based on Deep leaning, apparatus for Learning thereof |
-
2019
- 2019-03-11 CN CN201910179602.0A patent/CN109871903B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108288075A (en) * | 2018-02-02 | 2018-07-17 | 沈阳工业大学 | A kind of lightweight small target detecting method improving SSD |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN109871903A (en) | 2019-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344701B (en) | Kinect-based dynamic gesture recognition method | |
CN110728200B (en) | Real-time pedestrian detection method and system based on deep learning | |
CN109635666B (en) | Image target rapid detection method based on deep learning | |
CN111461083A (en) | Rapid vehicle detection method based on deep learning | |
CN111275688A (en) | Small target detection method based on context feature fusion screening of attention mechanism | |
WO2021238019A1 (en) | Real-time traffic flow detection system and method based on ghost convolutional feature fusion neural network | |
CN107273832B (en) | License plate recognition method and system based on integral channel characteristics and convolutional neural network | |
CN111967313B (en) | Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm | |
CN111401293B (en) | Gesture recognition method based on Head lightweight Mask scanning R-CNN | |
CN111209887A (en) | SSD model optimization method for small target detection | |
CN111914720B (en) | Method and device for identifying insulator burst of power transmission line | |
CN108875754B (en) | Vehicle re-identification method based on multi-depth feature fusion network | |
CN114495029A (en) | Traffic target detection method and system based on improved YOLOv4 | |
CN113723377A (en) | Traffic sign detection method based on LD-SSD network | |
CN111353544A (en) | Improved Mixed Pooling-Yolov 3-based target detection method | |
CN111768415A (en) | Image instance segmentation method without quantization pooling | |
CN109871903B (en) | Target detection method based on end-to-end deep network and counterstudy | |
Zhang et al. | Application research of YOLO v2 combined with color identification | |
Han et al. | A method based on multi-convolution layers joint and generative adversarial networks for vehicle detection | |
CN110909656B (en) | Pedestrian detection method and system integrating radar and camera | |
CN113076889B (en) | Container lead seal identification method, device, electronic equipment and storage medium | |
CN113762162A (en) | Fire early warning method and system based on semantic segmentation and recognition | |
CN112785610B (en) | Lane line semantic segmentation method integrating low-level features | |
CN113569911A (en) | Vehicle identification method and device, electronic equipment and storage medium | |
CN111275733A (en) | Method for realizing rapid tracking processing of multiple ships based on deep learning target detection technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |