CN109902800B - Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network - Google Patents
Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network Download PDFInfo
- Publication number
- CN109902800B CN109902800B CN201910058187.3A CN201910058187A CN109902800B CN 109902800 B CN109902800 B CN 109902800B CN 201910058187 A CN201910058187 A CN 201910058187A CN 109902800 B CN109902800 B CN 109902800B
- Authority
- CN
- China
- Prior art keywords
- backbone network
- stage
- network
- backbone
- feedback
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a method for detecting a general object based on a multi-level backbone network, which is characterized in that the multi-level backbone network based on a pseudo-feedback neural network is established, and a feedback mechanism is simulated in a deep neural network by utilizing the connection among a plurality of backbone networks, so that the extraction of the characteristics of the general object is enhanced, and the precision of object detection is improved. The invention can be applied to various object detectors, the applied backbone network of the detector adopts the multi-level backbone network provided by the invention, and the network structures of other parts of the detector do not need to be changed, the method is simple and convenient, and the object detection precision is high.
Description
Technical Field
The invention belongs to the technical field of object detection, relates to computer vision and deep learning technology, and particularly relates to a method for detecting a general object of a double-backbone network based on a quasi-feedback neural network.
Background
General object detection is one of the most basic tasks in the field of computer vision, and has very wide application in actual life, such as automatic driving, intelligent video monitoring, remote sensing technology and the like. In the years, the universal object detection has been greatly developed based on the rapid development of the deep neural network.
Currently, general Object detectors based on deep learning are classified into two types, one type is a Single-stage Detector, such as SSD (SSD), retanet (focal local for sense Object detection). Another class is two-stage detectors, such as Faster R-CNN (fast R-CNN: directions read-Time Object Detection with Region projection Networks), FPN (feature Pyramid Networks for Object Detection), MaskR-CNN, CascadeR-CNN (Cascade R-CNN: decoding in High precision Object Detection), and the like.
However, the above detectors all use a unidirectional feedforward neural network to detect a general object, and in the training and testing of the network, the features directly pass through the whole feedforward network and are output, and the network does not include a feedback mechanism. This is because the gradient descent method of the deep neural network is based on a back propagation mechanism, and no loop can exist in the connection of the network; however, there is a feedback loop in the human visual system, and a feedback mechanism can correct errors of the extracted features and further enhance the extraction of the features. Therefore, the existing detector adopting the unidirectional feedforward neural network is used for detecting a general object, a certain bottleneck exists in the technology, and the detection accuracy and precision are limited.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method for detecting a general object of a multi-stage backbone network based on a pseudo-feedback neural network, which establishes the multi-stage backbone network based on the pseudo-feedback neural network, and simulates a feedback mechanism in a deep neural network by using the connection among a plurality of backbone networks, thereby enhancing the extraction of the characteristics of the general object and improving the precision of object detection.
The technical scheme of the invention is as follows:
a multi-stage backbone network based on a quasi-feedback neural network is established, and a feedback mechanism is simulated in a deep neural network by utilizing the connection among a plurality of backbone networks, so that the extraction of the characteristics of the general object is enhanced, and the object detection precision is improved; the method comprises the following steps:
1) and establishing a multi-stage backbone network based on the quasi-feedback neural network.
The number of the multi-stage backbone networks can be 2,3 …, the backbone networks have the same structure, and the backbone networks can be ResNet (residual error network) or ResNeXt (multi-branch residual error network);
each backbone network comprises a plurality of (typically 4) convolutional blocks (or backbone network stages), each stage comprising a plurality of convolutional layers.
And taking the output of each stage of each backbone network as input to the same stage of the next stage of backbone network to form the quasi-feedback connection.
The structure of the quasi-feedback connection comprises a 1 × 1 convolutional layer and an up-sampling operation; the 1 x 1 convolutional layer aligns the number of channels of the output characteristics of a certain stage of the previous stage of the backbone network with the number of channels of the input characteristics of a corresponding stage of the next stage of the backbone network, and the upsampling operation aligns the spatial sizes of the characteristics of the two stages of the two adjacent stages of the backbone networks. The minimum stage quasi-feedback connection does not require an upsampling operation because its input and output characteristics are the same spatial size.
And taking the output of the last stage of backbone network as the output of the multi-stage backbone network.
2) Inputting a general object image to be detected to a detector, such as MaskR-CNN, CascadeR-CNN and the like;
3) sending the general object image into the multi-stage backbone network based on the quasi-feedback neural network established in the step 1) to extract features, wherein the output of the multi-stage backbone network is the extracted features;
4) the features extracted from the multi-stage backbone network are fed into subsequent modules of the backbone network, which may be RPNs (regional candidate networks) or detective heads, depending on the specific detector.
5) And taking the output of the subsequent module of the multilevel backbone network as the detection result of the detector.
The detection method of the general object can be widely applied to detectors for practical application such as automatic driving, intelligent video monitoring and object remote sensing identification, and the like, and the precision of object detection is improved.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for detecting a general object of a multi-stage backbone network based on a quasi-feedback neural network, which is characterized in that the multi-stage backbone network based on the quasi-feedback neural network is established, and a feedback mechanism is simulated in the neural network by utilizing the connection between the backbone networks, so that the extraction of the characteristics of the general object is enhanced, and the accuracy of object detection is improved.
The method breaks through the conventional thinking of adopting a forward network, establishes the multi-level backbone network extraction characteristics based on the quasi-feedback neural network, can be applied to various object detectors, adopts the multi-level backbone network provided by the invention for the backbone network of the applied detector, does not need to change the network structures of other parts of the detector, and has the advantages of simple and convenient method and high object detection precision. The implementation on MSCOCO shows that the input image size in both training and testing800 × 1333, after modifying the backbone network of the detector to the corresponding two-level backbone network (e.g., replacing the rescet 101 backbone network with the two-level rescet 101 backbone network, and replacing the rescext 152 backbone network with the two-level rescext 152 backbone network), the boxmAP value of the FPN based on ResNet101 on the test-dev set may be increased from 39.4% to 41.0%, the boxmAP value of the MaskR-CNN based on ResNet101 is increased from 40.1% to 41.8%, the boxmAP value of the CascadeR-CNN based on ResNet101 is increased from 42.8% to 44.3%, and the boxmAP value of the CascadeMaskR-CNN based on ResNet 152 is increased from 48.3% to 50.0%; and after the backbone network of the detector is modified into a corresponding three-level backbone network (for example, the ResNet101 backbone network is replaced by the three-level ResNet101 backbone network, and the ResNeXt152 backbone network is replaced by the three-level ResNeXt152 backbone network), the boxmAP value of the FPN based on ResNet101 can be increased from 39.4% to 42.0%, and the boxmAP value of the CascadeMaskR-CNN based on ResNeXt152 is increased from 48.3% to 51.2%. (Note: MSCOCO is a large-scale data set, including tasks such as object detection, segmentation, etc., seehttp://cocodataset.org/#homeThe mAP value of box is an index for measuring the detection performance, seehttp://cocodataset.org/#detection-eval)。
Drawings
Fig. 1 is a flow chart diagram of a general object detection method provided by the present invention.
Fig. 2 is a schematic diagram of a conventional backbone network structure.
Fig. 3 is a schematic diagram of a connection structure between two adjacent backbone networks according to the present invention.
Fig. 4 is a schematic structural diagram of a feedback connection in an embodiment of the present invention.
Detailed Description
The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.
The present invention proposes a multi-level backbone network for universal object detection, as shown in fig. 1. In the existing general object detection framework, there is only one backbone network, as shown in fig. 2, the most commonly used backbone network at present is ResNet (residual error network). In order to solve the feedback problem in the general object detection, the embodiment of the invention proposes that a plurality of backbone networks are used as a detection network, and a feedback mechanism is simulated in a deep neural network through some connections among the plurality of backbone networks, so as to enhance the extraction of features. These backbone networks are structurally identical and can be either ResNet (residual network) or resenext (multi-branch residual network). Each backbone network has a plurality of convolutional blocks (stages), each convolutional block containing a plurality of convolutional layers. The output of the convolution block of each level of the backbone network is connected to the input of the convolution block of the same level of the next level of the backbone network to form a pseudo-feedback connection, as shown in fig. 3. The structure of the pseudo-feedback connection (also called feedback connection for descriptive convenience) is shown in fig. 4. The method comprises a 1 x 1 convolutional layer and an up-sampling operation, wherein the 1 x 1 convolutional layer aligns the number of channels of the output characteristics of a certain convolutional block of a previous stage backbone network with the number of channels of the input characteristics of a convolutional block corresponding to a next stage backbone network, and the up-sampling operation aligns the space sizes of the two. It is noted that the feedback connection of the lowest stage does not require an upsampling operation because the spatial size of its input and output characteristics is the same.
FIG. 1 is a flow chart of a general object detection method provided by the present invention; for the detection network to be improved, a general backbone network (such as ResNet, ResNeXt) is directly replaced by the multi-stage backbone network in the invention.
The MSCOCO is a large-scale data set including tasks of object detection, segmentation, and the like, seehttp:// cocodataset.org/#homeThe mAP value of box is an index for measuring the detection performance, seehttp:// cocodataset.org/#detection-eval)。
Taking an FPN (Feature Pyramid Object Detection network) as an example, replacing a ResNet101 part in the network with a ResNet101 two-stage backbone network in the invention, namely, a first-stage backbone network and a second-stage backbone network are both ResNet101, and after improvement, under the condition that the size of a trained and tested image is 800 × 1333, the mAP value of Object Detection is promoted on a test-dev data set of the MSCOCO; when the ResNet101 in the network is partially replaced by the ResNet101 three-level backbone network structure in the invention, namely the first-level backbone network, the second-level backbone network and the third-level backbone network are all ResNet101, the mAP value detected by the object is improved on the test-dev data set of the MSCOCO.
Specifically, the experimental results on MSCOCO show that after modifying the backbone network of the detector to the corresponding two-stage backbone network (e.g., replacing the renet 101 backbone network with the two-stage ResNet101 backbone network and replacing the ResNeXt152 backbone network with the two-stage ResNeXt152 backbone network) in the case where the input image sizes of training and testing are both 800 × 1333, the boxmAP value of the FPN based on ResNet101 on the test-dev set can be increased from 39.4% to 41.0%, the boxmAP value of the MaskR-CNN based on ResNet101 is increased from 40.1% to 41.8%, the boxmAP value of the CascadeR-CNN based on ResNet101 is increased from 42.8% to 44.3%, and the boxmAP value of the CascadeMaskR-CNN based on ResNet 152 is increased from 48.3% to 50.0%; and after the backbone network of the detector is modified into a corresponding three-level backbone network (for example, the ResNet101 backbone network is replaced by the three-level ResNet101 backbone network, and the ResNeXt152 backbone network is replaced by the three-level ResNeXt152 backbone network), the boxmAP value of the FPN based on ResNet101 can be increased from 39.4% to 42.0%, and the boxmAP value of the CascadeMaskR-CNN based on ResNeXt152 is increased from 48.3% to 51.2%.
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.
Claims (8)
1. A multi-level backbone network based on a general object detection method of a multi-level backbone network is established, the multi-level backbone network based on a quasi-feedback neural network is established, and a feedback mechanism is simulated in a deep neural network by utilizing the connection among a plurality of backbone networks, so that the extraction of general object characteristics is enhanced, and the object detection precision is improved; the method comprises the following steps:
1) establishing a multi-stage backbone network based on a quasi-feedback neural network; the number of each level of backbone network is 1; the structure of each level of backbone network is the same; each backbone network comprises a plurality of stages; each stage comprises a plurality of convolution layers; for each backbone network, the output of each stage is used as input and is sent to the same stage of the next stage backbone network to form a connection of quasi-feedback;
2) collecting an image of a general object to be detected, and inputting the image into a detector;
3) sending the image into the multi-stage backbone network based on the quasi-feedback neural network established in the step 1) to extract features, wherein the output of the multi-stage backbone network is the extracted features;
4) the features extracted from the multilevel backbone network are sent to a subsequent detector module of the multilevel backbone network for detection;
5) and taking the output of the subsequent detector module of the multilevel backbone network as the detection result of the detector.
2. The method for detecting the universal object based on the multi-level backbone network according to claim 1, wherein the detection method is applied to an automatic driving detector, an intelligent video monitoring detector or an object remote sensing identification detector.
3. The method according to claim 1, wherein the subsequent detector modules of the multi-stage backbone network are regional candidate networks RPN.
4. The method for detecting the universal object based on the multi-stage backbone network as claimed in claim 1, wherein each stage of the backbone network employs a residual error network ResNet or a multi-branch residual error network ResNeXt.
5. The method for detecting a generic object based on a multi-stage backbone network as claimed in claim 1, wherein each stage of the backbone network comprises 4 stages.
6. The method according to claim 1, wherein the structure of the pseudo-feedback connection comprises a 1 x 1 convolutional layer and an upsampling operation; the 1 x 1 convolutional layer aligns the number of channels of the output characteristics of the previous stage of backbone network with the number of channels of the input characteristics of the corresponding stage of the next stage of backbone network, and the upsampling operation aligns the spatial sizes of the characteristics of the corresponding stages of the two stages of backbone networks.
7. The method for detecting a generic object based on a multi-stage backbone network as claimed in claim 6, wherein the input features and the output features of the lowest stage of each stage of the backbone network have the same spatial size; the quasi-feedback connection does not include an upsampling operation.
8. The method as claimed in claim 1, wherein the detector includes but is not limited to Mask R-CNN or Cascade R-CNN.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910058187.3A CN109902800B (en) | 2019-01-22 | 2019-01-22 | Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910058187.3A CN109902800B (en) | 2019-01-22 | 2019-01-22 | Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109902800A CN109902800A (en) | 2019-06-18 |
CN109902800B true CN109902800B (en) | 2020-11-27 |
Family
ID=66943968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910058187.3A Active CN109902800B (en) | 2019-01-22 | 2019-01-22 | Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109902800B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144407A (en) * | 2019-12-22 | 2020-05-12 | 浪潮(北京)电子信息产业有限公司 | Target detection method, system, device and readable storage medium |
CN111161260A (en) * | 2020-01-02 | 2020-05-15 | 中冶赛迪重庆信息技术有限公司 | Hot-rolled strip steel surface defect detection method and device based on deep learning |
CN111739062B (en) * | 2020-06-05 | 2021-05-25 | 北京航空航天大学 | Target detection method and system based on feedback mechanism |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301400A (en) * | 2017-06-23 | 2017-10-27 | 深圳市唯特视科技有限公司 | A kind of semantic semi-supervised video picture segmentation method being oriented to |
CN108399362A (en) * | 2018-01-24 | 2018-08-14 | 中山大学 | A kind of rapid pedestrian detection method and device |
CN108550162A (en) * | 2018-03-27 | 2018-09-18 | 清华大学 | A kind of object detecting method based on deeply study |
CN109086678A (en) * | 2018-07-09 | 2018-12-25 | 天津大学 | A kind of pedestrian detection method extracting image multi-stage characteristics based on depth supervised learning |
CN109165660A (en) * | 2018-06-20 | 2019-01-08 | 扬州大学 | A kind of obvious object detection method based on convolutional neural networks |
-
2019
- 2019-01-22 CN CN201910058187.3A patent/CN109902800B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301400A (en) * | 2017-06-23 | 2017-10-27 | 深圳市唯特视科技有限公司 | A kind of semantic semi-supervised video picture segmentation method being oriented to |
CN108399362A (en) * | 2018-01-24 | 2018-08-14 | 中山大学 | A kind of rapid pedestrian detection method and device |
CN108550162A (en) * | 2018-03-27 | 2018-09-18 | 清华大学 | A kind of object detecting method based on deeply study |
CN109165660A (en) * | 2018-06-20 | 2019-01-08 | 扬州大学 | A kind of obvious object detection method based on convolutional neural networks |
CN109086678A (en) * | 2018-07-09 | 2018-12-25 | 天津大学 | A kind of pedestrian detection method extracting image multi-stage characteristics based on depth supervised learning |
Non-Patent Citations (2)
Title |
---|
Cascade r-cnn: Delving into high quality object detection;Cai, Z等;《CVPR》;20181231;全文 * |
图像物体分类与检测算法综述;黄凯奇等;《计算机学报》;20140630;第37卷(第6期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109902800A (en) | 2019-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084124B (en) | Feature enhancement target detection method based on feature pyramid network | |
CN109902800B (en) | Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network | |
CN110633610B (en) | Student state detection method based on YOLO | |
US10867244B2 (en) | Method and apparatus for machine learning | |
JP6209879B2 (en) | Convolutional neural network classifier system, training method, classification method and use thereof | |
CN110390340B (en) | Feature coding model, training method and detection method of visual relation detection model | |
CN104217216A (en) | Method and device for generating detection model, method and device for detecting target | |
CN107423278B (en) | Evaluation element identification method, device and system | |
CN111507370A (en) | Method and device for obtaining sample image of inspection label in automatic labeling image | |
US20230360390A1 (en) | Transmission line defect identification method based on saliency map and semantic-embedded feature pyramid | |
CN111949480B (en) | Log anomaly detection method based on component perception | |
JP2020024534A (en) | Image classifier and program | |
CN109144852A (en) | Scan method, device, computer equipment and the storage medium of static code | |
CN114758255A (en) | Unmanned aerial vehicle detection method based on YOLOV5 algorithm | |
KR101825689B1 (en) | Object recognition apparatus, learning method thereof and object recognition method using the same | |
CN117115715A (en) | Video anomaly detection method based on combination of stream reconstruction and frame prediction | |
CN116481791A (en) | Steel structure connection stability monitoring system and method thereof | |
CN113128412B (en) | Fire trend prediction method based on deep learning and fire monitoring video | |
KR20210011822A (en) | Method of detecting abnormal log based on artificial intelligence and system implementing thereof | |
CN111310611A (en) | Method for detecting cell visual field map and storage medium | |
CN116797586A (en) | Automatic paper cup defect detection method and system | |
CN115143128B (en) | Fault diagnosis method and system for small-sized submersible electric pump | |
CN116523711A (en) | Education supervision system and method based on artificial intelligence | |
CN115439446A (en) | Appearance defect detection method and device, storage medium and electronic equipment | |
CN115546682A (en) | Dynamic smoke detection method based on video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |