CN109902800B - Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network - Google Patents

Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network Download PDF

Info

Publication number
CN109902800B
CN109902800B CN201910058187.3A CN201910058187A CN109902800B CN 109902800 B CN109902800 B CN 109902800B CN 201910058187 A CN201910058187 A CN 201910058187A CN 109902800 B CN109902800 B CN 109902800B
Authority
CN
China
Prior art keywords
backbone network
stage
network
backbone
feedback
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910058187.3A
Other languages
Chinese (zh)
Other versions
CN109902800A (en
Inventor
刘玉栋
王勇涛
汤帜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201910058187.3A priority Critical patent/CN109902800B/en
Publication of CN109902800A publication Critical patent/CN109902800A/en
Application granted granted Critical
Publication of CN109902800B publication Critical patent/CN109902800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method for detecting a general object based on a multi-level backbone network, which is characterized in that the multi-level backbone network based on a pseudo-feedback neural network is established, and a feedback mechanism is simulated in a deep neural network by utilizing the connection among a plurality of backbone networks, so that the extraction of the characteristics of the general object is enhanced, and the precision of object detection is improved. The invention can be applied to various object detectors, the applied backbone network of the detector adopts the multi-level backbone network provided by the invention, and the network structures of other parts of the detector do not need to be changed, the method is simple and convenient, and the object detection precision is high.

Description

Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network
Technical Field
The invention belongs to the technical field of object detection, relates to computer vision and deep learning technology, and particularly relates to a method for detecting a general object of a double-backbone network based on a quasi-feedback neural network.
Background
General object detection is one of the most basic tasks in the field of computer vision, and has very wide application in actual life, such as automatic driving, intelligent video monitoring, remote sensing technology and the like. In the years, the universal object detection has been greatly developed based on the rapid development of the deep neural network.
Currently, general Object detectors based on deep learning are classified into two types, one type is a Single-stage Detector, such as SSD (SSD), retanet (focal local for sense Object detection). Another class is two-stage detectors, such as Faster R-CNN (fast R-CNN: directions read-Time Object Detection with Region projection Networks), FPN (feature Pyramid Networks for Object Detection), MaskR-CNN, CascadeR-CNN (Cascade R-CNN: decoding in High precision Object Detection), and the like.
However, the above detectors all use a unidirectional feedforward neural network to detect a general object, and in the training and testing of the network, the features directly pass through the whole feedforward network and are output, and the network does not include a feedback mechanism. This is because the gradient descent method of the deep neural network is based on a back propagation mechanism, and no loop can exist in the connection of the network; however, there is a feedback loop in the human visual system, and a feedback mechanism can correct errors of the extracted features and further enhance the extraction of the features. Therefore, the existing detector adopting the unidirectional feedforward neural network is used for detecting a general object, a certain bottleneck exists in the technology, and the detection accuracy and precision are limited.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method for detecting a general object of a multi-stage backbone network based on a pseudo-feedback neural network, which establishes the multi-stage backbone network based on the pseudo-feedback neural network, and simulates a feedback mechanism in a deep neural network by using the connection among a plurality of backbone networks, thereby enhancing the extraction of the characteristics of the general object and improving the precision of object detection.
The technical scheme of the invention is as follows:
a multi-stage backbone network based on a quasi-feedback neural network is established, and a feedback mechanism is simulated in a deep neural network by utilizing the connection among a plurality of backbone networks, so that the extraction of the characteristics of the general object is enhanced, and the object detection precision is improved; the method comprises the following steps:
1) and establishing a multi-stage backbone network based on the quasi-feedback neural network.
The number of the multi-stage backbone networks can be 2,3 …, the backbone networks have the same structure, and the backbone networks can be ResNet (residual error network) or ResNeXt (multi-branch residual error network);
each backbone network comprises a plurality of (typically 4) convolutional blocks (or backbone network stages), each stage comprising a plurality of convolutional layers.
And taking the output of each stage of each backbone network as input to the same stage of the next stage of backbone network to form the quasi-feedback connection.
The structure of the quasi-feedback connection comprises a 1 × 1 convolutional layer and an up-sampling operation; the 1 x 1 convolutional layer aligns the number of channels of the output characteristics of a certain stage of the previous stage of the backbone network with the number of channels of the input characteristics of a corresponding stage of the next stage of the backbone network, and the upsampling operation aligns the spatial sizes of the characteristics of the two stages of the two adjacent stages of the backbone networks. The minimum stage quasi-feedback connection does not require an upsampling operation because its input and output characteristics are the same spatial size.
And taking the output of the last stage of backbone network as the output of the multi-stage backbone network.
2) Inputting a general object image to be detected to a detector, such as MaskR-CNN, CascadeR-CNN and the like;
3) sending the general object image into the multi-stage backbone network based on the quasi-feedback neural network established in the step 1) to extract features, wherein the output of the multi-stage backbone network is the extracted features;
4) the features extracted from the multi-stage backbone network are fed into subsequent modules of the backbone network, which may be RPNs (regional candidate networks) or detective heads, depending on the specific detector.
5) And taking the output of the subsequent module of the multilevel backbone network as the detection result of the detector.
The detection method of the general object can be widely applied to detectors for practical application such as automatic driving, intelligent video monitoring and object remote sensing identification, and the like, and the precision of object detection is improved.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for detecting a general object of a multi-stage backbone network based on a quasi-feedback neural network, which is characterized in that the multi-stage backbone network based on the quasi-feedback neural network is established, and a feedback mechanism is simulated in the neural network by utilizing the connection between the backbone networks, so that the extraction of the characteristics of the general object is enhanced, and the accuracy of object detection is improved.
The method breaks through the conventional thinking of adopting a forward network, establishes the multi-level backbone network extraction characteristics based on the quasi-feedback neural network, can be applied to various object detectors, adopts the multi-level backbone network provided by the invention for the backbone network of the applied detector, does not need to change the network structures of other parts of the detector, and has the advantages of simple and convenient method and high object detection precision. The implementation on MSCOCO shows that the input image size in both training and testing800 × 1333, after modifying the backbone network of the detector to the corresponding two-level backbone network (e.g., replacing the rescet 101 backbone network with the two-level rescet 101 backbone network, and replacing the rescext 152 backbone network with the two-level rescext 152 backbone network), the boxmAP value of the FPN based on ResNet101 on the test-dev set may be increased from 39.4% to 41.0%, the boxmAP value of the MaskR-CNN based on ResNet101 is increased from 40.1% to 41.8%, the boxmAP value of the CascadeR-CNN based on ResNet101 is increased from 42.8% to 44.3%, and the boxmAP value of the CascadeMaskR-CNN based on ResNet 152 is increased from 48.3% to 50.0%; and after the backbone network of the detector is modified into a corresponding three-level backbone network (for example, the ResNet101 backbone network is replaced by the three-level ResNet101 backbone network, and the ResNeXt152 backbone network is replaced by the three-level ResNeXt152 backbone network), the boxmAP value of the FPN based on ResNet101 can be increased from 39.4% to 42.0%, and the boxmAP value of the CascadeMaskR-CNN based on ResNeXt152 is increased from 48.3% to 51.2%. (Note: MSCOCO is a large-scale data set, including tasks such as object detection, segmentation, etc., seehttp://cocodataset.org/#homeThe mAP value of box is an index for measuring the detection performance, seehttp://cocodataset.org/#detection-eval)。
Drawings
Fig. 1 is a flow chart diagram of a general object detection method provided by the present invention.
Fig. 2 is a schematic diagram of a conventional backbone network structure.
Fig. 3 is a schematic diagram of a connection structure between two adjacent backbone networks according to the present invention.
Fig. 4 is a schematic structural diagram of a feedback connection in an embodiment of the present invention.
Detailed Description
The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.
The present invention proposes a multi-level backbone network for universal object detection, as shown in fig. 1. In the existing general object detection framework, there is only one backbone network, as shown in fig. 2, the most commonly used backbone network at present is ResNet (residual error network). In order to solve the feedback problem in the general object detection, the embodiment of the invention proposes that a plurality of backbone networks are used as a detection network, and a feedback mechanism is simulated in a deep neural network through some connections among the plurality of backbone networks, so as to enhance the extraction of features. These backbone networks are structurally identical and can be either ResNet (residual network) or resenext (multi-branch residual network). Each backbone network has a plurality of convolutional blocks (stages), each convolutional block containing a plurality of convolutional layers. The output of the convolution block of each level of the backbone network is connected to the input of the convolution block of the same level of the next level of the backbone network to form a pseudo-feedback connection, as shown in fig. 3. The structure of the pseudo-feedback connection (also called feedback connection for descriptive convenience) is shown in fig. 4. The method comprises a 1 x 1 convolutional layer and an up-sampling operation, wherein the 1 x 1 convolutional layer aligns the number of channels of the output characteristics of a certain convolutional block of a previous stage backbone network with the number of channels of the input characteristics of a convolutional block corresponding to a next stage backbone network, and the up-sampling operation aligns the space sizes of the two. It is noted that the feedback connection of the lowest stage does not require an upsampling operation because the spatial size of its input and output characteristics is the same.
FIG. 1 is a flow chart of a general object detection method provided by the present invention; for the detection network to be improved, a general backbone network (such as ResNet, ResNeXt) is directly replaced by the multi-stage backbone network in the invention.
The MSCOCO is a large-scale data set including tasks of object detection, segmentation, and the like, seehttp:// cocodataset.org/#homeThe mAP value of box is an index for measuring the detection performance, seehttp:// cocodataset.org/#detection-eval)。
Taking an FPN (Feature Pyramid Object Detection network) as an example, replacing a ResNet101 part in the network with a ResNet101 two-stage backbone network in the invention, namely, a first-stage backbone network and a second-stage backbone network are both ResNet101, and after improvement, under the condition that the size of a trained and tested image is 800 × 1333, the mAP value of Object Detection is promoted on a test-dev data set of the MSCOCO; when the ResNet101 in the network is partially replaced by the ResNet101 three-level backbone network structure in the invention, namely the first-level backbone network, the second-level backbone network and the third-level backbone network are all ResNet101, the mAP value detected by the object is improved on the test-dev data set of the MSCOCO.
Specifically, the experimental results on MSCOCO show that after modifying the backbone network of the detector to the corresponding two-stage backbone network (e.g., replacing the renet 101 backbone network with the two-stage ResNet101 backbone network and replacing the ResNeXt152 backbone network with the two-stage ResNeXt152 backbone network) in the case where the input image sizes of training and testing are both 800 × 1333, the boxmAP value of the FPN based on ResNet101 on the test-dev set can be increased from 39.4% to 41.0%, the boxmAP value of the MaskR-CNN based on ResNet101 is increased from 40.1% to 41.8%, the boxmAP value of the CascadeR-CNN based on ResNet101 is increased from 42.8% to 44.3%, and the boxmAP value of the CascadeMaskR-CNN based on ResNet 152 is increased from 48.3% to 50.0%; and after the backbone network of the detector is modified into a corresponding three-level backbone network (for example, the ResNet101 backbone network is replaced by the three-level ResNet101 backbone network, and the ResNeXt152 backbone network is replaced by the three-level ResNeXt152 backbone network), the boxmAP value of the FPN based on ResNet101 can be increased from 39.4% to 42.0%, and the boxmAP value of the CascadeMaskR-CNN based on ResNeXt152 is increased from 48.3% to 51.2%.
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims (8)

1. A multi-level backbone network based on a general object detection method of a multi-level backbone network is established, the multi-level backbone network based on a quasi-feedback neural network is established, and a feedback mechanism is simulated in a deep neural network by utilizing the connection among a plurality of backbone networks, so that the extraction of general object characteristics is enhanced, and the object detection precision is improved; the method comprises the following steps:
1) establishing a multi-stage backbone network based on a quasi-feedback neural network; the number of each level of backbone network is 1; the structure of each level of backbone network is the same; each backbone network comprises a plurality of stages; each stage comprises a plurality of convolution layers; for each backbone network, the output of each stage is used as input and is sent to the same stage of the next stage backbone network to form a connection of quasi-feedback;
2) collecting an image of a general object to be detected, and inputting the image into a detector;
3) sending the image into the multi-stage backbone network based on the quasi-feedback neural network established in the step 1) to extract features, wherein the output of the multi-stage backbone network is the extracted features;
4) the features extracted from the multilevel backbone network are sent to a subsequent detector module of the multilevel backbone network for detection;
5) and taking the output of the subsequent detector module of the multilevel backbone network as the detection result of the detector.
2. The method for detecting the universal object based on the multi-level backbone network according to claim 1, wherein the detection method is applied to an automatic driving detector, an intelligent video monitoring detector or an object remote sensing identification detector.
3. The method according to claim 1, wherein the subsequent detector modules of the multi-stage backbone network are regional candidate networks RPN.
4. The method for detecting the universal object based on the multi-stage backbone network as claimed in claim 1, wherein each stage of the backbone network employs a residual error network ResNet or a multi-branch residual error network ResNeXt.
5. The method for detecting a generic object based on a multi-stage backbone network as claimed in claim 1, wherein each stage of the backbone network comprises 4 stages.
6. The method according to claim 1, wherein the structure of the pseudo-feedback connection comprises a 1 x 1 convolutional layer and an upsampling operation; the 1 x 1 convolutional layer aligns the number of channels of the output characteristics of the previous stage of backbone network with the number of channels of the input characteristics of the corresponding stage of the next stage of backbone network, and the upsampling operation aligns the spatial sizes of the characteristics of the corresponding stages of the two stages of backbone networks.
7. The method for detecting a generic object based on a multi-stage backbone network as claimed in claim 6, wherein the input features and the output features of the lowest stage of each stage of the backbone network have the same spatial size; the quasi-feedback connection does not include an upsampling operation.
8. The method as claimed in claim 1, wherein the detector includes but is not limited to Mask R-CNN or Cascade R-CNN.
CN201910058187.3A 2019-01-22 2019-01-22 Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network Active CN109902800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910058187.3A CN109902800B (en) 2019-01-22 2019-01-22 Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910058187.3A CN109902800B (en) 2019-01-22 2019-01-22 Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network

Publications (2)

Publication Number Publication Date
CN109902800A CN109902800A (en) 2019-06-18
CN109902800B true CN109902800B (en) 2020-11-27

Family

ID=66943968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910058187.3A Active CN109902800B (en) 2019-01-22 2019-01-22 Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network

Country Status (1)

Country Link
CN (1) CN109902800B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144407A (en) * 2019-12-22 2020-05-12 浪潮(北京)电子信息产业有限公司 Target detection method, system, device and readable storage medium
CN111161260A (en) * 2020-01-02 2020-05-15 中冶赛迪重庆信息技术有限公司 Hot-rolled strip steel surface defect detection method and device based on deep learning
CN111739062B (en) * 2020-06-05 2021-05-25 北京航空航天大学 Target detection method and system based on feedback mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301400A (en) * 2017-06-23 2017-10-27 深圳市唯特视科技有限公司 A kind of semantic semi-supervised video picture segmentation method being oriented to
CN108399362A (en) * 2018-01-24 2018-08-14 中山大学 A kind of rapid pedestrian detection method and device
CN108550162A (en) * 2018-03-27 2018-09-18 清华大学 A kind of object detecting method based on deeply study
CN109086678A (en) * 2018-07-09 2018-12-25 天津大学 A kind of pedestrian detection method extracting image multi-stage characteristics based on depth supervised learning
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301400A (en) * 2017-06-23 2017-10-27 深圳市唯特视科技有限公司 A kind of semantic semi-supervised video picture segmentation method being oriented to
CN108399362A (en) * 2018-01-24 2018-08-14 中山大学 A kind of rapid pedestrian detection method and device
CN108550162A (en) * 2018-03-27 2018-09-18 清华大学 A kind of object detecting method based on deeply study
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
CN109086678A (en) * 2018-07-09 2018-12-25 天津大学 A kind of pedestrian detection method extracting image multi-stage characteristics based on depth supervised learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Cascade r-cnn: Delving into high quality object detection;Cai, Z等;《CVPR》;20181231;全文 *
图像物体分类与检测算法综述;黄凯奇等;《计算机学报》;20140630;第37卷(第6期);全文 *

Also Published As

Publication number Publication date
CN109902800A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN110084124B (en) Feature enhancement target detection method based on feature pyramid network
CN109902800B (en) Method for detecting general object by using multi-stage backbone network based on quasi-feedback neural network
CN110633610B (en) Student state detection method based on YOLO
US10867244B2 (en) Method and apparatus for machine learning
JP6209879B2 (en) Convolutional neural network classifier system, training method, classification method and use thereof
CN110390340B (en) Feature coding model, training method and detection method of visual relation detection model
CN104217216A (en) Method and device for generating detection model, method and device for detecting target
CN107423278B (en) Evaluation element identification method, device and system
CN111507370A (en) Method and device for obtaining sample image of inspection label in automatic labeling image
US20230360390A1 (en) Transmission line defect identification method based on saliency map and semantic-embedded feature pyramid
CN111949480B (en) Log anomaly detection method based on component perception
JP2020024534A (en) Image classifier and program
CN109144852A (en) Scan method, device, computer equipment and the storage medium of static code
CN114758255A (en) Unmanned aerial vehicle detection method based on YOLOV5 algorithm
KR101825689B1 (en) Object recognition apparatus, learning method thereof and object recognition method using the same
CN117115715A (en) Video anomaly detection method based on combination of stream reconstruction and frame prediction
CN116481791A (en) Steel structure connection stability monitoring system and method thereof
CN113128412B (en) Fire trend prediction method based on deep learning and fire monitoring video
KR20210011822A (en) Method of detecting abnormal log based on artificial intelligence and system implementing thereof
CN111310611A (en) Method for detecting cell visual field map and storage medium
CN116797586A (en) Automatic paper cup defect detection method and system
CN115143128B (en) Fault diagnosis method and system for small-sized submersible electric pump
CN116523711A (en) Education supervision system and method based on artificial intelligence
CN115439446A (en) Appearance defect detection method and device, storage medium and electronic equipment
CN115546682A (en) Dynamic smoke detection method based on video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant