CN109472298B - Deep bidirectional feature pyramid enhanced network for small-scale target detection - Google Patents

Deep bidirectional feature pyramid enhanced network for small-scale target detection Download PDF

Info

Publication number
CN109472298B
CN109472298B CN201811219005.8A CN201811219005A CN109472298B CN 109472298 B CN109472298 B CN 109472298B CN 201811219005 A CN201811219005 A CN 201811219005A CN 109472298 B CN109472298 B CN 109472298B
Authority
CN
China
Prior art keywords
output
scale
target
pyramid
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201811219005.8A
Other languages
Chinese (zh)
Other versions
CN109472298A (en
Inventor
庞彦伟
朱海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201811219005.8A priority Critical patent/CN109472298B/en
Publication of CN109472298A publication Critical patent/CN109472298A/en
Application granted granted Critical
Publication of CN109472298B publication Critical patent/CN109472298B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a deep bidirectional feature pyramid enhancement network for small-scale target detection, which comprises the following steps: determining a backbone network of a network coding end; designing a Bottom-up characteristic pyramid; designing a Top-down characteristic pyramid; target detection sub-network: adopting a strategy of two-stage detection in the faster-rcnn, namely a candidate frame extraction stage and a target classification stage respectively, adopting convolution with a convolution kernel of 3x3 to perform regression of a target frame and prediction of probability of whether the target frame is the target or not on an output feature map of each scale of a top-down feature pyramid in an RPN stage, performing ROI-posing on the screened candidate target frame and the output feature map of the top-down feature pyramid with the corresponding scale, and finally performing frame adjustment and classification of specific classes of the target by using two fully-connected layers; and outputting the object detection result.

Description

Deep bidirectional feature pyramid enhanced network for small-scale target detection
Technical Field
The invention belongs to a target detection technology in the fields of computer vision, pattern recognition, deep learning, artificial intelligence and the like, and particularly relates to a technology for detecting a target in a scene by using a deep convolutional neural network in an image or a video.
Background
In the field of deep object detection, as the performance of object detection is continuously improved, the performance of small-scale object detection becomes a new bottleneck, and some new network structures are proposed to improve the problem of small-scale object detection. A feature pyramid network (FPN for short) is representative thereof. The FPN introduces the pyramid thought widely applied in the field of traditional image processing into a deep object detection framework, and greatly improves the object detection in a large scale range, particularly the detection performance of small-scale objects. The characteristic pyramid in the FPN is constructed in a top-down (top-down) mode, is integrated with a backbone network, and can be used in a single-stage or double-stage object detection method. The characteristic pyramid structure in DSSD [2] is similar to FPN, and the construction mode is complex, and the pyramid structure is used in single-stage object detection. The authors in Blitznet [3] attempted to solve the multi-tasking problem of object detection and semantic segmentation simultaneously using the structure of the feature pyramid for single-stage object detection. In DSOD [4], the authors propose a bottom-up (bottom-up) based dense connection-based network architecture that merges more shallow network features in the forward direction. Although the methods improve the detection performance of the small objects to a certain extent, the methods are away from the requirements of the actual scene to a certain extent.
Most of the existing methods adopt the method that the backbone network characteristics of the previous scale are fused with the characteristics of the current scale of the pyramid network through a jump connection module, some are characteristic pyramid structures from top to bottom, some are structures from bottom to top, and the utilization of the characteristics of different scales and different semantic levels is insufficient.
One challenge facing the computer vision field is identifying objects in a large scale range. At present, an object detection method based on a deep neural network obtains overwhelming performance advantages in the field of object detection. However, most existing object detection methods have a good detection effect on large-scale objects, and the detection effect on small-scale objects is unsatisfactory. One common small-scale object detection problem is shown in fig. 1. The reason is that as the network is continuously deepened and the resolution of the feature map is correspondingly reduced, the information of the small-scale object is gradually submerged in the context in the feature extraction process. However, in scenarios such as autopilot, the performance requirements for small-scale objects are very stringent.
Reference documents:
[1]Lin,T.Y.,Dollár,P.,Girshick,R.,He,K.,Hariharan,B.,&Belongie,S.(2017,July).Featurepyramidnetworks for object detection.In CVPR(Vol.1,No.2,p.4).
[2]Fu,C.Y.,Liu,W.,Ranga,A.,Tyagi,A.,&Berg,A.C.(2017).DSSD:Deconvolutional single shotdetector.arXivpreprint arXiv:1701.06659.
[3]Dvornik,N.,Shmelkov,K.,Mairal,J.,&Schmid,C.(2017,October).BlitzNet:A real-time deep network for scene understanding.In ICCV 2017-International Conference on ComputerVision(p.11).
[4]Shen,Z.,Liu,Z.,Li,J.,Jiang,Y.G.,Chen,Y.,&Xue,X.(2017,October).Dsod:Learning deeply supervised object detectors from scratch.In The IEEE International Conference on ComputerVision(ICCV)(Vol.3,No.6,p.7).
disclosure of Invention
In order to solve the problem that small-scale object information is gradually submerged by a background as a network deepens, the invention provides a deep bidirectional feature pyramid enhanced network for small-scale target detection so as to improve the scale robustness of an object detection algorithm. The technical scheme is as follows:
a deep bidirectional feature pyramid enhancement network for small scale target detection, comprising:
(1) determining a backbone network of a network coding end: the residual network is used as a backbone network, and the residual network comprises 5 convolution modules, wherein each convolution module starts with one pooling layer (pooling) or convolution with a step size of two (stride convolution).
(2) Designing a Bottom-up feature pyramid: in the construction process of the bottom-up feature pyramid, each feature fusion operation is completed by adding two paths of features by using corresponding elements, wherein one path is formed by carrying out channel feature fusion and channel direction dimension adjustment on the output of a posing layer of a trunk network with the current scale or a convolution layer with the step length of two convolution layers through a convolution layer with the convolution kernel of 1x1, the adjusted channels are 256, the other path is formed by continuously carrying out fusion of three scales from a third convolution module of the trunk network after the previous feature fusion in the bottom-up pyramid structure is carried out and then is carried out through a convolution kernel of 3x3 with the step length of two, and the output of the convolution layer with the channel number of 256 is output;
(3) designing a Top-down characteristic pyramid: in the construction process of the top-down feature pyramid, each feature fusion operation fuses three paths of features by adding corresponding elements, wherein the first path is output after the output of the last layer of a convolution module of a backbone network with the same output scale as that of the current fusion module passes through a convolution layer with convolution kernel 1 to fuse channel features and adjust the direction dimension of the channel to be 256, the second path is output after the output of a feature fusion module on the bottom-up pyramid corresponding to 1/2 scale of the output scale of the current fusion module passes through a convolution kernel of 3x3 and convolution layers with 256 output channels, and then is output after the up-sampling with multiple of 2, the third path is output after the output of a feature fusion module on the top-down pyramid corresponding to 1/2 scale of the output scale of the current fusion module passes through a convolution kernel of 3x3 and the convolution layers with 256 output channels, continuously performing three-scale fusion from the output of the last convolution module of the backbone network in the top-down characteristic pyramid through the up-sampling output with the multiple of 2;
(4) target detection sub-network: adopting a strategy of two-stage detection in the faster-rcnn, namely a candidate frame extraction stage and a target classification stage respectively, adopting convolution with a convolution kernel of 3x3 to perform regression of a target frame and prediction of probability of whether the target frame is the target or not on an output feature map of each scale of a top-down feature pyramid in an RPN stage, performing ROI-posing on the screened candidate target frame and the output feature map of the top-down feature pyramid with the corresponding scale, and finally performing frame adjustment and classification of specific classes of the target by using two fully-connected layers;
(5) outputting an object detection result: giving an input image, extracting features of a backbone network, fusing features of a bottom-up feature pyramid and a top-down feature pyramid, extracting and classifying candidate target frames on a feature map fused by the top-down feature pyramid, outputting the position and the scale of a target, adjusting the regression of the position information by a target classification stage according to the position information of the candidate target frames output by an RPN stage, and outputting the final position and scale, wherein the category of the target is determined by the output of the target classification stage; and (3) obtaining a high-resolution prediction image by fusing the multi-scale feature space and the semantic space at the decoding end, and up-sampling the prediction image to a scale consistent with the image so as to obtain a pixel-level semantic segmentation image of the input image.
Compared with FPN, the network structure provided by the invention integrates the bidirectional pyramids from top to bottom and from bottom to top at the same time, and can reserve more small-scale object information from a shallow network with richer details. Since the bottom-up pyramid structure uses a smaller number of channels, only a small amount of additional computation is added. The pyramid from bottom to top performs feature fusion in the forward part of the backbone network, and adds more channels for information transmission, thereby improving information loss of small objects in the information transmission process. The feature fusion module in the top-down feature pyramid utilizes three paths of information sources, and semantic levels of features of each path are different, so that diversity of information is increased, and retention of information of small-scale objects is facilitated.
Drawings
Figure 1 illustrates the small scale object detection problem. In the left picture, the person who wears black clothes and sits down misses, and in the right picture, the two innermost children miss
FIG. 2 depicts a deep bidirectional feature pyramid enhanced network for small scale target detection proposed by the present invention
FIG. 3 depicts feature fusion operations in a bottom-up feature pyramid and a top-down feature pyramid
FIG. 4 illustrates an overall object detection network architecture
FIG. 5 shows some experimental results of Resnet50-FPN compared to PEN proposed by the present invention.
FIG. 6 shows a general embodiment of the present invention
Detailed Description
The invention provides a deep bidirectional feature pyramid enhancement network for small-scale target detection, the network structure is shown in figure 2, and the network structure can enhance the forward transmission of features, particularly the maintenance of small-scale target information. The proposed network comprises a backbone convolutional neural network and a bottom-up semantically (bottom-up) feature pyramid and a top-down semantically (top-down) feature pyramid. The Top-down feature pyramid comprises three fusion input sources, namely a former scale of the backbone network, a current scale of the Top-down feature pyramid and a current scale of the bottom-up feature pyramid. The feature fusion module of the pyramid structure is shown in fig. 3. The feature fusion module in the top-down feature pyramid includes operations of upsampling, convolution and corresponding element addition. The feature fusion module in the bottom-up feature pyramid contains operations of down-sampling, convolution and addition of corresponding elements. In the present invention, a bottom-up feature pyramid is used to enhance a top-down feature pyramid like FPN to maintain richer small-scale target information during the network forward process. The enhanced bidirectional feature pyramid network as a subject network may be combined with a target detection sub-network (such as fast-rcnn) to form an overall target detection network as shown in fig. 4. Both the bottom-up characteristic pyramid structure and the top-down characteristic pyramid structure are beneficial to improving the information loss of the small-scale target in the forward process of the deep network, so that the performance of small-scale target detection is improved.
The invention mainly comprises two aspects of overall network construction and parameter learning of the network. These two aspects will be described in detail below.
The method comprises the following steps of firstly constructing an overall network, wherein the overall network can be divided into a backbone network, a bottom-up feature pyramid, a top-down feature pyramid and a target detection subnetwork.
Backbone network: in our implementation, a classical residual network is used as the backbone network. The specific implementation may combine the application scenario and the requirement of the device to select a suitable backbone network, for example, a scenario with high speed requirement and limited device computing performance, a lightweight backbone network needs to be selected, and Resnet18 may be used. When the device and efficiency requirements are not high but the performance requirements are strict, a more complex backbone network can be adopted. Here we take Resnet50 as an example, a 50-layer residual network contains 5 convolution modules, each starting with one pooling layer (posing) or convolution with a step size of two (stride convolution).
Bottom-up feature pyramid: the constructing operation of the bottom-up pyramid is completed by the operation of the left graph in fig. 3, in the constructing process of the bottom-up feature pyramid, each time the feature fusion operation completes the adding operation of two paths of features by corresponding elements, one path of output of a pooling layer of a backbone network with the current scale or a convolution layer with the step length of two convolution layers is subjected to channel feature fusion and channel direction dimension adjustment by one convolution layer with the convolution kernel of 1x1, the adjusted channels are uniform 256, the other path of output is output of the convolution layer with the channel number of 256 after the previous feature fusion in the bottom-up pyramid structure is subjected to 3x3, the step length is 2. Thus, the three scales are continuously fused from the third convolution module of the backbone network.
Top-down feature pyramid: the construction of the top-down feature pyramid is completed by the operation of the right diagram in fig. 3, and in the construction process of the top-down feature pyramid, three paths of features are fused by adding corresponding elements in each feature fusion operation. The first path is output after the last layer of convolution module of the backbone network with the same output scale as the current fusion module fuses channel features through a convolution layer with convolution kernel of 1 and adjusts the direction dimension of the channel to be uniform 256, the second path is output after the output of the feature fusion module corresponding to 1/2 scale of the output scale of the current fusion module on the bottom-up pyramid passes through a convolution kernel of 3x3, the number of output channels is 256 convolution layers, and then passes through an up-sampling output with multiple of 2, and the third path is output after the output of the feature fusion module corresponding to 1/2 scale of the output scale of the current fusion module on the top-down pyramid passes through a convolution kernel of 3x3, the number of output channels is 256 convolution layers, and then passes through the up-sampling output with multiple of 2. And continuously performing fusion of three scales from the output of the last convolution module of the backbone network in the top-down feature pyramid.
Target detection sub-network: similar to FPN, a strategy of two-stage detection in the fast-rcnn is adopted, and the two-stage detection is respectively a candidate frame extraction stage and a target classification stage. The rpn (regional pro-posal network) stage performs regression of the target box and prediction of the probability of being a target or not on the output feature map of each scale of the top-down feature pyramid using convolution with a convolution kernel of 3 × 3. And performing ROI-posing on the screened candidate target frame and an output feature map of a corresponding scale top-down feature pyramid, and finally performing frame adjustment and target specific category classification by using two full-connection layers. However, there is a difference from FPN in that the fused features of each scale of the pyramid in FPN are extracted by a convolutional layer with a convolution kernel of 3 × 3 and then detected. The proposed network followed each merge by a 3x3 convolutional layer and examined on the convolutional layer's output profile.
Secondly, the learning of network parameters, which can be divided into the following three parts.
Training and test data preparation: to prove the effectiveness of the proposed network, a database is selected, which is divided into a training set for learning network parameters and a test set for verifying the overall performance level of the network. Given that we are interested in small-scale object detection, microsoft's published COCO dataset is a better choice, training and testing sets have been already identified above and provide more objective evaluation criteria, and what we do is only to process the dataset into the form required by our network input and some data enhancement operations, depending on our choice of deep learning development framework, such as caffe, tenserflow, cafe 2, mxnet, pitorch, etc. Our experiments were all based on this dataset expansion.
Network initialization and training hyper-parameter settings: we used the resnet50 model trained on the image recognition database Imagenet as initial values for the backbone network parameters, and the rest was initialized randomly. Our training was performed on a single nvidiaitanx GPU, with the hyper-parameters of the training including the number of data set cycles set to 20, the initial value of the learning rate set to 1e-2, and the original 1/10 after the 12 th and 18 th cycles end, with the batch size set to 2.
Selection of a training strategy: a two-stage training strategy is adopted, the value of a backbone network is fixed in the first stage, parameters of a bottom-up feature pyramid and a top-down feature pyramid and parameters of a detection sub-network are adjusted until convergence, and fine tuning is carried out on the whole network in the second stage.
Effects of the embodiments: when the backbone network selects the resnet50, the comprehensive performance of the proposed network (PEN for short) and the FPN on the coco data set is compared with that in table 1, and it can be seen that the PEN provided by the invention increases the scale robustness of detection, and especially the detection performance of small-scale targets is obviously improved.
Fig. 5 shows the comparison result between the networks (PEN for short) of the present invention and FPN, where the same backbone network (such as the Resnet50) is used, the proposed PEN has a great advantage over FPN in the detection of small-scale objects (such as railway people and cars).
Table 1 comparison of test performance on COCO Minival dataset
Figure BDA0001834271190000061

Claims (1)

1. A small-scale target detection method based on a deep bidirectional feature pyramid enhanced network comprises the following steps:
(1) determining a backbone network of a network coding end: taking a residual error network as a backbone network, wherein the residual error network comprises 5 convolution modules, and each convolution module starts with one pooling layer position or convolution constraint with the step length of two;
(2) designing a Bottom-up feature pyramid: in the construction process of the bottom-up feature pyramid, each feature fusion operation is completed by adding two paths of features by using corresponding elements, wherein one path is formed by carrying out channel feature fusion and channel direction dimension adjustment on the output of a posing layer of a trunk network with the current scale or a convolution layer with the step length of two through a convolution layer with the convolution kernel of 1x1, the adjusted channels are 256, the other path is formed by continuously carrying out fusion of three scales from a third convolution module of the trunk network after the previous feature fusion in the bottom-up pyramid structure is carried out and then is 3x3 through the convolution kernel, the step length is two, and the output of the convolution layer with the channel number of 256 is output;
(3) designing a Top-down characteristic pyramid: in the construction process of the top-down feature pyramid, each feature fusion operation fuses three paths of features by adding corresponding elements, wherein the first path is output after the output of the last layer of a convolution module of a backbone network with the same output scale as that of the current fusion module passes through a convolution layer with convolution kernel 1 to fuse channel features and adjust the direction dimension of the channel to be 256, the second path is output after the output of a feature fusion module on the bottom-up pyramid corresponding to 1/2 scale of the output scale of the current fusion module passes through a convolution kernel of 3x3 and convolution layers with 256 output channels, and then is output after the up-sampling with multiple of 2, the third path is output after the output of a feature fusion module on the top-down pyramid corresponding to 1/2 scale of the output scale of the current fusion module passes through a convolution kernel of 3x3, outputting convolution layers with 256 channel numbers, performing up-sampling output with multiple of 2, and continuously performing fusion of three scales from the output of the last convolution module of the trunk network in a top-down characteristic pyramid;
(4) target detection sub-network: adopting a strategy of two-stage detection in the faster-rcnn, namely a candidate frame extraction stage and a target classification stage respectively, adopting convolution with a convolution kernel of 3x3 to perform regression of a target frame and prediction of probability of whether the target frame is the target or not on an output feature map of each scale of a top-down feature pyramid in an RPN stage, performing ROI-posing on the screened candidate target frame and the output feature map of the top-down feature pyramid with the corresponding scale, and finally performing frame adjustment and classification of specific classes of the target by using two fully-connected layers;
(5) outputting an object detection result: giving an input image, extracting features of a backbone network, fusing features of a bottom-up feature pyramid and a top-down feature pyramid, extracting and classifying candidate target frames on a feature map fused by the top-down feature pyramid, outputting the position and the scale of a target, adjusting the regression of the position information by a target classification stage according to the position information of the candidate target frames output by an RPN stage, and outputting the final position and scale, wherein the category of the target is determined by the output of the target classification stage; and (3) obtaining a high-resolution prediction image by fusing the multi-scale feature space and the semantic space at the decoding end, and up-sampling the prediction image to a scale consistent with the image so as to obtain a pixel-level semantic segmentation image of the input image.
CN201811219005.8A 2018-10-19 2018-10-19 Deep bidirectional feature pyramid enhanced network for small-scale target detection Expired - Fee Related CN109472298B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811219005.8A CN109472298B (en) 2018-10-19 2018-10-19 Deep bidirectional feature pyramid enhanced network for small-scale target detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811219005.8A CN109472298B (en) 2018-10-19 2018-10-19 Deep bidirectional feature pyramid enhanced network for small-scale target detection

Publications (2)

Publication Number Publication Date
CN109472298A CN109472298A (en) 2019-03-15
CN109472298B true CN109472298B (en) 2021-06-01

Family

ID=65664134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811219005.8A Expired - Fee Related CN109472298B (en) 2018-10-19 2018-10-19 Deep bidirectional feature pyramid enhanced network for small-scale target detection

Country Status (1)

Country Link
CN (1) CN109472298B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858539A (en) * 2019-01-24 2019-06-07 武汉精立电子技术有限公司 A kind of ROI region extracting method based on deep learning image, semantic parted pattern
CN110084816B (en) * 2019-03-21 2021-04-06 深圳大学 Object segmentation method, device, computer-readable storage medium and computer equipment
CN109903339B (en) * 2019-03-26 2021-03-05 南京邮电大学 Video group figure positioning detection method based on multi-dimensional fusion features
CN110084124B (en) * 2019-03-28 2021-07-09 北京大学 Feature enhancement target detection method based on feature pyramid network
CN110580699A (en) * 2019-05-15 2019-12-17 徐州医科大学 Pathological image cell nucleus detection method based on improved fast RCNN algorithm
CN110334622B (en) * 2019-06-24 2022-04-19 电子科技大学 Pedestrian retrieval method based on adaptive feature pyramid
CN110348384B (en) * 2019-07-12 2022-06-17 沈阳理工大学 Small target vehicle attribute identification method based on feature fusion
CN110378297B (en) * 2019-07-23 2022-02-11 河北师范大学 Remote sensing image target detection method and device based on deep learning and storage medium
CN111104962B (en) * 2019-11-05 2023-04-18 北京航空航天大学青岛研究院 Semantic segmentation method and device for image, electronic equipment and readable storage medium
CN111695398A (en) * 2019-12-24 2020-09-22 珠海大横琴科技发展有限公司 Small target ship identification method and device and electronic equipment
CN111242122B (en) * 2020-01-07 2023-09-08 浙江大学 Lightweight deep neural network rotating target detection method and system
CN111460926B (en) * 2020-03-16 2022-10-14 华中科技大学 Video pedestrian detection method fusing multi-target tracking clues
CN111539435A (en) * 2020-04-15 2020-08-14 创新奇智(合肥)科技有限公司 Semantic segmentation model construction method, image segmentation equipment and storage medium
CN113591872A (en) * 2020-04-30 2021-11-02 华为技术有限公司 Data processing system, object detection method and device
CN111898615A (en) * 2020-06-16 2020-11-06 济南浪潮高新科技投资发展有限公司 Feature extraction method, device, equipment and medium of object detection model
CN112528976B (en) * 2021-02-09 2021-09-21 北京世纪好未来教育科技有限公司 Text detection model generation method and text detection method
CN112634273B (en) * 2021-03-10 2021-08-13 四川大学 Brain metastasis segmentation system based on deep neural network and construction method thereof
CN113011442A (en) * 2021-03-26 2021-06-22 山东大学 Target detection method and system based on bidirectional adaptive feature pyramid
CN113111736A (en) * 2021-03-26 2021-07-13 浙江理工大学 Multi-stage characteristic pyramid target detection method based on depth separable convolution and fusion PAN
CN113705320A (en) * 2021-05-24 2021-11-26 中国科学院深圳先进技术研究院 Training method, medium, and apparatus for surgical motion recognition model
CN113378815B (en) * 2021-06-16 2023-11-24 南京信息工程大学 Scene text positioning and identifying system and training and identifying method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063623A (en) * 2010-12-28 2011-05-18 中南大学 Method for extracting image region of interest by combining bottom-up and top-down ways
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN107391609A (en) * 2017-07-01 2017-11-24 南京理工大学 A kind of Image Description Methods of two-way multi-modal Recursive Networks
CN107798691A (en) * 2017-08-30 2018-03-13 西北工业大学 A kind of unmanned plane independent landing terrestrial reference real-time detecting and tracking method of view-based access control model
CN108171752A (en) * 2017-12-28 2018-06-15 成都阿普奇科技股份有限公司 A kind of sea ship video detection and tracking based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063623A (en) * 2010-12-28 2011-05-18 中南大学 Method for extracting image region of interest by combining bottom-up and top-down ways
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN107391609A (en) * 2017-07-01 2017-11-24 南京理工大学 A kind of Image Description Methods of two-way multi-modal Recursive Networks
CN107798691A (en) * 2017-08-30 2018-03-13 西北工业大学 A kind of unmanned plane independent landing terrestrial reference real-time detecting and tracking method of view-based access control model
CN108171752A (en) * 2017-12-28 2018-06-15 成都阿普奇科技股份有限公司 A kind of sea ship video detection and tracking based on deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Bidirectional Feature Pyramid Network with Recurrent Attention Residual Modules for Shadow Detection;Lei Zhu 等;《Computer Vision ECCV 2018》;20180914;第122-136页 *
BlitzNet: A Real-Time Deep Network for Scene Understanding;Nikita Dvornik 等;《arXiv:1708.02813v1 [cs.CV]》;20170809;正文第1-10页 *
Feature Pyramid Networks for Object Detection;Tsung-Yi Lin 等;《arXiv:1612.03144v2 [cs.CV]》;20170419;正文第1-10页 *
用边缘金字塔结构实现Hausdorff距离匹配;韦燕凤 等;《计算机辅助设计与图形学学报》;20041231;第492-497页 *

Also Published As

Publication number Publication date
CN109472298A (en) 2019-03-15

Similar Documents

Publication Publication Date Title
CN109472298B (en) Deep bidirectional feature pyramid enhanced network for small-scale target detection
CN111126472B (en) SSD (solid State disk) -based improved target detection method
Xu et al. Learning deep structured multi-scale features using attention-gated crfs for contour prediction
Fu et al. Foreground gating and background refining network for surveillance object detection
CN110188239B (en) Double-current video classification method and device based on cross-mode attention mechanism
Zhuang et al. Unsupervised learning from video with deep neural embeddings
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
CN111460914A (en) Pedestrian re-identification method based on global and local fine-grained features
CN112036260B (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
Zhou et al. Regional attention with architecture-rebuilt 3d network for rgb-d gesture recognition
Xu et al. BANet: A balanced atrous net improved from SSD for autonomous driving in smart transportation
Zhao et al. Multifeature fusion action recognition based on key frames
Murase et al. Algan: Anomaly detection by generating pseudo anomalous data via latent variables
Kan et al. A GAN-based input-size flexibility model for single image dehazing
Cao et al. A new region proposal network for far-infrared pedestrian detection
CN113361466B (en) Multispectral target detection method based on multi-mode cross guidance learning
Peng et al. Motion boundary emphasised optical flow method for human action recognition
CN110728238A (en) Personnel re-detection method of fusion type neural network
CN113450297A (en) Fusion model construction method and system for infrared image and visible light image
Xue et al. Multi‐scale pedestrian detection with global–local attention and multi‐scale receptive field context
Han et al. Feature fusion and adversary occlusion networks for object detection
CN115861861A (en) Lightweight acceptance method based on unmanned aerial vehicle distribution line inspection
CN115761220A (en) Target detection method for enhancing detection of occluded target based on deep learning
CN111582057B (en) Face verification method based on local receptive field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210601

Termination date: 20211019