CN111144407A - Target detection method, system, device and readable storage medium - Google Patents

Target detection method, system, device and readable storage medium Download PDF

Info

Publication number
CN111144407A
CN111144407A CN201911332544.7A CN201911332544A CN111144407A CN 111144407 A CN111144407 A CN 111144407A CN 201911332544 A CN201911332544 A CN 201911332544A CN 111144407 A CN111144407 A CN 111144407A
Authority
CN
China
Prior art keywords
network
independent
backbone
backbone network
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911332544.7A
Other languages
Chinese (zh)
Inventor
张润泽
郭振华
赵雅倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201911332544.7A priority Critical patent/CN111144407A/en
Publication of CN111144407A publication Critical patent/CN111144407A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种目标检测方法、系统、装置及可读存储介质,包括:获取待检测图像;将所述待检测图像输入级联骨干网络;所述级联骨干网络包括K个独立骨干网络,每个所述独立骨干网络包括N个网络模块,第i个独立骨干网络的第j级网络模块的输出特征图经过预处理后与第i+1个独立骨干网络的第j‑1级网络模块的输出特征图求和并输入第i+1个独立骨干网络的第j级网络模块,其中1≤i<K,1<j≤N;将第K个独立骨干网络每一级的输出特征图输入目标检测网络。本申请中无需从头训练骨干网络,而是将成熟的独立骨干网络级联,使骨干网络高层特征和低层特征融合,提高了目标检测的精度,节省了训练骨干网络的成本。

Figure 201911332544

The present application discloses a target detection method, system, device and readable storage medium, including: acquiring an image to be detected; inputting the to-be-detected image into a cascaded backbone network; the cascaded backbone network includes K independent backbone networks , each of the independent backbone networks includes N network modules, and the output feature map of the j-th network module of the i-th independent backbone network is preprocessed with the j-1-th network of the i+1-th independent backbone network. The output feature maps of the modules are summed and input into the j-th network module of the i+1-th independent backbone network, where 1≤i<K, 1<j≤N; the output features of each stage of the K-th independent backbone network are combined Graph input object detection network. In this application, there is no need to train the backbone network from scratch, but mature independent backbone networks are cascaded to fuse high-level features and low-level features of the backbone network, which improves the accuracy of target detection and saves the cost of training the backbone network.

Figure 201911332544

Description

Target detection method, system, device and readable storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a method, a system, an apparatus, and a readable storage medium for target detection.
Background
The target detection has a very important position in the field of computer vision, belongs to the basic field of computer vision, and is also a research hotspot currently entering the field of computer vision.
Generally, the target detection framework includes backbone networks Backbones, a Feature Pyramid Network (FPN), a Region frame extraction Network (RPN), and specific task header networks Heads, and if the backbone networks can extract more representative features, the performance of corresponding target detection is better. However, the cost of designing a complex backbone network that can extract powerful features is extremely high, and how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a method, a system, a device and a readable storage medium for object detection. The specific scheme is as follows:
a method of target detection, comprising:
acquiring an image to be detected;
inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
Preferably, the process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and preprocessing an output characteristic diagram of the jth level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the jth-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
Preferably, the step of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
Preferably, the process of performing an upsampling operation on the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
Preferably, the independent backbone network is specifically resnet50, resnet101, resnext152 or senet 154.
Preferably, the network module is specifically a stage network module;
each of the independent backbone networks further comprises:
and the stem network module is positioned in front of the N network modules.
Preferably, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Correspondingly, the invention also discloses a target detection system, which comprises:
the input module is used for acquiring an image to be detected;
the cascade backbone module is used for inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and the target detection module is used for inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
Correspondingly, the invention also discloses a target detection device, which comprises:
a memory for storing a computer program;
a processor for implementing the steps of the object detection method as claimed in any one of the above when executing the computer program.
Accordingly, the present invention also discloses a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.
The application discloses a target detection method, which comprises the following steps: acquiring an image to be detected; inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N; and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network. According to the method and the device, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, the target detection precision is improved, and the cost for training the backbone networks is saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a method for detecting a target according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a network architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a target detection system according to an embodiment of the present invention;
fig. 4 is a structural distribution diagram of an object detection apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Generally, the target detection framework comprises a backbone network, an FPN, an RPN and Heads, and if the backbone network can extract more representative features, the performance of corresponding target detection is better. But the cost of designing a complex backbone network that can extract powerful features is extremely high. According to the method and the device, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, the target detection precision is improved, and the cost for training the backbone networks is saved.
The embodiment of the invention discloses a target detection method, which is shown in figure 1 and comprises the following steps:
s1: acquiring an image to be detected;
s2: inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
it can be understood that the cascaded backbone network in this embodiment is composed of a plurality of independent backbone networks, each of the independent backbone networks has the same network structure, the first K-1 independent backbone networks serve as auxiliary backbone networks, the kth backbone network serves as a main backbone network, and an output characteristic diagram of the kth backbone network serves as an input characteristic diagram of the target detection network. The independent backbone network may be any model weight trained in the ImageNet data set, such as resnet50 or resnet101, and if the video memory is sufficient, a deeper network model such as resnext152 or senet154 may be selected. Each independent backbone network comprises N network modules, specifically a stage network module, and further comprises a stem network module in front of the N network modules. Taking resnet50 as an example, referring to the schematic diagram of the network structure shown in fig. 2, in each independent backbone network, conv1 indicates that a stem network module acquires an image to be detected, and then the image passes through four stage network modules, each stage network module is composed of a plurality of residual modules, and the resolution of the output feature map of each stage network module is half of that of the input feature map.
The feature graph with high resolution has weaker semantic information and less information loss, and the feature graph with low resolution has stronger semantic information but more information loss due to feature selection, so that the embodiment completes feature fusion of high resolution and low resolution by using the cascade connection between independent backbone networks, specifically:
Figure BDA0002330052090000041
Figure BDA0002330052090000051
is an output characteristic diagram of a j-th level network module of an i-th independent backbone network,
Figure BDA0002330052090000052
is a pair of
Figure BDA0002330052090000053
The pre-treatment is carried out, and the pretreatment,
Figure BDA0002330052090000054
is an output characteristic diagram of a j-1 level network module of an i +1 th independent backbone network,
Figure BDA0002330052090000055
is an output characteristic diagram of a j-th level network module of an i +1 th independent backbone network,
Figure BDA0002330052090000056
and performing internal calculation on the input characteristic diagram for the j-th level network module of the (i + 1) th independent backbone network.
The process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
Further, the process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
The process of performing upsampling operation on the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
It is understood that the main purpose of the preprocessing is to keep the resolution and the number of channels of the two output feature maps consistent, and the preprocessing can be related to other optimization purposes, and is not limited herein; when the resolution and the channel number of two output characteristic graphs are unified, the purpose can be realized by other processing modes in the preprocessing besides the convolution operation and the up-sampling operation; further, besides the interpolation calculations, the upsampling operation may also be performed by another calculation method.
S3: and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
It is understood that the object detection network in this embodiment may adopt any object detection framework, and may specifically include an FPN network and/or an RPN network and/or an ads network. The subsequent preset processing steps are completed through the target detection network, and the specific operation belongs to the prior art and is not described herein again.
The embodiment of the application discloses a target detection method, which comprises the following steps: acquiring an image to be detected; inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N; and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network. According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
The embodiment of the invention discloses a specific target detection system, and compared with the previous embodiment, the technical scheme is further explained and optimized in the embodiment. Specifically, the method comprises the following steps:
in the embodiment, simple model network construction is performed through the current mainstream deep learning framework pytorch and the deep learning model library torchvision, and the pre-training weight of the independent backbone network is loaded from the torchvision library without retraining.
This example was performed in an experimental environment with 8V 100 GPUs, with the COCO dataset used for the database. The number of pictures that can be processed by each GPU is 2, and the initial learning rate is 0.01. During training, data enhancement adopts horizontal inversion, the short edge of the picture is set to be 800, and the long edge of the picture is set to be 1333. During the test, for fair comparison, the Soft-NMS method is not adopted.
The COCO test-2017 data set results are shown in Table 1.
Table 1: COCO test-2017 dataset results
Backbone APbox AP50 AP75
Cascade RCNN Resnet101 42.8 62.1 46.3
Cascaded backbone network (K2) Resnet101 44.1 62.3 47.9
Backbone APmask AP50 AP75
Mask RCNN Resnet101 35.9 57.9 38
Cascaded backbone network (K2) Resnet101 36.9 59.5 39.2
Two sets of experiments were performed here to compare the algorithm performance of this embodiment with other algorithms on target detection and example segmentation, respectively. The first group of experiments are Cascade RCNN network models with better target detection performance, the second group of experiments are Mask RCNN network models with better example segmentation performance, and backbone networks all adopt Resnet 101. It can be seen from table 1 that, no matter the target detection or the example segmentation is performed, the performance of the method of this embodiment is improved by one percent compared with that of the reference method under the same condition of other network configurations of the model.
Correspondingly, the present invention also discloses a target detection system, as shown in fig. 3, including:
the input module 01 is used for acquiring an image to be detected;
a cascade backbone module 02 for inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and the target detection module 03 is configured to input the output feature map of each stage of the kth independent backbone network into the target detection network.
In the embodiment, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Correspondingly, the invention also discloses a target detection device, which is shown in fig. 4 and comprises a processor 11 and a memory 12; wherein the processing 11 implements the following steps when executing the computer program stored in the memory 12:
acquiring an image to be detected;
inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented: and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Further, the target detection apparatus in this embodiment may further include:
the input interface 13 is configured to obtain a computer program imported from the outside, store the obtained computer program in the memory 12, and further be configured to obtain various instructions and parameters transmitted by an external terminal device, and transmit the instructions and parameters to the processor 11, so that the processor 11 performs corresponding processing by using the instructions and parameters. In this embodiment, the input interface 13 may specifically include, but is not limited to, a USB interface, a serial interface, a voice input interface, a fingerprint input interface, a hard disk reading interface, and the like.
And an output interface 14, configured to output various data generated by the processor 11 to a terminal device connected thereto, so that other terminal devices connected to the output interface 14 can acquire various data generated by the processor 11. In this embodiment, the output interface 14 may specifically include, but is not limited to, a USB interface, a serial interface, and the like.
A communication unit 15 for establishing a telecommunication connection between the object detection loader and the external server so that the object detection device can mount the image file to the external server. In this embodiment, the communication unit 15 may specifically include, but is not limited to, a remote communication unit based on a wireless communication technology or a wired communication technology.
And the keyboard 16 is used for acquiring various parameter data or instructions input by a user through real-time key cap knocking.
And the display 17 is used for displaying relevant information of the target detection process in real time so that a user can know the target detection condition in time.
The mouse 18 may be used to assist the user in entering data and to simplify the user's operation.
Further, embodiments of the present application also disclose a computer-readable storage medium, where the computer-readable storage medium includes Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable hard disk, CD-ROM, or any other form of storage medium known in the art. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring an image to be detected;
inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented: and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above detailed description is provided for a target detection method, a system, an apparatus and a readable storage medium, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1.一种目标检测方法,其特征在于,包括:1. a target detection method, is characterized in that, comprises: 获取待检测图像;Obtain the image to be detected; 将所述待检测图像输入级联骨干网络;所述级联骨干网络包括K个独立骨干网络,每个所述独立骨干网络包括N个网络模块,第i个独立骨干网络的第j级网络模块的输出特征图经过预处理后与第i+1个独立骨干网络的第j-1级网络模块的输出特征图求和并输入所述第i+1个独立骨干网络的第j级网络模块,其中1≤i<K,1<j≤N;Input the image to be detected into the cascaded backbone network; the cascaded backbone network includes K independent backbone networks, each of the independent backbone networks includes N network modules, and the j-th network module of the i-th independent backbone network After preprocessing, the output feature map of is summed with the output feature map of the j-1th level network module of the i+1th independent backbone network and input to the jth level network module of the i+1th independent backbone network, where 1≤i<K, 1<j≤N; 将第K个独立骨干网络每一级的输出特征图输入目标检测网络。The output feature maps of each stage of the K-th independent backbone network are input into the object detection network. 2.根据权利要求1所述目标检测方法,其特征在于,对第i个独立骨干网络的第j级网络模块的输出特征图进行预处理的过程,具体包括:2. The target detection method according to claim 1, wherein the process of preprocessing the output feature map of the j-th network module of the i-th independent backbone network specifically comprises: 对第i个独立骨干网络的第j级网络模块的输出特征图进行预处理,以使该输出特征图与所述第i+1个独立骨干网络的第j-1级网络模块的输出特征图在分辨率与通道数上保持一致。Preprocess the output feature map of the j-th network module of the i-th independent backbone network, so that the output feature map is the same as the output feature map of the j-1-th network module of the i+1-th independent backbone network. Be consistent in resolution and number of channels. 3.根据权利要求2所述目标检测方法,其特征在于,所述对第i个独立骨干网络的第j级网络模块的输出特征图进行预处理的过程,具体包括:3. The target detection method according to claim 2, wherein the process of preprocessing the output feature map of the j-th network module of the i-th independent backbone network specifically includes: 对第i个独立骨干网络的第j级网络模块的输出特征图进行1*1卷积操作;Perform a 1*1 convolution operation on the output feature map of the j-th network module of the i-th independent backbone network; 对第i个独立骨干网络的第j级网络模块的输出特征图进行上采样操作。An up-sampling operation is performed on the output feature map of the j-th network module of the i-th independent backbone network. 4.根据权利要求3所述目标检测方法,其特征在于,所述对第i个独立骨干网络的第j级网络模块的输出特征图进行上采样操作的过程,具体包括:4. The target detection method according to claim 3, wherein the process of performing an upsampling operation on the output feature map of the j-th network module of the i-th independent backbone network specifically includes: 对第i个独立骨干网络的第j级网络模块的输出特征图进行最近邻插值计算或双线性插值计算或双三次插值计算。The nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation is performed on the output feature map of the jth level network module of the ith independent backbone network. 5.根据权利要求1至4任一项所述目标检测方法,其特征在于,所述独立骨干网络具体为resnet50、resnet101、resnext152或senet154。The target detection method according to any one of claims 1 to 4, wherein the independent backbone network is specifically resnet50, resnet101, resnext152 or senet154. 6.根据权利要求5所述目标检测方法,其特征在于,6. The target detection method according to claim 5, characterized in that, 所述网络模块具体为stage网络模块;The network module is specifically a stage network module; 每个所述独立骨干网络还包括:Each of said independent backbone networks also includes: 位于N个所述网络模块前的stem网络模块。The stem network module located in front of the N network modules. 7.根据权利要求6所述目标检测方法,其特征在于,所述目标检测网络具体包括FPN网络和/或RPN网络和/或HEADS网络。7 . The target detection method according to claim 6 , wherein the target detection network specifically comprises an FPN network and/or an RPN network and/or a HEADS network. 8 . 8.一种目标检测系统,其特征在于,包括:8. A target detection system, characterized in that, comprising: 输入模块,用于获取待检测图像;Input module, used to obtain the image to be detected; 级联骨干模块,用于将所述待检测图像输入级联骨干网络;所述级联骨干网络包括K个独立骨干网络,每个所述独立骨干网络包括N个网络模块,第i个独立骨干网络的第j级网络模块的输出特征图经过预处理后与第i+1个独立骨干网络的第j-1级网络模块的输出特征图求和并输入所述第i+1个独立骨干网络的第j级网络模块,其中1≤i<K,1<j≤N;The cascaded backbone module is used to input the image to be detected into the cascaded backbone network; the cascaded backbone network includes K independent backbone networks, each of the independent backbone networks includes N network modules, and the i-th independent backbone network The output feature map of the j-th network module of the network is preprocessed and summed with the output feature map of the j-1-th network module of the i+1-th independent backbone network and input to the i+1-th independent backbone network The j-th network module, where 1≤i<K, 1<j≤N; 目标检测模块,用于将第K个独立骨干网络每一级的输出特征图输入目标检测网络。The target detection module is used to input the output feature map of each stage of the Kth independent backbone network into the target detection network. 9.一种目标检测装置,其特征在于,包括:9. A target detection device, characterized in that, comprising: 存储器,用于存储计算机程序;memory for storing computer programs; 处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述目标检测方法的步骤。The processor is configured to implement the steps of the target detection method according to any one of claims 1 to 7 when executing the computer program. 10.一种可读存储介质,其特征在于,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述目标检测方法的步骤。10. A readable storage medium, characterized in that, a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the target detection method according to any one of claims 1 to 7 is implemented. step.
CN201911332544.7A 2019-12-22 2019-12-22 Target detection method, system, device and readable storage medium Withdrawn CN111144407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911332544.7A CN111144407A (en) 2019-12-22 2019-12-22 Target detection method, system, device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911332544.7A CN111144407A (en) 2019-12-22 2019-12-22 Target detection method, system, device and readable storage medium

Publications (1)

Publication Number Publication Date
CN111144407A true CN111144407A (en) 2020-05-12

Family

ID=70519271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911332544.7A Withdrawn CN111144407A (en) 2019-12-22 2019-12-22 Target detection method, system, device and readable storage medium

Country Status (1)

Country Link
CN (1) CN111144407A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348121A (en) * 2020-12-01 2021-02-09 吉林大学 Target detection method, target detection equipment and computer storage medium
CN113779318A (en) * 2021-08-09 2021-12-10 中国中医科学院中医药信息研究所 Backbone network extraction method and device, computer equipment and storage medium
CN117218580A (en) * 2023-09-13 2023-12-12 杭州像素元科技有限公司 Expressway cross-camera multi-vehicle tracking method and system combining multiple models

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902800A (en) * 2019-01-22 2019-06-18 北京大学 A method for detecting general objects based on a multi-level backbone network based on quasi-feedback neural network
CN110232316A (en) * 2019-05-05 2019-09-13 杭州电子科技大学 A kind of vehicle detection and recognition method based on improved DSOD model
CN110264466A (en) * 2019-06-28 2019-09-20 广州市颐创信息科技有限公司 A kind of reinforcing bar detection method based on depth convolutional neural networks
US20190370648A1 (en) * 2018-05-29 2019-12-05 Google Llc Neural architecture search for dense image prediction tasks
CN110543912A (en) * 2019-09-02 2019-12-06 李肯立 A method for automatically obtaining cardiac cycle video from ultrasound video of key fetal views

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370648A1 (en) * 2018-05-29 2019-12-05 Google Llc Neural architecture search for dense image prediction tasks
CN109902800A (en) * 2019-01-22 2019-06-18 北京大学 A method for detecting general objects based on a multi-level backbone network based on quasi-feedback neural network
CN110232316A (en) * 2019-05-05 2019-09-13 杭州电子科技大学 A kind of vehicle detection and recognition method based on improved DSOD model
CN110264466A (en) * 2019-06-28 2019-09-20 广州市颐创信息科技有限公司 A kind of reinforcing bar detection method based on depth convolutional neural networks
CN110543912A (en) * 2019-09-02 2019-12-06 李肯立 A method for automatically obtaining cardiac cycle video from ultrasound video of key fetal views

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUDONG LIU ET AL.: ""CBNet: A Novel Composite Backbone Network Architecture for Object Detection"", 《ARXIV》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348121A (en) * 2020-12-01 2021-02-09 吉林大学 Target detection method, target detection equipment and computer storage medium
CN113779318A (en) * 2021-08-09 2021-12-10 中国中医科学院中医药信息研究所 Backbone network extraction method and device, computer equipment and storage medium
CN113779318B (en) * 2021-08-09 2024-03-19 中国中医科学院中医药信息研究所 Backbone network extraction method, backbone network extraction device, computer equipment and storage medium
CN117218580A (en) * 2023-09-13 2023-12-12 杭州像素元科技有限公司 Expressway cross-camera multi-vehicle tracking method and system combining multiple models

Similar Documents

Publication Publication Date Title
CN109934197B (en) Training method and device for face recognition model and computer readable storage medium
CN112257815B (en) Model generation method, target detection method, device, electronic device and medium
CN106778928B (en) Image processing method and device
CN109583340B (en) A video object detection method based on deep learning
CN111160533A (en) Neural network acceleration method based on cross-resolution knowledge distillation
CN108596258A (en) A kind of image classification method based on convolutional neural networks random pool
CN112132847A (en) Model training method, image segmentation method, apparatus, electronic device and medium
CN110427819B (en) A method and related equipment for identifying PPT borders in images
CN110245621B (en) Face recognition device, image processing method, feature extraction model, and storage medium
CN111144407A (en) Target detection method, system, device and readable storage medium
CN113724271A (en) Semantic segmentation model training method for scene understanding of mobile robot in complex environment
CN113888501B (en) Attention positioning network-based reference-free image quality evaluation method
CN111898693A (en) Training method, visibility estimation method and device for visibility classification model
CN110399826B (en) End-to-end face detection and identification method
CN115170403A (en) Font repairing method and system based on deep meta learning and generation countermeasure network
CN113807237A (en) Training of in vivo detection model, in vivo detection method, computer device, and medium
CN109101984B (en) Image identification method and device based on convolutional neural network
CN111680577A (en) Face detection method and device
CN107992944A (en) It is a kind of based on be originally generated confrontation network model multiple dimensioned convolution method
CN115187456A (en) Text recognition method, device, equipment and medium based on image enhancement processing
CN114612989A (en) Method and device for generating face recognition data set, electronic equipment and storage medium
CN111368898B (en) Image description generation method based on long-time and short-time memory network variant
CN116665217B (en) Ancient book text restoration method and system based on dual generative adversarial networks
CN117830835A (en) A satellite remote sensing image segmentation method based on deep learning
CN113221870B (en) OCR (optical character recognition) method, device, storage medium and equipment for mobile terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200512

WW01 Invention patent application withdrawn after publication