CN111144407A - Target detection method, system, device and readable storage medium - Google Patents

Target detection method, system, device and readable storage medium Download PDF

Info

Publication number
CN111144407A
CN111144407A CN201911332544.7A CN201911332544A CN111144407A CN 111144407 A CN111144407 A CN 111144407A CN 201911332544 A CN201911332544 A CN 201911332544A CN 111144407 A CN111144407 A CN 111144407A
Authority
CN
China
Prior art keywords
network
output characteristic
independent
backbone network
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911332544.7A
Other languages
Chinese (zh)
Inventor
张润泽
郭振华
赵雅倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201911332544.7A priority Critical patent/CN111144407A/en
Publication of CN111144407A publication Critical patent/CN111144407A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a target detection method, a system, a device and a readable storage medium, comprising the following steps: acquiring an image to be detected; inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N; and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network. According to the method and the device, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, the target detection precision is improved, and the cost for training the backbone networks is saved.

Description

Target detection method, system, device and readable storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a method, a system, an apparatus, and a readable storage medium for target detection.
Background
The target detection has a very important position in the field of computer vision, belongs to the basic field of computer vision, and is also a research hotspot currently entering the field of computer vision.
Generally, the target detection framework includes backbone networks Backbones, a Feature Pyramid Network (FPN), a Region frame extraction Network (RPN), and specific task header networks Heads, and if the backbone networks can extract more representative features, the performance of corresponding target detection is better. However, the cost of designing a complex backbone network that can extract powerful features is extremely high, and how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a method, a system, a device and a readable storage medium for object detection. The specific scheme is as follows:
a method of target detection, comprising:
acquiring an image to be detected;
inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
Preferably, the process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and preprocessing an output characteristic diagram of the jth level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the jth-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
Preferably, the step of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
Preferably, the process of performing an upsampling operation on the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
Preferably, the independent backbone network is specifically resnet50, resnet101, resnext152 or senet 154.
Preferably, the network module is specifically a stage network module;
each of the independent backbone networks further comprises:
and the stem network module is positioned in front of the N network modules.
Preferably, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Correspondingly, the invention also discloses a target detection system, which comprises:
the input module is used for acquiring an image to be detected;
the cascade backbone module is used for inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and the target detection module is used for inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
Correspondingly, the invention also discloses a target detection device, which comprises:
a memory for storing a computer program;
a processor for implementing the steps of the object detection method as claimed in any one of the above when executing the computer program.
Accordingly, the present invention also discloses a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.
The application discloses a target detection method, which comprises the following steps: acquiring an image to be detected; inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N; and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network. According to the method and the device, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, the target detection precision is improved, and the cost for training the backbone networks is saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a method for detecting a target according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a network architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a target detection system according to an embodiment of the present invention;
fig. 4 is a structural distribution diagram of an object detection apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Generally, the target detection framework comprises a backbone network, an FPN, an RPN and Heads, and if the backbone network can extract more representative features, the performance of corresponding target detection is better. But the cost of designing a complex backbone network that can extract powerful features is extremely high. According to the method and the device, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, the target detection precision is improved, and the cost for training the backbone networks is saved.
The embodiment of the invention discloses a target detection method, which is shown in figure 1 and comprises the following steps:
s1: acquiring an image to be detected;
s2: inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
it can be understood that the cascaded backbone network in this embodiment is composed of a plurality of independent backbone networks, each of the independent backbone networks has the same network structure, the first K-1 independent backbone networks serve as auxiliary backbone networks, the kth backbone network serves as a main backbone network, and an output characteristic diagram of the kth backbone network serves as an input characteristic diagram of the target detection network. The independent backbone network may be any model weight trained in the ImageNet data set, such as resnet50 or resnet101, and if the video memory is sufficient, a deeper network model such as resnext152 or senet154 may be selected. Each independent backbone network comprises N network modules, specifically a stage network module, and further comprises a stem network module in front of the N network modules. Taking resnet50 as an example, referring to the schematic diagram of the network structure shown in fig. 2, in each independent backbone network, conv1 indicates that a stem network module acquires an image to be detected, and then the image passes through four stage network modules, each stage network module is composed of a plurality of residual modules, and the resolution of the output feature map of each stage network module is half of that of the input feature map.
The feature graph with high resolution has weaker semantic information and less information loss, and the feature graph with low resolution has stronger semantic information but more information loss due to feature selection, so that the embodiment completes feature fusion of high resolution and low resolution by using the cascade connection between independent backbone networks, specifically:
Figure BDA0002330052090000041
Figure BDA0002330052090000051
is an output characteristic diagram of a j-th level network module of an i-th independent backbone network,
Figure BDA0002330052090000052
is a pair of
Figure BDA0002330052090000053
The pre-treatment is carried out, and the pretreatment,
Figure BDA0002330052090000054
is an output characteristic diagram of a j-1 level network module of an i +1 th independent backbone network,
Figure BDA0002330052090000055
is an output characteristic diagram of a j-th level network module of an i +1 th independent backbone network,
Figure BDA0002330052090000056
and performing internal calculation on the input characteristic diagram for the j-th level network module of the (i + 1) th independent backbone network.
The process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
Further, the process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
The process of performing upsampling operation on the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
It is understood that the main purpose of the preprocessing is to keep the resolution and the number of channels of the two output feature maps consistent, and the preprocessing can be related to other optimization purposes, and is not limited herein; when the resolution and the channel number of two output characteristic graphs are unified, the purpose can be realized by other processing modes in the preprocessing besides the convolution operation and the up-sampling operation; further, besides the interpolation calculations, the upsampling operation may also be performed by another calculation method.
S3: and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
It is understood that the object detection network in this embodiment may adopt any object detection framework, and may specifically include an FPN network and/or an RPN network and/or an ads network. The subsequent preset processing steps are completed through the target detection network, and the specific operation belongs to the prior art and is not described herein again.
The embodiment of the application discloses a target detection method, which comprises the following steps: acquiring an image to be detected; inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N; and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network. According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
The embodiment of the invention discloses a specific target detection system, and compared with the previous embodiment, the technical scheme is further explained and optimized in the embodiment. Specifically, the method comprises the following steps:
in the embodiment, simple model network construction is performed through the current mainstream deep learning framework pytorch and the deep learning model library torchvision, and the pre-training weight of the independent backbone network is loaded from the torchvision library without retraining.
This example was performed in an experimental environment with 8V 100 GPUs, with the COCO dataset used for the database. The number of pictures that can be processed by each GPU is 2, and the initial learning rate is 0.01. During training, data enhancement adopts horizontal inversion, the short edge of the picture is set to be 800, and the long edge of the picture is set to be 1333. During the test, for fair comparison, the Soft-NMS method is not adopted.
The COCO test-2017 data set results are shown in Table 1.
Table 1: COCO test-2017 dataset results
Backbone APbox AP50 AP75
Cascade RCNN Resnet101 42.8 62.1 46.3
Cascaded backbone network (K2) Resnet101 44.1 62.3 47.9
Backbone APmask AP50 AP75
Mask RCNN Resnet101 35.9 57.9 38
Cascaded backbone network (K2) Resnet101 36.9 59.5 39.2
Two sets of experiments were performed here to compare the algorithm performance of this embodiment with other algorithms on target detection and example segmentation, respectively. The first group of experiments are Cascade RCNN network models with better target detection performance, the second group of experiments are Mask RCNN network models with better example segmentation performance, and backbone networks all adopt Resnet 101. It can be seen from table 1 that, no matter the target detection or the example segmentation is performed, the performance of the method of this embodiment is improved by one percent compared with that of the reference method under the same condition of other network configurations of the model.
Correspondingly, the present invention also discloses a target detection system, as shown in fig. 3, including:
the input module 01 is used for acquiring an image to be detected;
a cascade backbone module 02 for inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and the target detection module 03 is configured to input the output feature map of each stage of the kth independent backbone network into the target detection network.
In the embodiment, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Correspondingly, the invention also discloses a target detection device, which is shown in fig. 4 and comprises a processor 11 and a memory 12; wherein the processing 11 implements the following steps when executing the computer program stored in the memory 12:
acquiring an image to be detected;
inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented: and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Further, the target detection apparatus in this embodiment may further include:
the input interface 13 is configured to obtain a computer program imported from the outside, store the obtained computer program in the memory 12, and further be configured to obtain various instructions and parameters transmitted by an external terminal device, and transmit the instructions and parameters to the processor 11, so that the processor 11 performs corresponding processing by using the instructions and parameters. In this embodiment, the input interface 13 may specifically include, but is not limited to, a USB interface, a serial interface, a voice input interface, a fingerprint input interface, a hard disk reading interface, and the like.
And an output interface 14, configured to output various data generated by the processor 11 to a terminal device connected thereto, so that other terminal devices connected to the output interface 14 can acquire various data generated by the processor 11. In this embodiment, the output interface 14 may specifically include, but is not limited to, a USB interface, a serial interface, and the like.
A communication unit 15 for establishing a telecommunication connection between the object detection loader and the external server so that the object detection device can mount the image file to the external server. In this embodiment, the communication unit 15 may specifically include, but is not limited to, a remote communication unit based on a wireless communication technology or a wired communication technology.
And the keyboard 16 is used for acquiring various parameter data or instructions input by a user through real-time key cap knocking.
And the display 17 is used for displaying relevant information of the target detection process in real time so that a user can know the target detection condition in time.
The mouse 18 may be used to assist the user in entering data and to simplify the user's operation.
Further, embodiments of the present application also disclose a computer-readable storage medium, where the computer-readable storage medium includes Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable hard disk, CD-ROM, or any other form of storage medium known in the art. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring an image to be detected;
inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented: and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above detailed description is provided for a target detection method, a system, an apparatus and a readable storage medium, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method of object detection, comprising:
acquiring an image to be detected;
inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
2. The object detection method of claim 1, wherein the step of preprocessing the output characteristic diagram of the jth network module of the ith independent backbone network specifically comprises:
and preprocessing an output characteristic diagram of the jth level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the jth-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
3. The object detection method of claim 2, wherein the step of preprocessing the output characteristic map of the jth network module of the ith independent backbone network specifically comprises:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
4. The method according to claim 3, wherein the process of performing an upsampling operation on the output characteristic map of the jth network module of the ith independent backbone network specifically includes:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
5. The object detection method according to any of claims 1 to 4, characterized in that the independent backbone network is in particular resnet50, resnet101, resnext152 or senet 154.
6. The object detection method according to claim 5,
the network module is specifically a stage network module;
each of the independent backbone networks further comprises:
and the stem network module is positioned in front of the N network modules.
7. The object detection method according to claim 6, wherein the object detection network specifically comprises an FPN network and/or an RPN network and/or an HEADS network.
8. An object detection system, comprising:
the input module is used for acquiring an image to be detected;
the cascade backbone module is used for inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and the target detection module is used for inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
9. An object detection device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the object detection method according to any one of claims 1 to 7 when executing the computer program.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the object detection method according to any one of claims 1 to 7.
CN201911332544.7A 2019-12-22 2019-12-22 Target detection method, system, device and readable storage medium Withdrawn CN111144407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911332544.7A CN111144407A (en) 2019-12-22 2019-12-22 Target detection method, system, device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911332544.7A CN111144407A (en) 2019-12-22 2019-12-22 Target detection method, system, device and readable storage medium

Publications (1)

Publication Number Publication Date
CN111144407A true CN111144407A (en) 2020-05-12

Family

ID=70519271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911332544.7A Withdrawn CN111144407A (en) 2019-12-22 2019-12-22 Target detection method, system, device and readable storage medium

Country Status (1)

Country Link
CN (1) CN111144407A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348121A (en) * 2020-12-01 2021-02-09 吉林大学 Target detection method, target detection equipment and computer storage medium
CN113779318A (en) * 2021-08-09 2021-12-10 中国中医科学院中医药信息研究所 Backbone network extraction method and device, computer equipment and storage medium
CN117218580A (en) * 2023-09-13 2023-12-12 杭州像素元科技有限公司 Expressway cross-camera multi-vehicle tracking method and system combining multiple models

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902800A (en) * 2019-01-22 2019-06-18 北京大学 The method of multistage backbone network detection generic object based on quasi- Feedback Neural Network
CN110232316A (en) * 2019-05-05 2019-09-13 杭州电子科技大学 A kind of vehicle detection and recognition method based on improved DSOD model
CN110264466A (en) * 2019-06-28 2019-09-20 广州市颐创信息科技有限公司 A kind of reinforcing bar detection method based on depth convolutional neural networks
US20190370648A1 (en) * 2018-05-29 2019-12-05 Google Llc Neural architecture search for dense image prediction tasks
CN110543912A (en) * 2019-09-02 2019-12-06 李肯立 Method for automatically acquiring cardiac cycle video in fetal key section ultrasonic video

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370648A1 (en) * 2018-05-29 2019-12-05 Google Llc Neural architecture search for dense image prediction tasks
CN109902800A (en) * 2019-01-22 2019-06-18 北京大学 The method of multistage backbone network detection generic object based on quasi- Feedback Neural Network
CN110232316A (en) * 2019-05-05 2019-09-13 杭州电子科技大学 A kind of vehicle detection and recognition method based on improved DSOD model
CN110264466A (en) * 2019-06-28 2019-09-20 广州市颐创信息科技有限公司 A kind of reinforcing bar detection method based on depth convolutional neural networks
CN110543912A (en) * 2019-09-02 2019-12-06 李肯立 Method for automatically acquiring cardiac cycle video in fetal key section ultrasonic video

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUDONG LIU ET AL.: ""CBNet: A Novel Composite Backbone Network Architecture for Object Detection"", 《ARXIV》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348121A (en) * 2020-12-01 2021-02-09 吉林大学 Target detection method, target detection equipment and computer storage medium
CN113779318A (en) * 2021-08-09 2021-12-10 中国中医科学院中医药信息研究所 Backbone network extraction method and device, computer equipment and storage medium
CN113779318B (en) * 2021-08-09 2024-03-19 中国中医科学院中医药信息研究所 Backbone network extraction method, backbone network extraction device, computer equipment and storage medium
CN117218580A (en) * 2023-09-13 2023-12-12 杭州像素元科技有限公司 Expressway cross-camera multi-vehicle tracking method and system combining multiple models

Similar Documents

Publication Publication Date Title
CN109934197B (en) Training method and device for face recognition model and computer readable storage medium
WO2021022521A1 (en) Method for processing data, and method and device for training neural network model
CN109993040A (en) Text recognition method and device
CN111144407A (en) Target detection method, system, device and readable storage medium
CN111259940A (en) Target detection method based on space attention map
CN107464217B (en) Image processing method and device
CN111160533A (en) Neural network acceleration method based on cross-resolution knowledge distillation
CN115457531A (en) Method and device for recognizing text
CN114005012A (en) Training method, device, equipment and storage medium of multi-mode pre-training model
CN112084920B (en) Method, device, electronic equipment and medium for extracting hotwords
CN111108508B (en) Face emotion recognition method, intelligent device and computer readable storage medium
CN108960425B (en) Rendering model training method, system, equipment, medium and rendering method
CN111383232A (en) Matting method, matting device, terminal equipment and computer-readable storage medium
CN110399826B (en) End-to-end face detection and identification method
KR102593835B1 (en) Face recognition technology based on heuristic Gaussian cloud transformation
CN109978139B (en) Method, system, electronic device and storage medium for automatically generating description of picture
US20230122927A1 (en) Small object detection method and apparatus, readable storage medium, and electronic device
CN116363261A (en) Training method of image editing model, image editing method and device
CN115861462A (en) Training method and device for image generation model, electronic equipment and storage medium
CN112329808A (en) Optimization method and system of Deeplab semantic segmentation algorithm
CN115713462A (en) Super-resolution model training method, image recognition method, device and equipment
CN113592881B (en) Picture designability segmentation method, device, computer equipment and storage medium
CN114037772A (en) Training method of image generator, image generation method and device
CN113822521A (en) Method and device for detecting quality of question library questions and storage medium
CN111768406A (en) Cell image processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200512