CN111144407A - Target detection method, system, device and readable storage medium - Google Patents
Target detection method, system, device and readable storage medium Download PDFInfo
- Publication number
- CN111144407A CN111144407A CN201911332544.7A CN201911332544A CN111144407A CN 111144407 A CN111144407 A CN 111144407A CN 201911332544 A CN201911332544 A CN 201911332544A CN 111144407 A CN111144407 A CN 111144407A
- Authority
- CN
- China
- Prior art keywords
- network
- output characteristic
- independent
- backbone network
- level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a target detection method, a system, a device and a readable storage medium, comprising the following steps: acquiring an image to be detected; inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N; and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network. According to the method and the device, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, the target detection precision is improved, and the cost for training the backbone networks is saved.
Description
Technical Field
The present invention relates to the field of computer vision, and in particular, to a method, a system, an apparatus, and a readable storage medium for target detection.
Background
The target detection has a very important position in the field of computer vision, belongs to the basic field of computer vision, and is also a research hotspot currently entering the field of computer vision.
Generally, the target detection framework includes backbone networks Backbones, a Feature Pyramid Network (FPN), a Region frame extraction Network (RPN), and specific task header networks Heads, and if the backbone networks can extract more representative features, the performance of corresponding target detection is better. However, the cost of designing a complex backbone network that can extract powerful features is extremely high, and how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a method, a system, a device and a readable storage medium for object detection. The specific scheme is as follows:
a method of target detection, comprising:
acquiring an image to be detected;
inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
Preferably, the process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and preprocessing an output characteristic diagram of the jth level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the jth-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
Preferably, the step of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
Preferably, the process of performing an upsampling operation on the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
Preferably, the independent backbone network is specifically resnet50, resnet101, resnext152 or senet 154.
Preferably, the network module is specifically a stage network module;
each of the independent backbone networks further comprises:
and the stem network module is positioned in front of the N network modules.
Preferably, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Correspondingly, the invention also discloses a target detection system, which comprises:
the input module is used for acquiring an image to be detected;
the cascade backbone module is used for inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and the target detection module is used for inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
Correspondingly, the invention also discloses a target detection device, which comprises:
a memory for storing a computer program;
a processor for implementing the steps of the object detection method as claimed in any one of the above when executing the computer program.
Accordingly, the present invention also discloses a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the object detection method as described in any one of the above.
The application discloses a target detection method, which comprises the following steps: acquiring an image to be detected; inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N; and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network. According to the method and the device, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, the target detection precision is improved, and the cost for training the backbone networks is saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a method for detecting a target according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a network architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a target detection system according to an embodiment of the present invention;
fig. 4 is a structural distribution diagram of an object detection apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Generally, the target detection framework comprises a backbone network, an FPN, an RPN and Heads, and if the backbone network can extract more representative features, the performance of corresponding target detection is better. But the cost of designing a complex backbone network that can extract powerful features is extremely high. According to the method and the device, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, the target detection precision is improved, and the cost for training the backbone networks is saved.
The embodiment of the invention discloses a target detection method, which is shown in figure 1 and comprises the following steps:
s1: acquiring an image to be detected;
s2: inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
it can be understood that the cascaded backbone network in this embodiment is composed of a plurality of independent backbone networks, each of the independent backbone networks has the same network structure, the first K-1 independent backbone networks serve as auxiliary backbone networks, the kth backbone network serves as a main backbone network, and an output characteristic diagram of the kth backbone network serves as an input characteristic diagram of the target detection network. The independent backbone network may be any model weight trained in the ImageNet data set, such as resnet50 or resnet101, and if the video memory is sufficient, a deeper network model such as resnext152 or senet154 may be selected. Each independent backbone network comprises N network modules, specifically a stage network module, and further comprises a stem network module in front of the N network modules. Taking resnet50 as an example, referring to the schematic diagram of the network structure shown in fig. 2, in each independent backbone network, conv1 indicates that a stem network module acquires an image to be detected, and then the image passes through four stage network modules, each stage network module is composed of a plurality of residual modules, and the resolution of the output feature map of each stage network module is half of that of the input feature map.
The feature graph with high resolution has weaker semantic information and less information loss, and the feature graph with low resolution has stronger semantic information but more information loss due to feature selection, so that the embodiment completes feature fusion of high resolution and low resolution by using the cascade connection between independent backbone networks, specifically:
is an output characteristic diagram of a j-th level network module of an i-th independent backbone network,is a pair ofThe pre-treatment is carried out, and the pretreatment,is an output characteristic diagram of a j-1 level network module of an i +1 th independent backbone network,is an output characteristic diagram of a j-th level network module of an i +1 th independent backbone network,and performing internal calculation on the input characteristic diagram for the j-th level network module of the (i + 1) th independent backbone network.
The process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
Further, the process of preprocessing the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
The process of performing upsampling operation on the output characteristic diagram of the jth level network module of the ith independent backbone network specifically includes:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
It is understood that the main purpose of the preprocessing is to keep the resolution and the number of channels of the two output feature maps consistent, and the preprocessing can be related to other optimization purposes, and is not limited herein; when the resolution and the channel number of two output characteristic graphs are unified, the purpose can be realized by other processing modes in the preprocessing besides the convolution operation and the up-sampling operation; further, besides the interpolation calculations, the upsampling operation may also be performed by another calculation method.
S3: and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
It is understood that the object detection network in this embodiment may adopt any object detection framework, and may specifically include an FPN network and/or an RPN network and/or an ads network. The subsequent preset processing steps are completed through the target detection network, and the specific operation belongs to the prior art and is not described herein again.
The embodiment of the application discloses a target detection method, which comprises the following steps: acquiring an image to be detected; inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N; and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network. According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
The embodiment of the invention discloses a specific target detection system, and compared with the previous embodiment, the technical scheme is further explained and optimized in the embodiment. Specifically, the method comprises the following steps:
in the embodiment, simple model network construction is performed through the current mainstream deep learning framework pytorch and the deep learning model library torchvision, and the pre-training weight of the independent backbone network is loaded from the torchvision library without retraining.
This example was performed in an experimental environment with 8V 100 GPUs, with the COCO dataset used for the database. The number of pictures that can be processed by each GPU is 2, and the initial learning rate is 0.01. During training, data enhancement adopts horizontal inversion, the short edge of the picture is set to be 800, and the long edge of the picture is set to be 1333. During the test, for fair comparison, the Soft-NMS method is not adopted.
The COCO test-2017 data set results are shown in Table 1.
Table 1: COCO test-2017 dataset results
Backbone | APbox | AP50 | AP75 | |
Cascade RCNN | Resnet101 | 42.8 | 62.1 | 46.3 |
Cascaded backbone network (K2) | Resnet101 | 44.1 | 62.3 | 47.9 |
Backbone | APmask | AP50 | AP75 | |
Mask RCNN | Resnet101 | 35.9 | 57.9 | 38 |
Cascaded backbone network (K2) | Resnet101 | 36.9 | 59.5 | 39.2 |
Two sets of experiments were performed here to compare the algorithm performance of this embodiment with other algorithms on target detection and example segmentation, respectively. The first group of experiments are Cascade RCNN network models with better target detection performance, the second group of experiments are Mask RCNN network models with better example segmentation performance, and backbone networks all adopt Resnet 101. It can be seen from table 1 that, no matter the target detection or the example segmentation is performed, the performance of the method of this embodiment is improved by one percent compared with that of the reference method under the same condition of other network configurations of the model.
Correspondingly, the present invention also discloses a target detection system, as shown in fig. 3, including:
the input module 01 is used for acquiring an image to be detected;
a cascade backbone module 02 for inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and the target detection module 03 is configured to input the output feature map of each stage of the kth independent backbone network into the target detection network.
In the embodiment, the backbone network does not need to be trained from the beginning, but the mature independent backbone networks are cascaded, so that the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the cascade backbone module 02 is specifically configured to:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Correspondingly, the invention also discloses a target detection device, which is shown in fig. 4 and comprises a processor 11 and a memory 12; wherein the processing 11 implements the following steps when executing the computer program stored in the memory 12:
acquiring an image to be detected;
inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented: and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Further, the target detection apparatus in this embodiment may further include:
the input interface 13 is configured to obtain a computer program imported from the outside, store the obtained computer program in the memory 12, and further be configured to obtain various instructions and parameters transmitted by an external terminal device, and transmit the instructions and parameters to the processor 11, so that the processor 11 performs corresponding processing by using the instructions and parameters. In this embodiment, the input interface 13 may specifically include, but is not limited to, a USB interface, a serial interface, a voice input interface, a fingerprint input interface, a hard disk reading interface, and the like.
And an output interface 14, configured to output various data generated by the processor 11 to a terminal device connected thereto, so that other terminal devices connected to the output interface 14 can acquire various data generated by the processor 11. In this embodiment, the output interface 14 may specifically include, but is not limited to, a USB interface, a serial interface, and the like.
A communication unit 15 for establishing a telecommunication connection between the object detection loader and the external server so that the object detection device can mount the image file to the external server. In this embodiment, the communication unit 15 may specifically include, but is not limited to, a remote communication unit based on a wireless communication technology or a wired communication technology.
And the keyboard 16 is used for acquiring various parameter data or instructions input by a user through real-time key cap knocking.
And the display 17 is used for displaying relevant information of the target detection process in real time so that a user can know the target detection condition in time.
The mouse 18 may be used to assist the user in entering data and to simplify the user's operation.
Further, embodiments of the present application also disclose a computer-readable storage medium, where the computer-readable storage medium includes Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable hard disk, CD-ROM, or any other form of storage medium known in the art. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring an image to be detected;
inputting an image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and less than K, and j is more than or equal to 1 and less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
According to the method and the device, the backbone network does not need to be trained from the beginning, the mature independent backbone networks are cascaded, the high-level features and the low-level features of the backbone networks are fused, and the target detection precision is improved.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented: and preprocessing the output characteristic diagram of the j-th level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the j-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
In some specific embodiments, when the processor 11 executes the computer subprogram stored in the memory 12, the following steps may be specifically implemented:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
In some specific embodiments, the independent backbone network is specifically resnet50, resnet101, resnext152, or senet 154.
In some specific embodiments, the network module is specifically a stage network module;
each independent backbone network further comprises:
and the stem network module is positioned in front of the N network modules.
In some specific embodiments, the object detection network specifically includes an FPN network and/or an RPN network and/or an ads network.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above detailed description is provided for a target detection method, a system, an apparatus and a readable storage medium, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A method of object detection, comprising:
acquiring an image to be detected;
inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
2. The object detection method of claim 1, wherein the step of preprocessing the output characteristic diagram of the jth network module of the ith independent backbone network specifically comprises:
and preprocessing an output characteristic diagram of the jth level network module of the ith independent backbone network so as to keep the output characteristic diagram consistent with the output characteristic diagram of the jth-1 level network module of the (i + 1) th independent backbone network in terms of resolution and channel number.
3. The object detection method of claim 2, wherein the step of preprocessing the output characteristic map of the jth network module of the ith independent backbone network specifically comprises:
performing 1 × 1 convolution operation on an output characteristic diagram of a jth level network module of an ith independent backbone network;
and performing up-sampling operation on the output characteristic diagram of the j-th level network module of the ith independent backbone network.
4. The method according to claim 3, wherein the process of performing an upsampling operation on the output characteristic map of the jth network module of the ith independent backbone network specifically includes:
and performing nearest neighbor interpolation calculation or bilinear interpolation calculation or bicubic interpolation calculation on the output characteristic graph of the j-th level network module of the ith independent backbone network.
5. The object detection method according to any of claims 1 to 4, characterized in that the independent backbone network is in particular resnet50, resnet101, resnext152 or senet 154.
6. The object detection method according to claim 5,
the network module is specifically a stage network module;
each of the independent backbone networks further comprises:
and the stem network module is positioned in front of the N network modules.
7. The object detection method according to claim 6, wherein the object detection network specifically comprises an FPN network and/or an RPN network and/or an HEADS network.
8. An object detection system, comprising:
the input module is used for acquiring an image to be detected;
the cascade backbone module is used for inputting the image to be detected into a cascade backbone network; the cascade backbone network comprises K independent backbone networks, each independent backbone network comprises N network modules, an output characteristic diagram of a j-th level network module of an ith independent backbone network is preprocessed and then summed with an output characteristic diagram of a j-1 level network module of an (i + 1) th independent backbone network and input into the j-th level network module of the (i + 1) th independent backbone network, wherein i is more than or equal to 1 and is less than or equal to K, and j is more than or equal to 1 and is less than or equal to N;
and the target detection module is used for inputting the output characteristic diagram of each level of the Kth independent backbone network into the target detection network.
9. An object detection device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the object detection method according to any one of claims 1 to 7 when executing the computer program.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the object detection method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911332544.7A CN111144407A (en) | 2019-12-22 | 2019-12-22 | Target detection method, system, device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911332544.7A CN111144407A (en) | 2019-12-22 | 2019-12-22 | Target detection method, system, device and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111144407A true CN111144407A (en) | 2020-05-12 |
Family
ID=70519271
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911332544.7A Withdrawn CN111144407A (en) | 2019-12-22 | 2019-12-22 | Target detection method, system, device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111144407A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112348121A (en) * | 2020-12-01 | 2021-02-09 | 吉林大学 | Target detection method, target detection equipment and computer storage medium |
CN113779318A (en) * | 2021-08-09 | 2021-12-10 | 中国中医科学院中医药信息研究所 | Backbone network extraction method and device, computer equipment and storage medium |
CN117218580A (en) * | 2023-09-13 | 2023-12-12 | 杭州像素元科技有限公司 | Expressway cross-camera multi-vehicle tracking method and system combining multiple models |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902800A (en) * | 2019-01-22 | 2019-06-18 | 北京大学 | The method of multistage backbone network detection generic object based on quasi- Feedback Neural Network |
CN110232316A (en) * | 2019-05-05 | 2019-09-13 | 杭州电子科技大学 | A kind of vehicle detection and recognition method based on improved DSOD model |
CN110264466A (en) * | 2019-06-28 | 2019-09-20 | 广州市颐创信息科技有限公司 | A kind of reinforcing bar detection method based on depth convolutional neural networks |
US20190370648A1 (en) * | 2018-05-29 | 2019-12-05 | Google Llc | Neural architecture search for dense image prediction tasks |
CN110543912A (en) * | 2019-09-02 | 2019-12-06 | 李肯立 | Method for automatically acquiring cardiac cycle video in fetal key section ultrasonic video |
-
2019
- 2019-12-22 CN CN201911332544.7A patent/CN111144407A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190370648A1 (en) * | 2018-05-29 | 2019-12-05 | Google Llc | Neural architecture search for dense image prediction tasks |
CN109902800A (en) * | 2019-01-22 | 2019-06-18 | 北京大学 | The method of multistage backbone network detection generic object based on quasi- Feedback Neural Network |
CN110232316A (en) * | 2019-05-05 | 2019-09-13 | 杭州电子科技大学 | A kind of vehicle detection and recognition method based on improved DSOD model |
CN110264466A (en) * | 2019-06-28 | 2019-09-20 | 广州市颐创信息科技有限公司 | A kind of reinforcing bar detection method based on depth convolutional neural networks |
CN110543912A (en) * | 2019-09-02 | 2019-12-06 | 李肯立 | Method for automatically acquiring cardiac cycle video in fetal key section ultrasonic video |
Non-Patent Citations (1)
Title |
---|
YUDONG LIU ET AL.: ""CBNet: A Novel Composite Backbone Network Architecture for Object Detection"", 《ARXIV》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112348121A (en) * | 2020-12-01 | 2021-02-09 | 吉林大学 | Target detection method, target detection equipment and computer storage medium |
CN113779318A (en) * | 2021-08-09 | 2021-12-10 | 中国中医科学院中医药信息研究所 | Backbone network extraction method and device, computer equipment and storage medium |
CN113779318B (en) * | 2021-08-09 | 2024-03-19 | 中国中医科学院中医药信息研究所 | Backbone network extraction method, backbone network extraction device, computer equipment and storage medium |
CN117218580A (en) * | 2023-09-13 | 2023-12-12 | 杭州像素元科技有限公司 | Expressway cross-camera multi-vehicle tracking method and system combining multiple models |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109934197B (en) | Training method and device for face recognition model and computer readable storage medium | |
WO2021022521A1 (en) | Method for processing data, and method and device for training neural network model | |
CN109993040A (en) | Text recognition method and device | |
CN111144407A (en) | Target detection method, system, device and readable storage medium | |
CN111259940A (en) | Target detection method based on space attention map | |
CN107464217B (en) | Image processing method and device | |
CN111160533A (en) | Neural network acceleration method based on cross-resolution knowledge distillation | |
CN115457531A (en) | Method and device for recognizing text | |
CN114005012A (en) | Training method, device, equipment and storage medium of multi-mode pre-training model | |
CN112084920B (en) | Method, device, electronic equipment and medium for extracting hotwords | |
CN111108508B (en) | Face emotion recognition method, intelligent device and computer readable storage medium | |
CN108960425B (en) | Rendering model training method, system, equipment, medium and rendering method | |
CN111383232A (en) | Matting method, matting device, terminal equipment and computer-readable storage medium | |
CN110399826B (en) | End-to-end face detection and identification method | |
KR102593835B1 (en) | Face recognition technology based on heuristic Gaussian cloud transformation | |
CN109978139B (en) | Method, system, electronic device and storage medium for automatically generating description of picture | |
US20230122927A1 (en) | Small object detection method and apparatus, readable storage medium, and electronic device | |
CN116363261A (en) | Training method of image editing model, image editing method and device | |
CN115861462A (en) | Training method and device for image generation model, electronic equipment and storage medium | |
CN112329808A (en) | Optimization method and system of Deeplab semantic segmentation algorithm | |
CN115713462A (en) | Super-resolution model training method, image recognition method, device and equipment | |
CN113592881B (en) | Picture designability segmentation method, device, computer equipment and storage medium | |
CN114037772A (en) | Training method of image generator, image generation method and device | |
CN113822521A (en) | Method and device for detecting quality of question library questions and storage medium | |
CN111768406A (en) | Cell image processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200512 |