CN112396126A - Target detection method and system based on detection of main stem and local feature optimization - Google Patents

Target detection method and system based on detection of main stem and local feature optimization Download PDF

Info

Publication number
CN112396126A
CN112396126A CN202011388976.2A CN202011388976A CN112396126A CN 112396126 A CN112396126 A CN 112396126A CN 202011388976 A CN202011388976 A CN 202011388976A CN 112396126 A CN112396126 A CN 112396126A
Authority
CN
China
Prior art keywords
training
network
data
target detection
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011388976.2A
Other languages
Chinese (zh)
Other versions
CN112396126B (en
Inventor
郑慧诚
严志伟
黄梓轩
李烨
陈绿然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011388976.2A priority Critical patent/CN112396126B/en
Publication of CN112396126A publication Critical patent/CN112396126A/en
Application granted granted Critical
Publication of CN112396126B publication Critical patent/CN112396126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method and a system based on detection of trunk and local feature optimization, wherein the method comprises the following steps: acquiring training data and preprocessing the training data to obtain preprocessed data; constructing a target detection network based on a long-neck backbone architecture and a local feature optimization module; training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network; and acquiring data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result. The system comprises: the device comprises a preprocessing module, a network construction module, a training module and a detection module. By using the invention, the detector is ensured to obtain satisfactory performance on the premise of being computationally friendly. The target detection method and the system based on the detection backbone and the local feature optimization can be widely applied to the field of target detection networks.

Description

Target detection method and system based on detection of main stem and local feature optimization
Technical Field
The invention belongs to the field of target detection networks, and particularly relates to a target detection method and a target detection system based on detection backbone and local feature optimization.
Background
Target detection has wide application as a basic task of computer vision, and is a hot field of research in academic and industrial fields. With the rise of deep learning, the field of target detection is greatly developed. However, the current detector has poor performance for detecting small-scale targets, mainly due to the fast information loss in the backbone network and the insufficient modeling of local information by the detection head.
The main network is used as a basic structure for feature extraction, and plays a significant role in the target detection effect. Due to the general shortage of target detection training samples, most current detectors employ network backbones pre-trained on large image classification datasets. The difference of tasks causes the problem of domain deviation when the network is finely adjusted, and meanwhile, the structural design space of the backbone network is limited to a certain extent by adopting the pre-training network. Due to the fact that the currently and commonly adopted backbone network carries out pooling operation prematurely, space detail information is lost, and the method is unfavorable for feature expression of small targets.
On the other hand, the detection head part of the current mainstream detector usually uses a feature pyramid as an input, shallow feature semantic information in the pyramid is insufficient, and spatial information of deep features is seriously lost, so how to enhance feature expression and detection of a detection layer on a small-scale target is a problem that needs to be solved at present.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a method and a system for target detection based on detection of main stem and local feature optimization, which ensure that the detector obtains satisfactory performance on the premise of being computationally friendly.
The first technical scheme adopted by the invention is as follows: a target detection method based on main detection and local feature optimization comprises the following steps:
acquiring training data and preprocessing the training data to obtain preprocessed data;
constructing a target detection network based on a long-neck backbone architecture and a local feature optimization module;
training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network;
and acquiring data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result.
Further, the step of obtaining training data and preprocessing the training data to obtain preprocessed data specifically includes:
collecting training data according to the problem domain and marking the training data to obtain marked training data;
the training data comprises public data sets and solid shot images from the Internet, and information in the training data comprises original material pictures and annotation records of target positions and categories in the pictures.
Further, the target detection network comprises a long-neck residual error trunk network and a local feature optimization module, the long-neck residual error trunk network comprises six feature extraction convolution modules, and the local feature optimization module comprises a local fusion module and a scale supervision module.
Further, the feature extraction convolution module comprises an inclusion module, and the inclusion module comprises two branches.
Further, the local fusion module comprises a detail re-guiding branch, a local context branch and an original input mapping branch, wherein the detail re-guiding branch enables the input feature graph to sequentially pass through a 1 × 1 convolution layer, a maximum pooling layer, a 3 × 3 convolution layer and a batch normalization layer, the local upper and lower branches enable the input feature graph to sequentially pass through the 1 × 1 convolution layer, an inverse convolution layer, the 3 × 3 convolution layer and the batch normalization layer, and the original input mapping branch enables the input feature graph to sequentially pass through the 1 × 1 convolution layer, the 3 × 3 convolution layer and the batch normalization layer.
Further, the step of training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network specifically includes:
dividing data into a training set, a verification set and a test set according to a certain proportion;
calculating network output by taking the training set as input in the training process of the target detection network through operations such as convolution and the like to obtain a prediction frame set;
according to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set comprises a category vector and a position vector;
for the classification subtask, using the cross entropy between the prediction frame class vector and the labeling frame class vector as a loss function;
for the positioning subtask, calculating the position loss of the prediction frame and the marking frame through a Smooth L1 loss function;
calculating the gradient of the parameters in the convolutional layer by layer according to the calculated loss and a random gradient descent method, and updating the parameters of each layer in the network;
in the training process, the generalization of the network is evaluated by taking the verification set as input at fixed iteration times at intervals;
and after the training is finished, evaluating the performance of the network by taking the test set as the input of the network, and simultaneously storing parameters such as a convolution kernel, an offset and the like in the network to obtain the trained target detection network.
Further, the step of acquiring data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result specifically includes:
obtaining an image of a target to be detected by taking data to be detected;
inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a prediction frame and an N-dimensional vector sequence representing class prediction through a convolutional layer;
the detector discards a part of low-quality results according to the N-dimensional vector sequence predicted by the category through an artificially preset category confidence threshold to obtain the residual detection results;
and (4) the residual detection results pass through the confidence degrees of the prediction frames and the overlapping rate between the prediction frames calculated based on the position 4-dimensional vector, and the prediction frames are subjected to de-duplication based on a non-maximum suppression algorithm to obtain and output the final detection result of the detector.
The method and the system have the beneficial effects that: a local feature optimization module for spatial local information fusion is designed, so that not only can semantic information of a detection layer be enhanced, but also spatial local information of detection head features is guaranteed, small target detection is particularly facilitated, in order to overcome the problem of performance reduction during random initialization of backbone network parameters, a suitable learning strategy is further provided, and the detector is guaranteed to obtain satisfactory performance on the premise of friendly calculation power.
Drawings
FIG. 1 is a network architecture of a target detection network based on detection backbone and local feature optimization according to the present invention;
FIG. 2 is a flowchart illustrating steps of a method for detecting a target based on detecting stem and local feature optimization according to the present invention;
FIG. 3 is a block diagram of a target detection system based on detection of stem and local feature optimization according to the present invention;
FIG. 4 illustrates a branch structure in a local fusion module according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
As shown in fig. 1 and fig. 2, the present invention provides a target detection method based on detecting stem and local feature optimization, which includes the following steps:
s1, acquiring training data and preprocessing the training data to obtain preprocessed data;
s2, constructing a target detection network based on the long-neck backbone architecture and the local feature optimization module;
s3, training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network;
specifically, in order to overcome the performance reduction caused by no pre-training, the invention optimizes the training strategy to ensure that similar or even better performance is obtained under the same training resources, and the specific improvement is as follows: (1) differentiation learning rate: the part of the network before the local inclusion module is consistent with the existing ResNet structure, and meanwhile, the lower-layer visual features have stronger generalization capability, so that pre-training initialization parameters can be adopted. For the pre-trained network part, a smaller learning rate is adopted to keep the pre-training knowledge; for randomly initialized parameters, a large learning rate is employed to facilitate the search of the network in the parameter space. By adopting the difference learning strategy, the detection network not only can have generalization performance brought by pre-training, but also can ensure faster learning convergence speed. (2) Strengthening the stability of the initial stage of training: the network adopts a characteristic pyramid structure to carry out target detection, which is beneficial to enhancing the robustness of a target scale, but a high-resolution characteristic diagram in a detection layer easily generates overlarge gradient at the initial training stage, and the convergence of a learning process is influenced. The invention adopts the preheating technology, ensures the gradual optimization of the network by gradually increasing the learning rate in the initial training stage, and prevents the network from deviating from the optimization target too far in the initial stage, thereby ensuring the learning process to be more stable. By adopting preheating, the statistical characteristics obtained by the network at the initial training stage are more accurate, and the problem that the existing randomly initialized target detection network depends on large-batch learning is solved, so that the satisfactory performance can be obtained under the condition of smaller computing resource requirements.
And S4, acquiring the data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result.
Further, as a preferred embodiment of the method, the step of obtaining the training data and preprocessing the training data to obtain preprocessed data specifically includes:
collecting training data according to the problem domain and marking the training data to obtain marked training data;
the training data comprises public data sets and solid shot images from the Internet, and information in the training data comprises original material pictures and annotation records of target positions and categories in the pictures.
Specifically, here a label box is generated, containing a label box category vector and a position vector.
Further as a preferred embodiment of the method, the target detection network includes a long-neck residual error trunk network and a local feature optimization module, the long-neck residual error trunk network includes six feature extraction convolution modules, and the local feature optimization module includes a local fusion module and a scale supervision module.
Specifically, as shown in the upper half of fig. 1, "long-neck residual backbone network", the structure of the backbone network basically adopts a residual structure, but differs from the conventional ResNet in two places: (1) a local inclusion module for obtaining a multiple receptive field ratio is added; (2) the neck is longer, so that richer space detail characteristics can be extracted;
in addition, as shown in the upper left of fig. 1, the architecture of the long-neck trunk is based on a residual error network, and mainly includes 6 convolution levels responsible for feature extraction, one of which is a local inclusion module. Unlike the normal residual network, the long-neck backbone network cancels one of the largest pooling layers after the conv1 level, resulting in multiplication of the input profile resolution of the conv2_ x level and thereafter the backbone network. In addition, removal of the pooling layer also slows down the increase of the receptive field in the trunk, thereby facilitating capture of fine-grained features.
If the pooling layer is simply removed, the feature resolution will be increased, which results in a certain amount of computation increase. The invention also provides a simplified version of a long-neck residual backbone network (LN-ResNet-light). In comparison to LN-ResNet, LN-ResNet-light preserves the largest pooling layer behind conv1 in the original ResNet structure, and reduces the first residual block convolution step of conv3_ x to 1, thereby reducing the overall computation.
The long-neck backbone network (LN-ResNet) provided by the invention is mainly used for extracting fine-grained spatial information in an image. The network enhances the extraction of high-resolution features by prolonging the depth of a neck (each convolution layer in front of a detection layer), relieves the problem of too fast loss of space detail information in a common backbone network, and enhances the feature expression of small-scale targets
Further as a preferred embodiment of the method, the feature extraction convolution module includes an inclusion module, and the inclusion module includes two branches.
Specifically, the local inclusion module comprises two branches. The input features are first passed through a volume of 1 x 1 layers in both branches to compress the number of channels to reduce the number of computations.
After that, the two branches respectively include a 1 × 3 convolution and a 3 × 1 convolution, and the two parallel convolution layer processes are different from the serial processes in the common inclusion, and are mainly used for obtaining the receptive field information with different aspect ratios, so that the targets with different aspect ratios are more effectively expressed and modeled. In addition, the convolutional layers are also beneficial to expanding the receptive field and deepening the network, thereby enhancing the semantic expression.
And finally, splicing the output characteristics of the two branches and fusing the output characteristics through a 3 multiplied by 3 convolutional layer. The fused output is added with the input of the whole module to form a residual structure, so that the effective propagation of the gradient is ensured.
As a further preferred embodiment of the method, the local fusion module includes a detail re-directing branch, a local context branch and an original input mapping branch, the detail re-directing branch sequentially passes the input feature map through the 1 × 1 convolutional layer, the maximum pooling layer, the 3 × 3 convolutional layer and the batch normalization layer, the local up-down branch sequentially passes the input feature map through the 1 × 1 convolutional layer, the inverse convolutional layer, the 3 × 3 convolutional layer and the batch normalization layer, and the original input mapping branch sequentially passes the input feature map through the 1 × 1 convolutional layer, the 3 × 3 convolutional layer and the batch normalization layer.
Specifically, as shown in fig. 4, the detail re-directing branch: the branch is designed primarily to alleviate the problem of loss of detail information due to pooling. It uses as input the feature map that is shallowest in the previous adjacent level of the detection layer and has twice the spatial resolution to guarantee spatial detail as much as possible. The input feature map is first passed through a convolutional layer compression pass, and then the resolution is reduced using a max pooling layer (Maxpooling) to obtain a feature map with the same resolution as the middle branch. Finally, a convolution layer and Batch Normalization (BN) layer are used for further feature transformation; local context branching: the branch assists the location and identification of the target by introducing local context information of the target. The input of the method is from the next stage of the current detection layer, and the spatial resolution is half of the characteristic diagram of the detection layer. Firstly, the input feature map passes through a 1 × 1 convolutional layer to reduce the number of channels, then the deconvolution layer performs up-sampling on the feature map to generate a feature map with the same spatial resolution as that of the detection layer, and finally the feature map passes through a 3 × 3 convolutional layer and a batch normalization layer. Different from a common hourglass structure, the input of the branch is a characteristic layer adjacent to a detection layer, so that the detection layer semantics are enhanced, and meanwhile, the locality of context characteristics is guaranteed; original input mapping branch: the branch inputs the original feature map into a 1 × 1 convolutional layer and a 3 × 3 convolutional layer for feature transformation before channel compression and fusion so as to control the subsequent calculation increase possibly brought by a local fusion module and better fuse with the features of the other two branches.
As a preferred embodiment of the method, the step of training the target detection network based on the preprocessed data and a preset training strategy to obtain the trained target detection network specifically includes:
dividing data into a training set, a verification set and a test set according to a certain proportion;
calculating network output by taking the training set as input in the training process of the target detection network through operations such as convolution and the like to obtain a prediction frame set;
specifically, before training, a series of preprocessing rules for the input image are set, wherein the preprocessing operations that must be included include image normalization for stable training and changing image size to control computational complexity. During training, on the basis of necessary operation, a series of random preprocessing operations such as random clipping are introduced to achieve the purpose of data augmentation and enhance the performance of the network.
According to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set comprises a category vector and a position vector;
for the classification subtask, using the cross entropy between the prediction frame class vector and the labeling frame class vector as a loss function;
for the positioning subtask, calculating the position loss of the prediction frame and the marking frame through a Smooth L1 loss function;
calculating the gradient of the parameters in the convolutional layer by layer according to the calculated loss and a random gradient descent method, and updating the parameters of each layer in the network;
in the training process, the generalization of the network is evaluated by taking the verification set as input at fixed iteration times at intervals, so that the network is prevented from being influenced by overfitting;
and after the training is finished, evaluating the performance of the network by taking the test set as the input of the network, and simultaneously storing parameters such as a convolution kernel, an offset and the like in the network to obtain the trained target detection network.
Specifically, in actual detection, the trained model can be recovered only by assigning the parameter value to the parameter of the corresponding layer in the network through the parameter name, and the model is used as a basis for outputting a detection result in a subsequent detection process.
Further, as a preferred embodiment of the method, the step of acquiring the data to be detected, inputting the data to be detected to the trained target detection network, and outputting the detection result specifically includes:
obtaining an image of a target to be detected by taking data to be detected;
inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a prediction frame and an N-dimensional vector sequence representing class prediction through a convolutional layer;
the detector discards a part of low-quality results according to the N-dimensional vector sequence predicted by the category through an artificially preset category confidence threshold to obtain the residual detection results;
and calculating the overlapping rate of the prediction frames according to the residual detection results through the confidence degrees of the prediction frames and the position-based 4-dimensional vector, and removing the duplication of the prediction frames based on a non-maximum suppression algorithm to obtain and output the final detection result of the detector.
Specifically, the detector first discards a portion of the low quality results from the N-dimensional sequence of class predictions by an artificially predetermined class confidence threshold. The remaining detection results are de-duplicated from the detection boxes according to a non-maximum suppression (NMS) algorithm by the prediction box confidence and the overlap ratio between the prediction boxes calculated based on the position 4-dimensional vector. And finally, the residual prediction frame is the detection result of the detector.
As shown in fig. 3, an object detection system based on the optimization of detection main stem and local features includes the following modules:
the preprocessing module is used for acquiring training data and preprocessing the training data to obtain preprocessed data;
the network construction module is used for constructing a target detection network based on the long-neck backbone architecture and the local feature optimization module;
the training module is used for training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network;
and the detection module is used for acquiring data to be detected, inputting the data to be detected into the trained target detection network and outputting a detection result.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A target detection method based on main detection and local feature optimization is characterized by comprising the following steps:
acquiring training data and preprocessing the training data to obtain preprocessed data;
constructing a target detection network based on a long-neck backbone architecture and a local feature optimization module;
training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network;
and acquiring data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result.
2. The method according to claim 1, wherein the step of obtaining training data and preprocessing the training data to obtain preprocessed data specifically comprises:
collecting training data according to the problem domain and marking the training data to obtain marked training data;
the training data comprises public data sets and solid shot images from the Internet, and information in the training data comprises original material pictures, and annotation records of target positions and categories in the pictures.
3. The method according to claim 2, wherein the target detection network comprises a long-neck residual error trunk network and a local feature optimization module, the long-neck residual error trunk network comprises six feature extraction convolution modules, and the local feature optimization module comprises a local fusion module and a scale supervision module.
4. The method of claim 3, wherein the feature extraction convolution module comprises an inclusion module, and the inclusion module comprises two branches.
5. The method of claim 4, wherein the local fusion module comprises a detail re-directing branch, a local context branch and an original input mapping branch, the detail re-directing branch sequentially passes the input feature map through a 1 x 1 convolutional layer, a maximum pooling layer, a 3 x 3 convolutional layer and a batch normalization layer, the local up-down branch sequentially passes the input feature map through a 1 x 1 convolutional layer, an anti-convolutional layer, a 3 x 3 convolutional layer and a batch normalization layer, and the original input mapping branch sequentially passes the input feature map through a 1 x 1 convolutional layer, a 3 x 3 convolutional layer and a batch normalization layer.
6. The method according to claim 5, wherein the step of training the target detection network based on the preprocessed data and the preset training strategy to obtain the trained target detection network specifically comprises:
dividing data into a training set, a verification set and a test set according to a certain proportion;
calculating network output by taking the training set as input in the training process of the target detection network through operations such as convolution and the like to obtain a prediction frame set;
according to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set comprises a category vector and a position vector;
for the classification subtask, using the cross entropy between the prediction frame class vector and the labeling frame class vector as a loss function;
for the positioning subtask, calculating the position loss of the prediction frame and the marking frame through a Smooth L1 loss function;
calculating the gradient of the parameters in the convolutional layer by layer according to the calculated loss and a random gradient descent method, and updating the parameters of each layer in the network;
in the training process, the generalization of the network is evaluated by taking the verification set as input at fixed iteration times at intervals;
and after the training is finished, evaluating the performance of the network by taking the test set as the input of the network, and simultaneously storing parameters such as a convolution kernel, an offset and the like in the network to obtain the trained target detection network.
7. The method according to claim 3, wherein the step of obtaining the data to be detected, inputting the data to be detected to the trained target detection network, and outputting the detection result specifically comprises:
acquiring data to be detected to obtain an image of a target to be detected;
inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a prediction frame and an N-dimensional vector sequence representing class prediction through a convolutional layer;
the detector discards a part of low-quality results according to the N-dimensional vector sequence predicted by the category through an artificially preset category confidence threshold to obtain the residual detection results;
and calculating the overlapping rate of the prediction frames according to the residual detection results through the confidence degrees of the prediction frames and the position-based 4-dimensional vector, and removing the duplication of the prediction frames based on a non-maximum suppression algorithm to obtain and output the final detection result of the detector.
8. A target detection system based on main detection and local feature optimization is characterized by comprising the following modules:
the preprocessing module is used for acquiring training data and preprocessing the training data to obtain preprocessed data;
the network construction module is used for constructing a target detection network based on the long-neck backbone architecture and the local feature optimization module;
the training module is used for training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network;
and the detection module is used for acquiring data to be detected, inputting the data to be detected into the trained target detection network and outputting a detection result.
CN202011388976.2A 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization Active CN112396126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011388976.2A CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011388976.2A CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Publications (2)

Publication Number Publication Date
CN112396126A true CN112396126A (en) 2021-02-23
CN112396126B CN112396126B (en) 2023-09-22

Family

ID=74604938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011388976.2A Active CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Country Status (1)

Country Link
CN (1) CN112396126B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554125A (en) * 2021-09-18 2021-10-26 四川翼飞视科技有限公司 Object detection apparatus, method and storage medium combining global and local features

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN110188720A (en) * 2019-06-05 2019-08-30 上海云绅智能科技有限公司 A kind of object detection method and system based on convolutional neural networks
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A kind of small target deteection of Enhanced feature study and recognition methods
CN111144329A (en) * 2019-12-29 2020-05-12 北京工业大学 Light-weight rapid crowd counting method based on multiple labels

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN110188720A (en) * 2019-06-05 2019-08-30 上海云绅智能科技有限公司 A kind of object detection method and system based on convolutional neural networks
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A kind of small target deteection of Enhanced feature study and recognition methods
CN111144329A (en) * 2019-12-29 2020-05-12 北京工业大学 Light-weight rapid crowd counting method based on multiple labels

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554125A (en) * 2021-09-18 2021-10-26 四川翼飞视科技有限公司 Object detection apparatus, method and storage medium combining global and local features
CN113554125B (en) * 2021-09-18 2021-12-17 四川翼飞视科技有限公司 Object detection apparatus, method and storage medium combining global and local features

Also Published As

Publication number Publication date
CN112396126B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
WO2021227366A1 (en) Method for automatically and accurately detecting plurality of small targets
CN112597941B (en) Face recognition method and device and electronic equipment
CN111476302B (en) fast-RCNN target object detection method based on deep reinforcement learning
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
CN111696110B (en) Scene segmentation method and system
CN112541508A (en) Fruit segmentation and recognition method and system and fruit picking robot
CN109002755B (en) Age estimation model construction method and estimation method based on face image
CN114943876A (en) Cloud and cloud shadow detection method and device for multi-level semantic fusion and storage medium
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN111428664B (en) Computer vision real-time multi-person gesture estimation method based on deep learning technology
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN112016512A (en) Remote sensing image small target detection method based on feedback type multi-scale training
CN114170570A (en) Pedestrian detection method and system suitable for crowded scene
CN115410081A (en) Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
CN115222750A (en) Remote sensing image segmentation method and system based on multi-scale fusion attention
CN116740362A (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN114333062A (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN112396126A (en) Target detection method and system based on detection of main stem and local feature optimization
CN116110005A (en) Crowd behavior attribute counting method, system and product
CN116246147A (en) Cross-species target detection method based on cross-layer feature fusion and linear attention optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant