CN112396126B - Target detection method and system based on detection trunk and local feature optimization - Google Patents

Target detection method and system based on detection trunk and local feature optimization Download PDF

Info

Publication number
CN112396126B
CN112396126B CN202011388976.2A CN202011388976A CN112396126B CN 112396126 B CN112396126 B CN 112396126B CN 202011388976 A CN202011388976 A CN 202011388976A CN 112396126 B CN112396126 B CN 112396126B
Authority
CN
China
Prior art keywords
data
network
target detection
training
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011388976.2A
Other languages
Chinese (zh)
Other versions
CN112396126A (en
Inventor
郑慧诚
严志伟
黄梓轩
李烨
陈绿然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011388976.2A priority Critical patent/CN112396126B/en
Publication of CN112396126A publication Critical patent/CN112396126A/en
Application granted granted Critical
Publication of CN112396126B publication Critical patent/CN112396126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a target detection method and a target detection system based on detection trunk and local feature optimization, wherein the method comprises the following steps: acquiring training data and preprocessing the training data to obtain preprocessed data; constructing a target detection network based on a long-neck trunk architecture and a local feature optimization module; training the target detection network based on the preprocessing data and a preset training strategy to obtain a trained target detection network; and acquiring data to be detected, inputting the data to the trained target detection network, and outputting a detection result. The system comprises: the system comprises a preprocessing module, a network construction module, a training module and a detection module. By using the application, the satisfactory performance of the detector is ensured under the premise of friendly calculation force. The target detection method and the target detection system based on detection trunk and local feature optimization can be widely applied to the field of target detection networks.

Description

Target detection method and system based on detection trunk and local feature optimization
Technical Field
The application belongs to the field of target detection networks, and particularly relates to a target detection method and system based on detection trunk and local feature optimization.
Background
The object detection is widely applied as a basic task of computer vision, and is a hot spot field of research in academia and industry. With the rise of deep learning, the field of target detection has been greatly developed. However, current detectors do not perform well for small-scale targets, mainly due to the too fast information loss in the backbone network and the insufficient modeling of local information by the detection head.
The main network is used as a basic structure of feature extraction, and plays a role in target detection effect. Because of the lack of training samples for target detection in general, current detectors mostly employ a network backbone pre-trained on a large image classification dataset. The task difference causes the domain offset problem in the network fine tuning, and the adoption of the pre-training network also limits the structural design space of the backbone network to a certain extent. Because the pooling operation is carried out on the backbone network which is commonly adopted at present too early, the space detail information is lost, and the feature expression of the small target is unfavorable.
On the other hand, the detection head part of the current mainstream detector generally uses a feature pyramid as input, shallow feature semantic information in the pyramid is insufficient, and space information of deep features is seriously lost, so that how to enhance feature expression and detection of a small-scale target by a detection layer is a problem to be solved currently.
Disclosure of Invention
In order to solve the technical problems, the application aims to provide a target detection method and a target detection system based on detection trunk and local feature optimization, which ensure that a detector obtains satisfactory performance on the premise of friendly calculation force.
The first technical scheme adopted by the application is as follows: a target detection method based on detection trunk and local feature optimization comprises the following steps:
acquiring training data and preprocessing the training data to obtain preprocessed data;
constructing a target detection network based on a long-neck trunk architecture and a local feature optimization module;
training the target detection network based on the preprocessing data and a preset training strategy to obtain a trained target detection network;
and acquiring data to be detected, inputting the data to the trained target detection network, and outputting a detection result.
Further, the step of obtaining training data and preprocessing the training data to obtain preprocessed data specifically includes:
collecting training data according to the problem domain and marking to obtain marked training data;
the training data comprises a public data set from the Internet and an in-situ photographed image, and the information in the training data comprises original material pictures and labeling records of target positions and categories in the pictures.
Further, the target detection network comprises a long-neck residual error main network and a local feature optimization module, the long-neck residual error main network comprises six feature extraction convolution modules, and the local feature optimization module comprises a local fusion module and a scale supervision module.
Further, the feature extraction convolution module comprises an acceptance module, wherein the acceptance module comprises two branches.
Further, the local fusion module includes a detail re-directing branch that sequentially passes the input feature map through the 1×1 convolution layer, the max-pooling layer, the 3×3 convolution layer, and the batch normalization layer, a local context branch that sequentially passes the input feature map through the 1×1 convolution layer, the deconvolution layer, the 3×3 convolution layer, and the batch normalization layer, and an original input mapping branch that sequentially passes the input feature map through the 1×1 convolution layer, the 3×3 convolution layer, and the batch normalization layer.
Further, the step of training the target detection network based on the preprocessing data and a preset training strategy to obtain a trained target detection network specifically includes:
dividing the data into a training set, a verification set and a test set according to a certain proportion;
the training set is used as input in the target detection network training process, and the network output is calculated through convolution and other operations to obtain a prediction frame set;
according to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set comprises a category vector and a position vector;
for the classification subtasks, using cross entropy between the prediction frame class vector and the annotation frame class vector as a loss function;
for a positioning subtask, calculating the position loss of the prediction frame and the annotation frame through a Smooth L1 loss function;
calculating the gradient of the parameters in the convolution layer by layer according to the calculated loss and a random gradient descent method, and updating the parameters of each layer in the network;
in the training process, evaluating the generalization of the network by taking the verification set as input every fixed iteration number;
after training, the performance of the network is evaluated by taking the test set as the input of the network, and parameters such as convolution kernel, bias and the like in the network are saved at the same time, so that the trained target detection network is obtained.
Further, the step of acquiring the data to be detected and inputting the data to the trained target detection network and outputting the detection result specifically includes:
obtaining an image of a target to be detected by taking data to be detected;
inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a predicted frame and an N-dimensional vector sequence expressing category prediction through a convolution layer;
the detector discards a part of low-quality results through a manually preset category confidence threshold value according to the N-dimensional vector sequence of category prediction to obtain residual detection results;
and carrying out de-duplication on the predicted frames based on a non-maximum suppression algorithm by using the rest detection results through the confidence of the predicted frames and the overlapping rate between the predicted frames calculated based on the position 4-dimensional vector, so as to obtain the final detection result of the detector and output the final detection result.
The method and the system have the beneficial effects that: the design of the local feature optimization module for carrying out the spatial local information fusion not only can enhance the semantic information of the detection layer, but also ensures the spatial local information of the detection head features, is particularly beneficial to small target detection, further provides a suitable learning strategy for overcoming the problem of performance degradation during the random initialization of the backbone network parameters, and ensures that the detector obtains satisfactory performance on the premise of friendly calculation power.
Drawings
FIG. 1 is a network architecture of a target detection network based on detection backbone and local feature optimization of the present application;
FIG. 2 is a flow chart of steps of a target detection method based on detection of backbone and local feature optimization in accordance with the present application;
FIG. 3 is a block diagram of a target detection system based on detection backbone and local feature optimization in accordance with the present application;
FIG. 4 is a branching structure in a local fusion module in accordance with an embodiment of the present application.
Detailed Description
The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
As shown in fig. 1 and 2, the present application provides a target detection method based on detection of a trunk and local feature optimization, the method comprising the steps of:
s1, acquiring training data and preprocessing the training data to obtain preprocessed data;
s2, constructing a target detection network based on a long-neck trunk architecture and a local feature optimization module;
s3, training the target detection network based on the preprocessing data and a preset training strategy to obtain a trained target detection network;
specifically, in order to overcome performance degradation caused by no pre-training, the training strategy is optimized to ensure that similar or even better performance is obtained under the same training resources, and the specific improvement is as follows: (1) differential learning rate: the part of the network in front of the local acceptance module is consistent with the existing ResNet structure, and meanwhile, the low-level visual features have stronger generalization capability, so that the pre-training initialization parameters can be adopted. For the pre-trained network part, adopting a smaller learning rate to maintain pre-training knowledge; for randomly initialized parameters, a large learning rate is employed to facilitate searching of the network in the parameter space. By adopting the differential learning strategy, the detection network not only can have generalization performance brought by pre-training, but also can ensure faster learning convergence speed. (2) enhancing initial training stability: the network adopts the feature pyramid structure to detect the target, which is beneficial to enhancing the robustness to the target scale, but the high-resolution feature map in the detection layer easily generates overlarge gradient in the initial stage of training, and influences the convergence of the learning process. The application adopts the preheating technology, ensures the gradual optimization of the network by gradually increasing the learning rate in the initial stage of training, and prevents the deviation from the optimization target too far in the initial stage, thereby ensuring the learning process to be more stable. By adopting preheating, the statistical characteristics obtained by the network at the initial stage of training are more accurate, the problem that the existing random initialized target detection network depends on large-batch learning is solved, and therefore satisfactory performance can be obtained under the condition of smaller calculation resource requirement.
S4, acquiring data to be detected, inputting the data to be detected to a trained target detection network, and outputting a detection result.
Further as a preferred embodiment of the method, the step of obtaining training data and preprocessing the training data to obtain preprocessed data specifically includes:
collecting training data according to the problem domain and marking to obtain marked training data;
the training data comprises a public data set from the Internet and an in-situ photographed image, and the information in the training data comprises original material pictures and labeling records of target positions and categories in the pictures.
Specifically, a label box is generated here, containing a label box category vector and a location vector.
Further as a preferred embodiment of the method, the target detection network includes a long-neck residual backbone network and a local feature optimization module, the long-neck residual backbone network includes six feature extraction convolution modules, and the local feature optimization module includes a local fusion module and a scale supervision module.
Specifically, as shown in the upper half of fig. 1, the structure of the backbone network is a long neck residual backbone network, basically adopts a residual structure, but is different from the general res net in two points: (1) A local acceptance module for obtaining the multi-receptive field proportion is added; (2) The neck part is longer, so that the extraction of more abundant space detail features is facilitated;
in addition, as shown in the upper left of fig. 1, the architecture of the long neck trunk is based on a residual network, and mainly comprises 6 convolution levels responsible for feature extraction, one of which is a local acceptance module. Unlike the common residual network, the long-neck backbone network cancels a maximum pooling layer after the conv1 level, resulting in doubling of the resolution of the input feature map of the conv2_x level and the subsequent backbone networks. In addition, removal of the pooling layer also slows the increase in receptive field in the backbone, thereby facilitating capture of fine-grained features.
Simply removing the pooling layer will result in an increase in feature resolution, resulting in a certain increase in computation. The application also provides a simplified version of the long neck residual backbone (LN-ResNet-light). Compared with LN-ResNet, LN-ResNet-light retains the largest pooling layer behind conv1 in the original ResNet structure, while reducing the first residual block convolution step of conv3_x to 1, thereby reducing overall computational effort.
The long-neck backbone network (LN-ResNet) provided by the application is mainly used for extracting fine granularity spatial information in an image. The network strengthens the extraction of high-resolution features by prolonging the depth of a neck (each convolution layer before a detection layer), relieves the problem of too fast loss of space detail information in a common backbone network, and strengthens the feature expression of a small-scale target
Further as a preferred embodiment of the method, the feature extraction convolution module includes an acceptance module, where the acceptance module includes two branches.
Specifically, the local acceptance module contains two branches. The input features are first laminated by one volume 1 x 1 in both branches to reduce the number of channels by compressing the number of channels.
After that, the two branches respectively comprise a 1×3 convolution and a 3×1 convolution, and the two parallel convolution layer processes are different from the serial processes in the common acceptance and are mainly used for obtaining the receptive field information with different length-width ratios, so that the expression modeling of the targets with different length-width ratios is more effectively performed. In addition, these convolution layers also facilitate expanding receptive fields and deepening networks, thereby enhancing semantic expression.
Finally, the output characteristics of the two branches are spliced and then fused through a 3X 3 convolution layer. The fused output is added with the input of the whole module to form a residual structure, so that the effective propagation of the gradient is ensured.
Further as the preferred embodiment of the method, the local fusion module includes a detail re-directing branch that sequentially passes the input feature map through the 1×1 convolution layer, the max-pooling layer, the 3×3 convolution layer, and the batch normalization layer, a local context branch that sequentially passes the input feature map through the 1×1 convolution layer, the deconvolution layer, the 3×3 convolution layer, and the batch normalization layer, and an original input mapping branch that sequentially passes the input feature map through the 1×1 convolution layer, the 3×3 convolution layer, and the batch normalization layer.
Specifically, as shown in fig. 4, the detail re-directs the branch: the design of the branch is mainly used for relieving the problem of detail information loss caused by pooling. It uses as input the feature map that is shallowest in the immediately preceding adjacent level of the detection layer and has twice the spatial resolution to guarantee spatial detail as much as possible. The input profile is first passed through a convolutional layer laminating channel and then a maximum pooling layer (Maxpooling) is used to reduce the resolution to obtain the profile with the same resolution as the middle leg. Finally, further transforming the features using a convolution layer and a Batch Normalization (BN) layer; local context branching: the branch assists in the localization and identification of the target by introducing local context information of the target. Its input originates from the next stage of the current detection layer, with a spatial resolution of half of the detection layer profile. Firstly, the number of channels of an input feature map is reduced through a 1×1 convolution layer, then the deconvolution layer upsamples the feature map to generate a feature map with the same spatial resolution as that of a detection layer, and finally the feature map passes through a 3×3 convolution layer and a batch normalization layer. Different from a common hourglass structure, the input of the branch is a feature layer adjacent to the detection layer, so that the semantics of the detection layer are enhanced and the locality of the context features is ensured; original input mapping branches: the branch inputs the original feature map into a 1X 1 convolution layer and a 3X 3 convolution layer to perform channel compression and feature transformation before fusion so as to control the subsequent calculation amount possibly brought by a local fusion module to be increased and better fuse with the features of the other two branches.
Further as a preferred embodiment of the method, the step of training the target detection network based on the preprocessing data and a preset training policy to obtain a trained target detection network specifically includes:
dividing the data into a training set, a verification set and a test set according to a certain proportion;
the training set is used as input in the target detection network training process, and the network output is calculated through convolution and other operations to obtain a prediction frame set;
specifically, a series of preprocessing rules for the input image are set prior to training, wherein the preprocessing operations that must be involved include stabilizing the trained image normalization and controlling the changing image size of the computational complexity. During training, a series of random preprocessing operations such as random clipping are introduced on the basis of necessary operations to achieve the purpose of data augmentation and enhance the performance of the network.
According to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set comprises a category vector and a position vector;
for the classification subtasks, using cross entropy between the prediction frame class vector and the annotation frame class vector as a loss function;
for a positioning subtask, calculating the position loss of the prediction frame and the annotation frame through a Smooth L1 loss function;
calculating the gradient of the parameters in the convolution layer by layer according to the calculated loss and a random gradient descent method, and updating the parameters of each layer in the network;
in the training process, the generalization of the network is evaluated by taking the verification set as input every fixed iteration number, so that the network is prevented from being influenced by over fitting;
after training, the performance of the network is evaluated by taking the test set as the input of the network, and parameters such as convolution kernel, bias and the like in the network are saved at the same time, so that the trained target detection network is obtained.
Specifically, in actual detection, the trained model can be recovered only by giving parameter values to parameters of corresponding layers in the network through parameter names, and the parameters are used as the basis for outputting detection results in the subsequent detection process.
Further, as a preferred embodiment of the method, the step of acquiring the data to be tested, inputting the data to the trained target detection network, and outputting the detection result specifically includes:
obtaining an image of a target to be detected by taking data to be detected;
inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a predicted frame and an N-dimensional vector sequence expressing category prediction through a convolution layer;
the detector discards a part of low-quality results through a manually preset category confidence threshold value according to the N-dimensional vector sequence of category prediction to obtain residual detection results;
and calculating the overlapping rate between the prediction frames through the confidence coefficient of the prediction frames and the position 4-dimensional vector, de-duplicating the prediction frames based on a non-maximum suppression algorithm, and obtaining and outputting the final detection result of the detector.
Specifically, the detector first discards a portion of the low quality results from the N-dimensional vector sequence of class predictions by manually predetermined class confidence thresholds. The remaining detection results are de-duplicated for the detection frames according to a non-maximum suppression (NMS) algorithm by predicting frame confidence and calculating the overlap rate between prediction frames based on the position 4-dimensional vector. And finally, the residual prediction frame is the detection result of the detector.
As shown in fig. 3, a target detection system based on detection trunk and local feature optimization includes the following modules:
the preprocessing module is used for acquiring training data and preprocessing the training data to obtain preprocessed data;
the network construction module is used for constructing a target detection network based on the long-neck trunk architecture and the local feature optimization module;
the training module is used for training the target detection network based on the preprocessing data and a preset training strategy to obtain a trained target detection network;
the detection module is used for acquiring the data to be detected, inputting the data to the trained target detection network and outputting a detection result.
The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (5)

1. The target detection method based on detection trunk and local feature optimization is characterized by comprising the following steps of:
acquiring training data and preprocessing the training data to obtain preprocessed data;
constructing a target detection network based on a long-neck trunk architecture and a local feature optimization module;
training the target detection network based on the preprocessing data and a preset training strategy to obtain a trained target detection network;
obtaining data to be detected, inputting the data to a trained target detection network, and outputting a detection result;
the target detection network comprises a long-neck residual error main network and a local feature optimization module, wherein the long-neck residual error main network comprises six feature extraction convolution modules, and the local feature optimization module comprises a local fusion module and a scale supervision module;
the local fusion module comprises a detail re-leading branch, a local context branch and an original input mapping branch, wherein the detail re-leading branch sequentially passes an input characteristic image through a 1×1 convolution layer, a maximum pooling layer, a 3×3 convolution layer and a batch normalization layer, the local context branch sequentially passes the input characteristic image through the 1×1 convolution layer, a deconvolution layer, the 3×3 convolution layer and the batch normalization layer, and the original input mapping branch sequentially passes the input characteristic image through the 1×1 convolution layer, the 3×3 convolution layer and the batch normalization layer;
the step of acquiring the data to be detected and inputting the data to the trained target detection network and outputting the detection result specifically comprises the following steps:
acquiring data to be detected to obtain an image of a target to be detected;
inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a predicted frame and an N-dimensional vector sequence expressing category prediction through a convolution layer;
the detector discards a part of low-quality results through a manually preset category confidence threshold value according to the N-dimensional vector sequence of category prediction to obtain residual detection results;
and calculating the overlapping rate between the prediction frames through the confidence coefficient of the prediction frames and the position 4-dimensional vector, de-duplicating the prediction frames based on a non-maximum suppression algorithm, and obtaining and outputting the final detection result of the detector.
2. The method for detecting a target based on detection of a trunk and local feature optimization according to claim 1, wherein the step of acquiring training data and preprocessing the training data to obtain preprocessed data specifically comprises:
collecting training data according to the problem domain and marking to obtain marked training data;
the training data comprises a public data set from the Internet and an in-situ photographed image, and the information in the training data comprises original material pictures, target positions in the pictures and labeling records of categories.
3. The target detection method based on detection backbone and local feature optimization according to claim 2, wherein the feature extraction convolution module comprises an acceptance module, and the acceptance module comprises two branches.
4. The method for detecting a target based on detection trunk and local feature optimization according to claim 3, wherein the step of training the target detection network based on the preprocessing data and a preset training strategy to obtain a trained target detection network specifically comprises the following steps:
dividing the data into a training set, a verification set and a test set according to a certain proportion;
the training set is used as input in the target detection network training process, and the network output is calculated through convolution and other operations to obtain a prediction frame set;
according to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set comprises a category vector and a position vector;
for the classification subtasks, using cross entropy between the prediction frame class vector and the annotation frame class vector as a loss function;
for a positioning subtask, calculating the position loss of the prediction frame and the annotation frame through a Smooth L1 loss function;
calculating the gradient of the parameters in the convolution layer by layer according to the calculated loss and a random gradient descent method, and updating the parameters of each layer in the network;
in the training process, evaluating the generalization of the network by taking the verification set as input every fixed iteration number;
after training, the performance of the network is evaluated by taking the test set as the input of the network, and parameters such as convolution kernel, bias and the like in the network are saved at the same time, so that the trained target detection network is obtained.
5. The target detection system based on detection trunk and local feature optimization is characterized by comprising the following modules:
the preprocessing module is used for acquiring training data and preprocessing the training data to obtain preprocessed data;
the network construction module is used for constructing a target detection network based on the long-neck trunk architecture and the local feature optimization module;
the training module is used for training the target detection network based on the preprocessing data and a preset training strategy to obtain a trained target detection network;
the detection module is used for acquiring data to be detected, inputting the data to the trained target detection network and outputting a detection result;
the target detection network comprises a long-neck residual error main network and a local feature optimization module, wherein the long-neck residual error main network comprises six feature extraction convolution modules, and the local feature optimization module comprises a local fusion module and a scale supervision module;
the local fusion module comprises a detail re-leading branch, a local context branch and an original input mapping branch, wherein the detail re-leading branch sequentially passes an input characteristic image through a 1×1 convolution layer, a maximum pooling layer, a 3×3 convolution layer and a batch normalization layer, the local context branch sequentially passes the input characteristic image through the 1×1 convolution layer, a deconvolution layer, the 3×3 convolution layer and the batch normalization layer, and the original input mapping branch sequentially passes the input characteristic image through the 1×1 convolution layer, the 3×3 convolution layer and the batch normalization layer;
the method for obtaining the data to be detected and inputting the data to the trained target detection network and outputting the detection result specifically comprises the following steps:
acquiring data to be detected to obtain an image of a target to be detected; inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a predicted frame and an N-dimensional vector sequence expressing category prediction through a convolution layer; the detector discards a part of low-quality results through a manually preset category confidence threshold value according to the N-dimensional vector sequence of category prediction to obtain residual detection results; and calculating the overlapping rate between the prediction frames through the confidence coefficient of the prediction frames and the position 4-dimensional vector, de-duplicating the prediction frames based on a non-maximum suppression algorithm, and obtaining and outputting the final detection result of the detector.
CN202011388976.2A 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization Active CN112396126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011388976.2A CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011388976.2A CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Publications (2)

Publication Number Publication Date
CN112396126A CN112396126A (en) 2021-02-23
CN112396126B true CN112396126B (en) 2023-09-22

Family

ID=74604938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011388976.2A Active CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Country Status (1)

Country Link
CN (1) CN112396126B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554125B (en) * 2021-09-18 2021-12-17 四川翼飞视科技有限公司 Object detection apparatus, method and storage medium combining global and local features

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN110188720A (en) * 2019-06-05 2019-08-30 上海云绅智能科技有限公司 A kind of object detection method and system based on convolutional neural networks
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A kind of small target deteection of Enhanced feature study and recognition methods
CN111144329A (en) * 2019-12-29 2020-05-12 北京工业大学 Light-weight rapid crowd counting method based on multiple labels

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN110188720A (en) * 2019-06-05 2019-08-30 上海云绅智能科技有限公司 A kind of object detection method and system based on convolutional neural networks
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A kind of small target deteection of Enhanced feature study and recognition methods
CN111144329A (en) * 2019-12-29 2020-05-12 北京工业大学 Light-weight rapid crowd counting method based on multiple labels

Also Published As

Publication number Publication date
CN112396126A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CN111126472B (en) SSD (solid State disk) -based improved target detection method
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
WO2021227366A1 (en) Method for automatically and accurately detecting plurality of small targets
CN114022432B (en) Insulator defect detection method based on improved yolov5
US11538286B2 (en) Method and apparatus for vehicle damage assessment, electronic device, and computer storage medium
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN109472193A (en) Method for detecting human face and device
CN114187311A (en) Image semantic segmentation method, device, equipment and storage medium
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN112381763A (en) Surface defect detection method
CN112529931B (en) Method and system for foreground segmentation
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN114943876A (en) Cloud and cloud shadow detection method and device for multi-level semantic fusion and storage medium
CN111368634A (en) Human head detection method, system and storage medium based on neural network
CN112200772A (en) Pox check out test set
CN112396126B (en) Target detection method and system based on detection trunk and local feature optimization
CN113627504B (en) Multi-mode multi-scale feature fusion target detection method based on generation of countermeasure network
CN111027542A (en) Target detection method improved based on fast RCNN algorithm
CN112633100B (en) Behavior recognition method, behavior recognition device, electronic equipment and storage medium
CN111160282B (en) Traffic light detection method based on binary Yolov3 network
CN117058716A (en) Cross-domain behavior recognition method and device based on image pre-fusion
CN110956097A (en) Method and module for extracting occluded human body and method and device for scene conversion
CN116110005A (en) Crowd behavior attribute counting method, system and product
CN115147727A (en) Method and system for extracting impervious surface of remote sensing image
CN115937565A (en) Hyperspectral image classification method based on self-adaptive L-BFGS algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant