CN112396126A - Target detection method and system based on detection of main stem and local feature optimization - Google Patents

Target detection method and system based on detection of main stem and local feature optimization Download PDF

Info

Publication number
CN112396126A
CN112396126A CN202011388976.2A CN202011388976A CN112396126A CN 112396126 A CN112396126 A CN 112396126A CN 202011388976 A CN202011388976 A CN 202011388976A CN 112396126 A CN112396126 A CN 112396126A
Authority
CN
China
Prior art keywords
network
target detection
training
data
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011388976.2A
Other languages
Chinese (zh)
Other versions
CN112396126B (en
Inventor
郑慧诚
严志伟
黄梓轩
李烨
陈绿然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011388976.2A priority Critical patent/CN112396126B/en
Publication of CN112396126A publication Critical patent/CN112396126A/en
Application granted granted Critical
Publication of CN112396126B publication Critical patent/CN112396126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method and a system based on detection of trunk and local feature optimization, wherein the method comprises the following steps: acquiring training data and preprocessing the training data to obtain preprocessed data; constructing a target detection network based on a long-neck backbone architecture and a local feature optimization module; training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network; and acquiring data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result. The system comprises: the device comprises a preprocessing module, a network construction module, a training module and a detection module. By using the invention, the detector is ensured to obtain satisfactory performance on the premise of being computationally friendly. The target detection method and the system based on the detection backbone and the local feature optimization can be widely applied to the field of target detection networks.

Description

Target detection method and system based on detection of main stem and local feature optimization
Technical Field
The invention belongs to the field of target detection networks, and particularly relates to a target detection method and a target detection system based on detection backbone and local feature optimization.
Background
Target detection has wide application as a basic task of computer vision, and is a hot field of research in academic and industrial fields. With the rise of deep learning, the field of target detection is greatly developed. However, the current detector has poor performance for detecting small-scale targets, mainly due to the fast information loss in the backbone network and the insufficient modeling of local information by the detection head.
The main network is used as a basic structure for feature extraction, and plays a significant role in the target detection effect. Due to the general shortage of target detection training samples, most current detectors employ network backbones pre-trained on large image classification datasets. The difference of tasks causes the problem of domain deviation when the network is finely adjusted, and meanwhile, the structural design space of the backbone network is limited to a certain extent by adopting the pre-training network. Due to the fact that the currently and commonly adopted backbone network carries out pooling operation prematurely, space detail information is lost, and the method is unfavorable for feature expression of small targets.
On the other hand, the detection head part of the current mainstream detector usually uses a feature pyramid as an input, shallow feature semantic information in the pyramid is insufficient, and spatial information of deep features is seriously lost, so how to enhance feature expression and detection of a detection layer on a small-scale target is a problem that needs to be solved at present.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a method and a system for target detection based on detection of main stem and local feature optimization, which ensure that the detector obtains satisfactory performance on the premise of being computationally friendly.
The first technical scheme adopted by the invention is as follows: a target detection method based on main detection and local feature optimization comprises the following steps:
acquiring training data and preprocessing the training data to obtain preprocessed data;
constructing a target detection network based on a long-neck backbone architecture and a local feature optimization module;
training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network;
and acquiring data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result.
Further, the step of obtaining training data and preprocessing the training data to obtain preprocessed data specifically includes:
collecting training data according to the problem domain and marking the training data to obtain marked training data;
the training data comprises public data sets and solid shot images from the Internet, and information in the training data comprises original material pictures and annotation records of target positions and categories in the pictures.
Further, the target detection network comprises a long-neck residual error trunk network and a local feature optimization module, the long-neck residual error trunk network comprises six feature extraction convolution modules, and the local feature optimization module comprises a local fusion module and a scale supervision module.
Further, the feature extraction convolution module comprises an inclusion module, and the inclusion module comprises two branches.
Further, the local fusion module comprises a detail re-guiding branch, a local context branch and an original input mapping branch, wherein the detail re-guiding branch enables the input feature graph to sequentially pass through a 1 × 1 convolution layer, a maximum pooling layer, a 3 × 3 convolution layer and a batch normalization layer, the local upper and lower branches enable the input feature graph to sequentially pass through the 1 × 1 convolution layer, an inverse convolution layer, the 3 × 3 convolution layer and the batch normalization layer, and the original input mapping branch enables the input feature graph to sequentially pass through the 1 × 1 convolution layer, the 3 × 3 convolution layer and the batch normalization layer.
Further, the step of training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network specifically includes:
dividing data into a training set, a verification set and a test set according to a certain proportion;
calculating network output by taking the training set as input in the training process of the target detection network through operations such as convolution and the like to obtain a prediction frame set;
according to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set comprises a category vector and a position vector;
for the classification subtask, using the cross entropy between the prediction frame class vector and the labeling frame class vector as a loss function;
for the positioning subtask, calculating the position loss of the prediction frame and the marking frame through a Smooth L1 loss function;
calculating the gradient of the parameters in the convolutional layer by layer according to the calculated loss and a random gradient descent method, and updating the parameters of each layer in the network;
in the training process, the generalization of the network is evaluated by taking the verification set as input at fixed iteration times at intervals;
and after the training is finished, evaluating the performance of the network by taking the test set as the input of the network, and simultaneously storing parameters such as a convolution kernel, an offset and the like in the network to obtain the trained target detection network.
Further, the step of acquiring data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result specifically includes:
obtaining an image of a target to be detected by taking data to be detected;
inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a prediction frame and an N-dimensional vector sequence representing class prediction through a convolutional layer;
the detector discards a part of low-quality results according to the N-dimensional vector sequence predicted by the category through an artificially preset category confidence threshold to obtain the residual detection results;
and (4) the residual detection results pass through the confidence degrees of the prediction frames and the overlapping rate between the prediction frames calculated based on the position 4-dimensional vector, and the prediction frames are subjected to de-duplication based on a non-maximum suppression algorithm to obtain and output the final detection result of the detector.
The method and the system have the beneficial effects that: a local feature optimization module for spatial local information fusion is designed, so that not only can semantic information of a detection layer be enhanced, but also spatial local information of detection head features is guaranteed, small target detection is particularly facilitated, in order to overcome the problem of performance reduction during random initialization of backbone network parameters, a suitable learning strategy is further provided, and the detector is guaranteed to obtain satisfactory performance on the premise of friendly calculation power.
Drawings
FIG. 1 is a network architecture of a target detection network based on detection backbone and local feature optimization according to the present invention;
FIG. 2 is a flowchart illustrating steps of a method for detecting a target based on detecting stem and local feature optimization according to the present invention;
FIG. 3 is a block diagram of a target detection system based on detection of stem and local feature optimization according to the present invention;
FIG. 4 illustrates a branch structure in a local fusion module according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
As shown in fig. 1 and fig. 2, the present invention provides a target detection method based on detecting stem and local feature optimization, which includes the following steps:
s1, acquiring training data and preprocessing the training data to obtain preprocessed data;
s2, constructing a target detection network based on the long-neck backbone architecture and the local feature optimization module;
s3, training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network;
specifically, in order to overcome the performance reduction caused by no pre-training, the invention optimizes the training strategy to ensure that similar or even better performance is obtained under the same training resources, and the specific improvement is as follows: (1) differentiation learning rate: the part of the network before the local inclusion module is consistent with the existing ResNet structure, and meanwhile, the lower-layer visual features have stronger generalization capability, so that pre-training initialization parameters can be adopted. For the pre-trained network part, a smaller learning rate is adopted to keep the pre-training knowledge; for randomly initialized parameters, a large learning rate is employed to facilitate the search of the network in the parameter space. By adopting the difference learning strategy, the detection network not only can have generalization performance brought by pre-training, but also can ensure faster learning convergence speed. (2) Strengthening the stability of the initial stage of training: the network adopts a characteristic pyramid structure to carry out target detection, which is beneficial to enhancing the robustness of a target scale, but a high-resolution characteristic diagram in a detection layer easily generates overlarge gradient at the initial training stage, and the convergence of a learning process is influenced. The invention adopts the preheating technology, ensures the gradual optimization of the network by gradually increasing the learning rate in the initial training stage, and prevents the network from deviating from the optimization target too far in the initial stage, thereby ensuring the learning process to be more stable. By adopting preheating, the statistical characteristics obtained by the network at the initial training stage are more accurate, and the problem that the existing randomly initialized target detection network depends on large-batch learning is solved, so that the satisfactory performance can be obtained under the condition of smaller computing resource requirements.
And S4, acquiring the data to be detected, inputting the data to be detected to the trained target detection network, and outputting a detection result.
Further, as a preferred embodiment of the method, the step of obtaining the training data and preprocessing the training data to obtain preprocessed data specifically includes:
collecting training data according to the problem domain and marking the training data to obtain marked training data;
the training data comprises public data sets and solid shot images from the Internet, and information in the training data comprises original material pictures and annotation records of target positions and categories in the pictures.
Specifically, here a label box is generated, containing a label box category vector and a position vector.
Further as a preferred embodiment of the method, the target detection network includes a long-neck residual error trunk network and a local feature optimization module, the long-neck residual error trunk network includes six feature extraction convolution modules, and the local feature optimization module includes a local fusion module and a scale supervision module.
Specifically, as shown in the upper half of fig. 1, "long-neck residual backbone network", the structure of the backbone network basically adopts a residual structure, but differs from the conventional ResNet in two places: (1) a local inclusion module for obtaining a multiple receptive field ratio is added; (2) the neck is longer, so that richer space detail characteristics can be extracted;
in addition, as shown in the upper left of fig. 1, the architecture of the long-neck trunk is based on a residual error network, and mainly includes 6 convolution levels responsible for feature extraction, one of which is a local inclusion module. Unlike the normal residual network, the long-neck backbone network cancels one of the largest pooling layers after the conv1 level, resulting in multiplication of the input profile resolution of the conv2_ x level and thereafter the backbone network. In addition, removal of the pooling layer also slows down the increase of the receptive field in the trunk, thereby facilitating capture of fine-grained features.
If the pooling layer is simply removed, the feature resolution will be increased, which results in a certain amount of computation increase. The invention also provides a simplified version of a long-neck residual backbone network (LN-ResNet-light). In comparison to LN-ResNet, LN-ResNet-light preserves the largest pooling layer behind conv1 in the original ResNet structure, and reduces the first residual block convolution step of conv3_ x to 1, thereby reducing the overall computation.
The long-neck backbone network (LN-ResNet) provided by the invention is mainly used for extracting fine-grained spatial information in an image. The network enhances the extraction of high-resolution features by prolonging the depth of a neck (each convolution layer in front of a detection layer), relieves the problem of too fast loss of space detail information in a common backbone network, and enhances the feature expression of small-scale targets
Further as a preferred embodiment of the method, the feature extraction convolution module includes an inclusion module, and the inclusion module includes two branches.
Specifically, the local inclusion module comprises two branches. The input features are first passed through a volume of 1 x 1 layers in both branches to compress the number of channels to reduce the number of computations.
After that, the two branches respectively include a 1 × 3 convolution and a 3 × 1 convolution, and the two parallel convolution layer processes are different from the serial processes in the common inclusion, and are mainly used for obtaining the receptive field information with different aspect ratios, so that the targets with different aspect ratios are more effectively expressed and modeled. In addition, the convolutional layers are also beneficial to expanding the receptive field and deepening the network, thereby enhancing the semantic expression.
And finally, splicing the output characteristics of the two branches and fusing the output characteristics through a 3 multiplied by 3 convolutional layer. The fused output is added with the input of the whole module to form a residual structure, so that the effective propagation of the gradient is ensured.
As a further preferred embodiment of the method, the local fusion module includes a detail re-directing branch, a local context branch and an original input mapping branch, the detail re-directing branch sequentially passes the input feature map through the 1 × 1 convolutional layer, the maximum pooling layer, the 3 × 3 convolutional layer and the batch normalization layer, the local up-down branch sequentially passes the input feature map through the 1 × 1 convolutional layer, the inverse convolutional layer, the 3 × 3 convolutional layer and the batch normalization layer, and the original input mapping branch sequentially passes the input feature map through the 1 × 1 convolutional layer, the 3 × 3 convolutional layer and the batch normalization layer.
Specifically, as shown in fig. 4, the detail re-directing branch: the branch is designed primarily to alleviate the problem of loss of detail information due to pooling. It uses as input the feature map that is shallowest in the previous adjacent level of the detection layer and has twice the spatial resolution to guarantee spatial detail as much as possible. The input feature map is first passed through a convolutional layer compression pass, and then the resolution is reduced using a max pooling layer (Maxpooling) to obtain a feature map with the same resolution as the middle branch. Finally, a convolution layer and Batch Normalization (BN) layer are used for further feature transformation; local context branching: the branch assists the location and identification of the target by introducing local context information of the target. The input of the method is from the next stage of the current detection layer, and the spatial resolution is half of the characteristic diagram of the detection layer. Firstly, the input feature map passes through a 1 × 1 convolutional layer to reduce the number of channels, then the deconvolution layer performs up-sampling on the feature map to generate a feature map with the same spatial resolution as that of the detection layer, and finally the feature map passes through a 3 × 3 convolutional layer and a batch normalization layer. Different from a common hourglass structure, the input of the branch is a characteristic layer adjacent to a detection layer, so that the detection layer semantics are enhanced, and meanwhile, the locality of context characteristics is guaranteed; original input mapping branch: the branch inputs the original feature map into a 1 × 1 convolutional layer and a 3 × 3 convolutional layer for feature transformation before channel compression and fusion so as to control the subsequent calculation increase possibly brought by a local fusion module and better fuse with the features of the other two branches.
As a preferred embodiment of the method, the step of training the target detection network based on the preprocessed data and a preset training strategy to obtain the trained target detection network specifically includes:
dividing data into a training set, a verification set and a test set according to a certain proportion;
calculating network output by taking the training set as input in the training process of the target detection network through operations such as convolution and the like to obtain a prediction frame set;
specifically, before training, a series of preprocessing rules for the input image are set, wherein the preprocessing operations that must be included include image normalization for stable training and changing image size to control computational complexity. During training, on the basis of necessary operation, a series of random preprocessing operations such as random clipping are introduced to achieve the purpose of data augmentation and enhance the performance of the network.
According to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set comprises a category vector and a position vector;
for the classification subtask, using the cross entropy between the prediction frame class vector and the labeling frame class vector as a loss function;
for the positioning subtask, calculating the position loss of the prediction frame and the marking frame through a Smooth L1 loss function;
calculating the gradient of the parameters in the convolutional layer by layer according to the calculated loss and a random gradient descent method, and updating the parameters of each layer in the network;
in the training process, the generalization of the network is evaluated by taking the verification set as input at fixed iteration times at intervals, so that the network is prevented from being influenced by overfitting;
and after the training is finished, evaluating the performance of the network by taking the test set as the input of the network, and simultaneously storing parameters such as a convolution kernel, an offset and the like in the network to obtain the trained target detection network.
Specifically, in actual detection, the trained model can be recovered only by assigning the parameter value to the parameter of the corresponding layer in the network through the parameter name, and the model is used as a basis for outputting a detection result in a subsequent detection process.
Further, as a preferred embodiment of the method, the step of acquiring the data to be detected, inputting the data to be detected to the trained target detection network, and outputting the detection result specifically includes:
obtaining an image of a target to be detected by taking data to be detected;
inputting an image of a target to be detected into a trained target detection network, and outputting a 4-dimensional vector sequence representing the position of a prediction frame and an N-dimensional vector sequence representing class prediction through a convolutional layer;
the detector discards a part of low-quality results according to the N-dimensional vector sequence predicted by the category through an artificially preset category confidence threshold to obtain the residual detection results;
and calculating the overlapping rate of the prediction frames according to the residual detection results through the confidence degrees of the prediction frames and the position-based 4-dimensional vector, and removing the duplication of the prediction frames based on a non-maximum suppression algorithm to obtain and output the final detection result of the detector.
Specifically, the detector first discards a portion of the low quality results from the N-dimensional sequence of class predictions by an artificially predetermined class confidence threshold. The remaining detection results are de-duplicated from the detection boxes according to a non-maximum suppression (NMS) algorithm by the prediction box confidence and the overlap ratio between the prediction boxes calculated based on the position 4-dimensional vector. And finally, the residual prediction frame is the detection result of the detector.
As shown in fig. 3, an object detection system based on the optimization of detection main stem and local features includes the following modules:
the preprocessing module is used for acquiring training data and preprocessing the training data to obtain preprocessed data;
the network construction module is used for constructing a target detection network based on the long-neck backbone architecture and the local feature optimization module;
the training module is used for training the target detection network based on the preprocessed data and a preset training strategy to obtain a trained target detection network;
and the detection module is used for acquiring data to be detected, inputting the data to be detected into the trained target detection network and outputting a detection result.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1.一种基于检测主干与局部特征优化的目标检测方法,其特征在于,包括以下步骤:1. a target detection method based on detection backbone and local feature optimization, is characterized in that, comprises the following steps: 获取训练数据并对训练数据进行预处理,得到预处理数据;Obtain training data and preprocess the training data to obtain preprocessed data; 基于长颈主干架构和局部特征优化模块构建目标检测网络;Build a target detection network based on the long-neck backbone architecture and local feature optimization module; 基于预处理数据和预设的训练策略对目标检测网络进行训练,得到训练后的目标检测网络;The target detection network is trained based on the preprocessed data and the preset training strategy, and the trained target detection network is obtained; 获取待测数据并输入到训练后的目标检测网络,输出检测结果。Obtain the data to be tested and input it into the trained target detection network, and output the detection result. 2.根据权利要求1所述一种基于检测主干与局部特征优化的目标检测方法,其特征在于,所述获取训练数据并对训练数据进行预处理,得到预处理数据这一步骤,其具体包括:2. a kind of target detection method based on detection backbone and local feature optimization according to claim 1, is characterized in that, described acquisition training data and training data are preprocessed, obtain this step of preprocessing data, it specifically comprises : 根据问题域收集训练数据并进行标注,得到标注后的训练数据;Collect training data and label it according to the problem domain, and obtain labelled training data; 所述训练数据包括来自于互联网的公开数据集和实地拍摄图像,所述训练数据中的信息包括原始的素材图片、图片中目标位置以及类别的标注记录。The training data includes public data sets and field images from the Internet, and the information in the training data includes original material pictures, target positions in the pictures, and annotation records of categories. 3.根据权利要求2所述一种基于检测主干与局部特征优化的目标检测方法,其特征在于,所述目标检测网络包括长颈残差主干网络和局部特征优化模块,所述长颈残差主干网络包括六个特征提取卷积模块,所述局部特征优化模块包括局部融合模块和尺度监督模块。3. A target detection method based on detection backbone and local feature optimization according to claim 2, wherein the target detection network comprises a long-neck residual backbone network and a local feature optimization module, and the long-neck residual The backbone network includes six feature extraction convolution modules, and the local feature optimization module includes a local fusion module and a scale supervision module. 4.根据权利要求3所述一种基于检测主干与局部特征优化的目标检测方法,其特征在于,所述特征提取卷积模块包括Inception模块,所述Inception模块包含两条支路。4. A target detection method based on detection backbone and local feature optimization according to claim 3, wherein the feature extraction convolution module comprises an Inception module, and the Inception module comprises two branches. 5.根据权利要求4所述一种基于检测主干与局部特征优化的目标检测方法,其特征在于,所述局部融合模块包括细节重引分支、局部上下文分支和原始输入映射分支,所述细节重引分支将输入特征图按顺序通过1×1卷积层、最大池化层、3×3卷积层和批归一化层,所述局部上下分支将输入特征图按顺序通过1×1卷积层、反卷积层、3×3卷积层与批归一化层,所述原始输入映射分支将输入特征图按顺序通过1×1卷积层、3×3卷积层和批归一化层。5. A target detection method based on detection backbone and local feature optimization according to claim 4, wherein the local fusion module comprises a detail re-reference branch, a local context branch and an original input mapping branch, and the detail re-reference branch. The introductory branch passes the input feature map through a 1×1 convolutional layer, a max pooling layer, a 3×3 convolutional layer, and a batch normalization layer in order, and the local up-down branch passes the input feature map through a 1×1 volume in order Convolution layer, deconvolution layer, 3×3 convolution layer and batch normalization layer, the raw input mapping branch passes the input feature map through 1×1 convolution layer, 3×3 convolution layer and batch normalization layer in order One layer. 6.根据权利要求5所述一种基于检测主干与局部特征优化的目标检测方法,其特征在于,所述基于预处理数据和预设的训练策略对目标检测网络进行训练,得到训练后的目标检测网络这一步骤,其具体包括:6. A target detection method based on detection backbone and local feature optimization according to claim 5, wherein the target detection network is trained based on preprocessing data and a preset training strategy to obtain the target after training The step of detecting the network includes: 将数据按照一定比例分为训练集,验证集,测试集;Divide the data into training set, validation set, and test set according to a certain proportion; 以训练集作为目标检测网络训练过程中的输入,通过卷积等运算,计算出网络输出,得到预测框集合;Taking the training set as the input in the training process of the target detection network, through operations such as convolution, the network output is calculated, and the set of prediction boxes is obtained; 根据分类子任务和定位子任务,所述预测框集合中每个预测框包含类别向量和位置向量;According to the classification subtask and the positioning subtask, each prediction frame in the prediction frame set includes a category vector and a position vector; 对于分类子任务,使用预测框类别向量与标注框类别向量之间的交叉熵作为损失函数;For the classification subtask, use the cross-entropy between the predicted box category vector and the labeled box category vector as the loss function; 对于定位子任务,通过Smooth L1损失函数来计算预测框与标注框的位置损失;For the positioning subtask, the position loss of the prediction frame and the annotation frame is calculated by the Smooth L1 loss function; 根据计算出的损失按照随机梯度下降方法逐层计算卷积层中参数的梯度,更新网络中各层的参数;According to the calculated loss, the gradient of the parameters in the convolutional layer is calculated layer by layer according to the stochastic gradient descent method, and the parameters of each layer in the network are updated; 训练过程中,每间隔固定迭代次数以验证集作为输入对网络的泛化性进行评估;During the training process, the generalization of the network is evaluated with the validation set as the input at a fixed number of iterations. 训练完成后,以测试集作为网络的输入对网络的性能进行评估,同时保存网络中的卷积核、偏置等参数,得到训练后的目标检测网络。After the training is completed, the performance of the network is evaluated with the test set as the input of the network, and the parameters such as the convolution kernel and bias in the network are saved to obtain the trained target detection network. 7.根据权利要求3所述一种基于检测主干与局部特征优化的目标检测方法,其特征在于,所述获取待测数据并输入到训练后的目标检测网络,输出检测结果这一步骤,其具体包括:7. a kind of target detection method based on detection backbone and local feature optimization according to claim 3, is characterized in that, described acquisition to be measured data and input to the target detection network after training, this step of outputting detection result, its Specifically include: 获取待测数据得到需检测目标的图像;Obtain the data to be tested to obtain the image of the target to be detected; 将需检测目标的图像输入到训练后的目标检测网络,经过卷积层输出一个表示预测框位置的4维向量序列以及一个表达类别预测的N维向量序列;Input the image of the target to be detected into the trained target detection network, and output a 4-dimensional vector sequence representing the position of the prediction frame and an N-dimensional vector sequence expressing the category prediction through the convolution layer; 检测器通过人工预定的类别置信度阈值并根据类别预测的N维向量序列丢弃一部分低质量结果,得到剩余的检测结果;The detector discards a part of the low-quality results through the artificially predetermined category confidence threshold and the N-dimensional vector sequence predicted by the category, and obtains the remaining detection results; 将剩余的检测结果通过预测框置信度以及基于位置4维向量计算出预测框之间的重叠率,并基于非极大值抑制算法对预测框进行去重,得到检测器的最终检测结果并输出。The remaining detection results are calculated by the confidence of the prediction frame and the overlap rate between the prediction frames based on the position 4-dimensional vector, and the prediction frame is deduplicated based on the non-maximum suppression algorithm, and the final detection result of the detector is obtained and output. . 8.一种基于检测主干与局部特征优化的目标检测系统,其特征在于,包括以下模块:8. A target detection system based on detection backbone and local feature optimization, characterized in that it comprises the following modules: 预处理模块,用于获取训练数据并对训练数据进行预处理,得到预处理数据;The preprocessing module is used to obtain the training data and preprocess the training data to obtain the preprocessed data; 网络构建模块,用于基于长颈主干架构和局部特征优化模块构建目标检测网络;A network building module for building a target detection network based on the long-neck backbone architecture and local feature optimization module; 训练模块,用于基于预处理数据和预设的训练策略对目标检测网络进行训练,得到训练后的目标检测网络;The training module is used to train the target detection network based on the preprocessing data and the preset training strategy, and obtain the trained target detection network; 检测模块,用于获取待测数据并输入到训练后的目标检测网络,输出检测结果。The detection module is used to obtain the data to be tested and input it to the trained target detection network, and output the detection result.
CN202011388976.2A 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization Active CN112396126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011388976.2A CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011388976.2A CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Publications (2)

Publication Number Publication Date
CN112396126A true CN112396126A (en) 2021-02-23
CN112396126B CN112396126B (en) 2023-09-22

Family

ID=74604938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011388976.2A Active CN112396126B (en) 2020-12-02 2020-12-02 Target detection method and system based on detection trunk and local feature optimization

Country Status (1)

Country Link
CN (1) CN112396126B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554125A (en) * 2021-09-18 2021-10-26 四川翼飞视科技有限公司 Object detection apparatus, method and storage medium combining global and local features
CN114818931A (en) * 2022-04-27 2022-07-29 重庆邮电大学 Fruit image classification method based on small sample element learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN110188720A (en) * 2019-06-05 2019-08-30 上海云绅智能科技有限公司 A kind of object detection method and system based on convolutional neural networks
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A Small Target Detection and Recognition Method Based on Enhanced Feature Learning
CN111144329A (en) * 2019-12-29 2020-05-12 北京工业大学 A lightweight and fast crowd counting method based on multi-label

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875595A (en) * 2018-05-29 2018-11-23 重庆大学 A kind of Driving Scene object detection method merged based on deep learning and multilayer feature
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN110188720A (en) * 2019-06-05 2019-08-30 上海云绅智能科技有限公司 A kind of object detection method and system based on convolutional neural networks
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A Small Target Detection and Recognition Method Based on Enhanced Feature Learning
CN111144329A (en) * 2019-12-29 2020-05-12 北京工业大学 A lightweight and fast crowd counting method based on multi-label

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554125A (en) * 2021-09-18 2021-10-26 四川翼飞视科技有限公司 Object detection apparatus, method and storage medium combining global and local features
CN113554125B (en) * 2021-09-18 2021-12-17 四川翼飞视科技有限公司 Object detection apparatus, method and storage medium combining global and local features
CN114818931A (en) * 2022-04-27 2022-07-29 重庆邮电大学 Fruit image classification method based on small sample element learning

Also Published As

Publication number Publication date
CN112396126B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
AU2019213369B2 (en) Non-local memory network for semi-supervised video object segmentation
CN111598174B (en) Model training method and image change analysis method based on semi-supervised adversarial learning
WO2021227366A1 (en) Method for automatically and accurately detecting plurality of small targets
CN111860235B (en) Generation method and system of attention remote sensing image description based on high and low level feature fusion
CN104217225B (en) A kind of sensation target detection and mask method
CN112541904B (en) Unsupervised remote sensing image change detection method, storage medium and computing device
CN111476302A (en) Faster-RCNN target object detection method based on deep reinforcement learning
CN111563508A (en) A Semantic Segmentation Method Based on Spatial Information Fusion
CN111460936A (en) Remote sensing image building extraction method, system and electronic equipment based on U-Net network
CN112541508A (en) Fruit segmentation and recognition method and system and fruit picking robot
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN110008853B (en) Pedestrian detection network and model training method, detection method, medium, equipment
CN108182260B (en) Multivariate time sequence classification method based on semantic selection
CN105678284A (en) Fixed-position human behavior analysis method
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
JP2024513596A (en) Image processing method and apparatus and computer readable storage medium
CN113505670B (en) Weakly supervised building extraction method from remote sensing images based on multi-scale CAM and superpixels
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN115410059B (en) Remote sensing image part supervision change detection method and device based on contrast loss
CN113283282A (en) Weak supervision time sequence action detection method based on time domain semantic features
CN117033657A (en) An information retrieval method and device
CN112396126A (en) Target detection method and system based on detection of main stem and local feature optimization
CN117872127A (en) Motor fault diagnosis method and equipment
CN116434076A (en) A Target Recognition Method of Remote Sensing Image Integrating Prior Knowledge
CN116861262B (en) Perception model training method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant