CN114693966A

CN114693966A - Target detection method based on deep learning

Info

Publication number: CN114693966A
Application number: CN202210259711.5A
Authority: CN
Inventors: 潘晓光; 王小华; 陈亮; 张雅娜; 张娜; 姚珊珊
Original assignee: Shanxi Sanyouhe Smart Information Technology Co Ltd
Current assignee: Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2022-07-01

Abstract

The invention belongs to the technical field of machine learning, and particularly relates to a target detection method based on deep learning, which comprises the following steps of: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training; data preprocessing: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model; model construction: adopting a low-dimensional convolutional neural network to build a recognition classification model, inputting training data, completing the building of a parameter model, and storing the model: when the loss function of the model is not reduced any more, the model is saved; and (3) model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance. The invention provides a new model (G-RCN) for separating classification and positioning tasks, optimizes the difference between the classification and positioning tasks and realizes the improvement of performance.

Description

Target detection method based on deep learning

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a target detection method based on deep learning.

Background

At present, the deep neural network achieves the performance of a super-human level in 1000 types of target recognition tasks, but the detection network does not achieve the performance equivalent to that of a human in 80 types of target identification tasks, which indicates that a gap exists between the recognition task and the detection task. Object detection requires powerful classification performance and the ability to pinpoint an object among an infinite number of candidate locations.

Problems or disadvantages of the prior art: at present, a target detection model utilizes a shared feature map to complete classification and positioning tasks at the same time, but the performance of the original fast RCNN and the RCNN of partial separation feature mapping can only be compared, and the performance is poor.

Disclosure of Invention

Based on the method, the target detection method based on deep learning is provided, the construction of an original data set is completed by collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set, and images in the COCO data set are classified according to the pixel size. And after the data collection is finished, preprocessing the data, wherein the preprocessing comprises segmentation, normalization and the like. Inputting the preprocessed data into the built G-RCN fusion network model for training the network model, storing the model until the loss function of the model does not decrease any more, completing model construction, inputting the test data into the stored network model to complete model evaluation, and identifying the performance of the model.

The application discloses a target detection method based on deep learning, which comprises the following steps,

s1, data acquisition: collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set EMG to complete the construction of an original data set, labeling the category of the original data set, and completing the construction of a data set required by model training;

s2, preprocessing data: preprocessing data, and dividing different types of original data pictures by different data segmentation methods to ensure the training effect of the model;

s3, model construction: building a recognition classification model by adopting a low-dimensional convolutional neural network, inputting training data, and completing the building of a parameter model;

s4, model storage: when the loss function of the model is not reduced any more, the model is saved;

s5, model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance.

Further, in step S1, the images in the original data set are used for performance testing of the model, and the images in the COCO data set are classified according to the size of pixels, where the classification is based on the size of the images being respectively less than 32 × 32, greater than 32 × 32, less than 96 × 96, and greater than 96 × 96, and the PASCAL VOC data set includes a training set and a testing set, and the two data sets are integrated to construct training and testing suitable for the network model.

Further, in step S2: the method comprises data segmentation and image scaling, wherein the data segmentation is used for segmenting an obtained original data set, and the data segmentation is carried out according to the following steps of 7: 3, dividing the model into a training set and a testing set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation;

the image scaling is as follows: the pixel sizes of the images are unified to 600 × 600 before the images are input.

Further, in step S3: constructing a convolutional network based on a gap optimization region to realize the identification of minimum texture without adding any additional modules or information, adopting G-RCN on fast R-CNN with ResNet101 backbone, separating ResNet101 from the last 6 bottlenecks of conv4 blocks, and using conv5 blocks as original blocks in head; the first convolution layer of the first bottleneck in the conv4 block originally has a stride of 2, is modified to a stride of 1 in the localization branch, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block, the conv5 layer modifies stride to 1, and where all layers of the conv5 block in the ResNet101 used as headers, appended to the RoI pooling layer, the backbone is composed of all layers in the first four blocks, and a shared signature for classification and localization is generated.

Further, in step S5, the index is evaluated by using AP and MPA, and the formula is as follows:

where pii represents the number of correctly classified pixels; pij represents the number of all pixels, i, j are serial numbers;

where N is all categories.

Compared with the prior art, the invention has the following beneficial effects:

the invention designs a target detection method based on deep learning, which analyzes the classification and positioning tasks of a target detection model based on a region and respectively researches the effects of the two tasks, so that the high-level feature sharing of the classification and positioning tasks is suboptimal, the large span is beneficial to classification but not beneficial to positioning, and the global context information can improve the classification performance. On the basis, a new model (G-RCN) for separating classification and positioning tasks is provided, the difference between the classification and positioning tasks is optimized, and the performance is improved.

Drawings

FIG. 1 is a block flow diagram of the present invention.

Detailed Description

Technical inventions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The application discloses a target detection method based on deep learning, as shown in fig. 1, comprising the following steps,

collecting a COCO 2014 detection data set and a PASCAL VOC 2007+2012 data set to complete the construction of an original data set, wherein the COCO data set comprises 80 categories, 83000 images are used for training, 40000 images are used for performance testing of a model, and the images in the COCO data set are classified according to the pixel size, and the classification is carried out according to the sizes of the images which are respectively smaller than 32 x 32, larger than 32 x 32, smaller than 96 x 96 and larger than 96 x 96. The PASCAL VOC data set comprises 11000 training sets and 5000 testing sets, and the two data sets are integrated to construct a special data set suitable for the experiment and used for training and testing a network model.

data normalization: Min-Max normalization was performed for each piece of data.

Data segmentation: and segmenting the obtained original data set according to the following steps of 7: 3 into a training set and a test set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation.

Image zooming: because the image source directions in the original data are different, the sizes of the obtained images are different, and in order to meet the input requirement of a training model, the pixel sizes of the images are unified to 600 multiplied by 600 before the images are input, so that the model detection performance is improved.

S3, the specific method for model construction is as follows: building a recognition classification model by adopting a low-dimensional convolution neural network, inputting training data, and completing building of a parameter model; a Convolutional Network (G-RCN) Based on a Gap optimized Region is constructed, the identification of the minimum texture is realized under the condition of not adding any extra module or information, the G-RCN is adopted on a Faster R-CNN with a ResNet101 backbone, and a method for separating the two tasks is provided. ResNet101 was detached from the last 6 necks of the conv4 block, while the conv5 block was still used as the original block in the head. The original stride of the first convolution layer of the first bottleneck in the conv4 block is 2, and the modified stride is 1 in the localization branch. However, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block. Where the RoI is aligned to output size 7 × 7 instead of 14 × 14 to save memory footprint on level conv5, and level conv5 modifies stride to 1 to improve localization performance. And all layers of the conv5 block in the ResNet101 used therein as headers, appended to the RoI pooling layer. The skeleton consists of all layers in the first four blocks and generates a shared feature map for classification and localization. In addition, the conv5 block utilizing the header can obtain better performance than the conv5 block which is put into a feature extraction network and shared MLP is used at the header, and meanwhile, stronger and more abstract feature representation can be obtained, which is beneficial to classification; the used Faster R-CNN learns the features of the low level by means of multi-task learning, starts splitting from the last few layers of the backbone and updates the feature map according to the correlation of the local part and the global context. Meanwhile, the stride of the positioning branch is reduced, the performance of the model is improved, and finally, the classification and positioning branches share one head to prevent the parameters from being obviously increased; the Gap-optimized Region is mainly used for improving the performance without additional functions by slightly modifying the paradigm, and the improvement is mainly from 3 aspects: separating, detecting global context and stride. The classification and positioning tasks are respectively placed on different parts of the detection model, so that the performance of the whole target detection model is improved. The method has the advantages that the global context information is rich, more auxiliary information and diversified information are added, and the classification can be obviously improved. Pooling layers are widely used in convolutional neural networks. In the tasks of classification, segmentation and detection, the pooling layer is an important component of the feature extraction network, and plays an important role in reducing resolution, reducing parameters, transmitting main information and the like. However, pooling layers also suffer from drawbacks including loss of sub-optimal information and alignment inaccuracies, preserving information by using a 2 × 2 convolution kernel with stride of 2 instead of pooling layers, while reducing the resolution of feature maps.

S4 model saving: when the loss function of the model is no longer decreasing, the model is saved.

S5, model evaluation: and inputting the test data into the stored network model to complete the evaluation of the model performance. In order to complete the performance evaluation of the model, the evaluation indexes are obtained by using AP, MPA and the like, and the formula is as follows:

where pii represents the number of correctly classified pixels; pij denotes the number of all pixels.

Wherein N is all of the classes

Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are included in the scope of the present invention.

Claims

1. A target detection method based on deep learning is characterized by comprising the following steps,

2. The method for detecting an object based on deep learning of claim 1, wherein in step S1, the images in the raw data set are used for performance test of the model, and the images in the COCO data set are classified according to pixel size, the classification is based on sizes respectively smaller than 32 × 32, larger than 32 × 32, smaller than 96 × 96, and larger than 96 × 96, and the paschaloc data set includes a training set and a testing set, and the two data sets are integrated to construct the training and testing set suitable for the network model.

3. The deep learning-based object detection method according to claim 2, wherein in step S2: the method comprises data segmentation and image scaling, wherein the data segmentation is used for segmenting an obtained original data set, and the data segmentation is carried out according to the following steps of 7: 3, dividing the model into a training set and a testing set, wherein the training set is used for training the model, and the rest images are used for final model performance evaluation;

4. The deep learning-based object detection method according to claim 3, wherein in step S3: constructing a convolutional network based on a gap optimization region to realize the identification of minimum texture without adding any additional modules or information, adopting G-RCN on fast R-CNN with ResNet101 backbone, separating ResNet101 from the last 6 bottlenecks of conv4 blocks, and using conv5 blocks as original blocks in head; the first convolution layer of the first bottleneck in the conv4 block originally has a stride of 2, is modified to a stride of 1 in the localization branch, the classification and localization branches share the same kernel of the first 17 bottlenecks of the conv4 block, the conv5 layer modifies stride to 1, and where all layers of the conv5 block in the ResNet101 used as headers, appended to the RoI pooling layer, the backbone is composed of all layers in the first four blocks, and a shared signature for classification and localization is generated.

5. The method for detecting an object based on deep learning of claim 4, wherein in the step S5, the evaluation index is evaluated by using AP and MPA, and the formula is as follows:

where N is all categories.