CN113591901A - Target detection method based on anchor frame - Google Patents

Target detection method based on anchor frame Download PDF

Info

Publication number
CN113591901A
CN113591901A CN202110649575.6A CN202110649575A CN113591901A CN 113591901 A CN113591901 A CN 113591901A CN 202110649575 A CN202110649575 A CN 202110649575A CN 113591901 A CN113591901 A CN 113591901A
Authority
CN
China
Prior art keywords
anchor
frame
detection method
target detection
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110649575.6A
Other languages
Chinese (zh)
Inventor
赵丹萍
唐伟
杨泽文
江倩
杜卉
高世旺
曹宝龙
汤兆鑫
徐瑞东
田媛
王振楠
马永峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHINA AEROSPACE TIMES ELECTRONICS CO LTD
Original Assignee
CHINA AEROSPACE TIMES ELECTRONICS CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHINA AEROSPACE TIMES ELECTRONICS CO LTD filed Critical CHINA AEROSPACE TIMES ELECTRONICS CO LTD
Priority to CN202110649575.6A priority Critical patent/CN113591901A/en
Publication of CN113591901A publication Critical patent/CN113591901A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an anchor frame-based target detection method, which is used for detecting a target based on an SSD (solid State disk) framework, effectively increasing the detection speed by adopting a Mobilenet network framework in a characteristic extraction part, having high accuracy, better balancing the speed and the accuracy of position detection, stronger robustness, extracting a characteristic image of shallow convolution, improving the detection accuracy of a small target and having wide market application potential.

Description

Target detection method based on anchor frame
Technical Field
The invention belongs to the field of image target detection, and relates to a target detection method based on an anchor frame.
Background
With the continuous development of productivity and the acceleration of urbanization process, the problem of labor shortage is more and more prominent in the field of agricultural planting, and an agricultural robot is urgently needed to be used for completely or partially replacing people to efficiently, safely and reliably complete complex tasks. For example, in the flower planting industry, a large amount of manpower is consumed in the picking process, the number of the current picking farmers is far from reaching the picking requirement, and the picking efficiency is low. Therefore, it is urgently required to develop a picking robot which replaces or assists people to work so as to reduce labor cost. In the flower picking robot, the acquisition of visual information of planted flowers is an important prerequisite for supporting the operation of the flower picking robot. Therefore, the method is very important for accurate positioning and classification of the flower targets.
In recent years, with the rapid development of artificial intelligence, a target detection technology based on deep learning gradually becomes a mainstream technical method in the field of image target detection. The current deep learning methods in the field of target detection are mainly divided into two types: a two-stage target detection algorithm and a one-stage target detection algorithm. The two-stage target detection algorithm firstly generates a series of candidate frames serving as samples, and then carries out sample classification and positioning regression through a convolutional neural network, wherein common algorithms include RCNN, Fast RCNN and the like. The one-stage target detection algorithm is to directly convert the positioning problem of the target frame into a regression problem for processing without generating a candidate frame. Common algorithms are yolo (young only look once), ssd (single Shot multi box detector), etc.
The two-stage and one-stage target detection methods have advantages and disadvantages in detection precision and detection speed. Generally, the detection accuracy of the two-stage target detection method is better than that of the one-stage target detection method, and the detection speed is lower than that of the one-stage target detection method.
The form and the size of a sample in the field of agricultural planting are complex, target detection of objects with different sizes needs to be considered, and three flowers in different growth periods need to be identified in the picking link of the flower planting industry: the existing target detection method is difficult to simultaneously meet the requirements of precision, accuracy and speed required by the agricultural planting field.
Disclosure of Invention
The invention aims to overcome the defects and provides a target detection method based on an anchor frame.
In order to achieve the above purpose, the invention provides the following technical scheme:
an anchor frame-based target detection method comprises the following steps:
s1, collecting a target image Data set to be detected, and labeling to obtain a Data set Data 0;
s2, amplifying Data0 to obtain a Data set Data 1;
s3, dividing Data1 into a training set and a test set, wherein the number of samples contained in the training set is larger than that of the samples contained in the test set;
s4, constructing a target detection neural network model, and training by using a training set to obtain a training model;
s5, the image position is predicted by the training model, and the classification result is output.
Further, the number of samples in the Data set Data0 in step S1 is more than or equal to 200; a single sample includes multiple categories of target images.
Further, the method labeled in step S1 is manual labeling; and manually marking the target area by using a rectangular frame, and arranging four vertex coordinates of the rectangular frame in a clockwise direction from the upper left corner of the image.
Further, the target detection neural network model in step S4 includes a feature extraction module, a target prediction module and a non-maximum suppression module;
the characteristic extraction module is used for extracting a characteristic diagram of the input image step by step;
the target prediction module is used for presetting an anchor frame on each feature map and predicting the position and the category of the feature map after feature fusion;
and presetting an anchor frame on each characteristic graph of the non-maximum value suppression module for removing redundant or invalid anchor frames in the image to be detected, reserving the anchor frame with the maximum target class probability, and outputting the position of the anchor frame.
Further, the anchor frame is a plurality of bounding boxes with different sizes and aspect ratios generated by taking each pixel as a center, and the sizes and the aspect ratios are set according to the result of statistical analysis on the data in the data set.
Furthermore, the target detection neural network model is an SSD target detection model architecture, the loss function adopts DIOU _ loss, and the activation function adopts Mish function
Further, in the above-mentioned case,the feature extraction part adopts a Mobilene network architecture, and the normalization part adopts a CmBM algorithm; the maximum suppression module adopts a DIoU-NMS algorithm.
Further, the target prediction module is used for selecting feature maps after convolution 3 to convolution 13 in the Mobilenet framework for position prediction.
Further, in step S3, the ratio of the training set Data11 to the test set Data12 is 9:1, 8:2, or 7: 3.
Further, in step S2, the amplification method is Mosaic data enhancement and self-confrontation training; in the method for enhancing the Mosaic data, the combination of the pictures and the combination of the anchor frames are respectively carried out after the four pictures are randomly turned over, zoomed or subjected to color gamut change. Specifically, the Mosaic data enhancement method is to utilize four pictures for splicing enhancement, and specifically realizes the idea that the four pictures are read each time during enhancement, the four pictures are respectively turned over, zoomed, subjected to color gamut change and the like, and well placed according to four directions to combine the pictures and combine frames; the self-confrontation training includes two phases, a first phase in which the neural network modifies the original image, and a second phase in which the neural network is trained to perform the target detection task on the modified image in a normal manner.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method is used for detecting the target based on the SSD architecture, the detection speed is effectively increased by adopting a Mobilenet network structure in the characteristic extraction part, the accuracy is high, the speed and the accuracy of position detection are well balanced, and the robustness is high.
(2) The invention activates the function Mish function through the loss function DIOU _ loss; the CmBM algorithm is normalized, and the DIoU-NMS algorithm in the maximum suppression module is combined for use, so that the accuracy and the detection speed of target detection are improved;
(3) the method extracts the characteristic image of the shallow layer convolution, and improves the detection accuracy of the small target.
(4) The invention has high applicability and wide market application potential.
Drawings
FIG. 1 is a flowchart of an anchor frame-based target detection method according to the present invention.
Detailed Description
The features and advantages of the present invention will become more apparent and appreciated from the following detailed description of the invention.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
An anchor frame-based target detection method comprises the following steps:
s1, collecting a target image Data set to be detected, and labeling to obtain a Data set Data 0;
s2, amplifying Data0 to obtain a Data set Data 1;
s3, dividing Data1 into a training set and a test set, wherein the number of samples contained in the training set is larger than that of the samples contained in the test set;
s4, constructing a target detection neural network model, and training by using a training set to obtain a training model;
s5, the image position is predicted by the training model, and the classification result is output.
Further, the number of samples in the Data set Data0 in step S1 is more than or equal to 200; a single sample includes multiple categories of target images.
Further, the method labeled in step S1 is manual labeling; and the manual labeling adopts labelimg software to label the position and the type of the target frame, the target area is labeled by a rectangular frame, and the coordinates of four vertexes of the rectangular frame are arranged in the clockwise direction from the upper left corner of the image.
Further, the target detection neural network model in step S4 includes a feature extraction module, a target prediction module and a non-maximum suppression module;
the characteristic extraction module is used for extracting a characteristic diagram of the input image step by step;
the target prediction module is used for presetting an anchor frame on each feature map and predicting the position and the category of the feature map after feature fusion;
the non-maximum value suppression module is used for removing redundant or invalid anchor frames in the image to be detected, reserving the anchor frame with the maximum target class probability and outputting the position of the anchor frame.
Further, the anchor frame is a plurality of bounding boxes with different sizes and aspect ratios generated by taking each pixel as a center, and the sizes and the aspect ratios are set according to the result of statistical analysis on the data in the data set.
Furthermore, the target detection neural network model adopts DIOU _ loss as a loss function of the SSD target detection model architecture, and the Mish function as an activation function
Further, in the above-mentioned case,the feature extraction part adopts a Mobilene network architecture, and the normalization part adopts a CmBM algorithm; the maximum value suppression module adopts a DIoU-NMS algorithm
The characteristic extraction part adopts a Mobilene network to lighten a model, and the maximum suppression module adopts a DIoU-NMS algorithm, so that the model recall rate can be improved, and the detection result can be improved. In the Mobilenet framework, feature graphs after convolution 3 to convolution 13 are selected for prediction, and anchor frames with different sizes are preset on each feature graph for prediction.
Further, the target prediction module is used for selecting feature maps after convolution 3 to convolution 13 in the Mobilenet framework for position prediction.
Further, in step S3, the ratio of the training set Data11 to the test set Data12 is 9:1, 8:2, or 7: 3.
Further, in step S2, the amplification method is Mosaic data enhancement and self-confrontation training; in the method for enhancing the Mosaic data, the combination of the pictures and the combination of the anchor frames are respectively carried out after the four pictures are randomly turned over, zoomed or subjected to color gamut change. Specifically, the Mosaic data enhancement method is to utilize four pictures for splicing enhancement, and specifically realizes the idea that the four pictures are read each time during enhancement, the four pictures are respectively turned over, zoomed, subjected to color gamut change and the like, and well placed according to four directions to combine the pictures and combine frames; the self-confrontation training includes two phases, a first phase in which the neural network modifies the original image, and a second phase in which the neural network is trained to perform the target detection task on the modified image in a normal manner.
Example 1
Fig. 1 shows a target detection method for a cotton picking robot, which comprises the following specific steps:
s1 the data set adopted by the invention is 200 rose images manually collected by the unit and used for training the target detection model, wherein each image contains rose images of various categories. When the data set is collected, shooting is carried out from multiple angles in an all-around mode, and the collection types comprise three types, namely flower buds, flower bones and big flowers.
And manually labeling the acquired Data set to obtain a Data set Data 0. During marking, the flower area is marked in a rectangular frame marking mode, namely four vertex coordinates of the target are marked and are arranged in a clockwise direction from the upper left corner of the image.
And S2, performing Data set amplification on the labeled Data set, wherein the amplification method comprises performing Mosaic Data enhancement and self-confrontation training on the Data set Data0, and in the Mosaic Data enhancement method, randomly turning, zooming or color gamut changing four pictures, and outputting to obtain an amplification Data set Data1 with a definite amplification factor.
S3 Data1 Data enhanced by Data is as follows: the scale of 1 is divided into a training set and a test set.
S4, constructing a target detection neural network model and training by using the labeled training data set to obtain an optimal training model. The neural network includes three modules: the device comprises a feature extraction module, a target prediction module and a non-maximum suppression module. The target detection neural network model adopts an SSD (Single Shot Multi Box Detector) target detection model architecture. In the Mobilenet, feature graphs after convolution 3 to convolution 13 are selected for prediction, and each feature graph is preset with a plurality of anchor frames with different sizes for prediction. Wherein, the loss function during training is DIOU _ loss.
S5, based on the trained neural network model, flower position prediction is carried out on the flower image.
As shown in Table 1, the invention adopts a MobileNet network architecture in the SSD algorithm for feature extraction of images; inputting a single image meeting the requirement into a MobileNet network, in the embodiment, firstly adjusting the width, the height and the channel of the image in a training set to 512 × 512 × 3, improving the original standard convolution kernel with the size of 3 × 3 according to the advantage of depth separable convolution of the MobileNet network, forming two types of new convolution, one new convolution is depth level convolution, adopting 3 × 3 convolution to check that each input channel is respectively convoluted, and outputting a feature map from each channel; another new convolution is point convolution, and the convolution core with the size of 1 x 1 is used for carrying out feature fusion on the output feature graphs to form final output; the number of convolution kernels of the depth convolution layer is as follows: 32, 64, 128, 128, 256, 256, 512, 512, 512, 512, 512, 512, 1024; the number of convolution kernels of the point convolution is as follows: 32, 64, 128, 128, 256, 256, 512, 512, 512, 512, 512, 256, 1024; then the output after passing through 14 depth separable convolution layers is sent to an average pooling layer, and finally the result output of the pooling layer is sent to a full-connection layer; finally, a characteristic result of 1 × 1000 is formed, and then the extracted flower characteristic is classified by using a classifier.
Table 1 MobileNet network configuration diagram
Figure BDA0003111212200000071
The method has the advantages of high detection speed and high precision. Because the cotton picking robot needs to carry out the cotton picking action in real time, higher requirements are put forward for the speed of image target detection, and the detection speed can be greatly increased by adopting a Mobilenet network architecture in the characteristic extraction part of the target detection model constructed by the invention. The invention can meet the application requirement on detection precision and has stronger robustness.
The method can effectively detect the flower images and has wide market application potential.
The invention has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to be construed in a limiting sense. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the technical solution of the present invention and its embodiments without departing from the spirit and scope of the present invention, which fall within the scope of the present invention. The scope of the invention is defined by the appended claims.
Those skilled in the art will appreciate that those matters not described in detail in the present specification are well known in the art.

Claims (10)

1. An anchor frame-based target detection method is characterized by comprising the following steps:
s1, collecting a target image Data set to be detected, and labeling to obtain a Data set Data 0;
s2, amplifying Data0 to obtain a Data set Data 1;
s3, dividing Data1 into a training set and a test set, wherein the number of samples contained in the training set is larger than that of the samples contained in the test set;
s4, constructing a target detection neural network model, and training by using a training set to obtain a training model;
s5 predicts the image position using the training model.
2. The anchor frame-based target detection method as claimed in claim 1, wherein the number of samples in the Data set Data0 in the step S1 is greater than or equal to 200; the single sample includes a plurality of categories of target images.
3. The anchor-frame-based target detection method according to claim 1, wherein the labeling method in step S1 is manual labeling; and the manual labeling is to label the target area by using a rectangular frame, and arrange the coordinates of four vertexes of the rectangular frame from the upper left corner of the image in a clockwise direction.
4. The anchor-frame-based target detection method of claim 1, wherein the amplification method in step S2 is a method using Mosaic data enhancement and self-confrontation training; in the method for enhancing the Mosaic data, the combination of the pictures and the combination of the anchor frames are respectively carried out after the four pictures are randomly turned over, zoomed or subjected to color gamut change.
5. The anchor-frame-based object detection method of claim 1, wherein the object detection neural network model in step S4 comprises a feature extraction module, an object prediction module and a non-maximum suppression module;
the characteristic extraction module is used for extracting a characteristic diagram of the input image step by step;
the target prediction module is used for presetting an anchor frame on each feature map and predicting the position and the category of the feature map after feature fusion;
the non-maximum value suppression module is used for removing redundant or invalid anchor frames in the image to be detected, reserving the anchor frame with the maximum target class probability and outputting the position of the anchor frame.
6. The anchor-frame-based object detection method according to claim 5, wherein the anchor frame is a plurality of bounding boxes with different size and aspect ratios generated around each pixel, and the size and aspect ratio is set according to a result of statistical analysis of data in the data set.
7. The anchor frame-based object detection method of claim 1, wherein the object detection neural network model is an SSD object detection model architecture, wherein the model loss function is DIOU _ nms, and the activation function in the convolutional layer is a Mish function.
8. The anchor frame-based target detection method according to claim 5, wherein the feature extraction part adopts a Mobilene network architecture, and the normalization part adopts a CmBM algorithm; the maximum suppression module adopts a DIoU-NMS algorithm.
9. The anchor frame-based target detection method according to claim 5, wherein the target prediction module is configured to select feature maps after convolution 3 to convolution 13 in a Mobilenet architecture for position prediction.
10. The anchor frame-based target detection method as claimed in claim 1, wherein in step S3, the ratio of the training set Data11 to the test set Data12 is 9:1, 8:2, or 7: 3.
CN202110649575.6A 2021-06-10 2021-06-10 Target detection method based on anchor frame Pending CN113591901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110649575.6A CN113591901A (en) 2021-06-10 2021-06-10 Target detection method based on anchor frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110649575.6A CN113591901A (en) 2021-06-10 2021-06-10 Target detection method based on anchor frame

Publications (1)

Publication Number Publication Date
CN113591901A true CN113591901A (en) 2021-11-02

Family

ID=78243673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110649575.6A Pending CN113591901A (en) 2021-06-10 2021-06-10 Target detection method based on anchor frame

Country Status (1)

Country Link
CN (1) CN113591901A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263819A (en) * 2019-05-28 2019-09-20 中国农业大学 A kind of object detection method and device for shellfish image
CN110378420A (en) * 2019-07-19 2019-10-25 Oppo广东移动通信有限公司 A kind of image detecting method, device and computer readable storage medium
CN111178182A (en) * 2019-12-16 2020-05-19 深圳奥腾光通系统有限公司 Real-time detection method for garbage loss behavior
CN112215861A (en) * 2020-09-27 2021-01-12 深圳市优必选科技股份有限公司 Football detection method and device, computer readable storage medium and robot
CN112270381A (en) * 2020-11-16 2021-01-26 电子科技大学 People flow detection method based on deep learning
CN112836623A (en) * 2021-01-29 2021-05-25 北京农业智能装备技术研究中心 Facility tomato farming decision auxiliary method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263819A (en) * 2019-05-28 2019-09-20 中国农业大学 A kind of object detection method and device for shellfish image
CN110378420A (en) * 2019-07-19 2019-10-25 Oppo广东移动通信有限公司 A kind of image detecting method, device and computer readable storage medium
CN111178182A (en) * 2019-12-16 2020-05-19 深圳奥腾光通系统有限公司 Real-time detection method for garbage loss behavior
CN112215861A (en) * 2020-09-27 2021-01-12 深圳市优必选科技股份有限公司 Football detection method and device, computer readable storage medium and robot
CN112270381A (en) * 2020-11-16 2021-01-26 电子科技大学 People flow detection method based on deep learning
CN112836623A (en) * 2021-01-29 2021-05-25 北京农业智能装备技术研究中心 Facility tomato farming decision auxiliary method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHAOHUI ZHENG ET AL.: "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression", 《ARXIV.1911.08287V1 [CS.CV]》, pages 142 - 157 *

Similar Documents

Publication Publication Date Title
Fu et al. Fast and accurate detection of kiwifruit in orchard using improved YOLOv3-tiny model
Sadeghi-Tehran et al. DeepCount: in-field automatic quantification of wheat spikes using simple linear iterative clustering and deep convolutional neural networks
Rustia et al. Automatic greenhouse insect pest detection and recognition based on a cascaded deep learning classification method
Chen et al. Detecting citrus in orchard environment by using improved YOLOv4
Wang et al. YOLOv3‐Litchi Detection Method of Densely Distributed Litchi in Large Vision Scenes
CN106372648A (en) Multi-feature-fusion-convolutional-neural-network-based plankton image classification method
CN110770752A (en) Automatic pest counting method combining multi-scale feature fusion network with positioning model
CN111340141A (en) Crop seedling and weed detection method and system based on deep learning
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN112215795B (en) Intelligent detection method for server component based on deep learning
CN112529090B (en) Small target detection method based on improved YOLOv3
Wen et al. Wheat spike detection and counting in the field based on SpikeRetinaNet
CN113487576B (en) Insect pest image detection method based on channel attention mechanism
Wang et al. Field rice panicle detection and counting based on deep learning
CN111178177A (en) Cucumber disease identification method based on convolutional neural network
CN110059539A (en) A kind of natural scene text position detection method based on image segmentation
US10304194B2 (en) Method for quantifying produce shape
CN114821102A (en) Intensive citrus quantity detection method, equipment, storage medium and device
CN114781514A (en) Floater target detection method and system integrating attention mechanism
CN109242826B (en) Mobile equipment end stick-shaped object root counting method and system based on target detection
CN114140665A (en) Dense small target detection method based on improved YOLOv5
CN114548208A (en) Improved plant seed real-time classification detection method based on YOLOv5
CN116385374A (en) Cell counting method based on convolutional neural network
Lu et al. Citrus green fruit detection via improved feature network extraction
CN109615610B (en) Medical band-aid flaw detection method based on YOLO v2-tiny

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination