CN114973002A

CN114973002A - Improved YOLOv 5-based ear detection method

Info

Publication number: CN114973002A
Application number: CN202210705045.3A
Authority: CN
Inventors: 赵晋陵; 戴飞杰; 雷雨; 黄林生; 汪传建; 梁栋; 黄文江
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-08-30

Abstract

The invention relates to an improved YOLOv 5-based ear detection method, which is characterized by comprising the following steps: the method comprises the following steps: the method comprises the steps of obtaining a wheat ear image, marking the wheat ear image to obtain a wheat ear image data set, and dividing the wheat ear image data set into a training set and a testing set; constructing a YOLOv5 network model; improving a YOLOv5 network model to obtain an improved YOLOv5 network model; inputting the training set into an improved Yolov5 network model, and training the improved Yolov5 network model; the improved YOLOv5 network model was evaluated and tested. The invention uses four-scale feature detection to increase the shallow detection scale so as to improve the identification precision of small targets; the invention introduces a CA attention mechanism to improve the characteristic extraction capability of the algorithm; according to the invention, the Bounding Box Regression Loss of the CIOU _ Loss as the algorithm Loss function is introduced, so that the detection accuracy of the inspection Box is improved.

Description

Improved YOLOv 5-based ear detection method

Technical Field

The invention relates to the technical field of intelligent agriculture and information-based agriculture, in particular to an improved YOLOv 5-based ear detection method.

Background

Wheat is the most important grain crop in world trade quantity and is also one of the main grain crops in China. In order to ensure national grain safety and establish reasonable grain price and macro-regulation policy, the expected yield of crops such as wheat needs to be predicted and estimated accurately in time. At present, the ear of wheat technology mostly adopts a manual investigation method, and has the defects of time and labor waste, high cost, small investigation range and the like. How to accurately, efficiently and nondestructively identify the wheat ears has important practical significance on the work of wheat production, breeding and the like.

In recent years, a method based on deep learning is more and more widely applied to the fields of vision systems, voice detection, document analysis and the like, and compared with manual feature extraction, the deep learning technology can process images by using a multilayer neural network so as to obtain local information and deep information of the images. In the current deep learning, the YOLOv5 has the advantages of excellent performance, real-time performance, accuracy and the like, so that the algorithm is widely applied. However, the original YOLOv5 still has the problems of easy occlusion, low precision when detecting small targets, and the like. Therefore, a target detection algorithm that is fast and has high accuracy in identifying small occluded targets is required.

Disclosure of Invention

The invention aims to provide an improved YOLOv 5-based ear detection method which can improve the identification precision of small targets and improve the detection accuracy of a check box.

In order to achieve the purpose, the invention adopts the following technical scheme: a modified YOLOv 5-based ear detection method comprising the sequential steps of:

(1) the method comprises the steps of obtaining a wheat ear image, marking the wheat ear image to obtain a wheat ear image data set, and dividing the wheat ear image data set into a training set and a testing set;

(2) constructing a YOLOv5 network model;

(3) improving a YOLOv5 network model to obtain an improved YOLOv5 network model;

(4) inputting the training set into an improved Yolov5 network model, and training the improved Yolov5 network model;

(5) the improved YOLOv5 network model was evaluated and tested.

The step (1) specifically comprises the following steps:

(1a) adopting a DJI Phantom 4PRO unmanned aerial vehicle, carrying a camera with the model of FC6310, and collecting 1500 wheat ear images, wherein the resolution of the wheat ear images is 5472 multiplied by 3648 pixels;

(1b) labeling the ear image by using a data Labeling tool Labeling, Labeling the position of the ear in the image by using a rectangular frame, and Labeling into a YOLO format to obtain an ear image data set;

(1c) and dividing the labeled wheat ear image data set into a training set and a testing set according to the proportion of 7: 3.

In step (2), the YOLOv5 network model includes:

the Input end is used for performing Mosaic data enhancement and adaptive picture scaling;

the backhaul basic network comprises a CON structure, a C3 structure and an SPP structure;

the Neck network adopts a structure of FPN + PAN, wherein the FPN structure is used for transmitting strong semantic feature information from top to bottom; adding an upward characteristic pyramid behind the FPN structure in the PAN structure, wherein the upward characteristic pyramid is used for transmitting strong positioning information from bottom to top;

and the Prediction output layer comprises a Bounding Box Loss function and NMS non-maximum suppression, wherein the Bounding Box Loss function adopts a GIOU _ Loss Loss function.

The step (3) specifically comprises the following steps:

(3a) introducing a CA attention mechanism between a C3 structure and an SPP structure of a Backbone basic network of a YOLOv5 network model;

(3b) adding a detection scale of 160 multiplied by 160 in a YOLOv5 network model, and expanding the detection scale from three-scale detection to four-scale detection; continuously increasing convolution layers and upsampling after 80 × 80 feature layers, fusing the two times of upsampled feature layers with 160 × 160 feature layers to obtain a fourth 160 × 160 detection scale, and automatically generating an anchor frame by adopting a self-contained K-Means algorithm of YOLOv 5;

(3c) the Bounding Box Loss function adopts a CIOU _ Loss function, and the formula is as follows:

where a is a weight coefficient, v represents a distance of an aspect ratio of the detection frame and the real frame, and b ^gt Respectively representing the central points of a prediction frame of the wheat ear and a prediction frame of a non-wheat ear, wherein rho represents the Euclidean distance, C represents the diagonal distance of the minimum circumscribed rectangle of the target, IoU represents the intersection area ratio of the two frames and the union area thereof, and the expressions of a and v are as follows:

in the formula, omega ^gt Is the width of a true rectangular frame, h ^gt The height of the real rectangular frame, ω the width of the detection rectangular frame, and h the height of the detection rectangular frame.

The step (4) specifically comprises the following steps:

(4a) setting training parameters, namely grid training parameters, wherein the grid training parameters comprise iteration times, batch size, learning rate and momentum, the iteration times are 200, the batch size 8, the learning rate 0.01 and the momentum 0.937;

(4b) setting YOLOv5 network model parameters, namely data enhancement parameters, wherein the data enhancement parameters comprise hsv _ h, hsv _ s, hsv _ v, degrees, fliud and fliplr, wherein hsv _ h is 0.015, hsv _ s is 0.7, hsv _ v is 0.4, degrees is 1.0, fliud is 0.01, and fliplr is 0.5;

the improved YOLOv5 network model is trained using the mesh training parameters, the data enhancement parameters, and the training set.

The step (5) specifically comprises the following steps: evaluating the improved YOLOv5 network model by adopting model evaluation indexes, wherein the model evaluation indexes comprise an accuracy P, a recall ratio R and an average precision mean value mAP, and the formula is as follows:

wherein AP is averagePrecision, TP represents the number of correct ears predicted by the model, FP represents the number of samples for identifying non-ear regions as ears, FN represents the number of samples for identifying ears as non-ear regions, M represents the number of categories, R represents the recall ratio, P (R) represents the corresponding accuracy ratio P under different recall ratios R, AP _i Representing the average precision of the ith iteration;

and finally, inputting the test set into a trained improved YOLOv5 network model for testing.

In step (3a), the CA attention mechanism comprises:

(3a1) embedding coordinate information: coding each channel along a horizontal coordinate and a vertical coordinate by using a pooling layer to obtain a pair of direction perception characteristic graphs;

(3a2) generating a coordinate information feature map: firstly, splicing extracted feature information, then performing information conversion by using a1 x 1 convolution transformation function to obtain an intermediate feature map, decomposing the intermediate feature map into two independent tensors along a spatial dimension, converting the two independent tensors into tensors with the same channel number by using two convolutions, and finally expanding an output result to respectively serve as attention weight distribution values.

According to the technical scheme, the beneficial effects of the invention are as follows: firstly, the invention uses four-scale feature detection to increase the shallow detection scale so as to improve the identification precision of small targets; secondly, a CA attention mechanism is introduced, and the feature extraction capability of the algorithm is improved; thirdly, the invention introduces CIOU _ Loss as Bounding Box Regression Loss of algorithm Loss function, and improves the accuracy of detection of the inspection Box.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a block diagram of a YOLOv5 network;

FIG. 3 is a block diagram of a modified YOLOv5 network;

FIG. 4 is a block diagram of a CA attention mechanism;

FIG. 5 is a diagram illustrating the variation of various parameters during the training process.

Detailed Description

As shown in fig. 1, a modified YOLOv 5-based ear detection method comprises the following sequential steps:

(2) constructing a YOLOv5 network model;

(5) the improved YOLOv5 network model was evaluated and tested.

The step (1) specifically comprises the following steps:

(1a) adopting a DJI Phantom 4PRO unmanned aerial vehicle, carrying a camera with the model of FC6310, and collecting 1500 ear images, wherein the resolution of the ear images is 5472 multiplied by 3648 pixels;

As shown in fig. 2, in step (2), the YOLOv5 network model includes:

the Neck network adopts a structure of FPN + PAN, wherein the FPN structure is used for transmitting strong semantic feature information from top to bottom; the PAN structure is additionally provided with an upward characteristic pyramid behind the FPN structure and is used for transmitting strong positioning information from bottom to top;

Although the Yolov5 network model has good performance in the accuracy and speed of detection, it still has great problems in the detection of small objects such as obstructions.

Aiming at the problems, the invention improves the YOLOv5 network model, improves the accuracy, the recall rate and the average precision, and can better detect the wheat ears.

The step (3) specifically comprises the following steps:

(3a) a CA attention mechanism is introduced between the C3 structure and the SPP structure of the Backbone basic network of the YOLOv5 network model, as shown in FIG. 4.

In step (3a), the CA attention mechanism comprises:

(3b) A detection scale of 160 × 160 is added in the YOLOv5 network model, and the three-scale detection is expanded into four-scale detection, as shown in fig. 3; continuously increasing convolution layers and upsampling after 80 × 80 feature layers, fusing the two times of upsampled feature layers with 160 × 160 feature layers to obtain a fourth 160 × 160 detection scale, and automatically generating an anchor frame by adopting a self-contained K-Means algorithm of YOLOv 5;

where a is a weight coefficient, v represents a distance of an aspect ratio of the detection frame and the real frame,b and b ^gt Respectively representing the central points of a prediction frame with the classification of wheat ear and a prediction frame with non-wheat ear, wherein rho represents Euclidean distance, C represents the diagonal distance of the minimum circumscribed rectangle of the target, IoU represents the intersection area ratio of the two frames and the union area thereof, and the expressions of a and v are as follows:

in the formula, ω ^gt Is the width of a true rectangular frame, h ^gt The height of the real rectangular frame, ω the width of the detection rectangular frame, and h the height of the detection rectangular frame.

The step (4) specifically comprises the following steps:

(4a) setting training parameters, namely grid training parameters, wherein the grid training parameters comprise iteration times, batch size, learning rate and momentum, the iteration times are 200, the batch size is 8, the learning rate is 0.01, and the momentum is 0.937;

wherein AP is average precision, TP represents the number of correct ears predicted by the model, FP represents the number of samples for identifying non-ear regions as ears, FN represents the number of samples for identifying ears as non-ear regions, M represents the number of categories, R represents the recall ratio, P (R) represents the corresponding accuracy ratio P under different recall ratios R, AP (point of interest) represents the average precision of the ear regions, and _i representing the average precision of the ith iteration;

The change of each evaluation index of the improved YOLOv5 network model in the training process is shown in FIG. 5, after the training of the improved YOLOv5 network model is completed, the improved YOLOv5 network model is tested by using a test set, the accuracy P reaches 91.1%, the recall rate R reaches 84.9%, the mAP (0.5) reaches 91.1%, and the mAP (0.5:0.9) reaches 48%. The accuracy rate P, the recall rate R, mAP (0.5) and the mAP (0.5:0.9) are respectively improved by 0.2%, 1.3%, 1.9% and 1.8%, and the feasibility of the method is proved.

TABLE 2 comparison of algorithmic Performance

	Precision or P	Recall is R	mAP(0.5)	mAP(0.5:0.9)
					yolov5	0.909	0.836	0.892	0.462
Algorithm of the invention	0.911	0.849	0.911	0.48

The results show that: compared with the original algorithm, the algorithm of the invention has better effect.

In summary, the invention uses four-scale feature detection to increase the shallow detection scale, so as to improve the identification precision of the small target; the invention introduces a CA attention mechanism to improve the characteristic extraction capability of the algorithm; according to the invention, the CIOU _ Loss is introduced as the Bounding Box Regression Loss of the algorithm Loss function, so that the detection accuracy of the inspection Box is improved.

Claims

1. An improved YOLOv 5-based ear detection method is characterized in that: the method comprises the following steps in sequence:

(2) constructing a YOLOv5 network model;

(5) the improved YOLOv5 network model was evaluated and tested.

2. The improved YOLOv 5-based ear detection method according to claim 1, wherein: the step (1) specifically comprises the following steps:

3. The improved YOLOv 5-based ear detection method according to claim 1, wherein: in step (2), the YOLOv5 network model includes:

4. The improved YOLOv 5-based ear detection method according to claim 1, wherein: the step (3) specifically comprises the following steps:

where a is a weight coefficient, v represents a distance of an aspect ratio of the detection frame and the real frame, and b ^gt Respectively representing the central points of a prediction frame with the classification of wheat ear and a prediction frame with non-wheat ear, wherein rho represents Euclidean distance, C represents the diagonal distance of the minimum circumscribed rectangle of the target, IoU represents the intersection area ratio of the two frames and the union area thereof, and the expressions of a and v are as follows:

5. The improved YOLOv 5-based ear detection method according to claim 1, wherein: the step (4) specifically comprises the following steps:

6. The improved YOLOv 5-based ear detection method according to claim 1, wherein: the step (5) specifically comprises the following steps: evaluating the improved YOLOv5 network model by adopting model evaluation indexes, wherein the model evaluation indexes comprise an accuracy P, a recall ratio R and an average precision mean value mAP, and the formula is as follows:

wherein AP is average precision, TP represents the number of correct ears predicted by the model, FP represents the number of samples for identifying non-ear regions as ears, FN represents the number of samples for identifying ears as non-ear regions, M represents the number of categories, R represents the recall ratio, and P (R) represents the corresponding recall ratio under different recall ratios RAccuracy of P, AP _i Representing the average precision of the ith iteration;

7. The improved YOLOv 5-based ear detection method according to claim 4, wherein: in step (3a), the CA attention mechanism comprises: