CN115019226A

CN115019226A - Tea leaf picking and identifying method based on improved YoloV4 model

Info

Publication number: CN115019226A
Application number: CN202210523294.0A
Authority: CN
Inventors: 王白娟; 杨贺凯; 蔡小波; 吴奇; 刘晓慧; 邓秀娟; 袁文侠; 张世浩; 杨春华
Original assignee: Yunnan Agricultural University
Current assignee: Yunnan Agricultural University
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-09-06

Abstract

The invention discloses a tea leaf picking and identifying method based on an improved YooloV 4 model, belongs to the technical field of image target detection, and aims to optimize the traditional model feature extraction of the tea leaf picking and identifying method based on the improved YooloV 4 model, reduce the model calculation amount, realize image identification through a small control panel, reduce the size of a picking equipment identification module and facilitate application to classified picking of superior tea leaves.

Description

Tea leaf picking and identifying method based on improved YoloV4 model

Technical Field

The invention belongs to the technical field of image target detection, and particularly relates to a tea leaf picking and identifying method based on an improved Yolov4 model.

Background

At present, with the continuous increase of tea demand, automatic tea picking equipment is gradually selected for use in a large-scale tea plantation for tea picking, and the western swimming tea picking equipment is used for cutting tea leaves close to each other by using a compound cutting knife under the assistance of manpower, so that the picking efficiency is high, and the labor intensity of tea picking is greatly relieved. The high-quality tea leaves are divided into multiple grades such as one bud and one leaf, one bud and two leaves, one bud and three leaves and the like according to picking grades, but the existing tea picking equipment generally has no recognition function, the picking mode is undifferentiated picking, the quality of the picked tea leaves is not selected, and the existing high-quality famous tea leaves still need to be picked manually. In order to improve the identification accuracy of the picking equipment, part of the picking equipment selects a traditional visual identification algorithm to carry out picking grading according to picked tea images, but the traditional visual identification algorithm has large model, large operation data amount and higher requirement on identification processing equipment, and an industrial personal computer is required to be configured for carrying out image identification, so that the picking equipment has overlarge volume and cannot work in a tea garden with high planting density. The mode of identifying and judging through the cloud server is adopted to grade tea leaves to be picked, the requirement on a network is higher, and network infrastructure of a part of mountain tea gardens is poorer, so that normal work of picking equipment is influenced.

Disclosure of Invention

In order to overcome the problems in the background art, the invention provides a tea leaf picking and identifying method based on an improved YOLOV4 model, which optimizes the feature extraction of the traditional model, reduces the model calculation amount, can realize image identification through a small control panel, reduces the size of an identification module of picking equipment, and is convenient to apply to classified picking of superior tea leaves.

In order to achieve the purpose, the invention is realized by the following technical scheme: 1. a tea leaf picking and identifying method based on an improved YooloV 4 model comprises the following steps: step 1: collecting a tea picture sample, and manually labeling the tea picture to complete data set manufacturing;

step 2: dividing an initial picture sample set and a marked picture sample set into a training set, a verification set and a test set;

and step 3: constructing a tea leaf picking grade target recognition model, wherein the tea leaf picking grade target recognition model is an improved YoloV4 target recognition model, and a MobilenetV3 feature extraction network is used for replacing a CSPDarkNet53 feature capture network;

and 4, step 4: importing the characteristic values captured by using a characteristic capture network MobilenetV3 into a characteristic layer to perform convolution operation for 3 times, importing the characteristic layer into a spatial pyramid pooling layer, and pooling the characteristic layer by using maximum pooling layers with different sizes;

and 5: stacking the pooled results, performing convolution for 3 times again, performing up-sampling on the feature layer after convolution for 3 times, stacking the feature layer after convolution for 3 times with a feature layer 1 and a feature layer 2 in a trunk feature extraction network to realize feature fusion, and performing down-sampling in the second stage after the construction of a feature pyramid is completed;

step 6: setting a loss function, adding a cosine annealing attenuation function, and performing iterative training on the tea picking grade target identification model by using a training set until the loss function is converged to obtain a trained tea picking grade target identification model;

and 7: performing performance evaluation on the trained tea picking grade target identification model by using a verification set, and testing again by using a test set after the evaluation reaches the standard;

and 8: and (4) guiding the evaluated tea leaf picking grade target identification model into a controller, and performing real-time video prediction on the picked tea leaves.

Further, the step 3 comprises the following steps:

step 3.1: setting a volume block in a YoloV4 trunk feature network as Depthwise-separable-volume, adopting a Bneck structure, and setting an activation function as H-swish;

step 3.2: setting the input layer pictures as uniform size to be input into a feature capture network;

step 3.3: changing the original picture into 224 × 3 as a first feature layer by using a convolution network conv2d structure in MobilenetV 3;

step 3.4: changing the first feature layer into 112 × 16 as a second feature layer by using a residual network bneck3 × 3 structure in MobilenetV 3;

step 3.5: changing the second feature layer into 56 × 24 as a third feature layer by using a residual network bneck5 × 5 structure in MobilenetV 3;

step 3.6: the third feature layer was changed to 28 × 40 as the fourth feature layer using the residual net bneck3 × 3 structure in MobilenetV3,

step 3.7: changing the third feature layer into 14 × 112 as a fifth feature layer by using a residual network bneck3 × 3 structure in MobilenetV 3;

step 3.8: the sixth feature layer was changed to 1 × 1280 as a feature output layer using pool pooling structure in MobilenetV3 and convolutional network conv2d, NBN structure.

Further, the step 6 comprises the following steps:

step 6.1: setting a loss function according to the data set;

step 6.2: setting the number of iterations to 10000;

step 6.3: the training is divided into two stages, namely a freezing stage and a thawing stage, wherein the first 5000 iterations are set as the freezing stage, and the second 5000 iterations are set as the thawing stage;

step 6.4: and finishing setting and starting training, storing the trained model after 10000 times of iteration, drawing a loss function curve in the iteration process, and selecting the optimal model as a tea leaf picking grade target identification model according to the loss function curve.

Further, the step 6.1 comprises the following steps:

step 6.1.1: using y _ true to take out the position of the point where the target really exists in the characteristic layer and the corresponding type of the point;

step 6.1.2: according to the class cluster objects distributed to the class clusters, repeatedly calculating and updating the class cluster clustering center;

step 6.1.3: when for each graph, IoU for all real and predicted blocks are calculated;

step 6.1.4: calculating LOSS _CIOU As a function of the regression loss.

Further, the calculation formula of step 6.1.4 is:

in the formula, p ² (b，b ^gt ) C represents the diagonal distance of the minimum closure area which can simultaneously contain the prediction frame and the real frame, alpha and v are penalty terms of the length-width ratio, and the formula of the alpha is as follows:

the formula for v is as follows:

in the formula, w ^gt 、h ^gt The width and height of the real box, respectively, and w, h the width and height of the predicted box, respectively.

The invention has the beneficial effects that: a tea leaf picking and identifying method based on an improved YOLOV4 model optimizes feature extraction of a traditional model, reduces model calculation amount, can realize image identification through a small control board, reduces the size of a picking equipment identification module, and is convenient to apply to classified picking of superior tea leaves.

Drawings

FIG. 1 is a flow chart of the steps of the present invention;

FIG. 2 is a diagram of the original YoloV4 framework;

FIG. 3 is a schematic structural diagram of MobilenetV3 according to the present invention;

FIG. 4 is a block diagram of the improved YOLOV4 of the present invention;

FIG. 5 is a graph of an example loss function;

FIG. 6 is a graph comparing model performance.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings to facilitate understanding of the skilled person.

The invention discloses a tea leaf picking and identifying method based on an improved YOLOV4 model, which comprises the following steps:

step 1: collecting a tea picture sample, and manually labeling the tea picture to complete data set manufacturing;

in the embodiment, a tea picture sample is collected, images of different picking grades of one bud, one leaf, two leaves and three leaves of tea are manually marked by using LabelImg, the images of one bud, one leaf, two leaves and three leaves of tea are ensured to be positioned in the center of a marking frame, and a generated picture corresponding to an XML file is stored in a label folder, so that a data set is manufactured;

step 2: dividing the initial picture sample set and the real frame picture sample set into a training set, a verification set and a test set according to a proper proportion;

in the embodiment, a data set is randomly divided into a training set, a verification set and a test set according to a ratio of 6:2:2, the training set, the verification set and the test set are independent of each other, in specific identification, the training set is used for training a model, the verification set is used for verifying the performance of the model after training is completed, and the test set is used for drawing a loss function curve.

And 3, step 3: a tea leaf picking grade target recognition model is constructed, the tea leaf picking grade target recognition model is an improved YoloV4 target recognition model, a MobilenetV3 feature extraction network is used for replacing a CSPDarkNet53 feature capture network, and a schematic structural diagram of MobilenetV3 adopted in the embodiment is shown in fig. 3.

The CSPDarkNet53 is selected as a feature extraction network in a traditional YoloV4 target recognition model, the structure of an original YoloV4 target recognition model is shown in figure 2, a CSPDarkNet53 feature extraction network model is large, poor in pertinence and high in requirements on processing and analyzing equipment, so that the CSPDarkNet53 feature capture network is replaced by a MobilenetV3 feature extraction network, feature extraction of a traditional model is optimized, the model operation amount is reduced, image recognition can be realized through a small-sized control panel, the size of a recognition module of picking equipment is reduced, and the model is conveniently applied to classified picking of superior tea leaves.

Step 3.1: setting a volume block in a YoloV4 backbone feature network as Depthwise-partial-volume, adopting a Bneck structure, and setting an activation function as H-swish; in a trunk feature extraction network CSPDarknet53 of YoloV4, a convolution block is DarknetConv2D _ BN _ Mish, an activation function is MISh, the convolution block is replaced by Depthwise-partial-consistent, a Bneck structure is adopted, and an H-swish activation function is used for replacing the MISh activation function in CSPDarknet53, the structure of MobileneetV 3 of the invention is schematically shown in FIG. 3, the frame diagram of improved YbilenetV 4 of the invention is shown in FIG. 4, and the feature capture network of larger CSPDarknet53 in YOLOV4 is replaced by MobileneetV 3.

step 3.6: changing the third feature layer into 28 × 40 as a fourth feature layer by using a residual network bneck3 × 3 structure in MobilenetV 3;

in this embodiment, the feature value captured by the feature capture network mobileneetv 3 is introduced into the feature layer to perform convolution operation for 3 times, the feature layer is introduced into a Spatial pyramid pooling layer (SPP), and the feature layer is pooled by using maximum pooling layers (5, 9, 13) of different sizes.

And 5: stacking the pooled results, performing convolution for 3 times again, performing up-sampling on the feature layer after convolution for 3 times, stacking the feature layer after convolution for 3 times with the feature layer 1 and the feature layer 2 in the trunk feature extraction network to realize feature fusion, and performing down-sampling in the second stage after the construction of the feature pyramid is completed. The purpose of continuously upsampling and downsampling is to stack the samples for better features.

in this example, step 6.1: setting a loss function according to the data set;

further, step 6.1.1: using y _ true to take out the position of the point where the target really exists in the characteristic layer and the corresponding type of the point;

in this embodiment: step 6.1.1: and using y _ true to extract the position of the point where the target really exists in the feature layer and the corresponding type of the point.

step 6.1.4: calculating LOSS _CIOU As a function of the loss.

The IoU parameter calculated in step 6.1.3 represents the cross-over ratio, which is the most common index in target detection, IoU can be used to determine the positive sample and the negative sample, and can also be used to evaluate the distance between the prediction frame and the real frame, but IoU cannot accurately reflect the overlap ratio of the real frame and the prediction frame, LOSS _CIOU And considering the distance, the overlapping rate, the scale and the penalty term between the prediction box and the real box, so that the prediction box regression becomes more stable.

The calculation formula of step 6.1.4 is:

the formula for v is as follows:

in the formula, w ^gt 、h ^gt The width and height of the real box, respectively, W, h are the width and height of the predicted box, respectively.

Step 6.2: setting the number of iterations to 10000;

step 6.3: the training is divided into two stages, namely a freezing stage and a thawing stage, and the first 5000 iterations are set as the freezing stage and the second 5000 iterations are set as the thawing stage. The video memory occupied by the feature extraction network is set to be small when the feature extraction network is not changed in the first 5000 times of freezing stages, the network is only subjected to fine adjustment, the main trunk of the model is not frozen when the feature extraction network is changed in the second 5000 times of unfreezing stages, all parameters of the network are changed, and the video memory occupied by the network is increased. By adopting the iteration mode, the data volume needing to be processed during training is reduced, and the requirement of the model on the processor is reduced.

In this embodiment, the loss function curve is shown in fig. 5, and the loss function curve is divided into a Train loss function curve and a val loss function curve, where the Train loss function curve represents the loss value of the whole training set; the val loss function curve represents the loss value for the entire test set. When the model is trained, the calculated loss function curve has approximately the following relationship:

when Train loss decreases, val loss stabilizes: network overfitting;

when Train loss stabilizes, val loss decreases: if the data set has serious problems, whether the label file has annotation errors or the data set is poor in quality can be checked, and the tea sample picture is reselected for annotation;

when Train loss decreases, val _ loss decreases: training is normal, and the model can be selected as a tea picking grade target identification model.

in this embodiment, a verification set is used to perform performance evaluation on the trained tea picking grade target identification model, and the comparison and evaluation of the improved algorithm and the original yoolov 4 result is shown in table 1.

TABLE 1 evaluation table comparing improved algorithm with original YoloV4 test result

According to the experimental result, the detection speed, the accuracy rate and the aspect of the redesigned tea picking grade target identification model are improved to some extent, and compared with the original accuracy rate, the accuracy rate is improved by 6.89%, and the detection speed of the model is improved by 6.4 times. Therefore, the tea leaf picking grade can be detected more accurately and effectively by using the improved YoloV4 according to different detection scenes.

In this example, adopt the raspberry group as the controller, the raspberry group is a microcomputer mainboard based on ARM, and small being convenient for install into small-size tea picking equipment, pick grade division discernment to superior quality tealeaves automatically. The installation of the internal system of the raspberry group is completed, the Rasbian OS system provided by the raspberry group official is adopted in the raspberry group identification system, the Python library in the Rasbian OS system of the raspberry group is used for development, the camera detection algorithm is compiled and called on the basis of the model calling algorithm, and the video real-time detection and identification are carried out by connecting the raspberry group USB with the camera sensor.

And (3) the model generated in the step 6.5 is loaded into a raspberry group specified folder for model calling, the generated model is imported into the raspberry group, the raspberry group is connected with a camera through a usb interface, the camera is called and a model program generated before is called in the environment of the raspberry group, and all operations of real-time video prediction are realized through pictures captured by the camera in real time and calling of the model generated before.

Finally, it is noted that the above preferred embodiments are merely illustrative of the technical solutions of the invention and not restrictive, and that, although the invention has been described in detail with reference to the above preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention.

Claims

1. A tea leaf picking and identifying method based on an improved YoloV4 model is characterized by comprising the following steps:

and 4, step 4: importing the characteristic values captured by using a characteristic capture network MobilenetV3 into a characteristic layer for convolution operation for 3 times, importing the characteristic layer into a spatial pyramid pooling layer, and pooling the characteristic layer by using maximum pooling layers with different sizes;

2. The tea picking identification method based on the improved yoolov 4 model as claimed in claim 1, wherein the step 3 comprises the following steps:

step 3.1: setting a volume block in a YoloV4 backbone feature network as Depthwise-partial-volume, adopting a Bneck structure, and setting an activation function as H-swish;

step 3.8: the sixth feature level was changed to 1 x 1280 as a feature output level using pool pooling in MobilenetV3 and convolutional networks conv2d, NBN.

3. The tea leaf picking identification method based on the improved YoloV4 model as claimed in claim 1, wherein the method comprises the following steps: the step 6 comprises the following steps:

step 6.1: setting a loss function according to the data set;

step 6.2: setting the number of iterations to 10000;

step 6.3: dividing training into two stages, namely a freezing stage and a thawing stage, setting the first 5000 iterations as the freezing stage and the second 5000 iterations as the thawing stage;

4. The tea picking identification method based on the improved yoolov 4 model as claimed in claim 3, wherein the method comprises the following steps: the step 6.1 comprises the following steps:

step 6.1.2: according to the class cluster objects distributed to the class clusters, repeatedly calculating and updating the class cluster center;

step 6.1.4: calculating LOSS _CIOU As a function of the regression loss.

5. The tea picking identification method based on the improved yoolov 4 model as claimed in claim 4, wherein the method comprises the following steps: the calculation formula of step 6.1.4 is:

in the formula, p ² (b，b ^gt ) C represents the diagonal distance of the minimum closure area which can contain the prediction frame and the real frame at the same time, alpha and v are penalty terms of the length-width ratio, and the formula of the alpha is as follows:

the formula for v is as follows: