CN115272791A

CN115272791A - Multi-target detection positioning method for tea based on YoloV5

Info

Publication number: CN115272791A
Application number: CN202210866833.0A
Authority: CN
Inventors: 朱立学; 张智浩; 林桂潮; 张世昂; 陈品岚; 官金炫; 陈明杰; 林深凯; 吴天骏
Original assignee: Zhongkai University of Agriculture and Engineering
Current assignee: Zhongkai University of Agriculture and Engineering
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2022-11-01
Anticipated expiration: 2042-07-22
Also published as: CN115272791B

Abstract

The invention provides a yoloV 5-based multi-target detection and positioning method for tea, which specifically comprises the steps of S01, constructing a tea tender shoot image data set; s02, improving a YoloV5 detection network; s03, obtaining three-dimensional point cloud of tea tender shoots, fitting a minimum external cuboid of the tea tender shoots, and obtaining tea tender shoot picking points. The method can effectively carry out multi-target detection and positioning on the tea tender shoots, so that the positions of the tea tender shoots are accurately and effectively identified, the tea tender shoots are intelligently picked by matching with a picking tool, the picking efficiency is improved, the picking time is saved, and the labor cost is reduced.

Description

YoloV 5-based multi-target detection and positioning method for tea

Technical Field

The invention relates to the technical field of tea positioning, in particular to a yoloV 5-based multi-target detection and positioning method for tea.

Background

Tea processing, also known as tea making, is a process of picking fresh leaves on tea trees and then processing the fresh leaves into various semi-finished or finished tea products through various processing procedures; wherein, picking tea tender shoots (or fresh leaves) is one of the important links of tea processing production. At present, picking of tender buds (or fresh leaves) of tea leaves mainly adopts manpower, but the manual picking mode is low in efficiency, the labor intensity of workers is increased, and a large amount of labor cost is wasted; meanwhile, a mode of picking tea leaves by using a tea picking machine exists in the prior art, but the current working mode of the tea picking machine is mostly a 'one-knife-cutting' reciprocating cutting mode which has no selectivity, and the tea shoots grow highly inconsistently due to the influence of various environmental factors (such as illumination, gravity, temperature, humidity and the like) on tea trees, and the 'one-knife-cutting' reciprocating cutting mode is easy to cause missed picking and wrong picking, and even causes damage to the tea shoots. Therefore, how to identify and judge the position of the tea leaves, so that accurate and mechanical tea leaf picking is realized, the problems of missed picking, wrong picking and tea leaf damage are avoided, and the intelligent tea leaf picking machine is one of the challenges faced by the intelligent tea leaf picking at present.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a yoloV 5-based multi-target detection and positioning method for tea, which can effectively perform multi-target detection and positioning on tea tender shoots, so that the positions of the tea tender shoots can be accurately and effectively identified, a picking tool is matched, intelligent picking of the tea tender shoots is realized, the picking efficiency is improved, the picking time is saved, and the labor cost is reduced.

The purpose of the invention is realized by the following technical scheme:

a tea multi-target detection positioning method based on YoloV5 is characterized in that: the method specifically comprises the following steps:

s01, constructing a tea tender shoot image data set;

s02, constructing a feature map with rich semantic information through a bidirectional feature pyramid network and a channel attention mechanism based on the data set in the step S01, improving a YoloV5 detection network, obtaining a YoloV5 target detection network model, and detecting small-size tea tender shoots;

s03, obtaining tea three-dimensional point cloud based on the training result of the YoloV5 target detection network model in the step S02; then screening out tea tender shoot three-dimensional point clouds from the tea three-dimensional point clouds; and finally fitting the minimum external cuboid of the tea tender shoot to obtain the accurate position and picking point of the tea tender shoot.

Further optimization, the step S01 specifically includes: firstly, collecting tea tender shoot image data by using an RGB-D camera to obtain a color image and a depth image of tea tender shoots; then, labeling the color image by using a labeling tool, performing data set enhancement operation, expanding the number of the data sets, and constructing a tea tender shoot image data set; and finally, dividing the data set into a training set, a testing set and a verification set.

Preferably, the labeling tool is a Labelimg labeling tool.

For further optimization, the yoloV5 detection network comprises a backhaul module, a Neck module and a Head module; the backhaul module comprises a Focus module, an SPP module and a CBS module which are used for slicing the pictures, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a Detect layer for detecting targets on feature maps of different scales by using a grid anchor-based method.

Preferably, the yoolov 5 detection network adopts a network model with the smallest model file size and the smallest depth and width of the feature map.

For further optimization, the step S02 specifically includes:

s21, firstly, preprocessing the images in the training set in the step S01 and unifying the resolution of all the images in the training set; inputting the preprocessed image into a Backbone module to obtain characteristic graphs with different sizes;

s22, inputting the Feature maps with different sizes in the step S21 into a neutral module, and performing multi-Feature fusion by adopting a bidirectional Feature Pyramid Network (BiFPN) to replace an original Path Aggregation Network (PANET) in the neutral module; sequentially carrying out up-sampling and down-sampling on the feature maps, splicing the feature maps through an Efficiency Channel Attention (ECA) mechanism to generate feature maps with various sizes, and inputting the feature maps into a Detect layer of a Head module;

s23, combining various loss functions to perform back propagation, and updating and adjusting the weight parameters of the gradient in the model;

and S24, finally, verifying the existing model by adopting the verification set in the step S01 to obtain the YoloV5 target detection network model.

For further optimization, the step S03 specifically includes:

s31, firstly, obtaining the coordinates of a detection frame according to the result of the YoloV5 target detection network model in the step S02, and generating a Region of Interest (ROI) of a color image and a corresponding depth image;

s32, obtaining corresponding mapped color image coordinates according to the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the color image and through the coordinate values, the pixel values and the recording distance of the depth image;

s33, obtaining a tea three-dimensional point cloud through coordinate fusion of the color image and the depth image, and specifically:

in the formula (I), the compound is shown in the specification,

a coordinate system representing a three-dimensional point cloud;

a coordinate system representing the color image; d represents a depth value, obtained by a depth image; f. of_x、f_yRepresents the camera focal length;

s34, because the generated tea three-dimensional point cloud comprises tea tender shoots and background point cloud thereof, obtaining an average value of the tea three-dimensional point cloud through calculation, and taking the average value as a distance threshold; then, filtering the background point cloud larger than the distance threshold value to obtain a primarily segmented three-dimensional point cloud; then, a DBSCAN clustering algorithm is adopted, and the parameter radius Eps and the minimum number M of samples required to be contained in the neighborhood are set_pGathering the preliminarily divided three-dimensional point clouds into one kind, and screening out the three-dimensional point clouds of the tea tender shoots;

s35, fitting the minimum external cuboid of the tea tender shoot at the position by adopting a Principal Component Analysis (PCA) according to the growth posture of the tea tender shoot; then calculating to obtain coordinates of each vertex of the cuboid; and obtaining the coordinate of the central point of the bottom surface of the cuboid by calculating the average value of the four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of the tender shoots of the tea leaves.

For further optimization, the step S35 specifically includes:

firstly, screening three main directions, namely x, y and z directions, of the tea tender shoot three-dimensional point cloud by adopting a principal component analysis method, and calculating a mass center and covariance to obtain a covariance matrix; the method specifically comprises the following steps:

in the formula, P_cRepresenting centroid coordinates of the three-dimensional point cloud; n represents the number of three-dimensional point clouds (i.e., the number of points); (x)_i,y_i,z_i) Three-dimensional coordinates representing the ith point;

in the formula, C_pA covariance matrix representing the three-dimensional point cloud;

then, singular value decomposition is carried out on the covariance matrix to obtain an eigenvalue and an eigenvector, and the specific formula is as follows:

in the formula of U_pRepresents a covariance matrix C_pC_p ^TA feature vector matrix of (a); d_pIndicating that a diagonal non-0 value is C_pC_p ^TA diagonal matrix of square roots of non-0 eigenvalues of (1);

represents a C_p ^TC_pThe feature vector matrix of (2);

the direction of the eigenvector corresponding to the maximum eigenvalue is the direction of the main axis of the cuboid;

then, the coordinate points are projected onto the direction vector, and the position coordinate P of each vertex is calculated_iThe inner product with the coordinate point unit vector is obtained at eachMaking a, b and c respectively be average values of the maximum value and the minimum value on x, y and z to obtain a central point O and a length L of the cuboid, and generating the cuboid with the most appropriate and compact tea tender shoots;

the concrete formula is as follows:

O＝ax+by+cz；

wherein, X is a unit vector of the coordinate point in the X direction; y is a unit vector of the coordinate point in the Y direction; z is a unit vector of the coordinate point in the Z direction; l is a radical of an alcohol_x、L_y、L_zThe lengths of the cuboid in the x direction, the y direction and the z direction are respectively;

then, coordinates of the minimum four points in the y direction of the cuboid are judged to be used as coordinates of four vertexes of the bottom surface of the cuboid; and finally, obtaining the coordinate of the central point of the bottom surface of the cuboid, namely the picking point, through the average value of the coordinates of the four vertexes.

The invention has the following technical effects:

according to the method, a characteristic diagram with rich semantic information is constructed by adopting a bidirectional characteristic pyramid network and a channel attention mechanism, and an improved YoloV5 target detection network model is constructed, so that more characteristics are fused on the premise of not increasing extra cost, semantic expression and positioning capacity on multiple scales are enhanced, the probability of a judgment object and the detection precision of the model are improved, the method is effectively suitable for the identification of tea tender shoots with smaller targets and complex environments, and the problems of misjudgment, unclear identification and even incapability of identification caused by small proportion of the tea tender shoots in the whole image are avoided; rethread fitting tea tender bud's minimum external cuboid and the picking point who regards as tea tender bud with the bottom surface central point of this minimum external cuboid realizes the accurate positioning of tea tender bud, cooperates automatic picking tool to pick to tea tender bud simultaneously, effectively avoids mechanized picking easily to injure tealeaves and easily appears the mistake and adopt, leak the scheduling problem of adopting, effectively improves the picking efficiency of tealeaves.

Drawings

Fig. 1 is a schematic diagram of a picture labeled by a labeling tool in the embodiment of the present invention.

Fig. 2 is a multi-scale feature fusion structure diagram based on a bidirectional feature pyramid network structure in the embodiment of the present invention.

Fig. 3 is a flowchart of a multi-target detection positioning method according to an embodiment of the present application.

Detailed Description

The foregoing aspects of the present invention are described in further detail by way of examples, but it should be understood that the scope of the subject matter described above is not limited to the examples described below, and that various modifications and/or changes in form of the techniques described above and those made thereto will fall within the scope of the present invention.

Example (b):

s01, constructing a tea tender shoot image data set; the method specifically comprises the following steps:

firstly, collecting tea tender shoot image data by using an RGB-D camera to obtain a color image and a depth image of tea tender shoots; then, labeling the color image by using a labeling tool, such as a label labeling tool (as shown in fig. 1), performing a data set enhancement operation (the data set enhancement operation may adopt the conventional technical means, and those skilled in the art can clearly know and understand means such as spatial conversion, color conversion, etc.), expanding the number of the data sets, and constructing a tea shoot image data set; and finally, dividing the data set into a training set, a testing set and a verification set.

the YoloV5 detection network adopts a network model with the smallest model file size and the smallest depth and width of a characteristic diagram, and comprises a backhaul module, a Neck module and a Head module; the backhaul module comprises a Focus module, an SPP module and a CBS module which are used for slicing the pictures, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a Detect layer for detecting targets on feature maps of different scales by using a grid anchor-based module;

the method comprises the following steps:

s22, inputting the Feature maps with different sizes in the step S21 into a neutral module, and performing multi-Feature fusion by adopting a bidirectional Feature Pyramid Network (BiFPN) to replace an original Path Aggregation Network (PANET) in the neutral module; sequentially carrying out up-sampling and down-sampling on the feature maps, splicing the feature maps through an Effective Channel Attention (ECA) mechanism to generate feature maps with various sizes, and inputting the feature maps into a Detect layer of a Head module;

in a YoloV5 detection network (namely in an original YoloV5 network structure), reinforced features are used for extracting BiFPN, P5_ in is subjected to upsampling, and BiFPN _ Concat stacking is performed on the upsampled BiFPN _ Concat and P4_ in to obtain P4_ td; then, performing upsampling on the P4_ td, and performing BiFPN _ Concat stacking on the upsampled P4_ td and the P3_ in to obtain P3_ out; then, down-sampling the P3_ out, and stacking BiFPN _ Concat with the P4_ td after down-sampling to obtain P4_ out; and then, the P4_ out is downsampled, and the downsampled P4_ out and the P5_ in are stacked to obtain P5_ out. The method uses efficient bidirectional cross connection to perform feature fusion, removes nodes which contribute less to the feature fusion in the PANet, adds extra connection between input and output nodes at the same level, fuses more features without adding extra cost, and enhances semantic expression and positioning capacity on multiple scales, as shown in FIG. 2.

Then, adding ECA after the 9 th layer, enabling a module to perform Global Average Pooling (Global Average Pooling) on an input feature map, changing the feature map from a matrix of [ h, w, c ] into a vector of [1, c ], then calculating to obtain a size kernel _ size of an adaptive one-dimensional convolution kernel, and using the kernel _ size in one-dimensional convolution to obtain the weight of each channel in the feature map; and multiplying the normalized weight and the original input feature map one by one to generate a weighted feature map.

The attention mechanism uses a 1x1 convolution layer after the global average pooling layer, removes a full connection layer, avoids dimensionality reduction, effectively captures cross-channel interaction, and finally improves the probability of judging an object and the detection precision of a model; the concrete formula is as follows:

wherein C represents the channel dimension; k represents the convolution sum; y and b take the values of 2 and 1 respectively;

s23, back propagation is carried out by combining various loss functions (such as classification loss, positioning loss, execution loss and the like), and the gradient in the model is updated to adjust the weight parameters;

and S24, finally, verifying the existing model by adopting the verification set in the step S01 to obtain a YoloV5 target detection network model.

The method comprises the following specific steps:

s33, obtaining the tea three-dimensional point cloud through the coordinate fusion of the color image and the depth image, specifically:

in the formula (I), the compound is shown in the specification,

a coordinate system representing a three-dimensional point cloud;

the DBSCAN clustering algorithm randomly selects a data sample in the space, and determines whether the number of the samples distributed in the neighborhood radius Eps is more than or equal to the minimum number M of the samples_pA threshold number to determine if it is a core object:

if so, all the points in the neighborhood can be divided into the same cluster group, and all samples with reachable density can be found by breadth-first search and divided into the cluster group on the basis of the cluster group;

if the data sample is a non-core object, marking the data sample as a noise point to be removed;

the formula is specifically:

N_Eps(p)＝{q∈D|dist(p,q)≤Eps}；

in the formula, D represents a point cloud sample set; p and q respectively represent sample points summarized by the sample set;

for any p e D, if its Eps corresponds to | N_Eps(p) | contains at least M_pOne sample, then p is the core object; if q is within the Eps of p and p is the core object, then q becomes reachable by p density;

s35, fitting the minimum external cuboid of the tea tender shoot at the position by adopting a Principal Component Analysis (PCA) according to the growth posture of the tea tender shoot; then calculating to obtain coordinates of each vertex of the cuboid; then obtaining the coordinate of the central point of the bottom surface of the cuboid by calculating the average value of the four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of the tender bud of the tea, wherein the method specifically comprises the following steps:

firstly, screening out three main directions, namely x, y and z directions, of the tea tender shoot three-dimensional point cloud by adopting a principal component analysis method, and calculating a mass center and covariance to obtain a covariance matrix; the method specifically comprises the following steps:

in the formula, C_pTo representCovariance matrix of the three-dimensional point cloud;

represents a C_p ^TC_pA feature vector matrix of (a);

then, the coordinate points are projected onto the direction vector, and the position coordinate P of each vertex is calculated_iObtaining the maximum value and the minimum value in each direction by the inner product of the unit vector of the coordinate point, and enabling a, b and c to be the average values of the maximum value and the minimum value in x, y and z respectively to obtain the central point O and the length L of the cuboid to generate the cuboid with the most appropriate and compact tea tender shoot;

the concrete formula is as follows:

O＝ax+by+cz；

wherein, X is a unit vector of the coordinate point in the X direction; y is a unit vector of the coordinate point in the Y direction; z is a unit vector of the coordinate point in the Z direction; l is_x、L_y、L_zThe lengths of the cuboid in the x direction, the y direction and the z direction are respectively;

The above description is only a preferred embodiment of the present invention, and should not be taken as limiting the invention in any way, and any person skilled in the art can make any simple modification, equivalent replacement, and improvement on the above embodiment without departing from the technical spirit of the present invention, and still fall within the protection scope of the technical solution of the present invention.

Claims

1. A tea multi-target detection positioning method based on YoloV5 is characterized in that: the method specifically comprises the following steps:

s01, constructing a tea tender shoot image data set;

s03, obtaining tea three-dimensional point cloud based on the training result of the YoloV5 target detection network model in the step S02; then screening out three-dimensional point clouds of tender buds of the tea leaves from the three-dimensional point clouds of the tea leaves; and finally fitting the minimum external cuboid of the tea tender shoot to obtain the accurate position and picking point of the tea tender shoot.

2. The YoloV 5-based multi-target detection and positioning method for tea leaves as claimed in claim 1, wherein: the step S01 is specifically: firstly, collecting tea tender shoot image data by using an RGB-D camera to obtain a color image and a depth image of tea tender shoots; then, marking the color image by using a marking tool, performing data set enhancement operation, expanding the number of data sets, and constructing a tea tender shoot image data set; and finally, dividing the data set into a training set, a testing set and a verification set.

3. The tea multi-target detection and positioning method based on YoloV5 as claimed in claim 1 or 2, characterized in that: the YoloV5 detection network comprises a backhaul module, a Neck module and a Head module; the backhaul module comprises a Focus module, an SPP module and a CBS module which are used for slicing the pictures, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a Detect layer for detecting targets on feature maps of different scales by using a grid anchor.

4. The tea multi-target detection and positioning method based on YoloV5 as claimed in claim 1 or 3, characterized in that: the step S02 specifically includes:

s22, inputting the feature maps with different sizes in the step S21 into a hack module, and performing multi-feature fusion by adopting a bidirectional feature pyramid network to replace an original path aggregation network in the hack module; sequentially carrying out up-sampling and down-sampling on the feature maps, and splicing through a channel attention mechanism to generate feature maps with various sizes, and inputting the feature maps into a Detect layer of a Head module;

5. The yoolov 5-based multi-target detection and positioning method for tea leaves as claimed in claim 1 or 4, wherein: the step S03 specifically includes:

s31, firstly, obtaining coordinates of a detection frame according to a result of the YoloV5 target detection network model in the step S02, and generating a color image and an interested area corresponding to a depth image;

in the formula (I), the compound is shown in the specification,

a coordinate system representing a three-dimensional point cloud;

s35, fitting the minimum external cuboid of the tea tender shoots at the position by adopting a principal component analysis method according to the growth postures of the tea tender shoots; then calculating to obtain coordinates of each vertex of the cuboid; and obtaining the coordinate of the central point of the bottom surface of the cuboid by calculating the average value of the four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of the tender shoots of the tea leaves.

6. The yoloV 5-based multi-target detection and positioning method for tea leaves as claimed in claim 5, wherein: the step S35 specifically includes:

in the formula, P_cRepresenting centroid coordinates of the three-dimensional point cloud; n represents the number of three-dimensional point clouds; (x)_i,y_i,z_i) Three-dimensional coordinates representing the ith point;

in the formula of U_pAn eigenvector representing a covariance matrix; u shape_pRepresents a covariance matrix C_pC_p ^TA feature vector matrix of (a); d_pIndicating that a diagonal non-0 value is C_pC_p ^TIs diagonal matrix of square roots of non-0 eigenvalues；

Represents a C_p ^TC_pThe feature vector matrix of (2);

thereafter, the coordinate points are projected onto the direction vector by calculating the coordinates P of each vertex position_iObtaining the maximum value and the minimum value in each direction by the inner product of the unit vector of the coordinate point, and enabling a, b and c to be the average values of the maximum value and the minimum value in x, y and z respectively to obtain the central point O and the length L of the cuboid to generate the cuboid with the most appropriate and compact tea tender shoot;

the concrete formula is as follows:

O＝ax+by+cz；