CN115272791A - Multi-target detection positioning method for tea based on YoloV5 - Google Patents

Multi-target detection positioning method for tea based on YoloV5 Download PDF

Info

Publication number
CN115272791A
CN115272791A CN202210866833.0A CN202210866833A CN115272791A CN 115272791 A CN115272791 A CN 115272791A CN 202210866833 A CN202210866833 A CN 202210866833A CN 115272791 A CN115272791 A CN 115272791A
Authority
CN
China
Prior art keywords
tea
module
cuboid
dimensional point
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210866833.0A
Other languages
Chinese (zh)
Other versions
CN115272791B (en
Inventor
朱立学
张智浩
林桂潮
张世昂
陈品岚
官金炫
陈明杰
林深凯
吴天骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongkai University of Agriculture and Engineering
Original Assignee
Zhongkai University of Agriculture and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongkai University of Agriculture and Engineering filed Critical Zhongkai University of Agriculture and Engineering
Priority to CN202210866833.0A priority Critical patent/CN115272791B/en
Publication of CN115272791A publication Critical patent/CN115272791A/en
Application granted granted Critical
Publication of CN115272791B publication Critical patent/CN115272791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention provides a yoloV 5-based multi-target detection and positioning method for tea, which specifically comprises the steps of S01, constructing a tea tender shoot image data set; s02, improving a YoloV5 detection network; s03, obtaining three-dimensional point cloud of tea tender shoots, fitting a minimum external cuboid of the tea tender shoots, and obtaining tea tender shoot picking points. The method can effectively carry out multi-target detection and positioning on the tea tender shoots, so that the positions of the tea tender shoots are accurately and effectively identified, the tea tender shoots are intelligently picked by matching with a picking tool, the picking efficiency is improved, the picking time is saved, and the labor cost is reduced.

Description

YoloV 5-based multi-target detection and positioning method for tea
Technical Field
The invention relates to the technical field of tea positioning, in particular to a yoloV 5-based multi-target detection and positioning method for tea.
Background
Tea processing, also known as tea making, is a process of picking fresh leaves on tea trees and then processing the fresh leaves into various semi-finished or finished tea products through various processing procedures; wherein, picking tea tender shoots (or fresh leaves) is one of the important links of tea processing production. At present, picking of tender buds (or fresh leaves) of tea leaves mainly adopts manpower, but the manual picking mode is low in efficiency, the labor intensity of workers is increased, and a large amount of labor cost is wasted; meanwhile, a mode of picking tea leaves by using a tea picking machine exists in the prior art, but the current working mode of the tea picking machine is mostly a 'one-knife-cutting' reciprocating cutting mode which has no selectivity, and the tea shoots grow highly inconsistently due to the influence of various environmental factors (such as illumination, gravity, temperature, humidity and the like) on tea trees, and the 'one-knife-cutting' reciprocating cutting mode is easy to cause missed picking and wrong picking, and even causes damage to the tea shoots. Therefore, how to identify and judge the position of the tea leaves, so that accurate and mechanical tea leaf picking is realized, the problems of missed picking, wrong picking and tea leaf damage are avoided, and the intelligent tea leaf picking machine is one of the challenges faced by the intelligent tea leaf picking at present.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a yoloV 5-based multi-target detection and positioning method for tea, which can effectively perform multi-target detection and positioning on tea tender shoots, so that the positions of the tea tender shoots can be accurately and effectively identified, a picking tool is matched, intelligent picking of the tea tender shoots is realized, the picking efficiency is improved, the picking time is saved, and the labor cost is reduced.
The purpose of the invention is realized by the following technical scheme:
a tea multi-target detection positioning method based on YoloV5 is characterized in that: the method specifically comprises the following steps:
s01, constructing a tea tender shoot image data set;
s02, constructing a feature map with rich semantic information through a bidirectional feature pyramid network and a channel attention mechanism based on the data set in the step S01, improving a YoloV5 detection network, obtaining a YoloV5 target detection network model, and detecting small-size tea tender shoots;
s03, obtaining tea three-dimensional point cloud based on the training result of the YoloV5 target detection network model in the step S02; then screening out tea tender shoot three-dimensional point clouds from the tea three-dimensional point clouds; and finally fitting the minimum external cuboid of the tea tender shoot to obtain the accurate position and picking point of the tea tender shoot.
Further optimization, the step S01 specifically includes: firstly, collecting tea tender shoot image data by using an RGB-D camera to obtain a color image and a depth image of tea tender shoots; then, labeling the color image by using a labeling tool, performing data set enhancement operation, expanding the number of the data sets, and constructing a tea tender shoot image data set; and finally, dividing the data set into a training set, a testing set and a verification set.
Preferably, the labeling tool is a Labelimg labeling tool.
For further optimization, the yoloV5 detection network comprises a backhaul module, a Neck module and a Head module; the backhaul module comprises a Focus module, an SPP module and a CBS module which are used for slicing the pictures, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a Detect layer for detecting targets on feature maps of different scales by using a grid anchor-based method.
Preferably, the yoolov 5 detection network adopts a network model with the smallest model file size and the smallest depth and width of the feature map.
For further optimization, the step S02 specifically includes:
s21, firstly, preprocessing the images in the training set in the step S01 and unifying the resolution of all the images in the training set; inputting the preprocessed image into a Backbone module to obtain characteristic graphs with different sizes;
s22, inputting the Feature maps with different sizes in the step S21 into a neutral module, and performing multi-Feature fusion by adopting a bidirectional Feature Pyramid Network (BiFPN) to replace an original Path Aggregation Network (PANET) in the neutral module; sequentially carrying out up-sampling and down-sampling on the feature maps, splicing the feature maps through an Efficiency Channel Attention (ECA) mechanism to generate feature maps with various sizes, and inputting the feature maps into a Detect layer of a Head module;
s23, combining various loss functions to perform back propagation, and updating and adjusting the weight parameters of the gradient in the model;
and S24, finally, verifying the existing model by adopting the verification set in the step S01 to obtain the YoloV5 target detection network model.
For further optimization, the step S03 specifically includes:
s31, firstly, obtaining the coordinates of a detection frame according to the result of the YoloV5 target detection network model in the step S02, and generating a Region of Interest (ROI) of a color image and a corresponding depth image;
s32, obtaining corresponding mapped color image coordinates according to the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the color image and through the coordinate values, the pixel values and the recording distance of the depth image;
s33, obtaining a tea three-dimensional point cloud through coordinate fusion of the color image and the depth image, and specifically:
Figure BDA0003759558010000031
in the formula (I), the compound is shown in the specification,
Figure BDA0003759558010000032
a coordinate system representing a three-dimensional point cloud;
Figure BDA0003759558010000033
a coordinate system representing the color image; d represents a depth value, obtained by a depth image; f. ofx、fyRepresents the camera focal length;
s34, because the generated tea three-dimensional point cloud comprises tea tender shoots and background point cloud thereof, obtaining an average value of the tea three-dimensional point cloud through calculation, and taking the average value as a distance threshold; then, filtering the background point cloud larger than the distance threshold value to obtain a primarily segmented three-dimensional point cloud; then, a DBSCAN clustering algorithm is adopted, and the parameter radius Eps and the minimum number M of samples required to be contained in the neighborhood are setpGathering the preliminarily divided three-dimensional point clouds into one kind, and screening out the three-dimensional point clouds of the tea tender shoots;
s35, fitting the minimum external cuboid of the tea tender shoot at the position by adopting a Principal Component Analysis (PCA) according to the growth posture of the tea tender shoot; then calculating to obtain coordinates of each vertex of the cuboid; and obtaining the coordinate of the central point of the bottom surface of the cuboid by calculating the average value of the four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of the tender shoots of the tea leaves.
For further optimization, the step S35 specifically includes:
firstly, screening three main directions, namely x, y and z directions, of the tea tender shoot three-dimensional point cloud by adopting a principal component analysis method, and calculating a mass center and covariance to obtain a covariance matrix; the method specifically comprises the following steps:
Figure BDA0003759558010000041
Figure BDA0003759558010000042
in the formula, PcRepresenting centroid coordinates of the three-dimensional point cloud; n represents the number of three-dimensional point clouds (i.e., the number of points); (x)i,yi,zi) Three-dimensional coordinates representing the ith point;
Figure BDA0003759558010000043
in the formula, CpA covariance matrix representing the three-dimensional point cloud;
then, singular value decomposition is carried out on the covariance matrix to obtain an eigenvalue and an eigenvector, and the specific formula is as follows:
Figure BDA0003759558010000044
in the formula of UpRepresents a covariance matrix CpCp TA feature vector matrix of (a); dpIndicating that a diagonal non-0 value is CpCp TA diagonal matrix of square roots of non-0 eigenvalues of (1);
Figure BDA0003759558010000045
represents a Cp TCpThe feature vector matrix of (2);
the direction of the eigenvector corresponding to the maximum eigenvalue is the direction of the main axis of the cuboid;
then, the coordinate points are projected onto the direction vector, and the position coordinate P of each vertex is calculatediThe inner product with the coordinate point unit vector is obtained at eachMaking a, b and c respectively be average values of the maximum value and the minimum value on x, y and z to obtain a central point O and a length L of the cuboid, and generating the cuboid with the most appropriate and compact tea tender shoots;
the concrete formula is as follows:
Figure BDA0003759558010000051
O=ax+by+cz;
Figure BDA0003759558010000052
wherein, X is a unit vector of the coordinate point in the X direction; y is a unit vector of the coordinate point in the Y direction; z is a unit vector of the coordinate point in the Z direction; l is a radical of an alcoholx、Ly、LzThe lengths of the cuboid in the x direction, the y direction and the z direction are respectively;
then, coordinates of the minimum four points in the y direction of the cuboid are judged to be used as coordinates of four vertexes of the bottom surface of the cuboid; and finally, obtaining the coordinate of the central point of the bottom surface of the cuboid, namely the picking point, through the average value of the coordinates of the four vertexes.
The invention has the following technical effects:
according to the method, a characteristic diagram with rich semantic information is constructed by adopting a bidirectional characteristic pyramid network and a channel attention mechanism, and an improved YoloV5 target detection network model is constructed, so that more characteristics are fused on the premise of not increasing extra cost, semantic expression and positioning capacity on multiple scales are enhanced, the probability of a judgment object and the detection precision of the model are improved, the method is effectively suitable for the identification of tea tender shoots with smaller targets and complex environments, and the problems of misjudgment, unclear identification and even incapability of identification caused by small proportion of the tea tender shoots in the whole image are avoided; rethread fitting tea tender bud's minimum external cuboid and the picking point who regards as tea tender bud with the bottom surface central point of this minimum external cuboid realizes the accurate positioning of tea tender bud, cooperates automatic picking tool to pick to tea tender bud simultaneously, effectively avoids mechanized picking easily to injure tealeaves and easily appears the mistake and adopt, leak the scheduling problem of adopting, effectively improves the picking efficiency of tealeaves.
Drawings
Fig. 1 is a schematic diagram of a picture labeled by a labeling tool in the embodiment of the present invention.
Fig. 2 is a multi-scale feature fusion structure diagram based on a bidirectional feature pyramid network structure in the embodiment of the present invention.
Fig. 3 is a flowchart of a multi-target detection positioning method according to an embodiment of the present application.
Detailed Description
The foregoing aspects of the present invention are described in further detail by way of examples, but it should be understood that the scope of the subject matter described above is not limited to the examples described below, and that various modifications and/or changes in form of the techniques described above and those made thereto will fall within the scope of the present invention.
Example (b):
a tea multi-target detection positioning method based on YoloV5 is characterized in that: the method specifically comprises the following steps:
s01, constructing a tea tender shoot image data set; the method specifically comprises the following steps:
firstly, collecting tea tender shoot image data by using an RGB-D camera to obtain a color image and a depth image of tea tender shoots; then, labeling the color image by using a labeling tool, such as a label labeling tool (as shown in fig. 1), performing a data set enhancement operation (the data set enhancement operation may adopt the conventional technical means, and those skilled in the art can clearly know and understand means such as spatial conversion, color conversion, etc.), expanding the number of the data sets, and constructing a tea shoot image data set; and finally, dividing the data set into a training set, a testing set and a verification set.
S02, constructing a feature map with rich semantic information through a bidirectional feature pyramid network and a channel attention mechanism based on the data set in the step S01, improving a YoloV5 detection network, obtaining a YoloV5 target detection network model, and detecting small-size tea tender shoots;
the YoloV5 detection network adopts a network model with the smallest model file size and the smallest depth and width of a characteristic diagram, and comprises a backhaul module, a Neck module and a Head module; the backhaul module comprises a Focus module, an SPP module and a CBS module which are used for slicing the pictures, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a Detect layer for detecting targets on feature maps of different scales by using a grid anchor-based module;
the method comprises the following steps:
s21, firstly, preprocessing the images in the training set in the step S01 and unifying the resolution of all the images in the training set; inputting the preprocessed image into a Backbone module to obtain characteristic graphs with different sizes;
s22, inputting the Feature maps with different sizes in the step S21 into a neutral module, and performing multi-Feature fusion by adopting a bidirectional Feature Pyramid Network (BiFPN) to replace an original Path Aggregation Network (PANET) in the neutral module; sequentially carrying out up-sampling and down-sampling on the feature maps, splicing the feature maps through an Effective Channel Attention (ECA) mechanism to generate feature maps with various sizes, and inputting the feature maps into a Detect layer of a Head module;
in a YoloV5 detection network (namely in an original YoloV5 network structure), reinforced features are used for extracting BiFPN, P5_ in is subjected to upsampling, and BiFPN _ Concat stacking is performed on the upsampled BiFPN _ Concat and P4_ in to obtain P4_ td; then, performing upsampling on the P4_ td, and performing BiFPN _ Concat stacking on the upsampled P4_ td and the P3_ in to obtain P3_ out; then, down-sampling the P3_ out, and stacking BiFPN _ Concat with the P4_ td after down-sampling to obtain P4_ out; and then, the P4_ out is downsampled, and the downsampled P4_ out and the P5_ in are stacked to obtain P5_ out. The method uses efficient bidirectional cross connection to perform feature fusion, removes nodes which contribute less to the feature fusion in the PANet, adds extra connection between input and output nodes at the same level, fuses more features without adding extra cost, and enhances semantic expression and positioning capacity on multiple scales, as shown in FIG. 2.
Then, adding ECA after the 9 th layer, enabling a module to perform Global Average Pooling (Global Average Pooling) on an input feature map, changing the feature map from a matrix of [ h, w, c ] into a vector of [1, c ], then calculating to obtain a size kernel _ size of an adaptive one-dimensional convolution kernel, and using the kernel _ size in one-dimensional convolution to obtain the weight of each channel in the feature map; and multiplying the normalized weight and the original input feature map one by one to generate a weighted feature map.
The attention mechanism uses a 1x1 convolution layer after the global average pooling layer, removes a full connection layer, avoids dimensionality reduction, effectively captures cross-channel interaction, and finally improves the probability of judging an object and the detection precision of a model; the concrete formula is as follows:
Figure BDA0003759558010000081
wherein C represents the channel dimension; k represents the convolution sum; y and b take the values of 2 and 1 respectively;
s23, back propagation is carried out by combining various loss functions (such as classification loss, positioning loss, execution loss and the like), and the gradient in the model is updated to adjust the weight parameters;
and S24, finally, verifying the existing model by adopting the verification set in the step S01 to obtain a YoloV5 target detection network model.
S03, obtaining tea three-dimensional point cloud based on the training result of the YoloV5 target detection network model in the step S02; then screening out tea tender shoot three-dimensional point clouds from the tea three-dimensional point clouds; and finally fitting the minimum external cuboid of the tea tender shoot to obtain the accurate position and picking point of the tea tender shoot.
The method comprises the following specific steps:
s31, firstly, obtaining the coordinates of a detection frame according to the result of the YoloV5 target detection network model in the step S02, and generating a Region of Interest (ROI) of a color image and a corresponding depth image;
s32, obtaining corresponding mapped color image coordinates according to the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the color image and through the coordinate values, the pixel values and the recording distance of the depth image;
s33, obtaining the tea three-dimensional point cloud through the coordinate fusion of the color image and the depth image, specifically:
Figure BDA0003759558010000091
in the formula (I), the compound is shown in the specification,
Figure BDA0003759558010000092
a coordinate system representing a three-dimensional point cloud;
Figure BDA0003759558010000093
a coordinate system representing the color image; d represents a depth value, obtained by a depth image; f. ofx、fyRepresents the camera focal length;
s34, because the generated tea three-dimensional point cloud comprises tea tender shoots and background point cloud thereof, obtaining an average value of the tea three-dimensional point cloud through calculation, and taking the average value as a distance threshold; then, filtering the background point cloud larger than the distance threshold value to obtain a primarily segmented three-dimensional point cloud; then, a DBSCAN clustering algorithm is adopted, and the parameter radius Eps and the minimum number M of samples required to be contained in the neighborhood are setpGathering the preliminarily divided three-dimensional point clouds into one kind, and screening out the three-dimensional point clouds of the tea tender shoots;
the DBSCAN clustering algorithm randomly selects a data sample in the space, and determines whether the number of the samples distributed in the neighborhood radius Eps is more than or equal to the minimum number M of the samplespA threshold number to determine if it is a core object:
if so, all the points in the neighborhood can be divided into the same cluster group, and all samples with reachable density can be found by breadth-first search and divided into the cluster group on the basis of the cluster group;
if the data sample is a non-core object, marking the data sample as a noise point to be removed;
the formula is specifically:
NEps(p)={q∈D|dist(p,q)≤Eps};
in the formula, D represents a point cloud sample set; p and q respectively represent sample points summarized by the sample set;
for any p e D, if its Eps corresponds to | NEps(p) | contains at least MpOne sample, then p is the core object; if q is within the Eps of p and p is the core object, then q becomes reachable by p density;
s35, fitting the minimum external cuboid of the tea tender shoot at the position by adopting a Principal Component Analysis (PCA) according to the growth posture of the tea tender shoot; then calculating to obtain coordinates of each vertex of the cuboid; then obtaining the coordinate of the central point of the bottom surface of the cuboid by calculating the average value of the four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of the tender bud of the tea, wherein the method specifically comprises the following steps:
firstly, screening out three main directions, namely x, y and z directions, of the tea tender shoot three-dimensional point cloud by adopting a principal component analysis method, and calculating a mass center and covariance to obtain a covariance matrix; the method specifically comprises the following steps:
Figure BDA0003759558010000101
Figure BDA0003759558010000102
in the formula, PcRepresenting centroid coordinates of the three-dimensional point cloud; n represents the number of three-dimensional point clouds (i.e., the number of points); (x)i,yi,zi) Three-dimensional coordinates representing the ith point;
Figure BDA0003759558010000103
in the formula, CpTo representCovariance matrix of the three-dimensional point cloud;
then, singular value decomposition is carried out on the covariance matrix to obtain an eigenvalue and an eigenvector, and the specific formula is as follows:
Figure BDA0003759558010000104
in the formula of UpRepresents a covariance matrix CpCp TA feature vector matrix of (a); dpIndicating that a diagonal non-0 value is CpCp TA diagonal matrix of square roots of non-0 eigenvalues of (1);
Figure BDA0003759558010000105
represents a Cp TCpA feature vector matrix of (a);
the direction of the eigenvector corresponding to the maximum eigenvalue is the direction of the main axis of the cuboid;
then, the coordinate points are projected onto the direction vector, and the position coordinate P of each vertex is calculatediObtaining the maximum value and the minimum value in each direction by the inner product of the unit vector of the coordinate point, and enabling a, b and c to be the average values of the maximum value and the minimum value in x, y and z respectively to obtain the central point O and the length L of the cuboid to generate the cuboid with the most appropriate and compact tea tender shoot;
the concrete formula is as follows:
Figure BDA0003759558010000111
O=ax+by+cz;
Figure BDA0003759558010000112
wherein, X is a unit vector of the coordinate point in the X direction; y is a unit vector of the coordinate point in the Y direction; z is a unit vector of the coordinate point in the Z direction; l isx、Ly、LzThe lengths of the cuboid in the x direction, the y direction and the z direction are respectively;
then, coordinates of the minimum four points in the y direction of the cuboid are judged to be used as coordinates of four vertexes of the bottom surface of the cuboid; and finally, obtaining the coordinate of the central point of the bottom surface of the cuboid, namely the picking point, through the average value of the coordinates of the four vertexes.
The above description is only a preferred embodiment of the present invention, and should not be taken as limiting the invention in any way, and any person skilled in the art can make any simple modification, equivalent replacement, and improvement on the above embodiment without departing from the technical spirit of the present invention, and still fall within the protection scope of the technical solution of the present invention.

Claims (6)

1. A tea multi-target detection positioning method based on YoloV5 is characterized in that: the method specifically comprises the following steps:
s01, constructing a tea tender shoot image data set;
s02, constructing a feature map with rich semantic information through a bidirectional feature pyramid network and a channel attention mechanism based on the data set in the step S01, improving a YoloV5 detection network, obtaining a YoloV5 target detection network model, and detecting small-size tea tender shoots;
s03, obtaining tea three-dimensional point cloud based on the training result of the YoloV5 target detection network model in the step S02; then screening out three-dimensional point clouds of tender buds of the tea leaves from the three-dimensional point clouds of the tea leaves; and finally fitting the minimum external cuboid of the tea tender shoot to obtain the accurate position and picking point of the tea tender shoot.
2. The YoloV 5-based multi-target detection and positioning method for tea leaves as claimed in claim 1, wherein: the step S01 is specifically: firstly, collecting tea tender shoot image data by using an RGB-D camera to obtain a color image and a depth image of tea tender shoots; then, marking the color image by using a marking tool, performing data set enhancement operation, expanding the number of data sets, and constructing a tea tender shoot image data set; and finally, dividing the data set into a training set, a testing set and a verification set.
3. The tea multi-target detection and positioning method based on YoloV5 as claimed in claim 1 or 2, characterized in that: the YoloV5 detection network comprises a backhaul module, a Neck module and a Head module; the backhaul module comprises a Focus module, an SPP module and a CBS module which are used for slicing the pictures, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a Detect layer for detecting targets on feature maps of different scales by using a grid anchor.
4. The tea multi-target detection and positioning method based on YoloV5 as claimed in claim 1 or 3, characterized in that: the step S02 specifically includes:
s21, firstly, preprocessing the images in the training set in the step S01 and unifying the resolution of all the images in the training set; inputting the preprocessed image into a Backbone module to obtain characteristic graphs with different sizes;
s22, inputting the feature maps with different sizes in the step S21 into a hack module, and performing multi-feature fusion by adopting a bidirectional feature pyramid network to replace an original path aggregation network in the hack module; sequentially carrying out up-sampling and down-sampling on the feature maps, and splicing through a channel attention mechanism to generate feature maps with various sizes, and inputting the feature maps into a Detect layer of a Head module;
s23, combining various loss functions to perform back propagation, and updating and adjusting the weight parameters of the gradient in the model;
and S24, finally, verifying the existing model by adopting the verification set in the step S01 to obtain the YoloV5 target detection network model.
5. The yoolov 5-based multi-target detection and positioning method for tea leaves as claimed in claim 1 or 4, wherein: the step S03 specifically includes:
s31, firstly, obtaining coordinates of a detection frame according to a result of the YoloV5 target detection network model in the step S02, and generating a color image and an interested area corresponding to a depth image;
s32, obtaining corresponding mapped color image coordinates according to the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the color image and through the coordinate values, the pixel values and the recording distance of the depth image;
s33, obtaining a tea three-dimensional point cloud through coordinate fusion of the color image and the depth image, and specifically:
Figure FDA0003759558000000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003759558000000022
a coordinate system representing a three-dimensional point cloud;
Figure FDA0003759558000000023
a coordinate system representing the color image; d represents a depth value, obtained by a depth image; f. ofx、fyRepresents the camera focal length;
s34, because the generated tea three-dimensional point cloud comprises tea tender shoots and background point cloud thereof, obtaining an average value of the tea three-dimensional point cloud through calculation, and taking the average value as a distance threshold; then, filtering the background point cloud larger than the distance threshold value to obtain a primarily segmented three-dimensional point cloud; then, a DBSCAN clustering algorithm is adopted, and the parameter radius Eps and the minimum number M of samples required to be contained in the neighborhood are setpGathering the preliminarily divided three-dimensional point clouds into one kind, and screening out the three-dimensional point clouds of the tea tender shoots;
s35, fitting the minimum external cuboid of the tea tender shoots at the position by adopting a principal component analysis method according to the growth postures of the tea tender shoots; then calculating to obtain coordinates of each vertex of the cuboid; and obtaining the coordinate of the central point of the bottom surface of the cuboid by calculating the average value of the four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of the tender shoots of the tea leaves.
6. The yoloV 5-based multi-target detection and positioning method for tea leaves as claimed in claim 5, wherein: the step S35 specifically includes:
firstly, screening three main directions, namely x, y and z directions, of the tea tender shoot three-dimensional point cloud by adopting a principal component analysis method, and calculating a mass center and covariance to obtain a covariance matrix; the method specifically comprises the following steps:
Figure FDA0003759558000000031
Figure FDA0003759558000000032
in the formula, PcRepresenting centroid coordinates of the three-dimensional point cloud; n represents the number of three-dimensional point clouds; (x)i,yi,zi) Three-dimensional coordinates representing the ith point;
Figure FDA0003759558000000033
in the formula, CpA covariance matrix representing the three-dimensional point cloud;
then, singular value decomposition is carried out on the covariance matrix to obtain an eigenvalue and an eigenvector, and the specific formula is as follows:
Figure FDA0003759558000000034
in the formula of UpAn eigenvector representing a covariance matrix; u shapepRepresents a covariance matrix CpCp TA feature vector matrix of (a); dpIndicating that a diagonal non-0 value is CpCp TIs diagonal matrix of square roots of non-0 eigenvalues;
Figure FDA0003759558000000041
Represents a Cp TCpThe feature vector matrix of (2);
the direction of the eigenvector corresponding to the maximum eigenvalue is the direction of the main axis of the cuboid;
thereafter, the coordinate points are projected onto the direction vector by calculating the coordinates P of each vertex positioniObtaining the maximum value and the minimum value in each direction by the inner product of the unit vector of the coordinate point, and enabling a, b and c to be the average values of the maximum value and the minimum value in x, y and z respectively to obtain the central point O and the length L of the cuboid to generate the cuboid with the most appropriate and compact tea tender shoot;
the concrete formula is as follows:
Figure FDA0003759558000000042
O=ax+by+cz;
Figure FDA0003759558000000043
wherein, X is a unit vector of the coordinate point in the X direction; y is a unit vector of the coordinate point in the Y direction; z is a unit vector of the coordinate point in the Z direction; l isx、Ly、LzThe lengths of the cuboid in the x direction, the y direction and the z direction are respectively;
then, coordinates of the minimum four points in the y direction of the cuboid are judged to be used as coordinates of four vertexes of the bottom surface of the cuboid; and finally, obtaining the coordinate of the central point of the bottom surface of the cuboid, namely the picking point, through the average value of the coordinates of the four vertexes.
CN202210866833.0A 2022-07-22 2022-07-22 YoloV 5-based multi-target detection and positioning method for tea leaves Active CN115272791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210866833.0A CN115272791B (en) 2022-07-22 2022-07-22 YoloV 5-based multi-target detection and positioning method for tea leaves

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210866833.0A CN115272791B (en) 2022-07-22 2022-07-22 YoloV 5-based multi-target detection and positioning method for tea leaves

Publications (2)

Publication Number Publication Date
CN115272791A true CN115272791A (en) 2022-11-01
CN115272791B CN115272791B (en) 2023-05-26

Family

ID=83768705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210866833.0A Active CN115272791B (en) 2022-07-22 2022-07-22 YoloV 5-based multi-target detection and positioning method for tea leaves

Country Status (1)

Country Link
CN (1) CN115272791B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115943809A (en) * 2023-03-09 2023-04-11 四川省农业机械研究设计院 Tea picking optimization method and system based on quality evaluation
CN116138036A (en) * 2023-03-24 2023-05-23 仲恺农业工程学院 Secondary positioning method for picking young buds of famous tea

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080693A (en) * 2019-11-22 2020-04-28 天津大学 Robot autonomous classification grabbing method based on YOLOv3
CN113223091A (en) * 2021-04-29 2021-08-06 达闼机器人有限公司 Three-dimensional target detection method, three-dimensional target capture device and electronic equipment
CN113901874A (en) * 2021-09-09 2022-01-07 江苏大学 Tea tender shoot identification and picking point positioning method based on improved R3Det rotating target detection algorithm
CN114529799A (en) * 2022-01-06 2022-05-24 浙江工业大学 Aircraft multi-target tracking method based on improved YOLOV5 algorithm
CN114731840A (en) * 2022-04-07 2022-07-12 仲恺农业工程学院 Double-mechanical-arm tea picking robot based on machine vision

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080693A (en) * 2019-11-22 2020-04-28 天津大学 Robot autonomous classification grabbing method based on YOLOv3
CN113223091A (en) * 2021-04-29 2021-08-06 达闼机器人有限公司 Three-dimensional target detection method, three-dimensional target capture device and electronic equipment
CN113901874A (en) * 2021-09-09 2022-01-07 江苏大学 Tea tender shoot identification and picking point positioning method based on improved R3Det rotating target detection algorithm
CN114529799A (en) * 2022-01-06 2022-05-24 浙江工业大学 Aircraft multi-target tracking method based on improved YOLOV5 algorithm
CN114731840A (en) * 2022-04-07 2022-07-12 仲恺农业工程学院 Double-mechanical-arm tea picking robot based on machine vision

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YATAO LI ET AL: "\"In-field tea shoot detection and 3D localization using an RGB-D camera\"", 《COMPUTERS AND ELECTRONICS IN AGRICULTURE》 *
张泽坤等: "\"面向物流分拣的多立体摄像头物体操作系统\"", 《计算机应用》 *
罗陆锋等: ""自然环境下葡萄采摘机器人采摘点的自动定位"", 《农业工程学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115943809A (en) * 2023-03-09 2023-04-11 四川省农业机械研究设计院 Tea picking optimization method and system based on quality evaluation
CN116138036A (en) * 2023-03-24 2023-05-23 仲恺农业工程学院 Secondary positioning method for picking young buds of famous tea
CN116138036B (en) * 2023-03-24 2024-04-02 仲恺农业工程学院 Secondary positioning method for picking young buds of famous tea

Also Published As

Publication number Publication date
CN115272791B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN109934115B (en) Face recognition model construction method, face recognition method and electronic equipment
CN115272791B (en) YoloV 5-based multi-target detection and positioning method for tea leaves
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN115187803B (en) Positioning method for picking process of famous tea tender shoots
CN109102543A (en) Object positioning method, equipment and storage medium based on image segmentation
Wang et al. YOLOv3-Litchi detection method of densely distributed litchi in large vision scenes
CN110852182B (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
CN109325504A (en) A kind of underwater sea cucumber recognition methods and system
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN113409252B (en) Obstacle detection method for overhead transmission line inspection robot
CN112560623B (en) Unmanned aerial vehicle-based rapid mangrove plant species identification method
CN113901874A (en) Tea tender shoot identification and picking point positioning method based on improved R3Det rotating target detection algorithm
CN113408584A (en) RGB-D multi-modal feature fusion 3D target detection method
Zhang et al. Research on spatial positioning system of fruits to be picked in field based on binocular vision and SSD model
CN115082815A (en) Tea bud picking point positioning method and device based on machine vision and picking system
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN113569981A (en) Power inspection bird nest detection method based on single-stage target detection network
CN115719445A (en) Seafood identification method based on deep learning and raspberry type 4B module
CN111241905A (en) Power transmission line nest detection method based on improved SSD algorithm
CN109657540A (en) Withered tree localization method and system
CN116630828B (en) Unmanned aerial vehicle remote sensing information acquisition system and method based on terrain environment adaptation
CN113723833B (en) Method, system, terminal equipment and storage medium for evaluating quality of forestation actual results
CN115995017A (en) Fruit identification and positioning method, device and medium
CN115880477A (en) Apple detection positioning method and system based on deep convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant