CN114511849A

CN114511849A - Grape thinning identification method based on graph attention network

Info

Publication number: CN114511849A
Application number: CN202111666428.6A
Authority: CN
Inventors: 苏家仪; 韦光亮; 王筱东; 张玉国; 申智辉; 顾小宁
Original assignee: Guangxi Talentcloud Information Technology Co ltd
Current assignee: Guangxi Talentcloud Information Technology Co ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-05-17
Anticipated expiration: 2041-12-30
Also published as: CN114511849B

Abstract

The invention belongs to the field of grape image identification, and particularly relates to a grape thinning fruit identification method based on a graph attention network. According to the method, the segmentation result of the grape fruit grains is obtained through an example segmentation algorithm, a graph structure is established according to the self characteristics of the grape fruit grains and the correlation characteristics between the fruit grains, wherein the self characteristics of the fruit grains are represented by fruit grain position and size information, the size consistency characteristics are obtained through graph attention network learning, the correlation characteristics between the fruit grains are represented by fruit grain distances, the fruit grain gap characteristics are obtained through graph attention network learning, the graph attention network is used for carrying out whole graph classification on grape thinning, the correlation between the graph structure of the fruit grains and the judgment of whether the grape thinning exists is mined, and grape thinning identification is achieved. Compared with the existing grape thinning fruit identification method, the accuracy of automatic identification of the grape thinning fruit is improved.

Description

Grape thinning identification method based on graph attention network

Technical Field

The invention belongs to the field of grape image identification, and particularly relates to a grape thinning fruit identification method based on a graph attention network.

Background

Grape fruit thinning is a very important agricultural operation in the grape planting process, fruit thinning is usually performed in the flowering and fruit setting period, and scissors are used for removing plant diseases and insect pests, deformity and small fruit grains, because the fruit grains consume nutrients, normal growth of other fruit grains is influenced, the fruit grains are different in size and irregular in shape, the fruit is finally influenced, and the economic benefit is reduced.

The fruit thinning condition of the grapes in the garden is judged to be very important for large-scale grape planting enterprises, the traditional method is that the orchard is patrolled manually, the fruit thinning rate of fruit ears is counted, the mode wastes time and energy, and the efficiency is lower. Along with the development of intelligent agricultural machinery, unmanned garden tour car fungible manual work patrols the garden, gathers the grape picture through the camera, carries out fruit thinning discernment and statistical analysis.

The general image classification-based thinning fruit identification method divides grape cluster images into thinned fruits and unshuffled fruits, and annotators perform two-classification annotation by observing the size consistency and the fruit grain gaps of fruit grains, but a model based on a general convolutional neural network is difficult to learn two key unquantized characteristics of the size consistency and the fruit grain gaps of the fruit grains, so that the identification accuracy is low.

Disclosure of Invention

In order to solve the problems, the invention provides a grape thinning fruit identification method based on a graph attention network, which comprises the steps of obtaining a segmentation result of grape fruit grains through an example segmentation algorithm, establishing a graph structure according to self characteristics of the grape fruit grains and correlation characteristics between the grape fruit grains, wherein the self characteristics of the fruit grains are represented by fruit grain positions and size information, learning through the graph attention network to obtain size consistency characteristics, the correlation characteristics between the fruit grains are represented by fruit grain distances, learning through the graph attention network to obtain fruit grain gap characteristics, classifying the whole graph of the grape thinning fruit by using the graph attention network, and mining the correlation between the graph structure of the fruit grains and the judgment of whether thinning fruit exists to realize the grape thinning fruit identification. The specific technical scheme is as follows:

a grape thinning fruit identification method based on a graph attention network comprises the following steps:

step S1, example division data set creation: collecting grape cluster picture data, carrying out instance segmentation and labeling on the fruit grains, labeling a polygonal area of each fruit grain in a grape cluster picture by using a labeling tool, and dividing the collected grape cluster picture data into a training set, a verification set and a test set;

step S2, drawing attention to network data set creation: respectively carrying out thinning classification labeling on the training set and the verification set in the step S1, carrying out analysis and judgment on the grape cluster picture by a labeling person, labeling the grape cluster picture as thinned fruit or not thinned fruit, and establishing a graph structure to obtain the training set and the verification set of the graph attention network model; the grape ear picture is a complete picture, the complete picture is set to comprise N vertexes, all the vertexes are connected with each other through one edge, and each vertex represents one fruit grain;

step S3, training an example segmentation model: inputting the training set in the step S1 into an example segmentation model for training, and inputting the verification set in the step S1 into a trained example segmentation intermediate model for verification in the training process; when the recognition accuracy of the trained example segmentation intermediate model is greater than or equal to a preset value, outputting the trained example segmentation intermediate model as a final example segmentation model, and if the recognition accuracy of the example segmentation intermediate model is smaller than the preset value, repeating the step S3 until the recognition accuracy of the trained example segmentation intermediate model is greater than or equal to the preset value;

step S4, the attention network construction: constructing a graph attention network model, wherein the graph attention network comprises an input layer, a graph attention module and an output layer; the input layer is of a graph structure containing vertex features, each vertex contains F features, and the features at least comprise an abscissa of a normalized fruit grain center point, an ordinate of the normalized fruit grain center point and a normalized fruit grain pixel area; representing the size consistency of the fruit grains through the pixel area of the normalized fruit grains, and representing the gap of the fruit grains through the distance between the normalized fruit grains; the distance between the normalized fruit grains is obtained by calculating the abscissa of the center point of the normalized fruit grains and the ordinate of the center point of the normalized fruit grains;

the graph attention module realizes feature learning in a multi-head attention mechanism weighted summation mode; the output layer is used for identifying whether the grapes are thinning fruits or not;

step S5, the graph is aware of the force network loss function construction: adopting cross entropy as a loss function of the whole graph classification training;

step S6, attention network model training: inputting the graph attention network training set processed in the step S2 into the graph attention network model constructed in the step S4, and performing supervised training by adopting the loss function constructed in the step S5, wherein in the training process, the graph attention network verification set in the step S2 is input into the trained graph attention network intermediate model for verification;

when the recognition accuracy of the trained graph attention network intermediate model is larger than or equal to a preset value, outputting the trained graph attention network intermediate model as a final graph attention network model, and if the recognition accuracy of the graph attention network intermediate model is smaller than the preset value, repeating the step S6 until the recognition accuracy of the trained graph attention network intermediate model is larger than or equal to the preset value;

step S7, model reasoning: inputting the grape cluster pictures of the test set in the step S1 into the example segmentation model trained in the step S3 for reasoning to obtain an example segmentation result; and (4) selecting the characteristics of the graph structure of the example segmentation result, and finally inputting the graph structure and the characteristics thereof into the graph attention network model trained in the step S6 to judge whether fruit thinning is carried out, so as to finally obtain a grape fruit thinning whole graph classification result.

Preferably, in the step S1, the collected grape ear picture data is obtained according to a ratio of 0.8: 0.1: the scale of 0.1 is divided into a training set, a validation set and a test set.

Preferably, the step S2 of drawing attention to the creation of the network data set specifically includes: segmenting and labeling the corresponding example of each grape ear picture in the step S1 by adopting a contexHull function of opencv to calculate an ear profile, obtaining the area A of the ear profile through a contextuearea function, and obtaining the width w of the ear profile and the height h of the ear profile through a minArearect function; calculating by using moments function of opencv to obtain an abscissa x of the center point of the fruit grain; calculating by using moments function of opencv to obtain a longitudinal coordinate y of the center point of the fruit grain; the pixel area a of a single fruit grain is calculated by a contourArea function of opencv.

Preferably, the Mask R-CNN model of ResNet-50 is selected by the example segmentation model adopted in the step S3.

Preferably, the characteristics of the input layer in step S4 are calculated by:

the abscissa of the normalized fruit grain center point is calculated by the following formula:

wherein x is the horizontal coordinate of the center of the fruit in the grape fruit ear picture, and w is the width of the outline of the fruit ear in the grape fruit ear picture;

the vertical coordinate of the normalized fruit grain center point is calculated by the following formula:

wherein y is the longitudinal coordinate of the center of the fruit grain in the grape fruit ear picture, and h is the height of the fruit ear profile in the grape fruit ear picture;

the normalized area of the fruit grain pixel is calculated by the following formula:

wherein a is the pixel area of a single fruit grain in the grape fruit ear picture, and A is the area of the fruit ear contour in the grape fruit ear picture.

Preferably, the attention module in step S4 includes L stacked attention layers, and the ith vertex dimension of the (L + 1) th attention layer is F^l+1New feature vector of

The formula is adopted to calculate the following formula:

wherein K represents the number of heads of the multi-head attention mechanism; n is a radical of_iA set of neighbor vertices representing the ith vertex,

attention coefficients of a vertex i and a vertex j in a kth head of an ith graph attention layer are noticed; w^lkA weight matrix representing the kth head of the ith graph attention layer;

representing the jth vertex dimension of the ith graph attention layer as F^lThe feature vector of (2);

attention coefficients of vertex i and vertex j in the k-th head

Calculated by the following formula:

wherein,

representing the ith vertex dimension of the ith graph attention layer as F^lThe feature vector of (2);

representing an attention weight vector; the operator | | represents the feature concatenation; lambda [ alpha ]_ijExpressing the normalized distance between the ith fruit grain and the jth fruit grain;

the normalized distance lambda of the ith fruit grain and the jth fruit grain_ijCalculated by the following formula:

wherein x is_i、x_jRespectively representing the abscissa of the center point of the ith fruit grain and the abscissa of the center point of the jth fruit grain in the grape fruit ear picture; y is_i、y_jRespectively representing the ordinate of the center point of the ith fruit grain and the ordinate of the center point of the jth fruit grain in the grape fruit ear picture.

Preferably, in step S4, the output layer includes 1 full-connection layer for classification, the number of categories is C, the grape thinning fruit is identified as a binary task, C is 2, the output feature of the last graph attention layer of the graph attention module and the weight matrix of the full-connection layer are subjected to matrix multiplication, and normalized to 0-1 by a sigmoid activation function σ, and finally the probability of the grape thinning fruit and the grape non-thinning fruit is obtained, where the calculation method is as follows:

p_c＝σ(W_fcM)； (7)

wherein p is_cRepresenting the probability of being identified as class c, in the range 0-1, c being 1, 2; w_fcRepresenting a full connection weight matrix; m represents the feature mean vector of all vertices of the output feature of the last graph attention layer of the graph attention module.

Preferably, the loss function in step S5 is specifically as follows:

wherein C represents the number of classes, C is 2, C is 1, 2; q. q.s_cOne-hot encoded class labels, q, representing the c-th class_cE {0,1}, q when c is a true class_c1, otherwise q_c＝0；p_cThe probability of being identified as class c is represented, and the range is 0 to 1.

The beneficial effects of the invention are as follows: compared with the existing grape thinning fruit identification method, the grape thinning fruit identification method based on the graph attention network utilizes the example segmentation and the graph attention network whole graph classification algorithm, the size consistency of the fruit grains is represented through the pixel area of the normalized fruit grains, the gap of the fruit grains is represented through the distance between the normalized fruit grains, the relevance between the structure of the graph of the fruit grains and the judgment of whether thinning fruit exists is mined, and the accuracy of automatic grape thinning fruit identification is improved.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings used in the detailed description or the prior art description will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a diagram of the graph attention network architecture of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As shown in fig. 1, the embodiment of the present invention provides a grape thinning fruit identification method based on a graph attention network, including the following steps:

step S1, example division data set creation: collecting grape cluster picture data, carrying out instance segmentation and labeling on the fruit grains, labeling a polygonal area of each fruit grain in a grape cluster picture by using a labeling tool, and dividing the collected grape cluster picture data into a training set, a verification set and a test set; specifically, the collected grape ear picture data is calculated according to the following ratio of 0.8: 0.1: the scale of 0.1 is divided into a training set, a validation set and a test set.

Step S2, drawing attention to network data set creation: and (4) respectively carrying out fruit thinning classification and labeling on the training set and the verification set in the step (S1), analyzing and judging the grape ear pictures by a labeling person, labeling the grape ear pictures as thinned fruits or not thinned fruits, and segmenting and labeling the examples corresponding to each grape ear picture in the step (S1) to establish an ear picture structure to obtain the training set and the verification set of the graph attention network model. The grape cluster picture is a complete picture, all vertexes are connected with each other by one edge, and each vertex represents a fruit grain; assuming that the complete graph includes N vertices, E represents the number of edges, and the number of edges E is calculated by the following formula:

the drawing attention network data set production specifically comprises the following steps: segmenting and labeling the corresponding example of each grape ear picture in the step S1 by adopting a contexHull function of opencv to calculate an ear profile, obtaining the area A of the ear profile through a contextuearea function, and obtaining the width w of the ear profile and the height h of the ear profile through a minArearect function; calculating by using moments function of opencv to obtain an abscissa x of the center point of the fruit grain; calculating by using moments function of opencv to obtain a longitudinal coordinate y of the center point of the fruit grain; the pixel area a of a single fruit grain is calculated through the contourArea function of opencv.

Step S3, training an example segmentation model: inputting the training set in the step S1 into an example segmentation model for training, and inputting the verification set in the step S1 into a trained example segmentation intermediate model for verification in the training process; and if the recognition accuracy of the example segmentation intermediate model is smaller than the preset value, repeating the step S3 until the recognition accuracy of the trained example segmentation intermediate model is larger than or equal to the preset value. The Mask R-CNN model of ResNet-50 is selected according to the adopted example segmentation model.

Step S4, the attention network construction: constructing a graph attention network model, wherein the graph attention network comprises an input layer, a graph attention module and an output layer; the input layer is of a graph structure containing vertex features, each vertex contains F features, and the features at least comprise an abscissa of a normalized fruit grain center point, an ordinate of the normalized fruit grain center point and a normalized fruit grain pixel area; representing the size consistency of the fruit grains through the pixel area of the normalized fruit grains, and representing the gap of the fruit grains through the distance between the normalized fruit grains; and the distance between the normalized fruit grains is calculated by the abscissa of the center point of the normalized fruit grains and the ordinate of the center point of the normalized fruit grains.

The characteristics of the input layer are calculated in the following way:

The graph attention module realizes feature learning in a multi-head attention mechanism weighted summation mode; the method comprises the following specific steps:

the graph attention module comprises 3 stacked graph attention layers, wherein the ith vertex dimension of the L +1 th graph attention layer is F^l+1New feature vector of

Feature learning is realized by a multi-head attention mechanism weighted summation mode, and an ELU activation function is acted onThe multi-head attention mechanism introduces nonlinearity, and is calculated by adopting the following formula:

where L ∈ {1, …, L },

k represents the number of heads of the multi-head attention mechanism, and K is 8 in the embodiment; n is a radical of_iRepresenting the set of neighbor vertices of the ith vertex, wherein the neighbor vertices of the vertex i are all vertices on the graph structure because the grape cluster graph structure constructed in the step S2 is a complete graph;

attention coefficients of the vertex i and the vertex j in the kth head of the ith map attention layer are noted,

W^lka weight matrix representing the kth head of the ith graph attention layer,

representing the jth vertex dimension of the ith graph attention layer as F^lIs determined by the feature vector of (a),

F^lrepresenting the input dimension of the ith graph attention layer, F^l+1The output dimension of the i +1 th graph attention layer is represented.

Attention coefficients of vertex i and vertex j in kth head of ith attention layer

Calculated by the following formula:

wherein,

representing the ith vertex dimension of the ith graph attention layer as F^lIs determined by the feature vector of (a),

a vector of attention weight is represented that is,

the operator | | represents the feature concatenation; lambda [ alpha ]_ijRepresenting the weight between the ith fruit grain and the jth fruit grain, and representing the weight by the distance between the normalized fruit grains, wherein the weight is prior knowledge introduced aiming at the gap characteristics of the grape fruit grains;

the correlation between vertex i and vertex j,

the LeakyReLU activation function introduces non-linearity to the correlation learning and uses softmax for the normalization operation, i.e., the exp part in equation (6).

wherein x is_i、x_jRespectively representing the abscissa of the central point of the ith fruit grain and the abscissa of the central point of the jth fruit grain; y is_i、y_jRespectively representing the ordinate of the center point of the ith fruit grain and the center point of the jth fruit grain.

Attention Module As shown in FIG. 2, the output sizes of the input layers are (B, N, F)¹) B represents the number of sample batches of a single iteration, N represents the number of vertices of the complete graph of the grape ear picture, and F represents the number of vertices of the complete graph of the grape ear picture in the embodiment¹That is, in the first graph attention layer, each vertex includes 3 features, i.e., the abscissa of the normalized fruit piece center point, the ordinate of the normalized fruit piece center point, and the normalized fruit piece pixel area, but may include other features, i.e., F¹If > 3, then:

the input size of the 1 st graph attention layer is (B, N, F)¹) 1 st figure attention layer weight matrix W¹Size is (F)¹,F²K) The output size of the 1 st attention layer is (B, N, F)²K)；

The input size of the 2 nd graph attention layer is (B, N, F)²K) The 2 nd graph attention layer weight matrix W²Size is (F)²K,F³K) The output size of the 2 nd graph attention layer is (B, N, F)³K)；

The input size of the 3 rd graph attention layer is (B, N, F)³K) Weight matrix W for the 3 rd graph attention layer³Size is (F)³K,F³K) The output size of the 3 rd graph attention layer is (B, N, F)³K)；

In this embodiment, F²＝256，F³256, the hidden layer feature dimension of the graph attention network model self-learning is represented, and the input size and the output size of the last layer are the same.

The output layer is used for identifying whether the grapes are thinning fruits or not; the output layer comprises 1 full-connection layer for classification, the category number is C, grape thinning is identified as a binary task, C is 2, the last graph attention layer of the graph attention module, namely the output characteristic of the 3 rd graph attention layer and the weight matrix of the full-connection layer are subjected to matrix multiplication operation, the result is normalized to 0-1 through a sigmoid activation function sigma, and the probability that grape has been thinned and has not been thinned is finally obtained, wherein the calculation mode is as follows:

p_c＝σ(W_fcM)； (8)

wherein p is_cRepresenting the probability of being identified as class c, in the range 0-1, c being 1, 2; w_fcRepresents a linear transformation layer weight matrix that is,

m represents a feature mean vector of all vertices of the output feature of the last graph attention layer of the graph attention module, obtained by calculating a feature mean of all vertices of the output feature of the last graph attention layer of the graph attention module by a dgl.

Finally, obtaining a classification result p with the size of (B, C) through matrix multiplication and activation function calculation_c。

Step S5, the graph is aware of the force network loss function construction: adopting cross entropy as a loss function of the whole graph classification training; the loss function is specifically as follows:

Those of ordinary skill in the art will appreciate that the elements of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A grape thinning fruit identification method based on a graph attention network is characterized by comprising the following steps: the method comprises the following steps:

2. The grape thinning fruit identification method based on the graph attention network as claimed in claim 1, wherein: in the step S1, the collected grape ear picture data is calculated according to a ratio of 0.8: 0.1: the scale of 0.1 is divided into a training set, a validation set and a test set.

3. The grape thinning fruit identification method based on the graph attention network as claimed in claim 1, wherein: the step S2 of making the attention network data set specifically includes: segmenting and labeling the corresponding example of each grape ear picture in the step S1 by adopting a contexHull function of opencv to calculate an ear profile, obtaining the area A of the ear profile through a contextuearea function, and obtaining the width w of the ear profile and the height h of the ear profile through a minArearect function; calculating by using moments function of opencv to obtain an abscissa x of the center point of the fruit grain; calculating by using moments function of opencv to obtain a longitudinal coordinate y of the center point of the fruit grain; the pixel area a of a single fruit grain is calculated by a contourArea function of opencv.

4. The grape thinning fruit identification method based on the graph attention network as claimed in claim 1, wherein: the Mask R-CNN model of ResNet-50 is selected by the example segmentation model adopted in the step S3.

5. The grape thinning fruit identification method based on the graph attention network as claimed in claim 1, wherein: the characteristics of the input layer in step S4 are calculated in the following manner:

wherein x is the abscissa of the center point of the fruit grain in the grape fruit ear picture, and w is the width of the outline of the fruit ear in the grape fruit ear picture;

wherein y is the longitudinal coordinate of the center point of the fruit grain in the grape fruit ear picture, and h is the height of the fruit ear profile in the grape fruit ear picture;

6. The grape thinning fruit identification method based on the graph attention network as claimed in claim 5, wherein: the figure attention module in the step S4 includes L stacked figure attention layers, and the ith vertex dimension of the (L + 1) th figure attention layer is F^l+1New feature vector of

The formula is adopted to calculate the following formula:

the kth head middle vertexi and attention coefficient of vertex j

Calculated by the following formula:

wherein,

7. The grape thinning fruit identification method based on the graph attention network as claimed in claim 6, wherein: in step S4, the output layer includes 1 full connection layer for classification, the number of categories is C, the grape thinning is identified as a binary task, where C is 2, the output feature of the last graph attention layer of the graph attention module and the weight matrix of the full connection layer are subjected to matrix multiplication, and normalized to 0-1 by a sigmoid activation function σ, and finally the probability of the grape thinned fruit and un-thinned fruit is obtained, where the calculation method is as follows:

p_c＝σ(W_fcM)； (7)

8. The grape thinning fruit identification method based on the graph attention network as claimed in claim 1, wherein: the loss function in step S5 is specifically as follows: