CN116703932A

CN116703932A - CBAM-HRNet model wheat spike grain segmentation and counting method based on convolution attention mechanism

Info

Publication number: CN116703932A
Application number: CN202310604004.XA
Authority: CN
Inventors: 许鑫; 耿庆; 马新明; 乔红波
Original assignee: Henan Agricultural University
Current assignee: Henan Agricultural University
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-09-05

Abstract

The invention discloses a CBAM-HRNet model wheat spike grain segmentation and counting method based on a convolution attention mechanism, which relates to the technical field of segmentation and counting and comprises the following steps of firstly collecting a data set image and preprocessing the image to form a wheat spike data set; then image segmentation is carried out on wheat grains by using a deep learning segmentation network, a prediction model is obtained through training, the prediction model is called to test a test set, and a prediction result is output; and constructing a spike counting model by combining the output prediction result with an image processing technology, so as to realize accurate prediction and counting of wheat spikes. The invention constructs a CBAM-HRNet wheat spike segmentation counting deep learning model based on a convolution attention mechanism, and constructs a spike counting model by utilizing an image processing algorithm and wheat spike texture characteristics; the method has better wheat grain segmentation effect, better robustness and segmentation precision, and also has stronger generalization capability.

Description

CBAM-HRNet model wheat spike grain segmentation and counting method based on convolution attention mechanism

Technical Field

The invention belongs to the technical field of segmentation and counting, and particularly relates to a CBAM-HRNet model wheat spike segmentation and counting method based on a convolution attention mechanism.

Background

The scientific and accurate prediction of wheat yield is helpful to ensure the safety and social stability of grain supply; the traditional wheat estimation is obtained by manually investigating the ear number and the grain number per ear before wheat harvest and multiplying the ear number and the grain number by the conventional thousand grain weight, the estimated yield is often influenced by artificial factors, and meanwhile, the time and the labor are consumed, the efficiency is low, and the timeliness and the accuracy of wheat yield estimation are restricted; the grain number is an important parameter in estimating the crop yield, and has become a key scientific problem of intelligent estimation;

with the development of image processing and machine learning technologies, an important monitoring means is provided for the segmentation and identification of the wheat ears and the wheat grains; although the image processing technology is widely applied to identifying the wheat ears and the wheat grains, the methods focus on the extraction of texture features, color features and morphological features, and have problems in efficiency and practical application;

the Fernandez-Gallego and the like calculate the wheat head number for the RGB color image of the field by adopting a local maximum peak method, and the counting success rate is higher than 90 percent. Current image processing techniques require a large number of artificial image feature extraction, which places high demands on the environment and technology, while machine learning has proven to be a significant advantage in the field of image segmentation and recognition. Liu Zhe and the like propose a wheat spike counting algorithm based on color feature K-means clustering, and the recognition accuracy reaches 94%. Xu X and the like automatically extract the contour features of the wheat ears based on a K-means clustering algorithm, and further construct a CNN model to improve the identification accuracy of the wheat ears to 98.3%. However, the traditional image processing technology and the machine learning method still have the defects of long recognition and segmentation time and low efficiency; the problems of poor recognition and segmentation effects of complex images and the like;

the existing deep learning method is widely adopted to identify the wheat spike number and the wheat spike number, so that higher-precision image segmentation and identification can be realized, but professional equipment such as a CMOS camera is required for acquiring the wheat spike image, and the method is difficult to apply in complex production; and the problems of image recognition and segmentation of dense small targets, adhesion between targets easily occurs, and accuracy is difficult to improve are solved.

Disclosure of Invention

Aiming at the problems that the existing wheat grain based identification and segmentation method has long identification and segmentation time and low efficiency in the process of identification and segmentation and counting; the invention provides a CBAM-HRNet model wheat spike grain segmentation and counting method based on a convolution attention mechanism, which aims at the defects and problems that complex image recognition and segmentation effects are poor, adhesion is easy to occur between targets and accuracy is difficult to improve aiming at the problems of image recognition and segmentation of dense small targets.

The invention solves the technical problems by adopting the scheme that: a CBAM-HRNet model wheat grain segmentation and counting method based on a convolution attention mechanism comprises the following steps:

step one, data acquisition: respectively selecting a plurality of wheat strains of different varieties, acquiring a plurality of original wheat spike images of various varieties, and creating a wheat spike information table based on the different varieties;

step two, preprocessing the wheat ear image in the step one: the method comprises the steps of data normalization processing and data enhancement to form a wheat head data set;

thirdly, performing image segmentation on wheat grains by using a deep learning segmentation network, training to obtain a prediction model, calling the prediction model to test a test set, and outputting a prediction result: constructing a CBAM-HRNet, HRNet, PSPNet, deeplabV3+ segmentation model based on a convolution attention mechanism and carrying out segmentation and comparison on wheat grains respectively by U-Net; the CBAM-HRNet is used for realizing information interaction among different branches through parallel branches with multiple resolutions, achieving the purposes of strong semantic information and accurate position information and avoiding a large amount of effective information from being lost in the continuous up-down sampling process; adding the convolution attention mechanism to realize an up-sampling process for characterizing branches;

step four, constructing a spike counting model by combining the prediction result output by the step three with an image processing technology, so as to realize accurate prediction and counting of wheat spikes: after the above-mentioned wheat ear samples selected for each variety are predicted by deep learning segmentation model, there is a part of binding between each ear grain, and the part of overlapping and binding needs to be eliminated by image processing method, including the following steps:

s401, GRAY processing is carried out on the prediction result in the step three, and the color space conversion is changed from RGB to GRAY;

s402, performing binarization processing on the image threshold value to remove an overlapped part; the binarized image needs corrosion transformation to eliminate noise points, and calculates the distance from a pixel point in the image to the nearest zero pixel point, and the skeleton of the outline is obtained after the distance transformation;

s403, converting the dimensionalized expression into a dimensionless expression by using normalization on the binarized image to form a scalar, and normalizing to obtain the gray value of the image between 0 and 1.0;

s404, processing the gray level image into a binarized image through binarization and on operation;

s405, extracting the contour according to the boundary point drawing shape provided by the binarized image, wherein the extracted contour is the spike grain number at one side of the wheat.

According to the CBAM-HRNet model wheat grain segmentation and counting method based on the convolution attention mechanism, the image acquisition equipment is parallel to the wheat grain when the wheat grain image is acquired in the first step, the object distance is changed by adjusting the vertical height, the fact that the wheat grain completely appears in the field of view of the lens of the mobile equipment is known, and a clear grain image is displayed.

The wheat spike grain segmentation and counting method based on the CBAM-HRNet model of the convolution attention mechanism comprises the steps of nitrogen fertilizer treatment, shooting background, shooting date, weather, resolution, image size, shooting equipment, focal length and image quantity information of the wheat spike.

According to the CBAM-HRNet model wheat grain segmentation and counting method based on the convolution attention mechanism, in the second step, the image size is normalized to 480 multiplied by 80, so that the model calculation amount is reduced, and the overfitting risk is reduced.

According to the CBAM-HRNet model wheat grain segmentation and counting method based on the convolution attention mechanism, in the second step, data enhancement is expanded by utilizing images in the original data set so as to solve the problem of insufficient image data; and meanwhile, gaussian blur is used for reducing image noise and reducing detail level, so that the image effect of the image under different scale sizes is enhanced.

In the above CBAM-HRNet model wheat grain segmentation and counting method based on convolution attention mechanism, in the second step, the two types of segmentation objects of the wheat grain and the background are marked manually by a Labelme image marking tool, and are converted into mask images by marking information; the ear image and the mask image form a data set required by the deep learning segmentation model; the number of the images which are processed differently in the dataset is evenly distributed, and the wheat head dataset is divided into a training set and a verification set according to proportion.

According to the CBAM-HRNet model wheat grain segmentation and counting method based on the convolution attention mechanism, a network main body of the CBAM-HRNet comprises four stages and four parallel convolution branches, and the resolutions are 1/4, 1/8, 1/16 and 1/32 respectively; the first stage contains 4 bottleneck layer residual units, each unit is followed by a 3 x 3 convolution, the number of feature maps is changed to 32, and the other stages are the same; each module contains 4 residual units, each unit providing two 3 x 3 convolutions for each resolution, followed by a BN layer and a nonlinear activation function ReLU, with a multi-resolution fusion module at the end of each stage.

According to the CBAM-HRNet model wheat spike grain segmentation and counting method based on the convolution attention mechanism, in the fourth step, a prediction result is read in through OpenCV and NumPy, and gray processing is carried out on the prediction result; a threshold is set 120 for binarization processing.

According to the CBAM-HRNet model wheat grain segmentation and counting method based on the convolution attention mechanism, the total grain number of grains is counted in two ways, namely, the double of the grain number at one side of the wheat grain is the total grain number; and the sum of the grain numbers of the two sides is the total grain number.

Compared with the prior art, the invention has the beneficial effects that:

the invention constructs a CBAM-HRNet wheat spike segmentation counting deep learning model based on a convolution attention mechanism, and constructs a spike counting model by utilizing an image processing algorithm and wheat spike texture characteristics so as to realize the prediction counting of wheat spikes; compared with the traditional HRNet, PSPNet, deeplabV3+ segmentation model, U-Net and other methods, the method has better wheat grain segmentation effect, better robustness, further improved segmentation precision, stronger generalization capability, more abundant semantic information, and the method can be used for solving the problems of difficult segmentation of small target images and lack of fitting in training, and can be used for predicting the grain number of the wheat more quickly and accurately by the grain counting model, so that algorithm support is provided for efficient and intelligent estimation of the wheat;

aiming at the problems that semantic information of wheat ear images is complex, adhesion coverage phenomenon among wheat ears is serious and the like, the invention provides a method for improving a convolution attention mechanism on the basis of an original HRNet model, improving the efficiency of feature extraction, preventing weights from being too random and accelerating training efficiency; the effect predicted by the CBAM-HRNet model based on the convolution attention mechanism is obviously better than that of the compared network model, and the model has better robustness;

the method can be used for estimating the wheat spike number, and improves the wheat yield estimation efficiency; meanwhile, a rapid and automatic high-flux wheat grain counting system can be provided for agricultural workers, and the working efficiency is improved. The method is suitable for cutting and counting wheat grains, can be applied to cutting and counting other plants, and has wide application range.

Drawings

FIG. 1 is an overall workflow diagram of the present invention;

FIG. 2 is a diagram of a wheat grain segmentation network based on CBAM-HRNet;

FIG. 3 is a view of a PSPNet-based wheat grain segmentation network;

FIG. 4 is a diagram of a network structure for cutting wheat grains based on deep V3+;

FIG. 5 is a diagram of a network structure for splitting wheat kernels based on U-Net;

FIG. 6 is a diagram of the training process of the CBAM-HRNet model of the present invention;

FIG. 7 is a diagram of an HRNet training process;

FIG. 8 is a PSPNet training process diagram;

FIG. 9 is a diagram of the deep V3+ training process;

FIG. 10 is a diagram of a U-Net training process;

FIG. 11 is a flow chart of a four spike counting model according to the present invention;

FIG. 12 is a graph showing the segmentation effect of different models on a test set according to the present invention;

FIG. 13 is a graph showing a count result analysis of the method of the present invention;

FIG. 14 is a graph showing analysis of the results of the second counting method of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

Referring to fig. 1-14, the invention provides a technical scheme of a method for cutting and counting wheat grains of a CBAM-HRNet model based on a convolution attention mechanism, which comprises the following steps: the invention designs three varieties of field tests of Bainong 307, new wheat 26 and millet 336, and utilizes a mobile terminal to shoot wheat ear images so as to construct a wheat ear grain number deep learning segmentation model. The method comprises the steps of processing a segmentation result of wheat grains by an image processing technology, constructing a grain spike counting model by combining grain characteristics of the wheat grains, and realizing prediction counting of the wheat grains so as to obtain a rapid and efficient segmentation counting result and accurately estimate the yield of the wheat.

Embodiment one:

the embodiment provides a CBAM-HRNet model wheat grain segmentation and counting method based on a convolution attention mechanism, which comprises the following steps:

step one, data acquisition: the method comprises the steps of respectively selecting a plurality of wheat of different varieties through an image acquisition device, and in the embodiment, increasing complexity and diversity of a data set, improving generalization level of a model, adopting two image acquisition modes, wherein one mode is in-vitro sampling, and performing image acquisition on wheat ears in a laboratory environment; the other is in-situ sampling, i.e. image acquisition of the ears of wheat in a field environment.

When the image is acquired, the image acquisition mobile equipment is parallel to the wheat ears, the object distance is changed by adjusting the vertical height until the wheat ears completely appear in the field of view of the lens of the mobile equipment, clear ear images are displayed, and the original wheat ear images are obtained by the method; in order to eliminate the influence of different shooting distances on the wheat grain number, the distance between the moving equipment and the wheat grain is basically kept consistent during image acquisition so as to ensure the consistency of the wheat grain images; during collection, 30-40 wheat strains which are processed by each nitrogenous fertilizer in each bottle and are different in shooting background are selected respectively, 660 original images are obtained, and meanwhile, a wheat ear information table shown in the following table 1 is constructed:

table 1: wheat head information table

Step two, preprocessing the wheat ear image acquired in the step one and constructing a wheat ear data set: wherein the image preprocessing includes data normalization processing and data enhancement, in particular,

(1) Data normalization

The data normalization is to convert all images into a uniform size, so that model training is facilitated; because the size of the original wheat ear image is too large and the requirement on equipment is too high, in order to consider the calculation capability of the equipment, the number and quality of the images, the original image needs to be normalized, and the image size in the data set is normalized to 480 multiplied by 480 before model training, so that the calculation amount of the model can be reduced, and the risk of overfitting is reduced.

(2) Data enhancement

Because the number of images can influence the training of the model and easily lead to the decline of the model test precision, the images in the original data set are utilized for expansion through data enhancement, and the problem of insufficient number of images is solved; enhancing the data set by rotating 90 °, 180 ° and 270 ° and turning horizontally and vertically; meanwhile, gaussian blur is used for reducing image noise and detail level, and the most suitable Gaussian blur processing is found out by continuously adjusting the size of the Gaussian convolution kernel. The size of the gaussian convolution kernel is set to 5 x 5 by comparison, thereby enhancing the image effect of the image at different scale sizes.

After the image processing, the data set is constructed, in this embodiment, a supervised learning mode is adopted, that is, a deep learning model is trained by using manually marked data samples, so as to obtain a network model with a certain generalization capability, and realize computer vision tasks such as target classification, target detection, image segmentation and the like.

In the embodiment, when the data sample is marked, two types of segmentation objects, namely wheat grains and background, are marked manually through a Labelme image marking tool, and are converted into mask images through marking information. The ear image and the mask image form a data set required by the deep learning segmentation model; the number of images processed for each of the 3 varieties in the dataset was evenly distributed, and the wheat head dataset was set to 9:1, the training set and the verification set are divided according to the proportion, wherein 612 images of the training set, 62 images of the verification set and 56 images of the test set are obtained.

Thirdly, performing image segmentation on wheat grains by using a deep learning segmentation network, training to obtain a prediction model, calling the prediction model to test a test set, and outputting a prediction result: the segmentation of wheat grains is a classification task under a complex background, pixel points among the wheat grains are similar, and image adhesion is serious, so that the requirements on the analysis force and the global information acquisition capability of a deep learning model are higher. In order to ensure the segmentation precision and the calculation efficiency, the embodiment constructs a CBAM-HRNet, HRNet, PSPNet, deeplabV & lt3+ & gt segmentation model based on a convolution attention mechanism and a U-Net to segment wheat grains,

it should be noted that, when model training is performed, the weight is too random, so that the training effect of the network from 0 will be poor, and the feature extraction effect will be insignificant, so that according to the idea of migration learning, a freeze-thaw mechanism is constructed to improve the training effect.

As will be further described below with respect to the above model,

1) With respect to CBAM-HRNet

The CBAM-HRNet is used for realizing information interaction among different branches through parallel branches with multiple resolutions, achieving the purposes of strong semantic information and accurate position information, and avoiding a large amount of effective information from being lost in the continuous up-down sampling process.

The network body of the CBAM-HRNet comprises four stages and four parallel convolution branches with resolutions of 1/4, 1/8, 1/16 and 1/32, respectively. The first stage contains 4 bottleneck layer residual units, each followed by a 3 x 3 convolution, changing the number of feature maps to 32, and the other stages are similar. Each module contains 4 residual units, each providing two 3 x 3 convolutions for each resolution, followed by a BN layer (Batch Normalization) and a nonlinear activation function ReLU, with a multi-resolution fusion module at the end of each stage.

Because of the different resolutions, the low resolution characterization needs to reduce the resolution by setting the convolution step when receiving the high resolution information, and the high resolution characterization also needs to increase the resolution and the channel number by convolving the bilinear upsampling with 1×1 when receiving the low resolution information. For semantic segmentation tasks, the structure characterizing the branches is that the low resolution parts are resolution-enhanced by bilinear upsampling and stacked together, followed by 1 x 1 convolution fusion.

In the up-sampling process of characterizing the branches, a convolution attention mechanism (CBAM) is added, and the CBAM is a combination of a channel attention mechanism and a spatial attention mechanism, so that better effect can be achieved compared with the attention mechanism which only focuses on the channel or the space.

The CBAM will process the input feature layer with the channel attention mechanism and the spatial attention mechanism respectively. The channel attention mechanism will perform global average pooling and global maximum pooling on the single feature layer of the input, respectively. And then processing the results of the two, utilizing the shared full-connection layer, adding the processed results, and obtaining the weight (between 0 and 1) of each channel of the input feature layer through a Sigmoid activation function.

After the weight is obtained, multiplying the weight by the original input characteristic layer; the spatial attention mechanism will take the maximum and average values on the channels of each feature point for the input feature layer. And stacking the results, adjusting the channel number by using convolution with the channel number of 1 at a time, and then activating the function by Sigmoid to obtain the weight (between 0 and 1) of each feature point of the input feature layer.

After the weight is obtained, the original input characteristic layer is multiplied. The input feature map is multiplied by the channel attention mechanism, then the weight and the input feature map are sent to the space attention mechanism, and the input feature map of the normalized weight and the space attention mechanism is multiplied to obtain a final feature map.

Referring to fig. 2, for the network structure of the CBAM-HRNet, the CBAM-HRNet can maintain high resolution from beginning to end, information interaction of different branches can supplement information loss caused by reduction of channel number, and can realize self-adaptive attention of a network, and the network architecture design has a remarkable effect on a position-sensitive semantic segmentation task.

2) With respect to PSPNet

PSPNet is improved on the basis of FCN, an input image is subjected to feature extraction through a feature extraction network, in order to increase receptive field, the feature extraction network adopts a ResNet network added with hole convolution, and the extracted features are used as input of a pyramid pooling module (Pyramid Pooling Module).

A feature pyramid with the depth of 4 is constructed in the module, features with different depths are obtained through pooling operation with different scales based on input features, and then feature dimensions are reduced to 1/4 of the original dimensions through a 1X 1 convolution layer. And finally, directly upsampling the pyramid features to the same size as the input features, and then combining the pyramid features with the input features to obtain a final output feature map.

Referring to fig. 3, a network structure diagram of the PSPNet is shown, wherein the above-mentioned feature merging process is a process of merging a detail feature (shallow feature) and a global feature (deep feature, i.e., context information) of a target.

3) DeeplabV3+ segmentation model

As shown in fig. 4, the network structure of the deeplabv3+ partition model is the encoder-decoder structure; in the encoder part, two parts, namely a Backbone and a hole space pyramid pooling (ASPP), are mainly included. One part of the feature map output from the Backbone is the feature map output by the convolution of the last layer, and the other part is the feature map of the middle low-level feature. The ASPP module receives the output of the first part of the Backbone as input, uses four cavity convolution blocks with different expansion rates and a global average pooling block to obtain five groups of feature graphs, fuses the feature graphs, passes through a 1 multiplied by 1 convolution block, and finally sends the feature graphs to the Decoder module.

In the decoder section, low-level feature maps from the Backbone middle layer and outputs from the ASPP module are received as inputs. Firstly, performing channel dimension reduction on a low-level feature map by using 1×1 convolution; then, carrying out interpolation up-sampling on the feature map from the ASPP module to obtain a feature map with the same size as the low-level feature map; then, splicing the low-level feature image of channel dimension reduction and the feature image obtained by linear interpolation up-sampling, and sending the feature image into a group of 3X 3 convolution blocks for processing; and finally, linear interpolation up-sampling is carried out again to obtain a prediction graph with the same resolution as the original graph.

4)U-Net

The network structure of U-Net is shown in FIG. 5; U-Net is also the encoder-decoder structure, in the encoder part, the convolution module is adopted to extract the characteristics; each convolution module includes two convolution layers, each followed by a BN layer. And reducing the dimension of the obtained characteristics by adopting a pooling layer after passing through one convolution module, and reducing the dimension to half of the previous convolution module under the condition of maintaining the number of channels unchanged. After passing through the 5 convolution modules, high-level feature vectors of the input image are obtained and passed into the decoder section.

In the decoder part, deconvolution is realized by adopting an up-sampling and convolution combination method, the resolution of the characteristic image is increased, and the extracted characteristics are decoded by 4 decoding modules. And the low-level features obtained by the encoder are fused, so that the full utilization of the low-level features is ensured, and the segmentation effect of the network on the micro target is improved. After the features are restored to the size of the input image, the feature map with the same size as the input image is sent to a softmax layer, the probability that each pixel belongs to each category is obtained, the probability threshold value of each category is determined, and if the probability threshold value is higher than the threshold value, the pixel is considered to belong to the designated category. Finally, an image segmentation result is obtained.

The same wheat ear training set is trained through the CBAM-HRNet model, the HRNet, PSPNet, deeplabV3+ segmentation model and the U-Net based on the convolution attention mechanism, and the comparison effect of the mIoU and the loss values in the training process is shown in the figures 6-10, and the result shows that:

in the continuous iteration process, the mIoU values of the four models are in a steady rising trend, and gradually tend to be converged steadily along with the increase of the iteration times. The mIoU value of the CBAM-HRNet model based on the convolution attention mechanism is basically stabilized at about 0.85, and the model has a good cutting effect on the wheat ear data set. The loss values of the model in the training set and the verification set are rapidly reduced and gradually converged to be near 0.021, the network convergence speed is high, the situation that errors suddenly increase is avoided, and the error change amplitude is gentle. The difference of errors between the two data sets is small, so that in the gradient calculation process, the model can quickly and accurately find out the proper gradient direction, the performance is stable, and the learning effect is good. The change trend of the loss values of the training set and the verification set is basically consistent, which indicates that the model has better generalization capability.

Step four, constructing a spike counting model by combining the prediction result output by the step three with an image processing technology, so as to realize accurate prediction and counting of wheat spikes: referring to fig. 11, after the ear samples selected by each variety are predicted by the deep learning segmentation model, there is a part of adhesion between the ear grains, and the overlapping and adhesion parts need to be eliminated by an image processing method; the method comprises the following steps:

s401, reading in a prediction result through OpenCV and NumPy, and carrying out GRAY processing on the prediction result, wherein the color space conversion is changed from RGB to GRAY. After the processing, the pixel points 120 can be binarized by taking the pixel points as the threshold value, wherein the pixel points larger than the threshold value are 255 (namely white), and the pixel points smaller than the threshold value are 0 (namely black);

s402, eliminating noise points in the binarized image through corrosion transformation, calculating the distance from a pixel point in the image to the nearest zero pixel point, and obtaining a skeleton of the contour after the distance transformation; effectively removing the overlapped part through binarization;

s405, extracting a contour according to the drawing shape of the boundary points provided by the binarized image, wherein the extracted contour is the spike number of one side of the wheat;

according to the geometric and texture characteristics of wheat, wheat grains generally exist in pairs on two sides of a cob, so that the grains can be counted in two ways; the first method is that the double of the grain number at one side of the wheat ear can be regarded as the total grain number; the second method is that the sum of the grain numbers of the two sides is the total grain number; therefore, finally, the wheat grains can be easily counted according to the two methods by combining the extracted outlines of the grain number of one side of the wheat. The grain number of the wheat can be rapidly and accurately predicted by the grain counting model, and algorithm support is provided for efficient and intelligent estimation of the wheat.

Embodiment two:

on the basis of the first embodiment, the present embodiment describes the evaluation of the segmentation accuracy of the segmentation model in the first embodiment:

the segmentation model is mainly evaluated by accuracy, recall rate, class average pixel accuracy and average cross-merging comparison segmentation accuracy. The evaluation index is calculated by the parameters in the confusion matrix. In model accuracy evaluation, a confusion matrix is mainly used to compare a predicted value with a true value, and is calculated by comparing the position of each real pixel with the position of the predicted pixel, specifically,

the accuracy is the proportion of the number of samples with the predicted value being the true value to the total number of samples, and the formula is as follows:

the recall is the proportion of the number of samples with the predicted value being the true value in all positive examples, and the formula is as follows:

the cross-over ratio is a standard metric for evaluating semantic segmentation accuracy, and the formula is as follows:

class average pixel accuracy refers to the ratio of the number of correctly classified pixels for each class, and the formula is as follows:

the average cross ratio is the average of all classes IoU, and the formula is as follows:

TP-correct classification case in the formula; TN-negative of correct classification; FP-negative case divided into positive case by mistake; FN-positive examples divided into negative examples by mistake; k-total number of categories of segmentation; p (P) _k -pixel accuracy per class.

And taking the manually counted number of wheat ears and kernels of each sample as a true value, and taking the number of wheat ears and kernels of each sample obtained by an image segmentation algorithm and an ear counting model as a predicted value. The indexes for the accuracy of the quantization count model include root mean square error, average absolute error, average relative error and decision coefficient.

Embodiment III:

this embodiment describes the freeze-thaw mechanism described in embodiment one:

the pre-training weights of the model are common to the different data sets, as the features extracted by the neural network backbone feature extraction portion are common. The pre-training weight is needed to be used in most cases, otherwise, the weight of a trunk part is too random, the feature extraction effect is not obvious, and the network training result is also not good. The training can be quickened by freezing the training, and the weight can be prevented from being damaged.

In the freezing stage, the trunk of the model is frozen, the feature extraction network is not changed, the occupied video memory is small, and only the network is finely adjusted, so that the training requirements of different machine performances are met. In the thawing stage, the trunk of the model is not frozen, the feature extraction network is changed, the occupied video memory is large, and all parameters of the network are changed.

The training parameters of the freezing stage are as follows: the training generation (init_epoch=0) from which the model currently starts, the iteration number of model Freeze training (freeze_epoch=50), and the batch size of model Freeze training (freeze_batch_size=16);

training parameters of the thawing phase are: the model total number of iterations trained (unfreeze_epoch=300) and the model batch size after thawing (unfreeze_batch_size=8).

Embodiment four:

in order to measure the effectiveness of the model of the present invention, the present embodiment adopts different backbone networks, optimizers and learning rates according to different segmentation models, and adopts various evaluation indexes to compare the training results and performances of the model, as shown in the following table:

/>

according to the above table results, the CBAM-HRNet model based on convolution attention mechanism with hrnetv2_w32 as backbone network and Adam as optimizer obtains the best segmentation accuracy (mlou= 0.8521), HRNet model with hrnetv2_w32 as backbone network and Adam as optimizer is inferior (mlou=0.851), PSPNet segmentation accuracy with mobilenet v2 as backbone network and SGD as optimizer is the lowest (mlou= 0.7718); the training of the four models by using the Adam optimizer is superior to the training result by using the SGD optimizer, because the Adam adapts to the learning rate by combining the first-order momentum and the second-order momentum, and the problems that the SGD optimizer descends slowly and is easy to obtain a local optimal solution are solved.

Fifth embodiment:

on the basis of the fourth embodiment, under the condition that the optimal backbone network and the optimizer are Adam, the cutting capability of a CBAM-HRNet model, a HRNet, a U-Net, a PSPNet and a deep labV < 3+ > cutting model based on a convolution attention mechanism is compared and analyzed by using a wheat head test set, and the cutting effect is shown in fig. 12;

it can be found from the figure that, for the input wheat ear image, the segmentation effect of the PSPNet and deep V < 3+ > segmentation model is not ideal, and although wheat ears and backgrounds can be segmented, the ears are adhered seriously, because the gray features among the ears are similar, the pixel values are similar and are easily segmented into a whole, and the gray value of the background is far away from the wheat ears and is easily segmented.

While HRNet and U-Net segmentation works well, some details are lost in complex environments. In contrast, the CBAM-HRNet model based on the convolution attention mechanism and taking hrnetv2_w32 as a main network and Adam as an optimizer has strong segmentation capability on the wheat ear image, is not easily influenced by other noise, and can accurately segment the wheat ear grains and calculate the quantity of the wheat ear grains based on the segmentation capability.

Example six:

in order to further verify the accuracy of the method, 30 sample images are selected for each wheat variety, spike grain segmentation and counting are carried out on the wheat sample images through a CBAM-HRNet model based on a convolution attention mechanism and an image processing algorithm, the grain number of each wheat spike is calculated according to the two counting methods in the first embodiment, and the grain number is compared with the manual counting and analyzed; as shown in fig. 13 and 14, the counting results of the spike counting model under the two methods are shown in the following table, which is an evaluation table of the counting accuracy of the wheat spike under the two methods:

compared with the traditional image processing algorithm, the method has the advantages that the counting accuracy of wheat grains is higher, the average absolute error and the average error are lower, and the fitting degree of the predicted value and the true value is better. Therefore, the method is used for processing the wheat ears of different varieties, the counting precision of the ear grain number can be greatly improved, and the automatic counting of the ear grain of the single wheat can be realized with higher precision.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. A CBAM-HRNet model wheat spike grain segmentation and counting method based on a convolution attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:

thirdly, performing image segmentation on wheat grains by using a deep learning segmentation network, training to obtain a prediction model, calling the prediction model to test a test set, and outputting a prediction result: constructing a CBAM-HRNet, HRNet, PSPNet, deeplabV3+ segmentation model based on a convolution attention mechanism and a U-Net to respectively segment wheat grains and compare results; the CBAM-HRNet is used for realizing information interaction among different branches through parallel branches with multiple resolutions, achieving the purposes of strong semantic information and accurate position information and avoiding a large amount of effective information from being lost in the continuous up-down sampling process; adding the convolution attention mechanism to realize an up-sampling process for characterizing branches;

2. The method for cutting and counting wheat grains based on a CBAM-HRNet model of a convolution attention mechanism according to claim 1, wherein the method comprises the following steps of: in the first step, the image acquisition equipment is parallel to the wheat ears when the wheat ear image acquisition is carried out, the object distance is changed by adjusting the vertical height, the fact that the wheat ears completely appear in the field of view of the lens of the mobile equipment is known, and clear ear images are displayed.

3. The method for cutting and counting wheat grains based on a CBAM-HRNet model of a convolution attention mechanism according to claim 1, wherein the method comprises the following steps of: the wheat ear data table created in the first step comprises nitrogenous fertilizer treatment, shooting background, shooting date, weather, resolution, image size, shooting equipment, focal length and image quantity information of the wheat ears.

4. The method for cutting and counting wheat grains based on a CBAM-HRNet model of a convolution attention mechanism according to claim 1, wherein the method comprises the following steps of: in the second step, the image size is normalized to 480×80, so as to reduce the model calculation amount and reduce the overfitting risk.

5. The method for cutting and counting wheat grains based on a CBAM-HRNet model of a convolution attention mechanism according to claim 1, wherein the method comprises the following steps of: in the second step, the data enhancement is expanded by utilizing the image in the original data set so as to solve the problem of insufficient image data; and meanwhile, gaussian blur is used for reducing image noise and reducing detail level, so that the image effect of the image under different scale sizes is enhanced.

6. The method for cutting and counting wheat grains based on a CBAM-HRNet model of a convolution attention mechanism according to claim 1, wherein the method comprises the following steps of: in the second step, two types of segmentation objects, namely wheat grains and background, are marked manually through a Labelme image marking tool, and are converted into mask images through marking information; the ear image and the mask image form a data set required by the deep learning segmentation model; the number of the images which are processed differently in the dataset is evenly distributed, and the wheat head dataset is divided into a training set and a verification set according to proportion.

7. The method for cutting and counting wheat grains based on a CBAM-HRNet model of a convolution attention mechanism according to claim 1, wherein the method comprises the following steps of: the network main body of the CBAM-HRNet comprises four stages and four parallel convolution branches, and the resolution is 1/4, 1/8, 1/16 and 1/32 respectively; the first stage contains 4 bottleneck layer residual units, each unit is followed by a 3 x 3 convolution, the number of feature maps is changed to 32, and the other stages are the same; each module contains 4 residual units, each unit providing two 3 x 3 convolutions for each resolution, followed by a BN layer and a nonlinear activation function ReLU, with a multi-resolution fusion module at the end of each stage.

8. The method for cutting and counting wheat grains based on a CBAM-HRNet model of a convolution attention mechanism according to claim 1, wherein the method comprises the following steps of: in the fourth step, the prediction result is read in through OpenCV and NumPy, and gray processing is carried out on the prediction result; a threshold is set 120 for binarization processing.

9. The method for cutting and counting wheat grains based on a CBAM-HRNet model of a convolution attention mechanism according to claim 1, wherein the method comprises the following steps of: the total grain number of the wheat ears is counted in two ways, namely, the double of the grain number of one side of the wheat ears is the total grain number of the wheat ears; and the sum of the grain numbers of the two sides is the total grain number.