CN114821182A

CN114821182A - Rice growth stage image recognition method

Info

Publication number: CN114821182A
Application number: CN202210494136.7A
Authority: CN
Inventors: 吴琪; 吴云志; 曾涛; 乐毅; 张友华; 余克健; 胡楠
Original assignee: Anhui Agricultural University AHAU
Current assignee: Anhui Agricultural University AHAU
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-07-29

Abstract

The invention discloses an image recognition method for a rice growth stage, which comprises the following steps: step 1, collecting rice pictures on the spot and on the network as a data set; step 2, dividing the data set into a training set and a testing set, and performing preprocessing and data enhancement; step 3, constructing an ODRL-Swin transducer model; and step 4, setting configuration parameters of the ODRL-Swin transducer model as the optimal configuration parameters through training, and step 5, outputting a final rice prediction identification result by the ODRL-Swin transducer model. The method can improve the accuracy of rice image recognition.

Description

Rice growth stage image recognition method

Technical Field

The invention relates to the field of crop image identification methods, in particular to a rice growth stage image identification method.

Background

In order to cultivate rice better, the growth stages of rice need to be known, and corresponding planting measures need to be taken in different growth stages. In current wisdom agriculture, generally, through carrying out image acquisition at each growth stage of rice to discern the image of gathering, know what growth stage the rice is at present, and take corresponding planting measure based on the recognition result. Therefore, how to know the current growth state of the rice by identifying the images of the growth stage of the rice is an important factor influencing intelligent management and planting of the rice.

As computer technology continues to evolve, more and more technology can be integrated with other areas. In agriculture, accurate, rapid and convenient identification technology can reduce the cost of agricultural workers and has positive influence on the yield of crops. In the prior art, a convolutional network is mostly used as a model for detecting and identifying agricultural crops, the convolutional network is convenient to use, but the model mainly comprising the convolutional network has the defect of low precision. For the transform-based model, the performance is higher than that of the traditional convolution network, but a large amount of data sets are needed to be used as supports for training the model. In agriculture, data sets are often few, most data sets need to be collected and manually marked, a large amount of manpower and material resources need to be invested for manufacturing a large number of data sets, and the method has the characteristic of high cost. Therefore, a recognition model that achieves high accuracy on a small-scale data set is currently lacking.

Disclosure of Invention

The invention aims to provide an image recognition method for a rice growth stage, and aims to solve the problems that the rice recognition accuracy is low and a large number of data sets are required for training in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a rice growth stage image recognition method comprises the following steps:

step 1, acquiring a plurality of image data of rice at different growth stages as a data set;

step 2, dividing the data set obtained in the step 1 into a training set and a testing set, respectively preprocessing the data in the training set and the testing set, and then respectively enhancing the data;

step 3, based on the Swin Transformer model, the Swin Transformer module is composed of Block division, linear embedding and a plurality of Swin Transformer blocks, Optimized Dense relative positioning (Optimized Dense relative positioning) is added on the Swin Transformer blocks in the Swin Transformer model, and Dense relative position loss (Dense relative positioning loss) is added in a loss function, so that the ODRL-Swin Transformer model is constructed;

an original Swin Transformer module divides a characteristic graph into non-overlapping windows according to the size of the windows, elements in the windows are called blocks, and optimization dense relative positioning enables the Swin Transformer to fuse more spatial information without additional marking information through learning relative positions between the blocks; in an ODRL-Swin transform model, optimizing dense relative positioning by densely sampling blocks in an original Swin transform window, randomly selecting two blocks in the window during sampling, wherein the two blocks are called as an embedding pair, then calculating the geometric relative position distance of the embedding pair and predicting the relative position distance of the embedding pair by using MLP (Multi-level Linear programming) to realize the collection of spatial information, and further guiding the calculation of the relative position of the embedding pair by adding spatial relative position loss on a loss function of the Swin transform;

step 4, training the ODRL-Swin transducer model constructed in the step 3 by using the training set in the step 2, and adjusting parameters of the ODRL-Swin transducer model by combining the training result and the test set in the step 2 until the parameters of the ODRL-Swin transducer model are optimal configuration parameters;

and 5, inputting the rice growth stage image data to be identified into the ODRL-Swin Transformer model under the optimal configuration parameters obtained in the step 4, and outputting the rice growth stage image prediction identification result through the ODRL-Swin Transformer model.

Further, the preprocessing in the step 2 comprises the steps of removing repeated pictures and damaged pictures of the obtained multiple image data of the rice in different growth stages, deleting unmatched information in the annotation file, and dividing the data set into a training set, a testing set and a verification set according to the proportion of 7:2: 1.

Further, the data enhancement in step 3 includes Mosaic data augmentation, random flipping, scaling, and random cropping.

Furthermore, in step 4, the detection accuracy of the Swin Transformer is greatly improved compared with that of the conventional convolutional network, but the detection accuracy of the Swin Transformer on a small-scale data set is not high, and the Swin Transformer can also have better performance on the small-scale data set by optimizing Dense relative positioning (Optimized Dense relative positioning) and Dense relative position loss (Dense relative positioning loss).

Further, when training is performed in step 5, inputting the training set data into the ODRL-Swin Transformer model to obtain an output result, performing error calculation on the output result and the test set, and then adjusting the configuration parameters of the ODRL-Swin Transformer model based on the error calculation result until the error calculation result meets the expectation, wherein the configuration parameters of the ODRL-Swin Transformer model are the optimal configuration parameters.

The method is used for detecting rice pictures to be recognized based on the ODRL-Swin Transformer model, so that the detection of different stages of rice is realized, wherein the used ODRL-Swin Transformer model is added to the Swin Transformer Block in the Swin Transformer, and the ODRL-Swin Transformer model outputs the final rice growth stage prediction recognition result.

The method can ensure the accuracy of rice image recognition, realize the accurate recognition of the rice through a small-scale data set, and efficiently and accurately recognize the growth stage of the rice, so that farmers can provide the most reasonable planting measures for the rice, and the yield of the rice is improved.

Drawings

FIG. 1 is a block flow diagram of the method of the present invention.

FIG. 2 is a structural diagram of the ODRL-Swin transducer model of the present invention.

FIG. 3 is a structural diagram of an manipulated sensitive Relative Localization section of the present invention.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

As shown in fig. 1, the image recognition method for rice growth stage of the present invention comprises the following steps:

(1) preparing a data set:

collecting picture data of rice at different growth stages on site and on line, thereby constructing a data set;

(2) processing the data set:

firstly, obtaining specific labels and the number of images of various rice growth stages from the data set obtained in the step 1, removing repeated and abnormal data, then using the data set as a training set and a testing set according to the proportion of 7:3, respectively preprocessing the data in the training set and the testing set, wherein the preprocessing comprises the steps of obtaining a plurality of image data of the rice in different growth stages, removing repeated pictures and damaged pictures, deleting unmatched information in a label file, and dividing the data set into the training set, the testing set and a verification set according to the proportion of 7:2: 1.

(3) Data enhancement:

and respectively performing data enhancement on the data in the training set and the test set, wherein the data enhancement comprises Mosaic data amplification, random flip (RandomFlip), zoom (Resize) and random crop (RandomCrop). After data enhancement, filling (Pad) is carried out on the data so as to avoid feature loss and keep the features of the rice data set.

(4) An ODRL-Swin Transformer model is constructed on the basis of Swin Transformer:

an Optimized Dense relative positioning (Optimized Dense relative positioning) is added to a SwinTransformer Block in the SwinTransformer model, and a Dense relative position loss (Dense relative positioning loss) is added to a loss function, so that the ODRL-SwinTransformer model is constructed.

The characteristic diagram obtained after Block division and linear embedding is divided into non-overlapping windows in an original Swin transducer Block according to the size of the windows, elements in the windows are called blocks, and optimization dense relative positioning enables the Swin transducer to be capable of fusing more spatial information without additional labeling information through learning relative positions among the blocks. In our ODRL-Swin Transformer model, optimizing dense relative positioning is implemented by densely sampling blocks in the original Swin Transformer window, randomly selecting two blocks in the window during sampling, and then calculating the geometric relative position distance of the embedded pair and predicting the relative position distance of the embedded pair by using MLP to realize the collection of spatial information, and the implementation method is as follows:

given an image x, a Swin Transformer Block in a Swin Transformer model divides a feature graph obtained by Block division and linear embedding by the size of a window 7 × 7, and divides an input feature graph into H × W windows with the same size, wherein H is the number of rows of the divided windows, and W is the number of columns of the divided windows. One of which may be denoted G _x ＝{e _i,j } _{1≤i≤H,1≤j≤W} In which e is _i,j ∈R ^D ，e _i,j Representing an embedded block, D is the dimension dividing the window space. i denotes an embedded block of an ith row, and j denotes a jth column of the second embedded block.

For each G _x Randomly sampling pairs of embeddings in an optimized dense relative positioning module and for each sampling pair (e) _i,j ,e _p,j ) Calculating a 2D normalized translation offset (t) _u ,t _v ) ^T The calculation method is as follows:

(t _u ,t _v ) ^T ∈[0,1] ² .

wherein

sign () is a sign function, the function returns an integer variable indicating the sign of the parameter, and if the number of the returned value is greater than 0, sign returns 1; equal to 0, return 0; if less than 0, return to-1. Wherein the sign of the number parameter determines the return value of the sign function. When i.e. return 1 represents a positive input, return-1 represents a negative input, and return 0 represents the other inputs. Alpha is the segmentation point of the segmentation function and has a default value of 4. p denotes that the other embedded block in the embedded pair is the p-th row, and h denotes that the other embedded block in the embedded pair is the h-th row.

Embedding vector e to be selected _i,j And e _p,h Performing connection operation, inputting the vector subjected to the connection operation into a multilayer perceptron MLP of an optimized dense relative positioning module, wherein the multilayer perceptron MLP is provided with two hidden layers and two output neurons and is used for predicting the relative distance between a position (i, j) and a position (p, h) on a grid, and d _u Representing the distance on the predicted abscissa, d _v Indicating the distance on the prediction ordinate. The structure is shown in fig. 3, and the calculation formula is as follows:

(d _u ,d _v ) ^T ＝f(e _i,j ,e _p,j ) ^T

b represents the number of pictures processed simultaneously during model parallel computation, and the Dense relative localization loss (density relative localization loss) provided by the invention is as follows:

is added to Swin transducerStandard cross entropy loss of

In (1). The final total loss was:

λ 0.5 was used initially in the present invention. The introduction of regularization loss enables Swin Transformer to learn spatial information without using additional manual labeling. The spatial relationship in the image can be effectively learned without depending on a large number of data sets.

According to the invention, an optimized affected sensitive Localization optimization Relative position module is added in a Swin transform Block in a Swin transform model, the Swin transform Block is modified into an ODRL-Swin transform Block, and the structure diagram of the modified ODRL-Swin transform network is shown in FIG. 2. The Swin Transformer can be made to learn spatial information without using additional manual labeling, by densely sampling multiple embedded pairs per image and requiring the network to predict their relative positions.

(5) Training an ODRL-Swin transducer model, and setting the optimal configuration parameters of the model:

and (3) taking the output result of the ODRL-Swin transducer model after training as the predicted recognition result of the rice growth stage, calculating the classification error and regression error of the model output result and the test set, adjusting the configuration parameters of the ODRL-Swin transducer model after training into the optimal configuration parameters according to the verification and test results, inputting the data set of the rice growth stage image to be recognized into the ODRL-Swin transducer model with the parameters adjusted into the optimal configuration parameters, and outputting the final rice predicted recognition result by the ODRL-Swin transducer model.

The rice image data are used as a data set to be identified and input into a final ODRL-Swin transform model by identifying the images of the rice in different stages, the detection result of the rice is output by the ODRL-Swin transform model, and the growth stage of the rice to which the input image belongs is obtained, so that accurate identification and detection are realized.

The embodiments of the present invention are described only for the preferred embodiments of the present invention, and not for the limitation of the concept and scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the design concept of the present invention shall fall into the protection scope of the present invention, and the technical content of the present invention which is claimed is fully set forth in the claims.

Claims

1. A rice growth stage image recognition method is characterized by comprising the following steps:

step 3, based on the Swin Transformer model, the Swin Transformer module is composed of Block division, linear embedding and a plurality of Swin Transformer blocks; adding optimized dense relative positioning on a SwinTransformer Block in a SwinTransformer model, and adding dense relative position loss in a loss function, thereby constructing an ODRL-Swin Transformer model;

2. The method for identifying the images of the rice in the growing stage as claimed in claim 1, wherein the preprocessing in the step 2 includes the operations of removing the duplicate images and the damaged images of the obtained image data of the rice in different growing stages, deleting the unmatched information in the annotation file, and dividing the data set into a training set, a testing set and a verification set according to a ratio of 7:2: 1.

3. The method for recognizing the rice growth stage image as claimed in claim 1, wherein the data enhancement in the step 3 comprises Mosaic data expansion, random inversion, scaling and random cropping.

4. The rice growth stage image recognition method as claimed in claim 1, wherein in step 4, Swin Transformer detection accuracy is greatly improved compared with that of a conventional convolution network, but Swin Transformer detection accuracy is not high on a small-scale data set, and optimization of dense relative positioning and dense relative position loss enables Swin Transformer to perform better on the small-scale data set as well.

5. The method as claimed in claim 1, wherein during the training in step 5, the data of the training set is input into the ODRL-Swin Transformer model to obtain an output result, the output result and the test set are subjected to error calculation, and then the configuration parameters of the ODRL-Swin Transformer model are adjusted based on the error calculation result until the error calculation result is in accordance with the expectation, at which time the configuration parameters of the ODRL-Swin Transformer model are the optimal configuration parameters.