CN116205879A

CN116205879A - Unmanned aerial vehicle image and deep learning-based wheat lodging area estimation method

Info

Publication number: CN116205879A
Application number: CN202310167936.2A
Authority: CN
Inventors: 陈鹏; 庞春晖; 章军; 夏懿; 王俊峰; 张明年; 张波; 杜健铭; 王儒敬
Original assignee: Hefei Intelligent Agriculture Collaborative Innovation Research Institute Of China Science And Technology
Current assignee: Hefei Intelligent Agriculture Collaborative Innovation Research Institute Of China Science And Technology
Priority date: 2023-02-22
Filing date: 2023-02-22
Publication date: 2023-06-02

Abstract

The invention relates to a wheat lodging area estimation method based on unmanned aerial vehicle images and deep learning, which comprises the following steps: collecting wheat Tian Tuxiang by an unmanned aerial vehicle; preprocessing and data enhancement are carried out; constructing a deep learning model, improving the Mask R-CNN model, evaluating by adopting an evaluation index, and adjusting training parameters; inputting the wheat Tian Tuxiang to be evaluated into an optimal Mask R-CNN model, and outputting the number of lodging pixels of the wheat by the optimal Mask R-CNN model; and obtaining an estimation result of the actual lodging area of the wheat based on the number of the pixel points lodged by the wheat and the actual area represented by the single pixel calculated by the calibration area. According to the invention, the wheat lodging area is accurately identified and segmented, and the number of pixels of the lodging area is calculated, so that the area of the wheat lodging area is calculated, the disaster degree is estimated, the high-flux operation requirement in the wheat field environment can be met, and technical support is provided for subsequent disaster treatment and loss estimation.

Description

Unmanned aerial vehicle image and deep learning-based wheat lodging area estimation method

Technical Field

The invention relates to the technical field of deep learning and unmanned aerial vehicle remote sensing, in particular to a wheat lodging area estimation method based on unmanned aerial vehicle images and deep learning.

Background

The traditional crop growth condition monitoring is to manually enter a field to collect information, and the method is time-consuming, labor-consuming, poor in timeliness and limited to the influences of geography, weather and other factors, and cannot meet actual requirements. On one hand, the measurement result is not necessarily accurate, the problems that the influence of subjectivity is large, the actual requirement for large-scale land block lodging disaster area assessment is not suitable, and the like exist, and on the other hand, the measurement is performed manually in deep fields, so that secondary damage to wheat can be caused.

In addition, the existing wheat lodging area assessment based on satellite spectrum data, the unmanned aerial vehicle-based wheat lodging area assessment and the traditional machine learning-based wheat lodging area assessment also have defects:

at present, the problems related to crop lodging remote sensing monitoring mainly include that the data acquisition time of most medium-high resolution satellite remote sensing images is limited, the satellite revisiting period is long, and the spatial resolution of the data is affected by different sensors to a certain extent and is different. Satellite images often fail to provide the high resolution data needed for accurate agriculture in time due to satellite re-entry periods, low ground resolution, and data acquisition costs. In addition, the acquisition of satellite images is easily affected by the thickness of cloud layers, and in some agricultural areas, the pixel scale of the satellite images is sometimes larger than that of a single field, and large errors can be caused, so that the method cannot well realize real-time accurate monitoring of crop lodging.

Image data acquired based on an unmanned aerial vehicle platform is easily influenced by factors such as solar angle, wind speed, cloud shielding and the like, and the boundary between a lodging area and a non-lodging area is fuzzy, so that the image data is difficult to distinguish.

The traditional crop lodging information extraction method uses a shallow machine learning method (a support vector machine, a maximum likelihood method, a K-means method and the like) to extract a crop lodging area, and is not well applicable to different characters generated by crop varieties, planting areas, fertilization, irrigation and weather factors.

Disclosure of Invention

The invention aims to provide a wheat lodging area estimation method based on unmanned aerial vehicle image and deep learning, which reduces lodging estimation cost, greatly shortens time span of lodging estimation and improves real-time performance and accuracy of disaster monitoring.

In order to achieve the above purpose, the present invention adopts the following technical scheme: a wheat lodging area estimation method based on unmanned aerial vehicle images and deep learning comprises the following steps in sequence:

(1) Collecting wheat Tian Tuxiang by an unmanned aerial vehicle;

(2) Preprocessing the collected wheat Tian Tuxiang;

(3) Data enhancement is carried out on the pretreated wheat Tian Tuxiang;

(4) The data after data enhancement form a data set, and the data set is processed according to the following steps of 7:2:1 is divided into a training set, a verification set and a test set;

(5) Constructing a deep learning model for wheat lodging segmentation by using a Mask R-CNN model, improving the Mask R-CNN model, and training the improved Mask R-CNN model by using a training set to obtain a trained Mask R-CNN model;

(6) Testing the trained Mask R-CNN model by adopting a test set, evaluating the trained Mask R-CNN model by adopting an evaluation index, and adjusting training parameters of the trained Mask R-CNN model according to an evaluation result to obtain an optimal Mask R-CNN model;

(7) Inputting the wheat Tian Tuxiang to be evaluated into an optimal Mask R-CNN model, and outputting the number of lodging pixels of the wheat by the optimal Mask R-CNN model;

(8) And obtaining an estimation result of the actual lodging area of the wheat based on the number of the pixel points lodged by the wheat and the actual area represented by the single pixel calculated by the calibration area.

The step (2) specifically comprises the following steps:

(2a) Randomly cutting: randomly cutting each wheat Tian Tuxiang to a size of at least 25 800 x 800 pixels;

(2b) Screening the images: screening the images obtained by random cutting, and removing images without wheat Tian Daofu;

(2c) Labeling: marking a wheat lodging area by using a labelme tool, drawing points one by one in the area to be marked, connecting the points at the edge into a line to finally form a closed loop so as to obtain marking image information, and generating a json file; the lodging area is marked as a foreground, denoted by 1, the other areas outside the lodging area are marked as a background, denoted by 0, and used as labels for segmentation training or evaluation.

The step (3) specifically comprises the following steps:

(3a) Brightness balance: performing brightness conversion to eliminate brightness deviation caused by field environment illumination change and sensor difference;

(3b) Contrast transformation: the contrast of the image is improved;

(3c) Gaussian filtering: adding Gaussian filtering to the image to carry out fuzzy processing;

(3d) Geometric transformation: the image is flipped and scaled.

In the step (5), the Mask R-CNN model comprises a main network for extracting an input image feature map, a region suggestion network, an interesting region alignment layer and a region convolution neural network, the Mask R-CNN model adopts a residual network and a feature pyramid as extraction features of the main network, the main network outputs the feature map to the region suggestion network, the region suggestion network generates an interesting region, a candidate object boundary box is proposed, the interesting region alignment layer matches the feature map output by the interesting region and the main network, feature aggregation and pooling are completed to be of a fixed size, the feature map feature aggregation and pooling is then output to the region convolution neural network through a full connection layer, the region convolution neural network comprises a first branch, a second branch and a third branch, the first branch realizes classification of wheat lodging through a softmax classifier, and the second branch realizes more accurate target positioning through a boundary box regressor; the third branch completes the contour segmentation of wheat lodging through a full convolution network, a mask is generated, and finally, the output information of each branch is synthesized to obtain an image containing categories, positioning boundary boxes and segmentation masks;

the improvement of the Mask R-CNN model specifically comprises the following steps:

(5a) Improvement of regional advice networks: adding up 3 scales and 3 length-width ratios of the regional suggestion network to 9 different target frames, and adding up 64×64 scales to generate 4×3 anchor frames;

(5b) The characteristic pyramid network is improved: and the Mask R-CNN model is transferred from bottom to top, so that the information transfer path is shortened.

In the step (6), the evaluation indexes comprise an accuracy rate, a recall rate, an F1 value, a IoU index and an accuracy rate, the recall rate, the F1 value and the IoU index are adopted to evaluate the performance of the trained Mask R-CNN model, the accuracy rate is used for quantifying the extraction capacity of the lodging area, the accuracy rate refers to the proportion of the actual lodging area in the predicted lodging area, the recall rate refers to the proportion of the predicted lodging area to the actual lodging area, the F1 value is the sum average of the accuracy rate and the recall rate, the IoU index refers to the overlapping rate of the predicted lodging area and the actual lodging area, the accuracy rate refers to the proportion of the recognized effective area to the extracted total area, the value of the evaluation index is between 0 and 1, and the larger value indicates the better effect;

the formula of the accuracy is as follows:

wherein L is _t To correctly identify as lodged wheat area, N _t To correctly identify non-lodging wheat area, L _f To mistakenly identify lodged wheat as non-lodged wheat area, N _f Lodging wheat area was incorrectly identified.

The step (8) specifically refers to: and multiplying the actual area represented by the single pixel and the number of the pixels lodged by the wheat calculated by the calibration area by the actual area represented by the single pixel to obtain an estimation result of the actual lodged area of the wheat.

According to the technical scheme, the beneficial effects of the invention are as follows: firstly, the unmanned aerial vehicle is utilized to shoot the wheat lodging image, and the wheat lodging image is used as a data set for deep learning, so that the unmanned aerial vehicle is rapid, accurate, economical and practical, and can realize nondestructive monitoring of wheat lodging; secondly, the improved Mask R-CNN model is used for segmenting the wheat lodging area, the network feature extraction capability is strong, the segmentation effect of the model obtained through training is fine, and the actual application scene is met; third, the method for evaluating the lodging area by combining the unmanned aerial vehicle image information with the deep learning has practical application value, and compared with other evaluation modes, the method provided by the invention has the advantages that the model accuracy is improved, the lodging evaluation cost is reduced, the time span of lodging evaluation is greatly shortened, and the method is suitable for scenes with high requirements on the real-time performance and accuracy of disaster monitoring, such as post-disaster emergency treatment and insurance claims.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of an image-enhanced amplification dataset;

FIG. 3 is a diagram of a Mask R-CNN network framework in accordance with the present invention;

FIG. 4 is a schematic diagram of the results of the model test.

Detailed Description

As shown in fig. 1, a wheat lodging area estimation method based on unmanned aerial vehicle image and deep learning includes the following steps in sequence:

(1) Collecting wheat Tian Tuxiang by an unmanned aerial vehicle;

(2) Preprocessing the collected wheat Tian Tuxiang;

(3) Data enhancement is carried out on the pretreated wheat Tian Tuxiang;

(6) Testing the trained Mask R-CNN model by adopting a test set, and evaluating qualitative and quantitative related indexes by utilizing pictures in the test set to identify a wheat lodging area of the test set, wherein the test effect is shown in figure 4; evaluating the trained Mask R-CNN model by adopting an evaluation index, and adjusting training parameters of the trained Mask R-CNN model according to an evaluation result to obtain an optimal Mask R-CNN model;

The step (2) specifically comprises the following steps:

The step (3) specifically comprises the following steps:

(3b) Contrast transformation: the contrast of the image is improved;

(3d) Geometric transformation: the image is flipped and scaled.

In step (5), as shown in fig. 3, the Mask R-CNN model includes a main network for extracting an input image feature map, a region suggestion network, a region of interest alignment layer and a region convolution neural network, the Mask R-CNN model adopts a residual network and a feature pyramid as extraction features of the main network, the main network outputs a feature map to the region suggestion network, the region suggestion network generates a region of interest, a candidate object bounding box is proposed, the region of interest alignment layer matches the feature map output by the region of interest and the main network, feature map feature aggregation is completed and pooled to a fixed size, and the feature map feature aggregation is output to the region convolution neural network through a full connection layer, the region convolution neural network includes a first branch, a second branch and a third branch, the first branch realizes classification of wheat lodging through a softmax classifier, and the second branch realizes more accurate target positioning through a bounding box regressor; the third branch completes the contour segmentation of wheat lodging through a full convolution network, a mask is generated, and finally, the output information of each branch is synthesized to obtain an image containing categories, positioning boundary boxes and segmentation masks;

(5b) The characteristic pyramid network is improved: and the Mask R-CNN model is transferred from bottom to top, so that the information transfer path is shortened. The original Mask R-CNN model uses a top-down method to transfer the high-level semantic features and fuses the high-level semantic features with the low-level spatial features so as to improve the classification capability of the feature pyramid. Aiming at the accurate positioning capability required by wheat lodging, the method is improved into a method from bottom to top, the space characteristics of the lower layer are transferred, the information transfer path is shortened, and the function of accurately positioning information of the characteristics of the bottom layer is improved.

In the step (6), the evaluation indexes comprise an accuracy rate, a recall rate, an F1 value, a IoU index and an accuracy rate, the recall rate, the F1 value and the IoU index are adopted to evaluate the performance of the trained Mask R-CNN model, the accuracy rate is used for quantifying the extraction capacity of the lodging area, the accuracy rate refers to the proportion of the predicted lodging area to the actual lodging area, the recall rate refers to the proportion of the predicted lodging area to the actual lodging area, the F1 value is the sum of the accuracy rate and the recall rate, the IoU index refers to the overlapping rate of the predicted lodging area and the actual lodging area, the accuracy rate refers to the proportion of the recognized effective area to the extracted total area, the evaluation index is taken between 0 and 1, and the larger the value is, so that the effect of dividing the lodging area of the wheat by the Mask R-CNN model is better.

The formula of the accuracy is as follows:

Example 1

1. Image acquisition

According to the practical situation of the invention combined with domestic and foreign expert experience, the experiment adopts a Dajiang eidolon 4Pro unmanned plane, the wheelbase is 350mm, the camera pixels are 2000 ten thousand pixels, the image sensor is 1 inch CMOS, the lens parameters are FOV 84 degrees, 8.8mm/24mm (35 mm format is equivalent), and the aperture f/2.8-f/11. The GPS/GLONASS dual-mode positioning is carried, the resolution of a shot image is 5472 pixels multiplied by 3078 pixels, and the aspect ratio is 16:9. The image acquisition time is 10 am, the weather is clear and cloudless, the vertical shooting is carried out, the flying height is 30m, the flying speed is 2m/s, the flying duration is 20min, the course overlapping degree is 80%, the side overlapping degree is 80%, the camera shooting mode is equal-time interval shooting, and finally 100 original images are acquired.

2. Image preprocessing

The method comprises the following specific steps of screening photos collected by an unmanned aerial vehicle, removing images with abnormal attitude angles and imaging problems, selecting images shot by a 1 st route, and constructing a training set and a verification set of image segmentation:

(1) Randomly cutting: the invention adopts unmanned plane high-flux data with resolution of 5472 x 3078 pixels, and the image with excessive resolution can cause the consumption of GPU resources of a computer in the training process due to the limitation of hardware conditions. In order to reduce the consumption of GPU hardware resources while keeping the characteristic information and spatial resolution of the wheat lodging images unaffected, each unmanned aerial vehicle image is randomly cut into 800 x 800 pixels, more than 25 effective images can be obtained from a single image, and finally 2500 small-size images can be obtained from 100 original images after random cutting. The accuracy of image marking is improved while hardware resource requirements are reduced, because details of the small-size image are enlarged when marking the image lodging area, and the edge part of the wheat lodging can be more easily observed and processed.

(2) Screening the images: the training effect of the deep learning model is indistinguishable from the quality of data fed into the model, so that the images obtained in the step (1) are screened, and the main purpose is to remove images which do not meet the requirements of the model, such as images which do not fall down due to wheat or images which are disturbed in the image acquisition process to cause problems such as image distortion, local overexposure and the like.

(3) Labeling: and manually marking the wheat lodging area by using a labelme tool, and forming a closed loop finally by connecting points at the edge into a line by tracing the to-be-marked area one by one to obtain marking image information to generate a json file. The lodging area is marked as foreground (denoted by 1) and the other areas as background (denoted by 0) as labels for segmentation training or evaluation.

3. Data amplification:

in order to improve the robustness of the segmentation model, reduce the dependence of the model on certain attributes, avoid the overfitting phenomenon of the model caused by insufficient data volume in the training process, and enhance the data of the original image and the label by adopting a plurality of amplification methods after the data marking is finished, the method comprises the following specific operations:

(1) Brightness balance: performing brightness conversion at different levels to eliminate brightness deviation caused by illumination change of field environment and sensor difference;

(2) Contrast transformation: the contrast of the image is improved, so that details at the edge of the wheat lodging image can be better expressed;

(3) Gaussian filtering: adding Gaussian filtering to the image for blurring processing, and enhancing the generalization capability of the model to the blurred image;

(4) Geometric transformation: the image is turned over, zoomed and the like to improve network detection performance, so that the model learns to have the multi-scale characteristics of unchanged rotation.

In the image labeling process, 1800 wheat images are labeled together, the total number of the expanded wheat lodging images reaches 18000, 12600 Zhang Yangben of the images are randomly selected as training sets, and the sample numbers of the verification set and the test set are 3600 and 1800 respectively. Fig. 2 shows an amplified image of 4 wheat plots.

The data set for model training and testing in the invention is an image with a resolution of 5472 pixels multiplied by 3078 pixels and an aspect ratio of 16:9 at a height of 30 meters by using a Dajiang genius 4Pro unmanned aerial vehicle. In order to calculate the actual lodging area through the model result, keeping the height and other factors unchanged, the number of pixels of the unmanned aerial vehicle remote sensing image corresponding to a small area of 6m multiplied by 1.5m, namely the calibration area, is 257500, and the actual area corresponding to about 28600 pixels in the remote sensing image can be calculated to be 1m ² That is, the actual area of a single pixel point in the image corresponding to the ground is about 0.35cm ² 。

In summary, the high-definition image of the wheat lodging unmanned aerial vehicle is used as a data source, a Mask R-CNN model is used for constructing a deep learning model for wheat lodging segmentation, accurate identification and segmentation are carried out on a wheat lodging area, the number of pixels of the lodging area is calculated, the area of the wheat lodging area is calculated, and the disaster degree is estimated. According to the invention, an unmanned aerial vehicle is used for shooting a field wheat lodging image, a data enhancement technology is used for simulating a field complex environment under natural conditions, and the wheat lodging area is estimated; the unmanned aerial vehicle image information and the deep learning are combined, so that the high-flux operation requirement under the wheat field environment can be met, and technical support is provided for subsequent disaster treatment and loss assessment.

Claims

1. A wheat lodging area estimation method based on unmanned aerial vehicle images and deep learning is characterized by comprising the following steps of: the method comprises the following steps in sequence:

(1) Collecting wheat Tian Tuxiang by an unmanned aerial vehicle;

(2) Preprocessing the collected wheat Tian Tuxiang;

(3) Data enhancement is carried out on the pretreated wheat Tian Tuxiang;

2. The unmanned aerial vehicle image and deep learning-based wheat lodging area estimation method of claim 1, wherein the method comprises the following steps of: the step (2) specifically comprises the following steps:

3. The unmanned aerial vehicle image and deep learning-based wheat lodging area estimation method of claim 1, wherein the method comprises the following steps of: the step (3) specifically comprises the following steps:

(3b) Contrast transformation: the contrast of the image is improved;

(3d) Geometric transformation: the image is flipped and scaled.

4. The unmanned aerial vehicle image and deep learning-based wheat lodging area estimation method of claim 1, wherein the method comprises the following steps of: in the step (5), the Mask R-CNN model comprises a main network for extracting an input image feature map, a region suggestion network, an interesting region alignment layer and a region convolution neural network, the Mask R-CNN model adopts a residual network and a feature pyramid as extraction features of the main network, the main network outputs the feature map to the region suggestion network, the region suggestion network generates an interesting region, a candidate object boundary box is proposed, the interesting region alignment layer matches the feature map output by the interesting region and the main network, feature aggregation and pooling are completed to be of a fixed size, the feature map feature aggregation and pooling is then output to the region convolution neural network through a full connection layer, the region convolution neural network comprises a first branch, a second branch and a third branch, the first branch realizes classification of wheat lodging through a softmax classifier, and the second branch realizes more accurate target positioning through a boundary box regressor; the third branch completes the contour segmentation of wheat lodging through a full convolution network, a mask is generated, and finally, the output information of each branch is synthesized to obtain an image containing categories, positioning boundary boxes and segmentation masks;

5. The unmanned aerial vehicle image and deep learning-based wheat lodging area estimation method of claim 1, wherein the method comprises the following steps of: in the step (6), the evaluation indexes comprise an accuracy rate, a recall rate, an F1 value, a IoU index and an accuracy rate, the recall rate, the F1 value and the IoU index are adopted to evaluate the performance of the trained Mask R-CNN model, the accuracy rate is used for quantifying the extraction capacity of the lodging area, the accuracy rate refers to the proportion of the actual lodging area in the predicted lodging area, the recall rate refers to the proportion of the predicted lodging area to the actual lodging area, the F1 value is the sum average of the accuracy rate and the recall rate, the IoU index refers to the overlapping rate of the predicted lodging area and the actual lodging area, the accuracy rate refers to the proportion of the recognized effective area to the extracted total area, the value of the evaluation index is between 0 and 1, and the larger value indicates the better effect;

the formula of the accuracy is as follows:

6. The unmanned aerial vehicle image and deep learning-based wheat lodging area estimation method of claim 1, wherein the method comprises the following steps of: the step (8) specifically refers to: and multiplying the actual area represented by the single pixel and the number of the pixels lodged by the wheat calculated by the calibration area by the actual area represented by the single pixel to obtain an estimation result of the actual lodged area of the wheat.