CN113705478A

CN113705478A - Improved YOLOv 5-based mangrove forest single tree target detection method

Info

Publication number: CN113705478A
Application number: CN202111009370.8A
Authority: CN
Inventors: 马永康; 凌成星; 刘华; 赵峰; 张雨桐; 曾浩威
Original assignee: Research Institute Of Forest Resource Information Techniques Chinese Academy Of Forestry
Current assignee: Research Institute Of Forest Resource Information Techniques Chinese Academy Of Forestry
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-26
Anticipated expiration: 2041-08-31
Also published as: CN113705478B

Abstract

A mangrove forest single tree target detection method based on improved YOLOv5, a target detection and image processing technology in deep learning and a young mangrove forest single tree recognition technology of an improved YOLOv5 algorithm belong to the mangrove forest single tree target detection field in forestry scientific research. Sequentially marking target trees on the selected unmanned aerial vehicle image by utilizing open source software Labe Img, constructing a mangrove forest single tree data set, selecting YOLOv5 as a basic target detection model, optimizing and improving the target trees according to the characteristics of dense distribution and small size of the target trees, improving a CSPDarknet53 backbone network by using an effective channel attention mechanism effi i ent channel attention i on, avoiding the reduction of dimensionality and cross-channel interaction, enhancing the characteristic expression capability, introducing SoftPoo pooling operation into an SPP module, retaining more detailed characteristic information and improving the automatic target detection precision.

Description

Improved YOLOv 5-based mangrove forest single tree target detection method

Technical Field

The invention relates to a mangrove forest single tree target detection method based on improved YOLOv5, a target detection and image processing technology in deep learning and a young mangrove forest single tree recognition technology based on improved YOLOv5 algorithm, belonging to the field of mangrove forest single tree target detection in forestry scientific research.

Background

Mangrove forest is a special forest community on tropical and subtropical coast zones and plays an important role in improving ecological conditions, maintaining biological diversity, ecological safety in coastal areas and the like. Mangrove resource survey and dynamic monitoring are the basis and precondition of scientific protection and management of mangrove, but the mangrove area in our country is decreasing day by day due to human reconstruction and other reasons. In order to better protect the mangrove forest ecosystem and enlarge the mangrove forest area, in recent years, the artificial afforestation work of a plurality of mangrove forest natural protection areas is developed according to the plan, and the supplementary planting and the artificial afforestation in the original mangrove forest become the main measures for recovering the mangrove forest resources. The geographical positions of mangrove forest distribution are special, most mangrove forest is in shoals in intertidal zones, the mangrove forest is easily submerged by seawater in tidal tide, if the mangrove forest is not survived, the mangrove forest is washed away by water flow because of no adhesive force of root systems, and therefore the existing single saplings of the mangrove forest are survived trees. In the special environment, the low-altitude unmanned remote sensing system is very suitable for image acquisition of mangrove forest regions due to the characteristics of flexible data acquisition, low cost and capability of rapidly acquiring small-range ultrahigh-resolution images. Based on this, be applied to the acquisition of mangrove seedling with unmanned aerial vehicle, be one of the way of effectively improving mangrove seedling monitoring precision.

Therefore, how to combine the unmanned aerial vehicle image to detect the individual mangrove forest quickly and accurately becomes a problem to be solved urgently in the checking work of the survival rate of the newly-built mangrove forest. With the development of target detection and object recognition technologies, deep learning begins to be widely applied to target detection, and face recognition, voice recognition and the like are the most common applications. Aiming at the problems of insufficient feature extraction capability, detection speed reduction caused by prediction frame processing and the like, the Anchor-Free lightweight detection algorithm with multi-scale feature fusion is provided. The defect detection method based on the scale invariant feature pyramid is provided for the problem that the inspection defect detection precision of the existing target detection algorithm is low in a complex inspection scene. However, the above methods are only to perform algorithm improvement on the corresponding specific scenes. In recent years, the invention of deep learning is also gradually applied to the forestry industry, and more accurate, rapid and intelligent monitoring of forestry is gradually realized. Zhouyan in the article of the small target disaster-stricken tree detection method based on deep learning, aiming at the problems of small tree scale, dense growth, irregular distribution and the like in the forest image of the unmanned aerial vehicle, provides a small target disaster-stricken tree detection method based on deep learning. However, the SSD algorithm (Si ng l e Shot Mu l t i Box Detector, which is a target detection algorithm proposed by We i Li u on the ECCV 2016) discards the underlying features containing rich information, and has low robustness for small target detection.

The current mainstream deep learning target detection algorithm is divided into a double-stage algorithm and a single-stage algorithm. Different from a double-stage detection algorithm represented by an R-CNN series, the YOLO directly completes feature extraction, candidate frame regression and classification in the same unbranched convolutional network, so that the network structure is simple, and the requirement of a real-time detection task can be met. The YOLO target detection model is subjected to updating iteration for a plurality of times, and the problems of multiple dimensions such as multi-target detection, small target detection, missed detection and repair, multi-scale prediction and the like are solved successively.

Based on the advantages of automatic feature learning, high speed, high efficiency and the like of target detection, the method better applies the latest YOLOv5 of the series of algorithms to the individual tree detection of the mangrove forest to realize the intelligent monitoring of the condition of the seedlings of the mangrove forest.

Disclosure of Invention

The invention aims to provide a mangrove forest single tree target detection method based on improved YOLOv5 based on a deep learning method, which solves the problems that the mangrove forest single tree targets in the current man-machine image are small and densely distributed, the automation degree is low and the efficiency is low during detection, and the like, provides technical support for the automatic detection of survival mangrove forest seedlings, further improves the survival rate inspection precision and efficiency of newly manufactured mangroves, and realizes the intelligent monitoring of the mangrove forest seedling conditions.

Aiming at the problems that the individual mangrove tree targets in the unmanned aerial vehicle image are small and distributed densely, the automation degree is low during detection, the efficiency is low and the like, a method for detecting the individual mangrove tree targets based on the improved YOLOv5 is provided based on a deep learning method, so that the individual mangrove trees in the unmanned aerial vehicle image can be automatically identified and positioned quickly and accurately.

A mangrove forest single tree target detection method based on improved YOLOv5 comprises the following steps:

and sequentially marking target trees on the selected unmanned aerial vehicle images by utilizing open source software LabelImg to construct a mangrove forest single tree data set.

The method selects YOLOv5 as a basic target detection model, optimizes and improves the target wood according to the characteristics of dense distribution and small size of the target wood, improves a CSPDarknet53 backbone network by using an effective Channel Attention mechanism, avoids descending and cross-Channel interaction, enhances the characteristic expression capability, introduces SoftPool pooling operation in an SPP module, retains more detailed characteristic information and improves the automatic target detection precision.

The technical scheme adopted by the invention for solving the technical problems is as follows: a mangrove forest single tree target detection method based on improved YOLOv5 comprises the following steps:

step 1, because the single trees of the young mangrove forest are in special geographic positions, densely distributed and small in target, the data are acquired by means of the unique time flexibility and high spatial resolution advantage of the unmanned aerial vehicle to construct a required data set.

Step 2, introducing an effective Channel Attention mechanism (effective Channel Attention) into a CSPDarknet53 feature extraction network of YOLOv5, improving the YOLOv5 network, and renaming a model after improvement to YOLOv 5-ECA.

And step 3, further improving on the basis of the step 2, introducing SoftPool improved pooling operation into an SPP module of the YOLOv5 network, and reserving more detailed characteristic information.

And 4, inputting the obtained single-tree data set image of the unmanned aerial vehicle mangrove forest into a Yolov5 feature extraction network for feature extraction, and obtaining feature graphs of different scales.

And 5, classifying and regressing the characteristic diagram obtained in the step 4, performing characteristic reconstruction operation on a regression result to obtain a more precise characteristic diagram, and performing classification and regression operation again on the basis to calculate loss.

And 6, after the training of the model is completed, testing the test set by means of the divided data set to realize the target detection of the young mangrove forest single trees, and evaluating the detection effect of the model.

The efficient attention channel of the present invention reduces the complexity of the model by nondecreasing and cross-channel information interaction, while enhancing the expressive power of the features. The softpool can keep the expressiveness of the characteristics and the operation is micro, so that the characteristic information of the whole receptive field is reserved, and the accuracy of the algorithm is improved. The method can quickly, accurately and automatically detect the newly manufactured mangrove forest single trees, has obvious advantages compared with the traditional detection method of the mangrove forest of the existing unmanned aerial vehicle, improves the target detection precision along with the improvement, has lower training loss, can realize the quick, accurate and automatic detection of the mangrove forest single tree targets, and better improves the recognition and positioning capability of the mangrove forest single trees.

Drawings

A more complete appreciation of the invention and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein the accompanying drawings are included to provide a further understanding of the invention and form a part of this specification, and wherein the illustrated embodiments of the invention and the description thereof are intended to illustrate and not limit the invention, as illustrated in the accompanying drawings, in which:

fig. 1 is a schematic diagram of a YOLOv5 network structure.

Fig. 2 is a block diagram of an attention module for introducing an effective channel.

Fig. 3 is a diagram of the structure of the incorporation of SoftPool pooling in SPP modules.

FIG. 4 is a schematic illustration of the position of SoftPool in YOLOv 5-ECA.

Fig. 5a is a collected image of a portion of a mangrove forest single-tree unmanned aerial vehicle.

Fig. 5b is a collected image of a portion of a mangrove forest single-tree unmanned aerial vehicle.

FIG. 6a is a target label graph.

Fig. 6b is a graph of normalized target position.

Fig. 6c is a graph of normalized target size.

FIG. 7 is a graph of the value of the Yolov5 and Yolov5-ECA loss functions as a function of the number of training rounds.

FIG. 8a is a comparison graph of convergence results of various parameters of YOLOv5 and YOLOv 5-ECA.

FIG. 8b is a comparison graph of convergence results of various parameters of YOLOv5 and YOLOv 5-ECA.

FIG. 8c is a comparison graph of convergence results of various parameters of YOLOv5 and YOLOv 5-ECA.

FIG. 8d is a comparison graph of convergence results of the respective parameters YOLOv5 and YOLOv 5-ECA.

FIG. 8e is a comparison graph of convergence results of various parameters of YOLOv5 and YOLOv 5-ECA.

FIG. 8f is a comparison graph of convergence results of the respective parameters YOLOv5 and YOLOv 5-ECA.

FIG. 9a is a graph showing the detection result of the YOLOv5 detection target.

FIG. 9b is a graph showing the results of detection of a target by YOLOv 5.

FIG. 9c is a graph showing the results of detection of the YOLOv5-ECA detection target.

FIG. 9d is a graph showing the results of detection of the YOLOv5-ECA detection target.

FIG. 10 is a flow chart of the steps of the present invention.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

It will be apparent that those skilled in the art can make many modifications and variations based on the spirit of the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element, component or section is referred to as being "connected" to another element, component or section, it can be directly connected to the other element or section or intervening elements or sections may also be present. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art.

The following examples are further illustrative in order to facilitate the understanding of the embodiments, and the present invention is not limited to the examples.

Example 1: as shown in fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7, fig. 8, fig. 9 and fig. 10, a mangrove forest single tree target detection method based on improved YOLOv5 includes the following steps:

In the step 1, in order to ensure the completeness of each detection target in the image, conventional LabelImg open source software is used for sequentially marking individual seedlings of the mangrove forest in the unmanned aerial vehicle image, marking content is coordinates of a rectangular surrounding frame of a newly-built mangrove forest, and storing the coordinates as an XML text file for training and testing a YOLOv5 model.

The structure of YOLOv5 is shown in fig. 1, and the YOLOv5 network mainly comprises 4 parts, wherein 1 box is an Input end (Input), 2 boxes are a reference network (backhaul), 3 boxes are feature fusion (hack), and 4 boxes are a Prediction part (Prediction). The 1 box is an Input (Input) and comprises a preprocessing stage of training data set images, and the Input is enhanced by the Mosaic data of YOLOv4 to improve the training speed and the network precision of the model.

And the frame 2 is a reference network (Backbone), the Backbone of the detection network is used for extracting the characteristics of the high, middle and low layers of the image by the Backbone.

And 3, the frame is a feature fusion (Neck), and the Neck part is mainly used for generating a feature pyramid. The feature pyramid enhances the detection of the model for objects of different scaling dimensions, so that the same object of different size and dimensions can be identified.

And 4, taking a frame as a Prediction part (Prediction), and performing convolution again to obtain a Prediction result.

The specific contents of the A frame, the B frame, the C frame, the D frame, the E frame and the F frame except the part 4 are as follows:

the input end of the YOLOv5 comprises a preprocessing stage of training data set images, and the Mosaic data enhancement of YOLOv4 is adopted at the input end to improve the training speed and the network precision of the model.

In the reference network, YOLOv5 is different in that a slice structure (Focus) is newly added, and as shown in a block E in fig. 1, data is cut by slice to extract general features.

The A frame is composed of a CBL structure and a Conv + Bn + Leaky _ relu activation function.

The B box is CSP1_ X consisting of three convolutional layers and X Res unint modules concatee.

The CSP2_ X structure is used in feature fusion (Neck), and the Res unint module is no longer used, but instead is CBL, as indicated in block D.

The SPP module is mainly in a pooling mode as shown by a C frame, and the subsequent improvement mainly changes the original maximum pooling into soft pool.

The prediction part is convolved again to obtain a prediction result.

The frame F is a residual component in which the Res unit module is composed of two CBL modules, and the number of the modules is determined according to specific requirements.

The most prominent characteristic of the Efficient Channel Attention in the step 2 is that the dimension reduction is avoided, the cross-Channel interaction is realized, the complexity of the model is reduced, and the feature expression capability is enhanced, the structure diagram of the ECA Attention module is shown in FIG. 2, the principle is that the ECA module generates Channel Attention through the fast one-dimensional convolution with the size of k, and the size of the kernel is completely determined by the Channel dimension correlation function in a self-adaptive manner. When the feature image x is input under the condition that the dimension is kept unchanged, after all channels are subjected to global average pooling, the ECA module learns the features by utilizing one-dimensional convolution which can share the weight, and each channel is involved with k neighbors to capture cross-channel interaction when the features are learned. k represents the kernel size of the fast one-dimensional convolution, and the determination of the value of the adaptive k is obtained by the direct proportion relation between the covering area of the cross-channel information interaction and the channel dimension C, and the calculation is as shown in formula (1):

in the formula: γ 2, b 21，|*_oodRepresenting the nearest neighbor odd, C is the channel dimension.

And 3, detecting information for easily damaging weight loss when performing feature mapping by utilizing maximum pooling and average pooling for newly-manufactured mangrove seedlings with smaller targets and excessively low pixels. Therefore, the SoftPool is introduced into the SPP module to improve the pooling operation and retain more detailed feature information, namely, the pooling is realized in a Softmax mode in the pooling area of the activation feature map. The Softpool structure is shown in FIG. 3, and the Softpool downsampling process sequentially comprises the following steps from left to right: activating the characteristic diagram, calculating the final pooling result of the softpool in the area, and outputting the result. The position of fusion in YOLOv5 is shown in fig. 4, in the SPP module.

Unlike other pooling, SoftPool uses softmax (normalized exponential function) for weighted pooling, which can preserve expressiveness of features and is differentiable in operation. The gradient of each back propagation can be updated, and SoftPool can comprehensively utilize each activation factor of the pooled kernel and only increase little memory occupation. The discrimination of the similar characteristic information is increased, meanwhile, the characteristic information of the whole receptive field is kept, and the accuracy of the algorithm is improved.

The core idea of SoftPool is the utilization of softmax, and the eigenvalue weight of the region R is calculated according to the nonlinear eigenvalue:

in the formula: w_iWeight activated for the ith element, a_iFor the ith activation value, R is the pooling region size, and e is a mathematical constant that is the base of the natural logarithm function.

Weight of_wi can ensure the transmission of important characteristics, and the characteristic values in the region R have at least a preset minimum gradient when transmitted reversely. After obtaining the weight w_iThen, the output is obtained by weighting the characteristic value in the region R:

in the formula: r is the size of the pooling area,

for the output value of SoftPool, the weighted summation of all the activation factors of the pooling kernel is realized.

In step 4, before feature extraction is performed on the Yolov5 feature extraction network, mosaic enhancement operation is performed on the data, then the data are uniformly scaled to a standard size for Focus slicing operation, and then the data are input to the Yolov5 feature extraction network for feature extraction.

In the step 6, the detection effect and performance of the model are generally evaluated by using an average Precision AP (average Precision) and an average Precision mAP (average probability), wherein the AP is an area under a Recall rate (Recall) curve and an accuracy rate (Precision) curve, the method is a single-class target, and the AP is equivalent to the mAP.

Area intersection ratio (IoU): and measuring the position prediction capability of the model by calculating the area intersection ratio of the rectangular region of the model prediction target and the rectangular region calibrated by the target in the verification set.

Precision (Precision) represents the proportion of the correct target number detected by the model to the total target number, and represents the accuracy of the model in target detection.

And the Recall rate (Recall) represents the proportion of the number of the targets detected by the model to the total number of the targets, and embodies the Recall capability of the model identification.

In the formula: TP (true positive) is the number of detected correct positive samples, namely the type of the prediction box is the same as that of the labeling box, and IoU is more than 0.5; FP (false positive) is the number of positive samples with errors detected; FN is the number of negative samples detecting errors; r is the whole real number set; AP is the area under the recall and accuracy curves.

Because the accuracy and the recall ratio are influenced by the confidence, if the accuracy and the recall ratio are only adopted to evaluate the model performance, certain unscientific performance and limitation exist, so that the average accuracy AP is introduced into an experiment as an evaluation index to evaluate the recognition performance of the model, which is one of the most important indexes for evaluating the performance of the mainstream target detection algorithm at present.

Example 2: as shown in fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7, fig. 8, fig. 9 and fig. 10, the method for detecting a single-tree target of a mangrove forest based on the improved YOLOv5 algorithm, including the improved YOLOv5 network model, will be further described below by taking a new mangrove forest seedling in the zhanjiang mangrove forest protection area in guangzhou as an example, based on the method for detecting a single-tree target of a mangrove forest according to the present invention.

step 1, data acquisition and data set construction:

in view of the special geographical position and the short and small characteristics of the newly-constructed mangrove in the natural protection area of the Zhanjiang mangrove, 2348 images of the unmanned aerial vehicle are obtained by acquiring continuous flight data of multiple breakpoints in a flight mission area by the unmanned aerial vehicle at the height of 120 meters, the theoretical spatial resolution is 0.05 meter, and the single mangrove seedling can be clearly distinguished by partial images of the unmanned aerial vehicle as shown in fig. 5a and 5b, so that a data set required by the invention is constructed. The unmanned aerial vehicle is produced by Shenzhen, Dajiang Innovation technology Limited, carries a CCD sensor, has the model of Dajiang eidolon 4RTK, and is light and handy in design, convenient to operate and good in performance. The method comprises the steps of selecting a clearer original image with resolution of 5472 x 3648 pixels when a data set is constructed, segmenting the original image into 597 pictures with 512 x 512 pixels through a segmentation program written based on python, sequentially labeling individual seedlings of the mangrove forest in an unmanned aerial vehicle image by using more commonly used LabelImg open source software, wherein the general outline of a target tree is shown in figures 6a, 6b and 6c, the relative position distribution and the relative target size of a labeling frame in the figures can be known from figures 6b and 6c, the target tree is uniformly distributed in the figures, the width of the target mostly accounts for 2% -5% of the width of the picture, and the height of the target mostly accounts for 1% -4%. Labeling about 3 million target trees, labeling the coordinates of a rectangular bounding box of the newly constructed mangrove forest, and storing as an XML text file for training and testing the YOLOv5 model. Then, according to the following 8: 1: the ratio of 1 divides the constructed data set into a training set, a verification set and a test set, wherein the training set 477 images, the verification set 60 images and the test set 60 images are contained.

Step 2, model training environment:

the experimental platform is an autonomous configuration server, a 64-bit Windows10 operating system, an Intel (R) Xeon (R) CPU E5-2630 v3@2.40GHz, an NVIDIATesla K40c video card, a video memory 12G and a memory 16 GB. The invention constructs a network model based on a PyTorch deep learning framework, and the development environment is PyTorch1.4, cuda10.1 and python 3.7. The training process uses an Adam optimizer for training, the initial learning rate is set to be 0.01, single-scale training is adopted in all experiments, and the image input size is 512 x 512 pixels. According to the characteristics of the model, epochs of Yolov5 and Yolov5-ECA training are both 200, and a pre-training model is YOLOv5 x.

Step 3, model improvement and training:

an effective Channel Attention mechanism (effective Channel Attention) is introduced into a CSPDarknet53 feature extraction network of YOLOv5, dimension reduction and cross-Channel interaction are avoided, meanwhile, the complexity of the model is reduced, the feature expression capability is enhanced, the YOLOv5 network is improved, and the improved model is renamed as YOLOv 5-ECA; SoftPool is introduced into an SPP module of a YOLOv5 network to improve the pooling operation and retain more detailed feature information. Inputting the obtained single-tree data set image of the unmanned aerial vehicle mangrove forest into a Yolov5 feature extraction network for feature extraction to obtain feature graphs of different scales; and classifying and regressing the obtained characteristic diagram, performing characteristic reconstruction operation on the regression result to obtain a more refined characteristic diagram, performing classification and regression operation again on the basis, calculating loss, and completing target detection based on the Guangzhou Zhanjiang newly manufactured mangrove forest single tree.

Step 4, analyzing the detection result:

on the basis of completing the model training of the invention, the test set is tested, and the detection result is evaluated and analyzed through the loss value, the parameter convergence result, the model detection performance and other aspects in the training process.

The loss function is used as one of the standards for evaluating the training advantages and disadvantages of the model, and it can be seen that the improved optimized model YOLOv5-ECA is obviously superior to the original model YOLOv5, YOLOv5-ECA has lower loss function values under the same training round number, and meanwhile, the improved model has less detail loss and stronger feature learning capability. The transformation curve of the loss function values with the number of training rounds is shown in figure 7.

The loss parameter and the result parameter of the YOLOv5-ECA are relatively stable, the fluctuation of the mutation is small, the sudden change tends to converge smoothly, and the result parameter is higher than the YOLOv 5. As shown in fig. 8a, 8b, 8c, 8d, 8e, and 8f, it can be seen that during YOLOv5 training, val objective Loss fluctuates greatly around 50epoch, and as a result, parameters Recall, mapp @0.5, and mapp @0.5:0.95 all appear to suddenly drop at 180epoch, and then continue to rise smoothly to converge. Precision fluctuates to varying degrees between 0-60 epochs and then stabilizes substantially. The loss parameter and the result parameter of the YOLOv5-ECA are relatively stable, the fluctuation of the mutation is small, the sudden change tends to converge smoothly, and the result parameter is higher than the YOLOv 5.

The single-tree target detection effect of the mangrove forest is shown in fig. 9a, 9b, 9c and 9d, fig. 9a is listed as the detection effect of the original model YOLOv5, fig. 9b is listed as the detection effect of the improved model YOLOv5-ECA, and the overall precision is improved by 3.2% compared with the original model. As can be seen from the figure, because the newly-built mangrove forest is small and densely distributed, and is not clear enough on the image, both models have the phenomenon of missing detection, and the comprehensive comparison shows that the phenomenon of missing detection of YOLOv5-ECA is less, and the probability of detecting some marginal objects and fuzzy objects is higher.

As described above, although the embodiments of the present invention have been described in detail, it will be apparent to those skilled in the art that many modifications are possible without substantially departing from the spirit and scope of the present invention. Therefore, such modifications are also all included in the scope of protection of the present invention.

Claims

1. A mangrove forest single tree target detection method based on improved YOLOv5 is characterized by comprising the following steps:

sequentially marking target trees on the selected unmanned aerial vehicle images by utilizing open source software LabelImg to construct a mangrove forest single tree data set;

2. The method for detecting the single target of the mangrove forest based on the improved YOLOv5 as claimed in claim 1, which is characterized by comprising the following steps:

step 1, acquiring data to construct a required data set by means of the unique time flexibility and high spatial resolution advantage of an unmanned aerial vehicle due to the special geographic position, dense distribution and small target of the young mangrove forest tree;

step 2, introducing an effective Channel Attention mechanism (effective Channel Attention) module into a CSPDarknet53 feature extraction network of YOLOv5, improving the YOLOv5 network, and renaming the improved model to be YOLOv 5-ECA;

step 3, further improving on the basis of the step 2, introducing SoftPool improved pooling operation into an SPP module of a YOLOv5 network, and reserving more detailed characteristic information;

step 4, inputting the acquired single-tree data set image of the unmanned aerial vehicle mangrove forest into a Yolov5 feature extraction network for feature extraction to acquire feature graphs of different scales;

step 5, classifying and regressing the characteristic diagram obtained in the step 4, performing characteristic reconstruction operation on a regression result to obtain a more precise characteristic diagram, and performing classification and regression operation again on the basis to calculate loss;

and 6, after the training of the model is completed, testing the test set by means of the divided data set so as to realize the target detection of the young mangrove forest single tree and evaluate the detection effect of the model.

3. The method as claimed in claim 2, wherein in step 1, in order to ensure the integrity of each detected target in the image, the individual seedlings of mangrove forest in the unmanned aerial vehicle image are labeled in sequence by using LabelImg open source software, the labeled contents are coordinates of a rectangular bounding box of a newly created mangrove forest, and stored as an XML text file for training and testing of a YOLOv5 model, the structure of YOLOv5 includes an Input end Input, a reference network Backbone, a feature fusion part Neck, and a detection Head part, the Input end of YOLOv5 includes a preprocessing stage of training data set image, the Input end is enhanced by using Moic sa data of YOLOv4 to improve the training speed and network precision of the model, meanwhile, an adaptive anchor frame calculation program is newly added, in a Backbone network part, YOLOv5 is added with a generic structure for extracting generic features, and two kinds of CSP structures are constructed accordingly, the Focus structure carries out slicing operation on the picture, a combined structure of FPN + PAN is added to a feature fusion part between a backbone network and head detection in the Yolov5, meanwhile, a corresponding segmentation program is written to cut the picture into a required size, a labeling target frame is attached to a target tree on the cut picture, and the target object with or without missed labeling is checked so as to be complete in supplement.

4. The method of claim 2, wherein the single target detection method of mangrove forest based on improved YOLOv5, it is characterized in that in step 2, an effective Channel Attention mechanism (effective Channel Attention) module generates Channel Attention through a fast one-dimensional convolution with the size of k, wherein the size of the kernel is completely determined by channel dimension correlation function self-adaption, the characteristic image χ is input under the condition that the dimension is kept unchanged, after all channels are subjected to global average pooling, the ECA module learns the characteristics by utilizing one-dimensional convolution which can share the weight, and when learning features, each channel is involved with k neighbors to capture cross-channel interaction, k represents the kernel size of the fast one-dimensional convolution, the determination of the value of the adaptive k is obtained through the proportional relation between the covering area of the cross-channel information interaction and the channel dimension C, and the calculation is shown as a formula (1):

in the formula: gamma 2, b 1, | Y_oodRepresenting the nearest neighbor odd, C is the channel dimension.

5. The method as claimed in claim 2, wherein in step 3, when feature mapping is performed on newly-manufactured mangrove seedlings with smaller targets and too low pixels, the vulnerability to weight loss of information to be detected is performed by maximum pooling and average pooling, so SoftPool improved pooling operation is introduced into the SPP module, and more detailed feature information is retained, that is, pooling is implemented in a pooling area of an activation feature map by using a softmax method, and during the SoftPool downsampling process, the steps are sequentially from left to right: activating the characteristic diagram, calculating the final pooling result of the softpool in the area and outputting the result,

and (3) calculating the characteristic value weight of the region R according to the nonlinear characteristic value by SoftPool by utilizing softmax:

in the formula: w_iWeight activated for the ith element, a_iFor the ith activation value, R is the pooling region size, e is a mathematical constant, is the base of a natural logarithmic function,

weight ofw_iThe transfer of important features can be ensured, the feature values in the region R have at least a preset minimum gradient when being transferred reversely, and the weight w is obtained_iThen, the output is obtained by weighting the characteristic value in the region R:

in the formula: r is the size of the pooling area,

6. The method for detecting the single tree target of the mangrove forest based on the improved YOLOv5 as claimed in claim 2, wherein in step 4, before the YOLOv5 feature extraction network is used for feature extraction, the data is subjected to mosaic enhancement operation, then is uniformly scaled to a standard size for Focus slicing operation, and then is input to the YOLOv5 feature extraction network for feature extraction.

7. The method as claimed in claim 2, wherein in step 6, the target detection is performed by using average Precision AP (average Precision) and average Precision mAP to evaluate the detection effect and performance of the model, AP being the area under the Recall and Precision recalls curves,

area intersection ratio IoU: the position prediction capability of the model is measured by calculating the area intersection ratio of the rectangular region of the model prediction target and the rectangular region calibrated by the target in the verification set,

precision, which represents the proportion of the correct target number detected by the model to the total target number, represents the accuracy of the model in target detection,

recall (Recall) which represents the proportion of the number of the targets detected by the model to the total number of the targets, embodies the Recall capability of model identification,