CN113705478B

CN113705478B - Mangrove single wood target detection method based on improved YOLOv5

Info

Publication number: CN113705478B
Application number: CN202111009370.8A
Authority: CN
Inventors: 马永康; 凌成星; 刘华; 赵峰; 张雨桐; 曾浩威
Original assignee: Research Institute Of Forest Resource Information Techniques Chinese Academy Of Forestry
Current assignee: Research Institute Of Forest Resource Information Techniques Chinese Academy Of Forestry
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2024-02-27
Anticipated expiration: 2041-08-31
Also published as: CN113705478A

Abstract

A mangrove single wood target detection method based on improved YOLOv5, a target detection and image processing technology in deep learning and a young mangrove single wood identification technology based on improved YOLOv5 algorithm belong to the field of mangrove single wood target detection in forestry science research. The method comprises the steps of sequentially marking target trees on selected unmanned aerial vehicle images by using open source software Labe l Img, constructing a mangrove single wood data set, selecting YOLOv5 as a basic target detection model, optimizing and improving the target tree according to the characteristics of dense target wood distribution and smaller size, improving a CSPDarknet53 backbone network by using an effective channel attention mechanism Eff i c i ent Channe l Attent i on, enhancing feature expression capability while avoiding descending and cross-channel interaction, and introducing softPoo pooling operation in an SPP module, retaining more detail feature information, and improving automatic target detection precision.

Description

Mangrove single wood target detection method based on improved YOLOv5

Technical Field

The invention relates to a mangrove single wood target detection method based on improved YOLOv5, a target detection and image processing technology in deep learning and a young mangrove single wood identification technology based on improved YOLOv5 algorithm, and belongs to the field of mangrove single wood target detection in forestry scientific research.

Background

Mangrove is a special forest community on tropical and subtropical coastal zones, and plays an important role in improving ecological conditions, maintaining biodiversity, ecologically safety in coastal areas and the like. Mangrove resource investigation and dynamic monitoring are the basis and preconditions for the scientific protection and management of mangrove. In order to better protect the mangrove ecological system and enlarge the mangrove area, a plurality of mangrove natural protection areas develop mangrove artificial forestation work according to planning in recent years, and the supplementing planting and artificial forestation in the original mangrove become main measures for recovering mangrove resources. The mangrove is distributed in special geographical positions, is often submerged by seawater in the tidal zone, and the mangrove seedlings are washed away by water flow due to no adhesive force of root systems if not survived, so that the existing mangrove single seedlings are survived trees. In the special environment, the low-altitude unmanned aerial vehicle remote sensing system is very suitable for image acquisition of mangrove areas due to the characteristics of flexible data acquisition, low cost and capability of rapidly acquiring small-range ultrahigh-resolution images. Based on the method, the unmanned aerial vehicle is applied to acquisition of mangrove seedlings, and is one way for effectively improving the monitoring precision of the mangrove seedlings.

Therefore, how to quickly and accurately detect mangrove single wood by combining with unmanned aerial vehicle images is a problem to be solved in the checking work of survival rate of newly built mangrove. With the development of target detection and object recognition technologies, deep learning is beginning to be widely applied to target detection, face recognition, voice recognition and the like, which are the most common applications. Aiming at the problems of insufficient feature extraction capability, reduced detection speed in prediction frame processing and the like, an Anchor-Free lightweight detection algorithm for multi-scale feature fusion is provided. Aiming at the problem that the inspection defect detection precision of the existing target detection algorithm is low in a complex inspection scene, a defect detection method based on a scale-invariant feature pyramid is provided. However, the above methods are only algorithm improvement on the corresponding specific scenes. In recent years, the invention of deep learning is also gradually applied to the forestry industry, and more accurate, rapid and intelligent monitoring of forestry is gradually realized. Zhou in a text of "detection method of small target disaster-affected trees based on deep learning", a detection method of small target disaster-affected trees based on deep learning is provided for solving the problems of small tree scale, dense growth, irregular distribution and the like in forest images of unmanned aerial vehicles. However, the SSD algorithm (Single Shot MultiBox Detector, an object detection algorithm proposed by Wei Liu on ECCV 2016) discards the underlying features that contain rich information, and is less robust to small object detection.

The current mainstream deep learning target detection algorithm has a two-stage and a single-stage division. Different from a double-stage detection algorithm represented by an R-CNN series, the YOLO directly completes feature extraction, candidate frame regression and classification in the same unbranched convolution network, so that the network structure becomes simple, and the requirement of real-time detection tasks can be met. The YOLO target detection model is subjected to a plurality of updating iterations, and the problems of multiple dimensions such as multi-target detection, small target detection, maintenance missing complex, multi-scale prediction and the like are successively solved.

Based on the advantages of automatic feature learning, high speed, high efficiency and the like of target detection, the latest YOLOv5 of the series of algorithms is better applied to mangrove single wood detection so as to realize intelligent monitoring of mangrove seedling conditions.

Disclosure of Invention

The invention aims to provide a mangrove single wood target detection method based on improved YOLOv5 based on a deep learning method, which solves the problems of small and densely distributed mangrove single wood targets in the existing man-machine image, low automation degree, low efficiency and the like in detection, provides technical support for automatic detection of survival mangrove seedlings, further improves the survival rate detection precision and efficiency of newly-built mangrove, and enables intelligent monitoring of the condition of the mangrove seedlings to be realized.

Aiming at the problems of small and densely distributed mangrove single wood targets in the unmanned aerial vehicle image, low automation degree, low efficiency and the like in detection, the method for detecting the mangrove single wood targets based on improved YOLOv5 is provided based on a deep learning method, so that the mangrove single wood in the unmanned aerial vehicle image is rapidly and accurately automatically identified and positioned.

A mangrove single wood target detection method based on improved YOLOv5 comprises the following steps:

and marking target trees on the selected unmanned aerial vehicle images in sequence by using open source software LabelImg, and constructing a mangrove single wood data set.

Selecting YOLOv5 as a basic target detection model, optimizing and improving the target wood according to the characteristics of dense distribution and small size of the target wood, improving a CSPDarknet53 backbone network by using an effective channel attention mechanism Efficient Channel Attention, avoiding the reduction and the cross-channel interaction, enhancing the feature expression capability, introducing softPool pooling operation into an SPP module, retaining more detail feature information, and improving the automatic target detection precision.

The technical scheme adopted for solving the technical problems is as follows: a mangrove single wood target detection method based on improved YOLOv5 comprises the following steps:

step 1, because of special geographical positions and dense distribution of young mangrove units and small targets, data are acquired by means of the unique time flexibility and high spatial resolution advantages of the unmanned aerial vehicle to construct a required data set.

And 2, introducing an effective channel attention mechanism (Efficient Channel Attention) into the CSPDarknet53 feature extraction network of the Yolov5, improving the Yolov5 network, and renaming the improved model as Yolov5-ECA.

And 3, further improving on the basis of the step 2, introducing a SoftPool improvement pooling operation into the Yolov5 network SPP module, and reserving more detail characteristic information.

And 4, inputting the obtained image of the mangrove single wood dataset of the unmanned aerial vehicle into a Yolov5 feature extraction network to perform feature extraction, and obtaining feature graphs with different scales.

And 5, classifying and regressing the feature images obtained in the step 4, performing feature reconstruction operation on the regressing result to obtain a finer feature image, performing classification and regressing operation again on the basis, and calculating the loss.

And 6, after training of the model is completed, testing the test set by means of the divided data set to realize target detection of young mangrove single wood, and evaluating the detection effect of the model.

The effective attention channel of the invention reduces the complexity of the model by not reducing and crossing channel information interaction, and simultaneously enhances the expression capability of the characteristics. The softpool can maintain the expressive property of the features and the operation is tiny, so that the feature information of the whole receptive field is reserved, and the accuracy of the algorithm is improved. The method can rapidly, accurately and automatically detect newly-built mangrove single wood, has obvious advantages compared with the traditional detection method of the mangrove of the existing unmanned aerial vehicle, improves the target detection precision, has lower training loss, can realize rapid, accurate and automatic detection of the target of the mangrove single wood, and better improves the identification and positioning capability of the mangrove single wood.

Drawings

A more complete appreciation of the invention and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, which are included to provide a further understanding of the invention, and are illustrative embodiments of the invention and are not to be construed as limiting the invention, as illustrated in the accompanying drawings

Wherein:

fig. 1 is a schematic diagram of a YOLOv5 network structure.

Fig. 2 is a block diagram of an attention module for introducing effective channels.

Fig. 3 is a schematic diagram of the incorporation of SoftPool in an SPP module.

FIG. 4 is a schematic representation of the position of SoftPool in Yolov5-ECA.

Fig. 5a is a captured image of a portion of a mangrove single wood drone.

Fig. 5b is a captured image of a portion of a mangrove single wood drone.

FIG. 6a is a target annotation.

Fig. 6b is a normalized target position map.

Fig. 6c is a normalized target size plot.

FIG. 7 is a graph of the value of the Yolov5 and Yolov5-ECA loss function as a function of training round number.

FIG. 8a is a graph comparing the convergence of the parameters of Yolov5 and Yolov5-ECA.

FIG. 8b is a graph comparing the convergence of the parameters of Yolov5 and Yolov5-ECA.

FIG. 8c is a graph comparing the convergence of the parameters of Yolov5 and Yolov5-ECA.

FIG. 8d is a graph comparing the convergence of the parameters of Yolov5 and Yolov5-ECA.

FIG. 8e is a graph comparing the convergence of the parameters of Yolov5 and Yolov5-ECA.

FIG. 8f is a graph comparing the convergence of the parameters of Yolov5 and Yolov5-ECA.

Fig. 9a is a graph of the detection result of YOLOv5 detection target.

FIG. 9b is a graph showing the detection result of the YOLOv5 detection target.

FIG. 9c is a graph showing the detection result of the YOLOv5-ECA detection target.

FIG. 9d is a graph showing the detection result of the YOLOv5-ECA detection target.

FIG. 10 is a flow chart of the steps of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It will be apparent that many modifications and variations are possible within the scope of the invention, as will be apparent to those skilled in the art based upon the teachings herein.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element or component is referred to as being "connected" to another element or component, it can be directly connected to the other element or component or intervening elements or components may also be present. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art.

In order to facilitate an understanding of the embodiments, the following description will be given in conjunction with the accompanying drawings, and the various embodiments do not constitute a limitation of the present invention.

Example 1: as shown in fig. 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10, the method for detecting the mangrove single wood target based on the improved YOLOv5 comprises the following steps:

In the step 1, in order to ensure the completeness of each detection target in the image, the mangrove single wood seedlings in the unmanned aerial vehicle image are marked sequentially by using common LabelImg open source software, the marking content is coordinates of a newly-built mangrove rectangular bounding box, and the coordinates are stored as XML text files for training and testing of a YOLOv5 model.

The structure of YOLOv5 is shown in fig. 1, and the YOLOv5 network mainly comprises 4 parts, namely an Input terminal (Input) with a1 frame, a reference network (backbond) with a 2 frame, a feature fusion (negk) with a 3 frame, and a Prediction part (Prediction) with a 4 frame. The 1 frame is Input end (Input) and includes preprocessing stage of training data set image, and Mosaic data enhancement of YOLOv4 is used at Input end to raise training speed and network accuracy of the model.

The 2 boxes are reference networks (backbones), which are the backbones of the detection network, and the network extracts the features of the high, medium and low layers of the image.

The 3 box is feature fusion (neg), which is used mainly to generate feature pyramids. Feature pyramids enhance the detection of objects of different scales by the model, thereby enabling the identification of the same object of different sizes and scales.

And 4, taking a frame as a Prediction part (Prediction), and convolving again to obtain a Prediction result.

The specific contents of the A frame, the B frame, the C frame, the D frame, the E frame and the F frame outside the 4 parts are as follows:

the input end of YOLOv5 contains the preprocessing stage of training data set image, and the input end is enhanced by using the Mosaic data of YOLOv4, so as to improve the training speed and network accuracy of the model.

In the reference network, YOLOv5 is different in that a slice structure (Focus) is added, and as shown in an E box in fig. 1, data is cut by slice to extract general features.

The A frame is composed of CBL structure, conv+Bn+Leaky_relu activation function.

The B frame is csp1_x consisting of three convolutional layers and X Res unit modules Concate.

In feature fusion (negk), the csp2_x structure is used, and in D, the Res unit module is not used, but CBL is used instead.

The SPP module is shown in a C frame and mainly adopts a pooling mode, and the subsequent improvement mainly changes the original maximum pooling into soft pool.

The prediction part convolves again to obtain a prediction result.

The F frame is a Resunant module and is a residual assembly formed by two CBL modules, and the number of the modules can be determined according to specific requirements.

The most prominent feature of Efficient Channel Attention in step 2 is to avoid the down-scaling and cross-channel interaction, reduce the complexity of the model, enhance the feature expression capability, and the ECA attention module structure diagram is shown in fig. 2, whose principle is that the ECA module generates channel attention by a fast one-dimensional convolution of size k, where the size of the kernel is determined entirely by channel dimension correlation function adaptation. The feature image χ is input under the condition that the dimension is kept unchanged, after all channels are subjected to global averaging pooling, the ECA module learns features by utilizing one-dimensional convolution which can be shared by weights, and each channel is involved in capturing cross-channel interaction with k neighbors of each channel when learning the features. k represents the kernel size of the fast one-dimensional convolution, and the determination of the adaptive k value is obtained by the proportional relation between the coverage area of the cross-channel information interaction and the channel dimension C, and the calculation is shown in the formula (1):

wherein: γ=2, b=1, | _ood Representing the nearest odd number, C is the channel dimension.

In the step 3, aiming at newly-built mangrove seedlings with smaller targets and too low pixels, important detection information is easy to lose when characteristic mapping is carried out by utilizing maximum value pooling and average pooling. Thus introducing SoftPool in the SPP module improves pooling operations, retaining more detailed feature information, i.e. pooling is achieved using softmax means within the pooling area of the activation feature map. The structure of SoftPool is shown in fig. 3, and the SoftPool downsampling process is sequentially performed from left to right: activation of a feature map, final pooling result output of softpool calculation in an area, and the like. The position fused in YOLOv5 is shown in fig. 4, in the SPP module.

Unlike other pooling, softPool uses softmax (normalized exponential function) for weighted pooling, can preserve the expressiveness of features and is scalable. The gradient of each counter-propagation can be updated, and SoftPool can comprehensively utilize each activating factor of the pooled kernel, and only little memory occupation is increased. The distinguishing degree of the similar characteristic information is increased, the characteristic information of the whole receptive field is reserved, and the accuracy of the algorithm is improved.

The core idea of SoftPool is to calculate the eigenvalue weights of the region R from the nonlinear eigenvalues by using softmax:

wherein: w (W) _i Weights activated for the ith element, a _i For the ith activation value, R is the pool area size, e is a mathematical constant, and is the base of the natural logarithmic function.

Weight w _i The transmission of important features can be ensured, and the feature values in the region R have at least a preset minimum gradient during reverse transmission. At the time of obtaining the weight w _i And then, obtaining output through the characteristic values in the weighted region R:

wherein: r is the size of the pooled region,for the output value of SoftPool, a weighted summation of all active factors of the pooled kernel is achieved.

Before the Yolov5 feature extraction network performs feature extraction, the data is subjected to a mosaics enhancement operation, then uniformly scaled to a standard size for Focus slicing operation, and then input to the Yolov5 feature extraction network for feature extraction.

In step 6, in the target detection, the average accuracy AP (Average Precision) and the average mAP of the average accuracy are generally used for evaluating the detection effect and performance of the model, AP is the area under the curve of Recall (Recall) and Precision (Precision), and the AP is a single-class target, namely the AP is equal to the mAP.

Area intersection ratio (IoU): and calculating the area intersection ratio of the rectangular area of the target predicted by the model and the rectangular area calibrated by the target in the verification set, and measuring the position prediction capability of the model.

Accuracy (Precision) is the ratio of the number of detected correct targets to the total number of targets, and represents the accuracy of the model in target detection.

Recall ratio (Recall) the Recall ratio represents the ratio of the number of targets detected by the model to the total number of targets, and represents the full searching capability of model identification.

Wherein: TP (True positive) is the number of positive samples detected to be correct, namely the predicted frame and the marked frame are the same in category and IoU is more than 0.5; FP (False positive) is the number of positive samples in which an error is detected; FN is the number of negative samples in which errors are detected; r is the whole real number set; AP is the area under the recall and precision curves.

Because the accuracy and the recall are affected by the confidence, if the model performance is evaluated by only adopting the accuracy and the recall, certain scientificity and limitation exist, so that the average accuracy AP is introduced in the experiment as an evaluation index to evaluate the recognition performance of the model, and the method is one of the most important indexes for evaluating the performance of the mainstream target detection algorithm at present.

Example 2: as shown in fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7, fig. 8, fig. 9 and fig. 10, the method for detecting a mangrove single wood target based on the improved YOLOv5 algorithm comprises improving a YOLOv5 network model and the like, and the method of the invention will be further described based on the present invention by taking newly-built mangrove seedlings in the zhanjiang mangrove protection area of guangzhou as an example.

step 1, data acquisition and data set construction:

in view of the special geographical position of the newly-built mangrove in the Zhanjiang mangrove natural protection area and the characteristic of low self, 2348 unmanned aerial vehicle images are obtained through repeated breakpoint continuous flight data acquisition of the flight task area by using 120 meters of unmanned aerial vehicle, the theoretical spatial resolution is 0.05 meter, and part of unmanned aerial vehicle images are shown in fig. 5a and 5b, so that single mangrove seedlings can be clearly distinguished, and the required data set is constructed. The unmanned aerial vehicle is produced by Shenzhen Dajiang innovation technology limited company, is provided with a CCD sensor, and has the model of Dajiang eidolon 4RTK, light design, convenient operation and good performance. When a data set is constructed, a clearer original image with the resolution of 5472 x 3648 pixels is selected, the original image is segmented into 597 pictures with the resolution of 512 x 512 pixels through a segmentation program written based on python, the mangrove single seedlings in the unmanned aerial vehicle image are labeled sequentially by using common LabelImg open source software, the outline of a target tree is shown in fig. 6a, 6b and 6c, the relative position distribution of labeling frames in the graph and the relative size of the target are known from fig. 6b and 6c, the distribution of the target tree in the graph is uniform, the width of the target is mostly 2% -5% of the width of the picture, and the height of the target is mostly 1% -4%. About 3 ten thousand target trees are marked, the marking content is coordinates of a newly-built mangrove rectangular bounding box, and the coordinates are stored as XML text files for training and testing of the YOLOv5 model. Then according to 8:1: the scale of 1 divides the constructed dataset into a training set, a verification set and a test set, wherein the training set comprises 477 images, 60 images of the verification set and 60 images of the test set.

Step 2, model training environment:

the experimental platform of the invention is an autonomous configuration server, a 64-bit Windows10 operating system, and a processor is an Intel (R) Xeon (R) CPU E5-2630 v3@2.40GHz,NVIDIA Tesla K40c graphics card, a video memory 12G and a memory 16GB. The invention builds a network model based on the PyTorr deep learning framework, and the development environment is PyTorrch1.4, cuda10.1 and python3.7. The training process uses an Adam optimizer for training, the initial learning rate is set to be 0.01, single-scale training is adopted in all experiments, and the image input size is 512×512 pixels. According to the characteristics of the model, epoch trained by Yolov5 and Yolov5-ECA is 200, and the pre-trained model is YOLOv5x.

Step 3, model improvement and training:

introducing an effective channel attention mechanism (Efficient Channel Attention) into a CSPDarknet53 feature extraction network of the Yolov5, avoiding the reduction and cross-channel interaction, reducing the complexity of a model, enhancing the feature expression capability, improving the Yolov5 network, and renaming the improved model to Yolov5-ECA; and introducing a SoftPool improved pooling operation into the YOLOv5 network SPP module, and reserving more detail characteristic information. Inputting the obtained image of the mangrove single-wood dataset of the unmanned aerial vehicle into a Yolov5 feature extraction network for feature extraction to obtain feature graphs with different scales; and classifying and regressing the obtained feature map, performing feature reconstruction operation on the regressing result to obtain a finer feature map, performing classification and regressing operation again on the basis, and calculating loss to finish target detection of the Guangzhou Zhanjiang newly-built mangrove forest single wood.

Step 4, analysis of detection results:

on the basis of completing the model training of the invention, the test set is tested, and the detection result is evaluated and analyzed through the loss value, parameter convergence result, model detection performance and other aspects of the training process.

As one of the standards for evaluating the training quality of the model, the improved optimized model YOLOv5-ECA is obviously superior to the original model YOLOv5, the YOLOv5-ECA has lower loss function value under the same training round number, and meanwhile, the improved model has less detail loss and stronger characteristic learning capability. The curve of the loss function value as a function of the number of training rounds is shown in fig. 7.

The loss parameter and the result parameter of YOLOv5-ECA are stable, the fluctuation of mutation is small, convergence tends to be smooth, and the result parameter is higher than YOLOv5. As shown in fig. 8a, 8b, 8c, 8d, 8e and 8f, in the YOLOv5 training process, val Objectness Loss fluctuates greatly around 50epoch, and as a result, parameters Recall, mAP@0.5 and mAP@0.5:0.95 all fall suddenly at 180epoch, and then continue to rise steadily to converge. Precision fluctuates to varying degrees between 0 and 60epoch and is then substantially stable. The loss parameter and the result parameter of YOLOv5-ECA are stable, the fluctuation of mutation is small, convergence tends to be smooth, and the result parameter is higher than YOLOv5.

The detection effect of the mangrove single wood target is shown in fig. 9a, 9b, 9c and 9d, the detection effect of the original model YOLOv5 is listed in fig. 9a, the detection effect of the improved model YOLOv5-ECA is listed in fig. 9b, and the overall accuracy is improved by 3.2% compared with that of the original model. As shown in the figure, because the newly-built mangrove is smaller and densely distributed, the image is not clear enough, the two models have the phenomenon of missing detection, and the comprehensive comparison finds that the missing detection phenomenon of the YOLOv5-ECA is less, and the probability that some edge targets and fuzzy targets are detected is larger.

As described above, the embodiments of the present invention have been described in detail, but it will be apparent to those skilled in the art that many modifications can be made without departing from the spirit and effect of the present invention. Accordingly, such modifications are also entirely within the scope of the present invention.

Claims

1. The mangrove single wood target detection method based on improved YOLOv5 is characterized by comprising the following steps of:

sequentially marking target trees on the selected unmanned aerial vehicle images by using open source software LabelImg, and constructing a mangrove single wood data set;

selecting YOLOv5 as basic target detection model, optimizing and improving the target wood according to the characteristics of dense distribution and small size, improving CSPDarknet53 backbone network by using effective channel attention mechanism Efficient Channel Attention, avoiding down-dimension and cross-channel interaction, enhancing feature expression capability, introducing SoftPool operation into SPP module, retaining more detail feature information, improving automatic target detection precision,

in the step 2, an effective channel attention mechanism (Efficient Channel Attention) module generates channel attention through a quick one-dimensional convolution with a size k, wherein the size of a kernel is completely determined by channel dimension correlation function adaptation, a feature image χ is input under the condition that the dimension is kept unchanged, after all channels are subjected to global average pooling, an ECA module learns features through a one-dimensional convolution which can be shared by weights, and when features are learned, each channel is involved in capturing cross-channel interaction with k neighbors thereof, k represents the kernel size of the quick one-dimensional convolution, and the determination of the adaptive k value is obtained through the proportional relation between a coverage area of the cross-channel information interaction and the dimension C of the channel, and the calculation is as shown in a formula (1):

wherein: γ=2, b=1, | _ood Representing the nearest neighbor odd number, C is the channel dimension,

in the step 3, aiming at newly-built mangrove seedlings with smaller targets and too low pixels, when characteristic mapping is carried out by utilizing maximum value pooling and average pooling, important detection information is easy to lose, so that SoftPool improvement pooling operation is introduced into an SPP module, more detail characteristic information is reserved, namely pooling is realized by using a softmax mode in a pooling area of an activated characteristic map, and the following steps are sequentially carried out from left to right in the process of sampling the SoftPool: the final pooling result output of softpool is calculated in the region of the activation of the feature map,

SoftPool calculates eigenvalue weights for region R from nonlinear eigenvalues using softmax:

wherein: w (W) _i Weights activated for the ith element, a _i For the ith activation value, R is the pool area size, e is a mathematical constant, is the base of a natural logarithmic function,

weight w _i The transmission of important features can be ensured, the feature values in the region R have at least a preset minimum gradient during reverse transmission, and the weight w is obtained _i And then, obtaining output through the characteristic values in the weighted region R:

2. The method for detecting the mangrove single wood target based on the improved YOLOv5 as set forth in claim 1, comprising the steps of:

step 1, acquiring data by virtue of unique time flexibility and high spatial resolution advantages of an unmanned aerial vehicle to construct a required data set due to special geographic positions and dense distribution of young mangrove units and small targets;

step 2, introducing an effective channel attention mechanism (Efficient Channel Attention) module into a CSPDarknet53 feature extraction network of the Yolov5, improving the Yolov5 network, and renaming the improved model to Yolov5-ECA;

step 3, further improving on the basis of the step 2, introducing a SoftPool improvement pooling operation into a YOLOv5 network SPP module, and reserving more detail characteristic information;

step 4, inputting the obtained image of the mangrove single wood dataset of the unmanned aerial vehicle into a Yolov5 feature extraction network for feature extraction to obtain feature graphs with different scales;

step 5, classifying and regressing the feature images obtained in the step 4, performing feature reconstruction operation on the regressing result to obtain a finer feature image, performing classification and regressing operation again on the basis, and calculating loss;

and 6, after training of the model, testing the test set by means of the divided data set to realize target detection of young mangrove single wood and evaluate the detection effect of the model.

3. The method is characterized in that in step 1, in order to ensure the completeness of each detection target in an image, labelImg open source software is used for marking mangrove single seedlings in an unmanned aerial vehicle image in sequence, marking content is newly manufactured mangrove rectangular bounding box coordinates, XML text files are stored for training and testing of a model of YOLOv5, the structure of YOLOv5 comprises an Input end, a reference network backbox, a feature fusion part Neck and a detection Head part, the Input end of YOLOv5 comprises a preprocessing stage of a training data set image, mosaic data of YOLOv4 is used for enhancing training speed and network precision of the model, an adaptive anchor frame calculation program is newly added at the same time, a Focus structure is added at a Backbone network part, a Focus structure is used for extracting general features, two CSP structures are constructed, the image is sliced according to the structure, the feature of the image is detected in the middle of the Focus structure, the image is cut out on the corresponding image, and the image is cut out in the middle of the Focus structure, the image is completely marked on the Focus structure, and the image is completely cut out, and the image is completely marked on the Focus structure is added at the surface of the Focus, and the Focus is not required to be matched.

4. The method for detecting the mangrove single wood target based on the improved YOLOv5 as claimed in claim 2, wherein in the step 4, before the feature extraction is performed by the YOLOv5 feature extraction network, the data is subjected to a mosaic enhancement operation, then uniformly scaled to a standard size for a Focus slicing operation, and then input to the YOLOv5 feature extraction network for feature extraction.

5. The method for detecting single-tree objects in mangrove forest based on improved YOLOv5 according to claim 2, wherein in step 6, average accuracy AP (Average Precision) and average accuracy mean mAP are used to evaluate the detection effect and performance of the model, AP is the area under the Recall and accuracy curves,

area intersection ratio IoU: calculating the area intersection ratio of the rectangular area of the target predicted by the model and the rectangular area calibrated by the target in the verification set, measuring the position prediction capability of the model,

accuracy-representing the proportion of the correct target number to the total target number detected by the model, showing the accuracy of the model in target detection,

recall ratio (Recall) which represents the proportion of the number of the targets detected by the model to the total number of the targets, represents the full searching capability of model identification,