CN113591729A

CN113591729A - Urban forest single-tree crown detection method combining RGB-DSM image and deep learning

Info

Publication number: CN113591729A
Application number: CN202110885400.5A
Authority: CN
Inventors: 夏凯; 王昊; 冯海林; 杨垠晖; 徐流畅
Original assignee: Zhejiang A&F University ZAFU
Current assignee: Zhejiang A&F University ZAFU
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2021-11-02
Anticipated expiration: 2041-08-03
Also published as: CN113591729B

Abstract

The invention discloses an urban forest single-tree crown detection method combining RGB-DSM images and deep learning, and belongs to the field of target detection. The invention provides an urban forest single-tree crown detection method combining two images and deep learning. Compared with a deep learning network trained by only using an orthophoto map or a digital surface model, the method has obvious precision advantage.

Description

Urban forest single-tree crown detection method combining RGB-DSM image and deep learning

Technical Field

The invention belongs to the field of target detection, and particularly relates to an urban forest single-tree crown detection method combining an RGB-DSM image and deep learning.

Background

Urban forests have important functions of regulating urban climate, absorbing toxic and harmful gases, improving living environment, maintaining biological diversity and the like. As the urbanization effect becomes more and more serious, the construction of urban forests is more and more emphasized. Single tree crown detection (ITCD), as an important method for realizing sustainable urban forest management and management, is not only used to obtain basic information of trees, but also has important applications in disease tree detection, urban green quantity monitoring, and the like.

The ITCD task can be divided into two steps: position detection and edge delineation. Common algorithms for position detection include a local maximum method, a template matching method, a scale analysis method and the like; common algorithms for edge delineation include region growing methods, valley tracking methods, watershed segmentation methods, and the like. The algorithms are unsupervised algorithms and are more suitable for landscape homogenization areas. The urban forest is located in a city, and various types of ground features are distributed around the urban forest. In such a complicated environment, it is difficult for the above algorithm to achieve a good detection effect.

With the improvement of computer computing power and the increase of data volume, the strong feature learning capability enables the deep learning to have obvious advantages in the aspects of adapting to complex backgrounds, generalizing models, reducing access thresholds and the like, and the advantages enable the deep learning to be widely applied to multiple fields such as face recognition, unmanned driving and the like. In forestry, combine degree of depth study and unmanned aerial vehicle high definition image data, the ITCD task is applied to a great deal of scene.

Research has shown that elevation information helps to improve the performance of computer vision tasks. Some researchers have used elevation data in ITCD studies, but these studies have used elevation data in non-deep learning algorithm based ITCD tasks, but because color and elevation data are three-channel images and single-channel images (which have four channels in total), respectively, while currently common deep learning networks accept three-channel images, prior to combining the two types of data for a deep learning based ITCD task, some must try to resolve the contradiction between the study data and the network in terms of the number of channels.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a method for detecting urban forest single-tree crowns by combining RGB-DSM images and deep learning. The method combines the unmanned aerial vehicle RGB image and the DSM to detect the urban forest single tree crown so as to improve the accuracy of the detection result.

The invention adopts the following specific technical scheme:

a city forest single tree crown detection method combining RGB-DSM images and deep learning comprises the following steps:

s1: acquiring an unmanned aerial vehicle image of a target detection area, and generating a dense point cloud by using the acquired unmanned aerial vehicle image so as to obtain an RGB three-channel orthophoto map representing the reflectivity of a ground object and a single-channel digital surface model representing the height of the ground object;

s2: acquiring a first target detection model and a second target detection model which are respectively trained based on a deep learning network; the first target detection model takes an orthophoto map as input and outputs a detection frame of the single tree crown; the second target detection model takes a three-channel high-level diagram formed by repeated superposition of single-channel digital surface models as input and outputs a detection frame of the single-tree crown;

s3: taking the RGB three-channel orthophoto map obtained in the S1 as the input of the first target detection model to obtain a group of first detection box sets representing all single tree crowns in the input image; taking the single-channel digital surface model obtained in the step S1 as the input of the second target detection model to obtain another group of second detection box sets representing all single-tree crowns in the input image;

s4: and traversing the two groups of detection boxes for confidence level resetting and redundancy elimination:

if two detection frames belong to two groups of detection frames respectively and the intersection ratio is greater than or equal to the intersection ratio threshold value, the two detection frames are considered to represent the same crown, the confidence degrees of the two detection frames are weighted according to a preset weight and then assigned to one detection frame with a higher original confidence degree, and then the other detection frame is removed;

if the union ratio between one detection frame in one group of detection frame sets and all detection frames in the other group of detection frame sets is smaller than the union ratio threshold, the single tree crown does not exist in the other group of detection frame sets, the detection frame is reserved, but the confidence coefficient is reduced according to a preset proportion;

s5: and after the confidence coefficient resetting and redundancy elimination of the S4 are completed, obtaining a group of new detection boxes as a final single-tree crown detection result.

Preferably, when the unmanned aerial vehicle is used for acquiring the unmanned aerial vehicle image, the course overlapping and the side overlapping need to be kept to be not less than 90%.

Preferably, when the unmanned aerial vehicle is used for acquiring the unmanned aerial vehicle image, the flight area is larger than the target detection area, and then the unmanned aerial vehicle image of the target detection area is obtained by removing the edge part of the image.

Preferably, the deep learning network comprises a feature extraction network and a regional proposal network, the network receives three-channel images as input, and the input images are subjected to feature extraction to obtain a group of feature maps; then, the region proposal network generates a large number of regions of interest on the feature map, and the detection frame is obtained after the regions of interest are regressed, classified and screened.

Preferably, the deep learning network is Faster R-CNN.

Preferably, the threshold value of the intersection ratio is 0.5.

Preferably, the preset weights are equal weights, that is, the confidences of the two detection boxes are weighted by equal weights of 0.5.

Preferably, the first target detection model and the second target detection model are both trained in advance by using labeled sample data which is subjected to data enhancement.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides an urban forest single-tree crown detection method combining RGB-DSM images and deep learning. Compared with a deep learning network trained by only using RGB images or DSM images, the method has obvious precision advantage.

Drawings

FIG. 1 is a flow chart of steps of a method for detecting urban forest single-tree crowns according to the present invention;

fig. 2 is a schematic structural diagram of a dual-branch detection network.

Detailed Description

The invention will be further elucidated and described with reference to the drawings and the detailed description. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.

As shown in fig. 1, in the present invention, a method for detecting a canopy of an urban forest single tree is provided, which combines an RGB-DSM image with deep learning, and comprises the following steps:

s1: acquiring an unmanned aerial vehicle image of a target detection area, and generating dense point cloud by using the acquired unmanned aerial vehicle image to further obtain an Orthophoto Map (DOM) representing the reflectivity of a ground object and a Digital Surface Model (DSM) representing the height of the ground object. Wherein the orthophoto map is an RGB three-channel color image, and the digital surface model is single-channel elevation data.

When the unmanned aerial vehicle is used for obtaining the unmanned aerial vehicle image, the flight area is larger than the target detection area, and then the unmanned aerial vehicle image of the target detection area is obtained by removing the edge part of the image.

For the DOM and the DSM, in the process of generating dense point clouds, it is difficult to construct a more accurate point cloud at the edge of a flight area due to the problem of picture angle, so that the edge area in an image is discarded, and only the non-edge area of the image is captured for training a deep learning network. In view of this, when the unmanned aerial vehicle aerial photography task is executed to acquire the unmanned aerial vehicle image, the flight area should be set to be slightly larger than the target detection area, and then the unmanned aerial vehicle image of the target detection area is obtained by removing the edge portion of the image. In addition, in order to manufacture the DOM, when the unmanned aerial vehicle is used for acquiring the unmanned aerial vehicle image, certain course overlapping and side-to-side overlapping need to be kept, and the course overlapping and the side-to-side overlapping are set to be 90%.

S2: and acquiring a first target detection model and a second target detection model which are respectively trained based on the deep learning network, wherein the input and the output of the first target detection model and the second target detection model are different although the first target detection model and the second target detection model are both trained based on the deep learning network as a framework. The first target detection model takes the ortho-image as input and outputs a detection frame of the single-tree crown in the ortho-image, and the second target detection model takes a three-channel high-speed diagram formed by repeated superposition of single-channel digital surface models as input and outputs the detection frame of the single-tree crown in the digital surface model.

The deep learning network can adopt a general network framework capable of realizing target detection, and can comprise a feature extraction network and a regional proposal network, wherein the deep learning network accepts three-channel images as input, and the input images are subjected to feature extraction to obtain a group of feature maps; then, the area proposal network generates a large number of interesting areas (ROIs with Feature maps) on the Feature Map, and after the interesting areas are regressed, classified and screened, Detection Boxes (Detection Boxes) are obtained, and finally, the Detection Boxes are evaluated to obtain Evaluation indexes (Evaluation Index). Each detection box has a corresponding category and confidence. The confidence coefficient is self-evaluation of the accuracy degree of the detection frame, represents the probability that the object in the detection frame belongs to the judged category, and the value range is [0,1 ].

In the subsequent embodiments of the present invention, the Faster R-CNN is used as the deep learning network, and the first target detection model and the second target detection model are formed through training. The concrete network structure of the Faster R-CNN belongs to the prior art and is not described in detail. Training of the deep learning network can be performed by using labeled sample data, and the labeled sample data can also be obtained according to the method in S1 and manually labeled by using labeling software. Data enhancement can also be performed if the sample size is insufficient.

S3: taking the RGB three-channel orthophoto map obtained in the S1 as the input of the first target detection model to obtain a group of first detection box sets representing all single tree crowns in the input image; and taking the single-channel digital surface model obtained in the step S1 as the input of the second target detection model, and obtaining another group of second detection box sets representing all the single-tree crowns in the input image.

S4: because the two sets of detection frames are obtained by using different input data and detection models respectively, it is necessary to traverse the two sets of detection frames for confidence level resetting and redundancy elimination, and the methods for confidence level resetting and redundancy elimination are as follows:

if the parallel-to-parallel ratio between one detection frame A in one group of detection frame sets and all detection frames in the other group of detection frame sets is smaller than the parallel-to-parallel ratio threshold, the single tree crown, namely the detection frame A, does not exist in the other group of detection frame sets, the detection frame A is reserved, and the confidence coefficient is reduced according to a preset proportion.

In the confidence resetting and redundancy elimination processes, the intersection ratio threshold is a standard for judging whether the two detection boxes represent the same tree crown, and the specific value of the intersection ratio threshold is optimized and adjusted according to the actual condition. In the subsequent embodiments of the present invention, it is preferable to set the intersection ratio threshold value to 0.5. Similarly, the preset weights need to be optimized according to actual conditions, and in the subsequent embodiment of the present invention, the preset weights are preferably set to be equal weights, that is, the confidences of the two detection boxes are weighted according to an equal weight of 0.5.

S5: after all the detection frames in the two detection frame sets traverse to complete the confidence degree resetting and redundancy elimination processes, a new group of detection frames are obtained through combination, and the new group of detection frames can be used as a final single tree crown detection result in the target detection area.

The methods of S1-S5 are applied to an embodiment to show the specific implementation and technical effects of the present invention.

Examples

In this embodiment, the implementation process of the urban forest single-tree crown detection method combining the RGB-DSM image and deep learning is as follows:

1. unmanned aerial vehicle image data acquisition

The method comprises the steps of taking the periphery of a university campus of Zhejiang agriculture and forestry in the Hangzhou city of Zhejiang as a target detection area, and taking representative camphor trees in the area as detection objects.

In the embodiment, a genius Phantom 4 RTK series unmanned aerial vehicle of Xin Jiang Innovation science and technology Limited and a built-in camera carried by the unmanned aerial vehicle are used as an image acquisition system, and detailed parameters of the unmanned aerial vehicle image acquisition system are shown in a table 1. The images are acquired by using an unmanned aerial vehicle to automatically cruise and shoot at regular time in the environment with clear weather or cloudy weather and low wind speed from 3/month and 3/day to 15/month in 2021. The embodiment totally executes the flight mission 31 times, and the flight parameters are not completely consistent. Considering the problem of building height in the study area, the flight mission performed in the target detection area is 50 meters in height, and the flight mission performed outside the target detection area is 70 meters in height. To make the DOM, both heading and side lap were set to 90%. Through the flight mission, 31 batches of 11377 unmanned aerial vehicle aerial images are obtained in the embodiment.

TABLE 1 unmanned aerial vehicle image acquisition System parameters

2. Dual branch detection network construction

In order to fully utilize the information contained in the RGB image and the DSM in the ITCD task, the present embodiment proposes a two-branch detection network model. The model is improved on the basis of fast R-CNN, and the structure is shown in figure 2: respectively inputting an RGB three-channel orthophoto map representing the reflectivity of the ground object and a single-channel digital surface model representing the height of the ground object into respective branch depth learning networks; feature extraction is carried out on each branch to obtain a feature map, then an area of interest is generated on the feature map by an area proposal network, and regression and classification are respectively carried out on the area of interest to obtain a candidate detection frame set (since the embodiment only detects one type of crown, category information is not labeled, and numbers in the detection frames represent confidence). The deep learning networks of the two branches are respectively recorded as a first target detection model and a second target detection model, the first target detection model takes an orthophoto image as input and outputs a detection frame of the tree crown of the single tree; and the second target detection model takes a three-channel high-level diagram formed by repeated superposition of single-channel digital surface models as input and outputs a detection frame of the single-tree crown. Correspondingly, detection frame sets obtained by the first target detection model and the second target detection model are respectively marked as a first detection frame set and a second detection frame set, and then confidence weighting and redundant elimination are carried out on the two groups of candidate detection frames to obtain a group of final detection frames. And finally, evaluating the detection result to obtain evaluation indexes (Precision, Recall and Average Precision).

The specific processing method for the two groups of candidate detection frames is as follows: if two detection frames belong to two groups of detection frames respectively and the intersection ratio is greater than or equal to the intersection ratio threshold value, the two detection frames are considered to represent the same crown, the confidence degrees of the two detection frames are weighted according to a preset weight and then assigned to one detection frame with a higher original confidence degree, and then the other detection frame is removed; if the union ratio between one detection frame in one group of detection frame sets and all detection frames in the other group of detection frame sets is smaller than the union ratio threshold, the single tree crown does not exist in the other group of detection frame sets, the detection frame is reserved, and the confidence coefficient is reduced according to the preset proportion.

The only criterion for determining whether two detection boxes represent the same crown is an Intersection over Union (IoU) between the two detection boxes, and if IoU is greater than or equal to 0.5, the two detection boxes are considered to actually represent the same camphor tree crown, and vice versa. The calculation formula of the intersection ratio is shown in formula 1.

The method for realizing the process comprises the following steps: and (3) collecting the two groups of candidate detection frames, wherein the detection frame with the highest confidence coefficient is alpha, and the rest detection frames form a set Box, which is shown in a formula 2.

Boxes＝{box1,box2,......} (2)

And (4) calculating the intersection ratio of the detection frame alpha and all the detection frames in the set Box, and taking the maximum intersection ratio, as shown in a formula 3.

Max_IoU(α,Boxes)＝max{IoU(α,box1),IoU(α,box2),......} (3)

Assuming that the intersection ratio between the detection frame α and the detection frame β is maximum, equation 4 determines the weight of β: if the intersection ratio is more than or equal to 0.5, the alpha and the beta are considered to actually represent the same camphor tree crown, and the weight of the beta is 0.5; if the intersection ratio is less than 0.5, the alpha and the beta are not considered to represent the same camphor tree crown, and the weight of the beta is given as 0.

Finally, as shown in equation 5, α and β are weighted to obtain a new confidence level. This new confidence is assigned to the detection box α, which represents the probability that the object in the detection box α is the camphor tree crown in the detection result of this scheme (hereinafter referred to as the two-branch network scheme), and the detection box β is rejected. Where conf _ α and conf _ β represent the confidence levels of α and β, respectively. And repeating the above process until the processing of all the detection frames is completed.

Conf(α,Boxes)＝0.5*conf_α+Weight(Max_IoU(α,Boxes))*conf_β (5)

When evaluating the detection result, the calculation of the evaluation index depends on the confidence of each detection frame. The confidence weighting step in the above process is to determine the detection frame by using the elevation information and the color information in combination, so as to improve the confidence of the positive sample, reduce the confidence of the negative sample, and further improve the detection result.

3 data processing and data set creation

Data preprocessing: 31 batches of images obtained by an unmanned aerial vehicle aerial photography task are respectively loaded in an Agisoft Metashape Professional to generate dense point clouds, and then 31 DOM (an RGB image, which is recorded as an RGB image later) and DSM are respectively produced and derived. Due to the fact that the image is too large, the DSM is resampled to be the same as the RGB image in resolution, all local areas containing the camphor tree crowns are cut out, and 90 RGB local images and DSM local images which correspond to one another are obtained in total.

Data set preparation: the RGB partial image and the DSM partial image are divided into a training image and a test image respectively. The composition of each type of image is the same, and the specific steps are as follows: the test images were 46 in total and contained 602 camphor tree crowns. The training images are 44 in total and comprise 617 camphor tree crowns. In order to increase the data volume and obtain a better detection effect, the training images are subjected to data enhancement, and the training images are expanded to 308 images including 4319 camphor tree crowns through clockwise 90-degree rotation, clockwise 180-degree rotation, clockwise 270-degree rotation, left and right mirror images, upper and lower mirror images and 0.8-time brightness transformation.

The type, the position and the size of the single tree crown are determined through field investigation, particularly the places where the tree crowns are covered with each other are important investigation areas, and the information of each tree is ensured to be recorded correctly. On this basis, in the embodiment, the labeling work is completed by using an open source labeling tool, so as to obtain an RGB data set with a label and a DSM data set.

4. Network training

The computer operating system used in this embodiment is Windows 10 (64-bit, education edition), the CPU is Intel Xeon E3-1225 v5@3.30GHz, the GPU is Nvidia GeForce GTX 1080 Ti, and the video memory is 11G. The programming language is Python, the corresponding environment version is 3.5.2, and the tensrflow framework is installed (version 1.9.0). In this embodiment, Resnet101 is used as a backbone network, and in order to accelerate the convergence rate of the model, a corresponding pre-trained model is further obtained from a tensflo official sourcing project for migration learning, and detailed parameters of the network are shown in table 2. And training the network in the double-branch detection network by using the RGB data set and the DSM data set respectively until the network converges.

Table 2 network details parameters

5. Evaluation index

And after the training of the double-branch detection network is finished, inputting a test image into the double-branch detection network, and verifying the result. The RGB local image is used as the input of a first target detection model, and a group of first detection frame sets representing all single tree crowns in the input image are obtained; and taking the DSM local image as the input of a second target detection model to obtain another group of second detection frame sets representing all the single-tree crowns in the input image. According to the foregoing process in this embodiment, the two groups of detection frames are traversed to perform confidence level resetting and redundancy elimination, so as to obtain a new group of detection frames, which is used as a final single-tree crown detection result. In the embodiment, Recall rate (Recall), Precision rate (Precision) and Average Precision (AP) are used as evaluation indexes, and the quality of a model is evaluated according to the performances of the three indexes on test image data. In the embodiment, the recall rate indicates how many camphor tree crowns actually exist and are detected by the network, and the accuracy rate indicates how many camphor tree crowns detected by the network are correct. Under different confidence degrees, the accuracy rate and the recall rate are different, so that a P-R curve can be drawn, and the average accuracy is numerically equal to the area under the P-R curve. The correlation calculation formula is as follows:

in the formula, tp (true positive) indicates a detection box determined as the camphor tree crown, and actually, the detection box is also the camphor tree crown. Fp (false positive) indicates a detection frame determined as the camphor tree crown but is not actually the camphor tree crown. FN (false negative) indicates that the detection frame is not judged as the camphor tree crown but actually is the detection frame of the camphor tree crown. The specific determination method is as follows: and (8) taking all the rectangular label boxes in the label file corresponding to the image, and calculating IoU of the detection boxes and the label boxes one by one. Taking the largest IoU and the corresponding labeling box, and if IoU is greater than or equal to 0.5, determining that the camphor tree crowns represented by the detection box and the labeling box are the same, and correctly detecting; if the IoU is less than 0.5, the position of the detection frame is determined to have no camphor tree crown, and the detection is wrong.

In the present embodiment, the detection results of the above-mentioned two-branch network, the first object detection model using RGB partial images alone, and the second object detection model using DSM partial images alone are compared, as shown in table 3. Wherein Precision, Recall, TP, FP and TN are obtained by detecting frames with statistical confidence degree of more than or equal to 0.5.

TABLE 3 comparison of experimental results of various protocols

The two-branch network scheme achieves the best result, the average precision of the two-branch network scheme reaches 86.79%, the improvement is 7.28% compared with the RGB scheme, and the precision rate and the recall rate are respectively the best and the suboptimum in each scheme. The two-branch network scheme of the invention effectively combines and utilizes the elevation information and the color information to detect the single tree crown target, and forms a new detection result through further confidence resetting and redundancy elimination, thereby improving the performance of the ITCD task.

The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims

1. A city forest single tree crown detection method combining RGB-DSM images and deep learning is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the unmanned aerial vehicle is used to obtain the image of the unmanned aerial vehicle, and the course overlap and the side overlap are not less than 90%.

3. The method for detecting the canopy of an urban forest tree combining the RGB-DSM image and the deep learning according to claim 1, wherein when the drone image is obtained by using a drone, a flight area should be larger than a target detection area, and then the drone image of the target detection area is obtained by removing an edge portion of the image.

4. The method for detecting the crown of an urban forest single tree by combining the RGB-DSM image and the deep learning as claimed in claim 1, wherein the deep learning network comprises a feature extraction network and a region proposal network, the network receives three-channel images as input, and a group of feature maps are obtained after the input images are subjected to feature extraction; then, the region proposal network generates a large number of regions of interest on the feature map, and the detection frame is obtained after the regions of interest are regressed, classified and screened.

5. The method of urban forest single tree crown detection combining RGB-DSM image and deep learning of claim 1, wherein the deep learning network is fast R-CNN.

6. The method of urban forest single tree crown detection combining RGB-DSM image and deep learning according to claim 1, wherein the cross-over ratio threshold is 0.5.

7. The method as claimed in claim 1, wherein the predetermined weight is equal, that is, the confidence degrees of the two detection boxes are weighted by equal weight of 0.5.

8. The method of claim 1, wherein the first object detection model and the second object detection model are trained in advance by using labeled sample data which is subjected to data enhancement.