CN110929722A

CN110929722A - Tree detection method based on whole tree image

Info

Publication number: CN110929722A
Application number: CN201911065758.2A
Authority: CN
Inventors: 冯海林; 钱峥; 武斌; 杜晓晨; 夏凯
Original assignee: Zhejiang A&F University ZAFU
Current assignee: Zhejiang A&F University ZAFU
Priority date: 2019-11-04
Filing date: 2019-11-04
Publication date: 2020-03-27

Abstract

The invention discloses a tree detection method based on whole tree images, which comprises the steps of collecting whole tree images by using an autonomous shooting and on-line crawling mode, establishing a data set, and carrying out data enhancement processing on the data set images to enable the number of samples of each type of trees to be consistent; then, performing illumination processing on the tree image in the data set after the data enhancement processing; and finally, performing feature extraction, candidate area generation and classification on the tree image subjected to illumination processing by adopting a Faster R-CNN algorithm, and outputting a detection result. The tree detection method based on the data set concentrates the whole tree image, weakens the illumination and shelters the influence on the tree detection under the complex site environment, enables the tree detection work to be easier and more convenient, can adapt to complex terrains, and is not limited by a maintenance mode and an area.

Description

Tree detection method based on whole tree image

Technical Field

The invention belongs to the technical field of tree detection, and particularly relates to a tree detection method based on an integral tree image.

Background

Trees are of immeasurable importance for the global environment and human life, are an important source of oxygen and natural air filtration, and are vital protective umbrellas for the nature. The trees are an integral part of a forestry ecosystem, and play a vital role in improving the human living environment, providing social services, producing materials and the like. Tree detection has important ecological and practical value to forest management, however most of research that trees detected mainly uses the crown as the access point, combines unmanned aerial vehicle image to study. With the widespread application of neural networks, tree detection has a new idea, for example, based on two machine learning methods proposed by AridaSusilowati et al, clove tree automatic detection and yield estimation are realized by using remote sensing data and methods, and large-scale palm tree detection by using convolutional neural networks is realized.

However, the shape and state of the tree are changed continuously, and the tree is vulnerable to insect damage or infection of other organisms during the growth process, a dysplasia condition occurs slightly, and death occurs seriously, and accurate detection of the tree position is a key for evaluating the health state of the tree and maintaining the tree. One of the main challenges in tree detection is insufficient detection of smaller trees, with high accuracy for trees visible from above in single tree detection, but not reliable detection of trees in low-rise canopies. In a complex ground environment, the accuracy of high-altitude detection work mainly depends on the configuration mode of trees in space, and many forest lands in the world, particularly public lands, are managed by structural diversity on forest stand and landscape scale. In order to solve the detection problem existing in the complex site environment and promote the tree maintenance work, the whole tree detection is an option.

In summary, the traditional tree detection method has low efficiency, is greatly influenced by environmental factors of a sample plot, and needs to depend on an experienced observer for trees beyond the level of a single tree, so that large equipment is not easy to carry and is expensive. Considering the problems of high professional requirements, high cost investment, poor recognition effect and the like of the traditional forestry work, a proper solution is needed. With the rapid development of information technology, the general application of deep learning provides a new method for solving the problems, and the efficiency and the precision of forestry operation are improved. However, due to the characteristics of the tree itself and the influence of a complex field environment (such as illumination, terrain, etc.), the accuracy and applicability of tree detection are to be improved. Few work at home and abroad involves whole tree detection in natural environments, mainly uses unmanned planes and crowns as research entry points, and few work uses whole tree images from cameras.

Disclosure of Invention

The invention aims to provide a tree detection method based on whole tree images, which detects trees by autonomously shooting and crawling tree-related image data on a network, knows the total distribution position information of trees in the area and the plot, reduces the cost of manual classification and positioning, solves the problem of obvious detection effect difference caused by illumination and shielding, and ensures that forestry investigation work is safer and more convenient.

In order to achieve the purpose, the technical scheme of the application is as follows:

a tree detection method based on a whole tree image comprises the following steps:

collecting the whole image of the tree, establishing a data set, and performing data enhancement processing on the data set to ensure that the number of samples of each type of tree is consistent;

performing illumination processing on the tree image in the data set after the data enhancement processing;

and performing feature extraction, candidate area generation and classification on the tree image subjected to illumination processing by adopting a Faster R-CNN algorithm, and outputting a detection result.

Further, the illumination processing is performed on the tree image in the data set after the data enhancement processing, wherein a formula for performing the illumination processing on the tree image is as follows:

wherein, I (X) is the image of the tree to be treated, J (x) is the illuminationThe processed image, A is the image brightness, t (x) is the transmittance, t₀Is a preset threshold.

Further, the image brightness a is obtained by the following method:

solving the minimum value in three channels of RGB of the input tree image, namely solving a dark primary color channel image, carrying out mean value filtering on the dark primary color channel image, and then solving the point with the maximum gray value;

then, solving a channel image with the maximum median value in three channels of the RGB input image, and then solving a point with the maximum gray value;

the average of the gray values of the two points having the largest gray value is taken as the image luminance a.

Further, the transmittance t (x) is obtained according to the following formula:

where Ω (x) represents a window centered on pixel x, superscript C represents the three R/G/B color channels, and ω is a factor between [0,1 ].

Further, the loss function of the Faster R-CNN algorithm is as follows:

wherein N is_clsIs the total amount of target classification anchors; n is a radical of_regIs the total number of frame regression anchors, p_iPredicting the probability of being a target for an anchor point, when the anchor point is a positive sample

When the anchor point is a negative sample

t_i4 parameterized coordinates representing predicted candidate box bounding boxes;

is the coordinate vector of the real box bounding box corresponding to the positive sample;

is the logarithmic loss of the target versus the non-target, smooth_L1Is a robust loss function;

wherein:

p₊is composed of

Is detected in the positive sample of (a),

as anchor points t_iCorresponding candidate frame, t_iM，t_iNThe anchor points representing different classes of anchor points,

as anchor points t_iThe corresponding candidate box, k is a small constant, prevents the denominator from being 0, α, β are hyperparameters, and is used to balance the weights of different effects.

The application provides a tree detection method based on a whole tree image, the whole tree image is concentrated on the basis of data, the tree type and the specific position condition are obtained, and the influence of illumination and shielding on tree detection in a complex ground environment is weakened. The method enables tree detection work to be easier and more convenient, can adapt to complex terrains, is not limited by maintenance modes and areas, and reduces a series of inestimable problems such as tree damage, yield reduction, death, ecological damage and even destruction caused by untimely acquisition of tree position information.

Drawings

FIG. 1 is a flowchart of a tree detection method based on an overall image of a tree according to the present application;

FIG. 2 is a schematic diagram of the fast R-CNN algorithm according to an embodiment of the present application;

FIG. 3 is a diagram illustrating an embodiment of a missed recall and missed detection P, T;

fig. 4 is a schematic diagram of the experimental results of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The tree detection is taken as an example to explain in the east lake school zone of Zhejiang agriculture and forestry university in Lingan district of Hangzhou, Zhejiang, the zone is in a subtropical season wind zone, vegetation is mainly subtropical evergreen broad-leaved forest, the climate is clear in four seasons, the rainfall is abundant, the annual average temperature is 16.4 ℃, the annual sunshine duration is 1847.3 hours, the annual precipitation is 1628.6 millimeters, and the average relative humidity is 70.3%.

In one embodiment, as shown in fig. 1, a tree detection method based on an overall image of a tree is provided, including:

and S1, collecting the whole image of the tree, establishing a data set, and performing data enhancement processing on the data set to ensure that the number of samples of each type of tree is consistent.

The method and the device have the advantages that simple equipment such as a camera is used for shooting four kinds of tree species including ginkgo, maple, sweet osmanthus and soapberry in the area, and the whole image of the tree, hereinafter referred to as the tree image, is obtained. The whole image of this application trees is to any trees promptly, and the tree image of gathering is including crown, trunk, can see the full view of trees. The image data set for plant identification disclosed on the current network includes: most of plantatnet, Flavia, and leaf snap are plant leaf image data sets, and there is no published tree entire image data set. Therefore, in the application, a data set is constructed in a mode of combining self-shooting and network crawling from 4 months to 5 months in 2019, four kinds of tree species including ginkgo, maple, sweet osmanthus and soapberry are selected as research objects, 1533 pictures are collected in total, 610 pictures are shot independently, and 923 pictures are crawled on the network. The online crawling pictures mainly take single plants and multiple plants in sunny days, and the independent shooting pictures mainly take the single plants and the multiple plants in different weather conditions in consideration of the richness of experimental data.

The data set is screened, and then the condition that the number of samples of each type of trees is different is found, so that the neural network is easy to have the serious problem of uneven label distribution when training batches are divided, and the training parameters are deviated and the classification effect is reduced. In order to improve the classification accuracy of the network and prevent the problems of overfitting and the like, the method and the device adopt the modes of diagonal line inversion, affine transformation, color adjustment, local cutting and fuzzy processing to enhance data. The method is not limited to a specific data enhancement processing method, and the number of samples in the data set is increased through data enhancement processing, so that the number of samples of each type of trees is consistent, and the phase difference is within a preset number range.

And step S2, performing illumination processing on the tree images in the data set after the data enhancement processing.

Complicated illumination conditions in different site environments are a challenge for tree detection, and over-bright or dark light affects detection to different degrees, so illumination processing is very critical.

In most non-sky local areas in the tree image in the data set of the present application, at least one color channel exists in some pixels and has a very low value, which can be expressed by the following formula:

J^dark(x)＝min_y∈Ω(x)(min_c∈{r,g,b}J^C(y)) (1)

in the formula J^CRepresenting each channel of the color image and omega (x) represents a window centered on pixel x. The minimum value of each pixel RGB component is obtained from the input tree image, the minimum value is stored in a gray-scale image with the same size as the original image, then the minimum value filtering is carried out on the gray-scale image, the Radius of the filtering is determined by the window size, and the general window size W is 2 Radius + 1. According to J^darkTowards 0, the image after illumination processing is represented by the following formula (2):

wherein, i (x) is the image to be processed, i.e. the image of the tree in the data set, j (x) is the image after illumination processing, a is the image brightness, t (x) is the transmittance, and superscript C represents the three R/G/B color channels.

The image brightness a of this embodiment may use a gray value of an average brightness of pixel points in the image as an estimation value. Preferably calculated by the following method: firstly, the minimum value in three channels of RGB of an input tree image is obtained, namely a dark primary channel image is obtained, then mean filtering is carried out on the dark primary channel image, then the point with the maximum gray value is obtained, then the channel image with the maximum median value in the three channels of the input tree image is obtained, then the point with the maximum gray value is obtained, and then the mean value of the gray values of the two points with the maximum gray value is used as A.

In the estimated value calculation of the transmittance t (x), assuming that the transmittance t (x) and the value of a are given in each window, two minimum operations are performed on both sides of the above equation. Preferably, in consideration of the fact that light rays are different in real life and different in angle of the same object, it is necessary to keep a certain degree of illumination, so that a light ray in [0,1] is introduced into t (x)]Factor ω between, according to J^darkTrending toward 0 to obtain

The final processing results are shown below:

wherein, t₀Is a preset threshold value.

And step S3, performing feature extraction, candidate area generation and classification on the tree image subjected to illumination processing by adopting a Faster R-CNN algorithm, and outputting a detection result.

The fast R-CNN algorithm is a commonly used target detection algorithm, as shown in fig. 2, wherein convolutional layers Convlayers, fast R-CNN extract feature maps of images using a set of basic convolutional Conv + excitation relu + pooling mapping layers, and the Conv layers of this embodiment include 13 Conv layers +13 relu layers +4 mapping layers, and the feature maps are shared for the subsequent RPN layer and fully connected layer.

The candidate area network Region Networks (RPN) is used to generate candidate area Networks, the RPN network is actually divided into 2 lines, one obtains foreground forkround and background (detection target is forkround) by softmax classification anchors, and the other calculates bounding box regression offset for anchors to obtain accurate candidate frame. The last Proposal layer is responsible for synthesizing the Foreground anchors and bounding box regression offsets to obtain proposals, and meanwhile, eliminating proposals which are too small and exceed the boundary. The RPN network structure is summarized as follows: generating anchors- > softmax classifier extracting forkround anchors- > bounding box regression forkround anchors- > Propusal Layer generating propofol.

And the ROI pooling layer is responsible for collecting input prosals, calculating to obtain characteristic diagram prosalty maps of the candidate frames, and sending the characteristic maps into a subsequent full-connection layer to judge the target category.

The full connection layer calculates the specific category of each proforward through full connect layer and softmax by using the obtained proforward feature maps, and outputs a probability vector; and simultaneously, obtaining the final accurate position of the candidate frame by using the bounding box.

In the embodiment, the tree detection work such as feature extraction, candidate region generation, classification and the like is unified into a deep network framework through the Faster R-CNN. Adopting a VGG16 network, firstly, performing size normalization on an input tree image, setting the image size to be 224 × 224, and if the image size is smaller than 224 × 224, adopting a method of edge-0-complementing, namely complementing a black edge on the small-size image, and then performing convolution feature extraction on the small-size image, wherein the process can be described as follows:

in the formula x_iIs a feature map of the i-th layer of the convolutional neural network, k_iIs the i-th layer of convolution kernel, b_iIs a bias vector for the i-th layer,

for the sign of the convolution operation, act (-) is the activation function. Common activation functions include sigmoid, tanh, ReLU, LeakyRelu and the like, the ReLU activation function is adopted in the application, and the activation function formula is as follows:

act(x)＝max(0,x) (6)

in the entire convolutional layer Conv layers, 1/16 when the tree image passes through 4 pooling layers and becomes input, the feature map becomes spatially smaller, deeper, i.e., 14 × 14, and 512(224/16 is 14 ).

To train the RPN, each anchor needs to be assigned class labels (target and non-target). The Faster R-CNN network has 2 peer output layers, all fully connected layers, called multitasking. One for classification and one for adjusting the candidate box position, and the final total penalty is a weighted sum of the estimated classification penalty function and the penalty function for estimating the candidate box position.

RPN network training, whether a target exists in an anchor is judged by adopting the rule: 1) if IoU of a certain anchor and any target area is maximum, the anchor is judged to have a target; 2) if IoU of a certain anchor and any target area is greater than 0.7, determining that the target exists; 3) if IoU for an anchor and any target region is <0.3, it is determined to be background. (so-called IoU, which is the coverage of the candidate box and the real box, and its value is equal to the intersection of the two box areas divided by the union of the two box areas.)

The multi-task loss is followed in the training, the objective function is minimized, and the loss function is defined as:

the loss function is divided into two parts, corresponding to two branches of the RPN, namely classification error of target or not and regression error of frame.

Wherein N is the total amount of anchor points; n is a radical of_clsIs the total amount of target classification anchors; n is a radical of_regIs the total number of frame regression anchors; p is a radical of_iPredicting the probability of being a target for the anchor; real frame GT label

When the anchor is a positive sample then

When the anchor is negative, then

t_i＝{t_x,t_y,t_w,t_hIs a vector representing the 4 parameterized coordinates of the predicted candidate box bounding box;

is the coordinate vector of the real frame ground truth bounding box corresponding to the positive sample;

is the logarithmic loss of the target versus the non-target,

is regression loss, smooth_L1Is a robust loss function.

The goal of bounding box regression in Faster R-CNN is to achieve the candidate boxes' closeness to the real box Grountruth window. The original propofol is mapped by regression or fine tuning to obtain a regression window that is closer to the real box. The mapping is obtained by translating and scaling the Proposal into a window.

t_x＝(x-x_a)/w_a,t_y＝(y-y_a)/h_a(11)

t_w＝log(w/w_a),t_h＝log(h/h_a) (12)

Wherein x, y, w, h are the coordinates of the center point of the box, and the width and height of the box, respectively. And x is the predicted coordinate on box, x_aExpressed as the coordinates, x, on the anchor box generated by a pixel on the feature map^*The coordinates of the point corresponding to the group, and other symbols can be deduced in the same way.

And when the difference between the input Proposal and the Ground Truth is smaller, fine tuning the window by adopting linear regression to remove redundant Proposal. And the false recall and missed detection phenomena need to be considered, as shown in figure 3,P₂comprising T₂Partial region of (A) resulting in P₂The corresponding propofol is susceptible to T₂Interference, resulting in regression of the propofol containing the target T₁、T₂Partial area of (3), not accurate. If P₁Is absent when P₂And P₃IoU is greater than the threshold value, at NMS stage P₂Will be filtered to result in T₁Is leaked back.

In tree detection, in order to suppress such false detection, the threshold of NMS cannot be raised at once, otherwise false suppression of true approach is possible. In view of this problem, the following equation (15) is used to consider the case where the proposal is shifted to another target and the case where the proposals approaches to another target, thereby suppressing false detection in the case where the overapproxy is likely to miss a recall and the approximation degree is not appropriate, and making the detection result more robust to the NMS algorithm.

is the coordinate vector of the real frame bounding box corresponding to the positive sample anchor point; p is a radical of₊Is composed of

Is detected in the positive sample of (a),

as anchor points t_iThe corresponding candidate box, k is a small constant that prevents the denominator from being 0, α, β are hyperparameters that balance the weights of the different contributions.

Computing anchor points for all positive samples

IoU with all real boxes, take IoU max as L_AttrThe target frame of (1).

Namely, IoU nd 2 nd largest L_RepGTThe target frame of time.

L_AttrBy applying Smooth_L1The distance measure specifies the approximation degree between the candidate frame corresponding to the anchor and the target frame, which needs to satisfy the approximation of the candidate frame to the target frame, and each graph needs to consider all anchors.

L_RepGTBy applying Smooth_L1The distance measure is the ratio of the intersection of the candidate frame corresponding to anchor and the target frame to the area of the candidate frame corresponding to anchor.

L_RepBoxThe method is used for processing the condition that the candidate frames corresponding to different targets are too close to cause the condition of missed recall, the candidate frames of different targets are separated, the smaller the IoU between the candidate frames is, the better the candidate frames are, and the candidate frames corresponding to different targets areSelecting frames and separating.

The method is applied to a loss function in the Faster R-CNN, and a better processing result is provided for the tree shielding problem through the positioning result of the optimization model to the target.

The method is evaluated through experiments, and the mAP (mean Average precision) is selected as a performance evaluation model and is an Average value of the precision (AP) of each category. The method comprises the steps of firstly calculating the AP value of one type of tree species, setting an IoU threshold, dividing all GT and DT into classes, calculating a performance (namely AP) of all GT and DT of the same type, and then averaging the performances (mAP) of four types of tree species of the research object of the application, namely the performance under the IOU threshold.

Through experiments, as shown in fig. 4, the change of the tree species and the average value corresponding to the detection accuracy of four types of tree species is shown, the first column is the detection accuracy of the unprocessed condition, and the second column, the third column and the fourth column are respectively the detection accuracy comparison data which are processed by illumination processing, shielding processing and both processing.

It can be found that the mAP of the four species of trees is improved from 87.25% to 90.25% after the light treatment. Aiming at the problem of tree shielding, the interference of a GT on one side to a candidate frame is mainly controlled, a certain constraint effect is performed on the candidate frame, the detection effect of the experiment is more robust, and the mAP of four types of trees is improved to 91.5% from 87.25% through shielding treatment. In addition, due to the form difference of different tree species, the total change of the detection accuracy of the osmanthus fragrans tree species is relatively small, and the change range of the gingko, the maple and the soapberry is relatively large. The AP value of the four types of tree species is improved after illumination treatment and shielding treatment, and the precision of tree detection is improved to 93.25%.

The tree image processing method is based on an illumination processing method, the weakest area of illumination intensity in an input image is searched, namely the minimum value of RGB components in each pixel is solved, the minimum value is stored in a gray scale image and is subjected to minimum value filtering, the maximum point of a gray scale value is solved, then the maximum channel image and the maximum value of the gray scale value in the RGB image are solved, and the gray scale values of the two points are averaged. In consideration of existence of 'weak' illumination in real life, certain light is reserved according to weather conditions when the image is subjected to illumination processing. By processing the dark or bright part in the image, the local shadow and illumination change of the image can be effectively reduced, the image quality is improved to a certain extent, the image with the fused target object and background is obviously improved on the framing of the tree edges, and the mAP of the four types of trees is improved to different extents after illumination processing.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A tree detection method based on a whole tree image is characterized by comprising the following steps:

2. The tree detection method based on the whole tree image according to claim 1, wherein the tree image in the data set after the data enhancement processing is subjected to illumination processing, wherein an equation for performing illumination processing on the tree image is as follows:

wherein, I (X) is the image of the tree to be treated, J (x) is the image after the illumination treatment, A is the image brightness, t (x) is the transmissivity, t₀Is a preset threshold.

3. The tree detection method based on the whole tree image according to claim 2, wherein the image brightness a is obtained by:

4. The tree detection method based on the whole tree image according to claim 2, wherein the transmittance t (x) is obtained according to the following formula: